Posts by Collection

portfolio

Portfolio item number 1

Published: July 15, 2025

Short description of portfolio item number 1

Portfolio item number 2

Published: July 15, 2025

Short description of portfolio item number 2

publications

Stochastic Gradient Descent Works Really Well for Stress Minimization

Published in Graph Drawing and Network Visualization, 2021

Abstract: Stress minimization is among the best studied force-directed graph layout methods because it reliably yields high-quality layouts. It thus comes as a surprise that a novel approach based on stochastic gradient descent (Zheng, Pawar and Goodman, TVCG 2019) is claimed to improve on state-of-the-art approaches based on majorization. We present experimental evidence that the new approach does not actually yield better layouts, but that it is still to be preferred because it is simpler and robust against poor initialization.

Recommended citation: Katharina Börsig, Ulrik Brandes, Barna Pásztor (2021). "Stochastic Gradient Descent Works Really Well for Stress Minimization." Graph Drawing and Network Visualization. https://link.springer.com/chapter/10.1007/978-3-030-68766-3_2

On the impact of publicly available news and information transfer to financial markets

Published in Royal Society Open Science, 2021

Abstract: We quantify the propagation and absorption of large-scale publicly available news articles from the World Wide Web to financial markets. To extract publicly available information, we use the news archives from the Common Crawl, a non-profit organization that crawls a large part of the web. We develop a processing pipeline to identify news articles associated with the constituent companies in the S&P 500 index, an equity market index that measures the stock performance of US companies. Using machine learning techniques, we extract sentiment scores from the Common Crawl News data and employ tools from information theory to quantify the information transfer from public news articles to the US stock market. Furthermore, we analyse and quantify the economic significance of the news-based information with a simple sentiment-based portfolio trading strategy. Our findings provide support for that information in publicly available news on the World Wide Web has a statistically and economically significant impact on events in financial markets.

Recommended citation: Metod Jazbec, Barna Pásztor, Felix Faltings, Nino Antulov-Fantulin, Petter N Kolm (2021). "On the impact of publicly available news and information transfer to financial markets." Royal Society Open Science. https://royalsocietypublishing.org/doi/full/10.1098/rsos.202321

Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning

Published in Transactions on Machine Learning Research, 2023

Abstract: Learning in multi-agent systems is highly challenging due to several factors including the non-stationarity introduced by agents’ interactions and the combinatorial nature of their state and action spaces. In particular, we consider the Mean-Field Control (MFC) problem which assumes an asymptotically infinite population of identical agents that aim to collaboratively maximize the collective reward. In many cases, solutions of an MFC problem are good approximations for large systems, hence, efficient learning for MFC is valuable for the analogous discrete agent setting with many agents. Specifically, we focus on the case of unknown system dynamics where the goal is to simultaneously optimize for the rewards and learn from experience. We propose an efficient model-based reinforcement learning algorithm, M3-UCRL, that runs in episodes, balances between exploration and exploitation during policy learning, and provably solves this problem. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC, obtained via a novel mean-field type analysis. To learn the system’s dynamics, M3-UCRL can be instantiated with various statistical models, e.g., neural networks or Gaussian Processes. Moreover, we provide a practical parametrization of the core optimization problem that facilitates gradient-based optimization techniques when combined with differentiable dynamics approximation methods such as neural networks.

Recommended citation: Barna Pásztor, Andreas Krause, Ilija Bogunovic (2023). "Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning." Transactions on Machine Learning Research. https://openreview.net/pdf?id=gvcDSDYUZx

Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning

Published in AAMAS (2024), 2024

Abstract: Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent. In this paper, we address an important generalization where there exist global constraints on the distribution of agents (e.g., requiring capacity constraints or minimum coverage requirements to be met). We propose Safe-M3-UCRL, the first model-based algorithm that attains safe policies even in the case of unknown transition dynamics. As a key ingredient, it uses epistemic uncertainty in the transition model within a log-barrier approach to ensure pessimistic constraints satisfaction with high probability. We showcase Safe-M3-UCRL on the vehicle repositioning problem faced by many shared mobility operators and evaluate its performance through simulations built on Shenzhen taxi trajectory data. Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.

Recommended citation: Matej Jusup, Barna Pásztor, Tadeusz Janik, Kenan Zhang, Francesco Corman, Andreas Krause, and Ilija Bogunovic. 2024. Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS '24). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 973–982. https://dl.acm.org/doi/10.5555/3635637.3662952

Stochastic Bilevel Optimization with Lower-Level Contextual Markov Decision Processes

Published in NeurIPS 2024, 2024

Abstract: In various applications, the optimal policy in a strategic decision-making problem depends both on the environmental configuration and exogenous events. For these settings, we introduce Bilevel Optimization with Contextual Markov Decision Processes (BO-CMDP), a stochastic bilevel decision-making model, where the lower level consists of solving a contextual Markov Decision Process (CMDP). BO-CMDP can be viewed as a Stackelberg Game where the leader and a random context beyond the leader’s control together decide the setup of (many) MDPs that (potentially multiple) followers best respond to. This framework extends beyond traditional bilevel optimization and finds relevance in diverse fields such as model design for MDPs, tax design, reward shaping and dynamic mechanism design. We propose a stochastic Hyper Policy Gradient Descent (HPGD) algorithm to solve BO-CMDP, and demonstrate its convergence. Notably, HPGD only utilizes observations of the followers’ trajectories. Therefore, it allows followers to use any training procedure and the leader to be agnostic of the specific algorithm used, which aligns with various real-world scenarios. We further consider the setting when the leader can influence the training of followers and propose an accelerated algorithm. We empirically demonstrate the performance of our algorithm.

Recommended citation: Vinzenz Thoma, Barna Pasztor, Andreas Krause, Giorgia Ramponi, Yifan Hu. Stochastic Bilevel Optimization with Lower-Level Contextual Markov Decision Processes. Advances in Neural Information Processing Systems 2024. https://proceedings.neurips.cc/paper_files/paper/2024/hash/e66309ead63bc1410d2df261a28f602d-Abstract-Conference.html

Bandits with Preference Feedback: A Stackelberg Game Perspective

Published in NeurIPS 2024, 2024

Abstract: Bandits with preference feedback present a powerful tool for optimizing unknown target functions when only pairwise comparisons are allowed instead of direct value queries. This model allows for incorporating human feedback into online inference and optimization and has been employed in systems for tuning large language models. The problem is well understood in simplified settings with linear target functions or over finite small domains that limit practical interest. Taking the next step, we consider infinite domains and nonlinear (kernelized) rewards. In this setting, selecting a pair of actions is quite challenging and requires balancing exploration and exploitation at two levels: within the pair, and along the iterations of the algorithm. We propose MaxMinLCB, which emulates this trade-off as a zero-sum Stackelberg game and chooses action pairs that are informative and yield favorable rewards. MaxMinLCB consistently outperforms existing algorithms and satisfies an anytime-valid rate-optimal regret guarantee. This is due to our novel preference-based confidence sequences for kernelized logistic estimators.

Recommended citation: Barna Pasztor, Parnian Kassraie, Andreas Krause. Bandits with Preference Feedback: A Stackelberg Game Perspective. Advances in Neural Information Processing Systems 2024. https://proceedings.neurips.cc/paper_files/paper/2024/hash/1646e34971facbcda3727d1dc28ab635-Abstract-Conference.html

Learning Collusion in Episodic, Inventory-Constrained Markets

Published in AAMAS 2025, 2024

Abstract: Pricing algorithms have demonstrated the capability to learn tacit collusion that is largely unaddressed by current regulations. Their increasing use in markets, including oligopolistic industries with a history of collusion, calls for closer examination by competition authorities. In this paper, we extend the study of tacit collusion in learning algorithms from basic pricing games to more complex markets characterized by perishable goods with fixed supply and sell-by dates, such as airline tickets, perishables, and hotel rooms. We formalize collusion within this framework and introduce a metric based on price levels under both the competitive (Nash) equilibrium and collusive (monopolistic) optimum. Since no analytical expressions for these price levels exist, we propose an efficient computational approach to derive them. Through experiments, we demonstrate that deep reinforcement learning agents can learn to collude in this more complex domain. Additionally, we analyze the underlying mechanisms and structures of the collusive strategies these agents adopt.

Recommended citation: Paul Friedrich, Barna Pasztor, Giorgia Ramponi. Learning Collusion in Episodic, Inventory-Constrained Markets. AAMAS 2025. https://ifaamas.csc.liv.ac.uk/Proceedings/aamas2025/pdfs/p803.pdf

talks

Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning

Published: June 06, 2023

Recording: YouTube
Slides: PDF

teaching

Teaching Assistant for Algorithmic Game Theory and Mechanism Design

Master course, University of Zurich, Department of Informatics, 2022

Content

Game Theory
P2P File Sharing
Mechanism Design
Online Advertising Auctions
Linear and Integer Programming
Combinatorial Auctions
Matching Markets
Computational Social Choice