Algorithmic Game Theory (Lecture 18: From External Regret to Swap Regret and the Minimax Theorem)

Last lecture proved course-correlated equilibria are tractable via no-regret algorithms. This lecture extends this to correlated equilibria using "no-swap regret" algorithms, a refinement of no-regret. A polynomial-time reduction from no-regret to no-swap regret algorithms is shown, proving correlated equilibria's tractability. Finally, the min-max theorem for two-player zero-sum games is derived using no-regret algorithms, implying the existence of mixed Nash equilibria. This segment describes the online learning setting with a known time horizon, where a single player picks a distribution over actions, and an adversary reveals a cost vector. The speaker introduces the concept of "swap regret," a refined notion of regret analogous to external regret, and its connection to correlated equilibria. Minimizing swap regret through dynamics leads to the correlated equilibrium concept. This segment explains the concept of course-correlated equilibria and their tractability using multiplicative weights and no-regret algorithms. The speaker highlights that if all players use such algorithms, the joint play converges to the set of course-correlated equilibria, the largest of four equilibrium sets discussed. The speaker introduces the concept of correlated equilibria, a more refined equilibrium concept than course-correlated equilibria. The segment details that while requiring more sophisticated algorithms (no-swap regret algorithms), correlated equilibria are still tractable, with the history of joint play converging to the smaller set of correlated equilibria if all players use these refined algorithms. The discussion also touches upon the polynomial time aspect of convergence.This segment compares two approaches to achieving tractability in game theory: linear programming and learning dynamics. While linear programming offers exact solutions in polynomial time, the learning dynamics approach, though providing approximate solutions, better reflects the actual behavior of players in games. The speaker weighs the trade-offs between these two methods. This segment delves into a reduction technique from no external regret algorithms to no swap regret algorithms. The speaker explains the high-level idea of using n copies of no-regret algorithms (one for each action) to create a master no-swap regret algorithm. Each sub-algorithm is responsible for protecting against deviations from a specific action. The segment sets the stage for the detailed explanation of the reduction process. This segment highlights the core challenge remaining in the proof: making the left-hand side of the main inequality match the right-hand side. It emphasizes the importance of choosing the consensus distribution (PT) appropriately to achieve this match, setting the stage for the final trick.This segment reveals the ingenious solution: using the properties of Markov chains to compute the consensus distribution (PT). The speaker introduces the concept of a Markov chain with transition probabilities derived from the Qj's, hinting at the connection between the desired equation and the stationary distribution of this Markov chain. This segment clearly lays out the objective of proving the master algorithm has no swap regret, defining key terms like expected cost and counterfactual cost under a switching function. It sets the stage by outlining the necessary conditions and the approach to leverage the no-external-regret properties of individual algorithms.This section focuses on the perspective of each individual no-regret algorithm (Mj), detailing how it perceives its expected cost based on the scaled cost vectors it receives. It introduces the crucial inequality that compares the algorithm's perceived cost to what it would have been if it had played a fixed action every day, setting the groundwork for the summation step.This segment demonstrates the summation of inequalities derived in the previous segment over all no-regret algorithms. The analysis includes a careful examination of the regret terms, showing how they vanish as time goes to infinity, bringing the proof closer to its conclusion. This segment fully explains the Markov chain construction, showing how its stationary distribution satisfies the crucial equation needed to complete the proof. It connects the theoretical concept to the algorithm's practical implementation and concludes by stating the final result: the master algorithm has no swap regret.This segment provides a high-level interpretation of the results, explaining the meaning of the chosen consensus distribution (PT) in terms of iteratively asking for advice from different no-regret algorithms. It summarizes the overall achievement of proving the tractability of correlated equilibria in a specific context. This segment explores the counterintuitive min-max theorem, which states that in zero-sum games, it doesn't matter whether players move simultaneously or sequentially; the second player doesn't inherently have an advantage. The discussion contrasts the intuitive first-mover disadvantage with the theorem's surprising conclusion, highlighting the historical significance and initial disbelief surrounding this result. This segment introduces a novel approach to proving the min-max theorem using no-external regret algorithms. The speaker outlines the strategy of focusing on the reverse inequality and using the properties of no-regret algorithms to demonstrate the equivalence of the two sides of the min-max equation.This segment delves into the mechanics of no-regret algorithms, specifically how they handle payoff vectors in a zero-sum game setting. The explanation clarifies how payoff vectors are generated and fed into the algorithms, emphasizing the role of the algorithms in guiding players' strategies and the significance of the time-averaged strategies.This segment demonstrates how the time-averaged strategies generated by no-regret algorithms satisfy the min-max theorem up to a small error (epsilon). The speaker meticulously explains how the no-regret condition ensures that neither player could have significantly improved their payoff by using a different strategy, linking this to the time-averaged payoffs. This segment connects the min-max theorem to the concept of approximate Nash equilibrium. The speaker shows how the no-regret algorithms generate strategies that form an approximate Nash equilibrium, and by taking the limit as the error approaches zero, proves the existence of an exact Nash equilibrium, offering an alternative, more accessible proof compared to traditional methods.