Lecture 10: Systems of equations; optimization in one variable (part II)

This segment introduces limited-memory algorithms as a solution to the memory issues associated with updating the Jacobian matrix in Newton's method. The speaker explains the strategy of resetting or reconstructing the Jacobian after a certain number of iterations to improve efficiency and avoid excessive memory usage. This segment focuses on two main challenges in Newton's method: the difficulty of differentiation and the computational cost of inverting the Jacobian matrix. The speaker highlights the use of secant methods to approximate derivatives and introduces the concept of a clever approach to address matrix inversion. This segment introduces the Sherman-Morrison formula, a crucial tool in optimization that provides an efficient way to update the inverse of a matrix after a rank-one update. The speaker emphasizes the formula's practical applications and its widespread use in various optimization algorithms. Many root-finding methods (e.g., Newton's) require Jacobian matrix computations. To avoid expensive Jacobian inversions, limited-memory methods update approximations (e.g., using the Sherman-Morrison formula) rather than recomputing the full Jacobian. Automatic differentiation provides exact Jacobians, albeit slower, by augmenting variables with derivatives. Optimization often involves minimizing energy functions; while closed-form solutions are ideal, iterative methods are necessary for complex functions (e.g., non-linear least squares, maximum likelihood estimation). Analyzing the Hessian matrix determines whether a stationary point (gradient=0) is a minimum or maximum. This segment explains the concept of maximum likelihood estimation (MLE) for estimating parameters of probability distributions, highlighting its prevalence in machine learning. It then introduces the challenges that arise when dealing with complex probability distributions or high-dimensional parameter spaces, illustrating how these problems often require advanced optimization techniques. The discussion transitions to real-world examples, such as fitting distributions to student heights and the complexities of Bayesian networks, showcasing the practical applications and difficulties encountered in these scenarios. This segment provides a clear and concise explanation of global and local minima in the context of optimization problems. It starts by defining these concepts intuitively, then moves to a more formal mathematical description, emphasizing the importance of understanding these concepts for solving optimization problems. The discussion also touches upon the challenges of dealing with non-differentiable functions, highlighting the robustness and applicability of the definitions across various function types. This segment delves into the geometric median problem, a non-trivial optimization challenge where the goal is to find a point that minimizes the sum of distances to a set of points in a plane. The speaker contrasts this problem with the simpler least squares problem, highlighting the computational difficulties introduced by the absence of squaring. The discussion then focuses on the properties of the geometric median, including its connection to the median on a number line and its dependence on only a few closest points, providing valuable insights into the nature of this specific optimization problem and its broader implications. This segment discusses the limitations of approximate Jacobians, especially when dealing with large steps or complex functions. It sets the stage for the introduction of automatic differentiation as a solution to obtain accurate Jacobians without manual computation.