Lecture 12: Optimization: Multiple variables, constraints (part I)

CS205 midterm was hard but the grade is generously curved. Homework 4 is out; next week covers the conjugate gradient algorithm (read Jonathan Shewchuk's paper beforehand). The lecture covers optimization: Newton's method, bisection, unimodal functions, golden section search, gradient descent, and quasi-Newton methods. Derivative-free methods are discussed due to the computational cost of calculating Hessians. The professor acknowledges the difficulty of the midterm exam, reassures students about the generous grading curve, and emphasizes the importance of class participation for extra credit. He highlights that the midterm score is a small percentage of the final grade and participation can significantly improve the overall grade. The instructor clarifies the finality of midterm scores but emphasizes the significant impact of class participation on the final grade. He explains how active participation, including attending lectures, online engagement, and helping fellow students, can compensate for a lower midterm score. The professor discusses the difficulty of the midterm exam and suggests that the final exam might have a similar difficulty level. He advises students to review the midterm problems that many students missed, as these might reappear on the final exam. The lecture introduces the concept of optimization, focusing on minimizing functions. The instructor connects this to variational problems, where minimizing an energy function leads to desired properties, illustrating the concept's broad applicability. The lecture transitions from single-variable to multivariable optimization, introducing the concept of gradients and their role in finding the direction of steepest ascent or descent. The instructor explains how the negative gradient can be used for iterative optimization via gradient descent.This segment explains the gradient descent method for optimization, detailing how to iteratively move towards a minimum by following the negative gradient. It introduces the concept of a line search, a crucial step in gradient descent where the optimal step size (T) along the gradient direction is determined by minimizing a one-variable function, effectively finding the best point along that line before recalculating the gradient and repeating the process. The explanation clarifies the iterative nature of the algorithm and the importance of line search in achieving efficient convergence. This segment delves into Newton's method for optimization, highlighting its use of the Hessian matrix (matrix of second derivatives) to approximate the function as a quadratic near the current point. It explains how finding the minimum of this quadratic approximation leads to an update rule involving the inverse of the Hessian and the gradient. The discussion emphasizes the importance of checking if the Hessian is positive definite to ensure convergence towards a minimum rather than a maximum or saddle point, and it introduces the Cholesky factorization as a method for verifying positive definiteness.