Suppose we have a multivariate objective function
Choose an initial vector
- Determine a direction vector
- Determine a scalar step length
( ) - Compute the next design point
using the update formula
Thus, we are creating a sequence of vectors
One popular method or algorithm is gradient descent
Reference
Sections 4.1-4.3 in Kochenderfer, M. J., & Wheeler, T. A. (2019). Algorithms for optimization
Algorithm
Usage: In Descent Method
Line search is typically integrated into descent method (e.g., gradient descent, Newton-Raphson method, conjugate gradient) to determine the optimal step size
for each iteration. Link to original
- Initialize at the starting point
. - Compute the direction
for descent, typically the negative gradient direction in gradient descent or other calculated directions in other methods. - Perform line search to find the optimal step size
that minimizes the function along the direction : - Update the point
. - Repeat until convergence.
Termination conditions
Given a small number
Absolute Improvement
The algorithm terminates when the difference in function value between two consecutive iterations
Relative Improvement
With this, the descent method terminates when the relative change in the function value between iterations becomes negligible (smaller than a small fraction of the current function value,
This condition is useful when the magnitude of the objective function is large. If the function values are large, a small absolute improvement may still be significant relative to the current value. This prevents premature stopping when the function is still improving at a reasonable rate.
Gradient Magnitude
This condition stops the descent method when the magnitude of the gradient
This condition is a strong indicator of convergence, but requires the objective function to be differentiable.
Max Iterations
This condition limits the maximum number of iterations the algorithm can run, regardless of the other conditions. It is an important safety stop to prevent the algorithm from running indefinitely, especially in situations where convergence is slow or not guaranteed (e.g. in gradient descent).
Why it’s good to know different termination conditions
-
Flexibility: Different optimization problems may require different types of termination criteria. For example, some problems may require higher precision in the objective function value (where absolute improvement might be important), while others may be more sensitive to the gradient (where gradient magnitude might be the priority).
-
Robustness: By using multiple conditions, the algorithm can be made more robust and ensure that it doesn’t terminate too early or run for too long. For example, in non-convex optimization problems, the gradient magnitude alone may not be sufficient, and a combination of conditions like relative improvement and max iterations can provide a better stopping criterion.
-
Efficiency: Different termination conditions can be used to speed up the optimization process by stopping the algorithm once it has converged sufficiently, without requiring excessive computation. For example, if the algorithm has achieved a certain level of relative improvement, it may not be necessary to continue refining the solution further.
Caveat
- Likelihood of being stuck in a local minimum or a plateau
- The function has to have a minimum, for the descent method to converge at a point.