Suppose we have a multivariate objective function , and let be our design variable(s), represented as an dimensional column vector. The descent method has 4 components: initial point , step size , direction vector(s), and termination conditions

Choose an initial vector in the domain (feasible region) of , then repeat the following, indexing the steps using , until a termination condition is met:

  • Determine a direction vector
  • Determine a scalar step length ()
  • Compute the next design point using the update formula

Thus, we are creating a sequence of vectors that step through the design space towards the local minimum

One popular method or algorithm is gradient descent

Reference

Sections 4.1-4.3 in Kochenderfer, M. J., & Wheeler, T. A. (2019). Algorithms for optimization

Algorithm

Usage: In Descent Method

Line search is typically integrated into descent method (e.g., gradient descent, Newton-Raphson method, conjugate gradient) to determine the optimal step size for each iteration.

  1. Initialize at the starting point .
  2. Compute the direction for descent, typically the negative gradient direction in gradient descent or other calculated directions in other methods.
  3. Perform line search to find the optimal step size that minimizes the function along the direction :
  4. Update the point .
  5. Repeat until convergence.
Link to original

Termination conditions

Given a small number

Absolute Improvement

The algorithm terminates when the difference in function value between two consecutive iterations and becomes smaller than a small threshold , meaning that the function value is no longer improving significantly and implying that we are approaching a local minimum.

Relative Improvement

With this, the descent method terminates when the relative change in the function value between iterations becomes negligible (smaller than a small fraction of the current function value, )

This condition is useful when the magnitude of the objective function is large. If the function values are large, a small absolute improvement may still be significant relative to the current value. This prevents premature stopping when the function is still improving at a reasonable rate.

Gradient Magnitude

This condition stops the descent method when the magnitude of the gradient of the objective function is close to 0. The gradient represents the direction and rate of change of the function. When the gradient is close to zero, it means that the function is close to a critical point (which could be a local minimum, maximum, or saddle point).

This condition is a strong indicator of convergence, but requires the objective function to be differentiable.

Max Iterations

This condition limits the maximum number of iterations the algorithm can run, regardless of the other conditions. It is an important safety stop to prevent the algorithm from running indefinitely, especially in situations where convergence is slow or not guaranteed (e.g. in gradient descent).

Why it’s good to know different termination conditions

  1. Flexibility: Different optimization problems may require different types of termination criteria. For example, some problems may require higher precision in the objective function value (where absolute improvement might be important), while others may be more sensitive to the gradient (where gradient magnitude might be the priority).

  2. Robustness: By using multiple conditions, the algorithm can be made more robust and ensure that it doesn’t terminate too early or run for too long. For example, in non-convex optimization problems, the gradient magnitude alone may not be sufficient, and a combination of conditions like relative improvement and max iterations can provide a better stopping criterion.

  3. Efficiency: Different termination conditions can be used to speed up the optimization process by stopping the algorithm once it has converged sufficiently, without requiring excessive computation. For example, if the algorithm has achieved a certain level of relative improvement, it may not be necessary to continue refining the solution further.

Caveat

  • Likelihood of being stuck in a local minimum or a plateau
  • The function has to have a minimum, for the descent method to converge at a point.