(ML Intuition) (Code example)

In machine learning, cost function describes how well the current learning algorithm is performing on the given dataset.

The goal is to choose parameters so that learning algorithm is the closest to of the dataset. In other words, the objective function is to minimize the cost function

Linear models

In linear models, minimizing cost function is equivalent to minimizing the residuals

Code

(Source)

#sample cost function for linear regression
import numpy as np
 
def compute_cost(x, y, w, b): 
    """
    Computes the cost function for linear regression.
    
    Args:
      x (ndarray (m,)): Data, m examples 
      y (ndarray (m,)): target values
      w,b (scalar)    : model parameters  
    
    Returns
        total_cost (float): The cost of using w,b as the parameters for linear regression
               to fit the data points in x and y
    """
    # number of training examples
    m = x.shape[0] 
    
    cost_sum = 0 
    for i in range(m): 
        f_wb = w * x[i] + b   
        cost = (f_wb - y[i]) ** 2  
        cost_sum = cost_sum + cost  
    total_cost = (1 / (2 * m)) * cost_sum  
 
    return total_cost
 
J = compute_cost(X, y, theta=np.array([0.0, 0.0]))
print('With theta = [0, 0] \nCost computed = %.2f' % J)
print('Expected cost value (approximately) 32.07\n')
 

With regularization

Cost function with regularization

regularization is added directly to the cost function

For example, mean squared error (cost of linear regression) with Ridge regularization:

  • : a vector of weights, a scalar-valued bias term
  • : number of training examples
  • : learning algorithm to fit a vector of features
  • : training example in the dataset
  • : regularization parameter. How much we want to shrink the impact of some predictors. The larger the more penalty.
  • : number of features
  • : weight parameter (to be penalized)

In practice, we might or might not penalize parameter , because it makes little difference. It’s just a number.

How to choose

(Source)

  1. Try different values for , each doubling the previous:
    1. Minimize cost function with regularization, as we do normal cost function.
    2. Evaluate parameters on the validation set
  2. Pick parameters that has the lowest validation error
  3. Report test error (or cross validation error as an estimate)
Link to original