Iterative loop of ML development (example)

Choose architecture

Split data

Split large data into train-validation-test

  1. Popular splits are
    1. 70-15-15
    2. 80-10-10
    3. 60-20-20
  2. cross validation

Explanatory data analysis

  1. Examine
    1. the size of the data (no of features, of training examples)
    2. missing data
    3. skewness

Specify model

  1. Specify the model
    1. parameters
    2. learning algorithm
  2. Determine cost function with regularization

Train model

We can have multiple model candidates, which will be considered and chosen in the evaluating stage

  1. Train on the training set
  2. Compute training error

Evaluate

(Source)

  1. For each model, compute validation error (= cost function applied to the validation set or cross validation)
  2. Pick the model with lowest validation error

Error diagnostics

  1. Bias-Variance
  2. error analysis
  3. For skewed datasets: precision and sensitivity

Improve (for the next round)

  1. Add more data on category where error occurs (after error analysis)
    1. data augmentation
    2. data synthesis
  2. Transfer learning

Test

  1. Confirm results on a test set
  2. Calculate test error (generalization error)