Iterative loop of ML development (example)

Choose architecture

Split data

Split large data into train-validation-test

Popular splits are
1. 70-15-15
2. 80-10-10
3. 60-20-20
cross validation

Explanatory data analysis

Examine
1. the size of the data (no of features, of training examples)
2. missing data
3. skewness

Specify model

Specify the model
1. parameters
2. learning algorithm
Determine cost function with regularization

Train model

We can have multiple model candidates, which will be considered and chosen in the evaluating stage

Train on the training set
Compute training error

Evaluate

(Source)

For each model, compute validation error (= cost function applied to the validation set or cross validation)
Pick the model with lowest validation error

Error diagnostics

Bias-Variance
error analysis
For skewed datasets: precision and sensitivity

Improve (for the next round)

Add more data on category where error occurs (after error analysis)
1. data augmentation
2. data synthesis
Transfer learning

Test

Confirm results on a test set
Calculate test error (generalization error)

Why don't we fit any parameters to test set?

This procedure ensures that we haven’t accidentally fit anything to the test set so that it’s a NEW, fair and not overly optimistic estimate of how well the model will generalize to new data. BEST PRACTICE: Make all decisions and tweaks to the learning algorithm before touching the test set.

My (Chiffon) Nguyen

Explorer

machine learning development process

Choose architecture

Split data

Explanatory data analysis

Specify model

Train model

Evaluate

Error diagnostics

Improve (for the next round)

Test

Graph View

Table of Contents

Backlinks