clustering is an unsupervised learning algorithm that automatically groups similar data points together into homogeneous classes/clusters.
Application
- Group similar news
- Market segmentation
- Analyze DNA into groups
- Group astronomical data
Characteristics of an effective clustering model
- The clusters are clearly identifiable.
- Within each intercluster, there is lots of empty space.
- Within each intracluster, the points are close to each other.
Metrics
For an effective model, we want to
- minimize inertia
- maximize silhouette score
Inertia
Transclude of inertia
Silhouette score
Transclude of silhouette-score