Olist — Executive Report on Unsupervised Modeling
Customer segmentation (K-Means) + latent factors (FA/PCA). Data & figures from /static/data/OLIST.
Summary
Auto-parsed from outputsRows processed
—
Variables
—
Clusters (k)
—
Silhouette
—
Calinski–Harabasz
—
Davies–Bouldin
—
We read outputs/customer_segments.csv and logs/run.log to populate these metrics.
Cluster composition
DistributionSource: customer_segments.csv.
Satisfaction & Logistics
Cluster averagesAverages of avg_review_score and late_rate (if present).
Model visual evidence
Silhouette • Elbow • PCA • Scree
Downloads
If FA wasn’t available, the pipeline produces pca_loadings.xlsx and pca_scores.parquet.
Executive narrative
- Optimal k = 3 (validated by Silhouette and the Elbow pattern).
- Clear separation in 2D PCA; one cluster shows wider dispersion → candidate for sub-segmentation.
- Latent factors capture sales/ticket, logistics/timing, and purchase patterns.
- Recommendation: persist cluster in the DB for STP campaigns and churn models.
Segment profiles
| Cluster | Label | Profile | Recommended action |
|---|---|---|---|
| 0 | High-Value Loyalists | High recency, above-average ticket, strong review scores, low late rate | Loyalty program, cross-sell premium categories, early access offers |
| 1 | At-Risk Occasionals | Medium recency, average ticket, mixed logistics experience | Win-back campaigns, service recovery, targeted discounts on repeat categories |
| 2 | Low-Engagement Churners | Low recency, small ticket, high late rate, lower satisfaction | Re-engagement with aggressive incentives or deprioritize to reduce CAC |
Profiles derived from cluster centroids on standardized features. Use probability × expected ticket to prioritize within each segment.
Quick data preview
First rowsSource: outputs/customer_segments.csv