Olist — Executive Report on Unsupervised Modeling

Customer segmentation (K-Means) + latent factors (FA/PCA). Data & figures from /static/data/OLIST.

Summary

Auto-parsed from outputs
Rows processed
Variables
Clusters (k)
Silhouette
Calinski–Harabasz
Davies–Bouldin
We read outputs/customer_segments.csv and logs/run.log to populate these metrics.

Cluster composition

Distribution
Source: customer_segments.csv.

Satisfaction & Logistics

Cluster averages
Averages of avg_review_score and late_rate (if present).

Model visual evidence

Silhouette • Elbow • PCA • Scree
Silhouette score analysis across different values of k for K-Means clustering Elbow method plot showing inertia vs number of clusters for optimal k selection PCA 2D scatter plot showing customer clusters projected onto first two principal components
Scree plot showing explained variance ratio per principal component

Downloads

If FA wasn’t available, the pipeline produces pca_loadings.xlsx and pca_scores.parquet.

Executive narrative

  • Optimal k = 3 (validated by Silhouette and the Elbow pattern).
  • Clear separation in 2D PCA; one cluster shows wider dispersion → candidate for sub-segmentation.
  • Latent factors capture sales/ticket, logistics/timing, and purchase patterns.
  • Recommendation: persist cluster in the DB for STP campaigns and churn models.

Segment profiles

ClusterLabelProfileRecommended action
0 High-Value Loyalists High recency, above-average ticket, strong review scores, low late rate Loyalty program, cross-sell premium categories, early access offers
1 At-Risk Occasionals Medium recency, average ticket, mixed logistics experience Win-back campaigns, service recovery, targeted discounts on repeat categories
2 Low-Engagement Churners Low recency, small ticket, high late rate, lower satisfaction Re-engagement with aggressive incentives or deprioritize to reduce CAC
Profiles derived from cluster centroids on standardized features. Use probability × expected ticket to prioritize within each segment.

Quick data preview

First rows
Source: outputs/customer_segments.csv

Run log

logs/run.log
Show log