Olist — Executive Report on Unsupervised Modeling

Customer segmentation (K-Means) + latent factors (FA/PCA). Data & figures from /static/data/OLIST.

Auto-parsed from outputs

Rows processed

—

Variables

—

Clusters (k)

—

Silhouette

—

Calinski–Harabasz

—

Davies–Bouldin

—

We read outputs/customer_segments.csv and logs/run.log to populate these metrics.

Distribution

Source: customer_segments.csv.

Cluster averages

Averages of avg_review_score and late_rate (if present).

Silhouette • Elbow • PCA • Scree

If FA wasn’t available, the pipeline produces pca_loadings.xlsx and pca_scores.parquet.

Optimal k = 3 (validated by Silhouette and the Elbow pattern).
Clear separation in 2D PCA; one cluster shows wider dispersion → candidate for sub-segmentation.
Latent factors capture sales/ticket, logistics/timing, and purchase patterns.
Recommendation: persist cluster in the DB for STP campaigns and churn models.

Cluster	Label	Profile	Recommended action
0	High-Value Loyalists	High recency, above-average ticket, strong review scores, low late rate	Loyalty program, cross-sell premium categories, early access offers
1	At-Risk Occasionals	Medium recency, average ticket, mixed logistics experience	Win-back campaigns, service recovery, targeted discounts on repeat categories
2	Low-Engagement Churners	Low recency, small ticket, high late rate, lower satisfaction	Re-engagement with aggressive incentives or deprioritize to reduce CAC

Profiles derived from cluster centroids on standardized features. Use probability × expected ticket to prioritize within each segment.

First rows

Source: outputs/customer_segments.csv

logs/run.log

Show log