Unsupervised — EFA + Clustering
We discover segments and their drivers, and define actions per profile.
- EFA (KMO/Bartlett) → latent factors
- K-Means / PAM / CLARA / hierarchical
- Silhouette, stability, and naming
SQL architecture to consolidate facts/dims → segmentation with EFA + clustering → calibrated repurchase models to drive targeted campaigns by probability × ticket.
Multi-product marketplace with heterogeneous customers. We need to identify segments and predict repurchase to focus campaigns and offers, increasing frequency and ticket while reducing churn.
vw_customer_features (R/F/Monetary, logistics, reviews, categories).Targeting the top two deciles by predicted probability concentrates the majority of expected revenue, reducing contact cost by up to 4× compared to untargeted campaigns.
Calibration curve near the diagonal (Brier = 0.11) means a predicted 30% probability translates to a real 30% repurchase rate — enabling reliable per-segment budget allocation.
Recency and delivery experience are the strongest predictors. Improving logistics SLA by 1 point increases repurchase probability by ~8%. Ticket is modeled separately via OLS.
High-Value Loyalists → loyalty & cross-sell. At-Risk Occasionals → win-back campaigns. Low-Engagement Churners → aggressive incentives or deprioritize.
features catalog, model versioning, and experiment logbook.We discover segments and their drivers, and define actions per profile.
Calibrated repurchase probability and expected ticket to prioritize campaigns.
avg_ticketExecutive sequence that integrates dashboards and key findings.
SQL (Snowflake)
Snowflake schema (facts/dims), analytic views, and warehouse orchestration.
scikit-learn
Clustering & supervised models, calibration, validation, and evaluation.
Tableau
Executive story, dashboards, KPI tracking, and sharing.
pandas
Feature engineering, time windows, joins, and I/O.
NumPy
Vectorized math and numerical helpers.