About Project

LATAM Equity Scanner — Cross-Market Profitability Analysis

Full CRISP-DM pipeline in R: 1,200 records from regulatory sources (CVM Brasil, SVS Chile, SBS Perú, Superfinanciera Colombia, EMIS) → cleaned to 1,062 companies across 11 countries and 18 sectorsnon-parametric hypothesis testing (Anderson-Darling, Kruskal-Wallis, Wilcoxon, Spearman) to build a distribution-free profitability benchmark that reveals which markets and industries actually outperform.

1) The Challenge

South American equity markets are fragmented across 11 regulatory environments, currencies, and reporting standards. Financial ratios like ROA and ROE exhibit extreme skewness (ROA skewness = −2.91, kurtosis = 88.4) and outlier contamination (10–12% of records), making parametric comparisons and means misleading. The ROE mean of −108.56% suggests catastrophic value destruction — but the median is +8.04%. The mean lies.

The mean lies. Average ROE across LATAM is −108.56%, but the median is +8.04%. This 116-point gap reveals extreme outlier contamination — making non-parametric methods essential for any credible cross-market comparison.

2) Approach — CRISP-DM in R

  1. Business Understanding: define ROA and ROE as primary variables; justify their complementary diagnostic power (asset efficiency vs. leverage amplification).
  2. Data Understanding: load 1,200 records × 26 variables from Cotizadas.xlsx; explore missingness, distributions, and qualitative structure.
  3. Data Preparation: 4-step pipeline — remove 1 duplicate, 87 without ROA/ROE, 7 without assets, 43 without sector → 1,062 clean records (11.5% eliminated). Create derived: Apalancamiento, Margen_Neto.
  4. Modeling: IQR 1.5× outlier fences → 164 outliers (15.4%). Full descriptive stats by country & sector. 14 statistical visualizations.
  5. Evaluation: Anderson-Darling normality → Kruskal-Wallis (4 tests) → Wilcoxon rank-sum (2 pairwise) → Spearman & Pearson correlation. All p-values documented.
  6. Deployment: tier classification, financial dictamen, Excel dashboard with aggregated KPIs.
Companies
1,062
From 1,200 raw
Countries
11
All South America
Sectors
18
EMIS classification
Spearman \(\rho\)
0.826
ROA ↔ ROE

3) Data Overview

Cleaning Funnel

StepNCut
Raw dataset1,200
(-) Duplicates1,1991
(-) No ROA & ROE1,11287
(-) No assets / net income1,1057
(-) No sector1,06243

11.5% of records eliminated. Guyana (7 companies) dropped entirely due to incomplete data.

Global Descriptive Statistics

MetricROA (%)ROE (%)
N1,0611,062
Mean2.54−108.56
Median2.548.04
Std. Dev.21.613,431.26
IQR7.2416.70
Skewness−2.91extreme neg.
Kurtosis88.40extreme
Range569.55 pp113,675 pp
Key insight: ROE mean = −108.56% vs. median = +8.04%. A single Venezuelan micro-bank (Inversiones Crece Pymes, ROA −307%) and Brazilian airlines in restructuring (Azul S.A., ROA −251%) distort the mean catastrophically. Always use medians.

Performance by Country

CountryN%ROA Med.ROE Med.ROA SDROE SDTier
Peru15314.4%3.429.9012.5430.14Tier 1
Brasil37635.4%3.2410.3023.565,765.64Tier 1
Colombia565.3%2.926.666.0813.99Tier 2
Chile16715.7%2.897.5314.5532.64Tier 2
Paraguay393.7%2.879.864.4815.33Tier 2
Argentina847.9%2.634.0933.11108.39Tier 2
Venezuela292.7%1.753.6261.88127.15Tier 2
Ecuador817.6%0.884.497.5415.14Tier 3
Bolivia676.3%0.663.888.8016.91Tier 3
Uruguay80.8%9.5622.1532.1549.61n=8
Surinam20.2%0.171.262.4435.45n=2

Top Sectors by Median ROA

SectorNROA Med.ROE Med.ROA MeanROE MeanNote
Energy & Utilities1125.6412.754.1412.11Regulated revenues, low dispersion
Food & Beverage374.4010.565.5212.75Stable demand, defensive
Metals & Mining394.287.431.266.62Mean < median: outlier miners
Healthcare113.847.574.076.38Small sample
Wholesale253.387.743.737.90Lowest dispersion (SD=4.15)
Services433.3611.133.5612.78Strong leverage effect
Banking & Insurance2161.8611.282.868.92ROA→ROE: 6× leverage amplification
Tourism & Leisure310.00−0.62−1.24−15.45Only sector with negative median ROE

Outlier Detection (IQR 1.5×)

VariableFencesN Outliers%
ROA[−10.81, 18.15]10910.27%
ROE[−24.60, 42.20]12511.77%

164 unique outlier records removed (15.4%) → 898 for robust analysis. ROA SD drops from 21.61 to 4.99.

Most Extreme Companies

CompanyROACountry
Inversiones Crece Pymes−307.64Venezuela
Azul S.A. (restructuring)−251.32Brasil
Nexpe Participacoes (judicial)−140.67Brasil
Paranapanema S.A. (judicial)−131.44Brasil

Companies in judicial restructuring or with micro-assets produce extreme ratios that devastate the mean.

4) Key Findings

Geographic Tiers

Peru leads Tier 1 — ROA 3.42%, ROE 9.90%

Best risk-return profile with a reliable sample (n=153). Low ROA dispersion (SD=12.54) vs. Brazil's extreme volatility (SD=23.56). Kruskal-Wallis confirms cross-country differences are highly significant (p = 6.5×10−6).

Sector Dominance

Energy leads at 5.64% median ROA

Wilcoxon confirms Energy > Banking (W=11,923.5, p=9.4×10−5). Regulated revenues in resource-rich economies create structural advantage. Food & Beverage (4.40%) and Metals & Mining (4.28%) complete the top tier.

Leverage Effect

Banking: ROA 1.86% → ROE 11.28%

Spearman \(\rho\) = 0.826 between ROA and ROE across the full dataset (p = 4.65×10−225). Banks exemplify the leverage amplifier: modest asset returns multiply into double-digit equity returns. ROE median exceeds ROA median in all 11 countries.

Post-COVID

Tourism: only sector with negative ROE

Median ROA = 0.00%, median ROE = −0.62%. Mean ROE = −15.45%. The sole sector where profitability has not recovered. Kruskal-Wallis for ROE by sector: \(\chi^2\) = 46.25, p = 6.3×10−6.

Convergence

Brazil ≈ Chile in ROA

Despite being the two largest markets (376 + 167 = 51% of sample), their median ROA is statistically indistinguishable. Wilcoxon: W=23,774, p=0.358. Investors see similar asset returns in either market.

Profitability

23% of listed companies lose money

Nearly 1 in 4 firms has negative ROA. Paraguay leads with 94.9% profitable; Bolivia, Ecuador, and Venezuela drag the average. Contingency analysis reveals structural country-level differences in profitability rates.

The mean is always wrong: ROE mean = −108.56% paints a picture of regional value destruction. Reality (median = +8.04%) is that the typical LATAM listed company generates solid returns. A handful of restructuring cases (Azul, Nexpe, Paranapanema) and micro-firms with near-zero assets produce ratios of −300% that obliterate the average. This project demonstrates why non-parametric methods are not optional for emerging-market financial analysis.

5) Hypothesis Tests — Summary

All tests at \(\alpha = 0.05\). Non-parametric methods chosen after Anderson-Darling rejected normality.

Anderson-Darling: ROA Normality
AD = 10.0155  |  p < 2.2×10−16
Reject H0 — ROA is not normally distributed, even after outlier removal.
Anderson-Darling: ROE Normality
AD = 1.8190  |  p = 0.0001
Reject H0 — ROE is not normally distributed.
Kruskal-Wallis: ROA by Country
\(\chi^2\) = 38.36, df = 8  |  p = 6.455×10−6
Reject H0 — Significant ROA differences between countries.
Kruskal-Wallis: ROE by Country
\(\chi^2\) = 36.96, df = 8  |  p = 1.169×10−5
Reject H0 — Significant ROE differences between countries.
Kruskal-Wallis: ROA by Sector
\(\chi^2\) = 34.47, df = 12  |  p = 5.692×10−4
Reject H0 — Significant ROA differences between sectors.
Kruskal-Wallis: ROE by Sector
\(\chi^2\) = 46.25, df = 12  |  p = 6.277×10−6
Reject H0 — Significant ROE differences between sectors.
Wilcoxon: ROA Brasil vs Chile
W = 23,774  |  p = 0.358
Fail to reject H0 — No significant difference. The two largest markets converge.
Wilcoxon: ROA Energy vs Banking
W = 11,923.5  |  p = 9.397×10−5
Reject H0 — Energy has significantly higher ROA than Banking.
Spearman: ROA vs ROE
\(\rho\) = 0.8258  |  p = 4.65×10−225
Reject H0Strong positive monotonic association.
Pearson: ROA vs ROE
r = 0.7767  |  p = 4.28×10−182
Reject H0 — Linear component also strong (but Spearman > Pearson confirms non-linear structure).

Analytical Modules

Statistical Analysis

R Notebook — Non-Parametric Pipeline

Full CRISP-DM analysis in R (IRkernel on Google Colab). From raw data to hypothesis testing and financial dictamen.

  • 14 statistical visualizations (boxplots, QQ, density, scatter)
  • Anderson-Darling + 4 Kruskal-Wallis + 2 Wilcoxon + Spearman/Pearson
  • IQR outlier detection & impact analysis
  • Country & sector tier classification
Dashboard

Executive Dashboard — Financial KPIs

Aggregated KPIs, country & sector breakdowns, top 10 companies, and alert signals — built from the cleaned dataset.

  • 1,036 companies × 11 countries × $5.08T MM total assets
  • Avg. ROA 2.97%  |  75.2% profitable companies
  • Top 10 by revenue  |  18 sector comparison
  • 8 alerts: critical, warnings & positive signals

7) Next Step

  1. Panel data: extend to multi-year longitudinal analysis for trend identification and causal inference (2015–2024 fiscal years are available).
  2. Risk-adjusted metrics: incorporate Sharpe-like ratios and downside risk measures per country-sector pair.
  3. Sector deep-dives: detailed sub-sector analysis for Energy, Banking, and Metals & Mining.
  4. Interactive dashboard: Shiny/Plotly app with country-sector drill-downs and dynamic filtering.
  5. Investment lens: Tier-1 markets (Peru, Chile) show consistent profitability; Tier-3 (Ecuador, Paraguay) need governance adjustments before capital allocation.

Tech Stack

R (IRkernel)

Primary language. dplyr, tidyr, readxl for data wrangling on 1,200×26 dataset.

ggplot2 / gridExtra

14 statistical visualizations: boxplots, QQ plots, density overlays, scatter with regression.

nortest / stats

Anderson-Darling, Kruskal-Wallis, Wilcoxon rank-sum, Spearman & Pearson correlation.

knitr / kableExtra

Publication-quality tables, styled HTML output, frequency distributions.

Google Colab

Cloud-based R notebook for reproducible analysis. IRkernel runtime.

CRISP-DM

End-to-end methodology across all 6 phases: business understanding through deployment.

Scope & Limitations

Scope
  • End-to-end CRISP-DM pipeline: raw data → cleaning → EDA → hypothesis testing → tiers.
  • 7 non-parametric tests with documented statistics and p-values.
  • Country and sector tier rankings based on median ROA/ROE with sufficient sample sizes.
  • Leverage-profitability dynamics via Spearman correlation (\(\rho\) = 0.826).
  • Excel dashboard with aggregated KPIs for 1,036 companies.
Limitations
  • Cross-sectional only: majority of data is 2023–2024; no panel/time-series dynamics.
  • No FX adjustment: financial variables in millions USD, but ratios reflect local reporting.
  • Survivorship bias: only currently listed firms; delisted/failed companies excluded.
  • Sample imbalance: Brazil = 35.4% of data; Uruguay (n=8) and Surinam (n=2) are unreliable alone.
  • Sector classifications from EMIS may not align perfectly across all exchanges.