About Project

LATAM Equity Scanner — Cross-Market Profitability Analysis

Full CRISP-DM pipeline in R: 1,200 records from regulatory sources (CVM Brasil, SVS Chile, SBS Perú, Superfinanciera Colombia, EMIS) → cleaned to 1,062 companies across 11 countries and 18 sectors → non-parametric hypothesis testing (Anderson-Darling, Kruskal-Wallis, Wilcoxon, Spearman) to build a distribution-free profitability benchmark that reveals which markets and industries actually outperform.

1) The Challenge

South American equity markets are fragmented across 11 regulatory environments, currencies, and reporting standards. Financial ratios like ROA and ROE exhibit extreme skewness (ROA skewness = −2.91, kurtosis = 88.4) and outlier contamination (10–12% of records), making parametric comparisons and means misleading. The ROE mean of −108.56% suggests catastrophic value destruction — but the median is +8.04%. The mean lies.

Fragmentation: Brazil alone holds 35.4% of listings; Uruguay and Surinam together represent <1%.
Non-normality: Anderson-Darling rejects normality even after outlier removal (p < 2.2×10⁻¹⁶ for ROA).
Goal: rank countries and sectors using robust, distribution-free methods — medians, IQR, and rank-based tests.

The mean lies. Average ROE across LATAM is −108.56%, but the median is +8.04%. This 116-point gap reveals extreme outlier contamination — making non-parametric methods essential for any credible cross-market comparison.

2) Approach — CRISP-DM in R

Business Understanding: define ROA and ROE as primary variables; justify their complementary diagnostic power (asset efficiency vs. leverage amplification).
Data Understanding: load 1,200 records × 26 variables from Cotizadas.xlsx; explore missingness, distributions, and qualitative structure.
Data Preparation: 4-step pipeline — remove 1 duplicate, 87 without ROA/ROE, 7 without assets, 43 without sector → 1,062 clean records (11.5% eliminated). Create derived: Apalancamiento, Margen_Neto.
Modeling: IQR 1.5× outlier fences → 164 outliers (15.4%). Full descriptive stats by country & sector. 14 statistical visualizations.
Evaluation: Anderson-Darling normality → Kruskal-Wallis (4 tests) → Wilcoxon rank-sum (2 pairwise) → Spearman & Pearson correlation. All p-values documented.
Deployment: tier classification, financial dictamen, Excel dashboard with aggregated KPIs.

Companies

1,062

From 1,200 raw

Countries

All South America

Sectors

EMIS classification

Spearman $\rho$

0.826

ROA ↔ ROE

3) Data Overview

Cleaning Funnel

Step	N	Cut
Raw dataset	1,200	—
(-) Duplicates	1,199	1
(-) No ROA & ROE	1,112	87
(-) No assets / net income	1,105	7
(-) No sector	1,062	43

11.5% of records eliminated. Guyana (7 companies) dropped entirely due to incomplete data.

Global Descriptive Statistics

Metric	ROA (%)	ROE (%)
N	1,061	1,062
Mean	2.54	−108.56
Median	2.54	8.04
Std. Dev.	21.61	3,431.26
IQR	7.24	16.70
Skewness	−2.91	extreme neg.
Kurtosis	88.40	extreme
Range	569.55 pp	113,675 pp

Key insight: ROE mean = −108.56% vs. median = +8.04%. A single Venezuelan micro-bank (Inversiones Crece Pymes, ROA −307%) and Brazilian airlines in restructuring (Azul S.A., ROA −251%) distort the mean catastrophically. Always use medians.

Performance by Country

Country	N	%	ROA Med.	ROE Med.	ROA SD	ROE SD	Tier
Peru	153	14.4%	3.42	9.90	12.54	30.14	Tier 1
Brasil	376	35.4%	3.24	10.30	23.56	5,765.64	Tier 1
Colombia	56	5.3%	2.92	6.66	6.08	13.99	Tier 2
Chile	167	15.7%	2.89	7.53	14.55	32.64	Tier 2
Paraguay	39	3.7%	2.87	9.86	4.48	15.33	Tier 2
Argentina	84	7.9%	2.63	4.09	33.11	108.39	Tier 2
Venezuela	29	2.7%	1.75	3.62	61.88	127.15	Tier 2
Ecuador	81	7.6%	0.88	4.49	7.54	15.14	Tier 3
Bolivia	67	6.3%	0.66	3.88	8.80	16.91	Tier 3
Uruguay	8	0.8%	9.56	22.15	32.15	49.61	n=8
Surinam	2	0.2%	0.17	1.26	2.44	35.45	n=2

Top Sectors by Median ROA

Sector	N	ROA Med.	ROE Med.	ROA Mean	ROE Mean	Note
Energy & Utilities	112	5.64	12.75	4.14	12.11	Regulated revenues, low dispersion
Food & Beverage	37	4.40	10.56	5.52	12.75	Stable demand, defensive
Metals & Mining	39	4.28	7.43	1.26	6.62	Mean < median: outlier miners
Healthcare	11	3.84	7.57	4.07	6.38	Small sample
Wholesale	25	3.38	7.74	3.73	7.90	Lowest dispersion (SD=4.15)
Services	43	3.36	11.13	3.56	12.78	Strong leverage effect
Banking & Insurance	216	1.86	11.28	2.86	8.92	ROA→ROE: 6× leverage amplification
Tourism & Leisure	31	0.00	−0.62	−1.24	−15.45	Only sector with negative median ROE

Outlier Detection (IQR 1.5×)

Variable	Fences	N Outliers	%
ROA	[−10.81, 18.15]	109	10.27%
ROE	[−24.60, 42.20]	125	11.77%

164 unique outlier records removed (15.4%) → 898 for robust analysis. ROA SD drops from 21.61 to 4.99.

Most Extreme Companies

Company	ROA	Country
Inversiones Crece Pymes	−307.64	Venezuela
Azul S.A. (restructuring)	−251.32	Brasil
Nexpe Participacoes (judicial)	−140.67	Brasil
Paranapanema S.A. (judicial)	−131.44	Brasil

Companies in judicial restructuring or with micro-assets produce extreme ratios that devastate the mean.

4) Key Findings

Geographic Tiers

Peru leads Tier 1 — ROA 3.42%, ROE 9.90%

Best risk-return profile with a reliable sample (n=153). Low ROA dispersion (SD=12.54) vs. Brazil's extreme volatility (SD=23.56). Kruskal-Wallis confirms cross-country differences are highly significant (p = 6.5×10⁻⁶).

Sector Dominance

Energy leads at 5.64% median ROA

Wilcoxon confirms Energy > Banking (W=11,923.5, p=9.4×10⁻⁵). Regulated revenues in resource-rich economies create structural advantage. Food & Beverage (4.40%) and Metals & Mining (4.28%) complete the top tier.

Leverage Effect

Banking: ROA 1.86% → ROE 11.28%

Spearman $\rho$ = 0.826 between ROA and ROE across the full dataset (p = 4.65×10⁻²²⁵). Banks exemplify the leverage amplifier: modest asset returns multiply into double-digit equity returns. ROE median exceeds ROA median in all 11 countries.

Post-COVID

Tourism: only sector with negative ROE

Median ROA = 0.00%, median ROE = −0.62%. Mean ROE = −15.45%. The sole sector where profitability has not recovered. Kruskal-Wallis for ROE by sector: $\chi^2$ = 46.25, p = 6.3×10⁻⁶.

Convergence

Brazil ≈ Chile in ROA

Despite being the two largest markets (376 + 167 = 51% of sample), their median ROA is statistically indistinguishable. Wilcoxon: W=23,774, p=0.358. Investors see similar asset returns in either market.

Profitability

23% of listed companies lose money

Nearly 1 in 4 firms has negative ROA. Paraguay leads with 94.9% profitable; Bolivia, Ecuador, and Venezuela drag the average. Contingency analysis reveals structural country-level differences in profitability rates.

The mean is always wrong: ROE mean = −108.56% paints a picture of regional value destruction. Reality (median = +8.04%) is that the typical LATAM listed company generates solid returns. A handful of restructuring cases (Azul, Nexpe, Paranapanema) and micro-firms with near-zero assets produce ratios of −300% that obliterate the average. This project demonstrates why non-parametric methods are not optional for emerging-market financial analysis.

5) Hypothesis Tests — Summary

All tests at $\alpha = 0.05$. Non-parametric methods chosen after Anderson-Darling rejected normality.

Anderson-Darling: ROA Normality

AD = 10.0155 | p < 2.2×10⁻¹⁶

Reject H₀ — ROA is not normally distributed, even after outlier removal.

Anderson-Darling: ROE Normality

AD = 1.8190 | p = 0.0001

Reject H₀ — ROE is not normally distributed.

Kruskal-Wallis: ROA by Country

$\chi^2$ = 38.36, df = 8 | p = 6.455×10⁻⁶

Reject H₀ — Significant ROA differences between countries.

Kruskal-Wallis: ROE by Country

$\chi^2$ = 36.96, df = 8 | p = 1.169×10⁻⁵

Reject H₀ — Significant ROE differences between countries.

Kruskal-Wallis: ROA by Sector

$\chi^2$ = 34.47, df = 12 | p = 5.692×10⁻⁴

Reject H₀ — Significant ROA differences between sectors.

Kruskal-Wallis: ROE by Sector

$\chi^2$ = 46.25, df = 12 | p = 6.277×10⁻⁶

Reject H₀ — Significant ROE differences between sectors.

Wilcoxon: ROA Brasil vs Chile

W = 23,774 | p = 0.358

Fail to reject H₀ — No significant difference. The two largest markets converge.

Wilcoxon: ROA Energy vs Banking

W = 11,923.5 | p = 9.397×10⁻⁵

Reject H₀ — Energy has significantly higher ROA than Banking.

Spearman: ROA vs ROE

$\rho$ = 0.8258 | p = 4.65×10⁻²²⁵

Reject H₀ — Strong positive monotonic association.

Pearson: ROA vs ROE

r = 0.7767 | p = 4.28×10⁻¹⁸²

Reject H₀ — Linear component also strong (but Spearman > Pearson confirms non-linear structure).

Analytical Modules

Statistical Analysis

R Notebook — Non-Parametric Pipeline

Full CRISP-DM analysis in R (IRkernel on Google Colab). From raw data to hypothesis testing and financial dictamen.

14 statistical visualizations (boxplots, QQ, density, scatter)
Anderson-Darling + 4 Kruskal-Wallis + 2 Wilcoxon + Spearman/Pearson
IQR outlier detection & impact analysis
Country & sector tier classification

View PDF

Dashboard

Executive Dashboard — Financial KPIs

Aggregated KPIs, country & sector breakdowns, top 10 companies, and alert signals — built from the cleaned dataset.

1,036 companies × 11 countries × $5.08T MM total assets
Avg. ROA 2.97% | 75.2% profitable companies
Top 10 by revenue | 18 sector comparison
8 alerts: critical, warnings & positive signals

Open

7) Next Step

Panel data: extend to multi-year longitudinal analysis for trend identification and causal inference (2015–2024 fiscal years are available).
Risk-adjusted metrics: incorporate Sharpe-like ratios and downside risk measures per country-sector pair.
Sector deep-dives: detailed sub-sector analysis for Energy, Banking, and Metals & Mining.
Interactive dashboard: Shiny/Plotly app with country-sector drill-downs and dynamic filtering.
Investment lens: Tier-1 markets (Peru, Chile) show consistent profitability; Tier-3 (Ecuador, Paraguay) need governance adjustments before capital allocation.

Tech Stack

R (IRkernel)

Primary language. dplyr, tidyr, readxl for data wrangling on 1,200×26 dataset.

ggplot2 / gridExtra

14 statistical visualizations: boxplots, QQ plots, density overlays, scatter with regression.

nortest / stats

Anderson-Darling, Kruskal-Wallis, Wilcoxon rank-sum, Spearman & Pearson correlation.

knitr / kableExtra

Publication-quality tables, styled HTML output, frequency distributions.

Google Colab

Cloud-based R notebook for reproducible analysis. IRkernel runtime.

CRISP-DM

End-to-end methodology across all 6 phases: business understanding through deployment.

Scope & Limitations

Scope

End-to-end CRISP-DM pipeline: raw data → cleaning → EDA → hypothesis testing → tiers.
7 non-parametric tests with documented statistics and p-values.
Country and sector tier rankings based on median ROA/ROE with sufficient sample sizes.
Leverage-profitability dynamics via Spearman correlation ($\rho$ = 0.826).
Excel dashboard with aggregated KPIs for 1,036 companies.

Limitations

Cross-sectional only: majority of data is 2023–2024; no panel/time-series dynamics.
No FX adjustment: financial variables in millions USD, but ratios reflect local reporting.
Survivorship bias: only currently listed firms; delisted/failed companies excluded.
Sample imbalance: Brazil = 35.4% of data; Uruguay (n=8) and Surinam (n=2) are unreliable alone.
Sector classifications from EMIS may not align perfectly across all exchanges.