How accurate is the Haymaker prediction model?

65.2% overall test accuracy on 1,605 fights from 2023-2026. High-confidence picks (>72% model probability) hit at 81.3%.

What machine learning model does Haymaker use?

A weighted ensemble of LightGBM (65%) and CatBoost (35%), tuned with Optuna hyperparameter optimization. The model uses 23+ features including Elo ratings, rolling performance stats, style matchup history, and betting odds.

How often is the model updated?

Fight data and derived stats (Elo, snapshots, styles) are updated after every UFC event. The model is retrained periodically as new training data accumulates.

Does Haymaker beat the betting market?

On fights with available odds (1,080 of 1,605 test fights), the model matches the closing line at 68.1% vs the market's 68.0%. The real edge is on high-confidence picks and no-odds fights where the model provides independent signal.

Model Transparency

Our prediction model is fully open about its methodology and accuracy. No black boxes.

65.2%

Test Accuracy

1,605 fights (2023-2026)

0.7142

AUC-ROC

Discrimination

0.6167

Log-Loss

vs 0.693 coin flip

0.2146

Brier Score

vs 0.250 coin flip

How It Works

Data

Every UFC fight since 1994, scraped from UFCStats.com. Fighter stats computed as point-in-time snapshots to prevent data leakage.

Model

LightGBM + CatBoost ensemble (65/35 blend), tuned with Optuna (50 trials each). Trained on pre-2022 data, validated on 2022, tested on 2023.

Features

45 features including Elo ratings, rolling fight stats, defensive metrics, style matchups, and market odds when available.

Confidence Scaling

The model's confidence labels reflect real accuracy. When we say “Strong”, we mean it.

Toss-up (<55%)

50.0%

n=313

Lean (55-62%)

58.7%

n=322

Confident (62-72%)

65.0%

n=504

Strong (>72%)

81.3%

n=466

Calibration

When the model says 70%, does that fighter actually win ~70% of the time? Closer to the diagonal = better calibrated.

Predicted Range	Mean Predicted	Actual Win Rate	Fights	Difference
10-20%	17.7%	23.7%	38	+6.0pp
20-30%	25.6%	25.7%	171	+0.1pp
30-40%	35.1%	32.2%	239	2.9pp
40-50%	44.8%	48.3%	331	+3.5pp
50-60%	54.8%	57.6%	304	+2.7pp
60-70%	64.9%	64.1%	265	0.8pp
70-80%	74.5%	75.0%	152	+0.5pp
80-90%	84.7%	93.8%	96	+9.1pp
90-100%	91.0%	88.9%	9	2.1pp

Performance by Weight Class

Women's Featherweight

80.0%n=5

Middleweight

70.3%n=195

Women's Flyweight

69.3%n=88

Lightweight

68.2%n=217

Catch Weight

68.0%n=25

Bantamweight

65.2%n=184

Heavyweight

65.1%n=106

Women's Bantamweight

64.5%n=62

Flyweight

64.3%n=129

Light Heavyweight

63.0%n=108

Featherweight

62.6%n=190

Welterweight

61.4%n=189

Women's Strawweight

59.8%n=107

What Drives Predictions

Top features ranked by SHAP importance (mean absolute impact on predictions).

1Market odds lean

0.3527

2Age difference

0.1701

3Elo rating gap

0.1377

4Damage absorption rate

0.0794

5Takedown rate advantage

0.0718

6Striking defense advantage

0.0714

7Striking volume advantage

0.0659

8Reach advantage

0.0606

9Striking accuracy edge

0.0547

10Knockdown rate advantage

0.0368

Honest Assessment: Model vs Market

On fights with betting odds (1,080 of 1,605 test fights), the closing line achieves 68.0% accuracy — and our model matches it at 68.1% on the same subset. Overall test accuracy is 65.2%.

Where our model adds value:

Fights without odds data — we still predict at 59.0% accuracy
Structural analysis: Elo, style matchups, and stat differentials give context the line doesn't
Early line detection: spots value before lines sharpen
High-confidence picks (>72% model probability) hit at 81.3%

Training Details

Training Set

6,288 fights

1994-03-11 to 2021-12-18

Validation Set

505 fights

2022-01-15 to 2022-12-17

Test Set

1,644 fights

2023-01-14 to 2026-03-28

Ensemble Weights

LightGBM 50% / CatBoost 50%

Validation log-loss: 0.6023

What Changed in v2

Dropped team features — removed gym/team encodings that added noise without improving accuracy.
Added interaction features — offensive efficiency (strikes landed per absorbed) captures two-way matchup dynamics better than raw stat diffs.
Adaptive Elo K-factor — Elo ratings now update faster for early-career fighters and slow down for veterans, improving rating responsiveness.
Data-driven confidence thresholds — tier boundaries (toss-up / lean / confident / strong) are now set from observed accuracy curves instead of arbitrary cutoffs.