/ METHODOLOGY

How the AI actually generates picks

We get asked "what's the AI" a lot. Here's the full six-stage pipeline, with no hand-waving. If we don't explain the math, you shouldn't trust the picks.

Launch status: stages 1, 2, and 4 live via stub data. Stage 3 (real XGBoost) and stages 5-6 (CLV + retrain loop) ship within 4 weeks of launch. Follow the build-in-public updates on the blog.

Data intake

STAGE 1

Live odds from The Odds API across 12 retail + sharp books (DraftKings, FanDuel, BetMGM, Caesars, Circa, Pinnacle, plus regional/offshore). Injury feeds from team beat writers + official sources. Weather feeds for outdoor games. Line movement history with timestamps down to the minute.

▸12-book odds scraper running every 5 min
▸Injury + lineup feeds polled every 15 min on game days
▸Weather (wind + precip) for outdoor sports every 30 min
▸Historical odds + results archive (5 seasons per major league)

Feature engineering

STAGE 2

Raw data becomes ~80 features per game per sport. Pace-adjusted efficiency, rest differential, home/road splits, usage rates, recent form weighted exponentially by recency, schedule density, opponent-adjusted metrics. Feature set is versioned and reproducible.

▸80+ per-game features across NFL / NBA / MLB / NHL / NCAAF / NCAAB
▸Pace-adjusted (not raw) efficiency numbers per team
▸Recency-weighted recent form (last 5 games > last 10 games > last 20)
▸Opponent-adjusted variance, not just averages

XGBoost ensemble scoring

STAGE 3

Ensemble of gradient-boosted decision trees per sport, trained on 3+ seasons of historical data with walk-forward cross-validation (never train on future, never leak results). Model outputs a win probability and confidence tier for each market per game. Sub-100ms inference.

▸Per-sport ensemble (not one-size-fits-all across all leagues)
▸Walk-forward CV prevents data leakage
▸Out-of-sample test on last season held out at training
▸Inference served via lightweight FastAPI, <100ms per prediction

SHAP explainability

STAGE 4

Every prediction decomposed into per-feature SHAP values. Users see not just confidence 4 but which factors contributed: +0.8% from rest diff, +0.4% from injury impact, -0.2% from line movement, etc. No black boxes. Users who disagree with the weighting can override with context the model can't see.

▸SHAP value computed per pick, stored with the pick record
▸Top 5 factors surfaced in the Discord embed
▸Full factor breakdown visible in Elite-tier dashboard
▸SHAP drift monitoring flags when a factor starts behaving oddly

Closing Line Value tracker

STAGE 5

Every posted pick is compared against the final closing number across all 12 books when the game starts. Positive CLV means we beat the market; negative means we didn't. CLV over a 100+ pick window is the single honest indicator of whether the model has edge.

▸Settlement job runs nightly across all tracked picks
▸Per-sport + per-market + per-confidence-tier CLV aggregated
▸Public CLV rolling 30 / 60 / 90 day on Elite dashboard
▸Model regress target: +2% average CLV sustained (sharp territory)

Daily retrain feedback loop

STAGE 6

Settled results feed into the training set. Model retrains on a rolling window (weekly full retrain, daily incremental for fresh data) so tomorrow's predictions are informed by yesterday's results. Over 90 days the model measurably sharpens against the closing line.

▸Weekly full retrain with updated feature importances
▸Daily incremental fine-tune with the most recent game nights
▸A/B shadow models evaluated before any live swap
▸Rollback automatic if live CLV drops 0.5+ points vs prior version over 50 picks