PnL lies. Calibration doesn't.
VIGIL grades every Polymarket wallet A through F on actual forecasting skill — Brier Skill Score against the market's implied probability, not dollars earned. Paste any wallet. See the grade render below in about two seconds. We've graded over 23,000 wallets and about half of the self-declared "whales" on this platform aren't actually sharp.
Six dimensions. One grade. Every grade backed by onchain evidence.
Each wallet is scored across six weighted dimensions of forecasting skill. Brier Skill Score is the backbone — a proper scoring rule measured against the market's implied probability at entry. Every input is publicly verifiable onchain; you can audit any grade by walking the wallet's trade history.
Read the one-page methodology (inline, no page jump)
Brier Skill Score — the reference
BSS = 1 − (BrierForecast / BrierReference), where BrierReference is the market-implied probability at your entry. A positive BSS means your forecast beat the consensus. Negative means you underperformed it. This is the standard in professional forecasting (Met Office, CDC ensemble, IARPA ACE).
Why non-linear mapping?
A linear BSS→score would reward a BSS of 0.02 almost as much as 0.20. But the jump from 0 to 0.10 is roughly the difference between "noise" and "calibrated trader." We use a logistic curve that compresses the middle and expands the tails — small BSS deltas near zero matter less; BSS > 0.10 gets meaningful lift. See the curve below.
Validation
18 months of resolved-market data as training; final 90 days as holdout. Grade-to-out-of-sample-BSS correlation ρ ≈ 0.71 on the holdout. Weights ridge-regularized against rank stability. Full train/test notebook linked below.
Proven-winner tiers
A wallet with 40 resolved bets and BSS +0.15 is probably calibrated. A wallet with 500 resolved bets and BSS +0.15 is definitely calibrated. Tiered bonuses reward sustained evidence: +3 at 100, +5 at 250, +8 at 500.
Uncertainty bands
Every grade ships with a 95% confidence interval, bootstrapped from the wallet's resolved-bet record (10k resamples). Under 100 bets the CI is wide enough that we surface it prominently. Over 500, it tightens to ±2–3 points.
vigil-v1.20.2-validation.ipynb · 18mo train / 90d test · ρ=0.71 on holdoutHow Brier Skill Score becomes a letter.
The mapping from BSS to VIGIL score. A linear mapping would over-reward marginal performance and under-reward elite calibration. Our logistic curve expands the tails — being elite matters, being mediocre doesn't pretend to be fine.
Worse than the consensus. Probably a copy trader or a gambler.
Roughly at market. ~5 calibration points.
Calibrated trader. Curve steepens here on purpose.
Elite. BSS > 0.25 is world-class, very rare at sample.
Real wallets. Real grades. Same data you can audit.
Three live examples. PnL alone won't tell you which one to copy. The grades do. Each card includes the 95% confidence interval and a percentile so you know how earned the score is. Numbers from today's crawl — the live ticker above confirms freshness.
Where VIGIL sits vs. everything else.
Four ways people size up Polymarket traders today. Only one measures forecasting skill.
| VIGIL | Polymarket leaderboard | Nansen / Arkham | Self-reported tweets | |
|---|---|---|---|---|
| Measures forecasting skill (not PnL) | ✓ | ✗ PnL only | ✗ onchain flow | ✗ vibes |
| Brier Skill Score backbone | ✓ | ✗ | ✗ | ✗ |
| Confidence intervals on every grade | ✓ (95% CI, bootstrap) | ✗ | ✗ | ✗ |
| Catches penny-lottery + bot patterns | ✓ | ✗ bots can top it | partial | ✗ |
| Free · no sign-up · free API tier | ✓ | ✓ | ✗ $150+/mo | ✓ |
| Chrome extension injects inline badges | ✓ | ✗ | ✗ | ✗ |
| Open source · verifiable onchain | ✓ MIT | ✗ | ✗ black box | ✗ |
| Time-decays stale activity | ✓ rolling 90d weight | ✗ lifetime PnL | partial | ✗ |
| Published validation notebook | ✓ ρ=0.71 holdout | ✗ | ✗ | ✗ |
A D-grade wallet can still be up $500K.
PnL measures outcomes. The grade measures the quality of the forecast. A lucky gambler with huge PnL and a penny-lottery pattern gets a D. A small, calibrated, high-BSS wallet with modest PnL gets an A.
// WHAT WE PENALIZE
- Penny-lottery spraying (80%+ sub-$0.10 bets with negative BSS → hard cap at D/49)
- Receive-only wallets with no outbound transactions
- High-concentration, all-in-on-one-binary patterns
- Negative Brier Skill Score regardless of bottom-line PnL
- Under-sampling (<50 resolved bets caps at C until proven)
- Stale activity — idle wallets time-decay after 90 days
// WHAT WE REWARD
- Positive Brier Skill Score across 100+ resolved markets
- Calibrated entries near the eventual resolved probability
- Stable IQR — calibrated wins that repeat across eras
- Market diversification across categories and time horizons
- Cross-category transfer (politics + sports + crypto + news)
- Proven-winner bonus at 500+ resolved bets with positive BSS and PnL (+8 pts)
Free JSON API. One endpoint to learn.
VIGIL's scoring engine is the product. The landing page is a skin over it. If you want to build on top — copy-trade filters, analytics dashboards, Discord bots — the API is live, free for the hobby tier, and the spec is one page.
// API Tiers
No auth on hobby. Rate-limited by IP. If you need more, email gatson32@gmail.com and tell me what you're building — I'll turn the spigot up.
Trust badges on every Polymarket profile.
Install once. Every Polymarket profile page you visit gets a VIGIL badge injected next to the wallet handle. Works silently. Free. Open source. Zero tracking.
Before a trade, before a copy, before a tweet — know who you're looking at. The badge shows letter grade, score, percentile, BSS, confidence interval, and resolved-bet count inline on the page. Manifest V3. Zero tracking. Site-scoped to polymarket.com.
Solo build. Public wallet. No investors.
In a trust product, the people behind it matter more than the logo. VIGIL is built by one person, in public, over the last six months. You can read every commit, fork the model, or tell me I'm wrong on X.
Chris Gatson
I started VIGIL because I kept watching Polymarket "whales" get copied on X and quietly losing money calibration-adjusted. PnL rewards size. VIGIL rewards being right. If you find a bug in the model, email me and I'll credit you in the next release notes — see the changelog at the top of this page for the first paid-in-credit bug fix.
Harder questions.
Quick objections live near the proof cards above. These are the ones that require a paragraph.
Single-player today. Social tomorrow.
Grading one wallet at a time is step one. Grading a group of wallets — your crew, a Discord, a copy-trade list — is where this gets fun.
Grade My Crew
Paste 5–50 wallets. Get a composite skill score and the per-member breakdown. Share the image. Name and shame the weakest link.
ships may · join waitlistTier-Change Alerts
Subscribe to any wallet. Get pinged when it drops from A to B, or climbs out of D. Free on the Pro tier. Webhooks, email, or Discord.
ships juneGrade-vs-Grade
Two-wallet showdown. Same markets, different calibration. Who actually forecasted better? Shareable image. Perfect for Twitter.
ships june