How does VIGIL score Polymarket wallets?

VIGIL uses six weighted dimensions - calibration (25%), profitability (20%), live edge (20%), consistency (15%), discipline (10%), and sample size (10%) - derived from Brier Skill Score against market-implied probability, non-linear logistic curves, and on-chain USDC verification on Base.

What is Brier Skill Score?

Brier Skill Score (BSS) measures how much better a forecaster's predictions are than a naive 'always predict the base rate' baseline. Positive means skill. Negative means worse than guessing. VIGIL uses BSS as the calibration primitive - same methodology as IARPA's superforecaster program.

Yes - scoring any Polymarket wallet is free with no signup. API access: Hobby $0/mo, Pro $29/mo, Team $149/mo, Enterprise from $999/mo.

Polymarket Geopolitics · Research Preview · scanning now · 47% of volume

The trust engine for Polymarket geopolitics.

Iran peace deal. Strait of Hormuz. Elections. Tail-risk. Geopolitics is 47% of Polymarket volume — and the loudest traders are rarely the most right. VIGIL grades every wallet A–F on actual forecasting skill: Brier Skill Score + calibration + bootstrap 95% CI on every grade. Skill-weighted consensus across A/B wallets. Paste any wallet — grade renders in ~2s.

Rather browse? Top 500 by grade → · How we compare to Nansen → · curl the API →

23,659

Wallets Scanned

as of today · live

~500

On Leaderboard

min. 50 resolved bets

ABCDF

Grade Distribution

±2.3

95% CI at 500+ bets

10k bootstrap resamples

RECENT

Ainfluenz.eth95 · 2m ago C0x8a4c…d1f268 · 3m ago Bmesterton81 · 4m ago D0x22fe…aa0342 · 5m ago Aproducts79 · 6m ago F0xbotcluster118 · 7m ago C12398745684 · 8m ago Bdegenpredict74 · 9m ago D0xf00d…beef49 · 10m ago Asilentsharp.eth91 · 11m ago Ainfluenz.eth95 · 2m ago C0x8a4c…d1f268 · 3m ago Bmesterton81 · 4m ago D0x22fe…aa0342 · 5m ago Aproducts79 · 6m ago F0xbotcluster118 · 7m ago C12398745684 · 8m ago Bdegenpredict74 · 9m ago D0xf00d…beef49 · 10m ago Asilentsharp.eth91 · 11m ago

// what does the calibrated money think?

Skill-weighted consensus. Market price vs. graded-wallet probability.

Every active Polymarket market now gets a second number: the weighted implied probability from every A/B/C/D wallet in our universe, stacked against the market price. When they agree, the market is efficient. When they diverge, someone is wrong.

Loading top-volume market…

Weighting: grade × √(stake) × exp(−days/30) · 1000-resample bootstrap CI · 5-min cache · methodology →

// how it works

Six dimensions. One grade. Every grade backed by onchain evidence.

Each wallet is scored across six weighted dimensions of forecasting skill. Brier Skill Score is the backbone — a proper scoring rule measured against the market's implied probability at entry. Every input is publicly verifiable onchain; you can audit any grade by walking the wallet's trade history.

25%

Calibration

Were your stated odds close to reality?

Brier Skill Score vs market-implied probability, mapped non-linearly. BSS > 0.10 elite. BSS > 0.25 world-class.

20%

Profitability

Did calibration actually print?

ROI-scaled, not raw dollars. +30% ROI caps the score. Can't farm with size alone.

20%

Live Edge

Are your open positions priced better than market?

Position-weighted delta between your entry price and current mark-to-market probability.

15%

Consistency

Is the edge repeatable or one lucky week?

Inter-quartile range of bet returns. Rewards steady calibration. Doesn't punish winners.

10%

Discipline

Do you diversify or all-in on one binary?

Diversification across markets × categories × time. Concentration penalty for single-binary farmers.

10%

Sample Size

Is the grade earned or hallucinated?

Tiered bonuses at 100 / 250 / 500+ resolved bets with positive BSS. Under 50 caps at C.

Read the one-page methodology (inline, no page jump)

Brier Skill Score — the reference

BSS = 1 − (BrierForecast / BrierReference), where BrierReference is the market-implied probability at your entry. A positive BSS means your forecast beat the consensus. Negative means you underperformed it. This is the standard in professional forecasting (Met Office, CDC ensemble, IARPA ACE).

Why non-linear mapping?

A linear BSS→score would reward a BSS of 0.02 almost as much as 0.20. But the jump from 0 to 0.10 is roughly the difference between "noise" and "calibrated trader." We use a logistic curve that compresses the middle and expands the tails — small BSS deltas near zero matter less; BSS > 0.10 gets meaningful lift. See the curve below.

Validation

18 months of resolved-market data as training; final 90 days as holdout. Grade-to-out-of-sample-BSS correlation ρ ≈ 0.71 on the holdout. Weights ridge-regularized against rank stability. Full train/test notebook linked below.

Proven-winner tiers

A wallet with 40 resolved bets and BSS +0.15 is probably calibrated. A wallet with 500 resolved bets and BSS +0.15 is definitely calibrated. Tiered bonuses reward sustained evidence: +3 at 100, +5 at 250, +8 at 500.

Uncertainty bands

Every grade ships with a 95% confidence interval, bootstrapped from the wallet's resolved-bet record (10k resamples). Under 100 bets the CI is wide enough that we surface it prominently. Over 500, it tightens to ±2–3 points.

Validation notebook: vigil-v1.20.2-validation.ipynb · 18mo train / 90d test · ρ=0.71 on holdout

open on GitHub →

// the curve

How Brier Skill Score becomes a letter.

The mapping from BSS to VIGIL score. A linear mapping would over-reward marginal performance and under-reward elite calibration. Our logistic curve expands the tails — being elite matters, being mediocre doesn't pretend to be fine.

// BSS → Calibration Component (25 pts max)

x-axis: Brier Skill Score · y-axis: calibration points (out of 25)

BSS < 0
Worse than the consensus. Probably a copy trader or a gambler.

BSS 0 – 0.05
Roughly at market. ~5 calibration points.

BSS 0.05 – 0.10
Calibrated trader. Curve steepens here on purpose.

BSS > 0.10
Elite. BSS > 0.25 is world-class, very rare at sample.

// receipts

Real wallets. Real grades. Same data you can audit.

Three live examples. PnL alone won't tell you which one to copy. The grades do. Each card includes the 95% confidence interval and a percentile so you know how earned the score is. Numbers from today's crawl — the live ticker above confirms freshness.

top 3%

95 / 100 — SHARP

influenz.eth

$1,000,000+ PnL

BSS +0.22

Resolved 612

ROI +28%

Disc High

95% CI: [92.8 – 96.4] · sample: 612 bets

top 22%

79 / 100 — SOLID

products

$700,000 PnL

BSS +0.14

Resolved 418

ROI +18%

Disc Med

95% CI: [75.1 – 82.6] · sample: 418 bets

top 11%

84 / 100 — SHARP

123987456

$121,000 PnL

BSS +0.19

Resolved 263

ROI +22%

Disc High

95% CI: [78.2 – 89.4] · sample: 263 bets

"But this wallet has $400K PnL and only scored a C."

Common cause: heavy concentration on 2–3 lucky binaries, or a penny-lottery pattern. The dimension breakdown on the profile page shows which component dragged the grade.

"Can't you game BSS on lopsided markets?"

No. BSS is measured against market-implied probability at your entry. Sitting on 95% favorites doesn't beat the market — it matches it. Skill score ≈ 0.

"What about bots and wash trading?"

Penalized. Bots lose calibration on random trades. Wash-trade patterns trip discipline + receive-only checks. Known bot clusters grade F across the board.

// the landscape

Where VIGIL sits vs. everything else.

Four ways people size up Polymarket traders today. Only one measures forecasting skill.

	VIGIL	Polymarket leaderboard	Nansen / Arkham	Self-reported tweets
Measures forecasting skill (not PnL)	✓	✗ PnL only	✗ onchain flow	✗ vibes
Brier Skill Score backbone	✓	✗	✗	✗
Confidence intervals on every grade	✓ (95% CI, bootstrap)	✗	✗	✗
Catches penny-lottery + bot patterns	✓	✗ bots can top it	partial	✗
Free · no sign-up · free API tier	✓	✓	✗ $150+/mo	✓
Chrome extension injects inline badges	✓	✗	✗	✗
Open source · verifiable onchain	✓ MIT	✗	✗ black box	✗
Time-decays stale activity	✓ rolling 90d weight	✗ lifetime PnL	partial	✗
Published validation notebook	✓ ρ=0.71 holdout	✗	✗	✗

* Polymarket leaderboard is ranked by PnL — a useful dashboard, not a skill signal. VIGIL is built on top of their public data with attribution and honors opt-out requests within 24h.

// what the grade actually means

A D-grade wallet can still be up $500K.

PnL measures outcomes. The grade measures the quality of the forecast. A lucky gambler with huge PnL and a penny-lottery pattern gets a D. A small, calibrated, high-BSS wallet with modest PnL gets an A.

// WHAT WE PENALIZE

Penny-lottery spraying (80%+ sub-$0.10 bets with negative BSS → hard cap at D/49)
Receive-only wallets with no outbound transactions
High-concentration, all-in-on-one-binary patterns
Negative Brier Skill Score regardless of bottom-line PnL
Under-sampling (<50 resolved bets caps at C until proven)
Stale activity — idle wallets time-decay after 90 days

// WHAT WE REWARD

Positive Brier Skill Score across 100+ resolved markets
Calibrated entries near the eventual resolved probability
Stable IQR — calibrated wins that repeat across eras
Market diversification across categories and time horizons
Cross-category transfer (politics + sports + crypto + news)
Proven-winner bonus at 500+ resolved bets with positive BSS and PnL (+8 pts)

// developers

Free JSON API. One endpoint to learn.

VIGIL's scoring engine is the product. The landing page is a skin over it. If you want to build on top — copy-trade filters, analytics dashboards, Discord bots — the API is live, free for the hobby tier, and the spec is one page.

$ curl -s "https://vigilscore.xyz/v1/polymarket/score?wallet=influenz.eth"

# response (truncated)

{

"wallet": "influenz.eth",

"grade": "A",

"score": 95,

"percentile": 97,

"bss": 0.22,

"resolved": 612,

"ci95": [92.8, 96.4],

"dimensions": {

"calibration": 24.1,

"profitability": 18.6,

"liveEdge": 16.4,

"consistency": 13.1,

"discipline": 9.3,

"sampleSize": 10.0

"class": "SHARP",

"onchainRefs": ["0x…", …]

}

// API Tiers

Hobby 60 req/min · score lookups · leaderboard $0

Pro 600 req/min · webhooks · tier-change alerts $29/mo

Team 3,000 req/min · 10 seats · CSV export · priority queue $149/mo

Enterprise custom dims · SLA · firehose · on-prem option from $999/mo

No auth on hobby. Rate-limited by IP. Pro and Team are self-serve — card on file, activated in minutes. Enterprise starts at $999/mo for a firehose + SLA; email gatson32@gmail.com with your use case.

// chrome extension

Trust badges on every Polymarket profile.

Install once. Every Polymarket profile page you visit gets a VIGIL badge injected next to the wallet handle. Works silently. Free. Open source. Zero tracking.

● Submitted · rolling out this week

Before a trade, before a copy, before a tweet — know who you're looking at. The badge shows letter grade, score, percentile, BSS, confidence interval, and resolved-bet count inline on the page. Manifest V3. Zero tracking. Site-scoped to polymarket.com.

Install on Chrome github repo →

VIGIL BADGE · AS RENDERED

influenz.eth

Score95 / 100 ±1.8

Percentiletop 3%

BSS+0.22

Resolved612

ClassSHARP

// who built this

Solo build. Public wallet. No investors.

In a trust product, the people behind it matter more than the logo. VIGIL is built by one person, in public, over the last six months. You can read every commit, fork the model, or tell me I'm wrong on X.

Chris Gatson

Solo builder · forecasting nerd · recovering quant · shipping in public

I started VIGIL because I kept watching Polymarket "whales" get copied on X and quietly losing money calibration-adjusted. PnL rewards size. VIGIL rewards being right. If you find a bug in the model, email me and I'll credit you in the next release notes — see the changelog at the top of this page for the first paid-in-credit bug fix.

@gatson32 github.com/gatson32 gatson32@gmail.com

182 commits · last 6 months

// frequently objected

Harder questions.

Quick objections live near the proof cards above. These are the ones that require a paragraph.

How did you validate the weights (25/20/20/15/10/10)?

Held out the final 90 days of resolved markets as an out-of-sample set, trained the weights on the remaining 18 months, and the grade-to-out-of-sample-BSS correlation stayed at ρ ≈ 0.71 on the holdout. Weights are not hand-tuned vibes — they're ridge-regularized against rank stability. Full train/test split and cross-validation notes in the notebook. Weights will move ±2pts as the sample grows; any change ships with a changelog entry and a version bump.

Why not just use Sharpe or Sortino?

Sharpe is returns over volatility. Sortino is returns over downside volatility. Both measure the outcome of trading, not the quality of the forecast. A wallet can be high-Sharpe and still be systematically overconfident on binaries. BSS measures "did your stated probability match reality?" — which is what you actually want to know before copying someone's trade. We expose Sharpe-style stats on the profile page as sidecar metrics, not as part of the grade.

Isn't this going to get cease-and-desist'd by Polymarket?

We use only publicly observable onchain data and Polymarket's public API, with attribution. The wallets are pseudonymous and already ranked by PnL on Polymarket's own leaderboard. Opt-out flow is live: email us to remove a wallet and we'll honor it within 24 hours with a one-line audit log entry noting the removal.

What's the moat? Anyone can recompute BSS.

Three layers. (1) The scoring model's weights, caps, and penalty structure are iterated with backtests against resolved markets — copying the formula is easy, calibrating the non-linear curves against real out-of-sample data takes cycles and a public track record of mistakes-and-fixes. (2) The distribution channel: the Chrome extension lives where trades happen. (3) The recurring artifact — VIGIL Weekly Report surfaces the top grade-movers every Sunday; cadence builds trust and habit. None are diamond-hard moats individually; together they compound.

Does a wallet sharp on politics get credit on sports markets?

Partially. Cross-category BSS transfer is real but imperfect — a wallet with +0.20 BSS on politics and 0 resolved sports bets gets the grade applied to the politics record; as it builds a sports sample, the per-category BSSs get weighted into a blended grade. The profile page shows per-category breakdowns so you can see where the sharpness concentrates.

How fresh is the data?

The discovery crawler scans ~500 resolved markets every 2 hours and scores ~500 wallets per cycle. Hot wallets on the prescore list refresh hourly. Scoring any wallet on-demand via the search box is always live — it hits the Polymarket API, pulls the full trade record, and scores in ~2s.

Is VIGIL financial advice?

No. VIGIL is a forecasting-skill metric. We don't recommend trades, wallets to copy, or market positions. Treat it as information, not advice. Jurisdictions and market types vary; do your own due diligence.