What I Learned Trying to Predict Polymarket Bets

Opening note

A shorter, more readable version of the original archive entry, focused on the parts that remained technically useful.

What I Learned Trying to Predict Polymarket Bets

Phase 1 of the Polymarket work: build the dataset, enrich it properly, and discover that collection was harder than modeling.

I started with the most intuitive idea: predict whether a binary Polymarket market was mispriced before resolution. The real surprise was not the modeling challenge. It was how much engineering was needed before the dataset even existed.

This phase came before the 15-minute bot and before the HFT recorder. It set the rule that shaped everything afterwards: own the data path first, then decide if the research deserves more time.

If the dataset does not exist, data collection becomes the real project.

Pipeline map

A high-level view of how the first Polymarket dataset moved from public endpoints to reusable research artifacts.

Click any node to jump to that part of the article

Sources

The raw market universe and activity feeds.

Gamma API

Event metadata and market discovery

Data API

Trade history and activity windows

CLOB context

Optional book-side enrichment for live-style features

Enrichment

What made the scraper more than a plain ETL.

Top holders

Position concentration and holder mix

Trader metrics cache

Reusable trader-level stats across runs

Resume and filters

Offsets, min volume, and ID normalization

Feature layer

Signals distilled for training and scoring.

Whale score

Pressure proxy from participant concentration

Market lean

YES versus NO pressure from participant behavior

Merged feature rows

One row per market snapshot ready for modeling

Outputs

Artifacts that survived beyond phase 1.

features.csv

Training-ready table

top_holders.csv

Inspection layer for market structure

trader_metrics.csv

Cache that reduced repeated expensive calls

Dataset-first engineeringTop-holder signalsReusable CSV artifacts

This map stays high level on purpose: the real story in phase 1 is that collection, normalization, and reuse were already a serious system.

The hypothesis

The thesis was broader than simple price prediction. I wanted to combine:

trade flow and price buckets from the Data API,
CLOB structure such as spread, midpoint, and depth,
market metadata such as duration, category, and volume,
holder-level signals such as whale concentration, trader skill, and side lean.

That last part matters. The scraper was not only gathering trades. It also had a full optional layer for top holders, trader metrics, and market lean weighted by whale score.

The pipeline I actually built

The root project is much more than a one-off script. It has a proper package structure, central config, bundle generators, and several outputs designed for later analysis.

The core scrape flow looked like this:

Use Gamma API to scan events.
Pull trades from the Data API.
Optionally pull CLOB book and midpoint features.
Optionally run top holders analysis with trader-metrics caching.
Merge everything into market-level outputs for training or live-style scoring.

The main outputs were not just one CSV. The project writes features.csv, status.csv, raw_trades.csv, top_holders.csv, and trader_metrics.csv, which makes the work feel much closer to a research pipeline than to a notebook.

What made the scraper richer than a plain ETL

This first phase already contained signals that later justified the whole portfolio story.

execution_mode supported TRAINING and LIVE.
min_volume_usd and initial_offset controlled scope from the config layer instead of being hardcoded.
top_holders_limit went up to the API cap of 20 holders.
Trader metrics were cached for 5 days so the pipeline did not waste calls recomputing wallet histories.
Whale influence was summarized into whale_score, then reused to compute lean_yes_pct and lean_no_pct.

That means the project was not just "download trades and train a model." It was already trying to express market structure, participant quality, and operational robustness.

Where the real cost appeared

The blocker was not feature engineering. It was the cost of collecting enough history to trust the research.

What looked easy on paper	What it meant in practice
"Get historical markets"	No public bulk dump, so each event meant several API calls across Gamma, Data API, and CLOB
"Scale the dataset"	`min_volume_usd`, pagination offsets, and wide feature tables limited practical throughput
"Keep the scraper stable"	429 responses, retries, backoff, and resume logic shaped runtime as much as the model did
"Use holder analytics too"	Wallet histories, top-holder reports, and cached trader metrics made the data richer but even more expensive to build

A few thousand events could already mean hours of work. Tens of thousands of clean examples meant many long runs, more storage, and much more operational babysitting than I wanted for a first research front.

Why I paused this front

The pause was not a verdict on prediction markets as a topic. It was a scope decision.

Long-horizon markets resolve slowly, so the learning loop is slow.
Coverage had to be broad enough to avoid training on a thin or biased slice.
The richer I made the dataset, the more collection became the bottleneck.

Technically, the phase was successful. Strategically, it was too expensive relative to the feedback it was giving me.

What carried forward

This phase still paid off because it clarified what the next phases should look like.

It pushed me toward 15-Minute Trading on Polymarket, where the loop was much faster.
It made the case for HFT on Polymarket: Model, Rust, and the 98% Lie, where recording my own data became practical.
It reinforced the realism mindset that later became How a Real Backtest Works.

It also left behind bundle tooling and a clean module layout, which mattered later when the project had to become portable and server-friendly.

Takeaway

Prediction on Polymarket was technically viable, but the first serious obstacle was not the model. It was building a trustworthy dataset rich enough to deserve one.

That lesson shaped the rest of the journey: shorter horizons, tighter loops, and a much clearer respect for the difference between a good idea and a sustainable data problem.

What I Learned Trying to Predict Polymarket Bets

Reading guide

Opening note