Back to articles

Latency and infra

HFT on Polymarket: Model, Rust, and the 98% Lie

Phase 3: recorder-first HFT research, Rust on the hot path, prepared datasets, and why your own timestamps matter more than a flashy hit rate.

Phase 038 min7 sections
Rust
Python
LightGBM
WebSockets
PyO3
JSONL
Parquet
Polars
Adverse Selection
Low Latency

Polymarket journey

Reading guide

Phase 03 of the broader case study.

Chapter

Phase 03

Latency and infra

Reading time

8 min

7 structured sections

In this phase, the recorder mattered more than the model headline.
Open the Polymarket case study

Opening note

A shorter, more readable version of the original archive entry, focused on the parts that remained technically useful.

HFT on Polymarket: Model, Rust, and the 98% Lie

Phase 3: second-scale features, recorder-first infrastructure, and the point where even strong classifier metrics were not enough.

After long-horizon prediction hit data bottlenecks and the 15-minute system showed little durable edge, the next logical question was narrower: could second-scale signals still contain something useful?

That pushed the work toward HFT-style infrastructure. The problem was no longer just "can I train a model?" It was "can I observe, align, and execute honestly enough that the model means anything at all?"

HFT stack map

The HFT phase became a recorder-first stack: capture fast, prepare honestly, model carefully, and keep blocking live actions away from the hot path.

Why HFT was the next step

The thesis here was much more specific than before.

  • Maybe Polymarket lagged the reference market.
  • Maybe order-book imbalance and short lag features contained signal.
  • Maybe the real edge was not prediction alone, but data freshness and execution discipline.

That meant the project had to become infrastructure-first.

The architecture that made it possible

I split the stack by responsibility.

Rust on the hot path

  • WebSockets for Binance and Polymarket.
  • Event-driven waiting on the next Binance tick instead of sleepy polling.
  • A single-source feature path for the HFT vector, including spread, OBI, and lag features.
  • Recorder and latency primitives exposed through poly_rust_core.

Python around it

  • Recorder orchestration and token refresh.
  • JSONL to Parquet conversion, including incremental ETL.
  • prepared_l1 and prepared_l2 dataset generation for train, val, and test.
  • LightGBM training, experiment sweeps, visualization, and paper/live wrappers.

This division worked well because it kept the timing-sensitive path in Rust while leaving iteration flexible in Python.

Data quality mattered more than model complexity

The real improvement in this phase was the recorder and the ETL around it.

  • Each tick carried local_receipt_ts_ms.
  • Alignment followed a "last known value" philosophy instead of pretending two feeds share exact timestamps.
  • Freshness columns such as binance_age_ms and poly_age_ms turned misalignment into explicit information.
  • Training rows could be filtered by latency thresholds such as 200-500 ms.

The project also includes explicit leakage checks and dataset preparation stages, which made the pipeline feel much closer to production research than to a simple backtest.

To make that stage visible instead of abstract, I embedded one real source_recorded_l1_sol feature explorer inside the site. It lets you inspect a recorded slug, compare raw versus smoothed series, and see the sort of feature-debugging surface that sat behind prepared_l1.

In this phase, the recorder mattered more than the model headline.

Interactive appendix

Recorded L1 feature explorer

One real `source_recorded_l1_sol` export from the HFT repo, hosted inside the site so the feature layer can be inspected instead of just summarized.

slug=sol_1771810200
58,613 rows
57 columns
Plotly
prepared_l1 context
Polymarket HFT feature explorer preview

This is the actual HTML-style explorer used to inspect recorded HFT series and feature behavior. Embedding it here makes the dataset and feature-engineering story much easier to validate.

The models got better, but not magically tradable

This phase is where the repo became most honest.

The HFT module supports multiple target families:

  • taker-style horizons such as hft_1s, hft_5s, and hft_10s,
  • maker protection models such as maker_1s,
  • adverse-selection labels such as adverse_bid_1s,
  • fair-value and expiry-oriented variants such as fair_up and expiry.

One of the most telling outcomes is the phase1_adverse_bid_1s_lightgbm evaluation:

  • test accuracy around 0.921,
  • Brier score around 0.068,
  • log loss around 0.244,
  • but still negative maker-style PnL in evaluation.

That is exactly the kind of result I trust more than a flashy headline. It proves the pipeline was capable of producing a decent classifier while still refusing to confuse classification quality with executable edge.

The same evaluation also measured inference latency in the tens of microseconds per batch on the model side, which tells a very different story from the earlier simplistic "will this strategy make money?" framing.

Where the 98% lie came from

Short-horizon systems are especially vulnerable to stale quotes.

If you assume you buy at the visible ask at time t and react instantly, the backtest can look absurdly strong. That is where the fake 98%-style hit rate comes from.

Once I forced the simulation to respect reaction time and execution pain, the picture changed:

  • reaction_latency_ms >= 300 cut hit rate sharply,
  • slippage and queue realism made the result even more fragile,
  • the quant diagnostics exposed several ways to accidentally fake PnL if the simulator was careless.

This is exactly why How a Real Backtest Works became a companion note to the whole project.

The execution architecture got more serious too

The HFT branch did not stop at paper labeling and offline models.

  • real_trading_hft unified paper and live modes around the same architecture.
  • The hot loop stayed free of direct API calls.
  • A blocking action_queue handed place and cancel tasks to a secondary API worker.
  • Pre-flight ping checks and a latency monitor acted like a circuit breaker.
  • The live logic refused price extremes outside the safer band, roughly 0.10-0.90, and demanded extra probability when spread cost was too large.

That made the system much more like a small execution engine than like a notebook wrapped in a CLI.

Takeaway

The HFT phase was the technically strongest part of the Polymarket journey because it forced everything to become more precise: timestamps, ETL, feature contracts, model evaluation, and execution architecture.

It also made the main lesson impossible to ignore: if your data path or simulator is weak, the model is just decorating a timing artifact.