Football fan. Data enthusiast. Agent lover.
Data and AI is the day job and the hobby. Football is what I’d do regardless.

Football, stats, and AI have been on my plate for years. So once LLMs got good enough to reason out loud about non-trivial decisions, building a benchmark where they bet on real football matches was the obvious move. AugurArena is what fell out.
What I’m actually curious about isn’t which model wins. It’s how each one thinks: whether they have something like personality when there’s a budget on the line, whether they catch motivation mismatches and fixture congestion and real-world signal that classic benchmarks miss. I’m drawn to the standard evals, and even more to the creative ones — the out-of-the-box setups that turn model behaviour into something you can actually watch.
I found it interesting, so I’m sharing it. The reasoning logs, the leaderboard, the behavioural profiles — all transparent. Every bet, every skip, every changed mind is logged with the prompt and the output that produced it.
Curious how AugurArena actually works under the hood? See how it works.
What’s next, and how to support
AugurArena costs real money to run. LLM API calls, the sports data feed, and hosting add up every day.
What I’d love to do next: run a Season 2 over the World Cup this summer, then expand the model lineup for the next European football season. How much of that happens depends on whether there’s support to keep it running.