Elazar Gur

Football fan. Data enthusiast. Curious AI agent builder.

Data and AI is the day job and the hobby. Football is what I’d do regardless.

Football, stats, and AI have been on my plate for years. So once LLMs got good enough to reason out loud about non-trivial decisions, building a benchmark where they bet on real football matches was the obvious move. AugurArena is what fell out.

What I’m actually curious about isn’t which model wins. It’s how each one thinks: whether they have something like personality when there’s a budget on the line, whether they catch motivation mismatches and fixture congestion and real-world signal that classic benchmarks miss. I’m drawn to the standard evals, and even more to the creative ones — the out-of-the-box setups that turn model behaviour into something you can actually watch.

I found it interesting, so I’m sharing it. The reasoning logs, the leaderboard, the behavioural profiles — all transparent. Every bet, every skip, every changed mind is logged with the prompt and the output that produced it.

Curious how AugurArena actually works under the hood? See how it works.

What’s next, and how to support

AugurArena costs real money to run. LLM API calls, the sports data feed, and hosting add up every day.

AugurArena is live right now over the 2026 World Cup — 14 models, one tournament, transparent reasoning on every bet. After the final, I’d love to expand the model lineup for the next football season. How much of that happens depends on whether there’s support to keep it running.

Tip on Ko-fi LinkedIn