How AI Plays Dots and Boxes (and What Computer Analysis Has Taught Us)

From Berlekamp's combinatorial proofs to modern neural network solvers, computer analysis has fundamentally changed our understanding of dots and boxes. Here's what AI knows that humans missed.

April 25, 20269 min readAIcomputer analysisdots and boxestheory

For most of dots and boxes' history, the strongest play came from grandmasters who had played thousands of games and developed deep intuition. Then computers arrived, and a strange thing happened: the computers started winning consistently against the grandmasters. Not because they were smarter, but because they could see things humans couldn't.

This post is a tour of what computer analysis has taught us about dots and boxes — from Berlekamp's foundational theorems in the 1980s to modern reinforcement-learning solvers — and what those lessons mean for how humans should play.

The Berlekamp era: 1980s–2000s

Elwyn Berlekamp, the great combinatorial game theorist, did the foundational work on dots and boxes in the 1980s and 1990s. His book "The Dots and Boxes Game: Sophisticated Child's Play" (2000) consolidated decades of analysis and proved several results that changed the game.

The key insight from Berlekamp's era was the chain rule. Until Berlekamp's work, players knew chains mattered, but no one had a precise way to predict who would win based on chain structure. Berlekamp made it precise: count long chains plus loops, parity equals certain values, second player wins.

For the foundational explanation, see the chain rule explained simply. For the deeper math, see the mathematics of dots and boxes.

What's remarkable is that the chain rule is a combinatorial result — Berlekamp didn't need a computer to prove it. He used pencil-and-paper analysis. But the rule was so non-obvious that no human had spotted it in over a century of play. Once stated, it transformed the game.

The brute-force era: 2000s–2010s

After Berlekamp, the question became: can we solve small dots and boxes positions? "Solve" means: prove which player wins with perfect play and provide the moves to do it.

Brute-force computer search solved positions of increasing size:

3×3 box (16 lines): solved in the 2000s. Second player wins.
4×4 box (40 lines): solved by computer search by 2010. Second player wins.
5×5 box (60 lines): solved by computer search by 2014. Second player wins by 1 box (13–12).
6×6 and larger: still unsolved. The state space is too big for full search.

The 5×5 result was big news. The standard format used in tournaments and school yards was solved — second player wins by exactly one box with perfect play. Tournament rules now often allow each player to choose to be first or second, since being second is decisively better.

For the competitive scene implications, this changed everything. Top players now rehearse the second-player winning lines on 5×5, and the strategic question becomes "how does first player maximize variance?"

The reinforcement-learning era: 2015–present

Brute-force search hits limits past 5×5. To go further, researchers turned to reinforcement-learning agents — neural networks that learn by playing themselves millions of times, similar to AlphaGo.

These agents are not provably optimal. They don't "solve" the game. But they get very strong, and they reveal patterns that human analysis missed.

Some of the surprising findings from RL play:

Finding 1: Opening play matters more than humans thought

Humans treat the dots-and-boxes opening as somewhat arbitrary because no captures happen. Computer agents disagree. They show that specific opening sequences have measurable winning-rate advantages, especially for the player trying to overcome the inherent disadvantage of moving first.

This shifted competitive practice. Top players now study the first 12 moves of the 5×5 board more carefully than they used to. The Spine, Picket, and Open Middle openings have detailed analysis attached to them now.

Finding 2: Spite moves are more common in optimal play than expected

Spite moves — deliberately drawing the third side of a box — were considered desperation tactics by humans. Computer analysis shows they're correct surprisingly often, especially in mid-game positions where parity is hard to manage otherwise.

In top-level RL play on the 5×5 board, spite moves appear in roughly 1 of every 8 games as part of the winning line for the player who would otherwise be in zugzwang. Humans use them in fewer than 1 of every 50 games. The difference is a measurable skill gap.

Finding 3: Loops are tactically richer than chains

Berlekamp emphasized the long-chain rule. Computer analysis shows that loops — chains that close back on themselves — are tactically richer. A loop allows a 4-box double-cross instead of a 2-box one, which doubles the value of the trade.

This means that converting a long chain into a loop, when possible, is one of the highest-leverage moves in the game. Humans rarely think about this conversion explicitly. RL agents do it on purpose.

For human-readable coverage, see endgame loops and chains.

Finding 4: The middlegame is more decided by influence than humans realize

Competitive humans focus on the endgame because that's where captures happen. Computer agents treat the middlegame as critical — specifically, which player has more flexibility heading into the endgame.

A position with 5 safe moves is much better than one with 3 safe moves, even if both have correct parity, because the player with more safe moves has more time to react to opponent moves. RL agents systematically prefer flexibility-preserving moves over influence-maximizing moves in the middlegame, contrary to human intuition.

This maps to the influence vs. territory tradeoff. Humans tend to over-commit; computers stay flexible.

What computers cannot do (yet)

It's worth being clear about the limits.

Computers don't solve large grids. The 7×7 box (84 lines) is past the limit of full search and not reliably solved by RL either. As the grid grows, the state space explodes faster than computational tricks can keep up.

Computers don't generalize across grid types. A solver trained on 5×5 box grids does not automatically handle hex variants or Dot Clash's 25×25 grid. Each grid type requires retraining.

Computers don't explain themselves. An RL agent will tell you "play move X" but not "because of principle Y." Humans extract principles from observing computer play, but the extraction is manual and partial.

This is why human analysis still matters even in 2026. The Berlekamp-style work — identifying principles that humans can apply across positions — is still ahead of what AI can produce by itself.

What this means for human players

Three practical takeaways:

Takeaway 1: Study computer-annotated games

If you can find them, computer-analyzed games are gold for improvement. The best public source is the literature around the competitive scene, where top players publish annotated games with computer-suggested alternative lines.

Even one carefully studied annotated game teaches more than 20 unannotated games. It's the difference between watching pro tennis on TV and having a coach explain what each player is doing and why.

Takeaway 2: Adopt the tactics computers have proven

Three specific tactics that human-only play under-uses:

Spite moves, when parity is otherwise unfixable. See the dedicated post.
Chain-to-loop conversion, when the trade math favors the larger double-cross.
Flexibility-preserving moves in the middlegame, even when they don't directly capture or threaten.

These are not natural human instincts. You have to install them deliberately.

Takeaway 3: Don't fall into the "computers will solve it eventually" trap

Some players hear that 5×5 is solved and lose interest, thinking the game is "finished." This is wrong on two levels.

First, humans cannot play perfect 5×5 even knowing the solved lines exist — the line is too long to memorize and too sensitive to deviations. Knowing the game is solved doesn't help you play it.

Second, larger grids are not solved and may never be. Anything beyond 6×6 box and any of the non-square variants is still wide open. There's plenty of game left to play.

For Dot Clash specifically, the 25×25 grid is enormous. Computer analysis on it is in its infancy. Human pattern-recognition still has a real edge.

In short

Berlekamp's 1980s combinatorial work gave us the chain rule and the foundational theorems.
Brute-force solvers solved up to 5×5 box. Second player wins by 1.
RL agents revealed under-appreciated tactics: spite moves, loop conversions, flexibility preservation.
Limits remain — large grids, hex variants, and Dot Clash are still unsolved.
Human players should adopt computer-validated tactics and study annotated games.

The computer revolution in dots and boxes ran 30 years behind chess and 20 years behind Go, but it has now arrived. Players who understand both the human intuition and the computer-validated tactics play at a higher level than either alone.

For the human-side foundations, work through the chain rule, the double-cross, and parity counting in live play. Then layer the computer-derived tactics on top.