Friday, March 13, 2026

AI’s game-playing still has flaws: AlphaZero-style self-play tested on Nim

Despite heavy training, agents show blind spots and can miss optimal moves

Queen Mary University of London

image:
An illustration of Nim game states. The left panel shows an initial board configuration of five heaps: [n1, n2, n3, n4, n5] = [1, 3, 5, 7, 9]. The middle panel depicts an intermediate board state during gameplay: [v1, v2, v3, v4, v5] = [1, 2, 4, 4, 3], resulting from players removing counters. The right panel represents the terminal state where all counters are cleared, signifying a win for the player who made the last move.
view more
Credit: Image by Dr Bei Zhou, Research Associate at Imperial College, London, and Dr Søren Riis Reader in Computer Science Queen Mary University of London.

Researchers tested AlphaZero-style self-play on Nim, a children’s game solved mathematically.
Despite heavy training, agents show blind spots and can miss optimal moves.
Strong performance does not always mean the system has learned the underlying winning rule.

Embargo: immediate.

New research published in Machine Learning shows pattern learning is not enough to train AI to tackle games – and abstract representations or hybrid approaches may help.

Many AI researchers describe game-playing as the “Formula 1” of AI: it’s a controlled test environment with clear rules and clear success criteria. This paper uses that idea as a diagnostic, by studying a very simple game Nim, a children’s matchstick game whose optimal strategy is known exactly.

Because the correct move is known for every position, we can measure whether an agent plays optimally across the state space. The research found that while small boards can work, despite heavy training and search, agents show blind spots and can miss optimal moves, and performance degrades as the board grows, with predictions approaching random. This suggests impartial games often need analytic representations, not pattern learning.

What does this mean for gaming with machines?
Self-play AIs can be very strong, but in games where both players share the “pieces” and the winning strategy is an abstract arithmetic rule, pattern-recognition from raw positions may not be enough on its own. 

Wider implications:
The results don’t diminish the achievements of self-play AI in games like chess and Go. Rather, they help map where today’s methods can struggle, and where more abstract representations or hybrid approaches may be beneficial. More broadly, it’s a reminder that systems can perform well in common cases while remaining brittle in rare-but-important ones.

Dr Søren Riis Reader in Computer Science at Queen Mary University of London said: “Nim is a children’s game with a complete mathematical solution, yet AlphaZero-style self-play can still develop blind spots—becoming competitive while missing optimal moves across many positions.”

“This suggests that, for future work in AI, impressive performance alone is not proof that a system has learned the underlying principle: methods that capture abstract structure may be needed to reduce blind spots.”

“Impartial Games: A Challenge for Reinforcement Learning” by Dr Bei Zhou Research Associate at Imperial College London, and Dr Søren Riis Reader in Computer Science at Queen Mary University of London is published in Machine Learning

Journal

Machine Learning

DOI

10.1007/s10994-026-06996-1

Method of Research

Experimental study

Subject of Research

People

Article Title

Impartial Games: A Challenge for Reinforcement Learning

Article Publication Date

13-Mar-2026

LA REVUE GAUCHE - Left Comment