AI’s game-playing still has flaws: AlphaZero-style self-play tested on Nim
Despite heavy training, agents show blind spots and can miss optimal moves
Queen Mary University of London
image:
An illustration of Nim game states. The left panel shows an initial board configuration of five heaps: [n1, n2, n3, n4, n5] = [1, 3, 5, 7, 9]. The middle panel depicts an intermediate board state during gameplay: [v1, v2, v3, v4, v5] = [1, 2, 4, 4, 3], resulting from players removing counters. The right panel represents the terminal state where all counters are cleared, signifying a win for the player who made the last move.
view moreCredit: Image by Dr Bei Zhou, Research Associate at Imperial College, London, and Dr Søren Riis Reader in Computer Science Queen Mary University of London.
- Researchers tested AlphaZero-style self-play on Nim, a children’s game solved mathematically.
- Despite heavy training, agents show blind spots and can miss optimal moves.
- Strong performance does not always mean the system has learned the underlying winning rule.
Embargo: immediate.
New research published in Machine Learning shows pattern learning is not enough to train AI to tackle games – and abstract representations or hybrid approaches may help.
Many AI researchers describe game-playing as the “Formula 1” of AI: it’s a controlled test environment with clear rules and clear success criteria. This paper uses that idea as a diagnostic, by studying a very simple game Nim, a children’s matchstick game whose optimal strategy is known exactly.
Because the correct move is known for every position, we can measure whether an agent plays optimally across the state space. The research found that while small boards can work, despite heavy training and search, agents show blind spots and can miss optimal moves, and performance degrades as the board grows, with predictions approaching random. This suggests impartial games often need analytic representations, not pattern learning.
What does this mean for gaming with machines?
Self-play AIs can be very strong, but in games where both players share the “pieces” and the winning strategy is an abstract arithmetic rule, pattern-recognition from raw positions may not be enough on its own.
Wider implications:
The results don’t diminish the achievements of self-play AI in games like chess and Go. Rather, they help map where today’s methods can struggle, and where more abstract representations or hybrid approaches may be beneficial. More broadly, it’s a reminder that systems can perform well in common cases while remaining brittle in rare-but-important ones.
Dr Søren Riis Reader in Computer Science at Queen Mary University of London said: “Nim is a children’s game with a complete mathematical solution, yet AlphaZero-style self-play can still develop blind spots—becoming competitive while missing optimal moves across many positions.”
“This suggests that, for future work in AI, impressive performance alone is not proof that a system has learned the underlying principle: methods that capture abstract structure may be needed to reduce blind spots.”
--
“Impartial Games: A Challenge for Reinforcement Learning” by Dr Bei Zhou Research Associate at Imperial College London, and Dr Søren Riis Reader in Computer Science at Queen Mary University of London is published in Machine Learning
Journal
Machine Learning
Method of Research
Experimental study
Subject of Research
People
Article Title
Impartial Games: A Challenge for Reinforcement Learning
Article Publication Date
13-Mar-2026
No comments:
Post a Comment