A new approach to artificial intelligence has achieved near-optimal performance in solving the classic code-breaking game Mastermind, demonstrating how strategic information valuation can push heuristic s to their theoretical limits. Researchers have developed a weighted entropy framework that assigns context-dependent utility values to different types of feedback, allowing the AI to make more intelligent guesses than previous s. This breakthrough not only advances game-solving algorithms but also provides insights into how AI can prioritize information in decision-making processes, with potential applications in broader combinatorial problems.
The key finding is that this weighted entropy achieves state-of-the-art performance among known heuristic approaches for Mastermind. Specifically, the stage-weighted version of the heuristic achieves an average of 4.3488 guesses over all 1296 possible secret codes, with a maximum of 6 guesses needed. This places it within 0.2% of the theoretical optimum of 4.3403 average guesses, which was established through exhaustive search. The fixed-weight version also performs remarkably well, achieving 4.3565 average guesses with a maximum of 5, already ranking among top published s. These show that by valuing not just the amount of information gained but its strategic quality, AI can approach optimal game play without the computational burden of exhaustive s.
Ology centers on applying weighted Shannon entropy, based on the Belis-Guias, u framework, to Mastermind. Instead of treating all feedback equally, as standard entropy does, this approach assigns utility weights to each of the 14 possible feedback types (such as 0 bulls and 1 cow, or 2 bulls and 0 cows). The AI selects guesses that maximize this weighted entropy, effectively prioritizing feedback that leads to easier subproblems in future turns. For the stage-weighted heuristic, distinct weight vectors are optimized for each turn using a genetic algorithm, allowing the strategy to adapt as the game progresses from broad exploration to precise resolution. The genetic algorithm was implemented in C++/CUDA, running on an NVIDIA RTX 3090 GPU to efficiently evaluate 64 individuals in 250ms, with weights constrained between 0.1 and 1.0 for stability.
Analysis of reveals interpretable strategic patterns in the optimized weights, rather than arbitrary parameter fitting. For instance, the feedback indicating the secret code is found (4 bulls and 0 cows) shows increasing importance in turns 2-4, with deviations from uniform distribution of +76%, +76%, and +100% respectively, highlighting the value of consistent guesses that can solve the game early. Conversely, feedback like 0 bulls and 2 cows averages a -17% deviation, confirming it as generally less valuable. The stage-dependent weights show clear progression: turn 1 has low deviation (±2%) for broad exploration, while turn 6 exhibits high deviation (-59% to +65%) for specialized endgame requirements. Performance comparisons show the stage-weighted outperforms other heuristics like Shannon entropy (4.4151 average guesses) and Most Parts (4.3735 average guesses), closing the gap to optimality significantly.
Extend beyond Mastermind to broader AI and decision-making contexts. This work demonstrates that principled utility functions can enhance one-step-ahead heuristics, making them more effective while retaining computational efficiency—unlike optimal strategies that require huge precomputation. generalizes naturally to larger Mastermind variants and related combinatorial problems, such as constraint satisfaction, where efficient information gathering is crucial. By showing how AI can learn and apply strategic patterns through weighted information valuation, it offers a framework for improving algorithms in areas like data analysis, optimization, and even educational tools for teaching logical reasoning. The fully reproducible implementation, including source code and optimized parameters, provides a foundation for future research in these directions.
Limitations of the approach are noted in the paper, primarily the remaining 0.2% gap to true optimality, which reflects the fundamental constraint of one-step-ahead heuristics that optimize average-case performance rather than worst-case. The stage-weighted achieves a slightly higher maximum of 6 guesses compared to the optimal strategy's maximum of 5, trading off worst-case performance for near-optimal averages. Additionally, while scales well to standard Mastermind, its application to more complex variants like MM(5,8) requires further evaluation to assess adaptability. Future work could explore alternative entropy formulations, such as Rényi or Tsallis entropy, within the weighted framework to potentially enhance performance further, but current already establish a strong baseline for heuristic approaches in this domain.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn