In the world of artificial intelligence, researchers have long sought to understand why humans often make decisions that seem irrational. A new study reveals that AI agents, when constrained by limited computational power, exhibit behaviors strikingly similar to human biases like optimism and negativity. This finding challenges the traditional economic view of humans as fully rational actors and suggests that what we perceive as irrationality might simply be the best possible decisions under cognitive constraints. For anyone interested in how AI can model human behavior or improve decision-making systems, this research offers a fresh perspective on the interplay between intelligence and limitation.
The key discovery is that probabilistic finite automata (PFAs)—simple computational models with a finite number of states—can effectively solve the multi-armed bandit problem, a classic scenario where an agent must choose between multiple options with unknown rewards, much like a gambler picking slot machines in Las Vegas. The researchers found that these PFAs perform near-optimally when given enough states, and their performance degrades gracefully as states decrease. Importantly, the PFAs display human-like traits, including satisficing (setting an aspiration level and stopping when it's met), optimism bias (starting with high expectations), and negativity bias (giving more weight to negative outcomes). This indicates that such behaviors are not necessarily irrational but could arise naturally from bounded computational resources.
To achieve this, the researchers designed an 'aspiration-level' protocol, where the PFA sets an expectation for success and tests arms against a virtual competitor representing that level. If an arm meets or exceeds the expectation, it is played; otherwise, the aspiration is adjusted based on feedback. This method uses significantly fewer states than other approaches, such as the elimination tournament protocol, which compares arms in a pairwise manner. For example, in simulations with 50 arms, the aspiration-level protocol required only 115,000 states, encoded in about 17 bits, making it efficient and scalable. The approach avoids complex computations, relying instead on simple counters and thresholds to guide decisions.
The results, drawn from simulations with multi-armed bandits having uniformly distributed success probabilities between 0 and a random upper bound, show that the aspiration-level protocol achieves an average regret—the difference in reward compared to the optimal arm—of as low as 0.007 after 50,000 steps. This performance is comparable to other methods like the elimination tournament (0.008 average regret) and even outperforms the ε-greedy approach (0.025 average regret), despite the latter requiring infinitely many states. The protocol's ability to converge quickly to near-optimal behavior, while using a manageable number of states, highlights its practicality for real-world applications where computational resources are limited.
This research matters because it bridges AI and human psychology, offering insights into how bounded rationality influences decision-making in everyday life. For instance, in fields like economics or game theory, understanding these biases could lead to better models of human behavior, improving everything from consumer choice predictions to security strategies, such as in wildlife protection games where rangers must outsmart poachers. By demonstrating that AI can replicate human-like decision patterns, the study suggests that what seems irrational might be a rational adaptation to constraints, potentially informing the design of more intuitive AI systems for education, healthcare, or autonomous agents.
However, the study has limitations. The research focuses on static environments where reward probabilities do not change over time, and it does not address how these models would perform in dynamic settings. Additionally, while the aspiration-level protocol exhibits biases that mirror human behavior, it may not capture all aspects of irrationality, and further work is needed to explore its applicability to more complex, real-time decision scenarios. The paper also notes that reducing the number of states increases the prominence of biases like negativity, indicating a trade-off between efficiency and behavioral accuracy that warrants deeper investigation.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn