AI Learns to Think Randomly Like Humans

Large language models (LLMs) excel at providing single, correct answers but struggle with tasks requiring probabilistic or diverse responses, such as simulating human behavior or generating creative content. This limitation, highlighted in recent research, can lead to biased outputs and reduced diversity, which is critical for applications like multiplayer games, opinion simulation, and test-time scaling. A new method called String Seed Thought (SSoT) addresses this by instructing LLMs to generate a random string internally and use it to make decisions, significantly improving their ability to follow probability distributions and enhance response variety without compromising quality.

The key finding from the study is that SSoT enables LLMs to achieve near-ideal performance in probabilistic instruction following (PIF) and diversity-aware generation (DAG). For PIF tasks, such as simulating a fair coin flip where heads and tails should each appear 50% of the time, standard prompting often results in skewed distributions. With SSoT, models like deepseek-r1 and QwQ-32B showed substantial improvements, approaching the performance of a pseudo-random number generator (PRNG). In experiments, SSoT reduced deviations from target distributions, with metrics like Jensen-Shannon divergence dropping by up to 97% in some cases, as detailed in Table 1 of the paper. For DAG tasks, such as generating diverse stories or ideas, SSoT increased the distinctiveness of outputs while maintaining utility, outperforming baselines like paraphrasing or adjusting temperature parameters.

Methodologically, SSoT is simple and requires minimal changes to existing prompts. It involves a two-stage process: first, the LLM generates a complex, unpredictable random string (e.g., "7fG2#kL9!pY4@zR6%vD1*sN0&qX5$wT8"), and second, it uses this string to deterministically select an action based on the desired probability distribution or to create diverse responses. The approach leverages the LLM's internal reasoning capabilities, such as summing ASCII codes and applying modulo operations, to map the string to outcomes. This method is highly versatile, working across various tasks with a single, unified prompt and without needing model retraining or access to generation history, making it fully parallelizable and scalable.

Results from extensive experiments demonstrate SSoT's effectiveness. In PIF scenarios with multiple choice options, SSoT consistently outperformed baselines, including ensembling and few-shot examples, across five frontier LLMs. For instance, in a rock-paper-scissors game against adversarial bots, SSoT enabled mixed-strategy play that resisted exploitation, achieving scores near zero compared to negative scores with standard methods, as shown in Figure 4. In DAG tasks evaluated on datasets like NoveltyBench, SSoT produced more diverse responses without quality loss, with distinctness scores rising in categories like creativity. Theoretical analyses confirmed that SSoT ensures faithfulness to target distributions under mild assumptions, with total variation distance diminishing as the string length increases, supporting its reliability.

The implications of this research are broad for real-world applications. In gaming, SSoT could enable AI players to adopt unpredictable strategies, enhancing fairness and engagement. For simulations of human behavior, such as in social science or market research, it allows models to reflect diverse opinions more accurately. In creative fields, it aids in brainstorming and content generation by producing varied ideas. The method's ability to improve diversity is particularly valuable for test-time scaling, where generating multiple candidate solutions is essential for selecting the best one.

Limitations noted in the paper include that SSoT is most effective with highly capable LLMs that support long reasoning traces, and its performance depends on the quality of the internally generated random string. The study did not explore all potential domains, such as optimizing randomness extraction strategies for specific applications, leaving room for future work. Additionally, while SSoT reduces biases, it may not eliminate them entirely in all contexts, emphasizing the need for further refinement in complex scenarios.

AI Learns to Think Randomly Like Humans

About the Author

Guilherme A.