TL;DR
LLMs can't replicate human scientific reasoning: they struggle to form new hypotheses or design experiments that go beyond existing data.
Artificial intelligence systems that can autonomously conduct scientific research have long been a goal of AI development, but new research reveals significant limitations in current approaches. The study demonstrates that while AI can process and summarize existing scientific information, it falls short when tasked with the creative reasoning required for genuine .
The researchers found that large language models (LLMs) consistently failed to generate novel scientific hypotheses or design meaningful experiments. When tested across multiple scientific domains including biology, chemistry, and physics, the AI systems primarily reproduced patterns from their training data rather than creating new insights. The models showed particular difficulty with counterfactual reasoning—imagining scenarios that contradict established scientific knowledge.
The investigation used a systematic evaluation framework where AI systems were given scientific problems and assessed on their ability to propose testable hypotheses, design controlled experiments, and interpret unexpected . ology involved presenting the models with open-ended scientific s and analyzing their responses against criteria established by human scientific experts. This approach allowed direct comparison between AI-generated scientific reasoning and human scientific practice.
Analysis of showed that AI systems achieved only 23% success in generating scientifically valid hypotheses, compared to 78% for human scientists working on identical problems. The models frequently proposed experiments that were logically flawed or failed to account for key variables. In one notable case, when presented with anomalous experimental data, the AI systems tended to dismiss the anomalies as measurement errors rather than considering them as potential evidence for new phenomena.
extend beyond academic interest. Many organizations are investing in AI systems to accelerate scientific research, particularly in fields like drug and materials science. These suggest that current AI approaches may be better suited for data analysis and literature review than for the creative aspects of scientific . The research indicates that AI systems can assist human scientists but cannot yet replace the intuitive leaps and conceptual breakthroughs that drive scientific progress.
The study acknowledges several limitations. The evaluation focused primarily on text-based reasoning and did not test AI systems integrated with laboratory equipment or real-world experimentation. The researchers note that future AI systems with different architectures or training approaches might perform better. Additionally, the study examined only current state-of-the-art models, leaving open the possibility that future developments could overcome these limitations.
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn