AI Now Thinks in Code to Show Its Reasoning

TL;DR

Researchers built a simple language that runs inside AI models, making their logic visible and revealing how they handle contradictions.

A new approach to artificial intelligence has researchers looking backward to move forward, using a programming language inspired by 1980s home computers to make AI reasoning more transparent and reliable. Cognitive BASIC, developed by researchers at the University of Oldenburg, represents a significant departure from current AI prompting s by structuring how language models think using explicit, stepwise programs that run entirely within the model itself. This innovation addresses a critical limitation in modern AI systems: while they can generate impressive text, their internal reasoning processes often remain opaque and difficult to audit, making it challenging to understand how they reach conclusions or identify where their thinking breaks down.

The key finding from this research is that large language models can reliably execute structured programs written in a simplified BASIC-style language, creating transparent reasoning traces that reveal exactly how they process information. Unlike traditional prompting approaches that describe reasoning procedures at the prompt level, Cognitive BASIC executes them through line-numbered programs interpreted entirely by the AI model. This allows researchers to see not just what the model concludes, but how it gets there—including how it extracts facts, identifies contradictions, and resolves inconsistencies in its thinking. The approach transforms text generation from an opaque process into a sequence of auditable cognitive operations that can be examined step by step.

Ology centers on a minimal programming language that borrows from early BASIC conventions, with numbered lines executed sequentially unless redirected by control flow commands like IF...THEN or GOTO. An interpreter file, written entirely in natural language, defines the semantics of each command and how they manipulate a compact memory structure inside the model. This memory contains five key components: working memory for current scenario text, declarative memory for factual propositions, procedural memory for operational rules, a conflicts list for detected contradictions, and a resolution field for reconciled statements. Each instruction in a Cognitive BASIC program explicitly updates this memory state, creating a transparent record of the model's reasoning process.

From testing across three language models—granite3.3, gpt-oss:20b, and mistral:7b—reveal both strengths and weaknesses in current AI reasoning capabilities. On a benchmark of 25 scenarios containing contradictory factual statements, all models demonstrated strong declarative extraction, with granite3.3 and mistral:7b achieving perfect 1.00 accuracy at extracting facts into declarative memory. However, significant differences emerged in more complex reasoning tasks: conflict detection and resolution proved substantially more challenging. Granite3.3 showed the strongest overall performance with 0.92 accuracy in both conflict detection and resolution, while gpt-oss:20b dropped to 0.60 in both categories, and mistral:7b achieved 0.84 in detection and 0.80 in resolution. These patterns reveal specific cognitive weaknesses, particularly in handling temporal and numeric inconsistencies that some models struggled to detect or resolve properly.

Of this research extend beyond academic interest to practical applications where transparent AI reasoning matters. By making AI thinking processes explicit and auditable, Cognitive BASIC could help address concerns about AI reliability in fields like medical diagnosis, legal analysis, or scientific research where understanding how conclusions are reached is as important as the conclusions themselves. The approach also provides a new diagnostic tool for identifying specific failure modes in AI systems—revealing not just that a model is wrong, but exactly where and why its reasoning breaks down. This level of transparency could accelerate improvements in AI safety and reliability by giving developers precise targets for model improvement rather than relying on black-box evaluations.

Despite its promise, Cognitive BASIC has important limitations that the researchers acknowledge. The current implementation lacks tool-use capabilities, meaning any operation requiring external retrieval or computation must be handled by an outside controller before resuming the program. This limits its applicability to purely internal reasoning tasks without access to external data or computational resources. Additionally, the researchers note that preliminary trials with smaller models (1B–3B parameters) revealed unreliable program following and incomplete conflict pipelines, suggesting the approach may not scale down to more resource-constrained environments. Future work will need to address these limitations while exploring extensions like hierarchical control systems and more integrated tool-calling capabilities.

The research also highlights broader questions about how we should structure AI reasoning systems. By borrowing from early programming paradigms, Cognitive BASIC demonstrates that sometimes looking backward can provide forward momentum—using simple, transparent structures to manage complex cognitive processes. As AI systems become more integrated into critical decision-making contexts, approaches like this that prioritize transparency and auditability may become increasingly important for building trust and ensuring reliability. The fact that all tested models could execute Cognitive BASIC programs suggests this approach taps into fundamental capabilities of modern language models, potentially opening new avenues for making AI thinking more understandable and controllable.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn