AIResearch AIResearch
Back to articles
AI

AI Solves Complex Database Queries With Human-Like Planning

AI solves complex data questions by thinking like a human analyst. See how this system translates natural language into accurate database queries with unprecedented precision.

AI Research
November 14, 2025
3 min read
AI Solves Complex Database Queries With Human-Like Planning

A new artificial intelligence system can translate complex natural language questions into accurate database queries by thinking through problems step by step, much like a human analyst would. This breakthrough addresses a critical bottleneck in data analysis—bridging the gap between how people naturally ask questions and how databases require structured queries.

Researchers at Oracle developed OraPlan–SQL, which outperformed the second-best system by 6.3 percentage points in English and 12.6 percentage points in Chinese on the Archer Evaluation Challenge benchmark. The system achieved 55% accuracy in English and 56.7% in Chinese while maintaining near-perfect SQL syntax validity at 99%. This performance is particularly notable because the benchmark requires handling complex reasoning involving arithmetic calculations, commonsense knowledge, and logical inference—not just simple database lookups.

The key innovation lies in how the system approaches problem-solving. Instead of trying to directly convert questions into SQL code, OraPlan–SQL first creates an intermediate plan in natural language. This planning stage breaks down complex questions into manageable steps, making the reasoning process transparent and easier to debug. For example, when asked "What is the ratio of stations in the city with the most stations to the city with the fewest?" the system first identifies that it needs to count stations per city, then find the maximum and minimum counts, and finally compute their ratio.

The planning component uses what researchers call "meta-prompting"—analyzing common error patterns from previous attempts and incorporating corrective guidelines into the system's instructions. When the system encounters percentage calculations, it now explicitly breaks them into per-row computations before aggregation. For hypothetical questions using words like "if" or "suppose," it separates assumptions from actions to avoid logical errors.

Results show the planning approach is crucial. When researchers removed the planning stage and fed questions directly to the SQL generator, performance dropped by 7.7 percentage points. The meta-prompting refinement provided even larger gains—boosting accuracy from 44.2% to 79.8% on development tests by addressing systematic error patterns.

The system also handles bilingual queries effectively, closing the performance gap between English and Chinese that plagues many existing systems. Rather than translating Chinese queries to English first—which can introduce errors like converting "格里公园" to "Geli" instead of "Glebe Park"—OraPlan–SQL processes queries directly in their original language while using entity linking to match variations like "NYC" and "New York City."

For reliability, the system generates multiple query plans for each question and selects the final answer through majority voting of execution results. This approach reduces dependence on any single generated query and provides a small but consistent performance improvement.

The implications extend beyond technical benchmarks. Businesses and researchers dealing with multilingual datasets could use this technology to make data analysis more accessible to non-technical users who need to ask complex questions in their native language. The planning-centric approach also makes the system's reasoning more interpretable, allowing users to understand how it arrived at particular answers.

However, the system still faces limitations. Performance varies significantly depending on the underlying language model used—GPT-5 achieved 79.8% accuracy while GPT-4o reached only 52.9% in development tests. The approach also requires careful analysis of error patterns to create effective guidelines, which may not generalize perfectly to new types of questions beyond those encountered during development.

About the Author

Guilherme A.

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn