AI Model Masters Ayurvedic Medicine in Hindi and English

TL;DR

A new language model beats larger rivals on Ayurvedic knowledge, delivering reliable answers in both Hindi and English while exposing gaps in clinical use.

Ayurveda, a centuries-old system of traditional medicine, holds deep cultural and clinical significance, yet its complex principles often elude mainstream artificial intelligence. Researchers have developed AyurParam-2.9B, a specialized bilingual language model that accurately interprets and applies Ayurvedic knowledge, addressing a critical gap where general models fail. This advancement supports education and decision-making in a field that blends philosophy, wellness, and medicine, making it relevant for practitioners, students, and anyone interested in holistic health.

The key finding is that AyurParam-2.9B achieves state-of-the-art performance in Ayurvedic question-answering, surpassing similar-sized models and competing with much larger ones. On the BhashaBench-Ayur benchmark, which includes 14,963 exam-style questions, AyurParam scored 41.12% accuracy in Hindi and 38.04% in English, outperforming models like Llama-3.2-3B-Instruct and Qwen2.5-3B-Instruct. Notably, it excels in multiple-choice formats, with 40.12% accuracy, demonstrating its ability to discriminate between closely related therapeutic approaches—a skill essential for reliable advice.

Methodology involved fine-tuning the Param-1-2.9B-Instruct base model using a meticulously curated dataset of Ayurvedic texts. The team collected over 1,000 books, including classics like Charaka Samhita and Sushruta Samhita, in languages such as Hindi, Sanskrit, and English, ensuring broad coverage of domains like pharmacology and diagnostics. After optical character recognition processing to convert scanned pages into text, they generated 4.75 million question-answer pairs through knowledge-grounded synthesis, where answers were anchored to specific text spans to reduce errors. Human experts reviewed samples to refine the data, focusing on accuracy and cultural relevance.

Results analysis, as detailed in the paper's tables, shows AyurParam's consistent performance across difficulty levels and question types. For example, in easy questions, it achieved 31.21% accuracy, compared to 28.51% for Llama-3.2-3B-Instruct, and maintained strengths in domains like Kayachikitsa (internal medicine) and Dravyaguna (pharmacology). However, it struggled with reasoning-intensive areas such as Panchakarma and Rasayana, indicating room for improvement in complex analytical tasks. The model's bilingual capability, though stronger in Hindi, underscores its utility in making Ayurveda accessible to diverse audiences.

Contextually, this model matters because it enhances the reliability of AI in specialized fields, potentially supporting Ayurvedic education, consultation, and wellness guidance without replacing professional care. By providing accurate, context-aware responses in both Hindi and English, it bridges linguistic and knowledge gaps, empowering users in regions where Ayurveda is practiced. This could lead to better-informed decisions in holistic health, though it is designed as an educational tool, not a diagnostic replacement.

Limitations from the paper include temporal coverage gaps, as the training data primarily consists of historical texts up to 2024, missing recent medical advances. The performance disparity between Hindi and English questions suggests insufficient English content in the corpus. Additionally, the evaluation relies solely on exam-style benchmarks, lacking assessment of open-ended reasoning or real-world usability, and the model lacks safety guardrails for inappropriate advice, highlighting the need for further validation and ethical safeguards in future iterations.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn