Open-Source AI Matches Top Models on Research Questions

TL;DR

A new study finds free AI tools answer computer science questions as accurately as paid systems when updated with current research data.

Researchers have demonstrated that open-source artificial intelligence models, when properly enhanced, can answer complex computer science questions with accuracy rivaling expensive commercial systems. This finding challenges the assumption that only premium AI services can deliver reliable results for specialized domains, potentially making advanced research tools more accessible to students, educators, and developers worldwide.

The key discovery from the University of Moratuwa study shows that Mistral-7b-instruct, an open-source model, achieved 85.7% accuracy on binary questions when augmented with retrieval-augmented generation (RAG), coming remarkably close to OpenAI's GPT-3.5's 90.5% accuracy. This performance gap narrows significantly when considering that the open-source model operates without the financial costs associated with commercial APIs. The research compared four open-source models—Mistral-7b-instruct, LLaMa2-7b-chat, Falcon-7b-instruct, and Orca-mini-v3-7b—against GPT-3.5, evaluating their ability to answer questions about recent computer science literature.

Methodologically, the team built a specialized database containing 4,929 journal paper abstracts from 2023-2024, focusing on three trending areas: large language models, quantum computing, and edge computing. They converted these abstracts into numerical vectors using SPECTER, a scientific document embedding model, and stored them in FAISS for efficient similarity searching. When a question was posed, the system retrieved the most relevant document chunks and passed them to the language models along with the original query, providing current research context that vanilla models lack.

The results revealed clear performance patterns across different question types. For binary (yes/no) questions, GPT-3.5+RAG achieved the highest precision at 90.5%, followed closely by Mistral-7b-instruct+RAG at 85.7%. However, when evaluating long-form answers using cosine similarity scores and expert rankings, GPT-3.5 scored 0.4479 while Mistral-7b-instruct scored 0.2339, indicating commercial models still produce more creatively varied responses. The human expert evaluation ranked GPT-3.5 responses as "excellent" in 12 of 30 cases, compared to 10 for Mistral-7b-instruct. Notably, Orca-mini-v3-7b generated answers fastest at 99.2 seconds, while LLaMa2-7b-chat was slowest at 107.5 seconds.

This research matters because it demonstrates that cost-effective AI solutions can now handle specialized academic queries with reasonable accuracy. For computer science students and researchers operating with limited budgets, open-source models enhanced with current literature provide a viable alternative to expensive commercial services. The approach also addresses the critical problem of AI hallucination—where models generate confident but incorrect information—by grounding responses in verified research sources. As the paper notes, standard ChatGPT trained only until January 2022 cannot accurately capture recent theoretical developments, making RAG enhancement essential for current domain knowledge.

The study acknowledges several limitations, including its focus on computer science literature only and the use of abstracts rather than full papers due to computational constraints. The evaluation relied on a single human expert's perspective, which might introduce bias, and the cosine similarity scores for long answers remained relatively low across all models. Additionally, some open-source models required specific prompt formatting to generate expected responses, highlighting the ongoing importance of prompt engineering even with RAG systems.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn