AI Retrieval Is Now 500x Faster and 15% More Accurate

TL;DR

A new method pairs hierarchical knowledge trees with a fast filter to sharpen AI responses and cut retrieval time dramatically on complex queries.

Artificial intelligence systems that retrieve external information to answer questions often struggle with balancing speed and accuracy, but a new approach from researchers at Peking University tackles both s head-on. The framework, called Bridge-RAG, introduces a hierarchical structure for organizing knowledge and a fast lookup filter, resulting in significant improvements over existing s. This advancement is crucial for real-world applications where timely and precise responses are essential, from medical diagnostics to financial analysis, offering a more efficient way for AI to access and use vast amounts of data.

The key finding from the research is that Bridge-RAG achieves around a 15.65% improvement in accuracy and reduces retrieval time by 10 to 500 times compared to other retrieval-augmented generation frameworks. On datasets like MedQA and AALCR, demonstrated substantial gains: for example, on the AALCR dataset, it improved accuracy by 22.1% in BLEU score compared to Tree-RAG while being 500 times faster. These , detailed in Table 1 of the paper, show that the system can retrieve relevant information more quickly and accurately, addressing common issues where AI systems retrieve fragmented or noisy data that leads to errors in final responses.

Ology behind Bridge-RAG involves two main designs to overcome accuracy and efficiency s. First, to enhance accuracy, the researchers introduced the concept of an abstract, which groups five consecutive document chunks into higher-level knowledge units. These abstracts are organized into a tree structure, with upper levels representing general concepts and lower levels capturing detailed information. This hierarchical organization allows for multi-level retrieval, where the system traces parent-child relationships to gather comprehensive context from multiple abstraction levels. Second, to boost efficiency, an improved Cuckoo Filter data structure was designed, providing constant time complexity for entity lookups. This filter includes a block linked list to store abstract addresses and an entity temperature-based sorting mechanism that prioritizes frequently accessed entities, optimizing retrieval speed as shown in Figure 4 of the paper.

Analysis reveals that Bridge-RAG outperforms baseline models like Naive RAG, Graph-RAG, and Tree-RAG across multiple metrics. On the MedQA dataset, Bridge-RAG improved accuracy by 15% in ROUGE-L and BLEU compared to Tree-RAG while being 226 times faster. The speed advantage comes from the Cuckoo Filter's O(1) lookup time, which avoids exhaustive searches, and the accuracy improvement stems from retrieving context from parent and child abstracts, providing more structured and semantically coherent information. Experiments also showed that increasing the retrieval depth from one to three levels improved accuracy by capturing broader semantic context, though it slightly increased retrieval time due to additional tree traversal, as indicated in Table 1. The error rate in searching abstracts was nearly zero, with hash collisions causing minimal issues in datasets containing thousands of abstracts.

Of this research are significant for applications requiring high precision and low latency, such as healthcare, finance, and customer service. By enabling AI systems to retrieve more accurate and contextually rich information faster, Bridge-RAG could improve decision-making in fields where data synthesis is critical. For instance, in medical question-answering, it could help provide more reliable diagnoses by accessing multi-level knowledge from documents. 's efficiency gains also make it scalable for large datasets, potentially reducing computational costs and energy usage in AI deployments, as it minimizes the need for extensive similarity calculations across full document sets.

However, the paper notes several limitations. Bridge-RAG assumes that fixed-length abstracts of five chunks offer an optimal balance between semantic coverage and retrieval granularity, but variable-length groupings might behave differently. The hierarchical depth is constrained to three levels to control latency, and deeper trees could affect the accuracy-speed trade-off. Additionally, the system inherits the risk of hallucinations from the underlying large language model, meaning it may propagate errors if retrieved chunks contain conflicting or outdated facts. These limitations suggest areas for future research, such as exploring dynamic depth selection or improving fact-checking mechanisms to further enhance reliability.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn