AI Achieves Fast, Accurate Search Without Slowdown

A new artificial intelligence method can dramatically speed up how computers find and rank information while maintaining high accuracy, addressing a critical bottleneck in modern search systems. The approach, called E2Rank, allows a single AI model to perform both initial retrieval and sophisticated reranking tasks that previously required separate systems, cutting processing time by up to five times while matching or exceeding the performance of current state-of-the-art methods.

Researchers from Renmin University of China and Alibaba Group discovered that by treating complex ranking prompts as enhanced queries, they could eliminate the computational overhead that slows down current AI search systems. The key insight was recognizing that the detailed instructions given to large language models for ranking documents could be reinterpreted as pseudo-relevance feedback—a technique from traditional information retrieval where top results help refine the search.

The method works through a two-stage training process. First, the AI model learns to create effective text embeddings—mathematical representations that capture semantic meaning. Then, through multi-task learning that combines contrastive learning with RankNet pairwise ranking optimization, the model gains the ability to perform sophisticated listwise ranking while maintaining its efficient embedding capabilities. During operation, the system creates an enhanced query representation from the original query and top candidate documents, then uses simple cosine similarity calculations against precomputed document embeddings for fast scoring.

Experimental results across multiple benchmarks demonstrate the effectiveness of this approach. On the BEIR benchmark, E2Rank achieved competitive performance with existing methods while significantly reducing latency. The 0.6 billion parameter version showed an average gain of +4.06 NDCG@10 over comparable baselines, with larger models achieving even stronger results. Most impressively, the 8 billion parameter model achieved the highest overall score of 54.35 on BEIR, surpassing much larger models. On the challenging BRIGHT reasoning-intensive benchmark, E2Rank-8B attained a competitive score of 33.4 without any reinforcement learning process, validating its strong reasoning capabilities.

The efficiency improvements are particularly notable. Analysis showed E2Rank reduces inference latency by approximately 5 times compared to RankGPT-like rerankers at the 8 billion parameter size. Even the E2Rank-8B model runs faster than the 0.6 billion parameter version of traditional methods. This speed advantage comes from eliminating the need for expensive auto-regressive generation while supporting batch inference techniques that further optimize online processing.

For everyday users, this breakthrough means faster, more accurate search experiences across applications from web search to question-answering systems and retrieval-augmented generation. The unified approach also simplifies system architecture, reducing the complexity of maintaining separate retrieval and reranking components. The method maintains strong embedding capabilities as measured by the Massive Text Embedding Benchmark, ensuring it remains effective for various text understanding tasks beyond just search and ranking.

The research does note limitations, including that performance gains plateau when incorporating more than about 20 documents in the listwise prompt, suggesting diminishing returns from additional contextual signals. The method also relies on high-quality initial retrieval, though experiments show it adapts robustly to varying retrieval qualities. Future work could explore extending this unified approach to other AI tasks beyond information retrieval.

AI Achieves Fast, Accurate Search Without Slowdown

About the Author

Guilherme A.