AI Detects Unknown User Requests in Chatbots

TL;DR

New method uses flexible boundaries to flag unfamiliar intents in dialogue systems, boosting accuracy without manual tuning for real-world use.

In the world of AI-driven customer service and virtual assistants, a persistent has been handling user requests that fall outside the system's known capabilities. Traditional systems either fail to recognize these unfamiliar intents or require constant manual adjustments to stay effective. A new approach called EliDecide, developed by researchers from Tsinghua University and Hebei University of Science and Technology, addresses this by learning flexible decision boundaries that can adapt to the shape of known data, enabling more robust detection of unknown user intents without prior knowledge. This advancement is crucial for real-world applications like banking chatbots, where systems must reliably classify known requests—such as checking payment status or resetting a passcode—while flagging unexpected ones, like a user asking to eat something, for exception handling.

The key finding of this research is that using ellipsoid-shaped decision boundaries, rather than the spherical ones common in existing s, significantly improves the detection of unknown intents. The researchers demonstrated that ellipsoids offer greater geometric flexibility, allowing them to better capture the directional variance in real-world data distributions. As shown in Figure 2 of the paper, spherical boundaries often exclude many known samples to avoid including unknown ones, whereas ellipsoids can include most known samples without compromising detection accuracy. This led to state-of-the-art performance on benchmark datasets, with EliDecide achieving improvements of 0.54% to 2.66% over previous s across various settings, as detailed in Table 1 of the paper.

Ology behind EliDecide involves a two-stage process. First, the researchers used supervised contrastive learning to create a discriminative feature space from known samples, leveraging a pre-trained BERT model to extract sentence-level representations. This step ensures that similar intents are grouped closely together in the feature space. Second, they constructed ellipsoid boundaries for each known class by parameterizing them with learnable matrices that encode both the direction and scale of the ellipsoid's axes. These boundaries were optimized using a dual loss function: an expansion loss that encourages boundaries to encompass known samples, and a contraction loss that uses synthesized pseudo-open samples to prevent boundaries from over-expanding into regions where unknown intents might appear.

From extensive experiments on intent datasets like Banking and OOS, as well as a question classification dataset called StackOverflow, show that EliDecide consistently outperforms existing s. For instance, on the Banking dataset with 25% of classes known, EliDecide achieved an F1-score of 77.75% and accuracy of 85.81%, compared to 77.30% and 85.72% for the next best , KNNCL. The paper's Table 1 highlights these gains across different known class ratios, with showing particular strength in low-information scenarios where many classes are unknown. Additional experiments, such as those comparing ellipsoids to ball boundaries with fixed coverage fractions, revealed that even optimally tuned spherical boundaries underperform by 0.8% to 1.9%, underscoring the advantage of ellipsoid flexibility.

Of this work are significant for practical AI systems, especially in domains like customer service, healthcare, and autonomous systems where robustness to unexpected inputs is critical. By eliminating the need for manual threshold tuning—a limitation in scoring-based approaches—EliDecide offers a more deployable solution for real-world dialogue systems. 's success on diverse datasets suggests it could generalize to other text classification tasks in open-world scenarios, such as detecting out-of-scope questions in educational tools or anomalous commands in robotics. Moreover, comparisons with large language models like Llama 3, shown in Table 3, indicate that specialized approaches with compact encoders can outperform larger models in open intent detection, highlighting the value of tailored ologies over brute-force scaling.

Despite its strengths, the research acknowledges limitations. relies on synthesized pseudo-open samples for training, which may not fully capture the distribution of real unknown intents, potentially affecting performance in highly dynamic environments. Additionally, while the ellipsoid boundaries provide flexibility, they require careful hyperparameter tuning, such as the penalty strength β and Dirichlet parameter α, though the paper notes stability across reasonable ranges as shown in Figures 5 and 6. Future work could explore integrating real open samples during training or extending the approach to multimodal contexts. The paper also points out that the current evaluation is limited to text-based datasets, leaving open questions about applicability to other data types like images or audio.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn