Large language models (LLMs) are transforming how we interact with technology, but they often fall short in high-stakes fields like healthcare and finance by merely responding to queries instead of guiding conversations toward goals. A new study from Alibaba Group introduces Learn-to-Ask, a framework that teaches AI to act as proactive partners, learning directly from real-world expert dialogues without relying on simulations. This breakthrough could make AI assistants more effective in critical applications, improving efficiency and decision-making in daily services.
The researchers found that Learn-to-Ask enables LLMs to master both what to ask and when to stop in conversations, addressing a key limitation in current AI systems. In tests with the RealMedConv dataset of pharmacist-patient interactions, the method more than tripled the ability of a 7-billion-parameter model to ask precise, goal-oriented questions and correctly terminate dialogues with 92% accuracy. For larger models, such as a 32-billion-parameter version, it boosted question quality by 185% and stopping accuracy to 88%, showing that the approach scales effectively with model size.
To achieve this, the team developed a simulator-free approach that reframes the problem of proactive dialogue as a series of supervised tasks. Instead of using complex simulations, which often fail to mimic real human behavior, Learn-to-Ask analyzes existing conversation logs to infer what experts aimed to achieve at each turn. It extracts a 'micro-goal'—the specific information sought next—and a 'macro-goal'—whether to continue or stop the dialogue—from observed future interactions. This hindsight-driven method grounds the learning process in actual expert strategies, avoiding the 'reality gap' where AI trained in synthetic environments underperforms in real-world use.
The framework employs an Automated Prompt Calibration system to refine the AI's components, such as information extraction and reward grading, with minimal human oversight. This ensures that the model learns from high-quality, verified data, reducing noise and bias. For policy optimization, the researchers used reinforcement fine-tuning techniques like Group Relative Policy Optimization, which efficiently handles the nuanced rewards derived from expert behaviors without requiring separate value models.
Results from the study, detailed in Table 1 of the paper, show that Learn-to-Ask outperforms baselines like supervised fine-tuning and direct preference optimization. For instance, in the 7B model, the method increased the 'good-question hit rate' from 13% to 41% and termination accuracy from 16% to 93%. These metrics indicate that the AI not only asks better questions but also knows when enough information has been gathered, enhancing dialogue efficiency and user experience.
The real-world impact was validated through deployment in a live online 'Medication Assistant' service, handling thousands of daily users. In a four-week A/B test, the Learn-to-Ask-trained model achieved a 93% dialogue completeness rate and an 88% good-question rate, with business metrics like conversion rates showing a 1.87-fold improvement over previous systems. This demonstrates that the framework translates academic gains into tangible benefits, offering a practical blueprint for industries seeking to upgrade passive AI into active collaborators.
However, the study notes limitations, such as the AI inheriting biases from human expert data, which may include tendencies toward brevity or omitted safety checks. The paper calls for future work to explore how these systems can evolve beyond imitation to superhuman performance, potentially by incorporating organizational protocols or enabling exploration of unasked questions. By bridging the gap between data and deployment, Learn-to-Ask sets the stage for AI that not only responds but strategically engages, making everyday interactions smarter and more reliable.
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn