AI Learns From Human 'No' Instead of 'Yes'

As artificial intelligence systems approach and surpass expert human performance across many domains, training them becomes increasingly challenging. The best human experts are typically knowledgeable only in narrow areas and cannot comprehensively evaluate the correctness of advanced superhuman tasks. However, researchers have discovered that these specialists can reliably provide a different kind of signal: they can confidently state when something is wrong, even if they cannot identify the correct answer.

Researchers from the University of Tokyo and RIKEN developed a framework called 'partitioned human supervision' that enables AI oversight without requiring ground truth labels. The key finding shows that when AI systems tackle complex, cross-disciplinary problems beyond any single expert's ability, human specialists can still provide valuable 'complementary labels' - indicating which options are definitely incorrect. This approach builds on the observation that while experts may fail to positively certify correctness, they can readily certify when something is wrong.

The methodology works through a carefully designed labeling protocol. For multiple-choice questions, each item is routed to a randomly selected specialist responsible for exactly one answer class. The specialist either confirms the answer belongs to their field (yielding an ordinary label) or confidently rejects it (yielding a complementary label). Under this uniform wrong-index design, the researchers derived an unbiased estimator of accuracy using only these 'no' responses, along with finite-sample guarantees for its reliability.

Experimental results across multiple benchmarks demonstrate the practical effectiveness of this approach. On the MMLU-Pro benchmark with 10-option questions, the complementary-label estimator achieved 78.33% accuracy with minimal deviation. In real-world applications like Japanese financial report classification and medical abstract analysis, the method enabled accurate performance estimation without requiring any single expert to solve the problems alone. The researchers also showed that these weak signals can guide automated agent design, with AI systems trained using only complementary feedback outperforming manually designed baselines.

This breakthrough matters because it addresses a fundamental bottleneck in AI development: as systems become more capable, obtaining high-quality supervision becomes increasingly difficult. The partitioned supervision framework provides a practical pathway for evaluating and training AI systems on tasks where no single human can verify correctness. This could accelerate progress in fields like medicine, finance, and scientific research where problems often require deep knowledge across multiple specialized domains.

The approach does have limitations. It assumes that complementary labels are sampled uniformly from wrong options, and systematic biases in human judgment could affect performance. Additionally, the method's effectiveness depends on having sufficient complementary labels to offset their lower per-sample information compared to ordinary labels. The researchers note that small sample sizes can lead to unstable estimates, particularly when the number of answer choices is large.

Despite these limitations, the partitioned supervision framework represents a significant advance in scalable AI oversight. By leveraging human experts' ability to reliably identify what's wrong rather than requiring them to know what's right, it opens new possibilities for training and evaluating AI systems on increasingly complex problems that exceed individual human capabilities.

AI Learns From Human 'No' Instead of 'Yes'

About the Author

Guilherme A.