InteractComp: Evaluating Search Agents With Ambiguous Queries

Mingyi Deng, Lijun Huang, Yani Fan, Jiayi Zhang, Fashen Ren, Jinyi Bai, Fuzhen Yang, Dayi Miao, Zhaoyang Yu, Yifan Wu, Yanfei Zhang, Fengwei Teng, Yingjia Wan, Song Hu, Yude Li, Xin Jin, Conghao Hu, Haoyu Li, Qirui Fu, Tai Zhong, Xinyu Wang, Xiangru Tang, Nan Tang, Chenglin Wu, Yuyu Luo

29 Oct 2025 3 min read

AI-generated image, based on the article abstract

Quick Insight

When Your Google Search Gets Stuck, AI Might Be the Reason

Ever typed a vague question like “best way to fix my phone” and got dozens of unrelated results? Scientists have uncovered that many AI‑powered search assistants act as if they already know exactly what you mean, ignoring the fact that real users often start with fuzzy queries. To shine a light on this blind spot, researchers built InteractComp, a new test that throws deliberately ambiguous questions at search agents and watches whether they ask follow‑up questions to clear things up. Think of it like a detective who refuses to ask “Which model?” when you say “my car broke”—the case stays unsolved. The results are striking: even the smartest models solved only about 14 % of the puzzles, while they breezed through 71 % when the full context was given. This shows that current AI is overconfident, not clueless. By forcing a simple “Did you mean…?” step, performance jumps dramatically, hinting at untapped potential. As we rely more on AI for everyday answers, teaching these agents to ask the right clarifying question could make our digital lives smoother and more reliable. 🌟

Short Review

Overview of Interactive Search Agent Evaluation

This pivotal research introduces InteractComp, a novel benchmark designed to rigorously evaluate the capacity of language agents to resolve ambiguous user queries through active interaction during web search. Current search agents often operate under the unrealistic assumption of complete and unambiguous user input, lacking the interactive mechanisms crucial for real-world scenarios. To address this critical gap, InteractComp employs 210 expert-curated questions across nine domains, utilizing a target-distractor methodology to create genuine ambiguity resolvable only through dynamic engagement. The study's findings are striking: an evaluation of 17 models revealed a significant performance deficit, with the best model achieving only 13.73% accuracy, a stark contrast to 71.50% with complete context. This underperformance is primarily attributed to systematic overconfidence rather than inherent reasoning deficits, as demonstrated by dramatic gains observed under forced interaction. Furthermore, a longitudinal analysis highlighted a concerning stagnation in interaction capabilities over 15 months, despite substantial improvements in general search performance, exposing a critical blind spot in current AI development.

Critical Evaluation of InteractComp: Benchmarking Interactive AI

Strengths: A Novel Approach to Query Disambiguation

The introduction of InteractComp represents a significant strength, filling a crucial void in the evaluation of search agents by focusing on their ability to handle ambiguous queries interactively. Its "easy to verify, interact to disambiguate" principle, coupled with a target-distractor methodology, ensures that ambiguity is genuine and resolvable only through interaction, providing clean reward signals highly suitable for Reinforcement Learning with Value Regularization (RLVR). The benchmark effectively uncovers latent interaction capabilities in models that otherwise fail, demonstrating that the issue often lies in engagement strategies rather than a complete lack of reasoning. This robust design offers a clear pathway for both evaluating and training more sophisticated, human-like interactive agents.

Weaknesses: Unveiling Agent Overconfidence and Stagnation

While the benchmark itself is robust, the study critically exposes significant weaknesses in current language agent performance. The observed 13.73% accuracy rate underscores a profound limitation in how agents currently process and respond to uncertainty. The core issue identified is systematic overconfidence, where models fail to recognize their own ambiguity and initiate disambiguation, rather than a deficit in underlying reasoning. This overconfidence acts as a major performance bottleneck. Moreover, the longitudinal analysis revealing a stagnation in interaction capabilities over time, despite advancements in general search, highlights a concerning lack of progress in this vital area, indicating a critical AI development blind spot that needs urgent attention.

Implications: Charting the Future of Interactive AI in Search

The findings from InteractComp carry profound implications for the future of interactive AI and web search. By clearly demonstrating that agents possess latent interaction capabilities that current strategies fail to engage, the benchmark provides a compelling call to action for researchers and developers. It emphasizes the necessity of designing new mechanisms that actively encourage agents to recognize ambiguity and proactively seek clarification. InteractComp is not merely an evaluation tool; it is a valuable resource for training agents to overcome overconfidence and develop more effective interactive behaviors. This research points towards a future where search agents are not just information retrievers but intelligent, conversational partners capable of truly understanding and fulfilling complex, evolving user needs, thereby enhancing human-computer interaction significantly.

Conclusion: Advancing Human-Like Interaction in Search Agents

In conclusion, InteractComp stands as a groundbreaking contribution to the field of search agent development, offering an indispensable tool for assessing and improving interactive capabilities. The study's revelation of widespread agent overconfidence and the stagnation of interaction skills highlights a critical area for future research and development. By providing a clear framework for evaluating and training, InteractComp is poised to drive innovation towards more adaptive, context-aware, and truly interactive AI systems. This work is essential for fostering the next generation of search agents that can engage in dynamic, human-like dialogue to navigate the complexities of real-world information retrieval, ultimately enhancing the utility and intelligence of interactive AI.