AgentFold: Long-Horizon Web Agents with Proactive Context Management

Rui Ye, Zhongwang Zhang, Kuan Li, Huifeng Yin, Zhengwei Tao, Yida Zhao, Liangcai Su, Liwen Zhang, Zile Qiao, Xinyu Wang, Pengjun Xie, Fei Huang, Siheng Chen, Jingren Zhou, Yong Jiang

29 Oct 2025 3 min read

AI-generated image, based on the article abstract

Quick Insight

AgentFold: How Smart Web Helpers Remember What Matters

Ever wonder why some AI assistants lose track of a long conversation? AgentFold changes that by giving web‑agents a clever way to “fold” their memory, just like you might fold a paper notebook to keep the most important notes handy. Instead of stuffing every click and answer into a growing list, this new approach actively reshapes its memory, keeping key details crisp while safely tucking away older steps. Imagine a traveler who packs a suitcase: the most useful items stay on top, while the rest are neatly rolled and stored away. The result? The AI can handle complex, multi‑step web searches without getting confused, beating even much larger models and big‑brand assistants. This breakthrough means future chatbots could help you plan trips, research topics, or shop online with far fewer mix‑ups. Scientists found that a modestly sized model using AgentFold outperformed heavyweight rivals, proving that smarter memory beats bigger size. It’s a glimpse of a future where digital helpers stay focused, reliable, and truly useful every time you ask.

The next time you chat with an AI, imagine it folding its thoughts just for you.

Short Review

Overview of AgentFold's Proactive Context Management for LLM Agents

This groundbreaking article introduces AgentFold, a novel paradigm for LLM-based web agents designed to overcome the inherent challenges of context management in long-horizon tasks. Traditional agents often suffer from context saturation or irreversible information loss, hindering their effectiveness. Inspired by human cognitive processes of retrospective consolidation, AgentFold proposes a dynamic cognitive workspace that actively sculpts its historical trajectory rather than passively logging it. The core methodology involves "folding" operations, which include granular condensations for fine-grained details and deep consolidations for abstracting multi-step sub-tasks. Through Supervised Fine-Tuning (SFT), AgentFold learns to internalize this proactive context curation skill, leading to remarkable performance improvements.

Critical Evaluation of AgentFold's Performance and Methodology

Strengths: Novelty and Efficiency in LLM Agents

AgentFold presents a significant advancement in LLM agent design by introducing a truly proactive context management system. Its inspiration from human cognition provides a robust conceptual foundation, moving beyond passive context accumulation. The multi-scale folding operations, encompassing both granular and deep consolidations, effectively balance detail retention with the prevention of context inflation, a critical trade-off in complex tasks. This approach results in exceptional computational efficiency, demonstrating sub-linear growth in context size and achieving a 92% reduction in tokens compared to ReAct-based agents. Furthermore, the empirical results are compelling: AgentFold-30B-A3B achieves state-of-the-art performance on prominent BrowseComp benchmarks, notably surpassing larger open-source models and even leading proprietary agents like OpenAI's o4-mini, all achieved with relatively simple SFT without extensive pre-training or reinforcement learning.

Weaknesses: Potential Limitations and Future Directions

While AgentFold's performance is impressive, the reliance solely on Supervised Fine-Tuning (SFT), without continual pre-training or reinforcement learning (RL), might suggest a ceiling to its ultimate performance or generalizability across an even broader spectrum of unseen, highly diverse long-horizon tasks. The complexity of the "Fold-Generator" and the training data curation process, though effective, could present challenges for replication or adaptation to new domains without significant effort. Additionally, the article primarily focuses on web browsing tasks; its efficacy and efficiency in other complex domains, such as scientific discovery or creative writing, would require further validation. Future work exploring the integration of RL could potentially unlock even greater adaptability and robustness.

Implications: Advancing Scalable AI Agents

The implications of AgentFold are substantial for the future of LLM-based web agents and AI systems requiring sustained interaction. By effectively addressing the context saturation problem, AgentFold paves the way for more scalable, robust, and cost-effective AI assistants capable of tackling increasingly complex, long-horizon tasks. Its ability to maintain a significantly smaller context while achieving superior performance translates directly into reduced computational costs and memory footprints, making advanced AI agents more accessible and deployable. This paradigm shift towards active, human-inspired context curation could inspire new research directions in cognitive AI and agent architecture, fostering the development of truly intelligent and autonomous systems.

Conclusion: AgentFold's Impact on Web Agent Development

In conclusion, AgentFold represents a pivotal advancement in the field of LLM-based web agents, offering an elegant and highly effective solution to the persistent challenge of context management in long-horizon tasks. Its novel approach, inspired by human cognition and validated by state-of-the-art performance on key benchmarks, underscores the power of proactive context folding. The demonstrated computational efficiency and superior task completion rates position AgentFold as a leading paradigm, setting a new standard for designing intelligent agents that can navigate complex, multi-step interactions with unprecedented effectiveness and scalability. This work significantly contributes to the development of more capable and practical AI systems.