MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training

Taicheng Guo, Hai Wang, ChaoChun Liu, Mohsen Golalikhani, Xin Chen, Xiangliang Zhang, Chandan K. Reddy

16 Oct 2025 3 min read

AI-generated image, based on the article abstract

Quick Insight

AI Agent Learns to Talk to Databases Over Long Conversations

Ever wondered how a chatbot could actually fetch the right data from a huge database after a back‑and‑forth chat? Scientists have built a new system called MTSQL‑R1 that does just that. Instead of guessing the answer in one go, the AI works like a diligent assistant: it proposes a query, checks the database’s reply, verifies if the answer makes sense, and then tweaks the query until everything lines up. Think of it as a chef tasting a sauce, adjusting the seasoning, and tasting again until the flavor is perfect. This “propose‑execute‑verify‑refine” loop lets the AI handle long, multi‑turn conversations without getting lost or giving nonsensical results. The breakthrough means future voice assistants could help you pull exact sales numbers, schedule reports, or answer complex questions just by chatting naturally. This discovery brings us closer to truly conversational data tools that understand context and correct themselves on the fly. Imagine asking your phone for the latest weather trends over several questions and getting precise, reliable answers every time. The future of smart dialogue is here, and it’s learning to listen and improve, one step at a time.

Short Review

Advancing Multi-Turn Text-to-SQL with Agentic Reinforcement Learning

This article introduces MTSQL-R1, an innovative agentic training framework designed to tackle the complexities of multi-turn Text-to-SQL tasks. It addresses the limitations of conventional short-horizon methods that often produce non-executable or incoherent SQL queries. By framing the task as a Markov Decision Process (MDP), MTSQL-R1 employs an iterative propose-execute-verify-refine cycle. This approach leverages both database execution feedback and a persistent dialogue memory to ensure query coherence and accuracy. The framework integrates Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) within a long-horizon training pipeline. Experimental results consistently demonstrate that MTSQL-R1 significantly outperforms strong baselines, marking a substantial improvement in conversational semantic parsing.

Critical Evaluation of MTSQL-R1

Strengths

MTSQL-R1's primary strength lies in its novel formulation of multi-turn Text-to-SQL as an MDP, enabling a more robust and adaptive system. The iterative propose-execute-verify-refine cycle, incorporating both environment-driven verification and memory-guided refinement, is a significant methodological advancement. This design ensures higher logical correctness and executability of generated SQL queries. The multi-level reward system in the RL phase, combining outcome-based and process-level rewards, further enhances the agent's learning capabilities. This comprehensive approach leads to superior performance and generalization across complex, multi-turn tasks.

Weaknesses

While MTSQL-R1 achieves impressive results, the analysis highlights some persistent challenges. The framework still encounters issues such as Aggregation Drift and struggles with certain extra-hard cases. These limitations suggest that while the iterative refinement significantly improves performance, there remains room for enhancing the agent's understanding of complex aggregation logic and highly nuanced conversational turns. Further research could focus on refining these specific areas to achieve even greater robustness.

Implications

The development of MTSQL-R1 has profound implications for the field of conversational semantic parsing and database interaction. By demonstrating the effectiveness of an agentic, long-horizon approach, it paves the way for more intelligent and reliable Text-to-SQL systems. This framework could significantly enhance user experience in natural language interfaces for databases, reducing errors and improving the efficiency of data retrieval. It also provides a strong foundation for future research into more sophisticated reasoning agents capable of handling increasingly complex dialogue contexts.

Conclusion

MTSQL-R1 represents a significant leap forward in multi-turn Text-to-SQL, offering a sophisticated agentic framework that addresses critical limitations of prior methods. Its innovative use of an MDP, iterative verification, and memory-guided refinement leads to demonstrably superior performance in generating robust SQL queries with improved dialogue coherence. Despite minor challenges, the article's findings underscore the immense value of environment-driven feedback and memory in developing advanced conversational AI. This work provides a compelling blueprint for future research in long-horizon reasoning and semantic parsing.