The term 'data agent' is increasingly used to describe AI systems that handle data tasks, but this label masks a wide range of capabilities—from basic assistants to fully autonomous systems. Researchers from institutions including Tsinghua University and Hong Kong University of Science and Technology have developed a six-level taxonomy to clarify what these agents can actually do, drawing inspiration from the SAE J3016 standard for self-driving cars. This framework helps users understand the real autonomy of these tools and sets realistic expectations for their performance in data management and analysis.
The key finding is that current data agents operate at various autonomy levels, from Level 0 (fully manual) to Level 5 (fully autonomous with innovation). Most existing systems are at Levels 1 and 2, where they assist with tasks like generating SQL queries or cleaning data but require significant human oversight. For example, Level 1 agents provide prompt-based help, such as converting temperature units in a CSV file, but cannot interact with data environments on their own. Level 2 agents, like those in data cleaning, can execute code and adapt based on feedback but still follow human-designed workflows. The transition to Level 3, where agents orchestrate entire data tasks independently under supervision, is a major focus but remains largely unrealized, with only emerging 'proto-L3' systems showing early progress.
Methodology for this survey involved categorizing over 100 data agent systems based on their autonomy levels, using criteria like environmental perception, tool invocation, and task dominance. The researchers analyzed each system's ability to handle data-related tasks—such as configuration tuning, data cleaning, and report generation—without human intervention. They referenced specific examples, like AutoPrep for data preparation and Alpha-SQL for SQL generation, to illustrate how agents at different levels perform in real-world scenarios. This approach allowed a systematic review of advancements and gaps in the field.
Results from the analysis show that data agents excel in narrow tasks but struggle with broader autonomy. For instance, Level 2 agents can improve data cleaning through iterative loops, as seen in systems that detect and fix errors based on execution feedback, but they cannot devise their own workflows. The data indicates that most agents are confined to pre-defined procedures, limiting their adaptability. Figures from the paper, such as the overview in Figure 1, highlight how agents bridge user applications and data infrastructure, yet their effectiveness drops when tasks require cross-domain reasoning or long-term planning. The ongoing shift from Level 2 to Level 3 is identified as a critical hurdle, with current systems lacking the versatility to manage entire data lifecycles without human guidance.
In context, this taxonomy matters because it helps businesses and researchers avoid over-reliance on underpowered AI tools. For regular readers, it means that data agents are not yet the 'set-and-forget' solutions often advertised; instead, they require careful supervision to prevent errors in sensitive areas like financial analysis or healthcare data. By clarifying autonomy levels, the framework supports better decision-making in adopting AI for data tasks, reducing risks of mismatched expectations and accountability issues in industries from tech to finance.
Limitations noted in the paper include the reliance of current agents on human-designed pipelines and their inability to self-evolve in dynamic environments. For example, even advanced systems struggle with unforeseen data changes or complex, multi-step problems without manual updates. This underscores that full autonomy—where agents proactively discover and solve data issues—is still a future vision, with significant research needed to address gaps in reasoning and adaptability.
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn