State of the Field
Current research on AI agents is increasingly focused on enhancing efficiency and adaptability in complex tasks, addressing critical commercial challenges in scalability and reliability. Recent work emphasizes the development of frameworks that optimize resource allocation, such as confidence-aware routing and adaptive model selection, which significantly reduce computational costs while improving performance. Innovations like regression testing for non-deterministic workflows and structured self-evolving systems are paving the way for more robust deployment in high-stakes environments. Additionally, the integration of human domain knowledge into AI agents is enabling non-experts to achieve expert-level outcomes, thus alleviating bottlenecks in decision-making processes. The exploration of omni-modal capabilities is also gaining traction, aiming to create AI agents that can seamlessly integrate multiple forms of input for more nuanced interactions. Collectively, these advancements signal a shift toward more efficient, reliable, and versatile AI agents capable of tackling real-world applications across various sectors.
Papers
1–10 of 15Orchestrating Intelligence: Confidence-Aware Routing for Efficient Multi-Agent Collaboration across Multi-Scale Models
While multi-agent systems (MAS) have demonstrated superior performance over single-agent approaches in complex reasoning tasks, they often suffer from significant computational inefficiencies. Existin...
How to Build AI Agents by Augmenting LLMs with Codified Human Expert Domain Knowledge? A Software Engineering Framework
Critical domain knowledge typically resides with few experts, creating organizational bottlenecks in scalability and decision-making. Non-experts struggle to create effective visualizations, leading t...
AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows
Autonomous AI agents are deployed at unprecedented scale, yet no principled methodology exists for verifying that an agent has not regressed after changes to its prompts, tools, models, or orchest...
EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines
While LLM-based agents have shown promise for deep research, most existing approaches rely on fixed workflows that struggle to adapt to real-world, open-ended queries. Recent work therefore explores s...
PieArena: Frontier Language Agents Achieve MBA-Level Negotiation Performance and Reveal Novel Behavioral Differences
We present an in-depth evaluation of LLMs' ability to negotiate, a central business task that requires strategic reasoning, theory of mind, and economic value creation. To do so, we introduce PieArena...
Agentic Confidence Calibration
AI agents are rapidly advancing from passive language models to autonomous systems executing complex, multi-step tasks. Yet their overconfidence in failure remains a fundamental barrier to deployment ...
CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning
Mobile Agents can autonomously execute user instructions, which requires hybrid-capabilities reasoning, including screen summary, subtask planning, action decision and action function. However, existi...
From Labor to Collaboration: A Methodological Experiment Using AI Agents to Augment Research Perspectives in Taiwan's Humanities and Social Sciences
Generative AI is reshaping knowledge work, yet existing research focuses predominantly on software engineering and the natural sciences, with limited methodological exploration for the humanities and ...
OmniGAIA: Towards Native Omni-Modal AI Agents
Human intelligence naturally intertwines omni-modal perception -- spanning vision, audio, and language -- with complex reasoning and tool usage to interact with the world. However, current multi-modal...
Toward Efficient Agents: Memory, Tool learning, and Planning
Recent years have witnessed increasing interest in extending large language models into agentic systems. While the effectiveness of agents has continued to improve, efficiency, which is crucial for re...