Recent advancements in agent-based systems are focusing on enhancing reliability, efficiency, and adaptability in complex tasks across various domains. New frameworks, such as Avenir-Web, are improving the execution of long-horizon tasks on dynamic web interfaces by integrating advanced grounding techniques and adaptive memory systems. Concurrently, innovations like Learning to Share are optimizing parallel agentic systems by implementing selective memory mechanisms that reduce computational overhead while maintaining performance. Boundary-Aware Policy Optimization is addressing reliability issues in reinforcement learning by promoting accurate self-assessment in agents, encouraging them to acknowledge limitations. Additionally, the introduction of modular frameworks like AgentForge is democratizing the development of autonomous agents, enabling rapid prototyping and deployment. These developments collectively aim to solve commercial challenges in automation, data mining, and user interaction, paving the way for more robust and efficient agent systems that can operate effectively in real-world applications.
Top papers
- TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration(8.0)
- ProAct: Agentic Lookahead in Interactive Environments(8.0)
- CUA-Skill: Develop Skills for Computer Using Agent(8.0)
- Avenir-Web: Human-Experience-Imitating Multimodal Web Agents with Mixture of Grounding Experts(8.0)
- Learning to Share: Selective Memory for Efficient Parallel Agentic Systems(8.0)
- BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search(8.0)
- Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity(8.0)
- Darwinian Memory: A Training-Free Self-Regulating Memory System for GUI Agent Evolution(8.0)
- A Lightweight Modular Framework for Constructing Autonomous Agents Driven by Large Language Models: Design, Implementation, and Applications in AgentForge(8.0)
- MAXS: Meta-Adaptive Exploration with LLM Agents(8.0)
- SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks(8.0)
- M$^2$-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining(8.0)
- Learning with Challenges: Adaptive Difficulty-Aware Data Generation for Mobile GUI Agent Training(7.0)
- Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search(7.0)
- ToolRLA: Fine-Grained Reward Decomposition for Tool-Integrated Reinforcement Learning Alignment in Domain-Specific Agents(7.0)
- Long-term Task-oriented Agent: Proactive Long-term Intent Maintenance in Dynamic Environments(7.0)
- RF-Agent: Automated Reward Function Design via Language Agent Tree Search(7.0)
- PseudoAct: Leveraging Pseudocode Synthesis for Flexible Planning and Action Control in Large Language Model Agents(7.0)
- Graph-Based Self-Healing Tool Routing for Cost-Efficient LLM Agents(7.0)
- From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents(7.0)
- Think like a Scientist: Physics-guided LLM Agent for Equation Discovery(7.0)
- Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization(7.0)
- GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic Models(7.0)
- BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI Agents(7.0)
- Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning(7.0)
- Deep Researcher with Sequential Plan Reflection and Candidates Crossover (Deep Researcher Reflect Evolve)(7.0)
- MINT: Minimal Information Neuro-Symbolic Tree for Objective-Driven Knowledge-Gap Reasoning and Active Elicitation(7.0)
- ST-EVO: Towards Generative Spatio-Temporal Evolution of Multi-Agent Communication Topologies(7.0)
- Discovering High Level Patterns from Simulation Traces(7.0)
- Adaptive Memory Admission Control for LLM Agents(7.0)
- VeRO: An Evaluation Harness for Agents to Optimize Agents(7.0)
- SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly(7.0)
- LLM-in-Sandbox Elicits General Agentic Intelligence(7.0)
- KARL: Knowledge Agents via Reinforcement Learning(7.0)
- Best-of-Q: Improving VLM agents with Q-function Action Ranking at Inference(7.0)
- Insight Agents: An LLM-Based Multi-Agent System for Data Insights(7.0)
- AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration(7.0)
- TopoDIM: One-shot Topology Generation of Diverse Interaction Modes for Multi-Agent Systems(7.0)
- MASCOT: Towards Multi-Agent Socio-Collaborative Companion Systems(7.0)
- Emerging from Ground: Addressing Intent Deviation in Tool-Using Agents via Deriving Real Calls into Virtual Trajectories(7.0)
- OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution(7.0)
- ShopSimulator: Evaluating and Exploring RL-Driven LLM Agent for Shopping Assistants(7.0)
- Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering(7.0)
- FormalJudge: A Neuro-Symbolic Paradigm for Agentic Oversight(7.0)
- NEMO: Execution-Aware Optimization Modeling via Autonomous Coding Agents(7.0)
- AutoRefine: From Trajectories to Reusable Expertise for Continual LLM Agent Refinement(7.0)
- WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents(7.0)
- GLOVE: Global Verifier for LLM Memory-Environment Realignment(7.0)
- Reinforcing Real-world Service Agents: Balancing Utility and Cost in Task-oriented Dialogue(7.0)
- Improving MLLMs in Embodied Exploration and Question Answering with Human-Inspired Memory Modeling(7.0)