16 papers - avg viability 6.3
An LLM-powered workflow optimization system for multidisciplinary software development that drastically reduces development time and improves communication efficiency in the automotive industry.
A framework for LLMs to generate scientifically grounded research ideas by learning motivation-driven reasoning processes.
This research demonstrates how LLM agents can autonomously de-anonymize individuals by linking scattered, non-identifying data, posing a significant privacy risk that requires new evaluation methods.
MemArchitect provides a policy-driven governance layer for LLM agent memory, ensuring reliability and safety by resolving contradictions and managing privacy.
A meta-audit framework that systematically enumerates LLM agent tool-call safety vulnerabilities missed by existing benchmarks.
Budget-Aware Value Tree (BAVT) optimizes LLM agent performance by intelligently managing resource allocation during multi-hop reasoning.
A benchmark and evaluation framework for LLM agents to perform long-horizon resource allocation in dynamic enterprise environments, identifying a critical capability gap.
Equip LLM agents with native retrieval by projecting hidden states into the embedding space, eliminating the need for a separate embedding model and reducing latency.
ZebraArena provides a diagnostic environment to evaluate and improve the reasoning-action coupling of tool-augmented LLMs, addressing a key challenge for advanced AI agents.
A framework that uses checklist-grounded reinforcement learning to enable LLMs to systematically evolve and refine scientific ideas based on fine-grained feedback.