State of the Field
Recent research in enterprise AI is increasingly focused on enhancing the capabilities of large language models (LLMs) to navigate complex organizational environments. Current work reveals that LLMs struggle with the intricacies of enterprise systems, particularly in predicting cascading effects from their actions, which can lead to silent errors in workflows. New benchmarks, such as World of Workflows and OurBench, are being developed to rigorously assess LLM performance in real-world enterprise scenarios, including SQL debugging and query routing across multiple databases. These advancements aim to address significant gaps in LLM reliability and accuracy, which are critical for automating tasks in data-heavy environments. Additionally, innovative architectures like REGAL are being proposed to ground AI agents in enterprise telemetry, ensuring that they operate within defined semantic frameworks. Collectively, these efforts are poised to streamline enterprise operations, reduce coordination costs, and facilitate the emergence of more agile organizational structures.
Papers
1–5 of 5World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems
Frontier large language models (LLMs) excel as autonomous agents in many domains, yet they remain untested in complex enterprise systems where hidden workflows create cascading effects across intercon...
Beyond Text-to-SQL: Can LLMs Really Debug Enterprise ETL SQL?
SQL is central to enterprise data engineering, yet generating fully correct SQL code in a single attempt remains difficult, even for experienced developers and advanced text-to-SQL LLMs, often requiri...
Routing End User Queries to Enterprise Databases
We address the task of routing natural language queries in multi-database enterprise environments. We construct realistic benchmarks by extending existing NL-to-SQL datasets. Our study shows that rout...
REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry
Enterprise engineering organizations produce high-volume, heterogeneous telemetry from version control systems, CI/CD pipelines, issue trackers, and observability platforms. Large Language Models (LLM...
The Headless Firm: How AI Reshapes Enterprise Boundaries
The boundary of the firm is determined by coordination cost. We argue that agentic AI induces a structural change in how coordination costs scale: in prior modular systems, integration cost grew with ...