Enterprise AI

5papers
4.6viability
-33%30d

State of the Field

Recent research in enterprise AI is increasingly focused on enhancing the capabilities of large language models (LLMs) to navigate complex organizational environments. Current work reveals that LLMs struggle with the intricacies of enterprise systems, particularly in predicting cascading effects from their actions, which can lead to silent errors in workflows. New benchmarks, such as World of Workflows and OurBench, are being developed to rigorously assess LLM performance in real-world enterprise scenarios, including SQL debugging and query routing across multiple databases. These advancements aim to address significant gaps in LLM reliability and accuracy, which are critical for automating tasks in data-heavy environments. Additionally, innovative architectures like REGAL are being proposed to ground AI agents in enterprise telemetry, ensuring that they operate within defined semantic frameworks. Collectively, these efforts are poised to streamline enterprise operations, reduce coordination costs, and facilitate the emergence of more agile organizational structures.

Last updated Mar 4, 2026

Papers

1–5 of 5
Research Paper·Jan 29, 2026

World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems

Frontier large language models (LLMs) excel as autonomous agents in many domains, yet they remain untested in complex enterprise systems where hidden workflows create cascading effects across intercon...

7.0 viability
Research Paper·Jan 26, 2026

Beyond Text-to-SQL: Can LLMs Really Debug Enterprise ETL SQL?

SQL is central to enterprise data engineering, yet generating fully correct SQL code in a single attempt remains difficult, even for experienced developers and advanced text-to-SQL LLMs, often requiri...

6.0 viability
Research Paper·Jan 27, 2026

Routing End User Queries to Enterprise Databases

We address the task of routing natural language queries in multi-database enterprise environments. We construct realistic benchmarks by extending existing NL-to-SQL datasets. Our study shows that rout...

6.0 viability
Research Paper·Mar 3, 2026

REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry

Enterprise engineering organizations produce high-volume, heterogeneous telemetry from version control systems, CI/CD pipelines, issue trackers, and observability platforms. Large Language Models (LLM...

3.0 viability
Research Paper·Feb 24, 2026

The Headless Firm: How AI Reshapes Enterprise Boundaries

The boundary of the firm is determined by coordination cost. We argue that agentic AI induces a structural change in how coordination costs scale: in prior modular systems, integration cost grew with ...

1.0 viability