Papers
1–4 of 4One-step Language Modeling via Continuous Denoising
Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. In practice, however, they exhibit a sharp d...
Addressing the Ecological Fallacy in Larger LMs with Human Context
Language model training and inference ignore a fundamental linguistic fact -- there is a dependence between multiple sequences of text written by the same person. Prior work has shown that addressing ...
Ensembling Language Models with Sequential Monte Carlo
Practitioners have access to an abundance of language models and prompting strategies for solving many language modeling tasks; yet prior work shows that modeling performance is highly sensitive to bo...
Expert Threshold Routing for Autoregressive Language Modeling with Dynamic Computation Allocation and Load Balancing
Token-choice Mixture-of-Experts (TC-MoE) routes each token to a fixed number of experts, limiting dynamic computation allocation and requiring auxiliary losses to maintain load balance. We propose Exp...