State of the Field
Recent advancements in language models are increasingly focused on enhancing efficiency and accessibility across diverse applications. Work on small language models for low-resource languages demonstrates a cost-effective approach to training, enabling communities to develop tailored AI solutions for 54 languages at minimal expense. Meanwhile, innovations in reasoning capabilities are emerging through techniques like reward-guided stitching, which improves accuracy in complex problem-solving by leveraging intermediate reasoning steps. Additionally, the introduction of value-aware numerical representations addresses fundamental weaknesses in numerical understanding, enhancing arithmetic performance in transformer models. The exploration of multimodal capabilities is also gaining traction, with new frameworks that integrate visual and textual data to improve search and reasoning depth. Collectively, these developments indicate a shift towards more robust, adaptable, and user-friendly language models, poised to tackle real-world challenges in communication, education, and information retrieval.
Papers
1–10 of 18Kakugo: Distillation of Low-Resource Languages into Small Language Models
We present Kakugo, a novel and cost-effective pipeline designed to train general-purpose Small Language Models (SLMs) for low-resource languages using only the language name as input. By using a large...
Test-Time Scaling with Diffusion Language Models via Reward-Guided Stitching
Reasoning with large language models often benefits from generating multiple chains-of-thought, but existing aggregation strategies are typically trajectory-level (e.g., selecting the best trace or vo...
A.X K1 Technical Report
We introduce A.X K1, a 519B-parameter Mixture-of-Experts (MoE) language model trained from scratch. Our design leverages scaling laws to optimize training configurations and vocabulary size under fixe...
Hot-Start from Pixels: Low-Resolution Visual Tokens for Chinese Language Modeling
Large language models typically represent Chinese characters as discrete index-based tokens, largely ignoring their visual form. For logographic scripts, visual structure carries semantic and phonetic...
Raising Bars, Not Parameters: LilMoo Compact Language Model for Hindi
The dominance of large multilingual foundation models has widened linguistic inequalities in Natural Language Processing (NLP), often leaving low-resource languages underrepresented. This paper introd...
Value-Aware Numerical Representations for Transformer Language Models
Transformer-based language models often achieve strong results on mathematical reasoning benchmarks while remaining fragile on basic numerical understanding and arithmetic operations. A central limita...
Language Model Inversion through End-to-End Differentiation
Despite emerging research on Language Models (LM), few approaches analyse the invertibility of LMs. That is, given a LM and a desirable target output sequence of tokens, determining what input prompts...
MetaState: Persistent Working Memory for Discrete Diffusion Language Models
Discrete diffusion language models (dLLMs) generate text by iteratively denoising a masked sequence. Compared with autoregressive models, this paradigm naturally supports parallel decoding, bidirectio...
Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow
Masked Diffusion Language Models (MDLMs) promise parallel token generation and arbitrary-order decoding, yet it remains unclear to what extent current models truly realize these capabilities. We chara...
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
Multimodal large language models (MLLMs) have achieved remarkable success across a broad range of vision tasks. However, constrained by the capacity of their internal world knowledge, prior work has p...