Language Models

18papers

4.6viability

-50%30d

State of the Field

Recent advancements in language models are increasingly focused on enhancing efficiency and accessibility across diverse applications. Work on small language models for low-resource languages demonstrates a cost-effective approach to training, enabling communities to develop tailored AI solutions for 54 languages at minimal expense. Meanwhile, innovations in reasoning capabilities are emerging through techniques like reward-guided stitching, which improves accuracy in complex problem-solving by leveraging intermediate reasoning steps. Additionally, the introduction of value-aware numerical representations addresses fundamental weaknesses in numerical understanding, enhancing arithmetic performance in transformer models. The exploration of multimodal capabilities is also gaining traction, with new frameworks that integrate visual and textual data to improve search and reasoning depth. Collectively, these developments indicate a shift towards more robust, adaptable, and user-friendly language models, poised to tackle real-world challenges in communication, education, and information retrieval.

Last updated Mar 4, 2026

Papers

1–10 of 18

Research Paper·Jan 20, 2026

Kakugo: Distillation of Low-Resource Languages into Small Language Models

We present Kakugo, a novel and cost-effective pipeline designed to train general-purpose Small Language Models (SLMs) for low-resource languages using only the language name as input. By using a large...

8.0 viability

Research Paper·Feb 26, 2026

Test-Time Scaling with Diffusion Language Models via Reward-Guided Stitching

Reasoning with large language models often benefits from generating multiple chains-of-thought, but existing aggregation strategies are typically trajectory-level (e.g., selecting the best trace or vo...

8.0 viability

Research Paper·Jan 14, 2026

A.X K1 Technical Report

We introduce A.X K1, a 519B-parameter Mixture-of-Experts (MoE) language model trained from scratch. Our design leverages scaling laws to optimize training configurations and vocabulary size under fixe...

6.0 viability

Research Paper·Jan 14, 2026

Hot-Start from Pixels: Low-Resolution Visual Tokens for Chinese Language Modeling

Large language models typically represent Chinese characters as discrete index-based tokens, largely ignoring their visual form. For logographic scripts, visual structure carries semantic and phonetic...

6.0 viability

Research Paper·Mar 3, 2026

Raising Bars, Not Parameters: LilMoo Compact Language Model for Hindi

The dominance of large multilingual foundation models has widened linguistic inequalities in Natural Language Processing (NLP), often leaving low-resource languages underrepresented. This paper introd...

6.0 viability

Research Paper·Jan 14, 2026

Value-Aware Numerical Representations for Transformer Language Models

Transformer-based language models often achieve strong results on mathematical reasoning benchmarks while remaining fragile on basic numerical understanding and arithmetic operations. A central limita...

6.0 viability

Research Paper·Feb 11, 2026

Language Model Inversion through End-to-End Differentiation

Despite emerging research on Language Models (LM), few approaches analyse the invertibility of LMs. That is, given a LM and a desirable target output sequence of tokens, determining what input prompts...

5.0 viability

Research Paper·Mar 2, 2026

MetaState: Persistent Working Memory for Discrete Diffusion Language Models

Discrete diffusion language models (dLLMs) generate text by iteratively denoising a masked sequence. Compared with autoregressive models, this paradigm naturally supports parallel decoding, bidirectio...

5.0 viability

Research Paper·Jan 22, 2026

Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow

Masked Diffusion Language Models (MDLMs) promise parallel token generation and arbitrary-order decoding, yet it remains unclear to what extent current models truly realize these capabilities. We chara...

5.0 viability

Research Paper·Jan 29, 2026

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Multimodal large language models (MLLMs) have achieved remarkable success across a broad range of vision tasks. However, constrained by the capacity of their internal world knowledge, prior work has p...

5.0 viability

Page 1 of 2

Language Models

State of the Field

Papers

Kakugo: Distillation of Low-Resource Languages into Small Language Models

Test-Time Scaling with Diffusion Language Models via Reward-Guided Stitching

A.X K1 Technical Report

Hot-Start from Pixels: Low-Resolution Visual Tokens for Chinese Language Modeling

Raising Bars, Not Parameters: LilMoo Compact Language Model for Hindi

Value-Aware Numerical Representations for Transformer Language Models

Language Model Inversion through End-to-End Differentiation

MetaState: Persistent Working Memory for Discrete Diffusion Language Models

Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Filters