Multilingual NLP

5papers
5.2viability
+50%30d

State of the Field

Recent advancements in multilingual natural language processing are focusing on enhancing efficiency and adaptability across diverse languages and domains. Innovative architectures like convolutional networks are proving competitive with large transformer models for specific tasks, significantly reducing processing time and energy consumption. Meanwhile, new encoder families, such as MrBERT, are being tailored for localized linguistic tasks and specialized domains, showcasing the potential for cost-effective deployment in high-stakes applications. The introduction of datasets like BIRDTurk highlights the challenges faced by low-resource languages in text-to-SQL systems, while also providing a framework for evaluating cross-lingual performance. Additionally, research into cross-lingual classification methods for social media data emphasizes the importance of optimizing content filtering strategies to manage the noise inherent in multilingual discourse. Collectively, these efforts are addressing commercial needs for scalable, efficient multilingual solutions, paving the way for more nuanced and effective applications in global communication and data analysis.

Last updated Feb 26, 2026

Papers

1–5 of 5
Research Paper·Jan 16, 2026

Efficient Multilingual Name Type Classification Using Convolutional Networks

We present a convolutional neural network approach for classifying proper names by language and entity type. Our model, Onomas-CNN X, combines parallel convolution branches with depthwise-separable op...

7.0 viability
Research Paper·Feb 24, 2026

MrBERT: Modern Multilingual Encoders via Vocabulary, Domain, and Dimensional Adaptation

We introduce MrBERT, a family of 150M-300M parameter encoders built on the ModernBERT architecture and pre-trained on 35 languages and code. Through targeted adaptation, this model family achieves sta...

7.0 viability
Research Paper·Feb 3, 2026·B2B

BIRDTurk: Adaptation of the BIRD Text-to-SQL Dataset to Turkish

Text-to-SQL systems have achieved strong performance on English benchmarks, yet their behavior in morphologically rich, low-resource languages remains largely unexplored. We introduce BIRDTurk, the fi...

5.0 viability
Research Paper·Feb 19, 2026

Evaluating Cross-Lingual Classification Approaches Enabling Topic Discovery for Multilingual Social Media Data

Analysing multilingual social media discourse remains a major challenge in natural language processing, particularly when large-scale public debates span across diverse languages. This study investiga...

5.0 viability
Research Paper·Feb 18, 2026

When Semantic Overlap Is Not Enough: Cross-Lingual Euphemism Transfer Between Turkish and English

Euphemisms substitute socially sensitive expressions, often softening or reframing meaning, and their reliance on cultural and pragmatic context complicates modeling across languages. In this study, w...

2.0 viability