Current research in speech recognition is increasingly focused on addressing the limitations of existing systems, particularly in low-resource and high-stakes environments. Recent work has highlighted the challenges of dialectal variability in languages like Taiwanese Hakka and the need for inclusive technologies for speech-impaired individuals, such as those speaking Akan. Additionally, studies have shown that mainstream ASR systems struggle with short, critical utterances, prompting the development of synthetic data generation techniques to enhance accuracy for non-English speakers. Innovations like BBPE16 aim to streamline multilingual tokenization, while connector-sharing strategies based on linguistic family membership improve efficiency in multilingual ASR applications. Furthermore, frameworks like VibeVoice-ASR are designed to handle long-form audio and multi-speaker scenarios more effectively, integrating various speech processing tasks into a single pipeline. Collectively, these advancements signal a shift toward more robust, inclusive, and context-aware speech recognition systems capable of meeting diverse commercial needs.
Top papers
- Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing(7.0)
- "Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most(6.0)
- BBPE16: UTF-16-based byte-level byte-pair encoding for improved multilingual speech recognition(5.0)
- Enabling Automatic Disordered Speech Recognition: An Impaired Speech Dataset in the Akan Language(5.0)
- A Holistic Framework for Robust Bangla ASR and Speaker Diarization with Optimized VAD and CTC Alignment(5.0)
- When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper(5.0)
- GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialectal ASR(5.0)
- Language Family Matters: Evaluating LLM-Based ASR Across Linguistic Boundaries(4.0)
- VIBEVOICE-ASR Technical Report(3.0)
- WESR: Scaling and Evaluating Word-level Event-Speech Recognition(2.0)