Whisper-CD: Accurate Long-Form Speech Recognition using Multi-Negative Contrastive Decoding

PDF Viewer

BUILDER'S SANDBOX

Build This Paper

Use an AI coding agent to implement this research.

OpenAI Codex
OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude Code
Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDE
AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

Cursor
CursorIDE

AI-first code editor built on VS Code.

VS Code
VS CodeIDE

Free, open-source editor by Microsoft.

Estimated $9K - $13K over 6-10 weeks.

See exactly what it costs to build this -- with 3 comparable funded startups.

7-day free trial. Cancel anytime.

Discover the researchers behind this paper and find similar experts.

7-day free trial. Cancel anytime.

References (25)

[1]
Qwen3-ASR Technical Report
2026Qwen Team
[2]
Evaluating Hallucinations in Multimodal LLMs with Spoken Queries under Diverse Acoustic Conditions
2025Hansol Park, Hoseong Ahn et al.
[3]
Voxtral
2025Alexander H. Liu, Andrew S. C. Ehrenberg et al.
[4]
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
2025Abdelrahman Abouelenin, Atabak Ashfaq et al.
[5]
ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription
2025Khanh Le, Tuan Vu Ho et al.
[6]
Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models
2025Hanin Atwany, Abdul Waheed et al.
[7]
Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio
2025Mateusz Bara'nski, Jan Jasi'nski et al.
[8]
Efficient Streaming LLM for Speech Recognition
2024Junteng Jia, Gil Keren et al.
[9]
CrisperWhisper: Accurate Timestamps on Verbatim Speech Transcriptions
2024Laurin Wagner, Bernhard Thallinger et al.
[10]
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
2024Yuchen Hu, Chen Chen et al.
[11]
Careless Whisper: Speech-to-Text Hallucination Harms
2024Allison Koenecke, A. S. G. Choi et al.
[12]
Contrastive Decoding Reduces Hallucinations in Large Multilingual Machine Translation Models
2024Jonas Waldendorf, B. Haddow et al.
[13]
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
2023Sicong Leng, Hang Zhang et al.
[14]
Generative Speech Recognition Error Correction With Large Language Models and Task-Activating Prompting
2023Chao-Han Huck Yang, Yile Gu et al.
[15]
Updated Corpora and Benchmarks for Long-Form Speech Recognition
2023Jennifer Drexler Fox, Desh Raj et al.
[16]
Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding
2023Rico Sennrich, Jannis Vamvas et al.
[17]
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
2023Yung-Sung Chuang, Yujia Xie et al.
[18]
Robust Speech Recognition via Large-Scale Weak Supervision
2022Alec Radford, Jong Wook Kim et al.
[19]
LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer
2022Xun Gong, Yu Wu et al.
[20]
Contrastive Decoding: Open-ended Text Generation as Optimization
2022Xiang Lisa Li, Ari Holtzman et al.

Showing 20 of 25 references

Founder's Pitch

"Whisper-CD enhances long-form speech recognition accuracy and speed by using a training-free contrastive decoding method, making it a drop-in replacement for existing Whisper systems."

Speech RecognitionScore: 8View PDF ↗

Commercial Viability Breakdown

0-10 scale

High Potential

2/4 signals

5

Quick Build

4/4 signals

10

Series A Potential

3/4 signals

7.5

Sources used for this analysis

arXiv Paper

Full-text PDF analysis of the research paper

GitHub Repository

Code availability, stars, and contributor activity

Citation Network

Semantic Scholar citations and co-citation patterns

Community Predictions

Crowd-sourced unicorn probability assessments

Analysis model: GPT-4o · Last scored: 3/6/2026

Explore the full citation network and related research.

7-day free trial. Cancel anytime.

Understand the commercial significance and market impact.

7-day free trial. Cancel anytime.

Get detailed profiles of the research team.

7-day free trial. Cancel anytime.

Related Papers

Loading…