Daily Dose of Data Science | Avi Chawla | Substack0 Hot0 bình luận10 phút đọc2 giờ trước

How to Achieve 2.8x Faster Automatic Speech Recognition

Standard RNN-Transducer (RNN-T) ASR decoders waste most of their compute stepping through silence one frame at a time. The Token-and-Duration Transducer (TDT) fixes this by adding a second output head to the joint network that predicts how many frames to skip, not just which token to emit. This lets the decoder jump over silence in one step instead of confirming blank outputs repeatedly. The result is up to 2.82x faster decoding with equal or better word error rate, no encoder changes required. Speechmatics uses TDT in production, and NVIDIA's Parakeet TDT models top the HuggingFace Open ASR Leaderboard on throughput (RTFx) using the same encoder size as slower RNN-T competitors.

Đọc bài gốc

#machine-learning

Nguồn: https://blog.dailydoseofds.com/p/how-to-achieve-28x-faster-automatic. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

Hugging Face27 Hot11 phút2 ngày trướcAI

Why Specialization Is Inevitable

AI chuyên biệt không phải là lựa chọn mà là xu hướng tất yếu do ba nguyên lý: định lý No Free Lunch (không thuật toán tổng quát nào vượt trội trên mọi bài toán), sinh học tiến hóa (chuyên gia cạnh tranh hiệu quả hơn đa năng dưới áp lực tài nguyên), và thị trường cạnh tranh (tập trung chiến lược ưu việt hơn phân tán). Các bằng chứng từ machine learning (negative transfer, mixture-of-experts, AlphaFold) và sự phân biệt giữa domain knowledge (thay thế bởi scaling) với domain specialization (không bị loại bỏ) càng củng cố kết luận: khi nguồn lực hữu hạn và áp lực chọn lọc, sự phù hợp luôn thắng thế so với sự đa dạng.

Lập trình viên nên đọc bài này để hiểu cách AI và hệ thống máy học tự động hóa và tối ưu hóa thành công thông qua chuyên môn hóa chứ không phải sự đa dạng rộng rãi.

How to Achieve 2.8x Faster Automatic Speech Recognition

Đề xuất cho bạn

Why Specialization Is Inevitable

ML Development in VS Code with Google Cloud Power: Workbench Extension Now Available

From a “Buzzword” to a “Direction” — How AI Pulled Me Into the World of Data

Unlocking the Power of the TPU Stack: Introducing our new Developer Hub

From Prompt to Classifier: A Production Case Study

Every security leader I know has a version of the same story.

10 Best YouTube Channels to Learn Python in 2026

My Journey