The Voxtral tech-report is up!
arxiv.org/abs/2507.13264
We release these models with a permissive Apache 2.0 license. Feedback is welcome! We have a lot more cooking, this is just the beginning.
Turns out speech self-supervised learning technique can be generalized to sign language!
Great work led by @Shester_G (he’s looking for PhD opportunity this year!)
Turns out speech self-supervised learning technique can be generalized to sign language!
Great work led by @Shester_G (he’s looking for PhD opportunity this year!)
💚 Big shoutout to the #FUGATTO team for making this release happen — and to cats like Coltrane and Xenakis, who envisioned a world where "saxophones bark and howl."
Together, artists and researchers, let’s build a GPT-like future for audio generation!
fugatto.github.io
Q: Why can't we get GPT-level understanding from language models on speech?
A: We need better speech tokens!
In SyllableLM, *we beat
@kyutai_labs Moshi on semantic understanding in 70 hours of training* by making speech tokens at 5 frames/s
With @PuyuanPeng, David Harwath
1/n
Synthetic labels are amazing! Do you need an audio labelling machine? Audio Flamingo checkpoints are available on github.com/NVIDIA/audio-f…
...and pre-training with synthetic labels from Audio Flamingo gives large improvements in text-to-audio models
arxiv.org/abs/2406.15487
Beautiful work by Alex Liu on generative pre-training for speech with Flow Matching. I just realized it's one of the main components in AudioBox!
arxiv.org/abs/2310.16338
Recent years have witnessed significant developments in audio codec models (an overview figure from arxiv.org/abs/2402.13236). We introduce Codec-SUPERB (arxiv.org/abs/2402.13071) to boost fair and comprehensive comparison. Leaderboard: codecsuperb.com
LTU and LTU-AS codes are released. As usual, it is a full release including training and inference code, pretrained checkpoint, and the datasets. We hope these would be useful. Check github.com/YuanGongND/ltu.
I'll have a keynote talk at ASRU'23! asru2023.org/motion.asp?sit…
See you soon in Taiwan!
Actually, ASRU was the first conference that rejected my first-author paper (in 2003). But 20 years later, I was given the opportunity to be a keynote speaker, haha.
We summarize our lab's activities toward speech foundation models at wavlab.org/activities/202….
We have several other ongoing activities, and they are selected papers presented at ASRU.
🚀 Our upgraded audio large language model LTU-2 is now hosted on HuggingFace Space at lnkd.in/eJDpsBY4. Please have a try and let us know what you think 😀 .
🗣️ Whisper is great for speech recognition, but it only recognizes ~100 languages. What if it wasn't trained on the language that you speak?
Happy to introduce my #INTERSPEECH2023 paper comparing Whisper and XLS-R for adaption to unseen languages!
arxiv.org/abs/2305.12606
17K Followers 6K FollowingNeurodivergent physics student with a keen interest in multisensory integration and emergent perception. Exploring research on a proposed ‘sixth sense’. Δ
4 Followers 96 FollowingGlobal Head-Hunter For GenAI And Quantum. Helping Companies In AI And Quantum Find The Best Talent 🚀GenAI And Quantum Specialist🚀
621 Followers 8K FollowingScrambling tokens and daydreams, a latent-space fiddler—The next-token gradients get steeper, more rococo; I keep thinking why something exists and not nothing.
5 Followers 193 FollowingNature is the greatest artist in the world, and every leaf, bird, and flower tells its own story. She loves to explore forests, mountains, lakes, and oceans, an
56K Followers 853 FollowingFiguring out AI @allen_ai, open models, RLHF, fine-tuning, etc
Contact via email.
Writes @interconnectsai
Wrote The RLHF Book
Mountain runner
303 Followers 295 FollowingPh.D. in CS at University of Maryland, College Park | Ex- Adobe Research, NVIDIA, Cisco | Speech, Audio and Language Processing Researcher
229 Followers 226 FollowingResearch scientist @NVIDIA working on deep generative models for sequences, with a particular focus on speech and audio. Personal account.
636K Followers 35 FollowingWe're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.
132 Followers 235 FollowingCS Ph.D. @UTAustin @UTCompSci , Previously a NTUEE Undergraduate Interested in Music Information Retrieval, Speech processing and Deep learning.
5K Followers 500 FollowingChief exploration officer @kyutai_labs, with strong interests in stochastic optimization, audio generative models, and AI for science.
92 Followers 136 Following3rd Ph.D. student at National Taiwan University. Currently working on prompting speech LLMs. SpeechPrompt / SpeechGen. Ex-intern at @Meta @RealityLabs.
35K Followers 189 FollowingCo-founder and CEO https://t.co/efv72CKpAG (@WaveFormsAI) - Ex @OpenAI GPT-4o/AVM Audio Research Lead - #Her #TARS - Ex @AIatMeta, @Polytechnique (X11)