Alexander H. Liu @alex_h_liu

Ph.D. Student @MIT_CSAIL alexander-h-liu.github.io Massachusetts, USA Joined July 2013

Tweets

25
Followers

275
Following

145
Likes

164

Soham @sohamg121

2 months ago

The Voxtral tech-report is up! arxiv.org/abs/2507.13264 We release these models with a permissive Apache 2.0 license. Feedback is welcome! We have a lot more cooking, this is just the beginning.

1 3 14 471 1

💡Bridging speech, sound, & music representations with one universal model? We introduce USAD ✅ 📚 Distills knowledge from domain-specific SSL models 🎯 Matches expert models across speech/audio/music tasks 📄 arxiv.org/abs/2506.18843 🧑‍💻 huggingface.co/MIT-SLS/USAD-B…

0 9 34 2K 10

Download Image

Alexander H. Liu @alex_h_liu

9 months ago

Highly recommended!!! (Happy to chat if you’re curious about the experience with the team)

Rafael Valle @RafaelValleArt

9 months ago

Highly recommended!!! (Happy to chat if you’re curious about the experience with the team)

9 17 213 25K 117

0 0 18 2K 3

Alexander H. Liu @alex_h_liu

9 months ago

Turns out speech self-supervised learning technique can be generalized to sign language! Great work led by @Shester_G (he’s looking for PhD opportunity this year!)

Shester Gueuwou @Shester_G

9 months ago

Turns out speech self-supervised learning technique can be generalized to sign language! Great work led by @Shester_G (he’s looking for PhD opportunity this year!)

1 0 9 1K 2

Download Gif

3 0 21 2K 3

Rafael Valle @RafaelValleArt

9 months ago

💚 Big shoutout to the #FUGATTO team for making this release happen — and to cats like Coltrane and Xenakis, who envisioned a world where "saxophones bark and howl." Together, artists and researchers, let’s build a GPT-like future for audio generation! fugatto.github.io

5 8 47 3K 3

Alan Baade @BaadeAlan

11 months ago

Q: Why can't we get GPT-level understanding from language models on speech? A: We need better speech tokens! In SyllableLM, *we beat @kyutai_labs Moshi on semantic understanding in 70 hours of training* by making speech tokens at 5 frames/s With @PuyuanPeng, David Harwath 1/n

5 14 64 4K 28

Download Image

Rafael Valle @RafaelValleArt

a year ago

Synthetic labels are amazing! Do you need an audio labelling machine? Audio Flamingo checkpoints are available on github.com/NVIDIA/audio-f… ...and pre-training with synthetic labels from Audio Flamingo gives large improvements in text-to-audio models arxiv.org/abs/2406.15487

3 16 62 8K 29

Alexander H. Liu @alex_h_liu

a year ago

Looking forward to meeting friends at #ICASSP2024

0 0 29 1K 0

Download Image

Rafael Valle @RafaelValleArt

2 years ago

Beautiful work by Alex Liu on generative pre-training for speech with Flow Matching. I just realized it's one of the main components in AudioBox! arxiv.org/abs/2310.16338

2 2 30 3K 14

Hung-yi Lee (李宏毅) @HungyiLee2

2 years ago

Recent years have witnessed significant developments in audio codec models (an overview figure from arxiv.org/abs/2402.13236). We introduce Codec-SUPERB (arxiv.org/abs/2402.13071) to boost fair and comprehensive comparison. Leaderboard: codecsuperb.com

1 20 117 12K 25

Download Image

Alexander H. Liu @alex_h_liu

2 years ago

Visiting @WavLab was OWSM

0 1 45 2K 1

Download Image

Alexander H. Liu @alex_h_liu

2 years ago

Lin-Shan: if no one asked you to attend the closing ceremony, you’re probably not getting the award (and laughed out loud)

Wen-Chin Huang @unilightwf

2 years ago

Lin-Shan: if no one asked you to attend the closing ceremony, you’re probably not getting the award (and laughed out loud)

0 1 32 4K 1

Download Image

0 1 25 2K 1

Yuan Gong @YGongND

2 years ago

LTU and LTU-AS codes are released. As usual, it is a full release including training and inference code, pretrained checkpoint, and the datasets. We hope these would be useful. Check github.com/YuanGongND/ltu.

0 10 39 3K 6

Shinji Watanabe @shinjiw_at_cmu

2 years ago

I'll have a keynote talk at ASRU'23! asru2023.org/motion.asp?sit… See you soon in Taiwan! Actually, ASRU was the first conference that rejected my first-author paper (in 2003). But 20 years later, I was given the opportunity to be a keynote speaker, haha.

1 14 107 6K 4

Shinji Watanabe @shinjiw_at_cmu

2 years ago

We summarize our lab's activities toward speech foundation models at wavlab.org/activities/202…. We have several other ongoing activities, and they are selected papers presented at ASRU.

1 26 89 21K 19

Yuan Gong @YGongND

2 years ago

🚀 Our upgraded audio large language model LTU-2 is now hosted on HuggingFace Space at lnkd.in/eJDpsBY4. Please have a try and let us know what you think 😀 .

1 4 6 1K 2

Download Gif

Andrew Rouditchenko 🇺🇦 @arouditchenko

2 years ago

🗣️ Whisper is great for speech recognition, but it only recognizes ~100 languages. What if it wasn't trained on the language that you speak? Happy to introduce my #INTERSPEECH2023 paper comparing Whisper and XLS-R for adaption to unseen languages! arxiv.org/abs/2305.12606