Takuya Yoshioka @_ty274

Speech technology researcher/manager @AssemblyAI linkedin.com/in/ty274/ Bellevue, WA Joined November 2016

Tweets

917
Followers

552
Following

57
Likes

3K

Shyam Gollakota @ShyamGollakota

a year ago

Want to hear a friend in a noisy café? We designed deep learning-based headphones that let you isolate the speech from a specific person just by *looking* at them for a few seconds. CHI'24 honorable mention award. Paper: arxiv.org/abs/2405.06289 Code: github.com/vb000/LookOnce…

15 51 282 119K 111

Download Video

Jeff Dean @JeffDean

a year ago

I got an early demo of this when I visited @uwcse a couple months ago and the ability to isolate sounds in your environment was pretty great. Nice work, @b_veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, and @ShyamGollakota!

Shyam Gollakota @ShyamGollakota

a year ago

15 51 282 119K 111

Download Video

9 34 369 103K 82

Shinji Watanabe @shinjiw_at_cmu

2 years ago

Hi all, please let me know if you know large-scale speech data that can be used for training our Whisper reproduction (OWSM) model (arxiv.org/abs/2309.13876). We plan to move to OWSM v4.

13 27 97 14K 25

Download Image

Takuya Yoshioka @_ty274

2 years ago

Last Friday marked the end of my 7-year journey at Microsoft, filled with rewarding challenges, both in research & production, and incredible colleagues. I'll be starting something new very soon. マイクロソフトを退職しました。まだずっとシアトル界隈にいます。

3 5 43 6K 2

Download Image

AK @_akhaliq

2 years ago

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer paper page: huggingface.co/papers/2308.06… Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However,…

4 88 314 74K 142

Download Image

Takuya Yoshioka @_ty274

2 years ago

SpeechX from our new paper is a single generative model that edits, enhances & creates speech, enabling zero-shot TTS, spoken content editing (while preserving ambience), speaker extraction & speech/noise removal. Demo: aka.ms/speechx Paper: arxiv.org/abs/2308.06873

0 16 72 6K 12

Jonathan Le Roux @JonathanLeRoux

2 years ago

To everyone booking their @IEEE_WASPAA trip: please consider attending #SANE2023, which will take place at NYU on Thursday October 26, the day after #WASPAA2023. Register at saneworkshop.org/sane2023/

IEEE WASPAA 2025 @IEEE_WASPAA

2 years ago

0 1 17 6K 0

0 7 21 4K 1

Desh Raj @rdesh26

2 years ago

@ieeeICASSP Are there poster printing facilities at/near the conference venue?

1 2 0 989 0

Takuya Yoshioka @_ty274

2 years ago

Real-time target sound extraction with waveformer (to appear in ICASSP). Joint work with UW researchers. Paper (updated): arxiv.org/abs/2211.02250 Demo: waveformer.cs.washington.edu Code (both causal and non-causal): github.com/vb000/Waveform…

1 27 144 36K 54

Download Video

IEEE WASPAA 2025 @IEEE_WASPAA

3 years ago

WASPAA 2023 calls for papers! The traditional intimate Mohonk Mountain House with exciting changes: double-blind review, an unprecedented amount of travel grants, and more. More information: waspaa.com/call-for-paper… #waspaa2023

0 15 34 6K 1

Download Image

Shinji Watanabe @shinjiw_at_cmu

3 years ago

すごい！世界最大1万9千時間の音声コーパスと高精度日本語音声認識モデルがオープンソースで公開 - 窓の杜 forest.watch.impress.co.jp/docs/news/1471… via @madonomori

0 8 29 2K 1

IEEE ICASSP @ieeeICASSP

3 years ago

The #ICASSP2023 paper submission site is now open! Submit your papers by 19 October 2022 to be considered. Learn more about the paper guidelines and submission requirements here: hubs.la/Q01nmxt_0

0 6 21 0 0

Takuya Yoshioka @_ty274

3 years ago

How can we do streaming multi-talker ASR by best combining speech separation and overlap-robust ASR? t-SOT-VA does that and works for real meeting audio with any # of mics, achieving the best published WERs of 13.7%/15.5% for AMI-MDM dev/eval. Paper: arxiv.org/abs/2209.04974

2 4 27 0 4

Download Image

Marcin Junczys-Dowmunt (Marian NMT) @marian_nmt

3 years ago

Please retweet, @Lam19Tk, a young MT researcher, soon-to-be-PhD needs your help. He is looking for a job in speech/text translation. A job he already had lined-up has been revoked due to the hiring freezes in the industry. Here's his linkedin profile: linkedin.com/in/tsz-kin-lam…

4 16 17 0 1

Takuya Yoshioka @_ty274

3 years ago

Our new work on speaker diarization: arxiv.org/abs/2208.13085 (1) TS-VAD with cross-speaker transformer achieves a new SOTA DER in VoxConverse. (2) Further EEND-EDA integration for one-step diarization brings down the DER in CALLHOME.

2 8 32 0 5

Download Image

CHiME Challenge @chimechallenge

3 years ago

The challenge submission deadline is approaching (Sep 26). If you're interested in it, please do not hesitate to ask the CHiME Steering Group ([email protected]) or members (chimechallenge.org/current/steeri…) individually!