Mu Yang

Mu Yang

About Me

Hi, there! My name is Mu Yang. I’m a 3rd year Ph.D. student in Electrical and Computer Engineering (ECE) at University of Texas at Dallas. My advisor is Dr. John H. L. Hansen. I’m a member of UTD Center for Robust Speech Systems (UTD CRSS).

My research interests include Speech Recognition and Speech Synthesis. My recent works have focused on accented (non-native) speech assessment and multilingual Automatic Speech Recognition. In the past, I also had experience in Spoken Language Understanding (Intent Classification) and Natural Language Processing (Event and Event Temporal Relation Extraction).

Interests
  • Speech Recognition
  • Speech Synthesis
  • Natural/Spoken Language Processing
Education
  • Ph.D. in Electrical and Computer Engineering, 2021-

    University of Texas at Dallas, USA

  • Ph.D. in Computer Science, 2020-2021 (quitted)

    Texas A&M University, USA

  • M.S. in Electrical Engineering, 2017-2019

    University of Southern California, USA

  • B.Eng. in Communication Engineering, 2013-2017

    Chongqing University, China

  • Exchange Student, 2016

    National Sun Yat-sen University, Taiwan

Publications

(2024). Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation. arXiv preprint.

PDF Audio Samples

(2024). DiariST: Streaming Speech Translation with Speaker Diarization. ICASSP 2024.

PDF

(2023). What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model. Interspeech 2023.

PDF

(2023). Learning ASR Pathways: A Sparse Multilingual ASR Model. ICASSP 2023.

PDF

(2022). Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment. Interspeech 2022 (Oral).

PDF Audio Samples

(2022). Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis. ICASSP 2022.

PDF Code Audio Samples

(2022). Joint Hypoglycemia Prediction and Glucose Forecasting via Deep Multi-task Learning. ICASSP 2022.

PDF

(2021). EventPlus: A Temporal Event Understanding Pipeline. NAACL 2021 (Demonstrations).

PDF Code Demo

(2020). A CNN-based Active Learning Framework to Identify Mycobacteria in Digitized Ziehl-Neelsen Stained Human Tissues. Computerized Medical Imaging and Graphics 2020.

PDF

(2020). Biomedical Event Extraction with Hierarchical Knowledge Graphs. EMNLP 2020 (Findings).

PDF Code

(2019). Deep Structured Neural Network for Event Temporal Relation Extraction. CoNLL 2019.

PDF Code

(2019). Spoken Language Intent Detection using Confusion2Vec. Interspeech 2019.

PDF Dataset

Experience

 
 
 
 
 
Meta AI
Research Intern
Meta AI
May 2024 – Aug 2024 New York City, NY, USA

Mentors: Bowen Shi, Matthew Le, Wei-Ning Hsu. Manager: Andros Tjandra (FAIR Audiobox team).

Text-to-audio generation.

 
 
 
 
 
Microsoft
Research Intern
Microsoft
May 2023 – Aug 2023 Redmond, WA, USA

Mentors: Naoyuki Kanda, Xiaofei Wang. Manager: Takuya Yoshioka (Cognitive Services Research Speech team).

Speech Translation.

 
 
 
 
 
Meta AI
Research Intern
Meta AI
May 2022 – Aug 2022 New York City, NY, USA

Mentors: Andros Tjandra, Chunxi Liu, David Zhang. Manager: Duc Le, Ozlem Kalinli (AI Speech team).

Develop multilingual ASR technologies for on-device scenario.

 
 
 
 
 
USC ISI
Research Assistant
USC ISI
Jul 2019 – Jul 2020 Los Angeles, CA, USA

Supervisor: Nanyun Peng.

NLP projects (Information Extraction).

Misc.

I love music! I play guitar (a lot), bass (a little), drum (a little) and violin (>10 years back). I play and sing in a few (unprofessional) bands. We covered songs from our favorite artists. Check out some of our videos!