Publications

(2024). Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation. arXiv preprint.

PDF Audio Samples

(2024). DiariST: Streaming Speech Translation with Speaker Diarization. ICASSP 2024.

PDF

(2023). What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model. Interspeech 2023.

PDF

(2023). Learning ASR Pathways: A Sparse Multilingual ASR Model. ICASSP 2023.

PDF

(2022). Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment. Interspeech 2022 (Oral).

PDF Audio Samples

(2022). Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis. ICASSP 2022.

PDF Code Audio Samples

(2022). Joint Hypoglycemia Prediction and Glucose Forecasting via Deep Multi-task Learning. ICASSP 2022.

PDF

(2021). EventPlus: A Temporal Event Understanding Pipeline. NAACL 2021 (Demonstrations).

PDF Code Demo

(2020). A CNN-based Active Learning Framework to Identify Mycobacteria in Digitized Ziehl-Neelsen Stained Human Tissues. Computerized Medical Imaging and Graphics 2020.

PDF

(2020). Biomedical Event Extraction with Hierarchical Knowledge Graphs. EMNLP 2020 (Findings).

PDF Code

(2019). Deep Structured Neural Network for Event Temporal Relation Extraction. CoNLL 2019.

PDF Code

(2019). Spoken Language Intent Detection using Confusion2Vec. Interspeech 2019.

PDF Dataset

(2018). Synthing: A WaveNet-based Singing Voice Synthisizer. USC course EE599: Deep Learning Lab for Speech Processing.

PDF Code Dataset Audio Samples

(2018). Collection and Classification of Lyrics. USC course CSCI544: Applied Natural Language Processing.

PDF Code

(2017). Faster-RCNN for Pedestrian Detection in Videos. Graduation Project for Undergraduates at Chongqing University.

(2015). An example journal article. Journal of Source Themes, 1(1).

PDF Cite Code Slides