A Mis-pronunciation Detection system, with word-level aligned phonemes predictions.
Final project for USC course EE599: Deep Learning Lab for Speech Processing - a WaveNet-based singing voice synthesizer. This is a partial implementation of the paper [A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs](https://www.mdpi.com/2076-3417/7/12/1313).
Web crawler of lyrics and corresponding music genre. Multiple baseline classifiers, such as Naive Bayes, SVM and Neural Approach(LSTM) are applied to identify the genre of a song by analyzing its lyrics.
A Parallel second-order-based equalizer for Room Impulse Response Calibration.
Train and deploy a Faster-RCNN framework to perform pedestrain detection in videos.