Synthing: A WaveNet-based Singing Voice Synthisizer Audio Samples

Audio samples for our team's course project of the USC course EE599: Deep Learning Labs for Speech Processing. Code. Final Report.

Results on NIT Japanese Nursery dataset

Trained on NIT data. We took one of the training recordings as target. Resynthesized using true F0 contour, generated MFSC and AP.

Target	Synthesized

Trained on NIT data. Generate previously unseen sequences by splicing together random clips from the NIT recordings and doing a similar concatenation of the corresponding F0 contour and phonemes for each audio clip.

Target	Synthesized

Results on self-created dataset

Trained on self-created dataset. We resynthesized recordings in the self-curated Coldplay dataset using true F0 and AP, and MFSCs generated by the harmonic submodel.

Target	Synthesized

Synthing: A WaveNet-based Singing Voice Synthisizer

Mu Yang, James Bunning, Shiyu Mou, Sharada Murali, Yixin Yang

University of Southern California, USA

Synthing: A WaveNet-based Singing Voice Synthisizer Audio Samples

Results on NIT Japanese Nursery dataset

Results on self-created dataset