Synthing: A WaveNet-based Singing Voice Synthisizer

Mu Yang, James Bunning, Shiyu Mou, Sharada Murali, Yixin Yang

University of Southern California, USA

Synthing: A WaveNet-based Singing Voice Synthisizer Audio Samples

Audio samples for our team's course project of the USC course EE599: Deep Learning Labs for Speech Processing. Code. Final Report.

Results on NIT Japanese Nursery dataset

  • Trained on NIT data. We took one of the training recordings as target. Resynthesized using true F0 contour, generated MFSC and AP.
  • Target Synthesized
  • Trained on NIT data. Generate previously unseen sequences by splicing together random clips from the NIT recordings and doing a similar concatenation of the corresponding F0 contour and phonemes for each audio clip.
  • Target Synthesized

    Results on self-created dataset

  • Trained on self-created dataset. We resynthesized recordings in the self-curated Coldplay dataset using true F0 and AP, and MFSCs generated by the harmonic submodel.
  • Target Synthesized