Synthing: A WaveNet-based Singing Voice Synthisizer

Mu Yang, James Bunning, Shiyu Mou, Sharada Murali, Yixin Yang

University of Southern California, USA

Audio samples for our team's course project of the USC course EE599: Deep Learning Labs for Speech Processing. Code. Final Report.

Results on NIT Japanese Nursery dataset

  • Trained on NIT data. We took one of the training recordings as target. Resynthesized using true F0 contour, generated MFSC and AP.
  • Target Synthesized
  • Trained on NIT data. Generate previously unseen sequences by splicing together random clips from the NIT recordings and doing a similar concatenation of the corresponding F0 contour and phonemes for each audio clip.
  • Target Synthesized

    Results on self-created dataset

  • Trained on self-created dataset. We resynthesized recordings in the self-curated Coldplay dataset using true F0 and AP, and MFSCs generated by the harmonic submodel.
  • Target Synthesized