Lifelong Learning for Multilingual Text-To-Speech Synthesis Audio Samples
Audio samples for our paper: Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis. Code is available here.
We form the following training language sequence: German-Dutch-Chinese-Japanese. The final model is taken for speech synthesis after the entire sequence finished. From the perspective of lifelong learning, our goal is to mitigate "catastrophic forgetting" on German, Dutch and Chinese, without sacrificing performance on Japanese.
Dataset:
We use CSS10 corpus [1].
Systems:
- Reference: the Ground Truth utterance by human.
- Joint: four languages are jointly trained (upper bound).
- Fine-tune: four languages are sequentially trained, without any lifelong learning techniques (lower bound).
- EWC: a regularization-based lifelong learning baseline.
- GEM: a replay-based lifelong learning baseline.
- Rdm. Samp.: our proposed lifelong learning replay scheme, with Random Sampling strategy.
- Wtd. Samp.: our proposed lifelong learning replay scheme, with Weighted Sampling strategy.
- Dual Samp.: our proposed lifelong learning replay scheme, with the proposed Dual Sampler.
German
Text | Reference | Joint | Fine-tune | EWC | GEM | Rdm. Samp. | Wtd. Samp. | Daul Samp. |
---|---|---|---|---|---|---|---|---|
Hanake war das reichste Mädchen am Biwasee, nicht bloß reich an der äußeren Schönheit, welche die Frauen ruhig und wunschlos macht, | ||||||||
Sowie Leuwenhoek seinen Feind Swammerdamm erblickte, riß er sich los mit der höchsten Anstrengung seiner letzten Kräfte, sprang zurück | ||||||||
Er schämte sich, daß das Vorurteil des Vaters ihn so habe befangen können, | ||||||||
Ich geb' ihm auf sein Werben meine Tochter zum Weibe; er zieht mit ihr nach Göthaborg, | ||||||||
etragen, niemand anders sei als die Nichte des optischen Künstlers Leuwenhoek, namens Dörtje Elverdink, |
Dutch
Text | Reference | Joint | Fine-tune | EWC | GEM | Rdm. Samp. | Wtd. Samp. | Daul Samp. |
---|---|---|---|---|---|---|---|---|
Gewoonlijk verkrijgt men die gegevens met vrij samengestelde instrumenten, wier opgaven op zijn minst genomen twijfelachtig zijn, | ||||||||
Hij beschouwde hem nog als een onbegrepen vernuft, dat het bedrog der wereld moede, | ||||||||
of het niet beter was bij den kapitein in zijn kamer te gaan, hem eens ferm onder de oogen te zien en hem met blik en gebaar te tarten! | ||||||||
zooals gepantserde fregatten of ramschepen die hebben. Zoo zou dit onverklaarbaar verschijnsel zijn opgelost, of-er | ||||||||
Wij volgden hem alle drie, en eenige oogenblikken daarna kwamen wij, wonderbaarlijk gered, bij de boot van den visscher. |
Chinese
Text | Reference | Joint | Fine-tune | EWC | GEM | Rdm. Samp. | Wtd. Samp. | Daul Samp. |
---|---|---|---|---|---|---|---|---|
dàn bùdà hējǐu le, yě hěnshǎo yǒu gōngfū tánxiántiān。 tā bànshì, jiān jiàoshū, shízài qínkuài dé kěyǐ。 | ||||||||
fēi tè xìucái yīnwèi shàngchéng qù bàoguān, bèi bùhǎo de gémìngdǎng jiǎn le biànzǐ, érqiě yòu pòfèi le èrshí qiān de shǎngqián, suǒyǐ quánjiā yě hàotáo le。 | ||||||||
zhèshí tā měngránjiān kànjiàn zhào dàyé xiàng tā bēn lái, érqiě shǒulǐ niē zháo yīzhī dà zhúgāng。 | ||||||||
dáyìng zháo, sìmiàn kànshí, què jiàn yīgè měinv̌ de liǎnlù zài qiángtóu shàng, xiàng tā yīxiào, yǐnqù le。 tā hěn gāoxīng; | ||||||||
dàn ān mén zhǐ kāi le yītiáo féng, bìng wú hēigǒu cóngzhōng chōngchū, wàng jìnqù zhǐyǒu yīgè lǎo nígū。 |
Japanese
Text | Reference | Joint | Fine-tune | EWC | GEM | Rdm. Samp. | Wtd. Samp. | Daul Samp. |
---|---|---|---|---|---|---|---|---|
ano hito wa ichi nin de okonat teru n desu ka muron ichi nin desu seki wa? seki san wa kocchi yo。 kocchi ni yoga aru n desu mono | ||||||||
sore ni aitsu ga kuru to yakamashikut te ike nai kara ne nen wa shita de mo、 seishitsu no chigau kono imo-to wa、 | ||||||||
seki no tokoro ni iru ja ari mase n ka sorya fudan no hanashi yo。 watashi no iu no wa ima no koto yo。 ima doko ni irassharu ka tteyuu no yo。 | ||||||||
jufun gurai ugoi te aruko u ka aruku mai ka to mayot ta。 | ||||||||
mottomo kyokusetsu no o-i kakudo de、 arayuru ho-men ni hansha sa seru tegiwa wo itaru tokoro ni hakki shi te haba kara nai mono wa kanojo ni chigai nakat ta。 |
References
[1] Kyubyong Park and Thomas Mulc, “CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages,” in Proc. Interspeech 2019, 2019, pp. 1566–1570.