Título:
|
Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation
|
Autor/a:
|
Pascual, Santiago; Bonafonte Cávez, Antonio
|
Otros autores:
|
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions; Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla |
Abstract:
|
Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a specific output layer for each speaker. Objective and perceptual experiments prove that this scheme produces much better results in comparison with single speaker model. Moreover, we also tackle the problem of speaker adaptation by adding a new output branch to the model and successfully training it without the need of modifying the base optimized model. This fine tuning method achieves better results than training the new speaker from scratch with its own model. |
Abstract:
|
Peer Reviewed |
Materia(s):
|
-Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal -Signal processing -Speech synthesis -Learning (artificial intelligence) -Recurrent neural nets -Speaker recognition -Tractament del senyal |
Derechos:
|
|
Tipo de documento:
|
Artículo - Versión publicada Objeto de conferencia |
Editor:
|
Institute of Electrical and Electronics Engineers (IEEE)
|
Compartir:
|
|