Neural Audio Generation for Speech Synthesis

Home | About RECERCAT | Contact

Català | Castellano

All of RECERCAT

By Communities &
Collections By Defense Date By Authors By Titles By Subject

This Collection

By Defense Date By Authors By Titles By Subject

Statistics

View Statistics All RECERCAT

My RECERCAT

Other repositories directory

RECERCAT Home > Universitat Politècnica de Catalunya > Tesines i projectes i treballs de final de carrera > View document

To access the full text documents, please follow this link: http://hdl.handle.net/2117/117980

Title:	Neural Audio Generation for Speech Synthesis
Author:	Dorca Saez, Georgina
Other authors:	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions; Bonafonte Cávez, Antonio
Abstract:	Most speech synthesis systems require a linguistic module to produce the features that drive the speech generation module. In this project, system will be designed using a deep architecture and automatically learned to produce either linguistic features or speech from the raw letter representation.
Abstract:	Recently, neural networks have become the state of the art for speech synthesis from raw text tasks and they are actually representing a powerful force in the industry. In this project, we present an end-to-end deep learning-based TTS system, able to generate a voice signal from characters. In order to fulfil this task we developed a re-implementation of the Uncodicional-SampleRNN neural vocoder, in order to be conditioned under an adaptation of MUSA, which predicts vocoder parameters from text.
Abstract:	Recientemente, las redes neuronales se han convertido en el estado del arte para las tareas de síntesis del habla y actualmente representan una fuerza poderosa en la industria. En este proyecto, presentamos un sistema de conversión de texto a voz (Text-to-Speech) basado en aprendizaje profundo (Deep learning), capaz de generar una señal de voz a partir de caracteres. Para realizar dicha tarea desarrollamos una adaptación de MUSA, encargado de realizar una predicción de los parámetros del vocoder a partir del texto para, posteriormente, condicionar una reimplementación del vocoder neuronal Uncodicional-SampleRNN.
Abstract:	Recentment, les xarxes neuronals s'han convertit en l'estat de l'art per a les tasques de síntesis de la parla i actualment representen una força poderosa en la indústria. En aquest projecte, presentem un sistema de conversió de text a veu (Text-to-Speech) basat en aprenentatge profund (Deep learning), capaç de generar un senyal de veu a partir de caràcters. Per realitzar aquesta tasca desenvolupem una adaptació de MUSA, encarregat de realitzar una predicció dels paràmetres del vocoder a partir del text, per condicionar posteriorment una reimplementació del vocoder neuronal Uncodicional-SampleRNN.
Subject(s):	-Àrees temàtiques de la UPC::Enginyeria de la telecomunicació -Neural networks (Computer science) -Machine learning -Speech processing systems -Deep learning -Speech processing systems -Neural Nets -Aprendizaje profundo -Sistemas de procesado del habla -Redes Neuronales -Xarxes neuronals (Informàtica) -Aprenentatge automàtic -Processament de la parla
Rights:	S'autoritza la difusió de l'obra mitjançant la llicència Creative Commons o similar 'Reconeixement-NoComercial- SenseObraDerivada' http://creativecommons.org/licenses/by-nc-nd/3.0/es/
Document type:	Bachelor Thesis
Published by:	Universitat Politècnica de Catalunya
Share:

Show full item record

Accesibility | Legal note | Cookies Policy

Coordination

Supporters