Title:
|
LSTM neural network-based speaker segmentation using acoustic and language modelling
|
Author:
|
India Massana, Miquel Àngel; Rodríguez Fonollosa, José Adrián; Hernando Pericás, Francisco Javier
|
Other authors:
|
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions; Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla |
Abstract:
|
This paper presents a new speaker change detection system based on Long Short-Term Memory (LSTM) neural networks using acoustic data and linguistic content. Language modelling is combined with two different Joint Factor Analysis (JFA) acoustic approaches: i-vectors and speaker factors. Both of them are compared with a baseline algorithm that uses cosine distance to detect speaker turn changes. LSTM neural networks with both linguistic and acoustic features have been able to produce a robust speaker segmentation. The experimental results show that our proposal clearly outperforms the baseline system. |
Abstract:
|
Peer Reviewed |
Subject(s):
|
-Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic -Automatic speech recognition -Neural networks (Neurobiology) -Speaker segmentation -Neural language modelling -I-vectors -Speaker factors -LSTM neural networks -Reconeixement automàtic de la parla -Xarxes neuronals (Neurobiologia) |
Rights:
|
|
Document type:
|
Article - Published version Conference Object |
Published by:
|
International Speech Communication Association (ISCA)
|
Share:
|
|