Restricted Boltzmann machines for vector representation of speech in speaker recognition

dc.contributor
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.contributor
Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.contributor.author
Ghahabi Esfahani, Omid
dc.contributor.author
Hernando Pericás, Francisco Javier
dc.date.issued
2018-01
dc.identifier
Ghahabi, O., Hernando, J. Restricted Boltzmann machines for vector representation of speech in speaker recognition. "Computer speech and language", Gener 2018, vol. 47, p. 16-29.
dc.identifier
0885-2308
dc.identifier
https://hdl.handle.net/2117/106743
dc.identifier
10.1016/j.csl.2017.06.007
dc.description.abstract
Over the last few years, i-vectors have been the state-of-the-art technique in speaker recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need phonetically labeled background data. The aim of this work is to develop an efficient alternative vector representation of speech by keeping the computational cost as low as possible and avoiding phonetic labels, which are not always accessible. The proposed vectors will be based on both Gaussian Mixture Models (GMM) and Restricted Boltzmann Machines (RBM) and will be referred to as GMM–RBM vectors. The role of RBM is to learn the total speaker and session variability among background GMM supervectors. This RBM, which will be referred to as Universal RBM (URBM), will then be used to transform unseen supervectors to the proposed low dimensional vectors. The use of different activation functions for training the URBM and different transformation functions for extracting the proposed vectors are investigated. At the end, a variant of Rectified Linear Units (ReLU) which is referred to as variable ReLU (VReLU) is proposed. Experiments on the core test condition 5 of NIST SRE 2010 show that comparable results with conventional i-vectors are achieved with a clearly lower computational load in the vector extraction process.
dc.description.abstract
Peer Reviewed
dc.description.abstract
Postprint (published version)
dc.format
14 p.
dc.format
application/pdf
dc.language
eng
dc.publisher
Elsevier
dc.relation
http://www.sciencedirect.com/science/article/pii/S0885230816302923?via%3Dihub
dc.rights
http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.rights
Open Access
dc.rights
Attribution-NonCommercial-NoDerivs 3.0 Spain
dc.subject
Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject
Automatic speech recognition
dc.subject
Restricted Boltzmann machine
dc.subject
Deep learning
dc.subject
Variable rectified linear unit
dc.subject
Speaker recognition
dc.subject
GMM–RBM vector
dc.subject
i-vector
dc.subject
Reconeixement automàtic de la parla
dc.title
Restricted Boltzmann machines for vector representation of speech in speaker recognition
dc.type
Article


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

E-prints [72986]