Restricted Boltzmann machines for vector representation of speech in speaker recognition

Ghahabi Esfahani, Omid; Hernando Pericás, Francisco Javier

Restricted Boltzmann machines for vector representation of speech in speaker recognition

dc.contributor

Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions

dc.contributor

Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla

dc.contributor.author

Ghahabi Esfahani, Omid

dc.contributor.author

Hernando Pericás, Francisco Javier

dc.date.issued

2018-01

dc.identifier

Ghahabi, O., Hernando, J. Restricted Boltzmann machines for vector representation of speech in speaker recognition. "Computer speech and language", Gener 2018, vol. 47, p. 16-29.

dc.identifier

0885-2308

dc.identifier

https://hdl.handle.net/2117/106743

dc.identifier

10.1016/j.csl.2017.06.007

dc.description.abstract

Over the last few years, i-vectors have been the state-of-the-art technique in speaker recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need phonetically labeled background data. The aim of this work is to develop an efficient alternative vector representation of speech by keeping the computational cost as low as possible and avoiding phonetic labels, which are not always accessible. The proposed vectors will be based on both Gaussian Mixture Models (GMM) and Restricted Boltzmann Machines (RBM) and will be referred to as GMM–RBM vectors. The role of RBM is to learn the total speaker and session variability among background GMM supervectors. This RBM, which will be referred to as Universal RBM (URBM), will then be used to transform unseen supervectors to the proposed low dimensional vectors. The use of different activation functions for training the URBM and different transformation functions for extracting the proposed vectors are investigated. At the end, a variant of Rectified Linear Units (ReLU) which is referred to as variable ReLU (VReLU) is proposed. Experiments on the core test condition 5 of NIST SRE 2010 show that comparable results with conventional i-vectors are achieved with a clearly lower computational load in the vector extraction process.

dc.description.abstract

Peer Reviewed

dc.description.abstract

Postprint (published version)

dc.format

14 p.

dc.format

application/pdf

dc.language

eng

dc.publisher

Elsevier

dc.relation

http://www.sciencedirect.com/science/article/pii/S0885230816302923?via%3Dihub

dc.rights

http://creativecommons.org/licenses/by-nc-nd/3.0/es/

dc.rights

Open Access

dc.rights

Attribution-NonCommercial-NoDerivs 3.0 Spain

dc.subject

Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic

dc.subject

Automatic speech recognition

dc.subject

Restricted Boltzmann machine

dc.subject

Deep learning

dc.subject

Variable rectified linear unit

dc.subject

Speaker recognition

dc.subject

GMM–RBM vector

dc.subject

i-vector

dc.subject

Reconeixement automàtic de la parla

dc.title

Restricted Boltzmann machines for vector representation of speech in speaker recognition

dc.type

Article

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

E-prints [72986]