Título:
|
Audio to score matching by combining phonetic and duration information
|
Autor/a:
|
Gong, Rong; Pons Puig, Jordi; Serra, Xavier
|
Abstract:
|
Comunicació presentada a la 18th International Society for Music Information Retrieval Conference (ISMIR 2017), celebrada els dies 23 a 27 d'octubre de 2017 a Suzhou, Xina. |
Abstract:
|
We approach the singing phrase audio to score matching
problem by using phonetic and duration information – with
a focus on studying the jingju a cappella singing case. We
argue that, due to the existence of a basic melodic contour
for each mode in jingju music, only using melodic
information (such as pitch contour) will result in an ambiguous
matching. This leads us to propose a matching
approach based on the use of phonetic and duration
information. Phonetic information is extracted with an
acoustic model shaped with our data, and duration information
is considered with the Hidden Markov Models
(HMMs) variants we investigate. We build a model for
each lyric path in our scores and we achieve the matching
by ranking the posterior probabilities of the decoded
most likely state sequences. Three acoustic models are investigated:
(i) convolutional neural networks (CNNs), (ii)
deep neural networks (DNNs) and (iii) Gaussian mixture
models (GMMs). Also, two duration models are compared:
(i) hidden semi-Markov model (HSMM) and (ii)
post-processor duration model. Results show that CNNs
perform better in our (small) audio dataset and also that
HSMM outperforms the post-processor duration model. |
Abstract:
|
This work is partially supported by the Maria de Maeztu Programme (MDM-2015-0502) and by the European Research Council under the European Union’s Seventh Framework Program, as part of the CompMusic project (ERC grant agreement 267583). |
Materia(s):
|
-Música -- Informàtica |
Derechos:
|
© Rong Gong, Jordi Pons and Xavier Serra. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Rong Gong, Jordi Pons and Xavier Serra. “Audio to score matching by combining phonetic and duration information”, 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017.
http://creativecommons.org/licenses/by/4.0/ |
Tipo de documento:
|
Objeto de conferencia Artículo - Versión publicada |
Editor:
|
International Society for Music Information Retrieval (ISMIR)
|
Compartir:
|
|