Title:
|
Score-informed syllable segmentation for a cappella singing voice with convolutional neural networks
|
Author:
|
Pons Puig, Jordi; Gong, Rong; Serra, Xavier
|
Abstract:
|
Comunicació presentada a la 18th International Society for Music Information Retrieval Conference (ISMIR 2017), celebrada els dies 23 a 27 d'octubre de 2017 a Suzhou, Xina. |
Abstract:
|
This paper introduces a new score-informed method for the
segmentation of jingju a cappella singing phrase into syllables.
The proposed method estimates the most likely sequence
of syllable boundaries given the estimated syllable
onset detection function (ODF) and its score. Throughout
the paper, we first examine the jingju syllables structure
and propose a definition of the term “syllable onset”.
Then, we identify which are the challenges that jingju a
cappella singing poses. Further, we investigate how to
improve the syllable ODF estimation with convolutional
neural networks (CNNs). We propose a novel CNN architecture
that allows to efficiently capture different timefrequency
scales for estimating syllable onsets. Besides,
we propose using a score-informed Viterbi algorithm –
instead of thresholding the onset function–, because the
available musical knowledge we have (the score) can be
used to inform the Viterbi algorithm to overcome the identified
challenges. The proposed method outperforms the
state-of-the-art in syllable segmentation for jingju a cappella
singing. We further provide an analysis of the segmentation
errors which points possible research directions. |
Abstract:
|
This work is partially supported by the Maria de Maeztu Programme (MDM-2015-0502) and the European Research Council under the European Union’s Seventh Framework Program, as part of the CompMusic project (ERC grant agreement 267583). |
Subject(s):
|
-Música -- Informàtica |
Rights:
|
© Jordi Pons, Rong Gong and Xavier Serra. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Jordi Pons∗, Rong Gong∗ and Xavier Serra. “Score-informed syllable segmentation for a cappella singing voice with convolutional neural networks”, 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017.
http://creativecommons.org/licenses/by/4.0/ |
Document type:
|
Conference Object Article - Published version |
Published by:
|
International Society for Music Information Retrieval (ISMIR)
|
Share:
|
|