Zipf’s Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals

dc.contributor.author
Haro, Martín
dc.contributor.author
Serrà, Joan
dc.contributor.author
Herrera, Perfecto
dc.contributor.author
Corral, Álvaro
dc.date.accessioned
2020-11-12T10:39:51Z
dc.date.accessioned
2024-09-19T14:34:27Z
dc.date.available
2020-11-12T10:39:51Z
dc.date.available
2024-09-19T14:34:27Z
dc.date.issued
2012-01-01
dc.identifier.uri
http://hdl.handle.net/2072/377746
dc.description.abstract
Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources.
eng
dc.format.extent
10 p.
cat
dc.language.iso
eng
cat
dc.relation.ispartof
PLoS ONE
cat
dc.rights
L'accés als continguts d'aquest document queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons:http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.source
RECERCAT (Dipòsit de la Recerca de Catalunya)
dc.subject.other
Matemàtiques
cat
dc.title
Zipf’s Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals
cat
dc.type
info:eu-repo/semantics/article
cat
dc.type
info:eu-repo/semantics/publishedVersion
cat
dc.subject.udc
51
cat
dc.embargo.terms
cap
cat
dc.identifier.doi
10.1371/journal.pone.0033993
cat
dc.rights.accessLevel
info:eu-repo/semantics/openAccess


Documents

ACorral09MaRcAt.pdf

802.6Kb PDF

Aquest element apareix en la col·lecció o col·leccions següent(s)

CRM Articles [656]