A scaling law beyond Zipf'\''s law and its relation with Heaps'\'' law

dc.contributor.author
Font-Clos, F.
dc.contributor.author
Boleda, G.
dc.contributor.author
Corral, A.
dc.date.accessioned
2020-10-13T11:46:08Z
dc.date.accessioned
2024-09-19T13:15:27Z
dc.date.available
2020-10-13T11:46:08Z
dc.date.available
2024-09-19T13:15:27Z
dc.date.issued
2014-01-01
dc.identifier.uri
http://hdl.handle.net/2072/377507
dc.description.abstract
The dependence with text length of the statistical properties of word occurrences has long been considered a severe limitation {for the usefulness of} quantitative linguistics. We propose a simple scaling form for the distribution of absolute word frequencies which uncovers the robustness of this distribution as text grows. In this way, the shape of the distribution is always the same and it is only a scale parameter which increases linearly with text length. By analyzing very long novels we show that this behavior holds both for raw, unlemmatized texts and for lemmatized texts. For the latter case, the word-frequency distribution is well fit by a double power law, maintaining the Zipf'\''s exponent value $ \gamma\simeq 2$ for large frequencies but yielding a smaller exponent in the low frequency regime. The growth of the distribution with text length allows us to estimate the size of the vocabulary at each step and to propose an alternative to Heaps'\'' law, which turns out to be intimately connected to Zipf'\''s law, thanks to the scaling behavior.
eng
dc.format.extent
23 p.
cat
dc.language.iso
eng
cat
dc.relation.ispartof
CRM Preprints
cat
dc.rights
L'accés als continguts d'aquest document queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons:http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.source
RECERCAT (Dipòsit de la Recerca de Catalunya)
dc.subject.other
Matemàtiques
cat
dc.title
A scaling law beyond Zipf'\''s law and its relation with Heaps'\'' law
cat
dc.type
info:eu-repo/semantics/preprint
cat
dc.subject.udc
51
cat
dc.embargo.terms
cap
cat
dc.rights.accessLevel
info:eu-repo/semantics/openAccess


Documents

A61-zipfs_scalingMaRcAt.pdf

564.9Kb PDF

This item appears in the following Collection(s)