TY - JOUR
T1 - Lognormality of the distribution of Japanese sentence lengths
AU - Furuhashi, Sho
AU - Hayakawa, Yoshinori
PY - 2012/3
Y1 - 2012/3
N2 - The lengths of sentences in written texts have been reported to exhibit characteristic distributions that resemble lognormal distributions. However, the mechanism responsible for such lognormality is unclear. In this quantitative study, we analyze over 10,000 Japanese sentences from out-of-copyright Japanese texts stored on Aozora Bunko. We first confirm that sentence length distributions can be better represented by the lognormal function than by other functions (e.g., the gamma distribution). Next, under the assumption that each sentence is generated by a hierarchical branching process in terms of dependency trees, we test whether the composition of sentences can be explained by a simple multiplicative process by utilizing the Japanese dependency analyzer CaboCha. The results imply that the lognormality of sentence length distributions originates from the dependency tree depth and that a simple multiplicative model cannot accurately model the processes involved in generating sentences.
AB - The lengths of sentences in written texts have been reported to exhibit characteristic distributions that resemble lognormal distributions. However, the mechanism responsible for such lognormality is unclear. In this quantitative study, we analyze over 10,000 Japanese sentences from out-of-copyright Japanese texts stored on Aozora Bunko. We first confirm that sentence length distributions can be better represented by the lognormal function than by other functions (e.g., the gamma distribution). Next, under the assumption that each sentence is generated by a hierarchical branching process in terms of dependency trees, we test whether the composition of sentences can be explained by a simple multiplicative process by utilizing the Japanese dependency analyzer CaboCha. The results imply that the lognormality of sentence length distributions originates from the dependency tree depth and that a simple multiplicative model cannot accurately model the processes involved in generating sentences.
KW - Dependency tree
KW - Japanese dependency structure
KW - Lognormal distribution
KW - Multiplicative process
KW - Sentence length
UR - http://www.scopus.com/inward/record.url?scp=84858037745&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84858037745&partnerID=8YFLogxK
U2 - 10.1143/JPSJ.81.034004
DO - 10.1143/JPSJ.81.034004
M3 - Article
AN - SCOPUS:84858037745
SN - 0031-9015
VL - 81
JO - Journal of the Physical Society of Japan
JF - Journal of the Physical Society of Japan
IS - 3
M1 - 034004
ER -