Power laws, Pareto distributions and Zipf's lawстатья из журнала
Аннотация: Abstract When the probability of measuring a particular value of some quantity varies inversely as a power of that value, the quantity is said to follow a power law, also known variously as Zipf's law or the Pareto distribution. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. For instance, the distributions of the sizes of cities, earthquakes, forest fires, solar flares, moon craters and people's personal fortunes all appear to follow power laws. The origin of power-law behaviour has been a topic of debate in the scientific community for more than a century. Here we review some of the empirical evidence for the existence of power-law forms and the theories proposed to explain them. Acknowledgments The author would like to thank Petter Holme, Cris Moore and Erik van Nimwegen for useful conversations, and Lada Adamic for the Web site hit data. This work was funded in part by the National Science Foundation under grant number DMS – 0405348. Notes *Power laws also occur in many situations other than the statistical distributions of quantities. For instance, Newton's famous 1/r 2 law for gravity has a power-law form with exponent α = 2. While such laws are certainly interesting in their own way, they are not the topic of this paper. Thus, for instance, there has in recent years been some discussion of the ‘allometric’ scaling laws seen in the physiognomy and physiology of biological organisms [Citation17], but since these are not statistical distributions they will not be discussed here. †http://linkage.rockefeller.edu/wli/zipf/. ‡This can be done using the so-called transformation method. If we can generate a random real number r uniformly distributed in the range 0 ⩽ r < 1, then x = xmin (1 – r)−1/α−1 is a random power-law-distributed real number in the range xmin ⩽ x < ∞ with exponent α. Note that there has to be a lower limit xmin on the range; the power-law distribution diverges as x→0—see section 2.1. *See http://www.hpl.hp.com/research/idl/papers/ranking/ for a useful discussion of these and related points. †The most common words in this case are, in order, ‘the’, ‘of’, ‘and’, ‘a’ and ‘to’, and the same is true for most written English texts. Interestingly, however, it is not true for spoken English. The most common words in spoken English are, in order, ‘I’, ‘and’, ‘the’, ‘to’ and ‘that’ [Citation22]. *Sometimes the tail is also cut off because there is, for one reason or another, a limit on the largest value that may occur. An example is the finite-size effects found in critical phenomena—see section 4.5. In this case, Equationequation (5) must be modified [Citation20]. †Significantly more tenuous claims to power-law behaviour for other quantities have appeared elsewhere in the literature, for instance in the discussion of the distribution of the sizes of electrical blackouts [Citation31,Citation32]. These however I consider insufficiently substantiated for inclusion in the present work. *Also called the Eulerian integral of the first kind. †This can be demonstrated by approximating the Γ-functions of Equationequation (19) using Sterling's formula. *This argument is sometimes called the ‘monkeys with typewriters' argument, the monkey being the traditional exemplar of a random typist. *Gambler's ruin is so called because a gambler's night of betting ends when his or her supply of money hits zero (assuming the gambling establishment declines to offer him or her a line of credit). †The enthusiastic reader can easily derive this result for him or herself by expanding using the binomial theorem. *Modern phylogenetic analysis, the quantitative comparison of species' genetic material, can provide a picture of the evolutionary tree and hence allow the accurate ‘cladistic’ assignment of species to taxa. For prehistoric species, however, whose genetic material is not usually available, determination of evolutionary ancestry is difficult, so classification into taxa is based instead on morphology, i.e. on the shapes of organisms. It is widely accepted that such classifications are subjective and that the taxonomic assignments of fossil species are probably riddled with errors. †To be fair, I consider the power law for the distribution of genus lifetimes to fall in the category of ‘tenuous’ identifications to which I alluded in the second footnote on p. 9. This theory should be taken with a pinch of salt. *Yule's analysis of the process was considerably more involved than the one presented here, essentially because the theory of stochastic processes as we now know it did not yet exist in his time. The master equation method we employ is a relatively modern innovation, introduced in this context by Simon [Citation35].
Год издания: 2005
Авторы: MEJ Newman
Издательство: Taylor & Francis
Источник: Contemporary Physics
Ключевые слова: Complex Network Analysis Techniques, Complex Systems and Time Series Analysis, Statistical Mechanics and Entropy
Другие ссылки: Contemporary Physics (HTML)
arXiv (Cornell University) (PDF)
arXiv (Cornell University) (HTML)
CiteSeer X (The Pennsylvania State University) (PDF)
CiteSeer X (The Pennsylvania State University) (HTML)
DataCite API (HTML)
arXiv (Cornell University) (PDF)
arXiv (Cornell University) (HTML)
CiteSeer X (The Pennsylvania State University) (PDF)
CiteSeer X (The Pennsylvania State University) (HTML)
DataCite API (HTML)
Открытый доступ: green
Том: 46
Выпуск: 5
Страницы: 323–351