Information about Zipf Mandelbrot Law

Zipf-Mandelbrot
Probability mass function
Cumulative distribution function
Parameters (integer)
(real)
(real)
Support
Probability mass function (pmf)
Cumulative distribution function (cdf)
Mean
Median
Mode
Variance
Skewness
Excess kurtosis
Entropy
Moment-generating function (mgf)
Characteristic function
In probability theory and statistics, the Zipf-Mandelbrot law is a discrete probability distribution. Also known as the Pareto-Zipf law, it is a power-law distribution on ranked data, named after the Harvard linguistics professor George Kingsley Zipf (1902-1950) who suggested a simpler distribution called Zipf's law, and the mathematician Benoît Mandelbrot (born November 20, 1924), who subsequently generalized it.

The probability mass function is given by:



where is given by:



which may be thought of as a generalization of a harmonic number. In the limit as approaches infinity, this becomes the Hurwitz zeta function . For finite and the Zipf-Mandelbrot law becomes Zipf's law. For infinite and it becomes a Zeta distribution.

Applications

The distribution of words ranked by their frequency in a random corpus of writing is generally a power-law distribution, known as Zipf's law.

If one plots the frequency rank of words contained in a large corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Gelbukh and Sidorov 2001).

References and links

Probability distributions    [ edit] ]
Univariate Multivariate
Discrete: Benford • BernoullibinomialBoltzmanncategoricalcompound Poisson • discrete phase-type • degenerateGauss-Kuzmingeometrichypergeometriclogarithmicnegative binomialparabolic fractalPoissonRademacherSkellamuniformYule-SimonzetaZipf • Zipf-MandelbrotEwensmultinomialmultivariate Polya
Continuous: BetaBeta primeCauchychi-squareDirac delta function • Coxian • Erlangexponentialexponential powerFfading • Fermi-Dirac • Fisher's zFisher-TippettGammageneralized extreme valuegeneralized hyperbolicgeneralized inverse GaussianHalf-LogisticHotelling's T-squarehyperbolic secanthyper-exponentialhypoexponentialinverse chi-square (scaled inverse chi-square) • inverse Gaussianinverse gamma (scaled inverse gamma) • KumaraswamyLandauLaplace • Lvy • Lvy skew alpha-stablelogisticlog-normal • Maxwell-Boltzmann • Maxwell speedNakagaminormal (Gaussian)normal-gammanormal inverse GaussianParetoPearson • phase-type • polarraised cosineRayleigh • relativistic Breit-Wigner • Riceshifted GompertzStudent's ttriangulartruncated normaltype-1 Gumbeltype-2 GumbeluniformVariance-GammaVoigtvon MisesWeibullWigner semicircleWilks' lambdaDirichletGeneralized Dirichlet distribution . inverse-WishartKentmatrix normalmultivariate normalmultivariate Studentvon Mises-FisherWigner quasiWishart
Miscellaneous: bimodalCantorconditional • equilibrium • exponential family • infinitely divisible • location-scale familymarginalmaximum entropyposterior • prior • quasisamplingsingular
The integers (from the Latin integer, which means with untouched integrity, whole, entire) are the set of numbers including the whole numbers (0, 1, 2, 3, …) and their negatives (0, −1, −2, −3, …).
..... Click the link for more information.
Real may refer to:
  • Reality, something that exists
  • Real (galley), the flagship of Don Juan de Austria in the Battle of Lepanto in 1571
  • Real (bicycle manufacturer), a bicycle manufacturer
  • Real

..... Click the link for more information.
In mathematics, the real numbers may be described informally as numbers that can be given by an infinite decimal representation, such as 2.4871773339…. The real numbers include both rational numbers, such as 42 and −23/129, and irrational numbers, such as π and
..... Click the link for more information.
In mathematics, a support of a function f  from a set X  to the real numbers R is a subset Y of X such that f (x) is zero for all x in X and outside Y.
..... Click the link for more information.
probability mass function (abbreviated pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value. A probability mass function differs from a probability density function (abbreviated pdf
..... Click the link for more information.
In probability theory, the cumulative distribution function (CDF), also called probability distribution function or just distribution function,[1] completely describes the probability distribution of a real-valued random variable X.
..... Click the link for more information.
expected value (or mathematical expectation, or mean) of a discrete random variable is the sum of the probability of each possible outcome of the experiment multiplied by the outcome value (or payoff).
..... Click the link for more information.
median is described as the number separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking
..... Click the link for more information.
In statistics, mode means the most frequent value assumed by a random variable, or occurring in a sampling of a random variable. The term is applied both to probability distributions and to collections of experimental data.
..... Click the link for more information.
variance of a random variable (or somewhat more precisely, of a probability distribution) is one measure of statistical dispersion, averaging the squared distance of its possible values from the expected value.
..... Click the link for more information.
skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable.

Introduction

Consider the distribution in the figure. The bars on the right side of the distribution taper differently than the bars on the left side.
..... Click the link for more information.
kurtosis (from the Greek word kurtos, meaning bulging) is a measure of the "peakedness" of the probability distribution of a real-valued random variable. Higher kurtosis means more of the variance is due to infrequent extreme deviations, as opposed to frequent
..... Click the link for more information.
Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable.

Shannon entropy quantifies the information contained in a piece of data: it is the minimum average message length, in bits (if using base-2 logarithms), that must
..... Click the link for more information.
In probability theory and statistics, the moment-generating function of a random variable X is



wherever this expectation exists. The moment-generating function generates the moments of the probability distribution.
..... Click the link for more information.
In probability theory, the characteristic function of any random variable completely defines its probability distribution. On the real line it is given by the following formula, where X is any random variable with the distribution in question:


..... Click the link for more information.
Probability theory is the branch of mathematics concerned with analysis of random phenomena.[1] The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities
..... Click the link for more information.
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities.
..... Click the link for more information.
Discrete mathematics, also called finite mathematics or Decision Maths, is the study of mathematical structures that are fundamentally discrete, in the sense of not supporting or requiring the notion of continuity.
..... Click the link for more information.
probability distribution that assigns a probability to every subset (more precisely every measurable subset) of its state space in such a way that the probability axioms are satisfied.
..... Click the link for more information.
Pareto can refer to:
  • Vilfredo Pareto (born 1848), an Italian sociologist, economist and philosopher;
Several things named after Vilfredo Pareto:
  • Pareto chart, an ordered bar chart used in statistical quality assurance

..... Click the link for more information.
A power law is any polynomial relationship that exhibits the property of scale invariance. The most common power laws relate two variables and have the form



where and are constants, and is of .
..... Click the link for more information.
Harvard University (incorporated as The President and Fellows of Harvard College) is a private university in Cambridge, Massachusetts, USA and a member of the Ivy League.
..... Click the link for more information.
Linguistics is the scientific study of language, which can be theoretical or applied. Someone who engages in this study is called a linguist.
..... Click the link for more information.
The meaning of the word professor (Latin: person who professes to be an expert in some art or science, teacher of highest rank[1]) varies. In most English-speaking countries, it refers to a senior academic who holds a departmental chair
..... Click the link for more information.
George Kingsley Zipf (IPA [zɪf]), (1902-1950), was an American linguist and philologist who studied statistical occurrences in different languages. Zipf worked at Harvard University.
..... Click the link for more information.
19th century - 20th century - 21st century
1870s  1880s  1890s  - 1900s -  1910s  1920s  1930s
1899 1900 1901 - 1902 - 1903 1904 1905

Year 1902 (MCMII
..... Click the link for more information.
19th century - 20th century - 21st century
1920s  1930s  1940s  - 1950s -  1960s  1970s  1980s
1947 1948 1949 - 1950 - 1951 1952 1953

Year 1950 (MCML
..... Click the link for more information.
Zipf's law, publicized by Harvard linguist George Kingsley Zipf (IPA [zɪf]), stated that, in a corpus of natural language utterances, the frequency of any word is roughly inversely proportional to its rank in the frequency
..... Click the link for more information.
mathematician is a person whose primary area of study and research is the field of mathematics.

Problems in mathematics

Some people incorrectly believe that mathematics has been fully understood, but the publication of new discoveries in mathematics continues at an immense
..... Click the link for more information.
Benoît Mandelbrot

Mandelbrot speaking in 2007 at the EPFL
Born November 20 1924 (1924--) (age 84)
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter