Information about Likelihood Function
Likelihood as a solitary term is a shorthand for likelihood function. In non-technical usage, "likelihood" is a synonym for "probability", but throughout this article only the technical definition is used. Informally, if "probability" allows us to predict unknown outcomes based on known parameters, then "likelihood" allows us to determine unknown parameters based on known outcomes.
In a sense, likelihood works backwards from probability: given B, we use the conditional probability Pr(A|B) to reason about A, and, given A, we use the likelihood function L(B|A) to reason about B. This mode of reasoning is formalized in Bayes' theorem:
In statistics, a likelihood function is a conditional probability function considered as a function of its second argument with its first argument held fixed, thus:
and also any other function proportional to such a function. That is, the likelihood function for B is the equivalence class of functions
for any constant of proportionality
.
Thus the numerical value
is immaterial; all that matters are ratios of the form
since these are invariant with respect to the constant of proportionality.
For more about making inferences via likelihood functions, see also the method of maximum likelihood, and likelihood-ratio testing.
For example, consider a regression analysis model with normally distributed errors. The most likely value of the error variance is the variance of the residuals. The residuals depend on all other parameters. Hence the variance parameter can be written as a function of the other parameters.
where θ is the parameter (in the case of discrete distributions, the probability density functions are probability "mass" functions) the likelihood function is
where x is the observed outcome of an experiment. In other words, when f(x | θ) is viewed as a function of x with θ fixed, it is a probability density function, and when viewed as a function of θ with x fixed, it is a likelihood function.
Note: This is not the same as the probability that those parameters are the right ones, given the observed sample. Attempting to interpret the likelihood of a hypothesis given observed evidence as the probability of the hypothesis is a common error, with potentially disastrous real-world consequences in medicine, engineering or jurisprudence. See prosecutor's fallacy for an example of this.
In symbols, we can say the above as
Another way of saying this is to reverse it and say that "the likelihood of pH = 0.5, given the observation 'HH', is 0.25", i.e.,
But this is not the same as saying that the probability of pH = 0.5, given the observation, is 0.25.
To take an extreme case, on this basis we can say "the likelihood of pH = 1 given the observation 'HH' is 1". But it is clearly not the case that the probability of pH = 1 given the observation is 1: the event 'HH' can occur for any pH > 0 (and often does, in reality, for pH roughly 0.5). If the probability of pH = 1 given the observation is 1, it means that pH must and can only be equal 1 for event 'HH' to occur which is obviously not true.
The likelihood function is not a probability density function – for example, the integral of a likelihood function is not in general 1. In this example, the integral of the likelihood density over the interval [0, 1] in pH is 1/3, demonstrating again that the likelihood density function cannot be interpreted as a probability density function for pH. On the other hand, given any particular value of pH, e.g. pH = 0.5, the integral of the probability density function over the domain of the random variables is 1.
Synonyms (in ancient Greek, συν ("syn") = plus and όνομα ("onoma") = name
..... Click the link for more information.
In mathematics, two quantities are called proportional if they vary in such a way that one of the quantities is a constant multiple of the other, or equivalently if they have a constant ratio.
..... Click the link for more information.
In a sense, likelihood works backwards from probability: given B, we use the conditional probability Pr(A|B) to reason about A, and, given A, we use the likelihood function L(B|A) to reason about B. This mode of reasoning is formalized in Bayes' theorem:
In statistics, a likelihood function is a conditional probability function considered as a function of its second argument with its first argument held fixed, thus:
and also any other function proportional to such a function. That is, the likelihood function for B is the equivalence class of functions
for any constant of proportionality
.
Thus the numerical value
is immaterial; all that matters are ratios of the form
since these are invariant with respect to the constant of proportionality.
For more about making inferences via likelihood functions, see also the method of maximum likelihood, and likelihood-ratio testing.
Concentrated likelihood
For a likelihood function of more than one parameter, it is sometimes possible to write some parameters as functions of other parameters, thereby reducing the number of independent parameters. (The function is the parameter value which maximises the likelihood given the value of the other parameters.) This procedure is called concentration of the parameters and results in the concentrated likelihood function.For example, consider a regression analysis model with normally distributed errors. The most likely value of the error variance is the variance of the residuals. The residuals depend on all other parameters. Hence the variance parameter can be written as a function of the other parameters.
Historical remarks
Some early thoughts on likelihood were made in a book by Thorvald N. Thiele published in 1889[1]. The first paper where the full idea of the "likelihood" appears was written by R.A. Fisher in 1922[2]: "On the mathematical foundations of theoretical statistics". In that paper, Fisher also uses the term "method of maximum likelihood". Fisher argues against inverse probability as a basis for statistical inferences, and instead proposes inferences based on likelihood functions.Likelihood function of a parameterized model
Among many applications, we consider here one of broad theoretical and practical importance. Given a parameterized family of probability density functionswhere θ is the parameter (in the case of discrete distributions, the probability density functions are probability "mass" functions) the likelihood function is
where x is the observed outcome of an experiment. In other words, when f(x | θ) is viewed as a function of x with θ fixed, it is a probability density function, and when viewed as a function of θ with x fixed, it is a likelihood function.
Note: This is not the same as the probability that those parameters are the right ones, given the observed sample. Attempting to interpret the likelihood of a hypothesis given observed evidence as the probability of the hypothesis is a common error, with potentially disastrous real-world consequences in medicine, engineering or jurisprudence. See prosecutor's fallacy for an example of this.
Example
For example, if I toss a coin, with a probability pH of landing heads up ('H'), the probability of getting two heads in two trials ('HH') is pH2. If pH = 0.5, then the probability of seeing two heads is 0.25.In symbols, we can say the above as
Another way of saying this is to reverse it and say that "the likelihood of pH = 0.5, given the observation 'HH', is 0.25", i.e.,
.
But this is not the same as saying that the probability of pH = 0.5, given the observation, is 0.25.
To take an extreme case, on this basis we can say "the likelihood of pH = 1 given the observation 'HH' is 1". But it is clearly not the case that the probability of pH = 1 given the observation is 1: the event 'HH' can occur for any pH > 0 (and often does, in reality, for pH roughly 0.5). If the probability of pH = 1 given the observation is 1, it means that pH must and can only be equal 1 for event 'HH' to occur which is obviously not true.
The likelihood function is not a probability density function – for example, the integral of a likelihood function is not in general 1. In this example, the integral of the likelihood density over the interval [0, 1] in pH is 1/3, demonstrating again that the likelihood density function cannot be interpreted as a probability density function for pH. On the other hand, given any particular value of pH, e.g. pH = 0.5, the integral of the probability density function over the domain of the random variables is 1.
See also
- Bayes factor
- Bayesian inference
- conditional probability
- likelihood principle
- likelihood-ratio test
- maximum likelihood
- principle of maximum entropy
- score (statistics)
Notes
1. ^ Steffen L. Lauritzen, Aspects of T. N. Thiele's Contributions to Statistics (1999).
2. ^ Ronald A. Fisher. "On the mathematical foundations of theoretical statistics". Philosophical Transactions of the Royal Society, A, 222:309-368 (1922). ("Likelihood" is discussed in section 6.)
2. ^ Ronald A. Fisher. "On the mathematical foundations of theoretical statistics". Philosophical Transactions of the Royal Society, A, 222:309-368 (1922). ("Likelihood" is discussed in section 6.)
References
- A. W. F. Edwards (1972). Likelihood: An account of the statistical concept of likelihood and its application to scientific inference, Cambridge University Press. Reprinted in 1992, expanded edition, Johns Hopkins University Press.
For the taxonomical term, see .
Synonyms (in ancient Greek, συν ("syn") = plus and όνομα ("onoma") = name
..... Click the link for more information.
Probability is the likelihood that something is the case or will happen. Probability theory is used extensively in areas such as statistics, mathematics, science and philosophy to draw conclusions about the likelihood of potential events and the underlying mechanics of
..... Click the link for more information.
..... Click the link for more information.
A definition is a statement of the meaning of a term, word or phrase. The term to be defined is known as the definiendum (Latin: that which is to be defined).
..... Click the link for more information.
..... Click the link for more information.
Bayes' theorem (also known as Bayes' rule or Bayes' law) is a result in probability theory, which relates the conditional and marginal probability distributions of random variables.
..... Click the link for more information.
..... Click the link for more information.
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities.
..... Click the link for more information.
..... Click the link for more information.
Conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional probability is written P(A|B), and is read "the probability of A, given B".
..... Click the link for more information.
..... Click the link for more information.
equivalence class of an element a in X is the subset of all elements in X which are equivalent to a:
The notion of equivalence classes is useful for constructing sets out of already constructed ones.
..... Click the link for more information.
- [a] =
The notion of equivalence classes is useful for constructing sets out of already constructed ones.
..... Click the link for more information.
This article or section is in need of attention from an expert on the subject.
Please help recruit one or [ improve this article] yourself. See the talk page for details.
..... Click the link for more information.
Please help recruit one or [ improve this article] yourself. See the talk page for details.
..... Click the link for more information.
- proportionality, see Proportionality (disambiguation).
In mathematics, two quantities are called proportional if they vary in such a way that one of the quantities is a constant multiple of the other, or equivalently if they have a constant ratio.
..... Click the link for more information.
Maximum likelihood estimation (MLE) is a popular statistical method used to calculate the best way of fitting a mathematical model to some data. Modeling real world data by estimating maximum likelihood offers a way of tuning the free parameters of the model to provide an
..... Click the link for more information.
..... Click the link for more information.
A likelihood-ratio test is a statistical test in which a ratio is computed between the maximum probability of a result under two different hypotheses, so that statisticians can make a decision between two hypotheses based on the value of this ratio.
..... Click the link for more information.
..... Click the link for more information.
Parameters, in the plural form, has recently become popular with non-technical users to mean limits, but this should not be confused with the word's technical meaning.
In mathematics, statistics, and the mathematical sciences, parameters (L: auxiliary measure
..... Click the link for more information.
In mathematics, statistics, and the mathematical sciences, parameters (L: auxiliary measure
..... Click the link for more information.
This article or section may be confusing or unclear for some readers.
Please [improve the article] or discuss this issue on the talk page. This article has been tagged since May 2007.
..... Click the link for more information.
Please [improve the article] or discuss this issue on the talk page. This article has been tagged since May 2007.
..... Click the link for more information.
normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields. Each member of the family may be defined by two parameters, location and scale: the mean ("average",
..... Click the link for more information.
..... Click the link for more information.
In statistics and optimization, the concepts of error and residual are easily confused with each other.
Error is a misnomer; an error is the amount by which an observation differs from its expected value; the latter being based on the whole
..... Click the link for more information.
Error is a misnomer; an error is the amount by which an observation differs from its expected value; the latter being based on the whole
..... Click the link for more information.
variance of a random variable (or somewhat more precisely, of a probability distribution) is one measure of statistical dispersion, averaging the squared distance of its possible values from the expected value.
..... Click the link for more information.
..... Click the link for more information.
In statistics and optimization, the concepts of error and residual are easily confused with each other.
Error is a misnomer; an error is the amount by which an observation differs from its expected value; the latter being based on the whole
..... Click the link for more information.
Error is a misnomer; an error is the amount by which an observation differs from its expected value; the latter being based on the whole
..... Click the link for more information.
Thorvald Nicolai Thiele (December 24 1838 – September 26 1910) was a Danish astronomer, actuary, and mathematician, most notable for his work in statistics, interpolation, and the three-body problem.
..... Click the link for more information.
..... Click the link for more information.
19th century - 20th century
1850s 1860s 1870s - 1880s - 1890s 1900s 1910s
1886 1887 1888 - 1889 - 1890 1891 1892
:
Subjects: Archaeology - Architecture -
..... Click the link for more information.
1850s 1860s 1870s - 1880s - 1890s 1900s 1910s
1886 1887 1888 - 1889 - 1890 1891 1892
:
Subjects: Archaeology - Architecture -
..... Click the link for more information.
Ronald Fisher
Sir Ronald Aylmer Fisher
Born 17 January 1890
East Finchley, London , England
..... Click the link for more information.
Sir Ronald Aylmer Fisher
Born 17 January 1890
East Finchley, London , England
..... Click the link for more information.
Maximum likelihood estimation (MLE) is a popular statistical method used to calculate the best way of fitting a mathematical model to some data. Modeling real world data by estimating maximum likelihood offers a way of tuning the free parameters of the model to provide an
..... Click the link for more information.
..... Click the link for more information.
In probability theory, inverse probability is an obsolete term for the probability distribution of an unobserved variable. Given a probability distribution p(x|θ) for an observable quantity x
..... Click the link for more information.
..... Click the link for more information.
In mathematics, a probability density function (pdf) is a function that represents a probability distribution in terms of integrals.
Formally, a probability distribution has density f, if f
..... Click the link for more information.
Formally, a probability distribution has density f, if f
..... Click the link for more information.
The prosecutor's fallacy is any of several fallacies of statistical reasoning often used in legal arguments. Two of the most common errors are described below:
..... Click the link for more information.
- One form of the fallacy results from misunderstanding conditional probability, or neglecting the prior odds of a
..... Click the link for more information.
COIN can refer to:
..... Click the link for more information.
- Collaborative Innovation Networks
- Counterinsurgency
- Coin
- This article is about monetary coins.
..... Click the link for more information.
In mathematics, a probability density function (pdf) is a function that represents a probability distribution in terms of integrals.
Formally, a probability distribution has density f, if f
..... Click the link for more information.
Formally, a probability distribution has density f, if f
..... Click the link for more information.
A random variable is an abstraction of the intuitive concept of chance into the theoretical domains of mathematics, forming the foundations of probability theory and mathematical statistics.
..... Click the link for more information.
..... Click the link for more information.
In statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing[1][2].
Given a model selection problem in which we have to choose between two models M1 and M2
..... Click the link for more information.
Given a model selection problem in which we have to choose between two models M1 and M2
..... Click the link for more information.
Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process.
..... Click the link for more information.
..... Click the link for more information.
Conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional probability is written P(A|B), and is read "the probability of A, given B".
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus






