Information about Coalescent Theory
In genetics, coalescent theory is a retrospective model of population genetics that traces all alleles of a gene in a sample from a population to a single ancestral copy shared by all members of the population, known as the most recent common ancestor (MRCA; sometimes also termed the coancestor to emphasize the coalescent relationship[1]). The inheritance relationships between alleles are typically represented as a gene genealogy, similar in form to a phylogenetic tree. This gene genealogy is also known as the coalescent; understanding the statistical properties of the coalescent under different assumptions forms the basis of coalescent theory. In the most simple case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure. Advances in coalescent theory, however, allow extension to the basic coalescent, and can include recombination, selection, and virtually any arbitrarily complex evolutionary or demographic model in population genetic analysis. The mathematical theory of the coalescent was originally developed in the early 1980s by John Kingman[2].
in the population at that time. For a diploid population of size
and (neutral) mutation rate
, the initial frequency of a novel mutation is simply
and the number of new mutations per generation is
. Since the fixation rate is the rate of novel neutral mutation multiplied by their probability of fixation, the overall fixation rate is
. Thus the rate of fixation for a mutation not subject to selection is simply the rate of introduction of such mutations.
The probability that two lineages coalesce in the immediately preceding generation is the probability that they share a parent. In a diploid population of constant size with
copies of each locus, there are
"potential parents" in the previous generation, so the probability that two alleles share a parent is
and correspondingly, the probability that they do not coalesce is
.
At each successive preceding generation, the probability of coalescence is geometrically distributed - that is, it is the probability of noncoalescence at the
preceding generations multiplied by the probability of coalescence at the generation of interest:
For sufficiently large values of
, this distribution is well approximated by the continuously defined exponential distribution
The standard exponential distribution has both the expectation value and the standard deviation equal to
- therefore, although the expected time to coalescence is
, actual coalescence times have a wide range of variation.
. Mean heterozygosity is calculated as the probability of a mutation occurring at a given generation divided by the probability of any "event" at that generation (either a mutation or a coalescence). The probability that the event is a mutation is the probability of a mutation in either of the two lineages:
. Thus the mean heterozygosity is equal to
For
, the vast majority of allele pairs have at least one difference in nucleotide sequence.
..... Click the link for more information.
..... Click the link for more information.
Natural selection is the process by which favorable traits that are heritable become more common in successive generations of a population of reproducing organisms, and unfavorable traits that are heritable become less
..... Click the link for more information.
Theory
Consider two distinct haploid organisms who differ at a single nucleotide. By tracing the ancestry of these two individuals backwards there will be a point in time when the Most Recent Common Ancestor (MRCA) is encountered and the two lineages will have coalesced.Probability of fixation
Under conditions of genetic drift alone, every finite set of genes or alleles has a "coalescent point" at which all descendants converge to a single ancestor (i.e. they 'coalesce'). This fact can be used to derive the rate of gene fixation of a neutral allele (that is, one not under any form of selection) for a population of varying size (provided that it is finite and nonzero). Because the effect of natural selection is stipulated to be negligible, the probability at any given time of an allele becoming fixed is just its frequency
in the population at that time. For a diploid population of size
and (neutral) mutation rate
, the initial frequency of a novel mutation is simply
and the number of new mutations per generation is
. Since the fixation rate is the rate of novel neutral mutation multiplied by their probability of fixation, the overall fixation rate is
. Thus the rate of fixation for a mutation not subject to selection is simply the rate of introduction of such mutations.
Time to coalescence
A useful analysis based on coalescence theory seeks to predict the amount of time elapsed between the introduction of a mutation and a particular allele or gene distribution in a population. This time period is equal to how long ago the most recent common ancestor existed.The probability that two lineages coalesce in the immediately preceding generation is the probability that they share a parent. In a diploid population of constant size with
copies of each locus, there are
"potential parents" in the previous generation, so the probability that two alleles share a parent is
and correspondingly, the probability that they do not coalesce is
.
At each successive preceding generation, the probability of coalescence is geometrically distributed - that is, it is the probability of noncoalescence at the
preceding generations multiplied by the probability of coalescence at the generation of interest:
For sufficiently large values of
, this distribution is well approximated by the continuously defined exponential distribution
The standard exponential distribution has both the expectation value and the standard deviation equal to
- therefore, although the expected time to coalescence is
, actual coalescence times have a wide range of variation.
Neutral variation
Coalescent theory can also be used to model the amount of variation in DNA sequences expected from genetic drift alone. This value is termed the mean heterozygosity, represented as
. Mean heterozygosity is calculated as the probability of a mutation occurring at a given generation divided by the probability of any "event" at that generation (either a mutation or a coalescence). The probability that the event is a mutation is the probability of a mutation in either of the two lineages:
. Thus the mean heterozygosity is equal to
For
, the vast majority of allele pairs have at least one difference in nucleotide sequence.
Graphical representation
Coalescents can be visualised using dendograms which show the relationship of branches of the population to each other. The point where two branches meet indicates the Most Recent Common Ancestor (MRCA).Applications
Disease gene mapping
The utility of coalescent theory in the mapping of disease is slowly gaining more appreciation; although the application of the theory is still in its infancy there are a number of researchers who are actively developing algorithms for the analysis of human genetic data that utilise coalescent theory[3][4][5].History
Coalescent theory is a natural extension the more classical population genetics concept of neutral evolution and is an approximation to the Fisher-Wright (or Wright-Fisher) model for large populations. It was 'discovered' independently by several researchers in the 1980's [6][7][8][9], but the definitive formalisation is attributed to Kingman [6]. Major contributions to the development of coalescent theory have been made by Peter Donnelly[10], Robert Griffiths, Richard R Hudson[11] and Simon Tavaré[10], this has included incorporating variations in population size[12] recombination and selection[13][14].Software
A large body of software exists for simulating data sets under the coalescent process, and gradually software is emerging that allows the analysis of human genetics data for the mapping of disease susceptibility loci.- BEAST - Bayesian MCMC inference package with a wide range of coalescent models including the use of temporally sampled sequences.
- CoaSim - software for simulating genetic data under the coalescent model.
- GeneRecon - software for the fine-scale mapping of linkage disequilibrium mapping of disease genes using coalescent theory based on an Bayesian MCMC framework.
- genetree software for estimation of population genetics parameters using coalescent theory and simulation (the R package popgen). See also Oxford Mathematical Genetics and Bioinformatics Group
- GENOME - rapid coalescent-based whole-genome simulation[15]
- Migrate - Maximum likelihood and Bayesian inference of migration rates under the n-coalescent. The inference is implemented using MCMC
- Lamarc - software for estimation of rates of population growth, migration, and recombination.
- MS & MShot - Richard Hudson's original program for generating samples under neutral models [16] and an extension which allows recombination hotspots[17].
- SARG - Structure Ancestral Recombination Graph by Magnus Nordborg
References and notes
Articles
- ^ Browning, S.R. (2006) Multilocus association mapping using variable-length markov chains. http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1474089&blobtype=pdf American Journal of Human Genetics 78:903-913
- ^ Donnelly, P., Tavaré, S. (1995) Coalescents and genealogical structure under neutrality. Annual Review of Genetics 29:401-421
- ^ Hellenthal, G., Stephens M. (2006) msHOT: modifying Hudson's ms simulator to incorporate crossover and gene conversion hotspots http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btl622v1 Bioinformatics AOP
- ^ Hudson RR (1983a) Testing the constant-rate neutral allele model with protein sequence data. Evolution 37: 203-207 JSTOR copy
- ^ Hudson RR (1983b) Properties of a neutral allele model with intragenic recombination. Theoretical Population Biology 23:183 - 201.
- ^ Hudson RR (1991) Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology 7: 1-44
- ^ Hudson RR (2002) Generating samples under a Wright-Fisher neutral model. http://bioinformatics.oxfordjournals.org/cgi/content/abstract/18/2/337 Bioinformatics 18:337-338
- Hein, J. , Schierup, M., Wiuf C. (2004) Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory Oxford University Press ISBN 978-0198529965
- ^ Kaplan, N.L., Darden, T., Hudson, R.R. (1988) The coalescent process in models with selection. Genetics 120:819-829
- ^ Kingman, J.F.C. (1982) On the Genealogy of Large Populations. Journal of Applied Probability 19A:27-43 JSTOR copy
- ^ Kingman, J.F.C. (2000) Origins of the coalescent 1974-1982. http://www.genetics.org/cgi/content/full/156/4/1461 Genetics 156:1461-1463
- ^ Liang L., Zöllner S., Abecasis G.R. (2007) GENOME: a rapid coalescent-based whole genome simulator. Bioinformatics http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/12/1565 23: 1565-1567
- ^ Mailund, T., Schierup, M.H., Pedersen, C.N.S., Mechlenborg, P.J.M., Madsen, J.N., Schauser, L. (2005) CoaSim: A Flexible Environment for Simulating Genetic Data under Coalescent Models BMC Bioinformatics 6:252
- ^ Morris, A. P., Whittaker, J. C., Balding, D. J. (2002) Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=384946&blobtype=pdf American Journal of Human Genetics 70:686-707
- ^ Neuhauser, C., Krone, S.M. (1997) The genealogy of samples in models with selection Genetics 145 519-534
- ^ Rosenberg, N.A., Nordborg, M. (2002) Genealogical Trees, Coalescent Theory and the Analysis of Genetic Polymorphisms. Nature Reviews Genetics 3:380-390
- ^ Slatkin, M. (2001) Simulating genealogies of selected alleles in populations of variable size Genetic Research 145:519-534
- ^ Tajima, F. (1983) Evolutionary Relationship of DNA Sequences in finite populations. Genetics 105:437-460
- ^ Zöllner S. and Pritchard J.K. (2005) Coalescent-Based Association Mapping and Fine Mapping of Complex Trait Loci http://217.160.247.249:8000/cgi-bin/nph-proxy2.cgi/010110A/http/pritch.bsd.uchicago.edu/publications/ZollnerAndPritchard05.pdf Genetics 169:1071–1092
Books
- Hein, J; Schierup, M. H., and Wiuf, C. Gene Genealogies, Variation and Evolution – A Primer in Coalescent Theory. Oxford University Press, 2005. ISBN 0-19-852996-1.
- Nordborg, M. (2001) Introduction to Coalecsent Theory Chapter 7 in Balding, D., Bishop, M., Cannings, C., editors, Handbook of Statistical Genetics. Wiley ISBN 978-0471860945
- Wakeley J. (2006) An Introduction to Coalescent Theory Roberts & Co ISBN: 0-9747077-5-9 Accompanying website with sample chapters
- ^ Rice SH. (2004). Evolutionary Theory: Mathematical and Conceptual Foundations. Sinauer Associates: Sunderland, MA. See esp. ch. 3 for detailed derivations.
External links
- EvoMath 3: Genetic Drift and Coalescence, Briefly - overview, with probability equations for genetic drift, and simulation graphs
- Coalescent Likelihood Methods - page with lecture notes by Mary Kuhner on using likelihood approaches to the coalescent. Presented at the annual Workshop on Molecular Evolution at the Marine Biological Laboratory in Woods Hole, Massachusetts.
Topics in population genetics | |
|---|---|
| Key concepts | Hardy-Weinberg law • genetic linkage • linkage disequilibrium • Fisher's fundamental theorem • neutral theory |
| Selection | natural • sexual • artificial • ecological |
| Effects of selection on genomic variation | genetic hitchhiking • background selection |
| Genetic drift | small population size • population bottleneck • founder effect • coalescence |
| Founders | R.A. Fisher • J. B. S. Haldane • Sewall Wright |
| Related topics | evolution • microevolution • evolutionary game theory • fitness landscape • genetic genealogy |
| List of evolutionary biology topics | |
Genetics is the science of heredity and variation in living organisms.[1][2] Knowledge of the inheritance of characteristics has been implicitly used since prehistoric times for improving crop plants and animals through selective breeding.
..... Click the link for more information.
..... Click the link for more information.
For the hard rock band, see .
An allele (Pronounced: /əˈlil/) is a viable DNA (deoxyribonucleic acid) coding that occupies a given locus (position) on a chromosome...... Click the link for more information.
For a non-technical introduction to the topic, see .
A gene is a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions...... Click the link for more information.
The most recent common ancestor (MRCA) of any set of organisms is the most recent individual from which all organisms in the group are directly descended. The term is most frequently used of humans.
..... Click the link for more information.
..... Click the link for more information.
A phylogenetic tree, also called an evolutionary tree, is a tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common ancestor.
..... Click the link for more information.
..... Click the link for more information.
Genetic recombination is the process by which a strand of DNA is broken and then joined to the end of a different DNA molecule. In eukaryotes recombination commonly occurs during meiosis as chromosomal crossover between paired chromosomes.
..... Click the link for more information.
..... Click the link for more information.
Natural selection is the process by which favorable traits that are heritable become more common in successive generations of a population of reproducing organisms, and unfavorable traits that are heritable become less
..... Click the link for more information.
In population genetics, gene flow (also known as gene migration) is the transfer of alleles of genes from one population to another.
..... Click the link for more information.
..... Click the link for more information.
Sir John Frank Charles Kingman, a mathematician, was born on 28 August 1939 in Beckenham, Kent1. Since 2001, he has been N. M. Rothschild and Sons Professor of Mathematical Science and Director of the Isaac Newton Institute at the University of Cambridge1,2,3.
..... Click the link for more information.
..... Click the link for more information.
Plantae Chromalveolata Heterokontophyta Haptophyta Cryptophyta Alveolata
..... Click the link for more information.
..... Click the link for more information.
A nucleotide is a chemical compound that consists of 3 portions: a heterocyclic base, a sugar, and one or more phosphate groups. In the most common nucleotides the base is a derivative of purine or pyrimidine, and the sugar is the pentose (five-carbon sugar) deoxyribose or ribose.
..... Click the link for more information.
..... Click the link for more information.
In population genetics, fixation occurs when every individual within a population has the same allele at a particular locus. The allele, such as a single point mutation or whole gene, will be initially rare (e.g.
..... Click the link for more information.
..... Click the link for more information.
Probability is the likelihood that something is the case or will happen. Probability theory is used extensively in areas such as statistics, mathematics, science and philosophy to draw conclusions about the likelihood of potential events and the underlying mechanics of
..... Click the link for more information.
..... Click the link for more information.
mutations are changes to the base pair sequence of the genetic material of an organism. Mutations can be caused by copying errors in the genetic material during cell division, by exposure to ultraviolet or ionizing radiation, chemical mutagens, or viruses, or can occur deliberately
..... Click the link for more information.
..... Click the link for more information.
mutations are changes to the base pair sequence of the genetic material of an organism. Mutations can be caused by copying errors in the genetic material during cell division, by exposure to ultraviolet or ionizing radiation, chemical mutagens, or viruses, or can occur deliberately
..... Click the link for more information.
..... Click the link for more information.
Lineage may refer to:
In science:
..... Click the link for more information.
In science:
- Lineage (anthropology), descent group that can demonstrate their common descent from an apical ancestor
- Lineage (evolution), group composed of species, taxa, or individuals related by descent from a common ancestor
..... Click the link for more information.
geometric distribution is either of two discrete probability distributions:
..... Click the link for more information.
- the probability distribution of the number X of Bernoulli trials needed to get one success, supported on the set , or
- the probability distribution of the number Y
..... Click the link for more information.
exponential distributions are a class of continuous probability distribution. They are often used to model the time between independent events that happen at a constant average rate.
..... Click the link for more information.
..... Click the link for more information.
expected value (or mathematical expectation, or mean) of a discrete random variable is the sum of the probability of each possible outcome of the experiment multiplied by the outcome value (or payoff).
..... Click the link for more information.
..... Click the link for more information.
In probability and statistics, the standard deviation of a probability distribution, random variable, or population or multiset of values is a measure of the spread of its values. It is usually denoted with the letter σ (lower case sigma).
..... Click the link for more information.
..... Click the link for more information.
Editing of this page by unregistered or newly registered users is currently disabled due to vandalism.
If you are prevented from editing this page, and you wish to make a change, please discuss changes on the talk page, request unprotection, log in, or .
..... Click the link for more information.
If you are prevented from editing this page, and you wish to make a change, please discuss changes on the talk page, request unprotection, log in, or .
..... Click the link for more information.
Zygosity refers to the genetic condition of a zygote. In genetics, zygosity describes the similarity or dissimilarity of DNA between homologous chromosomes at a specific allelic position or gene.
Every gene in a diploid organism has two alleles at the gene's locus.
..... Click the link for more information.
Every gene in a diploid organism has two alleles at the gene's locus.
..... Click the link for more information.
A nucleotide is a chemical compound that consists of 3 portions: a heterocyclic base, a sugar, and one or more phosphate groups. In the most common nucleotides the base is a derivative of purine or pyrimidine, and the sugar is the pentose (five-carbon sugar) deoxyribose or ribose.
..... Click the link for more information.
..... Click the link for more information.
Population genetics is the study of the allele frequency distribution and change under the influence of the four evolutionary forces: natural selection, genetic drift, mutation and gene flow. It also takes account of population subdivision and population structure in space.
..... Click the link for more information.
..... Click the link for more information.
19th century - 20th century - 21st century
1950s 1960s 1970s - 1980s - 1990s 2000s 2010s
1977 1978 1979 - 1980 - 1981 1982 1983
Year 1980 (MCMLXXX
..... Click the link for more information.
1950s 1960s 1970s - 1980s - 1990s 2000s 2010s
1977 1978 1979 - 1980 - 1981 1982 1983
Year 1980 (MCMLXXX
..... Click the link for more information.
Peter Donnelly, FRS is an Australian mathematician and Professor of Statistical Science at the University of Oxford. He is a specialist in applied probability and has made important mathematical contributions to coalescent theory.
..... Click the link for more information.
..... Click the link for more information.
Robert C. Griffiths is an Australian mathematician and statistician. He is Professor of Mathematical Genetics in the University of Oxford.
..... Click the link for more information.
..... Click the link for more information.
Bayesian refers to methods in probability and statistics named after the Reverend Thomas Bayes (ca. 1702–1761), in particular methods related to:
..... Click the link for more information.
- the degree-of-belief interpretation of probability, as opposed to frequency or proportion or propensity
..... Click the link for more information.
MCMC may refer to:
..... Click the link for more information.
- Malaysian Communications and Multimedia Commission
- Markov chain Monte Carlo
..... Click the link for more information.
Linkage disequilibrium is a term used in the study of population genetics for the non-random association of alleles at two or more loci, not necessarily on the same chromosome.
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus



