Information about Confounding Variable

A confounding variable (also confounding factor, lurking variable, a confound, or confounder) is an extraneous variable in a statistical or research model that should have been experimentally controlled, but was not. Failing to take a confounding variable into account can lead to a false conclusion that the dependent variables are in a causal relationship with the independent variable. Such a relation between two observed variables is termed a spurious relationship.

For example, assume that a child's weight and a country's gross domestic product (GDP) rise with time. A person carrying out an experiment could measure weight and GDP, and conclude that a higher GDP causes children to gain weight. However, the confounding variable, time, was not accounted for, and is the real cause of both rises.

By definition, a confounding variable is associated with both the probable cause and the outcome. The confounder is not allowed to lie in the causal pathway between the cause and the outcome: If A is thought to be the cause of disease C, the confounding variable B may not be solely caused by behaviour A; and behaviour B shall not always lead to behaviour C. An example: Being female does not always lead to smoking tobacco, and smoking tobacco does not always lead to cancer. Therefore, in any study that tries to elucidate the relation between being female and cancer should take smoking into account as a possible confounder. In addition, a confounder is always a risk factor that has a different prevalence in two risk groups (e.g. females/males). (Hennekens, Buring & Mayrent, 1987).

In statistical experimental design, attempts are made to remove lurking variables such as the placebo effect from the experiment. Because it can never be certain that observational data are not hiding a confounding variable, it is never safe to conclude that a regression model demonstrates a causal relationship with 100% certainty, no matter how strong the association.

Though criteria for causality in statistical studies has been researched intensely, Pearl has shown that confounding variables cannot be defined in terms of statistical notions alone; some causal assumptions are necessary.[1] In a 1965 paper, Austin Bradford Hill proposed a set of causal criteria.[2]. Many working epidemiologists take these as a good place to start when considering confounding and causation. However, these are of heuristic value at best. When causal assumptions are articulated in the form of causal graph, a simple criterion is available, called backdoor, to identify sets of confounding variables.

Anecdotal evidence does not take account of confounding variables.

How to remove confounding in a study setup

There are several ways to exclude or to control confounding variables from a study. "Epidemiology in Medicine" by Hennekens/Buring/Mayrent (1987) gives an oversight into this topic.
  • Case-control studies assign confounders to both groups, cases and controls, equally. For example if somebody wanted to study the cause of myocardial infarct and thinks that the age is a probable confounding variable, each 67 years old infarct patient will be matched with a healthy 67 year old "control" person. In case-control studies, matched variables most often are the age and sex.
  • Cohort studies: A degree of matching is also possible and it is often done by only admitting certain age groups or a certain sex into the study population, and thus all cohorts are comparable in regard to the possible confounding variable. For example, if age and sex are thought to be a confounders, only 40 to 50 years old males would be involved in a cohort study that would assess the myocardial infarct risk in cohorts that either are physically active or inactive.
  • Stratification: As in the example above, physical activity is thought to be a behaviour that protects from myocardial infarct; and age is assumed to be a possible confounder. The data sampled is then stratified by age group – this means, the association between activity and infarct would be analyzed per each age group. If the different age groups (or age strata) yield much different risk ratios from the crude risk ratio, age must be viewed as a confounding variable. There are statistical tools like Mantel-Haenszel methods that deal with stratified data.
All these methods have their drawbacks. This can be clearly seen in this example: A 45 years old Afro-American from Alaska, avid football player and vegetarian, working in education, suffers from a disease and is enrolled into a case-control study. Proper matching would call for a person with the same characteristics, with the sole difference of being healthy – but finding such one would be an enormous task. Additionally, there is always the risk of over- and undermatching of the study population. In cohort studies, too many people can be excluded; and in stratification, single strata can get too thin and thus contain only a small, non-significant number of samples.
  • Multivariate analysis is also possible. Multinomial and logistic regression models exist; and the latter are especially suited if there are binary variables like "Vaccinated against polio: Yes/No". Yet, a drawback of these regression models is that they give little information about the strength of the confounding variable as opposed to stratification methods.

External links

These sites contain descriptions or examples of lurking variables:

References

1. ^ Pearl, Judea (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press. ISBN 0-521-77362-8. 
2. ^ Hill, Austin Bradford (1965). "The environment or disease: association or causation?". Proc R Soc Med 58 (May): 295-300. PMID 14283879. 

See also



Extraneous variables are variables other than the independent variable that may bear any effect on the behaviour of the subject being studied.

Extraneous variables are often classified into three main types:

..... Click the link for more information.
A statistical model is used in applied statistics. Three basic notions are sufficient to describe all statistical models.
  1. We choose a statistical unit which we will observe directly. Multiple observations of the same unit over time is called longitudinal research.

..... Click the link for more information.
A scientific control augments integrity in experiments by isolating variables as dictated by the scientific method in order to make a conclusion about such variables. In a controlled experiment, two virtually identical experiments are conducted.
..... Click the link for more information.
Causality or causation denotes the relationship between one event (called cause) and another event (called effect) which is the consequence (result) of the first. [1]
..... Click the link for more information.
In mathematics, an independent variable is any of the arguments, i.e. "inputs", to a function. These are contrasted with the dependent variable, which is the value, i.e. the "output", of the function.
..... Click the link for more information.
In statistics, a spurious relationship (or, sometimes, spurious correlation) is a mathematical relationship in which two occurrences have no causal connection, yet it may be inferred that they do, due to a certain third, unseen factor (referred to as a "confounding factor"
..... Click the link for more information.
gross domestic product, or GDP, is one of the ways for measuring the size of its economy. The GDP of a country is defined as the total market value of all final goods and services produced within a country in a given period of time (usually a calendar year).
..... Click the link for more information.
Design of experiments includes the design of all information-gathering exercises where variation is present, whether under the full control of the experimenter or not. (The latter situation is usually called an observational study.
..... Click the link for more information.
Placebo effect is the term applied by medical science to the therapeutical and healing effects of inert medicines and/or ritualistic or faith healing manipulations.[1] [2].
..... Click the link for more information.
Judea Pearl
Born 1936
Tel Aviv, Israel
Field Computer Science, Statistics
Alma mater Technion, Israel; Rutgers University, U.S.; Polytechnic Institute of Brooklyn, U.S.
..... Click the link for more information.
Austin Bradford Hill (July 8, 1897 - April 18, 1991), English epidemiologist and statistician, pioneered the randomized clinical trial and, together with Richard Doll, was the first to demonstrate the connection between cigarette smoking and lung cancer.
..... Click the link for more information.
A confounding variable (also confounding factor, lurking variable, a confound, or confounder) is an extraneous variable in a statistical or research model that should have been experimentally controlled, but was not.
..... Click the link for more information.
Anecdotal evidence is an informal account of evidence in the form of an anecdote or hearsay. The term is often used in contrast to scientific evidence, such as evidence-based medicine, which are types of formal accounts.
..... Click the link for more information.
Case-control studies are one type of epidemiological study design. It is used to identify factors that may contribute to a medical condition by comparing a group of patients who have that condition with a group of patients that do not.
..... Click the link for more information.
A cohort study is a form of longitudinal study used in medicine and social science. It is one type of study design.

In medicine, it is usually undertaken to obtain evidence to try to refute the existence of a suspected association between cause and disease; failure to refute
..... Click the link for more information.
Stratification is the building up of layers, and can have several variations of meaning:
  • Social stratification, is the dividing of a society into levels based on wealth or power.

..... Click the link for more information.
Multivariate analysis (MVA) is based on the statistical principle of multivariate statistics, which involves observation and analysis of more than one statistical variable at a time.
..... Click the link for more information.
Judea Pearl
Born 1936
Tel Aviv, Israel
Field Computer Science, Statistics
Alma mater Technion, Israel; Rutgers University, U.S.; Polytechnic Institute of Brooklyn, U.S.
..... Click the link for more information.
Cambridge University Press (known colloquially as CUP) is a publisher given a Royal Charter by Henry VIII in 1534, and one of the two privileged presses (the other being Oxford University Press).
..... Click the link for more information.
Anecdotal evidence is an informal account of evidence in the form of an anecdote or hearsay. The term is often used in contrast to scientific evidence, such as evidence-based medicine, which are types of formal accounts.
..... Click the link for more information.
In statistics, a spurious relationship (or, sometimes, spurious correlation) is a mathematical relationship in which two occurrences have no causal connection, yet it may be inferred that they do, due to a certain third, unseen factor (referred to as a "confounding factor"
..... Click the link for more information.
Simpson's paradox (or the Yule-Simpson effect) is a statistical paradox wherein the successes of groups seem reversed when the groups are combined. This result is often encountered in social and medical science statistics,[1]
..... Click the link for more information.
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities.
..... Click the link for more information.
Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data.
..... Click the link for more information.
In statistics, mean has two related meanings:
  • the arithmetic mean (and is distinguished from the geometric mean or harmonic mean).
  • the expected value of a random variable, which is also called the population mean.

..... Click the link for more information.
In mathematics and statistics, the arithmetic mean (or simply the mean) of a list of numbers is the sum of all the members of the list divided by the number of items in the list. The arithmetic mean is what students are taught very early to call the "average".
..... Click the link for more information.
The geometric mean of a collection of positive data is defined as the nth root of the product of all the members of the data set, where n is the number of members.
..... Click the link for more information.
median is described as the number separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking
..... Click the link for more information.
In statistics, mode means the most frequent value assumed by a random variable, or occurring in a sampling of a random variable. The term is applied both to probability distributions and to collections of experimental data.
..... Click the link for more information.
The power of a statistical test is the probability that the test will reject a false null hypothesis (that it will not make a Type II error). As power increases, the chances of a Type II error decrease, and vice versa. The probability of a Type II error is referred to as β.
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter