Information about Reliability (statistics)
In statistics, reliability is the consistency of a set of measurements or measuring instrument. This can either be whether the measurements of the same instrument give (test-retest) or are likely to give the same measurement, or in the case of more subjective instruments, whether two independent assessors give similar scores (inter-rater reliability). Reliability does not imply validity. That is, a reliable measure is measuring something consistently, but not necessarily what it is supposed to be measuring. For example, while there are many reliable tests of specific abilities, not all of them would be valid for predicting, say, job performance.
In experimental sciences, reliability is the extent to which the measurements of a test remain consistent over repeated tests of the same subject under identical conditions. An experiment is reliable if it yields consistent results of the same measure. It is unreliable if repeated measurements give different results.
In engineering, reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time." It is often reported in terms of a probability. Evaluations of reliability involve the use of many statistical tools. See Reliability engineering for further discussion.
Each of these estimation methods is sensitive to different sources of error and so might not be expected to be equal. Also, reliability is a property of the scores of a measure rather than the measure itself and are thus said to be sample dependent. Reliability estimates from one sample might differ from those of a second sample (beyond what might be expected due to sampling variations) if the second sample is drawn from a different population because the true reliability is different in this second population. (This is true of measures of all types--yardsticks might measure houses well yet have poor reliability when used to measure the lengths of insects.)
Reliability may be improved by clarity of expression (for written assessments), lengthening the measure, and other informal means. However, formal psychometric analysis, called the item analysis, is considered the most effective way to increase reliability. This analysis consists of computation of item difficulties and item discrimination indices, the latter index involving computation of correlations between the items and sum of the item scores of the entire test.
where
is the symbol for the reliability of the observed score, X; , , and are the variances on the measured, true and error scores respectively. Unfortunately, there is no way to directly observe or calculate the true score, so a variety of methods are used to estimate the reliability of a test.
Some examples of the methods to estimate reliability include test-retest reliability, internal consistency reliability, and parallel-test reliability. Each method comes at the problem of figuring out the source of error in the test somewhat differently.
Homogeneity in statistics and data analysis pertains to properties of logically consistent data matrices.
..... Click the link for more information.
In experimental sciences, reliability is the extent to which the measurements of a test remain consistent over repeated tests of the same subject under identical conditions. An experiment is reliable if it yields consistent results of the same measure. It is unreliable if repeated measurements give different results.
In engineering, reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time." It is often reported in terms of a probability. Evaluations of reliability involve the use of many statistical tools. See Reliability engineering for further discussion.
Estimation
Reliability may be estimated through a variety of methods that fall into two types: Single-administration and multiple-administration. Multiple-administration methods require that two assessments are administered. In the test-retest method, reliability is estimated as the Pearson product-moment correlation coefficient between two administrations of the same measure. In the alternate forms method, reliability is estimated by the Pearson product-moment correlation coefficient of two different forms of a measure, usually administered together. Single-administration methods include split-half and internal consistency. The split-half method treats the two halves of a measure as alternate forms. This "halves reliability" estimate is then stepped up to the full test length using the Spearman-Brown prediction formula. The most common internal consistency measure is Cronbach's alpha, which is usually interpreted as the mean of all possible split-half coefficients.Each of these estimation methods is sensitive to different sources of error and so might not be expected to be equal. Also, reliability is a property of the scores of a measure rather than the measure itself and are thus said to be sample dependent. Reliability estimates from one sample might differ from those of a second sample (beyond what might be expected due to sampling variations) if the second sample is drawn from a different population because the true reliability is different in this second population. (This is true of measures of all types--yardsticks might measure houses well yet have poor reliability when used to measure the lengths of insects.)
Reliability may be improved by clarity of expression (for written assessments), lengthening the measure, and other informal means. However, formal psychometric analysis, called the item analysis, is considered the most effective way to increase reliability. This analysis consists of computation of item difficulties and item discrimination indices, the latter index involving computation of correlations between the items and sum of the item scores of the entire test.
- .
- . (where
is the failure rate)
Classical test theory
In classical test theory, reliability is defined mathematically as the ratio of the variation of the true score and the variation of the observed score. Or, equivalently, one minus the ratio of the variation of the error score and the variation of the observed score:where
is the symbol for the reliability of the observed score, X; , , and are the variances on the measured, true and error scores respectively. Unfortunately, there is no way to directly observe or calculate the true score, so a variety of methods are used to estimate the reliability of a test.
Some examples of the methods to estimate reliability include test-retest reliability, internal consistency reliability, and parallel-test reliability. Each method comes at the problem of figuring out the source of error in the test somewhat differently.
Item response theory
It was well-known to classical test theorists that measurement precision is not uniform across the scale of measurement. Tests tend to distinguish better for test-takers with moderate trait levels and worse among high- and low-scoring test-takers. Item response theory extends the concept of reliability from a single index to a function called the information function. The IRT information function is the inverse of the conditional observed score standard error at any given test score. Higher levels of IRT information indicate higher precision and thus greater reliability.See also
- Accuracy
- Censoring (statistics)
- Coefficient of variation
- homogeneity (statistics)
- Levels of measurement
- Precision
- Reliability theory
- Reliability engineering
- Scientific method
- Statistics
- Validity (statistics)
External links
- Reliability
- Uncertainty models, uncertainty quantification, and uncertainty processing in engineering
- The relationships between correlational and internal consistency concepts of test reliability
- The problem of negative reliabilities
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities.
..... Click the link for more information.
..... Click the link for more information.
Inter-rater reliability, Inter-rater agreement, or Concordance is the degree of agreement among raters. It gives a score of how much , or consensus, there is in the ratings given by judges.
..... Click the link for more information.
..... Click the link for more information.
This article or section is in need of attention from an expert on the subject.
Please help recruit one or [ improve this article] yourself. See the talk page for details.
..... Click the link for more information.
Please help recruit one or [ improve this article] yourself. See the talk page for details.
..... Click the link for more information.
In the scientific method, an experiment (Latin: ex- periri, "of (or from) trying") is a set of observations performed in the context of solving a particular problem or question, to support or falsify a hypothesis or research concerning phenomena.
..... Click the link for more information.
..... Click the link for more information.
Engineering is the applied science of acquiring and applying knowledge to design, analysis, and/or construction of works for practical purposes. The American Engineers' Council for Professional Development, also known as ECPD,[1] (later ABET [2]
..... Click the link for more information.
..... Click the link for more information.
Reliability engineering is an engineering field, that deals with the study reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time.[1] It is often reported in terms of a probability.
..... Click the link for more information.
..... Click the link for more information.
In statistics, the Pearson product-moment correlation coefficient (sometimes known as the PMCC) (r) is a measure of the correlation of two variables X and Y
..... Click the link for more information.
..... Click the link for more information.
The Spearman-Brown prediction formula (also known as the Spearman-Brown prophecy formula) is a formula relating psychometric reliability to test length:
where is the predicted reliability; N
..... Click the link for more information.
where is the predicted reliability; N
..... Click the link for more information.
Cronbach's (alpha) has an important use as a measure of the reliability of a psychometric instrument. It was first named as alpha by Cronbach (1951), as he had intended to continue with further instruments.
..... Click the link for more information.
..... Click the link for more information.
Classical test theory is a body of related psychometric theory that predict outcomes of psychological testing such as the difficulty of items or the ability of test-takers.
..... Click the link for more information.
..... Click the link for more information.
Item response theory is a body of theory used in the field of psychometrics. Pychometrics is concerned with the theory and technique of educational and psychological measurement.
..... Click the link for more information.
..... Click the link for more information.
accuracy is the degree of conformity of a measured or calculated quantity to its actual (true) value. Accuracy is closely related to precision, also called reproducibility or repeatability, the degree to which further measurements or calculations show the same or similar
..... Click the link for more information.
..... Click the link for more information.
In statistics, censoring occurs when the value of an observation is only partially known.
For example, suppose a study is conducted to measure the impact of a drug on mortality. In such a study, it may be known that an individual's age at death is at least 75 years.
..... Click the link for more information.
For example, suppose a study is conducted to measure the impact of a drug on mortality. In such a study, it may be known that an individual's age at death is at least 75 years.
..... Click the link for more information.
In probability theory and statistics, the coefficient of variation (CV) is a measure of dispersion of a probability distribution. It is defined as the ratio of the standard deviation to the mean :
..... Click the link for more information.
..... Click the link for more information.
- For homogeneity of variance see homoscedasticity.
Homogeneity in statistics and data analysis pertains to properties of logically consistent data matrices.
..... Click the link for more information.
The level of measurement of a variable in mathematics and statistics is a classification that was proposed in order to describe the nature of information contained within numbers assigned to objects and, therefore, within the variable.
..... Click the link for more information.
..... Click the link for more information.
Precision has the following meanings:
..... Click the link for more information.
- In engineering, science, industry, and statistics, precision characterises the degree of mutual agreement among a series of individual measurements, values, or results — see accuracy and precision.
..... Click the link for more information.
Reliability theory developed apart from the mainstream of probability and statistics. It was originally a tool to help nineteenth century maritime insurance and life insurance companies compute profitable rates to charge their customers.
..... Click the link for more information.
..... Click the link for more information.
Reliability engineering is an engineering field, that deals with the study reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time.[1] It is often reported in terms of a probability.
..... Click the link for more information.
..... Click the link for more information.
Scientific method is a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. It is based on gathering observable, empirical and measurable evidence subject to specific principles of reasoning,[1]
..... Click the link for more information.
..... Click the link for more information.
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities.
..... Click the link for more information.
..... Click the link for more information.
This article or section is in need of attention from an expert on the subject.
Please help recruit one or [ improve this article] yourself. See the talk page for details.
..... Click the link for more information.
Please help recruit one or [ improve this article] yourself. See the talk page for details.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus