Everything Totally Explained


Ask & we'll explain, totally!
Correlation
Totally Explained


  NEW! All the latest news in the worlds of computer gaming, entertainment, the environment,  
finance, health, politics, science, stocks & shares, technology and much, much, more.  


View this entry using RSS

Everything about Correlate totally explained

» This article is about the correlation coefficient between two variables. The term correlation can also mean the cross-correlation of two functions or electron correlation in molecular systems.

In probability theory and statistics, correlation, (often measured as a correlation coefficient), indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co-relation refers to the departure of two variables from independence. In this broad sense there are several coefficients, measuring the degree of correlation, adapted to the nature of data.
   A number of different coefficients are used for different situations. The best known is the Pearson product-moment correlation coefficient, which is obtained by dividing the covariance of the two variables by the product of their standard deviations. Despite its name, it was first introduced by Francis Galton.

Pearson's product-moment coefficient

Mathematical properties

The correlation coefficient ρX, Y between two random variables X and Y with expected values μX and μY and standard deviations σX and σY is defined as:
» ho_.

Common misconceptions about correlation

Correlation and causality

The conventional dictum that "correlation doesn't imply causation" means that correlation can't be validly used to infer a causal relationship between the variables. This dictum shouldn't be taken to mean that correlations can't indicate causal relations. However, the causes underlying the correlation, if any, may be indirect and unknown. Consequently, establishing a correlation between two variables isn't a sufficient condition to establish a causal relationship (in either direction).
   Here is a simple example: hot weather may cause both a reduction in purchases of warm clothing and an increase in ice-cream purchases. Therefore warm clothing purchases are correlated with ice-cream purchases. But a reduction in warm clothing purchases doesn't cause ice-cream purchases and ice-cream purchases don't cause a reduction in warm clothing purchases.
   A correlation between age and height in children is fairly causally transparent, but a correlation between mood and health in people is less so. Does improved mood lead to improved health? Or does good health lead to good mood? Or does some other factor underlie both? Or is it pure coincidence? In other words, a correlation can be taken as evidence for a possible causal relationship, but can't indicate what the causal relationship, if any, might be.

Correlation and linearity

While Pearson correlation indicates the strength of a linear relationship between two variables, its value alone may not be sufficient to evaluate this relationship, especially in the case where the assumption of normality is incorrect.
   The image on the right shows scatterplots of Anscombe's quartet, a set of four different pairs of variables created by Francis Anscombe. The four y variables have the same mean (7.5), standard deviation (4.12), correlation (0.81) and regression line (y = 3 + 0.5x). However, as can be seen on the plots, the distribution of the variables is very different. The first one (top left) seems to be distributed normally, and corresponds to what one would expect when considering two variables correlated and following the assumption of normality. The second one (top right) isn't distributed normally; while an obvious relationship between the two variables can be observed, it isn't linear, and the Pearson correlation coefficient isn't relevant. In the third case (bottom left), the linear relationship is perfect, except for one outlier which exerts enough influence to lower the correlation coefficient from 1 to 0.81. Finally, the fourth example (bottom right) shows another example when one outlier is enough to produce a high correlation coefficient, even though the relationship between the two variables isn't linear.
   These examples indicate that the correlation coefficient, as a summary statistic, can't replace the individual examination of the data.

Computing correlation accurately in a single pass

The following algorithm (in pseudocode) will calculate Pearson correlation with good numerical stability.
    sum_sq_x = 0 sum_sq_y = 0 sum_coproduct = 0 mean_x = x[1] mean_y = y[1] for i in 2 to N:
sweep = (i - 1.0) / i delta_x = x[i] - mean_x delta_y = y[i] - mean_y sum_sq_x += delta_x * delta_x * sweep sum_sq_y += delta_y * delta_y * sweep sum_coproduct += delta_x * delta_y * sweep mean_x += delta_x / i mean_y += delta_y / i pop_sd_x = sqrt(sum_sq_x / N ) pop_sd_y = sqrt(sum_sq_y / N ) cov_x_y = sum_coproduct / N correlation = cov_x_y / (pop_sd_x * pop_sd_y)

Further Information

Get more info on 'Correlate'.


External Link Exchanges

Do you know how hard it is to get a link from a large encyclopaedia? Well we're different and will prove it. To get a link from us just add the following HTML to your site on a relevant page:

    <a href="http://correlation.totallyexplained.com">Correlation Totally Explained</a>

Then simply click through this link from your web page. Our crawlers will verify your link, extract the title of your web page and instantly add a link back to it. If you like you can remove the words Totally Explained and embed the link in article text.
   As long as your link remains in place, we'll keep our link to you right here. Please play fair - our crawlers are watching. Your site must be closely related to this one's topic. Any kind of spamming, dubious practises or removing the link will result in your link from us being dropped and, potentially, your whole site being banned.



Copyright © 2007-8 totallyexplained.com | Licensed under the GNU Free Documentation License | Site Map
This article contains text from the Wikipedia article Correlation (History) and is released under the GFDL | RSS Version