Users reviews
We accept

A fresh look at correlation coefficient

In this review, we’re going to talk about correlation coefficient or CC for short. To cut a long story short, a correlation coefficient simply unveils the strength of the linear relationship between y and x. It should be stressed that the overall reliability of the linear model depends on the number of observed data points in the sample. We require looking at both the value of the sample size n and correlation coefficient r.

We need to carry out a hypothesis test of the significance of the correlation coefficient in order to figure out whether linear relationship in the sample data is quite strong to utilize it for modeling the relationship in the population.

As for sample data, it’s mostly employed for computing r, or in other words the correlation coefficient for the sample. Let’s assume we’ve got data for the entire population. In this case, we could figure out the precise population correlation coefficient. However, considering that we have just sample data, in reality we don’t have an opportunity to calculate our population correlation coefficient. As for the sample correlation coefficient, it’s our evaluation of the hidden population correlation coefficient.

  • To designate the population correlation coefficient we’ll use “ρ”. That’s a Greek letter “rho”.
  • The unknown population correlation coefficient will be marked by ρ.
  • We’ll also use r for our sample correlation coefficient, calculated using sample data.

With the help of the hypothesis test, we’ll know for sure whether the actual value of the population correlation coefficient ρ is quite close to 0 or considerably different from this value.

If our test indisputably concludes that the correlation coefficient is much different from 0, then, we can say the correlation coefficient isn’t considerable.

We can hardly conclude that there’s a considerable linear relationship between y and x, just because the correlation coefficient doesn’t differ much from 0. As follows from this, we don’t have an opportunity to make use of the regression line here in order to model a linear relationship between y and x in the population.

However, you should keep in mind the following things:

  • If r isn’t considerable OR and if the scatter plot doesn’t give a linear trend, then the line should be never utilized for prediction.
  • If r appears to be considerable and the scatter plot clearly discloses a linear trend, the line can be successfully employed for prediction the value of y for values of x, which are in the domain of observed x values.
  • If r is also considerable and the scatter plot points out to a linear trend too, the line might not be reliable enough or appropriate to predict outside the domain of observed x values within the data.

Making a conclusion

In fact, there are two major methods to make a conclusion. They’re both the same, so they’re supposed to offer the same result. The first method suggests the use of the p-value, while in the second case the research should rely on a table of critical values.

Additionally, when making use of the p-value method, we could pick up any appropriate significance level, so we aren’t limited as for employing α = 0.05. However, the table of critical values offered in the given textbook assumes we’re utilizing a significance level of 5%, α = 0.05. Respectively, If you’re interested in using a different significance level than 5% along with the critical value method, you require different tables of critical values, which can’t be found in this textbook.

The use of p-value for making a decision

Here, we’re talking about calculating the p-value by means of specialized linear regression tools such as LinRegTTEST on TI-84+ or TI-83+. On the input screen of the LinRegTTEST and on the line prompt for ρ or β you should highlight that 6= 0. The output screen of the tool displays the p-value on the line, reading “p=”. By the way, the vast majority of statistical software tools are capable of calculating the p-value.

If you see that your p-value appears to be less than the significance level, which is α = 0.05, then we have:

  • Our conclusion: "There’s substantial evidence to conclude there’s a considerable linear relationship between y and x. It’s because the correlation coefficient here differs a lot from 0. "
  • Our decision is to reject the null hypothesis.

If the p-value doesn’t appear to be less than the significant level of α = 0.05, we have:

  • Our decision: Avoid rejecting the null hypothesis.
  • Our conclusion: There isn’t substantial evidence to conclude there’s a considerable linear relationship between y and x. The reason is that the correlation coefficient doesn’t differ much from zero.

Crucial calculation notes

Of course, you’ll take advantage of up-to-date technology to calculate the desired p-value. The following tips will help you to compute the p-value as well as the test statistics.

It’s quite possible for you to compute the p-value by means of a t-distribution along with n-2 degrees of freedom.

Then, the correlation is 0 in the bulk of the data, check it out in the low left corner. In the upper right corner, you see the outliner, increasing both means and making the data lie predominantly in the quadrants 3 and 1. You require checking with the source of the data to find out whether the outlier is in error or not. Of course, like many other learners, you can also make errors, especially when a decimal point in your measurements occasionally shifts to the right. Even if you fail to find an adequate explanation for the outlier, it needs to be set aside, while the remaining data or the correlation coefficient needs to be calculated. Additionally, the report requires including a statement a statement of the outlier’s existence. Keep in mind, that it would be wrong to post the correlation built around on all the available data, because it would never disclose the behavior of the data.

Obviously, correlation coefficients are fully appropriate only when information is obtained by simply drawing a random sample from a larger population. Sometimes correlation coefficients are computed in the wrong way, when the values of one of the variables are determined with the help of the investigator. Well, in such cases, the outlier or message might be quite real and the two variables are prone to decreasing and increasing together. That’s so sad, once the study is carried out, we can’t do more about it, and the final outcome depends on a single observation.

It makes sense for you to check the outlier in order to spot probable errors. If everything is OK, report the CC for all points except the outlier with a warning that the outlier took place. In this particular case, our boasts a quite reasonable Y value as well as a bit unreasonable X value. Furthermore, your observations may appear to be two-dimensional outliers, absolutely unremarkable when every response is scrutinized individually.

This sort of picture arises when one variable appears to be a component of another. In most cases, the CC is positive, because increasing the total normally results in increasing every component.

The two almost straight lines in the display might be the result of plotting the combined data right from a couple of identifiable groups. For instance, one line corresponds to women and the other to men. Avoid reporting the single correlation coefficient without comment.

There’s a zero correlation within two groups. If there’s a great separation between the groups, the comments from the first case apply too. The data might not be just a random simple sample from a larger population. The division between the two groups might be a result of a conscious decision to exclude values right in the middle of the whole range of Y or X. Here, we can define the CC as an inappropriate summary of this type of data, as its value is greatly affected by the choice of Y or X values.

The CC is also a numerical summary, so you can report it as a measure of association for any group of numbers, regardless of their origin. Just like any other statistic, that’s proper interpretation depends on the sampling scheme employed to generate the data.

The CC is most appropriate when both measurements are carried out from a simple random sample from certain population. It’s apparent that the sample correlation evaluates a corresponding quantity within the population. You’d better compare sample cc for samples from different populations to find out whether the association is absolutely different in the populations or not. For instance, we can compare the association between bone density and calcium intake for black and white postmenopausal females.

If the date fails to constitute a simple random sample from a certain population, then it’s quite unclear how to interpret the CC other than as certain numerical measure for this group of numbers. Let’s assume, you’re going to measure bone density of a certain number of women at each of numerous levels of calcium intake, the CC will alter in compliance with the choice of intake levels.

The history of the correlation coefficient

As you know, the correlation coefficient gives us a clear idea of how well our data fits a curve or line. Indeed, Pearson wasn’t the true inventor of the CC, though the use of it became one of the most common ways of correlation measurement.

Francis Galton appeared to be the first person to measure this stuff, originally dubbed co-relation, that makes sense only if you study the relationship between several different variables. In this work dubbed Co-Relations and Their Measurement he pointed out that the statures of kinsmen can be defined as co-related variables, therefore the father’s stature is absolutely correlated to that of his adult son, however, the index of co-relation differs in the different cases.

By the way, it was Galton, who borrowed the term CC from biology, where it was employed at that time where he lived.

In 1892, Francis Ysidro Edgeworth, British statistician issued his paper dubbed Correlated Averages, where he firstly employed the term coefficient of correlation. He invented the product-moment correlation formula to estimate correlation.

We’d like to be your writers!

Perhaps, there’s nothing more terrible for you than writing a compare and contrast essay, writing an essay introduction, turabian paper or dissertation abstracts. Fortunately, by simply reaching us, you can set yourself free from this academic nightmare. By the way, it won’t take you much time to assess our professional level, just look at thesis examples on our website.

Our reputable writing service is an awesome solution for anyone, who is out of thoughts on the topic. Forget about your sleepless nights as well as other terrible things, as our helping hand is within your reach. Just imagine how many great things you could do instead of this academic boredom. You can save tons of your precious time and do whatever you want, while someone, who has decent expertise in this field, is completing your writing work. Apart from our indisputable professional skills, we can offer such tasty things as free revisions, friendly support, not to mention the complete absence of plagiarism.

More than 7 000 students trust us to do their work
90% of customers place more than 5 orders with us
Special price $10 /page
Check the price
for your assignment
Words min
Add commentaries
Type of service:
Custom writing
Academic level:
Get your $10 bonus now!
Hi! Every client is important to us, so we're giving you a $10 bonus. Create your first order and see for yourself - our service is working fine!