DATA PREPARATION
Description of Correlation coefficient
A correlation is a relationship between two variables.
Typically, we take x to be the independent variable. We take y to be the dependent variable. Data is represented by a collection of ordered pairs (x, y).
Mathematically, the strength and direction of a linear relationship between two variables is represented by the correlation coefficient. Suppose that there are n ordered pairs (x, y) that make up a sample from a population. The correlation coefficient r is given by:
mx and my are the means of x and y variables.
The p-value (significance level) of the correlation can be determined:
by using the correlation coefficient table for the degrees of freedom: df=n−2df=n−2, where n is the number of observations in x and y variables.
or by calculating the t value as follow:
In the case 2) the corresponding p-value is determined using t distribution table for df=n−2
This will always be a number between -1 and 1 (inclusive).
• If r is close to 1, we say that the variables are positively correlated. This means there is likely a strong linear relationship between the two variables, with a positive slope.
• If r is close to -1, we say that the variables are negatively correlated. This means there is likely a strong linear relationship between the two variables, with a negative slope.
• If r is close to 0, we say that the variables are not correlated. This means that there is likely no linear relationship between the two variables, however, the variables may still be related in some other way.
Role / Importance
(i) Correlation helps us in determining the degree of relationship between variables. It enables us to make our decision for the future course of actions.
(ii) Correlation analysis helps us in understanding the nature and degree of relationship which can be used for future planning and forecasting.
PROBLEM - 1
Source Code
Output
Interpretation of Result
t is the t-test statistic value (t = 1.6996),
df is the degrees of freedom (df= 4),
p-value is the significance level of the t-test (p-value = 0.1644).
sample estimates is the correlation coefficient (Cor.coeff = 0.6475511).
Conclusion
Hours of sleep and test score are Moderately correlated and is positive correlated.
As more number of hours of sleep affect the Test score.
PROBLEM - 2
Source Code
Output
Interpretation of Result
t is the t-test statistic value (t),
df is the degrees of freedom (df),
p-value is the significance level of the t-test (p-value).
sample estimates is the correlation coefficient (Cor.coeff).
Conclusion
Scores of Chemistry and Physics are Highly correlated and is positive correlated.
If the student’s score is high in Chemistry, there are high chances that he can score good in Physics.
PROBLEM – 3
Source Code
Output
Interpretation of Result
t is the t-test statistic value (t),
df is the degrees of freedom (df),
p-value is the significance level of the t-test (p-value).
sample estimates is the correlation coefficient (Cor.coeff).
Conclusion
There is a strong negative correlation between the number of absences and the final exam grade, since r is very close to −1. Thus, as the number of absences increases, the final exam grade tends to decrease.
As the number of absences of the student affect the final exam grade.
Comments
Post a Comment