CHI SQUARE TEST IN R - Student choices is dependent on grades ?

DATA PREPARATION - CHI SQUARE TEST


chi square test in r

What is Chi Square Test ?

The Chi-Squared test is a statistical hypothesis test that assumes (the null hypothesis) that the observed frequencies for a categorical variable match the expected frequencies for the categorical variable. The test calculates a statistic that has a chi-squared distribution, named for the Greek capital letter Chi (X) pronounced “ki” as in kite.

FORMULA


chi square test in r

Types Chi Square Test

There are two types of chi-square tests. Both use the chi-square statistic and distribution for different purposes:

  • A chi-square goodness of fit test determines if a sample data matches a population. For more details on this type, see: Goodness of Fit Test.
  • A chi-square test for independence compares two variables in a contingency table to see if they are related. In a more general sense, it tests to see whether distributions of categorical variables differ from each another.
    • A very small chi square test statistic means that your observed data fits your expected data extremely well. In other words, there is a relationship.
    • A very large chi square test statistic means that the data does not fit very well. In other words, there isn’t a relationship.

Role and Importance of Chi Square Test

The test is applied when you have two categorical variables from a single population. It is used to determine whether there is a significant association between the two variables. The test procedure described in this lesson is appropriate when the following conditions are met:

  • The sampling method is simple random sampling.
  • The variables under study are each categorical.
  • If sample data are displayed in a contingency table, the expected frequency count for each cell of the table is at least 5.

PROBLEM

In the dataset "Popular Kids," students in grades 4-6 were asked whether good grades, athletic ability, or popularity was most important to them. A two-way table separating the students by grade and by choice of most important factor is shown below:

Goals Grades Total
4 5 6
Grades 49 50 69 168
Popular 24 36 38 98
Sports 19 22 28 69
Total 92 108 135 335

Question: To investigate possible dependencies among the students' choices by grade and significance value is 0.05

PROBLEM

H0: Student choices is dependent on grades
H1: Student choices is independent on grades

SOURCE CODE





OUTPUT





Interpretation of Result

P-value(Probability value)

The p-value is the probability of obtaining the observed results of a test, assuming that the null hypothesis is correct.
The p-value is used as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected.


df (degree of freedom)

The p-value is the probability of obtaining the observed results of a test, assuming that the null hypothesis is correct.
The p-value is used as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected.


X-squared value (chi-square value)

Dimnames

The dimnames() command can set or query the row and column names of a matrix.

Conclusion

p-value - 0.8244
p- value is more than 0.05, hence student choices is not dependent by grade.
Therefore, the data collected above, has no relationship between the individual student and the grade that they have obtained.
The result shows the p-value (0.8244) more than significance 0.05. The variables (Goals & Grade) are not dependent of each other.
Hence, we accept the null Hypothesis

Comments