Chi-Square Test in R and Interpretation – R tutorial

247

R provides several methods of testing the independence of the categorical variables. In my tutorial, I will show three tests such as the chi-square test of independence, the Fisher exact test, and the Cochran-Mantel–Haenszel test.

Chi-Square test is a statistical method used to determine if two categorical variables have a significant correlation between them. The two variables are selected from the same population. Furthermore, these variables are then categorised as Male/Female, True/False, etc.

The function chisq.test() is used to perform this operation. I will show an example with builtin data on vcd package. You can always import data into R using CSV, Excel or SPSS data file. Also, we will see how to interpret the results of the Chi-square test.

Hypotheses of Chi-Square test

Null hypothesis – Assumes that there is no association between the two variables.

Alternative hypothesis – Assumes that there is an association between the two variables.

Let us see an example now.

Example

To install vcd package use the command install.packages("vcd"). Then use the following code to performs Chi-Square test in R for two different sets of variables and to understand when to accept and when to reject the hypothesis.

> library(vcd)
> chisq.test(Arthritis$Treatment,Arthritis$Improved)
        Pearson's Chi-squared test

data:  Arthritis$Treatment and Arthritis$Improved
X-squared = 13.055, df = 2, p-value = 0.001463
> chisq.test(Arthritis$Improved,Arthritis$Sex)

        Pearson's Chi-squared test

data:  Arthritis$Improved and Arthritis$Sex
X-squared = 4.8407, df = 2, p-value = 0.08889
Warning message:
In chisq.test(Arthritis$Improved, Arthritis$Sex) :
  Chi-squared approximation may be incorrect

From the result of chisq.test(Arthritis$Treatment,Arthritis$Improved), there appears to be a relationship between treatment received and level of improvement, We come to this conclusion because the p-value is less than 0.01. i.e, p < 0.01. Hence, we reject the null hypothesis and accept the alternative hypothesis.

But the result of chisq.test(Arthritis$Improved,Arthritis$Sex) shows that there doesn’t appear to be a relationship between patient sex and improvement because the p-value is greater than 0.01 or 0.05 i.e, p > 0.05. Hence, we reject the alternative hypothesis and accept the null hypothesis.

The warning message is produced because one of the six cells in the table (male-some improvement) has an expected value of less than five, which may invalidate the chi-square approximation. Use the code head(Arthritis) to check this.

So, this is how you can perform a Chi-Square test in R and interpret the result.