R provides several methods of testing the independence of the categorical variables. In my tutorial, I will show three tests such as the chi-square test of independence, the Fisher exact test, and the Cochran-Mantel–Haenszel test.
Chi-Square test is a statistical method used to determine if two categorical variables have a significant correlation between them. The two variables are selected from the same population. Furthermore, these variables are then categorised as Male/Female, True/False, etc.
chisq.test() is used to perform this operation. I will show an example with builtin data on
vcd package. You can always import data into R using CSV, Excel or SPSS data file. Also, we will see how to interpret the results of the Chi-square test.
Hypotheses of Chi-Square test
Null hypothesis – Assumes that there is no association between the two variables.
Alternative hypothesis – Assumes that there is an association between the two variables.
Let us see an example now.
To install vcd package use the command
install.packages("vcd"). Then use the following code to performs Chi-Square test in R for two different sets of variables and to understand when to accept and when to reject the hypothesis.
> library(vcd) > chisq.test(Arthritis$Treatment,Arthritis$Improved) Pearson's Chi-squared test data: Arthritis$Treatment and Arthritis$Improved X-squared = 13.055, df = 2, p-value = 0.001463 > chisq.test(Arthritis$Improved,Arthritis$Sex) Pearson's Chi-squared test data: Arthritis$Improved and Arthritis$Sex X-squared = 4.8407, df = 2, p-value = 0.08889 Warning message: In chisq.test(Arthritis$Improved, Arthritis$Sex) : Chi-squared approximation may be incorrect
From the result of
chisq.test(Arthritis$Treatment,Arthritis$Improved), there appears to be a relationship between treatment received and level of improvement, We come to this conclusion because the p-value is less than 0.01. i.e, p < 0.01. Hence, we reject the null hypothesis and accept the alternative hypothesis.
But the result of
chisq.test(Arthritis$Improved,Arthritis$Sex) shows that there doesn’t appear to be a relationship between patient sex and improvement because the p-value is greater than 0.01 or 0.05 i.e, p > 0.05. Hence, we reject the alternative hypothesis and accept the null hypothesis.
The warning message is produced because one of the six cells in the table (male-some improvement) has an expected value of less than five, which may invalidate the chi-square approximation. Use the code
head(Arthritis) to check this.
So, this is how you can perform a Chi-Square test in R and interpret the result.