Boston University School of Public Health Introduction This module will continue the discussion of hypothesis testing, where a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The specific tests considered here are called chi-square tests and are appropriate when the outcome is discrete dichotomous, ordinal or categorical. For example, in some clinical trials the outcome is a classification such as hypertensive, pre-hypertensive or normotensive.
Two-Way Tables and the Chi-Square Test When analysis of categorical data is concerned with more than one variable, two-way tables also known as contingency tables are employed. These tables provide a foundation for statistical inference, where statistical tests question the relationship between the variables on the basis of the data observed.
Example In the dataset "Popular Kids," students in grades were asked whether good grades, athletic ability, or popularity was most important to them. A two-way table separating the students by grade and by choice of most important factor is shown below: Grade Goals 4 5 6 Total Grades 49 50 69 Popular 24 36 38 98 Sports 19 22 28 69 Total 92 To investigate possible differences among the students' choices by grade, it is useful to compute the column percentages for each choice, as follows: Grade Goals 4 5 6 Grades 53 46 51 Popular 26 33 28 Sports 21 20 21 Total There is error in the second column the percentages sum to 99, not due to rounding.
From the appearance of the column percentages, it does not appear that there is much of a variation in preference across the three grades. A and Dummer, G. The chi-square test provides a method for testing the association between the row and column variables in a two-way table. The null hypothesis H0 assumes that there is no association between the variables in other words, one variable does not vary according to the other variablewhile the alternative hypothesis Ha claims that some association does exist.
The alternative hypothesis does not specify the type of association, so close attention to the data is required to interpret the information provided by the test. The chi-square test is based on a test statistic that measures the divergence of the observed data from the values that would be expected under the null hypothesis of no association.
This requires calculation of the expected values based on the data. Example Continuing from the above example with the two-way table for students choice of grades, athletic ability, or popularity by grade, the expected values are calculated as shown below: Once the expected values have been computed done automatically in most software packagesthe chi-square test statistic is computed as where the square of the differences between the observed and expected values in each cell, divided by the expected value, are added across all of the cells in the table.
The distribution of the statistic X2 is chi-square with r-1 c-1 degrees of freedom, where r represents the number of rows in the two-way table and c represents the number of columns. The distribution is denoted dfwhere df is the number of degrees of freedom.
The chi-square distribution is defined for all positive values. Example The chi-square statistic for the above example is computed as follows: This indicates that there is no association between the choice of most important factor and the grade of the student -- the difference between observed and expected values under the null hypothesis is negligible.
Example The "Popular Kids" dataset also divided the students' responses into "Urban," "Suburban," and "Rural" school areas. Is there an association between the type of school area and the students' choice of good grades, athletic ability, or popularity as most important?
A two-way table for student goals and school area appears as follows: School Area Goals Rural Suburban Urban Total Grades 57 87 24 Popular 50 42 6 98 Sports 42 22 5 69 Total 35 The corresponding column percentages are the following: School Area Goals Rural Suburban Urban Grades 38 58 69 Popular 34 28 17 Sports 28 14 14 Total Barplots comparing the percentages of students' choices by school area appear below: From the table and corresponding graphs, it appears that the emphasis on grades increases as the school areas become more urban, while the emphasis on popularity decreases.
Is this association significant? We can conclude that the urban students' increased emphasis on grades is not due to random variation.
The chi-square index in the Statlib Data and Story Library DASL provides several other examples of the use of the chi-square test in categorical data analysis.Test statistics that follow a chi-squared distribution arise from an assumption of independent normally distributed data, which is valid in many cases due to the central limit theorem.
A chi-squared test can be used to attempt rejection of the null hypothesis that the data are independent. A chi-squared test, also written as χ 2 test, is any statistical hypothesis test where the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true.
Without other qualification, 'chi-squared test' often is used as short for Pearson's chi-squared test. The chi-squared test is used to determine whether there is .
Introduction: A Chi-square test is used to compare observed data with expected data according to a hypothesis. For instance, if you were crossbreeding 2 heterozygous pea plants, you would expect to see a phenotypic ratio in the offspring%(1).
G–tests are a subclass of likelihood ratio tests, a general category of tests that have many uses for testing the fit of data to mathematical models; the more elaborate versions of likelihood ratio tests don't have equivalent tests using the Pearson chi-square statistic.
Pearson's chi-squared test is used to assess three types of comparison: goodness of fit, homogeneity, and independence. A test of goodness of fit establishes whether an observed frequency distribution differs from a theoretical distribution.
Chi Square Goodness of Fit (One Sample Test) This test allows us to compae a collection of categorical data with some theoretical expected distribution. This test is often used in genetics to compare the results of a cross with the theoretical distribution based on genetic theory.