Chi-Square and Nonparametric Statistics
in Dissertation & Thesis Research
There are two types of statistics normally used in dissertations and theses: parametric and nonparametric. A parametric statistic makes a key assumption that your sample was drawn from a normally distributed population. If your dissertation has ethnicity as a variable, for example, but the town in which you are collecting your data is not reflective of the ethnic distribution of the population as a whole, then your sample might be violating that assumption. A nonparametric test does not make an assumption about the distribution of scores underlying your sample, and should be used for your analysis.
Another instance in which you might require a nonparametric test for analysis is when your dependent variable is scaled on either a nominal (e.g. "yes", "no") or ordinal scale ("low," "medium," "high"). If in your research, you want to look at the relationship between two discrete variables, the appropriate nonparametric statistic is the chi-square test of independence. As part of your dissertation, you may hypothesize that your variable "A" is related to your variable "B". Or, in your analyses, you may find that one of your variables fluctuates depending on a different variable than expected.
For example, let's say that you are trying to determine whether people will buy ("yes" or "no") a particular product. Let's also say you have collected data from people of different ethnicities (e.g. 40 Caucasian, 40 African-American, and 40 Hispanic). One of your analyses could examine whether your sample's ethnicity is related to whether or not they would buy your product.
For your analysis, a chi-square test of independence would provide you with "expected" frequencies of how often persons in your sample of different ethnicities (variable "A") would buy your product (variable "B"), if those two variables were NOT related. When looking at your data, you notice that about half of your Hispanic subjects (21 out of 40) and the majority of your African-American subjects (36 out of 40) would buy your product. Further inspection of your data also reveals that almost none of your Caucasian subjects (3 out of 40) would buy your product.
Your data would look like this:
These numbers are the "observed" frequencies for your sample. A chi-square analysis determines whether your "observed" frequencies are sufficiently different from the "expected" frequencies to say that these two variables are, in fact, related. To determine the number of subjects in each cell you would "expect" to buy your product (Fe = "expected frequency"), you multiply the sum of the row by the sum of the column and divide this by the number of subjects (N).
Taking the first cell of the example table above, the sum of the row of African-American subjects is 40. The sum of the column of subjects who would buy your product is 60. Multiply those numbers (2400), and divide it by the total number of subjects (120). (Note that all of the "expected frequencies" for this example will come out the same.) In this example, you would expect about half of your subjects in each ethnic group (20 each, 60 total subjects) to buy your product. You would expect that about half of your subjects in each ethnic group (20 each, 60 total subjects) would not buy your product.
To find out whether or not your "observed" data is significantly different from what you should expect, and thus providing evidence that your variables are related, you subtract the number of subjects you would expect to buy your product from the observed number of subjects who said they would buy your product, and multiply that number by two. You then divide that by the number of subjects you would expect to buy your product: ((Fo-Fe)*2)/Fe. If the resulting chi-square is small, that means your observed data is not significantly different from what you would expect your sample data to be. That is, there is no relationship between your sample's ethnicity and their decision about buying your product. If your chi-square analysis is large, that means there is a relationship between your sample's ethnicity and whether or not they would buy your product.
Going back to the example data, 36 African-American subjects said they would buy your product (Fo=36), but only 20 African-American subjects would be expected to buy your product (Fe=20). Plugging these example numbers into our chi-square analysis, we find that the chi-square value is 1.6. This is a large number, so your two variables -- the sample's ethnicity and the sample's decision whether or not to buy your product -- are related. Your dissertation results are significant!