Lab 11

POL242 LAB MANUAL:  Lab 11

Chi-Square

PURPOSE

  • Introduce statistical significance
  • Learn how to perform the Chi-square test for significance.
  • Learn how sample-size and standard deviation affect the level of significance.

MAIN POINTS

Significance

  • Our studies are typically based on samples for reasons of economy.
  • All of the public opinion studies used in our labs and assignments thus far represent only a sample of a population.
  • Generally, we use sample findings as a basis for inferences about some population.
  • The sample (n) consists only of those individuals who were directly surveyed for the study.
  • The population (N) consists of the wider collection of individuals or cases about which we want to generalize.
  • In order to determine whether a sample is representative of a broader population, we measure statistical significance.
  • In calculating significance we ask whether the result obtained in the sample is representative of the population, or due to chance.
  • Inferential statistics such as Chi-Square and Anova address sampling error only, not errors in question construction or errors in the coding and weighting of the data.
  • Although based upon different sampling distributions, Chi-Square and Anova use the same standards for interpretation.
  • This lab will discuss using Chi-square; Lab 12 will take up Anova. Lab 13 addressing T-Tests will not be covered in class but you may work through it on your own.

MAIN POINTS

  • To determine whether a relationship in a cross tabulation is significant (not due to sampling error) use the Chi-square (Χ2) test.
  • Χ2 works by comparing observed frequencies in a table with what we would expect to find if there is no association between the variables (as indicated by differences across the columns). The greater the difference between what we observe and what we expect (no association), the less likely it is due to sampling error.
  • Cells should not have fewer than 5 cases.
  • Χ2 indicates only whether an observed relationship may be due to sampling error. As such, Χis not a measure of the strength of a relationship. Nor does it indicate the direction of a relationship.
  • If the probability level is less than .05, we conventionally infer that the observed relationship may be generalized to the population.
  • Statistical significance depends in part on sample size so as the number of cases (n) grows statistical significance is easier to achieve. As the sample size shrinks significance is more difficult to find.
  • A significant Chi-square tells us only that there is a significant difference in the table, not specifically where that difference is.
  • As we will in the next lab, we can determine which specific categories on the independent variable have significantly different values on the dependent variable by using ANOVA .

CONVENTIONAL STANDARDS FOR STATISTICAL SIGNIFICANCE

probability level            Description Acceptability GENERAL COMMENTS Chi-square INTERPRETATION
.00 to .01 Highly Significant GENERALLY ACCEPTABLE The difference in the sample is very likely representative of the population, not due to chance. The differences in the dependent variable across the categories of the independent variable are very unlikely to be due to chance.
.01 to .05 Significant CONVENTIONALLY ACCEPTABLE The difference in the sample is likely representative of the population The differences in the dependent variable across the categories of the independent variable are unlikely to be due to chance.
.05 to 0.1 Marginally significant Acceptable in some circumstances. The difference may well be due to chance The differences in the dependent variable across the categories of the independent variable may be due to chance.
.1 to 1.00 Non-significant NOT ACCEPTABLE The measurement in the sample is  likely  due to chance The differences in the dependent variable across the categories of the independent variable are likely due to chance.

EXAMPLE

  • Dataset:
    • CES2011
  • Independent Variable:
    • Party ID
  • Dependent Variable:
    • Egalitarian Attitudes
  • Hypothesis Arrow Diagram:
    • Party ID → Egalitarian

STEP ONE: CHI-SQUARE TEST FOR SIGNIFICANCE

  • Syntax
*Weighting the Data*.
weight by WGTSamp.

*Preparing indicators of Attitudes re Inequality*.
*declare missing values on pes11_41*.
missing values pes11_41 (8,9).

*reverse scoring on pes11_41 and make it range from 0-1*.
recode PES11_41 (1=1) (2=.75) (3=.5) (4= .25) (5=0) into undogap.
value labels undogap 0 'muchless' .25 'someless' .5 'asnow' .75 'somemore' 1 'muchmore'.

*rescale mbs11_k2 from 0-10 to 0-1 and reverse its scoring*.
missing values mbs11_k2 (-99).
compute govact = (((mbs11_k2 * -1) +10)/10).
value labels govact 0'not act' 1 'gov act'.
*recode and re-label mbs11_b3 and pes11_52b*.
recode mbs11_b3 (1=1) (2=0) into goveqch.
value labels goveqch 1 'decent living' 0 'leave alone'.

*create an indexed variable (alpha=.66).
compute rawegal = undogap + govact + goeqch.

*recode the new index into three categories*.
recode rawegal (0 thru 2.10=0)(2.15 thru 2.50=.5)
  (2.55 thru 3= 1) into egal3.
value labels egal3 0 'low' .5 'med' 1 'hi'.

*Preparing X indicator-party identification*.
recode cps11_71 (2=1) (1=2) (4=3) (3=4)into PID4.
value labels PID4 1 'Cons' 2 'Lib' 3 'BQ' 4 'NDP'.
*Crosstabular analysis*.
crosstabs tables = egal3 by PID4
  /cells = column count
  /statistics = phi chisq.

 

  • Syntax Legend
    • Note that data have been weighted
    • Syntax has been edited to exclude reliability and frequency analyses completed in previous labs
    • The DV has been renamed egal
    • Crosstabulation is a necessary prelude to Chi-Square
    • The chisq specification is added to the /statistics subcommand following the measure of association.

 

  • Output

 

PID4
Cons Lib BQ NDP
egal3 low
56.8% 28.8% 11.3% 17.8%
med
25.8% 38.2% 40.8% 29.9%
hi
17.4% 33.0% 47.9% 52.3%
Total
236 212 71 107

 

 

 

Chi-Square Tests
Value df Sig. (2-sided)
Pearson Chi-Square 96.345a 6 .000
N of Valid Cases 626
  1. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 22.80.

Interpretation

    • Glancing at the crosstab, we can see that at least some of the cell-percentages differ as we scan across the rows. This suggests that there may be a relationship between the two variables.
    • The value of Chi-square (96.3) and degrees of freedom (6) are essential for calculating statistical significance but not for its interpretation.
    • In interpreting the results the most important figure in the output is the significance (Sig 2-sided), which is .000.  This means that there is a chance of less than .0005% that the relationship between PID4 and Egal3 in the population is due to sampling error.
    • We can conclude that there is a statistically significant relationship between party identification and egalitarian attitudes.
    • We do not know, however, which columns differ significantly (beyond what one would expect due to chance).
    • We can push our analysis further using ANOVA to identify which specific differences are significant. This will be discussed in Lab 12.

INSTRUCTIONS (stage one)

  1. Begin by selecting one of the available datasets and hypothesize a relationship between two variables. Either variable can be measured at the nominal, ordinal, or interval level.
  2. Identify missing values and essential recodes using Frequency runs.
  3. Prepare a crosstab analysis a usual.
  4. Include chisq on the /statistics subcommand along with the appropriate measure of association.
  5. Determine whether you can infer that a relationship between the two variables exists in the population based on what you observe in the sample by referring to the Significance for the Pearson’s Chi-square. Use the same guidelines as in the previous labs.
  6. Repeat the steps above until you find a pair of variables that yield a significant relationship for the Chi-Square test.

QUESTIONS FOR REFLECTION

  • Even though the relationship in the cross tabulation clearly may be significant, is it possible for the variables not to be strongly related?
  • Since the level of significance level improves as we increase the sample size, why do surveys usually limit sample-size?
  • The significance of Chi-square applies to the table as a whole, but do we know specific columns differ significantly from one another?
  • What further analyses can we conduct which will enable us to do so?

DISCUSSION

  • Statistical significance and strength (or predictability) of association are two different things. One can have a weak relationship that is statistically significant or a strong relationship that is not significant. Notice too that sample size affects Chi-square.
  • After a certain sample-size, adding more cases does not much improve the significance level.  At such a point, the marginal benefit of increasing the sample-size has to be considered against the cost associated with gathering more data. The tipping point is somewhere between 1500 and 2000 cases in most surveys.
  • A Chi-square analysis does not tell us which columns differ significantly unless, of course, the independent variable has only two columns.
  • To determine which specific columns in a multi-column crosstabulation differ from one another requires another approach.
  • An inefficient approach would be to construct a series of two column tables.
  • A more efficient way to proceed is to turn to Analysis of Variance, the topic of Lab 12.