Lab19 | Data Art

POL242 LAB 19

Control Tables

PURPOSE

To learn how to control for a third variable with crosstabulations
To learn how to compare partial relationships against the original and to distinguish among types of relationships. Part 1 will discuss replication and specification, the most common results with control tables. Part 2 will discuss understanding spurious relationships as either explanation or interpretation.

MAIN POINTS

Control tables allow us to examine the relationship between two variables while controlling for a third variable. We do this in order to determine whether the control variable has an effect on the original relationship.
When we use a control variable the original relationship is separated into partial relationships. The number of partial relationships obtained corresponds to the number of categories of the control variable.
- For example, if we crosstabulate egalitarian attitudes (the dependent variable) with the independent variable Party Identification while controlling for French (language group), we will obtain two partial tables.
- Assuming language group is coded with French as the high score (1) and English as the low score (0), the first partial table would show the relationship between egalitarianism and Party Identification among English respondents.
- The second partial would show the relationship between the two variables among French respondents.
Partial tables can reveal several different types of results. The most common results are replication and specification. They are discussed in this part of the lab. Explanation and Interpretation will be discussed in the next part.
- With Replication: The partial relationships are roughly the same as in the original crosstabulation. This means that the original relationship holds true even when taking into account the control variable.
  - For example, if the partial relationships between Egalitarianism and PartyID for both English and French are essentially the same as the original relationship, then we would conclude that the result is replication
  - Thus we might find that PartyID predicts egalitarianism equally well for both French and English speakers.
- With Specification: One (or more) of the partial relationships becomes stronger and the other(s) become(s) weaker. This means that we have specified the conditions under which the original relationship occurs.
  - For an example, let’s return to the relationship between Egalitarianism and PartyID. If we control for the effect of education and find that the relationship is more pronounced among the more educated than among the less educated, then we have found specification.
  - This means that we have ‘specified’ certain conditions under which the relationship is stronger.

EXAMPLE 1: replication

Variables, Missing Values & Recodes

Dataset: CES 2011
Independent Variable: PartyID
Dependent Variable: Egalitarianism
Control Variable: French (language group)

Control Table Syntax

weight by WGTSamp.
*Preparing indicators of Attitudes re Inequality*.
*declare missing values on pes11_41*.
missing values pes11_41 (8,9).
*reverse scoring on pes11_41 and make it range from 0-1*.
recode pes11_41 (1=1) (2=.75) (3=.5) (4= .25)
   (5=0) into undogap.
value labels undogap 0 'muchless' .25 'someless' .5 'asnow'
   .75 'somemore' 1 'muchmore'.
*rescale mbs11_k2 from 0-10 to 0-1 and reverse its scoring*.
missing values mbs11_k2 (-99).
compute govact = (((mbs11_k2 * -1) +10)/10).
value labels govact 0'not act' 1 'gov act'.
*recode and re-label mbs11_b3 and pes11_52b*.
recode mbs11_b3 (1=1) (2=0) into goveqch.
value labels goveqch 1 'decent living' 0 'leave alone'.

*create an indexed variable (alpha=.66).
compute rawegal = undogap + govact + goveqch.
*recode index into 3 values for crosstabs*.
recode rawegal (0 thru 2.10=0)(2.15 thru 2.50=.5)
   (2.55 thru 3= 1) into egal3.
value labels egal3 0 'low' .5 'med' 1 'hi'.

*Preparing IV indicator-party identification from Lab 13*.
recode cps11_71 (2=1) (1=2) (4=3) (3=4)into PID4.
value labels PID4 1 'Cons' 2 'Lib' 3 'BQ' 4 'NDP'.

*Create Language indicator*.
recode cps_intlang11 (1=0) (5=1) into french.

*Crosstab and Control Tabs*.
crosstabs tables = egal3 by PID4
   /egal3 by PID4 by French
   /cells = column COUNT
   /statistics = phi chisq.

Selected Output

		PID4
		Cons	Lib	BQ	NDP
egal3	low	56.8%	28.8%	11.3%	17.8%
	low
	med	25.8%	38.2%	40.8%	29.9%
	med
	hi	17.4%	33.0%	47.9%	52.3%
	hi
Total
Total		236	212	71	107

Summary Statistics

All respondents

Cramer’s V= .277; p =.000
English respondents

Cramer’s V= .266; p =.000
French respondents

Cramer’s V= .241; p =.000

Interpretation of Results
- The resulting measures of association differ somewhat from the original relationship of 0.277, but only very slightly. The relationship for French speakers is slightly weaker and that for English respondents slightly stronger.
- Some authors suggest using a ‘rule of thirds’ as a rough and ready technique for determining whether or not the relationship is a replication. This rule of thumb suggests looking to see whether the original relationship has changed by one third (or more) of its original value. In this case 1/3 of .277 = .092. Neither the relationship for French nor the relationship for English differs from the original by this much.
- Moreover, the relationship remains statistically significant among both language groups despite the smaller sample size among the French.
- A more statistically rigorous technique is to refer to the standard error reported as ASE1 reported in conjunction with ordinal or interval measures of association. Although these are technically inappropriate here, calculating them here suggests the coefficients do not differ. As you will recall from our discussion of statistical significance, 95% of the cases on a normal distribution fall within about two (1.96) standard errors on each side of a point estimate. Therefore, multiplying the standard error by two and adding and subtracting the result to the coefficient can be used to determine significant differences between measures of association.
- By any of these standards, this is a case of replication. So we can conclude that the original relationship between Egalitarianism and PartID holds among both French and English respondents.

Example 2: Specification

Preliminary Hypothesis: PartyId is more related to Egalitarianism the Educated.
Partial Syntax

*Create education measure*.
recode cps11_79 (1 thru 5 = 0) (6 thru 8 = .5) (9 thru 11 =1) into ed3.
value labels ed3 inc3 0 'lo' .5 'med' 1 'hi'.

crosstabs tables = egal3 by PID4
   /egal3 by PID4 by ed3
   /cells = column COUNT
   /statistics = phi ctau chisq.

Control Table Statistical Output

Summary Statistics

All respondents
Cramer’s V= .277; p =.000; Tau_c =.336 (.033)

low ed respondents
Cramer’s V= .255; p =.003; Tau_c =.292 (.069)

med ed respondents
Cramer’s V= .213; p =.004; Tau_c =.210 (.062)

hi ed respondents
Cramer’s V= .371; p =.000, Tau_c =.454 (.044)

Interpreting the Control Table Results

The resulting measures of association for each subgroup differ somewhat from the original relationship. In particular, as indicated by its Cramer’s V value of .371 relationship between PartyID and Egalitarianism is substantially stronger among those respondents in the well educated third of the sample than it is for those in the lower and middle thirds.
Using the rough and ready ‘rule of thirds’ suggests that, as measured by Cramer’s V, the relationship among the most educated group differs from the original value by more than one third of the original value. In this case 1/3 of .277 = .092. The relationship for more Educated respondents exceeds the original relationship by at more than this.(.454-.277=) .254.
Although the relationship remains significant among those with low and medium education respondents, the relationship among highly educated respondents achieves a greater degree of significance.
By using the ordinal measure of association, another more rigorous technique is available. The estimated standard errors produced by Tau_ccan be used to calculate confidence intervals for ordinal measures of association. In this case, multiplying the standard error of .033 for the most educated group by 1.96 (or roughly two) and subtracting (or adding) this from the Tau_c value suggests that .454 likely differs from both.210 and .292 .
By all these standards, this appears to be a case of specification. So we can conclude that the original relationship between Egalitarianism and PartyID is stronger among the most educated respondents.

INSTRUCTIONS – Part 1: attempting control

Use SPSS to access an appropriate dataset and run a crosstabulation between a Dependent and Independent variable, selecting the appropriate measure of association and statistical significance.
Note the strength of the original relationship by looking at the measure of association and check the significance using either the p-value for the measure of association or chi-square.
Edit your syntax of your crosstabs tables = command to add a control variable by adding a second “by” and then the name of the control variable. The addition of “by” and the third variable is the only change required to create control tables.
Examine the strength of the new partial relationshipsby comparing the measure of association (like Kendall’s Tau or the Correlation Coefficient) for each partial relationship to the original measure of association.
Determine whether the results indicate replication or specification.
Carefully explain what factors led you to your conclusion.
Repeat with a new control variable until you find an instance of replication and one of specification.

Part 2:

USING CONTROL VARIABLES TO UNDERSTAND SPURIOIUS RELATIONSHIPS: Explanation and Interpretation

While the most common results using control tables are replication and specification, partial tables can also be used to examine more complex three-variable relationships.
If all of the partial relationships are substantially weaker than the original relationship, then the relationship may be either partly or wholly spurious. Recall that in a spurious relationship, the original relationship is revealed to be due to the influence of a third variable (Z) used as a control. We can often better understand statistically spurious relationships by analyzing the theoretical relations among the three variables.
In order to understand a spurious relationship it is often useful to determine theoretically whether the control variable is antecedent to the other two variables or intervening between them.
- An antecedent variable is logically (or temporally) prior to both of the original X and Y variables. Where the control variable is antecedent to both the independent and dependent variables, the finding of a spurious relationship is termed “explanation.” Symbolically, Z –> X,Y. The idea is that the control variable explains why X and Y are related. X and Y are not related because they are a cause and an effect; they are related because both are affected by Z.
- An intervening variable is one that is logically (or temporally) prior to one of the variables, but not both of them. Where the control variable is intervening, a finding of a spurious relationship is called “interpretation” because it clarifies (in whole or part) the process through which the relationship between X and Y functions. This is represented symbolically as: X–> Z –> Y
  - In some cases, certain variables could not plausibly be considered to be antecedent. In the case of a spurious relationship between gender and income the control variable, education cannot possibly be antecedent to both gender and income because education cannot be the cause of gender.
  - In other cases, you can determine whether the control variable is intervening or antecedent by observing which variable(s) the control variable influences.
- There are, of course, other possible outcomes in using control tables such as suppression and distortion as mentioned in lecture.

EXAMPLE – 3: Interpretation and Explanation

Several unsuccessful attempts to uncover an instance of interpretation or explanation for the PartyID and Egalitarianism relationship were made. These were based upon the thinking based upon the literature and theory that underlying this relationship might be some form of individualism (pes11_87), economic conservatism (cps11_30), authoritarianism (pes11_84, pes11_85), attitudes toward minorities (cps11_8, cps11_38), attitudes regarding welfare (cps11_3, cps_33), attitudes regarding taxation (cps11_30, cps11_31) or the role of government (pes11_22). In each instance either replication or specification was found.

Efforts to find a suitable example will continue.

INSTRUCTIONS – Part 2: using additional control variables

Using a data set of interest and select a dependent variable, an independent variable, and a control variable that may help you to explain or interpret the original IV-DV relationship.
Select appropriate measures of association and statistical significance.:
Perform the simple crosstaband note the strength of the original relationship by looking at the measure of association and check the significance using the p-value for the measure of association or chi-square.
Add a control variable to the analysis by adding declaring the appropriate missing values and making the relevant recodes. This is important, because it will determine the number and composition of the control tables. Remember that it is also a good idea to add value labelsfor newly recoded variables in order to facilitate reading your tables.
Examine the strength of the new partial relationships. Determine whether the results indicate replication, specification, or a spurious relationship.
Carefully explain what factors led you to your conclusion.
Repeat with a second control variable.

QUESTIONS FOR REFLECTION

Try to understand each set of control tables you run: do the results indicate replication, specification, interpretation, or explanation?
Why are replication and specification more common than interpretation and explanation?