new lab 7
UCSC LAB MANUAL: Lab 7
Crosstabulation with Ordinal Variables
PURPOSE
 To learn how to interpret the results of an ordinal X ordinal crosstabulation.
 To learn how to apply and interpret Kendall’s Taub and Tauc.
MAIN POINTS
Crosstabulation
 In the case where at least one of the variables in the crosstab is nominal (Lab 6), a relationship can be evident in the differences in columnpercentage in one or two rows of a table. However, this requirement is not sufficient to draw any conclusions for an ordinal X ordinal crosstab.
 For ordinal X ordinal crosstabs, a relationship is demonstrated only when the value of the dependent variable either increases or decreases in step with increases in the values of the independent variable. When the dependent variable increases in value as the independent variable increases in value, the relationship is positive. For example, as the level of education rises the incomelevel also increases means there is a positive relationship. On the other hand, when the dependent variable decreases in value as the independent variable increases in value then we have a negative relationship.
 A practical technique that students sometimes employ to identify whether a relationship exists for ordinal X ordinal crosstabs begins by finding the bulge in each row. The bulge is in the cell(s) in each row that has a disproportionately high columnpercentage. The columnpercentage for the bulge should differ substantially from that of the other cells in the row.
 For the relationship to be properly identified, it is necessary that your ordinal variables be properly coded (from lowest to highest). If there is a positive relationship, then the bulges will follow a pattern that leads diagonally downward (topleft to bottomright). Or in the case of a negative relationship, the bulges will follow a pattern that leads diagonally upward (bottomleft to topright).
POSITIVE RELATIONSHIP Dependent Independent Variable Variable Low Mid High Low XXX Mid XXX High XXX
NEGATIVE RELATIONSHIP
Dependent Independent Variable Variable Low Mid High Low XXX Mid XXX High XXX
Kendall’s Tau
 Kendall’s Tau is built on the notion of like and different ordered pairs of cases, It is a measure of association that calculates the strength of the relationship between two ordinal variables.
 Unlike Cramer’s V, Kendall’s Tau also indicates the direction of the relationship. A positive Tau value means that as the first variable increases in value, the second variable also increases in value. A negative Tau value means that as the first variable increases, the second decreases, and viceversa.
 Use Taub for square crosstabs (where the number of categories is same for both variables,( e.g., a 3 X 3 table, a 4 X 4, or 5 X 5). Use Tauc for rectangular crosstabs, where the variables have a different number of categories,( e.g., a 3 X 5 table).
 Once again, the Tau value can be interpreted as all other measures of association
INSTRUCTIONS
 Using the Codebook for your chosen dataset hypothesize a relationship between two indicators measured at the ordinal level. One of the variables should be a dependent variable and the other should be the independent variable. Indicators measured at the interval level may be used as long as you recode them into ordinal variables with three to five categories. This will make the crosstab easier to interpret.
 For example, Education (independent) explains the variation in reported income (dependent), i.e., the more educated a person is, the more his or her income .
 Using SPSS perform separate trialruns of the frequency distribution for each of the variables. (i.e. Perform frequency analyses without entering any recodes or missing values). Based on the frequency output, decide how to recode each variable (if necessary) and identify the missing values. If the number of categories of your variable is large you may also want to combine them into 3 to 5 categories to make the crosstabulation easier to interpret.
 Again using SPSS perform a Crosstabulation making certain to enter the dependent variable first, followed by the independent. Remember to account for the missing values.
 On the /statistics subcommand specify either btau or ctau to generate either “Kendall’s Taub” or “Kendall’s Tauc”.
 Based on the columnpercentages in the crosstabular output, determine whether there is a relationship between the variables. Identify a relationship as either positive or negative. Also, judge the strength of the relationship to determine whether it meets the standards set out in the previous lab (Lab 6)
 Repeat the process until you find an acceptable relationship that meets or exceeds the standards set out above. Continue to find other interesting crosstab relationships between variables.
EXAMPLE #1
 Dataset
 PPIC 2016 October Statewide Survey
 Dependent Variable

D10. Which of the following categories best describes your total annual household income, before taxes, from all sources?

 Independent Variable
 D6. What was the last grade of school that you completed?
 Arrow Diagram:
 Education → Income
 Syntax:
*Income by Educ missing values d6 d10 (9). crosstabs d10 by d6 /cells = column count /statistics = ctau d.
 Syntax Legend
 Missing Values and recodes are decided upon based on frequencies trial runs. In this case the indicators for both education and income have the same missing values. So they can both be declared on a single command.
 Crosstabs command lists the Dependent Variable first, then the Independent Variable
 /cells subcommand tells SPSS to put column percentages and frequencies in each cell.
 /statistics is the syntax subcommand that needs to be included in order to calculate the Measures of Association. In this case we want to calculate Kendall’s Tau.
 With a square table specify btau. If the table is rectangular specify ctau. Since the table is 5X7, tauc is preferred. When one is relatively certain as to which variable is dependent, Somer’s d can also be specified. While we can’t be certain of the causal order, it seems reasonable here to say Education → Income in this case.
Output
 Income by Education
Education Level Some HS HS grad Some Col Col Grad Post Grad <$20k 49.3% 31.8% 16.9% 8.7% 3.1% $2039 27.9% 29.5% 23.6% 13.7% 9.0% $4059 14.4% 11.0% 18.9% 15.5% 13.7% $6079 5.6% 7.9% 12.5% 11.5% 10.5% $80100 0.9% 12.3% 11.4% 18.2% 15.6% $100200 0.9% 5.8% 11.1% 21.2% 27.0% $200+ 0.9% 1.7% 5.6% 11.2% 21.1% Total 215 292 360 401 256 Kendall’s tauc = .421; Somer’s d = .426
PPIC October 2016 Statewide Survey  Interpretation of Crosstab:
 First, look at top and bottom rows of the table and identify the pattern of results. In the top row the column percentages decrease from 49.3% to 3.1%. In the bottom row the percentages increase from 0.9% to 21.1%.
 There is clearly a noticeable pattern to the results. Low income (to row) is associated with lower education levels . And high incomes are associated with higher levels of education (bottom row). The rows second from the top and bottom show similar results. The pattern in the middle three rows is somewhat less evident
 Students often find it useful to identify the cell or cells in each row that contain a disproportionately high columnpercentage (bulges). The bulge is the cell with the highest columnpercentage for its row.
 The bulges for each row are shown in bold text. For the “<$20k” row, the bulge (49.3%) is under the “some HS” category of the independent variable. Bulges can also clearly be identified in the other rows of the table.
 Since the bulges essentially follow a pattern running diagonally downward along the main diagonal, there is a positive relationship between the independent and dependent variable.
 We can conclude that the more education a respondent has, the greater their household income
 Interpretation of Tau
 Since the crosstab is rectangular (5X7), we use the tauc measure of.421 rather than the taub measure.
 The tauc value is positive, meaning that the years of schooling (independent variable), the greater their household income (dependent variable). This confirms the conclusion reached by the columnpercentage analysis performed above.
 Using the interpretative standards from the table in Lab 6, we see however that the relationship is worrisomely strong. This suggests that the two indicators may be measuring the same thing.
Example #2
 Dataset
 PPIC 2016 October Statewide Survey
 Dependent Variable
 Q21. “Proposition 64 is called the ‘Marijuana Legalization. Initiative Statute’ … If the election were held today, would you vote yes or no on Proposition 64?”
 Independent Variable
 Q40c. Regardless of how you may be registered, in politics today, do you consider yourself a Republican, Democrat or Independent? Q40d.
 Do you consider yourself a strong Democrat…?Q40e.
 Do you consider yourself a strong Republican…?
 Arrow Diagram
 Democratic Identification→ Intended ‘Yes’ Vote on Marijuana Proposal
 Syntax
*recode MJ measure into 01 values*. recode q21 (1=1) (2=0) into MJprop. value labels MJprop 1 'yes' 0 'no'. *pure pidwo leaners*. if (q40c = 1) and (q40e =1) ppid =1. if (q40c = 1) and (q40e =2) ppid =2. if (q40c = 3) ppid =3. if (q40c =2) and (q40d =2) ppid =4. if (q40c =2) and (q40d=1) ppid =5. value labels ppid 1 'strRep' 2 'Rep' 3 'Indep' 4 'Dem' 5 'strDem'. missing values q21 (8,9). crosstabs tables = MJprop by ppid /cells = column count /statistics = ctau.
 Syntax Legend
 Note that the recode used to create MJprop reorders and relabels the values for the item asking about intended vote of the recreational marijuana proposal.
 The series of “if” statements is used to create a new measure of party identification which incorporates strength of partisanship into the measure. This creates an ordinal level measure of partisan identification. Note that the ppid created here leaves independents who lean toward one of the major parties still scored as independents. Q40c can be used on its own as a nominal measure of party identification.
 Output
Support for MJ Initiative by Partisanship
strRep  Rep  Indep  Dem  strDem  
Vote
Intention 
No  68.9%  62.2%  38.9%  46.1%  34.2%  
Yes  31.1%  37.8%  61.1%  53.9%  65.8%  
Total  177  98  427  152  386 
Tauc = .225
PPIC Oct 2016 Statewide Survey
 Interpretation of Output
 Reading across the first row of the table the column percentages decrease from 68.9% to 34.2%.
 In the bottom row the percentages increase from 31.1% to 65.8%.
 There is clearly a noticeable pattern to the results. Stronger Democratic identification is associated with greater intention to vote ‘yes’. Weaker Democratic identification is associated with lesser intention to vote ‘yes’. This can be seen by comparing the first and last columns.
 The bulges for each row are shown in bold text. For the ‘no’ row, the bulge (68.9%) is under the ‘str Rep’ column for the independent variable. A bulge can also clearly be identified in the ‘yes’ row.a
 Since the bulges follow a pattern running diagonally along the main diagonal, there is a positive relationship between the independent and dependent variables.
 We can conclude that the more strongly Democratic a respondent feels, the more likely they are to intend to vote ‘yes’.
 Interpretation of Tau
 Since both variables are ordinal and the table is rectangular the appropriate measure of association is Tauc.
 The tauc value is positive, meaning that the more strongly a respondent identifies as Democratic the more likely that they intend to vote ‘yes’. This confirms the conclusion reached by the columnpercentage analysis performed above.
 Using the interpretative standards from the table in Lab 6, we see that the relationship is moderate and regarded as acceptable.
 Note that this relationship is not as strong as the one between income and education.
Example #3
 Dataset:
 PPIC October 2016
 Dependent Variable

 Q21. “Proposition 64 is called the ‘Marijuana Legalization. Initiative Statute’ … If the election were held today, would you vote yes or no on Proposition 64?”
 Independent Variable(s)
 Age
 Education
 Income
 Interest
 Participation (Vote frequency)
 Democratic Identification
 Arrow Diagram
 Age → Intended ‘Yes’ Vote on Marijuana Proposal
 Educ → Intended ‘Yes’ Vote on Marijuana Proposal
 Income → Intended ‘Yes’ Vote on Marijuana Proposal
 Interest → Intended ‘Yes’ Vote on Marijuana Proposal
 Vote → Intended ‘Yes’ Vote on Marijuana Proposal
 Democrat → Intended ‘Yes’ Vote on Marijuana Proposal
 Syntax
recode d1a (1=0) (2= .2) (3= .4) (4=.6) (5=.8) (6=1) into age. value labels age 0 '18+' .2 '25+' .4 '35+' .6 '45+' .8 '55+' 1 '65+'. recode d6 (1=1) (2=.75) (3=.5) (4=.25) (5=0) into educ. value labels educ 0 '<hs' .25 'hs' .5 'col' .75 'grad' 1 'post'. recode d10 (1 =0) (2=.17) (3=.34) (4=.5) (5=.66) (6=.83) (7=1) into income. value labels income 0 '<$20k' .17 '$20k+' .34 '$40k+' .5 '$60k+' .66 '$80k+' .83 '$100k+' 1 '$200k+' . recode q38 (1=1) (2=.66) (3=.33) (4=0) into interest. value labels interest 0 'none' .33 'only a little' .66 'fair amount' 1 'great deal'. recode q39 (1=1) (2=.75) (3=.5) (4=.25) (5=0) into vote. value labels vote 0 'never' .25 'seldom' .5 'part time' .75 'nearly' 1 'always'. crosstabs MJprop by age educ income interest vote /cells = column count /statistics ctau.
 Syntax Legend
 Age, Educ, Income, Interest and Vote are derived from available indicators in the PPIC data.
 They are all recoded to range from 01 with the high score coded as 1.
 Output
 The crosstabulations can be created by the reader using the above syntax.
 Summary measures can be reported in lieu of the full crosstabulation
Intended Vote for MJ initiative by Selected Predictors
tauc Age .198 Educ – .171 Income .095 Pol interest .057 Vote Freq .039 Democrat .225 PPIC Oct 2016 Statewide Survey
 The tauc values for Age, Education and Vote Frequency are negative, meaning that as Age,
 Interpretation of Tau
 Since all the independent variables are ordinal and their respective tables are rectangular the appropriate measure of association is Tauc.
 Education or Vote Frequency increases support for the Marijuana initiative decreases. The coefficients for Age and Education are both weak while that for Vote Frequency is negligible, scarcely differing from zero.
 The coefficients for Income, Interest and Democratic Identification are all positive. Thus as Income, Interest or Democratic Identification increase so too does support for the initiative. The association for between interest and intended vote is very weak as is that for income. In contrast the coefficient for Democratic Identification is moderate.
 All results can be allconfirmed by analyzing the columnpercentage in the crosstabulation. The
QUESTIONS FOR REFLECTION
 Do the direction and the strength of the relationship depend on how you code the variables?
 Is a stronger relationship always better?
 How can measures of association help you determine the relative strength of the relationships?
DISCUSSION
 The direction of the relationship, and hence the interpretation of any ordinal X ordinal crosstab, depends upon the manner in which the variables were coded. If we took one variable from a pair of positively related variables and recoded it such that the categories of one of the variables ran in reverse order, the relationship would become negative. Moreover, declaring missing values or recoding variables may affect measures of association. Since the interpretation of the crosstab depends on the way you code the variables, be sure to label and code the variables carefully. If you have a variable called Education, for example, arrange the categories of its indicator from lowest to highest levels of education. Do not code it such that it runs from highest to lowest levels of education. Age variables can often present a problem when based on birth year. Coding variables appropriately makes it easier for the reader of your work to understand your results.
 Remember that relations stronger than .4 may indicate that your two variables measure essentially the same thing.
 Summary measures of association enable you to compare the relative influence of several IVs on the same DV. For example, the following table was created using syntax at the end of this lab with the CES2011 data.
' 0 'no'. *pure pidwo leaners*. if (q40c = 1) and (q40e =1) ppid =1. if (q40c = 1) and (q40e =2) ppid =2. if (q40c = 3) ppid =3. if (q40c =2) and (q40d =2) ppid =4. if (q40c =2) and (q40d=1) ppid =5. value labels ppid 1 'strRep' 2 'Rep' 3 'Indep' 4 'Dem' 5 'strDem'. missing values q21 (8,9). crosstabs tables = MJpropD by ppid /cells = column count /statistics = phi ctau d chisq.
Support for MJ Initiative by Partisanship
strRep  Rep  Indep  Dem  strDem  
Vote
Intention 
No  68.9%  62.2%  38.9%  46.1%  34.2%  
Yes  31.1%  37.8%  61.1%  53.9%  65.8%  
Total  177  98  427  152  386 
Cramer’s V = .250
PPIC Oct 2016 Statewide Survey
Intended Vote for MJ initiative by Selected Predictors
tauc  
Age  .198 
Educ  – .171 
Income  .095 
Pol interest  .057 
Vote Freq  .039 
Democrat  .225 
PPIC Oct 2016 Statewide Survey