new lab 7

UCSC LAB MANUAL: Lab 7

Crosstabulation with Ordinal Variables

PURPOSE

  • To learn how to interpret the results of an ordinal X ordinal crosstabulation.
  • To learn how to apply and interpret Kendall’s Tau-b and Tau-c.

MAIN POINTS

Crosstabulation

  • In the case where at least one of the variables in the crosstab is nominal (Lab 6), a relationship can be evident in the differences in column-percentage in one or two rows of a table.  However, this requirement is not sufficient to draw any conclusions for an ordinal X ordinal crosstab.
  • For ordinal X ordinal crosstabs, a relationship is demonstrated only when the value of the dependent variable either increases or decreases in step with increases in the values of the independent variable.  When the dependent variable increases in value as the independent variable increases in value, the relationship is positive.  For example, as the level of education rises the income-level also increases means there is a positive relationship.  On the other hand, when the dependent variable decreases in value as the independent variable increases in value then we have a negative relationship.
  • A practical technique that students sometimes employ to identify whether a relationship exists for ordinal X ordinal crosstabs begins by finding the bulge in each row. The bulge is in the cell(s) in each row that has a disproportionately high column-percentage.  The column-percentage for the bulge should differ substantially from that of the other cells in the row.
  • For the relationship to be properly identified, it is necessary that your ordinal variables be properly coded (from lowest to highest).  If there is a positive relationship, then the bulges will follow a pattern that leads diagonally downward (top-left to bottom-right).  Or in the case of a negative relationship, the bulges will follow a pattern that leads diagonally upward (bottom-left to top-right).
POSITIVE RELATIONSHIP 
                   
Dependent           Independent Variable
Variable
                    Low     Mid    High

Low                 XXX

Mid                         XXX

High                               XXX
NEGATIVE RELATIONSHIP
                  
Dependent           Independent Variable
Variable
                    Low     Mid    High

Low                                XXX

Mid                        XXX

High               XXX

Kendall’s Tau

  • Kendall’s Tau is built on the notion of like and different ordered pairs of cases, It is a measure of association that calculates the strength of the relationship between two ordinal variables.
  • Unlike Cramer’s V, Kendall’s Tau also indicates the direction of the relationship.  A positive Tau value means that as the first variable increases in value, the second variable also increases in value.  A negative Tau value means that as the first variable increases, the second decreases, and vice-versa.
  • Use Tau-b for square crosstabs (where the number of categories is same for both variables,( e.g., a 3 X 3 table, a 4 X 4, or 5 X 5).  Use Tau-c for rectangular crosstabs, where the variables have a different number of categories,( e.g., a 3 X 5 table).
  • Once again, the Tau value can be interpreted as all other measures of association

INSTRUCTIONS

  1. Using the Codebook for your chosen dataset hypothesize a relationship between two indicators measured at the ordinal level. One of the variables should be a dependent variable and the other should be the independent variable. Indicators measured at the interval level may be used as long as you recode them into ordinal variables with three to five categories. This will make the crosstab easier to interpret.
    • For example, Education (independent) explains the variation in reported income (dependent), i.e., the more educated a person is, the more his or her income .
  2. Using SPSS perform separate trial-runs of the frequency distribution for each of the variables. (i.e. Perform frequency analyses without entering any recodes or missing values).  Based on the frequency output, decide how to recode each variable (if necessary) and identify the missing values.  If the number of categories of your variable is large you may also want to combine them into 3 to 5 categories to make the crosstabulation easier to interpret.
  3. Again using SPSS perform a Crosstabulation making certain to enter the dependent variable first, followed by the independent. Remember to account for the missing values.
  4. On the /statistics subcommand specify either b-tau or c-tau to generate either “Kendall’s Tau-b” or “Kendall’s Tau-c”.
  5. Based on the column-percentages in the crosstabular output, determine whether there is a relationship between the variables.  Identify a relationship as either positive or negative. Also, judge the strength of the relationship to determine whether it meets the standards set out in the previous lab (Lab 6)
  6. Repeat the process until you find an acceptable relationship that meets or exceeds the standards set out above. Continue to find other interesting crosstab relationships between variables.

EXAMPLE #1

      • Dataset
        • PPIC 2016 October Statewide Survey
      • Dependent Variable
        • D10. Which of the following categories best describes your total annual household income, before taxes, from all sources?

      • Independent Variable
        • D6. What was the last grade of school that you completed?
      • Arrow Diagram: 
        • Education → Income
      • Syntax:

        *Income by Educ 
        missing values d6 d10 (9).
        crosstabs d10 by d6
          /cells = column count
          /statistics = ctau d.
  • Syntax Legend
    • Missing Values and recodes are decided upon based on frequencies trial runs. In this case the indicators for both education and income have the same missing values. So they can both be declared on a single command.
    • Crosstabs command lists the Dependent Variable first, then the Independent Variable
    • /cells subcommand tells SPSS to put column percentages and frequencies in each cell.
    • /statistics is the syntax subcommand that needs to be included in order to calculate the Measures of Association.  In this case we want to calculate Kendall’s Tau.
    • With a square table specify btau. If the table is rectangular specify ctau. Since the table is 5X7, tauc is preferred.  When one is relatively certain as to which variable is dependent, Somer’s d can also be specified. While we can’t be certain of the causal order, it seems reasonable here to say Education → Income in this case.

Output

  • Income by Education
                                             Education Level
    Some HS HS grad Some Col Col Grad Post Grad
    <$20k 49.3% 31.8% 16.9% 8.7%        3.1%
    $20-39 27.9% 29.5% 23.6% 13.7%        9.0%
    $40-59 14.4% 11.0% 18.9% 15.5%     13.7%
    $60-79 5.6% 7.9% 12.5% 11.5%     10.5%
    $80-100 0.9% 12.3% 11.4% 18.2%     15.6%
    $100-200 0.9% 5.8% 11.1% 21.2%     27.0%
    $200+ 0.9% 1.7% 5.6% 11.2%     21.1%
    Total 215 292 360 401        256

    Kendall’s tauc = .421; Somer’s d = .426
    PPIC October 2016 Statewide Survey

  • Interpretation of Crosstab:
    • First, look at top and bottom rows of the table and identify the pattern of results. In the top row the column percentages decrease from 49.3% to 3.1%. In the bottom row the percentages increase from 0.9% to 21.1%.
    • There is clearly a noticeable pattern to the results. Low income (to row) is associated with lower education levels . And high incomes are associated with higher levels of education  (bottom row). The rows second from the top and bottom show similar results. The pattern in the middle three rows is somewhat less evident
    • Students often find it useful to identify the cell or cells in each row that contain a disproportionately high column-percentage (bulges). The bulge is the cell with the highest column-percentage for its row.
    • The bulges for each row are shown in bold text. For the “<$20k” row, the bulge (49.3%) is under the “some HS” category of the independent variable.  Bulges can also clearly be identified in the other rows of the table.
    • Since the bulges essentially follow a pattern running diagonally downward along the main diagonal, there is a positive relationship between the independent and dependent variable.
    • We can conclude that the more education a respondent has, the greater their household income
  • Interpretation of Tau
    • Since the crosstab is rectangular (5X7), we use the tau-c measure of.421 rather than the tau-b measure.
    • The tau-c value is positive, meaning that the years of schooling (independent variable), the greater their household income (dependent variable). This confirms the conclusion reached by the column-percentage analysis performed above.
    • Using the interpretative standards from the table in Lab 6, we see however that the relationship is worrisomely strong. This suggests that the two indicators may be measuring the same thing.

Example #2

  • Dataset
    • PPIC 2016 October Statewide Survey
  • Dependent Variable
    • Q21. “Proposition 64 is called the ‘Marijuana Legalization. Initiative Statute’ … If the election were held today, would you vote yes or no on Proposition 64?”
  • Independent Variable
    • Q40c. Regardless of how you may be registered, in politics today, do you consider yourself a Republican, Democrat or Independent? Q40d.
    • Do you consider yourself a strong Democrat…?Q40e.
    • Do you consider yourself a strong Republican…?
  • Arrow Diagram 
    • Democratic Identification→ Intended ‘Yes’ Vote on Marijuana Proposal
  • Syntax
*recode MJ measure into 0-1 values*.
recode q21 (1=1) (2=0) into MJprop.
value labels MJprop 1 'yes' 0 'no'.

*pure pid-wo leaners*.

if (q40c = 1) and (q40e =1) ppid =1.
if (q40c = 1) and (q40e =2) ppid =2.
if (q40c = 3) ppid =3.
if (q40c =2) and (q40d =2) ppid =4.
if (q40c =2) and (q40d=1) ppid =5.
value labels ppid 1 'strRep' 2 'Rep' 3 'Indep' 4 'Dem' 5 'strDem'.

missing values q21 (8,9).
crosstabs tables = MJprop by ppid
  /cells = column count
  /statistics = ctau.
  • Syntax Legend
    • Note that the recode used to create MJprop reorders and relabels the values for the item asking about intended vote of the recreational marijuana proposal.
    • The series of “if” statements is used to create a new measure of party identification which incorporates strength of partisanship into the measure. This creates an ordinal level measure of partisan identification. Note that the ppid created here leaves independents who lean toward one of the major parties still scored as independents. Q40c can be used on its own as a nominal measure of party identification.
  • Output

Support for MJ Initiative by Partisanship

strRep Rep Indep Dem strDem
Vote

Intention

No 68.9% 62.2% 38.9% 46.1% 34.2%
Yes 31.1% 37.8% 61.1% 53.9% 65.8%
Total 177 98 427 152 386

Tauc = .225

PPIC Oct 2016 Statewide Survey

  • Interpretation of Output
  • Reading across the first row of the table the column percentages decrease from 68.9% to 34.2%.
  • In the bottom row the percentages increase from 31.1% to 65.8%.
  • There is clearly a noticeable pattern to the results. Stronger Democratic identification is associated with greater intention to vote ‘yes’.  Weaker Democratic identification is associated with lesser intention to vote ‘yes’. This can be seen by comparing the first and last columns.
  • The bulges for each row are shown in bold text. For the ‘no’ row, the bulge (68.9%) is under the ‘str Rep’ column for the independent variable.  A bulge can also clearly be identified in the ‘yes’ row.a
  • Since the bulges follow a pattern running diagonally along the main diagonal, there is a positive relationship between the independent and dependent variables.
  • We can conclude that the more strongly Democratic a respondent feels, the more likely they are to intend to vote ‘yes’.
  • Interpretation of Tau
    • Since both variables are ordinal and the table is rectangular the appropriate measure of association is Tau-c.
    • The tau-c value is positive, meaning that the more strongly a respondent identifies as Democratic the more likely that they intend to vote ‘yes’. This confirms the conclusion reached by the column-percentage analysis performed above.
    • Using the interpretative standards from the table in Lab 6, we see that the relationship is moderate and regarded as acceptable.
    • Note that this relationship is not as strong as the one between income and education.

Example #3

  • Dataset:
    •  PPIC October 2016
  • Dependent Variable
    • Q21. “Proposition 64 is called the ‘Marijuana Legalization. Initiative Statute’ … If the election were held today, would you vote yes or no on Proposition 64?”
  • Independent Variable(s)
    • Age
    • Education
    • Income
    • Interest
    • Participation (Vote frequency)
    • Democratic Identification
  • Arrow Diagram 
    • Age → Intended ‘Yes’ Vote on Marijuana Proposal
    • Educ → Intended ‘Yes’ Vote on Marijuana Proposal
    • Income → Intended ‘Yes’ Vote on Marijuana Proposal
    • Interest → Intended ‘Yes’ Vote on Marijuana Proposal
    • Vote → Intended ‘Yes’ Vote on Marijuana Proposal
    • Democrat → Intended ‘Yes’ Vote on Marijuana Proposal
  • Syntax
recode d1a (1=0) (2= .2) (3= .4) (4=.6) (5=.8) (6=1) into age.
value labels age 0 '18+' .2 '25+' .4 '35+' .6 '45+' .8 '55+' 1 '65+'.

recode d6 (1=0) (2=.25) (3=.5) (4=.75) (5=1) into educ.
value labels educ 0 '<hs' .25 'hs' .5 'col' .75 'grad' 1 'post'.

recode d10 (1 =0) (2=.17) (3=.34) (4=.5) (5=.66) (6=.83) (7=1) into income.
value labels income 0 '<$20k' .17 '$20k+' .34 '$40k+' .5 '$60k+' .66 '$80k+' .83 '$100k+' 1 '$200k+' .

recode q38 (1=1) (2=.66) (3=.33) (4=0) into interest.
value labels interest 0 'none' .33 'only a little' .66 'fair amount' 1 'great deal'.

recode q39 (1=1) (2=.75) (3=.5) (4=.25) (5=0) into vote.
value labels vote 0 'never' .25 'seldom' .5 'part time' .75 'nearly' 1 'always'.

crosstabs MJprop by age educ income interest vote
  /cells = column count
  /statistics ctau.
  • Syntax Legend
    • Age, Educ, Income, Interest and Vote are derived from available indicators in the PPIC data.
    • They are all recoded to range from 0-1 with the high score coded as 1.
  • Output
    • The crosstabulations can be created by using the above syntax.
    • Summary measures can be reported in lieu of the full crosstabulation.

    • Intended Vote for MJ initiative by Selected Ordinal Predictors

           tauc
      Age  – .198
      Educ  – .171
      Income    .095
      Pol interest    .057
      Vote Freq  – .039
      Democrat    .225

      PPIC Oct 2016 Statewide Survey

  • Interpretation of Tau
    • Since all the independent variables are ordinal and their respective tables are rectangular the appropriate measure of association is Tau-c.
    • The tau-c values for Age, Education and Vote Frequency are negative, meaning that as they increase support for the marijuana initiative decreases.
    • The coefficients for Income, Interest and Democratic Identification are all positive. Thus as Income, Interest or Democratic Identification increase so too does support for the initiative.
    • The three associations between Income, Interest, Vote Frequency and support for the initiative, however, are all essentially negligible, scarcely different from zero.
    • The coefficients for Age and Education are both weak.
    • In contrast the coefficient for Democratic Identification is moderate.
    • These results can all be confirmed by analyzing the column-percentage in the crosstabulation.

QUESTIONS FOR REFLECTION

  • Do the direction and the strength of the relationship depend on how you code the variables?
  • Is a stronger relationship always better?
  • How can measures of association help you determine the relative strength of the relationships?

DISCUSSION

  • The direction of the relationship, and hence the interpretation of any ordinal X ordinal crosstab, depends upon the manner in which the variables were coded.  If we took one variable from a pair of positively related variables and recoded it such that the categories of one of the variables ran in reverse order, the relationship would become negative. Moreover, declaring missing values or recoding variables may affect measures of association. Since the interpretation of the crosstab depends on the way you code the variables, be sure to label and code the variables carefully. If you have a variable called Education, for example, arrange the categories of its indicator from lowest to highest levels of education.  Do not code it such that it runs from highest to lowest levels of education. Age variables can often present a problem when based on birth year. Coding variables appropriately makes it easier for the reader of your work to understand your results.
  • Remember that relations stronger than .4 may indicate that your two variables measure essentially the same thing.
  • Summary measures of association enable you to compare the relative influence of several IVs on the same DV. For example, the table produced above was created using previous syntax and that immediately below with the PPIC October 2016 data.
*pure pid-wo leaners*.

if (q40c = 1) and (q40e =1) ppid =1.
if (q40c = 1) and (q40e =2) ppid =2.
if (q40c = 3) ppid =3.
if (q40c =2) and (q40d =2) ppid =4.
if (q40c =2) and (q40d=1) ppid =5.
value labels ppid 1 'strRep' 2 'Rep' 3 'Indep' 4 'Dem' 5 'strDem'.

missing values q21 (8,9).
crosstabs tables = MJpropD by ppid
  /cells = column count
  /statistics = phi ctau d chisq.

Support for MJ Initiative by Partisanship

strRep Rep Indep Dem strDem
Vote

Intention

No 68.9% 62.2% 38.9% 46.1% 34.2%
Yes 31.1% 37.8% 61.1% 53.9% 65.8%
Total 177 98 427 152 386

Cramer’s V = .250

PPIC Oct 2016 Statewide Survey