new lab 2


Crosstabulation with Non-Interval Variables


  • To learn how to perform a crosstabulation and practice formulating hypotheses.
  • To appreciate how crosstabulation allows us to make comparisons relevant to our hypotheses.
  • Introduce the logic of comparison



  • Crosstabulation brings together the indicators for two variables and displays the relationship between them in a single table. Each column in the crosstab corresponds to a category of the independent variable, and each row corresponds to a category in the dependent variable. Hence the dependent variable goes on the left, and the independent variable goes on the top.
  • Each cell represents a unique combination of categories from each of the variables. For example, in the table below, the cell “G” represents all the respondents who selected Category I for the independent variable and Category III for the dependent variable.
  • The percentage in each cell is calculated by dividing the number of respondents in the cell by the total number of respondents for the column. Note: the cell-percentage values will be affected by whether or not we treat some categories of our indicators as missing values. Pay attention to the percentages in each cell rather than the number (n) of respondents in each cell.
  • To interpret crosstabs compare the column-percentages across the rows to see whether they differ. For instance, in the table below, compare the percentage values for cells A, B, and C, then compare D, E, and F, and finally compare G, H, and I. If the column-percentages of cells A-B-C, and/or D-E-F, and/or G-H-I remarkably differ from one another then you may have found a relationship.
  • Crosstabulation does not work effectively if either variable has a great many value categories.
Category I Category II Category III
Category II D E F
Category III G H I


Crosstabulating Variables

  1. Select an appropriate data set such as one of the PPIC 2016 or 2017 statewide surveys.
  2. Enter the codebook for the dataset you have chosen.
  3. Hypothesize a relationship between two variables in the dataset.
    • For example, you might think that attitudes toward inequality may vary by partisanship
  4. In order to avoid corrupting your data, lock your data set prior to beginning your analyses.
  5. To make certain there is some variation on the variables, use SPSS to perform a frequency analysis for each variable.
  6. In the Analysis menu of SPSS, select Descriptive Statistics and then Crosstabs. Place your dependent in the rows box and your independent variable in the columns box. Click on the “Cells” tab and select column percentages.
  7. Consider whether recoding your variables would be desirable and do so as necessary.
  8. Click on the “Paste” button. Select the syntax and run it.
  9. Determine whether there is a relationship between the variables based on the column-percentages in the crosstab.
  10. Repeat the analysis until you find a set of variables with a relationship.


  • Dataset:
    • Statewide Survey October 2016
    • Y Variable
      • Marijuana Initiative
    • Indicator for Y
      Q21. “Proposition 64 is called the ‘Marijuana Legalization. Initiative Statute’ … If the election were held today, would you vote yes or no on Proposition 64?”
    • Possible Explanation (X)
    • Indicator for X
  • Arrow Diagrams :
    • X → Y
    • Gender →Voting Intention on Marijuana Initiative
  • Syntax:

    *Preparing the DV*.
    missing values q21 (8,9).
    *Running the Crosstabulation*.
        /tables=q21 BY gender
       /cells=column count.
  • Output:

Crosstabulation of Initiative Vote intention by Gender

Q21. Proposition 64 is called the ‘Marijuana Legalization. Initiative Statute.’ If the election were held today, would you vote yes or no on Proposition 64? * Gender Crosstabulation
Gender Total
Male Female
Q21. Proposition 64 is called the ‘Marijuana Legalization. Initiative Statute.’ If the election were held today, would you vote yes or no on Proposition 64? yes Count 406 306 712
% within Gender 62.1% 48.3% 55.3%
no Count 248 327 575
% within Gender 37.9% 51.7% 44.7%
Total Count 654 633 1287
% within Gender 100.0% 100.0% 100.0%

Source: PPIC October 2016

Edited Version of Table:

Intended Vote on Marijuana Proposition by Gender

Male Female

vote yes or no on Proposition 64?

yes 62.1% 48.3%
no 37.9% 51.7%
Total 654 633


Interpretation of Crosstab:

    • The edited version of the table is easier to absorb.
    • The number in each cell is a column-percentage.  At the bottom of each column is the number of cases on which the column percentages are based. The column percentages are key in interpreting your findings.
    • Comparing the column-percentages for the cells across each row of the table we can see that there are differences between the gender groups.
    • It is often most useful to look at the top and bottom rows before looking at any middles rows.
    • In particular, looking across the top row, males are more likely to favour the initiative than females. And looking across the bottom row, women are more likely to oppose the initiative
    • Overall, there is a clear gender difference in vote intentions.


Based on the column-percentages in your crosstab, did you discover a relevant relationship? If so, was it evident in only one row of the table or in all rows?

Try another crosstabulation with another independent variable such as language of interview.


When you find a cell that has a substantially different column-percentage from the other cells in that row, there are usually other rows in the table that also have a difference. For example, if you find a difference in the column-percentage for cells A-B-C, then there is probably also a difference between D-E-F, or G-H-I. This happens because the column-percentage in any given cell influences the column-percentage of the other cells in that column.

Category I Category II Category III
Category II D E F
Category III G H I


Some more advanced Syntax for use with the ANES 2016 data

  • Syntax:

    *variations on Pres Vote*.
    recode v162034a (1=1) (2=2) (-9 thru -1 = -1) (3 thru 9 =9) into PresV1.
    missing values PresV1 (-1, 9).
    value labels PresV1 1 'Clinton' 2 'Trump'.
    frequencies variables= PresV1.
    recode v162034a (1=1) (2=2) (else = 0) into PresV2.
    missing values PresV2 (0).
    value labels PresV2 1 'Clinton' 2 'Trump'.
    frequencies variables= PresV2.
    recode v162034a (1=1) (2=2) (else = sysmis) into PresV3.
    value labels PresV3 1 'Clinton' 2 'Trump'.
    frequencies variables = PresV3.
    recode v162034a (1=1) (2=2) into PresVote.
    value labels PresVote 1 'Clinton' 2 'Trump'.
    fre var PresVote.
    FREQUENCIES VARIABLES=V161196 V161196a V161196x.
    missing values V161196 V161196a V161196x (-9, -8, -1).
    fre var V161196 V161196a V161196x.
    crosstabs tables = PresVote by V161196 V161196x
     /cells = column count.
    recode V161196 (1=1) (2=3) (3=2) into wall.
    value labels wall 1 'favor' 2 'neither' 3 'oppose'.
    crosstabs tables = PresVote by wall, V161196x
     /cells = column count
     /statistics = phi ctau d. 
    missing values V168017 V168018 V168019 (-8, -1).
    recode v168017 (1=1) (2=2) (3,4,5=3) into intell.
    value labels intell 1 'very high' 2 'faily high' 3 'average or less'. 
    crosstabs tables = PresVote by intell
     /cells = column count
     /statistics =phi ctau d.
    missing values V162368 (-9 thru -5).
    crosstabs tables = PresVote by V162368
     /cells = column count
     /statistics =phi ctau d.
    missing values v161114x (-1).
    crosstabs tables = PresVote by V161114x
     /cells = column count
     /statistics =phi ctau d.
    *creating ideology
    Fre var V161127.
    missing values v161127 (-9, -8, -1).
    recode V161127 (1=1) (2=3) (3=2) into Ideol.
    value labels Ideol 1 'liberal' 2 'moderate' 3 'conservative'.
    crosstabs tables = PresVote by Ideol
     /cells = column count
     /statistics =phi ctau d.
    *creating gender*
    fre var = V161342.
    recode v161342 (1=0) (2=1) into female.
    value labels female 0 'male' 1 'female'.
    fre var female.
    crosstabs tables = PresVote by Ideol by female
     /cells = column count
     /statistics = phi ctau d.