new lab 2
UCSC LAB 2
Crosstabulation with NonInterval Variables
PURPOSE
 To learn how to perform a crosstabulation and practice formulating hypotheses.
 To appreciate how crosstabulation allows us to make comparisons relevant to our hypotheses.
 Introduce the logic of comparison
MAIN POINTS
Crosstabulation
 Crosstabulation brings together the indicators for two variables and displays the relationship between them in a single table. Each column in the crosstab corresponds to a category of the independent variable, and each row corresponds to a category in the dependent variable. Hence the dependent variable goes on the left, and the independent variable goes on the top.
 Each cell represents a unique combination of categories from each of the variables. For example, in the table below, the cell “G” represents all the respondents who selected Category I for the independent variable and Category III for the dependent variable.
 The percentage in each cell is calculated by dividing the number of respondents in the cell by the total number of respondents for the column. Note: the cellpercentage values will be affected by whether or not we treat some categories of our indicators as missing values. Pay attention to the percentages in each cell rather than the number (n) of respondents in each cell.
 To interpret crosstabs compare the columnpercentages across the rows to see whether they differ. For instance, in the table below, compare the percentage values for cells A, B, and C, then compare D, E, and F, and finally compare G, H, and I. If the columnpercentages of cells ABC, and/or DEF, and/or GHI remarkably differ from one another then you may have found a relationship.
 Crosstabulation does not work effectively if either variable has a great many value categories.
INDEPENDENT VARIABLE  
Category I  Category II  Category III  
DEPENDENT VARIABLE  Category I  A  B  C 
Category II  D  E  F  
Category III  G  H  I 
INSTRUCTIONS:
Crosstabulating Variables
 Select an appropriate data set such as one of the PPIC 2016 or 2017 statewide surveys.
 Enter the codebook for the dataset you have chosen.
 Hypothesize a relationship between two variables in the dataset.
 For example, you might think that attitudes toward inequality may vary by partisanship
 In order to avoid corrupting your data, lock your data set prior to beginning your analyses.
 To make certain there is some variation on the variables, use SPSS to perform a frequency analysis for each variable.
 In the Analysis menu of SPSS, select Descriptive Statistics and then Crosstabs. Place your dependent in the rows box and your independent variable in the columns box. Click on the “Cells” tab and select column percentages.
 Consider whether recoding your variables would be desirable and do so as necessary.
 Click on the “Paste” button. Select the syntax and run it.
 Determine whether there is a relationship between the variables based on the columnpercentages in the crosstab.
 Repeat the analysis until you find a set of variables with a relationship.
EXAMPLE
 Dataset:
 Statewide Survey October 2016

 Y Variable
 Marijuana Initiative
 Indicator for Y
Q21. “Proposition 64 is called the ‘Marijuana Legalization. Initiative Statute’ … If the election were held today, would you vote yes or no on Proposition 64?”  Possible Explanation (X)
Gender  Indicator for X
Gender
 Y Variable
 Arrow Diagrams :
 X → Y
 Gender →Voting Intention on Marijuana Initiative
 Syntax:
*Preparing the DV*. missing values q21 (8,9). *Running the Crosstabulation*. crosstabs /tables=q21 BY gender /cells=column count.
 Output:
Crosstabulation of Initiative Vote intention by Gender
Q21. Proposition 64 is called the ‘Marijuana Legalization. Initiative Statute.’ If the election were held today, would you vote yes or no on Proposition 64? * Gender Crosstabulation  
Gender  Total  
Male  Female  
Q21. Proposition 64 is called the ‘Marijuana Legalization. Initiative Statute.’ If the election were held today, would you vote yes or no on Proposition 64?  yes  Count  406  306  712 
% within Gender  62.1%  48.3%  55.3%  
no  Count  248  327  575  
% within Gender  37.9%  51.7%  44.7%  
Total  Count  654  633  1287  
% within Gender  100.0%  100.0%  100.0% 
Source: PPIC October 2016
Edited Version of Table:
Intended Vote on Marijuana Proposition by Gender
Gender  
Male  Female  
Q21.
vote yes or no on Proposition 64? 
yes  62.1%  48.3%  
no  37.9%  51.7%  
Total  654  633 
Interpretation of Crosstab:
 The edited version of the table is easier to absorb.
 The number in each cell is a columnpercentage. At the bottom of each column is the number of cases on which the column percentages are based. The column percentages are key in interpreting your findings.
 Comparing the columnpercentages for the cells across each row of the table we can see that there are differences between the gender groups.
 It is often most useful to look at the top and bottom rows before looking at any middles rows.
 In particular, looking across the top row, males are more likely to favour the initiative than females. And looking across the bottom row, women are more likely to oppose the initiative
 Overall, there is a clear gender difference in vote intentions.
QUESTIONS FOR REFLECTION
Based on the columnpercentages in your crosstab, did you discover a relevant relationship? If so, was it evident in only one row of the table or in all rows?
Try another crosstabulation with another independent variable such as language of interview.
DISCUSSION
When you find a cell that has a substantially different columnpercentage from the other cells in that row, there are usually other rows in the table that also have a difference. For example, if you find a difference in the columnpercentage for cells ABC, then there is probably also a difference between DEF, or GHI. This happens because the columnpercentage in any given cell influences the columnpercentage of the other cells in that column.
INDEPENDENT VARIABLE  
Category I  Category II  Category III  
DEPENDENT VARIABLE  Category I  A  B  C 
Category II  D  E  F  
Category III  G  H  I 
Some more advanced Syntax for use with the ANES 2016 data
 Syntax:
FREQUENCIES VARIABLES=V162034a *variations on Pres Vote*. recode v162034a (1=1) (2=2) (9 thru 1 = 1) (3 thru 9 =9) into PresV1. missing values PresV1 (1, 9). value labels PresV1 1 'Clinton' 2 'Trump'. frequencies variables= PresV1. recode v162034a (1=1) (2=2) (else = 0) into PresV2. missing values PresV2 (0). value labels PresV2 1 'Clinton' 2 'Trump'. frequencies variables= PresV2. recode v162034a (1=1) (2=2) (else = sysmis) into PresV3. value labels PresV3 1 'Clinton' 2 'Trump'. frequencies variables = PresV3. recode v162034a (1=1) (2=2) into PresVote. value labels PresVote 1 'Clinton' 2 'Trump'. fre var PresVote. FREQUENCIES VARIABLES=V161196 V161196a V161196x. missing values V161196 V161196a V161196x (9, 8, 1). fre var V161196 V161196a V161196x. crosstabs tables = PresVote by V161196 V161196x /cells = column count. *wall*. recode V161196 (1=1) (2=3) (3=2) into wall. value labels wall 1 'favor' 2 'neither' 3 'oppose'. crosstabs tables = PresVote by wall, V161196x /cells = column count /statistics = phi ctau d. *intelligence*. FREQUENCIES VARIABLES=V168017. missing values V168017 V168018 V168019 (8, 1). recode v168017 (1=1) (2=2) (3,4,5=3) into intell. value labels intell 1 'very high' 2 'faily high' 3 'average or less'. crosstabs tables = PresVote by intell /cells = column count /statistics =phi ctau d. *skintone*. FREQUENCIES VARIABLES=V162368. missing values V162368 (9 thru 5). crosstabs tables = PresVote by V162368 /cells = column count /statistics =phi ctau d. *heathcare*. FREQUENCIES VARIABLES=V161114x. missing values v161114x (1). crosstabs tables = PresVote by V161114x /cells = column count /statistics =phi ctau d. *creating ideology Fre var V161127. missing values v161127 (9, 8, 1). recode V161127 (1=1) (2=3) (3=2) into Ideol. value labels Ideol 1 'liberal' 2 'moderate' 3 'conservative'. crosstabs tables = PresVote by Ideol /cells = column count /statistics =phi ctau d. FREQUENCIES VARIABLES=V161114x /BARCHART PERCENT. *creating gender* fre var = V161342. recode v161342 (1=0) (2=1) into female. value labels female 0 'male' 1 'female'. fre var female. crosstabs tables = PresVote by Ideol by female /cells = column count /statistics = phi ctau d.