new lab 2 | Data Art

UCSC LAB 2

Crosstabulation with Non-Interval Variables

PURPOSE

To learn how to perform a crosstabulation and practice formulating hypotheses.
To appreciate how crosstabulation allows us to make comparisons relevant to our hypotheses.
Introduce the logic of comparison

MAIN POINTS

Crosstabulation

Crosstabulation brings together the indicators for two variables and displays the relationship between them in a single table. Each column in the crosstab corresponds to a category of the independent variable, and each row corresponds to a category in the dependent variable. Hence the dependent variable goes on the left, and the independent variable goes on the top.
Each cell represents a unique combination of categories from each of the variables. For example, in the table below, the cell “G” represents all the respondents who selected Category I for the independent variable and Category III for the dependent variable.
The percentage in each cell is calculated by dividing the number of respondents in the cell by the total number of respondents for the column. Note: the cell-percentage values will be affected by whether or not we treat some categories of our indicators as missing values. Pay attention to the percentages in each cell rather than the number (n) of respondents in each cell.
To interpret crosstabs compare the column-percentages across the rows to see whether they differ. For instance, in the table below, compare the percentage values for cells A, B, and C, then compare D, E, and F, and finally compare G, H, and I. If the column-percentages of cells A-B-C, and/or D-E-F, and/or G-H-I remarkably differ from one another then you may have found a relationship.
Crosstabulation does not work effectively if either variable has a great many value categories.

		INDEPENDENT VARIABLE
		Category I	Category II	Category III
DEPENDENT VARIABLE	Category I	A	B	C
	Category II	D	E	F
	Category III	G	H	I

INSTRUCTIONS:

Crosstabulating Variables

Select the October 2016 California statewide survey and data set.
Enter the codebook for the dataset you have chosen.
Hypothesize a relationship between two variables in the dataset.
- For example, you might think that attitudes toward inequality may vary by partisanship
In order to avoid corrupting your data, lock your data set prior to beginning your analyses.
To make certain there is some variation on the variables, use SPSS to perform a frequency analysis for each variable.
In the Analysis menu of SPSS, select Descriptive Statistics and then Crosstabs. Place your dependent in the rows box and your independent variable in the columns box. Click on the “Cells” tab and select column percentages.
Consider whether recoding your variables would be desirable and do so as necessary.
Click on the “Paste” button. Select the syntax and run it.
Determine whether there is a relationship between the variables based on the column-percentages in the crosstab.
Repeat the analysis until you find a set of variables with a relationship.

EXAMPLE

Dataset:
- Statewide Survey October 2016
- Y Variable
  - Marijuana Initiative
- Indicator for Y
  Q21. “Proposition 64 is called the ‘Marijuana Legalization Initiative Statute’ … If the election were held today, would you vote yes or no on Proposition 64?”
- Possible Explanation (X)
  Gender
- Indicator for X
  Gender

Arrow Diagrams :
- X → Y
- Gender →Voting Intention on Marijuana Initiative

Syntax:

*Preparing the DV*.
missing values q21 (8,9).

*Running the Crosstabulation*.
 crosstabs 
    /tables=q21 BY gender
   /cells=column count.

Output:

Crosstabulation of Initiative Vote intention by Gender

*Q21. Proposition 64 is called the ‘Marijuana Legalization Initiative Statute.’ If the election were held today, would you vote yes or no on Proposition 64? Gender Crosstabulation**
			Gender		Total
			Male	Female	Total
Q21. Proposition 64 is called the ‘Marijuana Legalization. Initiative Statute.’ If the election were held today, would you vote yes or no on Proposition 64?	yes	Count	406	306	712
	yes	% within Gender	62.1%	48.3%	55.3%
	no	Count	248	327	575
	no	% within Gender	37.9%	51.7%	44.7%
Total		Count	654	633	1287
Total		% within Gender	100.0%	100.0%	100.0%

Source: PPIC October 2016

Edited Version of Table:

Intended Vote on Marijuana Proposition by Gender

		Gender
		Male	Female
Q21.vote yes or no on Proposition 64?	yes	62.1%	48.3%
	no	37.9%	51.7%

Total		654	633

Interpretation of Crosstab:

- The edited version of the table is easier to absorb.
- The number in each cell is a column-percentage. At the bottom of each column is the number of cases on which the column percentages are based. The column percentages are key in interpreting your findings.
- Comparing the column-percentages for the cells across each row of the table we can see that there are differences between the gender groups.
- It is often most useful to look at the top and bottom rows before looking at any middles rows.
- In particular, looking across the top row, males are more likely to favour the initiative than females. And looking across the bottom row, women are more likely to oppose the initiative
- Overall, there is a clear gender difference in vote intentions.

QUESTIONS FOR REFLECTION

Based on the column-percentages in your crosstab, did you discover a relevant relationship? If so, was it evident in only one row of the table or in all rows?

Try another crosstabulation with another independent variable such as language of interview.

DISCUSSION

When you find a cell that has a substantially different column-percentage from the other cells in that row, there are usually other rows in the table that also have a difference. For example, if you find a difference in the column-percentage for cells A-B-C, then there is probably also a difference between D-E-F, or G-H-I. This happens because the column-percentage in any given cell influences the column-percentage of the other cells in that column.

		INDEPENDENT VARIABLE
		Category I	Category II	Category III
DEPENDENT VARIABLE	Category I	A	B	C
	Category II	D	E	F
	Category III	G	H	I

Syntax for Regions Using any PPIC Data

*Regional Recodes.
*Regions used in PPIC reports.
Recode 
 county (4,6,9,10,11,15,16,20,24,31,34,39,45,50,51,52,54,57,58 = 1) 
 (1,7,21,28,38,41,43,48,49 =2) (19=3)(33,36 =4) (30,37=5) 
 (else = 6) into region.

Value labels region 1 'Central Valley' 2 'SF Bay Area' 3 'LA' 4 
 'Inland Empire' 5 'OrangeSD' 6 'other'.
*Coastal Recodes as used in PPIC reports.
Recode county 
  ( 8, 12, 23, 49, 1, 7, 21, 28, 38, 41, 43, 48, 49 , 44, 27, 40=1)
  (42, 56, 3, 30, 37 = 2) (else = 3) into coastal.

Value labels coastal 1 'NorthCent Coast' 2 'South Coast' 3 'Inland'.

*crosstab below uses June 2023 data.

crosstabs tables = q34 by region coastal
  /cells = column count.