UCSC Lab 2 | Data Art

UCSC LAB 2

Crosstabulation with Non-Interval Variables

PURPOSE

To learn how to perform a crosstabulation and practice formulating hypotheses.
To appreciate how crosstabulation allows us to make comparisons relevant to our hypotheses.
Introduce the logic of comparison

MAIN POINTS

Crosstabulation

Crosstabulation brings together the indicators for two variables and displays the relationship between them in a single table. Each column in the crosstab corresponds to a category of the independent variable, and each row corresponds to a category in the dependent variable. Hence the dependent variable goes on the left, and the independent variable goes on the top.
Each cell represents a unique combination of categories from each of the variables. For example, in the table below, the cell “G” represents all the respondents who selected Category I for the independent variable and Category III for the dependent variable.
The percentage in each cell is calculated by dividing the number of respondents in the cell by the total number of respondents for the column. Note: the cell-percentage values will be affected by whether or not we treat some categories of our indicators as missing values. Pay attention to the percentages in each cell rather than the number (n) of respondents in each cell.
To interpret crosstabs compare the column-percentages across the rows to see whether they differ. For instance, in the table below, compare the percentage values for cells A, B, and C, then compare D, E, and F, and finally compare G, H, and I. If the column-percentages of cells A-B-C, and/or D-E-F, and/or G-H-I remarkably differ from one another then you may have found a relationship.
Crosstabulation does not work effectively if either variable has a great many value categories.

		INDEPENDENT VARIABLE
		Category I	Category II	Category III
DEPENDENT VARIABLE	Category I	A	B	C
	Category II	D	E	F
	Category III	G	H	I

INSTRUCTIONS:

Crosstabulating Variables

Select an appropriate data set such as one of the PPIC 2015 statewide surveys.
Enter the codebook for the dataset you have chosen.
Hypothesize a relationship between two variables in the dataset.
- For example, you might think that attitudes toward inequality may vary by partisanship
In order to avoid corrupting your data, lock your data set prior to beginning your analyses.
To make certain there is some variation on the variables, use SPSS to perform a frequency analysis for each variable.
In the Analysis menu of SPSS, select Descriptive Statistics and then Crosstabs. Place your dependent in the rows box and your independent variable in the columns box. Click on the “Cells” tab and select column percentages.
Consider whether recoding your variables would be desirable and do so as necessary.
Click on the “Paste” button. Select the syntax and run it.
Determine whether there is a relationship between the variables based on the column-percentages in the crosstab.
Repeat the analysis until you find a set of variables with a relationship.

EXAMPLE

Dataset:
- Statewide Survey May 2015
Dependent Variable:
- Q16. Governor Brown recently directed the State Water ResourcesControl Board to implement mandatory water reductions in cities and towns across California to reduce statewide water usage by 25 percent. Do you think this action does too much, the right amount, or not enough to respond to the current drought in California?
Independent Variable:
- Q36 .Next, would you consider yourself to be politically:

Arrow Diagrams :
- X → Y
- Ideology →Attitude to drought reductions

Syntax:

*Preparing the DV*.
missing values q16 (8,9).

*Preparing the IV*
missing values q35 (8,9). 
recode q35 (1,2=1) (3=2) (4,5=3) into ideol.
value labels ideol 1 'liberal' 2 'mid-road' 3 'conserv'.

*Running the Crosstabulation*.
 crosstabs 
    /tables=q16 BY ideol
   /cells=column count.

Output:

Crosstabulation of Attitudes toward Water Reductions by Ideology

Ideology—————-Liberal Mid-Road Conserv

Q16. mandatory water reductions	too much	Lib	Mid	Cons
	too much	8.8%	14.3%	19.3%
	the right amount
	the right amount	51.1%	49.4%	48.8%
	not enough
	not enough	40.1%	36.3%	31.8%
Total		509	474	553

Source: PPIC May 2015

Interpretation of Crosstab:

The number in each cell is a column-percentage. At the bottom of each column is the number of cases on which the column percentages are based. The column percentages are key in interpreting your findings.

Comparing the column-percentages for the cells across each row of the table we can see that there are differences among the ideological groups.
It is often most useful to look at the top and bottom rows before looking at any middles rows.
In particular, looking across the top row, Conservatives are most likely to see the restrictions as being too much. And looking across the bottom row, Conservatives are least likely to view the restrictions as not enough. And in the middle row, Conservatives are least likely to see the restrictions as the right amount.
Overall, Conservatives seem least supportive of government action regarding drought, whereas liberals are more supportive.
Middle of the road respondents are, appropriately, in the middle.

QUESTIONS FOR REFLECTION

Based on the column-percentages in your crosstab, did you discover a relevant relationship? If so, was it evident in only one row of the table or in all rows?

Try another crosstabulation with another independent variable.

DISCUSSION

When you find a cell that has a substantially different column-percentage from the other cells in that row, there are usually other rows in the table that also have a difference. For example, if you find a difference in the column-percentage for cells A-B-C, then there is probably also a difference between D-E-F, or G-H-I. This happens because the column-percentage in any given cell influences the column-percentage of the other cells in that column.

		INDEPENDENT VARIABLE
		Category I	Category II	Category III
DEPENDENT VARIABLE	Category I	A	B	C
	Category II	D	E	F
	Category III	G	H	I