Lab 6
POL242 LAB MANUAL: Lab 6
Crosstabulation with Nominal Variables
PURPOSE
- To learn how to perform a crosstabulation and practice formulating hypotheses.
- To learn how to interpret crosstabs where at least one variables is nominal.
- To learn how to measure the strength of the relationship between two variables.
- To learn how to apply the basic measures of association: phi and Cramer’s V
MAIN POINTS
Crosstabulation
- Crosstabulation brings together two variables and displays the relationship between them in a single table. Each column in the crosstab corresponds to a category of the independent variable, and each row corresponds to a category in the dependent variable. Hence the dependent variable goes on the left, and the independent variable goes on the top.
- Each cell represents a unique combination of categories from each of the variables. For example, in the table below, the cell “G” represents all the respondents who selected Category I for the independent variable and Category III for the dependent variable.
- The percentage in each cell is calculated by dividing the number of respondents in the cell by the total number of respondents for the column. Note: the cell-percentage values will be wrong if the missing values are not eliminated. Pay attention to the percentages in each cell rather than the number (n) of respondents in each cell.
- To interpret crosstabs compare the column-percentages across the rows to see whether they differ. For instance, in the table below, compare the percentage values for cells A, B, and C, then compare D, E, and F, and finally compare G, H, and I. If the column-percentages of cells A-B-C, and/or D-E-F, and/or G-H-I markedly differ from one another then you have found a relationship.
INDEPENDENT VARIABLE | ||||
Category I | Category II | Category III | ||
DEPENDENT VARIABLE | Category I | A | B | C |
Category II | D | E | F | |
Category III | G | H | I |
Measures of Association: Nominal data–Phi and Cramer’s V
- Measures of Association calculate the strength, and for ordinal variables the direction, of the relationship between two variables.
- PHI is used to measure the strength of the association between two variables, each of which has only two categories. (It applies to 2 X 2 nominal tables only).
- CRAMER’S V is used to measure the strength of the association between one nominal variable with either another nominal variable, or with an ordinal variable. Both of the variables can have more than 2 categories. (It applies to either nominal X nominal crosstabs, or ordinal X nominal crosstabs, with no restriction on the number of categories.)
- Interpreting the value of the Level of Association:
LEVEL OF ASSOCIATION | Verbal Description | COMMENTS |
0.00 | No Relationship | Knowing the independent variable does not help in predicting the dependent variable. |
.00 to .15 | Very Weak | Not generally acceptable |
.15 to .20 | Weak | Minimally acceptable |
.20 to .25 | Moderate | Acceptable |
.25 to .30 | Moderately Strong | Desirable |
.30 to .35 | Strong | Very Desirable |
.35 to .40 | Very Strong | Extremely Desirable |
.40 to .50 | Worrisomely Strong | Either an extremely good relationship or the two variables are measuring the same concept |
.50 to .99 | Redundant | The two variables are probably measuring the same concept. |
1.00 | Perfect Relationship. | If we the know the independent variable, we can perfectly predict the dependent variable. |
INSTRUCTIONS
Crosstabulating Nominal Data
- Select an available Dataset for this exercise, perhaps from among the PPIC data available on the DataArt website.
- Enter the Codebook for the chosen dataset.
- Hypothesize a relationship between two indicators measured at the nominal level of measurement.
- For example, using the CES 2011 data one might suspect that support for economic equality might be related to gender or language with women and francophones being more supportive.
- Open the relevant data set with SPSS.
- Perform separate Frequency distributions for each of the variables. Based on the Frequency output, declare the appropriate missing values and recodeeach variable as needed.
- Conduct a Bivariate Crosstabulation relating your Dependent and Independent variables using the syntax structure demonstrated in the example shown below.
- Take care to enter the dependent variable first, followed by the independent If the variables are placed appropriately, the DV will appear on the leftof the crosstab and the IV will appear across the top (See diagram above).
- Specify the appropriate cell contents and summary statistics n the second and third lines of your syntax.
- When evaluating the measures of association, you should look at only Phi for 2 by 2 tables and Cramer’s V for other nominal tables.
- Determine whether there is a relationship between the variables based on the column-percentages in the crosstab. Then, looking at the value of the measure of association, use the above guidelines to interpret the strength of the relationship.
- Repeat the analysis until you find a set of variables with a relationship that has a moderate degree of association ( >.2).
EXAMPLES
Example #1: Using phi with two dichotomous variables
- Dataset
- CES 2011
- Dependent Variable
MBS11_B3. The government should:
1. See to it that everyone has a decent standard of living;
2. Leave people to get ahead on their own. - Independent Variables: X_{1} rgender, X_{2} intlang
rgender11. Respondent’s gender
1. Male
5. Female
Inlang11: Language of interview
1. English
5. French
- Arrow Diagrams
- X_{1} → Y
- Female → see to decent living
- X_{2} → Y
- French→ see to decent living
- Syntax
*Create gender indicator*. recode rgender11 (1=0) (5=1) into female. *Create Language indicator*. recode cps_intlang11 (1=0) (5=1) into french. recode mbs11_b3 (1=1) (2=0) into goveqch. value labels goveqch 1 'decent living' 0 'leave alone'. crosstabs tables = goveqch by female frencH /cells = column count /statistics = phi.
- Syntax Legend
- Missing Values And Recodes Determined by the trial-run of the Frequencies output
- Crosstab command: This tells SPSS which variables to use in the table. enter the Dependent variable first, then the Independent
- /cells =: This tells SPSS to put column percentages and frequencies in each cell. Make sure to indent this continuation of the Crosstabs command.
- /statistics =: This of syntax is included as part of the crosstab command in order to calculate the nominal Measures of Association (phi and Cramer’s V).
Output
Case Processing Summary | ||||||
Cases | ||||||
Valid | Missing | Total | ||||
N | Percent | N | Percent | N | Percent | |
goveqch * female | 1381 | 32.1% | 2927 | 67.9% | 4308 | 100.0% |
goveqch * french | 1381 | 32.1% | 2927 | 67.9% | 4308 | 100.0% |
goveqch * female
Crosstab | |||||
female | |||||
.00 | 1.00 | ||||
goveqch | leave alone | ||||
% within female | 22.8% | 15.5% | |||
decent living | |||||
% within female | 77.2% | 84.5% | |||
Total | Count | 637 | 744 | ||
Symmetric Measures | |||
Value | Approx. Sig. | ||
Nominal by Nominal | Phi | .093 | .001 |
Cramer’s V | .093 | .001 | |
N of Valid Cases | 1381 |
goveqch * french
Crosstab | |||||
french | |||||
.00 | 1.00 | ||||
goveqch | leave alone | Count | |||
% within french | 21.7% | 9.2% | |||
decent living | Count | ||||
% within french | 78.3% | 90.8% | |||
Total | Count | 1065 | 316 | ||
1 |
Symmetric Measures | |||
Value | Approx. Sig. | ||
Nominal by Nominal | Phi | .134 | .000 |
Cramer’s V | .134 | .000 | |
N of Valid Cases | 1381 |
- Crosstab Legend:
- The number at the top of each cell is the number of cases (n), and the number at the bottom of each cell is the column percentage. (You may find that the row total figures will slightly differ from the figures you would get from individual Frequency analyses. This is because some of the people who responded to one variable did not respond to the second and hence are eliminated by the missing values statement. So you can expect that the number of missing cases will be slightly higher in the crosstab than it would be was the individual frequency analysis.) Amongst all of these figures in the output, the most important for the your assessment will be the column-percentage for each cell.
- Measures of Association Legend:
- For the present time, the only aspect of the ‘symmetric measures’ output that you have to note is the ‘Value’ columns.
- While you can ignore the ‘Approximate Significance’ column for the time being, this will become important after we learn its meaning later in the course.
- Interpretation of Crosstab:
- In the first (goveqch * female) table comparing the column-percentages for the cells in the ‘Leave alone’ rows, we can see that there is a notable difference. These column percentages are 22.8% and 15.5%. A difference can also be observed in the
- This indicates that male (female =0)respondents are more likely to respond ‘leave alone’ to the question than are females (female=1).
- And male respondents are less likely to choose the ‘decent living’ response that are females.
- Since the crosstab is a 2 X 2 table, we know that Phi is the appropriate measure of association. The value of Phi is .09, which means that this is a Very Weak
- The value of Phi may be negative if the variables are coded in a particular way. The meaning of a negative measure of association will be discussed below. For the time being, recognize it the positive phi value here means that most of the cases are on the main diagonal of the table.
- The Phi of .09 only very weakly supports the hypothesis (X_{1} è Y)that gender is related to attitudes toward inequality.
- Looking at the second (goveeqch * french) table produces a similarly very weak relationship. So the (X2→ Y) hypothesis that language group is related to attitudes toward inequality also is only weakly supported.
Example #2: Cramer’s V
- Dataset
- CES 2011
- Dependent Variable:
MBS11_B3. The government should:
1. See to it that everyone has a decent standard of living;
2. Leave people to get ahead on their own. - Independent Variable:
- [CPSQ1_b] In federal politics do you usually think of yourself as a: Liberal, Conservative, NDP, Bloc Quebecois, or none of these?
- Arrow Diagram
- X_{2} → Y
- Partisanshipè see to decent living
- Syntax
recode mbs11_b3 (1=1) (2=0) into goveqch. value labels goveqch 1 'decent living' 0 'leave alone'. recode cps11_71 (2=1) (1=2) (4=3) (3=4)into PID. value labels PID 1 'Cons' 2 'Lib' 3 'BQ' 4 'NDP' 5 'Green' 6 'None'. crosstabs tables = goveqch by PID /cells = column count /statistics = phi.
- Syntax Legend
- Recode indicators for both DV and IV into new variable names, each with new value labels.
- In order for raw output to be displayed, the DV is recoded so as to eliminate the categories greater than 4.
- Crosstab Command with the DV first, followed by the IV.
- /statistics = phi is a syntax subcommand which instructs SPSS to calculate nominal measures of association, in this case phi and Cramer’s V.
- Output
Case Processing Summary | ||||||
Cases | ||||||
Valid | Missing | Total | ||||
N | Percent | N | Percent | N | Percent | |
goveqch * PID | 1038 | 24.1% | 3270 | 75.9% | 4308 | 100.0% |
PID | ||||||
Cons | Lib | BQ | NDP | |||
goveqch | leave alone | |||||
% within PID | 38.1% | 12.6% | 3.8% | 7.6% | ||
decent living | ||||||
% within PID | 61.9% | 87.4% | 96.2% | 92.4% | ||
Total | Count | 386 | 374 | 106 | 172 | |
Symmetric Measures | |||
Value | Approx. Sig. | ||
Nominal by Nominal | Phi | .346 | .000 |
Cramer’s V | .346 | .000 | |
N of Valid Cases | 1038 |
- Interpretation of Crosstab:
- Start by looking at the column percentages and compare across the rows. In the ‘leave alone’ row, which represents the percentage of people in each partisan grouping who say that the government should leave people to get ahead on their own, we see for example that 38.1% of Conservatives favour this option compared to 12.6% of Liberals and even smaller percentages of respondents among the other parties. This indicates a greater support for this option among Conservatives than among other partisans. However, when we look at the ‘decent living’ row we can see that this difference is reversed; a smaller portion of the Conservatives than any of the other party grouping favour seeing to it that everyone has a decent living.
- Keep in mind we are comparing column percentages, not row percentages. Hence it is, for example, incorrect to observe that 61.9% of those who favour the ‘decent living’ response are Conservatives.
- Taken together, the differences in the column percentages show substantial partisan differences in support for government providing a decent living with the NDP and BQ being most supportive, the Conservatives least, and Liberals being in the middle. Moreover, one might say that the difference here is essentially between the Conservatives and the other party groupings. One might consider recoding the IV into two categories Conservative and other.
- Interpretation of Cramer’s V:
- Since this crosstab involves a nominal independent variable with several categories, the appropriate measure of association to use in summarizing the relationship is Cramer’s V.
- Generally we would not use Phi because it is only appropriate for 2 X 2 tables. In this case, however, the two measures are the same.
- The Cramer’s V value is .36 Using the standards above, this relationship is on the cusp between a ‘strong’ and a ‘very strong’ relationship.
- We have found that there are strong differences among partisans in their responses to this indicator of attitudes regarding inequality.
QUESTIONS FOR REFLECTION
- Did you discover a relevant relationship in your crosstab based on the column-percentages? If so, was it evident in only one row of the table or in all rows?
- Can you compare the magnitude of a Phi-value from one relationship to the magnitude of a Cramer’s V value for another relationship?
- Would the strength of the relationship be affected if you looked only at the results for all the categories of the IV?
DISCUSSION
- When you find a cell that has a substantially different column-percentage from the other cells in that row, there are usually other rows in the table that also have a difference. For example, if you find a difference in the column-percentage for cells A-B-C, then there is probably also a difference between D-E-F, or G-H-I. This happens because the column-percentage in any given cell influences the column-percentage of the other cells in that column.
INDEPENDENT VARIABLE | ||||
Category I | Category II | Category III | ||
DEPENDENT VARIABLE | Category I | A | B | C |
Category II | D | E | F | |
Category III | G | H | I |
- We can compare two values of the same measures readily. But be cautious about comparing different measures of association to each other. Eg., you should compare two measures of Phi to one other, but be cautious about comparing a Phi-value to a Cramer’s V value.
- Find out by including all the values on party identification (pid).