UCSC Lab 4

UCSC LAB MANUAL

Lab 4

PURPOSE

  • To identify the meaning of descriptive terms for each type of variable: nominal, ordinal, and interval.
  • To become acquainted with recoding.

MAIN POINTS

  • Types of Variables
    • Nominal: The categories of the variable have no inherent rank or order.  The categories are nevertheless mutually exclusive and exhaustive. Examples are partisan preference or gender.
    • Ordinal: The categories of the variable are ordered or ranked, from less to more or more to less, but there is not an equivalent distance between them. E.g. How much should be done to reduce the gap between the rich and poor? Much more, somewhat more, the same as now, somewhat less or much less?
    • Interval: The categories of the variable are ordered and have a uniform distance between them. E.g., Income
    • An interval variable can be transformed into an ordinal variable by recoding it. For example, we could divide income into categories of income groups such as $0-10,000, $10,000-20,000…etc.
  • Descriptive Statistics
    • Mean: Computed by adding all the values and dividing this sum by the number of cases
    • Standard deviation: Expresses the degree of variation within a variable on the basis of the average deviation from the mean.
    • Variance: The squared value of the std. deviation. Hence the standard deviation is the square-root of the variance.
    • Median: The value of the middle case, i.e, the one with the same number of cases above and below it.
    • Mode: The most frequent value.
    • Skew: This measures the symmetry of the distribution
    • Kurtosis: This measures the peakedness of the distribution

INSTRUCTIONS

  1. Enter the codebook for the data set on which you would like to work.
  2. Identify three indicators one of which should be nominal, the second should be ordinal, and the last one should, if possible, should be measured at the interval 
  3. Using SPSS syntax, run a frequency analysis for each of the three indicators.
  4. Based on the Output of the trial run, identify which values should be identified as Missing Values and decide whether and how best to Recode the data. Prior to declaring missing values and making the appropriate recodes any summary measures may be misleading.
  5. Edit the syntax for missing values and recode as needed. Labs 1 & 2 include a number of examples. It is essential to re-label the recoded values as the old labels will not be automatically changed.
  6. Finally, where relevant, identify the MEANING of the summary measures for each type of variable.

EXAMPLES

Variable: Attitudes regarding inequality

  • Dataset:
    • ANES 2012
  • Indicator Type:
    • nominal
  • Indicator: pid_self

Generally speaking, do you usually think of yourself as a DEMOCRAT, a REPUBLICAN  an INDEPENDENT, or what?
0. no preference
1. Democrat
2. Republican
3. Independent
5. Other party
-8. Don’t know
-9. Refused

  • Syntax:
recode pid_self (1=1) (2=3) (3=2) into pid_selfR.
value labels pid_selfR 1 'Dem' 2 'Ind' 3 'Rep'.
fre var goveqch
   /statistics = mode median mean stddev variance skew kurtosis

Note that the 0, 5, -8, -9  value categories are rendered as missing by the recode. Alternatively,
they could be explicitly declared as missing by specifying
missing values pid_self (0, 5).
missing values pid_self (-8, -9).
Note that since SPSS permits no more than 3 missing values at a time, two separate missing values commands are used here.

  • Output:

 

pid_selfR

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

Dem

2363

39.9

42.2

42.2

Ind

1845

31.2

33.0

75.2

Rep

1389

23.5

24.8

100.0

Total

5597

94.6

100.0

Missing

System

319

5.4

Total

5916

100.0

 

Statistics

pid_selfR

N

Valid

5597

Missing

319

Mean

1.8260

Median

2.0000

Mode

1.00

Std. Deviation

.80012

Variance

.640

Skewness

.323

Kurtosis

-1.369

    • Note that the recode does several things. First, it makes support for government action receive the high score. By recoding the variable as a dichotomy it permits the mean score to indicate the proportion of respondents favouring action. Third it creates a new variable name. Fourth it indirectly renders the “not sure” category as missing by not including it in the new variable

Example #2

  • Dataset:
    • CES2011
  • Indicator Type:
    • Ordinal
  • Indicator: PES11_41
  • How much should be done to reduce the gap between the rich and poor in Canada? Much more, somewhat more, the same as now, somewhat less or much less?
  • Syntax:
missing values pes11_41 (8,9).
recode pes11_41 (1=1) (2=.75) (3=.5) (4= .25) (5=0) into undogap.
value labels undogap 0 'muchless' .25 'someless' .5 'asnow'
   .75 'somemore' 1 'muchmore'.

fre var undogap
  /statistics = mode median mean stddev variance skew kurtosis.
  • Output:
undogap
Frequency Percent Valid Percent Cumulative Percent
muchless 63 1.5 2.0 2.0
someless 81 1.9 2.5 4.5
asnow 638 14.8 19.8 24.2
somemore 1252 29.1 38.8 63.0
muchmore 1193 27.7 37.0 100.0
Total 3227 74.9 100.0
Missing System 1081 25.1
Total 4308 100.0
Statistics
undogap
N Valid 3227
Missing 1081
Mean .7658
Median .7500
Mode .75
Std. Deviation .22910
Variance .052
Skewness -.930
Kurtosis .850

Note that the recode makes the high score indicate support for the government doing much more which is useful since we understand the indicator to be measuring support for action against inequality.

Example #3

  • Dataset:
    • CES2011
  • Indicator Type:
    • interval
  • Indicator: MBS11_k2

Please place yourself on a scale of 0 to 10, where 0 means you strongly believe that the government SHOULD ACT to reduce differences in income and wealth, and 10 means that you strongly believe that the government SHOULD NOT ACT to reduce differences in income and wealth.

0 ‘Government should act’ thru 10 ‘government should not act.

  • Syntax:
missing values mbs11_k2 (-99).
compute govact = (((mbs11_k2 * -1) +10)/10).
value labels govact 0 'not act' 1 'gov act'.

fre var govact
  /statistics = mode median mean stddev variance skew kurtosis.
  • Output:
govact
Frequency Percent Valid Percent Cumulative Percent
not act 65 1.5 4.6 4.6
.10 37 .9 2.6 7.2
.20 93 2.2 6.6 13.8
.30 108 2.5 7.6 21.4
.40 111 2.6 7.9 29.3
.50 250 5.8 17.7 47.0
.60 191 4.4 13.5 60.5
.70 211 4.9 14.9 75.4
.80 147 3.4 10.4 85.8
.90 84 1.9 5.9 91.7
gov act 117 2.7 8.3 100.0
Total 1414 32.8 100.0
Missing System 2894 67.2
Total 4308 100.0
Statistics
govact
N Valid 1414
Missing 2894
Mean .5634
Median .6000
Mode .50
Std. Deviation .26142
Variance .068
Skewness -.272
Kurtosis -.519

QUESTIONS FOR REFLECTION

  • Why aren’t all of the descriptive statistics appropriate to describe all three variables?
  • Can we ever learn something from measures appropriate for another level of data?
  • Can graphics help us better understand our data?

Discussion

  • Not all summary measures are appropriate for every variable.
  • With nominal variables, the mode is the only truly useful descriptive statistic and the range can be used for dispersion. For dichotomous variables coded between 0 and1 (dummy variables), the mean is useful to indicate the proportions.
  • With ordinal variables, the mode, median and range are all useful.
  • With interval/ratio variables, the mean, median, range and standard deviation are useful. The mode can be used (as in our example), but often it is not very useful with interval data.
  • Skew and kurtosis can be particularly helpful with interval level data.
  • Grapics can often provide a better picture of our results. The relevant syntax is:
GRAPH
  /BAR(SIMPLE)=PCT BY undogap.

GRAPH
  /BAR(SIMPLE)=PCT BY govact.

GRAPH
  /BAR(SIMPLE)=PCT BY goveqch.