new lab 4

UCSC LAB MANUAL

***Note: Syntax from Week 2a lecture appears at the end of this lab.***

Lab 4

PURPOSE

  • To identify the meaning of descriptive terms for each type of variable: nominal, ordinal, and interval.
  • To become acquainted with recoding.

MAIN POINTS

  • Types of Variables
    • Nominal: The categories of the variable have no inherent rank or order.  The categories are nevertheless mutually exclusive and exhaustive. Examples are partisan preference or gender.
    • Ordinal: The categories of the variable are ordered or ranked, from less to more or more to less, but there is not an equivalent distance between them. E.g. How much should be done to reduce the gap between the rich and poor? Much more, somewhat more, the same as now, somewhat less or much less?
    • Interval: The categories of the variable are ordered and have a uniform distance between them. E.g., Income
    • An interval variable can be transformed into an ordinal variable by recoding it. For example, we could divide income into categories of income groups such as $0-10,000, $10,000-20,000…etc.
  • Descriptive Statistics
    • Mode: The most frequent value.
    • Median: The value of the middle case, i.e, the one with the same number of cases above and below it.
    • Mean: Computed by adding all the values and dividing this sum by the number of cases
    • Standard deviation: Expresses the degree of variation within a variable on the basis of the average deviation from the mean.
    • Variance: The squared value of the std. deviation. Hence the standard deviation is the square-root of the variance.
    • Skew: This measures the symmetry of the distribution
    • Kurtosis: This measures the peakedness of the distribution

INSTRUCTIONS

  1. Enter the codebook for the data set on which you would like to work.
  2. Identify three indicators, one of which should be nominal, the second should be ordinal, and the last one should, if possible, should be measured at the interval 
  3. Using SPSS syntax, run a frequency analysis for each of the three indicators.
  4. Based on the Output of the trial run, identify which values should be identified as Missing Values and decide whether and how best to Recode the data. Prior to declaring missing values and making the appropriate recodes any summary measures may be misleading.
  5. Edit the syntax for missing values and recode as needed. Labs 1 & 2 include a number of examples. It is essential to re-label any recoded values as the old labels will not be automatically changed.
  6. Finally, where relevant, identify the MEANING of the summary measures for each type of variable.

EXAMPLES

Variable: Measure of Ethnicity

  • Dataset:
    • PPIC Oct 2016
  • Indicator Type:
    • nominal
  • Indicator: D8a or D8comb

For classification purposes, we’d like to know what your racial background is. Are you white, black or African-American, Asian, Pacific Islander or Native Hawaiian, American Indian or an Alaskan native, a member of another race, or a combination of these?

  • Syntax:
missing values d8com (9).
recode D8com (3=1) (4=2) (else =3) into ethn.
value labels ethn 1 'Hisp' 2 'White' 3 'other'.
fre var ethn
   /statistics = mode.

Output:

ethn
Frequency Percent Valid Percent Cumulative Percent
Valid Hisp 576 33.8 33.8 33.8
White 785 46.1 46.1 79.9
other 343 20.1 20.1 100.0
Total 1704 100.0 100.0

Statistics

ethn

N

Valid

1704

Missing

0

Mean

1.8633

Median

2.0000

Mode

2.00

Note that while 9’s are explicitly declared above as missing, the  values 1,2,5,6 are recoded to the value of 3 using the else specification. Alternatively, the value of 9 can also be implicitly declared as missing by omitting it from the recode.
recode d8com (1, 2, 5, 6 =3) (3=1) (4=2) into ethn2.
value labels ethn2 1 ‘Hisp’ 2 ‘White’ 3 ‘other’

Example #2

  • Dataset:
    • PPIC October 2016
  • Indicator Type:
    • Ordinal
  • Indicator(s):  Q38 & Q37
    • Q38 Generally speaking, how much interest would you say you have in politics—a great deal, a fair amount, only a little, or none?
    • Q37: Next, would you consider yourself to be politically:
  • Syntax:
missing values q37 q38 (8,9).

fre var q37 q38
  /statistics = mode median mean stddev variance skew kurtosis.
  • Output:
Q37. Would you consider yourself to be:
Freq Percent Valid Percent Cumulative Percent
very liberal 236 13.8 14.3 14.3
somewhat liberal 370 21.7 22.4 36.6
middle-of-the-road 474 27.8 28.7 65.3
somewhat conservative 338 19.8 20.4 85.7
very conservative 236 13.8 14.3 100.0
Total 1654 97.1 100.0
Q38. Generally speaking, how much interest would you say you have in politics?
Freq Percent Valid Percent Cumulative Percent
great deal 552 32.4 32.7 32.7
fair amount 607 35.6 35.9 68.6
only a little 429 25.2 25.4 94.0
none 102 6.0 6.0 100.0
Total 1690 99.2 100.0
Summary statistics
Q37 Q38.
N Valid 1654 1690
Missing 50 14
Mean 2.98 2.05
Median 3.00 2.00
Mode 3 2
Std. Deviation 1.253 .906
Variance 1.570 .820
Skewness .037 .394
Kurtosis -.970 -.797

Note these frequency tables have been edited to remove extraneous elements. This can be accomplished by copying the tables in Rich Text Format from the output and pasting them into a word processor for editing. Saving the edited output as a .pdf file permits the creation of more attractive tables such as:
Summary Stats

Although many of the summary statistics are technically inappropriate for use with variables measured at the ordinal level, some useful information can nevertheless be gleaned. For example, standard deviation and variance scores suggest greater variation in q37 and greater skew in q38.  The skew on q38 can be reduced by recoding.

recode q38 (1=1) (2=2) (3,4=3) into interest.
value labels interest 1 'great deal' 2 'some' 3 'little or none'.
fre var q37 interest/ statistics = mode median mean stddev var skew kurt.

Example #3

  • Dataset:
    • ANES 2016
  • Indicator Type:
    • interval
  • Indicator: 161184

Self Placement on Government vs Private Medical Insurance
Where would you place yourself on this scale, or haven’t you thought much about this?

Government Insurance Plan  1   2   3   4   5   6   7   Private Insurance Plan.

(additional interval measures are provided by V161178 thru v161186 and v161198-v161203)

  • Syntax:
missing values v161184 (-8, -9, 99).
fre var v161184 
    /statistics = mode median mean stddev variance skew kurtosis.
*Advanced syntax for placing v161184 on 0-1 scale*.
compute medpref = (((v161184 * -1) +7)/7). 
value labels medpref 0 'govplan' 1 'privplan'. 
fre var medpref 
    /statistics = mode median mean stddev variance skew kurtosis.
  • Output:
7pt scale govt-private medical insur scale: self
     Freq Percent Valid % Cumul %
1. Govt insur plan 640 15.0 17.0 17.0
2 389 9.1 10.3 27.3
3 393 9.2 10.4 37.8
4 745 17.4 19.8 57.5
5 478 11.2 12.7 70.2
6 497 11.6 13.2 83.4
7. Private insur plan 624 14.6 16.6 100.0
Total 3766 88.2 100.0
Statistics
Govt vs Private med ins
N Valid 3766
Missing 505
Mean 4.07
Median 4.00
Mode 4
Std. Deviation 2.047
Variance 4.190
Skewness -.083
Kurtosis -1.218
 QUESTIONS FOR REFLECTION
      • Why aren’t all of the descriptive statistics appropriate to describe all three variables?
      • Can we ever learn something from measures appropriate for another level of data?
      • Can graphics help us better understand our data?
DISCUSSION
  • Not all summary measures are appropriate for every variable.
  • With nominal variables, the mode is the only truly useful descriptive statistic and the range can be used for dispersion. For dichotomous variables coded between 0 and 1 (dummy variables), the mean is useful to indicate the proportions.
  • With ordinal variables, the mode, median and range are all useful.
  • With interval/ratio variables, the mean, median, range and standard deviation are useful. The mode can be used (as in our example), but often it is not very helpful with interval data.
  • Skew and kurtosis can be particularly helpful with interval level data.
  • Graphics can often provide a better picture of our results. For example, the skew before and after recoding q38 can perhaps be better appreciated by using the following syntax. The first two commands below work with the PPIC October 2016 data while the third works with the ANES 2016 data.
GRAPH
  /BAR(SIMPLE)=PCT BY q38.
GRAPH
  /BAR(SIMPLE)=PCT BY interest.
GRAPH
  /BAR(SIMPLE)=PCT BY v161184.
*Optional Advanced syntax for placing v161184 on 0-1 scale*.
compute medpref = (((v161184 * -1) +7)/7). 
value labels medpref 0 'govplan' 1 'privplan'. 
fre var medpref 
    /statistics = mode median mean stddev variance skew kurtosis.
***Syntax used in 15 Jan Lecture***.
recode q21 (1=1) (2=0) into MJpropD.
value labels MJprop 1 'yes' 0 'no'.

recode q22 (1=1) (2=.66) (3=.33) (4=0) into MJimp.
value labels MJimp 1 'very' .66 'somewhat' .33 'not too' 0 'notatall'.

recode q36 (1=1) (2=0) into MJlegalD.
value labels MJlegal 1 'yes' 0 'no'.

recode q36a (1=1) (2=.5) (3=.0) into MJtry.
value labels MJtry 1 'recent' .5 'not recent' 0 'no'.

fre var MJpropD MJimp MJlegalD Mjtry
  /statistics mode median mean stdev variance skew kurtosis
  /barchart percent.

Summary stats as presented in lecture  2a available here:
Univariate Statistics on MJ Indicators

The following syntax produces the InterQuartile Range Scores.

 
EXAMINE VARIABLES=MJPropD MJimp MJLegalD MJtry
  /PLOT NONE
  /PERCENTILES(25,75) round.

This is how one produces a boxplot as discussed by K&W

EXAMINE VARIABLES=MJPropD MJImp MJLegalD MJTry 
  /COMPARE VARIABLE
  /PLOT=BOXPLOT.