new lab 4
UCSC LAB MANUAL
Lab 4
PURPOSE
 To identify the meaning of descriptive terms for each type of variable: nominal, ordinal, and interval.
 To become acquainted with recoding.
MAIN POINTS
 Types of Variables
 Nominal: The categories of the variable have no inherent rank or order. The categories are nevertheless mutually exclusive and exhaustive. Examples are partisan preference or gender.
 Ordinal: The categories of the variable are ordered or ranked, from less to more or more to less, but there is not an equivalent distance between them. E.g. How much should be done to reduce the gap between the rich and poor? Much more, somewhat more, the same as now, somewhat less or much less?
 Interval: The categories of the variable are ordered and have a uniform distance between them. E.g., Income
 An interval variable can be transformed into an ordinal variable by recoding it. For example, we could divide income into categories of income groups such as $010,000, $10,00020,000…etc.
 Descriptive Statistics
 Mode: The most frequent value.
 Median: The value of the middle case, i.e, the one with the same number of cases above and below it.
 Mean: Computed by adding all the values and dividing this sum by the number of cases
 Standard deviation: Expresses the degree of variation within a variable on the basis of the average deviation from the mean.
 Variance: The squared value of the std. deviation. Hence the standard deviation is the squareroot of the variance.
 Skew: This measures the symmetry of the distribution
 Kurtosis: This measures the peakedness of the distribution
INSTRUCTIONS
 Enter the codebook for the data set on which you would like to work.
 Identify three indicators one of which should be nominal, the second should be ordinal, and the last one should, if possible, should be measured at the interval
 Using SPSS syntax, run a frequency analysis for each of the three indicators.
 Based on the Output of the trial run, identify which values should be identified as Missing Values and decide whether and how best to Recode the data. Prior to declaring missing values and making the appropriate recodes any summary measures may be misleading.
 Edit the syntax for missing values and recode as needed. Labs 1 & 2 include a number of examples. It is essential to relabel any recoded values as the old labels will not be automatically changed.
 Finally, where relevant, identify the MEANING of the summary measures for each type of variable.
EXAMPLES
Variable: Attitudes regarding inequality
 Dataset:
 PPIC Oct 2016
 Indicator Type:
 nominal
 Indicator: D8a or D8comb
For classification purposes, we’d like to know what your racial background is. Are you white, black or AfricanAmerican, Asian, Pacific Islander or Native Hawaiian, American Indian or an Alaskan native, a member of another race, or a combination of these?
 Syntax:
missing values d8com (9). recode D8com (3=1) (4=2) (else =3) into ethn. value labels ethn 1 'Hisp' 2 'White' 3 'other'. fre var ethn /statistics = mode. Output:
ethn  
Frequency  Percent  Valid Percent  Cumulative Percent  
Valid  Hisp  576  33.8  33.8  33.8 
White  785  46.1  46.1  79.9  
other  343  20.1  20.1  100.0  
Total  1704  100.0  100.0 
Statistics 

ethn 

N 
Valid 
1704 
Missing 
0 

Mean 
1.8633 

Median 
2.0000 

Mode 
2.00 
Note that while 9’s are explicitly declared above as missing, the values 1,2,5,6 are recoded to the value of 3 using the else specification. Alternatively, the value of 9 can be implicitly declared as missing by omitting it from the recode.
recode d8com (1, 2, 5, 6 =3) (3=1) (4=2) into ethn2.
value labels ethn2 1 ‘Hisp’ 2 ‘White’ 3 ‘other’
Example #2
 Dataset:
 PPIC October 2016
 Indicator Type:
 Ordinal
 Indicator(s): Q38 & Q37
 Q38 Generally speaking, how much interest would you say you have in politics—a great deal, a fair amount, only a little, or none?
 Q37: Next, would you consider yourself to be politically:
 Syntax:
missing values q37 q38 (8,9). fre var q37 q38 /statistics = mode median mean stddev variance skew kurtosis.
 Output:
Q37. Would you consider yourself to be:  
Freq  Percent  Valid Percent  Cumulative Percent  
very liberal  236  13.8  14.3  14.3  
somewhat liberal  370  21.7  22.4  36.6  
middleoftheroad  474  27.8  28.7  65.3  
somewhat conservative  338  19.8  20.4  85.7  
very conservative  236  13.8  14.3  100.0  
Total  1654  97.1  100.0 
Q38. Generally speaking, how much interest would you say you have in politics?  
Freq  Percent  Valid Percent  Cumulative Percent  
great deal  552  32.4  32.7  32.7  
fair amount  607  35.6  35.9  68.6  
only a little  429  25.2  25.4  94.0  
none  102  6.0  6.0  100.0  
Total  1690  99.2  100.0 
Summary statistics  
Q37  Q38.  
N  Valid  1654  1690 
Missing  50  14  
Mean  2.98  2.05  
Median  3.00  2.00  
Mode  3  2  
Std. Deviation  1.253  .906  
Variance  1.570  .820  
Skewness  .037  .394  
Kurtosis  .970  .797 
Note these frequency tables have been edited to remove extraneous elements. This can be accomplished by copying the tables in Rich Text Format from the output and pasting this into a word processor for editing. Saving the edited output as a .pdf file permits the creation of more attractive tables such as:
Summary Stats
Although many of the summary statistics are technically inappropriate for use with variables measured at the ordinal level, some useful information can nevertheless be gleaned. For example, standard deviation and variance scores suggest greater variation in q37 and greater skew in q38. The skew on q38 can be reduced by recoding.
recode q38 (1=1) (2=2) (3,4=3) into interest. value labels interest 1 'great deal' 2 'some' 3 'little or none'. fre var q37 interest/ statistics = mode median mean stddev var skew kurt.
Example #3
 Dataset:
 ANES 2016
 Indicator Type:
 interval
 Indicator: 161184
Self Placement on Government vs Private Medical Insurance
Where would you place yourself on this scale, or haven’t you thought much about this?
Government Insurance Plan 1 2 3 4 5 6 7 Private Insurance Plan.
(additional interval measures are provided by V161178 thru v161186 and v161198v161203)
 Syntax:
missing values v161184 (8, 9, 99). fre var v161184 /statistics = mode median mean stddev variance skew kurtosis.
*Advanced syntax for placing v161184 on 01 scale*. compute medpref = (((v161184 * 1) +7)/7). value labels medpref 0 'govplan' 1 'privplan'. fre var medpref /statistics = mode median mean stddev variance skew kurtosis.
 Output:
7pt scale govtprivate medical insur scale: self  
Freq  Percent  Valid %  Cumul %  
1. Govt insur plan  640  15.0  17.0  17.0  
2  389  9.1  10.3  27.3  
3  393  9.2  10.4  37.8  
4  745  17.4  19.8  57.5  
5  478  11.2  12.7  70.2  
6  497  11.6  13.2  83.4  
7. Private insur plan  624  14.6  16.6  100.0  
Total  3766  88.2  100.0 
Statistics  
Govt vs Private med ins  
N  Valid  3766 
Missing  505  
Mean  4.07  
Median  4.00  
Mode  4  
Std. Deviation  2.047  
Variance  4.190  
Skewness  .083  
Kurtosis  1.218 
QUESTIONS FOR REFLECTION
 Why aren’t all of the descriptive statistics appropriate to describe all three variables?
 Can we ever learn something from measures appropriate for another level of data?
 Can graphics help us better understand our data?Discussion
 Not all summary measures are appropriate for every variable.
 With nominal variables, the mode is the only truly useful descriptive statistic and the range can be used for dispersion. For dichotomous variables coded between 0 and1 (dummy variables), the mean is useful to indicate the proportions.
 With ordinal variables, the mode, median and range are all useful.
 With interval/ratio variables, the mean, median, range and standard deviation are useful. The mode can be used (as in our example), but often it is not very useful with interval data.
 Skew and kurtosis can be particularly helpful with interval level data.
 Grapics can often provide a better picture of our results. For example, the skew before and after recoding q38 can perhaps be better appreciated by using the following syntax. The first two commands below work with the PPIC October 2016 data while the third works with the ANES 2016 data.
GRAPH /BAR(SIMPLE)=PCT BY q38. GRAPH /BAR(SIMPLE)=PCT BY interest. GRAPH /BAR(SIMPLE)=PCT BY v161184.
*Optional Advanced syntax for placing v161184 on 01 scale*. compute medpref = (((v161184 * 1) +7)/7). value labels medpref 0 'govplan' 1 'privplan'. fre var medpref /statistics = mode median mean stddev variance skew kurtosis. *Syntax used in Lecture on Univariate Statistics with PPIC October 2016 data*.
recode q21 (1=1) (2=0) into MJpropD.
value labels MJprop 1 ‘yes’ 0 ‘no’.
recode q22 (1=1) (2=.66) (3=.33) (4=0) into MJimp.
value labels MJimp 1 ‘very’ .66 ‘somewhat’ .33 ‘not too’ 0 ‘notatall’.
recode q36 (1=1) (2=0) into MJlegalD.
value labels MJlegal 1 ‘yes’ 0 ‘no’.
recode q36a (1=1) (2=.5) (3=.0) into MJtry.
value labels MJtry 1 ‘recent’ .5 ‘not recent’ 0 ‘no’.
fre var MJpropD MJimp MJlegalD Mjtry
/statistics mode median mean stdev variance skew kurtosis
/barchart percent.