new lab 9

Index Construction

PURPOSE

  • To learn to construct an index.

MAIN POINTS

  • The Reliability procedure tests only the suitability of a set of variables for the construction of an index (internal validation).
  • To construct an index you use the compute command, e.g., COMPUTE NEWNAME=Q1+Q2+Q3+Q4.
  • Recoding your index into fewer categories is often essential to create interpretable crosstabs.
  • Using your index in crosstabulation enables you to examine its relations to other variables. This permits not only external validation of your index; it can also enhance your explanatory research..

EXAMPLE

  • Dataset:
    • PPIC October 2016
  • Concept:
    • Attitude toward recreational use of Marijuana (Alpha =.78)
  • Indicators:

Syntax

*Identifying MJ Index Items*.

recode q21 (1=1) (2=0) into MJPropD.
value labels MJPropD 1 'yes' 0 'no'.
recode q36 (1=1) (2=0) into MJLegalD.
value labels MJLegalD 1 'yes' 0 'no'.
recode q36a (1=1) (2=.5) (3=.0) into MJTry.
value labels MJTry 1 'recent' .5 'not recent' 0 'no'.

*Replicating the Reliability Analysis*.

reliability
 /variables=MJPropD MJLegalD MJTry
 /scale('MJ3') MJPropD MJLegalD MJTry
 /statistics=descrpitive
 /summary=total.

*Constructing the Index*.
compute RawMJ3 = (MJPropD + MJLegalD + MJTry).
fre var RawMJ3
 /statistics = mean median mode stddev var skew kurtosis.

*Recoding the Index*.

recode RawMJ3 (0, .5=0) (1 thru 2= .5) (2.5, 3 =1) into MJ3.
value labels MJ3 0 'low' .5 'med' 1 'hi'.
fre var MJ3.
  • Syntax Legend
    • Comments can be inserted between asterisks *. . .*.
    • Most of the above syntax is familiar from the previous lab.
    • The compute command is where the index is constructed. However a subsequent frequency command of the raw index is also necessary to see the index and calculate its summary measures.
    • Recoding an index is essential to produce effective tables. Here recodes place about 1/3 of the cases in each category, using the cumulative percent column of the frequency analysis of the raw index as a guide.
    • Recode the index into a new name as it is will be useful to retain both the complete raw and recoded forms of an index.
  • Output for Raw Index

    RawMJ3

    Frequency

    Percent

    Valid Percent

    Cumulative Percent

    Valid

    .00

    341

    20.0

    27.7

    27.7

    .50

    143

    8.4

    11.6

    39.3

    1.00

    74

    4.3

    6.0

    45.3

    1.50

    30

    1.8

    2.4

    47.7

    2.00

    223

    13.1

    18.1

    65.8

    2.50

    257

    15.1

    20.9

    86.7

    3.00

    164

    9.6

    13.3

    100.0

    Total

    1232

    72.3

    100.0

    Missing

    System

    472

    27.7

    Total

    1704

    100.0

Summary Statistics
Mean= 1.44
Median= 2
Mode= 0
StdDev= 1.14
variance= 1.30
Skew= -.09
Kurtosis= – 1.62

The newly computed index variable has many categories, making a crosstab using it unwieldy. Therefore recode into fewer categories.

*Recoding the Index*.

recode RawMJ3 (0, .5=0) (1 thru 2= .5) (2.5, 3 =1) into MJ3.
value labels MJ3 0 'low' .5 'med' 1 'hi'.
fre var MJ3.

Frequency Distribution for Recoded Index

MJ3

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

low

484

28.4

39.3

39.3

med

327

19.2

26.5

65.8

hi

421

24.7

34.2

100.0

Total

1232

72.3

100.0

Missing

System

472

27.7

Total

1704

100.0

The recoded index can be readily crosstabulated with independent variables.

Mean= .47
Median= .5
Mode= .00
StdDev= .43
Variance= .18
Skew =  .098
Kurtosis = – 1.63

*Creating Indicators for Party Identification & Ideology*.
fre var q40c.
recode q40c (1=0) (3=.5) (2=1) into Democrat.
value labels Democrat 1 'Democ' .5 'Indep' 0 'Repub'.
crosstabs tables = MJ3 by Democrat liberal3
 / cells = column count
 /statistics = btau.

fre var q37.
missing values q37 (8,9).
recode q37 (1,2=1) (3=.5) (4,5= 0) into liberal3.
value labels liberal3 1 'liberal' .5 'middle' 0 'conserv'.
fre var liberal3

*Crosstabulation of MJ3 by Democrat & Liberal.*

crosstabs tables = MJ3 by Democrat,liberal3
  / cells = column count
  /statistics = btau.

Support for Recreational Marijuana by Partisanship

Support for Recreational MJ Partisanship
Repub Indep Democ
Low 61.4% 31.1% 33.5%
Medium 19.1% 29.1% 28.5%
High 19.5% 39.9% 37.9%
Total 272 409 522

Taub = .152
Source: PPIC October 2016

Support for Recreational Marijuana by Ideology

Support for Recreational MJ ldeology
conserv middle liberal
Low 62.0% 34.4% 21.3%
Medium 18.1% 31.8% 30.4%
High 20.0% 33.8% 48.3%
Total 421 337 451

Taub = .308
Source: PPIC October 2016

pdf file of tables: Tabs for Lab 9


Interpretation

    • The recoded variable is more manageable.
    • The frequency analysis for the index shows that scores range from zero through three. This makes sense since the index is composed of three items each of which is scored between zero and one.
    • Summary measures of central tendency and variation can be calculated.
    • The index is recoded into three categories using the cumulative percentages as a guide in finding cut points roughly approximating 33% and 66%.
    • The recoded variable is more manageable.
    • The index is crosstabulated with an indicators of political partisanship and ideology.
    • Crosstabs permit calculation of measures of association between the recoded index and other variables. This can be useful for both external validation and explanatory research.
    • The crosstabs and measures of association provide weak support for a partisan explanation of support for recreational marijuana.

INSTRUCTIONS

  1. Use the data set and questions you worked with in Lab 8.
  2. Having found a combination of questions that produce an alpha greater than .60, ensure that the range for each of the questions is similar to one another.  This is to ensure that none of the items are over or under-represented in the index.  For example, if the first question has a range from 1 to 3 and the second has a range from 1 to 100, then the second will be disproportionately over-represented.  Recode all the questions such that their ranges are similar, not necessarily identical.
  3. To create the index, combine all the different questions into a new measure using a compute command in the following form
    • Compute rawindex=.
  4. Run a frequency distribution of the new indexed variable and determine whether it is suitable for further data analysis.
  5. Recode the index into appropriate categories as necessary.
  6. The new index can be used in crosstabulation like any other variable. This enables you to investigate both the external validity of your measure as well as use it in explanatory research. For example, use your index with an independent or dependent variable and calculate the appropriate measures of association.

QUESTIONS FOR REFLECTION

  • How does the relationship between your index and an independent variable differ from what you would obtain using each element of the index to produce a crosstab?
  • How is the relationship produced with the index affected by the choices in recoding the indexed variable?

DISCUSSION

  • An index often leads to stronger relationships because the measurement errors in each of the constituent indicators tend to balance out. This isn’t the case here with partisanship as the relationship between partisanship and the DV’s three indicators differs considerably.
  • Proper recoding of your index requires careful consideration of the possibilities and attention to the substantive meaning of your categories.
  • Depending on your coding choices the strength of the relationship in your table may increase, decrease or stay roughly the same.

Advanced Exercises

  • The ANES example from the previous lab can be continued here. An earlier example using the ANES 2012 data is available here. LINK
  • In this lab only three of the four indicators considered in Lab 8 are used to create an index.
  • One can create standardized scores (or z-scores) for the indicators used to create an index in this lab using the following SPSS syntax:
  •   /descriptives variables =  /save.
  • This will create three new standardized variables in the data set: z z z. Their existence can be confirmed by looking at the dataset of by running a frequency analysis on each of these variables. These new variables can be used to create a standardized index using same procedures employed in this lab. Doing so will ensure that all variables are equally weighted in the index. For our purposes, coding our indicators on a common range of values will suffice.