UCSC Lab 9 | Data Art

Index Construction

PURPOSE

To learn to construct an index.

MAIN POINTS

The Reliability procedure tests only the suitability of a set of variables for the construction of an index (internal validation).
To construct an index you use the compute command, e.g., COMPUTE NEWNAME=Q1+Q2+Q3+Q4.
Recoding your index into fewer categories is often essential to create interpretable crosstabs.
Using your index in crosstabulation enables you to examine its relations to other variables. This permits not only external validation of your index; it can also enhance your explanatory research..

EXAMPLE

Dataset:
- ANES 2012

Concept:
- Attitude toward reducing economic inequality (Alpha =.69)
Indicators:

Syntax

*Identifying EconEq Index Items*.

missing values cses_govtact (-9 thru -6).
fre var=cses_govtact
 /statistics stdev skew kurtosis.
recode cses_govtact (1=1) (2=.75) (3= .5) (4= .25) (5=0) into eceq1.
fre var eceq1.

missing values ineqinc_ineqreduc (-9 thru -6).
fre var ineqic_ineqreduc.
recode ineqinc_ineqreduc (1=1) (2=0) (3= .5) into eceq3.
fre var eceq3.

missing values guarpr_self (-9 thru -2).
recode guarpr_self (1=1) (2=.832) (3= .666) (4= .5) (5= .332) (6= .166) (7=0) into eceq5.
fre var guarpr_self eceq5.

*Replicating the Reliability Analysis*.

reliability
   /variables= eceq1 eceq2 eceq3 eceq4 eceq5
   /scale (EcEq3) eceq1 eceq3 eceq5
   /summary = all.

*Constructing the Index*.
compute RawIndex = eceq1 + eceq3 + eceq5.
fre var RawIndex
   /statistics = mean median stddev skew kurtosis.

*Recoding the Index*.
recode RawIndex (0 thru 1.00 =1) (1.01 thru 1.85 =2) (1.86 thru 3 = 3) into IEcEq3.
fre var IEcEq3
   /statistics mean median stddev skew kurosis.

Syntax Legend
- Comments can be inserted between asterisks *. . .*.
- Most of the above syntax is familiar from the previous lab.
- The compute command is where the index is constructed. However a frequency command is necessary to see the index and calculate its summary measures.
- Recoding an index is essential to produce effective tables. Here recodes place about 1/3 of the cases in each category, using the cumulative percent column of the frequency analysis as a guide.
- Recode the index into a new name as it is will be useful to retain both the complete raw and recoded forms of an index.

Output for Raw Index


	Value	Frequency	Percent	Valid Percent
	.00	371	6.3	7.3	7.3
	.17	361	6.1	7.1	14.5
	.25	24	.4	.5	14.9
	.32	160	2.7	3.2	18.1
	.42	67	1.1	1.3	19.4
	.50	153	2.6	3.0	22.4
	.58	74	1.3	1.5	23.9
	.67	113	1.9	2.2	26.1
	.75	86	1.5	1.7	27.8
	.82	72	1.2	1.4	29.3
	.83	20	.3	.4	29.6
	.92	80	1.4	1.6	31.2
	1.00	177	3.0	3.5	34.7
	1.07	12	.2	.2	35.0
	1.08	99	1.7	1.7	36.9
	1.17	118	2.0	2.3	39.2
	1.25	171	2.9	3.4	42.6
	1.32	43	.7	.8	43.5
	1.33	129	2.2	2.5	46.0
	1.42	133	2.2	2.6	48.6
	1.50	314	5.3	6.2	54.8
	1.57	35	.6	.7	56.3
	1.58	71	1.2	1.4	56.9
	1.67	150	2.5	3.0	59.9
	1.75	174	2.9	3.4	63.3
	1.82	67	1.1	1.3	64.7
	1.83	70	1.2	1.4	66.0
	1.92	118	2.0	2.3	68.4
	2.00	333	5.6	6.6	75.0
	2.07	75	1.3	1.5	76.4
	2.08	59	1.0	1.2	77.6
	2.17	107	1.8	2.1	79.7
	2.25	242	4.1	4.8	84.5
	2.32	18	.3	.4	84.9
	2.33	76	1.3	1.5	86.4
	2.42	139	2.3	2.7	89.1
	2.50	141	2.4	2.8	91.9
	2.58	128	2.2	2.5	94.4
	2.67	54	.9	1.1	95.5
	2.75	85	1.4	1.7	97.2
	2.83	53	.9	1.0	98.2
	3.00	91	1.5	1.8	100.0
	Total	5063	85.6	100.0
Missing	System	853	14.4
Total	5916	100.0

Summary Statistics
Mean = 1.39
Median =1.50
StdDev = .84
Skew = -.12
Kurtosis = – 1.07

The newly computed index variable has so many categories that crosstabs will be unwieldy. Therefore recode into fewer categories.

*Recoding the Index*.

recode RawIndex (0 thru 1.00 =1) (1.01 thru 1.85 =2) (1.86 thru 3 = 3) into IEcEq3.
fre var IEcEq3
   /statistics mean median stddev skew kurosis.

Frequency Distribution for Recoded Index

IEcEq3
	Value	Frequency	Percent	Valid Percent
	1.00	1758	29.7	34.7	32.4
	2.00	1586	26.8	31.3	66.0
	3.00	1719	27.1	29.1	100.0
	Total	5916	85.6	100.0
Missing		853	14.4
Total N=		5916	100.0

Mean = 1.99
Median = 2.0
StdDev = .81
Skew = .014
Kurtosis = – 1.54

The recoded index can be readily crosstabulated with independent variables.

*Creating an indicator of Party Identification*.
missing values pid_self (-9 thru 0, 5).
fre var pid_self.
missing values pid_x (-2).
recode pid_self (1=1) (3 = .5) (2=0) into pid.
value labels pid 1 'Dem' .5 'Ind' 0 'Rep'.

*Crosstabulation of IEcEq3 by pid*.
crosstabs tables = IEcEq3 by pid
   /cells = column count
   /statistics = phi btau.

Crosstabulation of support for action toward Income Equality by Partisan Identity

IEcEq3
IEcEq3		Repub	Ind	Democ
Support for Government Action toward Income Equality
	low	65.7%	36.3%	13.9%

	med	22.2%	33.1%	35.9%

	hi	12.1%	30.6%	50.2%

	N =	(1221)	(1583)	(2021)

Cramer’s V =.318
Taub = .386

Interpretation

The recoded variable is more manageable.
Crosstabs permit calculation of measures of association between the recoded index and other variables. This can be useful for both external validation and explanatory research.
The frequency analysis for the index shows that scores range from zero through three. This makes sense since the index is composed of three items each of which is scored between zero and one.
Summary measures of central tendency and variation can be calculated
The index is recoded into three categories using the cumulative percentages as a guide in finding the 33% and 66%.
The index is crosstabulated with an indicator of political partisanship
The crosstabs provide support for a partisan explanation of policy preferences regarding Economic Equality.

INSTRUCTIONS

Use the data set and questions you worked with in Lab 8.
Having found a combination of questions that produce an alpha greater than .60, ensure that the range for each of the questions is similar to one another. This is to ensure that none of the items are over or under-represented in the index. For example, if the first question has a range from 1 to 3 and the second has a range from 1 to 100, then the second will be disproportionately over-represented. Recode all the questions such that their ranges are similar, not necessarily identical.
To create the index, combine all the different questions into a new measure using a compute command in the following form
- Compute rawindex=.
Run a frequency distribution of the new indexed variable and determine whether it is suitable for further data analysis.
Recode the index into appropriate categories as necessary.
The new index can be used in crosstabulation like any other variable. This enables you to investigate both the external validity of your measure as well as use it in explanatory research. For example, use your index with an independent or dependent variable and calculate the appropriate measures of association.

QUESTIONS FOR REFLECTION

How does the relationship between your index and an independent variable differ from what you would obtain using each element of the index to produce a crosstab?
How is the relationship produced with the index affected by the choices in recoding the indexed variable?

DISCUSSION

An index often leads to stronger relationships because the measurement errors in each of the constituent indicators tend to balance out.
Proper recoding of your index requires careful consideration of the possibilities and attention to the substantive meaning of your categories.
Depending on your coding choices the strength of the relationship in your table may increase, decrease or stay roughly the same.

Advanced Topics

In this lab only three of the five indicators considered in Lab 8 are used to create an index.
One can create standardized scores (or z-scores) for the indicators used to create an index in this lab using the following SPSS syntax:
descriptives variables = /save.
This will create three new standardized variables in the data set: z z z. Their existence can be confirmed by looking at the dataset of by running a frequency analysis on each of these variables. These new variables can be used to create a standardized index using same procedures employed in this lab. Doing so will ensure that all variables are equally weighted in the index. For our purposes, coding our indicators on a common range of values will suffice.