UCSC Lab 16 | Data Art

Poli 101 LAB MANUAL16

Bivariate Regression

PURPOSE

To introduce regression analysis
To learn how to perform a regression analysis and interpret the results.

Part I–MAIN POINTS

Regression is a technique that presents the relationship between two (or more) variables in the form of a simple linear function. The regression model finds the best-fitting equation through calculating the least squared deviations.

In bivariate analysis, regression takes the form:

y = a + bx. Where:
y is the dependent variable;
x is the independent variable;
b is the unstandardized regression coefficient;
a is an intercept or constant.
Translating the equation into words, we have:

Value of the Dependent Variable = Intercept (Constant) + Regression Coefficient times the Value of the Independent Variable

The regression equation allows us to predict the approximate value of the dependent variable given any value of the independent variable.
We interpret regression results in terms of the implications for the dependent variable (y) of a unit change (increase or decrease) in the independent variable (x).
For instance, assume a regression equation is Income = 10,000 + (5,000 X Education). Beginning from a constant of 10,000, every unit increase in education leads to a 5,000-unit increase in income.
- Significance is determined in two ways with regression. The first is for the equation as a whole, the second is for each particular independent variable. If the significance level is greater than 0.05 for either of these measures, then we cannot be infer that the increase described by the equation, or reflected in the individual regression coefficient is different than zero in the population.
- Also of particular interest in the regression output is the r-square value. Recall from our discussion of Pearson’s correlation coefficient (r) that r² is an estimation of the explained variance. The higher the value of r², the better. However with only one independent variable, the r-square value will likely be relatively low even when the independent variable is significant.
- The third important statistic in regression is the b value, which measures the effect of the independent variable on the dependent variable in terms of unit change
- The b value is an unstandardized coefficient meaning that it is measured in the units used to describe the independent variable. There is also a standardized version of b, called beta, which allows us to interpret regression in terms of standard deviation units. In this instance, every change of one standard deviation unit in the independent variable changes the dependent variable by a factor of beta.
Similar to correlation, regression should technically be used only when both variables are measured at the interval level, though researchers very often use ordinal variables with many categories in regression as well. Having more than a few possible categories provides greater variation for explanation. This is particularly true for the dependent variable. So in our own work it is usually desirable to use an index as the dependent variable.
As to independent variables there is a generally a bit more latitude with the level of measurement insofar as researchers commonly use dichotomies (coded as zero or one) created from nominal data as independent variables.

EXAMPLE

Calculating Regression

Dataset:
- ANES 2012
Dependent Variable:
- Economic Equality RawEq (Alpha .70)
  - Indicators: EcEq1(cses_govtact),
  - EcEq2 (ineqinc_ineqreduc)
  - EcEq3 (guarpr_self).
Independent Variables:
Hypothesis Arrow Diagram:

Syntax

weight by weight_full.
missing values cses_govtact (-9 thru -6).
recode cses_govtact (1=1) (2=.75) (3= .5) (4= .25) (5=0) into eceq1.
missing values ineqinc_ineqreduc (-9 thru -6).
recode ineqinc_ineqreduc (1=1) (2=0) (3= .5) into eceq3.
missing values guarpr_self (-9 thru -2). recode guarpr_self (1=1) (2=.832)
    (3= .666) (4= .5) (5= .332) (6= .166) (7=0) into eceq5.

*Constructing the Index*.
compute RawEqIndex = eceq1 + eceq3 + eceq5.

*Creating Independent Variables*.
*partisan feeling thermometers*.
missing values ft_dem (-2, -8, -9).

*Economy-past & future*.
missing values econ_ecpast_x (-9 thru -1).

regression variables = RawEqIndex ft_dem
  /dependent = RawEqIndex
  /method = enter.

regression variables = RawEqIndex econ_ecpast_x
  /dependent = RawEqIndex
  /method = enter.

Syntax Legend
- Missing values and recodes are declared & DV index constructed
- The regression command’s first line specifies the included variables.
- The second line specifies the dependent variable. Note the raw (unrecoded) index is used.
- Third line says to enter the other variable as a predictor.

Output

· Model Summary^b
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.511^a	.262	.261	.725
a. Predictors: (Constant), Feeling Thermometer Democratic Party
b. Dependent Variable: RawEqIndex

X→Y

Feeling toward Democrats< .51 > RawEgal

ANOVA^a
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	933.48	1	933.48	1775.77	.000^b
	Residual	2634.45	5012	.526
	Total	3567.92	5013
a. Dependent Variable: RawEqIndex
b. Predictors: (Constant), Feelings Democrats

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
		B	Std. Error	Beta
1	(Constant)	.556	.021		26.44	.000
	DemFeel	.015	.000	.511	42.14	.000
a. Dependent Variable: RawEqIndex

Y= a+ bx

RawEqIndex = .556 + (.015)DemFeel

Interpretation
- To derive the first regression equation, we need the information about the b coefficient and the constant. The equation can be written in its linear form as, [y] = a + b[x]:
  - RawEqIndex = (.556 + .015)DemFeel
- The regression coefficient is positive indicating that the relationship between the variables is positive. Thus as Democratic Feeling increases, attitudes to Economic Equality increases. Moreover, the equation tells us that for every one unit increase in Democratic Feeling, Economic Equality increases .015 units. The units referred to here are those in which each variable is measured.
- Next, we look at the significance of the equation and the significance of the regression coefficient for the independent variable. Since both are well below .05, we know the results of this regression analysis are unlikely to be due to sampling error. (In a univariate regression, it is rare for one to be significant if the other is not)
- The magnitude of the beta value can be overlooked for now. It will become relevant when we have two or more independent variables using multivariate regression. Notice that the regression equation above incorporates the b, not the beta.
- The R² is .262, which means that the variation in Democratic Feelings explains roughly 26% of the variation in Economic Equality Attitudes
- The interpretation of the second equation is up to you.

INSTRUCTIONS

In regression analysis, as with every other method of explanatory analysis, we begin by hypothesizing a relationship between an independent and a dependent variable. For example, continuing to work with the ANES 2012, we may hypothesize that respondent Democratic Partisan Feeling affects attitudes toward economic equality (dependent).
It is also essential to recode each variable and identify missing values for both variables based upon their respective frequency analysis.
Run the appropriate regression syntax in SPSS.
In viewing your output, consider first the ANOVA table to see whether the relationship meets the standards of statistical significancefor the equation as a whole and for the independent variable.
Next find the unstandardized coefficient (the column labeled “B”). This is the slope of the line and should be interpreted as the predicted effect on the dependent variable by a one unit increase in the value of the independent variable. Be sure to note the direction of the relationship.
Then check to see whether we can be confident that the results are not due to chance by checking the significance of that coefficient.
Finally, assess the magnitude of the r-square to determine the percent of variance in the DV explained by variation in the IV. The multiple r is equivalent to the correlation coefficient.
Repeatthe analysis using another independent variable from the data set such as Finances.
Write out the regression equationfor the relationship and interpret the meaning of your results in terms of the effect of the independent variable on the dependent variable with reference to both b and r-square.

QUESTIONS FOR REFLECTION

Do the magnitude of the regression coefficient b and the constant depend on how your variables are coded?
What is the regression equation for your results and what is the meaning of each of the components?
How do we visualize regression results?

DISCUSSION

The value of the b coefficient and the constant do depend on how the variables are coded. For example, recoding the variables into categories both b values and r-square will be affected. Generally speaking, when we use regression it is often preferable to use variables with as much of their original variation as possible.
It is up to you to calculate the effect of Economic Conditions on Economic Equality and decide whether it is more or less important than partisan feelings.
Regression results can be visualized in two ways.
- The first way is to run a Graph command in SPSS.
- GRAPH /scatterplot = ft_dem with RawEqIndex.

Note that the independent variable appears first in this procedure.

An alternative approach is to add a scatterplot subcommand to an existing regression procedure, after the /method= enter subcommand.
- It takes the form:
  - ```
  /scatterplot = (RawEqIndex ft_dem).
```

Note the DV precedes the IV for this.

Especially when working with categorical variables interested students may wish to create a “jittered scatter plot” by asking SPSS to add a small random number to each data value To do so, compute and use new “jittered” variables to visualize your results. For instance:

COMPUTE ft_demj = ft_dem + RV.UNIFORM(-0.3, +0.3).
COMPUTE RawEqIndexj = RawEqIndex + RV.UNIFORM(-0.3, +0.3).