Lab 16

POL242 LAB MANUAL16

Bivariate Regression

PURPOSE

  • To introduce regression analysis
  • To learn how to perform a regression analysis and interpret the results.

Part I–MAIN POINTS

In bivariate analysis, regression takes the form:

    • y = a + bx.    Where:
    • y is the dependent variable;
    • x is the independent variable;
    • b is the unstandardized regression coefficient;
    • is an intercept or constant.
    • Translating the equation into words, we have:

Value of the Dependent Variable = Intercept (Constant) + (Regression Coefficient times the Value of the Independent Variable)

  • The regression equation allows us to predict the approximate value of the dependent variable given any value of the independent variable.
  • We interpret regression results in terms of the implications for the dependent variable (y) of a unit increase in the independent variable (x).
  • For instance, assume a regression equation is Income = 10000 + (5000 X Education).  Beginning from a constant of 10000, every unit increase in education leads to a 5000-unit increase in income.
    • Significance is determined in two ways with regression. The first is for the equation as a whole, the second is for each particular independent variable. If the significance level is greater than 0.05 for either of these measures, then we cannot be certain that the increase described by the equation, or reflected in the individual regression coefficient is different than zero.
    • Also of particular interest in the regression output is the r-square value. Recall from our discussion of Pearson’s correlation coefficient (r) that r2 is an estimation of the explained variance. The higher the value of r2, the better. However with only one independent variable, the r-square value will likely be relatively low even when the independent variable is significant.
    • The third important statistic in regression is the b value, which measures the effect of the independent variable on the dependent variable in terms of unit change
    • The b value is an unstandardized coefficient meaning that it is measured in the units used to measure the independent variable. There is also a standardized version of b, called beta, which allows us to interpret regression in terms of standard deviation units. In this instance, every change of one standard deviation unit in the independent variable changes the dependent variable by a factor of beta.
  • Similar to correlation, regression should technically be used only when both variables are measured at the interval level, though researchers very often use ordinal variables with many categories in regression as well. Having more than a few possible categories provides greater variation for explanation.  This is particularly true for the dependent variable. So in our own work it is usually desirable to use an index as the dependent variable.
  • As to independent variables there is a generally a bit more latitude with the level of measurement insofar as researchers commonly use dichotomies (coded as zero or one) created from nominal data as independent variables.

EXAMPLE 

Calculating Regression

  • Dataset:
    • CES 2011
  • Dependent Variable:
    • Egal (Alpha =.67)
      • Indicators: PES11_41; mbs11_k2; mbs11_b3.
  • Independent Variable:
    • IV1: ConfFeel (cps11_18)
    • IV2 Personal Financial Situation (cps11_66)
  • Hypothesis Arrow Diagram:
    • H1: ConFeel→ ~Egal (Cons Partisan Feeling⇒Less Egalitarian)
    • H2: Pers Finance → Eqal (Improved Finances ⇒ More Egalitarian)
  • Syntax
weight by WGTSamp.
*Preparing indicators of Attitudes re Inequality*.
*declare missing values on pes11_41*.
missing values pes11_41 (8,9).

*reverse scoring on pes11_41 and make it range from 0-1*.
recode PES11_41 (1=1) (2=.75) (3=.5) (4= .25) (5=0) into undogap.
value labels undogap 0 'muchless' .25 'someless' .5 'asnow'
   .75 'somemore' 1 'muchmore'.

*rescale mbs11_k2 from 0-10 to 0-1 and reverse its scoring*.
missing values mbs11_k2 (-99).
compute govact = (((mbs11_k2 * -1) +10)/10).
value labels govact 0'not act' 1 'gov act'.

*recode and re-label mbs11_b3.
recode mbs11_b3 (1=1) (2=0) into goveqch.
value labels goveqch 1 'decent living' 0 'leave alone'.

*create an indexed variable (alpha=.66).
compute rawegal = undogap + govact + goveqch.
fre var = rawegal.

*interval measure of partisan feeling from Lab 7*.
fre var cps11_18.
recode cps11_18 (0=0) (else = copy) into ConFeel.
missing values Confeel (996, 998, 999).
fre var Confeel.

*create finance measures (from Lab 7.
missing values cps11_66 (8,9).
recode cps11_66 (1=1) (3=0) (5=.5) into finances.
variable labels finances 'personal finances'.
value labels finances 0 'worse' .5 'same' 1 'better'.

*Regression Analyses for H1 AND H2*.
regression variables = rawegal ConFeel
   /dependent = rawegal
   /method = enter.

regression variables = rawegal Finances
   /dependent = rawegal
   /method = enter.
  • Syntax Legend
    • Missing values and recodes are declared & DV index constructed
    • The regression command’s first line specifies the included variables.
    • The second line specifies the dependent variables
    • Third line says to enter the other variable as a predictor.
  • Output
·       Model Summaryb
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .384a .148 .147 .63641
a. Predictors: (Constant), ConFeel
b. Dependent Variable: rawegal

X→Y

Confeel < -.38 > RawEgal

ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 58.238 1 58.238 143.789 .000b
Residual 336.032 830 .405
Total 394.270 831
a. Dependent Variable: rawegal
b. Predictors: (Constant), ConFeel

 

Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
1 (Constant) 2.584 .041 63.077 .000
ConFeel -.009 .001 -.384 -11.991 .000
a. Dependent Variable: rawegal

Y= a+ bx

Rawegal = 2.58 + (-.009)ConFeel

Or

Rawegal = 2.58 -.009ConFeel

  • Interpretation
    • To derive the first regression equation, we need the information about the b coefficient and the constant.  The equation can be written in its linear form as, [y] = a + b[x]:
      • Rawegal = 2.58 + (-.009)ConFeel
        or
      • Rawegal = 2.58 -.009ConFeel
    • The regression coefficient is negative indicating that the relationship between the variables is negative. Thus as Conservative Feeling increases, Egalitarian attitude decreases. Moreover, the equation tells us that for every one unit increase in Conservative Feeling, Egalitarianism decreases .009 units. The units referred to here are those in which each variable is measured.
    • Next, we look at the significance of the equation and the significance of the regression coefficient for the independent variable.  Since both are well below .05, we know the results of this regression analysis are unlikely to be due to sampling error. (In a univariate regression, it is rare for one to be significant if the other is not)
    • The magnitude of the beta value can be overlooked for now it will become relevant when we have two or more independent variables using multivariate regression. Notice that the regression equation above incorporates the b, not the beta.
    • The R2 is .148, which means that the variation in Conservative Feelings explains roughly 15% of the variation in Egalitarian Attitudes
    • The interpretation of the second equation is up to you.

INSTRUCTIONS

  1. In regression analysis, as with every other method of explanatory analysis, we begin by hypothesizing a relationship between an independent and a dependent variable. For example, continuing to work with the CES 2011, we may hypothesize that respondent Conservative Partisan Feeling affects Egalitarianism (dependent).
  2. It is also essential to recodeeach variable and identify missing values for both variables based upon their respective frequency analysis.
  3. Run the appropriate regression syntax in SPSS.
  4. In viewing your output, consider first the ANOVA table to see whether the relationship meets the standards of statistical significancefor the equation as a whole and for the independent variable.
  5. Next find the unstandardized coefficient (the column labeled “B”). This is the slope of the line and should be interpreted as the predicted effect on the dependent variable by a one unit increase in the value of the independent variable. Be sure to note the direction of the relationship.
  6. Then check to see whether we can be confident that the results are not due to chance by checking the significance of that coefficient.
  7. Finally, assess the magnitude of the r-square to determine the percent of variance in the DV explained by variation in the IV. The multiple r is equivalent to the correlation coefficient.
  8. Repeatthe analysis using another independent variable from the data set such as Finances.
  9. Write out the regression equationfor the relationship and interpret the meaning of your results in terms of the effect of the independent variable on the dependent variable with reference to both b and r-square.

QUESTIONS FOR REFLECTION

  • Do the magnitude of the regression coefficient b and the constant depend on how your variables are coded?
  • What is the regression equation for your results and what is the meaning of each of the components?
  • How do we visualize regression results?

DISCUSSION

  • The value of the b coefficient and the constant do depend on how the variables are coded.  For example, recoding the variables into categories both b values and r-square will be affected. Generally speaking, when we use regression it is often preferable to use variables with as much as possible of their original variation.
  • It is up to you to calculate the effect of Finances on Egalitarianism and decide whether it is more or less important than partisan feelings.
  • Regression results can be visualized in two ways.
    • The first way is to run a Graph command in SPSS.
        • The relevant syntax is:
    • GRAPH /scatterplot = finances with rawegal.

Note that the independent variable appears first     in this procedure.

  • An alternative approach is to add a scatterplot subcommand to an existing regression procedure, after the /method= enter subcommand.
    • It takes the form:
      • /scatterplot = (rawegal finances).

 

Note the DV precedes the IV for this.

With categorical variables interested students may wish to create a “jittered scatter plot” by asking SPSS to add a small random number to each data value To do so, compute and use new “jittered” variables to visualize your results. For instance:

COMPUTE financesj = finances + RV.UNIFORM(-0.3, +0.3).
COMPUTE rawegalj = rawegal + RV.UNIFORM(-0.3, +0.3).