UCSC Lab 16

Poli 101 LAB MANUAL16

Bivariate Regression

PURPOSE

  • To introduce regression analysis
  • To learn how to perform a regression analysis and interpret the results.

Part I–MAIN POINTS

  • Regression is a technique that presents the relationship between two (or more) variables in the form of a simple linear function.  The regression model finds the best-fitting equation through calculating the least squared deviations.

In bivariate analysis, regression takes the form:

    • y = a + bx.    Where:
    • y is the dependent variable;
    • x is the independent variable;
    • b is the unstandardized regression coefficient;
    • is an intercept or constant.
    • Translating the equation into words, we have:

Value of the Dependent Variable = Intercept (Constant) + Regression Coefficient times the Value of the Independent Variable

  • The regression equation allows us to predict the approximate value of the dependent variable given any value of the independent variable.
  • We interpret regression results in terms of the implications for the dependent variable (y) of a unit change (increase or decrease) in the independent variable (x).
  • For instance, assume a regression equation is Income = 10,000 + (5,000 X Education).  Beginning from a constant of 10,000, every unit increase in education leads to a 5,000-unit increase in income.
    • Significance is determined in two ways with regression. The first is for the equation as a whole, the second is for each particular independent variable. If the significance level is greater than 0.05 for either of these measures, then we cannot be infer that the increase described by the equation, or reflected in the individual regression coefficient is different than zero in the population.
    • Also of particular interest in the regression output is the r-square value. Recall from our discussion of Pearson’s correlation coefficient (r) that r2 is an estimation of the explained variance. The higher the value of r2, the better. However with only one independent variable, the r-square value will likely be relatively low even when the independent variable is significant.
    • The third important statistic in regression is the b value, which measures the effect of the independent variable on the dependent variable in terms of unit change
    • The b value is an unstandardized coefficient meaning that it is measured in the units used to describe the independent variable. There is also a standardized version of b, called beta, which allows us to interpret regression in terms of standard deviation units. In this instance, every change of one standard deviation unit in the independent variable changes the dependent variable by a factor of beta.
  • Similar to correlation, regression should technically be used only when both variables are measured at the interval level, though researchers very often use ordinal variables with many categories in regression as well. Having more than a few possible categories provides greater variation for explanation.  This is particularly true for the dependent variable. So in our own work it is usually desirable to use an index as the dependent variable.
  • As to independent variables there is a generally a bit more latitude with the level of measurement insofar as researchers commonly use dichotomies (coded as zero or one) created from nominal data as independent variables.

EXAMPLE 

Calculating Regression

      • Dataset:
        • ANES 2012
      • Dependent Variable:
        • Economic Equality RawEq (Alpha .70)
          • Indicators: EcEq1(cses_govtact),
          • EcEq2 (ineqinc_ineqreduc)
          • EcEq3 (guarpr_self).
      • Independent Variables:
          • Feeling toward Democratic Party (ft_dem),
          • Improved Econ (econ_ecpast_x).
      • Hypothesis Arrow Diagram:
          • Positive Feeling toward  Democrats –>EcEq
          • Improved Finances –> Egal
          • Improved Econ –> Egal

Syntax

weight by weight_full.
missing values cses_govtact (-9 thru -6).
recode cses_govtact (1=1) (2=.75) (3= .5) (4= .25) (5=0) into eceq1.
missing values ineqinc_ineqreduc (-9 thru -6).
recode ineqinc_ineqreduc (1=1) (2=0) (3= .5) into eceq3.
missing values guarpr_self (-9 thru -2). recode guarpr_self (1=1) (2=.832)
    (3= .666) (4= .5) (5= .332) (6= .166) (7=0) into eceq5.

*Constructing the Index*.
compute RawEqIndex = eceq1 + eceq3 + eceq5.

*Creating Independent Variables*.
*partisan feeling thermometers*.
missing values ft_dem (-2, -8, -9).

*Economy-past & future*.
missing values econ_ecpast_x (-9 thru -1).

regression variables = RawEqIndex ft_dem
  /dependent = RawEqIndex
  /method = enter.

regression variables = RawEqIndex econ_ecpast_x
  /dependent = RawEqIndex
  /method = enter.
  • Syntax Legend
    • Missing values and recodes are declared & DV index constructed
    • The regression command’s first line specifies the included variables.
    • The second line specifies the dependent variable. Note the raw (unrecoded) index is used.
    • Third line says to enter the other variable as a predictor.
    • Output
    ·       Model Summaryb
    Model R R Square Adjusted R Square Std. Error of the Estimate
    1 .511a .262 .261 .725
    a. Predictors: (Constant), Feeling Thermometer Democratic Party
    b. Dependent Variable: RawEqIndex

    X→Y

    Feeling toward Democrats< .51 > RawEgal

    ANOVAa
    Model Sum of Squares df Mean Square F Sig.
    1 Regression   933.48 1 933.48 1775.77 .000b
    Residual 2634.45 5012 .526
    Total 3567.92 5013
    a. Dependent Variable: RawEqIndex
    b. Predictors: (Constant), Feelings Democrats
    Coefficientsa
    Model Unstandardized Coefficients Standardized Coefficients t Sig.
    B Std. Error Beta
    1 (Constant) .556 .021 26.44 .000
    DemFeel .015 .000 .511 42.14 .000
    a. Dependent Variable: RawEqIndex

    Y= a+ bx

    RawEqIndex = .556 + (.015)DemFeel

  • Interpretation
    • To derive the first regression equation, we need the information about the b coefficient and the constant.  The equation can be written in its linear form as, [y] = a + b[x]:
      • RawEqIndex = (.556 + .015)DemFeel
    • The regression coefficient is positive indicating that the relationship between the variables is positive. Thus as Democratic Feeling increases, attitudes to Economic Equality increases. Moreover, the equation tells us that for every one unit increase in Democratic Feeling, Economic Equality increases .015 units. The units referred to here are those in which each variable is measured.
    • Next, we look at the significance of the equation and the significance of the regression coefficient for the independent variable.  Since both are well below .05, we know the results of this regression analysis are unlikely to be due to sampling error. (In a univariate regression, it is rare for one to be significant if the other is not)
    • The magnitude of the beta value can be overlooked for now. It will become relevant when we have two or more independent variables using multivariate regression. Notice that the regression equation above incorporates the b, not the beta.
    • The R2 is .262, which means that the variation in Democratic Feelings explains roughly 26% of the variation in Economic Equality Attitudes
    • The interpretation of the second equation is up to you.

INSTRUCTIONS

  1. In regression analysis, as with every other method of explanatory analysis, we begin by hypothesizing a relationship between an independent and a dependent variable. For example, continuing to work with the ANES 2012, we may hypothesize that respondent Democratic Partisan Feeling affects attitudes toward economic equality (dependent).
  2. It is also essential to recode each variable and identify missing values for both variables based upon their respective frequency analysis.
  3. Run the appropriate regression syntax in SPSS.
  4. In viewing your output, consider first the ANOVA table to see whether the relationship meets the standards of statistical significancefor the equation as a whole and for the independent variable.
  5. Next find the unstandardized coefficient (the column labeled “B”). This is the slope of the line and should be interpreted as the predicted effect on the dependent variable by a one unit increase in the value of the independent variable. Be sure to note the direction of the relationship.
  6. Then check to see whether we can be confident that the results are not due to chance by checking the significance of that coefficient.
  7. Finally, assess the magnitude of the r-square to determine the percent of variance in the DV explained by variation in the IV. The multiple r is equivalent to the correlation coefficient.
  8. Repeatthe analysis using another independent variable from the data set such as Finances.
  9. Write out the regression equationfor the relationship and interpret the meaning of your results in terms of the effect of the independent variable on the dependent variable with reference to both b and r-square.

QUESTIONS FOR REFLECTION

  • Do the magnitude of the regression coefficient b and the constant depend on how your variables are coded?
  • What is the regression equation for your results and what is the meaning of each of the components?
  • How do we visualize regression results?

DISCUSSION

  • The value of the b coefficient and the constant do depend on how the variables are coded.  For example, recoding the variables into categories both b values and r-square will be affected. Generally speaking, when we use regression it is often preferable to use variables with as much of their original variation as possible.
  • It is up to you to calculate the effect of Economic Conditions on Economic Equality and decide whether it is more or less important than partisan feelings.
  • Regression results can be visualized in two ways.
    • The first way is to run a Graph command in SPSS.
        • The relevant syntax is:
    • GRAPH /scatterplot = ft_dem with RawEqIndex.

Note that the independent variable appears first  in this procedure.

  • An alternative approach is to add a scatterplot subcommand to an existing regression procedure, after the /method= enter subcommand.
    • It takes the form:
      • /scatterplot = (RawEqIndex ft_dem).

Note the DV precedes the IV for this.

Especially when working with categorical variables interested students may wish to create a “jittered scatter plot” by asking SPSS to add a small random number to each data value To do so, compute and use new “jittered” variables to visualize your results. For instance:

COMPUTE ft_demj = ft_dem + RV.UNIFORM(-0.3, +0.3).
COMPUTE RawEqIndexj = RawEqIndex + RV.UNIFORM(-0.3, +0.3).