Lab 16
POL242 LAB MANUAL16
Bivariate Regression
PURPOSE
 To introduce regression analysis
 To learn how to perform a regression analysis and interpret the results.
Part I–MAIN POINTS
 Regression is a technique that presents the relationship between two (or more) variables in the form of a simple linear function. The regression model finds the bestfitting equation through calculating the least squared deviations. See for example, the demonstration at: http://www.keypress.com/sketchpad/javasketchpad/gallery/pages/least_squares.php.
In bivariate analysis, regression takes the form:
 y = a + bx. Where:
 y is the dependent variable;
 x is the independent variable;
 b is the unstandardized regression coefficient;
 a is an intercept or constant.
 Translating the equation into words, we have:
Value of the Dependent Variable = Intercept (Constant) + (Regression Coefficient times the Value of the Independent Variable)
 The regression equation allows us to predict the approximate value of the dependent variable given any value of the independent variable.
 We interpret regression results in terms of the implications for the dependent variable (y) of a unit increase in the independent variable (x).
 For instance, assume a regression equation is Income = 10000 + (5000 X Education). Beginning from a constant of 10000, every unit increase in education leads to a 5000unit increase in income.
 Significance is determined in two ways with regression. The first is for the equation as a whole, the second is for each particular independent variable. If the significance level is greater than 0.05 for either of these measures, then we cannot be certain that the increase described by the equation, or reflected in the individual regression coefficient is different than zero.
 Also of particular interest in the regression output is the rsquare value. Recall from our discussion of Pearson’s correlation coefficient (r) that r^{2} is an estimation of the explained variance. The higher the value of r^{2}, the better. However with only one independent variable, the rsquare value will likely be relatively low even when the independent variable is significant.
 The third important statistic in regression is the b value, which measures the effect of the independent variable on the dependent variable in terms of unit change
 The b value is an unstandardized coefficient meaning that it is measured in the units used to measure the independent variable. There is also a standardized version of b, called beta, which allows us to interpret regression in terms of standard deviation units. In this instance, every change of one standard deviation unit in the independent variable changes the dependent variable by a factor of beta.
 Similar to correlation, regression should technically be used only when both variables are measured at the interval level, though researchers very often use ordinal variables with many categories in regression as well. Having more than a few possible categories provides greater variation for explanation. This is particularly true for the dependent variable. So in our own work it is usually desirable to use an index as the dependent variable.
 As to independent variables there is a generally a bit more latitude with the level of measurement insofar as researchers commonly use dichotomies (coded as zero or one) created from nominal data as independent variables.
EXAMPLE
Calculating Regression
 Dataset:
 CES 2011
 Dependent Variable:
 Egal (Alpha =.67)
 Indicators: PES11_41; mbs11_k2; mbs11_b3.
 Egal (Alpha =.67)
 Independent Variable:
 IV1: ConfFeel (cps11_18)
 IV2 Personal Financial Situation (cps11_66)
 Hypothesis Arrow Diagram:
 H1: ConFeel→ ~Egal (Cons Partisan Feeling⇒Less Egalitarian)
 H2: Pers Finance → Eqal (Improved Finances ⇒ More Egalitarian)
 Syntax
weight by WGTSamp. *Preparing indicators of Attitudes re Inequality*. *declare missing values on pes11_41*. missing values pes11_41 (8,9). *reverse scoring on pes11_41 and make it range from 01*. recode PES11_41 (1=1) (2=.75) (3=.5) (4= .25) (5=0) into undogap. value labels undogap 0 'muchless' .25 'someless' .5 'asnow' .75 'somemore' 1 'muchmore'. *rescale mbs11_k2 from 010 to 01 and reverse its scoring*. missing values mbs11_k2 (99). compute govact = (((mbs11_k2 * 1) +10)/10). value labels govact 0'not act' 1 'gov act'. *recode and relabel mbs11_b3. recode mbs11_b3 (1=1) (2=0) into goveqch. value labels goveqch 1 'decent living' 0 'leave alone'. *create an indexed variable (alpha=.66). compute rawegal = undogap + govact + goveqch. fre var = rawegal. *interval measure of partisan feeling from Lab 7*. fre var cps11_18. recode cps11_18 (0=0) (else = copy) into ConFeel. missing values Confeel (996, 998, 999). fre var Confeel. *create finance measures (from Lab 7. missing values cps11_66 (8,9). recode cps11_66 (1=1) (3=0) (5=.5) into finances. variable labels finances 'personal finances'. value labels finances 0 'worse' .5 'same' 1 'better'. *Regression Analyses for H1 AND H2*. regression variables = rawegal ConFeel /dependent = rawegal /method = enter. regression variables = rawegal Finances /dependent = rawegal /method = enter.
 Syntax Legend
 Missing values and recodes are declared & DV index constructed
 The regression command’s first line specifies the included variables.
 The second line specifies the dependent variables
 Third line says to enter the other variable as a predictor.
 Output
· Model Summary^{b}  
Model  R  R Square  Adjusted R Square  Std. Error of the Estimate 
1  .384^{a}  .148  .147  .63641 
a. Predictors: (Constant), ConFeel  
b. Dependent Variable: rawegal 
X→Y
Confeel < .38 > RawEgal
ANOVA^{a}  
Model  Sum of Squares  df  Mean Square  F  Sig.  
1  Regression  58.238  1  58.238  143.789  .000^{b} 
Residual  336.032  830  .405  
Total  394.270  831  
a. Dependent Variable: rawegal  
b. Predictors: (Constant), ConFeel 
Coefficients^{a}  
Model  Unstandardized Coefficients  Standardized Coefficients  t  Sig.  
B  Std. Error  Beta  
1  (Constant)  2.584  .041  63.077  .000  
ConFeel  .009  .001  .384  11.991  .000  
a. Dependent Variable: rawegal 
Y= a+ bx
Rawegal = 2.58 + (.009)ConFeel
Or
Rawegal = 2.58 .009ConFeel
 Interpretation
 To derive the first regression equation, we need the information about the b coefficient and the constant. The equation can be written in its linear form as, [y] = a + b[x]:
 Rawegal = 2.58 + (.009)ConFeel
or  Rawegal = 2.58 .009ConFeel
 Rawegal = 2.58 + (.009)ConFeel
 The regression coefficient is negative indicating that the relationship between the variables is negative. Thus as Conservative Feeling increases, Egalitarian attitude decreases. Moreover, the equation tells us that for every one unit increase in Conservative Feeling, Egalitarianism decreases .009 units. The units referred to here are those in which each variable is measured.
 Next, we look at the significance of the equation and the significance of the regression coefficient for the independent variable. Since both are well below .05, we know the results of this regression analysis are unlikely to be due to sampling error. (In a univariate regression, it is rare for one to be significant if the other is not)
 The magnitude of the beta value can be overlooked for now it will become relevant when we have two or more independent variables using multivariate regression. Notice that the regression equation above incorporates the b, not the beta.
 The R^{2} is .148, which means that the variation in Conservative Feelings explains roughly 15% of the variation in Egalitarian Attitudes
 The interpretation of the second equation is up to you.
 To derive the first regression equation, we need the information about the b coefficient and the constant. The equation can be written in its linear form as, [y] = a + b[x]:
INSTRUCTIONS
 In regression analysis, as with every other method of explanatory analysis, we begin by hypothesizing a relationship between an independent and a dependent variable. For example, continuing to work with the CES 2011, we may hypothesize that respondent Conservative Partisan Feeling affects Egalitarianism (dependent).
 It is also essential to recodeeach variable and identify missing values for both variables based upon their respective frequency analysis.
 Run the appropriate regression syntax in SPSS.
 In viewing your output, consider first the ANOVA table to see whether the relationship meets the standards of statistical significancefor the equation as a whole and for the independent variable.
 Next find the unstandardized coefficient (the column labeled “B”). This is the slope of the line and should be interpreted as the predicted effect on the dependent variable by a one unit increase in the value of the independent variable. Be sure to note the direction of the relationship.
 Then check to see whether we can be confident that the results are not due to chance by checking the significance of that coefficient.
 Finally, assess the magnitude of the rsquare to determine the percent of variance in the DV explained by variation in the IV. The multiple r is equivalent to the correlation coefficient.
 Repeatthe analysis using another independent variable from the data set such as Finances.
 Write out the regression equationfor the relationship and interpret the meaning of your results in terms of the effect of the independent variable on the dependent variable with reference to both b and rsquare.
QUESTIONS FOR REFLECTION
 Do the magnitude of the regression coefficient b and the constant depend on how your variables are coded?
 What is the regression equation for your results and what is the meaning of each of the components?
 How do we visualize regression results?
DISCUSSION
 The value of the b coefficient and the constant do depend on how the variables are coded. For example, recoding the variables into categories both b values and rsquare will be affected. Generally speaking, when we use regression it is often preferable to use variables with as much as possible of their original variation.
 It is up to you to calculate the effect of Finances on Egalitarianism and decide whether it is more or less important than partisan feelings.
 Regression results can be visualized in two ways.
 The first way is to run a Graph command in SPSS.
 The relevant syntax is:
 GRAPH /scatterplot = finances with rawegal.
 The first way is to run a Graph command in SPSS.
Note that the independent variable appears first in this procedure.
 An alternative approach is to add a scatterplot subcommand to an existing regression procedure, after the /method= enter subcommand.
 It takes the form:

/scatterplot = (rawegal finances).

 It takes the form:
Note the DV precedes the IV for this.
With categorical variables interested students may wish to create a “jittered scatter plot” by asking SPSS to add a small random number to each data value To do so, compute and use new “jittered” variables to visualize your results. For instance:
COMPUTE financesj = finances + RV.UNIFORM(0.3, +0.3). COMPUTE rawegalj = rawegal + RV.UNIFORM(0.3, +0.3).