new Lab 16
Poli 101 LAB MANUAL16
Bivariate Regression
PURPOSE
 To introduce regression analysis
 To learn how to perform a regression and interpret its results.
Part I–MAIN POINTS
 Regression is a technique that presents the relationship between two (or more) variables in the form of a simple linear function. The regression model finds the bestfitting equation by calculating the line from which the data points have the least (squared) deviations.
In bivariate analysis, regression takes the form:
 y = a + bx. Where:
 y is the dependent variable;
 x is the independent variable;
 b is the unstandardized regression coefficient;
 a is an intercept or constant.
 Translating the equation into words, we have:
Value of the Dependent Variable = Intercept (Constant) + Regression Coefficient times the Value of the Independent Variable
 The regression equation allows us to predict the approximate value of the dependent variable given any value of the independent variable.
 We interpret regression results in terms of the implications for the dependent variable (y) of a unit change (increase or decrease) in the independent variable (x).
 For instance, assume a regression equation is Income = 10,000 + (5,000 X Education). Beginning from a constant of 10,000, every unit increase in education leads to a 5,000unit increase in income.
 Significance is determined in two ways with regression. The first is for the equation as a whole, the second is for each particular independent variable. If the significance level is greater than 0.05 for either of these measures, then we cannot infer that the increase described by the equation, or reflected in the individual regression coefficient is different than zero in the population.
 Also of particular interest in the regression output is the rsquare value. Recall from our discussion of Pearson’s correlation coefficient (r) that r^{2} is an estimation of the explained variance. The higher the value of r^{2}, the better. However with only one independent variable, the rsquare value will likely be relatively low even when the independent variable is significant.
 The third important statistic in regression is the b value, which measures the effect of the independent variable on the dependent variable in terms of unit change
 The b value is an unstandardized coefficient meaning that it is measured in the units used to describe the independent variable. There is also a standardized version of b, called beta, which allows us to interpret regression in terms of standard deviation units. In this instance, every change of one standard deviation unit in the independent variable changes the dependent variable by a factor of beta.
 Similar to correlation, regression should technically be used only when both dependent and independent variables are measured at the interval level, though researchers very often use ordinal variables with many categories in regression as well.This is particularly true for the dependent variable where having more than a few possible categories provides greater variation to explain. So in our own work it is usually desirable to use an index as the dependent variable.
 As to independent variables there is a generally a bit more latitude with the level of measurement insofar as researchers commonly use dichotomies (coded as zero or one) created from nominal data as independent variables. This will be the subject of lab 18.
EXAMPLE
Calculating Regression
 Dataset:
 PPIC October 2016
 Dependent Variable:Index of Support for Recreational Marijuana (RawMJ3)
 Index of Support for Recreational Marijuana (7 categories; Alpha =.777)
 Indicators: q21 recoded as MJPropD
 q36 recoded as MJLegalD
 q36a recoded as MJTry.
 Index of Support for Recreational Marijuana (7 categories; Alpha =.777)
 Independent Variables:
 Partisan Identification
 Political Ideology
 Hypotheses Arrow Diagrams:
 H1: Democratic Party ID → Support Recreational Marijuana (5 categories coded 03)
 H2: Liberal Ideology → Support for Recreational Marijuana (7 categories coded 03)
 Syntax
*Weighting the Data*. weight by weight. *Recoding MJ Index Items*. recode q21 (1=1) (2=0) into MJPropD. value labels MJPropD 1 'yes' 0 'no'. recode q36 (1=1) (2=0) into MJLegalD. value labels MJLegalD 1 'yes' 0 'no'. recode q36a (1=1) (2=.5) (3=.0) into MJTry. value labels MJTry 1 'recent' .5 'not recent' 0 'no'. *Constructing an Index with alpha = .777*. compute RawMJ3 = (MJPropD + MJLegalD + MJTry). *Creating IV Indicators of Party Identification & Ideology*. recode q40c (1=0) (3=.5) (2=1) into Democrat. value labels Democrat 1 'Democ' .5 'Indep' 0 'Repub'.
*Democrat5 (adapted from from lab 7)*. if (q40c = 1) and (q40e =1) Democrat5 =0. if (q40c = 1) and (q40e =2) Democrat5 =.25. if (q40c = 3) Democrat5 =.5. if (q40c =2) and (q40d =2) Democrat5 = .75. if (q40c =2) and (q40d=1) Democrat5 =1. value labels Democrat5 0 'strRep' .25 'Rep' .5 'Indep' .75 'Dem' 1 'strDem'.
recode q37 (1,2=1) (3=.5) (4,5= 0) into liberal. value labels liberal 1 'liberal' .5 'middle' 0 'conserv'.
recode q37 (1=1) (2=.75) (3= .5 ) (4=.25) (5= 0) into liberal5. value labels liberal5 1 'vlib' .75 'liberal'.5 'middle' .25 'conserv' 0 'vcons'. regression variables=RawMJ3 Democrat5 /dependent = RawMJ3 /method = enter.
regression variables=RawMJ3 liberal5 /dependent = RawMJ3 /method = enter.
 Syntax Legend
 Missing values and recodes are declared & DV index constructed.
 Two separate regressions are programmed to run.
 The regression commands’ first line specifies the included variables.
 The second line specifies the dependent variable. Note the raw (unrecoded) index is used.
 The third line says to enter the other variable specified as a predictor.
Output for First Regression 
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .209^{a} .044 .043 1.12629 a. Predictors: (Constant), Democrat5
ANOVA^{a}  
Model  Sum of Squares  df  Mean Square  F  Sig.  
1  Regression  55.490  1  55.490  43.744  .000^{b} 
Residual  1218.702  961  1.269  
Total  1274.191  962  
a. Dependent Variable: RawMJ3  
b. Predictors: (Constant), Democrat5 
Coefficients^{a}  
Model  Unstandardized Coefficients  Standardized Coefficients 
t 
Sig. 

B  Std. Error  Beta  
1  (Constant)  1.108  .072  15.355  .000  
Democrat5  .734  .111  .209  6.614  .000  
a. Dependent Variable: RawMJ3 
 Interpretation
 The first table summarizes the regression model. The important interpretative elements here are the value of r and rsquare. Note that these are both capitalized in the output but not in writing about them. The value of r can be rounded from .209 to .210 as we generally use two digits in reporting r. We can summarize this in an X→Y as Democratic Id .21 → RawMJ3.
 The value of r^{2} is .044, which means that the variation in Democratic identification explains just over 4% of the variation in attitudes on recreational marijuana as measured by our index.
 The second table reports information used in calculating the regression model. The most relevant bit for us is the overall significance of the model. At less than one in a thousand (.000) this is well below .05 indicating that the results of the regression model as a whole are very unlikely due to sampling error.
 At a somewhat more technical level, the second table’s sum of squares column contains information on the total variance of the DV (1274.2) as well as the unexplained or residual variance (1218.7) not accounted for by the model. Using the Total Variance – Unexplained Variance calculation explained in class, the Explained Variance = 55.5 (1274.21218.7 = 55.5). Moreover, the ratio of Explained Variance to Total Variance gives us the r^{2} value of .044 (55.5/1274.2=.044)
 The third table provides the information needed to derive the regression equation. It contains the constant and the b coefficient. Both are significant indicating that neither is likely due to sampling error. The equation using this information can be written in linear form as:
y = a + b[x]. Hence: RawMJ3 = (1.108 + .734)Democrat5  The constant (1.108) tells us that when Democrat5 is scored as zero, support for RawMJ3 is approximately 1 on an index running from 03.
 The regression coefficient is positive indicating that the relationship between the variables is positive. Thus as Democratic Identification increases, support for recreation Marijuana increases. Moreover, the equation tells us that for every one unit increase in Democrat5, RawMJ3 increases ..734 units. The units referred to here are those in which each variable is measured.
 In a univariate regression, it is rare for the regression coefficient to not be significant when the overall model is significant.
 The magnitude of the beta value can be overlooked for now. It will become relevant when we have two or more independent variables using multivariate regression. Notice that the regression equation above incorporates the b, not the beta.
 The interpretation of the second equation is up to you.
Output for Second Regression
Model Summary  
Model  R  R Square  Adjusted R Square  Std. Error of the Estimate 
1  .361^{a}  .130  .129  1.07650 
a. Predictors: (Constant), liberal5 
ANOVA^{a}  
Model  Sum of Squares  df  Mean Square  F  Sig.  
1  Regression  169.650  1  169.650  146.395  .000^{b} 
Residual  1134.606  979  1.159  
Total  1304.256  980  
a. Dependent Variable: RawMJ3  
b. Predictors: (Constant), liberal5 
Coefficients^{a}  
Model  Unstandardized Coefficients  Standardized Coefficients 
t 
Sig. 

B  Std. Error  Beta  
1  (Constant)  .832  .067  12.476  .000  
liberal5  1.351  .112  .361  12.099  .000  
a. Dependent Variable: RawMJ3 
INSTRUCTIONS
 In regression analysis, as with every other method of explanatory analysis, we begin by hypothesizing a relationship between an independent and a dependent variable. For example, continuing to work with the PPIC data, we may hypothesize that income or political interest affects attitudes about marijuana.
 It is also essential to recode each variable and identify missing values for both variables based upon their respective frequency analysis.
 Run the appropriate regression syntax in SPSS.
 In viewing your output, consider first the ANOVA table to see whether the relationship meets the standards of statistical significance for the equation as a whole and for the independent variable.
 Next find the unstandardized coefficient (the column labeled “B”). This is the slope of the line and should be interpreted as the predicted effect on the dependent variable of a one unit increase in the value of the independent variable. Be sure to note the direction of the relationship.
 Then check to see whether we can be confident that the results are not due to chance by checking the significance of that coefficient.
 Finally, assess the magnitude of the rsquare to determine the percent of variance in the DV explained by variation in the IV. The multiple r is equivalent to the correlation coefficient.
 Repeat the analysis using another independent variable from the data set such as Finances.
 Write out the regression equation for the relationship and interpret the meaning of your results in terms of the effect of the independent variable on the dependent variable with reference to both b and rsquare.
QUESTIONS FOR REFLECTION
 Do the magnitude of the regression coefficient b and the constant depend on how your variables are coded?
 What is the regression equation for your results and what is the meaning of each of the components?
 How do we visualize regression results?
DISCUSSION
 The value of the b coefficient and the constant do depend on how the variables are coded. For example, recoding the variables into categories both b values and rsquare will be affected. Generally speaking, when we use regression it is often preferable to use variables with as much of their original variation as possible.
 It is up to you to calculate the effect of ideology or income and decide whether they more or less important than partisan identification.
 Regression results can be visualized in two ways.
 The first way is to run a Graph command in SPSS.
 The relevant syntax is:
 GRAPH /scatterplot = IV with DV.
 The first way is to run a Graph command in SPSS.
Note that the independent variable appears first in this procedure.
 An alternative approach is to add a scatterplot subcommand to an existing regression procedure, after the /method= enter subcommand.
 It takes the form:

/scatterplot = (DV IV).

 It takes the form:
Note the DV precedes the IV for this.
In either case a regression line can be added by doubleclicking on the scatterplot which opens a Chart Editor page in SPSS. Immediately above the scatterplot, the fifth symbol from the left adds a fit line. (Hover your cursor over the symbols until you find the right one.)
Especially when working with categorical variables interested students may wish to create a “jittered scatter plot” by asking SPSS to add a small random number to each data value To do so, compute and use new “jittered” variables to visualize your results. For instance:
COMPUTE democrat5j = democrat5 + RV.UNIFORM(0.15, +0.15). COMPUTE RawMJ3j = RawMJ3 + RV.UNIFORM(0.15, +0.15). Note that the amount of jitter can be adjusted by altering the /+ figures.