Poli 101
Winter 2018
Homework Work 3:

Correlation and Regression: (HW3)

Date Assigned: Week 7

Due Date: Week 8 


In this assignment, you will investigate the relationships between a dependent variable and at least three independent variables using correlation and multiple regression analyses.  At this point in the course, your work should focus not simply on its technical aspects of your analysis but also on providing a thoughtful account of your findings in the form of coherent narrative, explaining the relative impact of your independent variables on your dependent variable. As such, three (or more) properly formulated hypotheses should be provided.

Some tips and Reminders

Correlation coefficients are designed to summarize association between two variables measured at the interval level. As such, they are ideal for working with aggregate data where interval level measures are the norm. However correlations can also be useful in analyzing survey data where ordinal data are common.

The dependent variable in regression analysis ideally should be measured at the interval level. However, an ordinal measure with multiple values and a distribution that approximates normal is often generally acceptable. Accordingly, in working with public opinion data an index as a dependent variable is strongly recommended. In using indices, please report standard distribution and reliability measures, as well as any recodes. Independent variables can be measured at either the interval or ordinal level, and nominal level variables can be used if they are recoded as dummy variables (scored as 0 or 1).

In using SPSS to complete this assignment, it will be necessary to create (or edit and paste) correlation or regression input files in the syntax window. For correlation you may wish to calculate both ordinary and non-parametric coefficients. For regression, it is important to include both statistics and descriptives subcommands (/statistics coeff outs r tol; /descriptives = n).

Keep in mind that high quality tables are required, not raw output. Also be sure to include your syntax.

For this assignment you should use 2016-17 PPIC survey data, the ANES2016, the PEW data or the the World or State data sets available on the course website, unless you have specific permission from the instructor to use another data set. As in all assignments, you may not use examples provided in the lectures or labs.

This assignment requires you to:

  1. Select/construct a dependent variable of interest and develop at least three hypotheses (identified as H1, H2 and H3 etc.). As always, be sure to include a rationale for each hypothesis. At least one (group of) variables included in your analysis should consist of dummy variables derived from a variable measured at the nominal level which has three or more categories or values. Please note that dummies derived from the same original variable count as a single predictor variable.
  2. Making any necessary recodes and declaring all missing values, use the proper syntax to produce a correlation matrix including your dependent variable and three (or more) independent variables. Report your results in a properly formatted table. An example is presented below.
  3. Referring to the correlation matrix and attending carefully to the variables’ codings, describe your findings regarding each of the three (or more) hypothesized relationships in terms of their relative strength, direction and substantive meaning. Make reference to explained variance and statistical significance of each relationship. Also briefly consider what can be learned from the portion of the correlation matrix formed by the pairings of the independent variables. For example, the independent variables may be relatively unrelated to one another or they may be only slightly different indicators of the same underlying dimension. The latter can be particularly common with aggregate data
  4. Examine the same relationships using multiple regression analysis using at least three independent variables to explain variation in a dependent variable. Report your results in a properly formatted table (see below) and report the substantive conclusion you can draw from the results. Please note the statistical significance (or insignificance) of both the regression equation as a whole as well as of each of the three regression coefficients.
  5. Considering both unstandardized and standardized regression coefficients, discuss the relative influence of your independent variables in explaining variation in the dependent variable. Note whether your regression results confirm or differ from those of your correlation analysis.
  6. With specific reference to adjusted R-square consider how well the combination of your independent variables explains variation in your dependent variable. Discuss any indications of collinearity in your model and how they were assessed. Do the inter-relationships in your correlation analysis provide any insight into possible issues of collinearity?
  7. Include properly formatted tables of both your correlation and regression results (see below).

Examples of properly formatted tables:

Table 1. Correlations among Political Tolerance and Three Demographic Variables

  Tolerance Age Income
Age .042(.000)
Income .070(.000) -.081(.156)
Education .086(.043) -.208(.000) .332(.000)

Note: Cell entries are Pearson correlation coefficients with statistical significance indicated in parentheses.

Table 2. Prediction of Tolerance with Age, Income and Education

B Std. Error Beta
Age .153** (.010) .061
Income .010** (.000) .047
Education .017* (.001) .084
Constant .353** (.006)
N 2,368
R2 .013
Adjusted R2 .013

***p <.001; ** p < .01; * p < .05

Please note: Although presenting specific significance levels is often preferred, in order to reduce clutter the statistical significance of relationships summarized in correlation and regression tables is sometimes indicated by affixing asterisks or stars (*) to the relevant coefficients. In the social sciences three particular thresholds are common:

  1. a) If the significance level (p-value) is .001 or less, three stars *** are used;
  2. b) If the significance level (p-value) is between .01 and .001) ** are used;
  3. c) If the significance level (p-value) is between .05 and .01 one star * is used.
    If the relationship fails to reach the threshold of .05, no stars are used.