|
Identifying Multicollinearity in Multiple Regression
Statistics Help for Dissertation Students & Researchers
How to Identify Multicollinearity
You can assess multicollinearity by
examining tolerance and the Variance Inflation Factor (VIF) are two collinearity
diagnostic factors that can help you identify multicollinearity. Tolerance is a
measure of collinearity reported by most statistical programs such as SPSS; the
variable’s tolerance is 1-R2. A small tolerance value indicates that the
variable under consideration is almost a perfect linear combination of the
independent variables already in the equation and that it should not be added to
the regression equation. All variables involved in the linear relationship will
have a small tolerance. Some suggest that a tolerance value less than 0.1 should
be investigated further. If a low tolerance value is accompanied by large
standard errors and nonsignificance, multicollinearity may be an issue.
The Variance Inflation Factor (VIF)
The Variance Inflation Factor (VIF)
measures the impact of collinearity among the variables in a regression model.
The Variance Inflation Factor (VIF) is 1/Tolerance, it is always greater than or
equal to 1. There is no formal VIF value for determining presence of
multicollinearity. Values of VIF that exceed 10 are often regarded as indicating
multicollinearity, but in weaker models values above 2.5 may be a cause for
concern. In many statistics programs, the results are shown both as an
individual R2 value (distinct from the overall R2 of the model) and a Variance
Inflation Factor (VIF). When those R2 and VIF values are high for any of the
variables in your model, multicollinearity is probably an issue. When VIF is
high there is high multicollinearity and instability of the b and beta
coefficients. It is often difficult to sort this out.
Request
Research & Statistics Help Today!
You can also assess multicollinearity in regression in the
following ways:
1. Examine the correlations and associations (nominal variables) between
independent variables to detect a high level of association. High bivariate
correlations are easy to spot by running correlations among your variables. If
high bivariate correlations are present, you can delete one of the two
variables. However, this may not always be sufficient.
2. Regression coefficients will change dramatically according to whether other
variables are included or excluded from the model. Play around with this by
adding and then removing variables from your regression model.
3. The standard errors of the regression coefficients will be large if
multicollinearity is an issue.
4. Predictor variables with known, strong relationships to the outcome variable
will not achieve statistical significance. In this case, neither may contribute
significantly to the model after the other one is included. But together they
contribute a lot. If you remove both variables from the model, the fit would be
much worse. So the overall model fits the data well, but neither X variable
makes a significant contribution when it is added to your model last. When this
happens, multicollinearity may be present.
Request Research & Statistics Help Today!
|