

In other words, it is missing significant independent variables, polynomial terms, and interaction terms. This type of specification bias occurs when your linear model is underspecified. Non-random residual patterns indicate a bad fit despite a high R 2. An unbiased model has residuals that are randomly scattered around zero. The Residuals versus Fits plot emphasizes this unwanted pattern. However, the regression line consistently under and over-predicts the data along the curve, which is bias.


The data in the fitted line plot follow a very low noise relationship, and the R-squared is 98.5%, which seems fantastic. The fitted line plot models the association between electron mobility and density. You probably expect that a high R 2 indicates a good model but examine the graphs below. No! A regression model with a high R-squared value can have a multitude of problems. Related posts: Understand Precision in Applied Regression to Avoid Costly Mistakes and Mean Squared Error (MSE) Are High R-squared Values Always Great? A high R 2 is necessary for precise predictions, but it is not sufficient by itself, as we’ll uncover in the next section. How high does R-squared need to be for the model to produce useful predictions? That depends on the precision that you require and the amount of variation present in your data. If you need to generate predictions that are relatively precise (narrow prediction intervals), a low R 2 can be a showstopper. There is a scenario where small R-squared values can cause problems.

Related post: How to Interpret Regression Models that have Significant Variables but a Low R-squared Statistically significant coefficients continue to represent the mean change in the dependent variable given a one-unit shift in the independent variable. Clearly, being able to draw conclusions like this is vital. People are just harder to predict than things like physical processes.įortunately, if you have a low R-squared value but the independent variables are statistically significant, you can still draw important conclusions about the relationships between the variables. For example, studies that try to explain human behavior generally have R 2 values less than 50%. In these areas, your R 2 values are bound to be lower. Some fields of study have an inherently greater amount of unexplainable variation. No! Regression models with low R-squared values can be perfectly good models for several reasons. On the other hand, a biased model can have a high R 2 value! Are Low R-squared Values Always a Problem? R-squared does not indicate if a regression model provides an adequate fit to your data.
