Why Machine Learning
This lesson will focus on why we need machine learning models for predictions.
We'll cover the following
Issues with Regression
In the previous chapter, we learned how to use linear and logistic regression models for making predictions from data. But there are some issues with the regression framework. We will look at these issues one by one.
Non-linear relationships
By design, linear regression explores linear relationships between the dependent and independent variables. It assumes that there is a straight-line relationship between the variables and tries to find the line that best fits the data. Sometimes, it is not the case that variables follow a linear relationship. For instance, the relationship between age and income is not linear. Income rises exponentially during the early years and then grows almost linearly at the later stages.
Linear Regression parameters converge on the mean of the predicted variable
Linear regression finds the mean of the dependent variable since the error at the mean is relatively less to all points as compared to some other value. That is why in our tips example, the parameter chosen was close to the mean of the percentage tip. However, looking at extremes is also important.
Sensitivity to outliers
Linear regression is sensitive to outliers since it looks at the mean of the data. The best fit line can change direction to try to fit outliers.
Independent data assumption
Linear regression assumes that there is no significant relationship between the dependent variables. However, that is not always the case. Correlated independent variables affect the performance of linear regression models. This problem is also known as multi collinearity in statistics. Although there are ways to handle this, the performance of linear regression is not satisfactory when the dependent variables have relationships among themselves.
Because of these issues present in typical data, the performance of linear regression is often not very good. Linear regression works best when there are linear relationships between the dependent and independent variables, and the data contains no outliers. Therefore, we need another framework of predictive models that perform better. This is where machine learning comes in.
Create a free account to view this lesson.
Continue your learning journey with a 14-day free trial.
By signing up, you agree to Educative's Terms of Service and Privacy Policy