Multiple perspectives of Linear Regression

3 min readMay 12, 2021

Introduction

As a student of econometrics or data science or statistics, one would come across Linear Regression as one of the first and fundamental algorithms. Linear Regression is an approach for supervised learning for a quantitative response or output. As the name suggests, linear regression approximates a quantitative response Y based on set of inputs or features, represented by X, assuming that a linear relationship exists between X and Y.

For instance, Fig 1 plots sales as a function of advertising budget. Individual data points are represented by red points. The grey line represents sales as a linear function of advertising budget in TV. The linear function is the best representation of the sales as function of TV advertising budget as it minimizes the squared sum of errors(represented by light grey vertical lines).

Why multiple perspectives?

A concept can be very well derived by mathematical manipulation of formulae. However, that might not lead to conceptual understanding of the intricacies. Visiting a problem from multiple perspectives helps in understanding the fundamental idea behind the concept, broadening the scope in which a concept can be applied.

Multiple perspectives

1. Minimization of Mean Square Error

The minimization is performed analytically using calculus by differentiating error term with respect to coefficients.

2. Using Projection

Another perspective of linear regression is through projection of hyperplane formed by output, on the hyperspace formed by inputs. For simplicity, let us assume that we have two input variables(x1 & x2), and an output variable y.

Projection of y on hyperplane formed by X (source: G. James et al., An Introduction to Statistical Learning: with Applications in R)

The best possible estimate of y, using x is the projection of y on the hyperplane formed by x. In current case, it is the projection of y on the 2-d plane formed by x1 & x2.

The error term is the normal from the 2d plane(x1-x2) to y. The concept can be generalized to higher dimensions. The combination vector whose dot-product with input variables of x which form the projection is the coefficient vector.

3. As approximate Normal distribution

The OLS can be also represented as:

y ~ N(𝑋𝛽, e)

Linear regression can be also thought of as approximating y in the form of a normal distribution with mean 𝑋𝛽 and standard deviation as e.

For uni-variate case, if the inputs and outputs are centered by subtracting the mean, y is approximated by uni-variate normal distribution.

Conclusion

Although the three perspectives for Linear Regression might seem different on the surface, the underlying concept of approximating the output as a linear combination of inputs is same. The coefficients obtained are equal in all the three cases. In fact, if we derive the formulae for coefficients’ calculation analytically from methodologies 1 & 2 (minimization of MSE & projection), we arrive at the same result. Same can be verified for methodology 3 too.