For example, consider what happens when d=2, and x2 is highly correlated with x1, meaning that the data look like a line, as shown in the left panel of the figure below. Thus, there isn’t a unique best hyperplane. Such correlations happen often in real-life data, because of underlying common causes; for example, across a population, the height of people may depend on both age and amount of food intake in the same way. This is especially the case when there are many feature dimensions used in the regression. Mathematically, this leads to XTX close to singularity, such that (XTX)−1 is undefined or has huge values, resulting in unstable models (see the middle panel of figure and note the range of the y values—the slope is huge!):
remove the reduent feature