The residual for each observation is the difference between predicted values of $y$ (dependent variable) and observed values of $y$. \begin{align} \text{Residual}&=\text{actual } y \text{ value} - \text{predicted }y \text{ value} \text{,}\\ r_i&=y_i-\hat{y_i} . \end{align}
Having a negative residual means that the predicted value is too high, similarly if you have a positive residual it means that the predicted value was too low. The aim of a regression line is to minimise the sum of residuals.
Knowing that \[r_i=y_i-\hat{y_i}\] and knowing that the regression line has the equation \[\displaystyle \hat{y_i}=a+b{x_i}\] we calculate the residual of an observation as follows: \[r_i=y_i-\hat{y_i}=y_i-(a+bx_i).\]
To see how students' physical ability has increased over a four-year period, ten students completed an obstacle course and then four years later they took the same course again. Here are their times:
Student |
Debbie |
Edna |
Jerry |
Norman |
Joseph |
Betty |
Susan |
Marilyn |
Bert |
Alice |
---|---|---|---|---|---|---|---|---|---|---|
First Test, $x$, (seconds) |
$67$ |
$53$ |
$68$ |
$57$ |
$71$ |
$74$ |
$63$ |
$75$ |
$66$ |
$66$ |
Second Test, $y$, (seconds) |
$46$ |
$29$ |
$37$ |
$44$ |
$41$ |
$35$ |
$41$ |
$43$ |
$33$ |
$36$ |
The equation of our regression line is $\hat{y}=23.91+0.22x$. What is the predicted time to complete the second course for Betty and what is the residual value?
Using our regression line equation we can calculate the predicted value, $\hat{y}$, by simply substituting in our value for $x$ (the first test score for Betty).
\begin{align} \hat{y_i}&=a{x_i}+b\\ &=23.91+0.22x_i\\ &=23.91+0.22\times74\\ &=40.19 \end{align}
The residual value is calculated by
\begin{align} r_i&=y_i-\hat{y_i}\\ &=35-40.19\\ &=-5.19 \end{align}
This is a video example involving calculating residuals produced by Alissa Grant-Walker.