Tuesday, July 7, 2015

Trend line and regression analysis. How to do it easily

Today I came across a thesis os a student who used this and it was a long forgotten topic for me. nevertheless, it was refreshing to read and revisit this very simple and statistically inclined topic that we encounter in day to day life. So what is a trend line?

We all knowingly or unknowingly try to predict future from our past experience. For example if a student is scoring good grades in previous years, we tend to predict what his/her grades are going to be in coming years. There could be many more such examples as predicting the stock market, predicting weather and so on. So, what we do internally is generate a trend and try to fit in the future based on that trend.

So, let take a simple example of scores of a student:

1      300
2      340
3      320
4      400
5      420
9      500

So, here these are some values/scores the student had in the months in  the left column. Eye balling this data what we see is, there is a gradual improvement in scores. And we also observe that there are some missing points here (6,7,8). So, can we predict the score for the 10th month? Can we say what the score would have been on 6th, 7th and 8th month?

So, in this case, we may like to fit this into a trendline. The easiest way to do it is to use excel. So lets fill in the data in an excel sheet as shown below: [figure 1]

So, now the formula is Y = 25.5X + 278 at R2=0.928. Using this equation, we can predict the scores on 6th month e.g; Y = 25.5 X 6 + 278 = 309.5;
On 7th month it is = 25.5 X 7 + 278 = 454.4
On 8th month it is = 25.5 X 8 + 278 = 479.6
On 10th month, it is going to be 25.5 X 10 + 278 = 533

Now putting these values back into excel we get the following figure (Figure - 2)
So, what we see there? The equation and R2 has changed!! Slightly though.. So, what does it tell us? Probably the linear trendline is not correct for this data series.
Figure 1: Excel screen shot of how to select trendline in excel and display the formula.

Figure -2: Changed R value as well as trend line formula