Excel Tutorial: Statistical Functions

These are some of the common statistical functions in Excel that are useful in investing. Again, these are not the only ones to be found in Excel, I’m just filtering the onses most useful to investors.

AVERAGE(number1,number2...)             Returns the average or arithmatic mean of the series.
GEOMEAN(number1,number2...)             Returns the geometric mean of the series.  See Note 1.
                                                           for difference between geometric and arithmatic mean.
MEDIAN(number1,number2...)              Returns the median (i.e. midpoint) of the series
MAX(number1,number2...)                   Returns the maximum value in the series
MIN(number1,number2...)                    Returns the minimum value in the series
COUNT(number1,nubmer2...)               Returns the number of values in the series.
FORECAST(X,known_X,known_Y)             Extrapolates Y given a series of known X and Y values.
                                                             Uses linear regression (i.e. fitting a straight line
                                                             through the data) to project forward results.
TREND(known_y,known_x,X,const)          Extrapolates Y given a series of known X and Y values.
                                                            Similar to FORECAST(), TREND() uses linear regression.
GROWTH(known_y,known_x,x,const)     Extrapolates Y given a series of known X and Y values.
                                                          Similar to TREND(), but GROWTH() fits an exponential
                                                          curve through the data to project forward results.
LINEST(known_y,known_x,const,stats)     Returns the statistical information on the line that
                                                            best fits the data.  This function must be entered as
                                                            an array formula.  See Note 2.
LOGEST(known_y,known_x,const,stats)    Returns the statistical information on the exponential
                                                             curve that best fits the data.  This function must be
                                                             entered as an array formula.  See Note 2.

number1, number2, etc. are variables that can be entered directly into the equation or can be replace with a cell reference or range of cells.

known_X, known_Y are variables (usually a range of cells) that are input into the regression analysis.

X is the next X variable. Typically TREND(), GROWTH() etc. solve for an extrapolated Y value given the known_Xs & known_Ys and an assumed X. For example the annual returns on a portfolio are 3%, 17%, 23%, & -15%. These would be the known_Y values, the known_X values are the years 1, 2, 3, & 4. X would be the year for which you want TREND() to project a corresponding Y value.

Const is an optional TRUE/FALSE variable which can be excluded. It is used in the line/curve fitting function to force the y-interect to go through 0. If the flag is set to TRUE, the function will return a slope for a line/curve through the origin.

Stats is an optional TRUE/FALSE variable which can be excluded. If it is set to TRUE, additional statistical information (mn,mn-1,…,m1,b;sen,sen-1,…,se1,seb;r 2,sey) will be provided by the function.

Note 1:

When talking about average growth rates there are two types, arithmatic mean and geometric mean. Arithmatic mean is what is commonly known as the average, and is calculated from summing all the terms and dividing by the number of terms (i.e. (X + Y + Z) / 3). Geometric mean is calculated by multiplying all the terms and taking the n_th root of the product (i.e. (X * Y * Z) ^ (1/3) ).

The significance of this is in growth rates. Growth rates fluctuates from year to year and trying to nail down an average growth rate over a number years is a bit difficult if you use arithmatic mean. Take the following example of a portfolio’s 10 year performance.

Year	Growth	 Value
0 		     $10,000
1	5%	  $10,500
2	8%	  $11,340
3	12%	 $12,701
4	23%	 $15,622
5	-2%	 $15,310
6	-6%	 $14,391
7	35%	 $19,428
8	10%	 $21,371
9	16%	 $24,790
10	1%	 $25,038

Average Growth (arithmatic mean):

(5% + 8% + 12% + 23% – 2% – 6% + 35% + 10% + 16% + 1%) / 10 = 10.2%

Now, according to the law of compounding, the ending value of the portfolio could be calculated given the starting value, growth rate and time period. Since there were no deposits or withdrawls to the portfolio, the compound interest equation can be use. Starting_Value * (1 * Growth_Rate) ^ Time. But if you plug in the values from this example you get

$10k * (1 + 10.2%) ^ 10 = $26,413.

$1,375 higher than the actual ending value.

To get the true average growth rate, also known as the compounded annual growth rate (CAGR), you need to use the geometric mean. It is calculated a bit differently because only positive numbers can be input into the equation.

(1.05 * 1.08 * 1.12 * 1*23 * 0.98 * 0.94 * 1.35 * 1.1 * 1.16 * 1.01) ^ (1 / 10) = 9.6%

Using the geometric mean in the compound interest equation we get

$10k * (1 + 9.6%) ^ 10 = $25,038

which matches the actual results. For further discussion on the difference and applicability of the geometric mean, see this article

Note 2:

Array formulas are for functions that return several pieces of data. LINEST() return two pieces of information, the slope and y-intercept of the best fit line. Since two separate numbers cannot occupy the same cell, the function needs to be spread across two cells. Example:

            A       B
Row 1   1	1
Row 2   2	2
Row 3   3	3
Row 4   4	5
Row 5   5	8
Row 6   6	13
Row 7   7	21
Row 8   8	34
Row 9   9	55
Row 10  10	89

To enter the LINEST() function, highlight cells A11 & B11. Then type

=LINEST(B1:B10,A1:A10)

but instead of hitting ENTER at the end, hold down SHIFT + CTRL + ENTER.

The results should be 8.31 and -22.60.

Similarly, =LOGEST(B1:B10,A1:A10) should return 1.63 and 0.69.

To project values of Column B for values of 11, 12, 13, 14 & 15 in Column A, either the TREND() or FORECAST() functions can be used for linear regression or the GROWTH() function can be used for exponential growth projections.

Example of TREND():

            A       B
Row 1   1	1
Row 2   2	2
Row 3   3	3
Row 4   4	5
Row 5   5	8
Row 6   6	13
Row 7   7	21
Row 8   8	34
Row 9   9	55
Row 10  10	89
Row 11  11      =TREND(B1:B10,A1:A10,A11)
Row 12  12      =TREND(B1:B10,A1:A10,A12)
Row 13  13      =TREND(B1:B10,A1:A10,A13)
Row 14  14      =TREND(B1:B10,A1:A10,A14)
Row 15  15      =TREND(B1:B10,A1:A10,A15)

Results:

        A       B
Row 1   1	1
Row 2   2	2
Row 3   3	3
Row 4   4	5
Row 5   5	8
Row 6   6	13
Row 7   7	21
Row 8   8	34
Row 9   9	55
Row 10  10	89
Row 11  11	68.80
Row 12  12	77.11
Row 13  13	85.42
Row 14  14	93.73
Row 15  15	102.04

If the GROWTH() function was used in place of TREND(), the results would be:

            A       B
Row 1   1	1
Row 2   2	2
Row 3   3	3
Row 4   4	5
Row 5   5	8
Row 6   6	13
Row 7   7	21
Row 8   8	34
Row 9   9	55
Row 10  10	89
Row 11  11	147.74
Row 12  12	240.69
Row 13  13	392.11
Row 14  14	638.79
Row 15  15	1040.67

To get an idea of how well the TREND() and GROWTH() projections work, I created some charts plotting the projected values against the real values. In this example I used Fibonacci numbers as a pattern to fit a regression line.

You can see from the charts that the exponential regression works much better than linear regression for fitting this data set. If LOGEST() had some more data points to work with, the slope of the regression line would go to phi.

Discussion Area - Leave a Comment




blog comments powered by Disqus