Stock Market Valuation by Shawn P Emhe II

In Irrational Exhuberance, Robert Shiller compiled over 100 years of data to demonstrate how market dynamics create cycles in which the stock market prices become disconnected with valuations. One criticism1 of his work is that prices are set by the market, not by valuations, and because of this no one can say when the market is “overvalued” or “undervalued.” I intend to use the dataset to explore the link between value and price.

Data Wrangling

The data is already in tidy format, but still requires some wrangling prior to exploration.

## 'data.frame':    1769 obs. of  11 variables:
##  $ Date        : num  1871 1871 1871 1871 1871 ...
##  $ Price       : num  4.44 4.5 4.61 4.74 4.86 4.82 4.73 4.79 4.84 4.59 ...
##  $ Dividend    : num  0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 ...
##  $ Earnings    : num  0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 ...
##  $ CPI         : num  12.5 12.8 13 12.6 12.3 ...
##  $ DateFraction: num  1871 1871 1871 1871 1871 ...
##  $ GS10        : num  5.32 5.32 5.33 5.33 5.33 5.34 5.34 5.34 5.35 5.35 ...
##  $ RealPrice   : num  89 87.6 88.4 94.3 99 ...
##  $ RealDividend: num  5.21 5.06 4.99 5.17 5.3 5.38 5.38 5.46 5.34 5.25 ...
##  $ RealEarnings: num  8.02 7.78 7.67 7.96 8.15 8.27 8.27 8.41 8.21 8.08 ...
##  $ CAPE        : num  NA NA NA NA NA NA NA NA NA NA ...

The str command shows that there are two date fields.

  1. Date: 1871.01, 1871.02, 1871.03 …
  2. DateFraction: 1871.04, 1871.13, 1871.21 …

Both use a YYYY.MM format, with the first numbering the months 01 through 12 and the second representing months as a fraction of the year. I reviewed the background information2 on the dataset and the original excel file. The most logical explanation I could find was that the second format was created for use as the X-axis for the charts in the excel file. The first format was likely an output of another program or data source and is not a format interpretable by excel.

I converted the values into an R friendly Date format and dropped the DateFraction column.

The dataset contains the 10-Year Treasury Constant Maturity Rate labeled as GS10. The rates are in percentage form, most likely to make them easier to plot alongside the other features in the original excel file. I converted them to decimal form to prepare for analysis.

Feature Extraction

As an additional preparation step I created several new factors from the data:

  1. Inflation
    • Measured as the annual % change in CPI
    • As a categorical value: Rising or Falling
  2. Momentum
    • % Change in price over the past year
    • As a categorical value: Rising or Falling
  3. Forward Returns
    • Measured as a % over 1, 3, 5 and 10 years
    • Categorical value representing if the 1 year “Outlook” was Bullish or Bearish

Inflation was captured out of curiosity of its affect on future returns. Momentum was measured because studies have shown that price advances and declines can persist.3 I am interested to see if it will make a good complement to value in predicting returns.

I created a pct_change function to extract the above features from the price and CPI columns. I also created a function to shift the results of the pct_change for building the future returns values.

Furthermore, I used categorical variations of the inflation and momentum to assist in exploring positive and negative conditions of each.

Univariate Plots Section

## 'data.frame':    1769 obs. of  19 variables:
##  $ Date          : Date, format: "1871-01-01" "1871-02-01" ...
##  $ Price         : num  4.44 4.5 4.61 4.74 4.86 4.82 4.73 4.79 4.84 4.59 ...
##  $ Dividend      : num  0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 ...
##  $ Earnings      : num  0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 ...
##  $ CPI           : num  12.5 12.8 13 12.6 12.3 ...
##  $ GS10          : num  0.0532 0.0532 0.0533 0.0533 0.0533 0.0534 0.0534 0.0534 0.0535 0.0535 ...
##  $ RealPrice     : num  89 87.6 88.4 94.3 99 ...
##  $ RealDividend  : num  5.21 5.06 4.99 5.17 5.3 5.38 5.38 5.46 5.34 5.25 ...
##  $ RealEarnings  : num  8.02 7.78 7.67 7.96 8.15 8.27 8.27 8.41 8.21 8.08 ...
##  $ CAPE          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Momentum      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CPIChange     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Growth        : Factor w/ 2 levels "Expansion","Recession": NA NA NA NA NA NA NA NA NA NA ...
##  $ Inflation     : Factor w/ 2 levels "Deflation","Inflation": NA NA NA NA NA NA NA NA NA NA ...
##  $ Fwd1yrReturns : num  0.0946 0.0844 0.0933 0.0928 0.0658 ...
##  $ Fwd3yrReturns : num  0.0495 0.0667 0.026 -0.0295 -0.0782 ...
##  $ Fwd5yrReturns : num  0.0045 0.00444 -0.02169 -0.08439 -0.13992 ...
##  $ Fwd10yrReturns: num  0.394 0.371 0.354 0.312 0.337 ...
##  $ Outlook       : Factor w/ 2 levels "Bearish","Bullish": 2 2 2 2 2 2 2 2 2 2 ...
##       Date                Price            Dividend         Earnings      
##  Min.   :1871-01-01   Min.   :   2.73   Min.   : 0.180   Min.   :  0.160  
##  1st Qu.:1907-11-01   1st Qu.:   7.74   1st Qu.: 0.410   1st Qu.:  0.540  
##  Median :1944-09-01   Median :  16.42   Median : 0.830   Median :  1.325  
##  Mean   :1944-08-31   Mean   : 259.74   Mean   : 5.637   Mean   : 12.810  
##  3rd Qu.:1981-07-01   3rd Qu.: 122.90   3rd Qu.: 6.370   3rd Qu.: 13.607  
##  Max.   :2018-05-01   Max.   :2789.80   Max.   :50.000   Max.   :109.880  
##                                         NA's   :2        NA's   :5        
##       CPI              GS10           RealPrice        RealDividend  
##  Min.   :  6.28   Min.   :0.01500   Min.   :  67.67   Min.   : 4.99  
##  1st Qu.: 10.10   1st Qu.:0.03290   1st Qu.: 170.13   1st Qu.: 8.53  
##  Median : 18.20   Median :0.03860   Median : 253.23   Median :12.73  
##  Mean   : 57.84   Mean   :0.04569   Mean   : 509.86   Mean   :15.17  
##  3rd Qu.: 91.60   3rd Qu.:0.05220   3rd Qu.: 613.87   3rd Qu.:19.27  
##  Max.   :249.98   Max.   :0.15320   Max.   :2813.54   Max.   :50.08  
##                                                       NA's   :2      
##   RealEarnings         CAPE          Momentum          CPIChange       
##  Min.   :  4.19   Min.   : 4.78   Min.   :-0.65609   Min.   :-0.19700  
##  1st Qu.: 12.66   1st Qu.:11.79   1st Qu.:-0.06067   1st Qu.: 0.00000  
##  Median : 20.45   Median :16.17   Median : 0.06691   Median : 0.02264  
##  Mean   : 29.87   Mean   :16.86   Mean   : 0.06119   Mean   : 0.02236  
##  3rd Qu.: 39.34   3rd Qu.:20.47   3rd Qu.: 0.18493   3rd Qu.: 0.04615  
##  Max.   :111.42   Max.   :44.20   Max.   : 1.24152   Max.   : 0.23669  
##  NA's   :5        NA's   :120     NA's   :12         NA's   :12        
##        Growth         Inflation    Fwd1yrReturns      Fwd3yrReturns     
##  Expansion:1113   Deflation: 474   Min.   :-0.65609   Min.   :-0.82409  
##  Recession: 644   Inflation:1283   1st Qu.:-0.06067   1st Qu.:-0.05511  
##  NA's     :  12   NA's     :  12   Median : 0.06691   Median : 0.16223  
##                                    Mean   : 0.06119   Mean   : 0.18797  
##                                    3rd Qu.: 0.18493   3rd Qu.: 0.40225  
##                                    Max.   : 1.24152   Max.   : 1.38523  
##                                    NA's   :12         NA's   :36        
##  Fwd5yrReturns      Fwd10yrReturns        Outlook    
##  Min.   :-0.71629   Min.   :-0.61661   Bearish: 644  
##  1st Qu.:-0.05574   1st Qu.: 0.07159   Bullish:1113  
##  Median : 0.24275   Median : 0.49322   NA's   :  12  
##  Mean   : 0.33082   Mean   : 0.72440                 
##  3rd Qu.: 0.64044   3rd Qu.: 1.29522                 
##  Max.   : 2.38378   Max.   : 3.65442                 
##  NA's   :60         NA's   :120

The final dataset contains 1769 observations of 22 variables.

Price

Because the value of price is measured over time it is best plotted as a time series. The scale of the first plot makes it difficult to compare the data starting in the late 1800s to more recent years. The dataset already contains a transformed version in the form of RealPrice, which has been adjusted for inflation. Taking the log10 of the prices makes them even more readable, which is a natural transformation because price growth is geometric. The final inflation adjusted and log transformed graph makes it easy to compare the growth and recessions over the entire history of the dataset.

Dividends and Earnings

Logically, the dividends and earnings should benefit from the same transformations.

Dividends show the same steady growth over time as price. But the second half of the data appears to have much lower variance than the first half. I wonder if this is due to the decision to start taxing dividends in 1954.4

An interesting feature of the earnings data are the two significant dips that occurring during the Great Depression and Great Recession of the 1930’s and 2008. Earnings appear to have been impacted even harder than stock prices.

Interest Rates

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01500 0.03290 0.03860 0.04569 0.05220 0.15320

The 10-Year Treasury rates (GS10) show a huge spike in the 1970s as the Fed and policy makers sought to curb runaway inflation. This creates a long right tail in the distribution of rates. Transforming the scale to shows multiple peaks in the distribution. The GS10 rate is heavily influenced by the Fed Funds rate. Could these peaks be the default speeds of the Fed for boosting and taming the economy?

CAPE

CAPE, the Cyclically Adjusted Price to Earnings Ratio, is the heart of the dataset. Robert Shiller created the ratio to smooth for inflation and business cycle affects by dividing the inflation adjusted price by 10 year average of the earnings.

Transforming to log scale makes it easier to see an important characteristic. The tails of the distribution represent the cheapest and most expensive readings for the market. However, the normal scale of the x axis diminishes the fact that changes in the lower end represent proportionately larger percentage differences in price per earnings. For example, the difference between a value of 4 and 5 means paying 25% more for earnings, whereas 40 to 41 is only and increase of 2.5%. The log scale balances the emphasis placed on the rare occurrence of valuations at both ends of the spectrum.

Momentum

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## -0.65609 -0.06067  0.06691  0.06119  0.18493  1.24152       12

The Momentum distribution has long tails, with outlier years returning over 100% and losing more than 50%. The boxplot makes these extreme occurrences easy to see. The mean annual return was 6.12%.

Inflation

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## -0.19700  0.00000  0.02264  0.02236  0.04615  0.23669       12

The distribution of CPIChange shows very high kurtosis, with a mean of 2.24%. This aligns with the Fed’s target, while the tails show just how far inflation and deflation can go when they get out of hand.

The bar charts show that the US has spent most of period in a state of rising inflation and growth.

Forward Returns

##  Fwd1yrReturns      Fwd3yrReturns      Fwd5yrReturns     
##  Min.   :-0.65609   Min.   :-0.82409   Min.   :-0.71629  
##  1st Qu.:-0.06067   1st Qu.:-0.05511   1st Qu.:-0.05574  
##  Median : 0.06691   Median : 0.16223   Median : 0.24275  
##  Mean   : 0.06119   Mean   : 0.18797   Mean   : 0.33082  
##  3rd Qu.: 0.18493   3rd Qu.: 0.40225   3rd Qu.: 0.64044  
##  Max.   : 1.24152   Max.   : 1.38523   Max.   : 2.38378  
##  NA's   :12         NA's   :36         NA's   :60        
##  Fwd10yrReturns    
##  Min.   :-0.61661  
##  1st Qu.: 0.07159  
##  Median : 0.49322  
##  Mean   : 0.72440  
##  3rd Qu.: 1.29522  
##  Max.   : 3.65442  
##  NA's   :120

Note that the Fwd1yReturns are the same values as momentum, but shifted 1 year.

An interesting feature of the forward returns is that they become more positively skewed the longer the duration. The skew is the affect of compounding, but there is a lower limit due to prices having never been below 0.

A logical transformation is to convert all of the returns to their compound annual growth rate (CAGR). This will also make it easier to compare them to each other.

## [1] "1 year forward returns"
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## -0.65609 -0.06067  0.06691  0.06119  0.18493  1.24152       12
## [1] "3 year forward returns"
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## -0.43968 -0.01872  0.05139  0.04896  0.11929  0.33611       36
## [1] "5 year forward returns"
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## -0.22273 -0.01141  0.04442  0.04690  0.10406  0.27609       60
## [1] "10 year forward returns"
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## -0.09142  0.00694  0.04091  0.04517  0.08663  0.16624      120

The oft referred to fat tails of the returns are still present and a slight negative skew is visible. This is characteristic of the risks investors face; extreme returns occur more often than they would with a normal distribution, with greater than normal occurrence negative returns.

The mean annual returns decrease as the forward looking periods decrease. This is a result of the negative skew and its affect on compounding. Large losses require even larger gains for recovery, causing realized CAGRs to be lower than the mean Fwd1yrReturns.

After seeing the transformed values I decided to store them for easier access during rest of the analysis.

Univariate Analysis

What is the structure of your dataset?

The dataset contains 1769 months of data covering 22 features, one of which being the date. The data covers the price, dividends and earnings for the US stock market since 1871. Also included are the Consumer Price Index (CPI) and the 10-Year Treasury rates. The first allows conversion of values from nominal to inflation adjusted levels. The Treasury rates are commonly used as a risk free benchmark.

There are also many NA values in the data because of the numerous values created from running calculations. CAPE, for example, requires 10 years of earnings values to smooth before it begins showing values.

What is/are the main feature(s) of interest in your dataset?

Shiller’s CAPE ratio is the main feature of the dataset, allowing the analysis of market valuations over the extended time period. I also introduced momentum measures so that I can attempt to build a model for forecasting future returns.

What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?

CPI is crucial in transforming the data over time to account for inflation. In addition, I expect to see that inflation itself will have affects on stock returns. I believe this will also be the case for interest rates, which are often used by governments to boost economic growth.

Did you create any new variables from existing variables in the dataset?

A measure of momentum and inflation were created to show their running 12 month rates of change. Forward returns were also calculated and then transformed into compound annual growth rates (CAGRs) during the analysis.

Categorical values were created to allow partitioning the data into periods of Rising and Falling states of Growth and Inflation. An Outlook categorical variable was also created to test if predicting bullish and bearish periods was more feasible than predicting the specific level of return.

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

Skewed distributions are the norm in financial data, starting with the long tails in returns and carried through to prices with the effect of compounding over time. Interest rates and inflation also showed long tails.

Log transformations were applied to prices, dividends and earnings to reduce positive skew. They are also a natural transformation to apply because they convert the exponential growth of continuous compounding into an additive one. The CAPE ratio is generated from those variables and benefited from the same transformation.

I also took advantage of the inflation adjustments that were already in the data set to help visualize changes in the variables over time.

Last, I transformed the different forward return measures to annual measures to reduce skew and so that they could be directly compared.

Bivariate Plots Section

Stock Features

No surprises here, all of the stock features are highly correlated. Applying the log transform and transparency to reduce over plotting shows that they have strong linear relationships and have varied together over time.

CAPE and Forward Returns

There appears to be a negative correlation to forward returns, but the effect is less prominent in the Fwd1yrReturns. It looks as though the CAPE might be less valuable in near term forecasting than for making long term predictions.

The plots also look like they could benefit from using the Log10 transformed version of CAPE.

The cor.test function showed that he Pearson R correlations were -0.19, -0.30, -0.37 and -0.40, in order of increasing forward looking periods.

The following plots break CAPE into quintiles and show the mean returns for each quintile.

Average returns are higher starting from points when the market was selling at a discount. 1 yr returns show the highest average for the lowest CAPE reading. But the just looking at the mean does not say anything about the distribution of those returns.

The boxplots show that the means are affected by skewed distributions. The median returns are more of an indication of what investors would earn during a typical year.

Momentum and Forward Returns

There doesn’t appear to be a strong correlation between Momentum and forward returns. It looks like there are outliers. It might help to zoom in.

There still doesn’t appear to be a correlation, which surprises me. Momentum is the tool of choice of trend followers. Maybe the categorical Growth variable will shed some light.

Separating the data into Rising and Falling Growth periods does not appear to provide any benefit at all in predicting higher returns. However, it does tighten the distribution and reduce the tails of the forward 1 year returns, a particular benefit in reducing risk. This is confirmation of something I have learned from studying trend following in the past: its main benefit is not in finding higher returns, but in avoiding the worst periods. However, I was expecting at least some correlation to higher future returns. If momentum were the focus of this study I would consider looking at shorter forward looking periods.

One thing to note is that this connection reverses after the first year, with the 3 and 5 year plots showing the opposite connection and lower average returns after Rising Growth readings.

Interest Rates

It looks like there could be correlation over longer time periods. For 1 year the Pearson R is only .05, but it increases to .14, .21 and .39 over the respective longer periods.

## 
##  Pearson's product-moment correlation
## 
## data:  cape$GS10 and cape$CAPE
## t = -6.4187, df = 1647, p-value = 1.793e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2029604 -0.1087671
## sample estimates:
##        cor 
## -0.1562189
## 
##  Pearson's product-moment correlation
## 
## data:  cape$GS10 and log10(cape$CAPE)
## t = -9.0448, df = 1647, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2630431 -0.1710574
## sample estimates:
##        cor 
## -0.2175332

It does look like there could be some negative correlation between GS10 rates and CAPE. But the cor.test function returned -.16 indicating that it is very weak. The correlation is slightly stronger at -.22 when using the log10 transformed values of CAPE.

Inflation

Although it is not readily apparent in the plot, the cor.test function found a weak correlation of .36 between CPIChange and Fwd10yrCAGR. It looks like inflation only correlates with returns over long periods.

Inflation does not show correlation to Momentum either.

No correlation was visible between Inflation( as CPIChange) and the GS10 rates. I was expecting to find one because of the Federal Reserve’s use of rate policy to manage the inflation rate. It’s possible that there is a delay between market changes and policy reactions, and that lagging one against the other would reveal correlation.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

Forward returns were negatively correlated with CAPE. They did not show correlation to Momentum, Inflation or the GS10 rates. However, their distribution did appear tighter in the 1 year forward returns following rising momentum.

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

Price, Dividend and Earnings were all highly correlated. This was expected as it was visible in the time series plots that they have varied together over the duration of the dataset.

CAPE and GS10 had a weak negative correlation. This could be the affect of investors being willing to pay a higher price for returns during low interest rate periods, and vice versa.

What was the strongest relationship you found?

The strongest relationship was between CAPE and forward returns, with the correlation increasing over the longer periods tested.

Multivariate Plots Section

Growth and CAPE vs Foward Returns

It’s difficult to see in this plot, but it does look like there is a stronger trend between CAPE and forward 1 year returns during periods of recession.

Using facet_wrap to separate the conditions makes the difference easier to see. Periods of recession in the stock market exhibit a stronger negative correlation between CAPE and price than during expansion.

The feature combination doesn’t appear to have the same benefit over longer time periods.

CAPE and Inflation to Returns

Inflation and Deflation also appear to complement CAPE’s correlation to returns.

The numeric measures of Inflation and Growth did not show correlation during bivariate analysis, suggesting that they should work well together to enhance the power of CAPE.

CAPE looks strongest in predicting forward 1 year returns during deflationary recession years.

The search for yield

The inverse of CAPE can be interpreted as a yield similar to interest rates. Are investors willing to pay more for earnings when interest rates are low?

There does appear to be a positive correlation between the excess yield of stocks and forward returns.

Using the log10 inverse of CAPE makes the relationship look even stronger. The Pearson R correlations between the new value and forward returns are .18, .28, .34 and .35, indicating that this combination does not provide any benefit. These values are slightly lower than using CAPE alone.

Prediciting the future

I will use lm to fit a linear model to predict future returns.

## 
## Calls:
## Model 1: lm(formula = Fwd1yrReturns ~ log10(CAPE) + Growth + Inflation + 
##     GS10, data = cape)
## Model 2: lm(formula = Fwd3yrCAGR ~ log10(CAPE) + Growth + Inflation + 
##     GS10, data = cape)
## Model 3: lm(formula = Fwd5yrCAGR ~ log10(CAPE) + Growth + Inflation + 
##     GS10, data = cape)
## Model 4: lm(formula = Fwd10yrCAGR ~ log10(CAPE) + Growth + Inflation + 
##     GS10, data = cape)
## 
## ===========================================================================================
##                                      Model 1       Model 2       Model 3       Model 4     
##                                  -------------- ------------- ------------- -------------   
##                                   Fwd1yrReturns   Fwd3yrCAGR    Fwd5yrCAGR   Fwd10yrCAGR   
## -------------------------------------------------------------------------------------------
##   (Intercept)                         0.290***       0.233***      0.198***      0.138***  
##                                      (0.038)        (0.020)       (0.015)       (0.009)    
##   log10(CAPE)                        -0.193***      -0.183***     -0.161***     -0.118***  
##                                      (0.028)        (0.015)       (0.011)       (0.006)    
##   Growth: Recession/Expansion         0.001         -0.008         0.007        -0.006*    
##                                      (0.010)        (0.005)       (0.004)       (0.002)    
##   Inflation: Inflation/Deflation     -0.012          0.034***      0.024***      0.034***  
##                                      (0.011)        (0.006)       (0.004)       (0.003)    
##   GS10                                0.238          0.251*        0.465***      0.522***  
##                                      (0.206)        (0.110)       (0.082)       (0.048)    
## -------------------------------------------------------------------------------------------
##   R-squared                           0.036          0.117         0.182         0.352     
##   adj. R-squared                      0.034          0.115         0.180         0.351     
##   sigma                               0.184          0.097         0.072         0.041     
##   F                                  15.355         53.491        88.231       207.330     
##   p                                   0.000          0.000         0.000         0.000     
##   Log-likelihood                    452.680       1474.437      1924.220      2699.048     
##   Deviance                           55.129         15.177         8.257         2.622     
##   AIC                              -893.360      -2936.874     -3836.440     -5386.096     
##   BIC                              -860.957      -2904.558     -3804.215     -5354.101     
##   N                                1637           1613          1589          1529         
## ===========================================================================================

The strongest predictions were for 10 year forward returns, but with only 35% of the variance in returns being explained by the model.

A classification model can also be used to predict the Outlook variable. I used rpart to create a tree-based model to predict if the next year would be bullish or bearish.

##          pred
##           Bearish Bullish
##   Bearish     372     272
##   Bullish     114     999

The model was 78% accurate in labeling the dataset. One way of measuring the power of a classification model is to compare it to a majority classifier, which would have labelled all of the data as bullish with 63% accuracy.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

The Growth and Inflation features both strengthened the correlation between CAPE and 1 year forward returns. CAPE appears to have the strongest correlation to forward returns after periods of recession and deflation. One possible explanation is that periods of expanding growth and inflation push all prices up, without discrimination to the fundamental value of the market. In the reverse situation, a benefit of this relationship is that CAPE can be used to help identify good opportunities after stock and consumer prices have been falling.

Were there any interesting or surprising interactions between features?

OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.

Linear models were used to predict future returns. The models showed that returns are easier to predict over longer time horizons, but even the 10 year model was only able to predict 35% of the variation in returns.

A classification model was able to predict the 1 year outlook with 78% accuracy. This showed some strength over the majority classier.

To settle curiosity, the models are 71% confident that the next year will be bullish, but are only expecting a 1% return over the next 10 years (as of May 2018).

One caveat to keep in mind is that I did not split the data into training and test sets to control for over-fitting. These models were purely for exploring relationships in the data.


Final Plots and Summary

Plot One

Description One

This plot shows the log adjusted change in stock prices since 1871. Stocks have been a good investment, but the ride has been anything but smooth.

Plot Two

Description Two

Market valuations have power in predicting future returns. The average 10 year returns for the stock market are higher and with less variance starting from low CAPE readings. Coincidentally, the CAPE ratio is currently at 31 as of 2018-05-01. Current valuations suggest that the 10 year outlook for stocks is grim, which runs contrary to public opinion.

Plot Three

Description Three

Periods of expansion have less variance in returns than those of recession. This is a subtle, but important effect. Most investors have a hard time dealing with volatility. They can benefit from an indicator letting them know when to step aside or increase diversification (i.e. tilt portfolio allocations to bonds).


Reflection

This project gave me the opportunity to explore an interesting dataset covering over 100 years of stock market data. Some wrangling was required to prepare the data and create additional features of interest. I began by exploring all of the variables, creating a few interesting ones along the say. Then I analyzed the interactions between the features and finally created a few models for predicting stock market action.

Several variables were found to be correlated to future returns. These included the CAPE valuation ratio, interests rates and inflation. However, none of the correlations were very strong. This made building an accurate model for predicting the future returns difficult, with the strongest model only predicting 35% of the variance of returns. On the other hand, predicting whether the next year was bullish or bearish proved easier. A classification tree model was able to predict this feature with 78% accuracy.

This analysis could be enhanced by introducing additional asset classes to the dataset and building a model that created actionable predictions. For example, a model could be created to predict which asset would have the best performance and used to guide portfolio allocation.


  1. StockCharts Irrational Exuberance Review

  2. Robert Shiller’s Irrational Exuberance Data

  3. Returns to Buying Winners and Selling Losers

  4. A brief history of dividend tax rates