 Home Analytics Part 3: Volatility

Introduction

In the previous two articles of the home analytics series, we focused on returns of homes.  In part 1, we solved for the long-run returns of single family homes which provides a robust anchor for forecasting future returns.  In part 2, we developed an index for period-by-period returns and showed how the index can be used to estimate the price of a home some period of time after a sale.  However, investors are not only interested in the returns of a particular investment but also in its risks (i.e. volatility of returns).  In this article, we will propose a model for estimating the volatility of single family home real estate.

Volatility of Assets

Predictability of an asset’s performance is a virtue.  Knowing the long-run expected return isn’t enough to compel an investment.  We must also determine how much variability of performance exists around that expectation.  The following chart shows frequencies of different monthly returns of the S&P 500, the Russell 2000 and the Case-Shiller National Real Estate index from 1987-2017.  The indices consistently cluster around their respective means, but clearly the Russell 2000 shows much more variability, and therefore less predictability, in monthly returns than the Case-Shiller.

Figure 1: Histogram of monthly returns for S&P 500, Russell 2000 and Case-Shiller (1987-2018)

We can express this variability as a volatility of returns around the average. $\mu = \frac{1}{n}\sum r$ $\sigma = \sqrt{\frac{1}{n}\sum (r-\mu)^2}$

Where:

• $r$ is the period (log-)return
• $n$ is the total number of periods
• $\mu$ is the average return
• $\sigma$ is the sample volatility of returns

The historical annualized return and volatility for these assets are provided in the following table:

 Average Return Volatility Russell 1000 6.2% 15.7% Russell 2000 7.1% 20.7% Real Estate Index 3.5% 2.9% Individual Home 3.5% 12.7%

Figure 2: Table showing long run expected returns and corresponding volatility for Russell 1000, Russell 2000, Case-Shiller and an individual home.

Looking at the values, we immediately notice an interesting relationship between risk and reward for the indices.  Namely, the more variability of the returns, the higher the long run average return achieved.  Intuitively, an investor would require a greater return to compensate for a lower predictability (higher risk), so this result makes sense.

However, this relationship breaks when considering the difference in volatility between the Case-Shiller index and an individual home in the index.  This follows capital asset pricing theory, namely that an investor is only rewarded for taking systemic (un-diversifiable) and not idiosyncratic (diversifiable) risk.

A Home is not an Index

The S&P 500 and Russell 2000 indices are investable assets comprising a diverse range of stocks for publicly-listed companies.  In other words, one could actually purchase exchange-traded funds (ETFs) that almost perfectly match the returns of those equity indices.  Due to their convenience, index-based investing is effective and increasingly popular.

On the other hand, the Case-Shiller index is not investable.  Theoretically, one could imagine a world where a small investment is made in every single home in the country.  The performance of this portfolio would closely replicate that of the Case-Shiller index.  Until such an index is constructed, investors and homeowners are constrained to invest wholly in individual homes.  These individual assets are unique and are exposed to idiosyncratic risks which vanish in a diversified portfolio (individual market risks, natural disaster risk, homeowner credit events, etc…).  The goal of this post is to estimate the risk of an individual home.

This distinction is critical.  The majority of single-family home investments in the United States are levered (via mortgages and HELOCs) and un-diversified (typically a homeowner owns a single home).  It is imperative that we understand the risks of real estate at the individual level, not just in aggregate.

Historical Risk of Single Family Homes

Before we begin modeling, let’s take a look at the historical variance (squared volatility) of individual homes.  Each dot in the following chart represents a grouping of homes that are both purchased in the same year ( $t$) and subsequently sold $\tau$ years later.  For example, one particular dot represents every single home purchased in 2000 and then later sold in 2015.  This dot will have a holding period of $\tau=15$ years.  Dots are colour coded by purchase year.

Figure 3: Chart shows the sample variance (squared volatility) of homes purchased in year $t$ and sold $\tau$ years later.  Data reflects 1996-2017 time period.

Like many investment assets, homes exhibit an increasing relationship between the variance of returns and holding period.  In other words, the longer you hold your home, the higher the variability of outcomes.

Sample variance is calculated as follows: $\mu_{t,T} = \frac{1}{n_{t,T}}\sum r_{i,t,T}$ $s_{t,T} = \sqrt{\frac{1}{n_{t,T}-1}\sum (r_{i,t,T}-\mu_{t,T})^2}$ $\tau = T-t$

Where, for a given sample of homes purchased in year $t$ and sold in year $T$:

• $s_{t,T}$ is the sample volatility
• $n_{t,T}$ is the number of homes in the sample
• $r_{i,t,T}$ is the total (log-)return of home $i$ in the sample
• $\mu_{t,T}$ is the average return
• $\tau$ is the holding period

A Review of Variance of Random Processes

Before we propose a model, let’s review some facts:

1. In our previous articles we showed that (log-)returns of homes tend to increase linearly with time: $r(t) \sim \mu t$

2. In this article we showed graphically that the variance (square root of) (log-)returns of homes tend to increase linearly with time: $(r(t)-\mu t)^2 \sim \sigma^2 t$

These results are intuitive.  If we assume that the (log-)return in each holding period is normally distributed with average $\mu$ and volatility $\sigma$, we can back out facts 1 & 2 with the properties of sums of independent and identically distributed random variables.

Let’s take 2 returns periods of a single home, $r_{t,t+1}$ & $r_{t+1,t+2}$: $r_{t,t+1} \sim N(\mu,\sigma^2)$ $r_{t+1,t+2} \sim N(\mu,\sigma^2)$

If we want to determine the average and variance of returns over the entire period $r_{t,t+2}$ we can just add up the average and variances of the individual periods: $r_{t,t+2} \sim N(\mu+\mu,\sigma^2+\sigma^2) = N(2\mu,2\sigma^2)$

This can be further generalized: $r_{t,T} \sim N(\mu (T-t), \sigma^2 (T-t))$

With this relationship, we have enough information to build a simple and robust model for the volatility $\sigma$ of individual homes.

Avoiding a Trap

We might be tempted to just fit a line through Figure 3 and call it a day.  This is the equivalent to training a linear regression model to our sample variance vs. holding period data.  In fact, this is exactly what was done in the original Case-Shiller paper.

The model implies the following structure of sample variance: $\sigma^2_{t,T} = \alpha (T-t) + \tilde{\epsilon}$

Where $\tilde{\epsilon}$ is a normally distributed random variable.  This model is dangerous as it permits a negative variance.  In other words, if we sample from this distribution, we might draw a series of negative values of variance.

A much more suitable model for variance is a Chi-squared distribution which properly accounts for the distribution of observed variance in our data-set. $r_i-\mu \sim N(0,\sigma^2)$ $\sum_{i}^n \frac{(r_i-\mu)^2}{\sigma^2} \sim \chi^2 (n-1)$

Where:

• $r_i$ is return in the sample of returns
• $\mu$ is the average return
• $\sigma$ is the volatility of returns
• $n$ is the number of returns in the sample
• $N(.,.)$ is the normal distribution
• $\chi^2$ is the Chi-squared distribution

A Robust Model for Variance

Restating our goal: we want to determine what the annualized volatility of real estate returns.  In other words, we want to determine at what rate variance $\alpha^2$ (or square of volatility) increases with time: $\sigma(\tau)^2 = \alpha^2 \tau$

Where

• $\sigma(\tau)$ is the volatility of returns
• $\alpha$ is the annualized volatility that we are trying to solve for
• $\tau$ is the holding period

We have all the ingredients to solve for $\alpha$.  First let’s plug our volatility model into the Chi-squared distribution relationship formula: $\sum_{i}^n \frac{(r_i-\mu)^2}{\sigma^2} \sim \chi^2(n-1)$

Which can be rewritten as a function of the sample variance and our volatility functions: $\frac{s^2 (n-1)}{\alpha^2 \tau} \sim \chi^2(n-1)$

Where:

• $s^2$ is the sample variance for a given grouping of sale pairs
• $n$ is the number of data points in the grouping
• $\chi^2 (n-1)$ is the Chi-squared distribution with $n-1$ degrees of freedom

The Chi-squared distribution has the following probability density function: $p(x|\chi^2(k=n-1),\alpha) = \frac{x^{k/2-1}e^{-x/2}}{2^{k/2}\Gamma(k/2)}$

Let’s take an initial guess that $\alpha = 10\%$.  We can now compute the total probability $p_{total}$ of observing all of the sample variance points shown in Figure 3.  Following the law of total probability, it is simply the product of the probabilities of seeing each individual sample variance $s_i^2$ for sample $i$ in the list of samples $l$. $p_{total|\alpha=10\%} = p(x_1|\alpha=10\%)*p(x_2|\alpha=10\%)*\hdots*p(x_l|alpha=10\%)$

and $x_i = \frac{s_i^2 (n_i-1)}{\alpha^2 \tau_i}$

Solving for the Volatility Rate

We can now flip the analysis around and ask ourselves the following question: “What is the value of $\alpha$ that maximizes the probability of experiencing the sample variances in our data-set?”  Using home sale data from 1996-2017, we find that the annualized volatility $\alpha = 12.7\%$ best describes our data-set of home sales.

Conclusion

In this post we have investigated the volatility of an individual home in the United States.  Using a historical data-set of single family home transactions and fitting a Chi-squared model we determined that the maximum likelihood estimate for volatility is $12.7\%$ per year, substantially higher than that of the index of homes.

This result has two substantial implications.  First, homeowners with large mortgages (i.e. highly levered) are taking substantially more risk than previously thought.  Second, there are immense diversification benefits of generating a portfolio of single-family homes since the majority of single-family home risk is idiosyncratic (i.e. diversifiable).

Data Sources

Federal Reserve Economic Data (FRED) – Case-Shiller National Real Estate Index, S&P 500 and Russell 2000 monthly returns (1987-2018)

Recorded transaction data obtained from First American Mortgage Solutions.  Data reflects 1996-2018 time period.