Home Analytics Part 2: Home Price Index

Brodie Gay, MFE – Quantitative Strategist


In the previous article Home Analytics Part 1: Long Run Returns, we developed a method for estimating long-run average real estate returns. While this is a useful anchor for our expectation of future performance, we may be interested in a higher resolution, period-by-period view of the real estate market.  Specifically, what if we wanted to know how a home performs in a given year, quarter or month?  In this article, we will describe how a home price index can answer this question.  We will then provide index returns for select markets and walk though the math involved in generating them.

Real Estate Return Index

Analogous to the S&P 500 and Russell 2000 indices of stock market returns, a real estate return index provides a benchmark for real estate returns.  Homes are unique and thinly traded.  Unless a home is repeatedly sold on the market, it is very difficult to determine, and therefore, track its value over time.  By fitting an index of returns to homes that have actually sold, we can estimate the evolution of prices for homes that are off the market, since the entire asset class is quite highly correlated.

Figure 1: S&P 500, Russell 2000 and Case-Shiller National Indices over time (indexed to a value of 100 on January 1st, 2000)

One of the most well known indices is the Case-Shiller weighted repeat sales model.  The article describing its construction was published in the late 1980s and has since been a standard tool for real estate owners looking to track the value of their assets.   The Case-Shiller weighted repeat sales model is very powerful but has some drawbacks:

  1. When new sales occur, especially those that have been held for a long time, those sale pairs tend to shed new light on past returns.  Therefore, the index will retroactively update and past return estimates may change as new transactions occur.
  2. The model only looks at purchase and sale prices and cannot tell if a home has been remodeled/renovated or if a considerable amount of depreciation has occurred.  However, efforts are made to remove outliers (such as homes that were clearly flipped).
  3. The method of estimating the variance of home price returns is computationally unstable.  It models variance under a normal distribution prior rather than the more correct (and efficientChi-Squared distribution.  Fortunately, both methods produce the same results asymptotically (i.e. with a large number of data points) because the limiting distribution of the mean of a Chi-Squared distribution is normally distributed.  But if we wanted to determine the variance structure of a county or zip-code (with fewer data points), our model may break.

A few variants of the model exist and from here on out, we will focus on a slightly simplified version of the Case-Shiller which ignores transactions shocks (i.e. jumps in variance).  Below, we show the index of home price returns for the country, states, and Core Based Statistical Areas (CBSAs), which include both larger metropolitan and smaller micropolitan areas based on the simplified model.  The following data cleaning steps have been taken:

  • States and CBSAs with fewer than 50,000 sale pairs have been removed.
  • Homes with more than +/-50% returns per year are removed (these are typically flips or are not arms-length transactions).
  • Only homes purchased for between $50,000 and $5,000,000 are included.
  • Only arms-length residential repeat sales are included (no foreclosures or short sales)

Figure 2: Real Estate Index returns for the country, states and CBSAs.

In the following sections, we will review how to prepare our dataset and present a very simple 2-period example.  We will then describe a general method for building an index.

Preparing Data for Construction of the Index

Before we can create our index, we need a dataset to train it on.  Just like in the previous article, we are interested in homes that have sold at least twice (i.e. sale pairs).  With both a purchase and sale price, we can obtain a return.  We will also need to keep track of purchase and sale dates since we want a particular home’s return to influence those periods of the index that line up with its holding period.

In the last article, we argued that home returns tend to be log-normally distributed so it is beneficial to transform our returns into log-returns:

r = ln(R+1) = ln(p_{sale})-ln(p_{purchase})

r \sim N(\mu\tau,\sigma^2\tau)

  • r is the log-return of a home
  • P_{sale} is the sale price
  • P_{purchase} is the purchase price
  • R is the total holding period return
  • N(\mu\tau,\sigma^2\tau) represents a normal distribution with mean \mu\tau and variance \sigma^2\tau

This return occurs over a holding period \tau:

\tau = T - t

  • T is the sale date
  • t is the purchase date
  • \tau is the holding period

This distribution or r implies the following model for generating real estate log-returns:

r_i = \mu\tau_i + \sigma\sqrt{\tau_i}\epsilon_i

\epsilon_i \sim N(0,1)

  • r_i is the return of sale pair i
  • \tau_i is the holding period of sale pair i
  • \epsilon_i is the white noise random error of sale pair i
  • \mu is the true long-run average rate of return
  • \sigma is the true variance rate of returns

In the last article, we endeavored to obtain an estimate of \mu, the long run average return of real estate.  In this case, we want to estimate period-by-period average “short-run” returns. With our model, returns, holding periods and sale/purchase dates, we can begin estimating our index of returns.  But before presenting the general model, it may help to walk through a simple example.

A 2-Period Example

To begin, let’s consider a 2-period model with 3 sale pairs (assume the returns are log-returns):

  • Home 1 is purchased in year 0 and sold in year 1, returns 20%
  • Home 2 is purchased in year 1 and sold in year 2, returns -10%
  • Home 3 is purchased in year 0 and sold in year 2, returns 5%

Plugging this data into our model:

20\% = \beta_{0,1} + \sigma\epsilon_1

-10\% = \beta_{1,2} + \sigma\epsilon_2

5\% = \beta_{0,1} + \beta_{1,2} + \sqrt{2}\sigma\epsilon_3

Intuitively, for Home 1, only the first period return factor \beta_{0,1} will contribute to the return of 20\%.  The same is true for Home 2, but this time only the second factor \beta_{1,2} affects its return.  Finally, Home 3’s return comprises both factors and exhibits twice the variance (or \sqrt{2} times the volatility) due to being held over two years.

Figure 3: Fitting a home price index to 3 sale pairs.

Our goal is to estimate the period-by-period average log-returns \beta_{0,1},\beta_{1,2} that best explain the log-returns in our data set (i.e. minimize the squared errors \epsilon_i).  Those estimates solve the following convex optimization problem:

\underset{\hat{\beta}_{0,1},\hat{\beta}_{1,2}}{\text{minimize}} \ \epsilon_1^2 + \epsilon_2^2 + \epsilon_3^2

\underset{\hat{\beta}_{0,1},\hat{\beta}_{1,2}}{\text{minimize}} \ (20\%-\hat{\beta}_{0,1})^2 + (-10\%-\hat{\beta}_{1,2})^2 + (\frac{5\%-\hat{\beta}_{0,1}-\hat{\beta}_{1,2}}{\sqrt{2}})^2

Note that for each term, only the relevant \beta_{t,t+1} is picked out to explain that home’s return.  Additionally, the weighting on the final term ensures equal variance in error terms across all three homes.  Note that \sigma can be factored out of the equation since it appears in the denominator of each term.

This minimization can be written in vector notation:

\underset{\hat{\beta}}{\text{minimize}} \ (\textbf{1}\hat{\beta})^T \Omega (\textbf{1}\hat{\beta})


\Omega = \mbox{diag}(\vec{\tau})^{-1}

  • \vec{\tau} is a vector of holding periods \tau_i
  • \Omega is a weighting matrix

Solution to the 2-Period Example

By taking first order conditions we get closed-form solutions for estimates \hat{\beta}_{0,1},\hat{\beta}_{1,2}.  This is simply the solution to the Weighted Least Squares Regression.

\hat{\beta} = \begin{pmatrix} \hat{\beta}_{0,1} \\ \hat{\beta}_{1,2} \end{pmatrix} = (\textbf{1}^T\Omega\textbf{1})^{-1}(\textbf{1}^T\Omega r)


\textbf{1} =\begin{pmatrix}1&0\\ 0&1\\ 1&1\end{pmatrix}    \Omega =\begin{pmatrix}1&0&0\\ 0&1&0\\ 0&0&\frac{1}{2}\end{pmatrix}    r =\begin{pmatrix}20\%\\ -10\%\\ 5\%\end{pmatrix}

Which is solved by:

\hat{\beta} = \begin{pmatrix} 18.75\% \\ -11.25\% \end{pmatrix}

Application of the 2-Period Index to Price a Home

An example application of this output is to value a 4th home that was purchased in year 0 for $100k.  In year 1, our expectation of that home’s log-returns is 18.75%.  In year 2, the market has decreased and the total log-returns will now only be 7.5% (18.75%-11.25%).

P_{t=0} = \$100,000

P_{t=1} = \$100,000 e^{0.1875} = \$120,623

P_{t=2} = \$100,000 e^{0.075} = \$107,788

Figure 4: Estimating a home’s price using the fitted home price index returns.

General Solution to Estimating Index Returns

We now present the general method for estimating period-by-period returns \beta_{0,1}, \beta_{1,2}, \hdots,\beta_{T-1,T} that best explain the sale pair returns in our data set.

r_{i,t,T} = \beta_{t,t+1} + \hdots + \beta_{T-1,T} + \sigma\sqrt{\tau_i}\epsilon_i

\epsilon_i \sim N(0,1)

The model can be expressed more succinctly:

r_{i,t,T} = \sum_{j\in[t,T-1]}\beta_{j,j+1}+ \sigma\sqrt{\tau_i}\epsilon_i

Or using vector notation and an identifier function \textbf{1} (which picks out the \beta_{t,t+1}s relevant to the holding period for the sale pair and sets the remaining ones to 0):

\vec{r} = \vec{\textbf{1}}_{j\in[t,T-1]}\vec{\beta}+ \vec{e}

\vec{r} = \begin{pmatrix} r_0 \\ r_1 \\ \vdots \\ r_n \end{pmatrix}    \vec{\beta} = \begin{pmatrix} \beta_{0,1} \\ \beta_{1,2} \\ \vdots \\ \beta_{T-1,T} \end{pmatrix}    \vec{e} = \begin{pmatrix} \sigma\sqrt{\tau_0}\epsilon_0 \\ \sigma\sqrt{\tau_1}\epsilon_1 \\ \vdots \\ \sigma\sqrt{\tau_n}\epsilon_n \end{pmatrix}

  • \vec{r} is a vector of returns r_{i,t,T}
  • \beta_{t,t+1} is the index return between t and T
  • \sigma is the volatility of returns
  • \tau_i is the holding period of the return
  • \epsilon_i is the white noise process

We can solve for the vector of returns \hat{\beta} with the following equation:

\hat{\beta} = (\textbf{1}^T\Omega\textbf{1})^{-1}(\textbf{1}^T\Omega r)


\Omega = \mbox{diag}(\vec{\tau})^{-1}

  • \vec{\tau} is a vector of holding periods \tau_i
  • \Omega is a weighting matrix

General Application for Estimating the Price of  Home

With our estimates of \beta_{t,t+1} we can approximately value homes using the following asset pricing formula:

P_T = P_t e^{\beta_{t,t+1} + \hdots + \beta_{T-1,T}}


  • P_T is the approximate price in period T
  • P_t is the purchase price in period t
  • \beta_{t,t+1} + \hdots + \beta_{T-1,T} is the sum of index returns between periods t & T


In this article, we have described a simplified version of the weighted repeat sales method for constructing an index of returns.  With the index, a homeowner can gain insight into the value of their home over time by using information from the broader market of real estate transactions.

In the following article, we will describe a model for estimating the variance of home returns.  With both of these estimates, expected return and expected variance, we can begin to think about how to optimize a homeowner, retail or institutional portfolio comprising real estate assets.


Case, K.E., Shiller, R.J. (1987). Prices of single-family homes since 1970: new indexes for four cities


First American National Transaction Data 1996-Present.


Google Finance

Share this Post