Brad Lookabaugh, MFE
I am a big fan of blueberries. In particular, I enjoy blueberries in yogurt for breakfast. One annoyance is blueberries can be hit-or-miss, depending on the season or the luck-of-the-draw at the grocery store. To that end, I often find myself needing to pick out ones from the bottom of my bowl that look a little rough around the edges. I don’t inspect all of them, but excluding ones that show clear features of a “bad” blueberry make a big difference. A single bite into a soft, mushy, and malformed blueberry can ruin breakfast.
Applying this approach to investing can have a dramatic effect on portfolio performance. A strategy focused on removing the bad apples (or blueberries) among a set of investment opportunities is a simple concept, and a portfolio that is broadly invested across universe of assets except for, say, the worst 10% will naturally outperform the full set. It has a subtle, yet important, difference between an active selection investment strategy, whereby assets from a given universe that are expected to perform the best are chosen (screening each blueberry and putting only the best in the bowl). Perhaps this variety of strategy receives greater attention since, if successfully done, it will lead to greater out-performance than a deselection approach. But, history has shown that active selection is terribly difficult to do consistently over time. Now, this is not to say that deselection is any easier. In addition, the risk-adjusted performance of deselection can vary across time and asset classes.
Take equities. Positive skew and kurtosis (i.e., fat tails) in equity returns makes deselection quite attractive. Two pieces of literature from 2017 provide a powerful illustration of this skewness: a short report by BlackStar Funds and a paper by Hendrik Bessembinder, “Do Stocks Outperform Treasury Bills?“. The numbers are fascinating. Nearly 40% of all stocks in the study had negative lifetime returns, with almost one in five losing over 75% of their value. In addition, only 25% of all stocks accounted for all stock market gains. This is a poignant reminder of the inherent challenges in active selection of individual equities. Finding these big winners is one thing, holding onto them through intense market cycles is another. Obviously, identifying those that are bound to go to zero is no less difficult. Yet, creating a portfolio by starting with broad market exposure and focusing on removing the duds has advantages. For one, it guarantees you exposure to the high flyers – some of the 25% odd that account for the majority of the market gains. Attempting to hand pick the biggest winners affords you no such guarantee. In addition, deselection likely leaves you with a less volatile portfolio. By its nature, you have a well-diversified portfolio. Diversification can be achieved in active selection, but must be intentional.
Can deselection be applied to residential real estate?
The remainder of this post will explore answers to this question. We will attempt to answer:
- How would a deselection strategy manifest itself in residential real estate equity investing?
- What are the expected portfolio performance gains from this strategy?
- How does real estate as an asset class inform a deselection approach? How does this differ from other asset classes?
- What factors should we consider when attempting to identify the expected worst performers?
To help answer these, we conduct an experiment (a back-test, really) on two portfolios of properties. The two portfolios of single-family residences (SFRs) are created by taking a sample of home sales in two separate metros. We examine the return distribution of the homes in the portfolios and the overall return and volatility of the portfolio, with and without a deselection strategy. Differences across the two metros are also analyzed. To be sure, the deselection is done ex poste to investigate what, if any, performance boost is created through deselection. While back-testing strategies on equity markets can be done in 5-10 lines of codes and even fewer simplifying assumptions, back-testing a residential real estate investment strategy involves crafting a more careful setup. Curating this controlled experiment does mean it has a little artificial feel to it, but it serves the purpose of assessing whether a deselection strategy could work in theory. The next step would be to determine how to implement the strategy in various housing markets, ex ante. We conclude the post with ideas on ways to identify laggards to exclude from residential real estate portfolios, and a subsequent post will more thoroughly vet those ideas.
Building the back-test
Data is abundant in the world of equities, allowing for some creative simulations over various time frames and asset characteristics. Residential real estate does not offer that luxury. Nonetheless, we make our best effort to conduct this experiment in an unbiased manner and control any undesired influences imposed by wonky data. Here, we provide a detailed overview on the inputs to this research, including the data used, sampling approach, assumptions, and limitations.
Which markets are we looking at?
Residential real estate performance can vary wildly across metros. Some are more prone to booms and busts, resulting in higher volatility, while others appreciate at much slower and steady rates. Some have more pronounced seasonality characteristics, with an annual cycle of prices peaking during summer and softening in winter. To gain insight as to how a deselection strategy may vary under these different conditions, we run the experiment for two individual metros that have historically behaved quite differently.
Figure 1: Select non-seasonally adjusted Case-Shiller home price indices. Include are the Phoenix and Dallas metro area series, along with the Case-Shiller National and 20-City indices.
Figure 1 illustrates the extremes of these differences. Phoenix is a market that experienced a dramatic boom-bust through the crisis. In addition, it displays muted seasonality. On the contrary, market gains in Dallas were much tamer leading up to the crisis. Similarly, the subsequent decline was less dramatic. But, seasonality plays a much larger role. In recent years, both markets have performed well and have had gains comparable to the overall market, as measured by both the National Case-Shiller and the 20-City indices. Case-Shiller indices are often used to gauge the performance of the US residential real estate markets. These indices do a decent job of summarizing the broad movements of the market, but it is dangerous to assume the Case-Shiller movements directly reflect the performance of individual homes.
Which homes are we buying? What is our holding period?
Individual homes are sold infrequently. This presents challenges when creating a portfolio of homes intended to represent a given market. It also complicates tracking performance over time. A metro area may have a million or more SFRs, but only a fraction will sell in a specific time period. Thus, we are forced to make a trade-off between the size of the portfolio (in terms of number of homes) and the time window from which we sample transactions. For this exercise, we determined that getting as close to a uniform for all homes in the portfolio was more important than a larger sample. Thus, we sample all the portfolios’ assets from sales in a specific month and call this month . This highlights a rather obvious, yet important, difference between equities and residential real estate. It is quite straightforward to simulate or execute a deselection strategy (or any portfolio strategy, for that matter) for equities. Since stocks trade by the fraction of a second, initiating a strategy in a back-test (creating the portfolio at ) can be assumed to be quite similar to execution in practice. Homes trade on the order of years and this assumption breaks down.
The “investible universe” is defined as homes sold in December 1999 in each metro. This month was chosen because it allows us to set an effective to the beginning of 2000 , which is the year the Case-Shiller indices are based to 100. It also allows for a reasonable time sample. The analysis tracks the turnover of the homes from January 2000 through December 2017, a 17-year time frame that includes home sales before, during, and after the Great Recession.
The PHX and DAL portfolios are constructed such that we invest in 2,500 SFRs in Phoenix and Dallas, respectively, from this investible universe of homes that sold in December 1999 and subsequently sold prior to January 2018 . The following summarizes relevant details about the samples:
- Condos, multi-family, and other property types are excluded. This removes any differences in individual performance due to property type.
- Both newly constructed SFRs (sold for the first time) and resales were eligible.
- All sales were arms-length  and were assumed to reflect proper market value for the home. No distressed sales from December 1999 were eligible for purchase into the portfolios. Note, this does not imply the home did not exit the portfolio in a distressed manner. REOs, foreclosures, and short sales are a reality for investors. The portfolios in this experiment were not sheltered from those events.
- Homes subsequently sold within 12 months of purchase and those with an annualized return of over 50% were excluded. These filters are intended to remove “flipped” and substantially remodeled homes. It also catches transactions that have inaccurate data (e.g. the purchase of land and subsequent sale of property with a structure). These are well accepted filters often used in constructing residential real estate indices, such as the Case-Shiller.
- We limited our scope to only sample homes that have sold at some point in the 17-year time period. In fact, a fair portion of homes that sold in December 1999 have not yet sold again. Had we included these in the scope, we would have had to “liquidate” the remaining homes at some estimated value, resulting in a large cash flow at the end of the back-test. This would result in a strong influence on the measured performance of the portfolio. Therefore, we shrank the sample population as a control.
The table below describes the resulting sample for each portfolio. The samples are similar.
How are we measuring performance?
From our data, we have the raw ingredients – the purchase date and price and the subsequent sale date and price – to compute a return on each asset in the portfolios. These are the building blocks to compute the aggregate portfolio return and volatility. To simplify things, we assume that the proceeds from a home sale are not reinvested and are just held until December 2017. In practice, this capital would likely be recycled or, at a minimum, placed in interest-bearing securities. This feature could be added in future work. Since these sales yield cash flows, an annual IRR framework is a natural approach to assess the portfolio’s return. Specifically, the process for the back-test was run as follows:
- We assume $1 is invested into each of the 2,500 homes. This means our portfolio is equal-weighted across the sample, rather than dollar-weighted, which would be the case had we simply purchased all the homes outright. An equal-weighted test was preferred since it controls for any performance skew influenced by home price tier. In the IRR framework, this means we incur a cash outflow of $2,500 at .
- Cash flows are realized based on the next sale dates and prices. For example, if a home appreciated 50% and sold, the cash inflow from that home would be $1.50. Gross and annualized returns are also computed. Cash flows from the sales are grouped by year. Recall that we designed this back-test to only consider homes that were bought and sold within the 17-year time frame.
As an example, Figure 2 displays the annual cash flows along with the cumulative turnover for the 5,000 home PHX portfolio.
Figure 2: Cash flows for the PHX portfolio used to compute the IRR. Note the large cash outflow at t=0 is omitted from the graph.
The cash flows reflect a normal turnover for a pool of homes. Note there is a little blip in 2010, where sales were relatively higher. This year marked the bottom of the housing market following the crisis, and the increase in sales was likely attributable to forced sales. About 80% of the portfolio turns over by the end of 2010.
Volatility of these portfolios is very important as well. A previous post of ours provides an overview of our research on volatility in residential real estate. In short, the volatility measured by indices, such as the Case-Shiller, drastically understate the volatility of an individual home. Here, we compute the volatility of a portfolio as follows:
Where is defined as the annualized volatility of the portfolio. is the monthly volatility of an individual home, defined as:
- as the log-return of the home, using and as the value of the home at termination and purchase, respectively.
- as the holding period of the home, measured in months.
This approach was used for both the full and deselected portfolios.
How does the deselection work?
Now for the fun part. We’ve outlined how to measure the portfolios. Next, we see how the portfolios would have performed had we identified and removed the 10% worst performing homes. Again, we know we are cheating here. The point is to see what the impact would be if we had perfect insight into which homes would perform the worst. To do this, we simply rank-order the homes by annualized return and remove the bottom 250, yielding a portfolio of 2,250 homes. The performance is assessed in the same manner as outlined above.
Did this even work?
The table below summarizes the results for both metros. The boost in performance is admittedly less dramatic than what we may expect had we done the same exercise in equities. But they are nonetheless interesting, particularly if you focus on the difference between the two metros.
Cutting out the 250 worst performing homes adds roughly 50-100 bps to the IRR of the portfolios. Naturally, volatility is reduced (since we are removing 250 left-tail observations), improving the risk-adjusted returns as well. The reduction in volatility is quite pronounced in DAL, and we will explore that below. Intuitively, distressed sales made up a healthy portion of the 250 removed assets. Yet, it is interesting that the best possible performing portfolios in this controlled universe still had at least 1 in 10 homes terminate under distressed circumstances.
If these returns seem a little light, keep in mind no leverage is being assumed in this test. It’s pure one-to-one equity exposure in SFRs. Adding a bit of leverage magnifies the returns.
Comparing the metros
The initial portfolio characteristics seemed quite similar, but PHX clearly was the better portfolio, with double the IRR and lower annual volatility. To understand how this out-performance materialized, we need to dig into the individual asset performance.
Figure 3: Return comparison for all assets in the PHX and DAL portfolios.
These two portfolios demonstrate that residential real estate exhibits its own return positive skew and kurtosis. However, compared to equities a far greater percentage of assets have a positive return. This makes sense since a stock can easily go to zero, but it would take a lot for land with a structure to become completely worthless. By examining the above plot, which shows the distribution of cumulative annual returns for the homes in the portfolios, we can see some reasoning for the resulting portfolio performance.
The DAL portfolio experienced overall fatter tails, with negative skew dominating the impact of the few big winners. Interestingly, while the PHX portfolio average home return (6.1%) was larger than the median (5.4%) – evidence of positive skew, the DAL portfolio average and median were equal (2.2%). This suggests that deselecting the worst performing homes in the DAL cohort would have a stronger impact. This did not fully play out. The volatility reduction is larger in the deselected DAL (-7.3%), which primarily comes from wiping out the far left tail. But the impact on return was muted compared to PHX. A greater portion of the total DAL portfolio had a negative return, so, despite removing 250 losers, many remained. Where over 25% of the DAL portfolio had a negative return, less than 240 PHX homes did. This means the deselection CAGR cutoff was actually positive for PHX (i.e. homes with a positive return were removed). Overall, excluding the worst 10% had a meaningful impact on the portfolio risk-adjusted returns. The PHX sharpe ratio improved by about 50% while the DAL portfolio’s nearly doubled.
Figure 3 illustrates some important features of the distribution of returns across the entire holding period, but there are striking differences between the metros when looking at realized returns in a given year. Figure 4 below shows the distribution of returns of each portfolio for homes that sold in 2005. The DAL distribution looks relatively similar to its distribution in Figure 3. Yet, nearly all homes in the PHX sold in 2005 experienced a positive return. These type of return characteristics in the years leading up to the crisis, plus the fact a large portion of the assets turned over in during that time period, help illustrate the IRR differences between the two portfolios. One takeaway is that more volatile markets, or those susceptible to strong market cycles, are better candidates for deselection strategies. This could be more rigorously tested in later studies.
Figure 4: Return comparison for assets sold in 2005 from the PHX and DAL portfolios.
Do the Case-Shiller indices provide any insight?
Let’s look back at the metro Case-Shiller indices. Comparing the two during the boom-bust offers some information regarding portfolio performance differences. The Phoenix metro had a huge run-up before the crisis, whereas Dallas remained relatively flat. Many homes in the portfolios did turn over during this time period. All else equal, the home sales in the increasing Phoenix market produced stronger cash flows in the early years that likely contributed to the higher overall IRR.
But can we simply take the index performance as a direct proxy for the individual homes? Figure 5 below shows how the individual homes fared compared to their respective metro index.
Figure 5: Performance of the individual homes in the portfolios compared to the respective metro index over the same holding period.
Again, we see the presence of extremely fat tails, hinting that the Case-Shiller index may not be a good proxy for a portfolio, even one with a few thousand assets. The distribution characteristics are similar to what we would find if we did a comparable analysis with equities (e.g. individual stock performances vs. S&P 500). We are frequently told that an individual stock carries much greater risks, but this is rarely mentioned in the context of an individual home.
How can we identify ones to exclude?
So far, this post has focused on the theoretical performance gain of a deselection strategy. To implement this in practice, we need to figure out how to identify the ~10% of homes to remove from the investment universe. Doing this a priori is the tricky part. This exercise is quite trivial when throwing out bad blueberries; houses are another story.
One factor to explore is location. Location undoubtedly explains some portion of price movement, as evidenced by the difference in the metro Case-Shiller indices above. There are geographic characteristics that would intuitively lead to out-performance, but can these alone indicate whether an individual home should be excluded from a portfolio?
The map below helps us visually evaluate this. Each dot represents an individual home in the portfolios. Gray dots represent homes excluded from the deselected portfolios (i.e. ones in the bottom performance decile of each metro).
Figure 6: Interactive map showing all homes held in the portfolio. Resulting deselection candidates are indicated as dark gray dots. Click the “Focus Dallas/Phoenix” button to switch the view to the other metro. Hover over the dots to show the total and cumulative annual growth rates of the home.
Inspecting the map, we see there is only slight visual correlation between location and likelihood of being in the bottom 10% . In addition, for areas in which there are several gray dots clustered together, you can find other neighboring homes that performed strongly. This implies that geographic location, and by extension neighborhood characteristics, cannot be relied on as the lone factor in determining properties to exclude. What else could we explore?
There is a long list of factors that we could evaluate. In addition to location and neighborhood characteristics, others that come to mind include property specific attributes – bed, bath, gross living area, lot size, etc. – on an absolute and/or relative basis. For instance, do 6bd/4ba homes under-perform the market? Or, do homes with +/- 2 bedrooms above/below the median bedroom count for an area under-perform? Is there a performance difference between New Construction in resale? What about home condition and construction quality? Do amenities, such as pools and landscaping, add significant value? How do the location influences interact with these other features?
Building a proper model
This toy back-test needs to be repeated in a more rigorous manner in order to build a strategy that could be implemented in practice. This means challenging the simplifying assumptions and thinking through how we would properly “censor” data, such that the back-test does not use future data to make investment decisions. Some things we will consider:
- What are the appropriate models to use when evaluating the explanatory deselection factors? Our model(s) should prioritize intuition and feasibility given the discussed data constraints.
- Here, we looked at two metros in isolation. How do the results change with a geographically diversified portfolio? Do the important deselection factors differ, or have different levels of impact, across metros?
- We limited our scope to SFRs. What happens when we introduce other property types, such as condos and multi-families?
- How will we handle missing data? Current residential real estate data is spotty as it is. It gets sparser as you go further back in time.
- How do our results vary across different historical time frames and holding period lengths? This study did cover the housing boom-bust, but other market cycles warrant attention. It is well known that simulations in equities can be quite sensitive to the starting day/month/year. Is this the case in residential real estate? To what extent?
- Investing $1 in each home is impractical. How do we create an equal-weighted portfolio? If we must buy the homes outright, how does a market-weighted portfolio perform in the deselection framework? Is there another contract structure that we could utilize or create?
- What happens if we add leverage?
All of these could be considered in future research.
This post explored the feasibility of a deselection strategy in residential real estate. We created two toy portfolios comprised of December 1999 home sales in Phoenix and Dallas, respectively, and assessed the portfolio performance boost when we excluded the bottom decile of homes, measured by annualized return. We found that a perfectly executed strategy, one in which we identify every home that is in the bottom 10%, can lead to ~50-100 bps increase in IRR and a meaningful reduction in volatility for an un-levered portfolio. These results are less dramatic than those achieved in an similar strategy for equities, but is not necessarily a surprise given the differences in market structure and historical individual asset price behavior between equities and residential real estate.
This study has yielded more questions than answers, but that is perhaps the point. In general, a simple toy study is useful to explore the broad strokes of a strategy before fully diving in. We now have some benchmark as to how a deselection strategy would perform and has given us a path for future research.
 It is understood that the exact date the homes are sold does not factor into the holding period. Holding periods are computed on a monthly time frequency, and any differences in returns by using daily time frequency is deemed negligible.
 Property level data and transactions used throughout this analysis is tax assessor and county recorder data, provided by First American Data Tree.
 Buyer and seller are acting in their own self-interest. Non-arm’s length sales may result in a sale price that is materially different than the true market price.
 You can find some clustering in each city, but no obvious correlation exists. Interestingly, some of that clustering is in new developments, where a large number of homes sold in December 1999.
Share this Post