Profile Picture

Andrew Knight

Global Data and Technology Lead, RICS

In part 2 of this series, we provide cautionary tale about human-algorithm relations, told through the promise and pitfalls of the iBuyer real estate purchasing model.

Zillow: a cautionary tale in predicting house prices

To build its state-of-the-art AI models for property tech, Zillow attracted the best talent through a multi-year competition that also awarded one million dollars to a three-person winning team. Nima Shahbazi, member of this winning team later stated, and rightly so, that it is one thing to get a good accuracy of an AI model working on a website, but a completely different thing to make it work accordingly in the real-world, and at scale.

Predicting a house price can be difficult.  In the US, the average price of a house can number in hundreds of thousands of dollars. With those kinds of prices, even a small error of an AI system, perhaps a few percentage points, can lead to losses in tens of thousands of dollars per house. And replicating this process at scale would multiply the losses accordingly.

Predicting house prices is a also hard problem for AI to address, considering the number of direct and indirect factors that may impact house prices. As Zillow’s CEO Richard Barton later stated, ‘We’ve determined the unpredictability in forecasting home prices far exceeds what we anticipated’.

“In the case of Zillow, the AVM may have predicted accurately in aggregate, but seller behaviour appears to have clustered on properties that were not favourable from Zillow’s perspective.”

The renovation riddle

What Zillow was doing with Zestimate and iBuying was a lot harder and more complex than just buying and selling houses. They were also renovating the houses.

Renovation is a significantly more complex trait for an AI to grasp than predicting a house price. This is because it involves a variety of different industries, products (such as paint, flooring, ceiling, woodwork, bathroom utensils, electricity, lighting and labour costs) and prices (for example, depending on the range and quality). Such complexity makes renovation a very hard problem for AI to solve, because there is so much uncertainty in the data and there are too many assumptions stacked up on each other for any AI prediction to hold true. If any assumption does not hold true (for example, that the renovation cost of a particular house is predictable, or that the house price following remotion is predictable), or any prediction by an AI system is wrong (for example, the final cost of renovation), the entire system fails.

Valuation, volatility and variability

A persistent criticism levelled at AVMs concerns their capability to predict future values. The greater the time interval between offering a price to a seller and achieving a sale to buyer, the greater the risk that market conditions change.  When market volatility is high, this risk increases. Although there’s an opportunity for higher margins in a market where prices are increasing, margins could disappear or degrade very quickly in a down market.

If iBuyer models need to be stress tested for volatile conditions and must have datasets that could help in predicting market movements with a reasonable level of confidence. Without these, using algorithms for anything other than ‘mark-to-market’ valuation for a specific date and time would be a high-risk strategy.

The ability of an AI model to accurately factor in the condition of the property is another area of risk. Typically, AVM processes do not involve or include physical inspections of the property. Data on the property to establish its condition can, to some degree, be gained from external images interpretated using AI. However. the property’s condition affects both the initial valuation and the cost to remedy or retrofit the property for resale. The ability of an AVM to accurately consider these factors presents another risk.

With varying degrees of work required to prepare the properties for resale, any modelling would also need to factor in the effects of supply and demand on the costs and time required for the renovation. The availability of labour, materials, and machinery are all factors that could potentially cause delays and cost overruns.

Four lessons

AVMs have the potential to deliver a high degree of accuracy at an aggregate level, while having considerably reduced accuracy at the individual property level. In the case of Zillow, the AVM may have predicted accurately in aggregate, but seller behaviour appears to have clustered on properties that were not favourable from Zillow’s perspective.

What does this story tell us, and how can we use the undoubted power and utility of algorithms while also ensuring we manage these risks? We present four lessons in human-algorithm relations:

  1. Understand the capabilities, and crucially the limitations, of every algorithm. This includes understanding the models on which they are built, and the data that is used to train, drive, and calibrate them.
  2. Don’t allow ‘scope creep’, where algorithms and models are used for purposes for which they have not been designed or developed for.
  3. Keep challenging, benchmarking, and back-testing with real world data and outcomes.
  4. Make sure that the appropriate domain experts, in this case property professionals, work closely with the developers of algorithms and models right across the full lifecycle of initiation, design, development, operation, and governance.

If the relationship is not managed well, having a human in the loop can be problematic: potentially overriding the decision of an algorithm and thus introducing human error and bias. When managed well, a human-in-the-loop can provide the necessary vigilance and assurance that the algorithms are delivering the desired outcomes, and that risks are being managed and mitigated. The human-algorithm relationship should involve delegation, rather than abdication, of decisions and predictions to an algorithm.

[1] Ali is a doctoral researcher in Artificial Intelligence (AI) at UCL Institute of Education (IOE). His research focuses on Transparency of machine learning development pipelines for AI powered products in different real-world contexts. He specialises in auditing and evaluating different stages of AI development process including data collection and engineering, machine learning modelling and testing and AI deployment and iterative improvement stages. He has also worked as an AI consultant, helping various companies develop, prototype, and deploy AI tools in different sectors including financial services, education, healthcare and agri-tech.