Zillow Uses Analytics, Machine Learning To Disrupt With Data


Cities are central to our lives. More than half of the global population now lives in and around c

Industry disruptor Zillow leverages data about residential real estate and makes it available to the general public. The company's senior director of data science and engineering shares the secrets behind Zillow's data stack.

Residential real estate site Zillow stormed onto the market in the 2000s, letting consumers check on the property value of their own homes and those of all their friends, family members, and acquaintances, too, much to the dismay of real estate professionals.

Founded by a couple of former Microsoft executives who went on to start travel site Expedia and then Zillow, this site threatened to disrupt the real estate market when it debuted in 2006. It gave people access to information that had previously only been available through real estate pros.

Ten years later Zillow has proven it has staying power. Built on the idea of ingesting, processing, and serving data from multiple sources to consumers, the company has made a name for its "Zestimate" -- its secret data-driven formula for predicting the value of a piece of real estate. But none of this happens without a sophisticated IT department and data operation behind the scenes.

[Can machine learning impact your enterprise? Read What eBay's Machine Learning Advances Can Teach IT Professionals.]

Jasjeet Thind, senior director of data science and engineering at Zillow, says that Zestimate is one of the ways Zillow uses machine learning. This real estate value estimate was the first available home valuation model, and it's composed of hundreds of models behind the scenes -- linear models, decision trees, deep learning, and more -- to predict values for every single home in the country, Thind said.

Thind gave IT and data professionals an inside view of what is under the hood at Zillow during a presentation at September's Strata + Hadoop event in New York.

"Zillow Group's mission is to build the largest, most trusted, and vibrant home-related marketplace," he said during the session. Zillow Group refers to the company that Zillow has grown into in the decade since its launch. Now a publicly held company, Zillow owns several brands, including Trulia, HotPads, StreetEasy, Naked Apartments, Mortech, dotloop, and Retsly.

Thind said that Zillow operates a data lake composed of data from all those brands. It also gets data from counties, the MLS, real estate brokers, and directly from users via the "Claim Your Home" feature. Thind said that Zillow's ability to get updated information directly from homeowners is one if its key competitive edges.

Data obtained from government records can be tricky and not very glamorous to ingest. Some of this property data is in JPG form, while other data is typed text. Thind said that Zillow leverages OCR technology in its ingestion process to help optimize costs. Because the data can be input faster, the system also improves user experience.

Ensuring data quality is a big topic at Zillow, Thind said. Public records data comes in many different formats, and the company employs a data analyst whose full-time job is to ensure data quality. Zillow uses trend detection to look for anomalies in number of sales transactions.

There are also checks at the data field level, too, looking for listings that have, for example, 30,000 bedrooms. Zillow also flags certain types of transactions such as foreclosures, because these deals are not used in the Zestimate calculations.

Zillow's technology platform includes Apache Spark. The company also uses Redis and Python for real-time scoring. Zillow taps AWS S3 for cloud storage and relies on AWS Redshift and Presto for its data warehouse. Thind said Zillow specifically turns to Presto when looking at historical data.

Beyond the Zestimate, Zillow provides other numbers to its audience, too, such as a Turbo Zestimate, and a "hot homes" designation (which predicts how fast a home will sell). Many of these figures are based on Zillow's Zestimate calculation.

Zillow has also invested in predicting the preferences of its consumer users through personalization and search. Thind said Zillow uses different kinds of user vectors depending upon how sparse the signals are for a particular user.  

Users who share their email address with Zillow can get recommendations for homes they would like, based on what they've searched for in the past. Zillow may also send these users personalized collections of homes based on what factors seem important to the users, such as good school districts.

For the data pros in the audience, Zillow offers a special gift. The company publishes a small selection of data sets on its website that users can download. They are at Zillow.com/data.

0 Comment

Leave a Reply

Captcha image


  • 5300c769af79e

    Moto X is "Alive and Well," According to Motorola

    In a statement provided to Android Police following the Lenovo Tech World keynote, Motorola says that the Moto X is “alive and well.” Citing the somewhat recent launch (October, 2015) of the Moto X Force overseas, Motorola clarifies that the purpose of the Moto Z line is not to replace the Moto X, but to offer something a bit different to consumers.
  • 5300c769af79e

    The Battle of Polytopia (for iPhone)

    If that's the case, I recommend a trip to the therapist and, while you wait, a few rounds of The Battle of Polytopia.This charming iPhone game is a simple but still-challenging mobile take on classic strategy games such as Civilization and StarCraft II.
  • 5300c769af79e

    'Shop the Look' With Google's Virtual Lookbook

    Most mobile shoppers are unsure of what brands to browse, and often turn to blogs for new trends and ideas.Simply search for "cocktail attire," and you may see an image of a fashion blogger wearing a black dress, heels, and sunglasses.
  • 5300c769af79e

    Valve to Close the Spigot on Gambling Operations

    A number of gambling sites have been leveraging the Steam gaming platform -- in particular, the Counter-Strike: Global Offensive game -- to create massive operations that convert virtual skins into cash, Valve said.It's part of a growing problem of competitive e-sports being used as platforms to promote illegal gambling, it said.