Zillow Uses Analytics, Machine Learning To Disrupt With Data


Cities are central to our lives. More than half of the global population now lives in and around c

Industry disruptor Zillow leverages data about residential real estate and makes it available to the general public. The company's senior director of data science and engineering shares the secrets behind Zillow's data stack.

Residential real estate site Zillow stormed onto the market in the 2000s, letting consumers check on the property value of their own homes and those of all their friends, family members, and acquaintances, too, much to the dismay of real estate professionals.

Founded by a couple of former Microsoft executives who went on to start travel site Expedia and then Zillow, this site threatened to disrupt the real estate market when it debuted in 2006. It gave people access to information that had previously only been available through real estate pros.

Ten years later Zillow has proven it has staying power. Built on the idea of ingesting, processing, and serving data from multiple sources to consumers, the company has made a name for its "Zestimate" -- its secret data-driven formula for predicting the value of a piece of real estate. But none of this happens without a sophisticated IT department and data operation behind the scenes.

[Can machine learning impact your enterprise? Read What eBay's Machine Learning Advances Can Teach IT Professionals.]

Jasjeet Thind, senior director of data science and engineering at Zillow, says that Zestimate is one of the ways Zillow uses machine learning. This real estate value estimate was the first available home valuation model, and it's composed of hundreds of models behind the scenes -- linear models, decision trees, deep learning, and more -- to predict values for every single home in the country, Thind said.

Thind gave IT and data professionals an inside view of what is under the hood at Zillow during a presentation at September's Strata + Hadoop event in New York.

"Zillow Group's mission is to build the largest, most trusted, and vibrant home-related marketplace," he said during the session. Zillow Group refers to the company that Zillow has grown into in the decade since its launch. Now a publicly held company, Zillow owns several brands, including Trulia, HotPads, StreetEasy, Naked Apartments, Mortech, dotloop, and Retsly.

Thind said that Zillow operates a data lake composed of data from all those brands. It also gets data from counties, the MLS, real estate brokers, and directly from users via the "Claim Your Home" feature. Thind said that Zillow's ability to get updated information directly from homeowners is one if its key competitive edges.

Data obtained from government records can be tricky and not very glamorous to ingest. Some of this property data is in JPG form, while other data is typed text. Thind said that Zillow leverages OCR technology in its ingestion process to help optimize costs. Because the data can be input faster, the system also improves user experience.

Ensuring data quality is a big topic at Zillow, Thind said. Public records data comes in many different formats, and the company employs a data analyst whose full-time job is to ensure data quality. Zillow uses trend detection to look for anomalies in number of sales transactions.

There are also checks at the data field level, too, looking for listings that have, for example, 30,000 bedrooms. Zillow also flags certain types of transactions such as foreclosures, because these deals are not used in the Zestimate calculations.

Zillow's technology platform includes Apache Spark. The company also uses Redis and Python for real-time scoring. Zillow taps AWS S3 for cloud storage and relies on AWS Redshift and Presto for its data warehouse. Thind said Zillow specifically turns to Presto when looking at historical data.

Beyond the Zestimate, Zillow provides other numbers to its audience, too, such as a Turbo Zestimate, and a "hot homes" designation (which predicts how fast a home will sell). Many of these figures are based on Zillow's Zestimate calculation.

Zillow has also invested in predicting the preferences of its consumer users through personalization and search. Thind said Zillow uses different kinds of user vectors depending upon how sparse the signals are for a particular user.  

Users who share their email address with Zillow can get recommendations for homes they would like, based on what they've searched for in the past. Zillow may also send these users personalized collections of homes based on what factors seem important to the users, such as good school districts.

For the data pros in the audience, Zillow offers a special gift. The company publishes a small selection of data sets on its website that users can download. They are at Zillow.com/data.

0 Comment

Leave a Reply

Captcha image


  • 5300c769af79e

    Blizzard Hearthstone: Heroes of Warcraft (for iPad)

    But on a recent Saturday morning, I played six hours of Blizzard Hearthstone: Heroes of Warcraft, and I have absolutely no regrets.Compare Similar ProductsCompare D&D Lords of Waterdeep (for iPad) %displayPrice% Pocket Mortys (for iPad) %displayPrice% Forbidden Island (for iPad) %displayPrice% Octodad: Dadliest Catch (for iPad) %displayPrice% Lara Croft GO (for iPad) %displayPrice% Pac-Man 256 (for iPad) %displayPrice% Spider: Rite of the Shrouded Moon (for iPad) %displayPrice% Race the Sun (for iPad) %displayPrice% Her Story (for iPad) %displayPrice% Transistor (for iPad) %displayPrice% You Meet at an Inn.
  • 5300c769af79e

    Samsung Galaxy S7 Active (AT&T)

    The Samsung Galaxy S7 Active isn't just a Galaxy S7 in rugged clothing.Compare Similar ProductsCompare Samsung Galaxy S7 (Verizon Wireless) %displayPrice% Samsung Galaxy S7 Edge (Verizon Wireless) %displayPrice% Sonim XP7 (Unlocked) %displayPrice% Sonim XP6 (AT&T) %displayPrice% Kyocera DuraForce XD (AT&T) %displayPrice% Caterpillar Cat S40 (Unlocked) %displayPrice% Samsung Galaxy S7 Edge (Verizon Wireless) %displayPrice% Kyocera DuraForce (AT&T) %displayPrice% Design and FeaturesWhen it comes to rugged phones, you need to be prepared to make a bit of a compromise in the design department.
  • 5300c769af79e

    Miercom Report on Behavioral Detection of Threats and Data Loss

    Download The iboss Platform was tested at Miercom Labs for efficacy in behavioral data loss prevention and earned the Miercom Certified Secure designation.The test found that iboss' exclusive behavioral approaches were able to detect more active and complex threats that normally go undetected, and stopped the extraction of sensitive data like credit cards and phone numbers.
  • 5300c769af79e

    Moto Z Play Review

    For the Moto Z Play, being a mid-range phone that we wouldn’t typically review, we took a different approach to sharing our final thoughts on it.Hey, look, a video review from Droid Life!