Data Science Challenge: Predict Restaurant Health Scores with Yelp Data

Yelp connects people with local businesses and along the way we’ve gathered rich data about customers’ experiences at those businesses via reviews, tips, check-ins and business attributes. We are constantly asking ourselves how the collective wisdom of Yelpers can be used to help society. A couple years ago we began working with cities to share restaurant health scores on Yelp, but that’s just the beginning. Could we use Yelp’s reviews and business information to make the process of sending Health Inspectors to restaurants more efficient? We think so and are challenging data scientists worldwide to design a health inspection prediction algorithm, using Yelp data.

Yelp is co-sponsoring a new Data Science contest “Keeping it Fresh“ in collaboration with the City of Boston, and Harvard University economists (Ed, Andrew, Scott, and Mike). Using Yelp’s data for restaurants, food and nightlife businesses in Boston as well as past history of health inspections, we are asking contestants to predict the future health score that will be assigned to a business at their next health inspection.


According to the Centers for Disease Control, more than 48 million Americans per year become sick from food, and an estimated 75% of the outbreaks came from food prepared by caterers, delis, and restaurants. Currently, inspectors are sent to restaurants in a mostly random fashion. Since cities only have a limited number of health inspectors, quite often their time is wasted on spot checks at clean, rule-abiding restaurants. This also means that sometimes restaurants with poor health and safety records are discovered too late.

It turns out that with Yelp’s data, cities can improve the process of assigning Health Inspectors drastically. A research study by Professor Michael Luca from Harvard Business School and Professor Yejin Choi from Stony Brook University and their graduate students found that a model built using Yelp’s reviews data and past health inspection records is able to successfully predict future inspection scores for restaurants 82 percent of the time.

So the gauntlet has been thrown. Data scientists of the world – can you beat 82 percent?

Winning algorithms will be awarded financial prizes — but the real prize is the opportunity to help the City of Boston, which is committed to examining ways to integrate the winning algorithm into its day-to-day inspection operations.

Read about how Yelp engineers have tried to crack this case on the Yelp Engineering Blog, and check out the contest page for all the rules and juicy details. Submissions close on July 7, 2015.