Finding the Best Restaurants in Mumbai & Bangalore. (Using Statistics)
Backstory
Recently, I came across an intriguing article by Matt Sayar, where he ranked all the restaurants in Colorado. Inspired by his approach, I decided to create a similar ranking for Mumbai and Bangalore.
To keep the task manageable, I introduced a small twist—I excluded bakeries and fast-food joints due to the sheer number of restaurants in both cities.
Let’s get Data
The first and most crucial step was gathering data. I considered scraping Zomato and Swiggy, but that required too much effort. Scraping Google was another option, but the risk of getting banned made it unappealing. So, I took the easy route—entered my credit card details on Google Cloud Platform (GCP) and gained access to the APIs.
After skimming the documentation for five minutes, I got a basic understanding of how to use the Places API (New). Yes, that’s the actual name—Google didn’t go with "V2," just "New."
In the Places API (New), we specifically need the Nearby Search API which would help us fetch all the nearby places within specific coordinates.
My first API request returned overwhelming data—far more than I needed. A bit more reading revealed that I could filter results using the X-Goog-FieldMask header, allowing me to request only the necessary fields.
shell
curl --location 'https://places.googleapis.com/v1/places:searchNearby' \
--header 'Content-Type: application/json' \
--header 'X-Goog-Api-Key:Your_API_KEY' \
--header 'X-Goog-FieldMask: places.displayName,places.primaryTypeDisplayName.text,places.rating,places.id,places.shortFormattedAddress,places.userRatingCount,places.location,places.googleMapsUri' \
--data '{
"includedTypes": ["restaurant"],
"maxResultCount": 9999,
"locationRestriction": {
"circle": {
"center": {
"latitude": 12.971599,
"longitude": 77.594566},
"radius": 500.0
}
}
}'
The Relevant Fields
places.displayName
places.primaryTypeDisplayName.text
places.rating
places.id
places.shortFormattedAddress
places.userRatingCount
places.location
places.googleMapsUri
I also tried setting maxResultCount
to 9999, thinking I had found a shortcut. But Google promptly responded with:
"Max number of place results to return must be between 1 and 20 inclusively.”
So, I had to put in some real work. But in a world full of LLMs—where ChatGPT can code better than the average new grad—this "hard work" was just about crafting the right prompt. Back in 2020, I would’ve spent an entire day writing an algorithm. Thanks to GPT, it took just 20 minutes.
How I Collected the Data
To systematically gather restaurant data across the city, I used a grid-based search approach. Since the Google Places API limits results to 20 per query, I created a bounding box around the city (min/max latitude & longitude). By generating grid points within this box (~1000m apart), I could query the API at each point, ensuring full coverage. This method prevented missing data, avoided API limits, and made data collection efficient.
However, this method introduced another challenge—overlapping search areas led to duplicate entries. Since multiple grid points covered the same restaurants, some results appeared more than once. I had to remove duplicate restaurants based on their unique Google Place ID to clean this up.
With this refined approach, I ran the script. Within 10 minutes, I had data on 4,325 restaurants in Mumbai and 6341 restaurants in Bangalore.
Data Cleaning:
Removed bakeries and fast-food joints
Filtered out places with 0 reviews
Eliminated duplicate entries caused by overlapping search area
After all these efforts (made by GPT), I was left with this :
Mumbai | Bangalore | Total |
1886 restaurants | 2990 restaurants | 4876 restaurants |
The entire data collection process cost me ₹1,150, but it was fully waived since it fell within my free credits.
Ranking the Restaurants
Now comes the fun part: how do you rank restaurants based on star ratings?
Most people would prefer a restaurant with 2,350 reviews and a 4.9 rating over one with 25 reviews and a perfect 5.0. Clearly, raw ratings alone don’t tell the whole story.
After some research, I came across the Wilson interval scoring method. Instead of simply sorting restaurants by their average rating (which can be misleading), the Wilson Interval helps identify the most reliably loved restaurants—those that are not just highly rated but also backed by a significant number of reviews.
How Does It Work?
Unlike simple averages, Wilson's scoring penalizes items with fewer ratings. This prevents situations where a 5-star restaurant with only 3 reviews ranks higher than a 4.8-star restaurant with 10,000 reviews.
The Wilson Score Formula:
p = positive ratings / total ratings
n = total number of ratings
z = confidence level (typically 1.96 for 95% confidence)
This lower bound acts as the Wilson score and is used to rank items.
If you are interested in knowing how Mr. Wilson reached this formula, you can read this out
Automating the Ranking Process
Once again, my best friend (GPT) helped me whip up a Python script. I simply passed the JSON data, and it ranked the restaurants using Wilson's formula.
That’s it! My dataset was ready. Now, all I needed was a decent interface where everyone could explore what I had built. I used React for the same.
Conclusion
Surprisingly, many of the top-ranked restaurants were ones I had never even heard of, while some of my personal favourites didn’t even make the top 50. This just goes to show how many gems are still waiting to be discovered.
I’m not familiar with the restaurant scene in Bangalore, so I’d love to hear your thoughts—how well did the algorithm rank them? Drop a comment and let me know!
Thanks to LLMs, building things has become ridiculously easy. With a bit of debugging and problem-solving, you can create just about anything today.