What is HTTP Error 503 Service Unavailable and How to Fix it?
Understand what causes HTTP 503 errors, when they might indicate blocking, and how to effectively mitigate them.
Considered one of the most popular domains for business directories in the US, Yelp contains valuable company details, including addresses, emails, phone numbers, and reviews. But what's the most efficient way to extract these data?
In this guide, we'll take an extensive look into Yelp API, its key features, pricing, and limitations. Furthermore, we'll discuss potential alternatives. Let's dig in!
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:
Scrapfly does not offer legal advice but these are good general rules to follow in web scraping
and for more you should consult a lawyer.
The Yelp API is a product provided by Yelp for developers to automate certain actions or extract specific business data. Yelp APIs come with different endpoints, covering a wide range of workflows and resources.
In terms of data extraction, the Yelp API offers the following popular features:
Note that the Yelp API capabilities aren't limited to the above features. Refer to the official Yelp API documentation for more.
Yelp includes thousands of business details and millions of equivalent reviews across different sectors and indentures. This makes the queries required to find specific businesses quite complex. Therefore, Yelp has introduced the Fusion API.
The Fusion API provides an easier business search experience to find the best matching results powered by an AI chat interface. It answers prompted queries and categorizes them into business questions or search results. Then, it retrieves the relevant results with related information, reviews, photos, and more.
For further details on Yelp Fusion API, refer to the official introductory tutorial.
Yelp has information and review data for millions of businesses across various sectors and industries. Extracting this data using Yelp empowers different use cases, including:
Market Research
Navigating business services on Yelp allows business owners to evaluate their offerings based on the current market trend, which supports decision-making and helps businesses remain competitive.
Lead Generation
Considered one of the largest business directories in the US, Yelp API enables easy retrieval of contact information. Details like names, addresses, phone numbers, and emails make building outreach and marketing campaigns easier.
Sentiment Analysis
Using third-party tools and software has made it much easier to utilize LLMs for RAG applications and sentiment analysis models. Hence, extracting data from Yelp reviews is an excellent way to train these language models for context-aware applications.
For further details, have a look at our introduction to web scraping use cases.
In the following sections, we'll explore using the Yelp API for data extraction. We'll cover each related resource endpoint and the core parameters.
One of the most popular Yelp API endpoints is used to search for businesses. The generic search endpoint retrieves basic business data based on the provided search query. Below is the search API endpoint schema:
curl --request GET \
--url 'https://api.yelp.com/v3/businesses/search?sort_by=best_match&limit=20' \
--header 'Authorization:Yelp API key' \
--header 'accept: application/json'
The business search endpoint accepts different URl parameters to refine and narrow down the retrieved results. Below are the most common query parameters:
Parameter | Type | Description |
---|---|---|
term |
string | Search term to use |
sort_by |
string | Sorting algoirthm to use |
categories |
[]string | Categories to filter search results by |
location |
string | Geographic area to filter search results by |
latitude |
number | Latitude of the location to search from |
longitude |
number | Longitude of the location to search from |
price |
[]integer | Pricing levels to filter the search result with |
attributes |
[]string | Bussiness attributes to filter by |
limit |
integer | Number of results to retrieve |
offset |
integer | Pagination cursor to start from |
Above, are the commonly used query parameters when searching for businesses. Below is an example of the business details retrieved.
{
"businesses": [
{
"alias": "golden-boy-pizza-hamburg",
"categories": [
{
"alias": "pizza",
"title": "Pizza"
},
{
"alias": "food",
"title": "Food"
}
],
"coordinates": {
"latitude": 41.7873382568359,
"longitude": -123.051551818848
},
"display_phone": "(415) 982-9738",
"distance": 4992.437696561071,
"id": "QPOI0dYeAl3U8iPM_IYWnA",
"image_url": "https://yelp-photos.yelpcorp.com/bphoto/b0mx7p6x9Z1ivb8yzaU3dg/o.jpg",
"is_closed": true,
"location": {
"address1": "James",
"address2": "Street",
"address3": "68M",
"city": "Los Angeles",
"country": "US",
"display_address": ["James", "Street", "68M", "Los Angeles, CA 22399"],
"state": "CA",
"zip_code": "22399"
},
"name": "Golden Boy Pizza",
"phone": "+14159829738",
"price": "$",
"rating": 4,
"review_count": 903,
"transactions": ["restaurant_reservation"],
"url": "https://www.yelp.com/biz/golden-boy-pizza-hamburg?adjust_creative=XsIsNkqpLmHqfJ51zfRn3A&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=XsIsNkqpLmHqfJ51zfRn3A",
"business_hours": {
"open": [
{
"is_overnight": false,
"start": 15,
"end": 130,
"day": 0
},
{
"is_overnight": false,
"start": 630,
"end": 1730,
"day": 1
},
{
"is_overnight": false,
"start": 45,
"end": 500,
"day": 2
}
],
"hours_type": "REGULAR",
"is_open_now": false
}
}
],
"region": {
"center": {
"latitude": 37.76089938976322,
"longitude": -122.43644714355469
}
},
"total": 6800
}
Each request to the Yelp API for business search retrieves the matching businesses, with a maximum of 240 entities in each call. However, the results don't include business reviews.
The business details endpoint retrieves the detailed business content. It follows the below schema:
curl --request GET \
--url https://api.yelp.com/v3/businesses/business_id_or_alias \
--header 'Authorization: Your Yelp API key' \
--header 'accept: application/json'
To retrieve a specific business details, this API endpoint accepts either the business ID or its alias as the business_id_or_alias
path parameter.
As for the endpoint query parameters, they are limited to the below properties:
Parameter | Type | Description |
---|---|---|
locale |
string | Locale code in the langauge and country code foramt |
device_platform |
string | The platform to use for the mobile_link property |
Here's a sample output of details retrieved by the business API:
{
"alias": "golden-boy-pizza-hamburg",
"categories": [
{
"alias": "pizza",
"title": "Pizza"
},
{
"alias": "food",
"title": "Food"
}
],
"coordinates": {
"latitude": 41.7873382568359,
"longitude": -123.051551818848
},
"display_phone": "(415) 982-9738",
"distance": 4992.437696561071,
"id": "QPOI0dYeAl3U8iPM_IYWnA",
"image_url": "https://yelp-photos.yelpcorp.com/bphoto/b0mx7p6x9Z1ivb8yzaU3dg/o.jpg",
"is_claimed": false,
"is_closed": true,
"date_opened": "",
"date_closed": "",
"location": {
"address1": "James",
"address2": "Street",
"address3": "68M",
"city": "Los Angeles",
"country": "US",
"display_address": [
"James",
"Street",
"68M",
"Los Angeles, CA 22399"
],
"state": "CA",
"zip_code": "22399"
},
"name": "Golden Boy Pizza",
"phone": "+14159829738",
"photos": [
"https://s3-media2.fl.yelpcdn.com/bphoto/CPc91bGzKBe95aM5edjhhQ/o.jpg",
"https://s3-media4.fl.yelpcdn.com/bphoto/FmXn6cYO1Mm03UNO5cbOqw/o.jpg",
"https://s3-media4.fl.yelpcdn.com/bphoto/HZVDyYaghwPl2kVbvHuHjA/o.jpg"
],
"photo_details": [
{
"photo_id": "CPc91bGzKBe95aM5edjhhQ",
"url": "https://s3-media2.fl.yelpcdn.com/bphoto/CPc91bGzKBe95aM5edjhhQ/o.jpg",
"caption": "Meat",
"width": "710,",
"height": "47,",
"is_user_submitted": "false,",
"user_id": "null,",
"label": "food"
},
{
"photo_id": "FmXn6cYO1Mm03UNO5cbOqw",
"url": "https://s3-media4.fl.yelpcdn.com/bphoto/FmXn6cYO1Mm03UNO5cbOqw/o.jpg",
"caption": "Dessert",
"width": 585,
"height": 78,
"is_user_submitted": false,
"user_id": "null,",
"label": "food"
},
{
"photo_id": "HZVDyYaghwPl2kVbvHuHjA",
"url": "https://s3-media4.fl.yelpcdn.com/bphoto/HZVDyYaghwPl2kVbvHuHjA/o.jpg",
"caption": "Dessert_2",
"width": 710,
"height": 53,
"is_user_submitted": false,
"user_id": null,
"label": "food"
}
],
"photo_count": 50,
"price": "$",
"rating": 4,
"review_count": 903,
"hours": {
"open": [
{
"is_overnight": false,
"start": 15,
"end": 130,
"day": 0
},
{
"is_overnight": false,
"start": 630,
"end": 1730,
"day": 1
},
{
"is_overnight": false,
"start": 45,
"end": 500,
"day": 2
}
],
"hours_type": "REGULAR",
"is_open_now": false
},
"special_hours": [
{
"date": "2019-02-07",
"end": "2000",
"is_closed": null,
"is_overnight": false,
"start": "1600"
}
],
"transactions": [
"restaurant_reservation"
],
"url": "https://www.yelp.com/biz/golden-boy-pizza-hamburg?adjust_creative=XsIsNkqpLmHqfJ51zfRn3A&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=XsIsNkqpLmHqfJ51zfRn3A",
"attributes": {
"business_temp_closed": 1657868400,
"outdoor_seating": false,
"liked_by_vegans": false,
"liked_by_vegetarians": true,
"hot_and_new": "2022-12-10"
},
"messaging": {
"url": "https://www.yelp.com/raq/AA5cAADa-F9f5DPqZ-PADA?adjust_creative=5374ujususZtKiSNEg7uhg&utm_campaign=yelp_api_v3&utm_medium=api_v3_graphql&utm_source=5374upadasZtCvMLBg7uhg#popup%3Araq",
"use_case_text": "Request a Quote",
"response_rate": 1,
"response_time": 791,
"is_enabled": true
},
"yelp_menu_url": "https://www.yelp.com/menu/golden-boy-pizza-hamburg",
"rapc": {
"is_enabled": true,
"is_eligible": true
}
}
Despite having the full business details retrieved by this Yelp API endpoint, the review data aren't included.
We have explored the endpoints responsible for retrieving business details. However, the services provided are retrieved through a dedicated endpoint for service offerings.
Below is the service offerings API endpoint schema:
curl --request GET \
--url https://api.yelp.com/v3/businesses/business_id_or_alias/service_offerings \
--header 'Authorization: Your Yelp API key' \
--header 'accept: application/json'
The above Yelp API endpoint requires a business_id_or_alias
path parameter to identify the related business.
The related query parameters are used to define localization settings:
Parameter | Type | Description |
---|---|---|
locale |
string | Locale code in the langauge and country code foramt |
Here's an example of what the service offering results look like:
{
"active": [
"bathtub_shower_installation",
"drain_repair",
"emergency_services",
"garbage_disposal_repair",
"gas_line_services",
"offers_electric_water_heater_installation"
],
"eligible": [
"backflow_services",
"bathtub_shower_installation",
"bathtub_shower_repair",
"drain_installation",
"drain_repair",
"emergency_services"
]
}
The business review data is among the most frequently requested information on Yelp. For this, Yelp provides a dedicated review API endpoint with the below schema:
curl --request GET \
--url 'https://api.yelp.com/v3/businesses/business_id_or_alias/reviews?limit=20&sort_by=yelp_sort' \
--header 'Authorization: Your Yelp API key' \
--header 'accept: application/json'
Similar to the previous business-related API endpoints, passing the business ID or its alias as a business_id_or_alias
path parameter is required to identify the business entity for retrieving reviews.
The Yelp review API provides a few query parameters for locality settings and pagination:
Parameter | Type | Description |
---|---|---|
locale |
string | Locale code in the langauge and country code foramt |
offset |
integer | Pagination cursor to start from |
limit |
integer | Number of results to retrieve |
sort_by |
string | Sorting algoirthm to use |
The review snippet results include details about the review text, rating, and user details:
{
"possible_languages": [
"en"
],
"reviews": [
{
"id": "xAG4O7l-t1ubbwVAlPnDKg",
"url": "https://www.yelp.com/biz/la-palma-mexicatessen-san-francisco?hrid=hp8hAJ-AnlpqxCCu7kyCWA&adjust_creative=0sidDfoTIHle5vvHEBvF0w&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_reviews&utm_source=0sidDfoTIHle5vvHEBvF0w",
"text": "Went back again to this place since the last time i visited the bay area 5 months ago, and nothing has changed. Still the sketchy Mission, Still the cashier...",
"rating": 5,
"time_created": "2016-08-29 00:41:13",
"user": {
"id": "W8UK02IDdRS2GL_66fuq6w",
"profile_url": "https://www.yelp.com/user_details?userid=W8UK02IDdRS2GL_66fuq6w",
"image_url": "https://s3-media3.fl.yelpcdn.com/photo/iwoAD12zkONZxJ94ChAaMg/o.jpg",
"name": "Ella A."
}
},
....
],
"total": 3
}
The above example response includes complete review data for each snippet. However, each review API call limits the number of reviews retrieved to only three.
So far, we've explored the technical aspects of the Yelp API. However, a common question arises: is using the Yelp API suitable for extracting business and review data? For this, we must explore two crucial factors: pricing and limitations.
Yelp offers different subscription tiers, each varying in pricing and the data fields that can be retrieved. Let's consider the minimum plans required to retrieve both basic business and review data:
To better explore this, let's evaluate the cost of extracting 1000 business and review data using Yelp API.
Data | Plan | Pricing per 1,000 calls | Max results per API call | Cost per 10,000 results |
---|---|---|---|---|
Business | Starter | $7.99 | 1 | $79.9 |
Reviews | Plus | $9.99 | 3 | $33.33 |
Above is a rough estimation of the Yelp API costs for extracting 10,000 business and review entities. However, the subscription plans considered for this cost estimation only cover very basic attributes. The full data attributes are only available under the enterprise plan, which comes at a much higher cost of $14.99 per 1,000 API calls.
For further details on Yelp API pricing, refer to the official documentation.
So far, we have explored the available Yelp APIs for business and reviewed data extraction, including their specifications and result schema. However, we have identified the below limitations:
An alternative to using Yelp API for data extraction is using web scraping. This approach enables extracting data from Yelp's public web pages. Instead of requesting the API endpoints for direct JSON data retrieval, we can parse the HTML or replicate background API calls to extract what we are looking for!
For example, let's replicate and parse Yelp's public search pages as an alternative to the paid search API endpoint:
import json
import asyncio
from parsel import Selector
from typing import List, Dict
from urllib.parse import urlencode
from httpx import AsyncClient, Response
# initialize an async httpx client
client = AsyncClient(
# enable http2
http2=True,
# add basic browser like headers to prevent getting blocked
headers={
"Accept-Language": "en-US,en;q=0.9",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Cookie": "intl_splash=false"
},
follow_redirects=True
)
def parse_search(response: Response) -> List[Dict]:
"""parse listing data from the search XHR data"""
assert response.status_code == 200, "Request is blocked, use ScrapFly to bypass Yelp's blocking"
search_data = []
selector = Selector(text=response.text)
script = selector.xpath("//script[@data-id='react-root-props']/text()").get()
data = json.loads(script.split("react_root_props = ")[-1].rsplit(";", 1)[0])
for item in data["legacyProps"]["searchAppProps"]["searchPageProps"]["mainContentComponentsListProps"]:
# filter search data cards
if "bizId" in item.keys():
search_data.append(item)
# filter the max results count
elif "totalResults" in item["props"]:
total_results = item["props"]["totalResults"]
return {"search_data": search_data, "total_results": total_results}
async def scrape_search(keyword: str, location: str):
"""scrape single page of yelp search"""
def make_search_url(offset):
base_url = "https://www.yelp.com/search?"
params = {"find_desc": keyword, "find_loc": location, "start": offset}
return base_url + urlencode(params)
# final url example:
# https://www.yelp.com/search?find_desc=plumbers&find_loc=Seattle%2C+WA&start=1
first_page = await client.get(make_search_url(1))
data = parse_search(first_page)
return data
async def run():
search_data = await scrape_search(
keyword="plumbers", location="Seattle, WA"
)
with open ("data.json", "w", encoding="utf-8") as f:
f.write(json.dumps(search_data, indent=2, ensure_ascii=False))
if __name__ == "__main__":
asyncio.run(run())
Here, we request Yelp search pages given a search term and location. Then, we extract all business results from the page hidden web data. Here are what the results look like:
{
"total_results": 240,
"search_data": [
{
"bizId": "_Wv9uLrzQ1dZ6fgMYjgygg",
"searchResultBusiness": {
"ranking": null,
"isAd": true,
"renderAdInfo": false,
"name": "Rooter-Man",
"alternateNames": [],
"businessUrl": "/adredir?ad_business_id=_Wv9uLrzQ1dZ6fgMYjgygg&campaign_id=jpQhVJGCG8ILi71Et0XDdQ&click_origin=search_results&placement=vertical_0&placement_slot=1&redirect_url=https%3A%2F%2Fwww.yelp.com%2Fbiz%2Frooter-man-orting%3Foverride_cta%3DGet%2Bpricing%2B%2526%2Bavailability&request_id=9d07f67caab0b9cc&signature=919646537897431750a9ec41b1240262d0e8596eed098a3d2776e9c9afe1898f&slot=0",
"categories": [
{
"title": "Plumbing",
"url": "/search?find_desc=Plumbing&find_loc=Seattle%2C+WA"
}
],
"priceRange": "",
"rating": 0.0,
"isClickableReview": false,
"reviewCount": 0,
"formattedAddress": "",
"neighborhoods": [],
"phone": "",
"serviceArea": null,
"parentBusiness": null,
"servicePricing": null,
"bizSiteUrl": "https://biz.yelp.com",
"serviceOfferings": [],
"businessAttributes": {
"licenses": [
{
"license_number": "ROOTE**792MT",
"license_expiration_date": "2025-08-12",
"license_verification_url": "https://secure.lni.wa.gov/verify/Detail.aspx?UBI=602584774&LIC=ROOTE**792MT&SAW=",
"license_verification_status": "verified",
"license_verification_date": "2023-11-17",
"license_issuing_authority": "WA DLI ",
"license_type": "Journey Level",
"license_source": "biz_owner",
"licensee": null
}
]
},
"alias": "rooter-man-orting",
"website": {
"href": "/adredir?ad_business_id=_Wv9uLrzQ1dZ6fgMYjgygg&campaign_id=jpQhVJGCG8ILi71Et0XDdQ&click_origin=search_results_visit_website&placement=vertical_0&placement_slot=1&redirect_url=https%3A%2F%2Fwww.yelp.com%2Fbiz_redir%3Fcachebuster%3D1701335261%26s%3D853c7f42baedaddb12d3a47cbf0c7c30e7bb3cf5d0408740f1e1ee56ec69c2a7%26src_bizid%3D_Wv9uLrzQ1dZ6fgMYjgygg%26url%3Dhttp%253A%252F%252Fwww.rooterman.com%26website_link_type%3Dwebsite&request_id=9d07f67caab0b9cc&signature=8842df091a5b79be57b4bf6122644039b1f6c07fac0c2fdb235c8ff076ce2520&slot=0",
"rel": "noopener nofollow"
},
"city": "Orting"
},
"scrollablePhotos": {
"isScrollable": false,
"photoList": [
{
"src": "https://s3-media0.fl.yelpcdn.com/bphoto/-31eN7ypNCIJCHRO0Xjf3g/ls.jpg",
"srcset": "https://s3-media0.fl.yelpcdn.com/bphoto/-31eN7ypNCIJCHRO0Xjf3g/258s.jpg 1.03x,https://s3-media0.fl.yelpcdn.com/bphoto/-31eN7ypNCIJCHRO0Xjf3g/300s.jpg 1.20x,https://s3-media0.fl.yelpcdn.com/bphoto/-31eN7ypNCIJCHRO0Xjf3g/348s.jpg 1.39x"
}
],
"photoHref": "/adredir?ad_business_id=_Wv9uLrzQ1dZ6fgMYjgygg&campaign_id=jpQhVJGCG8ILi71Et0XDdQ&click_origin=search_results&placement=vertical_0&placement_slot=1&redirect_url=https%3A%2F%2Fwww.yelp.com%2Fbiz%2Frooter-man-orting%3Foverride_cta%3DGet%2Bpricing%2B%2526%2Bavailability&request_id=9d07f67caab0b9cc&signature=919646537897431750a9ec41b1240262d0e8596eed098a3d2776e9c9afe1898f&slot=0",
"allPhotosHref": "/biz_photos/_Wv9uLrzQ1dZ6fgMYjgygg",
"isResponsive": false
},
"childrenBusinessInfo": null,
"searchResultBusinessPortfolioProjects": null,
"searchResultBusinessHighlights": {
"bizSiteUrl": "https://biz.yelp.com/business_highlights?utm_source=disclaimer_www_searchresults",
"businessHighlights": [
{
"bizPageIconName": "",
"group": {},
"bizPageIconV2Name": "40x40_locally_owned_v2",
"iconName": "18x18_locally_owned",
"id": "LOCALLY_OWNED_OPERATED",
"title": "Locally owned & operated"
},
{
"bizPageIconName": "",
"group": {},
"bizPageIconV2Name": "40x40_family_owned_v2",
"iconName": "18x18_family_owned",
"id": "FAMILY_OWNED_OPERATED",
"title": "Family-owned & operated"
},
{
"bizPageIconName": "",
"group": {},
"bizPageIconV2Name": "40x40_workmanship_guaranteed_v2",
"iconName": "18x18_workmanship_guaranteed",
"id": "WORKMANSHIP_GUARANTEED",
"title": "Workmanship guaranteed"
},
{
"bizPageIconName": "",
"group": {},
"bizPageIconV2Name": "40x40_years_in_business_v2",
"iconName": "18x18_years_in_business",
"id": "YEARS_IN_BUSINESS",
"title": "20 years in business"
},
{
"bizPageIconName": "",
"group": {},
"bizPageIconV2Name": "40x40_veteran_owned_v2",
"iconName": "18x18_veteran_owned",
"id": "VETERAN_OWNED_OPERATED",
"title": "Veteran-owned & operated"
},
{
"bizPageIconName": "",
"group": {},
"bizPageIconV2Name": "40x40_free_estimates_v2",
"iconName": "18x18_free_estimates",
"id": "FREE_ESTIMATES",
"title": "Free estimates"
}
],
"numGemsAllowed": 2
},
"tags": [],
"serviceOfferings": [],
"snippet": {
"readMoreText": "more",
"readMoreUrl": "/adredir?ad_business_id=_Wv9uLrzQ1dZ6fgMYjgygg&campaign_id=jpQhVJGCG8ILi71Et0XDdQ&click_origin=read_more&placement=vertical_0&placement_slot=1&redirect_url=https%3A%2F%2Fwww.yelp.com%2Fbiz%2Frooter-man-orting%3Foverride_cta%3DGet%2Bpricing%2B%2526%2Bavailability&request_id=9d07f67caab0b9cc&signature=919646537897431750a9ec41b1240262d0e8596eed098a3d2776e9c9afe1898f&slot=0",
"text": "Give us a call for a free consultatiom.",
"thumbnail": {
"src": "https://s3-media0.fl.yelpcdn.com/bphoto/A0e8SoYZthSqITMDTjB0sA/30s.jpg",
"srcset": "https://s3-media0.fl.yelpcdn.com/bphoto/A0e8SoYZthSqITMDTjB0sA/ss.jpg 1.33x,https://s3-media0.fl.yelpcdn.com/bphoto/A0e8SoYZthSqITMDTjB0sA/60s.jpg 2.00x,https://s3-media0.fl.yelpcdn.com/bphoto/A0e8SoYZthSqITMDTjB0sA/90s.jpg 3.00x"
},
"id": "",
"type": "specialty"
},
"searchActions": [],
"markerKey": "ad_business:below_organic:U5LNtOZST6_9gpNAbqw8Lg",
"searchResultLayoutType": "scrollablePhotos",
"verifiedLicenseInfo": {
"licenses": [
{
"licensee": null,
"licenseNumber": "ROOTE**792MT",
"issuedBy": "WA DLI ",
"trade": "Journey Level",
"verifiedDate": "2023-11-17",
"expiryDate": "2025-08-12"
}
],
"bizSiteUrl": "https://biz.yelp.com/verified_license?utm_source=legal_disclaimer_www"
},
"verifiedLicenseLayout": "BadgeAndTextBelowBizName",
"yelpGuaranteedInfo": {
"yelp_guaranteed_status": false,
"yg_info_modal_url": "https://www.yelp.com/yelp-guaranteed"
},
"adLoggingInfo": {
"placement": "vertical_0",
"slot": 0,
"placementSlot": 1,
"opportunityId": "9d07f67caab0b9cc",
"adCampaignId": "jpQhVJGCG8ILi71Et0XDdQ",
"flow": "search",
"isShowcaseAd": false
},
"offerCampaignDetails": null
},
....
]
}
From the sample output above, we can see that we have retrieved full business data from search pages directly in JSON. Some attributes returned are even only available for the Yelp API through the enterprise plan!
For further tips and tricks on Yelp web scraping, refer to our dedicated guide.
We have seen that web scraping is a much better alternative to using Yelp API. However, there's a catch: official Yelp scraper on GitHub.
Yelp is able to identify our web scraping requests as being automated and hence requiring us to solve CAPTCHA challenges or even blocking us entirely:
ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.
Using ScrapFly to bypass Yelp scraping blocking is fairly straightforward. All we have to do is replace our HTTP client with ScrapFly client:
# standard web scraping code
import httpx
from parsel import Selector
response = httpx.get("some yelp.com URL")
selector = Selector(response.text)
# in ScrapFly becomes this 👇
from scrapfly import ScrapeConfig, ScrapflyClient
# replaces your HTTP client (httpx in this case)
scrapfly = ScrapflyClient(key="Your ScrapFly API key")
response = scrapfly.scrape(ScrapeConfig(
url="website URL",
asp=True, # enable the anti scraping protection to bypass blocking
country="US", # set the proxy location to a specfic country
render_js=True # enable rendering JavaScript (like headless browsers) to scrape dynamic content if needed
))
# use the built in Parsel selector
selector = response.selector
# access the HTML content
html = response.scrape_result['content']
Refer to our official official Yelp scraper on GitHub for ready-to-use data extraction scripts for various datasets.
To wrap up this guide, let's look at some of the frequently asked questions about using Yelp API for data extraction.
No, Yelp API is provided under paid subscription tiers. Each tier supports specific data resources.
Yes, web scraping Yelp is a competitive alternative to Yelp API. This approach extracts Yelp data using HTLML parsing or by replicating background requests. Refer to our Yelp scraping guide for more.
Yelp requires API tokens to authorize its endpoints. To get your Yelp API key, subscribe to any of the available plans.
In this guide, we went through an in-depth guide on Yelp API. We started by exploring the available API endpoints, their schemas, parameters, and outputs.
We have seen that using Yelp API for data extraction has limitations, including expensive subscription plans and limited data attributes supported.
As an alternative, we have explored ScrapFly web scraping API. It provides antibot bypass capabilities, enabling Yelp data extraction at scale.