How to Scrape YellowPages.com in 2025

by Bernardas Ališauskas Jan 21, 2025

#scrapeguide #python

In this tutorial, we'll explain how to scrape Yellowpages - an online directory of various US-based businesses.

YellowPages.com is the digital version of telephone directories called yellow pages. It contains business information such as phone numbers, websites, and addresses as well as business reviews.

In this tutorial, we'll be using Python to scrape all of that business and review information. We'll also apply a few HTML parsing tricks to extract the data from its pages effectively. Let's dive in!

Web Scraping with Python

Introduction tutorial to web scraping with Python. How to collect and parse public data. Challenges, best practices and an example project.

Latest YellowPages Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

Legal Disclaimer and Precautions

This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:

Do not scrape at rates that could damage the website.
Do not scrape data that's not available publicly.
Do not store PII of EU citizens who are protected by GDPR.
Do not repurpose the entire public datasets which can be illegal in some countries.

Scrapfly does not offer legal advice but these are good general rules to follow in web scraping and for more you should consult a lawyer.

Why Scrape YellowPages.com?

YellowPages contains thousands of businesses and their details, such as phone numbers, websites and locations. Therefore, we can utilize YellowPages web scraping for various use cases, from market and business analytics, to acquire competitive advantage or leads.

Furthermore, YallowPages features user reviews on the business. Scraping YellowPages allows for retrieving this data quickly, which can be utilized in machine learning techniques for analyzing and gaining insights into users' experiences and opinions.

Fore more details, refer to our previous guide on web scraping use cases.

Project Setup

Before we start, keep in mind that YellowPages is only accessible to US-based IP addresses. So, if you are located outside the US, you will need to use a US-based Proxy or VPN to access the website. Alternatively, run the full YellowPages scraper code on GitHub in the ScrapFly version.

To scrape YellowPages, we'll use Python with a few community packages:

httpx - An HTTP client library we'll use to request the YellowPages server.
parsel - An HTML parsing library we'll use to parse the HTML we get using selectors like CSS and XPath.
loguru - A logging library we'll use to monitor our YellowPages scraper.
asyncio - A library we'll use to run our code asynchronously, increasing our web scraping speed.

Note that asyncio comes pre-installed in Python. So, you will only have to install other packages using the following pip command:

$ pip install httpx parsel loguru

Alternatively, feel free to swap httpx out with any other HTTP client package such as requests. As for, parsel, another great alternative is beautifulsoup.

How to Find Companies on YellowPages

Before we scrape YellowPages for company data, we need to find them first. To find them, we can use either of two approaches. The first one is using the YellowPages sitemap, which contain links for all categories and pages on the website.

However, we'll use a more flexible approach, the search pages.

once we click find we are redirected to results page with 427 results

We can see that upon submitting we submit a search request, YellowPages redirects us to a new URL containing pages of results. Let's scrape these results in the following section.

How to Scrape YellowPages Search

To scrape YellowPages, we need to form a search URL using a search query and a few parameters. Below is an example of using the base search URL with the minimum parameters:

The above URL include the search query, location and search page number. Let's apply this URL structure with an example search query. We'll search for
Let's apply the above URL structure with an example search query, we'll search for japanese restaurants in San Francisco, California. Here is the page we got by requesting this URL:

Search page parsing markup illustration — search page parsing markup

We'll scrape YellowPages search page data from the marked fields above. Let's start by defining our parsing logic:

def parse_search(response) -> Preview:
    """parse yellowpages.com search page for business preview data"""
    sel = Selector(text=response.text)
    parsed = []
    for result in sel.css(".organic div.result"):
        links = {}
        for link in result.css("div.links>a"):
            name = link.xpath("text()").get()
            url = link.xpath("@href").get()
            links[name] = url
        first = lambda css: result.css(css).get("").strip()
        many = lambda css: [value.strip() for value in result.css(css).getall()]
        parsed.append(
            {
                "name": first("a.business-name ::text"),
                "url": urljoin("https://www.yellowpages.com/", first("a.business-name::attr(href)")),
                "links": links,
                "phone": first("div.phone::text"),
                "categories": many(".categories>a::text"),
                "address": first(".adr .street-address::text"),
                "location": first(".adr .locality::text"),
                "rating": first(".ratings .rating div::attr(class)").split(" ", 1)[-1],
                "rating_count": first(".ratings .rating span::text").strip("()"),
            }
        )
    return parsed

Here, we define a parse_search function. It iterates over result boxes and uses the CSS selectors to extract business preview information, such as phone number, rating, name and most importantly, link to their full information page.

Next, we'll utilize the parsing logic while requesting the search pages to scrape the data:

import asyncio
from urllib.parse import urljoin
import asyncio
import json
import httpx
from parsel import Selector
from loguru import logger as log
from typing_extensions import TypedDict
from typing import List


class Preview(TypedDict):
    """Type hint container for business preview data. This object just helps us to keep track what results we'll be getting"""
    name: str
    url: str
    links: List[str]
    phone: str
    categoresi: List[str]
    address: str
    location: str
    rating: str
    rating_count: str


def parse_search(response) -> Preview:
    """parse yellowpages.com search page for business preview data"""
    sel = Selector(text=response.text)
    parsed = []
    for result in sel.css(".organic div.result"):
        links = {}
        for link in result.css("div.links>a"):
            name = link.xpath("text()").get()
            url = link.xpath("@href").get()
            links[name] = url
        first = lambda css: result.css(css).get("").strip()
        many = lambda css: [value.strip() for value in result.css(css).getall()]
        parsed.append(
            {
                "name": first("a.business-name ::text"),
                "url": urljoin("https://www.yellowpages.com/", first("a.business-name::attr(href)")),
                "links": links,
                "phone": first("div.phone::text"),
                "categories": many(".categories>a::text"),
                "address": first(".adr .street-address::text"),
                "location": first(".adr .locality::text"),
                "rating": first(".ratings .rating div::attr(class)").split(" ", 1)[-1],
                "rating_count": first(".ratings .rating span::text").strip("()"),
            }
        )
    return parsed


# to avoid being instantly blocked we should use request headers that of a common web browser:
BASE_HEADERS = {
    "accept-language": "en-US,en;q=0.9",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "accept-language": "en-US;en;q=0.9",
    "accept-encoding": "gzip, deflate, br",
}


# to run our scraper we need to start httpx session:
async def run():
    limits = httpx.Limits(max_connections=5)
    async with httpx.AsyncClient(limits=limits, timeout=httpx.Timeout(15.0), headers=BASE_HEADERS) as session:
        response = await session.get("https://www.yellowpages.com/search?search_terms=Japanese+Restaurants&geo_location_terms=San+Francisco%2C+CA")
        result_search = parse_search(response)
        # print the results in JSOn format
        print(json.dumps(result_search, indent=2))

if __name__ == "__main__":
    asyncio.run(run())

Here is a sample output of result we got:

Sample output

[
  {
    "name": "Ichiraku",
    "url": "https://www.yellowpages.com/san-francisco-ca/mip/ichiraku-6317061",
    "links": {
      "View Menu": "/san-francisco-ca/mip/ichiraku-6317061#open-menu"
    },
    "phone": "(415) 668-9918",
    "categories": [
      "Japanese Restaurants",
      "Asian Restaurants",
      "Take Out Restaurants"
    ],
    "address": "3750 Geary Blvd",
    "location": "San Francisco, CA 94118",
    "rating": "four half",
    "rating_count": "13"
  },
  {
    "name": "Benihana",
    "url": "https://www.yellowpages.com/san-francisco-ca/mip/benihana-458857411",
    "links": {
      "Website": "http://www.benihana.com/locations/sanfrancisco-ca-sf"
    },
    "phone": "(415) 563-4844",
    "categories": [
      "Japanese Restaurants",
      "Bar & Grills",
      "Restaurants"
    ],
    "address": "1737 Post St",
    "location": "San Francisco, CA 94115",
    "rating": "three half",
    "rating_count": "10"
  },
  ...
]

The above code can scrape a single search page. Let's modify it to crawl over other search pages:

import asyncio
from urllib.parse import urlencode, urljoin
import asyncio
import json
import math
import httpx
from parsel import Selector
from loguru import logger as log
from typing_extensions import TypedDict
from typing import List, Optional


class Preview(TypedDict):
    """Type hint container for business preview data. This object just helps us to keep track what results we'll be getting"""
    name: str
    url: str
    links: List[str]
    phone: str
    categoresi: List[str]
    address: str
    location: str
    rating: str
    rating_count: str


def parse_search(response) -> Preview:
    """parse yellowpages.com search page for business preview data"""
    sel = Selector(text=response.text)
    parsed = []
    for result in sel.css(".organic div.result"):
        links = {}
        for link in result.css("div.links>a"):
            name = link.xpath("text()").get()
            url = link.xpath("@href").get()
            links[name] = url
        first = lambda css: result.css(css).get("").strip()
        many = lambda css: [value.strip() for value in result.css(css).getall()]
        parsed.append(
            {
                "name": first("a.business-name ::text"),
                "url": urljoin("https://www.yellowpages.com/", first("a.business-name::attr(href)")),
                "links": links,
                "phone": first("div.phone::text"),
                "categories": many(".categories>a::text"),
                "address": first(".adr .street-address::text"),
                "location": first(".adr .locality::text"),
                "rating": first(".ratings .rating div::attr(class)").split(" ", 1)[-1],
                "rating_count": first(".ratings .rating span::text").strip("()"),
            }
        )
    return parsed


async def search(query: str, session: httpx.AsyncClient, location: Optional[str] = None) -> List[Preview]:
    """search yellowpages.com for business preview information scraping all of the pages"""
    def make_search_url(page):
        base_url = "https://www.yellowpages.com/search?"
        parameters = {"search_terms": query, "geo_location_terms": location, "page": page}
        return base_url + urlencode(parameters)

    log.info(f'scraping "{query}" in "{location}"')
    first_page = await session.get(make_search_url(1))
    sel = Selector(text=first_page.text)
    total_results = int(sel.css(".pagination>span::text ").re("of (\d+)")[0])
    total_pages = int(math.ceil(total_results / 30))
    log.info(f'{query} in {location}: scraping {total_pages} of business preview pages')
    previews = parse_search(first_page)
    for result in await asyncio.gather(*[session.get(make_search_url(page)) for page in range(2, total_pages + 1)]):
        previews.extend(parse_search(result))
    log.success(f'{query} in {location}: scraped {len(previews)} total of business previews')
    return previews

Run the code

BASE_HEADERS = {
    "accept-language": "en-US,en;q=0.9",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "accept-language": "en-US;en;q=0.9",
    "accept-encoding": "gzip, deflate, br",
}

async def run():
    limits = httpx.Limits(max_connections=5)
    async with httpx.AsyncClient(limits=limits, timeout=httpx.Timeout(15.0), headers=BASE_HEADERS) as session:
        result_search = await search("japanese restaurants", location="San Francisco, CA", session=session)
        print(json.dumps(result_search, indent=2))

if __name__ == "__main__":
    asyncio.run(run())

he above function implements a complete scraping loop. We generate search URLs from a given query and location parameters. Then, we scrape the first results page to extract the total result number and scrape the remaining pages concurrently. This is a common pagination web scraping idiom:

illustration of pagination scraping idiom — efficient pagination scraping: get total results from first page and then scrape the rest of the pages together!

Our YellowPages scraper can find and scrape business data from search pages. Next, we'll scrape the dedicated business pages.

How to Scrape Yellowpages Company Data

To scrape company data, we need to request each company URL that we found previously. Let's start with an example URL of the restaurant business ozumo restaurant:

We'll scrape the marked fields in the above image. First, let's start with the scraping logic:

class Company(TypedDict):
    """type hint container for company data found on yellowpages.com"""
    name: str
    categories: List[str]
    rating: str
    rating_count: str
    phone: str
    website: str
    address: str
    work_hours: Dict[str, str] 


def parse_company(response) -> Company:
    """extract company details from yellowpages.com company's page"""
    sel = Selector(text=response.text)
    # here we define some lamba shortcuts for parsing common data like
    # selecting first elements, many elements and join all elements together and 
    first = lambda css: sel.css(css).get("").strip()
    many = lambda css: [value.strip() for value in sel.css(css).getall()]
    together = lambda css, sep=" ": sep.join(sel.css(css).getall())

    # to parse working hours we need to do a bit of complex string parsing
    def _parse_datetime(values: List[str]):
        """
        parse datetime from yellow pages datetime strings

        >>> _parse_datetime(["Fr-Sa 12:00-22:00"])
        {'Fr': '12:00-22:00', 'Sa': '12:00-22:00'}
        >>> _parse_datetime(["Fr 12:00-22:00"])
        {'Fr': '12:00-22:00'}
        >>> _parse_datetime(["Fr-Sa 12:00-22:00", "We 10:00-18:00"])
        {'Fr': '12:00-22:00', 'Sa': '12:00-22:00', 'We': '10:00-18:00'}
        """

        WEEKDAYS = ["Mo", "Tu", "We", "Th", "Fr", "Sa", "Su"]
        results = {}
        for text in values:
            days, hours = text.split(" ")
            if "-" in days:
                day_start, day_end = days.split("-")
                for day in WEEKDAYS[
                    WEEKDAYS.index(day_start) : WEEKDAYS.index(day_end) + 1
                ]:
                    results[day] = hours
            else:
                results[days] = hours
        return results

    return {
        "name": first("h1.business-name::text"),
        "categories": many(".categories>a::text"),
        "rating": first(".ratings div::attr(class)").split(" ", 1)[-1],
        "ratingCount": first(".ratings .count::text").strip("()"),
        "phone": first(".phone::attr(href)").replace("(", "").replace(")", ""),
        "website": first(".website-link::attr(href)"),
        "address": together(".address::text"),
        "workingHours": _parse_datetime(many(".open-details tr time::attr(datetime)")),
    }

Here, we use the CSS selectors to extract specific company fields we marked earlier. We also process and clean a few fields, such as the phone numbers and unpackeding the work days from range like Mo-We to individual values like Mo,Tu,We.

Next, let's use the parse_company function we defined while requesting the company pages:

import httpx
import json
import asyncio
from parsel import Selector
from typing_extensions import TypedDict
from typing import Dict, List

class Company(TypedDict):
    """type hint container for company data found on yellowpages.com"""
    name: str
    categories: List[str]
    rating: str
    rating_count: str
    phone: str
    website: str
    address: str
    work_hours: Dict[str, str] 


def parse_company(response) -> Company:
    """extract company details from yellowpages.com company's page"""
    sel = Selector(text=response.text)
    # here we define some lamba shortcuts for parsing common data like
    # selecting first elements, many elements and join all elements together and 
    first = lambda css: sel.css(css).get("").strip()
    many = lambda css: [value.strip() for value in sel.css(css).getall()]
    together = lambda css, sep=" ": sep.join(sel.css(css).getall())

    # to parse working hours we need to do a bit of complex string parsing
    def _parse_datetime(values: List[str]):
        """
        parse datetime from yellow pages datetime strings

        >>> _parse_datetime(["Fr-Sa 12:00-22:00"])
        {'Fr': '12:00-22:00', 'Sa': '12:00-22:00'}
        >>> _parse_datetime(["Fr 12:00-22:00"])
        {'Fr': '12:00-22:00'}
        >>> _parse_datetime(["Fr-Sa 12:00-22:00", "We 10:00-18:00"])
        {'Fr': '12:00-22:00', 'Sa': '12:00-22:00', 'We': '10:00-18:00'}
        """

        WEEKDAYS = ["Mo", "Tu", "We", "Th", "Fr", "Sa", "Su"]
        results = {}
        for text in values:
            days, hours = text.split(" ")
            if "-" in days:
                day_start, day_end = days.split("-")
                for day in WEEKDAYS[
                    WEEKDAYS.index(day_start) : WEEKDAYS.index(day_end) + 1
                ]:
                    results[day] = hours
            else:
                results[days] = hours
        return results

    return {
        "name": first("h1.business-name::text"),
        "categories": many(".categories>a::text"),
        "rating": first(".ratings div::attr(class)").split(" ", 1)[-1],
        "ratingCount": first(".ratings .count::text").strip("()"),
        "phone": first(".phone::attr(href)").replace("(", "").replace(")", ""),
        "website": first(".website-link::attr(href)"),
        "address": together(".address::text"),
        "workingHours": _parse_datetime(many(".open-details tr time::attr(datetime)")),
    }


async def scrape_company(url: str, session: httpx.AsyncClient) -> Company:
    """scrape yellowpage.com company page details"""
    first_page = await session.get(url)
    return parse_company(first_page)


BASE_HEADERS = {
    "accept-language": "en-US,en;q=0.9",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "accept-language": "en-US;en;q=0.9",
    "accept-encoding": "gzip, deflate, br",
}


async def run():
    limits = httpx.Limits(max_connections=5)
    async with httpx.AsyncClient(limits=limits, timeout=httpx.Timeout(15.0), headers=BASE_HEADERS) as session:
        result_company = await scrape_company(
            "https://www.yellowpages.com/san-francisco-ca/mip/ozumo-japanese-restaurant-8083027", session=session,
        )
        print(json.dumps(result_company))

if __name__ == "__main__":
    asyncio.run(run())

Here is a sample output of the result we got:

Sample output

```json { "name": "Ozumo Japanese Restaurant", "categories": [ "Japanese Restaurants", "Asian Restaurants", "Caterers", "Japanese Restaurants", "Asian Restaurants", "Caterers", "Family Style Restaurants", "Restaurants", "Sushi Bars" ], "rating": "three half", "rating_count": "72", "phone": "(415) 882-1333", "website": "http://www.ozumo.com", "address": "161 Steuart St San Francisco, CA 94105", "work_hours": { "Mo": "16:00-22:00", "Tu": "16:00-22:00", "We": "16:00-22:00", "Th": "16:00-22:00", "Fr": "12:00-22:00", "Sa": "12:00-22:00", "Su": "12:00-21:00" } } ```

Cool! With just a few lines of code, our YellowPages scraper was able to get all the essential business details. Next, we'll scrape the business reviews!

How to Scrape Yellowpages Reviews

To scrape business reviews, we'll have to send additional requests to the review pages. For example, if we go back to our Japanese restaurant listing and scroll to the bottom, we can find review paging URL format:

review page url structure — using inspect the function of browser developer tools (right-click -> inspect) we can see next page link structure

From the above image, we can see that we can paginate over reviews using the page parameter. And since we know the total number of reviews, we can crawl over review pages to extract all the reviews:

import asyncio
import httpx
import math
import json
from typing import List
from typing_extensions import TypedDict
from parsel import Selector
from urllib.parse import urlencode


class Company(TypedDict):
    # rest of the Company class we defined earlier

def parse_company(response):
    # the parse_company logic we defined earlier

class Review(TypedDict):
    """type hint for yellowpages.com scraped review"""
    id: str
    author: str
    source: str
    date: str
    stars: str
    title: str
    text: str


def parse_reviews(response) -> List[Review]:
    """parse company page for visible reviews"""
    sel = Selector(text=response.text)
    reviews = []
    for box in sel.css("#reviews-container>article"):
        first = lambda css: box.css(css).get("").strip()
        many = lambda css: [value.strip() for value in box.css(css).getall()]
        reviews.append(
            {
                "id": box.attrib.get("id"),
                "author": first("div.author::text"),
                "source": first("span.attribution>a::text"),
                "date": first("p.date-posted>span::text"),
                "stars": len(many(".result-ratings ul>li.rating-star")),
                "title": first(".review-title::text"),
                "text": first(".review-response p::text"),
            }
        )
    return reviews


class CompanyData(TypedDict):
    info: Company
    reviews: List[Review]


# Now we can extend our company scraper to pick up reviews as well!
async def scrape_company(url, session: httpx.AsyncClient, get_reviews=True) -> CompanyData:
    """scrape yellowpage.com company page details"""
    first_page = await session.get(url)
    sel = Selector(text=first_page.text)
    if not get_reviews:
        return parse_company(first_page)
    reviews = parse_reviews(first_page)
    if reviews:
        total_reviews = int(sel.css(".pagination-stats::text").re(r"of (\d+)")[0])
        total_pages = int(math.ceil(total_reviews / 20))
        for response in await asyncio.gather(
            *[session.get(url + "?" + urlencode({"page": page})) for page in range(2, total_pages + 1)]
        ):
            reviews.extend(parse_reviews(response))
    return {
        "info": parse_company(first_page),
        "reviews": reviews,
    }

In the above code, we apply the pagination approach we used in the search scraping logic. We also utilize the company parsing logic to extract the company information alongside the reviews. Let's run our YellowPages scraping code and have a look at the results:

Run code & example output

BASE_HEADERS = {
    "accept-language": "en-US,en;q=0.9",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "accept-language": "en-US;en;q=0.9",
    "accept-encoding": "gzip, deflate, br",
}


async def run():
    limits = httpx.Limits(max_connections=5)
    async with httpx.AsyncClient(limits=limits, timeout=httpx.Timeout(15.0), headers=BASE_HEADERS) as session:
        result_company = await scrape_company(
            "https://www.yellowpages.com/san-francisco-ca/mip/ozumo-japanese-restaurant-8083027", session=session,
        )
        print(json.dumps(result_company, indent=2, ensure_ascii=False))


if __name__ == "__main__":
    asyncio.run(run())

{
  "info": {
    "name": "Ozumo Japanese Restaurant",
    "categories": [
      "Japanese Restaurants",
      "Asian Restaurants",
      "Caterers",
      "Japanese Restaurants",
      "Asian Restaurants",
      "Caterers",
      "Family Style Restaurants",
      "Restaurants",
      "Sushi Bars"
    ],
    "rating": "three half",
    "rating_count": "72",
    "phone": "(415) 882-1333",
    "website": "http://www.ozumo.com",
    "address": "161 Steuart St San Francisco, CA 94105",
    "work_hours": {
      "Mo": "16:00-22:00",
      "Tu": "16:00-22:00",
      "We": "16:00-22:00",
      "Th": "16:00-22:00",
      "Fr": "12:00-22:00",
      "Sa": "12:00-22:00",
      "Su": "12:00-21:00"
    }
  },
  "reviews": [
    {
      "id": "<redacted for blog use>",
      "author": "<redacted for blog use>",
      "source": "Citysearch",
      "date": "03/18/2010",
      "stars": 5,
      "title": "Mindblowing Japanese!",
      "text": "Wow what a dinner!  I went to Ozumo last night with a friend for a complimentary meal I had won by being a Citysearch Dictator.  It was AMAZING!  We ordered the Hanabi (halibut) and Dohyo (ahi tuna) small plates as well as the Gindara (black cod) and Burikama (roasted yellowtail).  Everything was absolutely delicious.  They paired our meal with a variety of unique wines and sakes.  The manager, Hiro, and our waitress were extremely knowledgeable about the food and how it was prepared.  We started to tease the manager that he had a story for everything.  His most boring story, he said, was about edamame.  It was a great experience!"
    },
  ...
  ]
}

With this last feature, we can scrape YellowPages business data from company, search and review pages. However, our YellowPages scraper is very likely to get blocked after sending a few additional requests. Let's explore how we can scale it!

Bypass Yellowpages Scraping Blocking

Scraping Yellowpages isn't very complicated but scaling up such scraping operations can be difficult and this where Scrapfly can lend a hand!

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

Anti-bot protection bypass - scrape web pages without blocking!
Rotating residential proxies - prevent IP address and geographic blocks.
JavaScript rendering - scrape dynamic web pages through cloud browsers.
Full browser automation - control browsers to scroll, input and click on objects.
Format conversion - scrape as HTML, JSON, Text, or Markdown.
Python and Typescript SDKs, as well as Scrapy and no-code tool integrations.

To take advantage of ScrapFly's API in our YellowPages web scraper all we need to do is change our httpx session code with scrapfly-sdk client requests:

# standard web scraping code
import httpx
from parsel import Selector

response = httpx.get("some yellowpages.com URL")
selector = Selector(response.text)

# in ScrapFly becomes this 👇
from scrapfly import ScrapeConfig, ScrapflyClient

# replaces your HTTP client (httpx in this case)
scrapfly = ScrapflyClient(key="Your ScrapFly API key")

response = scrapfly.scrape(ScrapeConfig(
    url="some yellowpages.com URL",
    asp=True, # enable the anti scraping protection to bypass blocking
    country="US", # set the proxy location to a specfic country
    render_js=True # enable rendering JavaScript (like headless browsers) to scrape dynamic content if needed
))

# use the built in Parsel selector
selector = response.selector
# access the HTML content

FAQ

To wrap this guide up let's take a look at some frequently asked questions about web scraping YellowPages.

Is it legal to scrape YellowPages.com ?

Yes, YellowPages's data is publicly available and it's legal to scrape them. Scraping YellowPages.com at slow, respectful rates would fall under the ethical scraping definition. For more details, refer to our Is Web Scraping Legal? article.

Is there an API for YellowPages?

No, unfortunately, YellowPages.com doesn't offer APIs for public use. However, as we've covered in this tutorial - scraping YellowPages using Python is straightforward.

Are there alternatives for scraping YellowPages?

Yes, Yelp.com is another public website for business directories. We have covered how to scrape Yelp in a previous guide.

Latest Yellowpages Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

YellowPages Scraping Summary

In this article, we explained how to scrape YellowPages in Python. We started by reverse engineering the website behavior to understand its search system and find company pages on the website. Then, we used CSS selectors to parse the HTML pages and extract business details.

Finally, we have explained how to bypass YellowPages scraping blocking using ScrapFly's web scraping API.

How to Scrape YellowPages.com in 2025

Web Scraping with Python

Latest YellowPages Scraper Code

Why Scrape YellowPages.com?

Project Setup

How to Find Companies on YellowPages

How to Scrape YellowPages Search

How to Scrape Yellowpages Company Data

How to Scrape Yellowpages Reviews

Bypass Yellowpages Scraping Blocking

FAQ

Is it legal to scrape YellowPages.com ?

Is there an API for YellowPages?

Are there alternatives for scraping YellowPages?

YellowPages Scraping Summary

Explore this Article with AI

Related Knowledgebase

Python httpx vs requests vs aiohttp - key differences

How to scrape HTML table to Excel Spreadsheet (.xlsx)?

What Python libraries support HTTP2?

How to handle popup dialogs in Playwright?

How to use proxies with Python httpx?

How to scrape images from a website?

How to select dictionary key recursively in Python?

How to use cURL in Python?

How to fix Python requests SSLError?

Selenium: geckodriver executable needs to be in PATH?

Selenium: chromedriver executable needs to be in PATH?

How to fix Python requests ReadTimeout error?

Related Articles

How to Scrape Imovelweb.com

How to Scrape AutoScout24

How to Scrape Allegro.pl

How to Scrape Ticketmaster

How to Scrape Mouser.com

How to Scrape Zoro.com