🚀 We are hiring! See open positions

How to Scrape Bing Search with Python

How to Scrape Bing Search with Python

Bing.com is the second most popular search engine out there. It includes tons of valuable data found as search results. However, it's challenging to scrape due to its obfuscation challenges and high blocking rate.

In this article, we'll explain how to scrape Bing using Python. We'll be scraping valuable data fields, such as keywords and search ranking results. Let's dive in!

Key Takeaways

Master bing search python scraping with advanced techniques, SERP data extraction, and SEO monitoring for comprehensive search engine analysis.

  • Reverse engineer Bing's API endpoints by intercepting browser network requests and analyzing JSON responses
  • Extract structured search data including results, keywords, and ranking information from search engine results
  • Implement pagination handling and search parameter management for comprehensive SERP data collection
  • Configure proxy rotation and fingerprint management to avoid detection and rate limiting
  • Use specialized tools like ScrapFly for automated Bing scraping with anti-blocking features
  • Implement data validation and error handling for reliable search engine information extraction

Latest Bing Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

Bing indexes a large sector of the public internet, including some websites that aren't indexed by other search engines, such as Google. So, by scraping Bing, we can access different data sources and numerous data insights.

Web scraping Bing is also a popular use case for SEO practices. Businesses can scrape Bing for search results to know their competitors' ranks and what keywords they are using.

Bing also features results as AI-answered snippets or summary snippets from popular websites, such as Wikipedia. These snippets can be directly scraped from the search results instead of extracting them from the origin website.

Project Setup

To scrape Bing, we'll be using a few Python packages:

  • httpx: For requesting Bing search pages and getting HTML pages.
  • playwright: For scraping dynamically loaded parts of the search pages.
  • parsel: For parsing the HTML using web selectors like XPath and CSS.
  • loguru: For monitoring our scraper behavior.
  • asyncio: For running the scraping code asynchronously, increasing our web scraping speed.

Since asyncio comes included with Python. We only have to install the other packages using the following pip command:

$ pip install httpx playwright parsel loguru

After running the above command, install the playwright headless browser binaries using the following command:

$ playwright install

This guide will be focused on scraping Bing's search. However, the concepts can be applied to other search engines like Google, Duckduckgo, Kagi etc.

How to Scrape Google Search Results in 2025

In this scrape guide we'll be taking a look at how to scrape Google Search - the biggest index of public web. We'll cover dynamic HTML parsing and SERP collection itself.

How to Scrape Google Search Results in 2025

How to Scrape Bing Search Results

Let's start our guide by scraping Bing search result rankings (SERPs).

Search for any keyword, such as web scraping emails. The SERPs on the search page should look like this:

bing SERP results
Bing search page results

This search page contains other data snippets about the search keyword. However, we are only interested in the SERP results in this section. These results look like this in the HTML:

<main aria-label="Search Results">
    ......
    <li class="b_algo" data-tag="" data-partnertag="" data-id="" data-bm="8">
        ....
        <h2><a> .... SERP title .... </a></h2>
    </li>
    <li class="b_algo" data-tag="" data-partnertag="" data-id="" data-bm="9">
        ....
        <h2><a> .... SERP title .... </a></h2>
    </li>    
    <li class="b_algo" data-tag="" data-partnertag="" data-id="" data-bm="10">
        ....
        <h2><a> .... SERP title .... </a></h2>
    </li>
    ....
</main>

Bing's search pages HTML is dynamic. This means the class names are often changing which can break our parsing selectors. Therefore, we'll match elements against distinct class attributes and avoid dynamic class names:

Python
ScrapFly
def parse_serps(response: Response) -> List[Dict]:
    """parse SERPs from bing search pages"""
    selector = Selector(response.text)
    data = []
    if "first" not in response.context["url"]:
        position = 0
    else:
        position = int(response.context["url"].split("first=")[-1])
    for result in selector.xpath("//li[@class='b_algo']"):
        url = result.xpath(".//h2/a/@href").get()
        description = result.xpath("normalize-space(.//div/p)").extract_first()
        date = result.xpath(".//span[@class='news_dt']/text()").get()
        if data is not None and and len(date) > 12:
            date_pattern = re.compile(r"\b\d{2}-\d{2}-\d{4}\b")
            date_pattern.findall(description)
            dates = date_pattern.findall(date)
            date = dates[0] if dates else None
        position += 1
        data.append(
            {
                "position": position,
                "title": "".join(result.xpath(".//h2/a//text()").extract()),
                "url": url,
                "origin": result.xpath(".//div[@class='tptt']/text()").get(),
                "domain": url.split("https://")[-1].split("/")[0].replace("www.", "")
                if url
                else None,
                "description": description,
                "date": date,
            }
        )
    return data
def parse_serps(response: ScrapeApiResponse) -> List[Dict]:
    """parse SERPs from bing search pages"""
    selector = response.selector
    data = []
    if "first" not in response.context["url"]:
        position = 0
    else:
        position = int(response.context["url"].split("first=")[-1])
    for result in selector.xpath("//li[@class='b_algo']"):
        url = result.xpath(".//h2/a/@href").get()
        description = result.xpath("normalize-space(.//div/p)").extract_first()
        date = result.xpath(".//span[@class='news_dt']/text()").get()
        if data is not None and len(date) > 12:
            date_pattern = re.compile(r"\b\d{2}-\d{2}-\d{4}\b")
            date_pattern.findall(description)
            dates = date_pattern.findall(date)
            date = dates[0] if dates else None
        position += 1
        data.append(
            {
                "position": position,
                "title": "".join(result.xpath(".//h2/a//text()").extract()),
                "url": url,
                "origin": result.xpath(".//div[@class='tptt']/text()").get(),
                "domain": url.split("https://")[-1].split("/")[0].replace("www.", "")
                if url
                else None,
                "description": description,
                "date": date,
            }
        )
    return data

In the above code, we use the XPath selector to parse the SERPs' data from the HTML, such as the rank position, title, description, link and website. The next step is utilizing this function while sending requests to scrape the data:

Python
ScrapFly
import re
import asyncio
import json
from typing import List, Dict
from urllib.parse import urlencode
from httpx import AsyncClient, Response
from parsel import Selector
from loguru import logger as log

# initialize an async httpx client
client = AsyncClient(
    # enable http2
    http2=True,
    # add basic browser like headers to prevent being blocked
    headers={
        "Accept-Language": "en-US,en;q=0.9", # get the search results in English
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
    },
)

def parse_serps(response: Response) -> List[Dict]:
    """parse SERPs from bing search pages"""
    # rest of the function code


async def scrape_search(query: str):
    """scrape bing search pages"""
    url = f"https://www.bing.com/search?{urlencode({'q': query})}"
    log.info("scraping the first search page")
    response = await client.get(url)
    serp_data = parse_serps(response)
    log.success(f"scraped {len(serp_data)} search results from Bing search")
    return serp_data
import re
import asyncio
import json
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
from typing import Dict, List
from urllib.parse import urlencode
from loguru import logger as log

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")

def parse_serps(response: ScrapeApiResponse) -> List[Dict]:
    """parse SERPs from bing search pages"""
    # rest of the function code


async def scrape_search(query: str):
    """scrape bing search pages"""
    url = f"https://www.bing.com/search?{urlencode({'q': query})}"
    log.info("scraping the first search page")
    response = await SCRAPFLY.async_scrape(ScrapeConfig(url, asp=True, country="US"))
    serp_data = parse_serps(response)
    log.success(f"scraped {len(serp_data)} search results from Bing search")
    return serp_data
Run the code
async def run():
    serp_data = await scrape_search(
        query="web scraping emails",
        max_pages=2
    )
    # save the result to a JSON file
    with open("serps.json", "w", encoding="utf-8") as file:
        json.dump(serp_data, file, indent=2, ensure_ascii=False)

if __name__ == "__main__":
    asyncio.run(run())

Let's break down the above code. First, we start by initializing an async httpx client with basic browser headers to minimize the chances of getting our scraper blocked. Since Bing supports different languages, we define an Accept-Language header to set the web scraping language to English. We also define a scrape_search() function, which requests the search pages and then parses the page HTML using the parse_serps() function we defined earlier.

The above code can only scrape the first search page. Let's extend it to crawl over other pages. To do that, we can use the first parameter to start the page from a specific index. For example, if the first search page ends at the index 9, then the second page starts with the index 10. Let's apply this to our code:

Python
ScrapFly
# previous code remains the same
    
async def scrape_search(query: str, max_pages: int = None):
    """scrape bing search pages"""
    url = f"https://www.bing.com/search?{urlencode({'q': query})}"
    log.info("scraping the first search page")
    response = await client.get(url)
    serp_data = parse_serps(response)

    # new code starts from here
    log.info(f"scraping search pagination ({max_pages - 1} more pages)")
    total_results = (max_pages - 1) * 10  # each page contains 10 results
    other_pages = [
        client.get(url + f"&first={start}")
        for start in range(10, total_results + 10, 10)
    ]

    # scrape the remaining search pages concurrently
    for response in asyncio.as_completed(other_pages):
        response = await response
        data = parse_serps(response)
        serp_data.extend(data)
    log.success(f"scraped {len(serp_data)} search results from Bing search")
    return serp_data
# previous code remains the same
    
async def scrape_search(query: str, max_pages: int = None):
    """scrape bing search pages"""
    url = f"https://www.bing.com/search?{urlencode({'q': query})}"
    log.info("scraping the first search page")
    response = await SCRAPFLY.async_scrape(ScrapeConfig(url, asp=True, country="US"))
    serp_data = parse_serps(response)

    # new code starts from here
    log.info(f"scraping search pagination ({max_pages - 1} more pages)")
    total_results = (max_pages - 1) * 10  # each page contains 10 results
    other_pages = [
        ScrapeConfig(url + f"&first={start}", asp=True, country="US")
        for start in range(10, total_results + 10, 10)
    ]

    # scrape the remaining search pages concurrently
    async for response in SCRAPFLY.concurrent_scrape(other_pages):
        data = parse_serps(response)
        serp_data.extend(data)
    log.success(f"scraped {len(serp_data)} search results from Bing search")
    return serp_data
Run the code
async def run():
    serp_data = await scrape_search(
        query="web scraping emails",
        max_pages=3 # new, max search pages to scrape
    )
    # save the result to a JSON file
    with open("serps.json", "w", encoding="utf-8") as file:
        json.dump(serp_data, file, indent=2, ensure_ascii=False)

if __name__ == "__main__":
    asyncio.run(run())

Here, we use the first parameter to create a scraping list for the remaining search pages. Then, we scrape them concurrently, like how we scraped the first page.

Here is a sample output of the result we got:

Sample output
[
  {
    "position": 1,
    "title": "email-scraper · GitHub Topics · GitHub",
    "url": "https://github.com/topics/email-scraper",
    "origin": "Github",
    "domain": "github.com",
    "description": "WebNov 24, 2023 · An email scraper that finds email addresses located on a website. Made with Python Django. Emails are scraped using the requests, BeautifulSoup and regex …",
    "date": "Nov 24, 2023"
  },
  {
    "position": 2,
    "title": "Web Scraping Emails with Python - scrapfly.io",
    "url": "https://scrapfly.io/blog/posts/how-to-scrape-emails-using-python/",
    "origin": "Scrapfly",
    "domain": "scrapfly.io",
    "description": "WebOct 16, 2023 (Updated 2 months ago) Have you wondered how businesses seem to have an endless list of email contacts? Email scraping can do that! In this article, we'll explore …",
    "date": null
  },
  {
    "position": 3,
    "title": "Email scraping: Use cases, challenges & best practices in 2023",
    "url": "https://research.aimultiple.com/email-scraping/",
    "origin": "AIMultiple",
    "domain": "research.aimultiple.com",
    "description": "WebOct 13, 2023 · What is Email Scraping? Email scraping is the technique of extracting email addresses in bulk from websites using email scrapers. Top 3 benefits of email …",
    "date": "Oct 13, 2023"
  },
  {
    "position": 4,
    "title": "Scrape Email Addresses From Websites using Python …",
    "url": "https://www.scrapingdog.com/blog/scrape-email-addresses-from-website/",
    "origin": "Scrapingdog",
    "domain": "scrapingdog.com",
    "description": "Web13-01-2023 Email Scraping has become a popular and efficient method for obtaining valuable contact information from the internet. By learning how to scrape emails, businesses and individuals can expand their networks, …",
    "date": "13-01-2023"
  },
  {
    "position": 5,
    "title": "How to Scrape Emails on the Web? [8 Easy Steps and Tools]",
    "url": "https://techjury.net/blog/how-to-scrape-emails-on-the-web/",
    "origin": "Techjury",
    "domain": "techjury.net",
    "description": "WebNov 21, 2023 · Email scraping (or email address harvesting) is the process of gathering email addresses of potential clients from the Internet using automated tools. This method …",
    "date": "Nov 21, 2023"
  }
]

Our Bing scraper can successfully scrape search pages for SERP data. Next, we'll scrape keyword data.

How to Scrape Bing Keyword Data

Knowing what users search for or ask about is an essential part of the SEO keyword research. This keyword data can be found on Bing search pages under the related queries section:

keyword data on Bing search pages
Keyword data on Bing search pages

The first part of scraping this data is defining the parsing logic. Like we did before, we'll use XPath selectors and match against elements' attributes:

Python
ScrapFly
def parse_keywords(response: Response) -> Dict:
    """parse keyword data from bing search pages"""
    selector = Selector(response.text)

    related_keywords = []
    for keyword in selector.xpath(".//li[@class='b_ans']/div/ul/li"):
        related_keywords.append("".join(keyword.xpath(".//a/div//text()").extract()))

    return related_keywords
def parse_keywords(response: ScrapeApiResponse) -> Dict:
    """parse keyword data from bing search pages"""
    selector = response.selector

    related_keywords = []
    for keyword in selector.xpath(".//li[@class='b_ans']/div/ul/li"):
        related_keywords.append("".join(keyword.xpath(".//a/div//text()").extract()))

    return related_keywords

Here, we define a parse_keywords function. It extracts the FAQs and related queries data using the XPath selector. Next, we'll use this function after requesting the search pages to scrape the data. This data is often found on the first search page. So, pagination isn't required for this Bing scraping section:

Python
ScrapFly
import asyncio
import json
from typing import Dict
from urllib.parse import urlencode
from httpx import AsyncClient, Response
from parsel import Selector
from loguru import logger as log

# initialize an async httpx client
client = AsyncClient(
    # enable http2
    http2=True,
    # add basic browser like headers to prevent being blocked
    headers={
        "Accept-Language": "en-US,en;q=0.9", # get the search results in English
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "accept-encoding": "gzip, deflate, br",
    },
)

def parse_keywords(response: Response) -> Dict:
    """parse keyword data from bing search pages"""
    # rest of the function code    


async def scrape_keywords(query: str):
    """scrape bing search pages for keyword data"""
    url = f"https://www.bing.com/search?{urlencode({'q': query})}"
    log.info("scraping Bing search for keyword data")
    response = await client.get(url)
    keyword_data = parse_keywords(response)
    log.success(
        f"scraped {len(keyword_data)} keywords from Bing search"
    )
    return keyword_data
import asyncio
import json
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
from typing import Dict
from urllib.parse import urlencode
from loguru import logger as log

SCRAPFLY = ScrapflyClient(key="Your Scrapfly API key")

BASE_CONFIG = {
    # bypass Bing web scraping blocking
    "asp": True,
    # set the poxy location to US to get the result in English
    "country": "GB",
    "proxy_pool": "public_residential_pool",
    "debug":True,
    "os":"linux",
    "auto_scroll":True,
}

def parse_keywords(response: Response) -> Dict:
    """parse keyword data from bing search pages"""
    # rest of the function code


async def scrape_keywords(query: str):
    """scrape bing search pages for keyword data"""
    url = f"https://www.bing.com/search?{urlencode({'q': query})}"
    log.info("scraping Bing search for keyword data")
    response = await SCRAPFLY.async_scrape(ScrapeConfig(url, **BASE_CONFIG, render_js=True))
    keyword_data = parse_keywords(response)
    log.success(
        f"scraped {len(keyword_data)} keywords from Bing search"
    )
    return keyword_data
Run the code
async def run():
    keyword_data = await scrape_keywords(
        query="web scraping emails",
    )
    # save the result to a JSON file
    with open("keywords.json", "w", encoding="utf-8") as file:
        json.dump(keyword_data, file, indent=2, ensure_ascii=False)

if __name__ == "__main__":
    asyncio.run(run())

Here, we use the same httpx client we defined before and define a scrape_keywords function. It requests the search page and then parses the keyword data using the parse_keyword function we defined earlier. Finally, we run the code using asyncio and save the result to a JSON file. Here is the result we got:

Output
{
  "related_keywords": [
    "extract email address from website",
    "extract email from website free",
    "extract email from website",
    "scraping email addresses from websites",
    "scrape emails from website free",
    "capture emails from websites",
    "crawl website for email addresses",
    "extract email from webpage"
  ]
}

With this last piece, our Bing scraper is complete!
It scrapes SERPs, keywords and rich snippet data from search page HTMLs. However, our scraper is very likely to get blocked after sending additional requests. Let's have a look at a solution!

Avoid Bing Scraping Blocking With ScrapFly

To avoid Bing web scraping blocking, we'll use ScrapFly - a web scraping API that bypasses any website scraping blocking.

scrapfly middleware

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

For scraping Bing with Scrapfly all we have to do is replace our HTTP client with the ScrapFly client:

# standard web scraping code
import httpx
from parsel import Selector

response = httpx.get("some bing.com URL")
selector = Selector(response.text)

# in ScrapFly becomes this 👇
from scrapfly import ScrapeConfig, ScrapflyClient

# replaces your HTTP client (httpx in this case)
scrapfly = ScrapflyClient(key="Your ScrapFly API key")

response = scrapfly.scrape(ScrapeConfig(
    url="website URL",
    asp=True, # enable the anti scraping protection to bypass blocking
    country="US", # set the proxy location to a specfic country
    render_js=True # enable rendering JavaScript (like headless browsers) to scrape dynamic content if needed
))

# use the built in Parsel selector
selector = response.selector
# access the HTML content

FAQ

To wrap up this guide on web scraping Bing, let's take a look at some frequently asked questions.

Yes, Microsoft offers a subscription-based API for Bing search.

Yes, all the data on Bing search pages are publicly available, and it's legal to scrape them as long as you don't harm the website by keeping your scraping rate reasonable.

Are there alternatives for scraping Bing?

Yes, Google is the most popular alternative to the Bing search engine. We have explained how to scrape Google in a previous article. Many other search engines use Bing's data (like duckduckgo, kagi) so scraping bing covers scraping of these targets as well!

Latest Bing Scraper Code
https://github.com/scrapfly/scrapfly-scrapers/

Web Scraping Bing - Summary

In this article, we explained how to scrape Bing search. We went through a step-by-step guide on creating a Bing scraper to scrape SERPs, keywords and rich snippet data. We also explained how to overcome the Bing scraping challenges:

  • Complex and dynamic HTML structure.
    By parsing the HTML by matching against distinct elements' attributes and avoiding dynamic class names.

  • Scraping blocking and localized searches.
    By adding explicit language headers and using ScrapFly to avoid Bing web scraping blocking.

Explore this Article with AI

Related Knowledgebase

Related Articles