How to Scrape Bing Search with Python

How to Scrape Bing Search with Python

Bing.com is the second most popular search engine out there. It includes tons of valuable data found as search results. However, it's challenging to scrape due to its obfuscation challenges and high blocking rate.

In this article, we'll explain how to scrape Bing using Python. We'll be scraping valuable data fields, such as keywords and search ranking results. Let's dive in!

Latest Bing Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

Bing indexes a large sector of the public internet, including some websites that aren't indexed by other search engines, such as Google. So, by scraping Bing, we can access different data sources and numerous data insights.

Web scraping Bing is also a popular use case for SEO practices. Businesses can scrape Bing for search results to know their competitors' ranks and what keywords they are using.

Bing also features results as AI-answered snippets or summary snippets from popular websites, such as Wikipedia. These snippets can be directly scraped from the search results instead of extracting them from the origin website.

Project Setup

To scrape Bing, we'll be using a few Python packages:

  • httpx: For requesting Bing search pages and getting HTML pages.
  • playwright: For scraping dynamically loaded parts of the search pages.
  • parsel: For parsing the HTML using web selectors like XPath and CSS.
  • loguru: For monitoring our scraper behavior.
  • asyncio: For running the scraping code asynchronously, increasing our web scraping speed.

Since asyncio comes included with Python. We only have to install the other packages using the following pip command:

$ pip install httpx playwright parsel loguru

After running the above command, install the playwright headless browser binaries using the following command:

$ playwright install

This guide will be focused on scraping Bing's search. However, the concepts can be applied to other search engines like Google, Duckduckgo, Kagi etc.

How to Scrape Google Search with Python

For scraping google see our introduction to scraping Google using Python

How to Scrape Google Search with Python

How to Scrape Bing Search Results

Let's start our guide by scraping Bing search result rankings (SERPs).

Search for any keyword, such as web scraping emails. The SERPs on the search page should look like this:

bing SERP results
Bing search page results

This search page contains other data snippets about the search keyword. However, we are only interested in the SERP results in this section. These results look like this in the HTML:

<main aria-label="Search Results">
    ......
    <li class="b_algo" data-tag="" data-partnertag="" data-id="" data-bm="8">
        ....
        <h2><a> .... SERP title .... </a></h2>
    </li>
    <li class="b_algo" data-tag="" data-partnertag="" data-id="" data-bm="9">
        ....
        <h2><a> .... SERP title .... </a></h2>
    </li>    
    <li class="b_algo" data-tag="" data-partnertag="" data-id="" data-bm="10">
        ....
        <h2><a> .... SERP title .... </a></h2>
    </li>
    ....
</main>

Bing's search pages HTML is dynamic. This means the class names are often changing which can break our parsing selectors. Therefore, we'll match elements against distinct class attributes and avoid dynamic class names:

Python
ScrapFly
def parse_serps(response: Response) -> List[Dict]:
    """parse SERPs from bing search pages"""
    selector = Selector(response.text)
    data = []
    if "first" not in response.context["url"]:
        position = 0
    else:
        position = int(str(response.url).split("first=")[-1])
    for result in selector.xpath("//li[@class='b_algo']"):
        url = result.xpath(".//h2/a/@href").get()
        description = result.xpath("normalize-space(.//div/p)").extract_first()
        date = result.xpath(".//span[@class='news_dt']/text()").get()
        # date can be in different format
        if data is not None and date is not None and len(date) > 12:
            date_pattern = re.compile(r"\b\d{2}-\d{2}-\d{4}\b")
            date_pattern.findall(description)
            dates = date_pattern.findall(date)
            date = dates[0] if dates else None
        position += 1
        data.append(
            {
                "position": position,
                "title": "".join(result.xpath(".//h2/a//text()").extract()),
                "url": url,
                "origin": result.xpath(".//div[@class='tptt']/text()").get(),
                "domain": url.split("https://")[-1].split("/")[0].replace("www.", "")
                if url
                else None,
                "description": description,
                "date": date,
            }
        )
    return data
def parse_serps(response: ScrapeApiResponse) -> List[Dict]:
    """parse SERPs from bing search pages"""
    selector = response.selector
    data = []
    if "first" not in response.context["url"]:
        position = 0
    else:
        position = int(response.context["url"].split("first=")[-1])
    for result in selector.xpath("//li[@class='b_algo']"):
        url = result.xpath(".//h2/a/@href").get()
        description = result.xpath("normalize-space(.//div/p)").extract_first()
        date = result.xpath(".//span[@class='news_dt']/text()").get()
        # date can be in different format        
        if data is not None and date is not None and len(date) > 12:
            date_pattern = re.compile(r"\b\d{2}-\d{2}-\d{4}\b")
            date_pattern.findall(description)
            dates = date_pattern.findall(date)
            date = dates[0] if dates else None
        position += 1
        data.append(
            {
                "position": position,
                "title": "".join(result.xpath(".//h2/a//text()").extract()),
                "url": url,
                "origin": result.xpath(".//div[@class='tptt']/text()").get(),
                "domain": url.split("https://")[-1].split("/")[0].replace("www.", "")
                if url
                else None,
                "description": description,
                "date": date,
            }
        )
    return data

In the above code, we use the XPath selector to parse the SERPs' data from the HTML, such as the rank position, title, description, link and website. The next step is utilizing this function while sending requests to scrape the data:

Python
ScrapFly
import re
import asyncio
import json
from typing import List, Dict
from urllib.parse import urlencode
from httpx import AsyncClient, Response
from parsel import Selector
from loguru import logger as log

# initialize an async httpx client
client = AsyncClient(
    # enable http2
    http2=True,
    # add basic browser like headers to prevent being blocked
    headers={
        "Accept-Language": "en-US,en;q=0.9", # get the search results in English
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
    },
)

def parse_serps(response: Response) -> List[Dict]:
    """parse SERPs from bing search pages"""
    # rest of the function code


async def scrape_search(query: str):
    """scrape bing search pages"""
    url = f"https://www.bing.com/search?{urlencode({'q': query})}"
    log.info("scraping the first search page")
    response = await client.get(url)
    serp_data = parse_serps(response)
    log.success(f"scraped {len(serp_data)} search results from Bing search")
    return serp_data
import re
import asyncio
import json
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
from typing import Dict, List
from urllib.parse import urlencode
from loguru import logger as log

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")

def parse_serps(response: ScrapeApiResponse) -> List[Dict]:
    """parse SERPs from bing search pages"""
    # rest of the function code


async def scrape_search(query: str):
    """scrape bing search pages"""
    url = f"https://www.bing.com/search?{urlencode({'q': query})}"
    log.info("scraping the first search page")
    response = await SCRAPFLY.async_scrape(ScrapeConfig(url, asp=True, country="US"))
    serp_data = parse_serps(response)
    log.success(f"scraped {len(serp_data)} search results from Bing search")
    return serp_data
Run the code
async def run():
    serp_data = await scrape_search(
        query="web scraping emails"
    )
    # save the result to a JSON file
    with open("serps.json", "w", encoding="utf-8") as file:
        json.dump(serp_data, file, indent=2, ensure_ascii=False)

if __name__ == "__main__":
    asyncio.run(run())

Let's break down the above code. First, we start by initializing an async httpx client with basic browser headers to minimize the chances of getting our scraper blocked. Since Bing supports different languages, we define an Accept-Language header to set the web scraping language to English. We also define a scrape_search() function, which requests the search pages and then parses the page HTML using the parse_serps() function we defined earlier.

The above code can only scrape the first search page. Let's extend it to crawl over other pages. To do that, we can use the first parameter to start the page from a specific index. For example, if the first search page ends at the index 9, then the second page starts with the index 10. Let's apply this to our code:

Python
ScrapFly
# previous code remains the same
    
async def scrape_search(query: str, max_pages: int = None):
    """scrape bing search pages"""
    url = f"https://www.bing.com/search?{urlencode({'q': query})}"
    log.info("scraping the first search page")
    response = await client.get(url)
    serp_data = parse_serps(response)

    # new code starts from here
    log.info(f"scraping search pagination ({max_pages - 1} more pages)")
    total_results = (max_pages - 1) * 10  # each page contains 10 results
    other_pages = [
        client.get(url + f"&first={start}")
        for start in range(10, total_results + 10, 10)
    ]

    # scrape the remaining search pages concurrently
    for response in asyncio.as_completed(other_pages):
        response = await response
        data = parse_serps(response)
        serp_data.extend(data)
    log.success(f"scraped {len(serp_data)} search results from Bing search")
    return serp_data
# previous code remains the same
    
async def scrape_search(query: str, max_pages: int = None):
    """scrape bing search pages"""
    url = f"https://www.bing.com/search?{urlencode({'q': query})}"
    log.info("scraping the first search page")
    response = await SCRAPFLY.async_scrape(ScrapeConfig(url, asp=True, country="US"))
    serp_data = parse_serps(response)

    # new code starts from here
    log.info(f"scraping search pagination ({max_pages - 1} more pages)")
    total_results = (max_pages - 1) * 10  # each page contains 10 results
    other_pages = [
        ScrapeConfig(url + f"&first={start}", asp=True, country="US")
        for start in range(10, total_results + 10, 10)
    ]

    # scrape the remaining search pages concurrently
    async for response in SCRAPFLY.concurrent_scrape(other_pages):
        data = parse_serps(response)
        serp_data.extend(data)
    log.success(f"scraped {len(serp_data)} search results from Bing search")
    return serp_data
Run the code
async def run():
    serp_data = await scrape_search(
        query="web scraping emails",
        max_pages=3 # new, max search pages to scrape
    )
    # save the result to a JSON file
    with open("serps.json", "w", encoding="utf-8") as file:
        json.dump(serp_data, file, indent=2, ensure_ascii=False)

if __name__ == "__main__":
    asyncio.run(run())

Here, we use the first parameter to create a scraping list for the remaining search pages. Then, we scrape them concurrently, like how we scraped the first page.

Here is a sample output of the result we got:

Sample output
[
  {
    "position": 1,
    "title": "email-scraper · GitHub Topics · GitHub",
    "url": "https://github.com/topics/email-scraper",
    "origin": "Github",
    "domain": "github.com",
    "description": "WebNov 24, 2023 · An email scraper that finds email addresses located on a website. Made with Python Django. Emails are scraped using the requests, BeautifulSoup and regex …",
    "date": "Nov 24, 2023"
  },
  {
    "position": 2,
    "title": "Web Scraping Emails with Python - scrapfly.io",
    "url": "https://scrapfly.io/blog/how-to-scrape-emails-using-python/",
    "origin": "Scrapfly",
    "domain": "scrapfly.io",
    "description": "WebOct 16, 2023 (Updated 2 months ago) Have you wondered how businesses seem to have an endless list of email contacts? Email scraping can do that! In this article, we'll explore …",
    "date": null
  },
  {
    "position": 3,
    "title": "Email scraping: Use cases, challenges & best practices in 2023",
    "url": "https://research.aimultiple.com/email-scraping/",
    "origin": "AIMultiple",
    "domain": "research.aimultiple.com",
    "description": "WebOct 13, 2023 · What is Email Scraping? Email scraping is the technique of extracting email addresses in bulk from websites using email scrapers. Top 3 benefits of email …",
    "date": "Oct 13, 2023"
  },
  {
    "position": 4,
    "title": "Scrape Email Addresses From Websites using Python …",
    "url": "https://www.scrapingdog.com/blog/scrape-email-addresses-from-website/",
    "origin": "Scrapingdog",
    "domain": "scrapingdog.com",
    "description": "Web13-01-2023 Email Scraping has become a popular and efficient method for obtaining valuable contact information from the internet. By learning how to scrape emails, businesses and individuals can expand their networks, …",
    "date": "13-01-2023"
  },
  {
    "position": 5,
    "title": "How to Scrape Emails on the Web? [8 Easy Steps and Tools]",
    "url": "https://techjury.net/blog/how-to-scrape-emails-on-the-web/",
    "origin": "Techjury",
    "domain": "techjury.net",
    "description": "WebNov 21, 2023 · Email scraping (or email address harvesting) is the process of gathering email addresses of potential clients from the Internet using automated tools. This method …",
    "date": "Nov 21, 2023"
  }
]

Our Bing scraper can successfully scrape search pages for SERP data. Next, we'll scrape keyword data.

How to Scrape Bing Keyword Data

Knowing what users search for or ask about is an essential part of the SEO keyword research. This keyword data can be found on Bing search pages under FAQs and related queries section:

keyword data on Bing search pages
Keyword data on Bing search pages

The first part of scraping this data is defining the parsing logic. Like we did before, we'll use XPath selectors and match against elements' attributes:

Python
ScrapFly
def parse_keywords(response: Response) -> Dict:
    """parse keyword data from bing search pages"""
    selector = Selector(response.text)
    faqs = []
    for faq in selector.xpath("//*[*[div[contains(@data-tag, 'RelatedQnA.Item')]]]"):
        url = faq.xpath(".//a/@href").get()
        faqs.append(
            {
                "query": faq.xpath(".//div[contains(@data-tag, 'RelatedQnA.Item')]/@data-query").get(),
                "answer": faq.xpath(".//span[contains(@data-tag, 'QnA')]/text()").get(),
                "title": "".join(faq.xpath(".//div[@class='b_algo']/h2/*//text()").extract()),
                "domain": url.split("https://")[-1].split("/")[0].replace("www.", "")if url else None,
                "url": url,
            }
        )
    related_keywords = []
    for keyword in selector.xpath(".//li[@class='b_ans']/div/ul/li"):
        related_keywords.append("".join(keyword.xpath(".//a/div//text()").extract()))

    return {"FAQs": faqs, "related_keywords": related_keywords}
def parse_keywords(response: ScrapeApiResponse) -> Dict:
    """parse keyword data from bing search pages"""
    selector = response.selector
    faqs = []
    for faq in selector.xpath("//*[*[div[contains(@data-tag, 'RelatedQnA.Item')]]]"):
        url = faq.xpath(".//a/@href").get()
        faqs.append(
            {
                "query": faq.xpath(".//div[contains(@data-tag, 'RelatedQnA.Item')]/@data-query").get(),
                "answer": faq.xpath(".//span[contains(@data-tag, 'QnA')]/text()").get(),
                "title": "".join(faq.xpath(".//div[@class='b_algo']/h2/*//text()").extract()),
                "domain": url.split("https://")[-1].split("/")[0].replace("www.", "")if url else None,
                "url": url,
            }
        )
    related_keywords = []
    for keyword in selector.xpath(".//li[@class='b_ans']/div/ul/li"):
        related_keywords.append("".join(keyword.xpath(".//a/div//text()").extract()))

    return {"FAQs": faqs, "related_keywords": related_keywords}

Here, we define a parse_keywords function. It extracts the FAQs and related queries data using the XPath selector. Next, we'll use this function after requesting the search pages to scrape the data. This data is often found on the first search page. So, pagination isn't required for this Bing scraping section:

Python
ScrapFly
import asyncio
import json
from typing import Dict
from urllib.parse import urlencode
from httpx import AsyncClient, Response
from parsel import Selector
from loguru import logger as log

# initialize an async httpx client
client = AsyncClient(
    # enable http2
    http2=True,
    # add basic browser like headers to prevent being blocked
    headers={
        "Accept-Language": "en-US,en;q=0.9", # get the search results in English
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "accept-encoding": "gzip, deflate, br",
    },
)

def parse_keywords(response: Response) -> Dict:
    """parse keyword data from bing search pages"""
    # rest of the function code    


async def scrape_keywords(query: str):
    """scrape bing search pages for keyword data"""
    url = f"https://www.bing.com/search?{urlencode({'q': query})}"
    log.info("scraping Bing search for keyword data")
    response = await client.get(url)
    keyword_data = parse_keywords(response)
    log.success(
        f"scraped {len(keyword_data['related_keywords'])} keywords and {len(keyword_data['FAQs'])} FAQs from Bing search"
    )
    return keyword_data
import asyncio
import json
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
from typing import Dict
from urllib.parse import urlencode
from loguru import logger as log

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")

def parse_keywords(response: ScrapeApiResponse) -> Dict:
    """parse keyword data from bing search pages"""
    # rest of the function code   


async def scrape_keywords(query: str):
    """scrape bing search pages for keyword data"""
    url = f"https://www.bing.com/search?{urlencode({'q': query})}"
    log.info("scraping Bing search for keyword data")
    response = await SCRAPFLY.async_scrape(ScrapeConfig(url, asp=True, country="US", render_js=True))
    keyword_data = parse_keywords(response)
    log.success(
        f"scraped {len(keyword_data['related_keywords'])} keywords and {len(keyword_data['FAQs'])} FAQs from Bing search"
    )
    return keyword_data
Run the code
async def run():
    keyword_data = await scrape_keywords(
        query="web scraping emails",
    )
    # save the result to a JSON file
    with open("keywords.json", "w", encoding="utf-8") as file:
        json.dump(keyword_data, file, indent=2, ensure_ascii=False)

if __name__ == "__main__":
    asyncio.run(run())

Here, we use the same httpx client we defined before and define a scrape_keywords function. It requests the search page and then parses the keyword data using the parse_keyword function we defined earlier. Finally, we run the code using asyncio and save the result to a JSON file. Here is the result we got:

Output
{
  "FAQs": [
    {
      "query": "How to scrape emails from a website?",
      "answer": "Next, you will need a web scraper that can scrape emails from any website. For this example, we will download and install ParseHub, a free and powerful web scraper that works with any website. Now it’s time to get scraping. Open ParseHub and click on “New Project”. Then enter the URL of the page you will want to scrape.",
      "title": "How to Scrape Emails from any Website | ParseHub",
      "domain": "parsehub.com",
      "url": "https://www.parsehub.com/blog/scrape-emails/"
    },
    {
      "query": "How can web scraping help a business?",
      "answer": "To reach out to these new clients, you can extract contact information, such as email addresses, social media accounts, and phone numbers. Web scraping allows companies to ",
      "title": "Email scraping: Use cases, challenges & best practices in 2023 - AIMult…",
      "domain": "research.aimultiple.com",
      "url": "https://research.aimultiple.com/email-scraping/"
    },
    {
      "query": "What is email scraper?",
      "answer": "An email scraper that finds email addresses located on a website. Made with Python Django. Emails are scraped using the requests, BeautifulSoup and regex modules. Python script to extract unique email addresses from a list of domains using regular expression. Load more…",
      "title": "email-scraper · GitHub Topics · GitHub",
      "domain": "github.com",
      "url": "https://github.com/topics/email-scraper"
    },
    {
      "query": "Why is email scraping important?",
      "answer": "Email Scraping has become a popular and efficient method for ",
      "title": "Scrape Email Addresses From Websites using Python",
      "domain": "scrapingdog.com",
      "url": "https://www.scrapingdog.com/blog/scrape-email-addresses-from-website/"
    }
  ],
  "related_keywords": [
    "extract email address from website",
    "extract email from website free",
    "extract email from website",
    "scraping email addresses from websites",
    "scrape emails from website free",
    "capture emails from websites",
    "crawl website for email addresses",
    "extract email from webpage"
  ]
}

Our Bing scraper can successfully scrape search results rankings and keyword data. Let's proceed with the last part, scraping Bing's rich results.

How to Scrape Bing Rich Snippets

In this section, we'll scrape Bing search pages to get the rich snippet data. These snippets are summaries gathered from popular data sources and featured on the search pages. Rich snippets on Bing search look like this:

scrapfly middleware
Rich snippets' data on Bing search pages

Rich snippets require JavaScript rendering to load. Therefore, we'll use the Playwright headless browser to scrape its data. But first, let's start with the HTML parsing logic:

Python
ScrapFly
def parse_rich_snippet(html) -> Dict:
    """parse rich snippets from Bing search"""
    selector = Selector(html)
    data = {}
    data["title"] = selector.xpath("//div[@class='l_ecrd_hero_ttl']/div/a/h2/span/text()").get()
    data["link"] = selector.xpath("//div[@class='l_ecrd_hero_ttl']/div/a/@href").get()
    data["heading"] = " ".join(selector.xpath("//a[@title]/h2/span/text()").getall())
    data["links"] = {}
    for item in selector.xpath("//div[contains(@class, 'webicons')]/div"):
        name = item.xpath(".//a/@title").get()
        link = item.xpath(".//a/@href").get()
        data["links"][name] = link

    data["info"] = {}
    for row in selector.xpath("//div[contains(@class, 'expansion')]/div[contains(@class, 'row')]"):
        key = row.xpath(".//div/div/a[1]/text()").get().strip()
        value = row.xpath("string(.//div[not(contains(@class, 'title'))])").get().strip().replace(key, "")
        data["info"][key] = value

    all_text = ""
    for div_element in selector.xpath("//div[@class='lite-entcard-blk l_ecrd_bkg_hlt']"):
        div_text = div_element.xpath("string(.)").get().strip()
        all_text += div_text + "\n"
    data["descrption"] = all_text
    return data
def parse_rich_snippet(response: ScrapeApiResponse) -> Dict:
    """parse rich snippets from Bing search"""
    selector = response.selector
    data = {}
    data["title"] = selector.xpath("//div[@class='l_ecrd_hero_ttl']/div/a/h2/span/text()").get()
    data["link"] = selector.xpath("//div[@class='l_ecrd_hero_ttl']/div/a/@href").get()
    data["heading"] = " ".join(selector.xpath("//a[@title]/h2/span/text()").getall())
    data["links"] = {}
    for item in selector.xpath("//div[contains(@class, 'webicons')]/div"):
        name = item.xpath(".//a/@title").get()
        link = item.xpath(".//a/@href").get()
        data["links"][name] = link

    data["info"] = {}
    for row in selector.xpath("//div[contains(@class, 'expansion')]/div[contains(@class, 'row')]"):
        key = row.xpath(".//div/div/a[1]/text()").get().strip()
        value = row.xpath("string(.//div[not(contains(@class, 'title'))])").get().strip().replace(key, "")
        data["info"][key] = value

    all_text = ""
    for div_element in selector.xpath("//div[@class='lite-entcard-blk l_ecrd_bkg_hlt']"):
        div_text = div_element.xpath("string(.)").get().strip()
        all_text += div_text + "\n"
    data["descrption"] = all_text
    return data

🙋‍ Note that the rich snippets' HTML is constantly changing, and it defers based on the submitted search query. Hence, evaluating the rich snippets' HTML on the browser is crucial to ensure the correct behavior.

Now that our parsing function is ready, we'll use Playwright to request the search page and scrape the data:

Python
ScrapFly
import asyncio
import json
from typing import Dict
from urllib.parse import urlencode
from playwright.async_api import async_playwright
from parsel import Selector
from loguru import logger as log

def parse_rich_snippet(html) -> Dict:
    """parse rich snippets from Bing search"""
    # rest of the function code


async def scrape_rich_snippets(query: str):
    """scrape bing search for rich snippets data"""
    url = f"https://www.bing.com/search?{urlencode({'q': query})}"
    log.info("scraping Bing search for keyword data")

    async with async_playwright() as playwright:
        # launch a chrome headless browser instance
        browser = await playwright.chromium.launch(headless=True)
        # add HTTP headers
        context = await browser.new_context()
        await context.set_extra_http_headers({
        "Accept-Language": "en-US,en;q=0.9"
        })        
        page = await context.new_page()
        # go to the search page and wait for the page to load
        await page.goto(url, wait_until="domcontentloaded")
        # get the page HTML content
        page_content = await page.content()

    rich_snippet_data = parse_rich_snippet(page_content)
    log.success(f"scraped {len(rich_snippet_data)} rich snippets fields from Bing search")
    return rich_snippet_data
import asyncio
import json
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
from typing import Dict
from urllib.parse import urlencode
from loguru import logger as log

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")

def parse_rich_snippet(response: ScrapeApiResponse) -> Dict:
    """parse rich snippets from Bing search"""
    # rest of the function code

async def scrape_rich_snippets(query: str):
    """scrape bing search pages for rich snippets data"""
    url = f"https://www.bing.com/search?{urlencode({'q': query})}"
    log.info("scraping Bing search for keyword data")
    response = await SCRAPFLY.async_scrape(ScrapeConfig(url, asp=True, country="GB", render_js=True))
    rich_snippet_data = parse_rich_snippet(response)
    log.success(f"scraped {len(rich_snippet_data)} rich snippets fields from Bing search")
    return rich_snippet_data
Run the code
async def run():
    rich_snippet_data = await scrape_rich_snippets(
        query="Google Chrome",
    )
    # save the result to a JSON file
    with open("rich_snippets.json", "w", encoding="utf-8") as file:
        json.dump(rich_snippet_data, file, indent=2, ensure_ascii=False)

if __name__ == "__main__":
    asyncio.run(run())

We define a scrape_rich_snippets() function, which starts a Playwright headless browser instance and requests the search page URL. Next, it parses the rich snippet data using the parse_rich_snippet() function we defined earlier.

Here is the result we got:

{
  "title": "Google Chrome",
  "link": "https://bing.com/alink/link?url=https%3a%2f%2fwww.google.com%2fchrome%2f&source=serp-rr&h=0SS6iQXs6bC8dov4LbCztjsdUlldoiy7EapBUCzhG7I%3d&p=kcoffcialwebsite",
  "heading": "Cross-platform web browser",
  "links": {
    "Wikipedia": "https://en.wikipedia.org/wiki/Google_Chrome",
    "Facebook": "https://www.facebook.com/googlechrome/",
    "YouTube": "https://www.youtube.com/googlechrome",
    "Twitter": "https://twitter.com/googlechrome"
  },
  "info": {
    "Developer(s)": "Google",
    "Written in": "C, C++, Assembly, HTML, Java (Android app only), JavaScript, Python",
    "Engines": "Blink (WebKit on iOS), V8 JavaScript engine",
    "Operating system": "Android Oreo or later · ChromeOS · iOS 15 or later · Linux · macOS 10.15 or later · Windows 10 or later"
  },
  "descrption": "Microsoft Edge has a built-in edge, but Chrome beats it out in everyday use and benchmark tests. So, which browser should you choose?computertechnicians.com.auChrome provides quick access to Google’s services. It also uses the Google search engine and displays videos on other devices. It even connects to a Chromecast device for video output.\n[84] [85] They are supported by the browser's desktop edition. [86] [86] [91] In 2014, Google started preventing some Windows users from installing extensions not hosted on the Chrome Web Store. [94]en.wikipedia.orgThe JavaScript virtual machine used by Chrome, the V8 JavaScript engine, has features such as dynamic code generation, hidden class transitions, and precise garbage collection.\nLearn more about Chrome and speed. Tabs help you stay organized, keep track of multiple pages, and multi-task.techspot.comChrome is designed to be fast in every possible way: It's quick to start up from your desktop, loads web pages in a snap, and runs complex web applications fast.\n"
}

With this last piece, our Bing scraper is complete!
It scrapes SERPs, keywords and rich snippet data from search page HTMLs. However, our scraper is very likely to get blocked after sending additional requests. Let's have a look at a solution!

Avoid Bing Scraping Blocking With ScrapFly

To avoid Bing web scraping blocking, we'll use ScrapFly - a web scraping API that bypasses any website scraping blocking.

scrapfly middleware

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

For scraping Bing with Scrapfly all we have to do is replace our HTTP client with the ScrapFly client:

# standard web scraping code
import httpx
from parsel import Selector

response = httpx.get("some bing.com URL")
selector = Selector(response.text)

# in ScrapFly becomes this 👇
from scrapfly import ScrapeConfig, ScrapflyClient

# replaces your HTTP client (httpx in this case)
scrapfly = ScrapflyClient(key="Your ScrapFly API key")

response = scrapfly.scrape(ScrapeConfig(
    url="website URL",
    asp=True, # enable the anti scraping protection to bypass blocking
    country="US", # set the proxy location to a specfic country
    render_js=True # enable rendering JavaScript (like headless browsers) to scrape dynamic content if needed
))

# use the built in Parsel selector
selector = response.selector
# access the HTML content

FAQ

To wrap up this guide on web scraping Bing, let's take a look at some frequently asked questions.

Yes, Microsoft offers a subscription-based API for Bing search.

Yes, all the data on Bing search pages are publicly available, and it's legal to scrape them as long as you don't harm the website by keeping your scraping rate reasonable.

Are there alternatives for scraping Bing?

Yes, Google is the most popular alternative to the Bing search engine. We have explained how to scrape Google in a previous article. Many other search engines use Bing's data (like duckduckgo, kagi) so scraping bing covers scraping of these targets as well!

Latest Bing Scraper Code
https://github.com/scrapfly/scrapfly-scrapers/

Web Scraping Bing - Summary

In this article, we explained how to scrape Bing search. We went through a step-by-step guide on creating a Bing scraper to scrape SERPs, keywords and rich snippet data. We also explained how to overcome the Bing scraping challenges:

  • Complex and dynamic HTML structure.
    By parsing the HTML by matching against distinct elements' attributes and avoiding dynamic class names.

  • Scraping blocking and localized searches.
    By adding explicit language headers and using ScrapFly to avoid Bing web scraping blocking.

Related Posts

Everything to Know to Start Web Scraping in Python Today

Complete introduction to web scraping using Python: http, parsing, AI, scaling and deployment.

Guide to Python requests POST method

Discover how to use Python's requests library for POST requests, including JSON, form data, and file uploads, along with response handling tips.

Guide to Python Requests Headers

Our guide to request headers for Python requests library. How to configure and what do they mean.