Bing.com is the second most popular search engine out there. It includes tons of valuable data found as search results. However, it's challenging to scrape due to its obfuscation challenges and high blocking rate.
In this article, we'll explain how to scrape Bing using Python. We'll be scraping valuable data fields, such as keywords and search ranking results. Let's dive in!
Key Takeaways
Master bing search python scraping with advanced techniques, SERP data extraction, and SEO monitoring for comprehensive search engine analysis.
- Reverse engineer Bing's API endpoints by intercepting browser network requests and analyzing JSON responses
- Extract structured search data including results, keywords, and ranking information from search engine results
- Implement pagination handling and search parameter management for comprehensive SERP data collection
- Configure proxy rotation and fingerprint management to avoid detection and rate limiting
- Use specialized tools like ScrapFly for automated Bing scraping with anti-blocking features
- Implement data validation and error handling for reliable search engine information extraction
Latest Bing Scraper Code
Why Scrape Bing Search?
Bing indexes a large sector of the public internet, including some websites that aren't indexed by other search engines, such as Google. So, by scraping Bing, we can access different data sources and numerous data insights.
Web scraping Bing is also a popular use case for SEO practices. Businesses can scrape Bing for search results to know their competitors' ranks and what keywords they are using.
Bing also features results as AI-answered snippets or summary snippets from popular websites, such as Wikipedia. These snippets can be directly scraped from the search results instead of extracting them from the origin website.
Project Setup
To scrape Bing, we'll be using a few Python packages:
httpx
: For requesting Bing search pages and getting HTML pages.playwright
: For scraping dynamically loaded parts of the search pages.parsel
: For parsing the HTML using web selectors like XPath and CSS.loguru
: For monitoring our scraper behavior.asyncio
: For running the scraping code asynchronously, increasing our web scraping speed.
Since asyncio
comes included with Python. We only have to install the other packages using the following pip command:
$ pip install httpx playwright parsel loguru
After running the above command, install the playwright headless browser binaries using the following command:
$ playwright install
This guide will be focused on scraping Bing's search. However, the concepts can be applied to other search engines like Google, Duckduckgo, Kagi etc.
How to Scrape Google Search Results in 2025
In this scrape guide we'll be taking a look at how to scrape Google Search - the biggest index of public web. We'll cover dynamic HTML parsing and SERP collection itself.
How to Scrape Bing Search Results
Let's start our guide by scraping Bing search result rankings (SERPs).
Search for any keyword, such as web scraping emails. The SERPs on the search page should look like this:
This search page contains other data snippets about the search keyword. However, we are only interested in the SERP results in this section. These results look like this in the HTML:
<main aria-label="Search Results">
......
<li class="b_algo" data-tag="" data-partnertag="" data-id="" data-bm="8">
....
<h2><a> .... SERP title .... </a></h2>
</li>
<li class="b_algo" data-tag="" data-partnertag="" data-id="" data-bm="9">
....
<h2><a> .... SERP title .... </a></h2>
</li>
<li class="b_algo" data-tag="" data-partnertag="" data-id="" data-bm="10">
....
<h2><a> .... SERP title .... </a></h2>
</li>
....
</main>
Bing's search pages HTML is dynamic. This means the class names are often changing which can break our parsing selectors. Therefore, we'll match elements against distinct class attributes and avoid dynamic class names:
def parse_serps(response: Response) -> List[Dict]:
"""parse SERPs from bing search pages"""
selector = Selector(response.text)
data = []
if "first" not in response.context["url"]:
position = 0
else:
position = int(response.context["url"].split("first=")[-1])
for result in selector.xpath("//li[@class='b_algo']"):
url = result.xpath(".//h2/a/@href").get()
description = result.xpath("normalize-space(.//div/p)").extract_first()
date = result.xpath(".//span[@class='news_dt']/text()").get()
if data is not None and and len(date) > 12:
date_pattern = re.compile(r"\b\d{2}-\d{2}-\d{4}\b")
date_pattern.findall(description)
dates = date_pattern.findall(date)
date = dates[0] if dates else None
position += 1
data.append(
{
"position": position,
"title": "".join(result.xpath(".//h2/a//text()").extract()),
"url": url,
"origin": result.xpath(".//div[@class='tptt']/text()").get(),
"domain": url.split("https://")[-1].split("/")[0].replace("www.", "")
if url
else None,
"description": description,
"date": date,
}
)
return data
def parse_serps(response: ScrapeApiResponse) -> List[Dict]:
"""parse SERPs from bing search pages"""
selector = response.selector
data = []
if "first" not in response.context["url"]:
position = 0
else:
position = int(response.context["url"].split("first=")[-1])
for result in selector.xpath("//li[@class='b_algo']"):
url = result.xpath(".//h2/a/@href").get()
description = result.xpath("normalize-space(.//div/p)").extract_first()
date = result.xpath(".//span[@class='news_dt']/text()").get()
if data is not None and len(date) > 12:
date_pattern = re.compile(r"\b\d{2}-\d{2}-\d{4}\b")
date_pattern.findall(description)
dates = date_pattern.findall(date)
date = dates[0] if dates else None
position += 1
data.append(
{
"position": position,
"title": "".join(result.xpath(".//h2/a//text()").extract()),
"url": url,
"origin": result.xpath(".//div[@class='tptt']/text()").get(),
"domain": url.split("https://")[-1].split("/")[0].replace("www.", "")
if url
else None,
"description": description,
"date": date,
}
)
return data
In the above code, we use the XPath selector to parse the SERPs' data from the HTML, such as the rank position, title, description, link and website. The next step is utilizing this function while sending requests to scrape the data:
import re
import asyncio
import json
from typing import List, Dict
from urllib.parse import urlencode
from httpx import AsyncClient, Response
from parsel import Selector
from loguru import logger as log
# initialize an async httpx client
client = AsyncClient(
# enable http2
http2=True,
# add basic browser like headers to prevent being blocked
headers={
"Accept-Language": "en-US,en;q=0.9", # get the search results in English
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
},
)
def parse_serps(response: Response) -> List[Dict]:
"""parse SERPs from bing search pages"""
# rest of the function code
async def scrape_search(query: str):
"""scrape bing search pages"""
url = f"https://www.bing.com/search?{urlencode({'q': query})}"
log.info("scraping the first search page")
response = await client.get(url)
serp_data = parse_serps(response)
log.success(f"scraped {len(serp_data)} search results from Bing search")
return serp_data
import re
import asyncio
import json
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
from typing import Dict, List
from urllib.parse import urlencode
from loguru import logger as log
SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")
def parse_serps(response: ScrapeApiResponse) -> List[Dict]:
"""parse SERPs from bing search pages"""
# rest of the function code
async def scrape_search(query: str):
"""scrape bing search pages"""
url = f"https://www.bing.com/search?{urlencode({'q': query})}"
log.info("scraping the first search page")
response = await SCRAPFLY.async_scrape(ScrapeConfig(url, asp=True, country="US"))
serp_data = parse_serps(response)
log.success(f"scraped {len(serp_data)} search results from Bing search")
return serp_data
Run the code
async def run():
serp_data = await scrape_search(
query="web scraping emails",
max_pages=2
)
# save the result to a JSON file
with open("serps.json", "w", encoding="utf-8") as file:
json.dump(serp_data, file, indent=2, ensure_ascii=False)
if __name__ == "__main__":
asyncio.run(run())
Let's break down the above code. First, we start by initializing an async httpx client with basic browser headers to minimize the chances of getting our scraper blocked. Since Bing supports different languages, we define an Accept-Language
header to set the web scraping language to English. We also define a scrape_search()
function, which requests the search pages and then parses the page HTML using the parse_serps()
function we defined earlier.
The above code can only scrape the first search page. Let's extend it to crawl over other pages. To do that, we can use the first
parameter to start the page from a specific index. For example, if the first search page ends at the index 9, then the second page starts with the index 10. Let's apply this to our code:
# previous code remains the same
async def scrape_search(query: str, max_pages: int = None):
"""scrape bing search pages"""
url = f"https://www.bing.com/search?{urlencode({'q': query})}"
log.info("scraping the first search page")
response = await client.get(url)
serp_data = parse_serps(response)
# new code starts from here
log.info(f"scraping search pagination ({max_pages - 1} more pages)")
total_results = (max_pages - 1) * 10 # each page contains 10 results
other_pages = [
client.get(url + f"&first={start}")
for start in range(10, total_results + 10, 10)
]
# scrape the remaining search pages concurrently
for response in asyncio.as_completed(other_pages):
response = await response
data = parse_serps(response)
serp_data.extend(data)
log.success(f"scraped {len(serp_data)} search results from Bing search")
return serp_data
# previous code remains the same
async def scrape_search(query: str, max_pages: int = None):
"""scrape bing search pages"""
url = f"https://www.bing.com/search?{urlencode({'q': query})}"
log.info("scraping the first search page")
response = await SCRAPFLY.async_scrape(ScrapeConfig(url, asp=True, country="US"))
serp_data = parse_serps(response)
# new code starts from here
log.info(f"scraping search pagination ({max_pages - 1} more pages)")
total_results = (max_pages - 1) * 10 # each page contains 10 results
other_pages = [
ScrapeConfig(url + f"&first={start}", asp=True, country="US")
for start in range(10, total_results + 10, 10)
]
# scrape the remaining search pages concurrently
async for response in SCRAPFLY.concurrent_scrape(other_pages):
data = parse_serps(response)
serp_data.extend(data)
log.success(f"scraped {len(serp_data)} search results from Bing search")
return serp_data
Run the code
async def run():
serp_data = await scrape_search(
query="web scraping emails",
max_pages=3 # new, max search pages to scrape
)
# save the result to a JSON file
with open("serps.json", "w", encoding="utf-8") as file:
json.dump(serp_data, file, indent=2, ensure_ascii=False)
if __name__ == "__main__":
asyncio.run(run())
Here, we use the first
parameter to create a scraping list for the remaining search pages. Then, we scrape them concurrently, like how we scraped the first page.
Here is a sample output of the result we got:
Sample output
[
{
"position": 1,
"title": "email-scraper · GitHub Topics · GitHub",
"url": "https://github.com/topics/email-scraper",
"origin": "Github",
"domain": "github.com",
"description": "WebNov 24, 2023 · An email scraper that finds email addresses located on a website. Made with Python Django. Emails are scraped using the requests, BeautifulSoup and regex …",
"date": "Nov 24, 2023"
},
{
"position": 2,
"title": "Web Scraping Emails with Python - scrapfly.io",
"url": "https://scrapfly.io/blog/posts/how-to-scrape-emails-using-python/",
"origin": "Scrapfly",
"domain": "scrapfly.io",
"description": "WebOct 16, 2023 (Updated 2 months ago) Have you wondered how businesses seem to have an endless list of email contacts? Email scraping can do that! In this article, we'll explore …",
"date": null
},
{
"position": 3,
"title": "Email scraping: Use cases, challenges & best practices in 2023",
"url": "https://research.aimultiple.com/email-scraping/",
"origin": "AIMultiple",
"domain": "research.aimultiple.com",
"description": "WebOct 13, 2023 · What is Email Scraping? Email scraping is the technique of extracting email addresses in bulk from websites using email scrapers. Top 3 benefits of email …",
"date": "Oct 13, 2023"
},
{
"position": 4,
"title": "Scrape Email Addresses From Websites using Python …",
"url": "https://www.scrapingdog.com/blog/scrape-email-addresses-from-website/",
"origin": "Scrapingdog",
"domain": "scrapingdog.com",
"description": "Web13-01-2023 Email Scraping has become a popular and efficient method for obtaining valuable contact information from the internet. By learning how to scrape emails, businesses and individuals can expand their networks, …",
"date": "13-01-2023"
},
{
"position": 5,
"title": "How to Scrape Emails on the Web? [8 Easy Steps and Tools]",
"url": "https://techjury.net/blog/how-to-scrape-emails-on-the-web/",
"origin": "Techjury",
"domain": "techjury.net",
"description": "WebNov 21, 2023 · Email scraping (or email address harvesting) is the process of gathering email addresses of potential clients from the Internet using automated tools. This method …",
"date": "Nov 21, 2023"
}
]
Our Bing scraper can successfully scrape search pages for SERP data. Next, we'll scrape keyword data.
How to Scrape Bing Keyword Data
Knowing what users search for or ask about is an essential part of the SEO keyword research. This keyword data can be found on Bing search pages under the related queries section:
The first part of scraping this data is defining the parsing logic. Like we did before, we'll use XPath selectors and match against elements' attributes:
def parse_keywords(response: Response) -> Dict:
"""parse keyword data from bing search pages"""
selector = Selector(response.text)
related_keywords = []
for keyword in selector.xpath(".//li[@class='b_ans']/div/ul/li"):
related_keywords.append("".join(keyword.xpath(".//a/div//text()").extract()))
return related_keywords
def parse_keywords(response: ScrapeApiResponse) -> Dict:
"""parse keyword data from bing search pages"""
selector = response.selector
related_keywords = []
for keyword in selector.xpath(".//li[@class='b_ans']/div/ul/li"):
related_keywords.append("".join(keyword.xpath(".//a/div//text()").extract()))
return related_keywords
Here, we define a parse_keywords
function. It extracts the FAQs and related queries data using the XPath selector. Next, we'll use this function after requesting the search pages to scrape the data. This data is often found on the first search page. So, pagination isn't required for this Bing scraping section:
import asyncio
import json
from typing import Dict
from urllib.parse import urlencode
from httpx import AsyncClient, Response
from parsel import Selector
from loguru import logger as log
# initialize an async httpx client
client = AsyncClient(
# enable http2
http2=True,
# add basic browser like headers to prevent being blocked
headers={
"Accept-Language": "en-US,en;q=0.9", # get the search results in English
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"accept-encoding": "gzip, deflate, br",
},
)
def parse_keywords(response: Response) -> Dict:
"""parse keyword data from bing search pages"""
# rest of the function code
async def scrape_keywords(query: str):
"""scrape bing search pages for keyword data"""
url = f"https://www.bing.com/search?{urlencode({'q': query})}"
log.info("scraping Bing search for keyword data")
response = await client.get(url)
keyword_data = parse_keywords(response)
log.success(
f"scraped {len(keyword_data)} keywords from Bing search"
)
return keyword_data
import asyncio
import json
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
from typing import Dict
from urllib.parse import urlencode
from loguru import logger as log
SCRAPFLY = ScrapflyClient(key="Your Scrapfly API key")
BASE_CONFIG = {
# bypass Bing web scraping blocking
"asp": True,
# set the poxy location to US to get the result in English
"country": "GB",
"proxy_pool": "public_residential_pool",
"debug":True,
"os":"linux",
"auto_scroll":True,
}
def parse_keywords(response: Response) -> Dict:
"""parse keyword data from bing search pages"""
# rest of the function code
async def scrape_keywords(query: str):
"""scrape bing search pages for keyword data"""
url = f"https://www.bing.com/search?{urlencode({'q': query})}"
log.info("scraping Bing search for keyword data")
response = await SCRAPFLY.async_scrape(ScrapeConfig(url, **BASE_CONFIG, render_js=True))
keyword_data = parse_keywords(response)
log.success(
f"scraped {len(keyword_data)} keywords from Bing search"
)
return keyword_data
Run the code
async def run():
keyword_data = await scrape_keywords(
query="web scraping emails",
)
# save the result to a JSON file
with open("keywords.json", "w", encoding="utf-8") as file:
json.dump(keyword_data, file, indent=2, ensure_ascii=False)
if __name__ == "__main__":
asyncio.run(run())
Here, we use the same httpx client we defined before and define a scrape_keywords
function. It requests the search page and then parses the keyword data using the parse_keyword
function we defined earlier. Finally, we run the code using asyncio
and save the result to a JSON file. Here is the result we got:
Output
{
"related_keywords": [
"extract email address from website",
"extract email from website free",
"extract email from website",
"scraping email addresses from websites",
"scrape emails from website free",
"capture emails from websites",
"crawl website for email addresses",
"extract email from webpage"
]
}
With this last piece, our Bing scraper is complete!
It scrapes SERPs, keywords and rich snippet data from search page HTMLs. However, our scraper is very likely to get blocked after sending additional requests. Let's have a look at a solution!
Avoid Bing Scraping Blocking With ScrapFly
To avoid Bing web scraping blocking, we'll use ScrapFly - a web scraping API that bypasses any website scraping blocking.
ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.
- Anti-bot protection bypass - scrape web pages without blocking!
- Rotating residential proxies - prevent IP address and geographic blocks.
- JavaScript rendering - scrape dynamic web pages through cloud browsers.
- Full browser automation - control browsers to scroll, input and click on objects.
- Format conversion - scrape as HTML, JSON, Text, or Markdown.
- Python and Typescript SDKs, as well as Scrapy and no-code tool integrations.
For scraping Bing with Scrapfly all we have to do is replace our HTTP client with the ScrapFly client:
# standard web scraping code
import httpx
from parsel import Selector
response = httpx.get("some bing.com URL")
selector = Selector(response.text)
# in ScrapFly becomes this 👇
from scrapfly import ScrapeConfig, ScrapflyClient
# replaces your HTTP client (httpx in this case)
scrapfly = ScrapflyClient(key="Your ScrapFly API key")
response = scrapfly.scrape(ScrapeConfig(
url="website URL",
asp=True, # enable the anti scraping protection to bypass blocking
country="US", # set the proxy location to a specfic country
render_js=True # enable rendering JavaScript (like headless browsers) to scrape dynamic content if needed
))
# use the built in Parsel selector
selector = response.selector
# access the HTML content
FAQ
To wrap up this guide on web scraping Bing, let's take a look at some frequently asked questions.
Is there a public API for Bing search?
Yes, Microsoft offers a subscription-based API for Bing search.
Is it legal to scrape Bing?
Yes, all the data on Bing search pages are publicly available, and it's legal to scrape them as long as you don't harm the website by keeping your scraping rate reasonable.
Are there alternatives for scraping Bing?
Yes, Google is the most popular alternative to the Bing search engine. We have explained how to scrape Google in a previous article. Many other search engines use Bing's data (like duckduckgo, kagi) so scraping bing covers scraping of these targets as well!
Web Scraping Bing - Summary
In this article, we explained how to scrape Bing search. We went through a step-by-step guide on creating a Bing scraper to scrape SERPs, keywords and rich snippet data. We also explained how to overcome the Bing scraping challenges:
Complex and dynamic HTML structure.
By parsing the HTML by matching against distinct elements' attributes and avoiding dynamic class names.Scraping blocking and localized searches.
By adding explicit language headers and using ScrapFly to avoid Bing web scraping blocking.
Legal Disclaimer and Precautions
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens who are protected by GDPR.
- Do not repurpose the entire public datasets which can be illegal in some countries.