How to Scrape BestBuy Product, Offer and Review Data

How to Scrape BestBuy Product, Offer and Review Data

In this article, we'll explain how to scrape BestBuy, one of the most popular retail stores for electronic stores in the United States. We'll scrape different data types from product, search, review, and sitemap pages. Additionally, we'll employ a wide range of web scraping tricks, such as hidden JSON data, hidden APIs, HTML, and XML parsing. So, this guide serves as a comprehensive web scraping introduction!

Latest BestBuy Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

Why Scrape BestBuy?

The amount of data that web scraping BestBuy can allow is numerous. It can empower both businesses and retail buyers in different ways:

  • Competitive Analysis
    The market dynamics are aggressive and fast-changing, making it challenging for businesses to remain competitive. Scraping BestBuy allows businesses to compare their competitors' pricing, sales, and reviews. This provides a better understanding of the current trends and interests to remain up-to-date and attract new customers.

  • Customer Sentiment Analysis
    BestBuy includes thousands of review data for different products. Web scraping BestBuy's reviews can be used to run sentiment analysis research, which provides useful insights into the customers' satisfaction, preferences, and feedback.

  • Empowered Navigation
    Manually browsing the excessive number of similar products on BestBuy can be tedious. On the other hand, retailers can web scrape BestBuy to compare many products quickly, allowing them to identify niche markets and undervalued products.

For further details, refer to our introduction on web scraping use cases.

Setup

To web scrape BestBuy, we'll use Python with a few community libraries:

  • httpx: To request BestBuy pages and get the data as HTML, XML, or JSON.
  • parsel: To parse the HTML and XML data using selectors, such as XPath and CSS.
  • JMESPath: To refine and parse the BestBuy JSON datasets for the useful data only.
  • loguru: To monitor and log our BestBuy scraper in beautiful terminal outputs.
  • asyncio: To increase the web scraping speed by running the code asynchronously.

Since asyncio comes pre-installed in Python, we'll only have to install the other packages using the following pip command:

pip install httpx parsel jmespath loguru

How To Discover BestBuy Pages?

Scraping sitemaps is an efficient way to discover thousands of organized URLs. They are provided for search engine crawlers to index the pages, which we can use to discover web scraping targets on a website.

BestBuy's sitemaps can be found at bestbuy.com/robots.txt. It's a text file that provides crawling instructions along with the website's sitemap directory:

Sitemap: https://sitemaps.bestbuy.com/sitemaps_discover_learn.xml
Sitemap: https://sitemaps.bestbuy.com/sitemaps_pdp.xml
Sitemap: https://sitemaps.bestbuy.com/sitemaps_promos.xml
Sitemap: https://sitemaps.bestbuy.com/sitemaps_qna.xml
Sitemap: https://sitemaps.bestbuy.com/sitemaps_rnr.xml
Sitemap: https://sitemaps.bestbuy.com/sitemaps_search_plps.xml
Sitemap: https://sitemaps.bestbuy.com/sitemaps_standalone_qa.xml
Sitemap: https://www.bestbuy.com/sitemap.xml

Each of the above sitemaps represents a group of related page URLs found under an XML file that's compressed to a gzip file to reduce its size:

<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap><loc>https://sitemaps.bestbuy.com/sitemaps_pdp.0000.xml.gz</loc><lastmod>2024-03-08T10:16:14.901109+00:00</lastmod></sitemap>
<sitemap><loc>https://sitemaps.bestbuy.com/sitemaps_pdp.0001.xml.gz</loc><lastmod>2024-03-08T10:16:14.901109+00:00</lastmod></sitemap>
</sitemapindex>

The above gz file looks like the following after extracting:

<?xml version='1.0' encoding='utf-8'?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>https://www.bestbuy.com/site/aventon-aventure-step-over-ebike-w-45-mile-max-operating-range-and-28-mph-max-speed-medium-fire-black/6487149.p?skuId=6487149</loc></url>
<url><loc>https://www.bestbuy.com/site/detective-story-1951/34804554.p?skuId=34804554</loc></url>
<url><loc>https://www.bestbuy.com/site/flowers-lp-vinyl/35944053.p?skuId=35944053</loc></url>
<url><loc>https://www.bestbuy.com/site/apple-iphone-15-pro-max-1tb-natural-titanium-verizon/6525500.p?skuId=6525500</loc></url>
<url><loc>https://www.bestbuy.com/site/geeni-dual-outlet-outdoor-wi-fi-smart-plug-gray/6388590.p?skuId=6388590</loc></url>
<url><loc>https://www.bestbuy.com/site/dynasty-the-sixth-season-vol-1-4-discs-dvd/20139655.p?skuId=20139655</loc></url>

To scrape BestBuy's sitemaps, we'll request the compressed XML file, decode it, and parse it for the URLs. For this example, we'll use the promotions sitemap:

Python
ScrapFly
import asyncio
import json
import gzip
from typing import List
from httpx import AsyncClient, Response
from parsel import Selector
from loguru import logger as log

# initialize an async httpx client
client = AsyncClient(
    # enable http2
    http2=True,
    # add basic browser like headers to prevent getting blocked
    headers={
        "Accept-Language": "en-US,en;q=0.9",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
    },
)

def parse_sitemaps(response: Response) -> List[str]:
    """parse links for bestbuy sitemaps"""
    # decode the .gz file
    xml = str(gzip.decompress(response.content), 'utf-8')
    selector = Selector(xml)
    data = []
    for url in selector.xpath("//url/loc/text()"):
        data.append(url.get())
    return data


async def scrape_sitemaps(url: str) -> List[str]:
    """scrape link data from bestbuy sitemaps"""
    response = await client.get(url)
    promo_urls = parse_sitemaps(response)
    log.success(f"scraped {len(promo_urls)} urls from sitemaps")    
    return promo_urls
import asyncio
import json
import gzip
from typing import List
from parsel import Selector
from loguru import logger as log
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")

def parse_sitemaps(response: ScrapeApiResponse) -> List[str]:
    """parse links for bestbuy sitemaps"""
    # decode the .gz file
    bytes_data = response.scrape_result['content'].getvalue()
    xml = str(gzip.decompress(bytes_data), 'utf-8')
    selector = Selector(xml)
    data = []
    for url in selector.xpath("//url/loc/text()"):
        data.append(url.get())
    return data


async def scrape_sitemaps(url: str) -> List[str]:
    """scrape link data from bestbuy sitemaps"""
    response = await SCRAPFLY.async_scrape(ScrapeConfig(url, country="US",))
    promo_urls = parse_sitemaps(response)
    log.success(f"scraped {len(promo_urls)} urls from sitemaps")
    return promo_urls
Run the code
async def run():
    promo_urls = await scrape_sitemaps(
        url="https://sitemaps.bestbuy.com/sitemaps_promos.0000.xml.gz"
    )
    # save the data to a JSON file
    with open("promos.json", "w", encoding="utf-8") as file:
        json.dump(promo_urls, file, indent=2, ensure_ascii=False)


if __name__ == "__main__":
    asyncio.run(run())

In the above code, we define an httpx with common browser headers to minimize the chances of getting blocked. Additionally, we define two functions, let's break them down:

  • scrape_sitemaps: To request the sitemap URL using the defined httpx client.
  • parse_sitemaps: To decode the gz file into its XML content and then parse the XML for the URLs using the XPath selector.

Here is a sample output of the results we got:

[
  "https://www.bestbuy.com/site/promo/4k-capable-memory-cards",
  "https://www.bestbuy.com/site/promo/all-total-by-verizon",
  "https://www.bestbuy.com/site/promo/shop-featured-intel-evo",
  "https://www.bestbuy.com/site/promo/laser-heat-therapy",
  "https://www.bestbuy.com/site/promo/save-on-select-grills",
  ....
]

For further details on scraping and discovering sitemaps, refer to our dedicated guide:

How to Scrape Sitemaps to Discover Scraping Targets

Introduction to scraping and discovering sitemaps. You will learn how to find, navigate, and use Python and JavaScript tools for XML parsing.

How to Scrape Sitemaps to Discover Scraping Targets

How To Scrape BestBuy Search Pages?

Let's start with the first part of our BestBuy scraper code: search pages. Search for any product on the website, like the "macbook" keyword, and you will get a page that looks the following:

bestbuy search page with macbook product data
Products on search pages

To scrape BestBuy search pages, we'll request the search page URL and then parse the HTML. First, let's start with the parsing logic:

def parse_search(response: ScrapeApiResponse):
    """parse search data from search pages"""
    selector = response.selector
    data = []
    for item in selector.css("#main-results li"):
        name = item.css(".product-title::attr(title)").get()
        link = item.css("a.product-list-item-link::attr(href)").get()
        price = selector.css('div.customer-price::text').re('\d+\.\d{2}')[0]
        original_price = (selector.css('div.regular-price::text').re('\d+\.\d{2}') or [None]) [0]
        sku = item.xpath("@data-testid").get()
        _rating_data = item.css(".c-ratings-reviews p::text")
        rating = (_rating_data.re(r"\d+\.*\d*") or [None])[0]
        rating_count = int((_rating_data.re('(\d+) reviews') or [0])[0])
        images = item.css("img[data-testid='product-image']::attr(srcset)").getall()

        data.append({
            "name": name,
            "link": "https://www.bestbuy.com" + link if link else None,
            "images": images,
            "sku": sku,
            "price": price,
            "original_price": original_price,
            "rating": rating,
            "rating_count": rating_count,
        })
    if len(data):
        _total_count = selector.css("div.results-title span:nth-of-type(2)::text").re('\d+')[0]
        total_pages = int(_total_count) // len(data)
    else:
        total_pages = 1

    return {"data": data, "total_pages": total_pages}

Here, we define a parse_search function, which does the following:

  • Iterates over the product boxes on the HTML.
  • Parses each product's data, such as the name, price, link, etc.
  • Gets the total number of search pages available and returns the search data.

Next, we'll utilize the above parsing logic while sending requests to scrape and crawl the search pages:

import asyncio
import json

from typing import Union
from loguru import logger as log
from urllib.parse import urlencode, quote_plus
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your Scrapfly API key")

BASE_CONFIG = {
    # bypass bestbuy.com web scraping blocking
    "asp": True,
    # set the proxy country to US
    "country": "US",
    "headers": {
        "cookie": "intl_splash=false"
    }
}


def parse_search(response: ScrapeApiResponse):
    """parse search data from search pages"""
    # the same function logic


async def scrape_search(search_query: str, sort: Union["-bestsellingsort", "-Best-Discount"] = None, max_pages=None):
    """scrape search data from bestbuy search"""

    def form_search_url(page_number: int):
        """form the search url"""
        base_url = "https://www.bestbuy.com/site/searchpage.jsp?"
        # search parameters
        params = {"st": quote_plus(search_query)}
        if page_number > 1:
            params["cp"] = page_number
        if sort:
            params["sp"] = sort
        return base_url + urlencode(params)
    
    first_page = await SCRAPFLY.async_scrape(
        ScrapeConfig(
            form_search_url(1), render_js=True, rendering_wait=5000,auto_scroll=True,
            wait_for_selector="#main-results li", **BASE_CONFIG
        )
    )
    data = parse_search(first_page)
    search_data = data["data"]
    total_pages = data["total_pages"]

    # get the number of total search pages to scrape
    if max_pages and max_pages < total_pages:
        total_pages = max_pages

    log.info(f"scraping search pagination, {total_pages - 1} more pages")
    # add the remaining pages to a scraping list to scrape them concurrently
    to_scrape = [
        ScrapeConfig(form_search_url(page_number), **BASE_CONFIG)
        for page_number in range(2, total_pages + 1)
    ]
    async for response in SCRAPFLY.concurrent_scrape(to_scrape):
        data = parse_search(response)["data"]
        search_data.extend(data)
    
    log.success(f"scraped {len(search_data)} products from search pages")
    return search_data
Run the code
async def run():
    search_data = await scrape_search(
        search_query="macbook",
        max_pages=3
    )
    # save the results to a JSOn file
    with open("search.json", "w", encoding="utf-8") as file:
        json.dump(search_data, file, indent=2, ensure_ascii=False)    


if __name__ == "__main__":
    asyncio.run(run())

Let's break down the execution flow of the above scrape_search function:

  • Form a search URL based on the search keyword, sorting option, and page number.
  • Request the search URL and parse it with the parse_search function.
  • Get the number of pagination pages to scrape using the max_pages parameter.
  • Add the remaining pagination URLs to a list and request them concurrently.

The above BestBuy scraping code will extract product data from three search pages. Here is what the results should look like:

[
  {
    "name": "MacBook Pro 13.3\" Laptop - Apple M2 chip - 24GB Memory - 1TB SSD (Latest Model) - Silver",
    "link": "https://www.bestbuy.com/site/macbook-pro-13-3-laptop-apple-m2-chip-24gb-memory-1tb-ssd-latest-model-silver/6382795.p?skuId=6382795",
    "image": "https://pisces.bbystatic.com/image2/BestBuy_US/images/products/6382/6382795_sd.jpg;maxHeight=200;maxWidth=300",
    "sku": "6382795",
    "model": "MNEX3LL/A",
    "price": 1499,
    "original_price": 2099,
    "save": "28.59%",
    "rating": 4.8,
    "rating_count": 4,
    "is_sold_out": false
  },
  ....
]

The above code can scrape the product data that is visible on the search pages. However, it can be extended with crawling logic to scrape the full details of each product from its respective URL. For further details on crawling while scraping, refer to our dedicated guide.

How to Crawl the Web with Python

Take a deep dive into building web crawlers with Python. We'll start by defining the common crawling concepts and challenges. Then, we'll go through a practical example of creating a web crawler for a target website.

How to Crawl the Web with Python

How To Scrape BestBuy Product Pages?

Let's add support for scraping product pages to our BestBuy scraper. But before we start, let's have a look at what product pages look like. Go to any product page on the website, like this one, and you will get a page similar to this:

bestbuy product page
Product pages on BestBuy

Data on product pages is comprehensive, and it's scattered across the page. Therefore, it's challenging to scrape it using selectors. Instead, we'll scrape them as JSON datasets from script tags. To locate these script tags, follow the below steps:

  • Open the browser developer tools by pressing the F12 key.
  • Search for the script tags using the selector //script[contains(text(),'productBySkuId')]/text().

After following the above steps, you will find several script tags that include JSON data. Each script tag contains certain type of data about the product, such as pricing, shipping, review, etc. For example, here is what the product specification data looks like:

scrapfly middleware

The above JSON data are the same on the page but before getting rendered into the HTML, which is often known as hidden web data.

How to Scrape Hidden Web Data

Learn what hidden data is through some common examples. You will also learn how to scrape it using regular expressions and other clever parsing algorithms.

How to Scrape Hidden Web Data

To scrape BestBuy product data, we will select the script tags containing the JSON data and parse them:

import os
import json
import asyncio

from typing import Dict, List
from loguru import logger as log
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient("Your Scrapfly API key")

BASE_CONFIG = {
    # bypass bestbuy.com web scraping blocking
    "asp": True,
    # set the proxy country to US
    "country": "US",
    "headers": {
        "cookie": "intl_splash=false"
    }
}


def extract_json(script: str) -> Dict:
    """extract JSON data from a script tag content"""
    start_index = script.find('.push(')
    brace_start = script.find('{', start_index)

    # find the JSON block
    brace_count = 0
    for i in range(brace_start, len(script)):
        if script[i] == '{':
            brace_count += 1
        elif script[i] == '}':
            brace_count -= 1
            if brace_count == 0:
                brace_end = i + 1
                break

    raw_json = script[brace_start:brace_end]
    cleaned_json = raw_json.replace("undefined", "null")
    parsed_data = json.loads(cleaned_json)
    return parsed_data


def _extract_nested(data, keys, default=None):
    for key in keys:
        data = data.get(key, {})
    return data or default


def parse_product(response: ScrapeApiResponse) -> Dict:
    """parse product data from bestbuy product pages"""
    selector = response.selector
    data = {}
    
    product_info = extract_json( 
        selector.xpath("//script[contains(text(),'productBySkuId')]/text()").get()
    )
    product_features = extract_json(
        selector.xpath("//script[contains(text(),'R1eapefmjttrkq')]/text()").get()
    )
    buying_options = extract_json(
        selector.xpath("//script[contains(text(), 'R3vmipefmjttrkqH1')]/text()").get()
    )
    product_faq = extract_json(
        selector.xpath("//script[contains(text(), 'ProductQuestionConnection')]/text()").get()
    )

    data["product-info"] = _extract_nested(product_info, ["rehydrate", ":Rp9efmjttrkq:", "data", "productBySkuId"])
    data["product-features"] = _extract_nested(product_features, ["rehydrate", ":R1eapefmjttrkq:", "data", "productBySkuId", "features"])
    data["buying-options"] = _extract_nested(buying_options, ["rehydrate", ":R3vmipefmjttrkqH1:", "data", "productBySkuId", "buyingOptions"])
    data["product-faq"] = _extract_nested(product_faq, ["rehydrate", ":R1fapefmjttrkq:", "data", "productBySkuId", "questions"])

    return data


async def scrape_products(urls: List[str]) -> List[Dict]:
    """scrapy product data from bestbuy product pages"""
    to_scrape = [ScrapeConfig(url, **BASE_CONFIG, render_js=True) for url in urls]
    data = []
    async for response in SCRAPFLY.concurrent_scrape(to_scrape):
        try:
            product_data = parse_product(response)
            data.append(product_data)
        except:
            pass
            log.debug("expired product page")
    log.success(f"scraped {len(data)} products from product pages")
    return data
Run the code
async def run():
    product_data = await scrape_products(
        urls=[
            "https://www.bestbuy.com/site/apple-macbook-air-13-inch-apple-m4-chip-built-for-apple-intelligence-16gb-memory-256gb-ssd-midnight/6565862.p",
            "https://www.bestbuy.com/site/apple-geek-squad-certified-refurbished-macbook-pro-16-display-intel-core-i7-16gb-memory-amd-radeon-pro-5300m-512gb-ssd-space-gray/6489615.p",
            "https://www.bestbuy.com/site/apple-macbook-pro-14-inch-apple-m4-chip-built-for-apple-intelligence-16gb-memory-512gb-ssd-space-black/6602741.p",
            "https://www.bestbuy.com/site/apple-macbook-pro-14-laptop-m3-pro-chip-18gb-memory-14-core-gpu-512gb-ssd-latest-model-space-black/6534615.p"
    ])

    # save the data to a JSON file
    with open("products.json", "w", encoding="utf-8") as file:
        json.dump(product_data, file, indent=2, ensure_ascii=False)


if __name__ == "__main__":
    asyncio.run(run())

Let's break down the functions we use in the above BestBuy scraping code:

  • extract_json: To extract JSON datasets from a given script tags.
  • parse_product: To parse the script tags containing the product data from the page HTML.
  • scrape_products: To add the product page URLs into a scraping list while scraping them concurrently.

The output is a comprehensive JSON dataset that looks like the following:

Sample output
[
  {
    "product-info": {
      "__typename": "Product",
      "brand": "Apple",
      "skuId": "6534615",
      "name": {
        "__typename": "ProductName",
        "short": "Apple - MacBook Pro 14\" Laptop - M3 Pro chip Built for Apple Intelligence - 18GB Memory - 14-core GPU - 512GB SSD - Space Black"
      },
      "manufacturer": {
        "__typename": "Manufacturer",
        "modelNumber": "MRX33LL/A"
      },
      "hierarchy": {
        "__typename": "ProductHierarchy",
        "bbypres": [
          {
            "__typename": "ProductHierarchyLink",
            "id": "pcmcat247400050001",
            "primary": true,
            "href": "http://data.bestbuy.com/v2/hierarchy/bbypres/id/pcmcat247400050001",
            "categoryDetail": {
              "__typename": "CategoryDetail",
              "hierarchyId": "bbypres",
              "name": "MacBooks",
              "seoUrl": "https://www.bestbuy.com/site/all-laptops/macbooks/pcmcat247400050001.c?id=pcmcat247400050001",
              "startDate": "2011-07-13T05:00Z",
              "template": null,
              "broaderTerms": {
                "__typename": "HierarchyBroaderTerms",
                "primaryLineage": [
                  {
                    "__typename": "HierarchyLineage",
                    "id": "pcmcat138500050001",
                    "name": "All Laptops",
                    "seoUrl": "https://www.bestbuy.com/site/laptop-computers/all-laptops/pcmcat138500050001.c?id=pcmcat138500050001",
                    "sequence": 0,
                    "startDate": "2007-12-09T06:00Z"
                  },
                  ....
                ]
              }
            }
          },         
        ]
      },
      "esrbRating": null,
      "releaseDateDisplayValue": null,
      "dotComStreetDate": "2023-11-07T06:00Z",
      "inStoreServiceType": null,
      "badges": [],
      "openBoxCondition": null,
      "whatItIs": [
        "Laptop Computer",
        "MacBook"
      ],
      "specificationGroups": [
        {
          "__typename": "ProductSpecificationGroup",
          "name": "Key Specs",
          "specifications": [
            {
              "__typename": "ProductSpecification",
              "definition": null,
              "displayName": "Screen Type",
              "value": "Retina Display"
            },
            ....            
          ]
        }        
      ],
      "highlights": {
        "__typename": "Highlights",
        "entries": [
          {
            "__typename": "Highlight",
            "name": "Processor Model",
            "classification": "High",
            "description": "The CPU, or central processing unit, is essentially the brain of your computer. The faster your CPU, the faster your computer will run.",
            "link": "Why is the processor important?",
            "key": "d2c3dcc5-ac5e-411d-9bf1-5344c4ec9cf6",
            "classifications": [
              {
                "__typename": "Classification",
                "bullets": [
                  "Great portability",
                  "Budget friendly",
                  "Basic internet tasks"
                ],
                "description": "Works well for very basic Internet tasks, such as casual browsing. Commonly found in the most portable laptops, which tend to have smaller screens and less storage.",
                "icon": "https://pisces.bbystatic.com/image2/vector/BestBuy_US/dam/icon-highlight-cpu-budget-dd17b005-a44b-49e9-87d0-661ac5cefa5a.svg",
                "key": "14fab14b-e67b-42fd-aefb-e08def5464ee",
                "name": "Budget",
                "sampleValue": null
              }              
            ],
            "value": "Apple M3 Pro"
          }          
        ],
        "typeInfoDefinition": "Laptop_Computers",
        "highlightsCollectionId": "0bbc3112-2558-4a49-bc86-71b6de7b47af",
        "skuId": "6534615"
      },
      "operationalAttributes": [
        {
          "__typename": "ProductOperationalAttribute",
          "displayName": "Box_Contents",
          "values": [
            "14-inch MacBook Pro",
            "70W USB-C Power Adapter",
            "USB-C to MagSafe 3 Cable (2 m)"
          ]
        },
        ....        
      ],
      "productSelectorId": null
    },
    "product-features": [
      {
        "__typename": "ProductFeature",
        "description": "SUPERCHARGED BY M3 PRO OR M3 MAX—The Apple M3 Pro chip, with an up to 12-core CPU and up to 18-core GPU using hardware-accelerated ray tracing, delivers amazing.",
        "sequence": 0,
        "title": null
      },
      ....      
    ],
    "buying-options": [
      {
        "__typename": "InboundBuyingOption",
        "type": "New",
        "product": {
          "__typename": "Product",
          "brand": "Apple",
          "skuId": "6534615",
          "url": {
            "__typename": "ProductUrl",
            "pdp": "https://www.bestbuy.com/site/apple-macbook-pro-14-laptop-m3-pro-chip-built-for-apple-intelligence-18gb-memory-14-core-gpu-512gb-ssd-space-black/6534615.p?skuId=6534615",
            "relativePdp": "/site/apple-macbook-pro-14-laptop-m3-pro-chip-built-for-apple-intelligence-18gb-memory-14-core-gpu-512gb-ssd-space-black/6534615.p?skuId=6534615"
          },
          "price": {
            "__typename": "ItemPrice",
            "customerPrice": 1599,
            "skuId": "6534615"
          },
          "fulfillmentOptions": {
            "__typename": "FulfillmentOptionsList",
            "shippingDetails": [
              {
                "__typename": "FulfillmentShippingDetail",
                "shippingAvailability": [
                  {
                    "__typename": "FulfillmentShippingAvailability",
                    "shippingEligible": false,
                    "customerLOSGroup": null
                  }
                ]
              }
            ],
            "ispuDetails": [
              {
                "__typename": "InStorePickupDetail",
                "ispuAvailability": [
                  {
                    "__typename": "InStorePickupAvailability",
                    "pickupEligible": true,
                    "instoreInventoryAvailable": false,
                    "quantity": null,
                    "minPickupInHours": null,
                    "maxDate": null
                  }
                ]
              }
            ],
            "buttonStates": [
              {
                "__typename": "ButtonState",
                "buttonState": "SOLD_OUT"
              }
            ]
          },
          "openBoxCondition": null
        },
        "description": "New",
        "code": null,
        "skuId": "6534615"
      }      
    ],
    "product-faq": {
      "__typename": "ProductQuestionConnection",
      "results": [
        {
          "__typename": "ProductQuestion",
          "answerCount": 4,
          "bazaarvoiceId": "10325575",
          "id": "e5dae9a9-45d4-3da6-9c4d-55827db44478",
          "isAiGenerated": false,
          "negativeFeedbackCount": 0,
          "positiveFeedbackCount": 0,
          "submissionTime": "2023-11-21T07:42:05.000-06:00",
          "text": null,
          "title": "does it include apple guarantee?",
          "userNickname": "sofia",
          "answers": [
            {
              "__typename": "ProductQuestionAnswer",
              "brandImageUrl": null,
              "id": "bf7cdf34-e646-3e28-b86e-5cee0139e98d",
              "negativeFeedbackCount": 0,
              "positiveFeedbackCount": 10,
              "submissionTime": "2023-11-21T22:35:34.000-06:00",
              "text": "All Apple products come with a 60-day AppleCare warranty.  If you have the TotalTech package membership, they threw in 3 years of AppleCare+ for free.  If you want to get a TV mounted or have some other use of BestBuy's package, its very much worth it.  I got AppleCare+ for free with a recent MacBook Air because I had it from a TV purchase/install.  So, you should ask about that to see if it works for your situation.\n\nSide note: I got this MBA before the new MBP M3s were out.  I love my fan-less MBA, but i'd of probably paid extra for the MBP M3, if it was an option then.",
              "userNickname": "JustinL",
              "badges": [
                {
                  "__typename": "ProductQuestionBadge",
                  "code": "rewardZoneNumberV3",
                  "description": "My Best Buy members receive promotional considerations or entries into drawings for writing reviews.",
                  "name": "My Best Buy\\u00ae Member"
                },
                .....
              ],
              "images": []
            }
            ....            
          ],
          "images": []
        }        
      ],
      "pageInfo": {
        "__typename": "ProductQuestionPageInfo",
        "page": 1,
        "pageSize": 8,
        "totalResults": 50
      },
      "totalResults": 50
    }
  },
  ....
]

🙋‍ Note that the HTML structure of the BestBuy product pages differs based on product type and category. Therefore, the above product parsing logic should be adjusted for other product types.

Cool! The above BestBuy scraping code can extract the full details of each product. However, it lacks the product reviews - let's scrape them in the next section!

How to Scrape BestBuy Review Pages?

Reviews on BestBuy can be found on each product page:

bestbuy review page
Review data on BestBuy

The above review data are split into two categories:

  • Product ratings
    Review and rating data into each product's specification, which we scraped earlier from the product page itself.

  • User reviews
    Detailed user reviews of the product, which we'll scrape in this section.

To scrape BestBuy reviews, we'll utilize the hidden reviews API. To locate this API, follow the below steps:

  • Open the browser developer tools by pressing the F12 key.
  • Select the network tab and filter by Fetch/XHR requests.
  • Filter the review using the sort option or click on the next review page.

After following the above steps, you will find the reviews API recorded on the browser:

reviews hidden api on browser developer tools
Reviews hidden API

The API above is called in the background using the browser and then rendered into HTML. The request can be copied as a cURL and imported into HTTP clients like Postman.

How to Scrape Hidden APIs

Learn how to find hidden APIs, how to scrape them, and what are some common challenges faced when developing web scrapers for hidden APIs.

How to Scrape Hidden APIs

To scrape the product reviews, we'll request the above API and paginate it:

import asyncio
import json
from typing import Dict, List
from loguru import logger as log
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")


def parse_reviews(response: ScrapeApiResponse) -> List[Dict]:
    """parse review data from the review API responses"""
    data = json.loads(response.scrape_result['content'])
    total_count = data["totalPages"]
    review_data = data["topics"]
    return {"data": review_data, "total_count": total_count}


async def scrape_reviews(skuid: int, max_pages: int=None) -> List[Dict]:
    """scrape review data from the reviews API"""
    first_page = await SCRAPFLY.async_scrape(ScrapeConfig(
        f"https://www.bestbuy.com/ugc/v2/reviews?page=1&pageSize=20&sku={skuid}&sort=MOST_RECENT",
        asp=True, country="US"
    ))
    data = parse_reviews(first_page)
    review_data = data["data"]
    total_count = data["total_count"]

    # get the number of total review pages to scrape
    if max_pages and max_pages < total_count:
        total_count = max_pages

    log.info(f"scraping reviews pagination, {total_count - 1} more pages")
    # add the remaining pages to a scraping list to scrape them concurrently
    to_scrape = [
        ScrapeConfig(
            f"https://www.bestbuy.com/ugc/v2/reviews?page={page_number}&pageSize=20&sku={skuid}&sort=MOST_RECENT",
            asp=True, country="US"
        )
        for page_number in range(2, total_count + 1)
    ]
    async for response in SCRAPFLY.concurrent_scrape(to_scrape):
        data = parse_reviews(response)["data"]
        review_data.extend(data)

    log.success(f"scraped {len(review_data)} reviews from the reviews API")
    return review_data
Run the code
async def run():
    review_data = await scrape_reviews(
        skuid="6565065",
        max_pages=3
    )
    with open("reviews.json", "w", encoding="utf-8") as file:
        json.dump(review_data, file, indent=2, ensure_ascii=False)        

if __name__ == "__main__":
    asyncio.run(run())

The above part of our BestBuy scraper is fairly straightforward. We only use two functions:

  • scrape_reviews: For requesting the reviews API, which accepts product skuID, sorting option, and page number. It starts by requesting the first page and then adding the remaining API URLs to a scraping list to request them concurrently.
  • parse_reviews: For parsing the JSON response of the reviews API. The response contains various review data types, but the function only parses the user reviews.

Here is a sample output of the above BestBuy scraping code:

Sample output
[
  {
    "id": "6b88383f-3830-3c78-915c-d3cf9f16596d",
    "topicType": "review",
    "rating": 5,
    "recommended": true,
    "title": "Amazing!",
    "text": "An absolutly amazing console very fast and smooth.",
    "author": "CocaNoot",
    "positiveFeedbackCount": 0,
    "negativeFeedbackCount": 0,
    "commentCount": 0,
    "writeCommentUrl": "/site/reviews/submission/6565065/review/337294210?campaignid=RR_&return=",
    "submissionTime": "2024-03-02T10:52:07.000-06:00",
    "brandResponses": [],
    "badges": [
      {
        "badgeCode": "Incentivized",
        "badgeDescription": "This reviewer received promo considerations or sweepstakes entry for writing a review.",
        "badgeName": "Incentivized",
        "badgeType": "Custom",
        "fileName": null,
        "iconText": null,
        "iconPath": null,
        "index": 90900
      },
      {
        "badgeCode": "VerifiedPurchaser",
        "badgeDescription": "We’ve verified that this content was written by people who purchased this item at Best Buy.",
        "badgeName": "Verified Purchaser",
        "badgeType": "Custom",
        "fileName": "badgeContextual-verifiedPurchaser.jpg",
        "imageURL": "https://bestbuy.ugc.bazaarvoice.com/static/3545w/badgeContextual-verifiedPurchaser.jpg",
        "iconText": "Verified Purchase",
        "iconPath": "/ugc-raas/ugc-common-assets/ugc-badge-verified-check.svg",
        "index": 100000,
        "iconUrl": "https://www.bestbuy.com/~assets/bby/_com/ugc-raas/ugc-common-assets/ugc-badge-verified-check.svg"
      },
      {
        "badgeCode": "rewardZoneNumberV3",
        "badgeDescription": "My Best Buy members receive promotional considerations or entries into drawings for writing reviews.",
        "badgeName": "My Best Buy\\u00ae Member",
        "badgeType": "Custom",
        "fileName": "badgeRewardZoneStd.gif",
        "imageURL": "https://bestbuy.ugc.bazaarvoice.com/static/3545w/badgeRewardZoneStd.gif",
        "iconText": "",
        "iconPath": "/ugc-raas/ugc-common-assets/badge-my-bestbuy-core.svg",
        "index": 100500,
        "iconUrl": "https://www.bestbuy.com/~assets/bby/_com/ugc-raas/ugc-common-assets/badge-my-bestbuy-core.svg"
      }
    ],
    "photos": [
      {
        "photoId": "008b1a1e-ba1b-38ea-b86e-effb7c0ca162",
        "caption": null,
        "normalUrl": "https://photos-us.bazaarvoice.com/photo/2/cGhvdG86YmVzdGJ1eQ/e79a5ff1-e891-57fa-ae03-e9f52bb4d7c4",
        "piscesUrl": "https://pisces.bbystatic.com/image2/BestBuy_US/ugc/photos/thumbnail/8db68b60f7a60bcea8f6cd1470938da9.jpg",
        "thumbnailUrl": "https://photos-us.bazaarvoice.com/photo/2/cGhvdG86YmVzdGJ1eQ/bd287ee8-1c8b-52ae-9c12-4a379d7ecb24",
        "reviewId": "6b88383f-3830-3c78-915c-d3cf9f16596d"
      }
    ],
    "qualityRating": null,
    "valueRating": null,
    "easeOfUseRating": null,
    "daysOfOwnership": 70,
    "pros": null,
    "cons": null,
    "secondaryRatings": [
      {
        "attribute": "Performance",
        "value": 5,
        "attributeLabel": "Performance",
        "valueLabel": "Excellent"
      },
      {
        "attribute": "StorageCapacity",
        "value": 5,
        "attributeLabel": "Storage Capacity",
        "valueLabel": "Excellent"
      },
      {
        "attribute": "Controller",
        "value": 5,
        "attributeLabel": "Controller",
        "valueLabel": "Excellent"
      }
    ]
  },
  ....  
]

With this last feature, our BestBuy scraper is complete. It can scrape sitemaps, search, product, and review data.

Avoid BestBuy Scraping Blocking

We have successfully scraped BestBuy data from various pages. However, attempting to scale our scraping rate will lead the website to block the IP address.

scrapfly middleware

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

Here is how we can scrape without getting blocked with ScrapFly. All we have to do is replace the HTTP client with the ScrapFly client, enable the asp parameter, and select a proxy country:

# standard web scraping code
import httpx
from parsel import Selector

response = httpx.get("some bestbuy.com URL")
selector = Selector(response.text)

# in ScrapFly becomes this 👇
from scrapfly import ScrapeConfig, ScrapflyClient

# replaces your HTTP client (httpx in this case)
scrapfly = ScrapflyClient(key="Your ScrapFly API key")

response = scrapfly.scrape(ScrapeConfig(
    url="website URL",
    asp=True, # enable the anti scraping protection to bypass blocking
    country="US", # set the proxy location to a specfic country
    render_js=True # enable rendering JavaScript (like headless browsers) to scrape dynamic content if needed
))

# use the built in Parsel selector
selector = response.selector
# access the HTML content
html = response.scrape_result['content']

FAQ

To wrap up this guide on web scraping BestBuy, let's have a look at some frequently asked questions.

Are there public APIs for BestBuy?

Yes, BestBuy offers APIs for developers. We have scraped review data from hidden BestBuy APIs. The same approach can be utilized to scrape other data sources on the website.

Are there alternatives for scraping BestBuy?

Yes, other popular e-commerce platforms include Amazon and Walmart. We have covered scraping Amazon and Walmart in previous tutorials. For more guides on similar scraping targets, refer to our #scrapeguide blog tag.

Latest BestBuy Scraper Code
https://github.com/scrapfly/scrapfly-scrapers/

Summary

In this guide, we have explained how to scrape BestBuy. We went through a step-by-step guide on scraping BestBuy with Python for different pages on the website, which are:

  • Sitemaps for BestBuy page URLs.
  • Search pages for product data on search results.
  • Product pages for various details, including specifications, pricing, and ratings.
  • Review pages for user reviews on products.

Related Posts

How to Scrape YouTube in 2025

Learn how to scrape YouTube, channel, video, and comment data using Python directly in JSON.

How to Scrape Reddit Posts, Subreddits and Profiles

In this article, we'll explore how to scrape Reddit. We'll extract various social data types from subreddits, posts, and user pages. All of which through plain HTTP requests without headless browser usage.

How to Scrape LinkedIn in 2025

In this scrape guide we'll be taking a look at one of the most popular web scraping targets - LinkedIn.com. We'll be scraping people profiles, company profiles as well as job listings and search.