How to Scrape (2023 Update)

article feature image

Aliexpress is one the biggest global e-commerce stores from China as well as being a popular web scraping target.

Aliexpress contains millions of products and product reviews that can be used in market analytics, business intelligence and dropshipping.

In this tutorial, we'll take a look at how to scrape Aliexpress. We'll start by finding products by scraping the search system. Then we'll scrape the found product data, pricing and customer reviews.

This will be a relatively easy scraper in just a few lines of Python code. Let's dive in!

Latest Scraper Code

Why Scrape Aliexpress?

There are many reasons to scrape Aliexpress data. For starters, because Aliexpress is the biggest e-commerce platform in the world, it's a prime target for business intelligence or market analytics. Having an awareness of top products and their meta-information on Aliexpress can be used to great advantage in business and market analysis.

Another common use is e-commerce primarily via dropshipping - one of the biggest emergent markets of this century is curating a list of products and reselling them directly rather than managing a warehouse. In this case, many shop curators would scrape Aliexpress products to generate curated product lists for their dropshipping shops.

Project Setup

In this tutorial we'll be using Python with two packages:

  • httpx - HTTP client library which will let us communicate with's servers
  • parsel - HTML parsing library which will help us to parse our web scraped HTML files for hotel data.

All of these packages can be easily installed via pip command:

$ pip install httpx parsel

Alternatively, you're free to swap httpx out with any other HTTP client library such as requests as we'll only need basic HTTP functions which are almost interchangeable in every library. As for, parsel, another great alternative is beautifulsoup package.

Hands on Python Web Scraping Tutorial and Example Project

While our Aliexpress scraper is pretty easy if you're new to web scraping with Python we recommend checking out our full introduction tutorial to web scraping with Python and common best practices.

Hands on Python Web Scraping Tutorial and Example Project

Finding Aliexpress Products

There are many ways to discover products on Aliexpress.
We could use the search system to find products we want to scrape or explore many product categories. Whichever approach we take our key target is all the same - scrape product previews and pagination.

Let's take a look at Aliexpress listing page that is used in the search or category view:


If we take a look at the page source of either search or category page we can see that all the product previews are stored in a javascript variable window.runParams tucked away in the <script> tag in the HTML source of the page:

page source illustration
We can see product preview data by exploring page source in our browser

This is a common web development pattern, which enables dynamic data management using javascript.

It's good news for us though, as we can pick this data up with a simple regex pattern and parse it like a Python dictionary! This is generally called hidden web data scraping and it's a common pattern in modern web scraping.

With this, we can write the first piece of our scraper code - the product preview parser. We'll be using it to extract product preview data from category or search result pages:

from parsel import Selector
import json

def extract_search(response) -> Dict:
    """extract json data from search page"""
    # find script with result.pagectore data in it._it_t_=
    script_with_data = sel.xpath('//script[contains(text(),"window.runParams")]')
    # select page data from javascript variable in script tag using regex
    data = json.loads('_init_data_\s*=\s*{\s*data:\s*({.+}) }')[0])
    return data['data']['root']['fields']

def parse_search(response):
    """Parse search page response for product preview results"""
    data = extract_search(response)
    parsed = []
    for result in data["mods"]["itemList"]["content"]:
                "id": result["productId"],
                "url": f"{result['productId']}.html",
                "type": result["productType"],  # can be either natural or ad
                "title": result["title"]["displayTitle"],
                "price": result["prices"]["salePrice"]["minPrice"],
                "currency": result["prices"]["salePrice"]["currencyCode"],
                "trade": result.get("trade", {}).get("tradeDesc"),  # trade line is not always present
                "thumbnail": result["image"]["imgUrl"].lstrip("/"),
                "store": {
                    "url": result["store"]["storeUrl"],
                    "name": result["store"]["storeName"],
                    "id": result["store"]["storeId"],
                    "ali_id": result["store"]["aliMemberId"],
    return parsed

Let's try our parser out by scraping a single Aliexpress listing page (category page or search results page):

Run code & example output
if __name__ == "__main__":
    # for example, this category is for android phones:
    resp = httpx.get("", follow_redirects=True)
    print(json.dumps(parse_search(resp), indent=2))
    "id": "3256804075561256",
    "url": "",
    "type": "ad",
    "title": "2G/3G Smartphones Original 512MB RAM/1G RAM 4GB ROM android mobile phones new cheap celulares FM unlocked 4.0inch cell",
    "price": 21.99,
    "currency": "USD",
    "trade": "8 sold",
    "thumbnail": "",
    "store": {
      "url": "",
      "name": "New 123 Store",
      "id": 1101690689,
      "ali_id": 247497658

There's a lot of useful information, but we've limited our parser to bare essentials to keep things brief. Let's put this parser to use in actual scraping next.

Now that we have our product preview parser ready, we need a scraper loop that will iterate through search results to collect all available results - not just the first page:

import httpx

async def scrape_search(query: str, session: httpx.AsyncClient, sort_type="default"):
    """Scrape all search results and return parsed search result data"""
    query = query.replace(" ", "+")

    async def scrape_search_page(page):
        """Scrape a single aliexpress search page and return all embedded JSON search data"""
        print(f"scraping search query {query}:{page} sorted by {sort_type}")
        resp = await session.get(
        return resp

    # scrape first search page and find total result count
    first_page = await scrape_search_page(query, session, 1)
    first_page_data = extract_search(first_page)
    page_size = first_page_data["pageInfo"]["pageSize"]
    total_pages = int(math.ceil(first_page_data["pageInfo"]["totalResults"] / page_size))
    if total_pages > 60:
        log.warning(f"query has {total_pages}; lowering to max allowed 60 pages")
        total_pages = 60

    # scrape remaining pages concurrently
    print(f'scraping search "{query}" of total {total_pages} sorted by {sort_type}')
    other_pages = await asyncio.gather(*[scrape_search_page(page=i) for i in range(1, total_pages + 1)])

    product_previews = []
    for response in [first_page, *other_pages]:
    return product_previews

Above, we defined our scrape_search function we use a common web scraping idiom for known length pagination:

efficient pagination scraping illustration

We scrape the first page to extract the total number of pages and scrape the remaining pages concurrently.

Now, that we can find products let's take a look at how we can scrape product data, pricing info and reviews!

Scraping Aliexpress Products

To scrape Aliexpress products all we need is a product numeric ID, which we already found in the previous chapter by scraping product previews from Aliexpress search. For example, this hand drill product has the numeric ID of 4000927436411.

To parse product data we can use the same technique we used in our search parser - the data is hidden in the HTML document under window.runParams variable's data key:

from parsel import Selector

def parse_product(response):
    """parse product HTML page for product data"""
    sel = Selector(text=response.text)
    # find the script tag containing our data:
    script_with_data = sel.xpath('//script[contains(text(),"window.runParams")]')
    # extract data using a regex pattern:
    data = json.loads("data: ({.+?}),\n")[0])
    product = {
        "name": data["titleModule"]["subject"],
        "total_orders": data["titleModule"]["formatTradeCount"],
        "feedback": data["titleModule"]["feedbackRating"],
        "variants": [],
    # every product variant has it's own price and ID number (sku):
    for sku in data["skuModule"]["skuPriceList"]:
                "name": sku["skuAttr"].split("#", 1)[1].split(";")[0],
                "sku": sku["skuId"],
                "available": sku["skuVal"]["availQuantity"],
                "full_price": sku["skuVal"]["skuAmount"]["value"],
                "discount_price": sku["skuVal"]["skuActivityAmount"]["value"],
                "currency": sku["skuVal"]["skuAmount"]["currency"],
    # data variable contains much more information - so feel free to explore it,
    # but to keep things brief we focus on essentials in this article
    return product

async def scrape_products(ids, session: httpx.AsyncClient):
    """scrape aliexpress products by id"""
    print(f"scraping {len(ids)} products")
    responses = await asyncio.gather(*[session.get(f"{id_}.html") for id_ in ids])
    results = []
    for response in responses:
    return results

Here, we defined our product scraping function which takes in product IDs, scrapes HTML contents and extracts hidden product JSON of each product. If we run it for our drill product we should see a nicely formatted response:

Run code & example output
# Let's use browser like request headers for this scrape to reduce chance of being blocked or asked to solve a captcha
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "accept-language": "en-US;en;q=0.9",
    "accept-encoding": "gzip, deflate, br",

async def run():
    async with httpx.AsyncClient(headers=BASE_HEADERS) as session:
        print(json.dumps(await scrape_products(["4000927436411"], session), indent=2))

if __name__ == "__main__":
    import asyncio
    "name": "Mini Wireless Drill Electric Carving Pen Variable Speed USB Cordless Drill Rotary Tools Kit Engraver Pen for Grinding Polishing",
    "total_orders": "3824",
    "feedback": {
      "averageStar": "4.8",
      "averageStarRage": "96.4",
      "display": true,
      "evarageStar": "4.8",
      "evarageStarRage": "96.4",
      "fiveStarNum": 1724,
      "fiveStarRate": "88",
      "fourStarNum": 170,
      "fourStarRate": "9",
      "oneStarNum": 21,
      "oneStarRate": "1",
      "positiveRate": "87.6",
      "threeStarNum": 45,
      "threeStarRate": "2",
      "totalValidNum": 1967,
      "trialReviewNum": 0,
      "twoStarNum": 7,
      "twoStarRate": "0"
    "variants": [
        "name": "Red",
        "sku": 10000011265318724,
        "available": 1601,
        "full_price": 16.24,
        "discount_price": 12.99,
        "currency": "USD"

Using this approach, we scrapped much more data than we could see in the visible HTML of the page. We got SKU numbers, stock availability, detailed pricing and review score meta information. We're only missing reviews themselves so let's take a look at how we can retrieve the review data.

Scraping Aliexpress Reviews

Aliexpress' product reviews require additional request to its backend API. If we fire up Network Inspector devtools (F12 in major browsers and then "Network" tab) we can see a background request being made when we click on a next review page:

We can see a background request being made when we click on page 2 link

Let's replicate this request in our scraper:

def parse_review_page(response):
    """parse single review page"""
    sel = Selector(response.text)
    parsed = []
    for review_box in sel.css(".feedback-item"):
        # to get star score we have to rely on styling where's 1 star == 20% width, e.g. 4 stars is 80%
        stars = int(review_box.css(".star-view>span::attr(style)").re("width:(\d+)%")[0]) / 20
        # to get options we must iterate through every options container
        options = {}
        for option in review_box.css("div.user-order-info>span"):
            name = option.css("strong::text").get("").strip()
            value = "".join(option.xpath("text()").getall()).strip()
            options[name] = value
        # parse remaining fields
                "country": review_box.css(".user-country>b::text").get("").strip(),
                "text": review_box.xpath('.//dt[contains(@class,"buyer-feedback")]/span[1]/text()').get("").strip(),
                "post_time": review_box.xpath('.//dt[contains(@class,"buyer-feedback")]/span[2]/text()').get("").strip(),
                "stars": stars,
                "order_info": options,
                "user_name": review_box.css(".user-name>a::text").get(),
                "user_url": review_box.css(".user-name>a::attr(href)").get(),
    return parsed

async def scrape_product_reviews(seller_id: str, product_id: str, session: httpx.AsyncClient):
    """scrape all reviews of aliexpress product"""

    async def scrape_page(page):
        log.debug(f"scraping review page {page} of product {product_id}")
        data = f"ownerMemberId={seller_id}&memberType=seller&productId={product_id}&companyId=&evaStarFilterValue=all+Stars&evaSortValue=sortlarest%40feedback&page={page}&currentPage={page-1}&startValidDate=&i18n=true&withPictures=false&withAdditionalFeedback=false&onlyFromMyCountry=false&version=&isOpened=true&translate=+Y+&jumpToTop=true&v=2"
        resp = await
            headers={**session.headers, "Content-Type": "application/x-www-form-urlencoded"},
        return resp

    # scrape first page of reviews and find total count of review pages
    first_page = await scrape_page(page=1)

    sel = Selector(text=first_page.text)
    total_reviews = sel.css("div.customer-reviews").re(r"\((\d+)\)")[0]
    total_pages = int(math.ceil(int(total_reviews) / 10))

    # then scrape remaining review pages concurrently
    print(f"scraping reviews of product {product_id}, found {total_reviews} total reviews")
    other_pages = await asyncio.gather(*[scrape_page(page) for page in range(1, total_pages + 1)])
    reviews = []
    for resp in [first_page, *other_pages]:
    return reviews

For scraping reviews we're using the same paging idiom we've learned earlier - we request the first page, find the total count and retrieve the rest concurrently.
Further, since reviews are only available in HTML structure we have to dig into HTML parsing a bit more. We iterated through each review box and extracted core details such as star rating, review text and title etc. - all with a few clever XPath and CSS selectors!

Parsing HTML with Xpath

For more on parsing HTML using XPATH see our complete, interactive introduction course.

Parsing HTML with Xpath

Now, that we have our Aliexpress review scraper let's take it for a spin. For that we'll need seller ID and product ID, which we found previously in our product data scraper (fields sellerId and productId)

Run code & example output
async def run():
    async with httpx.AsyncClient(headers=BASE_HEADERS) as session:
        print(json.dumps(await scrape_product_reviews("220712488", "4000714658687", session), indent=2))

if __name__ == "__main__":
    "country": "BR",
    "text": "As requested and",
    "post_time": "31 May 2022 16:11",
    "stars": 5.0,
    "order_info": {
      "Color:": "DKCD20FU-Li SET2",
      "Ships From:": "China",
      "Logistics:": "Seller's Shipping Method"
    "user_name": "S***s",
    "user_url": ""

With this, we've covered the main scrape targets of Aliexpress - we scraped search to find products, product pages to find product data and product reviews to gather feedback intelligence. Finally, to scrape at scale let's take a look at how can we avoid blocking and captchas.

ScrapFly Bypass Blocking and Captchas

Scraping product data of seems to be easy though unfortunately when scraping at the scale we might be blocked or requested to start solving captchas which will hinder our web scraping process.

To get around this, let's take advantage of ScrapFly API which can avoid all of these blocks for us!

illustration of scrapfly's middleware

Which offers several powerful features that'll help us to get around AliExpress's blocking:

For this, we'll be using scrapfly-sdk python package and ScrapFly's anti scraping protection bypass feature. First, let's install scrapfly-sdk using pip:

$ pip install scrapfly-sdk

To take advantage of ScrapFly's API in our AliExpress product scraper all we need to do is our httpx session code with scrapfly-sdk requests.


To wrap this guide up, let's take a look at some frequently asked questions about web scraping

Yes. Aliexpress product data is publicly available, and we're not extracting anything personal or private. Scraping at slow, respectful rates would fall under the ethical scraping definition. See our Is Web Scraping Legal? article for more.

Is there an Aliexpress API?

No. Currently there's no public API for retrieving product data from Fortunately, as covered in this tutorial, web scraping Aliexpress is easy and can be done with a few lines of Python code!

Scraped Aliexpress data is not accurate, what can I do?

The main cause of data difference is geo location. shows different prices and products based on the user's location so the scraper needs to match the location of the desired data. If you're using Scrapfly API then see our geo location selection feature.

Latest Scraper Code

Aliexpress Scraping Summary

In this tutorial, we built an Aliexpress data scraper capable of using the search system to discover products and scraping product data and product reviews.

We have used Python with httpx and parsel packages and to avoid being blocked we used ScrapFly's API, which smartly configures every web scraper connection to avoid being blocked. For more on ScrapFly see our documentation and try it out for free!

Related Posts

How to scrape Threads by Meta using Python (2023-08 Update)

Guide how to scrape Threads - new social media network by Meta and Instagram - using Python and popular libraries like Playwright and background request capture techniques.

How to Scrape for Fashion Apparel Data in Python is a rising storefront for luxury fashion apparel items. It's known for high quality apparel data so in this tutorial we'll take a look how to scrape it using Python.

How to Scrape Fashionphile for Second Hand Fashion Data

In this fashion scrapeguide we'll be taking a look at Fashionphile - another major 2nd hand luxury fashion marketplace. We'll be using Python and hidden web data scraping to grap all of this data in just few lines of code.