How to Scrape Nordstrom Fashion Product Data

article feature image

Nordstrom is a leading fashion retailer based in US with an equally popular e-commerce store that operates worldwide. It's a popular web scraping target because of the rich data it offers and it's position in the fashion industry.

In this guide, we'll take a look at web scraping Nordstrom using Python. We'll cover:

  • Nordstrom product data scraping.
  • Product discovery and search.

For this, we'll be using popular web scraping in Python tools httpx and parsel. To parse the data we'll be using hidden web data approach.

Nordstrom is relatively easy to scrape so let's dive in!

Latest Nordstorm.com Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

Why Scrape Nordstrom?

Nordstrom is a popular fashion retailer with a huge product catalog. It's a great target for web scraping because of the rich data it offers. Its popularity and dataset size is a great way to understand the fashion e-commerce market. This data can be used for business analytics, market analysis and competitive intelligence.

For more on web scraping uses see our web scraping use case hub.

Nordstrom Scrape Preview

In this article, we'll focus on scraping Nordstrom product data and product reviews. Here are some examples of the datasets we'll be collecting:

Scraped Product Dataset
{
  "id": 5846438,
  "title": "SKIMS Stretch Cotton T-Shirt",
  "type": "T-shirt/Tee",
  "typeParent": "Tops",
  "ageGroups": [
    "ADULT"
  ],
  "reviewAverageRating": 4.5,
  "numberOfReviews": 652,
  "brand": {
    "brandName": "SKIMS",
    "brandUrl": "/brands/skims--21197?origin=productBrandLink",
    "hasBrandPage": false,
    "imsBrandId": 74974321
  },
  "description": "A tried-and-true classic, this fitted T-shirt made from stretch-cotton jersey is from Kim Kardashian's highly sought-out SKIMS.",
  "features": [
    "21 1/2\" length (size Medium)",
    "Crewneck",
    "Short sleeves",
    "90% cotton, 10% elastane",
    "Machine wash, tumble dry",
    "Imported",
    "Item #6194916"
  ],
  "gender": "Female",
  "isAvailable": true,
  "media": {
    "5847438": {
      "id": 5847438,
      "colorId": "053",
      "name": "LIGHT HEATHER GREY",
      "url": "https://n.nordstrommedia.com/id/sr3/e354aaf8-5865-431b-b8d8-3cbccc6a2d83.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5847448": {
      "id": 5847448,
      "colorId": "053",
      "name": "LIGHT HEATHER GREY",
      "url": "https://n.nordstrommedia.com/id/sr3/df191e8d-4f2c-48f4-9144-e6b9dbede775.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5847458": {
      "id": 5847458,
      "colorId": "053",
      "name": "LIGHT HEATHER GREY",
      "url": "https://n.nordstrommedia.com/id/sr3/bca96a41-af1b-4736-89e3-e2facb3ec8ed.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5847468": {
      "id": 5847468,
      "colorId": "053",
      "name": "LIGHT HEATHER GREY",
      "url": "https://n.nordstrommedia.com/id/sr3/1b0051f1-f60e-4b4b-8f79-3fabd077e91d.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5847478": {
      "id": 5847478,
      "colorId": "053",
      "name": "LIGHT HEATHER GREY",
      "url": "https://n.nordstrommedia.com/id/sr3/86510e70-589b-440a-b66a-98982ce59740.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5847488": {
      "id": 5847488,
      "colorId": "053",
      "name": "LIGHT HEATHER GREY",
      "url": "https://n.nordstrommedia.com/id/sr3/d6ae4e0c-3b22-4dff-b528-d428005d8cd8.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5848438": {
      "id": 5848438,
      "colorId": "234",
      "name": "SEDONA",
      "url": "https://n.nordstrommedia.com/id/sr3/d64c4a4d-ca98-46af-8ff4-efd7460e3321.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5848448": {
      "id": 5848448,
      "colorId": "234",
      "name": "SEDONA",
      "url": "https://n.nordstrommedia.com/id/sr3/f1d6105b-9e75-49aa-bfdb-39ed6a0cd82a.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5848458": {
      "id": 5848458,
      "colorId": "234",
      "name": "SEDONA",
      "url": "https://n.nordstrommedia.com/id/sr3/04936587-02d9-41c7-b36f-b7f90144df6e.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5849438": {
      "id": 5849438,
      "colorId": "242",
      "name": "UMBER",
      "url": "https://n.nordstrommedia.com/id/sr3/85f4e2d8-00de-41f9-b777-2169bb799970.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5849448": {
      "id": 5849448,
      "colorId": "242",
      "name": "UMBER",
      "url": "https://n.nordstrommedia.com/id/sr3/4e2bffa2-fb87-416c-8438-a922d593423f.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5849458": {
      "id": 5849458,
      "colorId": "242",
      "name": "UMBER",
      "url": "https://n.nordstrommedia.com/id/sr3/ca5f4ff8-7587-48cc-8914-818ee6320b9c.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5850438": {
      "id": 5850438,
      "colorId": "251",
      "name": "CAMEL",
      "url": "https://n.nordstrommedia.com/id/sr3/0762da9a-4326-46fd-9b84-6db33035c0ea.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5850448": {
      "id": 5850448,
      "colorId": "251",
      "name": "CAMEL",
      "url": "https://n.nordstrommedia.com/id/sr3/9f20433a-3d03-4893-87f9-2fd90f05c2b5.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5850458": {
      "id": 5850458,
      "colorId": "251",
      "name": "CAMEL",
      "url": "https://n.nordstrommedia.com/id/sr3/32d39f3b-88e8-4ee2-bb15-7723bed651c8.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5850468": {
      "id": 5850468,
      "colorId": "251",
      "name": "CAMEL",
      "url": "https://n.nordstrommedia.com/id/sr3/da666e38-7c2d-408e-9874-f30f094ccd9e.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5850478": {
      "id": 5850478,
      "colorId": "251",
      "name": "CAMEL",
      "url": "https://n.nordstrommedia.com/id/sr3/97828599-558b-48a5-8e03-35aeec7f6dbe.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5850488": {
      "id": 5850488,
      "colorId": "251",
      "name": "CAMEL",
      "url": "https://n.nordstrommedia.com/id/sr3/ef50a5bb-8f20-428d-8d64-0c7f9dd80776.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5851438": {
      "id": 5851438,
      "colorId": "301",
      "name": "DEEP SEA",
      "url": "https://n.nordstrommedia.com/id/sr3/8a2ed339-427b-4f93-9a49-762a43145d42.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5851448": {
      "id": 5851448,
      "colorId": "301",
      "name": "DEEP SEA",
      "url": "https://n.nordstrommedia.com/id/sr3/406118cc-c17a-42a5-842c-c12a54c19b39.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5852438": {
      "id": 5852438,
      "colorId": "339",
      "name": "MINERAL",
      "url": "https://n.nordstrommedia.com/id/sr3/a6c49b4c-1849-4c9e-895e-2804c4a0d01b.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5852448": {
      "id": 5852448,
      "colorId": "339",
      "name": "MINERAL",
      "url": "https://n.nordstrommedia.com/id/sr3/3c3820b0-0fe3-4869-bfcb-040917a78276.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5852458": {
      "id": 5852458,
      "colorId": "339",
      "name": "MINERAL",
      "url": "https://n.nordstrommedia.com/id/sr3/0544d615-d912-4fed-8e35-95bd9fdf753f.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5852468": {
      "id": 5852468,
      "colorId": "339",
      "name": "MINERAL",
      "url": "https://n.nordstrommedia.com/id/sr3/df96797a-9a3d-4070-83c5-cd7d94dd1260.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5853438": {
      "id": 5853438,
      "colorId": "400",
      "name": "COBALT",
      "url": "https://n.nordstrommedia.com/id/sr3/95c440cd-18ea-47e0-a48f-6f97f1e1c0fc.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5854438": {
      "id": 5854438,
      "colorId": "446",
      "name": "KYANITE",
      "url": "https://n.nordstrommedia.com/id/sr3/b0359253-5e23-4619-9123-34dfb35063e6.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5854448": {
      "id": 5854448,
      "colorId": "446",
      "name": "KYANITE",
      "url": "https://n.nordstrommedia.com/id/sr3/81a2918e-5643-4d63-8850-d9d8654b62af.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5854458": {
      "id": 5854458,
      "colorId": "446",
      "name": "KYANITE",
      "url": "https://n.nordstrommedia.com/id/sr3/8e2b9d0f-8b8f-4835-9a57-bb197f95631d.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5855438": {
      "id": 5855438,
      "colorId": "8",
      "name": "525",
      "url": "https://n.nordstrommedia.com/id/sr3/31cdee52-d41a-46a7-8691-3ae1e0c53fb7.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5855448": {
      "id": 5855448,
      "colorId": "8",
      "name": "525",
      "url": "https://n.nordstrommedia.com/id/sr3/a1843dad-b30c-4031-8d36-42c47934572f.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5855458": {
      "id": 5855458,
      "colorId": "8",
      "name": "525",
      "url": "https://n.nordstrommedia.com/id/sr3/e2543102-670e-40e8-acb6-916ea91f1515.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5855468": {
      "id": 5855468,
      "colorId": "8",
      "name": "525",
      "url": "https://n.nordstrommedia.com/id/sr3/3daefa94-9c8a-41f6-967e-f85b80ba3ebf.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5855478": {
      "id": 5855478,
      "colorId": "8",
      "name": "525",
      "url": "https://n.nordstrommedia.com/id/sr3/95c440cd-18ea-47e0-a48f-6f97f1e1c0fc.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5855488": {
      "id": 5855488,
      "colorId": "8",
      "name": "525",
      "url": "https://n.nordstrommedia.com/id/sr3/ad9d1fcc-a0a0-4856-8345-de54e3b6b54f.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5856438": {
      "id": 5856438,
      "colorId": "603",
      "name": "SANGRIA",
      "url": "https://n.nordstrommedia.com/id/sr3/aaa6a78e-f7d8-46f3-b51e-533642b5ea02.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5856448": {
      "id": 5856448,
      "colorId": "603",
      "name": "SANGRIA",
      "url": "https://n.nordstrommedia.com/id/sr3/3d680521-dc9e-4f07-a634-e02043e78910.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5856458": {
      "id": 5856458,
      "colorId": "603",
      "name": "SANGRIA",
      "url": "https://n.nordstrommedia.com/id/sr3/fbc8e722-af04-403f-a8fa-e938d56da1f3.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5856468": {
      "id": 5856468,
      "colorId": "603",
      "name": "SANGRIA",
      "url": "https://n.nordstrommedia.com/id/sr3/4f2e699b-125f-484e-8873-09f72a2fa40a.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5856478": {
      "id": 5856478,
      "colorId": "603",
      "name": "SANGRIA",
      "url": "https://n.nordstrommedia.com/id/sr3/1559d8ec-d03e-416c-9d27-c6d31151012f.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5856488": {
      "id": 5856488,
      "colorId": "603",
      "name": "SANGRIA",
      "url": "https://n.nordstrommedia.com/id/sr3/44ffdcad-614c-4329-ba6b-65244873e200.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5857438": {
      "id": 5857438,
      "colorId": "690",
      "name": "ROSE CLAY",
      "url": "https://n.nordstrommedia.com/id/sr3/35a9863f-feda-463c-aedf-a988329754c8.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5857448": {
      "id": 5857448,
      "colorId": "690",
      "name": "ROSE CLAY",
      "url": "https://n.nordstrommedia.com/id/sr3/2241bfc4-be0f-4645-a350-7d19aafce7ae.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5857458": {
      "id": 5857458,
      "colorId": "690",
      "name": "ROSE CLAY",
      "url": "https://n.nordstrommedia.com/id/sr3/3c76bafa-dda1-4069-9d55-4deddd58a70f.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5857468": {
      "id": 5857468,
      "colorId": "690",
      "name": "ROSE CLAY",
      "url": "https://n.nordstrommedia.com/id/sr3/a3e5cf6b-0e43-455e-aba4-0b7093e0ac60.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5857478": {
      "id": 5857478,
      "colorId": "690",
      "name": "ROSE CLAY",
      "url": "https://n.nordstrommedia.com/id/sr3/357882e4-c176-4c98-9601-39ee0299452a.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5857488": {
      "id": 5857488,
      "colorId": "690",
      "name": "ROSE CLAY",
      "url": "https://n.nordstrommedia.com/id/sr3/b9a9588e-a241-43b8-b907-0fc5d16d959c.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5858438": {
      "id": 5858438,
      "colorId": "900",
      "name": "BONE",
      "url": "https://n.nordstrommedia.com/id/sr3/eb5b0ed4-41b9-439b-a56d-a9f549892451.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5859438": {
      "id": 5859438,
      "colorId": "003",
      "name": "SOOT",
      "url": "https://n.nordstrommedia.com/id/sr3/2c5c5fd6-3df6-4e30-a5af-893041f219dc.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    },
    "5860438": {
      "id": 5860438,
      "colorId": "203",
      "name": "GARNET",
      "url": "https://n.nordstrommedia.com/id/sr3/9b140781-5301-4137-b94e-fe10b7a674b4.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
    }
  },
  "variants": {
    "5871416": {
      "id": 5871416,
      "sizeId": "xx-small",
      "colorId": "339",
      "totalQuantityAvailable": 1,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "339",
        "value": "Mineral",
        "sizes": "_s:xx-small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5852438,
          5852448,
          5852458,
          5852468
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/8a3eb8e4-e660-42d9-af9e-41d9e85ecb99.jpeg?crop=fit&w=31&h=31"
      }
    },
    "5871419": {
      "id": 5871419,
      "sizeId": "medium",
      "colorId": "339",
      "totalQuantityAvailable": 9,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "339",
        "value": "Mineral",
        "sizes": "_s:xx-small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5852438,
          5852448,
          5852458,
          5852468
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/8a3eb8e4-e660-42d9-af9e-41d9e85ecb99.jpeg?crop=fit&w=31&h=31"
      }
    },
    "5871420": {
      "id": 5871420,
      "sizeId": "large",
      "colorId": "339",
      "totalQuantityAvailable": 10,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "339",
        "value": "Mineral",
        "sizes": "_s:xx-small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5852438,
          5852448,
          5852458,
          5852468
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/8a3eb8e4-e660-42d9-af9e-41d9e85ecb99.jpeg?crop=fit&w=31&h=31"
      }
    },
    "5871421": {
      "id": 5871421,
      "sizeId": "x-large",
      "colorId": "339",
      "totalQuantityAvailable": 19,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "339",
        "value": "Mineral",
        "sizes": "_s:xx-small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5852438,
          5852448,
          5852458,
          5852468
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/8a3eb8e4-e660-42d9-af9e-41d9e85ecb99.jpeg?crop=fit&w=31&h=31"
      }
    },
    "5871422": {
      "id": 5871422,
      "sizeId": "plus-2 x",
      "colorId": "339",
      "totalQuantityAvailable": 19,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "339",
        "value": "Mineral",
        "sizes": "_s:xx-small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5852438,
          5852448,
          5852458,
          5852468
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/8a3eb8e4-e660-42d9-af9e-41d9e85ecb99.jpeg?crop=fit&w=31&h=31"
      }
    },
    "5871423": {
      "id": 5871423,
      "sizeId": "plus-3 x",
      "colorId": "339",
      "totalQuantityAvailable": 14,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "339",
        "value": "Mineral",
        "sizes": "_s:xx-small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5852438,
          5852448,
          5852458,
          5852468
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/8a3eb8e4-e660-42d9-af9e-41d9e85ecb99.jpeg?crop=fit&w=31&h=31"
      }
    },
    "5871424": {
      "id": 5871424,
      "sizeId": "plus-4 x",
      "colorId": "339",
      "totalQuantityAvailable": 23,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "339",
        "value": "Mineral",
        "sizes": "_s:xx-small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5852438,
          5852448,
          5852458,
          5852468
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/8a3eb8e4-e660-42d9-af9e-41d9e85ecb99.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855448": {
      "id": 33855448,
      "sizeId": "small",
      "colorId": "900",
      "totalQuantityAvailable": 319,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "900",
        "value": "Bone",
        "sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5858438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855449": {
      "id": 33855449,
      "sizeId": "medium",
      "colorId": "900",
      "totalQuantityAvailable": 437,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "900",
        "value": "Bone",
        "sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5858438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855450": {
      "id": 33855450,
      "sizeId": "large",
      "colorId": "900",
      "totalQuantityAvailable": 626,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "900",
        "value": "Bone",
        "sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5858438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855451": {
      "id": 33855451,
      "sizeId": "x-large",
      "colorId": "900",
      "totalQuantityAvailable": 273,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "900",
        "value": "Bone",
        "sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5858438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855452": {
      "id": 33855452,
      "sizeId": "plus-2 x",
      "colorId": "900",
      "totalQuantityAvailable": 105,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "900",
        "value": "Bone",
        "sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5858438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855454": {
      "id": 33855454,
      "sizeId": "xx-small",
      "colorId": "900",
      "totalQuantityAvailable": 38,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "900",
        "value": "Bone",
        "sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5858438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855455": {
      "id": 33855455,
      "sizeId": "plus-3 x",
      "colorId": "900",
      "totalQuantityAvailable": 56,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "900",
        "value": "Bone",
        "sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5858438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855456": {
      "id": 33855456,
      "sizeId": "plus-4 x",
      "colorId": "900",
      "totalQuantityAvailable": 67,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "900",
        "value": "Bone",
        "sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5858438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855464": {
      "id": 33855464,
      "sizeId": "x-small",
      "colorId": "003",
      "totalQuantityAvailable": 1,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "003",
        "value": "Soot",
        "sizes": "_s:xx-small|_s:x-small|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5859438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/51d8a867-3627-4f76-88c2-5f3a6397ad2a.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855477": {
      "id": 33855477,
      "sizeId": "large",
      "colorId": "003",
      "totalQuantityAvailable": 720,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "003",
        "value": "Soot",
        "sizes": "_s:xx-small|_s:x-small|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5859438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/51d8a867-3627-4f76-88c2-5f3a6397ad2a.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855478": {
      "id": 33855478,
      "sizeId": "x-large",
      "colorId": "003",
      "totalQuantityAvailable": 317,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "003",
        "value": "Soot",
        "sizes": "_s:xx-small|_s:x-small|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5859438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/51d8a867-3627-4f76-88c2-5f3a6397ad2a.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855479": {
      "id": 33855479,
      "sizeId": "plus-2 x",
      "colorId": "003",
      "totalQuantityAvailable": 166,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "003",
        "value": "Soot",
        "sizes": "_s:xx-small|_s:x-small|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5859438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/51d8a867-3627-4f76-88c2-5f3a6397ad2a.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855480": {
      "id": 33855480,
      "sizeId": "xx-small",
      "colorId": "003",
      "totalQuantityAvailable": 22,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "003",
        "value": "Soot",
        "sizes": "_s:xx-small|_s:x-small|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5859438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/51d8a867-3627-4f76-88c2-5f3a6397ad2a.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855482": {
      "id": 33855482,
      "sizeId": "plus-3 x",
      "colorId": "003",
      "totalQuantityAvailable": 11,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "003",
        "value": "Soot",
        "sizes": "_s:xx-small|_s:x-small|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5859438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/51d8a867-3627-4f76-88c2-5f3a6397ad2a.jpeg?crop=fit&w=31&h=31"
      }
    },
    "33855483": {
      "id": 33855483,
      "sizeId": "plus-4 x",
      "colorId": "003",
      "totalQuantityAvailable": 18,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "003",
        "value": "Soot",
        "sizes": "_s:xx-small|_s:x-small|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5859438
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/51d8a867-3627-4f76-88c2-5f3a6397ad2a.jpeg?crop=fit&w=31&h=31"
      }
    },
    "36450158": {
      "id": 36450158,
      "sizeId": "medium",
      "colorId": "053",
      "totalQuantityAvailable": 241,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "053",
        "value": "Light Heather Grey",
        "sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5847438,
          5847448,
          5847458,
          5847468,
          5847478,
          5847488
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
      }
    },
    "36450160": {
      "id": 36450160,
      "sizeId": "large",
      "colorId": "053",
      "totalQuantityAvailable": 137,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "053",
        "value": "Light Heather Grey",
        "sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5847438,
          5847448,
          5847458,
          5847468,
          5847478,
          5847488
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
      }
    },
    "36450161": {
      "id": 36450161,
      "sizeId": "x-large",
      "colorId": "053",
      "totalQuantityAvailable": 69,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "053",
        "value": "Light Heather Grey",
        "sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5847438,
          5847448,
          5847458,
          5847468,
          5847478,
          5847488
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
      }
    },
    "36450162": {
      "id": 36450162,
      "sizeId": "plus-2 x",
      "colorId": "053",
      "totalQuantityAvailable": 40,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "053",
        "value": "Light Heather Grey",
        "sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5847438,
          5847448,
          5847458,
          5847468,
          5847478,
          5847488
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
      }
    },
    "36450163": {
      "id": 36450163,
      "sizeId": "xx-small",
      "colorId": "053",
      "totalQuantityAvailable": 16,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "053",
        "value": "Light Heather Grey",
        "sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5847438,
          5847448,
          5847458,
          5847468,
          5847478,
          5847488
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
      }
    },
    "36450164": {
      "id": 36450164,
      "sizeId": "plus-3 x",
      "colorId": "053",
      "totalQuantityAvailable": 23,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "053",
        "value": "Light Heather Grey",
        "sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5847438,
          5847448,
          5847458,
          5847468,
          5847478,
          5847488
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
      }
    },
    "36450165": {
      "id": 36450165,
      "sizeId": "plus-4 x",
      "colorId": "053",
      "totalQuantityAvailable": 27,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "053",
        "value": "Light Heather Grey",
        "sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5847438,
          5847448,
          5847458,
          5847468,
          5847478,
          5847488
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
      }
    },
    "36450185": {
      "id": 36450185,
      "sizeId": "x-small",
      "colorId": "053",
      "totalQuantityAvailable": 46,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "053",
        "value": "Light Heather Grey",
        "sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5847438,
          5847448,
          5847458,
          5847468,
          5847478,
          5847488
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
      }
    },
    "36450186": {
      "id": 36450186,
      "sizeId": "small",
      "colorId": "053",
      "totalQuantityAvailable": 197,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "053",
        "value": "Light Heather Grey",
        "sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5847438,
          5847448,
          5847458,
          5847468,
          5847478,
          5847488
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
      }
    },
    "38558224": {
      "id": 38558224,
      "sizeId": "plus-2 x",
      "colorId": "446",
      "totalQuantityAvailable": 22,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "446",
        "value": "Kyanite",
        "sizes": "_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5854438,
          5854448,
          5854458
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/2f637c12-349c-4506-9021-70e078f2ffe4.jpeg?crop=fit&w=31&h=31"
      }
    },
    "38558226": {
      "id": 38558226,
      "sizeId": "plus-3 x",
      "colorId": "446",
      "totalQuantityAvailable": 5,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "446",
        "value": "Kyanite",
        "sizes": "_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5854438,
          5854448,
          5854458
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/2f637c12-349c-4506-9021-70e078f2ffe4.jpeg?crop=fit&w=31&h=31"
      }
    },
    "38558227": {
      "id": 38558227,
      "sizeId": "plus-4 x",
      "colorId": "446",
      "totalQuantityAvailable": 7,
      "price": {
        "currencyCode": "USD",
        "units": 48,
        "nanos": 0
      },
      "color": {
        "id": "446",
        "value": "Kyanite",
        "sizes": "_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
        "mediaIds": [
          5854438,
          5854448,
          5854458
        ],
        "swatch": "https://n.nordstrommedia.com/id/sr3/2f637c12-349c-4506-9021-70e078f2ffe4.jpeg?crop=fit&w=31&h=31"
      }
    }
  }
}

Project Setup

For this scraper we'll be using the hidden web data scraping approach. We'll be collecting HTML pages and extracting hidden JSON datasets, then parsing them with JSON parsing tools:

  • httpx - powerful HTTP client which we'll be using to retrieve the HTML pages.
  • parsel - HTML parser which we'll be using to extract hidden JSON datasets.
  • nested-lookup - JSON/Dict parser which will help us find specific keys in large JSON datasets.
  • jmespath - JSON query engine which we'll be using to reduce JSON datasets to important bits like product prices, images etc. For more see our introduction to parsing JSON with JMESPath.

All of these packages can be installed using Python's pip console command:

$ pip install httpx parsel jmespath nested-lookup

For Scrapfly users there's also a Scrapfly SDK version of each code example. The SDK can be installed using pip as well:

$ pip install "scrapfly-sdk[all]"

Scrape Nordstrom Product Data

Let's start by scraping product data of a single product. For this, let's take a look at an example product page like:
nordstrom.com/s/nike-phoenix-fleece-crewneck-sweatshirt/

We could parse the HTML data using CSS selectors or XPath but since Nordstrom is using React javascript framework to power their website we can extract the dataset directly from the page source:

hidden web data of nordstrom product

If we open up page source and ctrl+f for unique product identifier text (like description or title) we can see there's a hidden JSON dataset. In web scraping, this is called hidden web data scraping and let's take a look how to scrape this in Python.

Our scraper process will look something like this:

  1. Retrieve HTML page of the product using httpx.
  2. Find the hidden JSON dataset from <script> tag using parsel and XPath.
  3. Load the JSON dataset using json.loads() and find product fields using nested-lookup

In Python this scraper will look like this:

Python
ScrapFly
import asyncio
import json
import httpx
from parsel import Selector
from nested_lookup import nested_lookup

# setup httpx client with http2 enabled and browser-like headers to avoid being blocked:
client = httpx.AsyncClient(
    http2=True,
    headers={
        "User-Agent": "Mozilla/4.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=-1.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
    }
)


def find_hidden_data(html) -> dict:
    """extract hidden web cache from page html"""
    # use XPath to find script tag with data
    data = Selector(html).xpath("//script[contains(.,'__INITIAL_CONFIG__')]/text()").get()
    data = data.split("=", 1)[-1].strip().strip(";")
    data = json.loads(data)
    return data


async def scrape_product(url: str):
    """scrape Nordstrom.com product page for product data"""
    response = await client.get(url)
    # find all hidden dataset:
    data = find_hidden_data(response.text)
    # extract only product data from the dataset
    # find first key "stylesById" and take first value (which is the current product)
    product = nested_lookup("stylesById", data)
    product = list(product[0].values())[0]
    return product

# example scrape run:
print(asyncio.run(scrape_product("https://www.nordstrom.com/s/nike-phoenix-fleece-crewneck-sweatshirt/6665302")))

import asyncio
import json
from nested_lookup import nested_lookup
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

client = ScrapflyClient(key="YOUR SCRAPFLY KEY")

def find_hidden_data(result: ScrapeApiResponse) -> dict:
    """extract hidden web cache from page html"""
    # use XPath to find script tag with data
    data = result.selector.xpath("//script[contains(.,'__INITIAL_CONFIG__')]/text()").get()
    data = data.split("=", 1)[-1].strip().strip(";")
    data = json.loads(data)
    return data


async def scrape_product(url: str):
    response = await client.scrape(ScrapeConfig(
        url=url,
        asp=True,  # enable anti-scraping-protection bypass
        cache=True,  # enable cache while we develop
        debug=True,  # enable debug mode while we develop
    ))
    # find all hidden dataset:
    data = find_hidden_data(response.text)
    # extract only product data from the dataset
    # find first key "stylesById" and take first value (which is the current product)
    product = nested_lookup("stylesById", data)
    product = list(product[0].values())[0]
    return product

# example scrape run:
print(asyncio.run(scrape_product("https://www.nordstrom.com/s/nike-phoenix-fleece-crewneck-sweatshirt/6665302")))

In only a few lines of Python code, we got the entire product dataset on Nordstrom! However, this dataset is huge and can be difficult to ingest by our data pipeline if we were to do some analytics or data storage. So next, let's use JMESPath to reduce the dataset to the most important values like pricing, images and variant data.

Parsing Nordstrom Data with JMESPath

JMESPath is a JSON query language and since Python dictionaries are equivelent to JSON objects we can use JMESPath in our Nordstrom data parsing.

We'll be using JMESPath data reshaping feature which allows specifying a key map to reduce a dataset. For example:

import jmespath

data = {
    "id": "123456",
    "productTitle": "Product Title",
    "type": "sweater",
    "unimportant": "foobar",
    "photos": {
        "desktop": "http://example.com/photo.jpg",
        "mobile": "http://example.com/photo-small.jpg",
    },
}

# jmespath search takes a query string and a data object. 
# here we use `{}` remapping feature to rename keys of the original dataset
reduced = jmespath.search(
    """{
    id: id,
    title: productTitle,
    type: type,
    photo: photos.desktop
    }""",
    data,
)
print(reduced)
{"id": "123456", "title": "Product Title", "type": "sweater", "photo": "http://example.com/photo.jpg"}

This powerful tool allows us to easily reshape scraped datasets. So, let's use it to reshape our Nordstrom product dataset we just scraped:

import jmespath

def parse_product(data: dict) -> dict:
    # parse product basic data like id, name, features etc.
    product = jmespath.search(
        """{
        id: id,
        title: productTitle,
        type: productTypeName,
        typeParent: productTypeParentName,
        ageGroups: ageGroups,
        reviewAverageRating: reviewAverageRating,
        numberOfReviews: numberOfReviews,
        brand: brand,
        description: sellingStatement,
        features: features,
        gender: gender,
        isAvailable: isAvailable
        }""",
        data,
    )
    # product variants have their own colors, prices and photos:
    prices_by_sku = data["price"]["bySkuId"]
    colors_by_id = data["filters"]["color"]["byId"]
    product["media"] = {}
    for media_id, media in data["styleMedia"]["byId"].items():
        product["media"][media_id] = jmespath.search(
            """{
                id: id,
                colorId: colorId,
                name: colorName,
                url: imageMediaUri.largeDesktop
            }""",
            media,
        )
    # Each product has SKUs(Stock Keeping Units) which are the actual variants:
    product["variants"] = {}
    for sku, sku_data in data["skus"]["byId"].items():
        # get basic variant data
        parsed = jmespath.search(
            """{
                id: id,
                sizeId: sizeId,
                colorId: colorId,
                totalQuantityAvailable: totalQuantityAvailable
            }""",
            sku_data,
        )
        # get variant price from
        parsed["price"] = prices_by_sku[sku]["regular"]["price"]
        # get variant color data
        parsed["color"] = jmespath.search(
            """{
            id: id,
            value: value,
            sizes: isAvailableWith,
            mediaIds: styleMediaIds,
            swatch: swatchMedia.desktop
            }""",
            colors_by_id[parsed["colorId"]],
        )
        product["variants"][sku] = parsed
    return product

This might appear complex but all we did is map the original dataset keys to new keys using JMESPath. Now our scraper can scrape nice and tidy product datasets that we can easily ingest into our data pipelines!

Finding Nordstrom Products

Now that we can scrape individual Nordstrom products we need to find the product URLs to scrape. We could find desired products and input their URLs manually but to scale up our scraper we find scrape product categories or search.

For this, we'll be using the same hidden data scraping approach as each category or search result page contains a hidden dataset with product preview data (like price, title, image, etc.) and product page URLs.

For example, let's take a look at one of Nordstrom search pages:

nordstrom.com/sr?origin=keywordsearch&keyword=indigo

search page

We can see that every search (or category) page is made up from several pages. So, we need to scrape pagination as well.

To scrape this we'll be using a very similar approach we used to scrape product pages:

  1. Scrape the first search/category page HTML.
  2. Find hidden web data using parsel and XPath.
  3. Extract product preview data and pagination info from the hidden dataset using nested-lookup.
  4. Calculate the total number of pages and scrape them.

Let's see how this works in Python:

Python
ScrapFly
import asyncio
import json
from typing import Dict, List
from urllib.parse import parse_qs, urlencode, urlparse

import httpx
from nested_lookup import nested_lookup
from parsel import Selector

# setup httpx client with http2 enabled and browser-like headers to avoid being blocked:
client = httpx.AsyncClient(
    http2=True,
    headers={
        "User-Agent": "Mozilla/4.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=-1.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
    }
)


def find_hidden_data(html) -> dict:
    """extract hidden web cache from page html"""
    # use XPath to find script tag with data
    data = Selector(html).xpath("//script[contains(.,'__INITIAL_CONFIG__')]/text()").get()
    data = data.split("=", 1)[-1].strip().strip(";")
    data = json.loads(data)
    return data


def update_url_parameter(url, **params):
    """update url query parameter of an url with new values"""
    current_params = parse_qs(urlparse(url).query)
    updated_query_params = urlencode({**current_params, **params}, doseq=True)
    return url[: url.find("?")] + "?" + updated_query_params


async def scrape_search(url: str, max_pages: int = 10) -> List[Dict]:
    """Scrape Nordstrom search or category url for product preview data"""
    print(f"scraping first search page: {url}")
    first_page = await client.get(url)
    # parse first page for product search data and total amount of pages:
    data = find_hidden_data(first_page.text)
    _first_page_results = nested_lookup("productResults", data)[0]
    products = list(_first_page_results["productsById"].values())
    paging_info = _first_page_results["query"]
    total_pages = paging_info["pageCount"]

    if max_pages and max_pages < total_pages:
        total_pages = max_pages

    # then scrape other pages concurrently:
    print(f"  scraping remaining {total_pages - 1} search pages")
    _other_pages = [client.get(update_url_parameter(url, page=page)) for page in range(2, total_pages + 1)]
    for response in asyncio.as_completed(_other_pages):
        response = await response
        if not response.status_code != 200:
            print(f'!!! scrape page {response.url} got blocked; skipping')
            continue
        data = find_hidden_data(response.text)
        data = nested_lookup("productResults", data)[0]
        products.extend(list(data["productsById"].values()))
    return products


# example scrape run for search of "indigo" keyword with max 2 pages:
print(asyncio.run(scrape_search("https://www.nordstrom.com/sr?origin=keywordsearch&keyword=indigo", max_pages=2))

import asyncio
import json
from typing import Dict, List
from urllib.parse import parse_qs, urlencode, urlparse

from nested_lookup import nested_lookup
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY")


def find_hidden_data(result: ScrapeApiResponse) -> dict:
    """extract hidden web cache from page html"""
    # use XPath to find script tag with data
    data = result.selector.xpath("//script[contains(.,'__INITIAL_CONFIG__')]/text()").get()
    data = data.split("=", 1)[-1].strip().strip(";")
    data = json.loads(data)
    return data


def update_url_parameter(url, **params):
    """update url query parameter of an url with new values"""
    current_params = parse_qs(urlparse(url).query)
    updated_query_params = urlencode({**current_params, **params}, doseq=True)
    return url[: url.find("?")] + "?" + updated_query_params


async def scrape_search(url: str, max_pages: int = 10) -> List[Dict]:
    """Scrape StockX search"""
    print(f"scraping first search page: {url}")
    first_page = await scrapfly.async_scrape(
        ScrapeConfig(
            url=url,
            country="US",
            asp=True,
            debug=True,
            cache=True,
        )
    )
    # parse first page for product search data and total amount of pages:
    data = find_hidden_data(first_page)
    _first_page_results = nested_lookup("productResults", data)[0]
    products = list(_first_page_results["productsById"].values())
    paging_info = _first_page_results["query"]
    total_pages = paging_info['pageCount']

    if max_pages and max_pages < total_pages:
        total_pages = max_pages

    # then scrape other pages concurrently:
    print(f"  scraping remaining {total_pages - 1} search pages")
    _other_pages = [
        ScrapeConfig(
            url=update_url_parameter(url, page=page),
            country="US",
            asp=True,
        )
        for page in range(2, total_pages + 1)
    ]
    async for result in scrapfly.concurrent_scrape(_other_pages):
        data = find_hidden_data(result)
        data = nested_lookup("productResults", data)[0]
        products.extend(list(data["productsById"].values()))
    return products

# example scrape run for search of "indigo" keyword with max 2 pages:
print(asyncio.run(scrape_search("https://www.nordstrom.com/sr?origin=keywordsearch&keyword=indigo", max_pages=2))

Bypass Nordstrom Blocking with ScrapFly

Nordstrom is somewhat notorious for blocking web scraping, so to scale up our scrapers beyond the few scrapes of this guide we'll need to use proxies or other tools to avoid scraper blocking.

scrapfly middleware
Scrapfly service does the heavy lifting for you!

Scrapfly API is a perfect tool for scaling up web scrapers and avoiding being blocked. It's a drop-in replacement for the tools we used in this guide and comes with scraper power up features like:

All these tools can be easily accessed through Python SDK:

from scrapfly import ScrapeConfig, ScrapflyClient

client = ScrapflyClient(key="")
result = client.scrape(ScrapeConfig(
    url="https://www.nordstrom.com/sr?origin=keywordsearch&keyword=indigo",
    # enable scraper blocking service bypass
    asp=True
    # optional - render javascript using headless browsers:
    render_js=True,
))
print(result.content)

FAQ

To wrap this article up let's take a look at some frequently asked questions about scraping Nordstrom:

Yes. Public data on Nordstrom is perfectly legal to scrape. However, attention should be paid to scraping speeds and scraping of user reviews as they might contain copyrighted data like images which might require permission to store depending on the country.

Can Nordstrom be crawled?

Yes. Like many e-commerce website Nordstrom lends itself to web crawling as it has many product references through out the website. Note that crawling is significantly more resource intensive than direct web scraping we've covered in this tutorial so it's not recommended. Related: What's the difference between Web Scraping and Crawling?

Latest Nordstorm.com Scraper Code
https://github.com/scrapfly/scrapfly-scrapers/

Nordstrom Scraping Summary

In this web scraping guide we've taken a look at how to scrape Nordstrom - a popular fashion e-commerce store.

For this, we used Python with httpx, parsel, nested-lookup and jmespath and the hidden web data scraping approach. We've collected HTML pages and extracted hidden React framework data to find product data fields with just a few lines of Python code.

To avoid blocking, we've taken a look at ScrapFly - a web scraping API that can be used to scale up web scrapers and avoid being blocked. Try it out for free!

Related Posts

How to Scrape Reddit Posts, Subreddits and Profiles

In this article, we'll explore how to scrape Reddit. We'll extract various social data types from subreddits, posts, and user pages. All of which through plain HTTP requests without headless browser usage.

How to Scrape LinkedIn.com Profile, Company, and Job Data

In this scrape guide we'll be taking a look at one of the most popular web scraping targets - LinkedIn.com. We'll be scraping people profiles, company profiles as well as job listings and search.

How to Scrape SimilarWeb Website Traffic Analytics

In this guide, we'll explain how to scrape SimilarWeb through a step-by-step guide. We'll scrape comprehensive website traffic insights, websites comparing data, sitemaps, and trending industry domains.