Nordstrom is a React driven fashion marketplace with real time inventory, pricing experiments, and layered anti bot controls that make it both lucrative and challenging to scrape. Hitting it with cURL works for a single request, yet production monitoring requires resilient infrastructure that can rotate fingerprints, respect pagination state, and normalize deeply nested product JSON.
In this guide we act as both reverse engineer and production scraping engineer. We will prototype with hidden web data techniques so you can see exactly how Nordstrom ships product JSON to the browser.
Quick Start
If you just need a working scraper, clone the maintained Nordstrom scraper that ships with ScrapFly ready settings:
git clone https://github.com/scrapfly/scrapfly-scrapers.git
cd scrapfly-scrapers/nordstorm-scraper
This repository contains an up to date scraper with ScrapFly configuration, HTTP client best practices, and parsing helpers so you can run a production ready crawl with minimal setup.
Latest Nordstorm.com Scraper Code
What Is Nordstrom?
Nordstrom is a major retailer with hundreds of physical stores and a vast online marketplace. The company offers a wide range of products, including apparel, beauty items, footwear, and accessories for women, men, and children.
One of the standout features of Nordstrom's website is the product detail page (PDP), which exposes highly structured JSON data. This embedded data includes product titles, descriptions, variants, pricing, inventory levels, images, and customer reviews. Such a format makes Nordstrom particularly attractive for anyone seeking to extract clean and comprehensive e-commerce data.
Why Scrape Nordstrom?
Scraping Nordstrom allows you to access a wealth of well-structured product and marketplace data, such as:
- Product Information: Names, IDs, detailed descriptions, product type, and parent categories.
- Variants: Availability and attributes for each size, color, and style, including fulfillment and inventory statuses.
- Pricing: Original and sale prices, markdown details, full price ladders for each SKU.
- Images and Media: Main images, swatches, 360° views, alternative photos, and all related media metadata.
- Brand Details: Brand name, brand pages, and category associations.
- Customer Reviews: Number of reviews, star ratings, reviewer profiles, and review content.
- Attributes: Material, fit, care instructions, age group, and other technical specifications.
Scraping this data allows for in-depth product analytics, price tracking, product matching, catalog building, trend monitoring, and more.
For more ideas on data you can collect with web scraping, check out our web scraping use case hub.
Challenges of Scraping Nordstrom
Nordstrom invests heavily in traffic quality filters and advanced website mechanics, so even well-written prototypes can break when scaled.
Below, we break down the main technical and operational obstacles you’ll face, along with practical tips for overcoming each challenge. The format below uses bullet points and shorter paragraphs for readability.
Anti-Bot Defenses
Nordstrom’s anti-bot stack is sophisticated and multi-layered. In addition to standard defenses, they employ several advanced techniques:
Frequent Fingerprint Rotation: Actively rotate device fingerprints, including:
- Window dimensions
- Hardware profiling
- Language settings
- Browser quirks
Session Integrity Checks: Validate sessions frequently, which can catch scrapers using reused headers or cookie jars.
Surprise Challenges: Insert invisible or interactive CAPTCHAs at unpredictable times.
Detection Triggers: Automated clients can stand out if they:
- Reuse identical headers, cookies, or sessions
- Use uncommon patterns for headers such as User-Agent or Accept-Language
Even advanced HTTP libraries can easily stand out if you reuse identical headers or cookie jars across sessions. Real-world blockers manifest as sudden 403 errors, long page loads, or bot challenge overlays.
Rate Limiting and IP Hygiene
Nordstrom watches for abnormal request rates and browsing behavior. Notable rate-limiting tactics include:
Strict Thresholds: Enforced not just at IP level, but also for device fingerprints or user IDs.
Detection of Robot Patterns: Requests fired at precise intervals or hitting APIs in unnatural sequences look suspicious.
To avoid detection while scraping, randomize your approach at every level: vary the timing between requests by adding jitter, and shuffle the order in which requests are made.
Deeply Nested Data Structures
Almost all valuable data on Nordstrom—such as products, prices, variants, and reviews, exists within highly nested JSON structures. These structures are typically organized by numeric IDs and rarely provide straightforward cross-references.
How Nordstrom Structures Its Data:
- Key datasets (SKUs, prices, media assets, reviews) are kept in separate dictionaries, each keyed by their own IDs.
- There isn’t a simple “relational table” or built-in mapping, so you can’t just merge everything together in one line of code.
- To assemble full product information, you have to match up related IDs across multiple dictionaries.
Strategies for Navigating Complex Data:
- Use specialized tools like
nested-lookupto flatten nested dictionaries. - Try JMESPath for powerful, query-like searching within deeply nested JSON.
- Explicitly program joining logic, and make sure your code gracefully deals with missing or null values.
Clear, robust logic for joining and navigating these nested structures is essential to reliably extract all the data you need.
Understanding these constraints is key to building a reliable Nordstrom scraper. Leveraging smart session management, adaptive request logic, dynamic data extraction, and robust JSON joining will allow your scraper to recover from failures and keep pace as Nordstrom evolves its defenses.
Nordstrom Scrape Preview
In this article, we'll focus on scraping Nordstrom product data and product reviews. Here are some examples of the datasets we'll be collecting:
Scraped Product Dataset
{
"id": 5846438,
"title": "SKIMS Stretch Cotton T-Shirt",
"type": "T-shirt/Tee",
"typeParent": "Tops",
"ageGroups": [
"ADULT"
],
"reviewAverageRating": 4.5,
"numberOfReviews": 652,
"brand": {
"brandName": "SKIMS",
"brandUrl": "/brands/skims--21197?origin=productBrandLink",
"hasBrandPage": false,
"imsBrandId": 74974321
},
"description": "A tried-and-true classic, this fitted T-shirt made from stretch-cotton jersey is from Kim Kardashian's highly sought-out SKIMS.",
"features": [
"21 1/2\" length (size Medium)",
"Crewneck",
"Short sleeves",
"90% cotton, 10% elastane",
"Machine wash, tumble dry",
"Imported",
"Item #6194916"
],
"gender": "Female",
"isAvailable": true,
"media": {
"5847438": {
"id": 5847438,
"colorId": "053",
"name": "LIGHT HEATHER GREY",
"url": "https://n.nordstrommedia.com/id/sr3/e354aaf8-5865-431b-b8d8-3cbccc6a2d83.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5847448": {
"id": 5847448,
"colorId": "053",
"name": "LIGHT HEATHER GREY",
"url": "https://n.nordstrommedia.com/id/sr3/df191e8d-4f2c-48f4-9144-e6b9dbede775.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5847458": {
"id": 5847458,
"colorId": "053",
"name": "LIGHT HEATHER GREY",
"url": "https://n.nordstrommedia.com/id/sr3/bca96a41-af1b-4736-89e3-e2facb3ec8ed.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5847468": {
"id": 5847468,
"colorId": "053",
"name": "LIGHT HEATHER GREY",
"url": "https://n.nordstrommedia.com/id/sr3/1b0051f1-f60e-4b4b-8f79-3fabd077e91d.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5847478": {
"id": 5847478,
"colorId": "053",
"name": "LIGHT HEATHER GREY",
"url": "https://n.nordstrommedia.com/id/sr3/86510e70-589b-440a-b66a-98982ce59740.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5847488": {
"id": 5847488,
"colorId": "053",
"name": "LIGHT HEATHER GREY",
"url": "https://n.nordstrommedia.com/id/sr3/d6ae4e0c-3b22-4dff-b528-d428005d8cd8.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5848438": {
"id": 5848438,
"colorId": "234",
"name": "SEDONA",
"url": "https://n.nordstrommedia.com/id/sr3/d64c4a4d-ca98-46af-8ff4-efd7460e3321.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5848448": {
"id": 5848448,
"colorId": "234",
"name": "SEDONA",
"url": "https://n.nordstrommedia.com/id/sr3/f1d6105b-9e75-49aa-bfdb-39ed6a0cd82a.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5848458": {
"id": 5848458,
"colorId": "234",
"name": "SEDONA",
"url": "https://n.nordstrommedia.com/id/sr3/04936587-02d9-41c7-b36f-b7f90144df6e.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5849438": {
"id": 5849438,
"colorId": "242",
"name": "UMBER",
"url": "https://n.nordstrommedia.com/id/sr3/85f4e2d8-00de-41f9-b777-2169bb799970.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5849448": {
"id": 5849448,
"colorId": "242",
"name": "UMBER",
"url": "https://n.nordstrommedia.com/id/sr3/4e2bffa2-fb87-416c-8438-a922d593423f.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5849458": {
"id": 5849458,
"colorId": "242",
"name": "UMBER",
"url": "https://n.nordstrommedia.com/id/sr3/ca5f4ff8-7587-48cc-8914-818ee6320b9c.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5850438": {
"id": 5850438,
"colorId": "251",
"name": "CAMEL",
"url": "https://n.nordstrommedia.com/id/sr3/0762da9a-4326-46fd-9b84-6db33035c0ea.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5850448": {
"id": 5850448,
"colorId": "251",
"name": "CAMEL",
"url": "https://n.nordstrommedia.com/id/sr3/9f20433a-3d03-4893-87f9-2fd90f05c2b5.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5850458": {
"id": 5850458,
"colorId": "251",
"name": "CAMEL",
"url": "https://n.nordstrommedia.com/id/sr3/32d39f3b-88e8-4ee2-bb15-7723bed651c8.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5850468": {
"id": 5850468,
"colorId": "251",
"name": "CAMEL",
"url": "https://n.nordstrommedia.com/id/sr3/da666e38-7c2d-408e-9874-f30f094ccd9e.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5850478": {
"id": 5850478,
"colorId": "251",
"name": "CAMEL",
"url": "https://n.nordstrommedia.com/id/sr3/97828599-558b-48a5-8e03-35aeec7f6dbe.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5850488": {
"id": 5850488,
"colorId": "251",
"name": "CAMEL",
"url": "https://n.nordstrommedia.com/id/sr3/ef50a5bb-8f20-428d-8d64-0c7f9dd80776.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5851438": {
"id": 5851438,
"colorId": "301",
"name": "DEEP SEA",
"url": "https://n.nordstrommedia.com/id/sr3/8a2ed339-427b-4f93-9a49-762a43145d42.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5851448": {
"id": 5851448,
"colorId": "301",
"name": "DEEP SEA",
"url": "https://n.nordstrommedia.com/id/sr3/406118cc-c17a-42a5-842c-c12a54c19b39.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5852438": {
"id": 5852438,
"colorId": "339",
"name": "MINERAL",
"url": "https://n.nordstrommedia.com/id/sr3/a6c49b4c-1849-4c9e-895e-2804c4a0d01b.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5852448": {
"id": 5852448,
"colorId": "339",
"name": "MINERAL",
"url": "https://n.nordstrommedia.com/id/sr3/3c3820b0-0fe3-4869-bfcb-040917a78276.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5852458": {
"id": 5852458,
"colorId": "339",
"name": "MINERAL",
"url": "https://n.nordstrommedia.com/id/sr3/0544d615-d912-4fed-8e35-95bd9fdf753f.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5852468": {
"id": 5852468,
"colorId": "339",
"name": "MINERAL",
"url": "https://n.nordstrommedia.com/id/sr3/df96797a-9a3d-4070-83c5-cd7d94dd1260.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5853438": {
"id": 5853438,
"colorId": "400",
"name": "COBALT",
"url": "https://n.nordstrommedia.com/id/sr3/95c440cd-18ea-47e0-a48f-6f97f1e1c0fc.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5854438": {
"id": 5854438,
"colorId": "446",
"name": "KYANITE",
"url": "https://n.nordstrommedia.com/id/sr3/b0359253-5e23-4619-9123-34dfb35063e6.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5854448": {
"id": 5854448,
"colorId": "446",
"name": "KYANITE",
"url": "https://n.nordstrommedia.com/id/sr3/81a2918e-5643-4d63-8850-d9d8654b62af.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5854458": {
"id": 5854458,
"colorId": "446",
"name": "KYANITE",
"url": "https://n.nordstrommedia.com/id/sr3/8e2b9d0f-8b8f-4835-9a57-bb197f95631d.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5855438": {
"id": 5855438,
"colorId": "8",
"name": "525",
"url": "https://n.nordstrommedia.com/id/sr3/31cdee52-d41a-46a7-8691-3ae1e0c53fb7.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5855448": {
"id": 5855448,
"colorId": "8",
"name": "525",
"url": "https://n.nordstrommedia.com/id/sr3/a1843dad-b30c-4031-8d36-42c47934572f.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5855458": {
"id": 5855458,
"colorId": "8",
"name": "525",
"url": "https://n.nordstrommedia.com/id/sr3/e2543102-670e-40e8-acb6-916ea91f1515.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5855468": {
"id": 5855468,
"colorId": "8",
"name": "525",
"url": "https://n.nordstrommedia.com/id/sr3/3daefa94-9c8a-41f6-967e-f85b80ba3ebf.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5855478": {
"id": 5855478,
"colorId": "8",
"name": "525",
"url": "https://n.nordstrommedia.com/id/sr3/95c440cd-18ea-47e0-a48f-6f97f1e1c0fc.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5855488": {
"id": 5855488,
"colorId": "8",
"name": "525",
"url": "https://n.nordstrommedia.com/id/sr3/ad9d1fcc-a0a0-4856-8345-de54e3b6b54f.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5856438": {
"id": 5856438,
"colorId": "603",
"name": "SANGRIA",
"url": "https://n.nordstrommedia.com/id/sr3/aaa6a78e-f7d8-46f3-b51e-533642b5ea02.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5856448": {
"id": 5856448,
"colorId": "603",
"name": "SANGRIA",
"url": "https://n.nordstrommedia.com/id/sr3/3d680521-dc9e-4f07-a634-e02043e78910.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5856458": {
"id": 5856458,
"colorId": "603",
"name": "SANGRIA",
"url": "https://n.nordstrommedia.com/id/sr3/fbc8e722-af04-403f-a8fa-e938d56da1f3.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5856468": {
"id": 5856468,
"colorId": "603",
"name": "SANGRIA",
"url": "https://n.nordstrommedia.com/id/sr3/4f2e699b-125f-484e-8873-09f72a2fa40a.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5856478": {
"id": 5856478,
"colorId": "603",
"name": "SANGRIA",
"url": "https://n.nordstrommedia.com/id/sr3/1559d8ec-d03e-416c-9d27-c6d31151012f.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5856488": {
"id": 5856488,
"colorId": "603",
"name": "SANGRIA",
"url": "https://n.nordstrommedia.com/id/sr3/44ffdcad-614c-4329-ba6b-65244873e200.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5857438": {
"id": 5857438,
"colorId": "690",
"name": "ROSE CLAY",
"url": "https://n.nordstrommedia.com/id/sr3/35a9863f-feda-463c-aedf-a988329754c8.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5857448": {
"id": 5857448,
"colorId": "690",
"name": "ROSE CLAY",
"url": "https://n.nordstrommedia.com/id/sr3/2241bfc4-be0f-4645-a350-7d19aafce7ae.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5857458": {
"id": 5857458,
"colorId": "690",
"name": "ROSE CLAY",
"url": "https://n.nordstrommedia.com/id/sr3/3c76bafa-dda1-4069-9d55-4deddd58a70f.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5857468": {
"id": 5857468,
"colorId": "690",
"name": "ROSE CLAY",
"url": "https://n.nordstrommedia.com/id/sr3/a3e5cf6b-0e43-455e-aba4-0b7093e0ac60.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5857478": {
"id": 5857478,
"colorId": "690",
"name": "ROSE CLAY",
"url": "https://n.nordstrommedia.com/id/sr3/357882e4-c176-4c98-9601-39ee0299452a.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5857488": {
"id": 5857488,
"colorId": "690",
"name": "ROSE CLAY",
"url": "https://n.nordstrommedia.com/id/sr3/b9a9588e-a241-43b8-b907-0fc5d16d959c.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5858438": {
"id": 5858438,
"colorId": "900",
"name": "BONE",
"url": "https://n.nordstrommedia.com/id/sr3/eb5b0ed4-41b9-439b-a56d-a9f549892451.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5859438": {
"id": 5859438,
"colorId": "003",
"name": "SOOT",
"url": "https://n.nordstrommedia.com/id/sr3/2c5c5fd6-3df6-4e30-a5af-893041f219dc.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
},
"5860438": {
"id": 5860438,
"colorId": "203",
"name": "GARNET",
"url": "https://n.nordstrommedia.com/id/sr3/9b140781-5301-4137-b94e-fe10b7a674b4.jpeg?crop=pad&pad_color=FFF&format=jpeg&w=780&h=1196"
}
},
"variants": {
"5871416": {
"id": 5871416,
"sizeId": "xx-small",
"colorId": "339",
"totalQuantityAvailable": 1,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "339",
"value": "Mineral",
"sizes": "_s:xx-small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5852438,
5852448,
5852458,
5852468
],
"swatch": "https://n.nordstrommedia.com/id/sr3/8a3eb8e4-e660-42d9-af9e-41d9e85ecb99.jpeg?crop=fit&w=31&h=31"
}
},
"5871419": {
"id": 5871419,
"sizeId": "medium",
"colorId": "339",
"totalQuantityAvailable": 9,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "339",
"value": "Mineral",
"sizes": "_s:xx-small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5852438,
5852448,
5852458,
5852468
],
"swatch": "https://n.nordstrommedia.com/id/sr3/8a3eb8e4-e660-42d9-af9e-41d9e85ecb99.jpeg?crop=fit&w=31&h=31"
}
},
"5871420": {
"id": 5871420,
"sizeId": "large",
"colorId": "339",
"totalQuantityAvailable": 10,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "339",
"value": "Mineral",
"sizes": "_s:xx-small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5852438,
5852448,
5852458,
5852468
],
"swatch": "https://n.nordstrommedia.com/id/sr3/8a3eb8e4-e660-42d9-af9e-41d9e85ecb99.jpeg?crop=fit&w=31&h=31"
}
},
"5871421": {
"id": 5871421,
"sizeId": "x-large",
"colorId": "339",
"totalQuantityAvailable": 19,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "339",
"value": "Mineral",
"sizes": "_s:xx-small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5852438,
5852448,
5852458,
5852468
],
"swatch": "https://n.nordstrommedia.com/id/sr3/8a3eb8e4-e660-42d9-af9e-41d9e85ecb99.jpeg?crop=fit&w=31&h=31"
}
},
"5871422": {
"id": 5871422,
"sizeId": "plus-2 x",
"colorId": "339",
"totalQuantityAvailable": 19,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "339",
"value": "Mineral",
"sizes": "_s:xx-small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5852438,
5852448,
5852458,
5852468
],
"swatch": "https://n.nordstrommedia.com/id/sr3/8a3eb8e4-e660-42d9-af9e-41d9e85ecb99.jpeg?crop=fit&w=31&h=31"
}
},
"5871423": {
"id": 5871423,
"sizeId": "plus-3 x",
"colorId": "339",
"totalQuantityAvailable": 14,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "339",
"value": "Mineral",
"sizes": "_s:xx-small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5852438,
5852448,
5852458,
5852468
],
"swatch": "https://n.nordstrommedia.com/id/sr3/8a3eb8e4-e660-42d9-af9e-41d9e85ecb99.jpeg?crop=fit&w=31&h=31"
}
},
"5871424": {
"id": 5871424,
"sizeId": "plus-4 x",
"colorId": "339",
"totalQuantityAvailable": 23,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "339",
"value": "Mineral",
"sizes": "_s:xx-small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5852438,
5852448,
5852458,
5852468
],
"swatch": "https://n.nordstrommedia.com/id/sr3/8a3eb8e4-e660-42d9-af9e-41d9e85ecb99.jpeg?crop=fit&w=31&h=31"
}
},
"33855448": {
"id": 33855448,
"sizeId": "small",
"colorId": "900",
"totalQuantityAvailable": 319,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "900",
"value": "Bone",
"sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5858438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
}
},
"33855449": {
"id": 33855449,
"sizeId": "medium",
"colorId": "900",
"totalQuantityAvailable": 437,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "900",
"value": "Bone",
"sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5858438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
}
},
"33855450": {
"id": 33855450,
"sizeId": "large",
"colorId": "900",
"totalQuantityAvailable": 626,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "900",
"value": "Bone",
"sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5858438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
}
},
"33855451": {
"id": 33855451,
"sizeId": "x-large",
"colorId": "900",
"totalQuantityAvailable": 273,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "900",
"value": "Bone",
"sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5858438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
}
},
"33855452": {
"id": 33855452,
"sizeId": "plus-2 x",
"colorId": "900",
"totalQuantityAvailable": 105,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "900",
"value": "Bone",
"sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5858438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
}
},
"33855454": {
"id": 33855454,
"sizeId": "xx-small",
"colorId": "900",
"totalQuantityAvailable": 38,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "900",
"value": "Bone",
"sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5858438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
}
},
"33855455": {
"id": 33855455,
"sizeId": "plus-3 x",
"colorId": "900",
"totalQuantityAvailable": 56,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "900",
"value": "Bone",
"sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5858438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
}
},
"33855456": {
"id": 33855456,
"sizeId": "plus-4 x",
"colorId": "900",
"totalQuantityAvailable": 67,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "900",
"value": "Bone",
"sizes": "_s:xx-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5858438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/5835a37b-e5c6-4bb9-9564-02f506ac745c.jpeg?crop=fit&w=31&h=31"
}
},
"33855464": {
"id": 33855464,
"sizeId": "x-small",
"colorId": "003",
"totalQuantityAvailable": 1,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "003",
"value": "Soot",
"sizes": "_s:xx-small|_s:x-small|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5859438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/51d8a867-3627-4f76-88c2-5f3a6397ad2a.jpeg?crop=fit&w=31&h=31"
}
},
"33855477": {
"id": 33855477,
"sizeId": "large",
"colorId": "003",
"totalQuantityAvailable": 720,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "003",
"value": "Soot",
"sizes": "_s:xx-small|_s:x-small|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5859438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/51d8a867-3627-4f76-88c2-5f3a6397ad2a.jpeg?crop=fit&w=31&h=31"
}
},
"33855478": {
"id": 33855478,
"sizeId": "x-large",
"colorId": "003",
"totalQuantityAvailable": 317,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "003",
"value": "Soot",
"sizes": "_s:xx-small|_s:x-small|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5859438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/51d8a867-3627-4f76-88c2-5f3a6397ad2a.jpeg?crop=fit&w=31&h=31"
}
},
"33855479": {
"id": 33855479,
"sizeId": "plus-2 x",
"colorId": "003",
"totalQuantityAvailable": 166,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "003",
"value": "Soot",
"sizes": "_s:xx-small|_s:x-small|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5859438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/51d8a867-3627-4f76-88c2-5f3a6397ad2a.jpeg?crop=fit&w=31&h=31"
}
},
"33855480": {
"id": 33855480,
"sizeId": "xx-small",
"colorId": "003",
"totalQuantityAvailable": 22,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "003",
"value": "Soot",
"sizes": "_s:xx-small|_s:x-small|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5859438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/51d8a867-3627-4f76-88c2-5f3a6397ad2a.jpeg?crop=fit&w=31&h=31"
}
},
"33855482": {
"id": 33855482,
"sizeId": "plus-3 x",
"colorId": "003",
"totalQuantityAvailable": 11,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "003",
"value": "Soot",
"sizes": "_s:xx-small|_s:x-small|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5859438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/51d8a867-3627-4f76-88c2-5f3a6397ad2a.jpeg?crop=fit&w=31&h=31"
}
},
"33855483": {
"id": 33855483,
"sizeId": "plus-4 x",
"colorId": "003",
"totalQuantityAvailable": 18,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "003",
"value": "Soot",
"sizes": "_s:xx-small|_s:x-small|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5859438
],
"swatch": "https://n.nordstrommedia.com/id/sr3/51d8a867-3627-4f76-88c2-5f3a6397ad2a.jpeg?crop=fit&w=31&h=31"
}
},
"36450158": {
"id": 36450158,
"sizeId": "medium",
"colorId": "053",
"totalQuantityAvailable": 241,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "053",
"value": "Light Heather Grey",
"sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5847438,
5847448,
5847458,
5847468,
5847478,
5847488
],
"swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
}
},
"36450160": {
"id": 36450160,
"sizeId": "large",
"colorId": "053",
"totalQuantityAvailable": 137,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "053",
"value": "Light Heather Grey",
"sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5847438,
5847448,
5847458,
5847468,
5847478,
5847488
],
"swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
}
},
"36450161": {
"id": 36450161,
"sizeId": "x-large",
"colorId": "053",
"totalQuantityAvailable": 69,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "053",
"value": "Light Heather Grey",
"sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5847438,
5847448,
5847458,
5847468,
5847478,
5847488
],
"swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
}
},
"36450162": {
"id": 36450162,
"sizeId": "plus-2 x",
"colorId": "053",
"totalQuantityAvailable": 40,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "053",
"value": "Light Heather Grey",
"sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5847438,
5847448,
5847458,
5847468,
5847478,
5847488
],
"swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
}
},
"36450163": {
"id": 36450163,
"sizeId": "xx-small",
"colorId": "053",
"totalQuantityAvailable": 16,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "053",
"value": "Light Heather Grey",
"sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5847438,
5847448,
5847458,
5847468,
5847478,
5847488
],
"swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
}
},
"36450164": {
"id": 36450164,
"sizeId": "plus-3 x",
"colorId": "053",
"totalQuantityAvailable": 23,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "053",
"value": "Light Heather Grey",
"sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5847438,
5847448,
5847458,
5847468,
5847478,
5847488
],
"swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
}
},
"36450165": {
"id": 36450165,
"sizeId": "plus-4 x",
"colorId": "053",
"totalQuantityAvailable": 27,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "053",
"value": "Light Heather Grey",
"sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5847438,
5847448,
5847458,
5847468,
5847478,
5847488
],
"swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
}
},
"36450185": {
"id": 36450185,
"sizeId": "x-small",
"colorId": "053",
"totalQuantityAvailable": 46,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "053",
"value": "Light Heather Grey",
"sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5847438,
5847448,
5847458,
5847468,
5847478,
5847488
],
"swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
}
},
"36450186": {
"id": 36450186,
"sizeId": "small",
"colorId": "053",
"totalQuantityAvailable": 197,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "053",
"value": "Light Heather Grey",
"sizes": "_s:xx-small|_s:x-small|_s:small|_s:medium|_s:large|_s:x-large|_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5847438,
5847448,
5847458,
5847468,
5847478,
5847488
],
"swatch": "https://n.nordstrommedia.com/id/sr3/9c98532a-adb9-4dad-b511-0ac149511a58.jpeg?crop=fit&w=31&h=31"
}
},
"38558224": {
"id": 38558224,
"sizeId": "plus-2 x",
"colorId": "446",
"totalQuantityAvailable": 22,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "446",
"value": "Kyanite",
"sizes": "_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5854438,
5854448,
5854458
],
"swatch": "https://n.nordstrommedia.com/id/sr3/2f637c12-349c-4506-9021-70e078f2ffe4.jpeg?crop=fit&w=31&h=31"
}
},
"38558226": {
"id": 38558226,
"sizeId": "plus-3 x",
"colorId": "446",
"totalQuantityAvailable": 5,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "446",
"value": "Kyanite",
"sizes": "_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5854438,
5854448,
5854458
],
"swatch": "https://n.nordstrommedia.com/id/sr3/2f637c12-349c-4506-9021-70e078f2ffe4.jpeg?crop=fit&w=31&h=31"
}
},
"38558227": {
"id": 38558227,
"sizeId": "plus-4 x",
"colorId": "446",
"totalQuantityAvailable": 7,
"price": {
"currencyCode": "USD",
"units": 48,
"nanos": 0
},
"color": {
"id": "446",
"value": "Kyanite",
"sizes": "_s:plus-2 x|_s:plus-3 x|_s:plus-4 x|",
"mediaIds": [
5854438,
5854448,
5854458
],
"swatch": "https://n.nordstrommedia.com/id/sr3/2f637c12-349c-4506-9021-70e078f2ffe4.jpeg?crop=fit&w=31&h=31"
}
}
}
}
Scraping Nordstrom with Python
Now that we know why Nordstrom is valuable and how it delivers data, let’s put it into practice. We will bootstrap a Python project, extract hidden JSON from a single PDP, reshape it with JMESPath, then expand the same approach to search pagination. Every snippet can run with plain httpx during prototyping and with ScrapFly once you are ready for production monitoring.
Project Setup
For this scraper we'll be using the hidden web data scraping approach. We'll be collecting HTML pages and extracting hidden JSON datasets, then parsing them with JSON parsing tools:
- httpx - powerful HTTP client which we'll be using to retrieve the HTML pages.
- parsel - HTML parser which we'll be using to extract hidden JSON datasets.
- nested-lookup - JSON/Dict parser which will help us find specific keys in large JSON datasets.
- jmespath - JSON query engine which we'll be using to reduce JSON datasets to important bits like product prices, images etc. For more see our introduction to parsing JSON with JMESPath.
All of these packages can be installed using Python's pip console command:
$ pip install httpx parsel jmespath nested-lookup
When instantiating httpx.AsyncClient, enable http2=True and feed browser grade headers such as User Agent, Accept, and Accept-Language so Nordstrom treats the session like a real shopper. ScrapFly mirrors these headers automatically when you enable Anti Scraping Protection.
For Scrapfly users there's also a Scrapfly SDK version of each code example. The SDK can be installed using pip as well:
$ pip install "scrapfly-sdk[all]"
Scrape Nordstrom Product Data
Let's start by scraping product data of a single product. For this, let's take a look at an example product page like:
nordstrom.com/s/nike-phoenix-fleece-crewneck-sweatshirt/
We could parse the HTML data using CSS selectors or XPath but since Nordstrom is using React javascript framework to power their website we can extract the dataset directly from the page source:
If we open up page source and ctrl+f for unique product identifier text (like description or title) we can see there's a hidden JSON dataset. In web scraping, this is called hidden web data scraping and let's take a look how to scrape this in Python.
Our scraper process will look something like this:
- Retrieve HTML page of the product using
httpx. - Find the hidden JSON dataset from
<script>tag usingparseland XPath. - Load the JSON dataset using
json.loads()and find product fields usingnested-lookup
Here is a minimal async helper that performs those three steps and returns a cleaned dictionary for further processing:
async def fetch_product(session, url: str) -> dict:
response = await session.get(url)
data = Selector(response.text).xpath("//script[contains(.,'__INITIAL_CONFIG__')]/text()").get()
payload = json.loads(data.split("=", 1)[-1].strip().strip(";"))
product = nested_lookup("stylesById", payload)[0]
return next(iter(product.values()))
This helper expects an httpx.AsyncClient or ScrapFly async client and outputs the exact product object the later parsing step will reduce.
In Python this scraper will look like this:
import asyncio
import json
import httpx
from parsel import Selector
from nested_lookup import nested_lookup
# setup httpx client with http2 enabled and browser-like headers to avoid being blocked:
client = httpx.AsyncClient(
http2=True,
headers={
"User-Agent": "Mozilla/4.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=-1.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
}
)
def find_hidden_data(html) -> dict:
"""extract hidden web cache from page html"""
# use XPath to find script tag with data
data = Selector(html).xpath("//script[contains(.,'__INITIAL_CONFIG__')]/text()").get()
data = data.split("=", 1)[-1].strip().strip(";")
data = json.loads(data)
return data
async def scrape_product(url: str):
"""scrape Nordstrom.com product page for product data"""
response = await client.get(url)
# find all hidden dataset:
data = find_hidden_data(response.text)
# extract only product data from the dataset
# find first key "stylesById" and take first value (which is the current product)
product = nested_lookup("stylesById", data)
product = list(product[0].values())[0]
return product
# example scrape run:
print(asyncio.run(scrape_product("https://www.nordstrom.com/s/phoenix-fleece-crewneck-sweatshirt/6665302")))
import asyncio
import json
from nested_lookup import nested_lookup
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
SCRAPFLY = ScrapflyClient(key="YOUR SCRAPFLY API KEY")
def find_hidden_data(response: ScrapeApiResponse) -> dict:
"""extract hidden web cache from page html"""
# use XPath to find script tag with data
data = response.selector.xpath("//script[contains(.,'__INITIAL_CONFIG__')]/text()").get()
data = data.split("=", 1)[-1].strip().strip(";")
data = json.loads(data)
return data
async def scrape_product(url: str):
response = await SCRAPFLY.async_scrape(ScrapeConfig(
url=url,
asp=True, # enable anti-scraping-protection bypass
cache=True, # enable cache while we develop
debug=True, # enable debug mode while we develop
))
# find all hidden dataset:
data = find_hidden_data(response)
# extract only product data from the dataset
# find first key "stylesById" and take first value (which is the current product)
product = nested_lookup("stylesById", data)
product = list(product[0].values())[0]
return product
# example scrape run:
print(asyncio.run(scrape_product("https://www.nordstrom.com/s/phoenix-fleece-crewneck-sweatshirt/6665302")))
🙋 Note that Nordstorm can detect the HTTP requests and return the wrong HTML, which breaks the parsing logic. We recommend running the Scrapfly code tabs to bypass Nordstorm scraping blocking.
In only a few lines of Python code, we got the entire product dataset on Nordstrom! However, this dataset is huge and can be difficult to ingest by our data pipeline if we were to do some analytics or data storage. So next, let's use JMESPath to reduce the dataset to the most important values like pricing, images and variant data.
Parsing Nordstrom Data with JMESPath
JMESPath is a JSON query language and since Python dictionaries are equivelent to JSON objects we can use JMESPath in our Nordstrom data parsing.
We'll be using JMESPath data reshaping feature which allows specifying a key map to reduce a dataset. For example:
import jmespath
data = {
"id": "123456",
"productTitle": "Product Title",
"type": "sweater",
"unimportant": "foobar",
"photos": {
"desktop": "http://example.com/photo.jpg",
"mobile": "http://example.com/photo-small.jpg",
},
}
# jmespath search takes a query string and a data object.
# here we use `{}` remapping feature to rename keys of the original dataset
reduced = jmespath.search(
"""{
id: id,
title: productTitle,
type: type,
photo: photos.desktop
}""",
data,
)
print(reduced)
{"id": "123456", "title": "Product Title", "type": "sweater", "photo": "http://example.com/photo.jpg"}
This powerful tool allows us to easily reshape scraped datasets. So, let's use it to reshape our Nordstrom product dataset we just scraped:
import jmespath
def parse_product(data: dict) -> dict:
# parse product basic data like id, name, features etc.
product = jmespath.search(
"""{
id: id,
title: productTitle,
type: productTypeName,
typeParent: productTypeParentName,
ageGroups: ageGroups,
reviewAverageRating: reviewAverageRating,
numberOfReviews: numberOfReviews,
brand: brand,
description: sellingStatement,
features: features,
gender: gender,
isAvailable: isAvailable
}""",
data,
)
# product variants have their own colors, prices and photos:
prices_by_sku = data["price"]["bySkuId"] if data["price"] else None
colors_by_id = data["filters"]["color"]["byId"]
product["media"] = []
for media_item in data["mediaExperiences"]["carouselsByColor"]:
item = jmespath.search(
"""{
colorCode: colorCode,
colorName: colorName
}""",
media_item,
)
item["urls"] = [i["url"] for i in media_item["orderedShots"]]
product["media"].append(item)
# Each product has SKUs(Stock Keeping Units) which are the actual variants:
product["variants"] = {}
for sku, sku_data in data["skus"]["byId"].items():
# get basic variant data
parsed = jmespath.search(
"""{
id: id,
sizeId: sizeId,
colorId: colorId,
totalQuantityAvailable: totalQuantityAvailable
}""",
sku_data,
)
# get variant price from
parsed["price"] = prices_by_sku[sku]["regular"]["price"] if prices_by_sku else None
# get variant color data
parsed["color"] = jmespath.search(
"""{
id: id,
value: value,
sizes: isAvailableWith,
mediaIds: styleMediaIds,
swatch: swatchMedia.desktop
}""",
colors_by_id[parsed["colorId"]],
)
product["variants"][sku] = parsed
return product
This might appear complex but all we did is map the original dataset keys to new keys using JMESPath. Now our scraper can scrape nice and tidy product datasets that we can easily ingest into our data pipelines!
Finding Nordstrom Products
Now that we can scrape individual Nordstrom products we need to find the product URLs to scrape. We could find desired products and input their URLs manually but to scale up our scraper we find scrape product categories or search.
For this, we'll be using the same hidden data scraping approach as each category or search result page contains a hidden dataset with product preview data (like price, title, image, etc.) and product page URLs.
For example, let's take a look at one of Nordstrom search pages:
nordstrom.com/sr?origin=keywordsearch&keyword=indigo
We can see that every search or category page is made up from several pages. So, we need to scrape pagination as well.
To scrape this we'll be using a very similar approach we used to scrape product pages:
- Scrape the first search/category page HTML.
- Find hidden web data using
parseland XPath. - Extract product preview data and pagination info from the hidden dataset using
nested-lookup. - Calculate the total number of pages and scrape them.
Let's see how this works in Python:
import asyncio
import json
from typing import Dict, List
from urllib.parse import parse_qs, urlencode, urlparse
import httpx
from nested_lookup import nested_lookup
from parsel import Selector
# setup httpx client with http2 enabled and browser-like headers to avoid being blocked:
client = httpx.AsyncClient(
http2=True,
headers={
"User-Agent": "Mozilla/4.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=-1.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
}
)
def find_hidden_data(html) -> dict:
"""extract hidden web cache from page html"""
# use XPath to find script tag with data
data = Selector(html).xpath("//script[contains(.,'__INITIAL_CONFIG__')]/text()").get()
data = data.split("=", 1)[-1].strip().strip(";")
data = json.loads(data)
return data
def update_url_parameter(url, **params):
"""update url query parameter of an url with new values"""
current_params = parse_qs(urlparse(url).query)
updated_query_params = urlencode({**current_params, **params}, doseq=True)
return url[: url.find("?")] + "?" + updated_query_params
async def scrape_search(url: str, max_pages: int = 10) -> List[Dict]:
"""Scrape Nordstrom search or category url for product preview data"""
print(f"scraping first search page: {url}")
first_page = await client.get(url)
# parse first page for product search data and total amount of pages:
data = find_hidden_data(first_page.text)
_first_page_results = nested_lookup("productResults", data)[0]
products = list(_first_page_results["productsById"].values())
paging_info = _first_page_results["query"]
total_pages = paging_info["pageCount"]
if max_pages and max_pages < total_pages:
total_pages = max_pages
# then scrape other pages concurrently:
print(f" scraping remaining {total_pages - 1} search pages")
_other_pages = [client.get(update_url_parameter(url, page=page)) for page in range(2, total_pages + 1)]
for response in asyncio.as_completed(_other_pages):
response = await response
if not response.status_code != 200:
print(f'!!! scrape page {response.url} got blocked; skipping')
continue
data = find_hidden_data(response.text)
data = nested_lookup("productResults", data)[0]
products.extend(list(data["productsById"].values()))
return products
# example scrape run for search of "indigo" keyword with max 2 pages:
print(asyncio.run(scrape_search("https://www.nordstrom.com/sr?origin=keywordsearch&keyword=indigo", max_pages=2)))
import asyncio
import json
from typing import Dict, List
from urllib.parse import parse_qs, urlencode, urlparse
from nested_lookup import nested_lookup
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
SCRAPFLY = ScrapflyClient(key="YOUR SCRAPFLY API KEY")
def find_hidden_data(result: ScrapeApiResponse) -> dict:
"""extract hidden web cache from page html"""
# use XPath to find script tag with data
data = result.selector.xpath("//script[contains(.,'__INITIAL_CONFIG__')]/text()").get()
data = data.split("=", 1)[-1].strip().strip(";")
data = json.loads(data)
return data
def update_url_parameter(url, **params):
"""update url query parameter of an url with new values"""
current_params = parse_qs(urlparse(url).query)
updated_query_params = urlencode({**current_params, **params}, doseq=True)
return url[: url.find("?")] + "?" + updated_query_params
async def scrape_search(url: str, max_pages: int = 10) -> List[Dict]:
"""Scrape StockX search"""
print(f"scraping first search page: {url}")
first_page = await SCRAPFLY.async_scrape(
ScrapeConfig(
url=url,
country="US",
asp=True,
debug=True,
cache=True,
)
)
# parse first page for product search data and total amount of pages:
data = find_hidden_data(first_page)
_first_page_results = nested_lookup("productResults", data)[0]
products = list(_first_page_results["productsById"].values())
paging_info = _first_page_results["query"]
total_pages = paging_info['pageCount']
if max_pages and max_pages < total_pages:
total_pages = max_pages
# then scrape other pages concurrently:
print(f" scraping remaining {total_pages - 1} search pages")
_other_pages = [
ScrapeConfig(
url=update_url_parameter(url, page=page),
country="US",
asp=True,
)
for page in range(2, total_pages + 1)
]
async for result in SCRAPFLY.concurrent_scrape(_other_pages):
data = find_hidden_data(result)
data = nested_lookup("productResults", data)[0]
products.extend(list(data["productsById"].values()))
return products
# example scrape run for search of "indigo" keyword with max 2 pages:
print(asyncio.run(scrape_search("https://www.nordstrom.com/sr?origin=keywordsearch&keyword=indigo", max_pages=2)))
Bypass Nordstrom Blocking with ScrapFly
Nordstrom is somewhat notorious for blocking web scraping, so to scale up check out Scrapfly!
ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.
- Anti-bot protection bypass - scrape web pages without blocking!
- Rotating residential proxies - prevent IP address and geographic blocks.
- JavaScript rendering - scrape dynamic web pages through cloud browsers.
- Full browser automation - control browsers to scroll, input and click on objects.
- Format conversion - scrape as HTML, JSON, Text, or Markdown.
- Python and Typescript SDKs, as well as Scrapy and no-code tool integrations.
For example, we can scrape Nordstrom using the Python SDK:
from scrapfly import ScrapeConfig, ScrapflyClient
client = ScrapflyClient(key="YOUR_SCRAPFLY_API_KEY")
result = client.scrape(ScrapeConfig(
url="https://www.nordstrom.com/sr?origin=keywordsearch&keyword=indigo",
# enable scraper blocking service bypass
asp=True
# optional - render javascript using headless browsers:
render_js=True,
))
print(result.content)
FAQs
These are some FAQs about scraping data from Nordstrom
Why does Nordstrom return empty JSON data when scraping product pages?
Nordstrom may return empty JSON data due to anti-bot detection, rate limiting, or IP blocking. Use residential proxies, implement delays between requests, rotate user agents, and consider using anti-scraping protection services like ScrapFly to bypass these restrictions.
How do I handle Nordstrom's anti-bot protection when scraping at scale?
Use residential proxies with IP rotation, implement session management, add realistic delays between requests, use browser-like headers and user agents, and consider using ScrapFly's anti-scraping protection to handle CAPTCHAs and advanced bot detection automatically.
What's the difference between scraping Nordstrom's HTML vs hidden JSON data?
HTML scraping requires parsing rendered content with selectors and is more fragile to layout changes. Hidden JSON data contains structured product information directly from the backend, is more reliable, faster to parse, and provides more complete data including variants, pricing, and inventory information.How often can I scrape Nordstrom without getting blocked or rate limited?
Start with 1-2 requests per second and monitor for blocking. Implement exponential backoff on errors, use proxy rotation, and respect robots.txt. For large-scale scraping, consider using ScrapFly's anti-scraping protection to handle rate limiting and blocking automatically.
Do I need residential proxies to scrape Nordstrom product data reliably?
While not always required, residential proxies significantly improve success rates for Nordstrom scraping. They help avoid IP-based blocking and provide more realistic traffic patterns. For high-volume scraping, residential proxies or anti-scraping protection services are recommended.
Can Nordstrom be crawled?
Yes. Like many e-commerce website Nordstrom lends itself to web crawling as it has many product references through out the website. Note that crawling is significantly more resource intensive than direct web scraping we've covered in this tutorial so it's not recommended. Related: What's the difference between Web Scraping and Crawling?
Summary
In this web scraping guide we've taken a look at how to scrape Nordstrom - a popular fashion e-commerce store.
For this, we used Python with httpx, parsel, nested-lookup and jmespath and the hidden web data scraping approach. We've collected HTML pages and extracted hidden React framework data to find product data fields with just a few lines of Python code.
To avoid blocking, we've taken a look at ScrapFly - a web scraping API that can be used to scale up web scrapers and avoid being blocked. Try it out for free!
Legal Disclaimer and Precautions
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens who are protected by GDPR.
- Do not repurpose the entire public datasets which can be illegal in some countries.