Looking to unlock premium market insights from one of the world’s leading luxury resale platforms? Fashionphile blends meticulous curation with a modern storefront, placing a treasure trove of clean, structured product data right at your fingertips if you know where to look.
In this walkthrough we rewrite the Fashionphile scraping flow with simple language and drop in ready to run Python code so you can follow along without guessing.
Key Takeaways
- Capture Fashionphile data faster by reading the hidden JSON payload that powers each Next.js page
- Collect product titles, prices, conditions, measurements, and media in a single request per page
- Add pagination, query management, and rate controls to cover every sale or category page
- Avoid blocks with realistic headers, gentle concurrency, and ScrapFly managed anti bot features
- Validate results and handle errors early so large fashion data runs stay predictable
Quick Start
If you just need a working scraper, clone the maintained Fashionphile scraper that ships with ScrapFly ready settings:
git clone https://github.com/scrapfly/scrapfly-scrapers.git
cd scrapfly-scrapers/fashionphile-scraper
This repository contains an up to date scraper with ScrapFly configuration, HTTP client best practices, and parsing helpers so you can run a production ready crawl with minimal setup.
Latest Fashionphile.com Scraper Code
What is Fashionphile
Fashionphile is a reseller that focuses on luxury bags, shoes, jewelry, and accessories. Each product page contains a detailed write up, precise measurements, condition grades, and dozens of media assets. Because the storefront is built with Next.js, all of that content is delivered inside a hidden __NEXT_DATA__ script tag that is trivial to parse.
Why Fashionphile
Fashionphile covers thousands of premium items with accurate pricing history, discount tags, and availability notes, which makes it ideal for:
- Competitive tracking for luxury brands and retailers
- Market research on secondary pricing and demand
- Portfolio monitoring for investors in fashion goods
- Trend reports that need trustworthy descriptions and media
The site keeps a consistent layout and publishes data in JSON, so automation work stays light compared to marketplaces that require heavy HTML parsing or browser automation.
Challenges in Scraping Fashionphile
While Fashionphile is easier to scrape than many marketplaces, it still presents some unique challenges. Let’s dive into each one:
Extracting the Hidden JSON
Fashionphile pages rely on a hidden JSON blob contained in a __NEXT_DATA__ script tag. If you don’t extract and parse this correctly, you’ll end up with just the basic HTML shell, missing all product information. Always make sure your scraper selects the proper script tag and parses its JSON content to get the real data.
Handling Pagination with Query Parameters
Pagination on Fashionphile uses query parameters, such as page=2. If you aren’t careful when updating these URLs, you may accidentally request duplicate pages or skip content. Attention to URL construction and tracking the current page are key to reliable crawling.
Managing Rate Limits
Sending too many requests in rapid succession can quickly hit Fashionphile’s rate limits. When this happens, you might receive HTTP 403 errors or empty responses. To avoid this, add small delays between requests and limit concurrent connections to stay under the radar.
Navigating Anti-Bot Systems
Fashionphile employs anti-bot protections that pay attention to unusual request headers and network fingerprints (like TLS signatures), especially on category pages. To reduce the risk of being blocked, use realistic browser headers, consider rotating proxies if you’re running at scale, and optionally use managed solutions like ScrapFly that handle these nuances automatically.
By addressing these four areas, you’ll maximize your chances of scraping Fashionphile consistently and at scale.
Fashionphile Scrape Preview
The final dataset is a JSON document that mirrors the content users see on the site. Here is a trimmed example from a pair of sandals to highlight the structure:
Example Fashionphile Product Dataset
{
"id": 1048096,
"sku": "BW",
"title": "BOTTEGA VENETA Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black",
"slug": "/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096",
"price": 950,
"retailPrice": 1650,
"discountedPrice": 900,
"condition": "Excellent",
"conditions": [
"scuffs",
"imprints",
"marks on sole(s)"
],
"brand": [
{
"id": 89,
"name": "Bottega Veneta"
}
],
"measurements": [
{
"type": "size",
"unit": "EU",
"value": 36
},
{
"type": "heel",
"unit": "in",
"value": 4
}
],
"shipsWith": "2 dust bags, box",
"color": "Black",
"featuredImage": {
"large": "https://prod-images.fashionphile.com/large/...jpg",
"main": "https://prod-images.fashionphile.com/main/...jpg",
"thumb": "https://prod-images.fashionphile.com/thumb/...jpg"
},
"conditionsText": "scuffs, imprints, marks on sole(s)",
"url": "https://apigateway.fashionphile.com/product/1048096"
}
Project Setup
We only need a couple of Python packages to scrape the hidden JSON blobs: an HTTP client and a simple HTML selector library.
- httpx for HTTP/2 requests with browser style headers
- parsel for CSS selectors that pull out the
__NEXT_DATA__element
Install them with pip:
$ pip install httpx parsel
ScrapFly users can also install the SDK to get anti bot helpers and result caching in a single client:
$ pip install "scrapfly-sdk[all]"
Scrape Fashionphile Product Data
Let us start with a single product page inside the discount section:
fashionphile.com/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096
Modern frameworks place the product payload inside the page HTML. You can open the source view, search for the SKU, and spot a script tag with id="__NEXT_DATA__". Parsing that JSON is faster than traversing the rendered HTML tree.
Here is a minimal Python version that pulls the product block in a few lines:
import asyncio
import json
import httpx
from parsel import Selector
client = httpx.AsyncClient(
follow_redirects=True,
http2=True,
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
},
)
def find_hidden_data(html: str) -> dict:
data = Selector(html).css("script#__NEXT_DATA__::text").get()
return json.loads(data)
async def scrape_product(url: str) -> dict:
response = await client.get(url)
data = find_hidden_data(response.text)
product = data["props"]["pageProps"]["initialState"]["productPageReducer"]["productData"]
return product
if __name__ == "__main__":
print(asyncio.run(scrape_product("https://www.fashionphile.com/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096")))
import asyncio
import json
from urllib.parse import parse_qs, urlencode, urlparse
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY")
def find_hidden_data(result: ScrapeApiResponse) -> dict:
data = result.selector.css("script#__NEXT_DATA__::text").get()
return json.loads(data)
async def scrape_product(url: str) -> dict:
result = await scrapfly.async_scrape(
ScrapeConfig(
url=url,
cache=True,
asp=True,
)
)
data = find_hidden_data(result)
product = data["props"]["pageProps"]["initialState"]["productPageReducer"]["productData"]
return product
def update_url_parameter(url, **params):
current_params = parse_qs(urlparse(url).query)
updated_query_params = urlencode({**current_params, **params}, doseq=True)
return f"{url.split('?')[0]}?{updated_query_params}"
if __name__ == "__main__":
example = scrape_product(
"https://www.fashionphile.com/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096"
)
print(asyncio.run(example))
Scrape Fashionphile Search and Categories
Product discovery pages reuse the exact same hidden JSON idea. Every search, sale, or category page includes the results list, pagination info, and metadata. The only extra work is flipping through each page by updating the page query parameter.
Simple pagination plan:
- Fetch the first page and parse the JSON cache
- Grab the current hits and find the total page count
- Queue page numbers two through
nbPages - Request the rest in small batches and extend the hits list
Below is a Python example that limits concurrency to three active HTTP connections. You can raise or lower that number depending on how friendly you want to be.
import asyncio
import json
from typing import Dict, List
from urllib.parse import parse_qs, urlencode, urlparse
import httpx
from parsel import Selector
client = httpx.AsyncClient(
follow_redirects=True,
http2=True,
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
},
limits=httpx.Limits(max_connections=3),
)
def find_hidden_data(html: str) -> dict:
data = Selector(html).css("script#__NEXT_DATA__::text").get()
return json.loads(data)
def update_url_parameter(url, **params):
current_params = parse_qs(urlparse(url).query)
updated_query_params = urlencode({**current_params, **params}, doseq=True)
return f"{url.split('?')[0]}?{updated_query_params}"
async def scrape_search(url: str, max_pages: int = 3) -> List[Dict]:
first_page = await client.get(url)
dataset = find_hidden_data(first_page.text)
first_results = dataset["props"]["pageProps"]["serverState"]["initialResults"][
"prod_ecom_products_date_desc"
]["results"][0]
hits = first_results["hits"]
total_pages = first_results["nbPages"]
if max_pages and max_pages < total_pages:
total_pages = max_pages
async def scrape_remaining(page: int):
response = await client.get(update_url_parameter(url, page=page))
data = find_hidden_data(response.text)
block = data["props"]["pageProps"]["serverState"]["initialResults"]["prod_ecom_products_date_desc"]["results"][0]
return block["hits"]
tasks = [scrape_remaining(page) for page in range(2, total_pages + 1)]
for task in asyncio.as_completed(tasks):
hits.extend(await task)
return hits
if __name__ == "__main__":
results = asyncio.run(scrape_search("https://www.fashionphile.com/shop/discounted/all", max_pages=3))
print(len(results))
import asyncio
import json
from typing import Dict, List
from urllib.parse import parse_qs, urlencode, urlparse
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY")
BASE_CONFIG = {"asp": True, "cache": True}
def find_hidden_data(result: ScrapeApiResponse) -> dict:
data = result.selector.css("script#__NEXT_DATA__::text").get()
return json.loads(data)
def update_url_parameter(url, **params):
current_params = parse_qs(urlparse(url).query)
updated_query_params = urlencode({**current_params, **params}, doseq=True)
return f"{url.split('?')[0]}?{updated_query_params}"
async def scrape_search(url: str, max_pages: int = 10) -> List[Dict]:
first_page = await scrapfly.async_scrape(ScrapeConfig(url, **BASE_CONFIG))
dataset = find_hidden_data(first_page)
block = dataset["props"]["pageProps"]["serverState"]["initialResults"]["prod_ecom_products_date_desc"]["results"][0]
hits = block["hits"]
total_pages = block["nbPages"]
if max_pages and max_pages < total_pages:
total_pages = max_pages
to_scrape = [ScrapeConfig(update_url_parameter(url, page=page), **BASE_CONFIG) for page in range(2, total_pages + 1)]
async for result in scrapfly.concurrent_scrape(to_scrape):
data = find_hidden_data(result)
page_hits = data["props"]["pageProps"]["serverState"]["initialResults"]["prod_ecom_products_date_desc"]["results"][0]["hits"]
hits.extend(page_hits)
return hits
if __name__ == "__main__":
example = scrape_search("https://www.fashionphile.com/shop/discounted/all", max_pages=3)
print(asyncio.run(example))
Bypass Fashionphile Blocks with ScrapFly
To scale beyond a handful of pages you will need light anti detection work. ScrapFly wraps these steps into the API so you can focus on parsing results while the platform rotates proxies, manages TLS fingerprints, and retries flaky responses.
ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.
- Anti-bot protection bypass - scrape web pages without blocking!
- Rotating residential proxies - prevent IP address and geographic blocks.
- JavaScript rendering - scrape dynamic web pages through cloud browsers.
- Full browser automation - control browsers to scroll, input and click on objects.
- Format conversion - scrape as HTML, JSON, Text, or Markdown.
- Python and Typescript SDKs, as well as Scrapy and no-code tool integrations.
Quick SDK sample:
from scrapfly import ScrapeConfig, ScrapflyClient
client = ScrapflyClient(key="YOUR SCRAPFLY KEY")
result = client.scrape(
ScrapeConfig(
url="https://www.vestiairecollective.com/women-clothing/knitwear/anine-bing/beige-cotton-anine-bing-knitwear-32147447.shtml",
asp=True,
render_js=True,
)
)
print(result.content)
FAQs
Now let's take a look at some frequently asked questions about Fashionphile.
How do I extract product data from Fashionphile's NEXT_DATA script element?
Select the script#__NEXT_DATA__ tag with CSS or XPath, read its text, and parse the JSON with json.loads(). Product data usually lives under props.pageProps.initialState.productPageReducer.productData or a similar path.
What should I do if Fashionphile blocks my scraper or returns 403 errors?
Use rotating residential or mobile proxies, send real browser headers, add short delays, and fall back to headless browsers when needed. ScrapFly can also take care of anti bot handling automatically.
How can I scrape Fashionphile at scale without getting rate limited?
Throttle requests to one or two per second, reuse HTTP/2 connections, rotate proxies, and respect robots.txt. ScrapFly provides adaptive throttling if you prefer managed limits.
Why is scraping hidden web data faster than using Selenium for Fashionphile?
Hidden web data parsing works with raw JSON and does not spin up a browser. That keeps memory low and speeds up each request by 10 to 100 times compared to running Selenium for static content.
How do I handle pagination when scraping all Fashionphile product listings?
Inspect the query string for a page parameter, loop through the range of nbPages, and call the same hidden data parser for every response. Merge the hits into a single list or push them into your database as you go.
Can Fashionphile be crawled?
Yes. You can walk through category pages, related products, and public sitemaps. Hidden data scraping keeps the crawl fast since each page returns machine friendly JSON.
Summary
In this guide, we showed how Fashionphile makes detailed product data available in structured JSON on every page, and demonstrated how to extract it easily using Python or the ScrapFly SDK.
Once you know how to parse a single product, you can extend the same method to handle multiple pages by looping over the page query parameter for full pagination. We also discussed easy anti-blocking strategies and how ScrapFly can automate these protections, making large, stable scraping projects possible.
Legal Disclaimer and Precautions
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens who are protected by GDPR.
- Do not repurpose the entire public datasets which can be illegal in some countries.