How to Scrape Fashionphile for Second Hand Fashion Data

Q: How do I extract product data from Fashionphile's

Select the script#__NEXT_DATA__ tag with CSS or XPath, read its text, and parse the JSON with json.loads() . Product data usually lives under props.pageProps.initialState.productPageReducer.productData or a similar path.

Q: How do I handle pagination when scraping all Fashionphile product listings?

Inspect the query string for a page parameter, loop through the range of nbPages , and call the same hidden data parser for every response. Merge the hits into a single list or push them into your database as you go.

by Bernardas Ališauskas Dec 06, 2025

#scrapeguide #python #ecommerce #fashion

How to Scrape Fashionphile for Second Hand Fashion Data

Looking to unlock premium market insights from one of the world’s leading luxury resale platforms? Fashionphile blends meticulous curation with a modern storefront, placing a treasure trove of clean, structured product data right at your fingertips if you know where to look.

In this walkthrough we rewrite the Fashionphile scraping flow with simple language and drop in ready to run Python code so you can follow along without guessing.

Key Takeaways

Capture Fashionphile data faster by reading the hidden JSON payload that powers each Next.js page
Collect product titles, prices, conditions, measurements, and media in a single request per page
Add pagination, query management, and rate controls to cover every sale or category page
Avoid blocks with realistic headers, gentle concurrency, and ScrapFly managed anti bot features
Validate results and handle errors early so large fashion data runs stay predictable

Quick Start

If you just need a working scraper, clone the maintained Fashionphile scraper that ships with ScrapFly ready settings:

git clone https://github.com/scrapfly/scrapfly-scrapers.git
cd scrapfly-scrapers/fashionphile-scraper

This repository contains an up to date scraper with ScrapFly configuration, HTTP client best practices, and parsing helpers so you can run a production ready crawl with minimal setup.

Latest Fashionphile.com Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

What is Fashionphile

Fashionphile is a reseller that focuses on luxury bags, shoes, jewelry, and accessories. Each product page contains a detailed write up, precise measurements, condition grades, and dozens of media assets. Because the storefront is built with Next.js, all of that content is delivered inside a hidden __NEXT_DATA__ script tag that is trivial to parse.

Why Fashionphile

Fashionphile covers thousands of premium items with accurate pricing history, discount tags, and availability notes, which makes it ideal for:

Competitive tracking for luxury brands and retailers
Market research on secondary pricing and demand
Portfolio monitoring for investors in fashion goods
Trend reports that need trustworthy descriptions and media

The site keeps a consistent layout and publishes data in JSON, so automation work stays light compared to marketplaces that require heavy HTML parsing or browser automation.

Challenges in Scraping Fashionphile

While Fashionphile is easier to scrape than many marketplaces, it still presents some unique challenges. Let’s dive into each one:

Extracting the Hidden JSON

Fashionphile pages rely on a hidden JSON blob contained in a __NEXT_DATA__ script tag. If you don’t extract and parse this correctly, you’ll end up with just the basic HTML shell, missing all product information. Always make sure your scraper selects the proper script tag and parses its JSON content to get the real data.

Handling Pagination with Query Parameters

Pagination on Fashionphile uses query parameters, such as page=2. If you aren’t careful when updating these URLs, you may accidentally request duplicate pages or skip content. Attention to URL construction and tracking the current page are key to reliable crawling.

Managing Rate Limits

Sending too many requests in rapid succession can quickly hit Fashionphile’s rate limits. When this happens, you might receive HTTP 403 errors or empty responses. To avoid this, add small delays between requests and limit concurrent connections to stay under the radar.

Navigating Anti-Bot Systems

Fashionphile employs anti-bot protections that pay attention to unusual request headers and network fingerprints (like TLS signatures), especially on category pages. To reduce the risk of being blocked, use realistic browser headers, consider rotating proxies if you’re running at scale, and optionally use managed solutions like ScrapFly that handle these nuances automatically.

By addressing these four areas, you’ll maximize your chances of scraping Fashionphile consistently and at scale.

Fashionphile Scrape Preview

The final dataset is a JSON document that mirrors the content users see on the site. Here is a trimmed example from a pair of sandals to highlight the structure:

Example Fashionphile Product Dataset

{
  "id": 1048096,
  "sku": "BW",
  "title": "BOTTEGA VENETA Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black",
  "slug": "/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096",
  "price": 950,
  "retailPrice": 1650,
  "discountedPrice": 900,
  "condition": "Excellent",
  "conditions": [
    "scuffs",
    "imprints",
    "marks on sole(s)"
  ],
  "brand": [
    {
      "id": 89,
      "name": "Bottega Veneta"
    }
  ],
  "measurements": [
    {
      "type": "size",
      "unit": "EU",
      "value": 36
    },
    {
      "type": "heel",
      "unit": "in",
      "value": 4
    }
  ],
  "shipsWith": "2 dust bags, box",
  "color": "Black",
  "featuredImage": {
    "large": "https://prod-images.fashionphile.com/large/...jpg",
    "main": "https://prod-images.fashionphile.com/main/...jpg",
    "thumb": "https://prod-images.fashionphile.com/thumb/...jpg"
  },
  "conditionsText": "scuffs, imprints, marks on sole(s)",
  "url": "https://apigateway.fashionphile.com/product/1048096"
}

Project Setup

We only need a couple of Python packages to scrape the hidden JSON blobs: an HTTP client and a simple HTML selector library.

httpx for HTTP/2 requests with browser style headers
parsel for CSS selectors that pull out the __NEXT_DATA__ element

Install them with pip:

$ pip install httpx parsel

ScrapFly users can also install the SDK to get anti bot helpers and result caching in a single client:

$ pip install "scrapfly-sdk[all]"

Scrape Fashionphile Product Data

Let us start with a single product page inside the discount section:

fashionphile.com/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096 screen capture of fashionphile product page

screen capture of fashionphile product page

Modern frameworks place the product payload inside the page HTML. You can open the source view, search for the SKU, and spot a script tag with id="__NEXT_DATA__". Parsing that JSON is faster than traversing the rendered HTML tree.

Here is a minimal Python version that pulls the product block in a few lines:

Python

ScrapFly

import asyncio
import json

import httpx
from parsel import Selector

client = httpx.AsyncClient(
    follow_redirects=True,
    http2=True,
    headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
    },
)


def find_hidden_data(html: str) -> dict:
    data = Selector(html).css("script#__NEXT_DATA__::text").get()
    return json.loads(data)


async def scrape_product(url: str) -> dict:
    response = await client.get(url)
    data = find_hidden_data(response.text)
    product = data["props"]["pageProps"]["initialState"]["productPageReducer"]["productData"]
    return product


if __name__ == "__main__":
    print(asyncio.run(scrape_product("https://www.fashionphile.com/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096")))

import asyncio
import json
from urllib.parse import parse_qs, urlencode, urlparse
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY")


def find_hidden_data(result: ScrapeApiResponse) -> dict:
    data = result.selector.css("script#__NEXT_DATA__::text").get()
    return json.loads(data)


async def scrape_product(url: str) -> dict:
    result = await scrapfly.async_scrape(
        ScrapeConfig(
            url=url,
            cache=True,
            asp=True,
        )
    )
    data = find_hidden_data(result)
    product = data["props"]["pageProps"]["initialState"]["productPageReducer"]["productData"]
    return product


def update_url_parameter(url, **params):
    current_params = parse_qs(urlparse(url).query)
    updated_query_params = urlencode({**current_params, **params}, doseq=True)
    return f"{url.split('?')[0]}?{updated_query_params}"


if __name__ == "__main__":
    example = scrape_product(
        "https://www.fashionphile.com/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096"
    )
    print(asyncio.run(example))

Scrape Fashionphile Search and Categories

Product discovery pages reuse the exact same hidden JSON idea. Every search, sale, or category page includes the results list, pagination info, and metadata. The only extra work is flipping through each page by updating the page query parameter.

Simple pagination plan:

Fetch the first page and parse the JSON cache
Grab the current hits and find the total page count
Queue page numbers two through nbPages
Request the rest in small batches and extend the hits list

Below is a Python example that limits concurrency to three active HTTP connections. You can raise or lower that number depending on how friendly you want to be.

Python

ScrapFly

import asyncio
import json
from typing import Dict, List
from urllib.parse import parse_qs, urlencode, urlparse

import httpx
from parsel import Selector

client = httpx.AsyncClient(
    follow_redirects=True,
    http2=True,
    headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
    },
    limits=httpx.Limits(max_connections=3),
)


def find_hidden_data(html: str) -> dict:
    data = Selector(html).css("script#__NEXT_DATA__::text").get()
    return json.loads(data)


def update_url_parameter(url, **params):
    current_params = parse_qs(urlparse(url).query)
    updated_query_params = urlencode({**current_params, **params}, doseq=True)
    return f"{url.split('?')[0]}?{updated_query_params}"


async def scrape_search(url: str, max_pages: int = 3) -> List[Dict]:
    first_page = await client.get(url)
    dataset = find_hidden_data(first_page.text)
    first_results = dataset["props"]["pageProps"]["serverState"]["initialResults"][
        "prod_ecom_products_date_desc"
    ]["results"][0]
    hits = first_results["hits"]

    total_pages = first_results["nbPages"]
    if max_pages and max_pages < total_pages:
        total_pages = max_pages

    async def scrape_remaining(page: int):
        response = await client.get(update_url_parameter(url, page=page))
        data = find_hidden_data(response.text)
        block = data["props"]["pageProps"]["serverState"]["initialResults"]["prod_ecom_products_date_desc"]["results"][0]
        return block["hits"]

    tasks = [scrape_remaining(page) for page in range(2, total_pages + 1)]
    for task in asyncio.as_completed(tasks):
        hits.extend(await task)

    return hits


if __name__ == "__main__":
    results = asyncio.run(scrape_search("https://www.fashionphile.com/shop/discounted/all", max_pages=3))
    print(len(results))

import asyncio
import json
from typing import Dict, List
from urllib.parse import parse_qs, urlencode, urlparse
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY")
BASE_CONFIG = {"asp": True, "cache": True}


def find_hidden_data(result: ScrapeApiResponse) -> dict:
    data = result.selector.css("script#__NEXT_DATA__::text").get()
    return json.loads(data)


def update_url_parameter(url, **params):
    current_params = parse_qs(urlparse(url).query)
    updated_query_params = urlencode({**current_params, **params}, doseq=True)
    return f"{url.split('?')[0]}?{updated_query_params}"


async def scrape_search(url: str, max_pages: int = 10) -> List[Dict]:
    first_page = await scrapfly.async_scrape(ScrapeConfig(url, **BASE_CONFIG))
    dataset = find_hidden_data(first_page)
    block = dataset["props"]["pageProps"]["serverState"]["initialResults"]["prod_ecom_products_date_desc"]["results"][0]
    hits = block["hits"]

    total_pages = block["nbPages"]
    if max_pages and max_pages < total_pages:
        total_pages = max_pages

    to_scrape = [ScrapeConfig(update_url_parameter(url, page=page), **BASE_CONFIG) for page in range(2, total_pages + 1)]
    async for result in scrapfly.concurrent_scrape(to_scrape):
        data = find_hidden_data(result)
        page_hits = data["props"]["pageProps"]["serverState"]["initialResults"]["prod_ecom_products_date_desc"]["results"][0]["hits"]
        hits.extend(page_hits)

    return hits


if __name__ == "__main__":
    example = scrape_search("https://www.fashionphile.com/shop/discounted/all", max_pages=3)
    print(asyncio.run(example))

Bypass Fashionphile Blocks with ScrapFly

To scale beyond a handful of pages you will need light anti detection work. ScrapFly wraps these steps into the API so you can focus on parsing results while the platform rotates proxies, manages TLS fingerprints, and retries flaky responses.

scrapfly middleware — Scrapfly service does the heavy lifting for you

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

Anti-bot protection bypass - scrape web pages without blocking!
Rotating residential proxies - prevent IP address and geographic blocks.
JavaScript rendering - scrape dynamic web pages through cloud browsers.
Full browser automation - control browsers to scroll, input and click on objects.
Format conversion - scrape as HTML, JSON, Text, or Markdown.
Python and Typescript SDKs, as well as Scrapy and no-code tool integrations.

Quick SDK sample:

from scrapfly import ScrapeConfig, ScrapflyClient

client = ScrapflyClient(key="YOUR SCRAPFLY KEY")
result = client.scrape(
    ScrapeConfig(
        url="https://www.vestiairecollective.com/women-clothing/knitwear/anine-bing/beige-cotton-anine-bing-knitwear-32147447.shtml",
        asp=True,
        render_js=True,
    )
)
print(result.content)

FAQs

Now let's take a look at some frequently asked questions about Fashionphile.

How do I extract product data from Fashionphile's NEXT_DATA script element?

Select the script#__NEXT_DATA__ tag with CSS or XPath, read its text, and parse the JSON with json.loads(). Product data usually lives under props.pageProps.initialState.productPageReducer.productData or a similar path.

What should I do if Fashionphile blocks my scraper or returns 403 errors?

Use rotating residential or mobile proxies, send real browser headers, add short delays, and fall back to headless browsers when needed. ScrapFly can also take care of anti bot handling automatically.

How can I scrape Fashionphile at scale without getting rate limited?

Throttle requests to one or two per second, reuse HTTP/2 connections, rotate proxies, and respect robots.txt. ScrapFly provides adaptive throttling if you prefer managed limits.

Why is scraping hidden web data faster than using Selenium for Fashionphile?

Hidden web data parsing works with raw JSON and does not spin up a browser. That keeps memory low and speeds up each request by 10 to 100 times compared to running Selenium for static content.

How do I handle pagination when scraping all Fashionphile product listings?

Inspect the query string for a page parameter, loop through the range of nbPages, and call the same hidden data parser for every response. Merge the hits into a single list or push them into your database as you go.

Can Fashionphile be crawled?

Yes. You can walk through category pages, related products, and public sitemaps. Hidden data scraping keeps the crawl fast since each page returns machine friendly JSON.

Summary

In this guide, we showed how Fashionphile makes detailed product data available in structured JSON on every page, and demonstrated how to extract it easily using Python or the ScrapFly SDK.

Once you know how to parse a single product, you can extend the same method to handle multiple pages by looping over the page query parameter for full pagination. We also discussed easy anti-blocking strategies and how ScrapFly can automate these protections, making large, stable scraping projects possible.

Legal Disclaimer and Precautions

This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:

Do not scrape at rates that could damage the website.
Do not scrape data that's not available publicly.
Do not store PII of EU citizens who are protected by GDPR.
Do not repurpose the entire public datasets which can be illegal in some countries.

Scrapfly does not offer legal advice but these are good general rules to follow in web scraping and for more you should consult a lawyer.

Products

Features

SDKs

No-Code Platforms

LLM & RAG Apps

Technical Challenges

Popular Targets

Real Estate

eCommerce

Social Media

Company & Reviews

Jobs

Search & SEO

Fashion

Travel & Hotels

Industry Solutions

How to Scrape Fashionphile for Second Hand Fashion Data

Explore this Article with AI

Key Takeaways

Quick Start

Latest Fashionphile.com Scraper Code

What is Fashionphile

Why Fashionphile

Challenges in Scraping Fashionphile

Extracting the Hidden JSON

Handling Pagination with Query Parameters

Managing Rate Limits

Navigating Anti-Bot Systems

Fashionphile Scrape Preview

Project Setup

Scrape Fashionphile Product Data

Scrape Fashionphile Search and Categories

Bypass Fashionphile Blocks with ScrapFly

FAQs

How do I extract product data from Fashionphile's NEXT_DATA script element?

What should I do if Fashionphile blocks my scraper or returns 403 errors?

How can I scrape Fashionphile at scale without getting rate limited?

Why is scraping hidden web data faster than using Selenium for Fashionphile?

How do I handle pagination when scraping all Fashionphile product listings?

Can Fashionphile be crawled?

Summary

Explore this Article with AI

Related Knowledgebase

How to scrape HTML table to Excel Spreadsheet (.xlsx)?

Python httpx vs requests vs aiohttp - key differences

What Python libraries support HTTP2?

How to handle popup dialogs in Playwright?

How to use proxies with Python httpx?

How to scrape images from a website?

How to check if element exists in Playwright?

How to use cURL in Python?

How to fix Python requests ReadTimeout error?

How to fix python requests ConnectTimeout error?

How to fix Python requests MissingSchema error?

How to fix Python requests TooManyRedirects error?

Related Articles

How to Scrape Goat.com for Fashion Apparel Data in Python

How to Scrape Vestiaire Collective for Fashion Product Data

How to Scrape Nordstrom Fashion Product Data

How to Scrape StockX e-commerce Data with Python

How to Scrape BestBuy Product, Offer and Review Data