🚀 We are hiring! See open positions

How to Scrape Fashionphile for Second Hand Fashion Data

How to Scrape Fashionphile for Second Hand Fashion Data

Looking to unlock premium market insights from one of the world’s leading luxury resale platforms? Fashionphile blends meticulous curation with a modern storefront, placing a treasure trove of clean, structured product data right at your fingertips if you know where to look.

In this walkthrough we rewrite the Fashionphile scraping flow with simple language and drop in ready to run Python code so you can follow along without guessing.

Key Takeaways

  • Capture Fashionphile data faster by reading the hidden JSON payload that powers each Next.js page
  • Collect product titles, prices, conditions, measurements, and media in a single request per page
  • Add pagination, query management, and rate controls to cover every sale or category page
  • Avoid blocks with realistic headers, gentle concurrency, and ScrapFly managed anti bot features
  • Validate results and handle errors early so large fashion data runs stay predictable

Quick Start

If you just need a working scraper, clone the maintained Fashionphile scraper that ships with ScrapFly ready settings:

git clone https://github.com/scrapfly/scrapfly-scrapers.git
cd scrapfly-scrapers/fashionphile-scraper

This repository contains an up to date scraper with ScrapFly configuration, HTTP client best practices, and parsing helpers so you can run a production ready crawl with minimal setup.

Latest Fashionphile.com Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

What is Fashionphile

Fashionphile is a reseller that focuses on luxury bags, shoes, jewelry, and accessories. Each product page contains a detailed write up, precise measurements, condition grades, and dozens of media assets. Because the storefront is built with Next.js, all of that content is delivered inside a hidden __NEXT_DATA__ script tag that is trivial to parse.

Why Fashionphile

Fashionphile covers thousands of premium items with accurate pricing history, discount tags, and availability notes, which makes it ideal for:

  • Competitive tracking for luxury brands and retailers
  • Market research on secondary pricing and demand
  • Portfolio monitoring for investors in fashion goods
  • Trend reports that need trustworthy descriptions and media

The site keeps a consistent layout and publishes data in JSON, so automation work stays light compared to marketplaces that require heavy HTML parsing or browser automation.

Challenges in Scraping Fashionphile

While Fashionphile is easier to scrape than many marketplaces, it still presents some unique challenges. Let’s dive into each one:

Extracting the Hidden JSON

Fashionphile pages rely on a hidden JSON blob contained in a __NEXT_DATA__ script tag. If you don’t extract and parse this correctly, you’ll end up with just the basic HTML shell, missing all product information. Always make sure your scraper selects the proper script tag and parses its JSON content to get the real data.

Handling Pagination with Query Parameters

Pagination on Fashionphile uses query parameters, such as page=2. If you aren’t careful when updating these URLs, you may accidentally request duplicate pages or skip content. Attention to URL construction and tracking the current page are key to reliable crawling.

Managing Rate Limits

Sending too many requests in rapid succession can quickly hit Fashionphile’s rate limits. When this happens, you might receive HTTP 403 errors or empty responses. To avoid this, add small delays between requests and limit concurrent connections to stay under the radar.

Fashionphile employs anti-bot protections that pay attention to unusual request headers and network fingerprints (like TLS signatures), especially on category pages. To reduce the risk of being blocked, use realistic browser headers, consider rotating proxies if you’re running at scale, and optionally use managed solutions like ScrapFly that handle these nuances automatically.

By addressing these four areas, you’ll maximize your chances of scraping Fashionphile consistently and at scale.

Fashionphile Scrape Preview

The final dataset is a JSON document that mirrors the content users see on the site. Here is a trimmed example from a pair of sandals to highlight the structure:

Example Fashionphile Product Dataset
{
  "id": 1048096,
  "sku": "BW",
  "title": "BOTTEGA VENETA Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black",
  "slug": "/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096",
  "price": 950,
  "retailPrice": 1650,
  "discountedPrice": 900,
  "condition": "Excellent",
  "conditions": [
    "scuffs",
    "imprints",
    "marks on sole(s)"
  ],
  "brand": [
    {
      "id": 89,
      "name": "Bottega Veneta"
    }
  ],
  "measurements": [
    {
      "type": "size",
      "unit": "EU",
      "value": 36
    },
    {
      "type": "heel",
      "unit": "in",
      "value": 4
    }
  ],
  "shipsWith": "2 dust bags, box",
  "color": "Black",
  "featuredImage": {
    "large": "https://prod-images.fashionphile.com/large/...jpg",
    "main": "https://prod-images.fashionphile.com/main/...jpg",
    "thumb": "https://prod-images.fashionphile.com/thumb/...jpg"
  },
  "conditionsText": "scuffs, imprints, marks on sole(s)",
  "url": "https://apigateway.fashionphile.com/product/1048096"
}

Project Setup

We only need a couple of Python packages to scrape the hidden JSON blobs: an HTTP client and a simple HTML selector library.

  • httpx for HTTP/2 requests with browser style headers
  • parsel for CSS selectors that pull out the __NEXT_DATA__ element

Install them with pip:

$ pip install httpx parsel

ScrapFly users can also install the SDK to get anti bot helpers and result caching in a single client:

$ pip install "scrapfly-sdk[all]"

Scrape Fashionphile Product Data

Let us start with a single product page inside the discount section:

fashionphile.com/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096 screen capture of fashionphile product page

Modern frameworks place the product payload inside the page HTML. You can open the source view, search for the SKU, and spot a script tag with id="__NEXT_DATA__". Parsing that JSON is faster than traversing the rendered HTML tree.

Here is a minimal Python version that pulls the product block in a few lines:

Python
ScrapFly
import asyncio
import json

import httpx
from parsel import Selector

client = httpx.AsyncClient(
    follow_redirects=True,
    http2=True,
    headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
    },
)


def find_hidden_data(html: str) -> dict:
    data = Selector(html).css("script#__NEXT_DATA__::text").get()
    return json.loads(data)


async def scrape_product(url: str) -> dict:
    response = await client.get(url)
    data = find_hidden_data(response.text)
    product = data["props"]["pageProps"]["initialState"]["productPageReducer"]["productData"]
    return product


if __name__ == "__main__":
    print(asyncio.run(scrape_product("https://www.fashionphile.com/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096")))
import asyncio
import json
from urllib.parse import parse_qs, urlencode, urlparse
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY")


def find_hidden_data(result: ScrapeApiResponse) -> dict:
    data = result.selector.css("script#__NEXT_DATA__::text").get()
    return json.loads(data)


async def scrape_product(url: str) -> dict:
    result = await scrapfly.async_scrape(
        ScrapeConfig(
            url=url,
            cache=True,
            asp=True,
        )
    )
    data = find_hidden_data(result)
    product = data["props"]["pageProps"]["initialState"]["productPageReducer"]["productData"]
    return product


def update_url_parameter(url, **params):
    current_params = parse_qs(urlparse(url).query)
    updated_query_params = urlencode({**current_params, **params}, doseq=True)
    return f"{url.split('?')[0]}?{updated_query_params}"


if __name__ == "__main__":
    example = scrape_product(
        "https://www.fashionphile.com/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096"
    )
    print(asyncio.run(example))

Scrape Fashionphile Search and Categories

Product discovery pages reuse the exact same hidden JSON idea. Every search, sale, or category page includes the results list, pagination info, and metadata. The only extra work is flipping through each page by updating the page query parameter.

Simple pagination plan:

  1. Fetch the first page and parse the JSON cache
  2. Grab the current hits and find the total page count
  3. Queue page numbers two through nbPages
  4. Request the rest in small batches and extend the hits list

Below is a Python example that limits concurrency to three active HTTP connections. You can raise or lower that number depending on how friendly you want to be.

Python
ScrapFly
import asyncio
import json
from typing import Dict, List
from urllib.parse import parse_qs, urlencode, urlparse

import httpx
from parsel import Selector

client = httpx.AsyncClient(
    follow_redirects=True,
    http2=True,
    headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
    },
    limits=httpx.Limits(max_connections=3),
)


def find_hidden_data(html: str) -> dict:
    data = Selector(html).css("script#__NEXT_DATA__::text").get()
    return json.loads(data)


def update_url_parameter(url, **params):
    current_params = parse_qs(urlparse(url).query)
    updated_query_params = urlencode({**current_params, **params}, doseq=True)
    return f"{url.split('?')[0]}?{updated_query_params}"


async def scrape_search(url: str, max_pages: int = 3) -> List[Dict]:
    first_page = await client.get(url)
    dataset = find_hidden_data(first_page.text)
    first_results = dataset["props"]["pageProps"]["serverState"]["initialResults"][
        "prod_ecom_products_date_desc"
    ]["results"][0]
    hits = first_results["hits"]

    total_pages = first_results["nbPages"]
    if max_pages and max_pages < total_pages:
        total_pages = max_pages

    async def scrape_remaining(page: int):
        response = await client.get(update_url_parameter(url, page=page))
        data = find_hidden_data(response.text)
        block = data["props"]["pageProps"]["serverState"]["initialResults"]["prod_ecom_products_date_desc"]["results"][0]
        return block["hits"]

    tasks = [scrape_remaining(page) for page in range(2, total_pages + 1)]
    for task in asyncio.as_completed(tasks):
        hits.extend(await task)

    return hits


if __name__ == "__main__":
    results = asyncio.run(scrape_search("https://www.fashionphile.com/shop/discounted/all", max_pages=3))
    print(len(results))
import asyncio
import json
from typing import Dict, List
from urllib.parse import parse_qs, urlencode, urlparse
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY")
BASE_CONFIG = {"asp": True, "cache": True}


def find_hidden_data(result: ScrapeApiResponse) -> dict:
    data = result.selector.css("script#__NEXT_DATA__::text").get()
    return json.loads(data)


def update_url_parameter(url, **params):
    current_params = parse_qs(urlparse(url).query)
    updated_query_params = urlencode({**current_params, **params}, doseq=True)
    return f"{url.split('?')[0]}?{updated_query_params}"


async def scrape_search(url: str, max_pages: int = 10) -> List[Dict]:
    first_page = await scrapfly.async_scrape(ScrapeConfig(url, **BASE_CONFIG))
    dataset = find_hidden_data(first_page)
    block = dataset["props"]["pageProps"]["serverState"]["initialResults"]["prod_ecom_products_date_desc"]["results"][0]
    hits = block["hits"]

    total_pages = block["nbPages"]
    if max_pages and max_pages < total_pages:
        total_pages = max_pages

    to_scrape = [ScrapeConfig(update_url_parameter(url, page=page), **BASE_CONFIG) for page in range(2, total_pages + 1)]
    async for result in scrapfly.concurrent_scrape(to_scrape):
        data = find_hidden_data(result)
        page_hits = data["props"]["pageProps"]["serverState"]["initialResults"]["prod_ecom_products_date_desc"]["results"][0]["hits"]
        hits.extend(page_hits)

    return hits


if __name__ == "__main__":
    example = scrape_search("https://www.fashionphile.com/shop/discounted/all", max_pages=3)
    print(asyncio.run(example))

Bypass Fashionphile Blocks with ScrapFly

To scale beyond a handful of pages you will need light anti detection work. ScrapFly wraps these steps into the API so you can focus on parsing results while the platform rotates proxies, manages TLS fingerprints, and retries flaky responses.

scrapfly middleware
Scrapfly service does the heavy lifting for you

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

Quick SDK sample:

from scrapfly import ScrapeConfig, ScrapflyClient

client = ScrapflyClient(key="YOUR SCRAPFLY KEY")
result = client.scrape(
    ScrapeConfig(
        url="https://www.vestiairecollective.com/women-clothing/knitwear/anine-bing/beige-cotton-anine-bing-knitwear-32147447.shtml",
        asp=True,
        render_js=True,
    )
)
print(result.content)

FAQs

Now let's take a look at some frequently asked questions about Fashionphile.

How do I extract product data from Fashionphile's NEXT_DATA script element?

Select the script#__NEXT_DATA__ tag with CSS or XPath, read its text, and parse the JSON with json.loads(). Product data usually lives under props.pageProps.initialState.productPageReducer.productData or a similar path.

What should I do if Fashionphile blocks my scraper or returns 403 errors?

Use rotating residential or mobile proxies, send real browser headers, add short delays, and fall back to headless browsers when needed. ScrapFly can also take care of anti bot handling automatically.

How can I scrape Fashionphile at scale without getting rate limited?

Throttle requests to one or two per second, reuse HTTP/2 connections, rotate proxies, and respect robots.txt. ScrapFly provides adaptive throttling if you prefer managed limits.

Why is scraping hidden web data faster than using Selenium for Fashionphile?

Hidden web data parsing works with raw JSON and does not spin up a browser. That keeps memory low and speeds up each request by 10 to 100 times compared to running Selenium for static content.

How do I handle pagination when scraping all Fashionphile product listings?

Inspect the query string for a page parameter, loop through the range of nbPages, and call the same hidden data parser for every response. Merge the hits into a single list or push them into your database as you go.

Can Fashionphile be crawled?

Yes. You can walk through category pages, related products, and public sitemaps. Hidden data scraping keeps the crawl fast since each page returns machine friendly JSON.

Summary

In this guide, we showed how Fashionphile makes detailed product data available in structured JSON on every page, and demonstrated how to extract it easily using Python or the ScrapFly SDK.

Once you know how to parse a single product, you can extend the same method to handle multiple pages by looping over the page query parameter for full pagination. We also discussed easy anti-blocking strategies and how ScrapFly can automate these protections, making large, stable scraping projects possible.

Explore this Article with AI

Related Knowledgebase

Related Articles