Etsy.com is a global online marketplace where users can buy and sell handmade and vintage items. It's a valuable data target though it can be challenging to scrape due to its high level of protection.
In this article about web scraping Etsy, we'll scrape items and review data from product, shop and search pages. Moreover, we'll explore to avoid Etsy scraping blocking. Let's get started!
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:
Do not scrape at rates that could damage the website.
Do not scrape data that's not available publicly.
Do not store PII of EU citizens who are protected by GDPR.
Do not repurpose the entire public datasets which can be illegal in some countries.
Scrapfly does not offer legal advice but these are good general rules to follow in web scraping
and for more you should consult a lawyer.
Why Scrape Etsy.com?
If you are a buyer looking to buy certain items, manually exploring thousands of ad listings to figure out the best deal can be tedious and time-consuming. With etsy.com scraping, we can retrieve thousands of listings and compare them in no time, allowing for better decision-making.
Web scraping etsy.com also allows businesses and sellers to understand and analyze market trends to get insights into consumer behavior.
Furthermore, scraping sellers' and shops' data from Etsy allows business owners to analyze their competitors' and market peers' items, stocks and prices. Leading to taking strategic moves and gaining a competitive edge.
Project Setup
To scrape etsy.com, we'll use a few Python libraries.
httpx for sending HTTP requests to the website and getting the pages' HTML.
parsel for parsing data from the HTML using selectors like XPath and CSS.
loguru for monitoring and logging our scraper.
scrapfly-sdk for bypassing etsy.com web scraping blocking using ScrapFly's web scraping API.
asyncio for running our code in an asynchronous fashion, increasing our web scraping speed.
As asyncio comes included with Python, so we only have to install the other libraries using the following pip command:
pip install httpx parsel loguru scrapfly-sdk
How to Scrape Etsy Listings
Let's start by scraping Etsy.com listing pages. Go to any listing page on the website and you will get a page similar to this:
To scrape listing pages' data, we'll extract all the data directly in JSON rather than parsing each data point from the HTML.
To view the hidden listing data, open the browser developer tools (by pressing the F12 key) to view the page HTML. Then, scroll down till you find the script tag with the application/ld+json type. The data inside this tag looks like this:
This data is the same on the web page but before getting rendered into the HTML, usually known as hidden web data. To scrape etsy.com listing pages, we'll select this script tag and extract the inside data as JSON directly:
Python
ScrapFly
import asyncio
import json
from httpx import AsyncClient, Response
from parsel import Selector
from typing import Dict, List
from loguru import logger as log
# 1. Create HTTP client with headers that look like a real web browser
client = AsyncClient(
headers={
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
},
follow_redirects=True,
http2=True
)
def parse_product_page(response: Response) -> Dict:
"""parse hidden product data from product pages"""
assert response.status_code == 200, "request is blocked"
selector = Selector(response.text)
script = selector.xpath("//script[contains(text(),'offers')]/text()").get()
data = json.loads(script)
return data
async def scrape_product(urls: List[str]) -> List[Dict]:
"""scrape etsy product pages"""
products = []
# add the product page URLs to a scraping list
to_scrape = [client.get(url) for url in urls]
# scrape all the product pages concurrently
for response in asyncio.as_completed(to_scrape):
data = parse_product_page(await response)
products.append(data)
log.success(f"scraped {len(products)} product listings from product pages")
return products
import json
import asyncio
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
from typing import Dict, List
from loguru import logger as log
SCRAPFLY = ScrapflyClient(key="Your ScrapFly key")
def parse_product_page(response: Response) -> Dict:
"""parse hidden product data from product pages"""
selector = Selector(response.text)
script = selector.xpath("//script[contains(text(),'offers')]/text()").get()
data = json.loads(script)
return data
async def scrape_product(urls: List[str]) -> List[Dict]:
"""scrape etsy product pages"""
products = []
# add the product page URLs to a scraping list
to_scrape = [ScrapeConfig(url, asp=True, country="US") for url in urls]
# scrape all the product pages concurrently
async for response in SCRAPFLY.concurrent_scrape(to_scrape):
data = parse_product_page(response)
products.append(data)
log.success(f"scraped {len(products)} product listings from product pages")
return products
Run the code
async def run():
product_data = await scrape_product(
urls = [
"https://www.etsy.com/listing/971370843",
"https://www.etsy.com/listing/529765307",
"https://www.etsy.com/listing/949905096"
]
)
# save the data to a JSON file
with open("products.json", "w", encoding="utf-8") as file:
json.dump(product_data, file, indent=2, ensure_ascii=False)
if __name__ == "__main__":
asyncio.run(run())
Here, we initialize an httpx client with basic browser headers and define two functions:
parse_product_page() for parsing the HTML and selecting the JSON data inside the script tag.
scrape_product() for scraping the product pages by adding the page URLs to a scraping list and scraping them concurrently.
Here is a sample output of the result we got:
Example output
[
{
"@type": "Product",
"@context": "https://schema.org",
"url": "https://www.etsy.com/listing/949905096/oakywood-dual-laptop-stand-wood-vertical",
"name": "Oakywood Dual Laptop Stand Wood, Vertical Stand for Desk, Adjustable Macbook Double Dock, Desk Organizer, Gift for Him, Work from Home",
"sku": "949905096",
"gtin": "n/a",
"description": "The Dual Laptop Dock is an innovative solution for your desk organization. The stand allows you to simultaneously store two devices: laptops or tablets up to 24mm (0.95 inch) thick. Streamline your work by securely storing your gadgets, organizing your desk space, and smoothly switching between the two devices.\n\nWant it PERSONALIZED? Add this - https://www.etsy.com/listing/662276951\n\nF E A T U R E S:\n• supports two devices up to 24mm (0.94 inch) thick and offers adjustable width for a secure fit\n• hand-polished solid oak or walnut wood\n• solid aluminum base - stable & safe\n• one-hand operation - micro-suction tape technology\n• soft wool felt and flock on the inside protects your devices\n• unique geometric design\n\nMake your desk organized and comfortable - choose a multifunctional dual vertical stand, which allows you to store all of your favorite devices! Handcrafted in solid walnut or oak wood, polished by true woodworking enthusiasts.\n\n1 PRODUCT = 1 TREE\nYes! It's that simple! You buy 1 product from us, and we plant 1 tree! You can do something good for the environment while buying products for yourself. Isn’t that great?!\n\nADDITIONAL INFORMATION:\n• Length x Width: 18 x 11.3 cm (7" x 4.45")\n• Height: 4 cm (1.6")\n• Wood is a natural material, thus each individual product may slightly vary in color\n• Handcrafted in Poland, EU\n\nASK US!\nIf you have any other questions about this phone case, please click the "Ask a Question" button next to the price and we’ll get right back to you!\nThank you for shopping at Oakywood!",
"image": [
{
"@type": "ImageObject",
"@context": "https://schema.org",
"author": "Oakywood",
"contentURL": "https://i.etsystatic.com/13285848/r/il/ea7b26/4369079578/il_fullxfull.4369079578_p74k.jpg",
"description": null,
"thumbnail": "https://i.etsystatic.com/13285848/c/1452/1154/149/0/il/ea7b26/4369079578/il_340x270.4369079578_p74k.jpg"
},
{
"@type": "ImageObject",
"@context": "https://schema.org",
"author": "Oakywood",
"contentURL": "https://i.etsystatic.com/13285848/r/il/2bce5f/4416475123/il_fullxfull.4416475123_p28q.jpg",
"description": null,
"thumbnail": "https://i.etsystatic.com/13285848/r/il/2bce5f/4416475123/il_340x270.4416475123_p28q.jpg"
},
{
"@type": "ImageObject",
"@context": "https://schema.org",
"author": "Oakywood",
"contentURL": "https://i.etsystatic.com/13285848/r/il/638121/4369065696/il_fullxfull.4369065696_5r4k.jpg",
"description": null,
"thumbnail": "https://i.etsystatic.com/13285848/r/il/638121/4369065696/il_340x270.4369065696_5r4k.jpg"
},
{
"@type": "ImageObject",
"@context": "https://schema.org",
"author": "Oakywood",
"contentURL": "https://i.etsystatic.com/13285848/r/il/7c320b/4416468699/il_fullxfull.4416468699_fb9a.jpg",
"description": null,
"thumbnail": "https://i.etsystatic.com/13285848/r/il/7c320b/4416468699/il_340x270.4416468699_fb9a.jpg"
},
{
"@type": "ImageObject",
"@context": "https://schema.org",
"author": "Oakywood",
"contentURL": "https://i.etsystatic.com/13285848/r/il/c02124/4416465723/il_fullxfull.4416465723_ddy9.jpg",
"description": null,
"thumbnail": "https://i.etsystatic.com/13285848/r/il/c02124/4416465723/il_340x270.4416465723_ddy9.jpg"
},
{
"@type": "ImageObject",
"@context": "https://schema.org",
"author": "Oakywood",
"contentURL": "https://i.etsystatic.com/13285848/r/il/47976e/5617640133/il_fullxfull.5617640133_f78z.jpg",
"description": null,
"thumbnail": "https://i.etsystatic.com/13285848/r/il/47976e/5617640133/il_340x270.5617640133_f78z.jpg"
},
{
"@type": "ImageObject",
"@context": "https://schema.org",
"author": "Oakywood",
"contentURL": "https://i.etsystatic.com/13285848/r/il/05dac5/5617640089/il_fullxfull.5617640089_fm5y.jpg",
"description": null,
"thumbnail": "https://i.etsystatic.com/13285848/r/il/05dac5/5617640089/il_340x270.5617640089_fm5y.jpg"
},
{
"@type": "ImageObject",
"@context": "https://schema.org",
"author": "Oakywood",
"contentURL": "https://i.etsystatic.com/13285848/r/il/1be16e/5617640099/il_fullxfull.5617640099_p3jt.jpg",
"description": null,
"thumbnail": "https://i.etsystatic.com/13285848/r/il/1be16e/5617640099/il_340x270.5617640099_p3jt.jpg"
},
{
"@type": "ImageObject",
"@context": "https://schema.org",
"author": "Oakywood",
"contentURL": "https://i.etsystatic.com/13285848/r/il/c4eb36/5617640105/il_fullxfull.5617640105_om5v.jpg",
"description": null,
"thumbnail": "https://i.etsystatic.com/13285848/r/il/c4eb36/5617640105/il_340x270.5617640105_om5v.jpg"
},
{
"@type": "ImageObject",
"@context": "https://schema.org",
"author": "Oakywood",
"contentURL": "https://i.etsystatic.com/13285848/r/il/1e4125/5617640107/il_fullxfull.5617640107_roa6.jpg",
"description": null,
"thumbnail": "https://i.etsystatic.com/13285848/r/il/1e4125/5617640107/il_340x270.5617640107_roa6.jpg"
}
],
"category": "Electronics & Accessories < Docking & Stands < Stands",
"brand": {
"@type": "Brand",
"@context": "https://schema.org",
"name": "Oakywood"
},
"logo": "https://i.etsystatic.com/isla/7cfa3d/58081234/isla_fullxfull.58081234_fqvaz995.jpg?version=0",
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.9",
"reviewCount": 7695
},
"offers": {
"@type": "AggregateOffer",
"offerCount": 2959,
"lowPrice": "70.00",
"highPrice": "80.00",
"priceCurrency": "USD",
"availability": "https://schema.org/InStock"
},
"review": [
{
"@type": "Review",
"reviewRating": {
"@type": "Rating",
"ratingValue": 5,
"bestRating": 5
},
"datePublished": "2023-11-30",
"reviewBody": "Absolutely beautiful laptop stand for my setup. I use it for my work laptop and my MacBook. It cleaned up my desk, which is mounted in my dining room, and added a refined look. The walnut almost perfectly matches the wood of my desk (differences in finishes aside; I can't expect the seller to use tung oil). The included woolen strips help to make each slot perfect for my laptops, both my chunky government one and my thin MBP. The stand hasn't moved, so whatever they put on the bottom to keep it in place is great. Highly recommend this.",
"author": {
"@type": "Person",
"name": "JC Palmer"
}
},
{
"@type": "Review",
"reviewRating": {
"@type": "Rating",
"ratingValue": 5,
"bestRating": 5
},
"datePublished": "2022-11-29",
"reviewBody": "I was so impressed with my desk shelf from Oakywood that I decided to supplement my desk setup with a dual laptop stand to place my personal and work laptops. The stand is well built and looks very sturdy and elegant on the desk. Definitely a great purchase and would really recommend Oakywood to anyone else that wants to spice their desk/home office setup.",
"author": {
"@type": "Person",
"name": "Enrico"
}
},
{
"@type": "Review",
"reviewRating": {
"@type": "Rating",
"ratingValue": 5,
"bestRating": 5
},
"datePublished": "2023-02-02",
"reviewBody": "Beautiful carved wood stand for my laptop and tablet. The walnut is true to color as the online photos, and it looks great on my glass top desk. It comes nicely packaged from a US-based shipping point (so it arrived quickly) and came with clear instructions and plenty of soft wool strips to ensure a proper fit. I'm now eyeing a wood monitor stand from Oakywood.",
"author": {
"@type": "Person",
"name": "Emily Kim"
}
},
{
"@type": "Review",
"reviewRating": {
"@type": "Rating",
"ratingValue": 5,
"bestRating": 5
},
"datePublished": "2023-02-27",
"reviewBody": "It’s beautifully made and helps me reduce the clutter that was my desk! Love it",
"author": {
"@type": "Person",
"name": "Amy Thomas"
}
}
],
"material": "Solid Wood/Stainless Steel/Walnut Wood"
},
]
Pretty straightforward! Our Etsy scraper got all the product data and a few reviews with just a few lines of code. Next, we'll scrape shop data!
How to Scrape Etsy Shops
Shop pages on etsy.com include data about the products sold by a shop alongside the shop reviews. Similar to product listing pages, shop page data are also found under script tags:
Just like in the previous section, we'll scrape etsy.com shop pages by extracting the data directly from the above script tag:
Python
ScrapFly
import asyncio
import json
from httpx import AsyncClient, Response
from parsel import Selector
from typing import Dict, List
from loguru import logger as log
# 1. Create HTTP client with headers that look like a real web browser
client = AsyncClient(
headers={
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
},
follow_redirects=True,
http2=True
)
def parse_shop_page(response: Response) -> Dict:
"""parse hidden shop data from shop pages"""
assert response.status_code == 200, "request is blocked"
selector = Selector(response.text)
script = selector.xpath("//script[contains(text(),'itemListElement')]/text()").get()
data = json.loads(script)
return data
async def scrape_shop(urls: List[str]) -> List[Dict]:
"""scrape etsy shop pages"""
shops = []
# add the shop page URLs to a scraping list
to_scrape = [client.get(url) for url in urls]
# scrape all the shop pages concurrently
for response in asyncio.as_completed(to_scrape):
data = parse_shop_page(await response)
shops.append(data)
log.success(f"scraped {len(shops)} shops from shop pages")
return shops
import json
import asyncio
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
from typing import Dict, List
from loguru import logger as log
SCRAPFLY = ScrapflyClient(key="Your ScrapFly key")
def parse_shop_page(response: ScrapeApiResponse) -> Dict:
"""parse hidden shop data from shop pages"""
selector = response.selector
script = selector.xpath("//script[contains(text(),'offers')]/text()").get()
data = json.loads(script)
return data
async def scrape_shop(urls: List[str]) -> List[Dict]:
"""scrape etsy shop pages"""
shops = []
# add the shop page URLs to a scraping list
to_scrape = [ScrapeConfig(url, asp=True, country="US") for url in urls]
# scrape all the shop pages concurrently
async for response in SCRAPFLY.concurrent_scrape(to_scrape):
data = parse_shop_page(response)
shops.append(data)
log.success(f"scraped {len(shops)} shops from shop pages")
return shops
Run the code
async def run():
shop_data = await scrape_shop(
urls = [
"https://www.etsy.com/shop/FalkelDesign",
"https://www.etsy.com/shop/JoshuaHouseCrafts",
"https://www.etsy.com/shop/Oakywood"
]
)
# save the data to a JSON file
with open("shops.json", "w", encoding="utf-8") as file:
json.dump(shop_data, file, indent=2, ensure_ascii=False)
if __name__ == "__main__":
asyncio.run(run())
The above code is the same as the Etsy scraper we wrote earlier. We have only changed the naming and the XPath selector.
Cool! We are able to scrape product and shop pages from etsy.com. The last piece of our Etsy scraper is the search pages. Let's jump into it!
How to Scrape Etsy Search
In this section, we'll scrape item listing data from search pages. But first, let's look at what the search pages on etsy.com look like. Search for any product on the website and you will get a page similar to this:
Unlike the product and shop pages, hidden data on search pages don't provide all the search data. For example, here is hidden data on a search page. It contains 8 product listings, but the actual page contains 64 product listings:
So, to scrape Etsy.com search pages, we have to parse each listing data from the HTML. Let's start with that:
Python
ScrapFly
def strip_text(text):
"""remove extra spaces while handling None values"""
if text != None:
text = text.strip()
return text
def parse_search(response: Response) -> Dict:
"""parse data from Etsy search pages"""
assert response.status_code == 200, "request is blocked"
selector = Selector(response.text)
data = []
script = json.loads(selector.xpath("//script[@type='application/ld+json']/text()").get())
# get the total number of pages
total_listings = script["numberOfItems"]
total_pages = math.ceil(total_listings / 48)
for product in selector.xpath("//div[@data-search-results-lg]/ul/li[div[@data-appears-component-name]]"):
link = product.xpath(".//a[contains(@class, 'listing-link')]/@href").get()
rate = product.xpath(".//span[contains(@class, 'review_stars')]/span/text()").get()
number_of_reviews = strip_text(product.xpath(".//div[contains(@aria-label,'star rating')]/p/text()").get())
if number_of_reviews:
number_of_reviews = number_of_reviews.replace("(", "").replace(")", "")
number_of_reviews = int(number_of_reviews.replace("k", "").replace(".", "")) * 10 if "k" in number_of_reviews else number_of_reviews
price = product.xpath(".//span[@class='currency-value']/text()").get()
original_price = product.xpath(".//span[contains(text(),'Original Price')]/text()").get()
discount = strip_text(product.xpath(".//span[contains(text(),'off')]/text()").get())
seller = product.xpath(".//span[contains(text(),'From shop')]/text()").get()
currency = product.xpath(".//span[@class='currency-symbol']/text()").get()
data.append({
"productLink": '/'.join(link.split('/')[:5]) if link else None,
"productTitle": strip_text(product.xpath(".//h3[contains(@class, 'v2-listing-card__titl')]/@title").get()),
"productImage": product.xpath("//img[@data-listing-card-listing-image]/@src").get(),
"seller": seller.replace("From shop ", "") if seller else None,
"listingType": "Paid listing" if product.xpath(".//span[@data-ad-label='Ad by Etsy seller']") else "Free listing",
"productRate": float(rate.strip()) if rate else None,
"numberOfReviews": int(number_of_reviews) if number_of_reviews else None,
"freeShipping": "Yes" if product.xpath(".//span[contains(text(),'Free shipping')]/text()").get() else "No",
"productPrice": float(price.replace(",", "")) if price else None,
"priceCurrency": currency,
"originalPrice": float(original_price.split(currency)[-1].strip()) if original_price else "No discount",
"discount": discount if discount else "No discount",
})
return {
"search_data": data,
"total_pages": total_pages
}
def strip_text(text):
"""remove extra spaces while handling None values"""
if text != None:
text = text.strip()
return text
def parse_search(response: ScrapeApiResponse) -> Dict:
"""parse data from Etsy search pages"""
selector = response.selector
data = []
script = json.loads(selector.xpath("//script[@type='application/ld+json']/text()").get())
# get the total number of pages
total_listings = script["numberOfItems"]
total_pages = math.ceil(total_listings / 48)
for product in selector.xpath("//div[@data-search-results-lg]/ul/li[div[@data-appears-component-name]]"):
link = product.xpath(".//a[contains(@class, 'listing-link')]/@href").get()
rate = product.xpath(".//span[contains(@class, 'review_stars')]/span/text()").get()
number_of_reviews = strip_text(product.xpath(".//div[contains(@aria-label,'star rating')]/p/text()").get())
if number_of_reviews:
number_of_reviews = number_of_reviews.replace("(", "").replace(")", "")
number_of_reviews = int(number_of_reviews.replace("k", "").replace(".", "")) * 10 if "k" in number_of_reviews else number_of_reviews
price = product.xpath(".//span[@class='currency-value']/text()").get()
original_price = product.xpath(".//span[contains(text(),'Original Price')]/text()").get()
discount = strip_text(product.xpath(".//span[contains(text(),'off')]/text()").get())
seller = product.xpath(".//span[contains(text(),'From shop')]/text()").get()
currency = product.xpath(".//span[@class='currency-symbol']/text()").get()
data.append({
"productLink": '/'.join(link.split('/')[:5]) if link else None,
"productTitle": strip_text(product.xpath(".//h3[contains(@class, 'v2-listing-card__titl')]/@title").get()),
"productImage": product.xpath("//img[@data-listing-card-listing-image]/@src").get(),
"seller": seller.replace("From shop ", "") if seller else None,
"listingType": "Paid listing" if product.xpath(".//span[@data-ad-label='Ad by Etsy seller']") else "Free listing",
"productRate": float(rate.strip()) if rate else None,
"numberOfReviews": int(number_of_reviews) if number_of_reviews else None,
"freeShipping": "Yes" if product.xpath(".//span[contains(text(),'Free shipping')]/text()").get() else "No",
"productPrice": float(price.replace(",", "")) if price else None,
"priceCurrency": currency,
"originalPrice": float(original_price.split(currency)[-1].strip()) if original_price else "No discount",
"discount": discount if discount else "No discount",
})
return {
"search_data": data,
"total_pages": total_pages
}
🙋 Note that some search page data can't be rendered without enabling JavaScript. We recommend running the ScrapFly code tabs as it has render_js enabled, allowing for a full load of the page.
Now that we have our data selectors set-up, let's utilize it with the rest of our scraping logic:
Python
ScrapFly
import asyncio
import json
import math
from httpx import AsyncClient, Response
from parsel import Selector
from typing import Dict, List
from loguru import logger as log
# 1. Create HTTP client with headers that look like a real web browser
client = AsyncClient(
headers={
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
},
follow_redirects=True,
http2=True
)
def strip_text(text):
"""remove extra spaces while handling None values"""
if text != None:
text = text.strip()
return text
def parse_search(response: Response) -> Dict:
"""parse data from Etsy search pages"""
# the rest of the parsing logic
async def scrape_search(url: str, max_pages: int = None) -> List[Dict]:
"""scrape product listing data from Etsy search pages"""
log.info("scraping the first search page")
# etsy search pages are dynaminc, requiring render_js enabled
first_page = await client.get(url)
data = parse_search(first_page)
search_data = data["search_data"]
# get the number of total pages to scrape
total_pages = data["total_pages"]
if max_pages and max_pages < total_pages:
total_pages = max_pages
log.info(f"scraping search pagination ({total_pages - 1} more pages)")
# add the remaining search pages in a scraping list
other_pages = [
client.get(url + f"&page={page_number}")
for page_number in range(2, total_pages + 1)
]
# scrape the remaining search pages concurrently
for response in asyncio.as_completed(other_pages):
data = parse_search(await response)
search_data.extend(data["search_data"])
log.success(f"scraped {len(search_data)} product listings from search")
return search_data
import json
import math
import asyncio
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
from typing import Dict, List
from loguru import logger as log
SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")
def strip_text(text):
"""remove extra spaces while handling None values"""
if text != None:
text = text.strip()
return text
def parse_search(response: ScrapeApiResponse) -> Dict:
"""parse data from Etsy search pages"""
# the rest of the parsing logic
async def scrape_search(url: str, max_pages: int = None) -> List[Dict]:
"""scrape product listing data from Etsy search pages"""
log.info("scraping the first search page")
# etsy search pages are dynaminc, requiring render_js enabled
first_page = await SCRAPFLY.async_scrape(ScrapeConfig(url,
wait_for_selector="//div[@data-search-pagination]", render_js=True, asp=True, country="US", cache=True))
data = parse_search(first_page)
search_data = data["search_data"]
# get the number of total pages to scrape
total_pages = data["total_pages"]
if max_pages and max_pages < total_pages:
total_pages = max_pages
log.info(f"scraping search pagination ({total_pages - 1} more pages)")
# add the remaining search pages in a scraping list
other_pages = [
ScrapeConfig(url + f"&page={page_number}",
wait_for_selector="//div[@data-search-pagination]", render_js=True, asp=True, country="US")
for page_number in range(2, total_pages + 1)
]
# scrape the remaining search pages concurrently
async for response in SCRAPFLY.concurrent_scrape(other_pages):
data = parse_search(response)
search_data.extend(data["search_data"])
log.success(f"scraped {len(search_data)} product listings from search")
return search_data
Run the code
async def run():
search_data = await scrape_search(
url="https://www.etsy.com/search?q=wood+laptop+stand", max_pages=1
)
# save the data to a JSON file
with open("search.json", "w", encoding="utf-8") as file:
json.dump(search_data, file, indent=2, ensure_ascii=False)
if __name__ == "__main__":
asyncio.run(run())
🙋 The chances of getting blocked while requesting search pages are very high. Run the ScrapFly code tab to avoid getting blocked.
Here, we define a scrape_search() function to crawl over the search pages by scraping the first search page and then iterating over the desired number of search pages.
The above etsy.com scraping code should scrape three search pages with a total number of 192 product listings. Here is what the scraped data should look like:
Our scraping code is finally complete, which can scrape product, shop and search pages. However, our Etsy scraper will likely get blocked after sending additional requests to the website. Let's take a look at a solution!
Avoid Esty.com Scraping Blocking
Etsy.com is a highly protected website that can detect and block bots such as web scrapers. For example, let's attempt to request a search page on etsy.com with a headless browser to minimize the chances of getting blocked:
from playwright.sync_api import sync_playwright
with sync_playwright() as playwight:
# Lanuch a chrome browser
browser = playwight.chromium.launch(headless=False)
page = browser.new_page()
# Go to leboncoin.fr
page.goto("https://www.etsy.com/search?q=personalized+gifts")
# Take a screenshot
page.screenshot(path="screenshot.png")
Our request has been detected as a bot and we got required to a CAPTCHA challenge before proceeding to the web page:
To avoid etsy.com scraping blocking, we'll use ScrapFly.
For Scrapfly example, all we should do is replace our HTTP client with the ScrapFly client:
import httpx
from parsel import Selector
response = httpx.get("some etsy.com url")
selector = Selector(response.text)
# in ScrapFly SDK becomes
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
scrapfly_client = ScrapflyClient("Your ScrapFly API key")
result: ScrapeApiResponse = scrapfly_client.scrape(ScrapeConfig(
# some homegate.ch URL
"https://www.etsy.com/search?q=personalized+gifts",
# we can select specific proxy country
country="US",
# and enable anti scraping protection bypass
asp=True,
# allows JavaScript rendering similar to headless browsers
render_js=True
))
# get the HTML content
html = result.scrape_result['content']
# use the built-in parsel selector
selector = result.selector
To wrap up this guide on etsy.com web scraping, let's take a look at some frequently asked questions.
Is scraping etsy.com legal?
All the data on etsy.com are publicly available and it's legal to scrape them as long as you don't affect the website performance by keeping your scraping rate reasonable. However, you should pay attention to the GDPR compliance in the EU, which stands against scraping personal data, such as scraping sellers' personal data on Etsy. Refer to our previous guide on web scraping legality for more details.
Is there a public API for etsy.com?
There are no public Etsy API endpoints available however it's Etsy.com is easy and legal to scrape. You can use the scraper code described in this tutorial to create your own web scraping API.
Are there alternatives for etsy.com?
Yes, for more scrape guides about websites similar to Etsy, refer to our #scrapeguide blog tag.
In this article, we explained how to scrape etsy.com, a popular website for hand-crafted and gift products.
We went through a step-by-step process on how to scrape product and review data from Etsy products, shop and search pages using Python. We have seen that etsy.com can detect and block web scrapers. And for that, we have used ScrapFly to avoid Etsy web scraping blocking.
In this article, we'll explore how to scrape Reddit. We'll extract various social data types from subreddits, posts, and user pages. All of which through plain HTTP requests without headless browser usage.
In this scrape guide we'll be taking a look at one of the most popular web scraping targets - LinkedIn.com. We'll be scraping people profiles, company profiles as well as job listings and search.
In this guide, we'll explain how to scrape SimilarWeb through a step-by-step guide. We'll scrape comprehensive website traffic insights, websites comparing data, sitemaps, and trending industry domains.