# Scrapfly Documentation

## Table of Contents

### Dashboard

- [Intro](https://scrapfly.io/docs)
- [Project](https://scrapfly.io/docs/project)
- [Account](https://scrapfly.io/docs/account)
- [Workspace & Team](https://scrapfly.io/docs/workspace-and-team)
- [Billing](https://scrapfly.io/docs/billing)

### Products

#### MCP Server

- [Getting Started](https://scrapfly.io/docs/mcp/getting-started)
- [Tools & API Spec](https://scrapfly.io/docs/mcp/tools)
- [Authentication](https://scrapfly.io/docs/mcp/authentication)
- [Examples & Use Cases](https://scrapfly.io/docs/mcp/examples)
- [FAQ](https://scrapfly.io/docs/mcp/faq)
##### Integrations

- [Overview](https://scrapfly.io/docs/mcp/integrations)
- [Claude Desktop](https://scrapfly.io/docs/mcp/integrations/claude-desktop)
- [Claude Code](https://scrapfly.io/docs/mcp/integrations/claude-code)
- [ChatGPT](https://scrapfly.io/docs/mcp/integrations/chatgpt)
- [Cursor](https://scrapfly.io/docs/mcp/integrations/cursor)
- [Cline](https://scrapfly.io/docs/mcp/integrations/cline)
- [Windsurf](https://scrapfly.io/docs/mcp/integrations/windsurf)
- [Zed](https://scrapfly.io/docs/mcp/integrations/zed)
- [Roo Code](https://scrapfly.io/docs/mcp/integrations/roo-code)
- [VS Code](https://scrapfly.io/docs/mcp/integrations/vscode)
- [LangChain](https://scrapfly.io/docs/mcp/integrations/langchain)
- [LlamaIndex](https://scrapfly.io/docs/mcp/integrations/llamaindex)
- [CrewAI](https://scrapfly.io/docs/mcp/integrations/crewai)
- [OpenAI](https://scrapfly.io/docs/mcp/integrations/openai)
- [n8n](https://scrapfly.io/docs/mcp/integrations/n8n)
- [Make](https://scrapfly.io/docs/mcp/integrations/make)
- [Zapier](https://scrapfly.io/docs/mcp/integrations/zapier)
- [Vapi AI](https://scrapfly.io/docs/mcp/integrations/vapi)
- [Agent Builder](https://scrapfly.io/docs/mcp/integrations/agent-builder)
- [Custom Client](https://scrapfly.io/docs/mcp/integrations/custom-client)


#### Web Scraping API

- [Getting Started](https://scrapfly.io/docs/scrape-api/getting-started)
- [API Specification]()
- [Monitoring](https://scrapfly.io/docs/monitoring)
- [Customize Request](https://scrapfly.io/docs/scrape-api/custom)
- [Debug](https://scrapfly.io/docs/scrape-api/debug)
- [Anti Scraping Protection](https://scrapfly.io/docs/scrape-api/anti-scraping-protection)
- [Proxy](https://scrapfly.io/docs/scrape-api/proxy)
- [Proxy Mode](https://scrapfly.io/docs/scrape-api/proxy-mode)
- [Proxy Mode - Screaming Frog](https://scrapfly.io/docs/scrape-api/proxy-mode/screaming-frog)
- [Proxy Mode - Apify](https://scrapfly.io/docs/scrape-api/proxy-mode/apify)
- [(Auto) Data Extraction](https://scrapfly.io/docs/scrape-api/extraction)
- [Javascript Rendering](https://scrapfly.io/docs/scrape-api/javascript-rendering)
- [Javascript Scenario](https://scrapfly.io/docs/scrape-api/javascript-scenario)
- [SSL](https://scrapfly.io/docs/scrape-api/ssl)
- [DNS](https://scrapfly.io/docs/scrape-api/dns)
- [Cache](https://scrapfly.io/docs/scrape-api/cache)
- [Session](https://scrapfly.io/docs/scrape-api/session)
- [Webhook](https://scrapfly.io/docs/scrape-api/webhook)
- [Screenshot](https://scrapfly.io/docs/scrape-api/screenshot)
- [Errors](https://scrapfly.io/docs/scrape-api/errors)
- [Timeout](https://scrapfly.io/docs/scrape-api/understand-timeout)
- [Throttling](https://scrapfly.io/docs/throttling)
- [Troubleshoot](https://scrapfly.io/docs/scrape-api/troubleshoot)
- [Billing](https://scrapfly.io/docs/scrape-api/billing)
- [FAQ](https://scrapfly.io/docs/scrape-api/faq)

#### Crawler API

- [Getting Started](https://scrapfly.io/docs/crawler-api/getting-started)
- [API Specification]()
- [Retrieving Results](https://scrapfly.io/docs/crawler-api/results)
- [WARC Format](https://scrapfly.io/docs/crawler-api/warc-format)
- [Data Extraction](https://scrapfly.io/docs/crawler-api/extraction-rules)
- [Webhook](https://scrapfly.io/docs/crawler-api/webhook)
- [Billing](https://scrapfly.io/docs/crawler-api/billing)
- [Errors](https://scrapfly.io/docs/crawler-api/errors)
- [Troubleshoot](https://scrapfly.io/docs/crawler-api/troubleshoot)
- [FAQ](https://scrapfly.io/docs/crawler-api/faq)

#### Screenshot API

- [Getting Started](https://scrapfly.io/docs/screenshot-api/getting-started)
- [API Specification]()
- [Accessibility Testing](https://scrapfly.io/docs/screenshot-api/accessibility)
- [Webhook](https://scrapfly.io/docs/screenshot-api/webhook)
- [Billing](https://scrapfly.io/docs/screenshot-api/billing)
- [Errors](https://scrapfly.io/docs/screenshot-api/errors)

#### Extraction API

- [Getting Started](https://scrapfly.io/docs/extraction-api/getting-started)
- [API Specification]()
- [Rules Template](https://scrapfly.io/docs/extraction-api/rules-and-template)
- [LLM Extraction](https://scrapfly.io/docs/extraction-api/llm-prompt)
- [AI Auto Extraction](https://scrapfly.io/docs/extraction-api/automatic-ai)
- [Webhook](https://scrapfly.io/docs/extraction-api/webhook)
- [Billing](https://scrapfly.io/docs/extraction-api/billing)
- [Errors](https://scrapfly.io/docs/extraction-api/errors)
- [FAQ](https://scrapfly.io/docs/extraction-api/faq)

#### Proxy Saver

- [Getting Started](https://scrapfly.io/docs/proxy-saver/getting-started)
- [Fingerprints](https://scrapfly.io/docs/proxy-saver/fingerprints)
- [Optimizations](https://scrapfly.io/docs/proxy-saver/optimizations)
- [SSL Certificates](https://scrapfly.io/docs/proxy-saver/certificates)
- [Protocols](https://scrapfly.io/docs/proxy-saver/protocols)
- [Pacfile](https://scrapfly.io/docs/proxy-saver/pacfile)
- [Secure Credentials](https://scrapfly.io/docs/proxy-saver/security)
- [Billing](https://scrapfly.io/docs/proxy-saver/billing)

#### Cloud Browser API

- [Getting Started](https://scrapfly.io/docs/cloud-browser-api/getting-started)
- [Proxy & Geo-Targeting](https://scrapfly.io/docs/cloud-browser-api/proxy)
- [Unblock API](https://scrapfly.io/docs/cloud-browser-api/unblock)
- [File Downloads](https://scrapfly.io/docs/cloud-browser-api/file-downloads)
- [Session Resume](https://scrapfly.io/docs/cloud-browser-api/session-resume)
- [Human-in-the-Loop](https://scrapfly.io/docs/cloud-browser-api/human-in-the-loop)
- [Debug Mode](https://scrapfly.io/docs/cloud-browser-api/debug-mode)
- [Bring Your Own Proxy](https://scrapfly.io/docs/cloud-browser-api/bring-your-own-proxy)
- [Browser Extensions](https://scrapfly.io/docs/cloud-browser-api/extensions)
##### Integrations

- [Puppeteer](https://scrapfly.io/docs/cloud-browser-api/puppeteer)
- [Playwright](https://scrapfly.io/docs/cloud-browser-api/playwright)
- [Selenium](https://scrapfly.io/docs/cloud-browser-api/selenium)
- [Vercel Agent Browser](https://scrapfly.io/docs/cloud-browser-api/agent-browser)
- [Browser Use](https://scrapfly.io/docs/cloud-browser-api/browser-use)
- [Stagehand](https://scrapfly.io/docs/cloud-browser-api/stagehand)

- [Billing](https://scrapfly.io/docs/cloud-browser-api/billing)
- [Errors](https://scrapfly.io/docs/cloud-browser-api/errors)


### Tools

- [Antibot Detector](https://scrapfly.io/docs/tools/antibot-detector)

### SDK

- [Golang](https://scrapfly.io/docs/sdk/golang)
- [Python](https://scrapfly.io/docs/sdk/python)
- [Rust](https://scrapfly.io/docs/sdk/rust)
- [TypeScript](https://scrapfly.io/docs/sdk/typescript)
- [Scrapy](https://scrapfly.io/docs/sdk/scrapy)

### Integrations

- [Getting Started](https://scrapfly.io/docs/integration/getting-started)
- [LangChain](https://scrapfly.io/docs/integration/langchain)
- [LlamaIndex](https://scrapfly.io/docs/integration/llamaindex)
- [CrewAI](https://scrapfly.io/docs/integration/crewai)
- [Zapier](https://scrapfly.io/docs/integration/zapier)
- [Make](https://scrapfly.io/docs/integration/make)
- [n8n](https://scrapfly.io/docs/integration/n8n)

### Academy

- [Overview](https://scrapfly.io/academy)
- [Web Scraping Overview](https://scrapfly.io/academy/scraping-overview)
- [Tools](https://scrapfly.io/academy/tools-overview)
- [Reverse Engineering](https://scrapfly.io/academy/reverse-engineering)
- [Static Scraping](https://scrapfly.io/academy/static-scraping)
- [HTML Parsing](https://scrapfly.io/academy/html-parsing)
- [Dynamic Scraping](https://scrapfly.io/academy/dynamic-scraping)
- [Hidden API Scraping](https://scrapfly.io/academy/hidden-api-scraping)
- [Headless Browsers](https://scrapfly.io/academy/headless-browsers)
- [Hidden Web Data](https://scrapfly.io/academy/hidden-web-data)
- [JSON Parsing](https://scrapfly.io/academy/json-parsing)
- [Data Processing](https://scrapfly.io/academy/data-processing)
- [Scaling](https://scrapfly.io/academy/scaling)
- [Walkthrough Summary](https://scrapfly.io/academy/walkthrough-summary)
- [Scraper Blocking](https://scrapfly.io/academy/scraper-blocking)
- [Proxies](https://scrapfly.io/academy/proxies)

---

# Static Page Scraping

 Static HTML pages are the easiest and simplest form pages encountered in web scraping. An easy way to confirm whether the page is static or not is to disable javascript in your browser and confirm whether the data is still present.

 For static page scraping we only need an HTTP client to fetch the page and an HTML parser to extract the data fields we want.

 An example of a static HTML page would be [web-scraping.dev/products](https://web-scraping.dev/products) page which lists a page of products without using any javascript. On the other hand, [web-scraping.dev/testimonials](https://web-scraping.dev/testimonials) is a dynamic page that uses javascript to load pages as the user scrolls down.

 Let's focus on the static products listing page and take a look at how to scrape it using Python as a reference.

## Example Scraper

 For this example scraper we'll be using Python with `httpx` as our HTTP client. It can be installed using `pip install httpx[http2]`. Note that we are installing the `http2` version of this library as it's a good practice to scrape with the latest HTTP version available.

- [Python](#delay)
- [Scrapfly.py](#selector)
 
 ```
import httpx

# 1. we can use httpx directly:
response = httpx.get(url="https://web-scraping.dev/products")

# 2. or create a configurable client (recommended):
client = httpx.Client(
  http2=True, # enable http2 support
  follow_redirects=True,  # automatically follow redirects (status codes 30x)
  headers={
    "user-agent": "scrapfly academy",
  }
)
response = client.get(url="https://web-scraping.dev/products")

print(response.status_code)
# 200
print(response.text)
# ...

```

 

   

 

 

 ```
from scrapfly import ScrapflyClient, ScrapeConfig

client = ScrapflyClient(key="YOUR SCRAPFLY KEY")
result = client.scrape(ScrapeConfig(
  url="https://web-scraping.dev/products"
))
print(result.upstream_status_code)
# 200
print(result.content)
# ...

```

 

   

 

 

 

 

 Above, we are using HTTP protocol to pull page data from our example URL. For that, we use minimal configuration by enabling `http2`. We're also setting some custom headers which are a way for the requesting client to provide metadata about the request - who's sending it and from where?

 In return, we receive a response object which is either a success or an error. This is indicated by the `status_code` property where `200` range numbers mean success and others indicate an error.

 In the next section we'll take a look at how to parse the HTML data we received and wrap up our scraper though before that we really recommend skimming over the types of HTTP challenges 👇

## Challenges

 Static page scraping introduces us to the first set of web scraping challenges that relate to HTTP connections. We can divide them into two practical categories.

### Technical Challenges

 To successfully retrieve the pages our HTTP requests must be valid. This means the correct URL must be used, request headers and even the HTTP version can play an effect.

##### FAQ [What case should headers be in? Lowercase?](https://scrapfly.io/blog/answers/what-case-should-http-headers-be/) [What's the difference between HTTP vs HTTPS?](https://scrapfly.io/blog/answers/http-vs-https-in-web-scraping/) [What are cookies in web scraping?](https://scrapfly.io/blog/answers/http-cookies-in-web-scraping/) 



 [ ##### Intro to HTTPX for Web Scraping

See this in-depth tutorial for more details on configuring httpx for web scraping.

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-python-httpx/#using-httpx) [ ##### Scraping in Different Languages and Currencies

Example how HTTP requests can be configured to access different localizations of the website

 

 ](https://scrapfly.io/blog/posts/how-to-scrape-in-another-language-or-currency/) 

 

 

### Handling Page State

 While HTTP in itself is stateless (meaning requests 1 and 2 are independent of each other), web pages can build extra layers to track the client state. Most commonly this is being done through **cookies**.

 Cookies are just normal headers that contain `key=value` data though they have a special standard functionality. The web server expects the client to store `Set-Cookie` response header values in its database and send the whole value back using `Cookie` request header. This is how cookies are used for persistent states and it can be a surprising challenge for new web scrapers.

 [](https://scrapfly.io/scrapeground/cookies) ##### Scrapeground Exercise: Cookies in Web Scraping

   See this in-depth tutorial on Cookies in web scraping on Scrapfly Scrapeground. This example demonstrates how login systems use cookies to track user sessions.

 

 

### Blocking Challenges

 This also introduces us to web scraper blocking. Any unusual HTTP behavior can indicate that the request is not coming from a web browser user. So, it's important to replicate the HTTP behavior of a web browser as much as possible. This includes using the same HTTP version, headers, cookies, and even connection patterns.

 We cover blocking in great detail in [Scraper Blocking](https://scrapfly.io/academy/scraper-blocking) section.

#### Related to HTTP Blocking:

 [ ##### User-Agent Header Explanation and Intro

This particular header is one of the most important request headers in web scraping.

 

 ](https://scrapfly.io/blog/posts/user-agent-header-in-web-scraping/) [ ##### How Headers are used to Identify Web Scrapers

Headers play a major role in scrape blocking, here's how it's all being done.

 

 ](https://scrapfly.io/blog/posts/how-to-avoid-web-scraping-blocking-headers/) 

## Next up - Parsing HTML

 We got our HTML data from this page which is not very useful. Next, let's take a look at how to parse it using HTML parsing tools.



 [&lt;](https://scrapfly.io/academy/reverse-engineering "Previous Page") [&gt;](https://scrapfly.io/academy/html-parsing "Next Page")