# Scrapfly Documentation

## Table of Contents

### Dashboard

- [Intro](https://scrapfly.io/docs)
- [Project](https://scrapfly.io/docs/project)
- [Account](https://scrapfly.io/docs/account)
- [Workspace & Team](https://scrapfly.io/docs/workspace-and-team)
- [Billing](https://scrapfly.io/docs/billing)

### Products

#### MCP Server

- [Getting Started](https://scrapfly.io/docs/mcp/getting-started)
- [Tools & API Spec](https://scrapfly.io/docs/mcp/tools)
- [Authentication](https://scrapfly.io/docs/mcp/authentication)
- [Examples & Use Cases](https://scrapfly.io/docs/mcp/examples)
- [FAQ](https://scrapfly.io/docs/mcp/faq)
##### Integrations

- [Overview](https://scrapfly.io/docs/mcp/integrations)
- [Claude Desktop](https://scrapfly.io/docs/mcp/integrations/claude-desktop)
- [Claude Code](https://scrapfly.io/docs/mcp/integrations/claude-code)
- [ChatGPT](https://scrapfly.io/docs/mcp/integrations/chatgpt)
- [Cursor](https://scrapfly.io/docs/mcp/integrations/cursor)
- [Cline](https://scrapfly.io/docs/mcp/integrations/cline)
- [Windsurf](https://scrapfly.io/docs/mcp/integrations/windsurf)
- [Zed](https://scrapfly.io/docs/mcp/integrations/zed)
- [Roo Code](https://scrapfly.io/docs/mcp/integrations/roo-code)
- [VS Code](https://scrapfly.io/docs/mcp/integrations/vscode)
- [LangChain](https://scrapfly.io/docs/mcp/integrations/langchain)
- [LlamaIndex](https://scrapfly.io/docs/mcp/integrations/llamaindex)
- [CrewAI](https://scrapfly.io/docs/mcp/integrations/crewai)
- [OpenAI](https://scrapfly.io/docs/mcp/integrations/openai)
- [n8n](https://scrapfly.io/docs/mcp/integrations/n8n)
- [Make](https://scrapfly.io/docs/mcp/integrations/make)
- [Zapier](https://scrapfly.io/docs/mcp/integrations/zapier)
- [Vapi AI](https://scrapfly.io/docs/mcp/integrations/vapi)
- [Agent Builder](https://scrapfly.io/docs/mcp/integrations/agent-builder)
- [Custom Client](https://scrapfly.io/docs/mcp/integrations/custom-client)


#### Web Scraping API

- [Getting Started](https://scrapfly.io/docs/scrape-api/getting-started)
- [API Specification]()
- [Monitoring](https://scrapfly.io/docs/monitoring)
- [Customize Request](https://scrapfly.io/docs/scrape-api/custom)
- [Debug](https://scrapfly.io/docs/scrape-api/debug)
- [Anti Scraping Protection](https://scrapfly.io/docs/scrape-api/anti-scraping-protection)
- [Proxy](https://scrapfly.io/docs/scrape-api/proxy)
- [Proxy Mode](https://scrapfly.io/docs/scrape-api/proxy-mode)
- [Proxy Mode - Screaming Frog](https://scrapfly.io/docs/scrape-api/proxy-mode/screaming-frog)
- [Proxy Mode - Apify](https://scrapfly.io/docs/scrape-api/proxy-mode/apify)
- [(Auto) Data Extraction](https://scrapfly.io/docs/scrape-api/extraction)
- [Javascript Rendering](https://scrapfly.io/docs/scrape-api/javascript-rendering)
- [Javascript Scenario](https://scrapfly.io/docs/scrape-api/javascript-scenario)
- [SSL](https://scrapfly.io/docs/scrape-api/ssl)
- [DNS](https://scrapfly.io/docs/scrape-api/dns)
- [Cache](https://scrapfly.io/docs/scrape-api/cache)
- [Session](https://scrapfly.io/docs/scrape-api/session)
- [Webhook](https://scrapfly.io/docs/scrape-api/webhook)
- [Screenshot](https://scrapfly.io/docs/scrape-api/screenshot)
- [Errors](https://scrapfly.io/docs/scrape-api/errors)
- [Timeout](https://scrapfly.io/docs/scrape-api/understand-timeout)
- [Throttling](https://scrapfly.io/docs/throttling)
- [Troubleshoot](https://scrapfly.io/docs/scrape-api/troubleshoot)
- [Billing](https://scrapfly.io/docs/scrape-api/billing)
- [FAQ](https://scrapfly.io/docs/scrape-api/faq)

#### Crawler API

- [Getting Started](https://scrapfly.io/docs/crawler-api/getting-started)
- [API Specification]()
- [Retrieving Results](https://scrapfly.io/docs/crawler-api/results)
- [WARC Format](https://scrapfly.io/docs/crawler-api/warc-format)
- [Data Extraction](https://scrapfly.io/docs/crawler-api/extraction-rules)
- [Webhook](https://scrapfly.io/docs/crawler-api/webhook)
- [Billing](https://scrapfly.io/docs/crawler-api/billing)
- [Errors](https://scrapfly.io/docs/crawler-api/errors)
- [Troubleshoot](https://scrapfly.io/docs/crawler-api/troubleshoot)
- [FAQ](https://scrapfly.io/docs/crawler-api/faq)

#### Screenshot API

- [Getting Started](https://scrapfly.io/docs/screenshot-api/getting-started)
- [API Specification]()
- [Accessibility Testing](https://scrapfly.io/docs/screenshot-api/accessibility)
- [Webhook](https://scrapfly.io/docs/screenshot-api/webhook)
- [Billing](https://scrapfly.io/docs/screenshot-api/billing)
- [Errors](https://scrapfly.io/docs/screenshot-api/errors)

#### Extraction API

- [Getting Started](https://scrapfly.io/docs/extraction-api/getting-started)
- [API Specification]()
- [Rules Template](https://scrapfly.io/docs/extraction-api/rules-and-template)
- [LLM Extraction](https://scrapfly.io/docs/extraction-api/llm-prompt)
- [AI Auto Extraction](https://scrapfly.io/docs/extraction-api/automatic-ai)
- [Webhook](https://scrapfly.io/docs/extraction-api/webhook)
- [Billing](https://scrapfly.io/docs/extraction-api/billing)
- [Errors](https://scrapfly.io/docs/extraction-api/errors)
- [FAQ](https://scrapfly.io/docs/extraction-api/faq)

#### Proxy Saver

- [Getting Started](https://scrapfly.io/docs/proxy-saver/getting-started)
- [Fingerprints](https://scrapfly.io/docs/proxy-saver/fingerprints)
- [Optimizations](https://scrapfly.io/docs/proxy-saver/optimizations)
- [SSL Certificates](https://scrapfly.io/docs/proxy-saver/certificates)
- [Protocols](https://scrapfly.io/docs/proxy-saver/protocols)
- [Pacfile](https://scrapfly.io/docs/proxy-saver/pacfile)
- [Secure Credentials](https://scrapfly.io/docs/proxy-saver/security)
- [Billing](https://scrapfly.io/docs/proxy-saver/billing)

#### Cloud Browser API

- [Getting Started](https://scrapfly.io/docs/cloud-browser-api/getting-started)
- [Proxy & Geo-Targeting](https://scrapfly.io/docs/cloud-browser-api/proxy)
- [Unblock API](https://scrapfly.io/docs/cloud-browser-api/unblock)
- [File Downloads](https://scrapfly.io/docs/cloud-browser-api/file-downloads)
- [Session Resume](https://scrapfly.io/docs/cloud-browser-api/session-resume)
- [Human-in-the-Loop](https://scrapfly.io/docs/cloud-browser-api/human-in-the-loop)
- [Debug Mode](https://scrapfly.io/docs/cloud-browser-api/debug-mode)
- [Bring Your Own Proxy](https://scrapfly.io/docs/cloud-browser-api/bring-your-own-proxy)
- [Browser Extensions](https://scrapfly.io/docs/cloud-browser-api/extensions)
##### Integrations

- [Puppeteer](https://scrapfly.io/docs/cloud-browser-api/puppeteer)
- [Playwright](https://scrapfly.io/docs/cloud-browser-api/playwright)
- [Selenium](https://scrapfly.io/docs/cloud-browser-api/selenium)
- [Vercel Agent Browser](https://scrapfly.io/docs/cloud-browser-api/agent-browser)
- [Browser Use](https://scrapfly.io/docs/cloud-browser-api/browser-use)
- [Stagehand](https://scrapfly.io/docs/cloud-browser-api/stagehand)

- [Billing](https://scrapfly.io/docs/cloud-browser-api/billing)
- [Errors](https://scrapfly.io/docs/cloud-browser-api/errors)


### Tools

- [Antibot Detector](https://scrapfly.io/docs/tools/antibot-detector)

### SDK

- [Golang](https://scrapfly.io/docs/sdk/golang)
- [Python](https://scrapfly.io/docs/sdk/python)
- [Rust](https://scrapfly.io/docs/sdk/rust)
- [TypeScript](https://scrapfly.io/docs/sdk/typescript)
- [Scrapy](https://scrapfly.io/docs/sdk/scrapy)

### Integrations

- [Getting Started](https://scrapfly.io/docs/integration/getting-started)
- [LangChain](https://scrapfly.io/docs/integration/langchain)
- [LlamaIndex](https://scrapfly.io/docs/integration/llamaindex)
- [CrewAI](https://scrapfly.io/docs/integration/crewai)
- [Zapier](https://scrapfly.io/docs/integration/zapier)
- [Make](https://scrapfly.io/docs/integration/make)
- [n8n](https://scrapfly.io/docs/integration/n8n)

### Academy

- [Overview](https://scrapfly.io/academy)
- [Web Scraping Overview](https://scrapfly.io/academy/scraping-overview)
- [Tools](https://scrapfly.io/academy/tools-overview)
- [Reverse Engineering](https://scrapfly.io/academy/reverse-engineering)
- [Static Scraping](https://scrapfly.io/academy/static-scraping)
- [HTML Parsing](https://scrapfly.io/academy/html-parsing)
- [Dynamic Scraping](https://scrapfly.io/academy/dynamic-scraping)
- [Hidden API Scraping](https://scrapfly.io/academy/hidden-api-scraping)
- [Headless Browsers](https://scrapfly.io/academy/headless-browsers)
- [Hidden Web Data](https://scrapfly.io/academy/hidden-web-data)
- [JSON Parsing](https://scrapfly.io/academy/json-parsing)
- [Data Processing](https://scrapfly.io/academy/data-processing)
- [Scaling](https://scrapfly.io/academy/scaling)
- [Walkthrough Summary](https://scrapfly.io/academy/walkthrough-summary)
- [Scraper Blocking](https://scrapfly.io/academy/scraper-blocking)
- [Proxies](https://scrapfly.io/academy/proxies)

---

# Scraper Blocking

 Unfortunately, many websites do not want to be scraped despite serving their content publicly. To do this, countless technologies have been developed to detect and block web scrapers.

## Introduction

 We can separate the scraper-blocking subject into two categories: unintentional scraper misconfiguration and intentional detection and blocking.

### Scraper Misconfiguration

 This is unintentional scraper blocking that happens when scrapers fail to match details required by site requests. Most commonly, this is caused by a missing header, cookies or some javascript functionality.

See these related Scrapeground exercises:

 [##### Referer Header Exercise

Referer header tells the website what page led to it which is an easy way to identify web scrapers

 

 ](https://scrapfly.io/scrapeground/headers/referer) [##### CSRF and Similar Tokens Exercise

Tokens like CSRF are required to scrape hidden web APIs without being blocked.

 

 ](https://scrapfly.io/scrapeground/headers/csrf) 

 So, it's important to replicate all request details in the scrape configuration. This includes headers, cookies, secret tokens and even connection patterns to avoid detection through http configuration.

### Scraper Identification

 Scrapers are often blocked intentionally. This is done by identifying whether incoming requests are coming from human browsers or automated programs. As there are many differences between the two, identification of scrapers is often trivial without proper preparation.

 Web scraping blocking is a major subject, and we recommend taking a look at the complete introduction on the Scrapfly blog 👇

 [ ##### Intro to Web Scraper Blocking

 This introduction hub covers web scraper blocking in all of its multifaceted forms: IP addresses, Proxies, Fingerprinting and all the sneaky tech used to identify web scrapers for blocking.

 

 ](https://scrapfly.io/blog/posts/how-to-scrape-without-getting-blocked-tutorial/)### Honeypots

 Honeypots are a common technique used to identify scrapers. They are hidden parts of the page that are only visible to scrapers and not to human users. When a scraper interacts with a honeypot it can be easily identified and blocked.

 This means that scrapers need to be strict in their scraping logic to prevent stumbling into any honeypots.

 [ ##### Intro to Honeypots in Scraping

 What exactly are honeypots and how are they relevant in web scraping and some popular examples with solutions on how to bypass them.

 

 ](https://scrapfly.io/blog/posts/what-are-honeypots-and-how-to-avoid-them/)### Captchas

 Captchas are a way for websites to validate whether the connecting client has a human at the end of it. It's an interactive task that is difficult to solve for robots but easy for humans.

 Fortunately, nobody likes captchas and these are used as a last resort when it comes to blocking web scrapers. This is done through a trust score calculation where the connection is analyzed first and only the least trustworthy connections are served captcha challenges.

 This means, to bypass captchas we can either fortify our scraper trust score to never receive one in the first place or solve captchas using image recognition software and similar solvers.

 [ ##### Intro to Captchas in Scraping

 How web scrapers are fortified to bypass and avoid captcha challenges and an overview of captcha tech in scraping.

 

 ](https://scrapfly.io/blog/posts/how-to-bypass-captcha-while-web-scraping-in-2024/)## Anti-bot Protection Services

 Identifying and blocking scrapers is a complex process and in turn, is becoming a major industry. This brings dedicated services called WAF which shield the entire website from scrapers and other connections.

 These services are very difficult to bypass and each has its own unique way of identifying scrapers so should be approached as an individual challenge. Here are some of the most popular ones and our intro articles on them:

 [##### Cloudflare

 

 ](https://scrapfly.io/blog/posts/how-to-bypass-cloudflare-anti-scraping/) [##### Akamai

 

 ](https://scrapfly.io/blog/posts/how-to-bypass-akamai-anti-scraping/) [##### Kasada

 

 ](https://scrapfly.io/blog/posts/how-to-bypass-kasada-anti-scraping-waf/) [##### Datadome

 

 ](https://scrapfly.io/blog/posts/how-to-bypass-datadome-anti-scraping/) [##### Imperva Incapsula

 

 ](https://scrapfly.io/blog/posts/how-to-bypass-imperva-incapsula-anti-scraping/) [##### PerimeterX

 

 ](https://scrapfly.io/blog/posts/how-to-bypass-perimeterx-human-anti-scraping/) 

 If you're unsure which WAF you're dealing with there are many tools like [wafw00f](https://github.com/EnableSecurity/wafw00f) which can detect which WAF is being used.

> ## Easy Mode with Scrapfly
> 
>  One of the main features of Scrapfly is the brilliant blocking bypass offered by Scrapfly's [Anti Scraping Protection Bypass](https://scrapfly.io/docs/scrape-api/anti-scraping-protection) which brings back the fun in web scraping as you can focus on the data you want to extract instead of committing time and resources to bypass blocking.

## Community Tools and Extensions

 As blocking is such a major issue in web scraping, many community tools can assist in bypassing common blocking techniques. These tools range from identification leak patches to external services that alter and manage connections. Here are some of the most popular ones:

 [##### Curl Impersonate

Extends the famous cURL HTTP client with TLS and HTTP fingerprints that mimic real Chrome and Firefox web browsers.

 

 ](https://scrapfly.io/blog/posts/curl-impersonate-scrape-chrome-firefox-tls-http2-fingerprint/) [##### FlareSolverr Server

Proxy server for solving Cloudflare javascript challenges.

 

 ](https://scrapfly.io/blog/posts/how-to-bypass-cloudflare-with-flaresolverr/) [##### Undetected-Chromedriver

Selenium security plugin that prevents known scraper detection leaks.

 

 ](https://scrapfly.io/blog/posts/web-scraping-without-blocking-using-undetected-chromedriver/) 

## Next - Proxies

 Next, let's take a look at the most important details for bypassing scraper blocking - IP Proxy use and rotation.

 [&lt;](https://scrapfly.io/academy/data-processing "Previous Page") [&gt;](https://scrapfly.io/academy/proxies "Next Page")