# Scrapfly Documentation

## Table of Contents

### Dashboard

- [Intro](https://scrapfly.io/docs)
- [Project](https://scrapfly.io/docs/project)
- [Account](https://scrapfly.io/docs/account)
- [Workspace & Team](https://scrapfly.io/docs/workspace-and-team)
- [Billing](https://scrapfly.io/docs/billing)

### Products

#### MCP Server

- [Getting Started](https://scrapfly.io/docs/mcp/getting-started)
- [Tools & API Spec](https://scrapfly.io/docs/mcp/tools)
- [Authentication](https://scrapfly.io/docs/mcp/authentication)
- [Examples & Use Cases](https://scrapfly.io/docs/mcp/examples)
- [FAQ](https://scrapfly.io/docs/mcp/faq)
##### Integrations

- [Overview](https://scrapfly.io/docs/mcp/integrations)
- [Claude Desktop](https://scrapfly.io/docs/mcp/integrations/claude-desktop)
- [Claude Code](https://scrapfly.io/docs/mcp/integrations/claude-code)
- [ChatGPT](https://scrapfly.io/docs/mcp/integrations/chatgpt)
- [Cursor](https://scrapfly.io/docs/mcp/integrations/cursor)
- [Cline](https://scrapfly.io/docs/mcp/integrations/cline)
- [Windsurf](https://scrapfly.io/docs/mcp/integrations/windsurf)
- [Zed](https://scrapfly.io/docs/mcp/integrations/zed)
- [Roo Code](https://scrapfly.io/docs/mcp/integrations/roo-code)
- [VS Code](https://scrapfly.io/docs/mcp/integrations/vscode)
- [LangChain](https://scrapfly.io/docs/mcp/integrations/langchain)
- [LlamaIndex](https://scrapfly.io/docs/mcp/integrations/llamaindex)
- [CrewAI](https://scrapfly.io/docs/mcp/integrations/crewai)
- [OpenAI](https://scrapfly.io/docs/mcp/integrations/openai)
- [n8n](https://scrapfly.io/docs/mcp/integrations/n8n)
- [Make](https://scrapfly.io/docs/mcp/integrations/make)
- [Zapier](https://scrapfly.io/docs/mcp/integrations/zapier)
- [Vapi AI](https://scrapfly.io/docs/mcp/integrations/vapi)
- [Agent Builder](https://scrapfly.io/docs/mcp/integrations/agent-builder)
- [Custom Client](https://scrapfly.io/docs/mcp/integrations/custom-client)


#### Web Scraping API

- [Getting Started](https://scrapfly.io/docs/scrape-api/getting-started)
- [API Specification]()
- [Monitoring](https://scrapfly.io/docs/monitoring)
- [Customize Request](https://scrapfly.io/docs/scrape-api/custom)
- [Debug](https://scrapfly.io/docs/scrape-api/debug)
- [Anti Scraping Protection](https://scrapfly.io/docs/scrape-api/anti-scraping-protection)
- [Proxy](https://scrapfly.io/docs/scrape-api/proxy)
- [Proxy Mode](https://scrapfly.io/docs/scrape-api/proxy-mode)
- [Proxy Mode - Screaming Frog](https://scrapfly.io/docs/scrape-api/proxy-mode/screaming-frog)
- [Proxy Mode - Apify](https://scrapfly.io/docs/scrape-api/proxy-mode/apify)
- [(Auto) Data Extraction](https://scrapfly.io/docs/scrape-api/extraction)
- [Javascript Rendering](https://scrapfly.io/docs/scrape-api/javascript-rendering)
- [Javascript Scenario](https://scrapfly.io/docs/scrape-api/javascript-scenario)
- [SSL](https://scrapfly.io/docs/scrape-api/ssl)
- [DNS](https://scrapfly.io/docs/scrape-api/dns)
- [Cache](https://scrapfly.io/docs/scrape-api/cache)
- [Session](https://scrapfly.io/docs/scrape-api/session)
- [Webhook](https://scrapfly.io/docs/scrape-api/webhook)
- [Screenshot](https://scrapfly.io/docs/scrape-api/screenshot)
- [Errors](https://scrapfly.io/docs/scrape-api/errors)
- [Timeout](https://scrapfly.io/docs/scrape-api/understand-timeout)
- [Throttling](https://scrapfly.io/docs/throttling)
- [Troubleshoot](https://scrapfly.io/docs/scrape-api/troubleshoot)
- [Billing](https://scrapfly.io/docs/scrape-api/billing)
- [FAQ](https://scrapfly.io/docs/scrape-api/faq)

#### Crawler API

- [Getting Started](https://scrapfly.io/docs/crawler-api/getting-started)
- [API Specification]()
- [Retrieving Results](https://scrapfly.io/docs/crawler-api/results)
- [WARC Format](https://scrapfly.io/docs/crawler-api/warc-format)
- [Data Extraction](https://scrapfly.io/docs/crawler-api/extraction-rules)
- [Webhook](https://scrapfly.io/docs/crawler-api/webhook)
- [Billing](https://scrapfly.io/docs/crawler-api/billing)
- [Errors](https://scrapfly.io/docs/crawler-api/errors)
- [Troubleshoot](https://scrapfly.io/docs/crawler-api/troubleshoot)
- [FAQ](https://scrapfly.io/docs/crawler-api/faq)

#### Screenshot API

- [Getting Started](https://scrapfly.io/docs/screenshot-api/getting-started)
- [API Specification]()
- [Accessibility Testing](https://scrapfly.io/docs/screenshot-api/accessibility)
- [Webhook](https://scrapfly.io/docs/screenshot-api/webhook)
- [Billing](https://scrapfly.io/docs/screenshot-api/billing)
- [Errors](https://scrapfly.io/docs/screenshot-api/errors)

#### Extraction API

- [Getting Started](https://scrapfly.io/docs/extraction-api/getting-started)
- [API Specification]()
- [Rules Template](https://scrapfly.io/docs/extraction-api/rules-and-template)
- [LLM Extraction](https://scrapfly.io/docs/extraction-api/llm-prompt)
- [AI Auto Extraction](https://scrapfly.io/docs/extraction-api/automatic-ai)
- [Webhook](https://scrapfly.io/docs/extraction-api/webhook)
- [Billing](https://scrapfly.io/docs/extraction-api/billing)
- [Errors](https://scrapfly.io/docs/extraction-api/errors)
- [FAQ](https://scrapfly.io/docs/extraction-api/faq)

#### Proxy Saver

- [Getting Started](https://scrapfly.io/docs/proxy-saver/getting-started)
- [Fingerprints](https://scrapfly.io/docs/proxy-saver/fingerprints)
- [Optimizations](https://scrapfly.io/docs/proxy-saver/optimizations)
- [SSL Certificates](https://scrapfly.io/docs/proxy-saver/certificates)
- [Protocols](https://scrapfly.io/docs/proxy-saver/protocols)
- [Pacfile](https://scrapfly.io/docs/proxy-saver/pacfile)
- [Secure Credentials](https://scrapfly.io/docs/proxy-saver/security)
- [Billing](https://scrapfly.io/docs/proxy-saver/billing)

#### Cloud Browser API

- [Getting Started](https://scrapfly.io/docs/cloud-browser-api/getting-started)
- [Proxy & Geo-Targeting](https://scrapfly.io/docs/cloud-browser-api/proxy)
- [Unblock API](https://scrapfly.io/docs/cloud-browser-api/unblock)
- [File Downloads](https://scrapfly.io/docs/cloud-browser-api/file-downloads)
- [Session Resume](https://scrapfly.io/docs/cloud-browser-api/session-resume)
- [Human-in-the-Loop](https://scrapfly.io/docs/cloud-browser-api/human-in-the-loop)
- [Debug Mode](https://scrapfly.io/docs/cloud-browser-api/debug-mode)
- [Bring Your Own Proxy](https://scrapfly.io/docs/cloud-browser-api/bring-your-own-proxy)
- [Browser Extensions](https://scrapfly.io/docs/cloud-browser-api/extensions)
##### Integrations

- [Puppeteer](https://scrapfly.io/docs/cloud-browser-api/puppeteer)
- [Playwright](https://scrapfly.io/docs/cloud-browser-api/playwright)
- [Selenium](https://scrapfly.io/docs/cloud-browser-api/selenium)
- [Vercel Agent Browser](https://scrapfly.io/docs/cloud-browser-api/agent-browser)
- [Browser Use](https://scrapfly.io/docs/cloud-browser-api/browser-use)
- [Stagehand](https://scrapfly.io/docs/cloud-browser-api/stagehand)

- [Billing](https://scrapfly.io/docs/cloud-browser-api/billing)
- [Errors](https://scrapfly.io/docs/cloud-browser-api/errors)


### Tools

- [Antibot Detector](https://scrapfly.io/docs/tools/antibot-detector)

### SDK

- [Golang](https://scrapfly.io/docs/sdk/golang)
- [Python](https://scrapfly.io/docs/sdk/python)
- [Rust](https://scrapfly.io/docs/sdk/rust)
- [TypeScript](https://scrapfly.io/docs/sdk/typescript)
- [Scrapy](https://scrapfly.io/docs/sdk/scrapy)

### Integrations

- [Getting Started](https://scrapfly.io/docs/integration/getting-started)
- [LangChain](https://scrapfly.io/docs/integration/langchain)
- [LlamaIndex](https://scrapfly.io/docs/integration/llamaindex)
- [CrewAI](https://scrapfly.io/docs/integration/crewai)
- [Zapier](https://scrapfly.io/docs/integration/zapier)
- [Make](https://scrapfly.io/docs/integration/make)
- [n8n](https://scrapfly.io/docs/integration/n8n)

### Academy

- [Overview](https://scrapfly.io/academy)
- [Web Scraping Overview](https://scrapfly.io/academy/scraping-overview)
- [Tools](https://scrapfly.io/academy/tools-overview)
- [Reverse Engineering](https://scrapfly.io/academy/reverse-engineering)
- [Static Scraping](https://scrapfly.io/academy/static-scraping)
- [HTML Parsing](https://scrapfly.io/academy/html-parsing)
- [Dynamic Scraping](https://scrapfly.io/academy/dynamic-scraping)
- [Hidden API Scraping](https://scrapfly.io/academy/hidden-api-scraping)
- [Headless Browsers](https://scrapfly.io/academy/headless-browsers)
- [Hidden Web Data](https://scrapfly.io/academy/hidden-web-data)
- [JSON Parsing](https://scrapfly.io/academy/json-parsing)
- [Data Processing](https://scrapfly.io/academy/data-processing)
- [Scaling](https://scrapfly.io/academy/scaling)
- [Walkthrough Summary](https://scrapfly.io/academy/walkthrough-summary)
- [Scraper Blocking](https://scrapfly.io/academy/scraper-blocking)
- [Proxies](https://scrapfly.io/academy/proxies)

---

# Scraping Tools and Languages

 Web scraping can be done in almost any programming language, and we cover hands-on **introductions** to the most popular ones:

 [ ##### Python \[recommended\]

The most popular and accessible language for web scraping. Best overall choice, many great libraries and built-in tools.

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-python/) [ ##### Typescript \[recommended\]

Second most popular choice for web scraping. Strong web dev ecosystem but weaker web scraping packaging.

 

 ](https://scrapfly.io/blog/posts/ultimate-intro-to-web-scraping-with-typescript/) [ ##### PHP

Classic web backend language has all the right tools for scraping but lacks data tooling and can be difficult to work with.

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-php-101/) [ ##### R

Popular choice for statisticians and data scientists. Strong data processing ecosystem but weaker web scraping packages.

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-r/) [ ##### Ruby

In many ways similar to Python with strong web development ecosystem but weaker web scraping packages.

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-ruby/) 

 While web scraping can be done in almost any language not every language is equally fit for this diverse niche. Primarily, the existing library support like a modern HTTP client, browser control client, HTML and JSON parsers are important for successful scraping. For this, **Python** and **Javascript** (Typescript) are generally considered to be the best overall options.

> Scrapfly's [Python](https://scrapfly.io/docs/sdk/python), [Typescript](https://scrapfly.io/docs/sdk/typescript) and [Golang](https://scrapfly.io/docs/sdk/golang) SDK's come with batteries included and handle all the hard parts for you!

## Web Scraping Libraries

 Web scraping covers a variety of niches based on scraped target though we can divide web scraping libraries into a few primary categories:

###   HTTP Clients

 HTTP clients are used to make HTTP requests to the target website. They are the most basic building block of web scraping and are used to fetch the HTML of the target page.

 HTTP clients are also used to communicate with Scrapfly API. Though, Scrapfly uses its own special HTTP client that is optimized for scraping.

 

 [ ##### Introduction to HTTPX

 HTTPX is one of the most popular HTTP clients for Python. It's modern, fast and asynchronous which is ideal for web scraping. 

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-python-httpx/) 

 

###   Browser Control Clients

 [ ##### Scraping Using Browsers

 Web scraping with the help of web browsers is one of the easiest ways to scrape modern web pages though it can get complex

 

 ](https://scrapfly.io/blog/posts/scraping-using-browsers/) 

 Alternative to HTTP clients we have browser automation clients that allow to control real web browsers for scraping. This is useful for scraping [dynamic web pages](https://scrapfly.io/academy/dynamic-scraping) that require Javascript to display the desired data.

 It's not without its downsides, however. Browsers are extremely resource intensive and complex which can be difficult to manage at scale.

 

 

 There are multiple browser automation tools available though these 3 are the most popular ones that are used in web scraping:

 [ ##### Playwright

The most modern client accessible from Python and Javascript. Supports Chrome and Firefox browsers.

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-playwright-and-python/) [ ##### Selenium

Classic choice with the biggest web scraping community.

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-selenium-and-python/) [ ##### Puppeteer

Predecessor to Playwright. Only available NodeJS with a sizable scraping community around it.

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-puppeteer-and-nodejs/) 

> Scrapfly's [Javascript Rendering](https://scrapfly.io/docs/scrape-api/javascript-rendering) and [Javascript Scenario](https://scrapfly.io/docs/scrape-api/javascript-scenario) features are the next evolution of browser automation.

###   HTML Parsers

 HTML parsers are used to parse the scraped HTML data to extract the desired data. Parsing is not only used to process the results but also for parts of web scraping logic like finding page links to follow when crawling or indicating what elements scraper should interact with when using interaction features (like clicking buttons).

 There are two primary technologies used for HTML parsing which are their own unique query languages:

 [##### CSS Selectors

 Used to select HTML elements for style application but can also be used to select elements for web automation.

 

 ](https://scrapfly.io/blog/posts/parsing-html-with-css/) [##### XPath

 More powerful version of CSS Selectors. Often used where more complex data selection is needed.

 

 ](https://scrapfly.io/blog/posts/parsing-html-with-xpath/) 

 Alternatively, many HTML parsing clients implement native methods and functions that perform very similarly to XPath or CSS Selectors:

 [  

##### Introduction to BeautifulSoup

 The most popular HTML parsing library for Python. It has its own native methods like `find` and `find_all` as well as CSS selector support.

 

 

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-python-beautifulsoup/)> Scrapfly Python and Typescript SDKs include access to both XPath and CSS selectors through the `scrape_response.selector` property.

###   JSON Parsers 

 JSON is becoming an increasingly popular web data format and many modern scraped pages contain JSON data that is used to render the page. Often these scraped dataset are big and complex requiring a dedicated JSON parser to extract the desired data.

 There are many ways to process JSON but tools that mirror HTML parsing techniques like CSS Selectors and XPath are the most popular ones in web scraping:

 [##### JMesPath

With client support for almost every programming language jmpespath can reshape, cleanup and parse JSON

 

 ](https://scrapfly.io/blog/posts/parse-json-jmespath-python/) [##### JsonPath

Inspire by XPath jsonpath mirrors many of the same features but for JSON instead of HTML.

 

 ](https://scrapfly.io/blog/posts/parse-json-jsonpath-python/) 

###   Utility

 There are too many powerful utility libraries that benefit web scrapers but here are some categories to keep an eye out for and some examples:

- URL formatting - creating, mixing and modifying URLs can get surprisingly complex.
- Regular Expression helpers and extension - regex is a powerful tool for text processing but can be difficult to use.
- Data Parsing utilities - there's a lot of free form data in web scraping which can be difficult to navigate without proper tooling.
 
 

 [ ##### Top Web Scraping Libraries for Python

We list our top web scraping library selection ranging from scraping itself to many utilities around it.

 

 ](https://scrapfly.io/blog/posts/top-10-web-scraping-libraries-in-python/) 

 

###   Scraping Frameworks

 While web scraping frameworks are becoming less popular as modern web is becoming more complex it can still be a good choice for many web scraping projects.

 [ ##### Introduction to Scrapy

Scrapy is a big framework, but this tutorial is a good way to get started!

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-scrapy/) 

 Scrapy is by far the most popular framework for web scraping at scale though here are some things to look out for when evaluating a framework:

- Modern HTTP client - http2 support, async support, proxy support, etc.
- Active development - web scraping is a rapidly evolving subject
- Concurrency - as web scraping is an IO bound task concurrency is a must for scaling any web scraping task.
 
 

> Scrapfly SDK [includes scrapy integration](https://scrapfly.io/docs/sdk/scrapy) powering up basic scrapy spiders with all of the Scrapfly functionality!



---

 While there's a lot of research to do when choosing the right web scraping environment we can pick it up as we go. Further on we'll be using **Python** and **Typescript** for our examples, but before that let's take a look at some common web scraping terms.

 

 [&lt;](https://scrapfly.io/academy/scraping-overview "Previous Page") [&gt;](https://scrapfly.io/academy/reverse-engineering "Next Page")