Knowledge Base

Q

Web scraping - what is HTTP 499 status code?

Response error 499 generally means the server has closed the connection unexpectedly. This could mean the client is being blocked. Here's how to fix i...

blocking

Q

Web scraping - what is HTTP 503 status code?

Response error 503 generally means the server is temporarily unavailable however it could also mean blocking. Here's how to fix it.

blocking

Q

Web scraping - what is HTTP 520 status code?

Response error 502 generally means the server cannot create a valid response. This could also mean the client is being blocked. Here's how to fix it.

blocking

Q

What are Cloudflare Errors 1006, 1007, 1008?

Cloudflare is a popular anti web scraping service and errors 1006, 1007 and 1008 are popular web scraping blocking errors. Here's how to avoid them.

blocking

Q

What is Cloudflare Error 1009?

Cloudflare is a popular web scraping blocking service and error 1009 access denied is a popular error for web scraper blocking. Here's how to avoid it...

blocking

Q

What is Cloudflare Error 1010?

Cloudflare is a popular web scraping blocking service and error 1010 access denied is a popular error for web scraper blocking. Here's how to avoid it...

blocking

Q

What is Cloudflare Error 1020?

Cloudflare error 1020 access denied is a common web error when web scraping caused by Cloudflare anti scraping service. Here's how to avoid it.

blocking

Q

3 ways to install Python Requests library

Python requests library is a popular HTTP client and here's how to install it using pip, poetry and pipenv.

requests

Q

How to scrape Perimeter X: Please verify you are human?

Perimeter X is a popular anti-scraping protection service - here's how to avoid it when web scraping.

blocking

Q

XPath vs CSS selectors: what's the difference?

CSS selectors and XPath are both path languages for HTML parsing. Xpath is more powerful but CSS is more approachable - which is one is better?

css-selectors xpath

Q

How to save and load cookies in Python requests?

To save session between script runs we can save and load requests session cookies to disk. Here's how to do in Python requests.

http python requests

Q

How to download a file with Playwright and Python?

To download files using Playwright we can either simulate the button click or extract the url and download it using HTTP. Here's how.

playwright

Q

How to get file type of an URL in Python?

There are 2 ways to determine URL file type: guess by url extension using mimetypes module or do a HTTP HEAD request. Here's how.

python crawling

Q

How to load local files in Playwright?

To load local files as page URLs in Playwright we can use the file:// protocol. Here's how to do it.

playwright

Q

How to save and load cookies in Playwright?

To persist playwright connection session between program runs we can save and load cookies to/from disk. Here's how.

playwright

Q

How to take a screenshot with Playwright?

To take page screenshots in playwright we can use page.screenshot() method. Here's how to select areas and how to screenshot them in playwright.

python playwright

Q

How to block image loading in Selenium?

To increase Selenium's performance we can block images. To do that with Chrome browser "prefs" launch option can be used. Here's how.

python selenium

Q

Scrapy vs Beautifulsoup - what's the difference?

Scrapy and BeautifulSoup are two popular web scraping libraries though very different. Scrapy is a framework while beautifulsoup is a HTML parser

beautifulsoup scrapy

Q

How to scroll to an element in Selenium?

In Selenium, the scrollIntoView JavaScript function can be used to scroll to a specific HTML element. Here's how to use it in Selenium.

python selenium

Q

How to find elements by XPath selectors in Playwright?

To execute XPath selectors in playwright the page.locator() method can be used. Here's how.

playwright xpath

Q

How to block resources in Playwright and Python?

Blocking non-vital resources can drastically speed up Playwright. To do that page interception feature can be used. Here's how.

python playwright

Q

How to capture background requests and responses in Playwright?

To capture background requests and response in Playwright we can use request/response interception feature through page.on() method. Here's how.

python playwright

Q

How to find elements by CSS selectors in Playwright?

To execute CSS selectors on current HTML data in Playwright the page.locator() method can be used. Here's how.

python playwright

Q

How to parse dynamic CSS classes when web scraping?

Dynamic CSS can make be very difficult to scrape. There are a few tricks and common idioms to approach this though.

data-parsing

? Answers

Web scraping - what is HTTP 499 status code?

Web scraping - what is HTTP 503 status code?

Web scraping - what is HTTP 520 status code?

What are Cloudflare Errors 1006, 1007, 1008?

What is Cloudflare Error 1009?

What is Cloudflare Error 1010?

What is Cloudflare Error 1020?

3 ways to install Python Requests library

How to scrape Perimeter X: Please verify you are human?

XPath vs CSS selectors: what's the difference?

How to save and load cookies in Python requests?

How to download a file with Playwright and Python?

How to get file type of an URL in Python?

How to load local files in Playwright?

How to save and load cookies in Playwright?

How to take a screenshot with Playwright?

How to block image loading in Selenium?

Scrapy vs Beautifulsoup - what's the difference?

How to scroll to an element in Selenium?

How to find elements by XPath selectors in Playwright?

How to block resources in Playwright and Python?

How to capture background requests and responses in Playwright?

How to find elements by CSS selectors in Playwright?

How to parse dynamic CSS classes when web scraping?

Ready to scale your web scraping?

Products

Features

SDKs

No-Code Platforms

LLM & RAG Apps

Technical Challenges

Popular Targets

Real Estate

eCommerce

Social Media

Company & Reviews

Jobs

Search & SEO

Fashion

Travel & Hotels

Industry Solutions

? Answers

Web scraping - what is HTTP 499 status code?

Web scraping - what is HTTP 503 status code?

Web scraping - what is HTTP 520 status code?

What are Cloudflare Errors 1006, 1007, 1008?

What is Cloudflare Error 1009?

What is Cloudflare Error 1010?

What is Cloudflare Error 1020?

3 ways to install Python Requests library

How to scrape Perimeter X: Please verify you are human?

XPath vs CSS selectors: what's the difference?

How to save and load cookies in Python requests?

How to download a file with Playwright and Python?

How to get file type of an URL in Python?

How to load local files in Playwright?

How to save and load cookies in Playwright?

How to take a screenshot with Playwright?

How to block image loading in Selenium?

Scrapy vs Beautifulsoup - what's the difference?

How to scroll to an element in Selenium?

How to find elements by XPath selectors in Playwright?

How to block resources in Playwright and Python?

How to capture background requests and responses in Playwright?

How to find elements by CSS selectors in Playwright?

How to parse dynamic CSS classes when web scraping?

Ready to scale your web scraping?