🚀 We are hiring! See open positions

Knowledge Base

Quick answers to common web scraping questions 161 answers

? Answers

24 answers
Q

How to select all elements between two elements in XPath?

To select all elements between two different elements preceding-sibling or following-sibling axis selectors can be used. Here's how.

data-parsing xpath
Q

How to check if element exists in Playwright?

To check whether an HTML element is present on the page using Playwright the page.locator() method can be used. Here's how.

python playwright
Q

What is Asynchronous Web Scraping?

Asynchronous programming is an accessible way to scale around IO blocking which is especially powerful in web scraping. Here's why.

http
Q

What are devtools and how they're used in web scraping?

Developer tools suite is used in web development but can also be used in web scraping to understand how target websites work. Here's how to use it.

http data-parsing css-selectors
Q

What is MITM and how is it used in web scraping?

MITM tools can be used to intercept and modify http traffic of various applications like web browser or phone apps in web scraper development.

http
Q

What is cURL and how is it used in web scraping?

cURL is the most popular HTTP client and library (libcurl) that implements most of HTTP features meaning it's a powerful web scraping tool too.

http
Q

How to use VPNs as proxies for web scraping

VPNs can be used as IP proxies in web scraping. Here's how and what to keep an eye on when using this approach.

proxies
Q

What is the difference between IPv4 vs IPv6 in web scraping?

IPv4 and IPv6 are two competing Internet Protocol version that have different advantages when it comes to web scraping. Here's what they are.

http proxies
Q

HTTP vs HTTPS in web scraping ?

HTTPS is a secure version of the HTTP protocol which can complicate the web scraping process in many different ways. Here's what it means.

http
Q

What is HTTP cookies role in web scraping?

HTTP cookies play a big role in web scraping. They can be used to configure website preferences and play an important role in scraper detection.

http
Q

How to use cURL in Python?

cURL through libcurl is a popular library used in HTTP connections and can be used with Python through wrapper libraries like pycurl.

http python
Q

Selenium: geckodriver executable needs to be in PATH?

selenium error "geckodriver executable needs to be in PATH" means that gecko driver is not installed or reachable - here's how to fix it.

python selenium
Q

Selenium: chromedriver executable needs to be in PATH?

selenium error "chromedriver executable needs to be in PATH" means that chrome driver is not installed or reachable - here's how to fix it.

python selenium
Q

How to configure Python requests to use a proxy?

Python requests supports many proxy types and options. Here's how to configure most proxy options for web scraping.

requests
Q

How to fix Python requests TooManyRedirects error?

Python's requests.TooManyRedirects exception is raised when server continues to redirect >30 times. Here's how to fix it.

python requests
Q

How to fix Python requests SSLError?

Python's requests.SSLError is caused when encryption certificates mismatch for HTTPS type of URLs. Here's how to fix it.

python requests
Q

How to fix Python requests ReadTimeout error?

Python requests.ReadTimeout is caused when resources cannot be read fast enough. Here's how to fix it.

python requests
Q

How to fix Python requests MissingSchema error?

Python "requests.MissingSchema" exception is usually caused by a missing protocol part in the URL. Most commonly when relative URL is used.

python requests
Q

How to fix python requests ConnectTimeout error?

Python's ConnectTimeout exception is caused when connection can't be established fast enough. Here's how to fix it.

python requests
Q

How to run Playwright in Jupyter notebooks?

Learn why the synchronous execution of Playwright is blocked on Jupyter notebooks and how to solve it using asyncio.

playwright jupyter
Q

How to open Python http responses in a web browser?

To preview Python http responses we can use temporary files and the built-in webbrowser module. Here's how.

python requests httpx
Q

Web scraping - what is HTTP 520 status code?

Response error 502 generally means the server cannot create a valid response. This could also mean the client is being blocked. Here's how to fix it.

blocking
Q

Web scraping - what is HTTP 503 status code?

Response error 503 generally means the server is temporarily unavailable however it could also mean blocking. Here's how to fix it.

blocking
Q

Web scraping - what is HTTP 499 status code?

Response error 499 generally means the server has closed the connection unexpectedly. This could mean the client is being blocked. Here's how to fix i...

blocking

Ready to scale your web scraping?

Anti-bot bypass, browser rendering, and rotating proxies — all in one API.