🚀 We are hiring! See open positions

Knowledge Base

Quick answers to common web scraping questions 161 answers

? Answers

24 answers
Q

Python httpx vs requests vs aiohttp - key differences

When it comes to these 3 popular http client packages they have different strenghts. Here's how to choose the right fit.

http python httpx
Q

How to scrape HTML table to Excel Spreadsheet (.xlsx)?

To scrape tables to Excel spreadsheet we can use bs4, requets and xlsxwriter packages for Python. Here's how.

python data-parsing
Q

How to edit Local Storage data using browser Devtools

To edit Local Storage browser's developer tools, Application tab -> Storage -> Local Storage where each value is represented in key-value format.

tools
Q

How to edit cookies in Chrome devtools?

To edit cookies in Chrome's devtools suite the application->cookies section can be used. Here's how.

tools
Q

How to click on cookie popups and modal alerts in Selenium?

To click on modal alerts like cookie popups in Selenium we can either find the button and click it or remove the modal elements. Here's how.

selenium
Q

How to click on cookie popups and modal alerts in Puppeteer?

To handle modal popups like cookie consents in Puppeteer the popup can be closed through a button click or removed entirely. Here's how.

puppeteer
Q

How to click on cookie popups and modal alerts in Playwright?

To click on modal popups like the infamous cookie conset alert we can either find and click the agree button or remove it entirely. Here's how.

playwright
Q

How to handle popup dialogs in Selenium?

To click on a pop-up alert using Selenium the alert_is_present method can be used to wait for and interact with alerts. Here's how.

selenium
Q

How to handle popup dialogs in Puppeteer?

To click on a popup dialog in Puppeteer the dialog even can be captured and interacted with using page.on("dialog") method. Here's how to do it.

puppeteer
Q

How to handle popup dialogs in Playwright?

To handle alert-type pop ups in Playwright the on "dialog" event can be captured and interacted with in both Python and NodeJS playwright clients

python playwright
Q

How to scroll to the bottom of the page with Selenium?

To scroll to the very bottom of the page the javascript evaluation feature can be used within a while loop. Here's how.

selenium
Q

How to scroll to the bottom of the page with Puppeteer?

To scrape to the very bottom of the page with Puppeteer the javascript evaluation feature can be used within a while loop. Here's how.

puppeteer
Q

How to scroll to the bottom of the page with Playwright?

Learn how to scroll to the bottom of the page with Playwright using three distinct approaches for both Python and NodeJS clients.

playwright
Q

How to install mitmproxy certificate on Chrome and Chromium?

Here are 5 easy steps to install SSL certificates to enable HTTPS traffic capture in mitmproxy tool used for intercepting and analyzing HTTP.

http tools
Q

How to capture background requests and responses in Selenium?

To capture background requests and response selenium needs to be extended with Selenium-wire. Here's how to do it.

selenium
Q

How to block resources in Selenium and Python?

To block http resources in selenium we need an external proxy. Here's how to setup mitmproxy to block requests and responses in Selenium.

selenium
Q

How to use proxies with Python httpx?

To use proxies with Python's httpx library the proxies parameter can be used for http, https and socks5 proxies. Here's how.

http python httpx
Q

How to use proxies with PHP Guzzle?

To use proxies with PHP Guzzle library the proxy parameter can be used which mirrors standard configuration patterns of cURL library.

http php
Q

How to use proxies with NodeJS axios?

To use proxies with axios and nodejs the proxy parameter of get and post methods can be used. Here's how.

http nodejs
Q

What are scrapy middlewares and how to use them?

Scrapy downloader middlewares can be used to intercept and update outgoing requests and incoming responses. Here's how to use them.

scrapy
Q

How to select elements by attribute value in XPath?

To select HTML elements by attribute value the @ syntax can be used together with = or contains() functions. Here's how.

xpath
Q

How to scrape images from a website?

To scrape all images from a given website python with beautifulsoup and httpx can be used. Here's an example.

python
Q

What are scrapy pipelines and how to use them?

Scrapy pipelines can be used to extend scraped result data with new fields or validate the whole datasets. Here's how.

scrapy
Q

Getting started with Puppeteer Stealth

Puppeteer-stealth is a popular plugin for Puppeteer browser automation library. It patches browsers to be less detectible. Here's how to get started.

puppeteer

Ready to scale your web scraping?

Anti-bot bypass, browser rendering, and rotating proxies — all in one API.