🚀 We are hiring! See open positions

Knowledge Base

Quick answers to common web scraping questions 161 answers

? Answers

24 answers
Q

How to use headless browsers with scrapy?

To use headless browser with scrapy a plugin like scrapy-playwright can be used. Here's how to use it and what are some other alternatives.

headless-browser scrapy
Q

How to rotate proxies in scrapy spiders?

To rotate proxies in scrapy spiders a request middleware can be used to randomly or smartly select the most viable proxy. Here's how.

proxies scrapy
Q

How to pass custom parameters to scrapy spiders?

To pass custom parameters to scrapy spider there CLI argument -a can be used. Here's how and why is it such a useful feature.

scrapy
Q

How to add headers to every or some scrapy requests?

To add headers to scrapy's request the `DEFAULT_REQUEST_HEADERS` settting or a custom request middleware can be used. Here's how.

http scrapy
Q

What are scrapy Item and ItemLoader objects and how to use them?

Scrapy's Item and ItemLoader classes are great way to structure dataset parsing logic. Here's how to use it.

scrapy
Q

Is it possible to select preceding siblings using CSS selectors?

It's not possible to select preceding sibling directly but there are easy alternatives that can be implemented to select preceding siblings.

css-selectors
Q

How to select following siblings using CSS selectors?

To select following sibling elements using CSS selectors the + and ~ operators can be used. Here's how.

css-selectors
Q

How to select elements by ID using CSS selectors?

To select elements that contain an ID the #id selector can be used. To select elements by exact ID the [id="some value"] can be used. Here's how.

css-selectors
Q

How to select elements by class using CSS selectors?

To select elements by class the .class selector can be used. To select by exact class value the [class="exact value"] can be used instead. Here's how.

css-selectors
Q

How to select elements by attribute using CSS selectors?

To select elements by attribute the powerful attribute selector can be used which has several selection options. Here's how.

css-selectors
Q

How to pass data from start_requests to parse callbacks in scrapy?

To pass data between scrapy callbacks like start_request and parse the Request.meta attribute can be used. Here's how.

scrapy
Q

How to pass data between scrapy callbacks in Scrapy?

To pass data between scrapy callbacks when scraping multiple pages the Request.item can be used. Here's how.

scrapy
Q

How to select sibling elements in XPath?

To select sibling elements in XPath the preceding-sibling and following-sibling axis can be used. Here's how and why it's so useful.

xpath
Q

How to select last element in XPath?

To select last element in XPath we cannot use indexing as -1 index is not supported. Instead, last() function can be used. Here's how.

data-parsing xpath
Q

How to select elements of a specific position in XPath?

To select elements of a specific position the position() function can be used in a selection predicate. Here's how.

xpath
Q

How to select any element using wildcard in XPath?

To select any element the wildcard "*" axis selector can be used which will select any HTML element of any name within the current context.

xpath
Q

How to select elements by ID in XPath?

To select elements by ID attribute in XPath we can directly match it using = operator in a predicate or contains() function. Here's how.

xpath
Q

How to select element with one of many names in XPath?

To select an element with name matching one from an array of names the name() method can be used. Here's how.

xpath
Q

How to reverse expressions in XPath?

To reverse expressions and predicates in XPath the not() function can be used. Here's how and why it's so useful.

xpath
Q

How to join values using XPath concat?

To join values in XPath the concat() function can be used to concatenate strings into one string. Here's how.

xpath
Q

How to get the name of an HTML element in XPath?

To find the name of a selected HTML element with XPath the name() function can be used. Here's how and why is this useful.

xpath
Q

How to count selections in XPath and why?

To count number of selected elements by an XPath selector the count() function can be used. Here's how to do it and why it's useful.

xpath
Q

What are some ways to parse JSON datasets in Python?

There are several popular options when it comes to JSON dataset parsing in Python. The most popular packages are Jmespath and Jsonpath.

python data-parsing
Q

How to select dictionary key recursively in Python?

To select dictionary keys recursively in Python the "nested-lookup" package implements the most popular nested key selection algorithms.

python data-parsing

Ready to scale your web scraping?

Anti-bot bypass, browser rendering, and rotating proxies — all in one API.