Guide to List Crawling: Everything You Need to Know
In-depth look at list crawling - how to extract valuable data from list-formatted content like tables, listicles and paginated pages.
No, Python's BeautifulSoup doesn't support XPath selectors despite supporting lxml
backend which can perform XPath queries.
To use XPath selectors either lxml
or parsel
packages must be used.
parsel is a modern wrapper around lxml
which makes xpath selections very easy:
from parsel import Selector
selector = Selector(text='<div class="price">22.85</div>')
print(selector.xpath("//div[@class='price']/text()").get())
"22.85"
Alternatively, lxml can be used directly:
from lxml import html
tree = html.fromstring('<div class="price">22.85</div>')
print(tree.xpath("//div[@class='price']/text()"))
"22.85"
This knowledgebase is provided by Scrapfly data APIs, check us out! 👇