Web Scraping With Ruby
Introduction to web scraping with Ruby. How to handle http connections, parse html files for data, best practices, tips and an example project.
The most popular package that implements XPath selectors in Python is lxml. We can use the
xpath() method to find all matching values:
from lxml import etree tree = etree.fromstring(""" <div> <a>link 1</a> <a>link 2</a> </div> """) for result in tree.xpath("//a"): print(result.text) "link 1" "link 2"
However, in web scraping the recommended way is to use the parsel package. It's based on
lxml and providers a more consistent behavior when working with HTML content:
from parsel import Selector selector = Selector(""" <div> <a>link 1</a> <a>link 2</a> </div> """) selector.xpath("//a").getall() ['<a>link 1</a>', '<a>link 2</a>']