Web Scraping With Ruby
Introduction to web scraping with Ruby. How to handle http connections, parse html files for data, best practices, tips and an example project.
Python has several popular packages that can parse HTML using CSS selectors.
The most popular one is BeautifulSoup which can execute CSS selectors through the select()
and select_one()
methods:
from bs4 import BeautifulSoup
soup = BeautifulSoup("""
<a>link 1</a>
<a>link 2</a>
""")
print(soup.select_one('a'))
"<a>link 1</a>"
print(soup.select('a'))
["<a>link 1</a>", "<a>link 2</a>"]
Another popular package is parsel (also used by scrapy) which can execute CSS selectors through the css()
method:
from parsel import Selector
soup = Selector("""
<a>link 1</a>
<a>link 2</a>
""")
print(soup.css('a').get())
"<a>link 1</a>"
print(soup.css('a').getall())
["<a>link 1</a>", "<a>link 2</a>"]