Web Scraping With Ruby
Introduction to web scraping with Ruby. How to handle http connections, parse html files for data, best practices, tips and an example project.
CSS selectors allow selecting elements by any attribute such as class, id, href etc.
While class and id have special notation (.
and #
) any attribute can be selected using the attribute selector []
which supports multiple operators:
[attr=match]
checks for exact equality:[attr~=match]
turns the attribute into an array of space-separated values and checks whether it contains the match (similar to .class and #id matchers):[attr|=match]
checks for exact equality except ignores hypen suffixes (e.g. -language
):[attr^=match]
checks whether the attribute starts with the match:[attr^=match]
checks whether the attribute ends with the match:[attr*=match]
checks whether the attribute contains the match:Note: to make all of these matches case insensitive add i
before the closing bracket, e.g. [attr=match i]