🚀 We are hiring! See open positions

Knowledge Base

Quick answers to common web scraping questions 161 answers

? Answers

24 answers
Q

How to wait for page to load in Playwright?

To wait for all content to load in playwright we can use several different options but page.wait_for_selector() is the most reliable one. Here's how t...

python playwright
Q

How to capture background requests and responses in Puppeteer?

To capture background requests and response in Puppeteer we can use page.on() method to intercept every request/response. Here's how.

puppeteer
Q

How to find HTML elements by text with Cheerio and NodeJS?

To find HTML elements by text in NodeJS we can use cheerio library and special ":contains()" selectors. Here's how to do it.

nodejs
Q

How to ignore non HTML URLs when web crawling?

When web crawling to avoid non-html pages we can test for page extensions or content types using HEAD requests. Here's how to do it.

crawling
Q

How to select HTML elements by text using CSS Selectors?

It's not possible to select HTML elements by text in original CSS selectors specification but here are some alternative ways to do it.

data-parsing css-selectors
Q

How to turn HTML to text in Python?

To turn HTML data to text in Python we can use BeautifulSoup's get_text() method which strips away HTML data and leaves text as is. Here's how.

data-parsing beautifulsoup
Q

How to use CSS selectors in NodeJS when web scraping?

There are many ways to execute CSS selectors on HTML text in NodeJS but cheerio and osmosis libraries are the most popular ones. Here's how to use the...

nodejs data-parsing css-selectors
Q

How to use CSS Selectors in Python?

To parse HTML using CSS selectors in Python we can use either BeautifulSoup or Parsel packages. Here's how.

python css-selectors
Q

How to use XPath selectors in NodeJS when web scraping?

To parse HTML using XPath in Nodejs we can use one of two popular libraries like osmosis or xmldom. Here's how.

nodejs data-parsing xpath
Q

How to use XPath selectors in Python?

Python has several options for executing XPath selectors against HTML. The most popular ones are lxml and parsel. Here's how to use them.

python data-parsing xpath
Q

Scraper doesn't see the data I see in the browser - why?

This means that scraper is not rendereding javascript that is changing the page contents. To verify this disable javascript in your browser.

headless-browser data-parsing
Q

How to find elements by CSS selector in Puppeteer?

To find HTML elements using CSS selectors in Puppeteer the $ and $eval methods can be used. Here's how to use them.

headless-browser puppeteer data-parsing
Q

How to find elements by XPath in Puppeteer?

To find elements by XPath using Puppeteer the "$x()" method can be used which will execute XPath selection on the current page DOM.

headless-browser puppeteer data-parsing
Q

How to get page source in Puppeteer?

To retreive page source in Puppteer the page.content() method can be used. Here's how to use it and what are the possible options.

python headless-browser puppeteer
Q

How to load local files in Puppeteer?

To load local files in Puppeteer the file:// URL protocol can be used as the URL protocol prefix which will load file from the file path URI

headless-browser puppeteer
Q

How to save and load cookies in Puppeteer?

To save and load cookies in Puppeteer page.setCookies() and page.cookies() methods can be used. Here's how to do it.

headless-browser puppeteer
Q

How to select elements by class in XPath?

To select HTML elements by class name in XPath we can use the @ attribute selector and comparison function contains(). Here's how to do it.

data-parsing xpath
Q

How to select elements by text in XPath?

To select elements by text using XPath, the contains() function can be used or re:test for selecting based on regular expression patterns.

data-parsing xpath
Q

How to take a screenshot with Puppeteer?

Learn how to take Puppeteer screenshots in NodeJS. You will also learn how to customize it through resolution and viewport customization.

headless-browser puppeteer
Q

How to wait for a page to load in Puppeteer?

To wait for a page to load in Puppeteer the best approach is to wait for a specific element to appear using page.waitForSelector() method. Here's how ...

headless-browser puppeteer
Q

How to find elements by CSS selector in Selenium

To select HTML elements by CSS selectors in Selenium the driver.find_element() method can be used with the By.CSS_SELECTOR option. Here's how to do it...

selenium data-parsing css-selectors
Q

How to find elements by XPath in Selenium

To select HTML elements by CSS selectors in Selenium the driver.find_element() method can be used with the By.XPATH option. Here's how to do it.

python headless-browser selenium
Q

How to find elements without a specific attribute in BeautifulSoup?

To find HTML elements that do NOT contains a specific attribute we can use regular expression matching or lambda functions. Here's how to do it.

python data-parsing beautifulsoup
Q

How to find HTML elements by multiple tags with BeautifulSoup?

To find HTML elements by one of many different element names we can use list of tags in find() methods or CSS selectors. Here's how to do it.

data-parsing beautifulsoup css-selectors

Ready to scale your web scraping?

Anti-bot bypass, browser rendering, and rotating proxies — all in one API.