Web Scraping Without Blocking With Undetected ChromeDriver
In this tutorial we'll be taking a look at a new popular web scraping tool Undetected ChromeDriver which is a Selenium extension that allows to bypass many scraper blocking techniques.
Response status code 429 generally means the client is making too many requests. In web scraping, this often happens when scraping too fast.
One way to avoid status code 429 is to slow down our connections using rate limiting. This is especially common when using high scale asynchronous scrapers like Python's asyncio
or scrapy
. For that see our guide how to rate limit python requests
Another way to avoid 429 status code is to distribute connections through multiple agents. For this, proxies and proxy rotation can be used. For that see our guide how to rotate proxies
Alternatively, ScrapFly web scraping API can be used to automatically distribute connection to avoid low rate limits imposed by some websites.