What is a Headless Browser? Top 5 Headless Browser Tools
Quick overview of new emerging tech of browser automation - what exactly are these tools and how are they used in web scraping?
The Playwright package is a popular web browser automation tool in Python, which can be run in Jupyter notebooks for quick web scraping scripts. However, since Jupyter notebooks runs its own asyncio loops, we cannot start the synchronous playwright client: :
# in Jupyter:
from playwright.sync_api import sync_playwright
playwright = sync_playwright().start()
"""
Error: It looks like you are using Playwright Sync API inside the asyncio loop.
Please use the Async API instead.
"""
The reason behind the above error is that there's an already running event loop. To use Playwright in Jupyter notebooks, we should explicitly use the asynchronous Playwright client using the following code:
# in Jupyter:
import nest_asyncio
import asyncio
from playwright.async_api import async_playwright
import atexit
# Allow nested event loops
nest_asyncio.apply()
async def main():
pw = await async_playwright().start()
browser = await pw.chromium.launch(headless=True)
page = await browser.new_page()
# All methods are async (use the "await" keyword)
await page.goto("https://web-scraping.dev")
src = await page.content()
print(src)
# Function to close browser and stop Playwright
async def shutdown_playwright():
await browser.close()
await pw.stop()
# Register shutdown hook for when the program exits
atexit.register(lambda: asyncio.run(shutdown_playwright()))
# Run the async main function
await main() # Use await directly instead of asyncio.run()
Here, we use playwright's async API and wrap it to the main
function. Then, we execute in a nested asynchronous event loop using nest_asyncio
. Note that the above snippet allows running Playwright in Google Colab since it shares the same concept as Jupyter notebooks.
For further details on web scraping with Playwright, refer to our dedicated guide.
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇