Playwright vs Selenium
Explore the key differences between Playwright vs Selenium in terms of performance, web scraping, and automation testing for modern web applications.
The Playwright package is a popular web browser automation tool in Python, which can be run in Jupyter notebooks for quick web scraping scripts. However, since Jupyter notebooks runs its own asyncio loops, we cannot start the synchronous playwright client: :
# in Jupyter:
from playwright.sync_api import sync_playwright
playwright = sync_playwright().start()
"""
Error: It looks like you are using Playwright Sync API inside the asyncio loop.
Please use the Async API instead.
"""
The reason behind the above error is that there's an already running event loop. To use Playwright in Jupyter notebooks, we should explicitly use the asynchronous Playwright client using the following code:
# in Jupyter:
import nest_asyncio
import asyncio
from playwright.async_api import async_playwright
import atexit
# Allow nested event loops
nest_asyncio.apply()
async def main():
pw = await async_playwright().start()
browser = await pw.chromium.launch(headless=True)
page = await browser.new_page()
# All methods are async (use the "await" keyword)
await page.goto("https://web-scraping.dev")
src = await page.content()
print(src)
# Function to close browser and stop Playwright
async def shutdown_playwright():
await browser.close()
await pw.stop()
# Register shutdown hook for when the program exits
atexit.register(lambda: asyncio.run(shutdown_playwright()))
# Run the async main function
await main() # Use await directly instead of asyncio.run()
Here, we use playwright's async API and wrap it to the main
function. Then, we execute in a nested asynchronous event loop using nest_asyncio
. Note that the above snippet allows running Playwright in Google Colab since it shares the same concept as Jupyter notebooks.
For further details on web scraping with Playwright, refer to our dedicated guide.
This knowledgebase is provided by Scrapfly data APIs, check us out! 👇