What is a Headless Browser? Top 5 Headless Browser Tools
Quick overview of new emerging tech of browser automation - what exactly are these tools and how are they used in web scraping?
When web scraping, we might want to pause our scraping session by saving cookies and resume it later. Using Playwright, to save and load cookies we need to refer to the context
object which has methods cookies()
and add_cookies()
:
import json
from pathlib import Path
from playwright.sync_api import sync_playwright
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
# To save cookies to a file first extract them from the browser context:
context = browser.new_context(viewport={"width": 1920, "height": 1080})
page = context.new_page()
page.goto("https://httpbin.dev/cookies/set/mycookie/myvalue")
cookies = context.cookies()
Path("cookies.json").write_text(json.dumps(cookies))
# Then, we can restore cookies from file:
context = browser.new_context(viewport={"width": 1920, "height": 1080})
context.add_cookies(json.loads(Path("cookies.json").read_text()))
page = context.new_page()
page.goto("https://httpbin.dev/cookies")
print(context.cookies()) # we can test whether they were set correctly
# will print:
[
{
"sameSite": "Lax",
"name": "mycookie",
"value": "myvalue",
"domain": "httpbin.dev",
"path": "/",
"expires": -1,
"httpOnly": False,
"secure": False,
}
]
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇