How to Scrape With Headless Firefox
Discover how to use headless Firefox with Selenium, Playwright, and Puppeteer for web scraping, including practical examples for each library.
To download files using Playwright we can either find the download button/link using the locator function and then click it or we can download it using HTTP client like httpx
or requests
in Python:
from pathlib import Path
from playwright.sync_api import sync_playwright
import httpx # or import requests
def download_file_with_playwright():
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
context = browser.new_context(viewport={"width": 1920, "height": 1080})
page = context.new_page()
page.goto('https://httpbin.dev/html')
# we can either click the download button using locator:
file = page.locator('a')
file.click()
# or we can download the file manually which is more flexible and faster
url = file.get_attribute('href')
response = httpx.get(url)
Path('file.txt').write_bytes(response.content)