What is a Headless Browser? Top 5 Headless Browser Tools
Quick overview of new emerging tech of browser automation - what exactly are these tools and how are they used in web scraping?
To download files using Playwright we can either find the download button/link using the locator function and then click it or we can download it using HTTP client like httpx
or requests
in Python:
from pathlib import Path
from playwright.sync_api import sync_playwright
import httpx # or import requests
def download_file_with_playwright():
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
context = browser.new_context(viewport={"width": 1920, "height": 1080})
page = context.new_page()
page.goto('https://httpbin.dev/html')
# we can either click the download button using locator:
file = page.locator('a')
file.click()
# or we can download the file manually which is more flexible and faster
url = file.get_attribute('href')
response = httpx.get(url)
Path('file.txt').write_bytes(response.content)
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇