How To Take Screenshots In Python?
Learn how to take Python screenshots through Selenium and Playwright, including common browser tips and tricks for customizing web page captures.
To scrape images from a website we can use Python with HTML parsing tools like beautifulsoup to select all <img>
elements and save them.
Here's an example using httpx
and beautifulsoup
(install using pip install httpx beautifulsoup4
):
import asyncio
import httpx
from bs4 import BeautifulSoup
from pathlib import Path
async def download_image(url, filepath, client):
response = await client.get(url)
filepath.write_bytes(response.content)
print(f"Downloaded {url} to {filepath}")
async def scrape_images(url):
download_dir = Path('images')
download_dir.mkdir(parents=True, exist_ok=True)
async with httpx.AsyncClient() as client:
response = await client.get(url)
soup = BeautifulSoup(response.text, "html.parser")
download_tasks = []
for img_tag in soup.find_all("img"):
img_url = img_tag.get("src") # get image url
if img_url:
img_url = response.url.join(img_url) # turn url absolute
img_filename = download_dir / Path(str(img_url)).name
download_tasks.append(
download_image(img_url, img_filename, client)
)
await asyncio.gather(*download_tasks)
# example - scrape all scrapfly blog images:
url = "https://scrapfly.io/blog/"
asyncio.run(scrape_images(url))
Above we are using httpx.AsyncClient
to first retrieve the target page HTML. Then, we extract all src
attributes of all <img>
elements. Finally, we download all images concurrently and save them to ./images
directory.
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇