How to scrape images from a website?

To scrape images from a website we can use Python with HTML parsing tools like beautifulsoup to select all <img> elements and save them.

Here's an example using httpx and beautifulsoup (install using pip install httpx beautifulsoup4):

import asyncio
import httpx
from bs4 import BeautifulSoup
from pathlib import Path


async def download_image(url, filepath, client):
    response = await client.get(url)
    filepath.write_bytes(response.content)
    print(f"Downloaded {url} to {filepath}")


async def scrape_images(url):
    download_dir = Path('images')
    download_dir.mkdir(parents=True, exist_ok=True)

    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        soup = BeautifulSoup(response.text, "html.parser")
        download_tasks = []
        for img_tag in soup.find_all("img"):
            img_url = img_tag.get("src")  # get image url
            if img_url:
                img_url = response.url.join(img_url)  # turn url absolute
                img_filename = download_dir / Path(str(img_url)).name
                download_tasks.append(
                    download_image(img_url, img_filename, client)
                )
        await asyncio.gather(*download_tasks)

# example - scrape all scrapfly blog images:
url = "https://scrapfly.io/blog/"
asyncio.run(scrape_images(url))

Above we are using httpx.AsyncClient to first retrieve the target page HTML. Then, we extract all src attributes of all <img> elements. Finally, we download all images concurrently and save them to ./images directory.

Question tagged: Python

Related Posts

How to Scrape Reddit Posts, Subreddits and Profiles

In this article, we'll explore how to scrape Reddit. We'll extract various social data types from subreddits, posts, and user pages. All of which through plain HTTP requests without headless browser usage.

How to Scrape With Headless Firefox

Discover how to use headless Firefox with Selenium, Playwright, and Puppeteer for web scraping, including practical examples for each library.

How to Scrape LinkedIn.com Profile, Company, and Job Data

In this scrape guide we'll be taking a look at one of the most popular web scraping targets - LinkedIn.com. We'll be scraping people profiles, company profiles as well as job listings and search.