How to scrape images from a website?

To scrape images from a website we can use Python with HTML parsing tools like beautifulsoup to select all <img> elements and save them.

Here's an example using httpx and beautifulsoup (install using pip install httpx beautifulsoup4):

import asyncio
import httpx
from bs4 import BeautifulSoup
from pathlib import Path


async def download_image(url, filepath, client):
    response = await client.get(url)
    filepath.write_bytes(response.content)
    print(f"Downloaded {url} to {filepath}")


async def scrape_images(url):
    download_dir = Path('images')
    download_dir.mkdir(parents=True, exist_ok=True)

    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        soup = BeautifulSoup(response.text, "html.parser")
        download_tasks = []
        for img_tag in soup.find_all("img"):
            img_url = img_tag.get("src")  # get image url
            if img_url:
                img_url = response.url.join(img_url)  # turn url absolute
                img_filename = download_dir / Path(str(img_url)).name
                download_tasks.append(
                    download_image(img_url, img_filename, client)
                )
        await asyncio.gather(*download_tasks)

# example - scrape all scrapfly blog images:
url = "https://scrapfly.io/blog/"
asyncio.run(scrape_images(url))

Above we are using httpx.AsyncClient to first retrieve the target page HTML. Then, we extract all src attributes of all <img> elements. Finally, we download all images concurrently and save them to ./images directory.

Question tagged: Python

Related Posts

How to Scrape Realestate.com.au Property Listing Data

We're taking yet another look at real estate websites. This time we're going down under! Realtestate.com.au is the biggest real estate portal in Australia and let's take a look at how to scrape it.

How to Scrape Immowelt.de Real Estate Data

Immowelt.de is a major real estate website in Germany and it's suprisingly easy to scrape. In this tutorial, we'll be using Python and hidden web data scraping technique to scrape real estate property data.

How to Scrape Homegate.ch Real Estate Property Data

For this scrape guide we'll be taking a look at another real estate website in Switzerland - Homegate. For this we'll be using hidden web data scraping and JSON parsing.