Guide to Python requests POST method
Discover how to use Python's requests library for POST requests, including JSON, form data, and file uploads, along with response handling tips.
To scrape images from a website we can use Python with HTML parsing tools like beautifulsoup to select all <img>
elements and save them.
Here's an example using httpx
and beautifulsoup
(install using pip install httpx beautifulsoup4
):
import asyncio
import httpx
from bs4 import BeautifulSoup
from pathlib import Path
async def download_image(url, filepath, client):
response = await client.get(url)
filepath.write_bytes(response.content)
print(f"Downloaded {url} to {filepath}")
async def scrape_images(url):
download_dir = Path('images')
download_dir.mkdir(parents=True, exist_ok=True)
async with httpx.AsyncClient() as client:
response = await client.get(url)
soup = BeautifulSoup(response.text, "html.parser")
download_tasks = []
for img_tag in soup.find_all("img"):
img_url = img_tag.get("src") # get image url
if img_url:
img_url = response.url.join(img_url) # turn url absolute
img_filename = download_dir / Path(str(img_url)).name
download_tasks.append(
download_image(img_url, img_filename, client)
)
await asyncio.gather(*download_tasks)
# example - scrape all scrapfly blog images:
url = "https://scrapfly.io/blog/"
asyncio.run(scrape_images(url))
Above we are using httpx.AsyncClient
to first retrieve the target page HTML. Then, we extract all src
attributes of all <img>
elements. Finally, we download all images concurrently and save them to ./images
directory.
This knowledgebase is provided by Scrapfly data APIs, check us out! 👇