How to get file type of an URL in Python?

by scrapecrow Dec 05, 2022

To get the file type of an URL we have 2 options - check the URL string for file suffix or perform a HEAD request:

import mimetypes

# mimetypes module can analysize string for file extensions:
mimetypes.guess_type("http://example.com/file.pdf")
('application/pdf', None)
mimetypes.guess_type("http://example.com/song.mp3")
('audio/mpeg', None)


mimetypes.guess_type("http://example.com/file-without-extension")
(None, None)
# for files without extension we can make head request which only downloads the metadata
import httpx
response = httpx.head("https://httpbin.dev/html").headers['Content-Type']
'text/html; charset=utf-8'
httpx.head("https://wiki.mozilla.org/images/3/37/Mozilla_MDN_Guide.pdf").headers['Content-Type']
'application/pdf'

When web scraping and web crawling knowing content type before retrieving URL contents can save a lot of bandwidth and speed up the web scraping process. For example, when crawling we only want to follow HTML pages and avoid media files.

How to get file type of an URL in Python?

Related Articles

Guide to List Crawling: Everything You Need to Know

How to Find All URLs on a Domain

Intro to Web Scraping Images with Python

How to Scrape Sitemaps to Discover Scraping Targets

How to Scrape YouTube in 2025

What is Rate Limiting? Everything You Need to Know