How to get file type of an URL in Python?

by scrapecrow Dec 05, 2022

To get the file type of an URL we have 2 options - check the URL string for file suffix or perform a HEAD request:

import mimetypes

# mimetypes module can analysize string for file extensions:
mimetypes.guess_type("http://example.com/file.pdf")
('application/pdf', None)
mimetypes.guess_type("http://example.com/song.mp3")
('audio/mpeg', None)


mimetypes.guess_type("http://example.com/file-without-extension")
(None, None)
# for files without extension we can make head request which only downloads the metadata
import httpx
response = httpx.head("https://httpbin.dev/html").headers['Content-Type']
'text/html; charset=utf-8'
httpx.head("https://wiki.mozilla.org/images/3/37/Mozilla_MDN_Guide.pdf").headers['Content-Type']
'application/pdf'

When web scraping and web crawling knowing content type before retrieving URL contents can save a lot of bandwidth and speed up the web scraping process. For example, when crawling we only want to follow HTML pages and avoid media files.

Related Articles

Guide to List Crawling: Everything You Need to Know

In-depth look at list crawling - how to extract valuable data from list-formatted content like tables, listicles and paginated pages.

CRAWLING
BEAUTIFULSOUP
PYTHON
Guide to List Crawling: Everything You Need to Know

How to Find All URLs on a Domain

Learn how to efficiently find all URLs on a domain using Python and web crawling. Guide on how to crawl entire domain to collect all website data

CRAWLING
PYTHON
How to Find All URLs on a Domain

Intro to Web Scraping Images with Python

In this guide, we’ll explore how to scrape images from websites using different methods. We'll also cover the most common image scraping challenges and how to overcome them. By the end of this article, you will be an image scraping master!

INTRO
CRAWLING
DATA-PARSING
PYTHON
Intro to Web Scraping Images with Python

How to Scrape Sitemaps to Discover Scraping Targets

Usually to find scrape targets we look at site search or category pages but there's a better way - sitemaps! In this tutorial, we'll be taking a look at how to find and scrape sitemaps for target locations.

CRAWLING
PYTHON
NODEJS
DATA-PARSING
How to Scrape Sitemaps to Discover Scraping Targets

How to Scrape YouTube in 2025

Learn how to scrape YouTube, channel, video, and comment data using Python directly in JSON.

SCRAPEGUIDE
PYTHON
HIDDEN-API
How to Scrape YouTube in 2025

What is Rate Limiting? Everything You Need to Know

Discover what rate limiting is, why it matters, how it works, and how developers can implement it to build stable, scalable applications.

BLOCKING
CRAWLING
HTTP
What is Rate Limiting? Everything You Need to Know