🚀 We are hiring! See open positions

How to fix Python requests MissingSchema error?

by scrapecrow Dec 19, 2022

MissingSchema error can be seen when using Python requests module to scrape invalid URLs without protocol indicator (the http:// bit).

This usually happens when we accidently supply the scraper with relative URLs instead of absolute URLs:

import requests

requests.get("/product/25")  # default redirect limit is 30
# will raise:
# MissingSchema: Invalid URL '/product/10': No scheme supplied. Perhaps you meant http:///product/10?

When web scraping, it's best to always ensure the scraped URLs are absolute using the urljoin() function:

from urllib.parse import urljoin
import requests

response = requests.get("http://example.com")
urls = [  # lets assume we got this batch of product urls:
    "/product/1",
    "/product/2",
    "/product/3",
]

for relative_url in urls:
    absolute_url = urljoin(response.url, relative_url)
    # this will result in: http://example.com/product/1
    item_response = requests.get(absolute_url)

Related Articles