How to fix Python requests MissingSchema error?

MissingSchema error can be seen when using Python requests module to scrape invalid URLs without protocol indicator (the http:// bit).

This usually happens when we accidently supply the scraper with relative URLs instead of absolute URLs:

import requests

requests.get("/product/25")  # default redirect limit is 30
# will raise:
# MissingSchema: Invalid URL '/product/10': No scheme supplied. Perhaps you meant http:///product/10?

When web scraping, it's best to always ensure the scraped URLs are absolute using the urljoin() function:

from urllib.parse import urljoin
import requests

response = requests.get("http://example.com")
urls = [  # lets assume we got this batch of product urls:
    "/product/1",
    "/product/2",
    "/product/3",
]

for relative_url in urls:
    absolute_url = urljoin(response.url, relative_url)
    # this will result in: http://example.com/product/1
    item_response = requests.get(absolute_url)

Provided by Scrapfly

This knowledgebase is provided by Scrapfly data APIs, check us out! 👇