How to Scrape Algolia Search

by Bernardas Ališauskas Nov 29, 2025

#python #hidden-api #project

Did you know that when you search for products, documentation, or reviews on your favorite sites, there's a good chance Algolia is working behind the scenes? With Algolia quietly powering lightning-fast searches for some of the biggest sites on the internet, unlocking its data can open up a world of web scraping opportunities.

In this web scraping tutorial, we'll learn how to scrape Algolia search using Python. We'll use a real-life example from alternativeto.net, which is a web database of software metadata and recommendations. Through this example, you'll see how Algolia works and learn how to write a web scraper that works with any website using Algolia.

Key Takeaways

Learn how to build an Algolia scraper that works across different websites. You'll discover how to find API keys, make search requests, and handle multiple pages efficiently.

Find API keys from website code using simple patterns to identify application IDs and API keys
Copy search requests by watching network traffic and recreating them in Python
Use async programming to scrape multiple pages at the same time for faster results
Set up proper headers and authentication to communicate with Algolia's API
Write reusable code that works with any Algolia-powered website
Use tools like ScrapFly to avoid being blocked while scraping

What is Algolia?

Algolia is a cloud-based search service that helps websites add search functionality quickly and easily. Instead of building a search system from scratch, websites can use Algolia to power their search features.

Think of Algolia as a smart search engine that websites rent instead of building themselves. It handles all the complex work of indexing content, searching through it, and returning results fast.

What is Algolia Search?

Algolia Search is the main product that powers search bars on websites. When you type something into a search box on a site using Algolia, your query gets sent to Algolia's servers. Those servers quickly search through the website's content and send back matching results.

The great thing about Algolia Search is that it works the same way across different websites. Once you understand how to scrape one Algolia-powered website, you can apply the same techniques to scrape many others.

Creating Search Engine for any Website using Web Scraping

Guide for creating a search engine for any website using web scraping in Python. How to crawl data, index it and display it via js powered GUI.

Why Algolia Search?

Websites choose Algolia Search for several reasons. It's fast, easy to set up, and works well on mobile devices. For web scrapers, Algolia is especially useful because it provides structured data in a consistent format.

When websites use Algolia, they send search queries to Algolia's API servers. These servers return results as JSON data, which is much easier to work with than parsing HTML. This makes Algolia a great target for web scraping because you get clean, organized data instead of messy web pages.

Many popular websites use Algolia, including:

E-commerce sites for product searches
Documentation sites for finding articles
Software directories for discovering apps
Company websites for searching content

Because so many websites use Algolia, learning to scrape it means you can extract data from a wide variety of sources.

Challenges When Scraping Algolia Search

When scraping Algolia Search, there are several key challenges that you may encounter. Each one can impact how successfully you can extract data, and it is important to understand and address each to build a robust scraper.

Finding the API Keys

Every website that uses Algolia implements its own unique Application ID and API key. These keys serve as credentials that allow the site to communicate with Algolia's servers. In order to make successful search requests, you need to obtain the appropriate keys for each website.

Typically, you can locate these keys by inspecting the JavaScript code or watching the network requests made by the website in your browser's developer tools. Identifying and extracting these values is a crucial first step to start scraping the data you need.

Understanding the Request Format

Algolia uses a very specific format for its search requests, including required headers and a particular JSON structure in the body of the request. The server expects headers like x-algolia-api-key and x-algolia-application-id, along with properly formatted POST data containing your search parameters.

If your requests do not match the format the server expects, your queries will fail. Therefore, carefully replicating both the headers and the JSON format of requests is essential for successful scraping.

Handling Multiple Pages

For most Algolia implementations, search results are organized into multiple pages rather than all being returned in a single response. This means you will need to implement pagination handling in your code.

You may have to send repeated requests, incrementing the page number or providing a cursor value, in order to retrieve all records from your target query. Not handling pagination will likely result in missing a significant portion of the available data.

Avoiding Blocks

Although Algolia generally allows requests from anywhere, the websites that use Algolia for their search features may employ strategies to prevent automated access. This could involve rate limiting, IP bans, or more advanced anti-bot systems. As a scraper.

you should be prepared to use techniques such as rotating user agents, managing request timing, or leveraging proxy networks as needed to avoid being blocked. We will discuss practical solutions for bypassing these kinds of obstacles later in the guide.

Different Index Names

Within Algolia, each website can define its own set of "indexes," often one or more for different types of searchable data. The correct index name must be included in your requests for the API to return results.

Since index names vary from one implementation to another, you need to identify which index or indexes the website utilizes for the data you are interested in. This usually involves inspecting network requests or studying the site’s JavaScript code.

All these challenges can be addressed with a little detective work and attention to detail. In the following sections, we will walk you through proven solutions for each point, so you can scrape Algolia-powered searches with confidence.

Project Setup

We'll use a couple of Python packages for web scraping:

httpx as our HTTP client
parsel as our HTML parser (only used in the bonus section)

Install both packages using pip:

$ pip install httpx parsel

Understanding and Scraping Algolia Search

Now let's learn how to actually scrape Algolia. We'll use alternativeto.net as our example website.

If you visit this website and type something into the search box, you can see that a request is sent to Algolia's API in the background:

Using browser devtools interface (F12 key) we can see a background POST-type request being made when submit our search

The first thing we notice is the URL contains secret keys like the application name and API token. The request also sends a JSON document with our query details:

screengrab of network inspector when searching Algolia

Now that we understand how it works, we can replicate this in Python:

from urllib.parse import urlencode
import httpx

params = {
    "x-algolia-agent": "Algolia for JavaScript (4.13.1); Browser (lite)",
    "x-algolia-api-key": "88489cdf3a8fbfe07a2f607bf1568330",
    "x-algolia-application-id": "ZIDPNS2VB0",
}
search_url = "https://zidpns2vb0-dsn.algolia.net/1/indexes/fullitems/query?" + urlencode(params)
search_data = {
    # See Algolia API docs for more parameters: https://www.algolia.com/doc/api-reference/search-api-parameters/
    "query": "Spotify",
    "page": 1,
    "distinct": True,
    "hitsPerPage": 20,
}
response = httpx.post(search_url, json=search_data)
print(response.json())

We got the first page of results plus pagination information we can use to get the remaining pages. Let's use asynchronous programming to download all pages at the same time for maximum speed:

import asyncio
import json
from typing import List
from urllib.parse import urlencode
import httpx

params = {
    "x-algolia-agent": "Algolia for JavaScript (4.13.1); Browser (lite)",
    "x-algolia-api-key": "88489cdf3a8fbfe07a2f607bf1568330",
    "x-algolia-application-id": "ZIDPNS2VB0",
}
search_url = "https://zidpns2vb0-dsn.algolia.net/1/indexes/fullitems/query?" + urlencode(params)


async def scrape_search(query: str) -> List[dict]:
    search_data = {
        # See Algolia API docs for more parameters: https://www.algolia.com/doc/api-reference/search-api-parameters/
        "query": query,
        "page": 1,
        "distinct": True,
        "hitsPerPage": 20,
    }
    async with httpx.AsyncClient(timeout=httpx.Timeout(30.0)) as session:
        # scrape first page for total number of pages
        response_first_page = await session.post(search_url, json=search_data)
        data_first_page = response_first_page.json()

        results = data_first_page["hits"]
        total_pages = data_first_page["nbPages"]
        # scrape remaining pages concurrently
        other_pages = [
            session.post(search_url, json={**search_data, "page": i})
            for i in range(2, total_pages + 1)
        ]
        for response_page in asyncio.as_completed(other_pages):
            page_data = (await response_page).json()
            results.extend(page_data["hits"])
        return results


print(asyncio.run(scrape_search("spotify")))

The scraper we just built will work with any website that uses Algolia search! You can test it on these popular Algolia-powered websites:

Bonus: Finding Tokens

In our scraper above, we hardcoded the Algolia API keys we found in the network inspector. These keys don't change often, but if you're building a scraper that needs to run continuously, you might want to find them automatically.

Since Algolia is a front-end service, all the required keys are somewhere in the website's HTML or JavaScript code. The keys might be in hidden input fields or stored as JavaScript variables.

With some simple parsing and pattern matching, we can extract these keys automatically:

import re
from urllib.parse import urljoin

import httpx
from parsel import Selector

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "Connection": "keep-alive",
    "Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
}


def search_keyword_variables(html: str):
    """Look for Algolia keys in javascript keyword variables"""
    variables = re.findall(r'(\w*algolia\w*?):"(.+?)"', html, re.I)
    api_key = None
    app_id = None
    for key, value in variables:
        key = key.lower()
        if len(value) == 32 and re.search("search_api_key|search_key|searchkey", key):
            api_key = value
        if len(value) == 10 and re.search("application_id|appid|app_id", key):
            app_id = value
        if api_key and app_id:
            print(f"found algolia details: {app_id=}, {api_key=}")
            return app_id, api_key


def search_positional_variables(html: str):
    """Look for Algolia keys in javascript position variables"""
    found = re.findall(r'"(\w{10}|\w{32})"\s*,\s*"(\w{10}|\w{32})"', html)
    return sorted(found[0], reverse=True) if found else None


def find_algolia_keys(url):
    """Scrapes a URL and its JavaScript files to find Algolia application ID and API key"""
    response = httpx.get(url, headers=HEADERS)
    sel = Selector(response.text)

    # 1. Search in input fields:
    app_id = sel.css("input[name*=search_api_key]::attr(value)").get()
    search_key = sel.css("input[name*=search_app_id]::attr(value)").get()
    if app_id and search_key:
        print(f"found algolia details in hidden inputs {app_id=} {search_key=}")
        return {
            "x-algolia-application-id": app_id,
            "x-algolia-api-key": search_key,
        }
    # 2. Search in website scripts:
    scripts = sel.xpath("//script/@src").getall()
    # Prioritize scripts with keywords like "app-" that are more likely to contain API keys
    _script_priorities = ["app", "settings"]
    scripts = sorted(scripts, key=lambda script: any(key in script for key in _script_priorities), reverse=True)
    print(f"found {len(scripts)} script files that could contain algolia details")
    for script in scripts:
        print("looking for algolia details in script: {script}", script=script)
        resp = httpx.get(urljoin(url, script), headers=HEADERS)
        if found := search_keyword_variables(resp.text):
            return {
                "x-algolia-application-id": found[0],
                "x-algolia-api-key": found[1],
            }
        if found := search_positional_variables(resp.text):
            return {
                "x-algolia-application-id": found[0],
                "x-algolia-api-key": found[1],
            }
    print(f"could not find algolia keys in {len(scripts)} script details")


## input
find_algolia_keys("https://www.heroku.com/search")
## kw variables
find_algolia_keys("https://incidentdatabase.ai/apps/discover/")
find_algolia_keys("https://fontawesome.com/search")
## positional variables
find_algolia_keys("https://alternativeto.net/")

The code above scans the main page for Algolia API keys. They might be located in:

JavaScript keyword variables in script files used by the website
Positional JavaScript variables (Algolia application IDs are 10 characters long and API keys are 32 characters long)
Hidden input forms in the page HTML

This code should help you find Algolia keys automatically without manual searching!

Avoiding Blocking with ScrapFly

While Algolia itself is easy to scrape, the websites that use Algolia search may block web scrapers.

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

Anti-bot protection bypass - scrape web pages without blocking!
Rotating residential proxies - prevent IP address and geographic blocks.
JavaScript rendering - scrape dynamic web pages through cloud browsers.
Full browser automation - control browsers to scroll, input and click on objects.
Format conversion - scrape as HTML, JSON, Text, or Markdown.
Python and Typescript SDKs, as well as Scrapy and no-code tool integrations.

Here's what our alternativeto.net scraper looks like using ScrapFly SDK. Install it using pip:

$ pip install scrapfly-sdk

To use ScrapFly, simply replace the httpx calls with ScrapFly SDK:

import asyncio
import json
from typing import List
from urllib.parse import urlencode
from scrapfly import ScrapflyClient, ScrapeConfig

params = {
    "x-algolia-agent": "Algolia for JavaScript (4.13.1); Browser (lite)",
    "x-algolia-api-key": "88489cdf3a8fbfe07a2f607bf1568330",
    "x-algolia-application-id": "ZIDPNS2VB0",
}
search_url = "https://zidpns2vb0-dsn.algolia.net/1/indexes/fullitems/query?" + urlencode(params)


async def scrape_search(query: str) -> List[dict]:
    search_data = {
        # See Algolia API docs for more parameters: https://www.algolia.com/doc/api-reference/search-api-parameters/
        "query": query,
        "page": 1,
        "distinct": True,
        "hitsPerPage": 20,
    }
    with ScrapflyClient(key="YOUR_SCRAPFLY_KEY", max_concurrency=2) as client:
        # scrape first page for total number of pages
        first_page = client.scrape(
            ScrapeConfig(
                url=search_url,
                method="POST",
                data=search_data,
                headers={"Content-Type": "application/json"},
                # Optional features:
                asp=True,  # Enable Anti Scraping Protection Bypass
                country="US",  # Select any country for your IP address
            )
        )
        data_first_page = json.loads(first_page.content)

        results = data_first_page["hits"]
        total_pages = data_first_page["nbPages"]
        # scrape remaining pages concurrently
        async for result in client.concurrent_scrape(
            [
                ScrapeConfig(
                    url=search_url, 
                    data={**search_data, "page": i}, 
                    method="POST",
                    headers={"Content-Type": "application/json"},
                    # Optional features:
                    asp=True,  # Enable Anti Scraping Protection Bypass
                    country="US",  # Select any country for your IP address
                )
                for i in range(2, total_pages + 1)
            ]
        ):
            data = json.loads(result.content)
            results.extend(data["hits"])
        return results


print(asyncio.run(scrape_search("spotify")))

By replacing httpx with ScrapFly SDK, we can scrape all pages without being blocked or throttled.

FAQs

How do I find Algolia API keys and application IDs for scraping?

Use browser developer tools (F12) to inspect network requests when searching. Look for requests to algolia.net domains and check the request headers for x-algolia-api-key and x-algolia-application-id. These can also be found in page source code or JavaScript files.

What's the difference between scraping Algolia vs regular HTML parsing?

Algolia scraping uses direct API calls to get structured JSON data. This is faster and more reliable than HTML parsing. It skips HTML rendering and gives you cleaner data, but you need to find the correct API endpoints and parameters first.

How do I handle rate limiting when scraping Algolia-powered websites?

Implement request delays (1-2 seconds between requests), use rotating proxies, and respect the website's rate limits. Algolia itself has generous limits, but the implementing websites may have their own restrictions.

Can I scrape Algolia search results without knowing the API keys?

Yes, you can reverse engineer the keys by inspecting the website's JavaScript code, network requests, or HTML source. Look for patterns like 32-character API keys and 10-character application IDs in script tags or hidden form fields.

What are the most common Algolia scraping challenges?

Common challenges include: finding the correct API endpoints and parameters, handling authentication tokens, dealing with rate limiting, and adapting to changes in the website's Algolia implementation.

What causes "Expecting value (near 1:1)" error?

This error happens when the POST request body is not formatted correctly (it should be valid JSON) or when the Content-Type header is missing or wrong. Make sure it's set to application/json.

What causes "{"message":"indexName is not valid","status":400}" error?

Some websites use multiple indexes for their search data. You need to specify which index you want to query using the "IndexName" parameter in the request body. You can find this value in the browser developer tools network inspector (press F12). Once you find it, you can hardcode it into your scraper since it rarely changes.

Summary

In this tutorial, you unlocked the secrets of scraping Algolia search like a pro. You discovered how to identify hidden API credentials, mimicked real search requests, and built a fast Python scraper that effortlessly pulls all results from alternativeto.net.

Plus, you learned smart, battle-tested techniques to bypass rate limits and blocks using ScrapFly SDK, putting the power of scalable, reliable data extraction right at your fingertips. Ready to take on any Algolia-powered website!

Products

Features

SDKs

No-Code Platforms

LLM & RAG Apps

Technical Challenges

Popular Targets

Real Estate

eCommerce

Social Media

Company & Reviews

Jobs

Search & SEO

Fashion

Travel & Hotels

Industry Solutions

How to Scrape Algolia Search

Explore this Article with AI

Key Takeaways

What is Algolia?

What is Algolia Search?

Creating Search Engine for any Website using Web Scraping

Why Algolia Search?

Challenges When Scraping Algolia Search

Finding the API Keys

Understanding the Request Format

Handling Multiple Pages

Avoiding Blocks

Different Index Names

Project Setup

Understanding and Scraping Algolia Search

Bonus: Finding Tokens

Avoiding Blocking with ScrapFly

FAQs

How do I find Algolia API keys and application IDs for scraping?

What's the difference between scraping Algolia vs regular HTML parsing?

How do I handle rate limiting when scraping Algolia-powered websites?

Can I scrape Algolia search results without knowing the API keys?

What are the most common Algolia scraping challenges?

What causes "Expecting value (near 1:1)" error?

What causes "{"message":"indexName is not valid","status":400}" error?

Summary

Explore this Article with AI

Related Knowledgebase

Python httpx vs requests vs aiohttp - key differences

How to scrape HTML table to Excel Spreadsheet (.xlsx)?

What Python libraries support HTTP2?

How to handle popup dialogs in Playwright?

How to use proxies with Python httpx?

How to scrape images from a website?

What are some ways to parse JSON datasets in Python?

What are devtools and how they're used in web scraping?

How to use cURL in Python?

How to open Python http responses in a web browser?

How to fix Python requests SSLError?

Selenium: chromedriver executable needs to be in PATH?

Related Articles

How to Scrape YouTube in 2025

How to Power-Up LLMs with Web Scraping and RAG

How to Build Minimum Advertised Price (MAP) Monitoring Tool

How to Scrape BestBuy Product, Offer and Review Data

How To Scrape TikTok in 2025