Scrapfly LlamaIndex Integration

Scrapfly is available on LlamaIndex - a popular data framework for building LLM applications.

For LlamaIndex, Scrapfly is available as a Web Page Reader object which uses Scrapfly Web Scrape API to retrieve web page data for use within the LlamaIndex ecosystem.

Many functionalities of LlamaIndex are already provided by Scrapfly's Extraction API through the LLM Extraction feature if you're looking for a more streamlined LLM parsing solution.

Usage

To start get your Srapfly API key on your dashboard. Then install Scrapfly Python SDK and LlamaIndex:

Then, the ScrapflyReader is available for scraping any web page:

For more advanced use, the integration supports all Scrapfly Web Scrape API options matching the Python SDK signature:

Example Use

LlamaIndex is a very large, feature-rich framework that can be daunting at first but the Scrapfly document loader can greatly simplify the user experience by scraping provided pages as markdown for easier processing.

For this example, let's explore a simple RAG scenario and use OpenAI to query it:

  1. We will use the ScrapflyReader to scrape pages as markdown into a local index
  2. Load documents into OpenAI query engine
  3. Create a prompt template and execute some prompts on scraped data

This whole process LlamaIndex would look as simple as the script below:

Let's unpack the above example step-by-step. We first set up our open api key and ScrapflyReader. Then, execute a scrape command to generate a list of documents (in this case 1 document).

Now, we can generate the index and query engine to prompt our scraped documents using LLMs:

This is a very basic use example though it's still a very powerful tool for data parsing using LLMs. You can load more than one document and design your own scraping-based knowledgebase that you can query with LLMs very easily as demonstrated in this example.

Errors

LlamaIndex will display the Scrapfly API error message in the standard Scrapfly API error message format. For more see:

Pricing

No additional costs.

Summary