Effortlessly Extract Data

with Our AI-Powered Extraction API

Unlock the power of our AI-Powered Extraction API to transform your data extraction process.

  • Boost productivity with LLM prompts for faster, smarter extractions.
  • Automate and adapt with AI-driven tools for automatic structured data extraction.
  • Create and customize your own extraction rules to suit any document structure.
  • Seamlessly integrate with platforms, Python and Typescript.
Web Scraping API preview

What Can it Do?

from pathlib import Path
from scrapfly import ExtractionConfig, ScrapflyClient 

client = ScrapflyClient(key="API KEY")

# product from https://web-scraping.dev/product/1
document_text = Path("product.html").read_text()  

api_response = client.extract(
    ExtractionConfig(
        body=document_text,
        content_type="text/html",
        extraction_prompt="summarize the review sentiment"
    )
)
print(api_response.result)
Output
import { 
    ScrapflyClient, ExtractionConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: Deno.env.get("SCRAPFLY_KEY") });
// product from https://web-scraping.dev/product/1
const document_text = Deno.readTextFileSync("./product.html").toString();

const api_result = await client.extract(
    new ExtractionConfig({
        body: document_text,
        url: "https://web-scraping.dev/product/1",
        content_type: "text/html",
        extraction_prompt: "summarize the review sentiment",
    })
);
console.log(JSON.stringify(api_result));
Output
http POST https://api.scrapfly.io/extraction \
key==$SCRAPFLY_KEY \
content_type==text/html \
"extraction_prompt==summarize the review sentiment" \
url==https://web-scraping.dev/product/1 \
@product.html
Output
from pathlib import Path
from scrapfly import ExtractionConfig, ScrapflyClient 

client = ScrapflyClient(key="API KEY")

# product from https://web-scraping.dev/product/1
document_text = Path("product.html").read_text()  

api_response = client.extract(
    ExtractionConfig(
        body=document_text,
        content_type="text/html",
        url="https://web-scraping.dev/product/1",
        extraction_prompt="extract price as JSON",
    )
)
print(api_response.result)
Output
import { 
    ScrapflyClient, ExtractionConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: Deno.env.get("SCRAPFLY_KEY") });
// product from https://web-scraping.dev/product/1
const document_text = Deno.readTextFileSync("./product.html").toString();

const api_result = await client.extract(
    new ExtractionConfig({
        body: document_text,
        url: "https://web-scraping.dev/product/1",
        content_type: "text/html",
        extraction_prompt: "extract price as json",
    })
);
console.log(JSON.stringify(api_result));
Output
http POST https://api.scrapfly.io/extraction \
key==$SCRAPFLY_KEY \
content_type==text/html \
"extraction_prompt==extract price as JSON" \
url==https://web-scraping.dev/product/1 \
@product.html
Output
from pathlib import Path
from scrapfly import ExtractionConfig, ScrapflyClient 

client = ScrapflyClient(key="API KEY")

# product from https://web-scraping.dev/product/1
document_text = Path("product.html").read_text()  

api_response = client.extract(
    ExtractionConfig(
        body=document_text,
        extraction_prompt="find the product price",
        # use almost any file type identified through content_type
        content_type="text/html",
        content_type="text/xml",
        content_type="text/plain",
        content_type="text/markdown",
        content_type="application/json",
        content_type="application/csv",
    )
)
print(api_response.result)
Output
import { 
    ScrapflyClient, ExtractionConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: Deno.env.get("SCRAPFLY_KEY") });
// product from https://web-scraping.dev/product/1
const document_text = Deno.readTextFileSync("./product.html").toString();

const api_result = await client.extract(
    new ExtractionConfig({
        body: document_text,
        url: "https://web-scraping.dev/product/1",
        extraction_prompt: "find the product price",
        // use almost any file type identified through content_type
        content_type: "text/html",
        // content_type: "text/xml",
        // content_type: "text/plain",
        // content_type: "text/markdown",
        // content_type: "application/json",
        // content_type: "application/csv",
    })
);
console.log(JSON.stringify(api_result));
Output
http POST https://api.scrapfly.io/extraction \
key==$SCRAPFLY_KEY \
content_type==text/html \
"extraction_prompt==find the price" \
url==https://web-scraping.dev/product/1 \
@product.html
Output
import json
from pathlib import Path
from scrapfly import ExtractionConfig, ScrapflyClient 

client = ScrapflyClient(key="API KEY")

# product from https://web-scraping.dev/product/1
document_text = Path("product.html").read_text()  

api_response = client.extract(
    ExtractionConfig(
        body=document_text,
        content_type="text/html",
        # use one of dozens of defined data models:
        extraction_model="product",
        # optional: provide file's url for converting relative links to absolute
        url="https://web-scraping.dev/product/1",
    )
)
print(json.dumps(api_response.extraction_result))
Output
import { 
    ScrapflyClient, ExtractionConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: Deno.env.get("SCRAPFLY_KEY") });
// product from https://web-scraping.dev/product/1
const document_text = Deno.readTextFileSync("./product.html").toString();

const api_result = await client.extract(
    new ExtractionConfig({
        body: document_text,
        content_type: "text/html",
        url: "https://web-scraping.dev/product/1",
        extraction_model: "product"
    })
);
console.log(JSON.stringify(api_result));
Output
http POST https://api.scrapfly.io/extraction \
key==$SCRAPFLY_KEY \
content_type==text/html \
extraction_model==product \
url==https://web-scraping.dev/product/1 \
@product.html
Output
import json
from pathlib import Path
from scrapfly import ExtractionConfig, ScrapflyClient 

client = ScrapflyClient(key="API KEY")

# product from https://web-scraping.dev/product/1
document_text = Path("product.html").read_text()  

api_response = client.extract(
    ExtractionConfig(
        body=document_text,
        content_type="text/html",
        # use one of dozens of defined data models:
        extraction_model="product",
        # optional: provide file's url for converting relative links to absolute
        url="https://web-scraping.dev/product/1",
    )
)
# the data_quality field describes how much was found
print(json.dumps(api_response.extraction_result['data_quality']))
Output
import { 
    ScrapflyClient, ExtractionConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: Deno.env.get("SCRAPFLY_KEY") });
// product from https://web-scraping.dev/product/1
const document_text = Deno.readTextFileSync("./product.html").toString();

const api_result = await client.extract(
    new ExtractionConfig({
        body: document_text,
        content_type: "text/html",
        url: "https://web-scraping.dev/product/1",
        extraction_model: "product"
    })
);
console.log(JSON.stringify(api_result));
Output
http POST https://api.scrapfly.io/extraction \
key==$SCRAPFLY_KEY \
content_type==text/html \
extraction_model==product \
url==https://web-scraping.dev/product/1 \
@product.html
Output
import json
from pathlib import Path
from scrapfly import ExtractionConfig, ScrapflyClient 

client = ScrapflyClient(key="API KEY")

# product from https://web-scraping.dev/reviews
document_text = Path("reviews.html").read_text()  
# define your JSON template
template = {  
  "source": "html",
  "selectors": [
    {
      "name": "date_posted",
      # use css selectors
      "type": "css",
      "query": "[data-testid='review-date']::text",
      "multiple": True,  # one or multiple?
      # post process results with formatters
      "formatters": [ {
        "name": "datetime",
        "args": {"format": "%Y, %b %d — %A"}
      } ]
    }
  ]
}
api_response = client.extract(
    ExtractionConfig(
        body=document_text,
        content_type="text/html",
        # use one of dozens of defined data models:
        ephemeral_template=template
    )
)
print(json.dumps(api_response.extraction_result))
Output
import { 
    ScrapflyClient, ExtractionConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: Deno.env.get("SCRAPFLY_KEY") });
// product from https://web-scraping.dev/reviews
const document_text = Deno.readTextFileSync("./reviews.html").toString();

// define your template as JSON 
const template = {  
  "source": "html",
  "selectors": [
    {
      "name": "date_posted",
      // use css selectors
      "type": "css",
      "query": "[data-testid='review-date']::text",
      "multiple": true,  // one or multiple?
      // post process results with formatters
      "formatters": [ {
        "name": "datetime",
        "args": {"format": "%Y, %b %d — %A"}
      } ]
    }
  ]
}
const api_result = await client.extract(
    new ExtractionConfig({
        body: document_text,
        url: "https://web-scraping.dev/product/1",
        content_type: "text/html",
        ephemeral_template: template,
    })
);
console.log(JSON.stringify(api_result));
Output
http POST https://api.scrapfly.io/extraction \
key==$SCRAPFLY_KEY \
content_type==text/html \
@product.html
Output

Developer-First Experience

We made Scrapfly for ourselves in 2017 and opened it to public in 2020. In that time, we focused on working on the best developer experience possible.

Master Web Data with our Docs and Tools

Access a complete ecosystem of documentation, tools, and resources designed to accelerate your data journey and help you get the most out of Scrapfly.

Explore the Docs

Seamlessly Integrate with Frameworks & Platforms

Easily integrate Scrapfly with your favorite tools and platforms, or customize workflows with our Python and TypeScript SDKs.

Powerful Web UI

One-stop shop to configure, control and observe all of your Scrapfly activity.

Simple & Fair Pricing

Pay for the features you use

How Many Extractions per Month?

Get started with 1,000 Free Credits. Try Scrapfly for free
Custom Subscription
/mo
Custom
ENTERPRISE Subscription
$500/mo
Enterprise
STARTUP Subscription
$250/mo
Startup
PRO Subscription
$100/mo
Pro
DISCOVERY Subscription
$30/mo
Discovery
Included API Credits 5,500,000 2,500,000 1,000,000 200,000
Extra API Credits per 10k $1.20 per 10k $2.00 per 10k $3.50 per 10k
Concurrent Request 100 50 20 5
Log Retention ∞ weeks 4 weeks 3 weeks 2 weeks 1 weeks
AI Auto Extract
LLM Prompting
Template Extraction
Support Premium Support Premium Support Standard Support Standard Support Basic Support

* Price may vary with tax

What Do Our Users Say?

"The Extraction API has transformed how we handle product data. We can now automatically extract detailed product information from hundreds of sites using just a few lines of code. The AI-powered auto extraction feature has saved us countless hours, and the schema-based outputs mean no more messy data. It’s incredibly reliable and easy to integrate!"

Jessica T. – Senior Data Engineer

"Scrapfly’s Extraction API is a game-changer for data parsing at scale. The ability to customize our extraction rules with templates or let the AI handle it for us makes it versatile for all our projects. We’ve been able to extract thousands of reviews and product details effortlessly, and the accuracy is simply outstanding. Highly recommended!"

Michael Bennett – CTO

"We’ve tried several data extraction tools, but Scrapfly’s Extraction API is by far the best. The LLM prompts and AI auto extraction made it simple to pull structured data from documents and webpages. Our team uses it daily to extract everything from articles to product details, and the accuracy and speed have exceeded our expectations!"

Olivia Martinez – Lead Developer

Capterra review badge

Frequently Asked Questions

What is an Extraction API?

Extraction API is a service for automating data parsing tasks like parsing HTML for structured data, extracting details from PDFs or generally making sense from text data. Extraction APIs pair well with web scraping and data harvesting and for that see Scrapfly Web Scraping API.

How can I access Extraction API?

Extraction HTTP API can be accessed in any http client like curl, httpie or any http client library in any programming language. For first-class support we offer Python and Typescript SDKs.

What type of documents the Extraction API supports?

The Extraction API currently supports most text documents for automated parsing service. This includes HTML, XML, JSON, CSV, RSS, Markdown and plain text. PDF support is coming soon.

Does Extraction API use AI to extract data from my documents?

Yes, AI is used in both LLM prompt and AI Auto Extract features. Both of these features use proprietary AI implementations to either prompt documents or extract data objects.

Is my data safe when used with the Extraction API?

Yes, we follow strict data privacy and security guidelines. We do not store your data, we do not share it with third parties or use it in AI training. For more see our privacy policy and terms of service.

Can I define my own extraction rules?

Yes, the extraction templates feature can be used to define your own parsing instructions in a JSON schema. This feature supports CSS and XPath selectors for selecting data, many different formatters for formatting and cleaning up the data and much more!