AI Web Scraping API

Unlock AI-Powered Web Automation

Tired of fragile web scrapers breaking on website updates or just the complexity of modern websites?

Let LLMs and machine learning models handle this for you:

  • AI auto scrape to avoid fragile HTML parsing. Specify a URL, desired data type and get predictable results.
  • Use LLM prompts to query scraped results using an LLM engine that understands HTML.
  • Overcome scraping blocks with automatic and advanced block bypass.
  • Utilize cloud-based browsers for scraping complex and dynamic pages as they appear in web browsers.
  • Seamlessly integrate with popular AI toolkits like LangChain, LlamaIndex in Python and Typescript.
Try for free Trusted by 30,000+ Developers
Capterra review badge
Web Scraping API preview
99.99%
1PB+/mo
Data Transferred
5B+/mo
Success Requests

What Can it Do?

# pip install scrapfly-sdk[all]

from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

client = ScrapflyClient(key="API KEY")

api_response: ScrapeApiResponse = client.scrape(
    ScrapeConfig(
        url='https://web-scraping.dev/product/1',
        # what object to scrape? product, review, real estate listing etc.
        extraction_model="product",
    )
)
print(api_response.scrape_result['extracted_data']['data'])
Output
import { 
    ScrapflyClient, ScrapeConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: "API KEY" });
let api_result = await client.scrape(
    new ScrapeConfig({
        url: 'https://web-scraping.dev/product/1',
        // what object to scrape? product, review, real estate listing etc.
        extraction_model: "product",
    })
);
console.log(api_result.result.extracted_data);
Output
http https://api.scrapfly.io/scrape \
key==$SCRAPFLY_KEY \
url==https://web-scraping.dev/product/1 \
extraction_model==product
Output
Name API name &extraction_model={model-name}
Article article
Event event
Food Recipe food_recipe
Hotel hotel
Hotel Listing hotel_listing
Job Listing job_listing
Job Posting job_posting
Organization organization
Product product
Product Listing product_listing
Real Estate Property real_estate_property
Real Estate Property Listing real_estate_property_listing
Review List review_list
Search Engine Results search_engine_results
Social Media Post social_media_post
Stock stock
Vehicle Ad vehicle_ad
Vehicle Ad Listing vehicle_ad_listing
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

client = ScrapflyClient(key="API KEY")

api_response: ScrapeApiResponse = client.scrape(
    ScrapeConfig(
        url='https://web-scraping.dev/product/1',
        # Use any LLM prompt:
        extraction_prompt="What's price of the product?",
    )
)
print(api_response.scrape_result['extracted_data'])
Output
import { 
    ScrapflyClient, ScrapeConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: "API KEY" });
let api_result = await client.scrape(
    new ScrapeConfig({
        url: 'https://web-scraping.dev/product/1',
        // Use any LLM prompt:
        extraction_prompt: "What's price of the product?",
    })
);
console.log(api_result.result.extracted_data);
Output
http https://api.scrapfly.io/scrape \
key==$SCRAPFLY_KEY \
url==https://web-scraping.dev/product/1 \
"extraction_prompt=What's price of the product?"
Output
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

client = ScrapflyClient(key="API KEY")

api_response: ScrapeApiResponse = client.scrape(
    ScrapeConfig(
        url='https://web-scraping.dev/product/1',
        # Prompt for specific structured data formats:
        extraction_prompt="Extract product features in JSON format",
    )
)
print(api_response.scrape_result['extracted_data'])
Output
import { 
    ScrapflyClient, ScrapeConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: "API KEY" });
let api_result = await client.scrape(
    new ScrapeConfig({
        url: 'https://web-scraping.dev/product/1',
        // Prompt for specific structured data formats:
        extraction_prompt: "Extract product features in JSON format",
    })
);
console.log(api_result.result.extracted_data);
Output
http https://api.scrapfly.io/scrape \
key==$SCRAPFLY_KEY \
url==https://web-scraping.dev/product/1 \
"extraction_prompt=Extract product features in JSON format"
Output
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
client = ScrapflyClient(key="API KEY")

api_response: ScrapeApiResponse = client.scrape(
    ScrapeConfig(
        url='https://web-scraping.dev/product/1',
        # add unique identifier to start a session
        session="mysession123",
    )
)

# resume session
api_response2: ScrapeApiResponse = client.scrape(
    ScrapeConfig(
        url='https://web-scraping.dev/product/1',
        session="mysession123",
        # sessions can be shared between browser and http requests
        # render_js = True,   # enable browser for this session
    )
)
print(api_response2.result))
Output
import { 
    ScrapflyClient, ScrapeConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: "API KEY" });

let api_result = await client.scrape(
    new ScrapeConfig({
        url: 'https://web-scraping.dev/product/1',
        // add unique identifier to start a session
        session: "mysession123",
    })
);

// resume session
let api_result2 = await client.scrape(
    new ScrapeConfig({
        url: 'https://web-scraping.dev/product/1',
        session: "mysession123",
        // sessions can be shared between browser and http requests
        // render_js: true,   // enable browser for this session
    })
);
console.log(JSON.stringify(api_result2.result));
Output
# start session
http https://api.scrapfly.io/scrape \
key==$SCRAPFLY_KEY \
url==https://web-scraping.dev/product/1 \
session=mysession123 

# resume session
http https://api.scrapfly.io/scrape \
key==$SCRAPFLY_KEY \
url==https://web-scraping.dev/product/1 \
session=mysession123
Output
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
client = ScrapflyClient(key="API KEY")

api_response: ScrapeApiResponse = client.scrape(
    ScrapeConfig(
        url='https://web-scraping.dev/reviews',
        # enable the use of cloud browers
        render_js=True,
        # wait for specific element to appear
        wait_for_selector=".review",
        # or wait set amount of time
        rendering_wait=3_000,  # 3 seconds
    )
)


print(api_response.result)
Output
import { 
    ScrapflyClient, ScrapeConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: "API KEY" });

let api_result = await client.scrape(
    new ScrapeConfig({
        url: 'https://web-scraping.dev/reviews',
        // enable the use of cloud browers
        render_js: true,
        // wait for specific element to appear
        wait_for_selector: ".review",
        // or wait set amount of time
        rendering_wait: 3_000,  // 3 seconds
    })
);

console.log(JSON.stringify(api_result.result));
Output
http https://api.scrapfly.io/scrape \
key==$SCRAPFLY_KEY \
url==https://web-scraping.dev/reviews \
render_js==true \
wait_for_selector==.review \
rendering_wait==3000
Output
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
client = ScrapflyClient(key="API KEY")

api_response: ScrapeApiResponse = client.scrape(
    ScrapeConfig(
        url='https://web-scraping.dev/login',
        # enable browsers for this request
        render_js = True,
        # describe your control flow
        js_scenario = [
            {"fill": {"selector": "input[name=username]", "value":"user123"}},
            {"fill": {"selector": "input[name=password]", "value":"password"}},
            {"click": {"selector": "button[type='submit']"}},
            {"wait_for_navigation": {"timeout": 5000}}
        ]
    )
)


print(api_response.result)
Output
import { 
    ScrapflyClient, ScrapeConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: "API KEY" });

let api_result = await client.scrape(
    new ScrapeConfig({
        url: 'https://web-scraping.dev/reviews',
        // enable browsers for this request
        render_js: true,
        // describe your control flow
        js_scenario: [
            {"fill": {"selector": "input[name=username]", "value":"user123"}},
            {"fill": {"selector": "input[name=password]", "value":"password"}},
            {"click": {"selector": "button[type='submit']"}},
            {"wait_for_navigation": {"timeout": 5000}}
        ]
    })
);

console.log(JSON.stringify(api_result.result));
Output
http https://api.scrapfly.io/scrape \
key==$SCRAPFLY_KEY \
url==https://web-scraping.dev/login \
render_js==true \
js_scenario==Ww0KCXsiZmlsbCI6IHsic2VsZWN0b3IiOiAiaW5wdXRbbmFtZT11c2VybmFtZV0iLCAidmFsdWUiOiJ1c2VyMTIzIn19LA0KCXsiZmlsbCI6IHsic2VsZWN0b3IiOiAiaW5wdXRbbmFtZT1wYXNzd29yZF0iLCAidmFsdWUiOiJwYXNzd29yZCJ9fSwNCgl7ImNsaWNrIjogeyJzZWxlY3RvciI6ICJidXR0b25bdHlwZT0nc3VibWl0J10ifX0sDQoJeyJ3YWl0X2Zvcl9uYXZpZ2F0aW9uIjogeyJ0aW1lb3V0IjogNTAwMH19DQpd

# note: js scenario has to be base64 encoded
Output
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
client = ScrapflyClient(key="API KEY")

api_response: ScrapeApiResponse = client.scrape(
    ScrapeConfig(
        url='https://web-scraping.dev/reviews',
        render_js=True,
        rendering_wait=3_000,
    )
)

# see the browser_data field
print(api_response.result['browser_data'])
Output
import { 
    ScrapflyClient, ScrapeConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: "API KEY" });

let api_result = await client.scrape(
    new ScrapeConfig({
        url: 'https://web-scraping.dev/reviews',
        render_js: true,
        rendering_wait: 3_000,  
    })
);

// see the browser_data field
console.log(JSON.stringify(api_result.result.browser_data));
Output
http https://api.scrapfly.io/scrape \
key==$SCRAPFLY_KEY \
url==https://web-scraping.dev/reviews \
render_js==true | jq .result.browser_data
Output

Developer-First Experience

We made Scrapfly for ourselves in 2017 and opened it to public in 2020. In that time, we focused on working on the best developer experience possible.

Master Web Data with our Docs and Tools

Access a complete ecosystem of documentation, tools, and resources designed to accelerate your data journey and help you get the most out of Scrapfly.

Explore the Docs

Seamlessly Integrate with Frameworks & Platforms

Easily integrate Scrapfly with your favorite tools and platforms, or customize workflows with our Python and TypeScript SDKs.

Powerful Web UI

One-stop shop to configure, control and observe all of your Scrapfly activity.

Predictable & Fair Pricing

Pay only for successful requests and the features you use.

How Many Scrapes per Month?

Get started with 1,000 Free Credits. Try Scrapfly for free
Custom Subscription
/mo
Custom
ENTERPRISE Subscription
$500/mo
Enterprise
STARTUP Subscription
$250/mo
Startup
PRO Subscription
$100/mo
Pro
DISCOVERY Subscription
$30/mo
Discovery
Included API Credits 5,500,000 2,500,000 1,000,000 200,000
Extra API Credits per 10k $1.20 per 10k $2.00 per 10k $3.50 per 10k
Concurrent Request 100 50 20 5
Log Retention ∞ weeks 4 weeks 3 weeks 2 weeks 1 weeks
Anti Scraping Protection
Residential Proxy
Geo targeting
Javascript Rendering
Team Management x x
Support Premium Support Premium Support Standard Support Standard Support Basic Support

* Price may vary with tax

What Do Our Users Say?

"Scrapfly’s Web Scraping API has completely transformed our data collection process. The automatic proxy rotation and anti-bot bypass are game-changers. We no longer have to worry about scraping blocks, and the setup was incredibly easy. Within minutes, we had a reliable scraping system pulling the data we needed. I highly recommend this API for any serious developer!"

John M. – Senior Data Engineer

"We’ve tried multiple scraping tools, but Scrapfly’s Web Scraping API stands out for its reliability and speed. With the cloud browser functionality, we were able to scrape dynamic content from JavaScript-heavy websites seamlessly. The real-time data collection helped us make faster, more informed decisions, and the 99.99% uptime is just unmatched."

Samantha C. – CTO

"Scalability was a major concern for us as our data scraping needs grew. Scrapfly’s Web Scraping API not only handled our increased requests but did so without a hitch. The proxy rotation across 120+ countries ensured we could access data from any region, and their comprehensive documentation made implementation a breeze. It's the most robust API we’ve used."

Alex T. – Founder

Capterra review badge

Frequently Asked Questions

What is an AI Web Scraping API?

AI Web Scraping API is a service that abstracts away the complexities and challenges of web scraping and data extraction using AI tools like machine learning and large language models. This significantly simplifies web scraping and allows developers to access web data through API calls for specific data objects or freeform prompt queries.

How can I access the AI Web Scraping API?

AI Web Scraping HTTP API can be accessed in any http client like curl, httpie or any http client library in any programming language. For first-class support we offer Python and Typescript SDKs.

Is AI web scraping legal?

Yes, generally web scraping is legal in most places around the world. This is especially applicable to AI web scraping as it's easier to control the collected data scope. For more see our in-depth web scraping laws article.

What AI does AI Web Scraping API use?

The AI Web Scraping API uses a combination of machine learning and large language models (LLMs) to make web scraping as seamless as possible. This also means the data can be accessed through pre-defined data models (like product, article etc.) or through flexible LLM prompts.

How long does it take to get results from the AI Web Scraping API?

Scrape duration varies based on used features like cloud browsers, browser actions and extraction method. Some scrapes can be almost instantenous while scrape requests with large LLM prompts can take longer to process.

How do I debug my AI web scrapers?

AI Web Scraping API returns several important details that can help you debug your scrapers. For example, if a pre-defined model is used the API will return self-evaluation report on how many of the schema fields were scraped. Also each request is logged and stored in the web dashboard allowing for easy inspection and replay of scrape commands.