  // PRODUCT# AI Web Scraping API

 Scrape and extract structured data in a single API call. Anti-bot bypass, JavaScript rendering, and AI-powered extraction - all baked into one endpoint.

##  One call. Scraped HTML + structured JSON. 

- **Combined pipeline.** No two-step scrape-then-extract workflow. One request returns both the raw page and AI-extracted data in the same response.
- **19+ pre-trained models built in.** Pass `extraction_model='product'` and get title, price, images, and more - no selectors to write.
 
 [ Get Free API Key ](https://scrapfly.io/register) [ Developer Docs ](https://scrapfly.io/docs/scrape-api/extraction) 

 1,000 free credits. No credit card required. 

 





'; var lanesEl = el.querySelector('[data-lanes]'); // Engine slug → marketing-source SVG (filled silhouette, FA-style). // The two icon URLs are pre-resolved by Twig's asset() helper and // stored on the wrapper as data-attrs, so the  picks up the // correct CDN-prefixed URL in prod and the same-origin URL in dev // without the JS having to know about it. var engineIcons = { CURLIUM: el.dataset.curliumIcon, SCRAPIUM: el.dataset.scrapiumIcon }; function engineIconSrc (engineName) { return engineIcons[engineName] || ''; } function buildLane (idx, initialStep) { var job = pickJob(); var laneEl = document.createElement('div'); laneEl.className = 'anim-scrape__lane'; laneEl.innerHTML = '' + '' + '' + job.country + '/' + job.pool + '' + '`' + job.path + '`' + '' + stageNames\[initialStep\] + '' + '' + '

' + '

'; return { el: laneEl, engineIconEl: laneEl.querySelector('[data-engine-icon]'), geoEl: laneEl.querySelector('[data-geo]'), pathEl: laneEl.querySelector('[data-path]'), stageEl: laneEl.querySelector('[data-stage-text]'), elapsedEl: laneEl.querySelector('[data-elapsed]'), barEl: laneEl.querySelector('[data-bar]'), step: initialStep, cumul: 0, job: job }; } // Three lanes, each starting at a different stage so they look // genuinely parallel from the first frame (no staggered fade-in). var lanes = [buildLane(0, 0), buildLane(1, 1), buildLane(2, 2)]; lanes.forEach(function (l) { lanesEl.appendChild(l.el); }); // Live RPS readout — sums lane completions over a rolling window. var statRpsEl = el.querySelector('[data-rps]'); var completionTimes = []; function updateRps () { var now = Date.now(); completionTimes = completionTimes.filter(function (t) { return now - t &lt; 1000; }); statRpsEl.textContent = completionTimes.length === 0 ? ',' : String(completionTimes.length * 3); } function tickLane (lane) { var s = lane.step; var deltas = lane.job.deltas; if (s &lt; deltas.length) { var d = deltas[s][0] + Math.random() * (deltas[s][1] - deltas[s][0]); lane.cumul += d; lane.stageEl.textContent = stageNames[s]; lane.elapsedEl.textContent = Math.round(lane.cumul) + 'ms'; } lane.barEl.setAttribute('data-progress', String(s)); lane.step++; if (lane.step &gt; deltas.length) { // Cycle complete — record completion, then re-roll the entire // job (path + engine + geo + pool) so each lane keeps showing // realistic Scrapfly variety rather than reusing the same // engine/path forever. completionTimes.push(Date.now()); lane.step = 0; lane.cumul = 0; lane.job = pickJob(); lane.pathEl.textContent = lane.job.path; lane.geoEl.textContent = lane.job.country + '/' + lane.job.pool; lane.engineIconEl.src = engineIconSrc(lane.job.engine); lane.engineIconEl.alt = lane.job.engine; lane.engineIconEl.title = lane.job.engine; lane.stageEl.textContent = stageNames[0]; lane.elapsedEl.textContent = ''; } } // Each lane ticks on its own interval — staggered so they don't // synchronize over time (~480ms ± per-lane jitter). var intervals = lanes.map(function (lane, i) { // Phase-stagger first tick by 160ms × lane index. setTimeout(function () { tickLane(lane); }, i * 160); return setInterval(function () { tickLane(lane); }, 460 + i * 40); }); var rpsInterval = setInterval(updateRps, 250); // Return the first interval so the existing teardown contract holds. // (No teardown is currently invoked, but stay symmetrical with the // other drivers that all return one setInterval handle.) void intervals; void rpsInterval; return intervals[0]; }, browser: function (el) { el.innerHTML = 'CDP EVENTS ' + '
'; var feed = el.querySelector('[data-feed]'); // Each event carries a realistic [minDelta, maxDelta] in ms — the // gap from the *previous* event in the same request flight. Numbers // mirror what Chrome DevTools shows on a real CDP trace: tens of ms // between network events, hundreds for DOM/load milestones, ~10ms // for Input dispatch. Hardcoded timestamps are dropped from detail // strings so the feed-time column is the single source of truth. var events = [ ['Network.requestWillBeSent', 'GET web-scraping.dev/abc', [15, 25]], ['Page.frameStartedLoading', 'frame=main', [5, 15]], ['Network.responseReceived', 'status=200, type=document', [40, 120]], ['Page.domContentEventFired', 'frame=main', [180, 320]], ['Runtime.executionContextCreated', 'origin=web-scraping.dev', [10, 25]], ['DOM.documentUpdated', 'nodes=1,284', [20, 60]], ['Page.loadEventFired', 'frame=main', [120, 240]], ['Network.dataReceived', '124.3 KB', [15, 45]], ['Input.dispatchMouseEvent', 'click (842, 316)', [5, 15]] ]; var i = 0; // tCdp is the simulated CDP clock in ms, NOT wall-clock time. It // resets at the start of each request flight (every full cycle of // events) so the feed reads as a fresh trace, not a 30-minute // session log. var tCdp = 0; function tick () { var idx = i % events.length; if (idx === 0) tCdp = 0; var ev = events[idx]; var jitter = ev[2][0] + Math.random() * (ev[2][1] - ev[2][0]); tCdp += jitter; var dt = Math.round(tCdp) + 'ms'; var li = document.createElement('li'); li.innerHTML = '' + dt + '' + '' + ev\[0\] + '' + '' + ev\[1\] + ''; feed.insertBefore(li, feed.firstChild); while (feed.children.length &gt; 6) feed.removeChild(feed.lastChild); i++; } for (var k = 0; k &lt; 4; k++) tick(); return setInterval(tick, 950); }, screenshot: function (el) { el.innerHTML = 'CAPTURING ' + '' + '' + '

' + '

' + '

' + '' + 'PNG' + 'JPEG' + 'WEBP' + 'FULL PAGE' + '

' + '

'; var fmts = el.querySelectorAll('[data-fmt]'); var shutter = el.querySelector('[data-shutter]'); var spec = el.querySelector('[data-spec]'); var elapsed = el.querySelector('[data-elapsed]'); // Each format combines a realistic viewport spec, capture latency, // and resulting payload size. Numbers cross-checked against // Scrapfly screenshot benchmarks: PNG/JPEG/WEBP on 1920×1080 land // 180-400ms; full-page on a long article scrolls + stitches and // takes 700-1200ms. var presets = [ { dim: '1920×1080', size: '184 KB', latencyMs: [180, 320] }, { dim: '1920×1080', size: '92 KB', latencyMs: [160, 260] }, { dim: '1920×1080', size: '76 KB', latencyMs: [200, 360] }, { dim: '1920×6840', size: '1.4 MB', latencyMs: [780, 1180] } ]; var step = 0; var anim = null; function tick () { var p = presets[step]; var latency = Math.round(p.latencyMs[0] + Math.random() * (p.latencyMs[1] - p.latencyMs[0])); spec.textContent = p.dim + ' • ' + p.size; elapsed.textContent = latency + 'ms'; // Web Animations API for the shutter sweep — replaces a CSS // transition + offsetWidth-reflow restart trick. Each tick we // cancel the previous animation and run a fresh one; WAAPI keeps // the work on the compositor thread, so no main-thread reflow. if (anim) anim.cancel(); anim = shutter.animate( [{ width: '0%' }, { width: '100%' }], { duration: latency, easing: 'cubic-bezier(.2,.8,.2,1)', fill: 'forwards' } ); fmts.forEach(function (f, i) { f.classList.toggle('anim-screenshot__format--active', i === step); }); step = (step + 1) % fmts.length; } tick(); return setInterval(tick, 1500); }, extract: function (el) { el.innerHTML = 'SCHEMA HYDRATION ' + '' + '{ name: \_\_\_\_\_\_\_\_\_\_\_\_,
' + ' price: \_\_\_\_\_\_\_\_\_\_\_\_,
' + ' in\_stock: \_\_\_\_,
' + ' rating: \_\_\_\_ }' + '

'; var records = [ { name: '"Widget Pro"', price: '49.99', in_stock: 'true', rating: '4.7' }, { name: '"Acme Runner"', price: '129.00', in_stock: 'true', rating: '4.3' }, { name: '"Vintage Chair"', price: '340.00', in_stock: 'false', rating: '4.9' }, { name: '"Coffee Grinder"', price: '89.50', in_stock: 'true', rating: '4.6' } ]; var keys = ['name', 'price', 'in_stock', 'rating']; var stat = el.querySelector('[data-stat]'); // Counter that ticks up each completed record so the panel reads // as "ongoing batch extraction" rather than a single shot demo. var totalRecords = 0; var rec = 0, step = 0; function tick () { var key = keys[step]; var field = el.querySelector('[data-field="' + key + '"]'); if (field) { field.textContent = records[rec % records.length][key]; field.className = 'v v-new'; } step++; if (step &gt;= keys.length) { step = 0; rec++; totalRecords++; if (stat) stat.textContent = totalRecords.toLocaleString('en-US') + ' records • ~340ms/rec'; setTimeout(function () { keys.forEach(function (k) { var f = el.querySelector('[data-field="' + k + '"]'); if (!f) return; f.textContent = k === 'in_stock' || k === 'rating' ? '____' : '____________'; f.className = 'pending'; }); }, 600); } } // Faster field reveal — 250ms feels like a template extraction // (regex/CSS), not a slow LLM dribble. Total per-record: ~1s. return setInterval(tick, 250); }, crawl: function (el) { el.innerHTML = '' + '**0 urls discovered**' + 'depth 1/5 • 0 req/s' + '

' + '```
web-scraping.dev/
```

'; var countEl = el.querySelector('[data-count]'); var depthEl = el.querySelector('[data-depth]'); var rpsEl = el.querySelector('[data-rps]'); var treeEl = el.querySelector('[data-tree]'); var branches = [ '├─ /products (1,284 pages)', '│ ├─ /products/shoes (392)', '│ ├─ /products/bags (218)', '│ └─ /products/accessories (674)', '├─ /articles (3,902 pages)', '│ ├─ /articles/2024/', '│ └─ /articles/2025/', '├─ /reviews (8,401)', '└─ /sitemap.xml' ]; // Counter starts plausible, climbs by realistic-per-tick batches // (~10 req/s sustained = 65/tick at 650ms cadence; we vary per // tick to read as live discovery rather than a clock). var count = 1, branchIdx = 0, depth = 1; function tick () { var batch = 50 + Math.floor(Math.random() * 60); count += batch; countEl.textContent = count.toLocaleString('en-US'); // RPS oscillates around 8-15 — the typical Scrapfly crawler // throttle for a single seed under default politeness. rpsEl.textContent = String(8 + Math.floor(Math.random() * 8)); if (branchIdx &lt; branches.length) { treeEl.innerHTML += '\n' + branches[branchIdx]; branchIdx++; depth = Math.min(5, 1 + Math.floor(branchIdx / 2)); depthEl.textContent = String(depth); } else { setTimeout(function () { treeEl.innerHTML = 'web-scraping.dev/'; branchIdx = 0; depth = 1; count = 1; depthEl.textContent = '1'; countEl.textContent = '1'; }, 1400); branchIdx = branches.length + 1; } } return setInterval(tick, 650); } }; document.querySelectorAll('[data-hero-anim]').forEach(function (el) { var kind = el.getAttribute('data-hero-anim'); var driver = drivers[kind]; if (driver) driver(el); }); })(); 

 

 

---

## 1

API call returns scrape + extracted JSON

 



 

## 19+

pre-trained AI extraction models

 



 

## 98%

success on Cloudflare-protected targets

 



 

## 55k+

developers building on Scrapfly

 



 

 

 

---

 CAPABILITIES## Scrape and Extract in One Pipeline

Anti-bot bypass, rendering, pre-trained models, LLM prompts, and template rules - all composable, all on one endpoint.

 

 ### From URL to Structured JSON - One Request

The traditional workflow requires two calls: one to fetch the page, one to parse it. This API collapses both into a single request. Anti-bot bypass, JavaScript rendering, and AI extraction run in sequence on the same endpoint. The response carries both the raw scraped content and the structured `extracted_data` object together.

  **URL + Schema / Prompt** one call: url, extraction\_model or extraction\_prompt or extraction\_template 

 

  **ASP Fetch** anti-bot bypass via Curlium + Scrapium, residential proxies, challenge solver 

 

  **Render and Parse** optional JS rendering, screenshot capture, browser\_data XHR capture 

 

  **LLM Extraction** vision + DOM + content-aware reasoning over the fetched page 

 

  **Schema Validator** output conforms to model schema or your JSON schema; data\_quality coverage report included 

 

  **Structured JSON** result.extracted\_data alongside raw HTML, screenshots, log\_url, and cost headers 

 

 

 [extraction\_model](https://scrapfly.io/docs/scrape-api/extraction) 

 [extraction\_prompt](https://scrapfly.io/docs/scrape-api/extraction) 

 [extraction\_template](https://scrapfly.io/docs/scrape-api/extraction) 

 [asp=true](https://scrapfly.io/docs/scrape-api/anti-scraping-protection) 

 [render\_js](https://scrapfly.io/docs/scrape-api/javascript-rendering) 

 [proxy\_pool](https://scrapfly.io/docs/scrape-api/proxy) 

 extracted\_data 

 data\_quality 

 

[View extraction docs →](https://scrapfly.io/docs/scrape-api/extraction)

 



 

 

 ### Pre-Trained AI Models

Set `extraction_model` and get a predictable JSON schema for your page type. No selectors to write, no XPath to maintain. The model applies across any domain that matches the schema - same field names, same output shape, every time.

  **19+** preset models 

  **any domain** same schema 

  **data\_quality** coverage score 

 

 [article](https://scrapfly.io/docs/extraction-api/automatic-ai/models/article) 

 [event](https://scrapfly.io/docs/extraction-api/automatic-ai/models/event) 

 [food\_recipe](https://scrapfly.io/docs/extraction-api/automatic-ai/models/food_recipe) 

 [hotel](https://scrapfly.io/docs/extraction-api/automatic-ai/models/hotel) 

 [hotel\_listing](https://scrapfly.io/docs/extraction-api/automatic-ai/models/hotel_listing) 

 [job\_listing](https://scrapfly.io/docs/extraction-api/automatic-ai/models/job_listing) 

 [+ 13 more models](https://scrapfly.io/docs/extraction-api/automatic-ai) 

 [Browse all models](https://scrapfly.io/docs/extraction-api/automatic-ai) 

 

[View pre-trained model docs →](https://scrapfly.io/docs/extraction-api/automatic-ai)

 



 

 ### LLM and Schema Extraction

Two LLM-powered modes beyond pre-trained models. Pass a JSON Schema with `extraction_prompt` and the LLM shapes its output to that structure. Or pass a free-form natural-language instruction and get back a JSON object matching your question. Both modes use vision, DOM, and content signals together.

  **JSON Schema** structured output 

  **Freeform** any question 

  **Vision** + DOM aware 

 

 [ // LLM + JSON SCHEMA **extraction\_prompt + schema** pass a JSON Schema alongside your prompt - guaranteed output shape ](https://scrapfly.io/docs/extraction-api/llm-prompt) 

 [ // FREEFORM PROMPT **extraction\_prompt (plain text)** natural-language instruction, any layout, ideal for ad-hoc analysis ](https://scrapfly.io/docs/extraction-api/llm-prompt) 

 

[View LLM prompt docs →](https://scrapfly.io/docs/extraction-api/llm-prompt)

 



 

 

 ### Template Extraction

Define your own CSS, XPath, or JMESPath selectors in a JSON template. Chain type extractors and formatters on any field for precise, repeatable output from a known site structure. Deterministic - same input always yields the same shape.

  **CSS** selectors 

  **XPath** expressions 

  **JMESPath** JSON queries 

 

type extractors

field formatters

nested fields

repeating rows

 

[View template docs →](https://scrapfly.io/docs/extraction-api/rules-and-template)

 



 

 ### Anti-Bot and Proxy Built In

Every request inherits the full Web Scraping API bypass stack. Add `asp=true` and Scrapfly detects the active protection layer, assembles a coherent fingerprint, and solves any challenge before extraction runs. Failed challenge retries do not cost credits.

  **Curlium** HTTP fingerprint 

  **Scrapium** browser identity 

  **190+ countries** proxy coverage 

  **Free retries** on challenge fail 

 

 [Cloudflare](https://scrapfly.io/bypass/cloudflare) 

 [DataDome](https://scrapfly.io/bypass/datadome) 

 [Akamai](https://scrapfly.io/bypass/akamai) 

 [PerimeterX](https://scrapfly.io/bypass/perimeterx) 

 [Kasada](https://scrapfly.io/bypass/kasada) 

 [Imperva](https://scrapfly.io/bypass/incapsula) 

 [F5](https://scrapfly.io/bypass/f5) 

 [AWS WAF](https://scrapfly.io/bypass/aws-waf) 

 

[View ASP docs →](https://scrapfly.io/docs/scrape-api/anti-scraping-protection)

 



 

 

 ### Extraction Result Cache

Extraction results are cached per (URL, schema) pair. Identical requests return the cached structured JSON without re-running the LLM. The raw scrape still runs fresh; only the extraction layer is served from cache when the input is unchanged.

  **Per (URL + schema)** cache key 

 

Faster repeat calls

Lower extraction cost on reruns

 

 



 

 ### Observability Built In

Every response carries a `log_url`. Follow it to inspect the full request, response headers, rendered HTML, HAR waterfall, and screenshots. The `content_replay_url` lets you re-run extraction against the stored HTML without making another scrape call. The `data_quality` field on `extracted_data` reports field-level coverage so low-confidence extractions are visible before they reach your pipeline.

  **log\_url** full trace 

  **Replay** no extra scrape 

  **data\_quality** field coverage 

  **Cost headers** per request 

 

log\_url

content\_replay\_url

data\_quality

X-Scrapfly-Api-Cost

 

[View debug and observability docs →](https://scrapfly.io/docs/scrape-api/debug)

 



 

 

 ### Works With Your AI Stack

The API returns plain JSON, which plugs into any framework that can consume a URL. LangChain and LlamaIndex wrappers are available in the official SDKs. For RAG pipelines, the structured response is already chunked by field - no further parsing required. Custom agent stacks call the endpoint directly with the same `api_key` used for all other Scrapfly products.

  **LangChain** SDK integration 

  **LlamaIndex** SDK integration 

  **RAG pipelines** pre-structured 

  **Agent stacks** plain HTTP 

 

Python SDK

TypeScript SDK

HTTP / cURL

Research assistants

Data pipelines

Training data prep

 

 



 

 ### Related APIs

The AI Web Scraping API is the combined product. Each layer is also available standalone for tighter control.

 [ // SCRAPE ONLY **Web Scraping API** anti-bot bypass and proxies without the extraction layer ](https://scrapfly.io/products/web-scraping-api) 

 [ // EXTRACT ONLY **Data Extraction API** structured parsing from HTML you already have - no scrape needed ](https://scrapfly.io/products/extraction-api) 

 [ // SCALE VIA CRAWL **Crawler API** run the same scrape-and-extract pipeline across an entire domain ](https://scrapfly.io/products/crawler-api) 

 

 



 

 

 ### Preset Models at a Glance

Each preset maps a page type to a documented JSON schema with named fields. Pass the model name as `extraction_model` and get back the same structure from any URL matching that type, regardless of domain or layout.

 [article](https://scrapfly.io/docs/extraction-api/automatic-ai/models/article) 

 [event](https://scrapfly.io/docs/extraction-api/automatic-ai/models/event) 

 [food\_recipe](https://scrapfly.io/docs/extraction-api/automatic-ai/models/food_recipe) 

 [hotel](https://scrapfly.io/docs/extraction-api/automatic-ai/models/hotel) 

 [hotel\_listing](https://scrapfly.io/docs/extraction-api/automatic-ai/models/hotel_listing) 

 [job\_listing](https://scrapfly.io/docs/extraction-api/automatic-ai/models/job_listing) 

 [job\_posting](https://scrapfly.io/docs/extraction-api/automatic-ai/models/job_posting) 

 [organization](https://scrapfly.io/docs/extraction-api/automatic-ai/models/organization) 

 [product](https://scrapfly.io/docs/extraction-api/automatic-ai/models/product) 

 [product\_listing](https://scrapfly.io/docs/extraction-api/automatic-ai/models/product_listing) 

 [real\_estate\_property](https://scrapfly.io/docs/extraction-api/automatic-ai/models/real_estate_property) 

 [real\_estate\_property\_listing](https://scrapfly.io/docs/extraction-api/automatic-ai/models/real_estate_property_listing) 

 [review\_list](https://scrapfly.io/docs/extraction-api/automatic-ai/models/review_list) 

 [search\_engine\_results](https://scrapfly.io/docs/extraction-api/automatic-ai/models/search_engine_results) 

 [social\_media\_post](https://scrapfly.io/docs/extraction-api/automatic-ai/models/social_media_post) 

 [software](https://scrapfly.io/docs/extraction-api/automatic-ai/models/software) 

 [stock](https://scrapfly.io/docs/extraction-api/automatic-ai/models/stock) 

 [vehicle\_ad](https://scrapfly.io/docs/extraction-api/automatic-ai/models/vehicle_ad) 

 [vehicle\_ad\_listing](https://scrapfly.io/docs/extraction-api/automatic-ai/models/vehicle_ad_listing) 

 

[View all preset model schemas →](https://scrapfly.io/docs/extraction-api/automatic-ai)

 



 

 

 ### Output Format

The extraction result lands in `result.extracted_data` alongside the scrape. The raw response body is still present - use markdown or screenshot format for LLM context, or HTML for downstream parsing.

JSON (extracted\_data)

Markdown body

HTML body

Screenshot

browser\_data

 

 



 

 ### Scale With the Crawler API

The single-URL endpoint handles individual requests. When you need the same extraction across an entire site - product catalog, news archive, property listings - pair it with the Crawler API. The crawler handles discovery, deduplication, and scheduling; each page passes through the same extraction pipeline.

[Crawler API →](https://scrapfly.io/products/crawler-api)

 



 

 ### Data Privacy

Document content is processed in memory and discarded after the response is returned. Scrapfly does not store, share, or use your extracted data or page content for training AI models. See the privacy policy for full details.

In-memory processing

Not used for AI training

 

 



 

 

 

---

 CODE## Scrape + Extract in One Call

Combine anti-bot bypass with AI extraction in a single request. Pick your extraction strategy.

 

 [ AI Auto Models ](#awsa-strat-auto) [ LLM + JSON Schema ](#awsa-strat-llm-structured) [ LLM Freeform Prompt ](#awsa-strat-llm-prompt) 

Pre-trained schemas for product, article, review, job\_posting, and more.

     Python TypeScript HTTP / cURL  

    

 ```
# pip install scrapfly-sdk[all]

from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

client = ScrapflyClient(key="API KEY")

api_response: ScrapeApiResponse = client.scrape(
    ScrapeConfig(
        url='https://web-scraping.dev/product/1',
        # what object to scrape? product, review, real estate listing etc.
        extraction_model="product",
    )
)
print(api_response.scrape_result['extracted_data']['data'])
```

 ```
import { 
    ScrapflyClient, ScrapeConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: "API KEY" });
let api_result = await client.scrape(
    new ScrapeConfig({
        url: 'https://web-scraping.dev/product/1',
        // what object to scrape? product, review, real estate listing etc.
        extraction_model: "product",
    })
);
console.log(api_result.result.extracted_data);
```

 ```
http https://api.scrapfly.io/scrape \
key==$SCRAPFLY_KEY \
url==https://web-scraping.dev/product/1 \
extraction_model==product
```

 

 

 [ Python SDK docs → ](https://scrapfly.io/docs/sdk/python) [ TypeScript SDK docs → ](https://scrapfly.io/docs/sdk/typescript) [ HTTP API docs → ](https://scrapfly.io/docs) 

 

Give the LLM a schema, get that shape back. Guaranteed.

     Python TypeScript HTTP / cURL  

    

 ```
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

client = ScrapflyClient(key="API KEY")

api_response: ScrapeApiResponse = client.scrape(
    ScrapeConfig(
        url='https://web-scraping.dev/product/1',
        # Prompt for specific structured data formats:
        extraction_prompt="Extract product features in JSON format",
    )
)
print(api_response.scrape_result['extracted_data'])
```

 ```
import { 
    ScrapflyClient, ScrapeConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: "API KEY" });
let api_result = await client.scrape(
    new ScrapeConfig({
        url: 'https://web-scraping.dev/product/1',
        // Prompt for specific structured data formats:
        extraction_prompt: "Extract product features in JSON format",
    })
);
console.log(api_result.result.extracted_data);
```

 ```
http https://api.scrapfly.io/scrape \
key==$SCRAPFLY_KEY \
url==https://web-scraping.dev/product/1 \
"extraction_prompt=Extract product features in JSON format"
```

 

 

 [ Python SDK docs → ](https://scrapfly.io/docs/sdk/python) [ TypeScript SDK docs → ](https://scrapfly.io/docs/sdk/typescript) [ HTTP API docs → ](https://scrapfly.io/docs) 

 

Natural-language extraction instructions. Use for any layout.

     Python TypeScript HTTP / cURL  

    

 ```
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

client = ScrapflyClient(key="API KEY")

api_response: ScrapeApiResponse = client.scrape(
    ScrapeConfig(
        url='https://web-scraping.dev/product/1',
        # Use any LLM prompt:
        extraction_prompt="What's price of the product?",
    )
)
print(api_response.scrape_result['extracted_data'])
```

 ```
import { 
    ScrapflyClient, ScrapeConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: "API KEY" });
let api_result = await client.scrape(
    new ScrapeConfig({
        url: 'https://web-scraping.dev/product/1',
        // Use any LLM prompt:
        extraction_prompt: "What's price of the product?",
    })
);
console.log(api_result.result.extracted_data);
```

 ```
http https://api.scrapfly.io/scrape \
key==$SCRAPFLY_KEY \
url==https://web-scraping.dev/product/1 \
"extraction_prompt=What's price of the product?"
```

 

 

 [ Python SDK docs → ](https://scrapfly.io/docs/sdk/python) [ TypeScript SDK docs → ](https://scrapfly.io/docs/sdk/typescript) [ HTTP API docs → ](https://scrapfly.io/docs) 

 

 

 

---

 LEARN## Docs, Tools, And Ready-Made Scrapers

Everything you need to go from a URL to a production data pipeline.

 

 ### API Reference

Every extraction parameter, every response field, with runnable examples for all three strategies.

 [ Developer Docs → ](https://scrapfly.io/docs/scrape-api/extraction) 



 

 ### Academy

Interactive courses on web scraping, anti-bot bypass, HTML parsing, and structured data extraction.

 [ Start learning → ](https://scrapfly.io/academy) 



 

 ### Open-Source Scrapers

40+ production-ready scrapers on GitHub. Each one uses the AI extraction pipeline for the parsing step.

 [ Explore repo → ](https://github.com/scrapfly/scrapfly-scrapers) 



 

 ### Developer Tools

CSS selector tester, cURL-to-Python, JA3 checker, HTTP/2 fingerprint, and more.

 [ Browse tools → ](https://scrapfly.io/web-scraping-tools) 



 

 

 

---

  // INTEGRATIONS## Seamlessly integrate with frameworks &amp; platforms

Plug Scrapfly into your favorite tools, or build custom workflows with our first-class SDKs.

 ### No-code automation

 [  Zapier ](https://scrapfly.io/integration/zapier) [  Make ](https://scrapfly.io/integration/make) [  n8n ](https://scrapfly.io/integration/n8n) 

 

### LLM &amp; RAG frameworks

 [  LlamaIndex ](https://scrapfly.io/integration/llamaindex) [  LangChain ](https://scrapfly.io/integration/langchain) [  CrewAI ](https://scrapfly.io/integration/crewai) 

 

### First-class SDKs

 [  Python pip install scrapfly-sdk ](https://scrapfly.io/docs/sdk/python) [  TypeScript Node, Deno, Bun ](https://scrapfly.io/docs/sdk/typescript) [  Go go get scrapfly-sdk ](https://scrapfly.io/docs/sdk/golang) [  Rust cargo add scrapfly-sdk ](https://scrapfly.io/docs/sdk/rust) [  Scrapy Full-feature extension ](https://scrapfly.io/docs/sdk/scrapy) 

 

 

 [ See all integrations  ](https://scrapfly.io/integration) 

 

---

  FAQ## Frequently Asked Questions

 

  ### What is the AI Web Scraping API?

 The AI Web Scraping API combines the Web Scraping API and the Extraction API into a single call. You send a URL and get back both the scraped page content and AI-extracted structured JSON in the same response. Anti-bot bypass, JavaScript rendering, residential proxies, and all three extraction strategies (pre-trained models, LLM prompts, CSS/XPath templates) are available on the same endpoint.

 

   ### How is this different from calling the Web Scraping API and Extraction API separately?

 The two-step approach requires two API calls, two sets of credentials to manage, and code to pipe the HTML from one response into the next request. The AI Web Scraping API does both in one call, returning `extracted_data` alongside the raw scrape result. It also inherits the full feature set of the Web Scraping API - asp bypass, render\_js, sessions, webhooks - without any additional configuration.

 

   ### Which extraction strategy should I use?

 Use `extraction_model` when your page matches a standard schema - product pages, news articles, job listings, reviews. The model returns a predictable JSON shape across any domain. Use `extraction_template` when you need precise, repeatable extraction from a known site with a CSS or XPath template you define. Use `extraction_prompt` for ad-hoc questions, rapidly changing layouts, or anything that doesn't fit a fixed schema.

 

   ### How many pre-trained models are available?

 There are 19+ pre-trained models covering the most common page types: product, article, review, job posting, real estate listing, recipe, event, and more. Every model returns a documented JSON schema with a `data_quality` coverage report so you can detect low-confidence extractions before they reach your pipeline.

 

   ### Does anti-bot bypass work with AI extraction?

 Yes. Add `asp=true` to any call and the API handles Cloudflare, DataDome, Akamai, PerimeterX, Kasada, Imperva, F5, and AWS WAF - the same bypass stack as the Web Scraping API. Extraction runs on the successfully retrieved page content, so protected sites work exactly the same as unprotected ones.

 

   ### Is my data used for AI training?

 No. Scrapfly does not store, share, or use your document content for training AI models. Data is processed in memory and discarded after the response is returned. See the privacy policy for full details.

 

   ### Is web scraping legal?

 Scraping publicly accessible data is legal in most jurisdictions (Meta v. Bright Data and hiQ v. LinkedIn have established strong precedent). You are responsible for respecting robots.txt, rate limits, and target terms of service. See [our legal overview](https://scrapfly.io/is-web-scraping-legal) for details.

 

  

 

  ---

 // PRICING## Transparent, usage-based pricing

One plan covers the full Scrapfly platform. Pick a monthly credit budget; every API shares the same credit pool. No per-product lock-in, no surprise line items.

 

  **Free tier**1,000 free credits on signup. No credit card required.

 

 

  **Pay on success**You only pay for successful requests. Failed calls are free.

 

 

  **No lock-in**Upgrade, downgrade, or cancel anytime. No contract.

 

 

 

 [ See pricing  ](https://scrapfly.io/pricing) [ Start free ](https://scrapfly.io/register) 

 

 

### Need more control? We unbundle the stack.

 The AI Web Scraping API is the batteries-included product. Each layer is also available standalone: [Web Scraping API](https://scrapfly.io/products/web-scraping-api) for [anti-bot bypass](https://scrapfly.io/bypass) and proxies without extraction, [Extraction API](https://scrapfly.io/products/extraction-api) for structured parsing from HTML you already have, [Browser API](https://scrapfly.io/products/cloud-browser-api) for hosted Playwright / Puppeteer, [AI Browser Agent](https://scrapfly.io/products/ai-browser-agent) for autonomous agent loops, [Scrapium](https://scrapfly.io/scrapium) for stealth Chromium you drive directly, or [Curlium](https://scrapfly.io/curlium) for byte-perfect HTTP with Chrome TLS fingerprints.

 

 [Get Free API Key](https://scrapfly.io/register)1,000 free credits. No card.