# Scrapfly Documentation

## Table of Contents

### Dashboard

- [Intro](https://scrapfly.io/docs)
- [Project](https://scrapfly.io/docs/project)
- [Account](https://scrapfly.io/docs/account)
- [Workspace & Team](https://scrapfly.io/docs/workspace-and-team)
- [Billing](https://scrapfly.io/docs/billing)

### Products

#### MCP Server

- [Getting Started](https://scrapfly.io/docs/mcp/getting-started)
- [Tools & API Spec](https://scrapfly.io/docs/mcp/tools)
- [Authentication](https://scrapfly.io/docs/mcp/authentication)
- [Examples & Use Cases](https://scrapfly.io/docs/mcp/examples)
- [FAQ](https://scrapfly.io/docs/mcp/faq)
##### Integrations

- [Overview](https://scrapfly.io/docs/mcp/integrations)
- [Claude Desktop](https://scrapfly.io/docs/mcp/integrations/claude-desktop)
- [Claude Code](https://scrapfly.io/docs/mcp/integrations/claude-code)
- [ChatGPT](https://scrapfly.io/docs/mcp/integrations/chatgpt)
- [Cursor](https://scrapfly.io/docs/mcp/integrations/cursor)
- [Cline](https://scrapfly.io/docs/mcp/integrations/cline)
- [Windsurf](https://scrapfly.io/docs/mcp/integrations/windsurf)
- [Zed](https://scrapfly.io/docs/mcp/integrations/zed)
- [Roo Code](https://scrapfly.io/docs/mcp/integrations/roo-code)
- [VS Code](https://scrapfly.io/docs/mcp/integrations/vscode)
- [LangChain](https://scrapfly.io/docs/mcp/integrations/langchain)
- [LlamaIndex](https://scrapfly.io/docs/mcp/integrations/llamaindex)
- [CrewAI](https://scrapfly.io/docs/mcp/integrations/crewai)
- [OpenAI](https://scrapfly.io/docs/mcp/integrations/openai)
- [n8n](https://scrapfly.io/docs/mcp/integrations/n8n)
- [Make](https://scrapfly.io/docs/mcp/integrations/make)
- [Zapier](https://scrapfly.io/docs/mcp/integrations/zapier)
- [Vapi AI](https://scrapfly.io/docs/mcp/integrations/vapi)
- [Agent Builder](https://scrapfly.io/docs/mcp/integrations/agent-builder)
- [Custom Client](https://scrapfly.io/docs/mcp/integrations/custom-client)


#### Web Scraping API

- [Getting Started](https://scrapfly.io/docs/scrape-api/getting-started)
- [API Specification]()
- [Monitoring](https://scrapfly.io/docs/monitoring)
- [Customize Request](https://scrapfly.io/docs/scrape-api/custom)
- [Debug](https://scrapfly.io/docs/scrape-api/debug)
- [Anti Scraping Protection](https://scrapfly.io/docs/scrape-api/anti-scraping-protection)
- [Proxy](https://scrapfly.io/docs/scrape-api/proxy)
- [Proxy Mode](https://scrapfly.io/docs/scrape-api/proxy-mode)
- [Proxy Mode - Screaming Frog](https://scrapfly.io/docs/scrape-api/proxy-mode/screaming-frog)
- [Proxy Mode - Apify](https://scrapfly.io/docs/scrape-api/proxy-mode/apify)
- [(Auto) Data Extraction](https://scrapfly.io/docs/scrape-api/extraction)
- [Javascript Rendering](https://scrapfly.io/docs/scrape-api/javascript-rendering)
- [Javascript Scenario](https://scrapfly.io/docs/scrape-api/javascript-scenario)
- [SSL](https://scrapfly.io/docs/scrape-api/ssl)
- [DNS](https://scrapfly.io/docs/scrape-api/dns)
- [Cache](https://scrapfly.io/docs/scrape-api/cache)
- [Session](https://scrapfly.io/docs/scrape-api/session)
- [Webhook](https://scrapfly.io/docs/scrape-api/webhook)
- [Screenshot](https://scrapfly.io/docs/scrape-api/screenshot)
- [Errors](https://scrapfly.io/docs/scrape-api/errors)
- [Timeout](https://scrapfly.io/docs/scrape-api/understand-timeout)
- [Throttling](https://scrapfly.io/docs/throttling)
- [Troubleshoot](https://scrapfly.io/docs/scrape-api/troubleshoot)
- [Billing](https://scrapfly.io/docs/scrape-api/billing)
- [FAQ](https://scrapfly.io/docs/scrape-api/faq)

#### Crawler API

- [Getting Started](https://scrapfly.io/docs/crawler-api/getting-started)
- [API Specification]()
- [Retrieving Results](https://scrapfly.io/docs/crawler-api/results)
- [WARC Format](https://scrapfly.io/docs/crawler-api/warc-format)
- [Data Extraction](https://scrapfly.io/docs/crawler-api/extraction-rules)
- [Webhook](https://scrapfly.io/docs/crawler-api/webhook)
- [Billing](https://scrapfly.io/docs/crawler-api/billing)
- [Errors](https://scrapfly.io/docs/crawler-api/errors)
- [Troubleshoot](https://scrapfly.io/docs/crawler-api/troubleshoot)
- [FAQ](https://scrapfly.io/docs/crawler-api/faq)

#### Screenshot API

- [Getting Started](https://scrapfly.io/docs/screenshot-api/getting-started)
- [API Specification]()
- [Accessibility Testing](https://scrapfly.io/docs/screenshot-api/accessibility)
- [Webhook](https://scrapfly.io/docs/screenshot-api/webhook)
- [Billing](https://scrapfly.io/docs/screenshot-api/billing)
- [Errors](https://scrapfly.io/docs/screenshot-api/errors)

#### Extraction API

- [Getting Started](https://scrapfly.io/docs/extraction-api/getting-started)
- [API Specification]()
- [Rules Template](https://scrapfly.io/docs/extraction-api/rules-and-template)
- [LLM Extraction](https://scrapfly.io/docs/extraction-api/llm-prompt)
- [AI Auto Extraction](https://scrapfly.io/docs/extraction-api/automatic-ai)
- [Webhook](https://scrapfly.io/docs/extraction-api/webhook)
- [Billing](https://scrapfly.io/docs/extraction-api/billing)
- [Errors](https://scrapfly.io/docs/extraction-api/errors)
- [FAQ](https://scrapfly.io/docs/extraction-api/faq)

#### Proxy Saver

- [Getting Started](https://scrapfly.io/docs/proxy-saver/getting-started)
- [Fingerprints](https://scrapfly.io/docs/proxy-saver/fingerprints)
- [Optimizations](https://scrapfly.io/docs/proxy-saver/optimizations)
- [SSL Certificates](https://scrapfly.io/docs/proxy-saver/certificates)
- [Protocols](https://scrapfly.io/docs/proxy-saver/protocols)
- [Pacfile](https://scrapfly.io/docs/proxy-saver/pacfile)
- [Secure Credentials](https://scrapfly.io/docs/proxy-saver/security)
- [Billing](https://scrapfly.io/docs/proxy-saver/billing)

#### Cloud Browser API

- [Getting Started](https://scrapfly.io/docs/cloud-browser-api/getting-started)
- [Proxy & Geo-Targeting](https://scrapfly.io/docs/cloud-browser-api/proxy)
- [Unblock API](https://scrapfly.io/docs/cloud-browser-api/unblock)
- [File Downloads](https://scrapfly.io/docs/cloud-browser-api/file-downloads)
- [Session Resume](https://scrapfly.io/docs/cloud-browser-api/session-resume)
- [Human-in-the-Loop](https://scrapfly.io/docs/cloud-browser-api/human-in-the-loop)
- [Debug Mode](https://scrapfly.io/docs/cloud-browser-api/debug-mode)
- [Bring Your Own Proxy](https://scrapfly.io/docs/cloud-browser-api/bring-your-own-proxy)
- [Browser Extensions](https://scrapfly.io/docs/cloud-browser-api/extensions)
##### Integrations

- [Puppeteer](https://scrapfly.io/docs/cloud-browser-api/puppeteer)
- [Playwright](https://scrapfly.io/docs/cloud-browser-api/playwright)
- [Selenium](https://scrapfly.io/docs/cloud-browser-api/selenium)
- [Vercel Agent Browser](https://scrapfly.io/docs/cloud-browser-api/agent-browser)
- [Browser Use](https://scrapfly.io/docs/cloud-browser-api/browser-use)
- [Stagehand](https://scrapfly.io/docs/cloud-browser-api/stagehand)

- [Billing](https://scrapfly.io/docs/cloud-browser-api/billing)
- [Errors](https://scrapfly.io/docs/cloud-browser-api/errors)


### Tools

- [Antibot Detector](https://scrapfly.io/docs/tools/antibot-detector)

### SDK

- [Golang](https://scrapfly.io/docs/sdk/golang)
- [Python](https://scrapfly.io/docs/sdk/python)
- [Rust](https://scrapfly.io/docs/sdk/rust)
- [TypeScript](https://scrapfly.io/docs/sdk/typescript)
- [Scrapy](https://scrapfly.io/docs/sdk/scrapy)

### Integrations

- [Getting Started](https://scrapfly.io/docs/integration/getting-started)
- [LangChain](https://scrapfly.io/docs/integration/langchain)
- [LlamaIndex](https://scrapfly.io/docs/integration/llamaindex)
- [CrewAI](https://scrapfly.io/docs/integration/crewai)
- [Zapier](https://scrapfly.io/docs/integration/zapier)
- [Make](https://scrapfly.io/docs/integration/make)
- [n8n](https://scrapfly.io/docs/integration/n8n)

### Academy

- [Overview](https://scrapfly.io/academy)
- [Web Scraping Overview](https://scrapfly.io/academy/scraping-overview)
- [Tools](https://scrapfly.io/academy/tools-overview)
- [Reverse Engineering](https://scrapfly.io/academy/reverse-engineering)
- [Static Scraping](https://scrapfly.io/academy/static-scraping)
- [HTML Parsing](https://scrapfly.io/academy/html-parsing)
- [Dynamic Scraping](https://scrapfly.io/academy/dynamic-scraping)
- [Hidden API Scraping](https://scrapfly.io/academy/hidden-api-scraping)
- [Headless Browsers](https://scrapfly.io/academy/headless-browsers)
- [Hidden Web Data](https://scrapfly.io/academy/hidden-web-data)
- [JSON Parsing](https://scrapfly.io/academy/json-parsing)
- [Data Processing](https://scrapfly.io/academy/data-processing)
- [Scaling](https://scrapfly.io/academy/scaling)
- [Walkthrough Summary](https://scrapfly.io/academy/walkthrough-summary)
- [Scraper Blocking](https://scrapfly.io/academy/scraper-blocking)
- [Proxies](https://scrapfly.io/academy/proxies)

---

# Retrieving Crawler Results

 [  View as markdown ](https://scrapfly.io/?view=markdown)   Copy for LLM    Copy for LLM  [     Open in ChatGPT ](https://chatgpt.com/?hints=search&prompt=Read%20from%20https%3A%2F%2Fscrapfly.io%2Fdocs%2Fcrawler-api%2Fresults%20so%20I%20can%20ask%20questions%20about%20it.) [     Open in Claude ](https://claude.ai/new?q=Read%20from%20https%3A%2F%2Fscrapfly.io%2Fdocs%2Fcrawler-api%2Fresults%20so%20I%20can%20ask%20questions%20about%20it.) [     Open in Perplexity ](https://www.perplexity.ai/search/new?q=Read%20from%20https%3A%2F%2Fscrapfly.io%2Fdocs%2Fcrawler-api%2Fresults%20so%20I%20can%20ask%20questions%20about%20it.) 

 

 

 Once your crawler has completed, you have multiple options for retrieving the results. Choose the method that best fits your use case: individual URLs, content queries, or complete artifacts.

> **Near-Realtime Results** Results become available in **near-realtime** as pages are crawled. You can query content immediately while the crawler is `RUNNING`. Artifacts (WARC/HAR) are only finalized when `is_finished: true`. Poll the `/crawl/{uuid}/status` endpoint to monitor progress and check `is_success` to determine the outcome.

## Choosing the Right Method

 Select the retrieval method that best matches your use case. Consider your crawl size, processing needs, and infrastructure.

 

   

##### List URLs

  **Best for:**

- URL discovery &amp; mapping
- Failed URL analysis
- Sitemap generation
- Crawl auditing
 
  **Scale:** Any size

 

 

 

   

##### Query Specific

  **Best for:**

- Selective retrieval
- Real-time processing
- On-demand access
- API integration
 
  **Scale:** Any size (per-page)

 

 

 

   

##### Get All Content

  **Best for:**

- Small crawls
- Testing &amp; development
- Quick prototyping
- Simple integration
 
  **Scale:** Best for &lt;100 pages

 

 

 

   Recommended 

##### Download Artifacts

  **Best for:**

- Large crawls (100s-1000s+)
- Long-term archival
- Offline processing
- Data pipelines
 
  **Scale:** Unlimited

 

 

 

 

 

## Retrieval Methods  

 The Crawler API provides four complementary methods for accessing your crawled data. Choose the method that best fits your use case:

    List URLs URL metadata    Query Specific Single page content    Get All Content All pages via API    Download Artifacts WARC/HAR files Recommended   

 ###   List Crawled URLs

 Get a comprehensive list of all URLs discovered and crawled during the job, with detailed metadata for each URL including status codes, depth, and timestamps.

- [  cURL ](#pane-results-urls-curl)
- [  Python ](#pane-results-urls-python)
- [  TypeScript ](#pane-results-urls-typescript)
- [  Go ](#pane-results-urls-go)
 
 ```bash
curl "https://api.scrapfly.io/crawl/{crawler_uuid}/urls?key={{ YOUR_API_KEY }}"
```

 

 ```python
from scrapfly import ScrapflyClient
client = ScrapflyClient(key="{{ YOUR_API_KEY }}")
for entry in client.get_crawl_urls("{crawler_uuid}", status="visited"):
    print(entry.url)
```

 

 ```javascript
import { ScrapflyClient } from 'scrapfly-sdk';
const client = new ScrapflyClient({ key: '{{ YOUR_API_KEY }}' });
const urls = await client.crawlUrls('{crawler_uuid}', { status: 'visited' });
for (const entry of urls.urls) console.log(entry.url);
```

 

 ```go
client, _ := scrapfly.New("{{ YOUR_API_KEY }}")
urls, _ := client.CrawlURLs("{crawler_uuid}", &scrapfly.CrawlURLsOptions{Status: "visited"})
for _, entry := range urls.URLs {
    fmt.Println(entry.URL)
}
```

 

 



 **Filter by status:**

 ```
# Get all visited URLs
curl https://api.scrapfly.io/crawl/<span class="snippet-var" data-var="crawler_uuid" title="Click to edit: Crawler UUID">{crawler_uuid}</span>/urls?key=&status=visited

# Get all failed URLs
curl https://api.scrapfly.io/crawl/<span class="snippet-var" data-var="crawler_uuid" title="Click to edit: Crawler UUID">{crawler_uuid}</span>/urls?key=&status=failed
```

 

   

 

 

 

Response includes URL metadata:

 ```
{
  "urls": [
    {
      "url": "https://web-scraping.dev",
      "status": "visited",
      "depth": 0,
      "status_code": 200,
      "crawled_at": "2025-01-15T10:30:20Z"
    },
    {
      "url": "https://web-scraping.dev/about",
      "status": "visited",
      "depth": 1,
      "status_code": 200,
      "crawled_at": "2025-01-15T10:30:45Z"
    }
  ],
  "total": 847,
  "page": 1,
  "per_page": 100
}
```

 

   

 

 

 

 **Use case:** Audit which pages were crawled, identify failed URLs, or build a sitemap.

> **HTTP Caching Optimization** For completed crawlers (`is_finished: true`), all retrieval endpoints return `Cache-Control: public, max-age=3600, immutable` headers. This enables:
> 
> - **Browser caching:** Automatically cache responses for 1 hour
> - **CDN acceleration:** Content can be cached by intermediate proxies
> - **Reduced API calls:** Repeat requests served from cache without counting against limits
> - **Immutable guarantee:** Content won't change, safe to cache aggressively

 

###   Query Specific Page Content

 Retrieve extracted content for specific URLs from the crawl. Perfect for selective content retrieval without downloading the entire dataset.

#### Single URL Query

Retrieve content for one specific URL using the `url` query parameter:

- [  cURL ](#pane-results-contents-curl)
- [  Python ](#pane-results-contents-python)
- [  TypeScript ](#pane-results-contents-typescript)
- [  Go ](#pane-results-contents-go)
 
 ```bash
curl "https://api.scrapfly.io/crawl/{crawler_uuid}/contents?key={{ YOUR_API_KEY }}&url=https://web-scraping.dev/products&format=markdown"
```

 

 ```python
from scrapfly import ScrapflyClient
client = ScrapflyClient(key="{{ YOUR_API_KEY }}")
md = client.get_crawl_contents("{crawler_uuid}", format="markdown",
                               url="https://web-scraping.dev/products", plain=True)
print(md[:200])
```

 

 ```javascript
import { ScrapflyClient } from 'scrapfly-sdk';
const client = new ScrapflyClient({ key: '{{ YOUR_API_KEY }}' });
const md = await client.crawlContents('{crawler_uuid}', {
    format: 'markdown',
    url: 'https://web-scraping.dev/products',
    plain: true,
});
console.log(typeof md === 'string' ? md.slice(0, 200) : md);
```

 

 ```go
client, _ := scrapfly.New("{{ YOUR_API_KEY }}")
md, _ := client.CrawlContentsPlain("{crawler_uuid}", "https://web-scraping.dev/products", scrapfly.CrawlerFormatMarkdown)
fmt.Println(md[:200])
```

 

 



 

 

Response contains the extracted content for the specified URL:

 ```
# Homepage

Welcome to our site! We provide the best products and services for your needs.

## Our Services

- Web Development
- Mobile Apps
- Cloud Solutions

Contact us today to get started!
```

 

   

 

 

 

##### Plain Mode Efficient

Return raw content directly without JSON wrapper by adding `plain=true`. Perfect for shell scripts and direct file piping:

 ```
# Get raw markdown content (no JSON wrapper)
curl https://api.scrapfly.io/crawl/<span class="snippet-var" data-var="crawler_uuid" title="Click to edit: Crawler UUID">{crawler_uuid}</span>/contents?key=&url=https://web-scraping.dev&formats=markdown&plain=true

# Direct output - pure markdown, no JSON parsing needed:
# Homepage
#
# Welcome to our site...

# Pipe directly to file
curl https://api.scrapfly.io/crawl/<span class="snippet-var" data-var="crawler_uuid" title="Click to edit: Crawler UUID">{crawler_uuid}</span>/contents?key=&url=https://web-scraping.dev&formats=markdown&plain=true > page.md
```

 

   

 

 

 

> **Plain Mode Requirements**- Must specify `url` parameter (single URL only)
> - Must specify exactly one format in `formats` parameter
> - Response Content-Type matches format (e.g., `text/markdown`, `text/html`)
> - No JSON parsing needed - raw content in response body

##### Multipart Response Format

Request a multipart response for single URLs by setting the `Accept` header. Same efficiency benefits as batch queries:

 ```
# Request multipart format for single URL
curl "https://api.scrapfly.io/crawl/<span class="snippet-var" data-var="crawler_uuid" title="Click to edit: Crawler UUID">{crawler_uuid}</span>/contents?key=&url=https://web-scraping.dev&formats=markdown,text" \
  -H "Accept: multipart/related; boundary=custom123"
```

 

   

 

 

 

Response returns multiple formats for the same URL as separate parts:

 ```
HTTP/1.1 200 OK
Content-Type: multipart/related; boundary=custom123
Content-Location: https://web-scraping.dev

--custom123
Content-Type: text/markdown

# Homepage

Welcome to our site...
--custom123
Content-Type: text/plain

Homepage

Welcome to our site...
--custom123--
```

 

   

 

 

 

> **Use Cases for Single URL Multipart**- **Multiple formats efficiently:** Get markdown + text + HTML for the same URL without JSON escaping overhead
> - **Streaming processing:** Process formats as they arrive in the multipart stream
> - **Bandwidth savings:** ~50% smaller than JSON for text content due to no escaping

#### Batch URL Query Efficient

Retrieve content for multiple URLs in a single request. Maximum **100 URLs per request**.

- [  cURL ](#pane-results-batch-curl)
- [  Python ](#pane-results-batch-python)
- [  TypeScript ](#pane-results-batch-typescript)
- [  Go ](#pane-results-batch-go)
 
 ```bash
curl -X POST "https://api.scrapfly.io/crawl/{crawler_uuid}/contents/batch?key={{ YOUR_API_KEY }}&formats=markdown,text" \
  -H "Content-Type: text/plain" \
  -d "https://web-scraping.dev/products
https://web-scraping.dev/product/1
https://web-scraping.dev/product/2"
```

 

 ```python
from scrapfly import ScrapflyClient
client = ScrapflyClient(key="{{ YOUR_API_KEY }}")
batch = client.get_crawl_contents_batch(
    "{crawler_uuid}",
    urls=[
        "https://web-scraping.dev/products",
        "https://web-scraping.dev/product/1",
        "https://web-scraping.dev/product/2",
    ],
    formats=["markdown", "text"],
)
for url, formats in batch.items():
    print(url, "->", len(formats["markdown"]), "chars")
```

 

 ```javascript
import { ScrapflyClient } from 'scrapfly-sdk';
const client = new ScrapflyClient({ key: '{{ YOUR_API_KEY }}' });
const batch = await client.crawlContentsBatch(
    '{crawler_uuid}',
    [
        'https://web-scraping.dev/products',
        'https://web-scraping.dev/product/1',
        'https://web-scraping.dev/product/2',
    ],
    ['markdown', 'text'],
);
for (const [url, formats] of Object.entries(batch)) {
    console.log(url, '->', formats.markdown.length, 'chars');
}
```

 

 ```go
client, _ := scrapfly.New("{{ YOUR_API_KEY }}")
batch, _ := client.CrawlContentsBatch(
    "{crawler_uuid}",
    []string{
        "https://web-scraping.dev/products",
        "https://web-scraping.dev/product/1",
        "https://web-scraping.dev/product/2",
    },
    []scrapfly.CrawlerContentFormat{scrapfly.CrawlerFormatMarkdown, scrapfly.CrawlerFormatText},
)
for url, formats := range batch {
    fmt.Println(url, "->", len(formats[string(scrapfly.CrawlerFormatMarkdown)]), "chars")
}
```

 

 



**Response format:** `multipart/related` (RFC 2387) - Each URL's content is returned as a separate part in the multipart response.

 ```
HTTP/1.1 200 OK
Content-Type: multipart/related; boundary=abc123
X-Scrapfly-Requested-URLs: 3
X-Scrapfly-Found-URLs: 3

--abc123
Content-Type: text/markdown
Content-Location: https://web-scraping.dev/page1

# Page 1

Content here...
--abc123
Content-Type: text/plain
Content-Location: https://web-scraping.dev/page1

Page 1 Content here...
--abc123
Content-Type: text/markdown
Content-Location: https://web-scraping.dev/page2

# Page 2

Different content...
--abc123--
```

 

   

 

 

 

> **Performance &amp; Efficiency** The multipart format provides **~50% bandwidth savings** compared to JSON for text content by eliminating JSON escaping overhead. The response streams efficiently with constant memory usage, making it ideal for large content batches.

##### Parsing Multipart Responses

Use standard HTTP multipart libraries to parse the response:

    Python    JavaScript    Go  

  ```
from email import message_from_bytes
from email.policy import HTTP
import requests

response = requests.post(
    f"https://api.scrapfly.io/crawl/<span class="snippet-var" data-var="crawler_uuid" title="Click to edit: Crawler UUID">{crawler_uuid}</span>/contents/batch",
    params={"key": api_key, "formats": "markdown,text"},
    headers={"Content-Type": "text/plain"},
    data="https://web-scraping.dev/page1\nhttps://web-scraping.dev/page2"
)

# Parse multipart response
msg = message_from_bytes(
    f"Content-Type: {response.headers['Content-Type']}\r\n\r\n".encode() + response.content,
    policy=HTTP
)

# Iterate through parts
for part in msg.iter_parts():
    url = part['Content-Location']
    content_type = part['Content-Type']
    content = part.get_content()

    print(f"{url} ({content_type}): {len(content)} bytes")

    # Store content by URL and format
    if content_type == "text/markdown":
        save_markdown(url, content)
    elif content_type == "text/plain":
        save_text(url, content)
```

 

   

 

 

 

 

 ```
// Node.js with node-fetch and mailparser
import fetch from 'node-fetch';
import { simpleParser } from 'mailparser';

const response = await fetch(
    `https://api.scrapfly.io/crawl/<span class="snippet-var" data-var="crawler_uuid" title="Click to edit: Crawler UUID">{crawler_uuid}</span>/contents/batch?key=${apiKey}&formats=markdown,text`,
    {
        method: 'POST',
        headers: { 'Content-Type': 'text/plain' },
        body: 'https://web-scraping.dev/page1\nhttps://web-scraping.dev/page2'
    }
);

const contentType = response.headers.get('content-type');
const buffer = await response.buffer();

// Parse multipart
const parsed = await simpleParser(
    `Content-Type: ${contentType}\r\n\r\n${buffer.toString('binary')}`
);

// Process each attachment (part)
for (const attachment of parsed.attachments) {
    const url = attachment.headers.get('content-location');
    const contentType = attachment.contentType;
    const content = attachment.content.toString();

    console.log(`${url} (${contentType}): ${content.length} bytes`);
}
```

 

   

 

 

 

 

 ```
package main

import (
    "io"
    "mime"
    "mime/multipart"
    "net/http"
    "strings"
)

func fetchBatchContents(crawlerUUID, apiKey string, urls []string) error {
    body := strings.Join(urls, "\n")

    resp, err := http.Post(
        "https://api.scrapfly.io/crawl/" + crawlerUUID + "/contents/batch?key=" + apiKey + "&formats=markdown,text",
        "text/plain",
        strings.NewReader(body),
    )
    if err != nil {
        return err
    }
    defer resp.Body.Close()

    // Parse multipart boundary
    mediaType, params, err := mime.ParseMediaType(resp.Header.Get("Content-Type"))
    if err != nil || !strings.HasPrefix(mediaType, "multipart/") {
        return err
    }

    // Read multipart parts
    mr := multipart.NewReader(resp.Body, params["boundary"])
    for {
        part, err := mr.NextPart()
        if err == io.EOF {
            break
        }
        if err != nil {
            return err
        }

        url := part.Header.Get("Content-Location")
        contentType := part.Header.Get("Content-Type")
        content, _ := io.ReadAll(part)

        // Process content
        println(url, contentType, len(content), "bytes")
    }

    return nil
}
```

 

   

 

 

 

 

 

#### Batch Query Parameters

 | Parameter | Type | Description |
|---|---|---|
| `key` | Query Param | Your API key (required) |
| `formats` | Query Param | Comma-separated list of formats for batch query (e.g., `markdown,text,html`) |
| Request Body | Plain Text | URLs separated by newlines (for batch query, max 100 URLs) |

##### Response Headers

 | Header | Description |
|---|---|
| `Content-Type` | `multipart/related; boundary=<random>` - Standard HTTP multipart format (RFC 2387) |
| `X-Scrapfly-Requested-URLs` | Number of URLs in your request |
| `X-Scrapfly-Found-URLs` | Number of URLs found in crawl results (may be less if some URLs were not crawled) |

##### Multipart Part Headers

Each part in the multipart response contains:

 | Header | Description |
|---|---|
| `Content-Type` | MIME type of the content (e.g., `text/markdown`, `text/plain`, `text/html`) |
| `Content-Location` | The URL this content belongs to |

 **Available formats:**

- `html` - Raw HTML content
- `clean_html` - HTML with boilerplate removed
- `markdown` - Markdown format (ideal for LLM training data)
- `text` - Plain text only
- `json` - Structured JSON representation
- `extracted_data` - AI-extracted structured data
- `page_metadata` - Page metadata (title, description, etc.)
 
 **Use cases:**

- **Single query:** Fetch content for individual pages via API for real-time processing
- **Batch query:** Efficiently retrieve content for multiple specific URLs (e.g., product pages, article URLs)
 
 

###   Get All Crawled Contents

 Retrieve all extracted contents in the specified format. Returns a paginated JSON object mapping URLs to their extracted content in your chosen format.

 ```
curl "https://api.scrapfly.io/crawl/<span class="snippet-var" data-var="crawler_uuid" title="Click to edit: Crawler UUID">{crawler_uuid}</span>/contents?key=&formats=markdown"
```

 

   

 

 

 

Response contains contents mapped by URL with pagination links:

 ```
{
  "links": {
    "crawled_urls": "https://api.scrapfly.io/crawl/{crawler_uuid}/urls",
    "next": "https://api.scrapfly.io/crawl/{crawler_uuid}/contents?limit=10&offset=10",
    "prev": null
  },
  "contents": {
    "https://web-scraping.dev": {"markdown": "# Homepage\n\nWelcome to our site..."},
    "https://web-scraping.dev/about": {"markdown": "# About Us\n\nWe are a company..."},
    "https://web-scraping.dev/contact": {"markdown": "# Contact\n\nReach us at..."}
  }
}
```

 

   

 

 

 

#### Pagination

 The endpoint returns results in pages. Use `limit` and `offset` parameters to navigate through results. The response includes `links.next` and `links.prev` URLs for easy navigation.

 | Parameter | Default | Max | Description |
|---|---|---|---|
| `limit` | 10 | 50 | Number of URLs to return per page |
| `offset` | 0 | - | Number of URLs to skip |

**Paginating through all results:**

 ```
# First page (default: 10 items)
curl "https://api.scrapfly.io/crawl/<span class="snippet-var" data-var="crawler_uuid" title="Click to edit: Crawler UUID">{crawler_uuid}</span>/contents?key=&formats=markdown"

# Second page
curl "https://api.scrapfly.io/crawl/<span class="snippet-var" data-var="crawler_uuid" title="Click to edit: Crawler UUID">{crawler_uuid}</span>/contents?key=&formats=markdown&limit=10&offset=10"

# Third page
curl "https://api.scrapfly.io/crawl/<span class="snippet-var" data-var="crawler_uuid" title="Click to edit: Crawler UUID">{crawler_uuid}</span>/contents?key=&formats=markdown&limit=10&offset=20"
```

 

   

 

 

 

#### Response Links

 | Field | Description |
|---|---|
| `links.next` | URL for the next page of results, or `null` if on the last page |
| `links.prev` | URL for the previous page of results, or `null` if on the first page |
| `links.crawled_urls` | URL to the `/urls` endpoint for this crawler |

#### Available Formats

- `html` - Raw HTML content
- `clean_html` - HTML with boilerplate removed
- `markdown` - Markdown format (ideal for LLM training data)
- `text` - Plain text only
- `json` - Structured JSON representation
- `extracted_data` - AI-extracted structured data
- `page_metadata` - Page metadata (title, description, etc.)
 
> **Large Crawls** For crawls with hundreds or thousands of pages, consider using **artifacts** (WARC/HAR) for more efficient bulk retrieval, or query specific URLs with the batch endpoint instead of paginating through all results.

 **Use case:** Small to medium crawls where you need all content via API, or testing/development.

 

###   Download Artifacts (Recommended for Large Crawls)

 Download industry-standard archive formats containing all crawled data. This is the **most efficient method** for large crawls, avoiding multiple API calls and handling huge datasets with ease.

#### Why Use Artifacts?

- **Massive Scale** - Handle crawls with thousands or millions of pages efficiently
- **Single Download** - Get the entire crawl in one compressed file, avoiding pagination and rate limits
- **Offline Processing** - Query and analyze data locally without additional API calls
- **Cost Effective** - One-time download instead of per-page API requests
- **Flexible Storage** - Store artifacts in S3, object storage, or local disk for long-term archival
- **Industry Standard** - WARC and HAR formats are universally supported by analysis tools
 
#### Available Artifact Types

##### WARC (Web ARChive Format)

 Industry-standard format for web archiving. Contains complete HTTP request/response pairs, headers, and extracted content. Compressed with gzip for efficient storage.

- [  cURL ](#pane-results-warc-curl)
- [  Python ](#pane-results-warc-python)
- [  TypeScript ](#pane-results-warc-typescript)
- [  Go ](#pane-results-warc-go)
 
 ```bash
curl "https://api.scrapfly.io/crawl/{crawler_uuid}/artifact?key={{ YOUR_API_KEY }}&type=warc" -o crawl.warc.gz
```

 

 ```python
from scrapfly import ScrapflyClient
client = ScrapflyClient(key="{{ YOUR_API_KEY }}")
warc = client.get_crawl_artifact("{crawler_uuid}", artifact_type="warc")
warc.save("crawl.warc.gz")
for record in warc.iter_responses():
    print(record.status_code, record.url, len(record.content), "bytes")
```

 

 ```javascript
import { ScrapflyClient } from 'scrapfly-sdk';
const client = new ScrapflyClient({ key: '{{ YOUR_API_KEY }}' });
const warc = await client.crawlArtifact('{crawler_uuid}', 'warc');
await warc.save('crawl.warc.gz');
console.log(`WARC: ${warc.data.byteLength} bytes`);
// TS SDK does not bundle a WARC parser; use `warcio` from npm if you need parsing.
```

 

 ```go
client, _ := scrapfly.New("{{ YOUR_API_KEY }}")
warc, _ := client.CrawlArtifact("{crawler_uuid}", scrapfly.ArtifactTypeWARC)
_ = warc.Save("crawl.warc.gz")

// Go SDK ships first-party WarcParser
parser, _ := scrapfly.ParseWARC(warc.Data)
parser.IterResponses(func(r *scrapfly.WarcRecord) bool {
    fmt.Printf("%d %s (%d bytes)\n", r.StatusCode, r.URL, len(r.Content))
    return true
})
```

 

 



 **Use case:** Long-term archival, offline analysis with standard tools, research datasets.

> **Learn More About WARC Format** See our [complete WARC format guide](https://scrapfly.io/docs/crawler-api/warc-format) for custom headers, reading libraries in multiple languages, and code examples.

##### HAR (HTTP Archive Format)

 JSON-based format with detailed HTTP transaction data. Ideal for performance analysis, debugging, and browser replay tools.

- [  cURL ](#pane-results-har-curl)
- [  Python ](#pane-results-har-python)
- [  TypeScript ](#pane-results-har-typescript)
- [  Go ](#pane-results-har-go)
 
 ```bash
curl "https://api.scrapfly.io/crawl/{crawler_uuid}/artifact?key={{ YOUR_API_KEY }}&type=har" -o crawl.har
```

 

 ```python
from scrapfly import ScrapflyClient
client = ScrapflyClient(key="{{ YOUR_API_KEY }}")
har = client.get_crawl_artifact("{crawler_uuid}", artifact_type="har")
for entry in har.parser.filter_by_status(200):
    print(entry.method, entry.url, entry.content_type)
```

 

 ```javascript
import { ScrapflyClient } from 'scrapfly-sdk';
const client = new ScrapflyClient({ key: '{{ YOUR_API_KEY }}' });
const har = await client.crawlArtifact('{crawler_uuid}', 'har');
await har.save('crawl.har');
console.log(`HAR: ${har.data.byteLength} bytes`);
// TS SDK does not bundle a HAR parser; use `har-validator` from npm if you need parsing.
```

 

 ```go
client, _ := scrapfly.New("{{ YOUR_API_KEY }}")
har, _ := client.CrawlArtifact("{crawler_uuid}", scrapfly.ArtifactTypeHAR)

// Go SDK ships first-party HarArchive with high-level filters
archive, _ := scrapfly.ParseHAR(har.Data)
for _, entry := range archive.FilterByStatus(200) {
    fmt.Println(entry.Method(), entry.URL(), entry.ContentType())
}
```

 

 



 **Use case:** Performance analysis, browser DevTools import, debugging HTTP transactions.

 

  

 ## Complete Retrieval Workflow

 Here's a complete example showing how to wait for completion and retrieve results:

    Bash Shell script    Python Using requests    JavaScript Using fetch API  

  ```
#!/bin/bash

# Step 1: Create crawler
RESPONSE=$(curl -X POST https://api.scrapfly.io/crawl?key= \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://web-scraping.dev/products",
    "page_limit": 25
  }')

# Extract crawler UUID
UUID=$(echo $RESPONSE | jq -r '.crawler_uuid')
echo "Crawler UUID: $UUID"

# Step 2: Poll status until complete
while true; do
  RESPONSE=$(curl -s https://api.scrapfly.io/crawl/$UUID/status?key=)
  IS_FINISHED=$(echo $RESPONSE | jq -r '.is_finished')
  IS_SUCCESS=$(echo $RESPONSE | jq -r '.is_success')

  echo "Status check: is_finished=$IS_FINISHED, is_success=$IS_SUCCESS"

  if [ "$IS_FINISHED" = "true" ]; then
    if [ "$IS_SUCCESS" = "true" ]; then
      echo "Crawler completed successfully!"
      break
    else
      echo "Crawler failed!"
      exit 1
    fi
  fi

  sleep 5
done

# Step 3: Download results
echo "Downloading WARC artifact..."
curl https://api.scrapfly.io/crawl/$UUID/artifact?key=&type=warc -o crawl.warc.gz

echo "Getting markdown content..."
curl https://api.scrapfly.io/crawl/$UUID/contents?key=&format=markdown > content.json

echo "Done!"
```

 

   

 

 

 

 

 ```
import requests
import time

API_KEY = ""
BASE_URL = "https://api.scrapfly.io"

# Step 1: Create crawler
response = requests.post(
    f"{BASE_URL}/crawl",
    params={"key": API_KEY},
    json={
        "url": "https://web-scraping.dev/products",
        "page_limit": 25
    }
)
crawler_data = response.json()
uuid = crawler_data["crawler_uuid"]
print(f"Crawler UUID: {uuid}")

# Step 2: Poll status until complete
while True:
    response = requests.get(
        f"{BASE_URL}/crawl/{uuid}/status",
        params={"key": API_KEY}
    )
    status = response.json()

    is_finished = status.get("is_finished", False)
    is_success = status.get("is_success", False)

    print(f"Status check: is_finished={is_finished}, is_success={is_success}")

    if is_finished:
        if is_success:
            print("Crawler completed successfully!")
            break
        else:
            print("Crawler failed!")
            exit(1)

    time.sleep(5)

# Step 3: Download results
print("Downloading WARC artifact...")
warc_response = requests.get(
    f"{BASE_URL}/crawl/{uuid}/artifact",
    params={"key": API_KEY, "type": "warc"}
)
with open("crawl.warc.gz", "wb") as f:
    f.write(warc_response.content)

print("Getting markdown content...")
content_response = requests.get(
    f"{BASE_URL}/crawl/{uuid}/contents",
    params={"key": API_KEY, "format": "markdown"}
)
with open("content.json", "w") as f:
    f.write(content_response.text)

print("Done!")
```

 

   

 

 

 

 

 ```
const API_KEY = "";
const BASE_URL = "https://api.scrapfly.io";

async function runCrawler() {
    // Step 1: Create crawler
    const createResponse = await fetch(`${BASE_URL}/crawl?key=${API_KEY}`, {
        method: "POST",
        headers: {
            "Content-Type": "application/json"
        },
        body: JSON.stringify({
            url: "https://web-scraping.dev/products",
            page_limit: 25
        })
    });

    const crawlerData = await createResponse.json();
    const uuid = crawlerData.crawler_uuid;
    console.log(`Crawler UUID: ${uuid}`);

    // Step 2: Poll status until complete
    while (true) {
        const statusResponse = await fetch(
            `${BASE_URL}/crawl/${uuid}/status?key=${API_KEY}`
        );
        const status = await statusResponse.json();

        const isFinished = status.is_finished || false;
        const isSuccess = status.is_success || false;

        console.log(`Status check: is_finished=${isFinished}, is_success=${isSuccess}`);

        if (isFinished) {
            if (isSuccess) {
                console.log("Crawler completed successfully!");
                break;
            } else {
                console.log("Crawler failed!");
                process.exit(1);
            }
        }

        await new Promise(resolve => setTimeout(resolve, 5000));
    }

    // Step 3: Download results
    console.log("Downloading WARC artifact...");
    const warcResponse = await fetch(
        `${BASE_URL}/crawl/${uuid}/artifact?key=${API_KEY}&type=warc`
    );
    const warcBlob = await warcResponse.blob();
    // In Node.js, use fs.writeFileSync to save
    // In browser, use URL.createObjectURL to download

    console.log("Getting markdown content...");
    const contentResponse = await fetch(
        `${BASE_URL}/crawl/${uuid}/contents?key=${API_KEY}&format=markdown`
    );
    const content = await contentResponse.json();
    // Save content.json to file

    console.log("Done!");
}

runCrawler().catch(console.error);
```

 

   

 

 

 

 

 

## Next Steps

- Learn about [webhook integration](https://scrapfly.io/docs/crawler-api/webhook) for real-time notifications
- Understand [billing and costs](https://scrapfly.io/docs/crawler-api/billing)
- Review the [full API specification](https://scrapfly.io/docs/crawler-api/getting-started#spec)