AI Powered Data Extraction

View as markdown Copy for LLM

Scrapfly's AI-powered automatic parser effortlessly converts unstructured HTML data into predefined, structured models. Experience precise and efficient data extraction with our advanced technology. It can ingest any text format like HTML, text, Markdown, json

Minimal API call is a POST request with key and extraction_model parameters:

https://api.scrapfly.io/extraction?key=<API KEY>&extraction_model=<MODEL NAME>

See Available Models

Usage

We will use https://web-scraping.dev/product/1 page as example and save its content to the current directory where you will run the curl command below as product.html
We will use the product model to extract product information
Call the extraction API
```
curl -X POST \
-H "content-type: text/html" \
"https://api.scrapfly.io/extraction?key=&url=https%3A%2F%2Fweb-scraping.dev&extraction_model=product" \
-d @product.html
```
If you have jq available on your machine, you can pretty print the output JSON by appending it to the command like -d @product.html | jq.
Command Explanation
- curl -X POST:
  - curl is a command-line tool for transferring data with URLs.
  - -X POST specifies the HTTP method to be used, which is POST in this case.
- -H "content-type: text/html":
  - -H is used to specify an HTTP header for the request.
  - "content-type: text/html" sets the Content-Type header to text/html, indicating that the data being sent is HTML.
- URL:
  - The URL of the API endpoint being accessed, including query parameters for authentication and specifying the target URL and extraction prompt.
  - key: An API key for authentication.
  - url: The URL of the web page to be scraped, URL-encoded.
  - extraction_model: The AI model to use for extraction.
- -d @product.html:
  - -d is used to specify the data to be sent in the POST request body.
  - @product.html indicates that the data should be read from a file named product.html.

Result:

{
    "config" : {
        ...
    },
    "context": {
        ...
    },
    "result": {
        ...
        "content": ".... html content ... too long for the example",
        "content_encoding": "utf-8",
        "content_format": "raw",
        "content_type": "text/html; charset=utf-8",
        "duration": 3.7,
        "error": null,
        "extracted_data": {
            "content_type": "application/json",
            "data": {
                "additional_property": [],
                "aggregate_rating": null,
                "brand": "ChocoDelight",
                "breadcrumbs": null,
                "canonical_url": null,
                "color": null,
                "description": "Indulge your sweet tooth with our Box of Chocolate Candy. Each box contains an assortment of rich, flavorful chocolates with a smooth, creamy filling. Choose from a variety of flavors including zesty orange and sweet cherry. Whether you're looking for the perfect gift or just want to treat yourself, our Box of Chocolate Candy is sure to satisfy.",
                "identifiers": {
                    "ean13": null,
                    "gtin14": null,
                    "gtin8": null,
                    "isbn10": null,
                    "isbn13": null,
                    "ismn": null,
                    "issn": null,
                    "mpn": null,
                    "sku": null,
                    "upc": null
                },
                "images": [],
                "main_category": "Products",
                "main_image": null,
                "name": "Box of Chocolate Candy",
                "offers": [
                    {
                        "availability": "available",
                        "currency": "$",
                        "price": 9.99,
                        "regular_price": 12.99
                    }
                ],
                "related_products": [
                    {
                        "availability": "available",
                        "description": null,
                        "images": [
                            {
                                "url": "https://web-scraping.dev/assets/products/red-potion.webp"
                            }
                        ],
                        "link": "https://web-scraping.dev/product/28",
                        "name": "Red Energy Potion",
                        "price": {
                            "amount": 4.99,
                            "currency": null,
                            "raw": "4.99"
                        }
                    },
                    {
                        "availability": "available",
                        "description": null,
                        "images": [
                            {
                                "url": "https://web-scraping.dev/assets/products/darkred-potion.webp"
                            }
                        ],
                        "link": "https://web-scraping.dev/product/2",
                        "name": "Dark Red Energy Potion",
                        "price": {
                            "amount": 4.99,
                            "currency": null,
                            "raw": "4.99"
                        }
                    },
                    {
                        "availability": "available",
                        "description": null,
                        "images": [
                            {
                                "url": "https://web-scraping.dev/assets/products/women-sandals-beige-1.webp"
                            }
                        ],
                        "link": "https://web-scraping.dev/product/8",
                        "name": "Women's High Heel Sandals",
                        "price": {
                            "amount": 59.99,
                            "currency": null,
                            "raw": "59.99"
                        }
                    },
                    {
                        "availability": "available",
                        "description": null,
                        "images": [
                            {
                                "url": "https://web-scraping.dev/assets/products/red-potion.webp"
                            }
                        ],
                        "link": "https://web-scraping.dev/product/4",
                        "name": "Red Energy Potion",
                        "price": {
                            "amount": 4.99,
                            "currency": null,
                            "raw": "4.99"
                        }
                    }
                ],
                "secondary_category": null,
                "size": null,
                "specifications": [
                    {
                        "name": "material",
                        "value": "Premium quality chocolate"
                    },
                    {
                        "name": "flavors",
                        "value": "Available in Orange and Cherry flavors"
                    },
                    {
                        "name": "sizes",
                        "value": "Available in small, medium, and large boxes"
                    },
                    {
                        "name": "brand",
                        "value": "ChocoDelight"
                    },
                    {
                        "name": "care instructions",
                        "value": "Store in a cool, dry place"
                    },
                    {
                        "name": "purpose",
                        "value": "Ideal for gifting or self-indulgence"
                    }
                ],
                "style": null,
                "url": "https://web-scraping.dev/",
                "variants": [
                    {
                        "color": "orange",
                        "offers": [
                            {
                                "availability": "available",
                                "price": {
                                    "amount": null,
                                    "currency": null,
                                    "raw": null
                                }
                            }
                        ],
                        "sku": "https://web-scraping.dev/product/1?variant=orange-small"
                    },
                    {
                        "color": "orange",
                        "offers": [
                            {
                                "availability": "available",
                                "price": {
                                    "amount": null,
                                    "currency": null,
                                    "raw": null
                                }
                            }
                        ],
                        "sku": "https://web-scraping.dev/product/1?variant=orange-medium"
                    },
                    {
                        "color": "orange",
                        "offers": [
                            {
                                "availability": "available",
                                "price": {
                                    "amount": null,
                                    "currency": null,
                                    "raw": null
                                }
                            }
                        ],
                        "sku": "https://web-scraping.dev/product/1?variant=orange-large"
                    },
                    {
                        "color": "cherry",
                        "offers": [
                            {
                                "availability": "available",
                                "price": {
                                    "amount": null,
                                    "currency": null,
                                    "raw": null
                                }
                            }
                        ],
                        "sku": "https://web-scraping.dev/product/1?variant=cherry-small"
                    },
                    {
                        "color": "cherry",
                        "offers": [
                            {
                                "availability": "available",
                                "price": {
                                    "amount": null,
                                    "currency": null,
                                    "raw": null
                                }
                            }
                        ],
                        "sku": "https://web-scraping.dev/product/1?variant=cherry-medium"
                    },
                    {
                        "color": "cherry",
                        "offers": [
                            {
                                "availability": "available",
                                "price": {
                                    "amount": null,
                                    "currency": null,
                                    "raw": null
                                }
                            }
                        ],
                        "sku": "https://web-scraping.dev/product/1?variant=cherry-large"
                    }
                ]
            },
            "data_quality": {
                "errors": [
                    "identifiers.sku: Input should be a valid string"
                ],
                "fulfilled": false,
                "fulfillment_percent": 45
            }
        },
        "format": "text",
        "reason": "OK",
        "request_headers": [],
        "response_headers": {
            ...
        },
        "status": "DONE",
        "status_code": 200,
        "success": true,
        "url": "https://web-scraping.dev/product/1"
    }
}

Models

These models have been tailored based on customer feedback and usage. If you need a specific general model, you can contact us through the support link below. In addition, if the existing models are missing some important fields you can also request them to be added.

Contact us

For automatic structured data extraction, choose a model from below and the AI will try to fulfill it from the scrape web page you provide

Name	API name `&extraction_model={model-name}`
Article	`article`
Event	`event`
Food Recipe	`food_recipe`
Hotel	`hotel`
Hotel Listing	`hotel_listing`
Job Listing	`job_listing`
Job Posting	`job_posting`
Organization	`organization`
Product	`product`
Product Listing	`product_listing`
Real Estate Property	`real_estate_property`
Real Estate Property Listing	`real_estate_property_listing`
Review List	`review_list`
Search Engine Results	`search_engine_results`
Social Media Post	`social_media_post`
Software	`software`
Stock	`stock`
Vehicle Ad	`vehicle_ad`
Vehicle Ad Listing	`vehicle_ad_listing`

Web Scraping API

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?tags=player%2Cproject%3Adefault&extraction_model=product&asp=true&render_js=true&auto_scroll=true&key=__API_KEY__&url=https%3A%2F%2Fweb-scraping.dev%2Fproduct%2F1")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body

https://api.scrapfly.io/scrape?tags=player%252Cproject%253Adefault&extraction_model=product&asp=true&render_js=true&auto_scroll=true&key=&url=https%253A%252F%252Fweb-scraping.dev%252Fproduct%252F1

API Response

You will retrieve the following information from the API response result.extracted_data

result.extracted_data.content_type: Always be JSON
result.extracted_data.data: Structured extracted data
result.extracted_data.data_quality
- errors Will give the list of data violations that do not follow the validation schema
- fulfilled A boolean indicating whether the schema is fully satisfied.
- fulfillment_percent The percentage of fulfillment, where 0 indicates empty and 100 indicates perfect.

{
    "config" : {
        ...
    },
    "context": {
        ...
    },
    "result": {
        ...
        "content": ".... html content ... too long for the example",
        "content_encoding": "utf-8",
        "content_format": "raw",
        "content_type": "text/html; charset=utf-8",
        "duration": 3.7,
        "error": null,
        "extracted_data": {
            "content_type": "application/json",
            "data": {
                "additional_property": [],
                "aggregate_rating": null,
                "brand": "ChocoDelight",
                "breadcrumbs": null,
                "canonical_url": null,
                "color": null,
                "description": "Indulge your sweet tooth with our Box of Chocolate Candy. Each box contains an assortment of rich, flavorful chocolates with a smooth, creamy filling. Choose from a variety of flavors including zesty orange and sweet cherry. Whether you're looking for the perfect gift or just want to treat yourself, our Box of Chocolate Candy is sure to satisfy.",
                "identifiers": {
                    "ean13": null,
                    "gtin14": null,
                    "gtin8": null,
                    "isbn10": null,
                    "isbn13": null,
                    "ismn": null,
                    "issn": null,
                    "mpn": null,
                    "sku": null,
                    "upc": null
                },
                "images": [],
                "main_category": "Products",
                "main_image": null,
                "name": "Box of Chocolate Candy",
                "offers": [
                    {
                        "availability": "available",
                        "currency": "$",
                        "price": 9.99,
                        "regular_price": 12.99
                    }
                ],
                "related_products": [
                    {
                        "availability": "available",
                        "description": null,
                        "images": [
                            {
                                "url": "https://web-scraping.dev/assets/products/red-potion.webp"
                            }
                        ],
                        "link": "https://web-scraping.dev/product/28",
                        "name": "Red Energy Potion",
                        "price": {
                            "amount": 4.99,
                            "currency": null,
                            "raw": "4.99"
                        }
                    },
                    {
                        "availability": "available",
                        "description": null,
                        "images": [
                            {
                                "url": "https://web-scraping.dev/assets/products/darkred-potion.webp"
                            }
                        ],
                        "link": "https://web-scraping.dev/product/2",
                        "name": "Dark Red Energy Potion",
                        "price": {
                            "amount": 4.99,
                            "currency": null,
                            "raw": "4.99"
                        }
                    },
                    {
                        "availability": "available",
                        "description": null,
                        "images": [
                            {
                                "url": "https://web-scraping.dev/assets/products/women-sandals-beige-1.webp"
                            }
                        ],
                        "link": "https://web-scraping.dev/product/8",
                        "name": "Women's High Heel Sandals",
                        "price": {
                            "amount": 59.99,
                            "currency": null,
                            "raw": "59.99"
                        }
                    },
                    {
                        "availability": "available",
                        "description": null,
                        "images": [
                            {
                                "url": "https://web-scraping.dev/assets/products/red-potion.webp"
                            }
                        ],
                        "link": "https://web-scraping.dev/product/4",
                        "name": "Red Energy Potion",
                        "price": {
                            "amount": 4.99,
                            "currency": null,
                            "raw": "4.99"
                        }
                    }
                ],
                "secondary_category": null,
                "size": null,
                "specifications": [
                    {
                        "name": "material",
                        "value": "Premium quality chocolate"
                    },
                    {
                        "name": "flavors",
                        "value": "Available in Orange and Cherry flavors"
                    },
                    {
                        "name": "sizes",
                        "value": "Available in small, medium, and large boxes"
                    },
                    {
                        "name": "brand",
                        "value": "ChocoDelight"
                    },
                    {
                        "name": "care instructions",
                        "value": "Store in a cool, dry place"
                    },
                    {
                        "name": "purpose",
                        "value": "Ideal for gifting or self-indulgence"
                    }
                ],
                "style": null,
                "url": "https://web-scraping.dev/",
                "variants": [
                    {
                        "color": "orange",
                        "offers": [
                            {
                                "availability": "available",
                                "price": {
                                    "amount": null,
                                    "currency": null,
                                    "raw": null
                                }
                            }
                        ],
                        "sku": "https://web-scraping.dev/product/1?variant=orange-small"
                    },
                    {
                        "color": "orange",
                        "offers": [
                            {
                                "availability": "available",
                                "price": {
                                    "amount": null,
                                    "currency": null,
                                    "raw": null
                                }
                            }
                        ],
                        "sku": "https://web-scraping.dev/product/1?variant=orange-medium"
                    },
                    {
                        "color": "orange",
                        "offers": [
                            {
                                "availability": "available",
                                "price": {
                                    "amount": null,
                                    "currency": null,
                                    "raw": null
                                }
                            }
                        ],
                        "sku": "https://web-scraping.dev/product/1?variant=orange-large"
                    },
                    {
                        "color": "cherry",
                        "offers": [
                            {
                                "availability": "available",
                                "price": {
                                    "amount": null,
                                    "currency": null,
                                    "raw": null
                                }
                            }
                        ],
                        "sku": "https://web-scraping.dev/product/1?variant=cherry-small"
                    },
                    {
                        "color": "cherry",
                        "offers": [
                            {
                                "availability": "available",
                                "price": {
                                    "amount": null,
                                    "currency": null,
                                    "raw": null
                                }
                            }
                        ],
                        "sku": "https://web-scraping.dev/product/1?variant=cherry-medium"
                    },
                    {
                        "color": "cherry",
                        "offers": [
                            {
                                "availability": "available",
                                "price": {
                                    "amount": null,
                                    "currency": null,
                                    "raw": null
                                }
                            }
                        ],
                        "sku": "https://web-scraping.dev/product/1?variant=cherry-large"
                    }
                ]
            },
            "data_quality": {
                "errors": [
                    "identifiers.sku: Input should be a valid string"
                ],
                "fulfilled": false,
                "fulfillment_percent": 45
            }
        },
        "format": "text",
        "reason": "OK",
        "request_headers": [],
        "response_headers": {
            ...
        },
        "status": "DONE",
        "status_code": 200,
        "success": true,
        "url": "https://web-scraping.dev/product/1"
    }
}

Combined with cache feature, we cache the raw data from the website, allowing you to re-extract the data with multiple extraction passes at a much faster speed and lower cost. This applies to the following extraction types:

Extraction Template

Extraction Model

LLM Extraction

Learn more about cache feature

Cache feature

API Specification

Error Handling

All related errors are listed below. You can see full description and example of error response on the Errors section.

ERR::EXTRACTION::CONFIG_ERROR - Parameters sent to the API are not valid
ERR::EXTRACTION::CONTENT_TYPE_NOT_SUPPORTED - The content type of the response is not supported for extraction.
ERR::EXTRACTION::DATA_ERROR - Extracted data is invalid or have an issue
ERR::EXTRACTION::INVALID_RULE - The extraction rule is invalid
ERR::EXTRACTION::INVALID_TEMPLATE - The template used for extraction is invalid
ERR::EXTRACTION::NO_CONTENT - Target response is empty
ERR::EXTRACTION::OPERATION_TIMEOUT - Extraction Operation Timeout
ERR::EXTRACTION::OUT_OF_CAPACITY - Not able to extract more data, backend are out of capacity, retry later.
ERR::EXTRACTION::TEMPLATE_NOT_FOUND - The provided template do not exist
ERR::EXTRACTION::TIMEOUT - The extraction was tool long (maximum 25s) or do not had enough time to complete

Pricing

Extraction model is billed 5 API Credits.

For more information about the pricing you can learn more on the dedicated section