Scrapfly Product Release Notes

2024-12-28

Dashboard

FEATURE

Team feature is now available in the dashboard. You can now invite your team members to collaborate on your projects and configure their access rights.

You can see the documentation, and discover it in your dashboard.

2024-06-10

Web Scraping API

CHANGED

Web Scraping API now announce the debug replay url, when you are using the debug parameter in the Web Scraping API, the response will now contain a content_replay_url in context.debug to replay a scrape against the exact same content.

This URL need to be authenticated with the same API key used to perform the scrape.

    context {
      debug: {
        screenshot_url: "https://api.scrapfly.io/11cd6abe-5061-4dce-8d37-5d50e667a071/scrape/screenshot/ee8484c6-ee5f-4775-a665-0a2b57631c1c/debug",
        response_url: "https://api.scrapfly.io/scrape/debug/ee8484c6-ee5f-4775-a665-0a2b57631c1c",
        content_replay_url: "https://api.scrapfly.io/scrape/debug/ee8484c6-ee5f-4775-a665-0a2b57631c1c/replay",
      }
    }

For more information, refer to the Web Scraping API documentation

2024-06-04

Screenshot API

ANNOUNCEMENT

Screenshot API released. This API allows you to take screenshots of web pages, much simpler than the Web Scraping API and all preset pre configured (Image load, High quality, Rendering wait)

Screenshot API provide some unique features:

  • Multiple image format (jpg, png, webp, gif)
  • Multiple capture mode (custom viewport, fullpage, vertical, elements)
  • Custom resolution
  • Caching
  • Page options (Dark mode, block banners, block ads, print format)

You can now discover the Screenshot API in the API documentation.

Extraction API

ANNOUNCEMENT

Extraction API released in BETA. This API allows you to extract structured data from web pages. It comes with 3 modes of extraction:

  • Custom rules with extraction template: define your own extraction rules, formatters, and filters
  • LLM Prompt Extraction: Extract or ask question about the document using our pre-trained LLM model dedicated to web scraping
  • Automatic extraction: Choose a model of extraction based on the type of page (product, job, article, etc.) and retrieved the structured data and metadata information to evaluate the quality of the extraction

You can now discover the Extraction API in the API documentation.

Web Scraping API

FEATURE

Web Scraping API now integrate data extraction from the scraped pages. You refer the documentation of those new parameters:

FIXED

Fixed an issue where the Web Scraping API screenshot return the image with an invalid IANA content type image/jpg instead of image/jpeg

CHANGED

The proxified_response parameter, when using extraction_template or extraction_prompt or extraction_model, now return the content-type of the extracted data instead of the original response content-type.

More information about the proxified_response parameter in the Web Scraping API documentation

CHANGED

The format parameter now accept options to configure the output format of the scraped page.

Markdown format now allow to:

  • Disable images no_links and use the alt text instead
  • Disable links no_images and use the anchor instead

By using the following notation: markdown:no_links,no_images - {format}:{option1},{optionN}

To lean more about those formats, refer to the Web Scraping API documentation

2024-04-24

Python SDK

RELEASE

Python SDK 0.8.17 released. This version introduce the support of:

  • Web Scraping API format parameter
  • Web Scraping API screenshot_flags parameter

You can now install the new version with pip install scrapfly-sdk==0.8.17 or upgrade with pip install --upgrade scrapfly-sdk

See the Python SDK documentation PyPi package

Javascript SDK

RELEASE

Javascript SDK 0.5.0 released. This version introduce the support of:

  • Web Scraping API format parameter
  • Web Scraping API screenshot_flags parameter

You can now install the new version with npm install scrapfly-sdk@0.5.0 or upgrade with npm install scrapfly-sdk@latest

See the Javascript SDK documentation NPM package

2024-04-22

Python SDK

ANNOUNCEMENT

Scrapfly has now official integration with LlamaIndex to help you to extract data.

See LLama index documentation

ANNOUNCEMENT

Scrapfly has now official integration with LangChain to help you to extract data.

See LangChain documentation

Web Scraping API

FEATURE

Introduce a new parameter format to the Web Scraping API to allow you to convert the scraped page to a specific format. With the rise of LLM usage, you can now convert into friendly LLM format and more.

You can now convert the scraped page to:

  • markdown
  • text
  • json (auto parse)
  • clean_html

If you are using proxified_response to directly retrieve the content, the announced content-type will follow the format you choose.

To lean more about those formats, refer to the Web Scraping API documentation

FEATURE

You can now pass flags to configure screenshot options directly from the Web Scraping API.

Available flags:

  • load_images
  • dark_mode
  • block_banners
  • high_quality
  • print_media_format

To lean more about those flags, refer to the Web Scraping API documentation

Summary