Scrapfly Product Release Notes
2024-12-28
Dashboard
Team feature is now available in the dashboard. You can now invite your team members to collaborate on your projects and configure their access rights.
You can see the documentation, and discover it in your dashboard.
2024-06-10
Web Scraping API
Web Scraping API now announce the debug replay url, when you are using the debug
parameter in the Web Scraping API,
the response will now contain a content_replay_url
in context.debug
to replay a scrape against the exact same content.
This URL need to be authenticated with the same API key used to perform the scrape.
context {
debug: {
screenshot_url: "https://api.scrapfly.io/11cd6abe-5061-4dce-8d37-5d50e667a071/scrape/screenshot/ee8484c6-ee5f-4775-a665-0a2b57631c1c/debug",
response_url: "https://api.scrapfly.io/scrape/debug/ee8484c6-ee5f-4775-a665-0a2b57631c1c",
content_replay_url: "https://api.scrapfly.io/scrape/debug/ee8484c6-ee5f-4775-a665-0a2b57631c1c/replay",
}
}
For more information, refer to the Web Scraping API documentation
2024-06-04
Screenshot API
Screenshot API released. This API allows you to take screenshots of web pages, much simpler than the Web Scraping API and all preset pre configured (Image load, High quality, Rendering wait)
Screenshot API provide some unique features:
- Multiple image format (jpg, png, webp, gif)
- Multiple capture mode (custom viewport, fullpage, vertical, elements)
- Custom resolution
- Caching
- Page options (Dark mode, block banners, block ads, print format)
You can now discover the Screenshot API in the API documentation.
Extraction API
Extraction API released in BETA. This API allows you to extract structured data from web pages. It comes with 3 modes of extraction:
- Custom rules with extraction template: define your own extraction rules, formatters, and filters
- LLM Prompt Extraction: Extract or ask question about the document using our pre-trained LLM model dedicated to web scraping
- Automatic extraction: Choose a model of extraction based on the type of page (product, job, article, etc.) and retrieved the structured data and metadata information to evaluate the quality of the extraction
You can now discover the Extraction API in the API documentation.
Web Scraping API
Web Scraping API now integrate data extraction from the scraped pages. You refer the documentation of those new parameters:
extraction_template
: Use your own extraction rulesextraction_prompt
: Use LLM prompt to retrieve dataextraction_model
: Use automatic extraction mode
Fixed an issue where the Web Scraping API screenshot return the image with an invalid IANA content type
image/jpg
instead of image/jpeg
The proxified_response
parameter, when using extraction_template
or extraction_prompt
or extraction_model
,
now return the content-type
of the extracted data instead of the original response content-type.
More information about the proxified_response
parameter in the Web Scraping API documentation
The format
parameter now accept options to configure the output format of the scraped page.
Markdown format now allow to:
- Disable images
no_links
and use the alt text instead - Disable links
no_images
and use the anchor instead
By using the following notation: markdown:no_links,no_images
- {format}:{option1},{optionN}
To lean more about those formats, refer to the Web Scraping API documentation
2024-04-24
Python SDK
Python SDK 0.8.17 released. This version introduce the support of:
- Web Scraping API
format
parameter - Web Scraping API
screenshot_flags
parameter
You can now install the new version with pip install scrapfly-sdk==0.8.17
or upgrade with pip install --upgrade scrapfly-sdk
Javascript SDK
Javascript SDK 0.5.0 released. This version introduce the support of:
- Web Scraping API
format
parameter - Web Scraping API
screenshot_flags
parameter
You can now install the new version with npm install scrapfly-sdk@0.5.0
or upgrade with npm install scrapfly-sdk@latest
2024-04-22
Python SDK
Scrapfly has now official integration with LlamaIndex to help you to extract data.
Scrapfly has now official integration with LangChain to help you to extract data.
Web Scraping API
Introduce a new parameter format
to the Web Scraping API to allow you to convert the scraped page to a specific format.
With the rise of LLM usage, you can now convert into friendly LLM format and more.
You can now convert the scraped page to:
markdown
text
json
(auto parse)clean_html
If you are using proxified_response
to directly retrieve the content, the announced content-type
will
follow the format you choose.
To lean more about those formats, refer to the Web Scraping API documentation
You can now pass flags to configure screenshot options directly from the Web Scraping API.
Available flags:
load_images
dark_mode
block_banners
high_quality
print_media_format
To lean more about those flags, refer to the Web Scraping API documentation