Typescript SDK
Introduction
Typescript SDK is the easiest way to access Scrapfly API in Typescript, Javascript and NodeJS.
It provides a client that streamlines the scraping process by:
- Handling common errors
- Automatically encoding and decoding sensitive API parameters
- Handling and simplifying concurrency
- Implementing CSS selector engine for result HTML
Installation
Source code of Typescript SDK is available on Github scrapfly-sdk package is available through NPM.
npm install scrapfly-sdk
The SDK is also compatible with other typescript runtimes like Bun.
Quick Use
Here's a quick preview of what Typescript SDK can do:
import { ScrapflyClient, ScrapeConfig, ScrapeResult, log } from "scrapfly-sdk";
// Optional: set log level to debug to see all details
log.setLevel("DEBUG");
// 1. Create a scrapfly client with your API key
const scrapfly = new ScrapflyClient({ key: "" })
// 2. Start scraping!
const result: ScrapeResult = await scrapfly.scrape(new ScrapeConfig({
url: "https://web-scraping.dev/product/1",
// optional configuration:
asp: true, // enable scraper blocking bypass
country: "US", // set proxy country
render_js: true, // enable headless web browser
// ... and much more
}))
// 3. access scraped result data
console.log(result.result.content);
// 3.1 and even process it with CSS selectors:
console.log(result.selector("h3").text())
In short, we first create a ScrapflyClient
object with our scrapfly key.
Then, we can use the .scrape()
method to issue our scraping commands
which are defined by ScrapeConfig
object.
The returned ScrapeResult
object contains result data (like page HTML),
request metadata and convenience extensions like CSS selector engine
.selector()
which can further parse the HTML result to
specific details.
Configuring Scrape
The SDK supports all features of Scrapfly API, which can be configured through
ScrapeConfig
object:
For scraping websites protected against web scraping make sure to enable
Anti Scraping Protection bypass
using asp: true
option.
const result: ScrapeResult = await scrapfly.scrape(new ScrapeConfig({
url: "https://web-scraping.dev/product/1",
// Request details
method: "GET", // GET, POST, PUT etc.
headers: {
"X-Csrf-Token": "1234",
},
// enable scraper blocking bypass (recommended)
asp: true,
// set proxy countries
country: "US,CA,FR",
// enable cache (recommended when developing)
cache: true,
cache_ttl: 3600, // expire cache in 1 hour (default 24h)
// enable debug to see more details in scrapfly web dashboard
debug: true,
// enable javascript rendering
render_js: true,
// wait for element to load when using js rendering:
wait_for_selector: ".review",
// or explicit amount of time
rendering_wait: 5000, // 5 seconds
// run custom javascript code
js: "return document.title",
// scroll to the bottom of the page (for loading details)
auto_scroll: true,
// ...
}))
For more on available options see API specification which is matched in the SDK where applicable.
Handling Result
The ScrapeResult
object contains all data returned by Scrapfly API such as response data,
api use information, scrape metadata and more:
const apiResult: ScrapeResult = await scrapfly.scrape(new ScrapeConfig({
url: "https://web-scraping.dev/product/1",
}))
// get response body (HTML) and status code:
apiResult.result.content
apiResult.result.status_code
// response headers:
apiResult.result.response_headers
// log url for accessing this scrape in scrapfly dashboard:
apiResult.result.log_url
// if render_js is used then browser context is available as well
// get data from javascript execution:
apiResult.result.browser_data.javascript_evaluation_result
// javascript scenario apiResults:
apiResult.result.browser_data.js_scenario
Concurrent Scraping
The main scraping method .scrape()
is asynchronous meaning it can be used
in javascript idioms like Promise.all()
and .then()
callbacks.
Additionally, the SDK provides .concurrentScrape()
async generator that can
be used to concurrently scrape at your scrapfly plan's concurrency limit:
import { ScrapflyClient, ScrapeConfig } from 'scrapfly-sdk';
const client = new ScrapflyClient({ key: "" });
const configs = [
// these two will succeed:
...new Array(2).fill(null).map(() => new ScrapeConfig({ url: 'https://httpbin.dev/status/200' })),
// these two will fail:
...new Array(2).fill(null).map(() => new ScrapeConfig({ url: 'https://httpbin.dev/status/403' })),
];
const results = [];
const errors = [];
for await (const resultOrError of client.concurrentScrape(configs)) {
if (resultOrError instanceof Error) {
errors.push(resultOrError);
} else {
results.push(resultOrError);
}
}
console.log(`got ${results.length} results:`);
console.log(results);
console.log(`got ${errors.length} errors:`);
console.log(errors);
Getting Account Details
To access Scrapfly account information the `.account()` method can be used:
import { ScrapflyClient, ScrapeConfig } from 'scrapfly-sdk';
const client = new ScrapflyClient({ key: "" });
console.log(await client.account());
Examples
Custom Headers
To provide additional headers, use headers
option of
ScrapeConfig
. Note that when using asp=True
Scrapfly can add additional headers automatically to prevent scraper blocking.
import { ScrapflyClient, ScrapeConfig } from 'scrapfly-sdk';
const client = new ScrapflyClient({ key: "" });
const result = await client.scrape(
new ScrapeConfig({
url: 'https://httpbin.dev/headers',
headers: { 'X-My-Header': 'foo' },
}),
);
console.log(JSON.parse(result.result.content));
Post Form
To post FormData, use data
option:
import { ScrapflyClient, ScrapeConfig } from 'scrapfly-sdk';
const client = new ScrapflyClient({ key: "" });
const result = await client.scrape(
new ScrapeConfig({
url: 'https://httpbin.dev/post',
method: 'POST',
data: { foo: 'bar' },
}),
);
console.log(JSON.parse(result.result.content));
Post JSON
To post JSON data, use data
option with a
'Content-Type':'application/json'
header in ScrapeConfig
:
import { ScrapflyClient, ScrapeConfig } from 'scrapfly-sdk';
const client = new ScrapflyClient({ key: "" });
const result = await client.scrape(
new ScrapeConfig({
url: 'https://httpbin.dev/post',
// set method to POST
method: 'POST',
// set appropriate header
headers: { 'content-type': 'application/json' },
data: { foo: 'bar' },
}),
);
console.log(JSON.parse(result.result.content));
Javascript Rendering
To render pages using headless browsers using
Javascript Rendering
feature use render_js=true
option of ScrapeConfig
:
import { ScrapflyClient, ScrapeConfig } from 'scrapfly-sdk';
const client = new ScrapflyClient({ key: "" });
const result = await client.scrape(
new ScrapeConfig({
url: 'https://web-scraping.dev/product/1',
render_js: true,
// additionally we can wait for specific element to appear on the page:
wait_for_selector: '.review',
// or wait for a set amount of time:
wait_for: 5000, // seconds
}),
);
console.log(JSON.parse(result.result.content));
Javascript Scenario
To execute Javascript Scenario
use scenario
option of ScrapeConfig
:
import { ScrapflyClient, ScrapeConfig } from 'scrapfly-sdk';
const client = new ScrapflyClient({ key: "" });
const result = await client.scrape(
new ScrapeConfig({
url: 'https://web-scraping.dev/product/1',
render_js: true,
// additionally, we can wait for a specific element to appear on the page:
wait_for_selector: '.review',
// or wait for a set amount of time:
wait_for: 5000, // seconds
}),
);
console.log(JSON.parse(result.result.content));
Capturing Screenshots
To capture screenshots render_js=true
and screenshots
options of ScrapeConfig
can be used:
import { ScrapflyClient, ScrapeConfig } from 'scrapfly-sdk';
const client = new ScrapflyClient({ key: "" });
const result = await client.scrape(
new ScrapeConfig({
url: 'https://web-scraping.dev/product/1',
// enable headless browsers for screenshots
render_js: true,
// optional: you can wait for page to load before capturing
wait_for_selector: '.review',
screenshots: {
// name: what-to-capture
// fullpage - will capture everything
// css selector (e.g. #reviews) - will capture just that element
everything: 'fullpage',
reviews: '#reviews',
},
}),
);
console.log(result.result.screenshots);
/*
{
everything: {
css_selector: null,
extension: 'jpg',
format: 'fullpage',
size: 63803,
url: 'https://api.scrapfly.io/scrape/screenshot/01H5S96DFN48V5RH32ZM9WM8WQ/everything'
},
reviews: {
css_selector: '#reviews',
extension: 'jpg',
format: 'element',
size: 12602,
url: 'https://api.scrapfly.io/scrape/screenshot/01H5S96DFN48V5RH32ZM9WM8WQ/reviews'
}
}
*/
// To save a screenshot to a file you can download the screenshot from the result URLs
import axios from 'axios';
import fs from 'fs';
for (let [name, screenshot] of Object.entries(result.result.screenshots)) {
let response = await axios.get(screenshot.url, {
// note: don't forget to add your API key parameter:
params: { key: key },
// this indicates that response is binary data:
responseType: 'arraybuffer',
});
// write to screenshot data to a file in current directory:
fs.writeFileSync(`example-screenshot-${name}.${screenshot.extension}`, response.data);
}
Scraping Binary Data
Binary data can be scraped like any other page however it's returned b64 encoded.
To decode it, the Buffer.from()
method can be used:
import { ScrapflyClient, ScrapeConfig } from 'scrapfly-sdk';
import fs from "fs";
const client = new ScrapflyClient({ key: "" });
const result = await client.scrape(
new ScrapeConfig({
url: 'https://web-scraping.dev/assets/products/orange-chocolate-box-small-1.png',
}),
);
const data = Buffer.from(result.result.content, 'base64');
fs.writeFileSync("image.png", data);