Go SDK
Go SDK is the easiest way to access Scrapfly API in Go (Golang).
It provides a client that streamlines the scraping process by:
- Handling common errors
- Automatically encoding and decoding sensitive API parameters
- Handling and simplifying concurrency
- Providing an HTML selector engine via goquery
For more on Go SDK use with Scrapfly, select "Go SDK" option in Scrapfly docs top bar.
Step by Step Introduction
For a hands-on introduction and example projects see our Scrapfly SDK introduction page!
Discover Now
Installation
Install the Go SDK using go get:
go get github.com/scrapfly/go-scrapfly
Quick Use
Here's a quick preview of what the Go SDK can do:
package main
import (
"fmt"
"log"
"github.com/scrapfly/go-scrapfly"
)
func main() {
key := "{{ YOUR_API_KEY }}"
client, err := scrapfly.New(key)
if err != nil {
log.Fatalf("failed to create client: %v", err)
}
result, err := client.Scrape(&scrapfly.ScrapeConfig{
URL: "https://web-scraping.dev/product/1",
ASP: true, // enable scraper blocking bypass
Country: "US", // set proxy country
RenderJS: true, // enable headless browser
ProxyPool: scrapfly.PublicResidentialPool,
})
if err != nil {
log.Fatalf("scrape failed: %v", err)
}
// 1) access scraped HTML content
fmt.Println(result.Result.Content)
// 2) or parse it with CSS selectors via goquery
selector, _ := result.Selector()
fmt.Println(selector.Find("h3").First().Text())
}
In short, we first create a scrapfly.Client with our Scrapfly key. Then, we use client.Scrape()
with a ScrapeConfig to issue our scraping commands.
The returned ScrapeResult contains result data (like page HTML), request metadata and a convenience
HTML selector via .Selector() for further parsing.
Configuring Scrape
The SDK supports all features of Scrapfly API, which can be configured through the
ScrapeConfig struct:
For scraping websites protected against web scraping make sure to enable
Anti Scraping Protection bypass
using ASP: true.
result, err := client.Scrape(&scrapfly.ScrapeConfig{
URL: "https://web-scraping.dev/product/1",
// Request details
Method: "GET", // GET, POST, PUT, PATCH
Headers: map[string]string{
"X-Csrf-Token": "1234",
},
// enable scraper blocking bypass (recommended)
ASP: true,
Country: "US,CA,FR", // set proxy countries
// enable cache (recommended when developing)
Cache: true,
CacheTTL: 3600, // expire cache in 1 hour (default 24h)
Debug: true, // enable debug info in dashboard
// enable javascript rendering
RenderJS: true,
WaitForSelector: ".review",
RenderingWait: 5000, // 5 seconds
JS: "return document.title",
AutoScroll: true,
})
if err != nil { /* handle error */ }
For more on available options see API specification
which is matched in the SDK where applicable.
Handling Result
The ScrapeResult object contains all data returned by Scrapfly API such as response data,
API usage information, scrape metadata and more:
apiResult, _ := client.Scrape(&scrapfly.ScrapeConfig{URL: "https://web-scraping.dev/product/1"})
// get response body (HTML) and status code:
_ = apiResult.Result.Content
_ = apiResult.Result.StatusCode
// response headers:
_ = apiResult.Result.ResponseHeaders
// log url for accessing this scrape in Scrapfly dashboard:
_ = apiResult.Result.LogURL
// if RenderJS is used then browser context is available as well
// get data from javascript execution:
_ = apiResult.Result.BrowserData.JSEvaluationResult
// javascript scenario results:
_ = apiResult.Result.BrowserData.JSScenario
Concurrent Scraping
Use client.ConcurrentScrape() to scrape concurrently at your plan's concurrency limit or a provided limit:
configs := []*scrapfly.ScrapeConfig{
{URL: "https://httpbin.dev/status/200"},
{URL: "https://httpbin.dev/status/403"},
{URL: "https://httpbin.dev/status/200"},
{URL: "https://httpbin.dev/status/403"},
}
results := 0
errors := 0
for res := range client.ConcurrentScrape(configs, 0) { // 0 uses account concurrency
if res.error != nil {
errors++
continue
}
results++
}
fmt.Printf("got %d results and %d errors\n", results, errors)
Getting Account Details
To access Scrapfly account information use client.Account():
account, err := client.Account()
if err != nil { /* handle error */ }
fmt.Println(account.Subscription.PlanName)
Examples
Provide additional headers using Headers in ScrapeConfig.
Note that when using ASP=true, Scrapfly can add additional headers automatically to prevent scraper blocking.
res, err := client.Scrape(&scrapfly.ScrapeConfig{
URL: "https://httpbin.dev/headers",
Headers: map[string]string{"X-My-Header": "foo"},
})
if err != nil { /* handle error */ }
fmt.Println(res.Result.Content)
Post Form
To post form data, set Method: "POST" and provide Data. By default it uses application/x-www-form-urlencoded.
res, err = client.Scrape(&scrapfly.ScrapeConfig{
URL: "https://httpbin.dev/post",
Method: "POST",
Data: map[string]interface{}{"foo": "bar"},
})
if err != nil { /* handle error */ }
fmt.Println(res.Result.Content)
Post JSON
To post JSON data, set Headers["content-type"] = "application/json" and provide Data.
res, err = client.Scrape(&scrapfly.ScrapeConfig{
URL: "https://httpbin.dev/post",
Method: "POST",
Headers: map[string]string{"content-type": "application/json"},
Data: map[string]interface{}{"foo": "bar"},
})
if err != nil { /* handle error */ }
fmt.Println(res.Result.Content)
Javascript Rendering
To render pages using headless browsers using
Javascript Rendering
feature set RenderJS=true in ScrapeConfig:
res, err = client.Scrape(&scrapfly.ScrapeConfig{
URL: "https://web-scraping.dev/product/1",
RenderJS: true,
WaitForSelector: ".review", // wait for element to appear
RenderingWait: 5000, // or wait for a set amount of time
})
if err != nil { /* handle error */ }
fmt.Println(res.Result.Content)
Javascript Scenario
To execute Javascript Scenario
use JSScenario in ScrapeConfig and enable RenderJS:
import (
"github.com/scrapfly/go-scrapfly/js_scenario"
"github.com/scrapfly/go-scrapfly"
"log"
"fmt"
)
// [...]
scenario, err := js_scenario.New().
WaitForSelector(".review").
Execute("return navigator.userAgent", js_scenario.WithExecuteTimeout(1000)).
Click("#load-more-reviews").
WaitForNavigation().
Execute("return [...document.querySelectorAll('.review p')].map(p=>p.outerText)", js_scenario.WithExecuteTimeout(1000)).
Build()
if err != nil {
log.Fatal(err)
}
res, err := client.Scrape(&scrapfly.ScrapeConfig{
URL: "https://web-scraping.dev/product/1",
Debug: true,
RenderJS: true,
JSScenario: scenario,
})
if err != nil {
log.Fatal(err)
}
fmt.Println(res.Result.BrowserData.JSScenario)
Capturing Screenshots
To capture screenshots use RenderJS=true and Screenshots in ScrapeConfig:
res, err = client.Scrape(&scrapfly.ScrapeConfig{
URL: "https://web-scraping.dev/product/1",
RenderJS: true, // enable headless browsers for screenshots
WaitForSelector: ".review",
Screenshots: map[string]string{
"everything": "fullpage",
"reviews": "#reviews",
},
})
if err != nil { /* handle error */ }
for name, sc := range res.Result.Screenshots {
fmt.Println(name, sc.URL)
}
// To save a screenshot, download from the result URLs and provide your API key:
// (example only)
/*
import (
"io"
"net/http"
"os"
"path/filepath"
)
for name, sc := range res.Result.Screenshots {
url := sc.URL + "?key={{ YOUR_API_KEY }}"
req, _ := http.NewRequest("GET", url, nil)
resp, err := http.DefaultClient.Do(req)
if err != nil { continue }
defer resp.Body.Close()
data, _ := io.ReadAll(resp.Body)
os.WriteFile(filepath.Join(".", fmt.Sprintf("example-screenshot-%s.%s", name, sc.Extension)), data, 0644)
}
*/
Scraping Binary Data
Binary data is returned base64 encoded. Decode it with encoding/base64:
import (
"encoding/base64"
"os"
)
res, err = client.Scrape(&scrapfly.ScrapeConfig{
URL: "https://web-scraping.dev/assets/products/orange-chocolate-box-small-1.png",
})
if err != nil { /* handle error */ }
data, _ := base64.StdEncoding.DecodeString(res.Result.Content)
os.WriteFile("image.png", data, 0644)
Full Documentation
For full documentation of the Go SDK, see the Go SDK documentation.