Go SDK

View as markdown

Go SDK is the easiest way to access Scrapfly API in Go (Golang).

It provides a client that streamlines the scraping process by:

Handling common errors
Automatically encoding and decoding sensitive API parameters
Handling and simplifying concurrency
Providing an HTML selector engine via goquery

For more on Go SDK use with Scrapfly, select "Go SDK" option in Scrapfly docs top bar.

Step by Step Introduction

For a hands-on introduction and example projects see our Scrapfly SDK introduction page!

Discover Now

Installation

Install the Go SDK using go get:

go get github.com/scrapfly/go-scrapfly

Quick Use

Here's a quick preview of what the Go SDK can do:

package main

import (
	"fmt"
	"log"

	"github.com/scrapfly/go-scrapfly"
)

func main() {
	key := "{{ YOUR_API_KEY }}"

	client, err := scrapfly.New(key)
	if err != nil {
		log.Fatalf("failed to create client: %v", err)
	}

	result, err := client.Scrape(&scrapfly.ScrapeConfig{
		URL:       "https://web-scraping.dev/product/1",
		ASP:       true,              // enable scraper blocking bypass
		Country:   "US",             // set proxy country
		RenderJS:  true,              // enable headless browser
		ProxyPool: scrapfly.PublicResidentialPool,
	})
	if err != nil {
		log.Fatalf("scrape failed: %v", err)
	}

	// 1) access scraped HTML content
	fmt.Println(result.Result.Content)
	// 2) or parse it with CSS selectors via goquery
	selector, _ := result.Selector()
	fmt.Println(selector.Find("h3").First().Text())
}

In short, we first create a scrapfly.Client with our Scrapfly key. Then, we use client.Scrape() with a ScrapeConfig to issue our scraping commands.

The returned ScrapeResult contains result data (like page HTML), request metadata and a convenience HTML selector via .Selector() for further parsing.

Configuring Scrape

The SDK supports all features of Scrapfly API, which can be configured through the ScrapeConfig struct:

For scraping websites protected against web scraping make sure to enable Anti Scraping Protection bypass using ASP: true.

result, err := client.Scrape(&scrapfly.ScrapeConfig{
	URL:    "https://web-scraping.dev/product/1",
	// Request details
	Method: "GET", // GET, POST, PUT, PATCH
	Headers: map[string]string{
		"X-Csrf-Token": "1234",
	},

	// enable scraper blocking bypass (recommended)
	ASP:     true,
	Country: "US,CA,FR", // set proxy countries

	// enable cache (recommended when developing)
	Cache:     true,
	CacheTTL:  3600, // expire cache in 1 hour (default 24h)
	Debug:     true, // enable debug info in dashboard

	// enable javascript rendering
	RenderJS:        true,
	WaitForSelector: ".review",
	RenderingWait:   5000, // 5 seconds
	JS:              "return document.title",
	AutoScroll:      true,
})
if err != nil { /* handle error */ }

For more on available options see API specification which is matched in the SDK where applicable.

Handling Result

The ScrapeResult object contains all data returned by Scrapfly API such as response data, API usage information, scrape metadata and more:

apiResult, _ := client.Scrape(&scrapfly.ScrapeConfig{URL: "https://web-scraping.dev/product/1"})
// get response body (HTML) and status code:
_ = apiResult.Result.Content
_ = apiResult.Result.StatusCode
// response headers:
_ = apiResult.Result.ResponseHeaders
// log url for accessing this scrape in Scrapfly dashboard:
_ = apiResult.Result.LogURL

// if RenderJS is used then browser context is available as well
// get data from javascript execution:
_ = apiResult.Result.BrowserData.JSEvaluationResult
// javascript scenario results:
_ = apiResult.Result.BrowserData.JSScenario

Concurrent Scraping

Use client.ConcurrentScrape() to scrape concurrently at your plan's concurrency limit or a provided limit:

configs := []*scrapfly.ScrapeConfig{
	{URL: "https://httpbin.dev/status/200"},
	{URL: "https://httpbin.dev/status/403"},
	{URL: "https://httpbin.dev/status/200"},
	{URL: "https://httpbin.dev/status/403"},
}
results := 0
errors := 0
for res := range client.ConcurrentScrape(configs, 0) { // 0 uses account concurrency
	if res.error != nil {
		errors++
		continue
	}
	results++
}
fmt.Printf("got %d results and %d errors\n", results, errors)

Getting Account Details

To access Scrapfly account information use client.Account():

account, err := client.Account()
if err != nil { /* handle error */ }
fmt.Println(account.Subscription.PlanName)

Examples

Custom Headers

Provide additional headers using Headers in ScrapeConfig. Note that when using ASP=true, Scrapfly can add additional headers automatically to prevent scraper blocking.

res, err := client.Scrape(&scrapfly.ScrapeConfig{
	URL: "https://httpbin.dev/headers",
	Headers: map[string]string{"X-My-Header": "foo"},
})
if err != nil { /* handle error */ }
fmt.Println(res.Result.Content)

Post Form

To post form data, set Method: "POST" and provide Data. By default it uses application/x-www-form-urlencoded.

res, err = client.Scrape(&scrapfly.ScrapeConfig{
	URL:    "https://httpbin.dev/post",
	Method: "POST",
	Data:   map[string]interface{}{"foo": "bar"},
})
if err != nil { /* handle error */ }
fmt.Println(res.Result.Content)

Post JSON

To post JSON data, set Headers["content-type"] = "application/json" and provide Data.

res, err = client.Scrape(&scrapfly.ScrapeConfig{
	URL:    "https://httpbin.dev/post",
	Method: "POST",
	Headers: map[string]string{"content-type": "application/json"},
	Data:    map[string]interface{}{"foo": "bar"},
})
if err != nil { /* handle error */ }
fmt.Println(res.Result.Content)

Javascript Rendering

To render pages using headless browsers using Javascript Rendering feature set RenderJS=true in ScrapeConfig:

res, err = client.Scrape(&scrapfly.ScrapeConfig{
	URL:            "https://web-scraping.dev/product/1",
	RenderJS:       true,
	WaitForSelector: ".review", // wait for element to appear
	RenderingWait:  5000,        // or wait for a set amount of time
})
if err != nil { /* handle error */ }
fmt.Println(res.Result.Content)

Javascript Scenario

To execute Javascript Scenario use JSScenario in ScrapeConfig and enable RenderJS:

import (
	"github.com/scrapfly/go-scrapfly/js_scenario"
	"github.com/scrapfly/go-scrapfly"
	"log"
	"fmt"
)
// [...]
scenario, err := js_scenario.New().
	WaitForSelector(".review").
	Execute("return navigator.userAgent", js_scenario.WithExecuteTimeout(1000)).
	Click("#load-more-reviews").
	WaitForNavigation().
	Execute("return [...document.querySelectorAll('.review p')].map(p=>p.outerText)", js_scenario.WithExecuteTimeout(1000)).
	Build()
if err != nil {
	log.Fatal(err)
}
res, err := client.Scrape(&scrapfly.ScrapeConfig{
	URL:       "https://web-scraping.dev/product/1",
	Debug:     true,
	RenderJS:  true,
	JSScenario: scenario,
})
if err != nil { 
	log.Fatal(err)
}
fmt.Println(res.Result.BrowserData.JSScenario)

Capturing Screenshots

To capture screenshots use RenderJS=true and Screenshots in ScrapeConfig:

res, err = client.Scrape(&scrapfly.ScrapeConfig{
	URL:       "https://web-scraping.dev/product/1",
	RenderJS:  true, // enable headless browsers for screenshots
	WaitForSelector: ".review",
	Screenshots: map[string]string{
		"everything": "fullpage",
		"reviews":    "#reviews",
	},
})
if err != nil { /* handle error */ }
for name, sc := range res.Result.Screenshots {
	fmt.Println(name, sc.URL)
}

// To save a screenshot, download from the result URLs and provide your API key:
// (example only)
/*
import (
	"io"
	"net/http"
	"os"
	"path/filepath"
)

for name, sc := range res.Result.Screenshots {
	url := sc.URL + "?key={{ YOUR_API_KEY }}"
	req, _ := http.NewRequest("GET", url, nil)
	resp, err := http.DefaultClient.Do(req)
	if err != nil { continue }
	defer resp.Body.Close()
	data, _ := io.ReadAll(resp.Body)
	os.WriteFile(filepath.Join(".", fmt.Sprintf("example-screenshot-%s.%s", name, sc.Extension)), data, 0644)
}
*/

Scraping Binary Data

Binary data is returned base64 encoded. Decode it with encoding/base64:

import (
	"encoding/base64"
	"os"
)
res, err = client.Scrape(&scrapfly.ScrapeConfig{
	URL: "https://web-scraping.dev/assets/products/orange-chocolate-box-small-1.png",
})
if err != nil { /* handle error */ }
data, _ := base64.StdEncoding.DecodeString(res.Result.Content)
os.WriteFile("image.png", data, 0644)

Full Documentation

For full documentation of the Go SDK, see the Go SDK documentation.