Puppeteer Integration

Puppeteer is Google's official Node.js library for controlling Chrome. Connect it to Scrapfly Cloud Browser for scalable automation with built-in proxies and fingerprinting.

Beta Feature: Cloud Browser is currently in beta.

Installation

Install puppeteer-core (the browser-agnostic version):

Use puppeteer-core instead of puppeteer since Cloud Browser provides the browser instance.

Quick Start

Connection Parameters

Configure your Cloud Browser connection with these WebSocket URL parameters:

Parameter Required Description
api_key Required Your Scrapfly API key
proxy_pool Optional datacenter (default) or residential
os Optional OS fingerprint: linux, windows, macos
session Optional Session ID for persistent browser state
country Optional Proxy country code (ISO 3166-1 alpha-2)

Data Extraction

Extract data from a dynamic page:

Form Interaction

Fill forms and handle login flows:

Session Persistence

Maintain browser state across connections using the session parameter:

Proxy Options

Proxy Pool Use Case Cost
datacenter General scraping, high speed, lower cost 1 credits/30s + 2 credits/MB
residential Protected sites, geo-targeting, anti-bot bypass 1 credits/30s + 10 credits/MB

Best Practices

  • Use puppeteer-core - Don't download bundled Chrome
  • Handle disconnects - Wrap connections in try/catch/finally blocks
  • Close browsers - Always call browser.close() to stop billing (use finally block)
  • Use sessions wisely - Reuse sessions for multi-step flows
  • Block unnecessary resources - Use request interception to reduce bandwidth
  • Set timeouts - Add reasonable timeouts to prevent hanging connections (e.g., timeout: 30000 in page.goto())
  • Error handling - Catch specific errors like timeout or WebSocket connection failures for better debugging

Summary