 # Lead Generation Web Scraping

##  Build your pipeline from public, structured data. 

 Public business directories, review platforms, and company data sites publish rich lead signals. Scrapfly fetches, unblocks, and structures that data so your pipeline stays current without manual effort.

 [ Get Free API Key ](https://scrapfly.io/register) [ Web Scraping API ](https://scrapfly.io/products/web-scraping-api) 

 1,000 free credits. No credit card required. 

 

  

 

 

 

---

## 6+

B2B sources (LinkedIn Public, Crunchbase, G2, BuiltWith, Yelp, Google Business)

 



 

## 5B+

scrapes/month platform-wide

 



 

## 99%+

success rate on protected targets

 



 

## JSON

or CSV - structured output, every call

 



 

 

 

---

 // FORMULA## Turn public signals into qualified pipeline.

 `Company URL` + `Schema` = Enriched Lead 

Fetch any public business profile. Extract structured fields. Push to CRM. Repeat at scale.

 

 

---

 COVERAGE## Every lead signal. One API.

From company enrichment to contact discovery and tech-stack detection.

 

 // FEATURED ### Company Enrichment

Scrape public company profiles to fill fields like funding stage, employee count, industry, tech stack, and founding year. Works on any public business intelligence platform.

 [Crunchbase](https://scrapfly.io/blog/posts/how-to-scrape-crunchbase/) 

 [G2](https://scrapfly.io/blog/posts/how-to-scrape-g2-company-data-and-reviews/) 

BuiltWith

 

 

 



 

 

 ### Directory Harvesting

Pull business listings from local and vertical directories. Name, address, phone, hours, category, and review summary in one pass.

 [Yelp](https://scrapfly.io/blog/posts/how-to-scrape-yelpcom/) 

Google Business

 

Local directories

 

 

 



 

 ### Contact Discovery

Follow the chain from a company domain to publicly listed emails and phone numbers. No authenticated sessions, no private data.

  **Company Domain** starting point from any enrichment source 

 

  **Site Crawl** Crawler API follows internal links, respects depth limits 

 

  **Contact Page** targeted extraction on /about, /contact, /team pages 

 

  **Emails + Phones** structured output, deduplicated, ready for CRM import 

 

 

 



 

 

 ### Tech-stack Detection

Identify which technologies a prospect uses before reaching out. Score leads by stack fit, competitor usage, or technology adoption signals.

**HTML source**script tags, meta, headers

**BuiltWith**public profiles

**G2**integrations listed

 

 



 

 ### CRM Sync and Pipelines

Structured JSON output plugs directly into any CRM or data pipeline. Schedule recurring scrapes to keep records fresh.

**JSON / CSV**stable output

**daily**freshness cadence

**dedup**via unique domain key

 

 



 

 

 ### Anti-bot Bypass - Built In

Lead data sources deploy heavy bot protection. Scrapfly routes around it automatically - no config required. Tested in production at scale.

 [Cloudflare](https://scrapfly.io/bypass/cloudflare) 

 [DataDome](https://scrapfly.io/bypass/datadome) 

 [Akamai](https://scrapfly.io/bypass/akamai) 

 [PerimeterX](https://scrapfly.io/bypass/perimeterx) 

 

 [See full bypass coverage](https://scrapfly.io/bypass) 



 

 

 

---

  - Web Scraping API
- Extraction API
- Screenshot API
- Crawler API
- Cloud Browser
 
 

Products

## Every tool your lead pipeline needs.

From raw HTML fetch to structured extraction and full-site crawl - all behind one API key.

   Web Scraping API

Fetch any public business profile with anti-bot bypass, proxy rotation, and optional JS rendering. Clean HTML or markdown, stable JSON envelope.

 [ Landing page ](https://scrapfly.io/products/web-scraping-api) [ Documentation ](https://scrapfly.io/docs/scrape-api/getting-started) 

 

   Extraction API

Turn scraped HTML into typed lead records. Pass a prompt or a JSON schema - the API returns structured fields: company name, domain, funding, employee range, tech stack.

 [ Landing page ](https://scrapfly.io/products/extraction-api) [ Documentation ](https://scrapfly.io/docs/extraction-api/getting-started) 

 

   Screenshot API

Capture full-page screenshots of company pages and business profiles. Useful for audit trails and visual verification of scraped data.

 [ Landing page ](https://scrapfly.io/products/screenshot-api) [ Documentation ](https://scrapfly.io/docs/screenshot-api/getting-started) 

 

   Crawler API

Traverse an entire company site to discover /about, /team, and /contact pages. Streams discovered URLs; every page runs through the Web Scraping API automatically.

 [ Landing page ](https://scrapfly.io/products/crawler-api) [ Documentation ](https://scrapfly.io/docs/crawler-api/getting-started) 

 

   Cloud Browser

Drive a real stealth Chromium over CDP for JavaScript-heavy lead sources. Full Playwright and Puppeteer compatibility, hosted and scaled by Scrapfly.

 [ Landing page ](https://scrapfly.io/products/cloud-browser-api) [ Documentation ](https://scrapfly.io/docs/cloud-browser-api/getting-started) 

 

 

 [Get Free API Key](https://scrapfly.io/register) 

 



 

---

 CODE## Real targets. Real requests.

Scrape a public business listing on Yelp in Python, TypeScript, or Go.

 

Scrape a public Yelp business page with anti-bot bypass and JS rendering.

     Python TypeScript HTTP / cURL  

    

 ```
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

client = ScrapflyClient(key="API KEY")

api_response: ScrapeApiResponse = client.scrape(
  ScrapeConfig(
    # add a page to scrape
    url='https://www.linkedin.com/company/red-hat/',
    asp=True,  # enable bypass of anti-scraping protection
    render_js=True,  # enable headless browser (if necessary)
    country="US",  # set location for region specific data
    # use AI to extract data
    extraction_model='organization' 
  )
)
# use AI extracted data
print(api_response.scrape_result['extracted_data']['data'])
# or parse the html yourself 
print(api_response.content)
```

 ```
import { 
    ScrapflyClient, ScrapeConfig 
} from 'jsr:@scrapfly/scrapfly-sdk';

const client = new ScrapflyClient({ key: "API KEY" });

let api_response = await client.scrape(
    new ScrapeConfig({
        // add a scrape url
        url: 'https://www.linkedin.com/company/red-hat/',
        asp: true, // enable bypass of anti-scraping protection
        render_js: true,  // enable headless browser (if necessary)
        // use AI to extract data
        extraction_model: 'organization' 
    })
);
// use AI extracted data
console.log(api_response.result['extracted_data']['data'])
// or parse the HTML yourself
console.log(api_response.result['content'])
```

 ```
http https://api.scrapfly.io/scrape \
key==$SCRAPFLY_KEY \
url==https://www.linkedin.com/company/red-hat/ \
asp==true \
render_js==true \
country==US \
extraction_model=organization
```

 

 

 [ Python SDK docs → ](https://scrapfly.io/docs/sdk/python) [ TypeScript SDK docs → ](https://scrapfly.io/docs/sdk/typescript) [ HTTP API docs → ](https://scrapfly.io/docs) 

 

 

 

---

 AUTOMATE## Automate with AI and Workflows

Connect Scrapfly to the tools your team already uses for data pipelines and AI enrichment.

 

 ### LLM-Powered Extraction

Pass scraped HTML directly to the Extraction API with a prompt describing the fields you need. No regex, no CSS selectors - describe the output schema and get typed JSON back.

**Prompt**plain language schema

**JSON Schema**strict typed output

**Templates**company, person, job

 

 



 

 ### Works with Your Stack

Scrapfly has official SDKs and integrations for the tools teams use to run data pipelines and AI workflows.

Python SDK



TypeScript SDK



REST API



MCP Server



n8n



Zapier



 

 



 

 

 

---

  FAQ## Frequently Asked Questions

 

  ### IS WEB SCRAPING LEAD DATA LEGAL?

 Generally yes - scraping publicly visible data is legal in most jurisdictions. Courts in the US and EU have consistently held that accessing data that is publicly visible without authentication does not constitute unauthorized access. Extra care should be taken around PII (personally identifiable information), which is subject to GDPR, CCPA, and similar laws depending on jurisdiction. Scrapfly is designed for public, non-authenticated data. For a detailed breakdown, see our [web scraping laws](https://scrapfly.io/is-web-scraping-legal) guide.

 

   ### HOW DO I UNBLOCK ACCESS TO LEAD DATA SOURCES?

 Most B2B data sites deploy anti-bot protection (Cloudflare, DataDome, Akamai, PerimeterX). Scrapfly bypasses these automatically via its built-in anti-bot engine. You pass a URL; we handle fingerprinting, proxy selection, and retry logic. No custom browser setup, no proxy management, no maintenance on your side.

 

   ### WHAT LEAD DATA CAN BE SCRAPED FROM PUBLIC SOURCES?

 Public lead sources are rich with B2B signal: company name, domain, industry, employee count, funding stage, tech stack, product ratings, review count, business address, phone number, and category tags. All of this is published openly on platforms like Crunchbase, G2, Yelp, BuiltWith, and Google Business - no login required.

 

   ### HOW DO I EXTRACT STRUCTURED FIELDS FROM SCRAPED PAGES?

 Use the [Extraction API](https://scrapfly.io/products/extraction-api). Pass the scraped HTML and a prompt describing the fields you need (or a JSON Schema for strict output). The API returns typed JSON. No selectors to write, no maintenance when the target site's layout changes.

 

   ### ARE PROXIES ENOUGH TO SCRAPE LEAD DATA SITES?

 No. Modern bot protection identifies proxies by TLS fingerprint, behavioral patterns, and IP reputation - not just by IP address. Rotating residential proxies alone get blocked quickly on Crunchbase, G2, and similar sites. Scrapfly combines proxy rotation with byte-perfect Chrome TLS fingerprinting (via Curlium) and a stealth Chromium browser (via Scrapium) to achieve reliable pass rates.

 

   ### HOW DO I SCRAPE LINKEDIN FOR LEAD DATA?

 Scrapfly is designed for public, non-authenticated data. LinkedIn's public company pages (visible without login) can be scraped for company description, employee count range, and industry. Authenticated profile data requires a logged-in session, which is outside the scope of Scrapfly's intended use and raises legal and ToS considerations. For B2B lead enrichment, Crunchbase, G2, and BuiltWith typically provide richer, cleaner structured data than LinkedIn's public pages.

 

   ### WHAT IS SCRAPFLY'S WEB SCRAPING API?

 The [Web Scraping API](https://scrapfly.io/products/web-scraping-api) is a single endpoint that abstracts proxy rotation, anti-bot bypass, browser rendering, and retry logic. You send a URL; you get back clean HTML or markdown. Official SDKs are available for [Python](https://scrapfly.io/docs/sdk/python) and [TypeScript](https://scrapfly.io/docs/sdk/typescript). Any HTTP client works too.

 

  

 

  ---

 // GET STARTED### Start building your lead pipeline today.

Free account, 1,000 credits, no credit card. Fetch, unblock, extract, and export - all from one API key.

 

 [ Get Free API Key ](https://scrapfly.io/register) [See all use cases](https://scrapfly.io/use-case/web-scraping)