# Scrapfly Documentation

## Table of Contents

### Dashboard

- [Intro](https://scrapfly.io/docs)
- [Project](https://scrapfly.io/docs/project)
- [Account](https://scrapfly.io/docs/account)
- [Workspace & Team](https://scrapfly.io/docs/workspace-and-team)
- [Billing](https://scrapfly.io/docs/billing)

### Products

#### MCP Server

- [Getting Started](https://scrapfly.io/docs/mcp/getting-started)
- [Tools & API Spec](https://scrapfly.io/docs/mcp/tools)
- [Authentication](https://scrapfly.io/docs/mcp/authentication)
- [Examples & Use Cases](https://scrapfly.io/docs/mcp/examples)
- [FAQ](https://scrapfly.io/docs/mcp/faq)
##### Integrations

- [Overview](https://scrapfly.io/docs/mcp/integrations)
- [Claude Desktop](https://scrapfly.io/docs/mcp/integrations/claude-desktop)
- [Claude Code](https://scrapfly.io/docs/mcp/integrations/claude-code)
- [ChatGPT](https://scrapfly.io/docs/mcp/integrations/chatgpt)
- [Cursor](https://scrapfly.io/docs/mcp/integrations/cursor)
- [Cline](https://scrapfly.io/docs/mcp/integrations/cline)
- [Windsurf](https://scrapfly.io/docs/mcp/integrations/windsurf)
- [Zed](https://scrapfly.io/docs/mcp/integrations/zed)
- [Roo Code](https://scrapfly.io/docs/mcp/integrations/roo-code)
- [VS Code](https://scrapfly.io/docs/mcp/integrations/vscode)
- [LangChain](https://scrapfly.io/docs/mcp/integrations/langchain)
- [LlamaIndex](https://scrapfly.io/docs/mcp/integrations/llamaindex)
- [CrewAI](https://scrapfly.io/docs/mcp/integrations/crewai)
- [OpenAI](https://scrapfly.io/docs/mcp/integrations/openai)
- [n8n](https://scrapfly.io/docs/mcp/integrations/n8n)
- [Make](https://scrapfly.io/docs/mcp/integrations/make)
- [Zapier](https://scrapfly.io/docs/mcp/integrations/zapier)
- [Vapi AI](https://scrapfly.io/docs/mcp/integrations/vapi)
- [Agent Builder](https://scrapfly.io/docs/mcp/integrations/agent-builder)
- [Custom Client](https://scrapfly.io/docs/mcp/integrations/custom-client)


#### Web Scraping API

- [Getting Started](https://scrapfly.io/docs/scrape-api/getting-started)
- [API Specification]()
- [Monitoring](https://scrapfly.io/docs/monitoring)
- [Customize Request](https://scrapfly.io/docs/scrape-api/custom)
- [Debug](https://scrapfly.io/docs/scrape-api/debug)
- [Anti Scraping Protection](https://scrapfly.io/docs/scrape-api/anti-scraping-protection)
- [Proxy](https://scrapfly.io/docs/scrape-api/proxy)
- [Proxy Mode](https://scrapfly.io/docs/scrape-api/proxy-mode)
- [Proxy Mode - Screaming Frog](https://scrapfly.io/docs/scrape-api/proxy-mode/screaming-frog)
- [Proxy Mode - Apify](https://scrapfly.io/docs/scrape-api/proxy-mode/apify)
- [(Auto) Data Extraction](https://scrapfly.io/docs/scrape-api/extraction)
- [Javascript Rendering](https://scrapfly.io/docs/scrape-api/javascript-rendering)
- [Javascript Scenario](https://scrapfly.io/docs/scrape-api/javascript-scenario)
- [SSL](https://scrapfly.io/docs/scrape-api/ssl)
- [DNS](https://scrapfly.io/docs/scrape-api/dns)
- [Cache](https://scrapfly.io/docs/scrape-api/cache)
- [Session](https://scrapfly.io/docs/scrape-api/session)
- [Webhook](https://scrapfly.io/docs/scrape-api/webhook)
- [Screenshot](https://scrapfly.io/docs/scrape-api/screenshot)
- [Errors](https://scrapfly.io/docs/scrape-api/errors)
- [Timeout](https://scrapfly.io/docs/scrape-api/understand-timeout)
- [Throttling](https://scrapfly.io/docs/throttling)
- [Troubleshoot](https://scrapfly.io/docs/scrape-api/troubleshoot)
- [Billing](https://scrapfly.io/docs/scrape-api/billing)
- [FAQ](https://scrapfly.io/docs/scrape-api/faq)

#### Crawler API

- [Getting Started](https://scrapfly.io/docs/crawler-api/getting-started)
- [API Specification]()
- [Retrieving Results](https://scrapfly.io/docs/crawler-api/results)
- [WARC Format](https://scrapfly.io/docs/crawler-api/warc-format)
- [Data Extraction](https://scrapfly.io/docs/crawler-api/extraction-rules)
- [Webhook](https://scrapfly.io/docs/crawler-api/webhook)
- [Billing](https://scrapfly.io/docs/crawler-api/billing)
- [Errors](https://scrapfly.io/docs/crawler-api/errors)
- [Troubleshoot](https://scrapfly.io/docs/crawler-api/troubleshoot)
- [FAQ](https://scrapfly.io/docs/crawler-api/faq)

#### Screenshot API

- [Getting Started](https://scrapfly.io/docs/screenshot-api/getting-started)
- [API Specification]()
- [Accessibility Testing](https://scrapfly.io/docs/screenshot-api/accessibility)
- [Webhook](https://scrapfly.io/docs/screenshot-api/webhook)
- [Billing](https://scrapfly.io/docs/screenshot-api/billing)
- [Errors](https://scrapfly.io/docs/screenshot-api/errors)

#### Extraction API

- [Getting Started](https://scrapfly.io/docs/extraction-api/getting-started)
- [API Specification]()
- [Rules Template](https://scrapfly.io/docs/extraction-api/rules-and-template)
- [LLM Extraction](https://scrapfly.io/docs/extraction-api/llm-prompt)
- [AI Auto Extraction](https://scrapfly.io/docs/extraction-api/automatic-ai)
- [Webhook](https://scrapfly.io/docs/extraction-api/webhook)
- [Billing](https://scrapfly.io/docs/extraction-api/billing)
- [Errors](https://scrapfly.io/docs/extraction-api/errors)
- [FAQ](https://scrapfly.io/docs/extraction-api/faq)

#### Proxy Saver

- [Getting Started](https://scrapfly.io/docs/proxy-saver/getting-started)
- [Fingerprints](https://scrapfly.io/docs/proxy-saver/fingerprints)
- [Optimizations](https://scrapfly.io/docs/proxy-saver/optimizations)
- [SSL Certificates](https://scrapfly.io/docs/proxy-saver/certificates)
- [Protocols](https://scrapfly.io/docs/proxy-saver/protocols)
- [Pacfile](https://scrapfly.io/docs/proxy-saver/pacfile)
- [Secure Credentials](https://scrapfly.io/docs/proxy-saver/security)
- [Billing](https://scrapfly.io/docs/proxy-saver/billing)

#### Cloud Browser API

- [Getting Started](https://scrapfly.io/docs/cloud-browser-api/getting-started)
- [Proxy & Geo-Targeting](https://scrapfly.io/docs/cloud-browser-api/proxy)
- [Unblock API](https://scrapfly.io/docs/cloud-browser-api/unblock)
- [File Downloads](https://scrapfly.io/docs/cloud-browser-api/file-downloads)
- [Session Resume](https://scrapfly.io/docs/cloud-browser-api/session-resume)
- [Human-in-the-Loop](https://scrapfly.io/docs/cloud-browser-api/human-in-the-loop)
- [Debug Mode](https://scrapfly.io/docs/cloud-browser-api/debug-mode)
- [Bring Your Own Proxy](https://scrapfly.io/docs/cloud-browser-api/bring-your-own-proxy)
- [Browser Extensions](https://scrapfly.io/docs/cloud-browser-api/extensions)
- [Native Browser MCP](https://scrapfly.io/docs/cloud-browser-api/mcp)
- [DevTools Protocol](https://scrapfly.io/docs/cloud-browser-api/cdp-reference)
##### Integrations

- [Puppeteer](https://scrapfly.io/docs/cloud-browser-api/puppeteer)
- [Playwright](https://scrapfly.io/docs/cloud-browser-api/playwright)
- [Selenium](https://scrapfly.io/docs/cloud-browser-api/selenium)
- [Vercel Agent Browser](https://scrapfly.io/docs/cloud-browser-api/agent-browser)
- [Browser Use](https://scrapfly.io/docs/cloud-browser-api/browser-use)
- [Stagehand](https://scrapfly.io/docs/cloud-browser-api/stagehand)

- [Billing](https://scrapfly.io/docs/cloud-browser-api/billing)
- [Errors](https://scrapfly.io/docs/cloud-browser-api/errors)


### Tools

- [Antibot Detector](https://scrapfly.io/docs/tools/antibot-detector)

### SDK

- [Golang](https://scrapfly.io/docs/sdk/golang)
- [Python](https://scrapfly.io/docs/sdk/python)
- [Rust](https://scrapfly.io/docs/sdk/rust)
- [TypeScript](https://scrapfly.io/docs/sdk/typescript)
- [Scrapy](https://scrapfly.io/docs/sdk/scrapy)

### Integrations

- [Getting Started](https://scrapfly.io/docs/integration/getting-started)
- [LangChain](https://scrapfly.io/docs/integration/langchain)
- [LlamaIndex](https://scrapfly.io/docs/integration/llamaindex)
- [CrewAI](https://scrapfly.io/docs/integration/crewai)
- [Zapier](https://scrapfly.io/docs/integration/zapier)
- [Make](https://scrapfly.io/docs/integration/make)
- [n8n](https://scrapfly.io/docs/integration/n8n)

### Academy

- [Overview](https://scrapfly.io/academy)
- [Web Scraping Overview](https://scrapfly.io/academy/scraping-overview)
- [Tools](https://scrapfly.io/academy/tools-overview)
- [Reverse Engineering](https://scrapfly.io/academy/reverse-engineering)
- [Static Scraping](https://scrapfly.io/academy/static-scraping)
- [HTML Parsing](https://scrapfly.io/academy/html-parsing)
- [Dynamic Scraping](https://scrapfly.io/academy/dynamic-scraping)
- [Hidden API Scraping](https://scrapfly.io/academy/hidden-api-scraping)
- [Headless Browsers](https://scrapfly.io/academy/headless-browsers)
- [Hidden Web Data](https://scrapfly.io/academy/hidden-web-data)
- [JSON Parsing](https://scrapfly.io/academy/json-parsing)
- [Data Processing](https://scrapfly.io/academy/data-processing)
- [Scaling](https://scrapfly.io/academy/scaling)
- [Walkthrough Summary](https://scrapfly.io/academy/walkthrough-summary)
- [Scraper Blocking](https://scrapfly.io/academy/scraper-blocking)
- [Proxies](https://scrapfly.io/academy/proxies)

---

# MCP Examples &amp; Use Cases

 [  View as markdown ](https://scrapfly.io/?view=markdown)   Copy for LLM    Copy for LLM  [     Open in ChatGPT ](https://chatgpt.com/?hints=search&prompt=Read%20from%20https%3A%2F%2Fscrapfly.io%2Fdocs%2Fmcp%2Fexamples%20so%20I%20can%20ask%20questions%20about%20it.) [     Open in Claude ](https://claude.ai/new?q=Read%20from%20https%3A%2F%2Fscrapfly.io%2Fdocs%2Fmcp%2Fexamples%20so%20I%20can%20ask%20questions%20about%20it.) [     Open in Perplexity ](https://www.perplexity.ai/search/new?q=Read%20from%20https%3A%2F%2Fscrapfly.io%2Fdocs%2Fmcp%2Fexamples%20so%20I%20can%20ask%20questions%20about%20it.) 

 

 

 Real-world examples of what you can build with the Scrapfly MCP Server-from simple data extraction to complex multi-step workflows. You don't write code-you just ask your AI in natural language, and it figures out which tools to use and how to chain them together.

 

##   Detailed Examples 

Let's dive deeper into specific scenarios with full workflows.

    Job Aggregation Multi-source job search    Price Monitoring Track prices &amp; alerts    Content Research Multi-source analysis    Data Extraction LLM-powered parsing    Competitive Intelligence Track competitors    Real Estate Analysis Property market research  

 ####   Scenario 

You want to find all remote Python developer jobs posted today on multiple job boards.

####   Prompt 

    

"Search for remote Python developer jobs on LinkedIn, Indeed, and AngelList. Filter for positions posted in the last 24 hours. Create a summary table with company name, position, salary range, and application link."

 

    

####   What Happens Behind the Scenes 

1. AI calls `scraping_instruction_enhanced` to understand best practices
2. AI uses `web_get_page` to scrape LinkedIn jobs page
3. AI uses `web_scrape` with `extraction_model: "<a href="/docs/extraction-api/automatic-ai/models/job_listing" target="_blank">job_listing</a>"` for Indeed
4. AI uses `web_get_page` for AngelList
5. AI parses all results and filters by date
6. AI creates a formatted table with all matching positions
 
 

 

 

####   Scenario 

Build an automated price tracking system that monitors products across multiple retailers, tracks historical trends, and identifies the best time to buy. Perfect for deal hunters, price comparison apps, or dynamic pricing strategies.

####   Prompt 

    

"I want to buy Sony WH-1000XM5 headphones but I'm looking for the best deal. Check current prices on Amazon, Best Buy, Target, and Walmart. For each retailer, get the price, stock status, shipping options, and any current promotions or discounts. Tell me which retailer has the best overall value considering price, shipping, and availability. Also check if there are any bundle deals or extended warranties included."

 

    

####   What Happens Behind the Scenes 

1. AI calls `scraping_instruction_enhanced` to get optimal scraping parameters
2. AI uses `web_scrape` with `extraction_model: "<a href="/docs/extraction-api/automatic-ai/models/product" target="_blank">product</a>"` for each retailer's product page
3. AI extracts comprehensive data: price, original price, discount %, stock status, shipping cost, delivery time, and warranty info
4. AI checks for promotional codes, bundle offers, and financing options
5. AI calculates total cost (price + shipping + taxes) for accurate comparison
6. AI compares value propositions: free shipping, faster delivery, return policies
7. AI generates detailed comparison with recommendation based on best overall value
 
####   Example Response 

###### Price Comparison for Sony WH-1000XM5 Headphones

 | Retailer | Price | Shipping | Total | Stock | Delivery |
|---|---|---|---|---|---|
| **Amazon** | $349.99 -13% | Free | **$349.99** | In Stock | 2-day Prime |
| **Best Buy** | $399.99 | Free | **$399.99** | In Stock | Same-day pickup |
| **Target** | $379.99 | $5.99 | **$385.98** | In Stock | 3-5 days |
| **Walmart** | $364.99 | Free | **$364.99** | Low Stock | 4-7 days |

**Best Overall Value: Amazon - $349.99**

- **Lowest price:** $50 off MSRP (13% discount)
- **Fast shipping:** Free 2-day Prime delivery
- **Good stock:** Available now, no delays
- **Easy returns:** 30-day return policy
- **Total savings:** $50 vs. Best Buy, $36 vs. Target
 
 

Additional Findings:

- **Best Buy:** Includes free 2-year Geek Squad protection ($79 value) with purchase
- **Amazon:** Currently has 15% off coupon available (clip coupon on product page)
- **Target:** Extra 5% off with RedCard ($19 additional savings)
- **Walmart:** Offers 24-month financing at 0% APR
 
**Price History Insight:**

Based on current pricing, this is the lowest price in the last 90 days. Historical average: $389. **Great time to buy!**

 

 

  **Pro Tip:** Set up automated monitoring by scheduling this prompt to run daily. Use `cache: true` with `cache_ttl: 3600` (1 hour) to track price changes efficiently. Combine with webhooks to get instant notifications when prices drop below your target. 

 [  Explore E-commerce &amp; Price Monitoring Use Case ](https://scrapfly.io/use-case/ecommerce-web-scraping) 

 

 

 

####   Scenario 

Conduct comprehensive research by gathering, analyzing, and synthesizing information from multiple authoritative sources. Perfect for market research, academic literature reviews, trend analysis, or competitive intelligence.

####   Prompt 

    

"I'm writing a whitepaper on quantum computing breakthroughs in 2024. Research the latest developments from MIT News, Nature.com, Quanta Magazine, and ArXiv from the past 3 months. For each breakthrough, extract: the discovery/advancement, lead researchers and institutions, publication date, practical applications, and any benchmarks or performance metrics mentioned. Identify common themes, compare different approaches (superconducting vs. photonic vs. topological qubits), and highlight which institutions are leading the field. Create a timeline of major announcements and summarize the most promising developments."

 

    

####   What Happens Behind the Scenes 

1. AI calls `scraping_instruction_enhanced` for best practices
2. AI constructs search queries for each source (filtering by topic and date range)
3. AI uses `web_scrape` with `extraction_model: "<a href="/docs/extraction-api/automatic-ai/models/article" target="_blank">article</a>"` for article listing pages
4. AI visits individual articles and extracts: title, authors, publication date, abstract, key findings, methodologies, and citations
5. AI filters articles by publication date (last 90 days) and relevance score
6. AI identifies recurring themes, breakthrough categories, and research trends
7. AI maps researchers to institutions and tracks collaboration networks
8. AI extracts quantitative metrics: qubit counts, error rates, coherence times, gate fidelities
9. AI synthesizes findings into structured report with timeline, thematic analysis, and institutional rankings
 
####   Example Response 

###### Quantum Computing Research Summary (Q1 2024)

Research Timeline:

**March 15, 2024** - Major Breakthrough

**IBM Achieves 1,000+ Qubit Processor**

IBM Research announced "Condor," a 1,121 superconducting qubit processor with 20% improvement in error rates. Lead: Dr. Jay Gambetta (IBM Quantum). Source: Nature Physics

 

**March 8, 2024**

**Google Demonstrates Error Correction Milestone**

Achieved logical error rate below physical error rate using surface codes on 72 qubits. Lead: Dr. Hartmut Neven (Google Quantum AI). Source: Nature

 

**February 22, 2024**

**QuEra Demonstrates Neutral-Atom Advantage**

280-qubit neutral-atom system solves optimization problems 100x faster than classical computers. Lead: Dr. Mikhail Lukin (Harvard). Source: Science

 

**January 30, 2024**

**PsiQuantum Secures $620M for Photonic Approach**

Funding round to build fault-tolerant photonic quantum computer by 2027. Room-temperature operation advantage. Source: MIT News

 

 

Key Technology Comparisons:

 | Approach | Leading Institutions | Qubit Count | Key Advantage | Challenge |
|---|---|---|---|---|
| **Superconducting** | IBM, Google, Rigetti | 1,000+ | Scalability &amp; speed | Requires cryogenic cooling |
| **Neutral Atom** | Harvard, QuEra, Pasqal | 280+ | Long coherence times | Gate fidelity improvements needed |
| **Photonic** | PsiQuantum, Xanadu | Theoretical | Room temperature operation | Manufacturing complexity |
| **Trapped Ion** | IonQ, Honeywell | 32+ | Highest gate fidelity (99.9%) | Slower operations |

Leading Institutions by Impact:

1. **IBM Research** - 12 publications, focus on superconducting scalability
2. **Google Quantum AI** - 8 publications, error correction breakthroughs
3. **Harvard University** - 7 publications, neutral-atom innovations
4. **MIT** - 6 publications, algorithm development and materials science
5. **Stanford** - 5 publications, quantum networking protocols
 
Emerging Themes:

- **Error Correction Focus:** 65% of papers emphasize quantum error correction as critical path to fault tolerance
- **Hybrid Approaches:** Growing interest in combining classical and quantum processors for practical advantage
- **Application-Driven:** Shift from "quantum supremacy" to solving real problems (drug discovery, optimization, cryptography)
- **Collaborations:** 40% of breakthroughs involved multi-institutional partnerships
 
**Most Promising Development:**

Google's error correction milestone represents a turning point - demonstrating that logical qubits can be more reliable than physical qubits. This validates the path to fault-tolerant quantum computing within the next 5-10 years. Combined with IBM's 1,000+ qubit processors, we're entering the era of "utility-scale" quantum computing for practical applications.

 

 

  **Pro Tip:** For ongoing research monitoring, use `format: "markdown"` to get clean, AI-friendly content. Chain this with sentiment analysis or citation tracking by asking the AI to extract reference networks. You can also use `screenshots` to capture figures, charts, and diagrams from papers for visual analysis. 

 

 

 

####   Scenario 

Extract complex, structured data from dynamic websites using AI-powered parsing. Perfect for building datasets, enriching CRM data, or scraping sites with inconsistent layouts that would be difficult to parse with traditional selectors.

####   Prompt 

    

"I need a comprehensive dataset of top Italian restaurants in San Francisco for a food delivery partnership. Go to Yelp and extract the top 20 Italian restaurants with ratings above 4 stars. For each restaurant, get: name, exact rating (out of 5), number of reviews, price range ($-$$$$), cuisine tags, full address with zip code, phone number, business hours, popular dishes mentioned in reviews, delivery/takeout availability, health score if visible, and the restaurant's website URL. Also note if they have outdoor seating or take reservations. Export as a structured JSON that I can import into our database."

 

    

####   What Happens Behind the Scenes 

1. AI calls `scraping_instruction_enhanced` to get optimal parameters
2. AI uses `web_scrape` with `render_js: true` (Yelp uses dynamic content)
3. AI provides detailed `extraction_prompt` specifying exact fields and data types
4. Scrapfly's LLM analyzes the page structure and intelligently extracts data across varying HTML layouts
5. AI handles edge cases: missing phone numbers, varied address formats, inconsistent pricing
6. AI may visit individual restaurant pages for additional details (hours, menu, reviews)
7. AI filters results for ratings ≥ 4.0 stars and sorts by relevance
8. AI validates data quality: checks phone format, ensures addresses are complete, normalizes price ranges
9. AI returns structured JSON with consistent schema, ready for database import
 
####   Extraction Tool Usage 

 ```
{
  "tool": "web_scrape",
  "parameters": {
    "url": "https://www.yelp.com/search?find_desc=Italian&find_loc=San+Francisco",
    "pow": "...",
    "render_js": true,
    "extraction_prompt": "Extract detailed information for each Italian restaurant listed. For each restaurant, return a JSON object with these fields: restaurant_name (string), rating (float, out of 5.0), review_count (integer), price_range (string: $ to $$$$), cuisine_tags (array of strings), address (object with street, city, state, zip), phone (string, formatted), business_hours (object with day: hours), popular_dishes (array of strings from reviews), offers_delivery (boolean), offers_takeout (boolean), health_score (string if visible), website_url (string), has_outdoor_seating (boolean), accepts_reservations (boolean). Return as a JSON array of restaurant objects."
  }
}
```

 

   

 

####   Example Response (Excerpt) 

###### Top Italian Restaurants - San Francisco

 ```
[
  {
    "restaurant_name": "Flour + Water",
    "rating": 4.5,
    "review_count": 3847,
    "price_range": "$$",
    "cuisine_tags": ["Italian", "Pasta Shop", "Wine Bar"],
    "address": {
      "street": "2401 Harrison St",
      "city": "San Francisco",
      "state": "CA",
      "zip": "94110"
    },
    "phone": "(415) 826-7000",
    "business_hours": {
      "Monday": "Closed",
      "Tuesday-Thursday": "5:30 PM - 10:00 PM",
      "Friday-Saturday": "5:30 PM - 11:00 PM",
      "Sunday": "5:30 PM - 10:00 PM"
    },
    "popular_dishes": [
      "Handmade Pasta",
      "Margherita Pizza",
      "Burrata",
      "Tiramisu"
    ],
    "offers_delivery": true,
    "offers_takeout": true,
    "health_score": "A",
    "website_url": "https://flourandwater.com",
    "has_outdoor_seating": true,
    "accepts_reservations": true
  },
  {
    "restaurant_name": "Delfina",
    "rating": 4.3,
    "review_count": 2912,
    "price_range": "$$",
    "cuisine_tags": ["Italian", "Californian"],
    "address": {
      "street": "3621 18th St",
      "city": "San Francisco",
      "state": "CA",
      "zip": "94110"
    },
    "phone": "(415) 552-4055",
    "business_hours": {
      "Monday-Sunday": "5:00 PM - 10:00 PM"
    },
    "popular_dishes": [
      "Spaghetti Carbonara",
      "Roasted Chicken",
      "Panna Cotta"
    ],
    "offers_delivery": false,
    "offers_takeout": true,
    "health_score": "A",
    "website_url": "https://delfinasf.com",
    "has_outdoor_seating": false,
    "accepts_reservations": true
  }
  // ... 18 more restaurants
]
```

 

   

 

**Data Quality Summary:**

- ✅ 20 restaurants extracted
- ✅ All ratings ≥ 4.0 stars (filtered as requested)
- ✅ 100% have valid addresses with zip codes
- ✅ 95% have phone numbers (1 missing phone not listed on Yelp)
- ✅ 85% have business hours (3 restaurants don't list hours)
- ✅ All data validated and normalized for database import
 
 

 

######   When to Use Extraction Prompt vs. Extraction Model 

- **Use extraction\_model:** For standard schemas (products, articles, jobs) - faster and more cost-effective
- **Use extraction\_prompt:** For custom fields, complex nested data, or when you need specific filtering/validation logic
- **Combine both:** Start with extraction\_model for base data, then use extraction\_prompt to enrich with custom fields
 
 

  **Pro Tip:** LLM-powered extraction excels at handling inconsistent HTML structures, but it's more expensive than CSS selectors. Use it for complex, one-time scraping tasks or when sites frequently change their markup. For production scraping at scale, consider using `extraction_model` with pre-trained schemas or traditional selector-based extraction for consistent sites. 

 

 

 

####   Scenario 

Track competitor features, pricing changes, and marketing strategies across multiple SaaS platforms to inform your product roadmap.

####   Prompt 

    

"Analyze my top 3 SaaS competitors: Ahrefs, SEMrush, and Moz. For each one, extract their pricing tiers, features included in each tier, and any special promotions on their pricing pages. Also check their blogs for recent product announcements from the last month. Create a comparison table showing feature gaps and pricing strategies."

 

    

####   What Happens Behind the Scenes 

1. AI calls `scraping_instruction_enhanced` to get optimal scraping parameters
2. AI uses `web_scrape` with `render_js: true` to load each competitor's pricing page (dynamic content)
3. AI uses `extraction_prompt` to extract: pricing tiers, features per tier, discounts, and trial terms
4. AI visits each blog/news page and uses `extraction_model: "<a href="/docs/extraction-api/automatic-ai/models/article" target="_blank">article</a>"` to get recent announcements
5. AI filters articles by publication date (last 30 days)
6. AI analyzes and compares pricing structures, feature availability, and positioning
7. AI generates a competitive analysis report with feature gaps, pricing insights, and strategic recommendations
 
####   Example Response 

###### Competitive Analysis Summary

Pricing Comparison:

 | Competitor | Entry Tier | Mid Tier | Enterprise |
|---|---|---|---|
| **Ahrefs** | $99/mo | $179/mo | $399/mo |
| **SEMrush** | $119/mo | $229/mo | $449/mo |
| **Moz** | $79/mo | $149/mo | $249/mo |

Key Feature Gaps:

- Missing Real-time rank tracking (Ahrefs &amp; SEMrush have it)
- Missing Content optimization AI (SEMrush exclusive)
- Advantage More generous API rate limits vs. Moz
 
Recent Product Updates (Last 30 Days):

- **Ahrefs:** Launched AI-powered content brief generator (March 15)
- **SEMrush:** Added YouTube keyword research tool (March 20)
- **Moz:** Updated domain authority algorithm (March 10)
 
 

  **Pro Tip:** Schedule this prompt to run weekly to track competitor pricing changes and new feature releases automatically. Use `cache: true` with appropriate `cache_ttl` to reduce costs for frequently monitored pages. 

 

 

 

####   Scenario 

Research the real estate market in a specific area by aggregating property listings, analyzing price trends, and comparing neighborhoods.

####   Prompt 

    

"I'm looking to invest in rental properties in Austin, Texas. Search Zillow, Redfin, and Realtor.com for 3-bedroom properties under $500k in the following zip codes: 78704, 78702, and 78751. For each listing, extract: address, price, square footage, bedrooms/bathrooms, estimated rental income, year built, and listing URL. Create a summary showing which neighborhood has the best rental yield and price per square foot."

 

    

####   What Happens Behind the Scenes 

1. AI calls `scraping_instruction_enhanced` to get best practices for real estate scraping
2. AI constructs search URLs for each platform with filters (location, price, bedrooms)
3. AI uses `web_scrape` with `extraction_model: "<a href="/docs/extraction-api/automatic-ai/models/real_estate_property_listing" target="_blank">real_estate_property_listing</a>"` for listing pages
4. AI visits individual property detail pages using `extraction_model: "<a href="/docs/extraction-api/automatic-ai/models/real_estate_property" target="_blank">real_estate_property</a>"` to get full details
5. AI extracts key data: price, size, features, rental estimates, HOA fees, property tax
6. AI calculates rental yield: (annual rental income / property price) × 100
7. AI calculates price per square foot for each property
8. AI groups by neighborhood (zip code) and generates comparative analysis with investment recommendations
 
####   Example Response 

###### Austin Real Estate Investment Analysis

Neighborhood Comparison:

 | Zip Code | Avg Price | Avg $/sqft | Est. Rental Yield | Properties Found |
|---|---|---|---|---|
| **78702** (East Austin) | $425,000 | $298 | **6.8%** | 12 |
| **78704** (South Austin) | $485,000 | $342 | 5.2% | 8 |
| **78751** (Hyde Park) | $495,000 | $365 | 4.9% | 5 |

Top Investment Opportunities:

**1. 1402 E 6th St, Austin TX 78702** - Best Value

- **Price:** $399,000 | **Size:** 1,350 sqft | **$/sqft:** $295
- **Est. Monthly Rent:** $2,800 | **Rental Yield:** 8.4%
- **Year Built:** 2018 | **HOA:** None
- [View on Zillow →](https://zillow.com/...)
 
 

**2. 3312 Govalle Ave, Austin TX 78702**

- **Price:** $415,000 | **Size:** 1,420 sqft | **$/sqft:** $292
- **Est. Monthly Rent:** $2,650 | **Rental Yield:** 7.7%
- **Year Built:** 2020 | **HOA:** $50/mo
- [View on Redfin →](https://redfin.com/...)
 
 

 

Investment Recommendation:

**Best Area: 78702 (East Austin)**

- **Highest rental yield:** 6.8% average (vs. 5.2% in 78704)
- **Best value:** $298/sqft (16% cheaper than 78751)
- **Strong rental demand:** Near downtown, UT campus, and tech offices
- **Market trend:** Appreciating 8.2% YoY based on recent sales data
 
 

 

  **Pro Tip:** Use `screenshots` parameter to capture property photos for visual comparison. You can also chain this with review scraping using `extraction_model: "<a href="/docs/extraction-api/automatic-ai/models/review_list" target="_blank">review_list</a>"` to research neighborhood safety and amenities on platforms like Nextdoor or Google Maps. 

 [  Explore Real Estate Use Case ](https://scrapfly.io/use-case/real-estate-web-scraping) 

 

 

 

 

##   Advanced Workflows 

Complex scenarios that chain multiple tools and steps.

    Market Research Multi-step analysis    Lead Generation Automated prospecting    Content Monitoring Change tracking &amp; alerts  

 Comprehensive product analysis across multiple e-commerce sites with sentiment analysis.

1. **Gather data** - Scrape product listings from 5 e-commerce sites
2. **Extract features** - Use `extraction_model: "<a href="/docs/extraction-api/automatic-ai/models/product_listing" target="_blank">product_listing</a>"`
3. **Analyze pricing** - Identify pricing patterns and outliers
4. **Check reviews** - Scrape reviews using `extraction_model: "<a href="/docs/extraction-api/automatic-ai/models/review_list" target="_blank">review_list</a>"`
5. **Sentiment analysis** - Analyze review sentiment
6. **Generate report** - Create comprehensive market analysis
 
 

 

 

Build and enrich contact lists automatically with data validation and scoring.

1. **Search directories** - Find companies matching criteria
2. **Extract contact info** - Get company details, founders, emails
3. **Enrich data** - Look up founders on LinkedIn
4. **Validate** - Check company websites for relevance
5. **Score leads** - Rank by fit and priority
6. **Export** - Format as CSV for CRM import
 
 

 

 

Track website changes over time with automated alerts and version archiving.

1. **Initial capture** - Take screenshot and save HTML
2. **Schedule checks** - Scrape page periodically
3. **Compare** - Detect changes in content or layout
4. **Alert** - Notify when changes detected
5. **Archive** - Store historical versions
 
 

 

 

 

##   Industry-Specific Use Cases 

Real-world applications across different industries.

    E-commerce Pricing &amp; reviews    Finance Market data    Recruiting Job boards &amp; candidates    Media News &amp; research    Real Estate Property listings    Travel Hotels &amp; flights  

 - Price monitoring and competitive intelligence
- Product availability tracking
- Review aggregation and sentiment analysis
- Trend identification and market research
 
 [  Explore E-commerce Use Case ](https://scrapfly.io/use-case/ecommerce-web-scraping) 

 

 

 

- Financial news aggregation
- Stock data collection from multiple sources
- Economic indicator tracking
- Real estate listing analysis
 
 [  Explore Finance Use Case ](https://scrapfly.io/use-case/finance-web-scraping) 

 

 

 

- Job posting aggregation
- Candidate research (LinkedIn, GitHub, portfolios)
- Salary benchmarking
- Company culture research
 
 [  Explore Jobs Use Case ](https://scrapfly.io/use-case/jobs-web-scraping) 

 

 

 

- News monitoring and aggregation
- Academic paper tracking
- Social media sentiment analysis
- Event and conference tracking
 
 [  Explore Media &amp; News Use Case ](https://scrapfly.io/use-case/media-and-news-web-scraping) 

 

 

 

- Property listing aggregation
- Price trend analysis
- Neighborhood research
- Rental market analysis
 
 [  Explore Real Estate Use Case ](https://scrapfly.io/use-case/real-estate-web-scraping) 

 

 

 

- Hotel price comparison
- Flight deal monitoring
- Review aggregation for destinations
- Event and attraction research
 
 [  Explore Travel Use Case ](https://scrapfly.io/use-case/travel-web-scraping) 

 

 

 

 

##   Tips &amp; Best Practices 

Optimize your scraping workflows with these proven strategies.

#####   Best Practices 

- **Call `scraping_instruction_enhanced` first** - Get latest POW parameter
- **Be specific in prompts** - "Get product prices" beats "check website"
- **Use extraction models** - Pre-trained models are faster
- **Handle errors gracefully** - Retry with different parameters
 
 

 

 

#####   Cost Optimization 

- Use `web_get_page` for simple pages
- Disable `render_js` for static content
- Use datacenter proxies by default
- Cache frequently accessed pages
 
 

 

 

#####   Performance 

- Request multiple pages in parallel
- Use `format: "markdown"` for AI
- Set appropriate `rendering_wait`
- Use `format_options: ["only_content"]`
 
 

 

 

 

###   Ready to Build? 

 Start with a simple prompt like "Get me the top posts from Hacker News" and watch your AI use Scrapfly MCP tools to make it happen!

 

 [  Get Started ](https://scrapfly.io/docs/mcp/getting-started) 

 

 

##   Next Steps 

 [   **Get started** with your first MCP integration   ](https://scrapfly.io/docs/mcp/getting-started) [   **Explore all available tools** and their parameters   ](https://scrapfly.io/docs/mcp/tools) [   **Set up authentication** for production use   ](https://scrapfly.io/docs/mcp/authentication) [   **Read the FAQ** for common questions   ](https://scrapfly.io/docs/mcp/faq)