FAQ

View as markdown

Here are some of the most common issues and questions that come up when using Scrapfly API. See the tag filter on the right for more.

How to send a POST-type requests?

This can be done by calling scrapfly API through POST HTTP requests instead of the default GET. Scrapfly uses the same request type for scraping as it's called with. See the request method customization page for more info.

For Scrapfly Typescript and Python SDKs the method="POST" parameter can be used instead.

How to send cookies or custom request headers?

This can be done through the headers API parameter. Cookies can also be set using the cookies shortcut parameter in Typescript and Python SDKs. See the request cookie customization page for more info.

Note that the header values have to be url encoded.

Why are URL parameters missing in my Scrapfly requests?

If your Scrapfly scrape requests end up missing URL parameters it's likely that you are not url encoding the URL properly as url parameter has to be url encoded

Valid: url=https%3A%2F%2Fhttpbin.org%2Fget%3Ffoo%3D1%26bar%3D2
Invalid: url=https://httpbin.org/get?foo=1&bar=2; Here, the bar=2 will be interpreted as Scrapfly parameter not scrape URL parameter.

Why is my scraped HTML result missing data?

There are many reasons why Scrapfly scraper might see results differently compared to your test web browser. Here are the main points.

The website could be dynamically loaded through browser javascript. If you're scraping without render_js parameter Scrapfly is not executing javascript which can cause the said data difference. If you are using render_js then ensure the scraper is waiting for the website to fully load using the wait_for_selector or rendering_wait parameters. See this Dynamic Scraping Academy tutorial for more on dynamic pages.

Another cause could be anti-bot measures which are designed to block scrapers. For this, make sure Anti Scraping Protection bypass is enabled.

Finally, data could be missing because of a different proxy country. Try changing the proxy country parameter to match your region.

How to bypass anti-bot protection?

To scrape websites protected by anti-scraping shields like Cloudflare, Perimeter X etc. Scrapfly's Anti Scraping Protection bypass feature can be used.

Why does ASP fail to bypass protection?

Scrapfly's Anti Scraping Protection bypass is a powerful but not a silver bullet for all cases. However, there are a few things you can do to ensure ASP has the best chance at bypassing anti-bot systems:

The first step is to ensure send headers and URL parameters match what the website expects. Scrapfly does manage fingerprint and all-natural headers but websites can set custom values through javascript or other scripting. For some examples, refer to our Scrapeground CSRF and CSRF exercise tutorials.

When scraping with POST ensure that the request body also matches the expected formatting and data values.

See the Anti Scraping Protection bypass page for more.

How to get the scraped response directly?

By default, Scrapfly returns scrape results as JSON datasets, however the proxified_response parameter can be used to get the raw page response directly.

How to scrape with headless browsers?

Scrapfly's render_js parameter can be used to scrape with headless browsers. See Javascript Rendering page for more.

How to download images, PDFs and other binary files?

Scrapfly can be used to scrape binary data like images, PDFs, etc. The scraped result type is indicated in result.format field, it can be TEXT or BINARY.

Binary responses are base64 encoded and have to be decoded manually. You can find our online tools with some code example

Binary and text responses are billed differently. See the Billing by Response Type page for more.

What does HTTP status code 400 mean?

Response status code 400 means that the request made to Scrapfly API is malformed. Ensure that the request parameters match the expected values and formatting described in the API specification. For more see the Errors page for all applicable error codes related to status code 400.

To debug this see response json content which describes what parameter is configured incorrectly.

Examples

Missing required url parameter:

                    {
                    "error_id": "26490351-0ff4-4b13-b885-3fbf115e2a89",
                    "http_code": 400,
                    "links": null,
                    "message": "Missing mandatory `url` parameter e.g: url=https://google.com",
                    "reason": "Bad Request"
                    }

Invalid protocol in the url parameter:

                    {
                    "error_id": "c318989d-6928-48e8-be4f-07a3da549e3d",
                    "http_code": 400,
                    "links": null,
                    "message": "Invalid uri protocol `url` parameter must begin with http:// or https://, given httpz://httpbin.dev/get",
                    "reason": "Bad Request"
                    }

What does HTTP status code 401 mean?

Response status code 401 means authorization error. You most likely forgot to add key parameter to Scrapfly API calls. For more see the Errors page for all applicable error codes related to status code 401.

Examples

                {
                "status": "error",
                "http_code": 401,
                "reason": "Unauthorized",
                "error_id": "301e2d9e-b4f5-4289-85ea-e452143338df",
                "message": "Invalid API key"
                }

What does HTTP status code 404 mean?

Response status code 404 means API path is not found. This is most likely caused by the typo in Scrapfly API URL.

Why am I getting Read Timeout errors?

As complex scraping operations can take a long time to execute Scrapfly sets read timeout to 155 seconds. Most HTTP clients have lower defaults of 30 seconds or less. To prevent read timeout set your HTTP client's read timeout to at least 155 seconds.

How do I know how many API credits my scrapes use?

The credit use is calculated based on scraping details and enabled features.

From the API response: The results are available in the X-Scrapfly-Api-Cost API response header and context.cost data field. Alternatively, cost breakdown can also be found in the Monitoring dashboard under cost field.
From the dashboard: The cost per call and the overview is available in the dashboard monitoring section, then you can inspect the cost of each scrape. On each scrape log, you can see the cost breakdown on the right side under the cost tab.

What is concurrency?

Concurrency measures the number of requests that are currently in flight through Scrapfly. Each Scrapfly plan has a different concurrency quota and exceeding it will result in failed scrape requests.

Concurrency is decreased by 1 for each scheduled scrape and reset as soon as the scrape request finishes.

Related Error: ERR::SCRAPE::TOO_MANY_CONCURRENT_REQUEST

If you're having trouble with concurrency usage decreasing ensure that your HTTP client's read timeout is set to 155 seconds.

How to check concurrency use?

To program with concurrency in mind, you can check the following API response headers:

X-Scrapfly-Account-Concurrent-Usage Indicate the current number of requests in flight (scrape is waiting for the response). Global, applied to current Scrapfly account.
X-Scrapfly-Account-Remaining-Concurrent-Usage Indicates the remaining concurrency limit. Global, applied to current Scrapfly account.
X-Scrapfly-Project-Concurrent-Usage Indicate the current number of requests in flight (scrape is waiting for the response). Scoped, applied to the selected Scrapfly project.
X-Scrapfly-Project-Remaining-Concurrent-Usage Indicates the remaining concurrency limit. Scoped, applied to the selected Scrapfly project.

Related Error: ERR::SCRAPE::TOO_MANY_CONCURRENT_REQUEST

What is asynchronous programming?

Languages like Javascript, Python, etc. support asynchronous programming which allows easy asynchronous connections. This is one of the easiest ways to handle web scraper scaling and both Scrapfly Python SDK and Typescript SDK support easy concurrency through async.

What happens if I run out of Scrapfly credits?

Scrapfly PAG allows you to continue scraping beyond the subscription credit limit to prevent scrapers from breaking and potentially losing data. Note that PAG usage are generally more expensive than subscription credits so it's best to avoid it if it's possible to predict the scraping load. see the Billing page for more info.

In case of upgrade or downgrade, any ongoing PAG usage is billed at the plan rate they have been recorded with and are not refundable.

How to prevent PAG?

From PRO plan, you can go over your quota and be billed on PAG model. You can completely disable PAG per project basis:

In your dashboard, in the left menu, click on project
Select a project
Unselect "Allow PAG"
Click update

By default, all account include a hard limit for Pay As You Go to avoid any major issue. The Pay As You Go usage is capped to 125% of your monthly quota. You will receive a notification when you reach 100% of your quota to warn you.

Example: You have 1,000,000 API Credits per month, you can use up to 1,250,000 API Credits on PAG, so a total of 2,250,000 credits.

If you have ongoing operation that will reach this limit, you can contact us to increase the limit exceptionally. If you are reaching this limit on your ENTERPRISE plan, you can contact us to create a custom plan.

How to estimate API Credit cost of a scrape?

Scrapfly API credit cost is calculated based on the scrape details and enabled features. For that refer to the Billing Table page.

However, the easiest way to estimate scrape cost is to try it in our Web API Player, on the right side, we display the estimation for a given configuration.
Each scrape request made in the web player also shows up in monitoring dashboard where scrape cost breakdown can be found.

How to cancel Scrapfly subscription?

You can cancel your subscription from your dashboard: click on the account setting located on the right side of the top bar, then billing and then on the right side, the "plan" card, there is a cancel button. Alternatively, here's the direct link to the dashboard billing page. Cancellation keep your current subscription until the renewal date and downgrade to free.

How to control Scrapfly credit spending?

Project can be configured with spending, concurrency and PAG use limits.
Throttling can be customized to define a budget limit per hour, day, week or month. Note that too low a budget can block all scraping so ensure a proper minimum is set.
cost_budget parameter can be used to define the maximum budget per scrape. Note that too low a budget can block all scraping so ensure a proper minimum is set.

What proxy types does Scrapfly support?

Scrapfly supports millions of datacenter and residential proxies which can be configured through the proxy_pool parameter. See more in the proxy documentation page.

What proxy countries does Scrapfly support?

Scrapfly supports over 50 countries for its residential and datacenter proxies. The full list can be found in the proxy documentation page and desired country can be configured through the country parameter.

How to set Scrapfly proxy country?

The country can be set to set proxy country for Scrapfly scrape requests. This parameter can also take a list of countries (separated by a comma) and Scrapfly will choose a random proxy from the list for each scraper. If the parameter is not set Scrapfly will choose a semi-random proxy from all available countries. For more see the proxy documentation page.

Can Scrapfly parse HTML results?

Yes, Scrapfly's Extraction API can parse results using AI models, LLM engines or predefined JSON templates.

For more on HTML parsing in web scraping see HTML Parsing Academy tutorial page.

Can Scrapfly parse JSON results?

Yes, Scrapfly's Extraction API can parse results using AI models, LLM engines or predefined JSON templates.

For more on JSON parsing in web scraping see JSON Parsing Academy tutorial page.

Does Scrapfly support sessions?

Yes, Scrapfly sessions can be used to persist cookies and other details between scrapes. Sessions can even be transferred between render_js and non-render_js scrapes making it an ideal optimization tool. For more see the Session page.

Where can I practice and test Scrapfly scrapers?

Scrapfly provides several tools for testing and developing web scrapers:

httpbin.dev is a HTTP response simulation that can be used to simulate various HTTP responses.

web-scraping.dev is a mock e-commerce website designed to demonstrate common web scraping patterns.

Scrapfly Academy features web scraping lessons and exercises.

What programming languages does Scrapfly support?

Scrapfly can be accessed in any programming language with HTTP client support. Alternatively, Scrapfly also provides Typescript and Python libraries with many convenience features.

Can Scrapfly's headless browsers be controlled to click buttons etc?

Yes, Scrapfly supports browser automation scenarios that can be used to click around websites, fill forms, etc. See the Javascript Scenario page for more.

Does Scrapfly support no-code integrations?

Yes, Scrapfly supports several no-code integrations like Zapier, Make and N8N.

Does Scrapfly integrate with LLM frameworks??

Yes, Scrapfly integrates with LLM development frameworks like Langchain and LlamaIndex which allows to create LLM agents capable of using Scrapfly APIs and services.

Can Scrapfly's headless browsers execute custom javascript?

Yes, Scrapfly headless browsers can execute custom user javascript code using the js parameter. See the Javascript Rendering page for more. If you're looking to automate browser actions see the Javascript Scenario page instead.

Can I use Scrapfly as a HTTP Proxy?

No, the HTTP proxy protocol does not offer enough capability to enable Scrapfly features however proxified_response feature can be used that imitates HTTP proxy behavior.

Can I take screenshots with Scrapfly?

Yes, Scrapfly can take screenshots of the whole page or specific page areas indicated by CSS or XPath selectors. For more see the Screenshot page.

Where can I see history of my scrape requests?

Scrapfly provides a Monitoring dashboard where all scrape requests are logged.

Why do my scrape request details differ from my scrape config?

Scrapfly can automatically adjust scrape details like request headers to avoid anti-bot detection when Anti Scraping Protection bypass is enabled. These details shouldn't affect the web scraping results and just ensure that the scrape request is not blocked.

Can I throttle and slow down my Scrapfly scrape requests?

Yes, Scrapfly supports Throttling which allows to set scraping speed and budget scoped by Scrapfly Project or scrape URL patterns.

Can I cache Scrapfly scrape requests?

Yes, Scrapfly supports Cache which can cache scrape results on Scrapfly servers. Any subsequent scrape requests will be served from the cache for a set amount of time or until the cache is explicitly cleared. In short, see the cache and cache_ttl parameters.

How much does cache cost and how long does it last?

Scrapfly cache is provided free of charge and stores data for the amount of time defined by Scrapfly plan's log retention policy. Alternatively, the cache_ttl parameter can be used to control cache TTL.

See the Cache page for more info.

Does Scrapfly supports pre-built scrapers for popular targets?

Web scraping process with Scrapfly is easy enough that instead of prebuilt scrapes Scrapfly provides reference implementations on Github for popular targets like Amazon, Google, Instagram, Real estate listing websites etc. to keep the Scrapfly API simple and easy to use and provide developers with needed flexibility.

The tutorials for these targets can also be found on our blog with tag #scrapeguide.

Where can I learn about web scraping?

Scrapfly provides many educational resources for learning web scraping. The best place to start is our Scrapfly Academy which follows a step-by-step roadmap for learning everything about web scraping.

Additionally, we publish a lot of tutorials, guides and industry highlights on Scrapfly Blog. We also cover common issues and questions in our Scrapfly Knowledgebase.

What AI extraction object models scrapfly supports?

Scrapfly's Extraction supports most common data objects encountered on the web like products, articles, reviews etc. For full list of supported objects see the available extraciton models

FAQ

How to send a POST-type requests?

How to send cookies or custom request headers?

Why are URL parameters missing in my Scrapfly requests?

Why is my scraped HTML result missing data?

How to bypass anti-bot protection?

Why does ASP fail to bypass protection?

How to get the scraped response directly?

How to scrape with headless browsers?

How to download images, PDFs and other binary files?

What does HTTP status code 400 mean?

Examples

What does HTTP status code 401 mean?

Examples

What does HTTP status code 404 mean?

Why am I getting Read Timeout errors?

How do I know how many API credits my scrapes use?

What is concurrency?

How to check concurrency use?

What is asynchronous programming?

What happens if I run out of Scrapfly credits?

How to prevent PAG?

How to estimate API Credit cost of a scrape?

How to cancel Scrapfly subscription?

How to control Scrapfly credit spending?

What proxy types does Scrapfly support?

What proxy countries does Scrapfly support?

How to set Scrapfly proxy country?

Can Scrapfly parse HTML results?

Can Scrapfly parse JSON results?

Does Scrapfly support sessions?

Where can I practice and test Scrapfly scrapers?

What programming languages does Scrapfly support?

Can Scrapfly's headless browsers be controlled to click buttons etc?

Does Scrapfly support no-code integrations?

Does Scrapfly integrate with LLM frameworks??

Can Scrapfly's headless browsers execute custom javascript?

Can I use Scrapfly as a HTTP Proxy?

Can I take screenshots with Scrapfly?

Where can I see history of my scrape requests?

Why do my scrape request details differ from my scrape config?

Can I throttle and slow down my Scrapfly scrape requests?

Can I cache Scrapfly scrape requests?

How much does cache cost and how long does it last?

Does Scrapfly supports pre-built scrapers for popular targets?

Where can I learn about web scraping?

What AI extraction object models scrapfly supports?

Filter by Tag