FAQ
Here are some of the most common issues and questions that come up when using Scrapfly API. See the tag filter on the right for more.
How to send a POST-type requests?
This can be done by calling scrapfly API through POST
HTTP requests
instead of the default GET
. Scrapfly uses the same request type for scraping
as it's called with.
See the request method customization page for more info.
For Scrapfly Typescript and Python SDKs the method="POST"
parameter can be used instead.
How to send cookies or custom request headers?
This can be done through the headers API parameter.
Cookies can also be set using the cookies
shortcut parameter in Typescript and Python SDKs.
See the request cookie customization page for more info.
Note that the header values have to be url encoded.
Why are URL parameters missing in my Scrapfly requests?
If your Scrapfly scrape requests end up missing URL parameters it's likely that you are
not url encoding the URL properly as url
parameter has to be
url encoded
- Valid:
url=https%3A%2F%2Fhttpbin.org%2Fget%3Ffoo%3D1%26bar%3D2
- Invalid:
url=https://httpbin.org/get?foo=1&bar=2
; Here, thebar=2
will be interpreted as Scrapfly parameter not scrape URL parameter.
Why is my scraped HTML result missing data?
There are many reasons why Scrapfly scraper might see results differently compared to your test web browser. Here are the main points.
The website could be dynamically loaded through browser javascript. If you're scraping without
render_js parameter Scrapfly is not executing javascript which can cause the said data difference.
If you are using render_js
then ensure the scraper is waiting for the website to fully load
using the wait_for_selector or rendering_wait parameters.
See this Dynamic Scraping Academy tutorial for more on dynamic pages.
Another cause could be anti-bot measures which are designed to block scrapers. For this, make sure Anti Scraping Protection bypass is enabled.
Finally, data could be missing because of a different proxy country. Try changing the proxy country parameter to match your region.
How to bypass anti-bot protection?
To scrape websites protected by anti-scraping shields like Cloudflare, Perimeter X etc. Scrapfly's Anti Scraping Protection bypass feature can be used.
Why does ASP fail to bypass protection?
Scrapfly's Anti Scraping Protection bypass is a powerful but not a silver bullet for all cases. However, there are a few things you can do to ensure ASP has the best chance at bypassing anti-bot systems:
The first step is to ensure send headers and URL parameters match what the website expects. Scrapfly does manage fingerprint and all-natural headers but websites can set custom values through javascript or other scripting. For some examples, refer to our Scrapeground CSRF and CSRF exercise tutorials.
When scraping with POST
ensure that the request body also matches the expected formatting and
data values.
See the Anti Scraping Protection bypass page for more.
How to get the scraped response directly?
By default, Scrapfly returns scrape results as JSON datasets, however the proxified_response parameter can be used to get the raw page response directly.
How to scrape with headless browsers?
Scrapfly's render_js parameter can be used to scrape with headless browsers. See Javascript Rendering page for more.
How to download images, PDFs and other binary files?
Scrapfly can be used to scrape binary data like images, PDFs, etc.
The scraped result type is indicated in result.format
field, it can be
TEXT
or BINARY
.
Binary responses are base64
encoded and have to be decoded manually. You can find our online tools with some code example
Binary and text responses are billed differently. See the Billing by Response Type page for more.
What does HTTP status code 400 mean?
Response status code 400 means that the request made to Scrapfly API is malformed. Ensure that the request parameters match the expected values and formatting described in the API specification. For more see the Errors page for all applicable error codes related to status code 400.
To debug this see response json content which describes what parameter is configured incorrectly.
Examples
Missing required url
parameter:
Invalid protocol in the url
parameter:
What does HTTP status code 401 mean?
Response status code 401 means authorization error. You most likely forgot to add key
parameter
to Scrapfly API calls.
For more see the Errors page for all applicable error codes related to status code 401.
Examples
What does HTTP status code 404 mean?
Response status code 404 means API path is not found. This is most likely caused by the typo in Scrapfly API URL.
Why am I getting Read Timeout errors?
As complex scraping operations can take a long time to execute Scrapfly sets read timeout to 155 seconds. Most HTTP clients have lower defaults of 30 seconds or less. To prevent read timeout set your HTTP client's read timeout to at least 155 seconds.
How do I know how many API credits my scrapes use?
The credit use is calculated based on scraping details and enabled features.
-
From the API response:
The results are available in the
X-Scrapfly-Api-Cost
API response header andcontext.cost
data field. Alternatively, cost breakdown can also be found in the Monitoring dashboard undercost
field. -
From the dashboard:
The cost per call and the overview is available in the dashboard monitoring section,
then you can inspect the cost of each scrape. On each scrape log, you can see the cost breakdown on the right side under the
cost
tab.
What is concurrency?
Concurrency measures the number of requests that are currently in flight through Scrapfly. Each Scrapfly plan has a different concurrency quota and exceeding it will result in failed scrape requests.
Concurrency is decreased by 1 for each scheduled scrape and reset as soon as the scrape request finishes.
Related Error: ERR::SCRAPE::TOO_MANY_CONCURRENT_REQUEST
If you're having trouble with concurrency usage decreasing ensure that your HTTP client's read timeout is set to 155 seconds.
How to check concurrency use?
To program with concurrency in mind, you can check the following API response headers:
-
X-Scrapfly-Account-Concurrent-Usage
Indicate the current number of requests in flight (scrape is waiting for the response). Global, applied to current Scrapfly account. -
X-Scrapfly-Account-Remaining-Concurrent-Usage
Indicates the remaining concurrency limit. Global, applied to current Scrapfly account. -
X-Scrapfly-Project-Concurrent-Usage
Indicate the current number of requests in flight (scrape is waiting for the response). Scoped, applied to the selected Scrapfly project. -
X-Scrapfly-Project-Remaining-Concurrent-Usage
Indicates the remaining concurrency limit. Scoped, applied to the selected Scrapfly project.
Related Error: ERR::SCRAPE::TOO_MANY_CONCURRENT_REQUEST
What is asynchronous programming?
Languages like Javascript, Python, etc. support asynchronous programming which allows easy asynchronous connections. This is one of the easiest ways to handle web scraper scaling and both Scrapfly Python SDK and Typescript SDK support easy concurrency through async.
What happens if I run out of Scrapfly credits?
Scrapfly Extra Usage feature allows you to continue scraping beyond the subscription credit limit to prevent scrapers from breaking and potentially losing data. Note that Extra Usage credits are generally more expensive than subscription credits so it's best to avoid it if it's possible to predict the scraping load. see the Billing page for more info.
In case of upgrade or downgrade, any Extra API Credit Billed are not retroactively recomputed against the new plan and are not refundable.
How to prevent extra usage?
From PRO plan, you can go over your quota and be billed on pay as you go model, this is called extra usage. You can completely disable extra usage on project basis:
- In your dashboard, in the left menu, click on project
- Select a project
- Unselect "Allow Extra Usage"
- Click update
By default, all account of hard limit of extra usage to avoid any major issue. You can't do 125% of your quota in extra usage. That mean for a quota of 1M API Credit, you can perform 1,25M API Credit in extra, for a total of 2,25M API Credit.
If you reach this limit the account is suspended and an account manager will reach you to figure out the situation.
How to estimate credit cost of a scrape?
Scrapfly API credit cost is calculated based on the scrape details and enabled features. For that refer to the Billing Table page.
However, the easiest way to estimate scrape cost is to try it in our Web API Player! Each scrape request made in the web player also shows up in monitoring dashboard where scrape cost breakdown can be found.
How to cancel Scrapfly subscription?
You can cancel your subscription from your dashboard: click on the account setting located on the right side of the top bar, then billing and then on the right side, the "plan" card, there is a cancel button. Alternatively, here's the direct link to the dashboard billing page. Cancellation keep your current subscription until the renewal date and downgrade to free.
How to control Scrapfly credit spending?
- Project can be configured with spending, concurrency and extra credit use limits.
- Throttling can be customized to define a budget limit per hour, day, week or month. Note that too low a budget can block all scraping so ensure a proper minimum is set.
- cost_budget parameter can be used to define the maximum budget per scrape. Note that too low a budget can block all scraping so ensure a proper minimum is set.
What proxy types does Scrapfly support?
Scrapfly supports millions of datacenter and residential proxies which can be configured through the proxy_pool parameter. See more in the proxy documentation page.
What proxy countries does Scrapfly support?
Scrapfly supports over 50 countries for its residential and datacenter proxies. The full list can be found in the proxy documentation page and desired country can be configured through the country parameter.
How to set Scrapfly proxy country?
The country can be set to set proxy country for Scrapfly scrape requests. This parameter can also take a list of countries (separated by a comma) and Scrapfly will choose a random proxy from the list for each scraper. If the parameter is not set Scrapfly will choose a semi-random proxy from all available countries. For more see the proxy documentation page.
Can Scrapfly parse HTML results?
Scrapfly focuses on scraping and leaves HTML parsing to the user's creativity.
That being said, both Scrapfly Typescript and Python SDKs have HTML parsing utilities through
result.selector
method which implements Xpath and/or CSS selectors for HTML parsing.
For more on HTML parsing in web scraping see HTML Parsing Academy tutorial page.
Can Scrapfly parse JSON results?
Scrapfly focuses on scraping and leaves JSON parsing to the user's creativity. Though, JSON is naturally an easy datatype to parse and for more on JSON parsing in web scraping see JSON Parsing Academy tutorial page.
Does Scrapfly support sessions?
Yes, Scrapfly sessions can be used to persist cookies and other details between scrapes. Sessions can even be transferred between render_js and non-render_js scrapes making it an ideal optimization tool. For more see the Session page.
Where can I practice and test Scrapfly scrapers?
Scrapfly provides several tools for testing and developing web scrapers:
What programming languages does Scrapfly support?
Scrapfly can be accessed in any programming language with HTTP client support. Alternatively, Scrapfly also provides Typescript and Python libraries with many convenience features.
Can Scrapfly's headless browsers be controlled to click buttons etc?
Yes, Scrapfly supports browser automation scenarios that can be used to click around websites, fill forms, etc. See the Javascript Scenario page for more.
Can Scrapfly's headless browsers execute custom javascript?
Yes, Scrapfly headless browsers can execute custom user javascript code using the js parameter. See the Javascript Rendering page for more. If you're looking to automate browser actions see the Javascript Scenario page instead.
Can I use Scrapfly as a HTTP Proxy?
No, the HTTP proxy protocol does not offer enough capability to enable Scrapfly features however proxified_response feature can be used that imitates HTTP proxy behavior.
Can I take screenshots with Scrapfly?
Yes, Scrapfly can take screenshots of the whole page or specific page areas indicated by CSS or XPath selectors. For more see the Screenshot page.
Where can I see history of my scrape requests?
Scrapfly provides a Monitoring dashboard where all scrape requests are logged.
Why do my scrape request details differ from my scrape config?
Scrapfly can automatically adjust scrape details like request headers to avoid anti-bot detection when Anti Scraping Protection bypass is enabled. These details shouldn't affect the web scraping results and just ensure that the scrape request is not blocked.
Can I throttle and slow down my Scrapfly scrape requests?
Yes, Scrapfly supports Throttling which allows to set scraping speed and budget scoped by Scrapfly Project or scrape URL patterns.
Can I cache Scrapfly scrape requests?
Yes, Scrapfly supports Cache which can cache scrape results on Scrapfly servers. Any subsequent scrape requests will be served from the cache for a set amount of time or until the cache is explicitly cleared. In short, see the cache and cache_ttl parameters.
How much does cache cost and how long does it last?
Scrapfly cache is provided free of charge and stores data for the amount of time defined by Scrapfly plan's log retention policy. Alternatively, the cache_ttl parameter can be used to control cache TTL.
See the Cache page for more info.
Does Scrapfly supports pre-built scrapers for popular targets?
Web scraping process with Scrapfly is easy enough that instead of prebuilt scrapes Scrapfly provides reference implementations on Github for popular targets like Amazon, Google, Instagram, Real estate listing websites etc. to keep the Scrapfly API simple and easy to use and provide developers with needed flexibility.
The tutorials for these targets can also be found on our blog with tag #scrapeguide.
Where can I learn about web scraping?
Scrapfly provides many educational resources for learning web scraping. The best place to start is our Scrapfly Academy which follows a step-by-step roadmap for learning everything about web scraping.
Additionally, we publish a lot of tutorials, guides and industry highlights on Scrapfly Blog. We also cover common issues and questions in our Scrapfly Knowledgebase.