Unblocking the Insites spider
In rare circumstances, Insites may be blocked from downloading the pages of a website by an anti-bot filter.
This is usually because Insites’s requests originate in the AWS (Amazon Web Services) cloud, which some filtering systems block. This article gives the necessary technical information to unblock Insites’s requests in your filtering system. Note – it is only possible to unblock websites for which you control the filtering.
The best way to unblock Insites requests is to look for the following HTTP request header which is part of every request Insites makes:
The issue with the above is that Prospect cannot overcome this, so will likely fail the challenge and be blocked by Cloudflare’s firewall rules. This typically results in a 503 error in Prospect.
To resolve this, the website owner needs to add an exception to their Cloudflare firewall settings to allow requests from AWS through the firewall. This can be done using the Autonomous System Number (ASN) for AWS, which is currently:
There are a number of ways to do this within Cloudflare, and we recommend following their official firewall documentation for the most up-to-date guidance. However, as of the time of writing, this could be done as follows:
- Login to your Cloudflare account
- Select the website you want to unblock AWS for
- Select “Firewall”
- Go to the “Firewall Rules” tab
- Select “Create a Firewall rule”
- Provide a suitable rule name – e.g.
Allow AWS for Prospect
- From the “Field” drop-down box, choose
- From the “Operator” drop-down box, choose
- In the “Value” field, enter the ASN for AWS, which is
- Then in the “Choose an action” drop-down, choose
Allowbefore hitting the “Deploy” button.
Alternatively, if you prefer adding the expression manually (as opposed to the expression builder option highlighted above), simply add:
(ip.geoip.asnum eq 14618)
Allow from the action drop-down menu.
This will then allow any request from AWS through Cloudflare’s firewall and should allow Prospect to analyse the website. Please note that this will also allow other AWS requests through the firewall, not just those from Prospect.
What doesn’t work
Insites’s server IPs are not static and change daily, so the best way to unblock Insites is via request data.
Unfortunately, user agent is also not ideal as in order to circumvent some filtering tools, Insites uses a user agent copied from a real browser and is subject to change. The current user agent for Insites is:
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36
If you need further assistance, please contact your Insites account manager.