Unblocking the Insites spider

In rare circumstances, Insites may be blocked from downloading the pages of a website by an anti-bot filter.

This is usually because Insites’s requests originate in the AWS (Amazon Web Services) cloud, which some filtering systems block. This article gives the necessary technical information to unblock Insites’s requests in your filtering system. Note – it is only possible to unblock websites for which you control the filtering.

Unblocking Insites

The best way to unblock Insites requests is to look for the following HTTP request header which is part of every request Insites makes:

X-BUSINESS-ANALYSER: Insites

Using Cloudflare

If your website is proxied via Cloudflare, there may be rules in place that can “challenge” visitors depending on a number of criteria such as IP address, country etc. This includes a Javascript challenge, where the browser will show a Cloudflare loading page whilst it checks that your browser is “real” before it allows you to proceed to the website, as well as a Captcha style challenge where the visitor will be asked to check a box or verify a random scenario first.

The issue with the above is that Prospect cannot overcome this, so will likely fail the challenge and be blocked by Cloudflare’s firewall rules. This typically results in a 503 error in Prospect.

To resolve this, the website owner needs to add an exception to their Cloudflare firewall settings to allow requests from AWS through the firewall. This can be done using the Autonomous System Number (ASN) for AWS, which is currently:

AS14618

There are a number of ways to do this within Cloudflare, and we recommend following their official firewall documentation for the most up-to-date guidance. However, as of the time of writing, this could be done as follows:

  1. Login to your Cloudflare account
  2. Select the website you want to unblock AWS for
  3. Select “Firewall”
  4. Go to the “Firewall Rules” tab
  5. Select “Create a Firewall rule”
  6. Provide a suitable rule name – e.g. Allow AWS for Prospect
  7. From the “Field” drop-down box, choose AS Num
  8. From the “Operator” drop-down box, choose equals
  9. In the “Value” field, enter the ASN for AWS, which is AS14618
  10. Then in the “Choose an action” drop-down, choose Allow before hitting the “Deploy” button.

Alternatively, if you prefer adding the expression manually (as opposed to the expression builder option highlighted above), simply add:

(ip.geoip.asnum eq 14618)

And choose Allow from the action drop-down menu.

This will then allow any request from AWS through Cloudflare’s firewall and should allow Prospect to analyse the website. Please note that this will also allow other AWS requests through the firewall, not just those from Prospect.

What doesn’t work

Insites’s server IPs are not static and change daily, so the best way to unblock Insites is via request data.

Unfortunately, user agent is also not ideal as in order to circumvent some filtering tools, Insites uses a user agent copied from a real browser and is subject to change. The current user agent for Insites is:

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36

If you need further assistance, please contact your Insites account manager.

  • Was this helpful?
  • Yes   No