Spamhaus Intelligence API

The Spamhaus Intelligence API (SIA) is a framework to provide convenient access for users to different datasets, giving them flexibility to utilize this data across multiple applications, services and products.

The framework comprises of several API endpoints. One is used to authenticate users and give them a token. This token is used to query the other endpoints.

The following is a list of all available endpoints and how to use them.

Login

To access the API, each request must have a proper Authorization header.

The Authorization header must be in the form:

Authorization: Bearer <AUTH TOKEN>

<AUTH TOKEN> is a string obtained through the following login API call.

Issue a call to the Login API which is accessed from the URL path /api/v1/login and send a POST request containing a JSON object that includes the username and password fields with the correct values as the following example shows:

curl -s -d '{"username":"[email protected]", "password":"m4g1c", "realm":"intel"}' \
	https://api.spamhaus.org/api/v1/login

username is the email address associated with your SIA account
password is the SIA password assigned to such user
realm is -as far SIA is involved- always intel

In the case of success, the HTTP status code will be 200 and the body will contain a JSON object similar to the following one:

{
  "code": 200,
  "token": "eyJ0eXAi[......]dx2UTSGcyEKvU",
  "expires": 1583252180
}

The successful JSON response object will include an “expires” integer field representing the unix timestamp of when the token will expire. Usually each token has a validity of 24h.

In the case of authentication failure, the API will return a different status code and an associated error message as part of the JSON object:

{
  "code": 401,
  "message": "Authentication failed"
}

When an auth token has expired any subsequent API requests will result in a 401 Unauthorized HTTP response code. In this case a new auth token is needed before sending additional API requests.

However, there is no need to wait until a token has expired before requesting a new one. We recommend refreshing the token when the current one is close to expiration.

NOTE: Please do NOT request an auth token for each API request. To protect against brute force attacks this action is penalized and may result in API access being temporarily blocked. Additionally, it will make the service extremely slow due to the added latency introduced by the repeated authentication sessions.

Limits and counters

Access to SIA comes with limits to the number of queries that can be run over specified time frames.
The limits API endpoint gives the ability to check the maximum query limits applied to the account in use, and the number of queries used so far for each different limit e.g. 24 hours and 30 days.

Usage example:

curl -sH 'Authorization: Bearer <AUTH TOKEN>' \
        https://api.spamhaus.org/api/intel/v1/limits

This query returns a JSON object like this:

{
  "status": 200,
  "account": {
    "sub": "3534543",
    "usr": "[email protected]"
  },
  "limits": {
    "ads": "XBL,BCL,CSS",
    "trs": "base",
    "qms": 1000,
    "qmh": 1500,
    "rl_qph": 3600,
    "rl_qpm": 60,
    "rl_qps": 1
  },
  "current": {
    "qpm": 18,
    "qpd": 18,
    "rl_qph": 5,
    "rl_qpm": 0,
    "rl_qps": 0
  }
}

Field explanations:

code - will be 200 in the case of success; otherwise an error occurred
account - an object which shows the account properties:
- sub - shows the account subscription identifier
- usr - shows the account username
limits - an object containing the global limits of the account
- ads - allowed queries datasets (comma separated list)
- trs - identifier of the access level, defaults to “base”
- qms - an integer showing the max number of queries per month allowed (soft limit)
- qmh - an integer showing the max number of queries per month allowed (hard limit)
- rl_qph - an integer showing the rate limit applied (queries per hour)
- rl_qpm - an integer showing the rate limit applied (queries per minute)
- rl_qps - an integer showing the rate limit applied (queries per second)
current - an object containing the current counters of the account
- qpm - shows the number of actual queries performed during the current month
- qpd - shows the number of actual queries performed during the current day
- rl_qph - an integer showing the rate limit current counter (queries per hour)
- rl_qpm - an integer showing the rate limit current counter (queries per minute)
- rl_qps - an integer showing the rate limit current counter (queries per second)

The rate limit values are predominantly intended as a measure to prevent abuse of the system that can cripple its functionalities. The contractual tier of the user is defined by qms and qmh.
The soft limit is the number of queries per month contractually allowed. If these limits are exceeded on a regular basis, the query volumes on the account need reviewing.

If the hard limit is reached all subsequent queries will be refused with a 429 - Too many requests HTTP error code.

The same error message will be provided if your query rate is too high and hitting one of the rate limit values (the various rl_qpX entries).
In this case you’re kindly asked to add some delay between API calls to slow down your query rate.

The rate limiting engine thresholds are proportional to the maximum number of queries per day that your account can perform based on account type.

NOTE: Not all queries are counted as 1. Queries for CIDR resources are applied a multiplier related with the size of the requested CIDR:

a query for a /32 IPv4 (or a /64 IPv6) is counted as 1

a query for a /31 IPv4 (or a /63 IPv6) is counted as 2

…

a query for a /24 IPv4 (or a /56 IPv6) is counted as 8

To calculate the “count” of a query for a CIDR, apply log₂X where X is the number of distinct resources (addresses in the case of IPv4, /64’s in the case of IPv6) contained in the CIDR.

IP reputation data

Search by CIDR

This API endpoint allows the request of either current or historical listing information for a specific network block.

The query URL is:

GET /api/intel/v1/byobject/cidr/<DATASET>/<MODE>/<QTY><IPADDRESS>[/<MASK>][?get_arguments..]

Arguments:

DATASET - identifies the dataset where the search can be performed. For example: XBL
MODE - search CIDR mode. Can have two different values:
- listed search all submissions contained within specified netmask
- listings search all submissions that contains specified netmask (this should only be used when querying a dataset that allows listings of variable sizes)
TYPE - can either be “live” or “history”:
- live only returns listings that are still “active”, i.e., they are currently listed and haven’t expired from the Spamhaus datasets yet
- history returns any record seen within the (implicit or explicit) time window, including expired data
IPADDRESS - IP address to look for
MASK - optional netmask to use. It defaults to 32 for IPv4 or 64 for IPv6; its maximum value can be 24 or 56 for IPv4 and IPv6 searches respectively

Available GET arguments:

limit - constrain the number of rows returned by the query
since - extract results with a timestamp greater than or equal to ‘since’ (unix timestamp); default is 12 months before since if not passed
until - extract results with a timestamp less than or equal to ‘until’ (unix timestamp); defaults to the current timestamp if not passed

NOTE: When querying for historical data, the maximum timeframe allowed is 12 months; passing a larger interval will result in an error code. If a larger timespan is required, multiple queries are needed.

Some usage examples:

# get active listings for 45.150.206.114 in XBL
curl -s https://api.spamhaus.org/api/intel/v1/byobject/cidr/XBL/listed/live/45.150.206.114 \
	-H 'Authorization: Bearer <AUTH TOKEN>'

# get last 10 listing events for IPs in 45.150.206.0/24 from XBL
curl -s https://api.spamhaus.org/api/intel/v1/byobject/cidr/XBL/listed/history/45.150.206.0/24?limit=10 \
	-H 'Authorization: Bearer <AUTH TOKEN>'

# get submissions from a specified time range for 45.150.206.0/24
curl -s 'https://api.spamhaus.org/api/intel/v1/byobject/cidr/XBL/listed/history/45.150.206.0/24?since=1606600000&until=1606750000' \
	-H 'Authorization: Bearer <AUTH TOKEN>'

Output

Independent of the type of query against the data, all successful queries will return a JSON object composed of a code with the same value as the HTTP code at the HTTP protocol layer.

Here is an example of a query returning no data:

{"code":404}

If the query results in data being returned, the JSON object provides an array named results with all the records returned by the query.

For example:
https://api.spamhaus.org/api/intel/v1/byobject/cidr/XBL/listed/history/74.77.66.227?limit=2 would return an object like the following:

{
  "code": 200,
  "results": [
    {
      "dataset": "XBL",
      "ipaddress": "74.77.66.227",
      "asn": "11351",
      "cc": "US",
      "listed": 1606757120,
      "seen": 1606757113,
      "valid_until": 1607361913,
      "rule": "01a400d5",
      "botname": "unknown",
      "detection": "SMTP impersonation",
      "dstport": 25,
      "helo": "outlook.com",
      "heuristic": "IMPERSONATE",
      "lat": 43.0505,
      "lon": -78.853,
      "srcip": "74.77.66.227"
    },
    {
      "dataset": "XBL",
      "ipaddress": "74.77.66.227",
      "asn": "11351",
      "cc": "US",
      "listed": 1606063971,
      "seen": 1606063960,
      "valid_until": 1606668760,
      "rule": "01a400d5",
      "botname": "unknown",
      "detection": "SMTP impersonation",
      "dstport": 25,
      "helo": "outlook.com",
      "heuristic": "IMPERSONATE",
      "lat": 43.0505,
      "lon": -78.853,
      "srcip": "74.77.66.227"
    }
  ]
}

For details of the fields included in each record, please refer to the relevant dataset’s documentation.

Domain reputation data

Search by domain

This API endpoint allows the request of the reputation data regarding a given domain.

The query URL is:

GET /api/intel/v1/byobject/domain/rep/<DOMAIN>

Arguments:

DOMAIN - identifies the domain for reach reputation data is being requested. This needs to be the bare domain (like in example.com) and not a hostname inside it (like www.example.com).

Output

All successful queries will return a JSON object composed of a code with the same value as the HTTP code at the HTTP protocol layer.

Here is an example of a query returning no data:

{"code":404}

If the query results in data being returned, the JSON object provides a result object containing all the data made available.

For example:
https://api.spamhaus.org/api/intel/v1/byobject/domain/rep/example.com would return an object like the following:

{
  "code": 200,
  "result": {
    "domain": "example.com",
    "reputation": "great",
    "registrar": "RESERVED-Internet Assigned Numbers Authority",
    "date_created": 808358400,
    "first_seen": 1248469080,
    "last_seen": 1661863800,
    "trusted_tld": false,
    "corporate_registrar": false,
    "ns": [
      {
        "hostname": "a.iana-servers.net",
        "first_seen": 1250643120,
        "last_seen": 1661863800,
        "reputation": "great"
      },
      {
        "hostname": "b.iana-servers.net",
        "first_seen": 1250643120,
        "last_seen": 1661863800,
        "reputation": "great"
      }
    ],
    "senders": [
      {
        "ip": "93.95.228.211",
        "last_seen": 1661863800
      },
      {
        "ip": "95.111.251.196",
        "last_seen": 1661863800
      },
      {
        "ip": "103.149.120.10",
        "last_seen": 1661863800
      },
      {
        "ip": "107.191.56.52",
        "last_seen": 1661863800
      },
      {
        "ip": "108.170.43.243",
        "last_seen": 1661863800
      },
      {
        "ip": "111.90.148.163",
        "last_seen": 1661863800
      },
      {
        "ip": "123.231.243.132",
        "last_seen": 1661863800
      },
      {
        "ip": "151.236.57.12",
        "last_seen": 1661863800
      },
      {
        "ip": "200.7.39.182",
        "last_seen": 1661863800
      },
      {
        "ip": "204.15.146.3",
        "last_seen": 1661863800
      }
    ]
  }
}

For details of the fields included in each record, please refer to the domain reputation data under the “Available datasets”documentation.

Dataset Download

This API endpoint allows the download of an entire current dataset, in compressed format.
Access to this API endpoint is only granted to customers who have subscribed for complete access to the dataset.

The query URL is:

GET /api/intel/v1/download/ext/<DATASET>

Arguments list:

DATASET - identifies the dataset to download. Currently supported dataset values are: bcl, xbl, css

Usage example:

# get eBCL full dataset export file
wget --header="Authorization: Bearer <AUTH TOKEN>" \
   --output-document=bcl.tgz \
   https://api.spamhaus.org/api/intel/v1/download/ext/bcl

Return Codes

The dataset download API will return the following HTTP codes:

200 - Download OK
401 - User not allowed to access the functionality
404 - Specified dataset file not found

Any other return code in the 4xx range could be possible and in general would identify a problem in accessing the functionality by the user.