Anatomy of the data

The Spamhaus Intelligence API (SIA) enables users to retrieve reputation data relating to IP addresses and domains.

The structured data is provided in JSON format enabling users to access and integrate the data easily, helping inform and highlight areas of potentially malicious activity relating to IPs and domains.

Here is a detailed inspection of the data, including how to interpret each property.

payload.malware-family describes the malware family label applied to the payload.

IP address records

The API will return one of more records for each IP address you query, for example:

    {
      "ipaddress": "179.108.187.53",
      "dataset": "XBL",
      "listed": 1663178006,
      "seen": 1663178004,
      "valid_until": 1663782804,
      "rule": "07e001d2",
      "heuristic": "SPAMBOT",
      "botname": "gamut",
      "dstip": "164.90.197.3",
      "dstport": 25,
      "protocol": "TCP",
      "srcip": "179.108.187.53",
      "srcport": 26023,
      "helo": "[179.108.187.53]",
      "subject": "Payment from your account."
      "asn": "263271",
      "cc": "BR",
      "lat": -22.9201,
      "lon": -43.0811,
    },
    {
      "ipaddress": "109.102.255.230",
      "dataset": "XBL",
      "listed": 1638015103,
      "seen": 1638015103,
      "valid_until": 1638619903,
      "rule": "025f0130",
      "heuristic": "LEGACY",
      "botname": "apk.flubot",
      "protocol": "tcp",
      "srcip": "109.102.255.230",
      "srcport": 80
      "asn": "9050",
      "cc": "RO",
      "lat": 44.4117,
      "lon": 26.0422,
    },
    {
      "ipaddress": "109.102.255.230",
      "dataset": "BCL",
      "listed": 1660657372,
      "seen": 1660657372,
      "valid_until": 1660678972,
      "dstport": 80,
      "botname": "apk.flubot",
      "botname_malpedia": "apk.flubot",
      "abused": true,
      "shared": false
      "asn": "9050",
      "cc": "RO",
      "lat": 44.4117,
      "lon": 26.0422,
    },

Context indicators

Context indicators tell you why this IP has been included in the dataset and allows you to analyze or filter out this information in your decision tree.

The ipaddress field is the IP address all the returned records refer to.
The dataset property details the dataset(s) the IP address is listed in; Exploits Blocklist - XBL, Combined Spam Sources - CSS, or Botnet Controller List - BCL. These datasets provide context around the issue(s) this IP is encountering, as each focuses on a different area of abuse. See [insert a link to IP reputation data]

Listing event indicators

These properties identify when the listing happened, why it happened and provide essential intelligence.

The listed is a Unix timestamp of the event.
The seen is the Unix timestamp of the first signal identifier that triggered the listing. For example, a spamtrap hit, a sandbox analysis, or a honeypot signal.
The valid_until is the Unix time at which this listing should be considered invalid. This listing may be renewed if more signals are observed. In this case, please send another API query for this object before the expiry date to find out if this listing is still valid.
The remove_timestamp is the Unix time when the team manually removed the listing. Manual removals occur following an external removal request or if the team discovers a false positive entry, which rarely happens.
The rule is the internal ID pointing to the rule triggering the detection. Detections triggered by different means or rules will show different IDs, even when they refer to the same detection. Please note that this is for internal use only, and we don’t provide a complete list of rules. However, users can use this field to cross-reference different events.
The botname The bot name associated with the detected activity. Where a clear association isn’t possible, “unknown” will be returned.
The botnam_malpedia is the Malpedia bot name associated with the detected activity, as named by Malpedia. Where a clear association isn’t possible, “unknown” will be returned. This field isn’t always provided, particularly for historic listings.
The heuristic represents the parameter contributing to the listing decision.
The detection is a string in a human-readable form, briefly describing how the data was collected. This field only appears when the heuristic involves multiple data collection methods.

Context-specific indicators

We have multiple context-specific indicators. They will vary depending on the action that triggered the event, for example, Spam through email or a website, trojans, Botnet CC activity, etc. Not all properties are mandatory; therefore, data will not always be returned for every field.

The dstip is the destination IP address of the connection that triggered the detection.
The dstport is the destination port of the connection that triggered the detection.
The protocol can be TCP or UDP
The srcip is the source IP address of the connection that triggered the detection.
The srcport is the source port for the connection generating the listing, when available.
The helo is the HELO/EHLO string used in the traffic contributing to the listing.
The helos is an array of strings and contains several helo domains (or strings) sent by the client.
The subject is the Subject header line for messages that contributed to the listing. Not always available or published as it is specific to listings referring to EMAIL spam/phish events.
The abused attribute is a boolean flag. “true” indicates the IP address is a legitimate asset that has been compromised, and “false” indicates that the perpetrators are directly responsible for this asset.
The shared attribute is a boolean flag. “true” indicates multiple actors are using the IP address in question, and “false” indicates only the single offending entity is using the IP. For example, a fraudulent domain name hosted on shared hosting or Cloudflare). This property also implies that the resource is not compromised.
The domains property is an array of strings that may contain several domain names involved in the listings. It is often used to indicate the domains used for a spam operation or by a spam gang.
The domain property is like domains but contains only one entry.
The uri property shows information about one specific URI involved in the listing, for example, a phishing domain URI.
The urls property lists multiple URLs associated with the listing, for example, phishing domain urls.
The useragent is the user agent string in the event of a honeypot hit.
The samples property is an array of elements that contain hashes referring to specific malware samples. Users can look up the hash’s original files at https://urlhaus.abuse.ch

Additional Information

Where possible, we enrich our intelligence data with ASN, country code, and geolocation, including the following properties:

The asn is the Autonomous System Number responsible for the IP address.
The cc is the country code, as indicated by ISO 3166. The country code attributed is associated to the IP address with complex algorithms and, in rare cases, it might not be what you would expect.
The lat and the lon properties are the IP address’s latitude and longitude. We extract this data from commercial geolocation services.

Domain Address records

Spamhaus provides domain reputation data via a REST API in JSON format. Before detailing the technical aspects of the dataset, we need to be explicit about the terminology relating to domain names.

What is a top-level domain (TLD)?

A TLD is the highest level in the hierarchical Domain Name System (DNS) after the root domain, i.e., everything that follows the final stop in a domain name before the “path”. For example, in the URL https://www.accountname.example.com/siteaddress, the top-level domain is .com.

Spamhaus expands this definition and includes in its own TLD list registered domains that meet any of the following criteria:

Domains that offer the delegation of control in the DNS to third parties. In other words, we include domains that publish nameserver (NS) records pointing to authoritative nameservers for domains underneath them in our TLD list. This can be verified by following the recursive process using dig +trace.
Domains that offer a WHOIS or Registration Data Access Protocol (RDAP) service.
Domains under which you can register additional domains of the same hierarchy, either paid or free.

Spamhaus reserves the right to include additional domains in this list at our discretion.

What is a domain?

Having defined a TLD, the domain definition follows easily: a domain name is the second-level domain. For example, in the URL https://www.accountname.example.com/siteaddress “example.com” is the second-level domain.

In the SIA REST API chapter, specifically in the TLD list section, you will learn how to fetch the TLD list and the other specifics of the domain data that we provide.

URLhaus data (beta release)

Here are examples of returned values for different calls. Note that not all fields will be filled in all cases.

{
    "id": 1234567,
    "url": "http://example.com/i",
    "status": {
        "ts": 1715764645,
        "status": "offline"
    },
    "payload": {
        "file_type": "ELF 32-bit LSB executable, ARM, EABI4 version 1 (SYSV), statically linked, stripped",
        "file_ext": "elf",
        "file_size": 307960,
        "file_name": "na",
        "sha256_hash": "d373396a0aa7f85abc9d9fad381ca97e9cbf95ec70ef8711d612333378116370",
        "malware_family": "Mirai"
    }
}

Fields:

id is the ID of the URL in the URLhaus database.
url is the subject that was queried.
status.ts is the UNIX timestamp when the returned record was created by abuse.ch.
status.status is the accessibility information at the time the record was created. Possible values are:
- unknown: The URL has not yet been checked by URLhaus
- online: The resource is accessible
- offline: The resource is not accessible
- removed: The URL has been removed from the URLhaus database
- reported: deprecated - should be considered as equivalent to unknown, will only be found for some very old data
payload.ts is the UNIX timestamp of the time the payload was ingested into SIA.
payload.mime_type is the MIME format of the payload.
payload.file_type is the file format of payload.
payload.file_size is the file size of the payload.
payload.file_name is the file name of the payload.
payload.sha256_hash shows the hash output of the hashing algorithm used.