Anatomy of the data

The Spamhaus Intelligence API (SIA) enables users to retrieve reputation data relating to IP addresses and domains.

The structured data is provided in JSON format enabling users to access and integrate the data easily, helping inform and highlight areas of potentially malicious activity relating to IPs and domains.

Here is a detailed inspection of the data, including how to interpret each property.

IP address records

The API will return one of more records for each IP address you query, for example:

    {
      "ipaddress": "179.108.187.53",
      "dataset": "XBL",
      "listed": 1663178006,
      "seen": 1663178004,
      "valid_until": 1663782804,
      "rule": "07e001d2",
      "heuristic": "SPAMBOT",
      "botname": "gamut",
      "dstip": "164.90.197.3",
      "dstport": 25,
      "protocol": "TCP",
      "srcip": "179.108.187.53",
      "srcport": 26023,
      "helo": "[179.108.187.53]",
      "subject": "Payment from your account."
      "asn": "263271",
      "cc": "BR",
      "lat": -22.9201,
      "lon": -43.0811,
    },
    {
      "ipaddress": "109.102.255.230",
      "dataset": "XBL",
      "listed": 1638015103,
      "seen": 1638015103,
      "valid_until": 1638619903,
      "rule": "025f0130",
      "heuristic": "LEGACY",
      "botname": "apk.flubot",
      "protocol": "tcp",
      "srcip": "109.102.255.230",
      "srcport": 80
      "asn": "9050",
      "cc": "RO",
      "lat": 44.4117,
      "lon": 26.0422,
    },
    {
      "ipaddress": "109.102.255.230",
      "dataset": "BCL",
      "listed": 1660657372,
      "seen": 1660657372,
      "valid_until": 1660678972,
      "dstport": 80,
      "botname": "apk.flubot",
      "botname_malpedia": "apk.flubot",
      "abused": true,
      "shared": false
      "asn": "9050",
      "cc": "RO",
      "lat": 44.4117,
      "lon": 26.0422,
    },

Context indicators

Context indicators tell you why this IP has been included in the dataset and allows you to analyze or filter out this information in your decision tree.

  • The ipaddress field is the IP address all the returned records refer to.

  • The dataset property details the dataset(s) the IP address is listed in; Exploits Blocklist - XBL, Combined Spam Sources - CSS, or Botnet Controller List - BCL. These datasets provide context around the issue(s) this IP is encountering, as each focuses on a different area of abuse. See [insert a link to IP reputation data]

Listing event indicators

These properties identify when the listing happened, why it happened and provide essential intelligence.

  • The listed is a Unix timestamp of the event.

  • The seen is the Unix timestamp of the first signal identifier that triggered the listing. For example, a spamtrap hit, a sandbox analysis, or a honeypot signal.

  • The valid_until is the Unix time at which this listing should be considered invalid. This listing may be renewed if more signals are observed. In this case, please send another API query for this object before the expiry date to find out if this listing is still valid.

  • The remove_timestamp is the Unix time when the team manually removed the listing. Manual removals occur following an external removal request or if the team discovers a false positive entry, which rarely happens.

  • The rule is the internal ID pointing to the rule triggering the detection. Detections triggered by different means or rules will show different IDs, even when they refer to the same detection. Please note that this is for internal use only, and we don’t provide a complete list of rules. However, users can use this field to cross-reference different events.

  • The botname The bot name associated with the detected activity. Where a clear association isn’t possible, “unknown” will be returned.

  • The botnam_malpedia is the Malpedia bot name associated with the detected activity, as named by Malpedia. Where a clear association isn’t possible, “unknown” will be returned. This field isn’t always provided, particularly for historic listings.

  • The heuristic represents the parameter contributing to the listing decision.

  • The detection is a string in a human-readable form, briefly describing how the data was collected. This field only appears when the heuristic involves multiple data collection methods.

Context-specific indicators

We have multiple context-specific indicators. They will vary depending on the action that triggered the event, for example, Spam through email or a website, trojans, Botnet CC activity, etc. Not all properties are mandatory; therefore, data will not always be returned for every field.

  • The dstip is the destination IP address of the connection that triggered the detection.

  • The dstport is the destination port of the connection that triggered the detection.

  • The protocol can be TCP or UDP

  • The srcip is the source IP address of the connection that triggered the detection.

  • The srcport is the source port for the connection generating the listing, when available.

  • The helo is the HELO/EHLO string used in the traffic contributing to the listing.

  • The helos is an array of strings and contains several helo domains (or strings) sent by the client.

  • The subject is the Subject header line for messages that contributed to the listing. Not always available or published as it is specific to listings referring to EMAIL spam/phish events.

  • The abused attribute is a boolean flag. “true” indicates the IP address is a legitimate asset that has been compromised, and “false” indicates that the perpetrators are directly responsible for this asset.

  • The shared attribute is a boolean flag. “true” indicates multiple actors are using the IP address in question, and “false” indicates only the single offending entity is using the IP. For example, a fraudulent domain name hosted on shared hosting or Cloudflare). This property also implies that the resource is not compromised.

  • The domains property is an array of strings that may contain several domain names involved in the listings. It is often used to indicate the domains used for a spam operation or by a spam gang.

  • The domain property is like domains but contains only one entry.

  • The uri property shows information about one specific URI involved in the listing, for example, a phishing domain URI.

  • The urls property lists multiple URLs associated with the listing, for example, phishing domain urls.

  • The useragent is the user agent string in the event of a honeypot hit.

  • The samples property is an array of elements that contain hashes referring to specific malware samples. Users can look up the hash’s original files at https://urlhaus.abuse.ch

Additional Information

Where possible, we enrich our intelligence data with ASN, country code, and geolocation, including the following properties:

  • The asn is the Autonomous System Number responsible for the IP address.

  • The cc is the country code, as indicated by ISO 3166. The country code attributed is associated to the IP address with complex algorithms and, in rare cases, it might not be what you would expect.

  • The lat and the lon properties are the IP address’s latitude and longitude. We extract this data from commercial geolocation services.

Domain records

A domain record is entirely different from an IP record, sharing distinct types of information. The Spamhaus research team provides reputation data for every domain they observe.

The following is a sample record for the google domain:

  {
    "domain": "google.com",
    "reputation": "great",
    "registrar": "Mark Monitor",
    "date_created": 874281600,
    "first_seen": 1248982542,
    "last_seen": 1663185184,
    "trusted_tld": false,
    "corporate_registrar": true,
    "ns": [
      {
        "hostname": "ns4.google.com",
        "first_seen": 1249739579,
        "last_seen": 1663185188,
        "reputation": "great"
      },
      {
        "hostname": "ns3.google.com",
        "first_seen": 1249739579,
        "last_seen": 1663185188,
        "reputation": "great"
      },
      {
        "hostname": "ns1.google.com",
        "first_seen": 1249739579,
        "last_seen": 1663185188,
        "reputation": "great"
      },
      {
        "hostname": "ns2.google.com",
        "first_seen": 1249739579,
        "last_seen": 1663185188,
        "reputation": "great"
      }
    ],
    "senders": [
      {
        "ip": "113.89.11.15",
        "last_seen": 1663184586
      },
      {
        "ip": "209.85.128.41",
        "last_seen": 1663184915
      },
      {
        "ip": "209.85.128.46",
        "last_seen": 1663184254
      },
      {
        "ip": "209.85.128.51",
        "last_seen": 1663184674
      },
      {
        "ip": "209.85.128.54",
        "last_seen": 1663184434
      },
      {
        "ip": "209.85.128.67",
        "last_seen": 1663184374
      },
      {
        "ip": "209.85.160.66",
        "last_seen": 1663184194
      },
      {
        "ip": "209.85.218.49",
        "last_seen": 1663185154
      },
      {
        "ip": "209.85.221.49",
        "last_seen": 1663184254
      },
      {
        "ip": "209.85.221.67",
        "last_seen": 1663184434
      }
    ]
  }

The properties are as follows:

Common records

  • domain The domain name the record is associated with.

  • reputation The reputation that Spamhaus associates with the domain. Potential values are:

  • malicious

  • bad

  • neutral

  • good

  • great

  • registrar is the registrar through which the domain is being managed, as available in registration data. The field will be missing if the information is unavailable for any reason. In different situations, a registrar’s name appears in varying formats. The name we provide is normalized for consistency.

  • date_created The Unix timestamp details the date and time the domain was registered, as extracted from registration data. Where this data can’t be extracted, the field will be missing.

  • first_seen The Unix timestamp details the first time the domain has been seen in use by Spamhaus researchers, independent of the context of its use.

  • last_seen The Unix timestamp details the last time the domain has been seen in use by Spamhaus, independent of the context of its use.

  • type If a domain has been observed to be a threat vector, this field indicates what type of threat it is associated with. In the case of no threat being associated with it the field will be missing. Potential values that can be returned:

  • phish

  • malware

  • botnetcc

  • snowshoe

  • redirector

  • adware

  • sinkhole

  • trusted_tld is a boolean value. Where true it indicates the top-level domain (TLD) part of the domain is restricted to verified and limited entries, for example .gov, .mil, .bank. Where false no such restrictions apply to the TLD.

  • corporate_registrar is a boolean value. Where true the domain’s registrar is a “corporate-type” with additional secure services for example “MarkMonitor”, “ComLaude”, “CSC” etc. Where false the domain’s registrar is a standard type of registrar.

  • history is an array of entries enumerating all the reputation changes the domain has been going through since Spamhaus has been monitoring it. Each entry is composed of three fields:

  • from_reputation the reputation value before the change.

  • to_reputation the reputation value the domain moved to.

  • time the Unix timestamp detailing when the reputation change took place

  • ns is an array of entries enumerating the hostnames that have been indicated as nameserver delegations for the given domain. Note that this information is taken by the parent domain (in most cases, the TLD), and as such can be subject to forgery, as in “a domain is pointing the delegation to a host that is not really providing namserver services for that domain or at all”. Each array entry is composed of three fields:

  • hostname is the hostname indicated as the target of the NS delegation

  • first_seen is the Unix timestamp detailing when the delegation was first observed.

  • last_seen is the Unix timestamp detailing when the delegation was most recently observed.

  • reputation is the reputation associated with the given nameserver. Researchers calculate this by weighting the average reputation of the domains pointing to it.

Context-specific indicators

  • senders is an array of entries. Each contains an ip field detailing an IP address and a last_seen timestamp. This provides users with information about where and when the domain has been observed in spamtrap data (if any).