Anatomy of the data
The Spamhaus Intelligence API (SIA) enables users to retrieve reputation data relating to IP addresses and domains.
The structured data is provided in JSON format enabling users to access and integrate the data easily, helping inform and highlight areas of potentially malicious activity relating to IPs and domains.
Here is a detailed inspection of the data, including how to interpret each property.
payload.malware-family
describes the malware family label applied to the payload.
IP address records
The API will return one of more records for each IP address you query, for example:
{
"ipaddress": "179.108.187.53",
"dataset": "XBL",
"listed": 1663178006,
"seen": 1663178004,
"valid_until": 1663782804,
"rule": "07e001d2",
"heuristic": "SPAMBOT",
"botname": "gamut",
"dstip": "164.90.197.3",
"dstport": 25,
"protocol": "TCP",
"srcip": "179.108.187.53",
"srcport": 26023,
"helo": "[179.108.187.53]",
"subject": "Payment from your account."
"asn": "263271",
"cc": "BR",
"lat": -22.9201,
"lon": -43.0811,
},
{
"ipaddress": "109.102.255.230",
"dataset": "XBL",
"listed": 1638015103,
"seen": 1638015103,
"valid_until": 1638619903,
"rule": "025f0130",
"heuristic": "LEGACY",
"botname": "apk.flubot",
"protocol": "tcp",
"srcip": "109.102.255.230",
"srcport": 80
"asn": "9050",
"cc": "RO",
"lat": 44.4117,
"lon": 26.0422,
},
{
"ipaddress": "109.102.255.230",
"dataset": "BCL",
"listed": 1660657372,
"seen": 1660657372,
"valid_until": 1660678972,
"dstport": 80,
"botname": "apk.flubot",
"botname_malpedia": "apk.flubot",
"abused": true,
"shared": false
"asn": "9050",
"cc": "RO",
"lat": 44.4117,
"lon": 26.0422,
},
Context indicators
Context indicators tell you why this IP has been included in the dataset and allows you to analyze or filter out this information in your decision tree.
The
ipaddress
field is the IP address all the returned records refer to.The
dataset
property details the dataset(s) the IP address is listed in; Exploits Blocklist - XBL, Combined Spam Sources - CSS, or Botnet Controller List - BCL. These datasets provide context around the issue(s) this IP is encountering, as each focuses on a different area of abuse. See [insert a link to IP reputation data]
Listing event indicators
These properties identify when the listing happened, why it happened and provide essential intelligence.
The
listed
is a Unix timestamp of the event.The
seen
is the Unix timestamp of the first signal identifier that triggered the listing. For example, a spamtrap hit, a sandbox analysis, or a honeypot signal.The
valid_until
is the Unix time at which this listing should be consideredinvalid
. This listing may be renewed if more signals are observed. In this case, please send another API query for this object before the expiry date to find out if this listing is still valid.The
remove_timestamp
is the Unix time when the team manually removed the listing. Manual removals occur following an external removal request or if the team discovers a false positive entry, which rarely happens.The
rule
is the internal ID pointing to the rule triggering the detection. Detections triggered by different means or rules will show different IDs, even when they refer to the same detection. Please note that this is for internal use only, and we don’t provide a complete list of rules. However, users can use this field to cross-reference different events.The
botname
The bot name associated with the detected activity. Where a clear association isn’t possible, “unknown” will be returned.The
botnam_malpedia
is the Malpedia bot name associated with the detected activity, as named by Malpedia. Where a clear association isn’t possible, “unknown” will be returned. This field isn’t always provided, particularly for historic listings.The
heuristic
represents the parameter contributing to the listing decision.The
detection
is a string in a human-readable form, briefly describing how the data was collected. This field only appears when the heuristic involves multiple data collection methods.
Context-specific indicators
We have multiple context-specific indicators. They will vary depending on the action that triggered the event, for example, Spam through email or a website, trojans, Botnet CC activity, etc. Not all properties are mandatory; therefore, data will not always be returned for every field.
The
dstip
is the destination IP address of the connection that triggered the detection.The
dstport
is the destination port of the connection that triggered the detection.The
protocol
can be TCP or UDPThe
srcip
is the source IP address of the connection that triggered the detection.The
srcport
is the source port for the connection generating the listing, when available.The
helo
is the HELO/EHLO string used in the traffic contributing to the listing.The
helos
is an array of strings and contains several helo domains (or strings) sent by the client.The
subject
is the Subject header line for messages that contributed to the listing. Not always available or published as it is specific to listings referring to EMAIL spam/phish events.The
abused
attribute is a boolean flag. “true” indicates the IP address is a legitimate asset that has been compromised, and “false” indicates that the perpetrators are directly responsible for this asset.The
shared
attribute is a boolean flag. “true” indicates multiple actors are using the IP address in question, and “false” indicates only the single offending entity is using the IP. For example, a fraudulent domain name hosted on shared hosting or Cloudflare). This property also implies that the resource is not compromised.The
domains
property is an array of strings that may contain several domain names involved in the listings. It is often used to indicate the domains used for a spam operation or by a spam gang.The
domain
property is likedomains
but contains only one entry.The
uri
property shows information about one specific URI involved in the listing, for example, a phishing domain URI.The
urls
property lists multiple URLs associated with the listing, for example, phishing domain urls.The
useragent
is the user agent string in the event of a honeypot hit.The
samples
property is an array of elements that contain hashes referring to specific malware samples. Users can look up the hash’s original files at https://urlhaus.abuse.ch
Additional Information
Where possible, we enrich our intelligence data with ASN, country code, and geolocation, including the following properties:
The
asn
is the Autonomous System Number responsible for the IP address.The
cc
is the country code, as indicated by ISO 3166. The country code attributed is associated to the IP address with complex algorithms and, in rare cases, it might not be what you would expect.The
lat
and thelon
properties are the IP address’s latitude and longitude. We extract this data from commercial geolocation services.
Domain Address records
Spamhaus provides domain reputation data via a REST API in JSON format. Before detailing the technical aspects of the dataset, we need to be explicit about the terminology relating to domain names.
What is a top-level domain (TLD)?
A TLD is the highest level in the hierarchical Domain Name System (DNS) after the root domain, i.e., everything that follows the final stop in a domain name before the “path”. For example, in the URL https://www.accountname.example.com/siteaddress, the top-level domain is .com.
Spamhaus expands this definition and includes in its own TLD list registered domains that meet any of the following criteria:
Domains that offer the delegation of control in the DNS to third parties. In other words, we include domains that publish nameserver (NS) records pointing to authoritative nameservers for domains underneath them in our TLD list. This can be verified by following the recursive process using dig +trace.
Domains that offer a WHOIS or Registration Data Access Protocol (RDAP) service.
Domains under which you can register additional domains of the same hierarchy, either paid or free.
Spamhaus reserves the right to include additional domains in this list at our discretion.
What is a domain?
Having defined a TLD, the domain definition follows easily: a domain name is the second-level domain. For example, in the URL https://www.accountname.example.com/siteaddress “example.com” is the second-level domain.
In the SIA REST API chapter, specifically in the TLD list
section, you will learn how to fetch the TLD list and the other specifics of the domain data that we provide.
URLhaus data (beta release)
Here are examples of returned values for different calls. Note that not all fields will be filled in all cases.
{
"id": 1234567,
"url": "http://example.com/i",
"status": {
"ts": 1715764645,
"status": "offline"
},
"payload": {
"file_type": "ELF 32-bit LSB executable, ARM, EABI4 version 1 (SYSV), statically linked, stripped",
"file_ext": "elf",
"file_size": 307960,
"file_name": "na",
"sha256_hash": "d373396a0aa7f85abc9d9fad381ca97e9cbf95ec70ef8711d612333378116370",
"malware_family": "Mirai"
}
}
Fields:
id
is the ID of the URL in the URLhaus database.url
is the subject that was queried.status.ts
is the UNIX timestamp when the returned record was created by abuse.ch.status.status
is the accessibility information at the time the record was created. Possible values are:unknown
: The URL has not yet been checked by URLhausonline
: The resource is accessibleoffline
: The resource is not accessibleremoved
: The URL has been removed from the URLhaus databasereported
: deprecated - should be considered as equivalent to unknown, will only be found for some very old data
payload.ts
is the UNIX timestamp of the time the payload was ingested into SIA.payload.mime_type
is the MIME format of the payload.payload.file_type
is the file format of payload.payload.file_size
is the file size of the payload.payload.file_name
is the file name of the payload.payload.sha256_hash
shows the hash output of the hashing algorithm used.