Anatomy of the data
The Spamhaus Intelligence API (SIA) enables users to retrieve reputation data relating to IP addresses and domains.
The structured data is provided in JSON format enabling users to access and integrate the data easily, helping inform and highlight areas of potentially malicious activity relating to IPs and domains.
Here is a detailed inspection of the data, including how to interpret each property.
payload.malware-familydescribes the malware family label applied to the payload.
IP address records
The API will return one of more records for each IP address you query, for example:
{
"ipaddress": "179.108.187.53",
"dataset": "XBL",
"listed": 1663178006,
"seen": 1663178004,
"valid_until": 1663782804,
"rule": "07e001d2",
"heuristic": "SPAMBOT",
"botname": "gamut",
"dstip": "164.90.197.3",
"dstport": 25,
"protocol": "TCP",
"srcip": "179.108.187.53",
"srcport": 26023,
"helo": "[179.108.187.53]",
"subject": "Payment from your account."
"asn": "263271",
"cc": "BR",
"lat": -22.9201,
"lon": -43.0811,
},
{
"ipaddress": "109.102.255.230",
"dataset": "XBL",
"listed": 1638015103,
"seen": 1638015103,
"valid_until": 1638619903,
"rule": "025f0130",
"heuristic": "LEGACY",
"botname": "apk.flubot",
"protocol": "tcp",
"srcip": "109.102.255.230",
"srcport": 80
"asn": "9050",
"cc": "RO",
"lat": 44.4117,
"lon": 26.0422,
},
{
"ipaddress": "109.102.255.230",
"dataset": "BCL",
"listed": 1660657372,
"seen": 1660657372,
"valid_until": 1660678972,
"dstport": 80,
"botname": "apk.flubot",
"botname_malpedia": "apk.flubot",
"abused": true,
"shared": false
"asn": "9050",
"cc": "RO",
"lat": 44.4117,
"lon": 26.0422,
},
Context indicators
Context indicators tell you why this IP has been included in the dataset and allows you to analyze or filter out this information in your decision tree.
The
ipaddressfield is the IP address all the returned records refer to.The
datasetproperty details the dataset(s) the IP address is listed in; Exploits Blocklist - XBL, Combined Spam Sources - CSS, or Botnet Controller List - BCL. These datasets provide context around the issue(s) this IP is encountering, as each focuses on a different area of abuse. See [insert a link to IP reputation data]
Listing event indicators
These properties identify when the listing happened, why it happened and provide essential intelligence.
The
listedis a Unix timestamp of the event.The
seenis the Unix timestamp of the first signal identifier that triggered the listing. For example, a spamtrap hit, a sandbox analysis, or a honeypot signal.The
valid_untilis the Unix time at which this listing should be consideredinvalid. This listing may be renewed if more signals are observed. In this case, please send another API query for this object before the expiry date to find out if this listing is still valid.The
remove_timestampis the Unix time when the team manually removed the listing. Manual removals occur following an external removal request or if the team discovers a false positive entry, which rarely happens.The
ruleis the internal ID pointing to the rule triggering the detection. Detections triggered by different means or rules will show different IDs, even when they refer to the same detection. Please note that this is for internal use only, and we don’t provide a complete list of rules. However, users can use this field to cross-reference different events.The
botnameThe bot name associated with the detected activity. Where a clear association isn’t possible, “unknown” will be returned.The
botnam_malpediais the Malpedia bot name associated with the detected activity, as named by Malpedia. Where a clear association isn’t possible, “unknown” will be returned. This field isn’t always provided, particularly for historic listings.The
heuristicrepresents the parameter contributing to the listing decision.The
detectionis a string in a human-readable form, briefly describing how the data was collected. This field only appears when the heuristic involves multiple data collection methods.
Context-specific indicators
We have multiple context-specific indicators. They will vary depending on the action that triggered the event, for example, Spam through email or a website, trojans, Botnet CC activity, etc. Not all properties are mandatory; therefore, data will not always be returned for every field.
The
dstipis the destination IP address of the connection that triggered the detection.The
dstportis the destination port of the connection that triggered the detection.The
protocolcan be TCP or UDPThe
srcipis the source IP address of the connection that triggered the detection.The
srcportis the source port for the connection generating the listing, when available.The
helois the HELO/EHLO string used in the traffic contributing to the listing.The
helosis an array of strings and contains several helo domains (or strings) sent by the client.The
subjectis the Subject header line for messages that contributed to the listing. Not always available or published as it is specific to listings referring to EMAIL spam/phish events.The
abusedattribute is a boolean flag. “true” indicates the IP address is a legitimate asset that has been compromised, and “false” indicates that the perpetrators are directly responsible for this asset.The
sharedattribute is a boolean flag. “true” indicates multiple actors are using the IP address in question, and “false” indicates only the single offending entity is using the IP. For example, a fraudulent domain name hosted on shared hosting or Cloudflare). This property also implies that the resource is not compromised.The
domainsproperty is an array of strings that may contain several domain names involved in the listings. It is often used to indicate the domains used for a spam operation or by a spam gang.The
domainproperty is likedomainsbut contains only one entry.The
uriproperty shows information about one specific URI involved in the listing, for example, a phishing domain URI.The
urlsproperty lists multiple URLs associated with the listing, for example, phishing domain urls.The
useragentis the user agent string in the event of a honeypot hit.The
samplesproperty is an array of elements that contain hashes referring to specific malware samples. Users can look up the hash’s original files at https://urlhaus.abuse.ch
Additional Information
Where possible, we enrich our intelligence data with ASN, country code, and geolocation, including the following properties:
The
asnis the Autonomous System Number responsible for the IP address.The
ccis the country code, as indicated by ISO 3166. The country code attributed is associated to the IP address with complex algorithms and, in rare cases, it might not be what you would expect.The
latand thelonproperties are the IP address’s latitude and longitude. We extract this data from commercial geolocation services.
Domain Address records
Spamhaus provides domain reputation data via a REST API in JSON format. Before detailing the technical aspects of the dataset, we need to be explicit about the terminology relating to domain names.
What is a top-level domain (TLD)?
A TLD is the highest level in the hierarchical Domain Name System (DNS) after the root domain, i.e., everything that follows the final stop in a domain name before the “path”. For example, in the URL https://www.accountname.example.com/siteaddress, the top-level domain is .com.
Spamhaus expands this definition and includes in its own TLD list registered domains that meet any of the following criteria:
Domains that offer the delegation of control in the DNS to third parties. In other words, we include domains that publish nameserver (NS) records pointing to authoritative nameservers for domains underneath them in our TLD list. This can be verified by following the recursive process using dig +trace.
Domains that offer a WHOIS or Registration Data Access Protocol (RDAP) service.
Domains under which you can register additional domains of the same hierarchy, either paid or free.
Spamhaus reserves the right to include additional domains in this list at our discretion.
What is a domain?
Having defined a TLD, the domain definition follows easily: a domain name is the second-level domain. For example, in the URL https://www.accountname.example.com/siteaddress “example.com” is the second-level domain.
In the SIA REST API chapter, specifically in the TLD list section, you will learn how to fetch the TLD list and the other specifics of the domain data that we provide.
URLhaus data (beta release)
Here are examples of returned values for different calls. Note that not all fields will be filled in all cases.
{
"id": 1234567,
"url": "http://example.com/i",
"status": {
"ts": 1715764645,
"status": "offline"
},
"payload": {
"file_type": "ELF 32-bit LSB executable, ARM, EABI4 version 1 (SYSV), statically linked, stripped",
"file_ext": "elf",
"file_size": 307960,
"file_name": "na",
"sha256_hash": "d373396a0aa7f85abc9d9fad381ca97e9cbf95ec70ef8711d612333378116370",
"malware_family": "Mirai"
}
}
Fields:
idis the ID of the URL in the URLhaus database.urlis the subject that was queried.status.tsis the UNIX timestamp when the returned record was created by abuse.ch.status.statusis the accessibility information at the time the record was created. Possible values are:unknown: The URL has not yet been checked by URLhausonline: The resource is accessibleoffline: The resource is not accessibleremoved: The URL has been removed from the URLhaus databasereported: deprecated - should be considered as equivalent to unknown, will only be found for some very old data
payload.tsis the UNIX timestamp of the time the payload was ingested into SIA.payload.mime_typeis the MIME format of the payload.payload.file_typeis the file format of payload.payload.file_sizeis the file size of the payload.payload.file_nameis the file name of the payload.payload.sha256_hashshows the hash output of the hashing algorithm used.