What is the Spamhaus Intelligence API?
The Spamhaus Intelligence API (SIA) is a structured REST interface that allows easy access to the unique reputation data that Spamhaus produces. This actionable information is provided for single IP addresses and domain names.
The majority of Spamhaus’s datasets come from intelligence gathered by a global network of probes and industry partnerships. Dedicated teams of researchers use various techniques, including machine learning, heuristics, and manual investigations, to identify malicious behavior relating to signals presented via SIA. This data is deduped and filtered from spurious signals; false positives are removed before assembling production data.
IP reputation data
There are multiple factors as to why our researchers deem an IP address to be potentially malicious. Depending on these contributing factors, IP addresses will appear in different datasets.
The following three IP datasets all provide reputation data for SIA:
Exploits Blocklist (XBL)
Combined Spam Sources (CSS)
Botnet Controller List (BCL)
You can collectively utilize all the intelligence returned from these datasets for a detailed inspection of a specific IP address. Alternatively, given these datasets’ different focuses (see below), you can use them individually for a particular purpose. One example is using the BCL dataset to assist with the timely identification of botnet compromised machines, enabling immediate remediation before the compromise causes further damage.
Here is an overview of what is contained within each of the above realtime datasets. For a detailed breakdown of the metadata fields returned by the API, please refer to Anatomy of Data [insert link]
Exploits Blocklist (XBL)
This dataset contains IP addresses that are exhibiting any of the following behaviors:
Hijacked PCs infected by illegal third-party exploits.
Abused open proxies (HTTP, socks, AnalogX, wingate, etc.).
Worms/viruses with built-in spam engines.
Other types of trojan-horse exploits.
Trying to brute force accounts on public internet services.
Combined Spam Sources (CSS)
This dataset includes IP addresses that are involved in sending low-reputation emails or emitting phishing attacks, thus representing a security risk.
Behaviors of IP addresses that are included in this dataset, but not limited to, are:
Email showing indications of unsolicited nature.
Broad-spectrum aggregated views of email deliveries.
Having poor list hygiene.
Sending out malicious emails due to a compromise (compromised account, webform, or content management system).
Showing indicators of phishing or negative behaviors.
Other indicators of low reputation or abuse.
Multiple events and heuristics lead researchers to detail IPs in the CSS.
Botnet Controller List (BCL)
This is a list of public IP addresses identified as being used by cybercriminals to control infected machines, also referred to as botnet command and control servers (C&C or C2).
Domain reputation data
Every domain observed by the Spamhaus research team is accessible via SIA, returning multiple data points relating to its reputation. Even domains that have been newly observed with zero reputation are included.
Where threat hunters observe malicious behavior, the threat type is detailed, including phishing, malware, and botnet command & controllers (C&Cs), among other threats.
URLhaus (beta release)
abuse.ch is a globally recognized platform focused on identifying, tracking, and fighting malware. The URLhaus dataset, from abuse.ch, provides malicious URLs that are being used for malware distribution. Malicious URLs form the literal link between users and a malicious payload. It is crucial for businesses and end users to understand if the link they are clicking on, or a link that has been obfuscated, is a link that leads to an attempt at malicious behavior, be it a malicious payload or a phishing website.
URLhaus data is collected and shared by researchers, the cyber threat intelligence (CTI) community, and a number of large CTI vendors. The data is constantly tested to identify which URLs are active, and which URLs have been taken down, meaning it remains up to date and relevant.t.
The following fields are queryable/searchable through SIA:
URL
Host/IP
Malware family
Hash (SHA256 or MD5)
ASN
Tags (community-generated identifiers)
This list will be updated as more searchable fields are added.