Extended XBL (eXBL)

This is the metadata-enriched version of the eXploit BlockList, a list of public IP addresses where through behavioral heuristics we identify indicators of compromised machines.
It’s distributed as a single JSON file containing all the listings live at the time of the file generation (indeed named eXBL, to differentiate it from the plain DNSBL list, not containing the metadata and named XBL), or can be queried through the API as XBL dataset , where historical data are also available. In both cases, the record format is exactly the same.
Each record identifies a “detection” for the given IP, with multiple records for the same IP possible in case multiple bots (or what our analysts think are multiple bots) have been identified on the same IP resource.
Each record is composed by the following fields:

  • ipaddress The IP address identified as the source of the bot-generated traffic. Always provided.

  • botname The bot name associated with the detected activity. Where a clear association is not possible, “unknown” will be returned. Always provided.

  • botnam_malpedia The bot name associated with the detected activity, as named by Malpedia. Where a clear association is not possible, “unknown” will be returned. Not always provided, particularly for historical listings.

  • seen The Unix timestamp (rounded to the minute) of the last detected event for the given IP and the given botname. Always provided.

  • firstseen Unix timestamp (rounded to the minute) of the first detection event for this IP+botname combination. This will match the value of seen if it’s the first sighting of this type on this particular IP. When there has been no activity for this given combination for a month, the field is reset the next time it’s observed. Always provided.

  • listed The Unix timestamp (rounded to the minute) of when the entry reached our database. Usually, this is very close to the value of seen unless when the data is coming from batched processes. Always provided.

  • valid_until Unix timestamp (rounded to the minute) of when the given entry will be considered “expired” from our dataset. Always provided.

  • detection Human-readable form, briefly describing how the data was collected. This field only appears when the heuristic can involve multiple ways of collecting said data.

  • rule An internal ID pointing to the rule operating the detection. Detections operated by different means or rules will show different IDs, even when they refer to the same detection. Always provided.

  • dstport The destination port of the traffic that triggered the detection. Not always disclosed/available.

  • helo When the detection is operated from SMTP traffic, this is the HELO string used in the SMTP session triggering the detection.

  • helos Specific to MPD detections only. This is an array enumerating all the HELO strings involved in the detection. Appears only in records for the MPD heuristic.

  • heuristic It’s the heuristic applied to generate the detection. Has a limited number of possible values.

  • asn It’s the Autonomous System Number (ASN) announcing the IP; predominantly obtained from routeviews data.

  • lat Geographic latitude of the IP. Only provided when geolocation data is available.

  • lon Geographic longitude of the IP. Only provided when geolocation data is available.

  • cc The ISO Country Code of the nation where the IP resides. Only provided when geolocation data is available.

  • protocol IP protocol of the traffic triggering the detection. Usually either UDP or TCP.

  • srcip Source IP of the traffic triggering the detection. Except in rare cases, this matches the argument of the listing for IPv4, while in IPv6 -for which the granularity of XBL is the /64, this provides the specific IP (/128) causing the listing.

  • uri Specific to the “SINKHOLE” heuristic, and to HTTP sinkholes detections in particular. This is the URI of the HTTP request triggering the listing. Not always available.

  • useragent Specific to the “SINKHOLE” heuristic, and to HTTP sinkhole detections in particular. It is the User-Agent header of the HTTP request triggering the listing. Not always available.

  • domain Mostly specific to the “SINKHOLE” heuristic, and to HTTP sinkholes in particular. It’s the domain/hostname the traffic triggering the detection is reaching, i.e., the sinkhole’d domain. Often obtained from the “host” header of the HTTP request triggering the listing. Not always available.