The Protocol - How to access the data

The Spamhaus Real Time Feed infrastructure is based on ‘websockets’.

A websocket is a Transition Control Protocol (TCP) communication channel that provides full-duplex capabilities on an HTTP(s) upgraded connection. In fact, it is established like a normal HTTP(s) connection and then upgraded to full-duplex in place.

Websockets can be handled easily and do not require specific ports to be opened on firewalls and security devices. HTTP and HTTPS ports will usually suffice.

If you want to learn more about websockets you can read RFC6455.

Authentication

As mentioned above, the connection starts over HTTPS.

The authentication method follows the Spamhaus Intelligence API (SIA) authentication - documented here.

The login API call should be performed using the abusert realm. An example login call would look like this:

curl -s -d '
    {
    "username":"[email protected]",
    "password":"the-secret-password",
    "realm":"abusert"
    }' https://api.spamhaus.org/api/v1/login

You will retrieve a JSON object containing a token (this is a long base64-encoded string).

In order to connect to the websocket, the HTTP(s) GET request (before the HTTP protocol upgrade explained in the RFC6455) should contain an Authentication header that needs to be formatted as outlined here in our technical documentation.

Websocket

The websockets to use are exposed on this url:

wss://rt.spamhaus.net/streams/v1/abuse.ch/<FEEDNAME>

where <FEEDNAME> is a placeholder that should be replaced with a proper Feed name.

The formatting of our supported Feeds are:

  • urlhaus

  • malwarebazaar

  • threatfox

  • yaraify

  • Feodotracker

  • sandnet

After connecting to the websocket and sending the proper Authentication header, you will need to instruct the server to start the real time stream flow.

All the commands and responses with the servers will happen over websocket text frames. See rfc6455 5.2 for additional information.

The server sends ping frames every few seconds: your client must respond with a pong frame. This is a keepalive mechanism and is required to keep the connection alive. If the client doesn’t promptly send a pong frame after receiving a ping, then it will be disconnected.

The server accepts the few simple commands explained below.

stop command

The client can send a text frame containing the stop command. The server will respond to a stop command by sending a close frame containing the bye string.

status command

The client can send a text frame containing a status command. The server will reply with a text frame containing the status of the server in this format: status <startindex> <endindex>

The <startindex> and <endindex> values are 64bit unsigned integer numbers indicating the index of the first message and the index of the last message available on the server.

This may be required because:

  • the server stores a few minutes of backlog that can be replayed upon request.

  • the client can ask the server to replay the backlog starting from a specific index. See the resume command below.

start command

The client can send a text frame containing a start command. Upon receiving that command, the server will then start sending the live feed in real time, starting from the next message available.

The resume command should not be used with the start command.

resume command

The client can send a text frame containing a resume <index> command. The placeholder <index> can be replaced by an unsigned 64bit integer representing the index of the message from which the server wants to replay the backlog. Once the message has been sent, the server will start replaying the backlog and then start sending the real time flow.

The start command should not be used with the resume command.

Records format

The records are JSON formatted. No other format is allowed.

Each record can have several properties, described in the appropriate documentation chapter. It should be noted though, that each message has two additional properties that are added by the realtime backend to all the messages:

  • _idx contains the index of that specific record. Each index is a monotonically incrementing 64bit unsigned integer number that identifies each message.

  • _ts is the Unix timestamp of the record as it was received by the server.

The _idx number is very important to resume the real time stream, in case of a disconnection. The client can keep track of the last _idx received and send a resume command to fetch the messages that it may have lost in the meantime.

The _ts property is important to understand if the real time feed has an issue and is lagging behind.

Please note that the server stores only a few minutes of backlog. The real time feed cannot replay full days of messages.