The Protocol - How to access the data
The Spamhaus Real Time Feed infrastructure is based on ‘websockets’.
A websocket
is a Transition Control Protocol (TCP) communication channel that provides full-duplex capabilities on an HTTP(s) upgraded connection. In fact, it is established like a normal HTTP(s) connection and then upgraded
to full-duplex in place.
Websockets can be handled easily and do not require specific ports to be opened on firewalls and security devices. HTTP and HTTPS ports will usually suffice.
If you want to learn more about websockets you can read RFC6455.
Authentication
As mentioned above, the connection starts over HTTPS.
The authentication method follows the Spamhaus Intelligence API (SIA) authentication - documented here.
The login API call should be performed using the abusert
realm. An example login call would look like this:
curl -s -d '
{
"username":"[email protected]",
"password":"the-secret-password",
"realm":"abusert"
}' https://api.spamhaus.org/api/v1/login
You will retrieve a JSON object containing a token (this is a long base64-encoded string).
In order to connect to the websocket
, the HTTP(s) GET request (before the HTTP protocol upgrade explained in the RFC6455) should contain an Authentication
header that needs to be formatted as outlined here in our technical documentation.
Websocket
The websockets to use are exposed on this url:
wss://rt.spamhaus.net/streams/v1/abuse.ch/<FEEDNAME>
where <FEEDNAME>
is a placeholder that should be replaced with a proper Feed name.
The formatting of our supported Feeds are:
urlhaus
malwarebazaar
threatfox
yaraify
Feodotracker
sandnet
After connecting to the websocket and sending the proper Authentication
header, you will need to instruct the server to start the real time stream flow.
All the commands and responses with the servers will happen over websocket text frames. See rfc6455 5.2 for additional information.
The server sends ping frames every few seconds: your client must respond with a pong frame. This is a keepalive mechanism and is required to keep the connection alive. If the client doesn’t promptly send a pong frame after receiving a ping, then it will be disconnected.
The server accepts the few simple commands explained below.
stop command
The client can send a text frame containing the stop
command.
The server will respond to a stop
command by sending a close frame containing the bye
string.
status command
The client can send a text frame containing a status
command.
The server will reply with a text frame containing the status of the server in this format: status <startindex> <endindex>
The <startindex>
and <endindex>
values are 64bit unsigned integer numbers indicating the
index of the first message and the index of the last message available on the server.
This may be required because:
the server stores a few minutes of backlog that can be replayed upon request.
the client can ask the server to replay the backlog starting from a specific index. See the
resume
command below.
start command
The client can send a text frame containing a start
command.
Upon receiving that command, the server will then start sending the live feed in real time, starting from the next message available.
The resume command should not be used with the start command.
resume command
The client can send a text frame containing a resume <index>
command.
The placeholder <index>
can be replaced by an unsigned 64bit integer representing the index of the message from which the server wants to replay the backlog.
Once the message has been sent, the server will start replaying the backlog and then start sending the real time flow.
The start command should not be used with the resume command.
Records format
The records are JSON formatted. No other format is allowed.
Each record can have several properties, described in the appropriate documentation chapter. It should be noted though, that each message has two additional properties that are added by the realtime backend to all the messages:
_idx
contains the index of that specific record. Each index is a monotonically incrementing 64bit unsigned integer number that identifies each message._ts
is the Unix timestamp of the record as it was received by the server.
The _idx
number is very important to resume the real time stream, in case of a disconnection. The client can keep track of the last _idx
received and send a resume command to fetch the messages that it may have lost in the meantime.
The _ts
property is important to understand if the real time feed has an issue and is lagging behind.
Please note that the server stores only a few minutes of backlog. The real time feed cannot replay full days of messages.