Continous update


Hello, I’m trying to understand the differences between data fetching methods. I was wondering about data loss during connection interruptions either by net failure or by reaching limits of account.

Supose I would like to have my fitered data really up to date. Using a streaming connection with a websocket, when I resume a streaming line after a pause, will I lose those interactions in between the start of the pause and the begining of the resume? Will those interactions be buffered for a moment?

Should I use pushing method for this instead of streaming?



Your two options are either using the Live Streaming API, or Push Delivery.

The Live Streaming API is as close to real-time delivery as you can get, though it is not fault tolerant - in a case where your connection becomes interrupted and is dropped - any interactions generated between your connection dropping, and you reconnecting will be lost. 

Push Delivery is the preferred delivery method. You can set your delivery frequency to '0' for continuous delivery - this typically gives you at most a couple of seconds latency over using the Live Streaming API. Unlike the Streaming API, Push delivery is fault tolerant - DataSift will buffer you data for at least an hour before it is discarded, so if you are unable to receive data for a few minutes, or you need to take your endpoint down for maintenance, it's not a problem! DataSift will simply keep trying to send your data until your endpoint is in a state where it can accept data again. We also offer the ability to push your data straight into a number of different endpoints, including custom HTTP endpoints, (S)FTP, S3, MongoDB, and many more.


Thank you, it is more clear now. One thing though, data destination seems to be optional, when it is not specified, the data is pushed over a pull connection? Also, the websocket endpoint, is it only for live streaming or can be combined to use push?


When you create a new Push Subscription, you must specify where you want your data to be pushed to. This can either be some specific endpoint, such as a MongoDB instance, or an S3 bucket, or you can just specify 'pull', and you can pull your data from anywhere you like. 

Regarding the WebSockets Streaming, this is only available for the Live Streaming API - it is currently not possible to Push data over a WebSockets connection.


Thank you, it is clear now.