FAQ: How best to consume data via Push, quickly and reliably


#1

This FAQ describes some quick and easy methods to ensure your Push subscriptions deliver data quickly and reliably. All these techniques are achievable using DataSift’s standard Push Delivery configuration options and Push API.

Compress your data

DataSift makes it easy for you to receive compressed data at your data destinations. You simply need to select the relevant compression option when setting up your Push Destination. This can reduce the file size of a 50MB JSON file down to ~10MB, saving a huge amount of space on your machine, and ensuring your files are delivered as fast as possible.

High Delivery Frequency

Continuous Delivery is our delivery frequency of choice; unless you have a good reason not to, you should always select continuous delivery.

Large Max File Sizes

You should select the largest max file size your application can handle; delivering fewer larger files has slightly lower overheads than delivering many smaller files, so will lead to faster delivery times.

Using the API to monitor your Push subscriptions

We have a number of API endpoints you should be using to monitor your Push subscriptions;

  • /push/get - allows you to check the status of your Push subscription. This includes looking at the status of the job, delivery count (when using API v1.1), the lost_data flag, which indicates whether this subscription has lost data, and most importantly, the remaining_bytes value. This is the number of bytes currently in DataSift’s Push buffer, waiting to be delivered. If you see this value is constantly increasing, this subscription may be matching data faster than you are able to consume it.
    In a case where you are consuming an Historics query, you can hit the /historics/pause endpoint to pause the Historics query, and allow yourself to clear your Push buffer before continuing to run the Historics query.
    If this happens while you are consuming a live Push subscription, it may be a sign that you need to increase the resources or bandwidth available to your Push endpoint, or that you need to break your filter up into a number of smaller filters which each return less data.
  • /push/log - shows any error log messages associated with your Push subscription. This may include error messages returned from your Push endpoint. For example, DataSift may be temporarily unable to reach your endpoint; an event like this would be logged to your /push/log.
  • /historics/get - allows you to check the progress of your Historics jobs. If you are using v1.1 of the API, you will receive a delivery_count field in your response which will tell you how many interactions this Historics query has matched so far.

I hope this has been informative; if you have questions, please feel free to comment on this post, or raise other topics in the Data Delivery and Destinations forum.