Interaction rate by provider type


#1

Greetings,

While using the DataSift streaming API, we’ve noticed somewhat puzzling patterns of the rate at which we receive data from various providers. It appears that, at least for some providers, data comes in large batches, with long pauses in-between. For example, when we ask for all content from YouTube, we sometimes get many comments at once, and sometimes get none at all for an hour or more.

In fact, even now, a stream with the CSDL of interaction.type == “youtube” has been running for more than an hour with no data; we’re also not receiving any data from one of your own sample streams (http://datasift.com/stream/19344/youtubetitle#app1-preview), so while it’s possible that there are issues with the CSDL we’re using, that’s definitely not the only issue.

We’ve also noticed the same behavior with Facebook, where we sometimes get multiple results per second, and sometimes nothing at all for several minutes at a time. Twitter, however, appears to generate results all the time.

With that in mind, we’d like to know whether:

  1. This is an appropriate behavior (perhaps caused by the batching of data performed by the data providers themselves?) or the result of us doing something incorrectly; and

  2. If it’s how it should be, do you have approximate expected values for the length of time between the two batches for Facebook and YouTube (and other providers, if you have them)?

Thank you.


#2

This is expected behaviour. Twitter provides us with a dedicated, real-time, Firehose delivery system, so you typically receive every Tweet less than two seconds after it is sent by the Twitter user.

We are actively working on improving our Facebook delivery model. We rely on polling the Facebook Open-Graph API to receive our Facebook interactions, so sometimes it may appear that you are receiving these interactions as batches.

We receive our data for the Youtube source in large batches, and thus it appears that we deliver our results in batches as we process them. Unfortunately the time between batches does very - unfortunately I could not reliably give you an estimate for how long this time between batches usually is.