While using the DataSift streaming API, we’ve noticed somewhat puzzling patterns of the rate at which we receive data from various providers. It appears that, at least for some providers, data comes in large batches, with long pauses in-between. For example, when we ask for all content from YouTube, we sometimes get many comments at once, and sometimes get none at all for an hour or more.
In fact, even now, a stream with the CSDL of
interaction.type == “youtube” has been running for more than an hour with no data; we’re also not receiving any data from one of your own sample streams (http://datasift.com/stream/19344/youtubetitle#app1-preview), so while it’s possible that there are issues with the CSDL we’re using, that’s definitely not the only issue.
We’ve also noticed the same behavior with Facebook, where we sometimes get multiple results per second, and sometimes nothing at all for several minutes at a time. Twitter, however, appears to generate results all the time.
With that in mind, we’d like to know whether:
This is an appropriate behavior (perhaps caused by the batching of data performed by the data providers themselves?) or the result of us doing something incorrectly; and
If it’s how it should be, do you have approximate expected values for the length of time between the two batches for Facebook and YouTube (and other providers, if you have them)?