Volume difference between Twitter Streaming API and DataSift Streaming API


#1

Hi,

we started testing Datasift Streaming API to replace our current use of Twitter’s Streaming API (the 1% free pipe) and we noticed that the volume of tweets is not at all 100 times higher than the free 1% from Twitter.
For example, our filter on Twitter would be “X,Y”, meaning that we look for tweets containing both X and Y keywords. On DataSift we use

twitter.text contains_all “X,Y”

We know that the DataSift version is not exactly the same. We should still have a much higher volume with DataSift. Right now we get about 200 tweets/hour from Twitter Streaming API and only 450 tweets/hour from DataSift.
Could someone help us understand this?


#2

Hi Ted,

Twitter's search functionality vs DataSift's CSDL behaves a bit different. In a nutshell, using Twitter's keyword search will look at the body of the tweet/retweet, any @mentions and in the URL of a link posted with the tweet/retweet. Using twitter.text looks at only the body of the tweet (not retweet). There are other targets that you can use to include in your CSDL to broaden your results.

Additonal information can be found in this blog post: Migrating from the Twitter Streaming API

Regards,
Victor


#3

Hi Victor,

we did try to use the following (from http://dev.datasift.com/blog/migrating-from-the-twitter-streaming-api)
twitter.text
twitter.domains
twitter.mentions
twitter.retweet.text
twitter.retweet.domains

but the volume of tweet is still between 2 and 3 time higher than the free stream from Twitter Streaming API instead of being 100 time bigger.


#4

Are you able to provide us with the CSDL you are using versus the Twitter API query? If you do not wish for it to be public, please feel free to submit a Support ticket and we will continue looking into it.