Too few traffic of data from datasift streaming api and Recordings


#1

I could get too few traffic of data from datasift streaming api and “Recordings” using a filter like following CSDL:‘twitter.text contains “nhk” or twitter.text contains “nhk_kouhaku”’ compared to Twitter’s official streaming api with the same tracking words. I think your api only can provide 20%-30% of amount of data which Twitter’s “Free” streaming api generally does. Did I forget something to do?


#2

Thanks for spotting this. We will do some research into this, and get back to you as soon as we can.


#3

After looking into this issue, we have found that Twitter Track actually searches additional parts of the Tweet objects.

For example, I ran the following query on the Twitter API to look for the word "techcrunch":

curl -d "track=techcrunch" "https://stream.twitter.com/1/statuses/filter.json" -uMyScreenName

which returned the following JSON (I have removed some irrelevant fields for brevity):

 

 

The search term "techcrunch" does not appear in the Tweet's text field, however it does appear in the URL attached to the Tweet.

The CSDL you were using to filter your stream within DataSift was using the twitter.text target, which only searches the Tweet's text element. If you wanted to extend this to the URL you could add some of our links targets, such as  links.title contains "techcrunch" or links.url contains "techcrunch"


#4

For more information on this subject, please see the blog post - Migrating from the Twitter Streaming API