Including retweet information in searches


#1

I use datasift to track politics. A stripped down example of a feed I might be interested in would be tweets involving the current front-runners in the 2012 presidential race, the CSDL for which follows:

tag "Barack Obama" {interaction.content any "@barackobama,the president,obama"} tag "Mitt Romney" {interaction.content any "mitt romney,@mittromney,governor romney"} tag "Newt Gingrich" {interaction.content any "@newtgingrich,newt,gingrich"} tag "Ron Paul" {interaction.content any "ron paul,@ronpaul"} return {twitter.text any "@barackobama,the president,obama,mitt romney,@mittromney,governor romney,@newtgingrich,newt,gingrich,ron paul,@ronpaul"}

Using the streaming API, I would expect to see the data about tweets mentioning any of those aliases for the 4 men. I do. I have also activated the retweet target of the twitter stream, and would also expect to see information about retweets. I never do. In fact, here is a sample item I received from the stream:

{ u'interaction': { u'author': { u'avatar': u'http://a0.twimg.com/profile_images/1622424085/rcmtwitter_normal.png', u'id': 18309196, u'link': u'http://twitter.com/rcmahoney', u'name': u'rcmahoney', u'username': u'rcmahoney'}, u'content': u'RT @lynn_bartels: Former #CU president Hank Brown gives President Obama an "F." http://t.co/vJtgMRAy #copolitics', u'created_at': u'Tue, 24 Apr 2012 16:40:15 +0000', u'id': u'1e18e2c2b100a180e074feaaa2936376', u'link': u'http://twitter.com/rcmahoney/statuses/194828109625823232', u'source': u'TweetDeck', u'tags': [u'Barack Obama'], u'type': u'twitter'}, u'klout': { u'score': 40}, u'language': { u'tag': u'sv'}, u'twitter': { u'created_at': u'Tue, 24 Apr 2012 16:40:15 +0000', u'domains': [u'bit.ly'], u'id': u'194828109625823232', u'links': [u'http://bit.ly/I6HVqQ'], u'mentions': [u'lynn_bartels'], u'source': u'TweetDeck', u'text': u'RT @lynn_bartels: Former #CU president Hank Brown gives President Obama an "F." http://t.co/vJtgMRAy #copolitics', u'user': { u'created_at': u'Mon, 22 Dec 2008 15:51:26 +0000', u'description': u'Regional Press Secretary for @RNC. Tweets/opinions are my own.', u'followers_count': 970, u'friends_count': 941, u'id': 18309196, u'id_str': u'18309196', u'lang': u'en', u'listed_count': 41, u'location': u'Washington D.C.', u'name': u'rcmahoney', u'screen_name': u'rcmahoney', u'statuses_count': 2652, u'time_zone': u'Central Time (US & Canada)', u'url': u'http://www.gop.com', u'utc_offset': -21600}}}

I would actually expect to see something more like the example twitter retweet data shown here:

"twitter": { "retweet": { "user": { "name": "Nick Halstead", "url": "http://about.me/nickhalstead", "description": "Founder of DataSift Inc ", "location": "Reading, UK", "statuses_count": 10158, "followers_count": 6572, "friends_count": 299, "screen_name": "nik", "lang": "en", "time_zone": "London", "listed_count": 561, "id": 3364401, "id_str": "3364401", "geo_enabled": true }, "links": [ "http://rww.to/rKy7PQ" ], "domains": [ "rww.to" ], "text": "Infographic: The Ever-Expanding Data Center http://t.co/hhlJeylg", "id": "149584730956906496", "source": "Flipboard", "count": 4, "created_at": "Wed, 21 Dec 2011 20:19:13 +0000" }, "retweeted": { "user": { "name": "ReadWriteWeb", "url": "http://www.readwriteweb.com", "description": "The latest news, analysis and conversation in all things web, tech and social media from the ReadWriteWeb.com team.", "location": "World Wide Web", "statuses_count": 24720, "followers_count": 1127797, "friends_count": 2250, "screen_name": "RWW", "lang": "en", "time_zone": "Pacific Time (US & Canada)", "listed_count": 18894, "id": 4641021, "id_str": "4641021", "geo_enabled": true }, "id": "149580029620269056", "source": "Tools Plugin for Movable Type", "created_at": "Wed, 21 Dec 2011 20:00:32 +0000" }, "id": "149584730956906496" },

I have never seen any tweets come through in that format, whether they are a retweet or not. In what ways do I have to modify my search so that in one CSDL query, I can get original tweets containing all of the words I’m interested in and information on the retweets of those originals as it happens?


#2

If anyone has any advice on how to make my csdl/json show up correctly there, let me know, I tried wrapping it in blocks to no avail.


#3

Regarding your Retweet question, if I were to copy or "quote" someone's Tweet and prefix it with "RT", Twitter does not consider this to be a Retweet, so neither does DataSift. A Retweet only occurs when you explicitly click a "Retweet" button in Twitter, or your Twitter client.

 

Looking at the way you are searching for @mentions of Twitter users, I can see it is not entirely correct:

  tag "Barack Obama" {interaction.content any "@barackobama,the president,obama"}

Mentions should be treated as separate entities. This is explained in our CSDL Engine blog post.

 

And unfortunately, there is no great way to format CSDL / JSON in the discussion forum at the moment - this is something we are working on.


#4

Thanks for the speedy reply, Jason. After fixing the above example to be:

tag “Barack Obama” {interaction.content any “the president,obama” or twitter.mentions any “barackobama”}
tag “Mitt Romney” {interaction.content any “mitt romney,governor romney” or twitter.mentions any “mittromney”}
tag “Newt Gingrich” {interaction.content any “newt,gingrich” or twitter.mentions any “newtgingrich”}
tag “Ron Paul” {interaction.content any “ron paul” or twitter.mentions any “ronpaul”}
return {twitter.text any "the president,obama,mitt romney,governor romney,newt,gingrich,ron paul"
or twitter.mentions any “barackobama,mittromney,newtgingrich,ronpaul”}

I can now see that it is properly including mentions, so thank you for that tip. As a point of clarification, you believe I should now be seeing both informal retweets of original tweets (comes through this stream as the original message prefixed with “RT”) and formal retweets (comes through this stream with retweet information structured into the json object)? It seems odd to me that in the time I’ve been observing a larger stream than this example, I have yet to see any formal retweets.


#5

If you are just looking at the preview stream in the web UI, you will not see any "formal Retweets" - they will all appear as "@New_Tweeter RT: @Original_Tweeter Text text text..."

Taking a look at the JSON or CSV interactions returned by DataSift, you will see these Retweets have been returned in the style you mentioned in your first post.