DataSift Retweets drop geo data?


#1

It appears that DataSift does not return geo information for retweets, even though this is provided by the Twitter API. Specifically,
Twitter’s Retweet JSON contains [geo] and [place] fields for the tweet and retweet, while DataSift’s Twitter object contains neither of these fields.

Can anyone confirm that DataSift fails to provide per-interaction location information for retweets?

If so, that would mean:

  • Geospatial queries will never return Retweets
  • To get geospatial data for a Retweet, you must query the Twitter API directly
I want to collect tweet-retweet pairs where the original tweet occurred was geo-located to a specific country. The goal is to be able to show both the original tweet and all subsequent retweets on a map (ignoring the many non-geo-enabled retweets).

I tried to do this with a simple query on twitter.location.country_code . After collecting ~150k interactions, this dataset contained no retweets. I reviewed another stream that used a simple keyword search. I found many retweets in that dataset, but again none of them had geo information.


#2

Here are some related discussions:


#3

Filtering on twitter.place.country_code will never return retweets as this filters on Tweet objects only - NOT retweets. To filter on retweets, you would need to filter on twitter.retweeted.place.country_code - please note, this filter looks at the geo location of the original Tweet that was retweeted. I do not think it is possible to add location data to a retweet.


#4

Jason,

Just to clarify on your last statement.

The interaction[‘twitter’][‘retweet’] contains the original tweet of the retweet and I can’t seem to find the ‘geo’ field in any of those.

However, while using the Twitter API, the [‘retweeted_status’] field contains the entire original tweet including ‘geo’ in it.

Are you saying that in the response received from DataSift, ‘geo’ will not be present for the original tweet?

My CSDL is just this: ‘twitter.retweet.text contains “#HashTag” OR twitter.text contains “#HashTag”’. Executing this, among the results returned, for the retweets, I would like to have the ‘geo’ for the original tweet if the user has tweeted with the geo coordinates.

Do I need to modify my CSDL to achieve this?


#5

interaction['twitter']['retweet'] contains the retweet information. interaction['twitter']['retweeted'] contains information about the retweeted Tweet.

Any existing geo data will appear in the interaction['twitter']['retweeted'] fields. A retweet itself can not be geo-tagged.


#6

Sorry about the confusion. Yes, interaction[‘twitter’][‘retweeted’] does contain the information about the original tweet that was retweeted.

And thanks. I did not see any geo information in the interaction[‘twitter’][‘retweeted’] part for the sample of tweets I fetched from DataSift. Maybe it was because they did not have the information associated with them. I will write my code so as to take it into consideration.