I belong to an academic team which purchased access to the twitter gardenhose feed (DataSift) from 2014 to 2015 in France. We are beginning to work with geotagged tweets with GPS coordinates using the
.twitter.geo.coordinates field (for latitude and longitude). We’ve noticed that by doing so, the actual percentage of geotagged tweets is around 2% like suggested in most of the twitter studies referring to that period. However, there seem to be many coordinates corresponding to places instead of exact GPS coordinates (as the same pair of coordinates tends to appear repeatedly through the whole dataset). When downloading some of the tweets from the public API we noticed that, while in this format the
.geo.coordinates only indicates GPS coordinates it doesn’t seem to be the case for the DataSift format. Is there then a specific field that would allow us to distinguish between exact GPS coordinates and places?
This issue has been flagged in similar threads (Sample output from twitter that includes geo and place?) that however weren’t fully answering our question.
I believe this difference is caused by the way in which the Tweets have been geo-tagged. When Tweeting, Twitter gives you an option to tag a place (such as “Times Square”), or to include your specific GPS coordinates:
The field that will allow you to distinguish between exact GPS coordinates and places will be in the Twitter object. Specifically the
.coordinates objects are included only when the user has shared their exact location, whereas the
.place object is always present when a location has been added to a Tweet
Thanks Jason for your last answer. Actually the fields
interaction.geo.coordinates seem to be redundant as we get the same number of locations overall with both queries. So, if I understand correctly, both these fields are representing exact GPS coordinates. Is it the case?
Yes, these two fields are mapped to each other, so should always return the same results