Is Interaction.id globally unique?


#1

I have large sets of interactions with substantially similar interaction ids. For example, hundreds of interactions in a row have an interaction.id starting with 1e2e.

Is the interaction.id guaranteed to be globaly unique? How is it calculated? If the same tweet is collected through two different streams, will they have the same interaction.id?


#2

Yes, the interaction.id is globally unique on a per-interaction basis. The interaction.id is added to the data as it is received by DataSift, and is independent upon what stream it is delivered across.


#3

The interaction.id provided by Datasift is a 32 character hex string that looks like a UUID. However, in thousands of consecutive messages, the first four hex digits are 1e2e, and about a third of messages have a common fifth character. In addition, many other interaction.id start with 1e2f. A UUID is guaranteed to have 122 bits at random according to my research, meaning that one and a half hex characters might not be at random. Thus, your interaction.id does not appear to conform to the requirements of the ISO standard for UUIDs.

We have already collected a lot of data from Datasift and we have read a lot of your documentation online. I’m not clear that your interaction.id is intended to be different from the ISO-specified UUID, or if so what your interaction.id is, what standard it conforms to, and how it is constructed. I’d like to find out that my understanding of UUIDs is wrong, and I realize Datasift is a major data provider. Could you please clarify on this point? Thank you.


#4

Our Chief Technical Architect, Lorenzo Alberton, has actually written about our UUID structure in his blog post - A Journey into Optimizing Hadoop Jobs


#5

How do we get that id ? from facebook ? Can I have an example id ?


#6

The interaction ID is found in every interaction you receive from DataSift. You can find example output data (including interaction IDs on our Understanding the Output Data pages.


#7

We’ve received 2 non-identical versions of an interaction in a single push, causing all kinds of trouble. The sample data is from a facebook source, but we’ve received these from twitter sources too. The interaction.id is the same and the data also for most parts, but: some timestamps differ and in my opinion this just should not be. If we’ve received a version A with id X, there should not exist version B of data with the same id X. Without this guarantee we cannot make any sane decisions of how to handle the incoming data.

The 2 versions:

{'language': {'tag': 'en', 'tag_extended': 'en', 'confidence': 97}, 'links': {'normalized_url': ['https://facebook.com/314467614927/posts/10153077209459928'], 'title': ['Angry Birds - We love cake! Share an Angry Birds cake with... | Facebook'], 'code': [200], 'meta': {'description': ['We love cake! Share an Angry Birds cake with us! Chirrrp!'], 'lang': ['en'], 'charset': ['CP1252']}, 'created_at': ['Sat, 13 Dec 2014 13:18:04 +0000'], 'url': ['https://www.facebook.com/314467614927/posts/10153077209459928']}, 'facebook_page': {'from': {'name': 'Ian Velinski', 'id': '100005368279025'}, 'page': {'username': 'angrybirds', 'name': 'Angry Birds', 'id': '314467614927', 'link': 'https://www.facebook.com/angrybirds', 'category': 'App page'}, 'post': {'from': {'name': 'Angry Birds', 'id': '314467614927', 'category': 'App page'}, 'type': 'photo', 'id': '314467614927_10153077209459928', 'link': 'https://www.facebook.com/314467614927/posts/10153077209459928', 'content': 'Angry Birds added 5 new photos to the album Birdalicious cake!', 'created_time': 'Thu, 11 Dec 2014 13:09:54 +0000'}, 'type': 'like'}, 'source': {'id': 'cc35ff9caeb047b695b38df0c6e27984'}, 'demographic': {'gender': 'male'}, 'interaction': {'type': 'facebook_page', 'created_at': 'Sun, 14 Dec 2014 10:01:01 +0000', 'id': '1e48136ff55fad00e019ffff9709ef62', 'link': 'https://www.facebook.com/314467614927/posts/10153077209459928', 'content': "Ian Velinski likes Angry Birds's photo", 'tag_tree': {'subscription': {'id': ['rovio:fb_angrybirds']}}, 'author': {'name': 'Ian Velinski', 'id': '100005368279025', 'avatar': 'https://graph.facebook.com/100005368279025/picture', 'link': 'http://www.facebook.com/profile.php?id=100005368279025'}, 'subtype': 'like', 'received_at': 1418551261.414}} {'language': {'tag': 'en', 'tag_extended': 'en', 'confidence': 97}, 'links': {'normalized_url': ['https://facebook.com/314467614927/posts/10153077209459928'], 'title': ['Angry Birds - We love cake! Share an Angry Birds cake with... | Facebook'], 'code': [200], 'meta': {'description': ['We love cake! Share an Angry Birds cake with us! Chirrrp!'], 'lang': ['en'], 'charset': ['CP1252']}, 'created_at': ['Sat, 13 Dec 2014 13:18:04 +0000'], 'url': ['https://www.facebook.com/314467614927/posts/10153077209459928']}, 'facebook_page': {'from': {'name': 'Ian Velinski', 'id': '100005368279025'}, 'page': {'username': 'angrybirds', 'name': 'Angry Birds', 'id': '314467614927', 'link': 'https://www.facebook.com/angrybirds', 'category': 'App page'}, 'post': {'from': {'name': 'Angry Birds', 'id': '314467614927', 'category': 'App page'}, 'type': 'photo', 'id': '314467614927_10153077209459928', 'link': 'https://www.facebook.com/314467614927/posts/10153077209459928', 'content': 'Angry Birds added 5 new photos to the album Birdalicious cake!', 'created_time': 'Thu, 11 Dec 2014 13:09:54 +0000'}, 'type': 'like'}, 'source': {'id': 'cc35ff9caeb047b695b38df0c6e27984'}, 'demographic': {'gender': 'male'}, 'interaction': {'type': 'facebook_page', 'created_at': 'Sun, 14 Dec 2014 10:00:02 +0000', 'id': '1e48136ff55fad00e019ffff9709ef62', 'link': 'https://www.facebook.com/314467614927/posts/10153077209459928', 'content': "Ian Velinski likes Angry Birds's photo", 'tag_tree': {'subscription': {'id': ['rovio:fb_angrybirds']}}, 'author': {'name': 'Ian Velinski', 'id': '100005368279025', 'avatar': 'https://graph.facebook.com/100005368279025/picture', 'link': 'http://www.facebook.com/profile.php?id=100005368279025'}, 'subtype': 'like', 'received_at': 1418551202.923}}

#8

Thanks for raising this. We are aware this can happen in some edge cases, and take a look into what we can do to prevent this from happening in the future. 

In the meantime, it may be worth implementing code which simply drops any interactions which have an interaction ID you have already seen. If you receive two interactions with the same ID; they are the same interaction. In this case, we simply pulled the same like from Facebook twice, which is why every detail is the same, other than the timestamp. It is worth noting that Facebook does not provide a timestamp for Likes engagements, so any timestamp is the time at which DataSift received that Like.