Different Upload Date


#1

I’m using both Datasift and the Facebook API to collect public posts, but there is something wrong with the upload date of the datasift posts, for example, this post:

I collect it from both sources and the result was:

Facebook API - ISODate(“2013-02-26T19:11:10Z”)
Datasift - ISODate(“2013-02-26T19:11:54Z”)

This is occurring with some frequency. My database uses the upload date to divide in shards and this kind of bug will make me stop using datasift to collect facebook.

Other posts that came with a different upload date:

Thank you


#2

The interaction.created_at time is the time we process the interaction, and should be the time you receive the interaction. We process the Facebook data source close to real-time (within a couple of minutes), so the interaction.created_at time may not always match the facebook.created_at time. I have noticed that in some cases, the facebook.created_at field is not being returned. I have raised an issue internally to get this resolved as soon as possible.


#3

Thank your for the reply Jason.
So in this case I will have to use the facebook.created_at field instead of the interaction.created_at, correct? This field will return the same date of the Facebook API?
Will I have the same problem with other sources like twitter if I use the interaction.created_at?


#4

The facebook.created_at value should always match the time provided in the interaction from Facebook, so this is the field you are looking for. Our engineering team is looking into why this field was not always appearing today.  

This is less of an issue when dealing with Twitter, as our Twitter Firehose delivery ensures we receive everything in real-time, though there may occasionally be a variance of a couple of seconds between the twitter.created_at and interaction.created_at times. 

In general, if you want to record the time the original post was created, you should always refer to the <data_source>.created_at time, and fall back to interaction.created_at if this is not available for some reason (for example, if the created_at time was not supplied as part of the interaction originally sent to us).


#5

Thank you for the fast response Jason. I will use <data_source>.created_at from now on.


#6

Hello Jason,

I was doing the modification that you suggested it (to use the facebook.created_at) but this field is not present in all posts that I searched for.

example:

“facebook”: {
“message”: “Last minute trip today to take a bite out of the Big Apple… NYC what will I encounter on this journey?”,
“id”: “641497012_10151603771467013”,
“author”: {
“id”: “641497012”,
“link”: “http://www.facebook.com/profile.php?id=641497012”,
“name”: “Charlie Lapson”,
“avatar”: “https://graph.facebook.com/641497012/picture
},
“source”: “web”,
“likes”: {
“count”: 1,
“ids”: [
“100000800595349”
],
“names”: [
“Izzibag Anne Agoren”
]
},
“type”: “status”
},

Are you guys working on this?


#7

Yes - we realise there is an issue here, and are working on pushing out a fix as soon as possible.


#8

Thanks

I will be waiting.


#9

This should be resolved now. Please let us know if you experience anything like this again.


#10

Hello Jason,

I just tested your correction and everything seems to be ok now =D

Thank you very much for the fast fix

PS: I will capture any exception and inform you if it happens