It appears that DataSift is not providing 100% of the matching posts from my Sina Weibo stream. My streaming collector (using the python libraries) runs without error and retrieves posts intermittently. However, if I do a search for one of my collection terms via the Sina Weibo search interface (e.g. http://s.weibo.com/weibo/term1), I see that I’m only getting ~50% of the posts.

For reference, my CSDL is:

interaction.type == "sinaweibo" AND interaction.content contains_any [language(zh)] "term1, term2, ..., term12"

I’ve tried the query with and without the [language(zh)] clause and with English, Mandarin, and a combination of terms.


We have heard that SinaWeibo has recently been required to improve their censorship; this is likely the cause for you not receiving 100% of the posts you expect to see.
I’m afraid this issue is down to what is being sent to DataSift by Sina Weibo; there is nothing we can do from a technical standpoint to improve the situation.