How can we limit the number of tweets per user in a stream?


#1

We are considering running some historic queries in which certain users who tweet a lot about the topic will appear quite a lot. Ultimately we are only interested in giving “one user one vote” in our dataset: we would like to receive at most one tweet per twitter user when we run the stream. (Which tweet we get doesn’t matter, though it would be handy if it was uniform random.)

Or barring that, is there some way to get something like the user’s first tweet of the day or something - any sort of volume control per user, so that we minimize how many tweets we pay for that we just ultimately throw away?

Any advice?


#2

In an Historic query, this is not currently possible. We can currently only return the full queried data set, and do not take into account which Tweets we have already returned while running an Historic. 

It is possible to achieve this behaviour when running live streams by changing your CSDL as soon as you have received Tweets from certain users. 


#3

Is putting a results limit on Historic queries something that you are looking into implementing?


#4

No. We are working on a service to allow you to more accurately estimate the number of interactions returned by an Historic query. You could then use these results to apply an interaction.sample term to your CSDL filter to limit the number of interactions you receive.