It seems like the DPU’s shoot up big time if you do a query like twitter.user.screen_name in “usera, userb, userc, userd”. In our use case, we might want to follow tweets coming from hundreds of people. But we can easily add them all to a list and have Datasift only follow tweets coming from that list. Is that possible? If not, then what are best practices to follow a large number of twitter users while keeping the DPU usage in check?
The cheapest way to achieve this is the way you are currently doing it by using
twitter.user.screen_name in "userA, userB, userC"
The cost of using the IN operator is determined using a sliding scale, which is detailed in our Understanding Billing page.
Subscribing to a list of Twitter users is something we do plan to support in the future, but do not have any immediate plans to implement.