Is running a filter on a month of historical data costs the same as running a filter for a month on real-time data?


#1

In other words, I can start a filter now and run it for a day on the real time stream(s) and that will cost me an amount of DPUs which is a function of the number of “real hours” the filter was running (for example 24 hours), I also understand that there is a cost related to the amount of results I got but lets put this aside.

My question is if I run the same filter on a day in the past (lets say 1/1/2012) then would I pay the same amount (same DPU) since I run my filter on 24 hours of data? or would I pay on the time it would take Datasift to execute the filter (hopefully this will be few minutes)

If the answer is 24 hours also for historical data, then I would like to say that its a little bit too much for Datasift to ask,
The real power of twitter is in real time and the only reason I would be running on something from two years ago is because I want to research the filter results so I will have some understand on to what to expect from real-time data.


#2

For simpler queries (under 20DPUs) it is cheaper to run filters on live-streams as the processing overheads are far lower. Please bear in mind that when you run an Historic query, we have to search an archive of almost a Petabyte in size, containing hundreds of billions of Tweets.

For more complex queries (20DPUs and above), it becomes cheaper to run these as Historics, rather than in real time, as the cost of running the CPU becomes cheaper when used at a larger scale.

Historics queries are charged in the same way real-time queries, just at a slightly different rate. That is, both are charged a multiple of your filter's DPU cost for each hour in time you wish to query. Details of how we bill streams can be found on our Understanding Billing page.