DPU Cost comparisions


#1

I’ve done a couple of filters to compare DPU costs on two different approaches.

The first produces rules like:

tumblr.body contains "a" and tumblr.body contains_any "b,c,d,e"

where the second does

tumblr.body contains_all "a,b"
or tumblr.body contains_all "a,c"
or tumblr.body contains_all "a,d"
or tumblr.body contains_all "a,e"

What surprised me was that for an approach which reduced 394 or down to 62 lines like in the first example the overall DPU cost was higher!

Can you comment on these two approaches and the costs?


#2

Simply put, one version is easier for DataSift to process, so carries a lower DPU cost.

Full details about how DPU costing works can be found in our Understanding Billing documentation

You can also check out the DPU cost breakdown of your filter using the /dpu API endpoint, or looking in the UI if you created the filter there.