I want to create CSDL for a huge amount of keywords


#1

Hi,

  • This question is about Twitter stream and I want to get relevant data for my search terms.
  • I wrote a test app I saw that Datasift has a limitation on the amount of logical operators which is 500, is it true?
  • Given that I have 10000 search terms, each search term would be something like: "best" AND "reel" or "best" AND "rod"
  • So what's your recommendation to build the CSDL for those terms?
  • I tried to convert my terms to CSDL, aggregate those CSDL together with OR operator and I got problem with the 500 operator limitation above. Then I tried different way, I split those CSDL to smaller group, compile each group and get a stream hash. Then I created a master stream from those stream using "stream" keyword, I got approximately 1000 streams to make a master stream but I couldn't compile the master CSDL which supposes to give me the master stream. I always get the error: "Bad request" or something.
  • Other questions:
  • - What is the maximum DPU you return for a CSDL?
  • - If I make a master stream, how would the master DPU be? Is it a sum of these child stream's DPU?
  • I'm stuck. Please help.
  • Thanks.

#2

The limit of 500 operators may well be a bug - I am looking into that now.

 

It is possible to optimize your CSDL. For example, instead of:

interaction.content contains "best" and interaction.content contains "rod" or
interaction.content contains "best" and interaction.content contains "reel" )

You could cut out any keywords you repeat:

interaction.content contains "best" and
interaction.content any "rod, reel"

 

Could you give me more details about the "Bad Request" you received? Did it appear in our list of Error Messages or Response Codes?

 

Your DPU questions should be explained in our Understanding Billing page.

 

I hope this helps


#3

Thanks for the answer. I received the bad request when using Datasift client library and i had found that the problem was I have duplicated TAG name. Problem solved.

I finally can compile my master stream of 350 sub streams but the DPU is so high 400. I’m working on it to reduce the value. Is it different in the DPU output when I use twitter.text instead of interaction.content?

Thanks


#4

The DPU cost for each target is the same - it is just the number of search keywords and the operators you use which will make a difference to your DPU cost. This is all explained on our Understanding Billing page.

Take a look at our Optimization page if you need some help reducing your CSDL cost.