contains_any dpu costs are calculated incorrectly for hashtags


#1

Following the example in the billing docs the following filter will cost 0.2 DPUs:

interaction.content contains_any “apple, microsoft, hp, dell, oracle, google, yahoo, ebay, amazon, facebook” The DPU break down shows correctly we have 10 keywords.

Now when I add the same keywords as hashtags: interaction.content contains_any “apple, microsoft, hp, dell, oracle, google, yahoo, ebay, amazon, facebook, #apple, #microsoft, #hp, #dell, #oracle, #google, #yahoo, #ebay, #amazon, #facebook” Suddenly the DPU break down shows incorrectly that I use 30 keywords (which should be 20) costing 0.5 DPUs.


#2

Because the hashtag symbols are classed as punctuation, they will be tokenized as separate words, so searching for "#apple", is charged as two words. Take a look at Tokenization and the CSDL Engine for more.

Filtering on hashtags can be done one of three ways:

interaction.content contains_any "#hashtag"

This will match any interactions containing "#hashtag", or "# <whitespace> hashtag", as the "#" symbol is tokenized as a separate word.

interaction.content contains_any "hashtag"

This will match any interactions containing "#hashtag" or "hashtag". As in the above example, the "#" symbol is tokenized as a separate word, so filtering for your hashtag without the "#" symbol will still match this hashtag.

twitter.hashtags contains_any "hashtag"

This will filter only on hashtags within your Tweets. See twitter.hashtags documentation for more.