Filter twitter spam in demographics


#1
twitter.user.profile_age > 1 and
twitter.user.follower_ratio < 0.01 and
not twitter.source contains_any "twittbot.net,EasyBotter,hellotxt.com,dlvr.it" //and
not interaction.raw_content regex_partial "(#\\w+[^#]+){5,}" and
not interaction.content contains_any "rt and follow,rt & follow,rt+follow,follow and rt,follow & rt,follow+rt"

Tried this filter for demographics as sugested on Twitter SPAM docs page. I tried it in different combinations turning on and off some of lines and it always returns empty results. I’m querying demographics data source, not twitter data source, but I think it should be available to filter out spam from demographics.


#2

Compiling this CSDL (I had to remove the commented out “//and” from line four), I instantly receive plenty of interactions. Can you please confirm the CSDL filter hash you receive when you compile this CSDL and how you are testing this filter?


#3

Yes, you are right, sorry. Seems like problem was in follower_ratio filter value.
However in regard of filtering out spam, shouldn’t it be:
twitter.user.follower_ratio > 0.01
and not:
twitter.user.follower_ratio < 0.01
as in documentation of twitter spam?
by doc it’s counted as num_of_followers_of_user / num_of_followed_by_user so if user is bot it should be like 1 / 1000 - some small number, and to filter them out i need to specify that I’m intersted in high folloer_ratio only so twitter.user.follower_ratio > 0.01

Thanks,


#4

First off, please bear in mind that the examples shown in our documentation are often just used to demonstrate what is possible, and will not necessarily represent what is best for your use case.
The CSDL: twitter.user.follower_ratio > 0.01 will match Twitter users who are followed by 0.01 times (or more) more people than they follow. As soon as you get into the 1x ratio, this represents users who are followed by more users than they follow. One thing we typically see with spam accounts is that they will follow many times more users than the number of users who follow them.


#5

Yes sure. What I mean: that other examples in that document says how to filter out spam, and this one says how to keep it, that a bit confusing to me, and really not obvious from surrounding text.

Thanks for quick and informative response!


#6

Ah, I understand. We tend to write filters like this to match spam posts, and use this to exclude spam results from the filter we care about. For example, if I compile a CSDL filter that matches a load of spam, and take the filter’s hash, I’ll then include this as part of a new filter like so:

interaction.content contains_any "keywords, I, care, about" AND
NOT stream "<hash_of_spam_filter>"

I can then take this single spam filter, and include it in all my individual filters to ensure the same spam filter is applied across all my CSDL filters.