Accented characters


#1

Say that we want to catch tweets that contain the spanish word “manifestación”, but also its misspelled cousin “manifestacion” where the accented character “ó” has been replaced by an ordinary “o”. Would we have to add both words to an ANY list or does datasift contain some magic that allows us to specify only one of the words, and still find tweets containing either word?


#2

You will need to add both words to a contains_any statement.

Take words such as cafe and café for example - one is a place to buy food and drink, the other is French for "coffee". DataSift classifies these both as different words.