Language Tagging problem in "board" content


#1

Hi! I don’t think language tagging is working correctly in “board” content…

See below – all of the content was tagged as “en”.

I expect some to slip through, but I think it’s way more non-english than english in the “board.” case.

(And in my reply “ugly” refers to how the copy-paste looked in this original message :slight_smile: )

–Seth


#2

Ugh. Ugly - I set up a page on my site so you can see it: http://www.lexalytics.com/page-datasift


#3

We are currently working on a number of improvements for our language detection, which we plan to roll out over the next few weeks. The first of these changes are currently being tested, and will hopefully be ready for release at some point next week.

You will find that most of the language mis-categorisations are found when a post contains two languages, such as English and Japanese. This is one of the key issues which we are addressing.