It shouldn't be too hard to create an algorithm to identify an author's native language in English text.
At least for those with a lower English level, it should be very easy to spot signature mistakes. For example, if a question begins with "what for [noun] ...", then the author's native language is very probably Dutch or German. (it's a literal translation of the Dutch way of saying "what kind of [noun] ...")
I wonder if a generic machine-learning technique would discover this pattern when fed with a corpus of texts labelled with the author's native language.
It should be easier than identifying the author's gender, in any case. Apparently, no one claims to guess the author's gender with more than 80% accuracy. I find this unsatisfactory.
At least for those with a lower English level, it should be very easy to spot signature mistakes. For example, if a question begins with "what for [noun] ...", then the author's native language is very probably Dutch or German. (it's a literal translation of the Dutch way of saying "what kind of [noun] ...")
I wonder if a generic machine-learning technique would discover this pattern when fed with a corpus of texts labelled with the author's native language.
It should be easier than identifying the author's gender, in any case. Apparently, no one claims to guess the author's gender with more than 80% accuracy. I find this unsatisfactory.