It shouldn't be too hard to create an algorithm to identify an author's native language in English text.
At least for those with a lower English level, it should be very easy to spot signature mistakes. For example, if a question begins with "what for [noun] ...", then the author's native language is very probably Dutch or German. (it's a literal translation of the Dutch way of saying "what kind of [noun] ...")
I wonder if a generic machine-learning technique would discover this pattern when fed with a corpus of texts labelled with the author's native language.
It should be easier than identifying the author's gender, in any case. Apparently, no one claims to guess the author's gender with more than 80% accuracy. I find this unsatisfactory.
At least for those with a lower English level, it should be very easy to spot signature mistakes. For example, if a question begins with "what for [noun] ...", then the author's native language is very probably Dutch or German. (it's a literal translation of the Dutch way of saying "what kind of [noun] ...")
I wonder if a generic machine-learning technique would discover this pattern when fed with a corpus of texts labelled with the author's native language.
It should be easier than identifying the author's gender, in any case. Apparently, no one claims to guess the author's gender with more than 80% accuracy. I find this unsatisfactory.
(no subject)
Date: 2005-05-03 04:40 pm (UTC)"This allows to improve the efficiency"
(no subject)
Date: 2005-05-03 04:45 pm (UTC)A lot of languages, really. I think English is exceptional in requiring an explicit subject. AFAIK, it could be Portuguese, French or Dutch... i.e. I can't rule out any languages other than English.
(no subject)
Date: 2005-05-03 04:46 pm (UTC)(no subject)
Date: 2005-05-03 08:02 pm (UTC)(no subject)
Date: 2005-05-03 08:15 pm (UTC)Again, could be many languages... English is also unique in using auxiliaries in virtually every question.
All I'll say is that most Dutch people speak better English than that.
Btw, I don't have the data or the knowledge to make these judgements in general... I'm only good enough in 4 languages.
(no subject)
Date: 2005-05-03 08:17 pm (UTC)(no subject)
Date: 2005-05-03 08:24 pm (UTC)I can tell you that it corresponds to a frequent expression in Portuguese. For some reason, it's used as often or more often than "I have a question".
It should be possible to ask Google if a literal translation to French or Spanish occurs proportionately more frequently than in English.
(no subject)
Date: 2005-05-03 08:17 pm (UTC)(no subject)
Date: 2005-05-04 05:21 pm (UTC)(no subject)
Date: 2005-05-04 05:22 pm (UTC)(no subject)
Date: 2005-05-04 05:10 pm (UTC)--Sebastian
(no subject)
Date: 2005-05-04 05:20 pm (UTC)Usage can also indicate a foreigner. For example, Dutch people often say "is it high time for ..." which is unusual but not incorrect English (where it is usual and correct Dutch). It's very hard to break away from such patterns... especially when there is no equivalent English expression.
I tend to cringe when I hear the colloquial "dit keer"... it's hard enough to tell de-words from het-words without exceptions to the rule.