NLP; native language classification
Oct. 19th, 2008 05:41 pmNLP is a pretty cool research area but just brainstorming projects makes my head hurt. Seriously, what hasn't been done? All the interesting&novel ideas I can come up with involve an expensive process of collecting/annotating data.
Here's another, which has been done (though only recently):
I'd like to make a classifier to identify the native language of the author of an English text.
A quick googling produced: Oren Tsur, Ari Rappoport - Using Classifier Features for Studying the Effect of Native Language on the Choice of Written Second Language Words
and to do it from audio: Bouselmi et al - Discriminative phoneme sequence extraction for non-native speaker’s origin classification
Here's another, which has been done (though only recently):
I'd like to make a classifier to identify the native language of the author of an English text.
A quick googling produced: Oren Tsur, Ari Rappoport - Using Classifier Features for Studying the Effect of Native Language on the Choice of Written Second Language Words
We apply machine learning techniques to study language transfer, a major topic in
the theory of Second Language Acquisition (SLA). Using an SVM for the problem of
native language classification, we show that a careful analysis of the effects of various
features can lead to scientific insights. In particular, we demonstrate that character bigrams
alone allow classification levels of about 66% for a 5-class task, even when content
and function word differences are accounted for. This may show that native language
has a strong effect on the word choice of people writing in a second language.
and to do it from audio: Bouselmi et al - Discriminative phoneme sequence extraction for non-native speaker’s origin classification
The existence of discriminative phone sequences in non-native speech is a significant result of this work. The system that we have developed achieved a significant correct classification rate of 96.3% and a significant error reduction compared to some other tested techniques.
Re: Mechanical Turk
Date: 2008-10-20 08:35 am (UTC)