gusl: (Default)
[personal profile] gusl
NLP is a pretty cool research area but just brainstorming projects makes my head hurt. Seriously, what hasn't been done? All the interesting&novel ideas I can come up with involve an expensive process of collecting/annotating data.

Here's another, which has been done (though only recently):

I'd like to make a classifier to identify the native language of the author of an English text.

A quick googling produced: Oren Tsur, Ari Rappoport - Using Classifier Features for Studying the Effect of Native Language on the Choice of Written Second Language Words
We apply machine learning techniques to study language transfer, a major topic in
the theory of Second Language Acquisition (SLA). Using an SVM for the problem of
native language classification, we show that a careful analysis of the effects of various
features can lead to scientific insights. In particular, we demonstrate that character bigrams
alone allow classification levels of about 66% for a 5-class task, even when content
and function word differences are accounted for. This may show that native language
has a strong effect on the word choice of people writing in a second language.


and to do it from audio: Bouselmi et al - Discriminative phoneme sequence extraction for non-native speaker’s origin classification
The existence of discriminative phone sequences in non-native speech is a significant result of this work. The system that we have developed achieved a significant correct classification rate of 96.3% and a significant error reduction compared to some other tested techniques.
(will be screened)
(will be screened if not validated)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags