gusl: (Default)
I sometimes read a paper, and think: how I wish I had someone around here to work on this stuff with.

If I could find someone at CMU who:
* cites this paper, or cites many of the same papers as this paper.
* has been a co-author or colleague of one of the authors
then this problem would be considerably simplified.

Some relevant resources:
* CiteSeer can be used to get the citation graph (although this can be problematic)
* citeUlike is a recommendation system. Might be useful.
gusl: (Default)
A project idea I've written down on my notebooks: Shannon language tests, to automatically generate tests to measure language ability. The prediction task is similar to the way Shannon measured the entropy of English.
gusl: (Default)
Rain has been a part of (literally) everyday life for these 3 weeks in Amsterdam. Shouldn't there be a way to tell what time it will begin to rain and what time it will stop?

I wish weather sites would show me precipitation graphs with approximately hourly resolution.

Also, do you ever notice that while you can forecasts up to a week ahead, you can't see aftercasts (backcasts?)? I'd like to see these forecasts evaluated against what actually happened. Maybe these websites are trying to keep up an illusion of infallibility.

In the spirit of the Wikipedia, I'd like to see amateur weather stations sharing their measurements in a central place where we can compare them.
gusl: (Default)
It shouldn't be too hard to create an algorithm to identify an author's native language in English text.

At least for those with a lower English level, it should be very easy to spot signature mistakes. For example, if a question begins with "what for [noun] ...", then the author's native language is very probably Dutch or German. (it's a literal translation of the Dutch way of saying "what kind of [noun] ...")

I wonder if a generic machine-learning technique would discover this pattern when fed with a corpus of texts labelled with the author's native language.

It should be easier than identifying the author's gender, in any case. Apparently, no one claims to guess the author's gender with more than 80% accuracy. I find this unsatisfactory.
gusl: (Default)
One thing I enjoy doing is creating decision-making tools.

One such project has been itching in my mind for the past few months: a language-learning decision-maker: which language should I learn next?

Given a language L and a person P, we want to calculate P's costs and benefits in learning L.

Effort: how much work is this language for P? (P's talent, linguistic flexibility, knowledge of related languages)
Money: how much money would P spend in learning this language?
(different combinations of effort and money may work, but it will be a trade-off in any case)
Time: P's opportunity cost: what is the cost of not doing other things he could be doing?

Economic: how much does knowing L improve P's job prospects? How much more business can P do by learning L? (what is P's area? what kind of person is P? where does P live? where does P intend to live?)
Social: would P make more friends by learning L? (where does P live? where does P intend to live?) be more attractive to the opposite sex? appease his partner's parents?
Entertainment: would P enjoy learning L? Would P enjoy the consequences of knowing L?

Are there any common reasons for learning a language that I am not covering here?

Here's a slightly bullshitty, yet comprehensive coverage of the whys and hows of learning a language. Here is a guide to the difficulty of learning a language for a native English speaker (measured in hours required to achieve proficiency).

Here is the ACTFL proficiency scale, a simple self-assessment guide. I am an "Advanced" in Dutch, in English and Portuguese I am "Superior" or "S-4" depending on my mood, and somewhere between "Intermediate High" and "Advanced" in French. I don't claim consistent S-4 level in any language, since there are times when I can't express myself very well at all.

In any case, it seems clear that I should learn Chinese. But first German, since I'm already halfway there. Having said that, it would only take me a couple of months to learn Spanish properly, so that might be the most sensible thing to do.

Here are the world's most important languages, as measured by GDP:

Languages on the Internet may be a better measure of how many highly-educated people a language has, so you might prefer this measure:

More measures...
Read more... )

An interesting tangent, is that the BRIC4 countries (Brazil, Russia, India, China) are predicted to economically surpass the G6 (UK, France, Germany, Italy, Japan, USA) before 2050.


gusl: (Default)

December 2016

18 192021222324


RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags