Dec. 29th, 2009

Avatar

Dec. 29th, 2009 02:40 am
gusl: (Default)
Highly recommended. Very gripping. Worth every penny. It's been a long time since I sat through a movie this long (161 minutes + trailers + ads) without complaining.

spoilers )
gusl: (Default)
The great thing about computational linguistics (in particular: text rather than speech) is that it's very easy to come up with research questions that can be answered by doing (often simple) statistics on large corpora, e.g.:

* in checklists, people don't always end their sentence with a period/mark. What grammatical structures tend to be closed off with explicit punctuation?
* when do bloggers complain the most about their partners / their bosses? What are the correlates to company earnings, unemployment rates?
* how does one's writing reflect one's linguistic (or cognitive) impairments (e.g. aphasics or L2 speakers)? How much insight can you get into someone's mind from their writings?
* what can you predict about other data sources (e.g. stock prices, movie ratings) based on newspaper text?
* find correlates of font choice
(and if you're getting people to type for you, keylogger data can be cognitively much more interesting! Perhaps as interesting as eye-tracking data.)

The not-so-great thing is that shallow approaches don't work for everything (although they can be surprisingly good!) and annotations can be expensive (though Mechanical Turk is making this a lot cheaper).

Having said that, I'm simply more interested in statistics: theory, methodology, modeling and algorithmics. And although engineering can be lots of fun, it can also be a pain to use other people's tools (lemmatizers, parsers, POS taggers, etc) or hacking up your own.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags