Jun. 28th, 2004

gusl: (Default)
One of the "should exist" things that continues to amaze me is the lack of free, downloadable, open-source translation dictionaries.

I just spent 10 minutes searching and I haven't found anything worth linking to. WordNet is apparently far behind in other languages, and an interlingual WordNet seems FAR, FAR AWAY. But WHY??

I find this strange because the benefit - cost of such an enterprise is enormous.

This project would cost very little because:
* We already have more than enough data in parallel corpora, for example Canadian government or European Union data to automatically extract translations quite reliably. (This was my project in January)
* Anyone who moves to a new language community learns a very significant of what anyone could expect from a "complete dictionary" in a few years (no such thing can actually exist, look up "Zipf Distribution"). And people do not learn that fast. Therefore, there are only a few thousand words needed per language. And another few thousands items to distinguish . Assuming a person can write 30 items / hour, this is only about 1000 man-hours / language (when not using any tools).
* There is already tons of data around in the form of explanations about which words to use when, etc. See [livejournal.com profile] go_dutch and similar communities. Those people could simply formalize their contributions a little more.

The benefits would be:
* access dictionaries through the powers of computing
* ordinary people never having to buy dictionaries, theasauri, language tools again, and being limited by their non-digital or proprietary form.
* never having to ask humans to help you when you simply want to translate a word or to know which word to use in which context.

I believe that this net benefit is so big that, I wouldn't mind seeing some government money used to finance such a valuable public good. But it's actually not really necessary! Someone with the leadership and time, please step up.
gusl: (Default)
Whoever invents a way to prevent stapler cuts deserves an IgNobel prize. I just bound a book which was never meant to be printed in the first place... (draft under revision)

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags