gusl: (Default)
[personal profile] gusl
One of the "should exist" things that continues to amaze me is the lack of free, downloadable, open-source translation dictionaries.

I just spent 10 minutes searching and I haven't found anything worth linking to. WordNet is apparently far behind in other languages, and an interlingual WordNet seems FAR, FAR AWAY. But WHY??

I find this strange because the benefit - cost of such an enterprise is enormous.

This project would cost very little because:
* We already have more than enough data in parallel corpora, for example Canadian government or European Union data to automatically extract translations quite reliably. (This was my project in January)
* Anyone who moves to a new language community learns a very significant of what anyone could expect from a "complete dictionary" in a few years (no such thing can actually exist, look up "Zipf Distribution"). And people do not learn that fast. Therefore, there are only a few thousand words needed per language. And another few thousands items to distinguish . Assuming a person can write 30 items / hour, this is only about 1000 man-hours / language (when not using any tools).
* There is already tons of data around in the form of explanations about which words to use when, etc. See [livejournal.com profile] go_dutch and similar communities. Those people could simply formalize their contributions a little more.

The benefits would be:
* access dictionaries through the powers of computing
* ordinary people never having to buy dictionaries, theasauri, language tools again, and being limited by their non-digital or proprietary form.
* never having to ask humans to help you when you simply want to translate a word or to know which word to use in which context.

I believe that this net benefit is so big that, I wouldn't mind seeing some government money used to finance such a valuable public good. But it's actually not really necessary! Someone with the leadership and time, please step up.

(no subject)

Date: 2004-06-28 11:21 am (UTC)
From: [identity profile] parakleta.livejournal.com
Did you come across freedict based on the dict protocol.?

(no subject)

Date: 2004-06-28 11:24 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
no, but it looks promising. Thanks.

(no subject)

Date: 2004-06-28 11:37 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
actually, you can see the site is outdated, there's hardly any documentation, what you see is mostly in German, and the Windows they support is 3.X . But maybe they have good data.

(no subject)

Date: 2004-06-28 01:27 pm (UTC)
From: [identity profile] parakleta.livejournal.com
For up-to-date information about clients and servers go to the dict link I gave you. They reference that first site as the source of multi-language data files.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags