DOP

Aug. 1st, 2005 06:47 pm
gusl: (Default)
[personal profile] gusl
My advisor Rens Bod is a computational linguist and the major proponent of a theory of language called "Data-Oriented Parsing" (DOP) (due to Remko Scha, another colleague of ours) which says that people produce language by reusing chunks they've seen before, whether concrete word-sequences or abstract rules.

While grammars attempt to be minimalistic, DOP proposes "maximalism": a theory of language has to incorporate a learner's whole language experience. It takes a child ~1000 days worth of information to learn a language, and the DOP philosophy is that children are not being informationally inefficient: we need most of that information in order to teach the same language to a computer. (OTOH, if you take a rule-based grammar + a dictionary, this will always take up less information)

Rens likes to generalize DOP to all kinds of human cognitive artifacts. I claim that all human artifacts can be parsed: language, music, film... and scientific knowledge. The latter is the subject of my thesis. Information produced by non-intelligent processes, OTOH, cannot be parsed. Parsing may be an intelligence universal: beings that handle a lot of information need to be able to organize it somehow. It would be an interesting project to make an algorithm that distinguishes human artifacts from data produced by non-intelligent processes (weather, geological, astronomical) or nature-made designs (plants, animals). I wonder if gzip can tell the difference, since compression is a kind of universal learning.


Michael Tomasello is an influential cognitive scientist who has a complementary view. While we say that maximalism is necessary for language acquisition, he says that it's sufficient (contra Pinker):

Tomasello is a great advocate of the role of nurture in language
acquisition, as this was evident in his review of Steven Pinker's
famous book 'The language instinct', which he entitled 'Language is
not an instinct'. Many linguists, Pinker amongst them, have argued
that the language input a child receives is insufficient to learn the
complex grammar of a language - thus, the language capacity must
mostly be innate. Others, including Tomasello, consider this as big
mistake. According to them, there is no reason to assume that the
abstract grammatical categories, which are being used to describe the
language of adults (for example categories like verb, noun,
preposition), actually have to be learned by infants. A child who
knows the abstract concepts, should, after hearing the sentence (1) 'I
throw the ball to the dog', be able to produce the next sentence 2) 'I
bring my plate to the kitchen'. After all, both sentences are
instances of the same scheme.

A growing number of experimental results (also from Tomasello's
laboratory) show that this is not the case: a child could for example
use the first sentence and at the same time never combine the
expression 'to X' with the verb 'bring'. In the 'usage-based' account
of language acquisition, learning takes place by picking up language
expressions from the environment, without assuming an innate universal
grammar.
This interpretation has great consequences for the 'nature
vs. nurture' debate. After all, until now linguistics has mostly
advocated the first alternative, namely 'nature'.



(see also his "Understanding and sharing intentions: The origins of cultural cognition")

For the second time in my life, I am siding against Pinker. And I really like the guy!

(The other time was about logical reasoning, when he promoted Cosmides's "cheater detection" theory. See here for why)

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags