This idea is roughly what I most desire whenever I'm learning a new language.
* create a sentence-pair with slots for the free words.
* define the class of words that can fit into each slot
* (Optional: elicit labels for combinations that sound ordinary/unusual/non-sensical/ungrammatical. Do some machine learning.)
For example:
Now we can automatically generate novel grammatical sentences for L2 learners to study. (I'd love to extend this to full generative grammars, but I suspect the ungrammaticality would be too difficult to control).
But here's, I think, a safe step in that direction. Define each word class inductively:
So now, we can generate sentence-pairs like:
Ok, it feels slightly ungrammatical in both languages, but it seems like some easy surface-level fine-tuning (which existing NL generation systems already do), which is hopefully machine-learnable (ideally it would induce the rules, with PbyD/ILP). In this case, the PT rules to be learned are "de+o --> do" and "de+a --> da", and the EN rule is something subtler about avoiding awkward/suboptimal word orders.
If people actually used such a system, we would have deeper corpora. And you can do as much knowledge-engineering as you wish. Now imagine Wikipedia-style collaboration: some people adding new words and constructions (expressivity), some focusing on controlling the rate of ungrammaticality, and some forking the system when goals become incompatible.
Of course, I'm not even mentioning deeper issues like metonymy, semantic taxonomies, or multiple choices in translation (though that would be interesting to add).
But if nothing else, this could be a source of much linguistic amusement.
* create a sentence-pair with slots for the free words.
* define the class of words that can fit into each slot
* (Optional: elicit labels for combinations that sound ordinary/unusual/non-sensical/ungrammatical. Do some machine learning.)
For example:
EN: The [Person] went to the [Place].
PT: [Person] foi para [Place].
Person: girl(a menina), pianist(a/o pianista), man(o homem), baker(o padeiro), real-estate agent(o corretor)
Place: house(a casa), shop(a loja), post office(o correio), kitchen(a cozinha)
Now we can automatically generate novel grammatical sentences for L2 learners to study. (I'd love to extend this to full generative grammars, but I suspect the ungrammaticality would be too difficult to control).
But here's, I think, a safe step in that direction. Define each word class inductively:
Place :- [Person]'s place (a casa de [Person])
Person :- owner of the [Place] (o dono/a dona de [Place])
So now, we can generate sentence-pairs like:
The baker went to the owner of the shop's place (O padeiro foi para a casa de o dono de a loja)
Ok, it feels slightly ungrammatical in both languages, but it seems like some easy surface-level fine-tuning (which existing NL generation systems already do), which is hopefully machine-learnable (ideally it would induce the rules, with PbyD/ILP). In this case, the PT rules to be learned are "de+o --> do" and "de+a --> da", and the EN rule is something subtler about avoiding awkward/suboptimal word orders.
If people actually used such a system, we would have deeper corpora. And you can do as much knowledge-engineering as you wish. Now imagine Wikipedia-style collaboration: some people adding new words and constructions (expressivity), some focusing on controlling the rate of ungrammaticality, and some forking the system when goals become incompatible.
Of course, I'm not even mentioning deeper issues like metonymy, semantic taxonomies, or multiple choices in translation (though that would be interesting to add).
But if nothing else, this could be a source of much linguistic amusement.