learning argumentative structures
Nov. 12th, 2006 12:32 amunified architectures for Machine Learning?
Our ontology is, as usual: observables (inputs) and unobservables we are interested in (output).
The purpose of much machine learning is to: given some data, induce a function that, when given a new data point with partial information, will let us complete it.
Analogy: reconstructing an image
Supervised learning is when we learn from complete images, and then perform the task of filling in the missing area.
Unsupervised learning is when we learn from incomplete images to begin with. The goal may be to complete the square, or merely to find a classification of the incomplete images.
But in a more general context, the different variables associated with the data will have different units and types (e.g. two-valued, multi-valued, discrete, continuous, etc). While such data points can still be encoded as images (everything can), we lose contraints that made learning feasible (e.g. continuity).
My impression is that many learning systems are hard-coded for a specific learning function (i.e. a given set of inputs and outputs), and aren't robust to changes, i.e. if you add an input the system won't improve, if you remove an input the system breaks. If we have a system that has learned an estimate of 1=>2, it should be easy to turn that learning into an estimate of 2=>1 (of course, if 1=>2 is very information-lossy, your standard for 2=>1 can't be very high).
-------------------------------
Learning argumentative structures
Anyway, I mention all this because in my current project on learning argumentative structures, our most ambitious goal (i.e. automating the process of argument-mapping) involves a multi-step process, and we may or may not want to add scaffolding (different levels of human-annotation) along the way.
Our "variables":
1 skeleton of the graph
2 text in nodes
3 raw source text
4 text segments ("quotes") to be used
5 links between text segments and nodes
The ambitious goal is to learn 3 => 1,2,5 (4 being a necessary intermediate step)
2 => 4 should be easy, as the text in nodes tends to be close paraphrases of the source, and the target space is small (there are only so many quotes you can take a short text).
1,3,4 => 5 should be easy to make perform reasonably well by using simple heuristics about ordering and textual cues (words like "therefore").
1,3 => 2 could benefit from this heuristic: if you see the text "we assume that" in 3, the sentence following that must be a leaf node (i.e. axiom node). Likewise, some cues may help us identify the root node: maybe "therefore" gets used in final conclusions whereas "thus" is used in intermediate nodes more often.
1,2,3,4 => 5 is easy: 1,3,4 => 5 is feasible already, and now we have 2, which makes the problem almost trivial: just use string matching.
---
Collecting data:
fixed text, different graphs: read & formalize assignments
fixed graph, different texts: read graph & write assigments (expose the author's point of view)
(no subject)
Date: 2006-11-12 12:37 am (UTC)What makes you think this? If I give Naive Bayes more training data, it does better. If I take away training data, it doesn't fail catastrophically, but rather its performance degrades gradually.
(no subject)
Date: 2006-11-12 12:41 am (UTC)(no subject)
Date: 2006-11-12 12:45 am (UTC)(no subject)
Date: 2006-11-12 12:54 am (UTC)(no subject)
Date: 2006-11-12 08:56 am (UTC)When I talk about "adding/removing an input", I mean extra variables about each data points. e.g. in biometrics, information about the neck width.
Is this what you meant?
(no subject)
Date: 2006-11-12 07:15 am (UTC)(no subject)
Date: 2006-11-12 08:52 am (UTC)(no subject)
Date: 2006-11-12 08:59 am (UTC)Are you making the point that machine learning is more dynamic than "normal algorithms", i.e. statistical recipes?
(no subject)
Date: 2006-11-12 03:10 pm (UTC)Also, I like jcreed misinterpreted you to mean adding more points. If you mean adding omre variable about each data point, then you're taking the problem into a higher dimensional space, so it makes sense that this would make the problem harder. Finding almost any kind of structure is much much harder in higher dimensional spaces than in lower dimensional ones. I'm not sure that learning algorithms are hand-coded for a specific dimension, but they certainly benefit from having lower dimension. There's been some cool work lately though on learning over more general manifolds, so even if your training data appears to come in some incredibly high-dimension space you can try to find a low-dimension manifold containing the training data and learn over that.
(no subject)
Date: 2006-11-12 03:27 pm (UTC)(no subject)
Date: 2006-11-12 05:20 pm (UTC)Does "add an input" sound more like "add a data point" than like "add a dimension"?
What would it mean for the learning to be hard-coded to a particular function? That you're just fitting parameters? If so, then you're still learning over a function "type" in a more restricted sense of the word (e.g. the type of linear functions touching (0,0)).
(no subject)
Date: 2006-11-12 05:35 pm (UTC)Yes, the former.
What would it mean for the learning to be hard-coded to a particular function?
To be honest, the thing I had in mind doesn't make a lot of sense: that one has an exact, single function in mind, and cooks up a learning algorithm to perform well on learning that function from instances of it. It is a somewhat extreme straw-man, for in principle the learning algorithm could just be the function itself, without paying any attention to its training data, but there are slightly less extreme versions of it.
I want to say that the human brain itself has a fixed "type" for its inputs and outputs, that we are born with only so many eyes and ears on the input side, and only so many arms and mouths on the output side. Nonetheless when people sustain injuries, they "route around" missing inputs or outputs, and seem to incorporate external objects into their sense of self insofar as learning to use tools.
I'm hesitant to accept the idea that algorithms now are "just inflexible" or "brittle" or something and that we need to figure out "the algorithm for flexibility", but there's something in what you're saying that is persuading me that there is some kind of generality that we might not have, and that we might need.
Let me try to rework your objection/claim/idea into a similar one:
Our current machine learning algorithms typically (but not always) learn functions whose type is something like Rn → Rm. Although we can twiddle around the ns and ms, it's still rather unstructured vectors of reals. How can we learn functions of more interesting types? Specifically, how can we learn functions of types which, on both the input and output side, are structured with enough complexity to admit objects that start looking like programs and language-utterances?
(no subject)
Date: 2006-11-13 06:48 pm (UTC)