Feb. 21st, 2007

gusl: (Default)
One of my projects is about annotating the argumentative structure of texts.

My advisor is interested in sources like newspaper editorials, political blogs, etc. He likes to annotate expressions of agreement/disagreement, sentiment, etc. He's not interested in formalizing "normative arguments", such as those found in math/science textbooks or philosophy texts, let alone something formalistic like Landau's Grundlagen (Automath version + full text in LaTeX, simple type theory version 1 2), because we have no reason to think that learning on such a corpus will transfer to real-world texts.

As a consequence, he is interested in doing coarser formalizations. It's perfectly possible to annotate logical steps (whether as valid steps: "modus ponens", "syllogism #5", etc, or as fallacies like "affirming the antecedent", "straw man", etc.), but in practice this will need to be a bit subjective, because:
* arguments in real texts (and the sentences inside them) may be ambiguous.
* arguments in real texts almost always make use of tacit knowledge (including "common sense" knowledge).

I am uncomfortable with this subjectivity. I am more interested in normative arguments, like those found in textbooks and philosophy texts. My natural tendency is always towards fine formalizations: I want dig out the exact logical structure of the argument. (I recognize that fine formalizations are subjective too: even the proofs inside Grundlagen might potentially be interpreted in non-equivalent ways, but somehow that doesn't bother me as much. I wonder why.)

However, my advisor and I do have some intersection in interests:
* annotating argumentative/explanatory triangles (2 explicit premises, 1 conclusion) and lines (1 explicit premise, 1 hidden premise, 1 conclusion).
* annotating discourse relations ("elaboration", "explanation", "illustration", etc.)

The current plan is to annotate the blog posts that I mirrored last week (for a different purpose). I'm expecting to get very simple, disconnected lines and maybe a few triangles. I'd be (pleasantly) surprised if there are any trees deeper than that. This is not my ideal corpus, but it's a start.

My ideal corpus would be a collection of philosophical texts (although a book like Armchair Economics seems like a very rich source). Does anybody know of a source of philosophical texts in digital form?
gusl: (Default)
Most programming languages provide us with an independent identically-distributed (IID) source of random variables, distributed uniformly between 0 and 1.

To convert this uniform distribution to any arbitrary 1D distribution, we need the inverse of the CDF of the distribution. inv-cdf(random()) will generate the distribution we want.

However, for arbitrary 2D distributions, it's not so clear. For one thing, it's not so clear what a 2D CDF would be, and whether it would take 1 or 2 arguments.

Still, I came up with a naive algorithm for a 2D generator:
(1) Find the distribution of the x component by integrating over y for each value in the x-axis (you could also choose some x's and interpolate: then you won't need to do this for *every* value), and generate an x_0 using the 1D generator.
(2) Find the distribution of the y component, conditioned on x=x_0 (computationally trivial). Generate y_0 using the 1D generator.

I really think I could do better here. How?

---

I also thought of a Las Vegas algorithm:

(1) create a geometrical shape inside your distribution, and integrate the PDF inside it. Call the result I.
(2) if random() < I, you fall inside the shape. If this area is precise enough, you're done.
If it's not precise enough, we make a PDF in this area (set all values outside to zero, and divide all values inside by I: best done lazily). Go back to 1.
(3) if random() >= I, you fall outside the shape.
If the area outside the shape is precise enough, you're done.
If it's not precise enough, we make a PDF in this outside area (set all values outside to zero, and divide all values inside by I: best done lazily). Go back to 1.

Improvement
If you partition the space into more subsets, you will save random numbers. They are a resource afterall (although maybe you can recycle them, using the residue).

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags