gusl: (Default)
While [livejournal.com profile] marymcglo was driving us back from the Machine Learning picnic last Saturday, we somehow came up with some empirical questions that were difficult to answer objectively. For example:

"Are low-income people more likely to marry early?"

i.e. the kind of demographic questions that economists are interested in.

We have two kinds of data available:
* Census data
* Marriage records

Integrating these two to answer our question is not trivial.

For one thing, census data is anonymous. Also, if you don't have access to microdata (i.e. individual data points), then all you get are distributions conditioned on variables like "gender", "age group", "race" or "marriage status". In particular, you can't condition on more than one thing. In situations like this, one trick is to ask a different question:

"Are people in low-income counties more likely to marry early?"

whose answer can be used to answer our original question, but only if we buy an independence assumption, namely that people in low-income countries are representative of low-income people in general. In other words we have to assume that the bias is small. Economists use such tricks all the time.

The methodologist in me wants to create a formal language for querying all this demographic data, while making these economists' tricks explicit. Once we have such a language, some logical questions are:
* what class of questions can be answered by our data?
* what questions need extra assumptions to be answered by our data?

Using this language, you would ask the reasoning engine a particular question, and it would come back offering you a choice of assumptions that could be used to answer the question. It is up to you to decide whether and how much you believe each of these assumptions. The more often an assumption gets accepted, the higher its prior gets: this way, the system formalizes what assumptions are considered "common-sense".

This is also a semantic-web-ish idea. For example, your question might talk about concepts that are not explicitly talked about in the data, but only indirectly so (there is a gap between your question and the data). Or you might have semantic interoperability issues between your data sets (the gap is inside the data).

Finally, I would like to create a Library of Formalized Economic Arguments. I don't know if anyone else is interested in this. While many economists seem to be interested in methodological issues, I don't know any who would like to take this to a foundational level.

P.S.: I didn't even mention causal inferences yet.

---

Census Microdata:

Uses of Microdata
Most population data - especially historical census data - have traditionally been available only in aggregated tabular form. The IPUMS is microdata, which means that it provides information about individual persons and households. This makes it possible for researchers to create tabulations tailored to their particular questions. Since the IPUMS includes nearly all the detail originally recorded by the census enumerations, users can construct a great variety of tabulations interrelating any desired set of variables. The flexibility offered by microdata is particularly important for historical research because the aggregate tabulations produced by the Census Bureau are often not comparable across time, and until recently the subject coverage of census publications was limited.
gusl: (Default)
Any Americans here who visited Brazil since 2004? If so, were you fingerprinted?

The latest news on this is almost 2 years old. This is a serious information problem. It shouldn't be hard to ask an American who traveled to Brazil recently, and yet we don't know how to do this.

Why don't we have anything more recent? Because neither "Americans no longer fingerprinted in Brazil" nor "Americans still fingerprinted in Brazil" makes a good headline. With current technology, this is a very hard information retrieval problem.
gusl: (Default)
I often have wild ideas, and do Google searches on them. For example, I'll think of a research area that should exist, like "combinatorial proof theory".

Googling often leads me to:

* rbjones.com, especially if my search is about formalization. (see his "The Automation of Reasoning")
* The KLI (Konrad Lorenz Institute) Theory Lab, especially if my search is about evolution.
* assorted webpages containing the word "cybernetics", often The Center Leo Apostel (CLEA)
* Cosma's notebooks
* my own site, perhaps unsurprisingly.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags