gusl | personality questionnaires & data mining

How meaningful are the MBTI dimensions?

Wouldn't we be better off doing data mining on personality questionnaires, in order to find an optimal set of personality dimensions?

If we have a questionnaire with k questions, the space of possible answers is A^k, where A is the set of admissible answers for an individual question. We could simplify this and say that A = {0,1} (yes-or-no questions). The consequence is that the possible answers form a (discrete) hypercube.

A personality dimension would be a linear combination of these answers.
A subspace of A^k is a collection of personality dimensions.

The interesting question is:

Given an integer n, and a sample of completed questionnaires, how do you find the optimal linear subspace in n dimensions? In general, you would define a set of variables that you want to predict. But let's say we want to predict all the variables equally, i.e. we want the subspace that provides the least-lossy compression of the data (measured by say, least-squares). Since we're maximizing meaningful information over all possible n-dimensional subspaces, I wonder if this corresponds to minimizing entropy. But it seems that maximizing entropy would just maximize noise.

Of course, the approach of linear combinations ignores complex interactions between the variables (e.g. given Q1, Q2 is positively correlated with Q3; but given ~Q1, Q2 are Q3 are negatively correlated). But we can always solve this problem by adding extra variables (e.g. a variable that equals "NOT (Q2 XOR Q3)", which measures their correlation): I wonder if all the logical relationships remain preserved when you do the linear regression (under the Boolean extension from {0,1} to [0,1] where "AND" becomes multiplication). Another interesting question is "which logical dependencies are expressible with a set of conjunctions?".

I'm now imagining that a good algorithm would be to create a graph with questions as nodes, and edges as strongly positive pairwise correlations. Good dimensions will show up as clusters (dense subgraphs), i.e. just let the dimension be the sum of all questions in the cluster. Good subspaces can be found by finding partitions that cut through the fewest edges (sort of similar to min-cut) while still being more or less balanced in the size of the clusters.

-

Conclusion: I need to take a machine learning class.

And by "class", I mean a good book.

And by "good", I mean "a book that answers my questions, without too much reading effort required".

Flat | Top-Level Comments Only

From:

marknau.livejournal.com

The term I think you want to Google for is "principal components analysis."

Here's a link by a chap who used this on answers to him online political beliefs survey, with interesting results.
http://ex-parrot.com/~chris/wwwitter/20030731-a_little_knowledge.html

gustavolacerda.livejournal.com

Thanks! Very interesting survey. It's a shame he only identified two axes.

The Libertarians' "World's Smallest Quiz" has 2 axes: social freedom, economic freedom. But I believe that the pragmatism dimension is much more realistic.

Evidently, those two axes were the only two that were statistically significant. I couldn't find the exact data, but I remember seeing something to the effect of the first axis being like 5 or 10 times more significant than the second, and no other potential axis adding any explanatory power at all.

Note that the label he puts on the axes are just his interpretations of the data. The axes are really defined by the questions and their resultant weights.

Also of related interest is his 2005 version, which evidently gets a larger and less insular sample of respondants. The axes turned out quite a bit different there.
http://www.politicalsurvey2005.com/

hey, how did you learn about PCA, being an amateur and all?

I like analyzing data, so I know just enough about a variety of statistical methods to be dangerous.

ernunnos.livejournal.com

I'd like to see this applied to musical tastes too.

Me too.

I imagine that some of the components would correspond to the subculture that you're in (people learn to like the styles they are around) while others would correspond to a personal "cognitive/emotional profile".

williamallthing.livejournal.com

lecture notes for the ml class i'm currently taking are available online:

http://www.stanford.edu/class/cs229/materials.html

chapter 10 is pca.

but it's pretty dense material.

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29

Gustavo Lacerda

personality questionnaires & data mining

(no subject)

(no subject)

Axes

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

Profile

February 2020

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags