My experience with applied statistics has been fun in large part because it involved making the Bayesian rubber meet the road of inherent computational complexity, and because it allowed me to define and explore theoretical concepts and tie them to our applied problem... I also had a good time formalizing and unifying ideas, making them more coherent and clearer. And it has been great to work with dedicated people. I'm still learning from my new experience in a mentoring role.
But perhaps the most formative thing is the periodic mindfuck that comes with realizing how wrong I am. It is the frustration of realizing that missing data reflects perfectly innocent biases (leaving me with little recourse, since I don't have a model of our colleagues who decide how to collect data, and what to report). It is questioning maximum-likelihood. Last year, I saw how one can expect to make better predictions by deliberately plugging in a parameter value that one knows is smaller than the truth (in simulated data, no misspecification). Now I am seeing how a better search algorithm leads to worse prediction. Maybe it is time to penalize the likelihood afterall... But it's not clear where overfitting may be coming from, or what makes one block structure more complex than another.
In supervised classification problems, we can define complexity families (e.g. linear separators are very simple; quadratic ones are less simple, etc) and define penalties proportional to the complexity measure... but what would you do if there was no geometry to the space, or if you were ignorant of geometry to begin with? And what makes one complexity measure (or one geometry) superior to another anyway? (It is not helpful to mention that Kolmogorov Complexity provides a universal measure of complexity; although I do enjoy this brand of optimism)
But perhaps the most formative thing is the periodic mindfuck that comes with realizing how wrong I am. It is the frustration of realizing that missing data reflects perfectly innocent biases (leaving me with little recourse, since I don't have a model of our colleagues who decide how to collect data, and what to report). It is questioning maximum-likelihood. Last year, I saw how one can expect to make better predictions by deliberately plugging in a parameter value that one knows is smaller than the truth (in simulated data, no misspecification). Now I am seeing how a better search algorithm leads to worse prediction. Maybe it is time to penalize the likelihood afterall... But it's not clear where overfitting may be coming from, or what makes one block structure more complex than another.
In supervised classification problems, we can define complexity families (e.g. linear separators are very simple; quadratic ones are less simple, etc) and define penalties proportional to the complexity measure... but what would you do if there was no geometry to the space, or if you were ignorant of geometry to begin with? And what makes one complexity measure (or one geometry) superior to another anyway? (It is not helpful to mention that Kolmogorov Complexity provides a universal measure of complexity; although I do enjoy this brand of optimism)