Entry tags:
L1 regularization
L2 regularization is seen a way to avoid overfitting when doing regression, no more.
L1 regularization tends to give sparse results. If the truth is sparse, this is seen as a way to get to the truth (although this is not always consistent, which is why we have Bolasso).
Even if the truth is not sparse, L1 may be seen as an Occam's razor. Is this a valid view?
Even if the truth is not sparse, L1 is a way to select a small number of variables, which can be useful for those of us concerned with scarce computational resources (although it's not clear why you'd choose L1 over PCA or Partial Least Squares)
L1 regularization tends to give sparse results. If the truth is sparse, this is seen as a way to get to the truth (although this is not always consistent, which is why we have Bolasso).
Even if the truth is not sparse, L1 may be seen as an Occam's razor. Is this a valid view?
Even if the truth is not sparse, L1 is a way to select a small number of variables, which can be useful for those of us concerned with scarce computational resources (although it's not clear why you'd choose L1 over PCA or Partial Least Squares)
Ooh. Tell me more.
I've been thinking about this too, because Optimality Theory, the favorite model of a lot of linguists, is something like a crude approximation of a logistic regression, with assumptions that radically reduce the number of active variables. The effect of L1 regularization is similar to the effect of OT assumptions, but less radical.
Re: Ooh. Tell me more.
While L1 returns the subset of the original variables it considers to be nonzero (you don't specify how many, though you could tweak the regularization parameter until it returns the desired number), PCA/PLS return a pre-specified number of linear mixtures of the original variables.
<< Optimality Theory, the favorite model of a lot of linguists, is something like a crude approximation of a logistic regression, with assumptions that radically reduce the number of active variables. >>
Please tell me more!
Re: Ooh. Tell me more.
Optimality Theory originally grew out of a perceptron-like formalism, with numerical weights on the variables and a categorical output, and the original analogies between perceptrons and OT kind of suggest an L1 exponential prior with discrete (discontinuous) support. The move to OT switches from comparison of numerical sums to a decision-tree-like comparison of candidate outputs, using the variables in order of importance. It essentially breaks down each case into a bunch of candidate decisions, and in each decision it only pays attention to the highest variable that distinguishes between the two candidates. This was motivated by an apparent scarcity in language of the kind of "ganging up" sum effects that you see in perceptrons. But there remain a few phenomena that look like "ganging up", and recently people have been looking beyond the categorical phenomena that are traditional in theoretical linguistics. The probabilistic phenomena seem to require some allowance for these ganging up effects. But they still seem less common than we would expect given a uniform prior, or even the L1 exponential prior.
I suspect that there is an interesting explanation for all of this, but I'm kind of stuck at this point about how to look for it. I wrote a long term paper about this 6 months ago, but then came to back to China, so between the shortage of people who have a good understanding of both learning theory and language phenomena, and being away from my peeps in SD who are obligated to read the paper, I've not had any feedback on it. I'm going back to SD in a few weeks, so maybe I'll continue with it then.
Re: Ooh. Tell me more.
First of all, this is *L1* regularization.
Secondly, no, not *near* zero weight. L1 methods throw out the subset of variables whose exclusion least hurts (in terms of prediction error).
Re: Ooh. Tell me more.
Re: Ooh. Tell me more.
I don't understand the question.
I'm not familiar with your terminology. I also don't know this stuff very well.
Re: Ooh. Tell me more.
Re: Ooh. Tell me more.