gusl: (Default)
Wouldn't you like to read my academic posts instead? Stats Computer Science
gusl: (Default)
Bickel, Klaassen, Ritov, Wellner
* p. 227, Re: estimating cell probabilities in a two-way contingency table with known marginals.  "Iterative Proportional Fitting converges to the minimum Kullback divergence estimator"

Casella & Berger (2nd Edition) says nothing

Gelman, Carlin, Stern, Rubin (2nd Edition)
* p.107, asymptotics of estimation under mis-specified model.
* p.586-588, details about this.

Hastie, Tibshirani, Friedman
 (2nd Edition)
* p. 561, KL "distance".  Used in definition of mutual information, on a section about ICA.

Hogg, McKean, Craig (6th edition) says nothing

* p.113-114, "Kullback-Leibler information number", on chapter titled "Strong Consistency of Maximum-Likelihood Estimates"

* p.59: proof question: show that KL > 0 unless the two distributions are equal.
* p.156: consistency of MLE.
* p.466: solution to the proof question.

Lehmann & Romano
* p. 432: optimal test rejects for large values of T_n, and T_n converges to -K(P0, P1).
* p. 672: in non-parametric test for the mean, KL is used to define the most favorable distribution in H0.

van der Vaart
* p. 56: asymptotics of estimation under mis-specified model.
* p. 62: viewing MLE as an M-estimator.

* p. 125: consistency of MLE. "By LLN, M_n converges to -D(θ*, θ)"
gusl: (Default)

I welcome improvements and additions to this.

* words in ALLCAPS denote macros

* x denotes arbitrary variables

* a, b denote scalars

* f, g denote functions

* i,j denote indices

* mat denotes matrices

* obj denotes objects 

* v, w denote vectors

* z denotes a Boolean truth value


gusl: (Default)
Supply charges

Manhattan: 8.3333c/kWh
Brooklyn: 15.8377c/kWh

Delivery charges

Manhattan: 10.1321c/kWh
Brooklyn: 10.1789c/KWh

SBC/RPS charges

Manhattan: 0.3082c/kWh
Brooklyn: 0.3911c/KWh

This means that in Manhattan I used to pay 18.7736c/kWh, whereas in Brooklyn I'm paying 26.4077c/kWh.

On top of that, in Manhattan, my two-person household was spending 5.12 kWh per day; in Brooklyn, my 3-person household has been spending 9.87 KWh per day.

On the positive side, Con Edison has adjusted their ridiculous "estimated" charges, and the adjusted bill looks almost reasonable.
gusl: (Default)
This is my first Burn... 8 days from now.

== desert weather ==
* camelbak: BOUGHT
* goggles: BOUGHT
* dust mask: BOUGHT

== camping ==
* tent: BORROWED
* rebar: BOUGHT
* reflective material for cooling: BOUGHT

== sleeping ==
* ear plugs (gel): BOUGHT
* sleep mask: ORDERED
* sleeping bag: ORDERED
* self-inflating mattress (3" thick): ORDERED
* blanket: ALREADY HAVE

== ride arrangements ==
I need to ride with Victoria, since we are splitting a Will Call ticket.
We arranged a ride in a sedan transporting 4 Burners and our stuff, which I think is very tight.
There might not be room for most of my stuff, so I will try to find other people who can transport my bicycle.

== lights ==
* headlamp: ORDERED
* reflective tape: ORDERED
* blinky reflective vest: ORDERED
* solar-powered light: BOUGHT
* bicycle lights: NEED BATTERIES

== electricity ==
* solar-powered phone charger: THANKS,GOOGLE
* batteries:

== survival ==
* dish, mug, cutlery: BOUGHT
* water:
* food:
* clothing for cold:

== MOOP ==
trash bags:

== other ==
duct tape: BOUGHT
gusl: (Default)
I'm going to Rio for WWW2013 next week. Exciting!

I went to AirBnb look for rooms near Barra. I found a studio here, for $150/night, "Enjoy Nature Studio in Lagoon RIO!":

The conference is near the Sheraton, which is a 20-minute walk from the center of the pink circle, and I'm otherwise not very picky, so I didn't mind the Strict cancellation policy. I accept the charge of $750 for 5 nights + $82 in AirBnb fees.

Once I book, I get the address: Av. Armando Lombardi 370. I tell my friend in Rio, who walks by and tells me that address given is a gas station! I call the host, and her daughter explains to me that the studio is on an island: Ilha Primeira, which falls completely outside of the pink circle (the triangular island just North of the pink circle), and that this address is where the boat picks you up. I understand why they did this: AirBnb requires a street address, and Ilha Primeira doesn't have streets.

The place sounds very beautiful, but it would be extremely inconvenient to have to depend on boats all the time. After I make a few inquiries about the boat service, the host advises me to cancel. I agree, pending her reassurance that she will give me a full refund. I call AirBnb twice, and learn that their resolution procedure is essentially: "you guys find an agreement", and they tell me that I'm going to lose the fees ($82) regardless. They heard my complaint about the misleading address, and seemed to agree with me, but didn't want to take any action on it.

Anyway, I accept losing the $82, so just before midnight, I cancel... Today I spoke to the host again, and she told me that she will refund me all the money they give her, but that this is only $727. So it sounds like AirBnb is charging her $23, on top of the $82 from me! Unfortunately, this means that my total loss would be $105... which crosses the threshold for picking a fight with me. Maybe this means that I need to call AirBnb again and threaten them with reversing charges. I have a debit card, but my bank reassured me that they will give me the benefit of the doubt in such a dispute... but first I have to wait until the transaction posts.

For the future, I should probably use credit cards more often: my understanding is that they are better when it comes to dispute resolution.

I do worry about burning my bridges with AirBnb, but this is a matter of principle. Hopefully they wouldn't do anything to my San Francisco booking if I reverse charges on the Rio booking.

my taxes

Apr. 10th, 2013 02:13 pm
gusl: (Default)
I file taxes as a Resident Alien, which makes my tax situation pretty much identical to most American PhD students at Columbia. And yet, because of legal liability, the only qualified people who are willing to give us advice are tax professionals (whose time will cost at least $100).

So here's the basic calculations I do, before I start working on my tax forms. This is not tax advice. This is not legal advice.

Add up the income:
* TA Wages, see W2 form
* Stipend: look through bank account or MyColumbia, and add all the checks issued in 2012.
* Interest income: Chase sent me a form ("in lieu of 1099-INT"), informing me that I made $4.58 in interest, of which $0.00 was withheld. However, Chase charged an "Agent Admin Fee" $4.58, which means that I'm going to pay tax on money that I never saw... So let's be thankful that Savings accounts have such crappy interest!

Add up the withholdings:
* TA Wages, see W2 form
* Stipend: look through checks at MyColumbia. Check whether they withheld anything. In my case, they didn't.

Confusing things:
* My scholarship exactly cancels out tuition+fees. This means I don't need to look at the 1098-T, even though the university is obligated to send it to me. I think this form concerns the university's taxes wrt me, not my taxes.
* Unlike most international students, I should not receive a 1042-S (it's only for Non-Resident Aliens)

Since no money is being withheld from my stipend checks, I expect to owe money to the IRS, on the order of a few thousand per year.

My stipend is taxable
gusl: (Default)
I'm subletting my apartment. It's a great deal. You will not find this much space in such a nice area for $900/month. See here.
gusl: (Default)
Earlier this week, another piece of statistical theory fell into place for me, this time inspired by reading Cox&Hinkley.

One of the key principles expounded in this book is known as the "conditionality principle": given your model, if you can find a statistic that is ancillary (i.e. invariant to the parameter of interest), then your likelihood function should be conditional on it.

Now, if the minimal sufficient statistic is complete (as is the case in any full-rank exponential family), Basu's theorem tells us that any ancillary statistic will be independent of it, i.e. there is a clean separation between sufficient and ancillary. But in curved exponential families, it can happen that there is no maximal ancillary statistic, i.e. you may have multiple choices of ancillary statistic, but combining them yields a statistic that is no longer ancillary. This is a bit troubling to me, because it breaks the nice idea of a bijection between model and likelihood function.

Given a choice between two ancillaries, C&H advises selecting the one whose Conditional Fisher Information has the greater variance. It's not immediately obvious why one should do this, but I think this can be understood as the Conditional Fisher Information giving us a lens into the conditional likelihood function. For example, if the conditional Fisher Information has 0 variance, it may be because the ancillary statistic doesn't add any information (as is the case when the minimal sufficient statistic is complete). However, it still seems plausible to me that the Conditional Fisher Information can be constant (independent of the ancillary statistic) even while the likelihood function is sensitive to it.

C&H also hint at a notion of partial sufficiency/efficiency and how to measure it: just compute a Conditional Fisher Information, conditioning on the proposed statistic.

(Since Fisher Information is an expectation, Conditional Fisher Information is the expectation of a conditional distribution; since the quantity on the LHS is a function of the sufficient statistic, conditioning on the sufficient statistic will not change anything, whereas conditioning on something insufficient can have the effect of making the log-likelihood smoother, and the Fisher Information smaller) Conditioning on ancillary, however, doesn't simple make the log-likelihood sharper: the average of the Conditional Fisher Information is just the Fisher Information.

[the last paragraph is probably wrong; please comment]
gusl: (Default)
A lot of people in the Data Science world seem to be on Twitter. And I think it's true of techy people in general, at least the sort of techy people who are into Free Culture type things.

Of course, I much prefer having a real blog, with space to develop my thoughts and a permanent record. But there is something to be said for Twitter, which is hard to put my finger on. Twitter gives a feeling of immediacy, that you're speaking to the whole world. Although real blogs are just as public and just as immediate as Twitter, (a) blog interaction suffers from interoperability issues, especially commenting: there are many competing standards, and unfortunately OpenID didn't take off (though maybe Facebook Connect will), whereas Twitter is unified and simple; and (b) the set of people who read you somehow on real blogs feels more limited*. This may be because Twitter users often browse by hashtags (#), rather than users (@).

I have a little problem now: I have "followed" way too many people on Twitter, so I need to split my friends into groups, i.e. reading lists.

* - I, however, have had the opposite experience, which is part of why I abandoned Twitter the first time.
gusl: (Default)
To my big surprise, the spam problem seems to be getting better, and it's not due to better spam filters or captchas, but rather to crackdowns on botnets. I do remember that it used to be worse.

Researcher: The End of Spam Is Closer Than You Think, July 2012

The End of Spam?, Jan 2011
gusl: (Default)
Perhaps one of the defining traits of "nerds" is a low level of body awareness, which comes with "spending too much time in the head". This may explain why yoga has been so revealing for me. I have been learning which sensations correspond to stretch, strain, and pain; and how to move muscles independently of other muscles (often my brain used to think of them as just one thing). Sometimes I need visual feedback to learn to control my muscles. I am lucky to have a teacher who understands how unintuitive this is for me.

I wish we had a standard language for naming specific sensations. I would like to convey precisely the twinge on my lower back, which might be a pinched nerve, but might just be soreness. If my teacher could feel what I feel, he would know what it was, but instead his judgement has to rely on my imperfect attempts at describing it.

When it comes to bodily sensations, we don't know how much subjectivity there is. Psychologists (psychophysicists) can often quantify the subjectivity of senses (say color), because even when words fail, they can do experiments to test whether subjects are able to detect tiny differences in stimuli (perhaps defining a metric on perceptual space, or more!), and then quantify how much people differ in this ability, in different regions of stimulus space. But when it comes to your body, it is much harder to stimulate a sensation to a precision worthy of being called "reproducible". And then there's habituation (which is also a problem for scientists trying to study smell).

Right now you could start a philosophical food fight by bringing up the label-switching problem (namely that, just because you and your teacher are in verbal agreement doesn't mean that your experiences agree), but I just want to be practical here: how can we develop a shared vocabulary that would allow me to better convey my sensation to my teacher, so that he may make a better guess about what is wrong with my back? Are there existing human cultures in which people can easily convey their bodily sensations to each other?

I think that the biggest obstacle here is establishing joint attention. It is easy to teach the names of visual stimuli to a seeing person. But when it comes to coining words to describe types of pain in the back, this becomes like two blind people trying to come up with words for categorizing shapes (they can experience shapes by touch, but without joint attention, i.e. let's say they are not allowed to pass shapes to each other).


Why are "the arts" traditionally visual and/or auditory? Because out of all our senses, vision and hearing are the only senses whose stimulus-response mapping is reliable enough. With the other senses, there is too much variation within and across people to have any control over their experience (which also explains why we have so few olfactory words/concepts). Smell and taste have very little spatiotemporal resolution. Touch may actually be a good candidate.
gusl: (Default)
Tonight I saw the Drunken Master of country music, Greg Garing, at the Treehouse a.k.a. 2A, just the man and his guitar (thanks Kathryn Minogue). It was the craziest performance I've ever seen, quickly switching between mellow and ultra-hard rock; between quiet&romantic and rude. His vocal range is amazing, and I was especially entertained by his shivering bass. One sees that he is improvising the entire time, and cannot resist changing things around or cracking a joke (Running joke: "Can't you guys let me do one song properly from beginning to end?"). As far as ridiculous stunts go, I was especially impressed when he dropped his left arm and used the microphone stand as a slide for a whole blues turnaround, while looking totally out of it.

The event felt like a musicians' party full of old timers, people who had opened for Bob Dylan, and written reviews for Rolling Stone. The shows were projected live (in black&white) onto the red-brick building across the street, for a very nice effect. The bar staff were very chill, and told me that since they didn't serve food, I could bring outside food(!!!).
gusl: (Default)
Tyler Cowen's 8 insights on how human-computer collaboration in the context of chess
Insight #1, as one might imagine, is that human creativity is now worth more than it used to be, since most of the analysis is now automated.
gusl: (Default)
I've given up on compressing my videos. Most compression software doesn't compress batches of files, and those that do usually lose meta-data, like datetime. It also requires a ton of processing, which is slow and heats up my computer (this incidentally explains why my Canon digicams keep them in bulky formats, AVI or MOV: they are not nearly powerful enough to run compression).

So I've been moving my videos to my external G-Drive. But this 750GB drive, which also serves as a backup for my machine, is nearly full. This means that I have two problems:

* buy more space to store my videos.
* buy more space to have a backup of these videos.

I could buy 2x 2TB drives, but this is likely to be expensive.

Any suggestions?
gusl: (Default)
Skeptical friends: some people say that when remembering things, your eyes move in a certain direction; when inventing things, they move in a different direction. The source of these ideas seems to be a pseudo-scientific field known as "Neuro-Linguistic Programming" (NLP). Is there any empirical basis for this claim?

See here


Sep. 10th, 2012 02:41 am
gusl: (Default)
My yoga teacher, Joseph, is an atheist/skeptic, thinks very logically about what I need to work on. This seems to be very rare for yoga.

Today he told me that his "style" doesn't have a name, but that his teacher was Allan Bateman.

Finally, I asked him to name some materialistic schools/style of yoga. He told me:
* Krishnamurthi (empiricist philosophy FTW)
* Strala
* Katonah

(beware, the marketing may be mystical, but that's just marketing)

I might go to a Katonah lesson next week.


I seem to be making steady progress (e.g. I can now touch my toes after just a few minutes of stretching). But I'm still a long ways from where I want to be. There's a lot of work ahead for strenghtening my upper body (abdomen, chest, arms).

As of next week, I'm planning to do two lessons per week.
gusl: (Default)
Although I love graphical models, I suspect that the "potential outcomes approach" is a more proper treatment of causal inference. See new book by Hernan and Robbins (h/t Michael Sobel).

Key concepts:
* counterfactuals: should be well-defined, but often aren't in the social science literature (e.g. to answer "what is the effect of marriage on health?", we'd have to imagine interventions that cause or prevent people from getting married; there are many non-equivalent ways to do this)
* potential outcomes: formalism in which all subjects are considered to have missing data for all but one experimental condition (i.e. the one that they were assigned to). This provides a direct way of thinking about token causation (a.k.a. causes-of-effects).
* ignorability: an important assumption that makes causal inference possible, similar to a missing-at-random assumption.
* propensity score matching: a way of coping when ignorability fails. (See also: Inverse probability weighting)

Directed graphical models perhaps provide something like a more concrete mechanism, allowing us to simulate the effects of interventions and propagate them downstream. But as far as real applications are concerned, papers in this tradition tend to make assumptions less explicit, and tend to mislead practitioners into thinking that the required assumptions are satisfied. (See Dawid - "Beware of the DAG")


UPDATE: Cosma Shalizi writes:
<< You've read Pearl's Statistics Surveys paper, right? I think the critique of the potential outcomes framework there, in section 4, is very strong. (Look at the stuff on ignorability, especially.) As for propensity matching, when the set of covariates you're using to calculate propensities doesn't meet the back door criterion, well, you get results like this. >>
gusl: (Default)
Attention Conservation Notice: ranting about a topic that I know very little about; accuracy is sacrificed for the sake of cute analogies.

Taxation happens when you have transactions between separate entities. For example, when you buy/sell something, or pay/receive money for services. It makes no difference whether the tax is charged to the buyer or the seller, the employer or the employee.

Income tax on services encourages do-it-yourself and informal transactions: if you get your child to do it, no one is going to come into your house and audit your child. And the bigger your house, the more taxation you can avoid. Analogously, corporate income taxes encourage mergers & acquisitions: by bringing your supplier inside the organization, you no longer have to "buy" their product, since it is now made in-house! (Kinda like erecting an eruv). This may explain why corporations, unlike people, are taxed on profits; if supplies couldn't be deducted as business expenses, we would have double taxation and a huge effect on efficiency. (But surely double taxation is alive and well, no?)

It would seem natural to want to acquire your most important supplier. (Kinda like the signing up for a "best friend plan" on your mobile service). But are acquisitions just a simple way to dodge taxes? For one thing, now your big organization has to run a business that may be outside of its expertise, the newly-acquired business can end up insulated from market forces, and gradually lose its competitiveness, yadda yadda yadda. Now, this analysis conflates ownership with management. Of course, it is possible to buy the expertise required to run the sub-company (it will likely come with the package), but managers' interests won't necessarily align with company's. It is also possible to simulate a competitive environment (some large corporations implement competitive markets for supplies, machines, workers, and even venture capital).

The common justification for stopping mergers (and breaking up monopolies) is that they would make it impossible(?) to enforce rules against price-fixing.

I would like to see an empirical study of mergers. If we have a scenario in which a supplier has a single customer, is there any reason not to merge?


UPDATE: I've just convinced myself that corporate income taxes, if flat, have the nice property that they can't be gamed by mergers (Proof: the total profit after the merger will be the sum of the profits; this assumes that profits are positive). However, sales taxes can be avoided this way, for products that they use themselves (rather than resell).

Does a Major Company like Walmart have to Pay Sales Tax when Making Major Purchases From Another Business?
<< Businesses pay sales tax on items they purchase for their own use.
They don't pay sales tax on transactions in which they obtain products for resale in the store. This applies to all businesses of all sizes. Its also universal across state lines. >>

This means that merging along the production chain will not save on taxes. However, Walmart could save on taxes by buying a company that produces flooring or security cameras. Similarly, software companies could save on sales taxes by buying a coffee company.



* 5 tricks corporations use to avoid paying taxes

* When supply chains merge: 5 mistakes to avoid

* One Big Mutual Fund, or, The Ownership Society, by Cosma Shalizi
<< ... Ambitiously, Miller tries to explain why hierarchical corporations exist at all, why they take some of the forms they do, and how, in part, their form relates to their performance. ... >>
gusl: (Default)
5 days to the big exam! We get to use 10 pages of notes, double-sided. It's way too much for my taste, but it's an interesting exercise, since it encourages you to think about relations between topics.

Asymptotics of Estimators (asymptotic consistency, normality)
* MLE: for regular parameter, asymptotically normal, with rate 1/sqrt(n).
* MLE: for truncation parameter, asymptotically exponential, with rate 1/n or worse.
* If family has both types of parameters, we cannot(?) use the Fisher Information to find the asymptotic variance of the regular one. But can't we plug in the true value of the truncation one, and use the asymptotics of the regular subfamily?
* Consistency is guaranteed if n/p -> infinity, plus a few other conditions ("for all theta, the density is bounded" should suffice)
* UMVUE: when is it asymptotically equivalent to the MLE?
* Sample quantiles
* Estimating Equations, a.k.a. no closed form for the MLE (e.g. Beta, Gamma, GLMs). Van der Vaart proves consistency (5.10), normality (5.19).

Fisher Information
* Why the two formulas are equivalent.
* Delta Method (for Rˆn -> Rˆm functions, we can easily generalize it using Jacobians!)
* Why knowing the nuisance parameter decreases the asymptotic variance of the parameter of interest
* Why location-scale families of *symmetric* distrs have a diagonal information matrix.
* Cramér-Rao Inequality, about *unbiased* estimators, comes from Cauchy-Schwarz. Not asymptotic: holds for every n! Equality is attained when we have "linear dependence". In other words, I think this means that an unbiased estimator U will be efficient iff can be written as: U = a*MLE + c.
* Compare: variance bounds for unbiased estimators vs other estimators.
* What if calculating the Fisher information is intractable?
* ATTENTION: is this for a single observation or for the whole sample?

Taylor Expansions
* How the asymptotic normality comes from one-step Newton-Raphson.
* Why asymptotics of likelihood ratio is Chi-Square.
* Delta Method, and how if the first derivative is zero, we get slower convergence to a Chi-Squared.
* Edgeworth Expansions

* Simple vs Simple: Neyman-Pearson.
* Simple vs Composite: compute MLE of the alternative.
* Composite vs Composite: MLE of the null (a.k.a. least favorable distribution)
* UMP: Monotone Likelihood Ratio on the *sufficient* statistic implies that I{T>c} is UMP.
* UMPU: Power function has slope 0. Is it a mixture of two UMPs?
* LMP: maximize the derivative of the power function at the boundary.
* Asymptotic power under contiguous alternatives: projections, non-Central Chi-Squared (I might need more practice with basic power calculations first!)

* Do there exist simple conditions for existence or non-existence?? For location families, UMVUEs for the location parameter should always exist: U = MLE + constant.
* If U is unbiased for 0 and T is UMVUE, then Cov(U,T) = 0.

Confidence Intervals
* Studentized Intervals
* Bootstrap intervals
* Many options for the Fisher information: Fisher Information at MLE, observed Fisher Information, etc. The observed Fisher Information may be biased. If the bias is positive, the resulting coverage probability will be below 1-alpha (but maybe the coverage probability converges to 1-alpha). If the bias is negative, the intervals will be conservative.
* Variance-stabilizing transformations. Do we get better intervals this way? These intervals will be asymmetrical.
* What if the first derivative of g is near zero at the MLE?

Linear Models
* Is S^2 always independent of beta-hat? Why?
* Why is the F-test equivalent to the T-test, and to the Likelihood Ratio Test?
* Review matrix calculus
* Simultaneous Confidence Intervals (studentized maximum modulus, studentized range distributions)

* Complete sufficient statistics for non-parametric families (e.g. all distrs, symmetric distrs, mean-zero distrs, etc)
* Kernel Regression
* Kernel Density Estimation (work out the bias!)
* U-statistics, and using projections to obtain asymptotic normality

* Review exam problem on Metropolis-Hastings
* Bayes Risk: may be minimized by posterior mean, median, mode, depending on the loss function

Probability Facts
* Distributions: pdfs, cdfs, means, variances
* Relations between distributions: conjugacy, convolutions, scaling
* Law of Total Covariance
* Joint distribution between minimum and maximum order statistics.
* Inequalities: Markov, Chebyshev, Jensen
* Dominated Convergence / Monotone Convergence: swap limit and integral.

Calculus facts
* (1 + x/n)^n -> e^x
* \sum_k x^k / k! = e^x
* \sum_{k=0}^n p^k = (1 - (p^n+1)) / (1 - p)
* \sum_{k=0}^n k p^k =


gusl: (Default)

December 2015



RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags