### Things to put on my cheat sheet / Things to think about

01:55 am
5 days to the big exam! We get to use 10 pages of notes, double-sided. It's way too much for my taste, but it's an interesting exercise, since it encourages you to think about relations between topics.

Asymptotics of Estimators (asymptotic consistency, normality)
* MLE: for regular parameter, asymptotically normal, with rate 1/sqrt(n).
* MLE: for truncation parameter, asymptotically exponential, with rate 1/n or worse.
* If family has both types of parameters, we cannot(?) use the Fisher Information to find the asymptotic variance of the regular one. But can't we plug in the true value of the truncation one, and use the asymptotics of the regular subfamily?
* Consistency is guaranteed if n/p -> infinity, plus a few other conditions ("for all theta, the density is bounded" should suffice)
* UMVUE: when is it asymptotically equivalent to the MLE?
* Sample quantiles
* Estimating Equations, a.k.a. no closed form for the MLE (e.g. Beta, Gamma, GLMs). Van der Vaart proves consistency (5.10), normality (5.19).

Fisher Information
* Why the two formulas are equivalent.
* Delta Method (for Rˆn -> Rˆm functions, we can easily generalize it using Jacobians!)
* Why knowing the nuisance parameter decreases the asymptotic variance of the parameter of interest
* Why location-scale families of *symmetric* distrs have a diagonal information matrix.
* Cramér-Rao Inequality, about *unbiased* estimators, comes from Cauchy-Schwarz. Not asymptotic: holds for every n! Equality is attained when we have "linear dependence". In other words, I think this means that an unbiased estimator U will be efficient iff can be written as: U = a*MLE + c.
* Compare: variance bounds for unbiased estimators vs other estimators.
* What if calculating the Fisher information is intractable?
* ATTENTION: is this for a single observation or for the whole sample?

Taylor Expansions
* How the asymptotic normality comes from one-step Newton-Raphson.
* Why asymptotics of likelihood ratio is Chi-Square.
* Delta Method, and how if the first derivative is zero, we get slower convergence to a Chi-Squared.
* Edgeworth Expansions

Testing
* Simple vs Simple: Neyman-Pearson.
* Simple vs Composite: compute MLE of the alternative.
* Composite vs Composite: MLE of the null (a.k.a. least favorable distribution)
* UMP: Monotone Likelihood Ratio on the *sufficient* statistic implies that I{T>c} is UMP.
* UMPU: Power function has slope 0. Is it a mixture of two UMPs?
* LMP: maximize the derivative of the power function at the boundary.
* Asymptotic power under contiguous alternatives: projections, non-Central Chi-Squared (I might need more practice with basic power calculations first!)

UMVUE
* Do there exist simple conditions for existence or non-existence?? For location families, UMVUEs for the location parameter should always exist: U = MLE + constant.
* If U is unbiased for 0 and T is UMVUE, then Cov(U,T) = 0.

Confidence Intervals
* Studentized Intervals
* Bootstrap intervals
* Many options for the Fisher information: Fisher Information at MLE, observed Fisher Information, etc. The observed Fisher Information may be biased. If the bias is positive, the resulting coverage probability will be below 1-alpha (but maybe the coverage probability converges to 1-alpha). If the bias is negative, the intervals will be conservative.
* Variance-stabilizing transformations. Do we get better intervals this way? These intervals will be asymmetrical.
* What if the first derivative of g is near zero at the MLE?

Linear Models
* Is S^2 always independent of beta-hat? Why?
* Why is the F-test equivalent to the T-test, and to the Likelihood Ratio Test?
* Review matrix calculus
* Simultaneous Confidence Intervals (studentized maximum modulus, studentized range distributions)

Non-parametrics
* Complete sufficient statistics for non-parametric families (e.g. all distrs, symmetric distrs, mean-zero distrs, etc)
* Kernel Regression
* Kernel Density Estimation (work out the bias!)
* U-statistics, and using projections to obtain asymptotic normality

Bayesian
* Review exam problem on Metropolis-Hastings
* Bayes Risk: may be minimized by posterior mean, median, mode, depending on the loss function

Probability Facts
* Distributions: pdfs, cdfs, means, variances
* Relations between distributions: conjugacy, convolutions, scaling
* Law of Total Covariance
* Joint distribution between minimum and maximum order statistics.
* Inequalities: Markov, Chebyshev, Jensen
* Dominated Convergence / Monotone Convergence: swap limit and integral.

Calculus facts
* (1 + x/n)^n -> e^x
* \sum_k x^k / k! = e^x
* \sum_{k=0}^n p^k = (1 - (p^n+1)) / (1 - p)
* \sum_{k=0}^n k p^k =

### productivity

07:27 pm
The most important (and possibly the hardest) exam of my life is in 3 weeks. I am looking for a good balance between:
* sleeping
* eating
* stimulants
* focused work
* exercise/yoga/etc
* leisure

... more specifically a daily schedule for the above. (Should "leisure" be absent?)

A good 20 minutes of cardio seems to make everything easier, but: when I am productive, I don't want to take breaks; and when I'm not productive, it's even harder to justify it to myself. I guess stimulants help me stay on task, but the effect is not big enough to be obviously better than placebo, and they can make me wake up too early the next day.

It would be nice if there were an atheist version of prayer.

### NYC subway fantasy map

05:20 pm
When I have nothing better to do in the subway, I often imagine extensions to the subway system:

* the 2nd Avenue Line (T line).
* extension of the 7 train to the Hudson Yards development.

(these two are already in the city's official plans)

* a line under 125th St in Harlem, to solve the horrible traffic jams.
* a way of crossing between UWS and UES, without taking a bus.
* better connectivity in Brooklyn (there are lots of line crossings without connections)
* better connectivity between Brooklyn and Queens. Currently, the G and JZ are the only direct connections... Actually, to be precise, many other trains cross over to Queens (AC, 23, etc.), but don't connect to the main Queens lines (namely, 7, E, F, M, NQR). You can imagine extending the Brooklyn M from Middle Village to Jackson Heights.

Somebody in Philadelphia made a PDF map with waaayy more stuff than I ever thought of.

### new WC?

11:05 am
I live in a 2BR apartment with one housemate, and I am tired of sharing a bathroom. The design is particularly bad because the toilet, sink and shower are all in the same space, ensuring that unless the housemates are willing to share that level of intimacy, it is impossible for one person to do X while the other does Y. (I try to have an open-door policy, but few housemates agree with that).

Amazingly for NYC, however, we have extra space in the apartment that nobody uses. In particular, there is a closet:

Length = 25in
Width = 28-30in

There is another closet of the same proportions adjacent to this one, in my bedroom. So if we broke that wall, we'd have a bigger space:

Length = 52in
Width = 28-30in

If we wanted to make a little WC (toilet + sink) the proper way, we would need to install new pipes, which would involve breaking the floor, and probably paying thousands of dollars to plumbers.

I am wondering if there is a cheaper solution, something like a portable toilet that fits in the 25in x 28-30in space... how easy would it be to clean it, and how often would I need to do it? Is there some sort of sewage pump that I could use from the portable toilet to the real toilet?

### Anglicisms in Brazil; multilingual etymological dictionary?

09:25 pm
I occasionally read Portuguese translations of the NYTimes. They tend to be very literal, so some of the meaning is surely lost on readers who don't know English expressions and mannerisms (Paul Krugman's puns come to mind).

Anglicisms have been penetrating Portuguese (and most big languages) for many decades now. Were the concepts of "marketing", "design" and "nerd" obscure in Brazil before they made their way from the anglophones? If not, why would these English words take over so successfully? Can cultural colonialism be successful when there isn't a vacuum to fill?

I am guilty of using words like "approach" and "standard"; because "abordagem" and "padrão" are too inaccessible in my brain... but the latter are most likely still Anglicisms, only they are dressed up to look like locals. I am guilty of using "deadline" when perfectly a good translation exists ("prazo"). I come up short when I want to express:

* "unhealthy" ("insalubre"? WTF? Why not "in-saudável"? The lack of a word here is clear evidence that PT speakers don't feel the freedom to modify words through affixes)
* "unlike" ("ao contrário de" is too long!)
* "range" ("alcance" is far too narrow)
* "quack" ("charlatão" is too formal)

my notes on Portuguese

---

On different types of loans

Between semantic loans and loan translations, it is remarkable how often words, metaphors and expressions translate exactly between European languages, leaving us puzzled about the history of a word or phrase. An etymological dictionary can tell you the first mention of "responsibility" in the English language, but it usually won't tell you if it was (a) a straightforward borrowing from another language, e.g. "responsabilité" (French), (b) a structural borrowing/rederivation based on e.g. Dutch "ver-antwoord-elijk-heid" (-response-ly-hood). There is also the possibility that such rederivations are a coincidence, i.e. independent reinventions... the likelihood of which should depend on how much you believe in the universality of the human conceptual system.

I suspect that educated speech uses a greater proportion of type B loans, whereas informal speech type uses more type A loans.

Another example (to borrow SAT analogy notation):
* community : common :: gemeenschap : gemeen

my notes on Dutch

### 3D model of human body; human flexibility; yoga

10:48 am
Dear LJ Genie,

I would like to play with a 3D model of the human body. You have probably seen kinetic sculptures of a human skeleton, in which rigid bones are connected by swiveling joints. What I want is a digital version of this, but with more constraints on flexibility. Namely:

* joints should not be hypermobile, but rather they should have a normal range of motion.
* a muscle layer would further constrain their motions, in such a way that you'd want to stretch the character in order to increase their range.

If we can furthermore model muscle strength, energy, etc, you can imagine simulating workouts, etc; and figure out the optimal workout for a given body state.

Gustavo

---

Layers of human flexibility

* muscle flexibility: temporarily increased by stretching, >1 minute to stretch fully, half-life of ~4 hours; breaking the adhesions between neighboring muscles (myofascial release)
* joint flexibility: temporarily increased by cracking, one swift motion, half-life of ~15 minutes
* skin flexibility

---

So 2 months after my first yoga lesson with Joseph, I've decided to start again. He often works by having me attempt a difficult goal (e.g. do a headstand), observing the limiting factors (e.g. weakness/tightness of back muscles, pains, etc), and then working on these more basic goals. The fact that he is willing to dispense a bit of professional massage is a big help when there is pain.

There is something very very satisfying about opening up areas of the body that have been closed for a long time. The only bad thing is that after the lesson, I'm tempted to crack my back for the next day or two. One theory is that most of the time, the muscles restrict my motion so that I don't feel the temptation to crack; but if we loosen that restriction, I'm just one delicious step away from increasing my flexibility. Another theory is that the yoga itself cracks my joints, providing immediate rewards, and thus restarting the addiction.

I suspect that strengthening my back muscles would go a long way towards curing my temptation to crack. For one, strong muscles would support good posture, making it harder for my vertebrae to fall out of alignment (I guess that most of my temptation to crack comes from perceived misalignment).

### insta-chair

05:32 pm
Dear LJ Genie,

I would love to get one of those portable instant chairs. They aren't full chairs, just a support for your butt when sitting on the ground, made of nylon (or similar material), with the rough shape&size of a laptop, ideal for reading a book while sitting on the grass. I remember seeing them around ~2000, perhaps as one of the products sold by MGMT101 students.

Gustavo

### my iPad apps & reviews (mostly drawing and note-taking)

03:29 pm
ShowMe: FREE! Perfect for Khan Academy-style lectures.

DropText: $1. For editing plain text documents on the DropBox cloud, a feature that the DropBox app is lacking. These 2 apps are mainly for annotating pre-existing documents: iAnnotate PDF:$10. Highlight and draw on PDFs. Annoyance: pencil thickness is zoom-dependent, so every annotation has its own thickness.

GoodReader: $5. Basically a filesystem. It feels like many different apps bundled as one. You can edit text files. You can highlight and draw on PDFs, but the UI is too complicated and tool icons are unpleasantly small. I suspect that there's a way to connect to DropBox, making DropText obsolete, but I haven't figured it out yet. Notability ($5 $1) and NoteShelf ($6): Very similar apps. They are both (a) paper-like note-taking apps that (b) support typing as well as handwriting, and (c) offer dual zoom view, which is very useful for handwriting more precisely while viewing a bigger area of the "paper", and which comes with a nice line-wrapping feature. (d) Organized into 3 levels: Notability calls them "categories > notes > pages", while NoteShelf calls them "bundles of notebooks > notebooks > pages".

Notability has text search; NoteShelf doesn't. On Notability, you move by using 2 fingers; on NoteShelf, you can use just one finger on the zoomed-out view.

On Notability, each page is associated with an audio recordings (a feature I've never wanted). NoteShelf has a wrist protection feature, which works sometimes.

Notability has a silly use of colors by default (which you can change), while NoteShelf has a very tasteful look.

Noteshelf vs. Remarks vs. Notability: iPad handwriting app shootout!

Apparently Notability can also annotate PDFs via DropBox! Maybe I should move it to the category above.

Notability is lacking a textbox feature, but planning to add it in a future version. This means that typing can be clunky, since all text starts from the left margin.

Skitch: FREE. For annotating maps/pictures. Not very good. No scrolling.

Unlike the above, this one is for artists:

SketchBook Express: FREE. Lots of fun. Transparent layers lets you overlay things. Unfortunately, it doesn't deal properly with the iPad's rotational symmetry, so many saved pictures end up upside down. The Pro version apparently gives you more drawing tools, more layers, etc.

Infinite Sketchpad: $1. A toy. Drawing app with unbounded zoom in both directions. ### armchair neuroscience 02:33 pm Conjectures: (a) the temporal resolution of human vision is determined by the refractory period of sensory neurons (b) this is the mechanism by which alcohol slows down one's reflexes. Alex Sweet (1951) p.198: << At the fovea, for either the light- or dark-adapted eye, the just noticeable interval of time between two adjacent flashes was approximately 5m.sec >> i.e. if you're going to have a flickering light, the flicker will tend to be noticeable if the frequency is lower than 200Hz. ### R: semantics and pragmatics / names vs values 11:53 am One of the annoyances in R is dealing with functions that don't evaluate one or more arguments that you pass, or who otherwise use the name of the variable passed. The problem appears when you try to write abstractly. e.g. with(data, ZQ/Total.Z) will compute data$ZQ / data$Total.Z . What 'with' is doing is parsing that expression, figuring out which variable tokens are already present in the current environment, and putting "data$" in the front of the rest. Yesterday, in my naivety, I implemented just that (28 easy-to-read lines of R).

However, it's hard to do something more abstract, e.g. with(data, property) will try to get a property named "property". To circumvent this, one can make a call to eval:

withExpr <- jPaste("with(x,",property,")")
eval(parse(text=withExpr))

I am not happy with this, but there is NO OTHER WAY. I say this confidently because 'with' appears to completely discard the value of the variable passed, while only using its name, i.e. something like:

property <- deparse(substitute(property))

Having to call eval is the price we pay for the convenience of not using quotes.

And, guess what, I take the deal! Yesterday, I wrote 'violinPlot', which is like a 'boxplot' but with kernel density estimates instead of quantiles. The two basic arguments to violinPlot are 'datasets' and 'property': for each dataset, it extracts the property and plots a violin.

l <- list(mon, tue, wed, thu, fri, sat, sun)
violinPlot(l, ZQ/Total.Z, col=c(rep("#AAAAFF",5), rep("orange", 2)), horizontal=FALSE)

My code starts with:
violinPlot <- function(datasets, property,
labels=c("M", "T", "W", "R", "F", "Sa", "Su"),
horizontal=TRUE, colors=NA){
property <- deparse(substitute(property))
colors <- rep(colors, length(datasets)/length(colors)+1)
densities <- lapply(datasets, function(x) density(with2(x,property)))
...
}

with2 <- function(data, expr, ...)
with(data, eval(parse(text=expr)), ...)

You can see above that I also wanted to pass 'property' without quotes. Having essentially reimplemented 'with', I am in a position to modify it so that the syntax becomes with(data, "ZQ/Total.Z"), and spare myself the eval next time... but I don't wanna.

But here's what I might do: instead of with(data,expr), make it with(data, exprLiteral=NULL, exprToEvaluate=NULL), and you would only pass one of these expr arguments. The difference is that 'exprToEvaluate' gets evaluated into a string (so it better be a string!); whereas 'exprLiteral' gets turned into a string directly, and corresponds to the current syntax of 'with'... and since 'exprLiteral' comes first (in the second position of the argument list), current calls to 'with' would continue working. Yay, backward-compatibility!

### “rape is ‘not about sex’” in the same way that “anorexia is ‘not about food.’”

04:04 pm
On the occasion of Take Back the Night, I have been seeing posters claiming that "rape is a hate crime, and not about sexual desire". I was naturally reminded of Maymay's blog post, which attributes this belief to the idea that sex and violence cannot go together (which is obviously contradicted in BDSM).

One might imagine that such a bold conclusion was surely arrived at after years of rigorous studies in behavioral science. And if so, it would be a remarkable finding: considering that animals rape, one should conclude that either humans are special, or that animals also rape for the sake of power alone (both of which seem unlikely to me).

What is so objectionable about the idea that rape is about both sex and power; and why do we need to accompany this with a disclaimer, that rape is an awful thing to inflict on anyone, regardless of the perpetrator's motivations?

Finally, a quote from Staci Newmahr "Playing on the Edge: Sadomasochism, Risk, and Intimacy":
<< Rape, which many of us would shudder to consider “intimacy,” is so heinous precisely because it is so intimate. >>

<< [this myth] encourages many young cisgender men (and others) to internalize the belief that having a penis makes them a rapist NO MATTER WHAT they do -- thus, they might as well just give up (either on sexual relationships entirely or on consent) and not even try. >>

It sounds odd that guilt can propagate this way, but I know a therapist who works here at Columbia, and he told me the exact same thing.

06:09 pm
I just made a big discovery from my sleep dataset (93 nights): my "Time in REM sleep" has a strong autocorrelation, i.e. I have streaks of high REM sleep nights, and low REM sleep nights.

Here are some statistics, 1-day auto-correlations for different quantities:
 rho-hat CI p-value Total Sleep time [-0.213, 0.197] ZQ 0.22 [0.0114, 0.4028] 3.8e-02 Time in Deep 0.29 [0.0915, 0.4678] 4.9e-03 Time in REM 0.43 [0.247, 0.584] 1.87e-05 Time in Light [-0.243, 0.166]

The data also suggests that 'Time in Deep' has a significant autocorrelation, but not as strong as 'Time in REM'.

Here's a time series of 'Time in REM':

In the above plot, we see that I had a streak of 6 days in my top quartile, and a streak of 7 days in my bottom quartile... which would be unlikely without autocorrelation.

For comparison, see a series of 'Total sleep' (the really bad nights correspond to a nasty strep infection I had earlier this year):

The natural scientific question is: what factors predict (or better, cause) periods of high REM sleep?  I've computed a tiredness variable, as an exponential moving average of 'Total Sleep Time' (or 'ZQ'), and it suggests that the more tired I am, the less REM sleep I will have... but this effect is estimated at 0.216, which is more modest than the autocorrelation in REM Sleep, so it could be due to confounding (i.e. I am most tired when my previous night's REM Sleep was low, which predicts the next night's REM Sleep also being low).

Note that ZQ is defined as a linear combination of the different phases of sleep, so it's not all that surprising that it seems to have some degree of autocorrelation.  If Z=X+Y, can we decompose autocorrelation(Z) into components?

### what I love about R

11:36 pm
One thing I really love about R is how I can write improperly-scoped code, and everything still works.

gSmooth <- function(x,y, kernelSd=1, kernel=function(z) dnorm(z,mean=x[i],sd=kernelSd)){
v <- c()
for (i in seq_len(length(x))){
weights <- sapply(x, kernel)
v[i] <- sum(weights*y)/sum(weights)
}
list(x=x,y=v)
}

plot(data$ZQ, type="l", ylim=c(0,130)) ss <- gSmooth(1:n,data$ZQ)
pplot(ss$x, ss$y, type="l", col="red")
ss <- gSmooth(1:n,data$ZQ, kernelSd=3) pplot(ss$x, ss$y, type="l", col="blue") ---- This is much cleaner: gSmooth <- function(x,y, kernel=gaussKernel){ v <- c() for (i in seq_len(length(x))){ center <- x[i] weights <- sapply(x, function(z) kernel(z, center)) v[i] <- sum(weights*y)/sum(weights) } list(x=x,y=v) } gaussKernel <- function(z, center) dnorm(z, mean=center, sd=kernelSd) emaKernel <- function(z, center) if(z<=center) return(exp((z-center)/kernelSd)) else return(0) ## Exponential Moving Average plot(data$dayNumber, data$ZQ, type="p") pplot(data$dayNumber, data$ZQ, type="l") kernelSd <- 3 ss <- gSmooth(data$dayNumber,data$ZQ) pplot(ss$x,ss$y, type="l", col="red") kernelSd <- 3 ss <- gSmooth(data$dayNumber,data$ZQ, kernel=emaKernel) pplot(ss$x,ss\$y, type="l", col="blue")

Note how the Exponential Moving Average (in blue) is backward-looking, and less smooth than the Gaussian one (in red), even though these kernels, when viewed as distributions, have the same standard deviation of 3.

I think that this is in part due to the Exponential kernel not being as smooth as the Gaussian one, but I also suspect that it weights the points less evenly.

### equivariant and invariant estimators

10:37 pm
When estimating the parameter of a location family, one can take the sample mean or the sample median. Or, more generally, any equivariant estimator. In a normal location family, the sample mean is efficient. In a double-exponential family, the median is better. (Solve for the MLE in each case. Whenever the MLE exists, it is asymptotically efficient)

Now, note that these two families are disjoint subsets of the exponential power family. When p=1, you're in a double-exponential family (Laplace); when p=2 you're in a normal family.

Michael Sherman - Comparing the Sample Mean and the Sample Median: An Exploration in the Exponential Power Family shows that when p=1.407, the mean and the median are equally good.

Note that the mean and median are examples of linear combinations of the order statistics. We can imagine how different linear combinations of the order statistics would be optimal for different values of p.

---

This is what I saw in class:

Definition: an equivariant estimator T(X) is one that satisfies T(X + ε 1) = T(X) + ε. i.e. if you shift all the data by some amount ε, the estimator changes by ε. Examples: sample mean, sample median, sample max, sample min.

Definition: an invariant estimator T(X) is one that satisfies T(X + ε 1) = T(X). i.e. if you shift all the data by the same amount, the estimator does not change. Examples: sample variance, interquartile range.

Theorem: if T1 is equivariant and T2 is invariant, then T1 + T2 is equivariant.

Definition: the maximum invariant Y(X) is the (n-1)-dimensional vector (X2 - X1, X3 - X1, ..., Xn - X1).

Theorem: every invariant estimator is a function of the maximum invariant.

---

This is how I made sense of it all.

Claim 1: Equivariant estimators are precisely the linear combinations of the order statistics $\sum {a}_{i}{X}_{\left(i\right)}$ satisfying ${a}_{1}+...+{a}_{n}=1$.
Claim 2: Invariant estimators are precisely the linear combinations of the order statistics satisfying ${a}_{1}+...+{a}_{n}=0$.
Proof:
Since the ${X}_{i}$ are iid, the order statistics ${X}_{\left(i\right)}$are sufficient.
Non-linear functions of the order statistics cannot be equivariant or invariant. (not proven)
Thus equivariant and invariant statistics must have the form $T\left(X\right)=\sum {a}_{i}{X}_{\left(i\right)}$.
Now consider that for invariant estimators, $T\left(X+\epsilon 1\right)=T\left(X\right)+\epsilon$.
But $T\left(X+\epsilon 1\right)=\sum {a}_{i}{\left(X}_{\left(i\right)}+\epsilon \right)=\sum {a}_{i}{X}_{\left(i\right)}+{a}_{i}\epsilon =T\left(X\right)+\epsilon \sum {a}_{i}$.
For equivariant estimator, this must be equal to $T\left(X\right)+\epsilon$, so it follows that $\sum {a}_{i}=1$.
For invariant estimators, this must be equal to $T\left(X\right)$, so it follows that $\sum {a}_{i}=0.$
QED.
---

For the record, I am beginning to use LyX. It is nice, but when I adjust the displayed font size, the math stays the same size, and so it looks comparatively tiny. This post was made by exporting to XHTML, copy-pasting onto DreamWidth, and then deleting the silly "magicparlabel" A-tags that were making everything look green.

### HVAC stupidity

03:28 pm
I've never understood why most buildings have such stupid and clunky HVAC systems, wasting energy while sacrificing your comfort.

Just got an email from the Vice President of Operations, Columbia University Facilities:
Dear Colleagues,
Unseasonably high temperatures are forecasted for our area over the next few days. As we are in the midst of heating season, and most university buildings are not able to supply both heating and cooling simultaneously, many areas of campus may feel warmer than usual. We appreciate your patience as we work to keep everyone on campus comfortable.

"most university buildings are not able to supply both heating and cooling simultaneously"

The implicit premise here is that they can't turn off the heat! It's unclear whether the blame lies with the engineers or the policy-makers.

### Probability II

11:25 am
The first class went pretty well, though I had prepared well by reading up on Karatzas & Shreve during the holidays. This stuff is pretty neat: we defined Stochastic Processes on arbitrary state-spaces and index sets. "Time" can be defined in ℝn, and even in L2! We defined Brownian Motion, and looked at sample paths as functions of time: for any fixed ω, the corresponding sample path is t ↦ Xt(ω).

We showed the existence of Brownian Motion by the Daniell-Kolmogorov Extension Theorem, which can be proven using the Carathéodory Extension Theorem (whatever that means).

We also defined notions of equality of stochastic processes. For simple random variables, the usual notions are:

* absolute equality (my coinage): for all ω . X(ω)=Y(ω)
* almost-sure equality: the set {ω | X(ω)=Y(ω)} has measure 1, also written P(X=Y) = 1.
* equality in distribution: for all A, P(X ∈ A) = P(Y ∈ A)

Now, stochastic processes have a time index.

The notion of absolute equality can be developed very easily:
* X and Y are "absolutely equal" (again, my coinage) if ∀ ω, t . Xt(ω) = Yt(ω)

But, in the case of continuous-time stochastic processes, the notion of almost-sure equality breaks into two non-equivalent notions, which differ on the placement of the quantifier ∀:

* X and Y are said to be indistinguishable if almost all the sample paths agree, i.e. if P({ω | ∀ t ∈ T . Xt(ω) = Yt(ω)}) = 1,

* X and Y are said to be modifications of each other if at every time, they are equal almost surely, i.e. ∀ t ∈ T, P({ω | Xt(ω) = Yt(ω)}) = 1,

Equality in distribution develops into:
* if X and Y have the same finite-dimensional distributions (FDDs), they are said to be versions of each other.

(I suspect there is compactness-type result somewhere, to the effect that having the same FDDs implies having the same infinite-dimensional distributions too)

---

Someone said the style of the lecture was very "French", in the sense that he only gave us a bunch of ideas, little detail, no proofs. I'm totally ok with that, since proofs are best done at the privacy of your own home. The issue, of course, is making sure that one's derivation skills don't fall hopelessly behind. Hopefully the TA will put us through exercises.

The mathematical level seems to be uncomfortably high for most statisticians. A few people were lost when he defined consistency in terms of a "pushforward" (myself included).

### spam in 2012

10:08 pm
I was just asking someone to not publish my email address inside a plain mailto link, lest spammers harvest my email. I asked that he use instead (a) ReCaptcha (b) an image (c) a JavaScript trick, or (d) a less important email address. He didn't want to bother, so I argued that this is an important email address for me, so that I want to keep the signal-to-noise ratio high; and that once spammers have your address, there's no going back.

Spam filtering is not perfect, and I occasionally see false negatives my Inbox, and false positives in my Spam folder (when I feel brave enough to look through the junk). I always thought that the problem of spam would get worse and worse, given the economic incentives involved, but this doesn't seem to have happened: probabilistic spam filtering does indeed seem to have had a lasting impact; and I'm sure that we also have people who fight this battle at a higher level, occasionally blocking sections of the Internet that send a lot of spam.

My email experience has been quite reasonable in recent years, but I'm not sure how much of this is due to my own hygiene, and how much is due to good spam filtering.

Am I still justified in asking people to not expose my email in mailto links?

### Zeo sleep manager

04:21 pm
(cross-posted from G+)

So I started using a Zeo Sleep Manager last week. It seems to work really well, and I was able to make some pretty graphs with it... BUT:
* it allows naps to be overwritten if you don't upload the data rightaway
* it's hard to export your data without uploading it to MyZeo. The ZeoDecoderViewer app isn't able to read my DAT file. Maybe I should try the Zeo Raw Data Library.

My impression is that they cared about open data geeks up until early 2011, but now they are a mainstream product...

### liquid "r" in the Zona Sur of La Paz, and the reach of American influence

06:35 pm
Yesterday, my girlfriend showed me this video: Boliviano Fashions featuring two Bolivian male characters in their early 20s, with a boastful upper-class attitude: one from La Paz, one from Santa Cruz; making slightly exaggerated accents. The La Paz one (known as a "jailón"; or a guido if we presume he's a wannabe) has a liquid R, roughly ɻ, for "rr" or initial R (i.e. like the stereotypical American "r"):

"perro" = "peɻo", "ratón" = "ɻaton".

... which I hypothesized came from Indigenous languages (which I believe is the origin of the liquid R in Brazil's caipira dialect), but according to my girlfriend this R came from the Americans! Apparently, the American influence on the upper classes of La Paz is much bigger than I imagined.

The fact that many kids who go to the American School of La Paz would speak to each other in English during their off-time and celebrate Thanksgiving makes this theory more plausible. This, of course, makes them the target of other young people, especially those with anti-American sentiment, who see this behavior as pretentious.

In Amsterdam I had a housemate from Colombia who was struggling because he hadn't learned English yet; because when he was younger, he didn't like the idea of being "one of those English speakers", because of its symbolic value. I think this is a pretty common mistake that teens make in Latin America.

### public service announcement - Oxy and sun damage

03:39 am
Over the last two years, my face has become rather freckly, more than anyone in my family. My first dramatic realization of this was during August 2010. I suspect this is due to many years of using anti-acne product Oxy (46% alcohol + 2% salicylic acid) making my skin prone to sun damage. My little patches of red hair during the last few summers are probably due to a similar mechanism.

<< You're right that exposing skin treated by salicylic acid to the sun will absolutely increase the likelihood of sun damage to your skin and a sunscreen should be used. I would imagine that the specific SPF you need will depend on things like the amount of sun exposure you'll be getting but here's one resource I like that helps consumers find sunscreens that really protect their skin rather than damage it: the Environmental Working Group's Sunscreen Guide at http://www.ewg.org/2010sunscreen/. >>

The instructions should probably tell you to wash your face with water after use (they don't!). Now I'm trying to use Oxy less often... and sunscreen more often (during the summer).

from the FDA website:
<< according to CIR Director Alan Andersen, products containing salicylic acid should either contain a sunscreen or bear directions advising consumers to use other sun protection. In order to comply with the CIR recommendations, cosmetic manufacturers should test their products to determine whether or not they cause an increase in sensitivity to the harmful ultraviolet radiation in sunlight. >>

... apparently this is not a regulation yet, or not one that is enforced.

http://truthinaging.com/ingredients/salicylic-acid
<< Salicylic Acid is also known to increase the skin’s sun sensitivity by 50%. All exfoliants that remove layers of the dermis expose the skin to more UV rays, therefore increasing UV radiation. >>

gusl

No cut tags