gusl | triangle inequality for correlation distance (Reply)

Given a dataset with 3 variables A, B, C, you observe a value for the correlation between A and B, and another value for the correlation between B and C.

Question: what are the possible values of the correlation between A and C?

Answer (due to en_ki): if your dataset has n points, center it, and consider A, B and C as vectors in ℜⁿ. The empirical correlation between two variables is the cosine of the angle between them.

Thus the question becomes easy: what are the possible values of the angle AC, given the angles AB and BC?

---

Original post, for the record:

Is there a sort of triangle inequality for correlations? I'd like to get a lower bound on R(A,C) given R(A,B) and R(B,C). Imagine the latter two as being close to 0.95: is it possible for R(A,B) to be smaller than 0.5? I think not, but don't have a proof.

Let me explore.

The (simple estimator for) sample covariance (the population covariance is the limit, when the sample is infinite):

Cov(A,B) = 1/n (SUM_i (A_i - mu_A) (B_i - mu_B))

Correlations don't change if we rescale the variances to be 1:

R(A,B) = 1/n (SUM_i (A_i - mu_A) (B_i - mu_B))

R(B,C) = 1/n (SUM_i (B_i - mu_B) (C_i - mu_C))

R(A,C) = 1/n (SUM_i (A_i - mu_A) (C_i - mu_C))

Correlations don't change if we recenter so that the means are 0:

R(A,B) = 1/n (SUM_i (A_i B_i))

R(B,C) = 1/n (SUM_i (B_i C_i))

R(A,C) = 1/n (SUM_i (A_i C_i))

So how could I prove any sort of triangle inequality here?

Let n = 1, then all 3 correlations are 0.

Let n = 2, then
R(A,B) = 1/2 (a1b1 + a2b2)
R(B,C) = 1/2 (b1c1 + b2c2)
R(A,C) = 1/2 (a1c1 + a2c2)

But since the means are 0:
a2 = -a1
b2 = -b1
c2 = -c1

So:
R(A,B) = 1/2 (a1b1 + a1b1) = a1b1
R(B,C) = 1/2 (b1c1 + b1c1) = b1c1
R(A,C) = 1/2 (a1c1 + a1c1) = a1c1

Since variance is 1, < R(A,B) , R(B,C) , R(A,C) > is in {1, -1}^3, i.e. each possible assignment is a corners of the cube. The constraint rules out some possibilities.

e.g.:
R(A,B) = +1
R(B,C) = +1
-----------------------------------------
a1=1, b1=1, c1=1 \/ a1=-1, b1=-1, c1=-1
----------------------------------------
R(A,C) = +1

But this is hardly a triangle inequality.

Let n = 3:

R(A,B) = 1/3 (a1b1 + a2b2 + a3b3)
R(B,C) = 1/3 (b1c1 + b2c2 + b3c3)
R(A,C) = 1/3 (a1c1 + a2c2 + a3c3)

Since the means are zero:
a3 = -a2-a1
b3 = -b2-b1
c3 = -c2-c1

So:
R(A,B) = 1/3 (a1b1 + a2b2 + (-a2-a1)(-b2-b1)) = 1/3 (a1b1 + a2b2 + a2b2 + a1b1 + a1b2 + a2b1) = 1/3 (2 a1b1 + 2 a2b2 + a1b2 + a2b1)

Similarly:
R(B,C) = 1/3 (2 b1c1 + 2 b2c2 + b1c2 + b2c1)
R(A,C) = 1/3 (2 a1c1 + 2 a2c2 + a1c2 + a2c1)

Can we prove any interesting bounds algebraically from the above?

Here's a constraint on each variable, since the variance is 1. Since a1^2 + a2^2 + a3^2 = 1, it follows that:
a1^2 + a2^2 + (-a2-a1)^2 = 1
a1^2 + a2^2 + a1^2 + a2^2 + 2a1a2 = 1
2 a1^2 + 2 a2^2 + 2a1a2 = 1)

Coming up with a hypothesis, proving it for n=3, and the induction step are left as an exercise for the reader (and the writer too). Possibly a challenging one.

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29

Gustavo Lacerda

triangle inequality for correlation distance

Profile

February 2020

Most Popular Tags

Style Credit

Expand Cut Tags