![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Given a dataset with 3 variables A, B, C, you observe a value for the correlation between A and B, and another value for the correlation between B and C.
Question: what are the possible values of the correlation between A and C?
Answer (due to
en_ki): if your dataset has n points, center it, and consider A, B and C as vectors in ℜn. The empirical correlation between two variables is the cosine of the angle between them.
Thus the question becomes easy: what are the possible values of the angle AC, given the angles AB and BC?
---
Original post, for the record:
Is there a sort of triangle inequality for correlations? I'd like to get a lower bound on R(A,C) given R(A,B) and R(B,C). Imagine the latter two as being close to 0.95: is it possible for R(A,B) to be smaller than 0.5? I think not, but don't have a proof.
Let me explore.
The (simple estimator for) sample covariance (the population covariance is the limit, when the sample is infinite):
Cov(A,B) = 1/n (SUM_i (A_i - mu_A) (B_i - mu_B))
Correlations don't change if we rescale the variances to be 1:
R(A,B) = 1/n (SUM_i (A_i - mu_A) (B_i - mu_B))
R(B,C) = 1/n (SUM_i (B_i - mu_B) (C_i - mu_C))
R(A,C) = 1/n (SUM_i (A_i - mu_A) (C_i - mu_C))
Correlations don't change if we recenter so that the means are 0:
R(A,B) = 1/n (SUM_i (A_i B_i))
R(B,C) = 1/n (SUM_i (B_i C_i))
R(A,C) = 1/n (SUM_i (A_i C_i))
So how could I prove any sort of triangle inequality here?
Let n = 1, then all 3 correlations are 0.
Let n = 2, then
R(A,B) = 1/2 (a1b1 + a2b2)
R(B,C) = 1/2 (b1c1 + b2c2)
R(A,C) = 1/2 (a1c1 + a2c2)
But since the means are 0:
a2 = -a1
b2 = -b1
c2 = -c1
So:
R(A,B) = 1/2 (a1b1 + a1b1) = a1b1
R(B,C) = 1/2 (b1c1 + b1c1) = b1c1
R(A,C) = 1/2 (a1c1 + a1c1) = a1c1
Since variance is 1, < R(A,B) , R(B,C) , R(A,C) > is in {1, -1}^3, i.e. each possible assignment is a corners of the cube. The constraint rules out some possibilities.
e.g.:
R(A,B) = +1
R(B,C) = +1
-----------------------------------------
a1=1, b1=1, c1=1 \/ a1=-1, b1=-1, c1=-1
----------------------------------------
R(A,C) = +1
But this is hardly a triangle inequality.
Let n = 3:
R(A,B) = 1/3 (a1b1 + a2b2 + a3b3)
R(B,C) = 1/3 (b1c1 + b2c2 + b3c3)
R(A,C) = 1/3 (a1c1 + a2c2 + a3c3)
Since the means are zero:
a3 = -a2-a1
b3 = -b2-b1
c3 = -c2-c1
So:
R(A,B) = 1/3 (a1b1 + a2b2 + (-a2-a1)(-b2-b1)) = 1/3 (a1b1 + a2b2 + a2b2 + a1b1 + a1b2 + a2b1) = 1/3 (2 a1b1 + 2 a2b2 + a1b2 + a2b1)
Similarly:
R(B,C) = 1/3 (2 b1c1 + 2 b2c2 + b1c2 + b2c1)
R(A,C) = 1/3 (2 a1c1 + 2 a2c2 + a1c2 + a2c1)
Can we prove any interesting bounds algebraically from the above?
Here's a constraint on each variable, since the variance is 1. Since a1^2 + a2^2 + a3^2 = 1, it follows that:
a1^2 + a2^2 + (-a2-a1)^2 = 1
a1^2 + a2^2 + a1^2 + a2^2 + 2a1a2 = 1
2 a1^2 + 2 a2^2 + 2a1a2 = 1)
Coming up with a hypothesis, proving it for n=3, and the induction step are left as an exercise for the reader (and the writer too). Possibly a challenging one.
Question: what are the possible values of the correlation between A and C?
Answer (due to
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
Thus the question becomes easy: what are the possible values of the angle AC, given the angles AB and BC?
---
Original post, for the record:
Is there a sort of triangle inequality for correlations? I'd like to get a lower bound on R(A,C) given R(A,B) and R(B,C). Imagine the latter two as being close to 0.95: is it possible for R(A,B) to be smaller than 0.5? I think not, but don't have a proof.
Let me explore.
The (simple estimator for) sample covariance (the population covariance is the limit, when the sample is infinite):
Cov(A,B) = 1/n (SUM_i (A_i - mu_A) (B_i - mu_B))
Correlations don't change if we rescale the variances to be 1:
R(A,B) = 1/n (SUM_i (A_i - mu_A) (B_i - mu_B))
R(B,C) = 1/n (SUM_i (B_i - mu_B) (C_i - mu_C))
R(A,C) = 1/n (SUM_i (A_i - mu_A) (C_i - mu_C))
Correlations don't change if we recenter so that the means are 0:
R(A,B) = 1/n (SUM_i (A_i B_i))
R(B,C) = 1/n (SUM_i (B_i C_i))
R(A,C) = 1/n (SUM_i (A_i C_i))
So how could I prove any sort of triangle inequality here?
Let n = 1, then all 3 correlations are 0.
Let n = 2, then
R(A,B) = 1/2 (a1b1 + a2b2)
R(B,C) = 1/2 (b1c1 + b2c2)
R(A,C) = 1/2 (a1c1 + a2c2)
But since the means are 0:
a2 = -a1
b2 = -b1
c2 = -c1
So:
R(A,B) = 1/2 (a1b1 + a1b1) = a1b1
R(B,C) = 1/2 (b1c1 + b1c1) = b1c1
R(A,C) = 1/2 (a1c1 + a1c1) = a1c1
Since variance is 1, < R(A,B) , R(B,C) , R(A,C) > is in {1, -1}^3, i.e. each possible assignment is a corners of the cube. The constraint rules out some possibilities.
e.g.:
R(A,B) = +1
R(B,C) = +1
-----------------------------------------
a1=1, b1=1, c1=1 \/ a1=-1, b1=-1, c1=-1
----------------------------------------
R(A,C) = +1
But this is hardly a triangle inequality.
Let n = 3:
R(A,B) = 1/3 (a1b1 + a2b2 + a3b3)
R(B,C) = 1/3 (b1c1 + b2c2 + b3c3)
R(A,C) = 1/3 (a1c1 + a2c2 + a3c3)
Since the means are zero:
a3 = -a2-a1
b3 = -b2-b1
c3 = -c2-c1
So:
R(A,B) = 1/3 (a1b1 + a2b2 + (-a2-a1)(-b2-b1)) = 1/3 (a1b1 + a2b2 + a2b2 + a1b1 + a1b2 + a2b1) = 1/3 (2 a1b1 + 2 a2b2 + a1b2 + a2b1)
Similarly:
R(B,C) = 1/3 (2 b1c1 + 2 b2c2 + b1c2 + b2c1)
R(A,C) = 1/3 (2 a1c1 + 2 a2c2 + a1c2 + a2c1)
Can we prove any interesting bounds algebraically from the above?
Here's a constraint on each variable, since the variance is 1. Since a1^2 + a2^2 + a3^2 = 1, it follows that:
a1^2 + a2^2 + (-a2-a1)^2 = 1
a1^2 + a2^2 + a1^2 + a2^2 + 2a1a2 = 1
2 a1^2 + 2 a2^2 + 2a1a2 = 1)
Coming up with a hypothesis, proving it for n=3, and the induction step are left as an exercise for the reader (and the writer too). Possibly a challenging one.