Given a dataset with 3 variables A, B, C, you observe a value for the correlation between A and B, and another value for the correlation between B and C.
Question: what are the possible values of the correlation between A and C?
Answer (due to
en_ki): if your dataset has n points, center it, and consider A, B and C as vectors in ℜn. The empirical correlation between two variables is the cosine of the angle between them.
Thus the question becomes easy: what are the possible values of the angle AC, given the angles AB and BC?
---
Original post, for the record:
Is there a sort of triangle inequality for correlations? I'd like to get a lower bound on R(A,C) given R(A,B) and R(B,C). Imagine the latter two as being close to 0.95: is it possible for R(A,B) to be smaller than 0.5? I think not, but don't have a proof.
Let me explore.
The (simple estimator for) sample covariance (the population covariance is the limit, when the sample is infinite):
Cov(A,B) = 1/n (SUM_i (A_i - mu_A) (B_i - mu_B))
Correlations don't change if we rescale the variances to be 1:
R(A,B) = 1/n (SUM_i (A_i - mu_A) (B_i - mu_B))
R(B,C) = 1/n (SUM_i (B_i - mu_B) (C_i - mu_C))
R(A,C) = 1/n (SUM_i (A_i - mu_A) (C_i - mu_C))
Correlations don't change if we recenter so that the means are 0:
R(A,B) = 1/n (SUM_i (A_i B_i))
R(B,C) = 1/n (SUM_i (B_i C_i))
R(A,C) = 1/n (SUM_i (A_i C_i))
So how could I prove any sort of triangle inequality here?
Let n = 1, then all 3 correlations are 0.
Let n = 2, then
R(A,B) = 1/2 (a1b1 + a2b2)
R(B,C) = 1/2 (b1c1 + b2c2)
R(A,C) = 1/2 (a1c1 + a2c2)
But since the means are 0:
a2 = -a1
b2 = -b1
c2 = -c1
So:
R(A,B) = 1/2 (a1b1 + a1b1) = a1b1
R(B,C) = 1/2 (b1c1 + b1c1) = b1c1
R(A,C) = 1/2 (a1c1 + a1c1) = a1c1
Since variance is 1, < R(A,B) , R(B,C) , R(A,C) > is in {1, -1}^3, i.e. each possible assignment is a corners of the cube. The constraint rules out some possibilities.
e.g.:
R(A,B) = +1
R(B,C) = +1
-----------------------------------------
a1=1, b1=1, c1=1 \/ a1=-1, b1=-1, c1=-1
----------------------------------------
R(A,C) = +1
But this is hardly a triangle inequality.
Let n = 3:
R(A,B) = 1/3 (a1b1 + a2b2 + a3b3)
R(B,C) = 1/3 (b1c1 + b2c2 + b3c3)
R(A,C) = 1/3 (a1c1 + a2c2 + a3c3)
Since the means are zero:
a3 = -a2-a1
b3 = -b2-b1
c3 = -c2-c1
So:
R(A,B) = 1/3 (a1b1 + a2b2 + (-a2-a1)(-b2-b1)) = 1/3 (a1b1 + a2b2 + a2b2 + a1b1 + a1b2 + a2b1) = 1/3 (2 a1b1 + 2 a2b2 + a1b2 + a2b1)
Similarly:
R(B,C) = 1/3 (2 b1c1 + 2 b2c2 + b1c2 + b2c1)
R(A,C) = 1/3 (2 a1c1 + 2 a2c2 + a1c2 + a2c1)
Can we prove any interesting bounds algebraically from the above?
Here's a constraint on each variable, since the variance is 1. Since a1^2 + a2^2 + a3^2 = 1, it follows that:
a1^2 + a2^2 + (-a2-a1)^2 = 1
a1^2 + a2^2 + a1^2 + a2^2 + 2a1a2 = 1
2 a1^2 + 2 a2^2 + 2a1a2 = 1)
Coming up with a hypothesis, proving it for n=3, and the induction step are left as an exercise for the reader (and the writer too). Possibly a challenging one.
Question: what are the possible values of the correlation between A and C?
Answer (due to
Thus the question becomes easy: what are the possible values of the angle AC, given the angles AB and BC?
---
Original post, for the record:
Is there a sort of triangle inequality for correlations? I'd like to get a lower bound on R(A,C) given R(A,B) and R(B,C). Imagine the latter two as being close to 0.95: is it possible for R(A,B) to be smaller than 0.5? I think not, but don't have a proof.
Let me explore.
The (simple estimator for) sample covariance (the population covariance is the limit, when the sample is infinite):
Cov(A,B) = 1/n (SUM_i (A_i - mu_A) (B_i - mu_B))
Correlations don't change if we rescale the variances to be 1:
R(A,B) = 1/n (SUM_i (A_i - mu_A) (B_i - mu_B))
R(B,C) = 1/n (SUM_i (B_i - mu_B) (C_i - mu_C))
R(A,C) = 1/n (SUM_i (A_i - mu_A) (C_i - mu_C))
Correlations don't change if we recenter so that the means are 0:
R(A,B) = 1/n (SUM_i (A_i B_i))
R(B,C) = 1/n (SUM_i (B_i C_i))
R(A,C) = 1/n (SUM_i (A_i C_i))
So how could I prove any sort of triangle inequality here?
Let n = 1, then all 3 correlations are 0.
Let n = 2, then
R(A,B) = 1/2 (a1b1 + a2b2)
R(B,C) = 1/2 (b1c1 + b2c2)
R(A,C) = 1/2 (a1c1 + a2c2)
But since the means are 0:
a2 = -a1
b2 = -b1
c2 = -c1
So:
R(A,B) = 1/2 (a1b1 + a1b1) = a1b1
R(B,C) = 1/2 (b1c1 + b1c1) = b1c1
R(A,C) = 1/2 (a1c1 + a1c1) = a1c1
Since variance is 1, < R(A,B) , R(B,C) , R(A,C) > is in {1, -1}^3, i.e. each possible assignment is a corners of the cube. The constraint rules out some possibilities.
e.g.:
R(A,B) = +1
R(B,C) = +1
-----------------------------------------
a1=1, b1=1, c1=1 \/ a1=-1, b1=-1, c1=-1
----------------------------------------
R(A,C) = +1
But this is hardly a triangle inequality.
Let n = 3:
R(A,B) = 1/3 (a1b1 + a2b2 + a3b3)
R(B,C) = 1/3 (b1c1 + b2c2 + b3c3)
R(A,C) = 1/3 (a1c1 + a2c2 + a3c3)
Since the means are zero:
a3 = -a2-a1
b3 = -b2-b1
c3 = -c2-c1
So:
R(A,B) = 1/3 (a1b1 + a2b2 + (-a2-a1)(-b2-b1)) = 1/3 (a1b1 + a2b2 + a2b2 + a1b1 + a1b2 + a2b1) = 1/3 (2 a1b1 + 2 a2b2 + a1b2 + a2b1)
Similarly:
R(B,C) = 1/3 (2 b1c1 + 2 b2c2 + b1c2 + b2c1)
R(A,C) = 1/3 (2 a1c1 + 2 a2c2 + a1c2 + a2c1)
Can we prove any interesting bounds algebraically from the above?
Here's a constraint on each variable, since the variance is 1. Since a1^2 + a2^2 + a3^2 = 1, it follows that:
a1^2 + a2^2 + (-a2-a1)^2 = 1
a1^2 + a2^2 + a1^2 + a2^2 + 2a1a2 = 1
2 a1^2 + 2 a2^2 + 2a1a2 = 1)
Coming up with a hypothesis, proving it for n=3, and the induction step are left as an exercise for the reader (and the writer too). Possibly a challenging one.
(no subject)
Date: 2007-10-28 03:00 pm (UTC)(2) Supposing mean to be zero and variance one and taking A, B, C as vectors in Euclidean space,
R(A,B) = A · B
so the property you are looking for will be a property of the dot product.
Further, this is cos(AB), where AB is the angle between the vectors A and B.
Fixing AB and BC, AC is maximized when A, B, and C are coplanar with A and C on opposite sites of B, so that AC = AB + BC. Take them all to be positive; then R(A,C) is minimized when AC is maximized. So by the cosine addition formula
R(A,C) >= R(A,B) R(B,C) - sqrt([1 - R(A,B)^2][1 - R(B,C)^2])
If you can make that pretty or enlightening, be my guest. :) But in the particular case of R(A,B) = R(B,C) = 0.95, you get R(A,C) >= 0.805.
(no subject)
Date: 2007-10-28 03:06 pm (UTC)(no subject)
Date: 2007-10-28 03:10 pm (UTC)(no subject)
Date: 2007-10-28 11:43 pm (UTC)This is unclear to me. Can you restate it?
(no subject)
Date: 2007-10-28 11:58 pm (UTC)(no subject)
Date: 2007-10-29 12:01 am (UTC)(no subject)
Date: 2007-10-29 12:04 am (UTC)(no subject)
Date: 2007-10-29 12:09 am (UTC)Two vectors define a plane, no matter how big the space.
C may or may not be in the plane defined by A and B. If it is, the angles add: AB + BC = AC; if not, AB + BC < AC. (No proof on tap, but it should be pretty simple linear algebra.) So AC >= AB + BC no matter what.
(no subject)
Date: 2007-10-29 12:10 am (UTC)(no subject)
Date: 2007-10-29 12:16 am (UTC)For simplicity, I am imagining this in 3 dimensions. We have 3 vectors, A, B, C, and we .
Let A be a vector.
We fix a B, so that AB is the given angle. Now there's a triangle.
Now, there are many places to make a C so that BC is the given angle. This describes a circle around B.
As you said, AC is maximized when C is placed in the plane describe by A and B, and on the opposite side from A. This procedure lets us achieve the bound, i.e. construct an example.
(no subject)
Date: 2007-10-29 12:21 am (UTC)Now, using that formula, R(A,B) = R(B,C) = r, how high can r be while R(A,C) is 0?
(no subject)
Date: 2007-10-29 12:24 am (UTC)0 = r^2 - (1-r^2)
0 = 2r^2 - 1
r^2 = 1/2
r = 1/sqrt(2)
(no subject)
Date: 2007-10-29 12:27 am (UTC)A. "45 degrees."
(no subject)
Date: 2007-11-02 06:36 pm (UTC)C(b,c) + 1 >= |C(a,b)-C(a,c)|
(http://en.wikipedia.org/wiki/Bell's_Theorem)
although they're probably unrelated.