gusl: (Default)
[personal profile] gusl
Given a dataset with 3 variables A, B, C, you observe a value for the correlation between A and B, and another value for the correlation between B and C.

Question: what are the possible values of the correlation between A and C?

Answer (due to [livejournal.com profile] en_ki): if your dataset has n points, center it, and consider A, B and C as vectors in ℜn. The empirical correlation between two variables is the cosine of the angle between them.

Thus the question becomes easy: what are the possible values of the angle AC, given the angles AB and BC?


---

Original post, for the record:

Is there a sort of triangle inequality for correlations? I'd like to get a lower bound on R(A,C) given R(A,B) and R(B,C). Imagine the latter two as being close to 0.95: is it possible for R(A,B) to be smaller than 0.5? I think not, but don't have a proof.

Let me explore.


The (simple estimator for) sample covariance (the population covariance is the limit, when the sample is infinite):

Cov(A,B) = 1/n (SUM_i (A_i - mu_A) (B_i - mu_B))


Correlations don't change if we rescale the variances to be 1:

R(A,B) = 1/n (SUM_i (A_i - mu_A) (B_i - mu_B))

R(B,C) = 1/n (SUM_i (B_i - mu_B) (C_i - mu_C))

R(A,C) = 1/n (SUM_i (A_i - mu_A) (C_i - mu_C))


Correlations don't change if we recenter so that the means are 0:

R(A,B) = 1/n (SUM_i (A_i B_i))

R(B,C) = 1/n (SUM_i (B_i C_i))

R(A,C) = 1/n (SUM_i (A_i C_i))



So how could I prove any sort of triangle inequality here?


Let n = 1, then all 3 correlations are 0.

Let n = 2, then
R(A,B) = 1/2 (a1b1 + a2b2)
R(B,C) = 1/2 (b1c1 + b2c2)
R(A,C) = 1/2 (a1c1 + a2c2)

But since the means are 0:
a2 = -a1
b2 = -b1
c2 = -c1

So:
R(A,B) = 1/2 (a1b1 + a1b1) = a1b1
R(B,C) = 1/2 (b1c1 + b1c1) = b1c1
R(A,C) = 1/2 (a1c1 + a1c1) = a1c1

Since variance is 1, < R(A,B) , R(B,C) , R(A,C) > is in {1, -1}^3, i.e. each possible assignment is a corners of the cube. The constraint rules out some possibilities.

e.g.:
R(A,B) = +1
R(B,C) = +1
-----------------------------------------
a1=1, b1=1, c1=1 \/ a1=-1, b1=-1, c1=-1
----------------------------------------
R(A,C) = +1


But this is hardly a triangle inequality.


Let n = 3:

R(A,B) = 1/3 (a1b1 + a2b2 + a3b3)
R(B,C) = 1/3 (b1c1 + b2c2 + b3c3)
R(A,C) = 1/3 (a1c1 + a2c2 + a3c3)

Since the means are zero:
a3 = -a2-a1
b3 = -b2-b1
c3 = -c2-c1

So:
R(A,B) = 1/3 (a1b1 + a2b2 + (-a2-a1)(-b2-b1)) = 1/3 (a1b1 + a2b2 + a2b2 + a1b1 + a1b2 + a2b1) = 1/3 (2 a1b1 + 2 a2b2 + a1b2 + a2b1)

Similarly:
R(B,C) = 1/3 (2 b1c1 + 2 b2c2 + b1c2 + b2c1)
R(A,C) = 1/3 (2 a1c1 + 2 a2c2 + a1c2 + a2c1)

Can we prove any interesting bounds algebraically from the above?

Here's a constraint on each variable, since the variance is 1. Since a1^2 + a2^2 + a3^2 = 1, it follows that:
a1^2 + a2^2 + (-a2-a1)^2 = 1
a1^2 + a2^2 + a1^2 + a2^2 + 2a1a2 = 1
2 a1^2 + 2 a2^2 + 2a1a2 = 1)

Coming up with a hypothesis, proving it for n=3, and the induction step are left as an exercise for the reader (and the writer too). Possibly a challenging one.

(no subject)

Date: 2007-10-28 03:00 pm (UTC)
From: [identity profile] en-ki.livejournal.com
(1) Looking at the other boundary condition, if A and B are uncorrelated and B and C are uncorrelated, then A and C can have arbitrary correlation. So it's not going to be something as simple as the triangle inequality.

(2) Supposing mean to be zero and variance one and taking A, B, C as vectors in Euclidean space,

R(A,B) = A · B

so the property you are looking for will be a property of the dot product.

Further, this is cos(AB), where AB is the angle between the vectors A and B.

Fixing AB and BC, AC is maximized when A, B, and C are coplanar with A and C on opposite sites of B, so that AC = AB + BC. Take them all to be positive; then R(A,C) is minimized when AC is maximized. So by the cosine addition formula

R(A,C) >= R(A,B) R(B,C) - sqrt([1 - R(A,B)^2][1 - R(B,C)^2])

If you can make that pretty or enlightening, be my guest. :) But in the particular case of R(A,B) = R(B,C) = 0.95, you get R(A,C) >= 0.805.

(no subject)

Date: 2007-10-28 03:06 pm (UTC)
From: [identity profile] en-ki.livejournal.com
Now that I think of it, it's probably reasonable to call the sine, sqrt(1 - R(A,B)^2), the "uncorrelation" of A and B: so as long as the product of A and B's and B and C's uncorrelations is less than the product of their correlations, A and C are positively correlated.

(no subject)

Date: 2007-10-28 03:10 pm (UTC)
From: [identity profile] en-ki.livejournal.com
[Note that my default icon says "wrong" in it, referring to the use of a wide stance. This is probably not the best icon to use in a math discussion, especially since the latter half of my first statement is fairly bogus.]

(no subject)

Date: 2007-10-28 11:43 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
<< The correlation coefficient can also be viewed as the cosine of the angle between the two vectors of samples drawn from the two random variables. >>

This is unclear to me. Can you restate it?

(no subject)

Date: 2007-10-28 11:58 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
what is the vector A? The vector of all samples from A?

(no subject)

Date: 2007-10-29 12:01 am (UTC)
From: [identity profile] en-ki.livejournal.com
Yes, we are treating the n samples from population A as a vector in Euclidean n-space and working with the dot product.

(no subject)

Date: 2007-10-29 12:04 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
do you mean co-hyper-planar? how many dimensions are these vectors in?

(no subject)

Date: 2007-10-29 12:09 am (UTC)
From: [identity profile] en-ki.livejournal.com
These are unit vectors (since variance is 1) at the origin (since the mean is 0) in n-space (since there are n samples).

Two vectors define a plane, no matter how big the space.

C may or may not be in the plane defined by A and B. If it is, the angles add: AB + BC = AC; if not, AB + BC < AC. (No proof on tap, but it should be pretty simple linear algebra.) So AC >= AB + BC no matter what.

(no subject)

Date: 2007-10-29 12:10 am (UTC)
From: [identity profile] en-ki.livejournal.com
I'm on LJabber/AIM if you want to go real-time.

(no subject)

Date: 2007-10-29 12:16 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
I think I get it! Thank you!

For simplicity, I am imagining this in 3 dimensions. We have 3 vectors, A, B, C, and we .

Let A be a vector.
We fix a B, so that AB is the given angle. Now there's a triangle.
Now, there are many places to make a C so that BC is the given angle. This describes a circle around B.
As you said, AC is maximized when C is placed in the plane describe by A and B, and on the opposite side from A. This procedure lets us achieve the bound, i.e. construct an example.

(no subject)

Date: 2007-10-29 12:21 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
no access here, but I finally get it! Thank you!

Now, using that formula, R(A,B) = R(B,C) = r, how high can r be while R(A,C) is 0?

(no subject)

Date: 2007-10-29 12:24 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
0 = r^2 - sqrt((1-r^2)^2)
0 = r^2 - (1-r^2)
0 = 2r^2 - 1
r^2 = 1/2
r = 1/sqrt(2)

(no subject)

Date: 2007-10-29 12:27 am (UTC)
From: [identity profile] en-ki.livejournal.com
Q. "How small can the angle AB = BC be and still let the angle AC be a right angle?"

A. "45 degrees."

(no subject)

Date: 2007-11-02 06:36 pm (UTC)
From: [identity profile] spoonless.livejournal.com
This reminds me a lot of Bell's Inequality:

C(b,c) + 1 >= |C(a,b)-C(a,c)|

(http://en.wikipedia.org/wiki/Bell's_Theorem)

although they're probably unrelated.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags