gusl | triangle inequality for correlation distance

Given a dataset with 3 variables A, B, C, you observe a value for the correlation between A and B, and another value for the correlation between B and C.

Question: what are the possible values of the correlation between A and C?

Answer (due to en_ki): if your dataset has n points, center it, and consider A, B and C as vectors in ℜⁿ. The empirical correlation between two variables is the cosine of the angle between them.

Thus the question becomes easy: what are the possible values of the angle AC, given the angles AB and BC?

---

Original post, for the record:

Is there a sort of triangle inequality for correlations? I'd like to get a lower bound on R(A,C) given R(A,B) and R(B,C). Imagine the latter two as being close to 0.95: is it possible for R(A,B) to be smaller than 0.5? I think not, but don't have a proof.

Let me explore.

The (simple estimator for) sample covariance (the population covariance is the limit, when the sample is infinite):

Cov(A,B) = 1/n (SUM_i (A_i - mu_A) (B_i - mu_B))

Correlations don't change if we rescale the variances to be 1:

R(A,B) = 1/n (SUM_i (A_i - mu_A) (B_i - mu_B))

R(B,C) = 1/n (SUM_i (B_i - mu_B) (C_i - mu_C))

R(A,C) = 1/n (SUM_i (A_i - mu_A) (C_i - mu_C))

Correlations don't change if we recenter so that the means are 0:

R(A,B) = 1/n (SUM_i (A_i B_i))

R(B,C) = 1/n (SUM_i (B_i C_i))

R(A,C) = 1/n (SUM_i (A_i C_i))

So how could I prove any sort of triangle inequality here?

Let n = 1, then all 3 correlations are 0.

Let n = 2, then
R(A,B) = 1/2 (a1b1 + a2b2)
R(B,C) = 1/2 (b1c1 + b2c2)
R(A,C) = 1/2 (a1c1 + a2c2)

But since the means are 0:
a2 = -a1
b2 = -b1
c2 = -c1

So:
R(A,B) = 1/2 (a1b1 + a1b1) = a1b1
R(B,C) = 1/2 (b1c1 + b1c1) = b1c1
R(A,C) = 1/2 (a1c1 + a1c1) = a1c1

Since variance is 1, < R(A,B) , R(B,C) , R(A,C) > is in {1, -1}^3, i.e. each possible assignment is a corners of the cube. The constraint rules out some possibilities.

e.g.:
R(A,B) = +1
R(B,C) = +1
-----------------------------------------
a1=1, b1=1, c1=1 \/ a1=-1, b1=-1, c1=-1
----------------------------------------
R(A,C) = +1

But this is hardly a triangle inequality.

Let n = 3:

R(A,B) = 1/3 (a1b1 + a2b2 + a3b3)
R(B,C) = 1/3 (b1c1 + b2c2 + b3c3)
R(A,C) = 1/3 (a1c1 + a2c2 + a3c3)

Since the means are zero:
a3 = -a2-a1
b3 = -b2-b1
c3 = -c2-c1

So:
R(A,B) = 1/3 (a1b1 + a2b2 + (-a2-a1)(-b2-b1)) = 1/3 (a1b1 + a2b2 + a2b2 + a1b1 + a1b2 + a2b1) = 1/3 (2 a1b1 + 2 a2b2 + a1b2 + a2b1)

Similarly:
R(B,C) = 1/3 (2 b1c1 + 2 b2c2 + b1c2 + b2c1)
R(A,C) = 1/3 (2 a1c1 + 2 a2c2 + a1c2 + a2c1)

Can we prove any interesting bounds algebraically from the above?

Here's a constraint on each variable, since the variance is 1. Since a1^2 + a2^2 + a3^2 = 1, it follows that:
a1^2 + a2^2 + (-a2-a1)^2 = 1
a1^2 + a2^2 + a1^2 + a2^2 + 2a1a2 = 1
2 a1^2 + 2 a2^2 + 2a1a2 = 1)

Coming up with a hypothesis, proving it for n=3, and the induction step are left as an exercise for the reader (and the writer too). Possibly a challenging one.

Threaded | Top-Level Comments Only

From:

en-ki.livejournal.com

(1) Looking at the other boundary condition, if A and B are uncorrelated and B and C are uncorrelated, then A and C can have arbitrary correlation. So it's not going to be something as simple as the triangle inequality.

(2) Supposing mean to be zero and variance one and taking A, B, C as vectors in Euclidean space,

R(A,B) = A · B

so the property you are looking for will be a property of the dot product.

Further, this is cos(AB), where AB is the angle between the vectors A and B.

Fixing AB and BC, AC is maximized when A, B, and C are coplanar with A and C on opposite sites of B, so that AC = AB + BC. Take them all to be positive; then R(A,C) is minimized when AC is maximized. So by the cosine addition formula

R(A,C) >= R(A,B) R(B,C) - sqrt([1 - R(A,B)^2][1 - R(B,C)^2])

If you can make that pretty or enlightening, be my guest. :) But in the particular case of R(A,B) = R(B,C) = 0.95, you get R(A,C) >= 0.805.

Now that I think of it, it's probably reasonable to call the sine, sqrt(1 - R(A,B)^2), the "uncorrelation" of A and B: so as long as the product of A and B's and B and C's uncorrelations is less than the product of their correlations, A and C are positively correlated.

[Note that my default icon says "wrong" in it, referring to the use of a wide stance. This is probably not the best icon to use in a math discussion, especially since the latter half of my first statement is fairly bogus.]

gustavolacerda.livejournal.com

<< The correlation coefficient can also be viewed as the cosine of the angle between the two vectors of samples drawn from the two random variables. >>

This is unclear to me. Can you restate it?

what is the vector A? The vector of all samples from A?

Yes, we are treating the n samples from population A as a vector in Euclidean n-space and working with the dot product.

do you mean co-hyper-planar? how many dimensions are these vectors in?

These are unit vectors (since variance is 1) at the origin (since the mean is 0) in n-space (since there are n samples).

Two vectors define a plane, no matter how big the space.

C may or may not be in the plane defined by A and B. If it is, the angles add: AB + BC = AC; if not, AB + BC < AC. (No proof on tap, but it should be pretty simple linear algebra.) So AC >= AB + BC no matter what.

I'm on LJabber/AIM if you want to go real-time.

I think I get it! Thank you!

For simplicity, I am imagining this in 3 dimensions. We have 3 vectors, A, B, C, and we .

Let A be a vector.
We fix a B, so that AB is the given angle. Now there's a triangle.
Now, there are many places to make a C so that BC is the given angle. This describes a circle around B.
As you said, AC is maximized when C is placed in the plane describe by A and B, and on the opposite side from A. This procedure lets us achieve the bound, i.e. construct an example.

no access here, but I finally get it! Thank you!

Now, using that formula, R(A,B) = R(B,C) = r, how high can r be while R(A,C) is 0?

0 = r^2 - sqrt((1-r^2)^2)
0 = r^2 - (1-r^2)
0 = 2r^2 - 1
r^2 = 1/2
r = 1/sqrt(2)

Q. "How small can the angle AB = BC be and still let the angle AC be a right angle?"

A. "45 degrees."

spoonless.livejournal.com

This reminds me a lot of Bell's Inequality:

C(b,c) + 1 >= |C(a,b)-C(a,c)|

(http://en.wikipedia.org/wiki/Bell's_Theorem)

although they're probably unrelated.

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29

Gustavo Lacerda

triangle inequality for correlation distance

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

Profile

February 2020

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags