multiplication algorithms
Nov. 9th, 2006 07:47 amToday, after theory lunch, I got an impromptu, 5-minute private lesson from Daniel Sleator (of splay-tree fame), on fast matrix multiplication.
I was surprised to learn that you can do better than O(n^3) in the worst case, in general. FWIU, Volker Strassen was the first one to propose make an improvement, sometime in the 60's.
If you want to multiply a pair of 2x2 matrices, you would ordinarily do 8 multiplications (the result matrix has 4 entries, each with 2 terms). But it's possible to do it with 7: basically you multiply sums of matrices, and add to the results of other multiplications to get what you need.
If you're using such a divide-and-conquer approach, the fundamental (information-theoretic?) bound is: how many multiplications are necessary to form a basis from which to compute our answer?
To illustrate with a simpler example, imagine you have two 10-digit binary numbers, x and y. To do the divide-and-conquer, we split each of them into two halves.
Let x = [01010|00111] = [a | b]
In decimal, x = 2^5 a + b
Likewise, y = 2^5 c + d
x*y is going to be a 20-digit number:
x*y = 2^10ac + 2^5 (ad+bc) + bd (note that we can compute the last 5 digits of x*y from b and d alone)
So it looks like we need 4 multiplications: a*c, a*d, b*c, b*d.
But in fact, if we do (a+b)*(c+d) = ac+ad+bc+bd
we don't need to multiply ad and bc, since they can be calculated using only additions with the above and ac and bd (3 multiplications).
At first sight, this may not seem like a very good deal, since additions cost the same as multiplications.
But the point is that somehow, this generalizes to matrix multiplications, whose complexity asymptotically dominates the complexity of matrix addition. Therefore, the fewer multiplications, the more efficient you are in the limit.
The complexity of matrix multiplication is O(n^omega). Omega today is around 2.3, which is what you get by solving a recurrence.
---
Given the kind of problem that this is, I'm surprised these algorithms aren't being discovered by computers. Going to these theory talks, and seeing clever (but hairy) tricks just keeps feeding my desire to automate the discovery of these theorems/algorithms.
The search for algorithms must constrain the hypothesis-space to algorithms that *might* possibly work (do there exist kinds of genetic programming where the search is constrained by logical constraints, like e.g. symmetry?). Given such a constrained search, some empirical testing may still be needed to "debug" the hypothesis, do some parameter-adjusting, etc.
Also, there's no reason we can't feed it some background knowledge that human computer scientists benefit from. For instance, I would to formalize strategies like "divide-and-conquer". It's an interesting problem how to make such a general idea applicable to different kinds of representations / data-structures.
As a cognitive AI person, I feel like it's my duty to relieve mathematicians from the burden of thinking. But in my experience, theory people don't like to hear this.
I was surprised to learn that you can do better than O(n^3) in the worst case, in general. FWIU, Volker Strassen was the first one to propose make an improvement, sometime in the 60's.
If you want to multiply a pair of 2x2 matrices, you would ordinarily do 8 multiplications (the result matrix has 4 entries, each with 2 terms). But it's possible to do it with 7: basically you multiply sums of matrices, and add to the results of other multiplications to get what you need.
If you're using such a divide-and-conquer approach, the fundamental (information-theoretic?) bound is: how many multiplications are necessary to form a basis from which to compute our answer?
To illustrate with a simpler example, imagine you have two 10-digit binary numbers, x and y. To do the divide-and-conquer, we split each of them into two halves.
Let x = [01010|00111] = [a | b]
In decimal, x = 2^5 a + b
Likewise, y = 2^5 c + d
x*y is going to be a 20-digit number:
x*y = 2^10ac + 2^5 (ad+bc) + bd (note that we can compute the last 5 digits of x*y from b and d alone)
So it looks like we need 4 multiplications: a*c, a*d, b*c, b*d.
But in fact, if we do (a+b)*(c+d) = ac+ad+bc+bd
we don't need to multiply ad and bc, since they can be calculated using only additions with the above and ac and bd (3 multiplications).
At first sight, this may not seem like a very good deal, since additions cost the same as multiplications.
But the point is that somehow, this generalizes to matrix multiplications, whose complexity asymptotically dominates the complexity of matrix addition. Therefore, the fewer multiplications, the more efficient you are in the limit.
The complexity of matrix multiplication is O(n^omega). Omega today is around 2.3, which is what you get by solving a recurrence.
---
Given the kind of problem that this is, I'm surprised these algorithms aren't being discovered by computers. Going to these theory talks, and seeing clever (but hairy) tricks just keeps feeding my desire to automate the discovery of these theorems/algorithms.
The search for algorithms must constrain the hypothesis-space to algorithms that *might* possibly work (do there exist kinds of genetic programming where the search is constrained by logical constraints, like e.g. symmetry?). Given such a constrained search, some empirical testing may still be needed to "debug" the hypothesis, do some parameter-adjusting, etc.
Also, there's no reason we can't feed it some background knowledge that human computer scientists benefit from. For instance, I would to formalize strategies like "divide-and-conquer". It's an interesting problem how to make such a general idea applicable to different kinds of representations / data-structures.
As a cognitive AI person, I feel like it's my duty to relieve mathematicians from the burden of thinking. But in my experience, theory people don't like to hear this.
(no subject)
Date: 2006-11-09 02:17 pm (UTC)I still can't understand your surprise that algorithm invention isn't yet trivially automated. Seriously, again, if you think it is easy to do, what algorithm do you have in mind that you think would successfully do it? Why are you not spending the time you write livejournal entries instead trying to code up this algorithm? :)
(also, just as a performance note, multiplcations do not at all cost the same amount as additions, generally; I suppose usually now on modern architechtures multiplication and additions of floating point numbers are omparable --- but in the matrix example, multiplying two matrices is basically always going to be harder than adding two matrices, and those are the things you're divide-and-conquering over)
(no subject)
Date: 2006-11-09 02:18 pm (UTC)But for what it's worth, Allan Borodin at Toronto has done some really cool work on formalizing greedy algorithms. He hasn't done divide-and-conquer yet, but he might be working on it. The formalization of greedy algorithms is pretty cool, and can be used to prove theorems of the form "no greedy algorithm can give an aproximation ratio of less than alpha" for various interesting problems (facility location and set cover spring to mind). But the point of his formalization is to go the other way -- instead of using it to find good greedy algorithms, he uses it to prove that greedy algorithms can't work well.
(no subject)
Date: 2006-11-09 02:38 pm (UTC)On the other hand, I can more easily imagine a "simulated undergraduate" - Mike is working on a problem and tells his compuer not to "think about solving this problem" but "think about using dynamic programming for this problem" for the afternoon - just like there's apparently a "think about using a greedy algorithm for this problem" computer.
I hazard that this particular AI-building strategy would be more useful to Jcreed and I than Mike - for our field, the query we'd be sending the simulated undergraduate is almost always "Induct over about a billion things to try and show this induction goes through, and if you always get stuck describe that to me." Then, we could do the more interesting things that we usually have to hold off on doing until we make the darn inductive proof go through. (Unsuprisingly, there already are a group of provers that do this.)
Do you think there's a way to your goal without going through this "write lots of robotic undergraduates to solve increacingly interesting groups of problems" phase?
(no subject)
Date: 2006-11-09 02:46 pm (UTC)I should also point out that Borodin's characterization of greedy algorithms so far has not led to a greedy-algorithm finding technique, i.e. as far as I know there's no "think about using a greedy algorithm computer". From what I remember, his formalization is a machine-like restriction, in that he basically states how greedy algorithms have to operate, then shows that all algorithms that we normally think of as greedy can be modified to work like this. A formalization like this doesn't necessarily lead to being able to find such algorithms -- look at turing machines for example. We have a perfectly good formalization of TMs, but we still can't automatically find TMs to do what we want.
(no subject)
Date: 2006-11-09 04:42 pm (UTC)This is exactly my point. This is why minimizing multiplications creates asymptotically better algorithms.
(no subject)
Date: 2006-11-09 06:03 pm (UTC)And your point is well taken - at most, the formal characterization does is defines a problem space that you could then write search tactics in. Assuming someone figured out how to define these tactics, they wouldn't have to be complete, just as provers that try to write inductive proofs aren't complete (just as the strategy of "have humans work to find inductive proofs and greedy algorithms" is not complete).
(no subject)
Date: 2006-11-09 06:41 pm (UTC)But it may be possible to do much of this with brute attacks.
(no subject)
Date: 2006-11-09 06:47 pm (UTC)A brute-force approach can in principle do anything: just search through all possible algorithms. For each one, just search through all possible proofs. Once you find a proof for an algorithm, the problem is solved.
But in practice, such an approach is disastrous. Intelligence boils down to effective search strategies, i.e. strategies that do a good job of pruning the search space: pruning a lot of unproductive paths while keeping enough of the productive ones.
(no subject)
Date: 2006-11-09 07:51 pm (UTC)(no subject)
Date: 2006-11-09 09:53 pm (UTC)