Last Thursday I heard the last half of DLS, featuring Doug James (who did his PhD with Dinesh Pai), about synthesizing sounds via physical simulation, using something called a "cubature", and some very serious-looking applied math. Like many engineering-type problems (e.g. vision), I didn't realize how difficult this was until looking at the state-of-the-art.
To realistically simulate the sound of a garbage can falling on the floor (5-10s of footage) apparently takes days in modern computers. He explained why bubbles produce a rising pitch as they pop, and that water drops falling on water don't make a sound: it's the resulting bubble that does!
Since I'm a machine learning guy, I asked him about data-oriented / "semi-synthetic" approaches, e.g. sampling from an existing database of event-sound pairs and trying to interpolate/extrapolate to the situation at hand. He said that this is what everyone else is doing. Another ML-y thought: it would be interesting to try to solve the inverse problem: from the sound, infer the event (just like "vision is inverse graphics"). This would be useful in forensics when you have audio but not video input.
Someone pointed out that many people don't know what real car crashes sound like, and that TV gives us a distorted idea. So Hollywood might not be that interested in synthetic audio.
To realistically simulate the sound of a garbage can falling on the floor (5-10s of footage) apparently takes days in modern computers. He explained why bubbles produce a rising pitch as they pop, and that water drops falling on water don't make a sound: it's the resulting bubble that does!
Since I'm a machine learning guy, I asked him about data-oriented / "semi-synthetic" approaches, e.g. sampling from an existing database of event-sound pairs and trying to interpolate/extrapolate to the situation at hand. He said that this is what everyone else is doing. Another ML-y thought: it would be interesting to try to solve the inverse problem: from the sound, infer the event (just like "vision is inverse graphics"). This would be useful in forensics when you have audio but not video input.
Someone pointed out that many people don't know what real car crashes sound like, and that TV gives us a distorted idea. So Hollywood might not be that interested in synthetic audio.
(no subject)
Date: 2009-10-09 07:23 am (UTC)(no subject)
Date: 2009-10-09 12:10 pm (UTC)Anyway, I'm sure all these things will be worked out eventually, given enough math and DSP power... This is the first significant attempt I've heard of, but now it's only a matter time...
(no subject)
Date: 2009-10-09 01:30 pm (UTC)However, there are plenty of opportunities for computational methods for sound synthesis; the clever aspects will be doing it efficiently. Pure modeling of the physical reality as James describes seems foolishly wasteful to me. It's like using ray-tracing for video games. It's a lot of brute force, and you get pretty pictures, but ultimately useless in practice.
The clever, innovative approaches will give you a valid perceptual result (like Foley does) without all the ridiculous computation, just like the graphics pioneers have been doing. This has been researched quite a bit, and the work I'm most familiar with is from an old colleague of mine:
http://sound.media.mit.edu/publications.php#mkc
(His dissertation is around somewhere I'm sure)
He did exactly this kind of work, but in an extremely efficient manner.
Similarly, the inverse is an interesting problem, but because it's primarily a human perceptual problem (much moreso than visual interpretation) it really needs to be addressed from that angle. "Sounds like" is very different than "is".