gusl: (Default)
[personal profile] gusl
Last Thursday I heard the last half of DLS, featuring Doug James (who did his PhD with Dinesh Pai), about synthesizing sounds via physical simulation, using something called a "cubature", and some very serious-looking applied math. Like many engineering-type problems (e.g. vision), I didn't realize how difficult this was until looking at the state-of-the-art.

To realistically simulate the sound of a garbage can falling on the floor (5-10s of footage) apparently takes days in modern computers. He explained why bubbles produce a rising pitch as they pop, and that water drops falling on water don't make a sound: it's the resulting bubble that does!

Since I'm a machine learning guy, I asked him about data-oriented / "semi-synthetic" approaches, e.g. sampling from an existing database of event-sound pairs and trying to interpolate/extrapolate to the situation at hand. He said that this is what everyone else is doing. Another ML-y thought: it would be interesting to try to solve the inverse problem: from the sound, infer the event (just like "vision is inverse graphics"). This would be useful in forensics when you have audio but not video input.

Someone pointed out that many people don't know what real car crashes sound like, and that TV gives us a distorted idea. So Hollywood might not be that interested in synthetic audio.

(no subject)

Date: 2009-10-09 07:23 am (UTC)
From: [identity profile] gfish.livejournal.com
I've been interested in the inverse problem for a long time, but I've never attempted to do anything about it. The human ear can certainly determine a surprising amount about basic collisions just from the sound. You can usually work out material, general shape and what the ground material was like just from the sound, after all.

(no subject)

Date: 2009-10-09 12:10 pm (UTC)
From: [identity profile] denorae.livejournal.com
Hollywood may not be interested in generated audio, but video game producers certainly are. Remember when (I think it was) Half-Life came out, and everyone was so impressed by the physics engine? "You can pick stuff up and throw it!" This was a new thing. The physics model, for graphics at least, had reached the point where they didn't need to render things in advance, which granted the player a much greater sense of improvisation within the environment. But so far, there has been no such equivalent for audio. Let's say the player wants to pick up a rock and throw it against the wall. If you want it to look right, you need to know the weight and shape of the rock, what the light sources are, and where the wall is. To simulate the audio realistically, you need to know what the wall is made of, how thick the wall is, how big the room is, how reflective the room is, where the other surfaces are, etc etc. Realistic audio is much more dependent on environment and materials, and we haven't even gotten to simulated speech or moving sound sources (don't forget about Doppler!).

Anyway, I'm sure all these things will be worked out eventually, given enough math and DSP power... This is the first significant attempt I've heard of, but now it's only a matter time...

(no subject)

Date: 2009-10-09 01:30 pm (UTC)
From: [identity profile] zarex.livejournal.com
Hollywood and videogames don't really want "realism" - they want illusion and hyper-realism. Reality is boring.

However, there are plenty of opportunities for computational methods for sound synthesis; the clever aspects will be doing it efficiently. Pure modeling of the physical reality as James describes seems foolishly wasteful to me. It's like using ray-tracing for video games. It's a lot of brute force, and you get pretty pictures, but ultimately useless in practice.

The clever, innovative approaches will give you a valid perceptual result (like Foley does) without all the ridiculous computation, just like the graphics pioneers have been doing. This has been researched quite a bit, and the work I'm most familiar with is from an old colleague of mine:

http://sound.media.mit.edu/publications.php#mkc

(His dissertation is around somewhere I'm sure)

He did exactly this kind of work, but in an extremely efficient manner.

Similarly, the inverse is an interesting problem, but because it's primarily a human perceptual problem (much moreso than visual interpretation) it really needs to be addressed from that angle. "Sounds like" is very different than "is".

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags