This AI can harness sound to reveal the structure of invisible spaces
Imagine you’re walking through a series of rooms, spinning in a circle closer and closer to a sound source, whether it’s music playing from a speaker or someone speaking. The noises you hear as you move through this maze will distort and fluctuate based on where you are. Given a scenario like this, a team of researchers from the Massachusetts Institute of Technology and Carnegie Mellon University worked on a model that could realistically depict how sound changes around a listener as it moves through a given space. They published their work on the topic in a preprint paper last week.
The sounds we hear in the world can vary depending on factors such as the type of spaces the sound waves bounce from, the materials they strike or pass through, and the distance they need to travel. These properties can affect how sound is scattered and decayed. But researchers can reverse engineer this process, too. They can take an intact sample, and even use it to elicit the shape of the environment (in some ways, it’s similar to how animals use echolocation to “see”).
“We mostly do spatial acoustics modeling, so [focus is on] Echo,” says Yilon Doe, an MIT graduate student and author on Paper. “Maybe if you’re in a concert hall, there’s a lot of echo, maybe if you’re in a cathedral, there’s a lot of echo versus if you’re in a small room, you won’t. There is really no echo.”
Their model, called the Neural Acoustic Field (NAF), is a neural network that can interpret the location of both the sound source and the listener, as well as the geometry of the space through which the sound travels.
To train the NAF, the researchers fitted it with visual information about the scene and a few spectrograms (representations of a visual pattern that captures the amplitude, frequency, and duration of sounds) of the collected sound from what the listener might hear at different points and situations.
“We have quite a few data points; based on that, we fit into a kind of model that can fine-tune how the sound would sound from any location in the room, and what it would sound like from a new location,” Doe says. “Once we fit that model, you can simulate all kinds of virtual lanes.”
The team used acoustic data obtained from a simulated room of approx. “We also have some results on real scenes, but the problem is that collecting this data in the real world takes a lot of time,” Doe notes.
With this data, the model can learn to predict how the sounds the listener hears will change if they move to another position. For example, if music is coming from a speaker in the center of a room, this sound will be louder if the listener approaches it, and will become more muted if the listener enters another room. NAF can also use this information to predict the structure of the world around the listener.
One of the big applications of this type of model is in virtual reality, so that sounds can be accurately generated for a listener moving through a space in virtual reality. The other big use he sees is artificial intelligence.
“We have a lot of models of vision. But perception is not just about seeing, sound is also very important. We can also imagine that this is an attempt to do perception using sound.”
Voice isn’t the only way researchers are manipulating AI. Today’s machine learning technology can take 2D images and use them to create a 3D model of an object, presenting different perspectives and new perspectives. This technology comes in handy especially in virtual reality settings, where engineers and artists have to model realism in screen spaces.
Additionally, models like these that focus on sound can improve existing sensors and devices in low-light or underwater conditions. “Sound also allows you to see through angles. There is a lot of contrast depending on the lighting conditions. Things look very different,” says Doe. “But the sound bounces back the same way most of the time. It’s a different sensory way.”
At the moment, the main obstacle to the further development of their model is the lack of information. “One thing that was surprisingly difficult was getting the data, because people haven’t explored this problem very much,” he says. “When you try to collect new opinions in virtual reality, there are a lot of data sets, all these are real images. With more data sets, it will be very exciting to explore more of these approaches especially in real scenes.”
Watch (and listen to) a tour of the virtual space below:
#harness #sound #reveal #structure #invisible #spaces