Real Time Audio Normalization System (RTANS)

2023 | Individual Project

Project Overview:

I love music! I play a few instruments, including guitar and piano, but I love listening to music just as much as playing. Before audio streaming platforms such as Spotify and Apple Music released audio normalization features, I had a problem. When I was listening to music, and a song would finish playing, the next song from the queue would start playing. Unfortunately, there were many cases where the next song, compared to the previous song, was mastered differently. This would result in the next song either being way too quiet or way too loud. With a Sony receiver being my source of the audio, there was no simple way to adjust the volume automatically, as Sony has no API's released for this piece of hardware. To make things more difficult, determining when the volume needed to be adjusted and by how much was unpredictable and solely depended on the songs and the way they were mastered. This forced me to be creative and look for new solutions.

Proposed Solution:

To solve this problem, I knew I needed three things:

A way to measure the audio levels or "loudness" of the room at a given time.
A baseline for what I considered to be the optimal listening volume.
A way to adjust the volume of my Sony receiver (given that there was no API)

To start, let's walk through my solution to measure the "volume of a room." There were many ways to approach this problem; in fact, I tried using my laptop's built-in microphone, and this failed miserably. The microphone was not nearly as sensitive as it needed to be.

Key Point: Microphone Sensitivity refers to how efficiently a microphone can convert the sound pressure levels or SPL into a measurable signal that can be read by a computer.

In order to overcome this problem, I knew I needed better hardware. I ended up buying an Arduino, a sound sensor module, some jumper wires, and a breadboard. Granted, Arduinos and their components are relatively cheap to begin with, but I was able to pick these up while they were on sale, so I was on my way!

I had never worked with an Arduino before, so there was a learning curve before I was able to make significant progress. Despite that, within a day or so, I was able to build a functioning SPL meter. I now had a way to measure the SPL in the room. Though as soon as I began receiving digital input, I quickly came to the realization that I had a different problem. The sound sensor I was using received input as an analog value, and my Sony receiver controlled volume on a decibel scale. I knew that I either needed to design an algorithm to accurately adjust the decibel level of the receiver based on the analog input, or I could lean on mathematics to make my life a bit easier. I chose the latter...

With some research, I found that there is a formula that can be used to convert an analog audio signal into decibels. It looks something like this:

db = 20 x log(input_V / reference_V) *note we are working with log base 10*

This would be great, in a perfect world, but this would also require me to know the sensitivity of the sound sensor I was using (this information was not labeled on the product). This causes a problematic loop: You need an accurate SPL meter to calibrate the readings of an inaccurate SPL meter (the one that needs calibration). I learned something here: building an accurate SPL meter can be difficult, and there are other solutions that may have been a better approach (such as measuring RMS, or simply the average loudness). Without wasting all of the hardware I had put together, I tried to come up with a workaround and had to accept that these readings were likely not accurate SPL readings of the room. However, I realized that these readings were accurate relative to each other, so I could design an algorithm that would perform "real-time calibration" of the input (after converting to dB) and use those values. Part of designing this algorithm involved finding what I considered the "optimal listening volume." I was able to quickly address my second problem during this algorithm design, and by doing some research, I found that most audiophiles would agree that somewhere between 65-85 dB is considered an ideal and safe listening volume. I later parameterized this value, allowing me to go back and adjust it while I was listening to music via a GUI I later designed. However, just like before, due to the sensitivity of the microphone, the lack of calibration, and the mathematical conversion from analog to decibels, I was working with somewhat lossy data. Without completely redesigning the hardware, I was curious to see how much progress I could make, even with these lossy values. I decided to move on, and with problems 1 and 2 somewhat tackled, it was time to figure out how I was going to adjust the volume of my receiver, given that there was no API.

This receiver had a fun feature: if you type the IP address of the device into a browser (while on the same network), it would open up a configuration menu. There was a sub-tab in this menu called "Remote." It is exactly what you might think. It was a digital version of the handheld remote, rebuilt and displayed on an HTML page. I knew this was my ticket to controlling the receiver without an API.

All of the code I had written up to this point for the decibel readings was in the Arduino IDE and the rest in Java. I decided I would open a headless browser, use the IP address of the receiver to bring up the GUI. I would then find the HTML class assigned to the buttons I needed to use (you can find it by inspecting a webpage on most common browsers, such as Chrome) and navigating my way from the Homepage to the Remote page on the headless browser. I found the HTML class given to the volume up and volume down buttons on the page and wrote two functions in Java that would perform an action on that browser and trigger those buttons to be pressed. The functions were simple: VolumeUp() and VolumeDown().

I now had all the pieces I needed to put this project together.

With my lossy input situation somewhat taken care of and a way to control the volume of my receiver, the rest was pretty straightforward. If there was a significant change in volume (either an increase or decrease) given some tolerance level (also parameterized), then I could call the corresponding volume up or volume down function as many times as I needed before the input decibel level returned to my predefined "ideal listening volume."

I could now turn my laptop on, plug in my Arduino, launch the program, and play music as long as I wanted. The volume was now adjusting automatically, and to my surprise (given the lossy data), it did a relatively good job at maintaining the volume I desired. I was basically normalizing the audio in real time (this comes with its own problems).

This project was not perfect in any way, but it was a fantastic learning experience, and I worked towards an innovative solution to a problem I would face regularly. Not too long after the completion of this project, a normalization feature was released to common streaming platforms such as Spotify and Apple Music. This software solution was much easier to use and more accurate than my solution. Nonetheless, this was a fun project and a great learning experience.