Balloon Popping Samples
The raw data we collected
North
East
South
West
North East
South East
North West
South West
**for our data, North refers to the direction away from the screen (towards the user), South refers to the direction into the screen (away from the user), while East refers to the left and West refers to the right, with all other directions in between.
​
We collected audio data by popping balloons at equidistance from the microphone (stereo) and at the same level from 8 different points located in the NW, NE, W, E, S, SE, SW, and N directions. By popping the balloons at these locations, we were able to capture the sound waves produced by the balloons at different angles and distances relative to the microphone. To obtain the most accurate data possible, we had to pop multiple balloons at each location and extract the data with the most important components. We also recorded data at a level elevation (0 degrees) to match our microphone's specific recording signature, which records more evenly when level. We will likely also need to filter our audio through an equalizer to remove characteristics in the recording caused by the microphone's sound signature itself.
Computing the Interaural Time Difference (ITD) and Interaural Level Difference (ILD)
The compute_ild function takes two input arguments: the audio data from the left and right channels of a stereo signal. It calculates the power of each channel by squaring the audio samples and summing the results. To get the ILD value, it takes the ratio of the left channel power to the right channel power, and converting the result to decibels (dB) using the formula ILD = 10 * log10(left_power / right_power). The ILD represents the difference in sound pressure between the ears, and is an important cue used by the auditory system for sound localization when paired with values of ITD. ILD is also significant as it can vary slightly with elevation as well, while ITD has vertical symmetry.
The compute_itd function takes three input arguments: the audio data from the left and right channels of a stereo signal, and the sampling rate of the signal. It calculates the cross-correlation between the left and right channels using the numpy correlate function with the mode set to 'full'. The maximum value in the resulting cross-correlation array represents the time shift between the two channels that maximizes their correlation. The time shift is computed as the difference between the maximum correlation index and the length of the left channel plus one. The ITD is then calculated as the time shift divided by the sampling rate. When paired with ILD, it can be used for sound localization.
Result from Single balloon(s)
North ILD: -1.9229315221309662 dB, North ITD: 0.0 s
East ILD: 1.3773463666439056 dB, East ITD: -2.0833333333333333e-05 s
South ILD: -2.045888751745224 dB, South ITD: 0.0 s
West ILD: -5.9534752368927 dB, West ITD: -2.0833333333333333e-05 s
North East ILD: 0.3517654538154602 dB, North East ITD: -2.0833333333333333e-05 s
North West ILD: -3.2634425163269043 dB, North West ITD: 2.0833333333333333e-05 s
South East ILD: 0.8375445008277893 dB, South East ITD: -6.25e-05 s
South West ILD: -3.884981870651245 dB, South West ITD: 4.1666666666666665e-05 s
Analyzing Data
ILD values
-
North and South have similar values, which makes sense as they are along the axis of symmetry.
-
East and South East are positive (1.38 dB and 0.84 dB, respectively), showing that the right ear is louder
-
West and South West are negative (-5.95 dB and -3.88 dB, respectively), showing that the left ear is louder. Also, they are significantly more negative than North and South, showing that there is indeed a direction-specific difference as well.
ITD values
-
North and South directions are both 0, indicating that the sound arrives at both ears simultaneously.
-
East, North East, and South East are negative (-2.08e-05 s, -2.08e-05 s, and -6.25e-05 s, respectively), showing that left ear arrives earlier.
-
West, North West, and South West are positive (2.08e-05 s, 2.08e-05 s, and 4.17e-05 s, respectively), showing that right ear arrives earlier.
In summary, the ILD and ITD values provide information about the spatial characteristics of the balloon pop recordings from different directions.
HRTF model based on Dataset
Find the average of all the SOFA files
​
Load an HRTF dataset (e.g., from the 3D3A Lab at Princeton University) in SOFA format.
Find the nearest HRTF measurements to your desired parameters (azimuth and elevation angles).
If necessary, interpolate the HRTF to obtain a smoother transfer function.
Apply the HRTF transfer function to your audio signal.
The results obtained from the generalized HRTF model were significantly better due to the accurate and professional measurements provided by the HRTF dataset. The high-quality data ensured that the model was well-calibrated and able to deliver superior performance. The measurements were meticulously conducted by experts in the field, ensuring the utmost precision and reliability. As a result, the HRTF model generated an exceptionally clear and accurate spatial audio experience. This not only enhanced the user's auditory perception but also demonstrated the importance of having reliable and precise HRTF data when developing a generalized model for spatial audio applications.
​
The outstanding results from the generalized HRTF model can also be attributed to the fact that it underwent extensive hearing tests with multiple individuals from diverse backgrounds. This wide-ranging testing allowed the model to be fine-tuned and optimized for various ear shapes, sizes, and head-related features, ensuring a more robust and versatile performance. Consequently, the HRTF model was able to deliver an accurate and immersive spatial audio experience for a broader range of users, showcasing the effectiveness of incorporating a comprehensive testing approach when developing such models.
[1]. R. Sridhar and E. Choueiri. The 3D3A Lab Head-Related Transfer Function Database. 3D3A Lab Technical Report #3, October 2021 (upcoming).\
[2]. R. Sridhar, J. G. Tylka, and E. Y. Choueiri. A database of the head-related transfer function and morphological measurements. In Audio Engineering Society Convention 143, October 2017.