TY - CONF
T1 - Binaural rendering of spherical microphone array recordings by directly synthesizing the spatial pattern of the head-related transfer function
AU - Sakamoto, Shuichi
AU - Salvador, César
AU - Treviño, Jorge
AU - Suzuki, Yôiti
N1 - Funding Information:
A part of this work was supported by a grant from the Strategic Information and Communications R&D Promotion Programme (SCOPE) No. 082102005 from the Ministry of Internal Affairs and Communications (MIC), Japan, the A3 Foresight Program for “Ultra-realistic acoustic interactive communication on next-generation Internet,” and JSPS KAKENHI Grant Numbers JP24240016, JP26280067, JP16H01736.
PY - 2017
Y1 - 2017
N2 - Binaural technologies can convey rich spatial auditory information to listeners using simple equipment such as headphones. Advanced binaural recording and reproduction methods use spherical microphone arrays and head-related transfer function (HRTF) datasets. Mainstream techniques, such as binaural Ambisonics, characterize the recorded sound field as a weighted sum of spherical harmonics functions. In contrast, this research seeks to generate individualized binaural signals directly from the microphone recordings, without relying on intermediate sound field representations. The approach, known as SENZI, applies a set of weighting filters to the recorded microphone signals resulting in the target spatial pattern defined by the HRTF dataset. In this sense, the proposal requires finding the appropriate weighting filters by inverting a linear system. Binaural synthesis methods based on the solution to an inverse problem belong to one of two categories: HRTF modeling (type 1) or microphone signal modeling (type 2). The SENZI method considered here belongs to the HRTF modeling category. In addition, the problem is generally over- or underdetermined, depending on the number of microphones in the array and HRTFs in the dataset. This also impacts the accuracy of the synthesized binaural signals. A design problem, therefore, is to choose the most appropriate number of microphones and HRTFs. Fortunately, large HRTF datasets, as well as massively multi-channel arrays are now available. An example of the latter is a real-time implementation of the SENZI method using a 252-channel spherical microphone array and a FPGA-based processing subsystem. This research evaluates the binaural synthesis accuracy in relation to the number of microphones and HRTFs used to derive the weighting filters. Numerical simulations show that underdetermined systems generally yield better results than overdetermined ones.
AB - Binaural technologies can convey rich spatial auditory information to listeners using simple equipment such as headphones. Advanced binaural recording and reproduction methods use spherical microphone arrays and head-related transfer function (HRTF) datasets. Mainstream techniques, such as binaural Ambisonics, characterize the recorded sound field as a weighted sum of spherical harmonics functions. In contrast, this research seeks to generate individualized binaural signals directly from the microphone recordings, without relying on intermediate sound field representations. The approach, known as SENZI, applies a set of weighting filters to the recorded microphone signals resulting in the target spatial pattern defined by the HRTF dataset. In this sense, the proposal requires finding the appropriate weighting filters by inverting a linear system. Binaural synthesis methods based on the solution to an inverse problem belong to one of two categories: HRTF modeling (type 1) or microphone signal modeling (type 2). The SENZI method considered here belongs to the HRTF modeling category. In addition, the problem is generally over- or underdetermined, depending on the number of microphones in the array and HRTFs in the dataset. This also impacts the accuracy of the synthesized binaural signals. A design problem, therefore, is to choose the most appropriate number of microphones and HRTFs. Fortunately, large HRTF datasets, as well as massively multi-channel arrays are now available. An example of the latter is a real-time implementation of the SENZI method using a 252-channel spherical microphone array and a FPGA-based processing subsystem. This research evaluates the binaural synthesis accuracy in relation to the number of microphones and HRTFs used to derive the weighting filters. Numerical simulations show that underdetermined systems generally yield better results than overdetermined ones.
KW - 3D audio technology
KW - Binaural synthesis
KW - Head-related transfer functions
KW - Microphone arrays
KW - Spherical acoustics
UR - http://www.scopus.com/inward/record.url?scp=85029452607&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029452607&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85029452607
T2 - 24th International Congress on Sound and Vibration, ICSV 2017
Y2 - 23 July 2017 through 27 July 2017
ER -