Dataset for evaluation of the performance of the methods of sound source localization algorithms using tetrahedral microphone arrays

Saulius Sakavičius

doi:10.3846/mla.2020.11462

DOI: https://doi.org/10.3846/mla.2020.11462

Abstract

For the development and evaluation of a sound source localization and separation methods, a concise audio dataset with complete geometrical information about the room, the positions of the sound sources, and the array of microphones is needed. Computer simulation of such audio and geometrical data often relies on simplifications and are sufficiently accurate only for a specific set of conditions. It is generally desired to evaluate algorithms on real-world data. For a three-dimensional sound source localization or direction of arrival estimation, a non-coplanar microphone array is needed.Simplest and most general type of non-coplanar array is a tetrahedral array. There is a lack of openly accessible realworld audio datasets obtained using such arrays. We present an audio dataset for the evaluation of sound source localization algorithms, which involve tetrahedral microphone arrays. The dataset is complete with the geometrical information of the room, the positions of the sound sources and the microphone array. Array audio data was captured for two tetrahedral microphone arrays with different distances between microphones and one or two active sound sources. The dataset is suitable for speech recognition and direction-of-arrival estimation, as the signals used for sound sources were speech signals.

Article in English.

Duomenų rinkinys garso šaltinio lokalizavimo, taikant tetraedrines mikrofonų gardeles, metodų charakteristikoms tirti

Santrauka

Garso šaltinio lokalizavimo ir išskyrimo algoritmams kurti ir charakteristikoms tirti reikalingas nuosekliai sudarytas garso duomenų rinkinys, papildytas informacija apie akustines patalpos savybes, garso šaltinių ir mikrofonų gardelės padėtis. Dažnai tokie garso ir geometriniai duomenys gaunami atliekant kompiuterinę emuliaciją, tačiau dauguma emuliacijos metodų grindžiami supaprastinimais ir yra tikslūs tik tam tikromis sąlygomis. Todėl garso šaltinio lokalizavimo ir išskyrimo algoritmų veikimą išsamiai įvertinti galima tik taikant realius garso duomenis. Siekiant nustatyti garso šaltinio padėtį ar sklidimo kryptį erdvėje, reikalinga mikrofonų gardelė, kurios elementai yra nekomplanarūs. Paprasčiausias ir bendriausias nekomplanarios gardelės tipas yra tetraedrinė gardelė. Šiuo metu nėra laisvai prieinamo garso ir geometrinių duomenų rinkinio, surinkto naudojant tokio tipo mikrofonų gardeles. Šiame straipsnyje pristatomas duomenų rinkinys, skirtas garso šaltinio lokalizavimo ir išskyrimo algoritmams tirti naudojant tetraedrines mikrofonų gardeles. Duomenų rinkinį sudaro garso duomenys ir juos atitinkanti geometrinė informacija: patalpos matmenys, garso šaltinių ir mikrofonų gardelės padėtys patalpos atžvilgiu. Garso duomenys buvo surinkti naudojant dvi tetraedrines mikrofonų gardeles su skirtingais atstumais tarp mikrofonų, esant vienam arba dviem vienu metu aktyviems garso šaltiniams. Garso šaltiniais buvo atkuriamas žmogaus kalbos signalas, todėl pristatomas duomenų rinkinys yra tinkamas kalbos atpažinimo ir sklidimo krypties nustatymo algoritmams tirti.

Reikšminiai žodžiai: garso duomenų rinkinys, garso šaltinio lokalizavimas, patalpos akustika, tetraedrinė mikrofonų gardelė, kalbos atpažinimas, garso šaltinio išskyrimas.

Keyword : audio dataset, sound source localization, room acoustics, tetrahedral microphone array, speech recognition, source separation

How to Cite

Sakavičius, S. (2020). Dataset for evaluation of the performance of the methods of sound source localization algorithms using tetrahedral microphone arrays. Mokslas – Lietuvos Ateitis / Science – Future of Lithuania, 12. https://doi.org/10.3846/mla.2020.11462

Published in Issue

Feb 24, 2020

Abstract Views

856

PDF Downloads

647

This work is licensed under a Creative Commons Attribution 4.0 International License.

References

Adavanne, S., Politis, A., & Virtanen, T. (2017). Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network. ArXiv:1710.10059 [Cs, Eess]. http://arxiv.org/abs/1710.10059

Alameda-Pineda, X., & Horaud, R. (2014). A geometric approach to sound source localization from time-delay estimates. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(6), 1082–1095. https://doi.org/10.1109/TASLP.2014.2317989

Allen, J. B., & Berkley, D. A. (1976). Image method for efficiently simulating small‐room acoustics. The Journal of the Acoustical Society of America, 65(4), 943–950. https://doi.org/10.1121/1.382599

Brutti, A., Omologo, M., & Svaizer, P. (2008). Localization of multiple speakers based on a two step acoustic map analysis. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4349–4352). https://doi.org/10.1109/ICASSP.2008.4518618

Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kaldec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., & Wellner, P. (2006). The AMI meeting corpus: a pre-announcement. In Proceedings of the Second International Conference on Machine Learning for Multimodal Interaction (pp. 28–39). https://doi.org/10.1007/11677482_3

Chakrabarty, S., & Habets, E. A. P. (2019). Multi-speaker DOA estimation using deep convolutional networks trained with noise signals. IEEE Journal of Selected Topics in Signal Processing, 13(1), 8–21. https://doi.org/10.1109/JSTSP.2019.2901664

Farina, A. (2007, May). Advancements in impulse response measurements by sine sweeps. In 122nd Audio Engineering Society Convention (pp. 2–21), Vienna, Austria.

Guentchev, K. (1997). Learning-based three dimensional sound localization using a compact non-coplanar array of microphones (Master’s thesis). Department of Computer Science, Michigan State University.

He, W., Motlicek, P., & Odobez, J.-M. (2018). Deep neural networks for multiple speaker detection and localization. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 74–79).
https://doi.org/10.1109/ICRA.2018.8461267

He, W., Motlicek, P., & Odobez, J.-M. (2019). Adaptation of multiple sound source localization neural networks with weak supervision and domain-adversarial training. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (pp. 770–774). https://doi.org/10.1109/ICASSP.2019.8682655

ITU-T, Rec. P.342. (2009). Transmission characteristics for narrow-band digital loudspeaking and hands-free telephony terminals. International Telecommunication Union, Geneva.

Le Roux, J., Vincent, E., Hershey, J. R., & Ellis, D. P. W. (2015). Micbots: Collecting large realistic datasets for speech and audio research using mobile robots. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5635–5639). https://doi.org/10.1109/ICASSP.2015.7179050

Lollmann, H. W., Evers, C., Schmidt, A., Mellmann, H., Barfuss, H., Naylor, P. A., & Kellermann, W. (2018). The LOCATA challenge data corpus for acoustic source localization and tracking. In 2018 IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM) (pp. 410–414).
https://doi.org/10.1109/SAM.2018.8448644

Loud Technologies Inc. (2017). Thump12A, Thump15A 1300W powered loudspeakers. Owner’s manual. https://mackie.com/sites/default/files/PRODUCT%20RESOURCES/MANUALS/Owners_Manuals/Thump12A_Thump15A_OM.pdf

Ozeki, K., & Hamada, N. (2006). Estimating directions of multiple sound sources using tetrahedral microphone array. In TENCON 2006 − 2006 IEEE Region 10 Conference (pp. 1–4).
https://doi.org/10.1109/TENCON.2006.343853

Sabine, W. C., & Egan, M. D. (1994). Collected papers on acoustics. Harvard University Press. https://doi.org/10.1121/1.409944

Scheibler, R., Bezzam, E., & Dokmanić, I. (2017). Pyroomacoustics: A Python package for audio room simulations and array processing algorithms. ArXiv:1710.04196 [Cs, Eess].
http://arxiv.org/abs/1710.04196

Schroeder, M. R. (1965). New method of measuring reverberation time. The Journal of the Acoustical Society of America, 37(3), 409–412. https://doi.org/10.1121/1.1909343

Siltanen, S., Lokki, T., & Savioja, L. (2010). Rays or waves? Understanding the strengths and weaknesses of computational room acoustics modeling techniques. In Proceedings of the International Symposium on Room Acoustics, ISRA 2010 (pp. 1–6).

Skålevik, M. (2011). Schroeder frequency revisited. Paper presented at the Proceedings of Forum Acusticum.

Stan, G. B., Embrechts, J. J., & Archambeau, D. (2002). Comparison of different impulse response measurement techniques. Journal of the Audio Engineering Society, 50(4), 249–262.

Strauss, M., Mordel, P., Miguet, V., & Deleforge, A. (2018). DREGON: dataset and methods for UAV-embedded sound source localization. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 1–8). https://doi.org/10.1109/IROS.2018.8593581

Takeda, R., & Komatani, K. (2017). Unsupervised adaptation of deep neural networks for sound source localization using entropy minimization. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2217–2221). https://doi.org/10.1109/ICASSP.2017.7952550