1
|
Ge Y, Tang C, Li H, Chen Z, Wang J, Li W, Cooper J, Chetty K, Faccio D, Imran M, Abbasi QH. A comprehensive multimodal dataset for contactless lip reading and acoustic analysis. Sci Data 2023; 10:895. [PMID: 38092796 PMCID: PMC10719268 DOI: 10.1038/s41597-023-02793-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 11/27/2023] [Indexed: 12/17/2023] Open
Abstract
Small-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this paper, we introduce a novel multimodal dataset based on Radio Frequency, visual, text, audio, laser and lip landmark information, also called RVTALL. Specifically, the dataset consists of 7.5 GHz Channel Impulse Response (CIR) data from ultra-wideband (UWB) radars, 77 GHz frequency modulated continuous wave (FMCW) data from millimeter wave (mmWave) radar, visual and audio information, lip landmarks and laser data, offering a unique multimodal approach to speech recognition research. Meanwhile, a depth camera is adopted to record the landmarks of the subject's lip and voice. Approximately 400 minutes of annotated speech profiles are provided, which are collected from 20 participants speaking 5 vowels, 15 words, and 16 sentences. The dataset has been validated and has potential for the investigation of lip reading and multimodal speech recognition.
Collapse
Affiliation(s)
- Yao Ge
- James Watt School of Engineering, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Chong Tang
- James Watt School of Engineering, University of Glasgow, Glasgow, G12 8QQ, UK
- Department of Security and Crime Science, University College London, London, WC1E 6BT, UK
| | - Haobo Li
- School of Physics & Astronomy, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Zikang Chen
- James Watt School of Engineering, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Jingyan Wang
- James Watt School of Engineering, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Wenda Li
- School of Science and Engineering, University of Dundee, Dundee, DD1 4HN, UK
| | - Jonathan Cooper
- James Watt School of Engineering, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Kevin Chetty
- Department of Security and Crime Science, University College London, London, WC1E 6BT, UK
| | - Daniele Faccio
- School of Physics & Astronomy, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Muhammad Imran
- James Watt School of Engineering, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Qammer H Abbasi
- James Watt School of Engineering, University of Glasgow, Glasgow, G12 8QQ, UK.
| |
Collapse
|
2
|
Xia NH, Xie CF, Liu YS, Wei B, Zhang HL, Guo Z, Zhang L, Wang MY, He XD. Two-dimensional displacement estimation of one-dimensional laser speckle images for detection of acoustic vibration. APPLIED OPTICS 2023; 62:1785-1790. [PMID: 37132926 DOI: 10.1364/ao.482438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Detection and recovery of audio signals using optical methods is an appealing topic. Observing the movement of secondary speckle patterns is a convenient method for such a purpose. In order to have less computational cost and faster processing, one-dimensional laser speckle images are captured by an imaging device, while it sacrifices the ability to detect speckle movement along one axis. This paper proposes a laser microphone system to estimate the two-dimensional displacement from one-dimensional laser speckle images. Hence, we can regenerate audio signals in real time even as the sound source is rotating. Experimental results show that our system is capable of reconstructing audio signals under complex conditions.
Collapse
|