José Vicente, Egas López and Gosztolya, Gábor (2023) Identifying Subjects Wearing a Mask from the Speech by Means of Encoded Speech Representations. LECTURE NOTES IN COMPUTER SCIENCE, 14102. pp. 131-140. ISSN 0302-9743
Text
2023-tsd-mask.pdf - Published Version Restricted to Registered users only Download (342kB) | Request a copy |
Abstract
In the current pandemic situation, one of the tools used to fight Covid-19 is wearing face masks in specific public spaces. As previous research on the Mask Augsburg Speech Corpus had verified, speech might be eligible to automatically determine whether the speaker is wearing a mask or not, but the performance of classification models is far from perfect at the moment. This paper employs seven transformerbased wav2vec2 models on this dataset, extracting the activations from the lower, convolutional blocks as well as from the higher, contextualized transformer blocks. We show that models obtained via the self-supervised pre-training phase lead to similar performances with both activation types. However, after fine-tuning the models for direct ASR purposes, the performance achieved by the contextualized representations dropped significantly. Here, we report the highest Unweighted Average Recall value on this corpus that was achieved by a standalone method.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | speech analysis, surgical mask, wav2vec2, computational paralinguistics, transformers |
Subjects: | Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány Q Science / természettudomány > QA Mathematics / matematika > QA76.76 Software Design and Development / Szoftvertervezés és -fejlesztés |
SWORD Depositor: | MTMT SWORD |
Depositing User: | MTMT SWORD |
Date Deposited: | 22 Aug 2024 08:23 |
Last Modified: | 22 Aug 2024 08:23 |
URI: | https://real.mtak.hu/id/eprint/203109 |
Actions (login required)
Edit Item |