Optimizing class priors to improve the detection of social signals in audio data

Gosztolya, Gábor (2022) Optimizing class priors to improve the detection of social signals in audio data. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 107. No. 104541. ISSN 0952-1976

Text
2022-eaai.pdf
Restricted to Repository staff only
Download (522kB) | Request a copy

Official URL: https://doi.org/10.1016%2Fj.engappai.2021.104541

Abstract

To detect social signals such as laughter and filler events in an audio recording, the most straightforward way is to utilize a Hidden Markov Model — or these days a Hidden Markov Model/Deep Neural Network (HMM/DNN) hybrid. HMM/DNNs, however, perform best if the DNN outputs are scaled by dividing them by the a priori class probabilities first, before applying a dynamic or Viterbi beam search. These class a priori probability values (or priors for short) are usually estimated by counting the frame occurrences of each class in the training set and then dividing these totals by the total number of frames. These estimates, however, may in fact be suboptimal for a number of reasons ranging from imprecise labeling to the overconfidence of DNNs. In this study we show empirically that more reliable scaling factors can be obtained by optimization. Using this approach, we managed to achieve a 6 − 9% relative error reduction both at the frame level and the segment level, using a public database containing spontaneous English mobile phone conversations.

Item Type:	Article
Uncontrolled Keywords:	Audio processing, Social signals, Laughter detection, Filler events, Deep neural networks, a priori estimates, Optimization, CMA-ES
Subjects:	T Technology / alkalmazott, műszaki tudományok > T2 Technology (General) / műszaki tudományok általában
SWORD Depositor:	MTMT SWORD
Depositing User:	MTMT SWORD
Date Deposited:	26 Sep 2022 11:38
Last Modified:	26 Sep 2022 11:38
URI:	http://real.mtak.hu/id/eprint/149746

Actions (login required)

Edit Item