Halmai, Dániel and Gosztolya, Gábor (2025) Spoken Emotion Recognition Using Soft Labels. In: Speech and Computer. Lecture Notes in Computer Science, 16187 (16187). Springer Nature Switzerland, Cham, pp. 101-112. ISBN 9783032079558; 9783032079565 (In Press)
|
Text
2025-specom-emotion.pdf Restricted to Repository staff only Download (632kB) | Request a copy |
Abstract
In Spoken Emotion Recognition (SER), the task is to identify the emotion of the speaker from the speech signal. Emotion, however, is much more complex a phenomenon than what can be described simply by one emotional category, and although training a neural network on such categories (i.e. hard labels) clearly works in practice, this procedure also leads to information loss. In this study we calculate soft labels for each speech recording based on the votes of the annotators instead, and train our neural network models in a regression task on a large SER corpus (MSP Podcast). By our results, this procedure led to a drop in the macro-averaged recall and F1-score values, but brought improvements in classification accuracy, macro-averaged precision, and in all metrics aggregated with weighting the class-wise metrics with class frequency. By our analysis, this behaviour can probably be attributed to the notably lower mean training targets of the less frequent emotions, which caused our neural networks (trained in regression mode) to consistently output low probability estimates for these classes and focus on the more frequent emotion categories instead.
| Item Type: | Book Section |
|---|---|
| Subjects: | Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány |
| SWORD Depositor: | MTMT SWORD |
| Depositing User: | MTMT SWORD |
| Date Deposited: | 26 Nov 2025 07:26 |
| Last Modified: | 26 Nov 2025 07:26 |
| URI: | https://real.mtak.hu/id/eprint/229892 |
Actions (login required)
![]() |
Edit Item |




