Ibrahimov, Ibrahim and Zainkó, Csaba and Gosztolya, Gábor (2025) Conformer-based Ultrasound-to-Speech Conversion. In: Annual Conference of the International Speech Communication Association, INTERSPEECH 2025. Interspeech . International Speech Communication Association (ISCA), Dublin, pp. 5578-5582.
|
Text
2025-interspeech-ssi-conformer.pdf - Published Version Download (3MB) | Preview |
Abstract
Deep neural networks have shown promising potential for ultrasound-to-speech conversion task towards Silent Speech Interfaces. In this work, we applied two Conformer-based DNN architectures (Base and one with bi-LSTM) for this task. Speaker-specific models were trained on the data of four speakers from the Ultrasuite-Tal80 dataset, while the generated mel spectrograms were synthesized to audio waveform using a HiFi-GAN vocoder. Compared to a standard 2D-CNN baseline, objective measurements (MSE and mel cepstral distortion) showed no statistically significant improvement for either model. However, a MUSHRA listening test revealed that Conformer with bi-LSTM provided better perceptual quality, while Conformer Base matched the performance of the baseline along with a 3× faster training time due to its simpler architecture. These findings suggest that Conformer-based models, especially the Conformer with bi-LSTM, offer a promising alternative to CNNs for ultrasound-to-speech conversion. © 2025 Elsevier B.V., All rights reserved.
| Item Type: | Book Section |
|---|---|
| Additional Information: | This study was supported by the NRDI Office of the Hungarian Ministry of Innovation and Technology (grant TKP2021-NVA-09), and within the framework of the Artificial Intelligence National Laboratory Program (RRF-2.3.1-21-2022-00004) and the European Union’s HORIZON Research and Innovation Programme under grant agreement No 101120657, project ENFIELD (European Lighthouse to Manifest Trustworthy and Green AI). |
| Uncontrolled Keywords: | conformer, ultrasound tongue imaging, silent speech synthesis |
| Subjects: | Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány |
| SWORD Depositor: | MTMT SWORD |
| Depositing User: | MTMT SWORD |
| Date Deposited: | 04 Feb 2026 15:27 |
| Last Modified: | 04 Feb 2026 15:27 |
| URI: | https://real.mtak.hu/id/eprint/233304 |
Actions (login required)
![]() |
Edit Item |




