Trencsényi, Réka Eszter and Czap, László (2025) Artikulációs beszédszintézis megvalósítása dinamikus ultrahangfelvételek alapján. BESZÉDTUDOMÁNY / SPEECH SCIENCE, 5 (1). pp. 90-116. ISSN 2732-3773
|
Text
17316-Cikkszovege-76170-1-10-20250415.pdf - Published Version Download (1MB) | Preview |
Abstract
Starting from 2D dynamic ultrasound sources recording the movement of the vocalorgans and the speech signal of the speaker in a simultaneous and synchronised manner,we produce machine speech by means of artificial intelligence. As visual objects,we use tongue and palate contours fitted automatically to the anatomic boundariesof the ultrasound images, and for training, we extract geometric information fromthese contours, as the change of their shape fundamentally describes the movement ofthe vocal organs during articulation. The geometric data consist of radial distancesbetween the tongue and palate contours and coefficients of the discrete cosine transformof the curves, respectively. Relying on this dataset, parameters connected to theacoustic content of the speech signal are trained by the network. These parameterscan be interpreted in the framework of the acoustic tube model of the vocal tract, andaccording to this, reflection coefficients and areas of the articulation channel are to betrained. In this study, sentences are synthesised using linear predictive coding and theacoustic tube model.
| Item Type: | Article |
|---|---|
| Subjects: | P Language and Literature / nyelvészet és irodalom > P0 Philology. Linguistics / filológia, nyelvészet |
| SWORD Depositor: | MTMT SWORD |
| Depositing User: | MTMT SWORD |
| Date Deposited: | 08 Jan 2026 09:14 |
| Last Modified: | 08 Jan 2026 09:14 |
| URI: | https://real.mtak.hu/id/eprint/231692 |
Actions (login required)
![]() |
Edit Item |




