Transformer based Grapheme-to-Phoneme Conversion

Yolchuyeva, Sevinj and Németh, Géza and Gyires-Tóth, Bálint (2019) Transformer based Grapheme-to-Phoneme Conversion. In: 20th Annual Conference of the International Speech Communication Association INTERSPEECH 2019, Sept 15-19, 2019, Graz, Ausztria.

[img] Text
IS2019_Transformer_G2P_SY_1954.pdf - Published Version
Restricted to Registered users only until 1 January 2034.

Download (167kB) | Request a copy


Attention mechanism is one of the most successful techniques in deep learning based Natural Language Processing (NLP). The transformer network architecture is completely based on attention mechanisms, and it outperforms sequence-to-sequence models in neural machine translation without recurrent and convolutional layers. Grapheme-to-phoneme (G2P) conversion is a task of converting letters (grapheme sequence) to their pronunciations (phoneme sequence). It plays a significant role in text-to-speech (TTS) and automatic speech recognition (ASR) systems. In this paper, we investigate the application of transformer architecture to G2P conversion and compare its performance with recurrent and convolutional neural network based approaches. Phoneme and word error rates are evaluated on the CMUDict dataset for US English and the NetTalk dataset. The results show that transformer based G2P outperforms the convolutional-based approach in terms of word error rate and our results significantly exceeded previous recurrent approaches (without attention) regarding word and phoneme error rates on both datasets. Furthermore, the size of the proposed model is much smaller than the size of the previous approaches.

Item Type: Conference or Workshop Item (Paper)
Subjects: T Technology / alkalmazott, műszaki tudományok > T2 Technology (General) / műszaki tudományok általában
Depositing User: Dr. Gyires-Tóth Bálint Pál
Date Deposited: 25 Sep 2019 21:50
Last Modified: 25 Sep 2019 21:50

Actions (login required)

Edit Item Edit Item