Grapheme-to-Phoneme Conversion with Convolutional Neural Networks

Yolchuyeva, Sevinj and Németh, Géza and Gyires-Tóth, Bálint (2019) Grapheme-to-Phoneme Conversion with Convolutional Neural Networks. APPLIED SCIENCES, 9 (6). ISSN 2076-3417

G2P_CNN_based_Applied_Science.pdf - Published Version

Download (932kB) | Preview

Download (19MB) | Preview


Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly essential role for natural language processing, text-to-speech synthesis and automatic speech recognition systems. In this paper, we investigate convolutional neural networks (CNN) for G2P conversion. We propose a novel CNN-based sequence-to-sequence (seq2seq) architecture for G2P conversion. Our approach includes an end-to-end CNN G2P conversion with residual connections, furthermore, a model, which utilizes a convolutional neural network (with and without residual connections) as encoder and Bi-LSTM as a decoder. We compare our approach with state-of-the-art methods, including Encoder-Decoder LSTM and Encoder-Decoder Bi-LSTM. Training and inference times, phoneme and word error rates were evaluated on the public CMUDict dataset for US English, and the best performing convolutional neural network based architecture was also evaluated on the NetTalk dataset. Our method approaches the accuracy of previous state-of-the-art results in terms of phoneme error rate.

Item Type: Article
Subjects: T Technology / alkalmazott, műszaki tudományok > T2 Technology (General) / műszaki tudományok általában
Depositing User: Dr. Gyires-Tóth Bálint Pál
Date Deposited: 26 Sep 2019 05:45
Last Modified: 26 Sep 2019 05:45

Actions (login required)

Edit Item Edit Item