REAL

Deep Learning-Based Analysis of Ancient Greek Literary Texts in English Version: A Statistical Model Based on Word Frequency and Noise Probability for the Classification of Texts

Gál, Zoltán and Tóth, Erzsébet (2024) Deep Learning-Based Analysis of Ancient Greek Literary Texts in English Version: A Statistical Model Based on Word Frequency and Noise Probability for the Classification of Texts. INFOCOMMUNICATIONS JOURNAL, 16 (Specia). pp. 2-11. ISSN 2061-2079

[img]
Preview
Text
InfocomJournal_2024_SpecISS_CogInf_CogAspVR_1.pdf - Published Version

Download (2MB) | Preview

Abstract

In our paper we intend to present a methodology that we elaborated for clustering texts based on the word fre quency in the English translations of selected old Greek texts. We used the classification system of the ancient Library of Alex andria, devised by the prominent Greek scholar-poet, Callima chus in the 3rd century BC., as a basis for categorizing literary masterpieces. In our content analysis, we could determine a tri plet of a, b, c values for describing a power function that appro priately fits a curve determined by the word frequencies in the texts. In addition, we have discovered 16 special features of the different texts that correspond to various token categories inves tigated in each text, such as part of speech of the word in the con text, numerals, subordinate conjunction, symbols, etc. We have developed a cognitive model in which several hundred different subtexts were utilized for supervised learning with the aim of subtext class recognition. Concerning 200 subtexts, the triplet of a, b, c values, the classes of the subtexts, and their 16-dimen sional feature vectors were learnt for the Recurrent Neural Net work (RNN). It turned out that the Long-Short Term Memory RNN could efficiently predict which class a chosen subtext could be categorized into without considering the interpretation of the content. The influence of the non-zero error rate of new com munication services on the meaning of the transferred texts was also investigated. The impact of the noise on the classification accuracy was found to be linear, dependent on the character error rate.

Item Type: Article
Uncontrolled Keywords: deep learning, old Greek literary texts, Pinakes, automatic content analysis, text classification, Recurrent Neural Network (RNN), Long-Short Term Memory, noisy texts
Subjects: P Language and Literature / nyelvészet és irodalom > PN Literature (General) / irodalom általában
Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
SWORD Depositor: MTMT SWORD
Depositing User: MTMT SWORD
Date Deposited: 15 Jul 2024 12:35
Last Modified: 15 Jul 2024 12:35
URI: https://real.mtak.hu/id/eprint/200159

Actions (login required)

Edit Item Edit Item