Siklósi, Borbála and Orosz, György and Novák, Attila and Prószéky, Gábor (2012) Automatic structuring and correction suggestion system for Hungarian clinical records. In: 8th SaLTMiL Workshop on Creation and use of basic lexical resources for less-resourced languages, 2012.05.22, Istanbul, Törökország.
|
Text
2012_LREC_medtext_final.pdf Download (363kB) | Preview |
Abstract
The first steps of processing clinical documents are structuring and normalization. In this paper we demonstrate how we compensate the lack of any structure in the raw data by transforming simple formatting features automatically to structural units. Then we developed an algorithm to separate running text from tabular and numerical data. Finally we generated correcting suggestions for word forms recognized to be incorrect. Some evaluation results are also provided for using the system as automatically correcting input texts by choosing the best possible suggestion from the generated list. Our method is based on the statistical characteristics of our Hungarian clinical data set and on the HUMor Hungarian morphological analyzer. The conclusions claim that our algorithm is not able to correct all mistakes by itself, but is a very powerful tool to help manually correcting Hungarian medical texts in order to produce a correct text corpus of such a domain.
Item Type: | Conference or Workshop Item (Lecture) |
---|---|
Subjects: | P Language and Literature / nyelvészet és irodalom > PH Finno-Ugrian, Basque languages and literatures / finnugor és baszk nyelvek és irodalom > PH04 Hungarian language and literature / magyar nyelv és irodalom Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány |
Depositing User: | Borbála Siklósi |
Date Deposited: | 07 Feb 2014 17:17 |
Last Modified: | 03 Apr 2023 06:37 |
URI: | http://real.mtak.hu/id/eprint/10202 |
Actions (login required)
Edit Item |