REAL

Automatic structuring and correction suggestion system for Hungarian clinical records

Siklósi, Borbála and Orosz, György and Novák, Attila and Prószéky, Gábor (2012) Automatic structuring and correction suggestion system for Hungarian clinical records. In: 8th SaLTMiL Workshop on Creation and use of basic lexical resources for less-resourced languages, 2012.05.22, Istanbul, Törökország.

[img]
Preview
Text
2012_LREC_medtext_final.pdf

Download (363kB) | Preview

Abstract

The first steps of processing clinical documents are structuring and normalization. In this paper we demonstrate how we compensate the lack of any structure in the raw data by transforming simple formatting features automatically to structural units. Then we developed an algorithm to separate running text from tabular and numerical data. Finally we generated correcting suggestions for word forms recognized to be incorrect. Some evaluation results are also provided for using the system as automatically correcting input texts by choosing the best possible suggestion from the generated list. Our method is based on the statistical characteristics of our Hungarian clinical data set and on the HUMor Hungarian morphological analyzer. The conclusions claim that our algorithm is not able to correct all mistakes by itself, but is a very powerful tool to help manually correcting Hungarian medical texts in order to produce a correct text corpus of such a domain.

Item Type: Conference or Workshop Item (Lecture)
Subjects: P Language and Literature / nyelvészet és irodalom > PH Finno-Ugrian, Basque languages and literatures / finnugor és baszk nyelvek és irodalom > PH04 Hungarian language and literature / magyar nyelv és irodalom
Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Depositing User: Borbála Siklósi
Date Deposited: 07 Feb 2014 17:17
Last Modified: 07 Feb 2014 17:20
URI: http://real.mtak.hu/id/eprint/10202

Actions (login required)

Edit Item Edit Item