Developing multilingual text mining workflows in UIMA and U-compare

Georgios Kontonasios, Ioannis Korkontzelos, Sophia Ananiadou, Georgios Kontonatsios

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

We present a generic, language-independent method for the construction of multilingual text mining workflows. The proposed mechanism is implemented as an extension of U-Compare, a platform built on top of the Unstructured Information Management Architecture (UIMA) that allows the construction, comparison and evaluation of interoperable text mining workflows. UIMA was previously supporting strictly monolingual workflows. Building multilingual workflows exhibits challenging problems, such as representing multilingual document collections and executing language-dependent components in parallel. As an application of our method, we develop a multilingual workflow that extracts terms from a parallel collection using a new heuristic. For our experiments, we construct a parallel corpus consisting of approximately 188.000 PubMed article titles for French and English. Our application is evaluated against a popular monolingual term extraction method, C Value. © 2012 Springer-Verlag.
Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|Lect. Notes Comput. Sci.
Place of PublicationBerlin, Heidelberg
PublisherSpringer Nature
Pages82-93
Number of pages11
Volume7337
ISBN (Print)9783642311772
DOIs
Publication statusPublished - 2012
Event17th International Conference on Applications of Natural Language to Information Systems, NLDB 2012 - Groningen
Duration: 1 Jul 2012 → …
http://dx.doi.org/10.1007/978-3-642-31178-9\_8

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Berlin / Heidelberg

Other

Other17th International Conference on Applications of Natural Language to Information Systems, NLDB 2012
CityGroningen
Period1/07/12 → …
Internet address

Keywords

  • multilingual term extraction
  • multilingual text mining workflows
  • text mining
  • U-Compare
  • UIMA

Fingerprint

Dive into the research topics of 'Developing multilingual text mining workflows in UIMA and U-compare'. Together they form a unique fingerprint.

Cite this