Abstract
Within the scientific literature, tables are commonly used to
present factual and statistical information in a compact way, which is easy
to digest by readers. The ability to “understand” the structure of tables is
key for information extraction in many domains. However, the complexity
and variety of presentation layouts and value formats makes it difficult to
automatically extract roles and relationships of table cells. In this paper,
we present a model that structures tables in a machine readable way and
a methodology to automatically disentangle and transform tables into the
modelled data structure. The method was tested in the domain of clinical
trials: it achieved an F-score of 94.26 % for cell function identification and
94.84 % for identification of inter-cell relationships.
present factual and statistical information in a compact way, which is easy
to digest by readers. The ability to “understand” the structure of tables is
key for information extraction in many domains. However, the complexity
and variety of presentation layouts and value formats makes it difficult to
automatically extract roles and relationships of table cells. In this paper,
we present a model that structures tables in a machine readable way and
a methodology to automatically disentangle and transform tables into the
modelled data structure. The method was tested in the domain of clinical
trials: it achieved an F-score of 94.26 % for cell function identification and
94.84 % for identification of inter-cell relationships.
Original language | English |
---|---|
Title of host publication | Natural Language Processing and Information Systems |
Subtitle of host publication | 21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016, Salford, UK, June 22-24, 2016, Proceedings |
Place of Publication | Switzerland |
Publisher | Springer Nature |
Pages | 162-174 |
Number of pages | 13 |
Volume | 9612 |
ISBN (Electronic) | 978-3-319-41754-7 |
ISBN (Print) | 978-3-319-41753-0 |
DOIs | |
Publication status | Published - 17 Jun 2016 |
Event | 21st International Conference on Applications of Natural Language to Information Systems - Media City, Salford, United Kingdom Duration: 22 Jun 2016 → 24 Jun 2016 Conference number: 21 http://www.salford.ac.uk/conferencing-at-salford/conference-management/current-conference/nldb-conference |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 9612 |
Conference
Conference | 21st International Conference on Applications of Natural Language to Information Systems |
---|---|
Abbreviated title | NLDB 2016, |
Country/Territory | United Kingdom |
City | Salford |
Period | 22/06/16 → 24/06/16 |
Internet address |
Keywords
- Table mining
- Text mining
- Data management
- Data modelling
- Natural language processing