PMC clinical trial disentangled tables data set



The database is created by processing 6558 clinical trial articles from PubMed Central public sample 2014. The articles are obtained by matching PMC and Medline documents. The documents that were selected contained in publication type word "Clinical" in Medline.

The documents were processed using TableDisentangler tool, that is able to create the majority of the database. Then documents were annotated using UMLS/MetaMap and script that is a part of TableDisentangler tool for communication with MetaMap. Three case studies were performed for information extraction from these data:
- Extraction of patients' age
- Extraction of gender distribution
- Extraction of FEV1 measures (this has been performed for COPD studies only)

Information extraction case studies were performed using TabInOut tool for generating table information extraction rules.

Database schema can be seen on the following link:

Files included in the dataset:
- - This file contains raw xml clinical documents from PMC
- - Contains database with processed data using TableDisentangler and TabInOut
Date made available19 May 2017
PublisherMendeley Data

Cite this