MedNorm: A Corpus and Embeddings for Cross-terminology Medical Concept Normalisation

Dataset

Description

MedNorm is a corpus of 27,979 textual descriptions simultaneously mapped to both MedDRA and SNOMED-CT, sourced from five publicly available datasets across biomedical and social media domains.
The cross-terminology medical concept embeddings are 64-dimensional vectors for UMLS, MedDRA and SNOMED-CT concepts that are able to capture semantic similarities between concepts from different medical terminologies.

For more details see paper entitled "MedNorm: A Corpus and Embeddings for Cross-terminology Medical Concept Normalisation"
Date made available3 Jun 2019
PublisherMendeley Data

Research Beacons, Institutes and Platforms

  • Manchester Institute of Biotechnology

Cite this