Development and Evaluation of a Text-Analytics Algorithm for Automated Application of National COVID-19 Shielding Criteria in Rheumatology Patients

Meghna Jani, Ghada Alfattni, Maksim Belousov, Lynn Laidlaw, Yuanyuan Zhang, Michael Cheng, Karim Webb, Robyn Hamilton, Andrew S. Kanter, William G. Dixon, Goran Nenadic

Research output: Contribution to journalArticlepeer-review


At the beginning of the COVID-19 pandemic, the UK’s Scientific Committee issued extreme social distancing measures, termed ‘shielding’, aimed at a subpopulation deemed extremely clinically vulnerable to infection. National guidance for risk stratification was based on patients’ age, comorbidities, and immunosuppressive therapies, including biologics that are not captured in primary care records. This process required considerable clinician time to manually review outpatient letters. Our aim was to develop and evaluate an automated shielding algorithm by text-mining outpatient letter diagnoses and medications, reducing the need for future manual review.

Rheumatology outpatient letters from a large UK Foundation trust, were retrieved. Free-text diagnoses were processed using Intelligent Medical Objects® software (Concept Tagger), which utilised interface terminology for each condition mapped to SNOMED-CT codes. We developed the Medication Concept Recognition tool (MedCore Named Entity Recognition) to retrieve medications’ type, dose, duration and status (active/past) at the time of the letter. Age, diagnosis and medication variables were then combined to calculate a shielding score based on the most recent letter. The algorithm’s performance was evaluated using clinical review as the gold standard. The time taken to deploy the developed algorithm on a larger patient subset was measured.

In total 5,942 free-text diagnoses were extracted and mapped to SNOMED CT, with 13,665 free-text medications (n=803 patients). The automated algorithm demonstrated a sensitivity of 80% (95% CI: 75, 85%) and specificity of 92% (95% CI: 90, 94%). Positive likelihood ratio was 10 (95% CI: 8, 14), negative likelihood ratio was 0.21 (95% CI: 0.16, 0.28), F1 score was 0.81. Evaluation of mismatches revealed that the algorithm performed correctly against the gold standard in most cases. The developed algorithm was then deployed on records from an additional 15,865 patients, which took 18 hours for data extraction and one hour to deploy.

An automated algorithm for risk stratification has several advantages including reducing clinician time for manual review to allow more time for direct care, improving efficiency, increasing transparency in individual patient communication. It has the potential to be adapted for future public health initiatives that requires prompt automated review of hospital outpatient letters.
Original languageEnglish
JournalAnnals of Rheumatic Diseases
Publication statusAccepted/In press - 26 Mar 2024


Dive into the research topics of 'Development and Evaluation of a Text-Analytics Algorithm for Automated Application of National COVID-19 Shielding Criteria in Rheumatology Patients'. Together they form a unique fingerprint.

Cite this