TY - JOUR
T1 - Understanding Views Around the Creation of a Consented, Donated Databank of Clinical Free Text to Develop and Train Natural Language Processing Models for Research
T2 - Focus Group Interviews With Stakeholders
AU - Fitzpatrick, Natalie K
AU - Dobson, Richard
AU - Roberts, Angus
AU - Jones, Kerina
AU - Shah, Anoop D
AU - Nenadic, Goran
AU - Ford, Elizabeth
N1 - ©Natalie K Fitzpatrick, Richard Dobson, Angus Roberts, Kerina Jones, Anoop D Shah, Goran Nenadic, Elizabeth Ford. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 03.05.2023.
Funding Information:
This work was funded by Healtex and Health Data Research UK. The funders have no role in developing the content of this manuscript. The authors would like to acknowledge and thank all the focus group participants for their expert knowledge and continued support of this work and Hopkins Van Mil for facilitating the focus groups. ADS is supported by NIHR (AI_AWARD01864 and COV-LT-0009), UKRI (Horizon Europe Guarantee for DataTools4Heart) and British Heart Foundation Accelerator Award (AA/18/6/24223).
Publisher Copyright:
©Natalie K Fitzpatrick, Richard Dobson, Angus Roberts, Kerina Jones, Anoop D Shah, Goran Nenadic, Elizabeth Ford.
PY - 2023/5/3
Y1 - 2023/5/3
N2 - BACKGROUND: Information stored within electronic health records is often recorded as unstructured text. Special computerized natural language processing (NLP) tools are needed to process this text; however, complex governance arrangements make such data in the National Health Service hard to access, and therefore, it is difficult to use for research in improving NLP methods. The creation of a donated databank of clinical free text could provide an important opportunity for researchers to develop NLP methods and tools and may circumvent delays in accessing the data needed to train the models. However, to date, there has been little or no engagement with stakeholders on the acceptability and design considerations of establishing a free-text databank for this purpose.OBJECTIVE: This study aimed to ascertain stakeholder views around the creation of a consented, donated databank of clinical free text to help create, train, and evaluate NLP for clinical research and to inform the potential next steps for adopting a partner-led approach to establish a national, funded databank of free text for use by the research community.METHODS: Web-based in-depth focus group interviews were conducted with 4 stakeholder groups (patients and members of the public, clinicians, information governance leads and research ethics members, and NLP researchers).RESULTS: All stakeholder groups were strongly in favor of the databank and saw great value in creating an environment where NLP tools can be tested and trained to improve their accuracy. Participants highlighted a range of complex issues for consideration as the databank is developed, including communicating the intended purpose, the approach to access and safeguarding the data, who should have access, and how to fund the databank. Participants recommended that a small-scale, gradual approach be adopted to start to gather donations and encouraged further engagement with stakeholders to develop a road map and set of standards for the databank.CONCLUSIONS: These findings provide a clear mandate to begin developing the databank and a framework for stakeholder expectations, which we would aim to meet with the databank delivery.
AB - BACKGROUND: Information stored within electronic health records is often recorded as unstructured text. Special computerized natural language processing (NLP) tools are needed to process this text; however, complex governance arrangements make such data in the National Health Service hard to access, and therefore, it is difficult to use for research in improving NLP methods. The creation of a donated databank of clinical free text could provide an important opportunity for researchers to develop NLP methods and tools and may circumvent delays in accessing the data needed to train the models. However, to date, there has been little or no engagement with stakeholders on the acceptability and design considerations of establishing a free-text databank for this purpose.OBJECTIVE: This study aimed to ascertain stakeholder views around the creation of a consented, donated databank of clinical free text to help create, train, and evaluate NLP for clinical research and to inform the potential next steps for adopting a partner-led approach to establish a national, funded databank of free text for use by the research community.METHODS: Web-based in-depth focus group interviews were conducted with 4 stakeholder groups (patients and members of the public, clinicians, information governance leads and research ethics members, and NLP researchers).RESULTS: All stakeholder groups were strongly in favor of the databank and saw great value in creating an environment where NLP tools can be tested and trained to improve their accuracy. Participants highlighted a range of complex issues for consideration as the databank is developed, including communicating the intended purpose, the approach to access and safeguarding the data, who should have access, and how to fund the databank. Participants recommended that a small-scale, gradual approach be adopted to start to gather donations and encouraged further engagement with stakeholders to develop a road map and set of standards for the databank.CONCLUSIONS: These findings provide a clear mandate to begin developing the databank and a framework for stakeholder expectations, which we would aim to meet with the databank delivery.
KW - consent
KW - databank
KW - electronic health records
KW - free text
KW - governance
KW - natural language processing
KW - public involvement
KW - unstructured text
U2 - 10.2196/45534
DO - 10.2196/45534
M3 - Article
C2 - 37133927
SN - 2291-9694
VL - 11
SP - e45534
JO - JMIR medical informatics
JF - JMIR medical informatics
M1 - e45534
ER -