Personal profile


Yan is a statistician and applied statistician with PhD in statistical epidemiology, MSc in health data science and BSc degree in mathematics. Yan is also an experienced statistical programmer as he has worked in a data management company and has conducted extensive statistical analysis in his MSc and PhD project for 6 years. He has expertise using electronic health records (EHR) to conduct epidemiological studies. He is also experienced in conducting statistical analysis and fulfilling the Food and Drug Administration (FDA) requirements for clinical trials of new medicines as he conducted multiple projects in the analyses of trial data. His current research area focuses on assessing the generalisability of risk prediction models (including traditional risk prediction model and machine-learning (AI) models) using EHRs from UK databases. 

Methodological knowledge


I am experienced using EHR data to assess model performance on both population level and individual level of risk prediction model. I am also familiar with using risk prediction model and other statistics to conduct epidemiology studies, e.g. to investigate on the drivers of antibiotics over prescription.  


I am capable of combining the strength of different program languages (SAS, R, Python, C++ and Java), and quickly overcoming computational challenges of substantive software and hardware in the research. I am familiar with statistical programming with SAS procedures, model fitting with R packages and machine learning model fitting and validation with Python.


As an experienced data scientist, I am very familiar with conducting analysis with large data. I have fitted and validated multiple risk prediction models including both of traditional statistical models and machine learning models in a 3.6 million cohort. I designed a workflow with the combination of advantages of different programming languages to conduct efficient statistical analysis using large data.

Research interests

Currently I have three aspects of research interest:

  1. The generalisability and clinical utility of traditional statistical risk prediction model and how could we improve its performance on individual level. Mainly with Cardiovascular disease as exemplar.
  2. The clinical implementation of machine learning (AI) models for risk prediction modelling and how could we take advantage of the strength of machine learning models and walk around from their disadvantages.
  3. The drivers of antibiotics over-prescription including what are the main drivers of antibiotic prescription, how could we best decrease the antibiotic prescription and any other impact of these interventions on antibiotic prescribing.


I taught new employee SAS programing in clinical trials.

I assist teaching master students how to program with R

I supervise master students for their dissertation

I taught master students with an introduction of clinical risk prediction modelling with traditional statistical modelling and machine learning

Other research

I am also a part-time video editor and photo designer with an aim to make the educational course contents more interesting with meme, animation and comics.

I am also interested in independent game develop with Python. I wish to involve more educational video game into education.

Expertise related to UN Sustainable Development Goals

In 2015, UN member states agreed to 17 global Sustainable Development Goals (SDGs) to end poverty, protect the planet and ensure prosperity for all. This person’s work contributes towards the following SDG(s):

  • SDG 3 - Good Health and Well-being

Education/Academic qualification

Doctor of Philosophy, The University of Manchester

1 Sept 20171 Sept 2020

Award Date: 1 Sept 2020

Master in Science, Health data science, The University of Manchester

1 Sept 20161 Sept 2017

Award Date: 1 Sept 2017

Bachelor of Science, Mathematics and applied mathematics, Sichuan University

1 Sept 20091 Sept 2013

Award Date: 1 Sept 2013

Areas of expertise

  • Q Science (General)
  • Epidemiology
  • statistical modelling
  • Statistics
  • QA75 Electronic computers. Computer science
  • programming


  • Risk prediction
  • EHR
  • CVD
  • Epidemiology
  • Statistics
  • Machine Learning
  • Artificial Inteligence


Dive into the research topics where Yan Li 李彦 is active. These topic labels come from the works of this person. Together they form a unique fingerprint.
  • 1 Similar Profiles

Collaborations and top research areas from the last five years

Recent external collaboration on country/territory level. Dive into details by clicking on the dots or