Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method

Bernadeta Griciūtė, Lifeng Han, Goran Nenadic

Research output: Contribution to conferencePaperpeer-review

Abstract

Topic Modelling (TM) is a natural language processing (NLP) method for discovering topics in a collection of documents. Being an unsupervised method, it is a valuable tool when trying to summarise the main topics and topic changes in large quantities of data. In this study, we apply two prevalent topic modelling techniques - Latent Dirichlet Allocation (LDA) and BERTopic - to analyse the change of topics in the Swedish newspaper articles about COVID-19. We describe the corpus we created including 6515 articles, methods applied, and statistics on topic changes over approximately 1 year and two months period of time from 17th January 2020 to 13th March 2021. We hope this work can be an asset for grounding applications of topic modelling and can be inspiring for similar case studies in an era with pandemics, to support socio-economic impact research as well as clinical and healthcare analytics. Our data and source code is openly available at https://github.com/aaronlifenghan/Swed-Covid-TM.

Original languageEnglish
Pages627-636
Number of pages10
DOIs
Publication statusPublished - 26 Jun 2023
Event2023 IEEE 11th International Conference on Healthcare Informatics (ICHI) - Houston, TX, USA
Duration: 26 Jun 202329 Jun 2023

Conference

Conference2023 IEEE 11th International Conference on Healthcare Informatics (ICHI)
Period26/06/2329/06/23

Keywords

  • BERTopic
  • COVID-19
  • Latent Dirichlet Allocation (LDA)
  • Swedish Newspaper Articles
  • Topic Modelling

Fingerprint

Dive into the research topics of 'Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method'. Together they form a unique fingerprint.

Cite this