Crude oil price forecasting incorporating news text

Yun Bai, Xixi Li, Suling Jia

Research output: Contribution to journalArticlepeer-review

597 Downloads (Pure)


Sparse and short news headlines can be arbitrary, noisy, and ambiguous, making it difficult for classic topic model LDA (latent Dirichlet allocation) designed for accommodating long text to discover knowledge from them. Nonetheless, some of the existing research about text-based crude oil forecasting employs LDA to explore topics from news headlines, resulting in a mismatch between the short text and the topic model and further affecting the forecasting performance. Exploiting advanced and appropriate methods to construct high-quality features from news headlines becomes crucial in crude oil forecasting. This paper introduces two novel indicators of topic and sentiment for the short and sparse text data to tackle this issue. Empirical experiments show that AdaBoost.RT with our proposed text indicators, with a more comprehensive view and characterization of the short and sparse text data, outperforms the other benchmarks. Another significant merit is that our method also yields good forecasting performance when applied to other futures commodities.

Original languageEnglish
Pages (from-to)367-383
Number of pages17
JournalInternational Journal of Forecasting
Issue number1
Early online date19 Jul 2021
Publication statusPublished - 10 Dec 2021


  • Crude oil price
  • Forecasting
  • Multivariate time series
  • News headlines
  • Text features


Dive into the research topics of 'Crude oil price forecasting incorporating news text'. Together they form a unique fingerprint.

Cite this