Machine Learning Approach for Crude Oil Price Prediction

  • Siti Norbaiti Binti Abdullah

Student thesis: Phd


Crude oil prices impact the world economy and are thus of interest to economic experts and politicians. Oil price's volatile behaviour, which has moulded today's world economy, society and politics, has motivated and continues to excite researchers for further study. This volatile behaviour is predicted to prompt more new and interesting research challenges. In the present research, machine learning and computational intelligence utilising historical quantitative data, with the linguistic element of online news services, are used to predict crude oil prices via five different models: (1) the Hierarchical Conceptual (HC) model; (2) the Artificial Neural Network-Quantitative (ANN-Q) model; (3) the Linguistic model; (4) the Rule-based Expert model; and, finally, (5) the Hybridisation of Linguistic and Quantitative (LQ) model. First, to understand the behaviour of the crude oil price market, the HC model functions as a platform to retrieve information that explains the behaviour of the market. This is retrieved from Google News articles using the keyword "Crude oil price". Through a systematic approach, price data are classified into categories that explain the crude oil price's level of impact on the market. The price data classification distinguishes crucial behaviour information contained in the articles. These distinguished data features ranked hierarchically according to the level of impact and used as reference to discover the numeric data implemented in model (2). Model (2) is developed to validate the features retrieved in model (1). It introduces the Back Propagation Neural Network (BPNN) technique as an alternative to conventional techniques used for forecasting the crude oil market. The BPNN technique is proven in model (2) to have produced more accurate and competitive results. Likewise, the features retrieved from model (1) are also validated and proven to cause market volatility. In model (3), a more systematic approach is introduced to extract the features from the news corpus. This approach applies a content utilisation technique to news articles and mines news sentiments by applying a fuzzy grammar fragment extraction. To extract the features from the news articles systematically, a domain-customised 'dictionary' containing grammar definitions is built beforehand. These retrieved features are used as the linguistic data to predict the market's behaviour with crude oil price. A decision tree is also produced from this model which hierarchically delineates the events (i.e., the market's rules) that made the market volatile, and later resulted in the production of model (4). Then, model (5) is built to complement the linguistic character performed in model (3) from the numeric prediction model made in model (2). To conclude, the hybridisation of these two models and the integration of models (1) to (5) in this research imitates the execution of crude oil market's regulators in calculating their risk of actions before executing a price hedge in the market, wherein risk calculation is based on the 'facts' (quantitative data) and 'rumours' (linguistic data) collected. The hybridisation of quantitative and linguistic data in this study has shown promising accuracy outcomes, evidenced by the optimum value of directional accuracy and the minimum value of errors obtained.
Date of Award1 Aug 2014
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorXiaojun Zeng (Supervisor)


  • crude oil price prediction
  • quantitative prediction model
  • linguistic prediction model
  • AI hybrid models
  • hierarchical conceptual model
  • machine learning
  • oil price prediction
  • oil prediction
  • sentiment-mining
  • ANN

Cite this