This paper presents an interpretable ensemble modelling method, in which the predictions of several individual base learners are combined together through Stacked generalisation, which makes use of a secondary layer model, or so called meta-learner, that is trained on the output cross-validation predictions of each base learner. To provide interpretability, the permutation variable importance (PVI) is computed on the ensemble, wherein variables are randomly shuffled and the reduction in predictive performance for the ensemble is calculated for each variable. This is a novel contribution, as no previous attempts have been made in the soft sensor literature to investigate the interpretability of ensemble models that use heterogeneous base learners. The Stacked ensemble model also avoids model selection, which is the process of choosing among many candidate models. Model selection is often based on cross-validation, which is not guaranteed to select the best model in terms of true generalisation performance on the test set. Instead, the proposed method combines multiple models instead of choosing a singular model, avoiding the need for model selection. The efficacy of the proposed methodology in terms of both variable importance and predictive performance is shown on a synthetic dataset, in which the variable importance is already known, and an industrial dataset of a refinery process provided by Dow. For the synthetic dataset, it is shown that the proposed method chooses the correct casual variables, whereas the in-built variable importance provided by the individual models, namely Partial least squares, Lasso, Random forests & XGBoost, can give increased importance to non-causal, randomly generated variables. For the industrial study, the combined ensemble is shown to outperform all individual base models in terms of predictive performance, whilst also providing a new perspective in terms of variable importance compared to previous studies.