Partial least squares with structured output for modelling the metabolomics data obtained from complex experimental designs: A study into the Y-block coding

Yun Xu, Howbeer Muhamad Ali, Ali Sayqal, Neil Dixon, Royston Goodacre

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a “pure” regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics data sets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.
    Original languageEnglish
    JournalMetabolites
    Volume6
    Issue number4
    Early online date28 Oct 2016
    DOIs
    Publication statusPublished - 2016

    Keywords

    • partial least squares; structural modelling; experimental design; metabolomics; Y coding

    Fingerprint

    Dive into the research topics of 'Partial least squares with structured output for modelling the metabolomics data obtained from complex experimental designs: A study into the Y-block coding'. Together they form a unique fingerprint.

    Cite this