TY - JOUR
T1 - Fast randomization of large genomic datasets while preserving alteration counts
AU - Gobbi, A.
AU - Iorio, F.
AU - Dawson, K.J.
AU - Wedge, D.C.
AU - Tamborero, D.
AU - Alexandrov, L.B.
AU - Lopez-Bigas, N.
AU - Garnett, M.J.
AU - Jurman, G.
AU - Saez-Rodriguez, J.
PY - 2014/9/1
Y1 - 2014/9/1
N2 - Motivation: Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a 'mutually exclusive' manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-Algorithm) is used to sample simulated datasets under a null model that preserves patient-and genewise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switchingsteps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive. Results: We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-Algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks..
AB - Motivation: Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a 'mutually exclusive' manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-Algorithm) is used to sample simulated datasets under a null model that preserves patient-and genewise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switchingsteps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive. Results: We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-Algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks..
UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-84907028033&partnerID=MN8TOARS
U2 - 10.1093/bioinformatics/btu474
DO - 10.1093/bioinformatics/btu474
M3 - Conference article
SN - 1367-4803
VL - 30
SP - i617-i623
JO - Bioinformatics
JF - Bioinformatics
IS - 17
ER -