Abstract
Background: It is extremely common to need to select a subset of reads from a BAM file based on their specific properties. Typically, a user unpacks the BAM file to a text stream using SAMtools, parses and filters the lines using AWK, then repacks them using SAMtools. This process is tedious and error-prone. In particular, when working with many columns of data, mix-ups are common and the bit field containing the flags is unintuitive. There are several libraries for reading BAM files, such as Bio-SamTools for Perl and pysam for Python. Both allow access to the BAM's read information and can filter reads, but require substantial boilerplate code; this is high overhead for mostly ad hoc filtering. Results: We have created a query language that gathers reads using a collection of predicates and common logical connectives. Queries run faster than equivalents and can be compiled to native code for embedding in larger programs. Conclusions: BAMQL provides a user-friendly, powerful and performant way to extract subsets of BAM files for ad hoc analyses or integration into applications. The query language provides a collection of predicates beyond those in SAMtools, and more flexible connectives.
Original language | English |
---|---|
Article number | 305 |
Journal | BMC Bioinformatics |
Volume | 17 |
Issue number | 1 |
Early online date | 11 Aug 2016 |
DOIs | |
Publication status | Published - 2016 |
Keywords
- BAM-format
- BAMQL
- Query language
Research Beacons, Institutes and Platforms
- Manchester Cancer Research Centre
Fingerprint
Dive into the research topics of 'BAMQL: A query language for extracting reads from BAM files'. Together they form a unique fingerprint.Datasets
-
BAMQL: a query language for extracting reads from BAM files
Masella, A. P. (Creator), Lalansingh, C. M. (Creator), Sivasundaram, P. (Creator), Fraser, M. (Creator), Bristow, R. (Creator) & Boutros, P. C. (Creator), figshare , 11 Aug 2016
DOI: 10.6084/m9.figshare.c.3625736
Dataset