BAMQL: A query language for extracting reads from BAM files

Andre P. Masella, Christopher M. Lalansingh, Pragash Sivasundaram, Michael Fraser, Robert G. Bristow, Paul C. Boutros*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Background: It is extremely common to need to select a subset of reads from a BAM file based on their specific properties. Typically, a user unpacks the BAM file to a text stream using SAMtools, parses and filters the lines using AWK, then repacks them using SAMtools. This process is tedious and error-prone. In particular, when working with many columns of data, mix-ups are common and the bit field containing the flags is unintuitive. There are several libraries for reading BAM files, such as Bio-SamTools for Perl and pysam for Python. Both allow access to the BAM's read information and can filter reads, but require substantial boilerplate code; this is high overhead for mostly ad hoc filtering. Results: We have created a query language that gathers reads using a collection of predicates and common logical connectives. Queries run faster than equivalents and can be compiled to native code for embedding in larger programs. Conclusions: BAMQL provides a user-friendly, powerful and performant way to extract subsets of BAM files for ad hoc analyses or integration into applications. The query language provides a collection of predicates beyond those in SAMtools, and more flexible connectives.

Original languageEnglish
Article number305
JournalBMC Bioinformatics
Volume17
Issue number1
Early online date11 Aug 2016
DOIs
Publication statusPublished - 2016

Keywords

  • BAM-format
  • BAMQL
  • Query language

Research Beacons, Institutes and Platforms

  • Manchester Cancer Research Centre

Fingerprint

Dive into the research topics of 'BAMQL: A query language for extracting reads from BAM files'. Together they form a unique fingerprint.

Cite this