Transcription factors (TFs) can bind DNA in a cooperative manner, enabling a mutual increase in occupancy. Through this type of interaction, alternative binding sites can be preferentially bound in diﬀerent tissues to regulate tissuespeciﬁc expression programmes. Recently, deep learning models have become state-of-the-art in various pattern analysis tasks, including applications in the ﬁeld of genomics. We therefore investigate the application of convolutional neural network (CNN) models to the discovery of sequence features determining cooperative and diﬀerential TF binding across tissues. We analyse ChIP-seq data from MEIS, TFs which are broadly expressed across mouse branchial arches, and HOXA2, which is expressed in the second and more posterior branchial arches. By developing models predictive of MEIS diﬀerential binding in all three tissues we are able to accurately predict HOXA2 co-binding sites. We evaluate transfer-like and multitask approaches to regularising the high-dimensional classiﬁcation task with a larger regression dataset, allowing for creation of deeper and more accurate models. We test the performance of perturbation and gradient-based attribution methods in identifying the HOXA2 sites from diﬀerential MEIS data. Our results show that deep regularised models signiﬁcantly outperform shallow CNNs as well as k-mer methods in the discovery of tissue-speciﬁc sites bound in vivo.
|Journal||Nucleic Acids Res|
|Publication status||Published - 24 Jan 2020|