Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kelley DR, Snoek J, Rinn JL. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 2016;26:990-9. [PMID: 27197224 PMCID: PMC4937568 DOI: 10.1101/gr.200535.115] [Citation(s) in RCA: 550] [Impact Index Per Article: 61.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2015] [Accepted: 04/26/2016] [Indexed: 12/22/2022]

For:	Kelley DR, Snoek J, Rinn JL. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 2016;26:990-9. [PMID: 27197224 PMCID: PMC4937568 DOI: 10.1101/gr.200535.115] [Citation(s) in RCA: 550] [Impact Index Per Article: 61.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2015] [Accepted: 04/26/2016] [Indexed: 12/22/2022]

Number

Cited by Other Article(s)

501

Li Y, Shi W, Wasserman WW. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods. BMC Bioinformatics 2018;19:202. [PMID: 29855387 PMCID: PMC5984344 DOI: 10.1186/s12859-018-2187-1] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 05/04/2018] [Indexed: 01/07/2023] Open

Abstract

Background

In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide.

Results

Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome).

Conclusion

The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2187-1) contains supplementary material, which is available to authorized users.

Collapse

502

Zou LS, Erdos MR, Taylor DL, Chines PS, Varshney A, Parker SCJ, Collins FS, Didion JP. BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues. BMC Genomics 2018;19:390. [PMID: 29792182 PMCID: PMC5966887 DOI: 10.1186/s12864-018-4766-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Accepted: 05/08/2018] [Indexed: 01/14/2023] Open

503

Min X, Zeng W, Chen N, Chen T, Jiang R. Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding. Bioinformatics 2018;33:i92-i101. [PMID: 28881969 PMCID: PMC5870572 DOI: 10.1093/bioinformatics/btx234] [Citation(s) in RCA: 80] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open

Abstract

Motivation

Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted k-mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful k-mer co-occurrence information with recent advances in deep learning.

Results

We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with k-mer embedding. We first split DNA sequences into k-mers and pre-train k-mer embedding vectors based on the co-occurrence matrix of k-mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that k-mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility.

Availability and implementation

The source code can be downloaded from https://github.com/minxueric/ismb2017_lstm.

Supplementary information

Supplementary materials are available at Bioinformatics online.

Collapse

504

Fraser K, Bruckner DM, Dordick JS. Advancing Predictive Hepatotoxicity at the Intersection of Experimental, in Silico, and Artificial Intelligence Technologies. Chem Res Toxicol 2018;31:412-430. [PMID: 29722533 DOI: 10.1021/acs.chemrestox.8b00054] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

505

Amidi A, Amidi S, Vlachakis D, Megalooikonomou V, Paragios N, Zacharaki EI. EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation. PeerJ 2018;6:e4750. [PMID: 29740518 PMCID: PMC5937476 DOI: 10.7717/peerj.4750] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Accepted: 04/21/2018] [Indexed: 11/20/2022] Open

506

Zhu L, Zhang HB, Huang DS. Direct AUC optimization of regulatory motifs. Bioinformatics 2018;33:i243-i251. [PMID: 28881989 PMCID: PMC5870558 DOI: 10.1093/bioinformatics/btx255] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

507

Koh PW, Pierson E, Kundaje A. Denoising genome-wide histone ChIP-seq with convolutional neural networks. Bioinformatics 2018;33:i225-i233. [PMID: 28881977 PMCID: PMC5870713 DOI: 10.1093/bioinformatics/btx243] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

508

Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res 2018;28:739-750. [PMID: 29588361 PMCID: PMC5932613 DOI: 10.1101/gr.227819.117] [Citation(s) in RCA: 280] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 03/23/2018] [Indexed: 01/10/2023]

509

Diao JA, Kohane IS, Manrai AK. Biomedical informatics and machine learning for clinical genomics. Hum Mol Genet 2018;27:R29-R34. [PMID: 29566172 PMCID: PMC5946905 DOI: 10.1093/hmg/ddy088] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2018] [Revised: 03/08/2018] [Accepted: 03/08/2018] [Indexed: 12/22/2022] Open

510

Pei L, Zheng Y, Zou S, Li Z. Dynamics of four-neuron recurrent inhibitory loop with state-dependent time delays. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.02.062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

511

Kalinin AA, Higgins GA, Reamaroon N, Soroushmehr S, Allyn-Feuer A, Dinov ID, Najarian K, Athey BD. Deep learning in pharmacogenomics: from gene regulation to patient stratification. Pharmacogenomics 2018;19:629-650. [PMID: 29697304 PMCID: PMC6022084 DOI: 10.2217/pgs-2018-0008] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 03/09/2018] [Indexed: 01/02/2023] Open

512

Artificial intelligence used in genome analysis studies. EUROBIOTECH JOURNAL 2018. [DOI: 10.2478/ebtj-2018-0012] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

513

Avsec Ž, Barekatain M, Cheng J, Gagneur J. Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks. Bioinformatics 2018;34:1261-1269. [PMID: 29155928 PMCID: PMC5905632 DOI: 10.1093/bioinformatics/btx727] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Revised: 10/16/2017] [Accepted: 11/15/2017] [Indexed: 12/01/2022] Open

514

Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 2018;15:20170387. [PMID: 29618526 PMCID: PMC5938574 DOI: 10.1098/rsif.2017.0387] [Citation(s) in RCA: 877] [Impact Index Per Article: 125.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 03/07/2018] [Indexed: 11/12/2022] Open

Affiliation(s)

Travers Ching Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, HI, USA
Daniel S Himmelstein Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Brett K Beaulieu-Jones Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Alexandr A Kalinin Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
Brian T Do Harvard Medical School, Boston, MA, USA
Gregory P Way Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Enrico Ferrero Computational Biology and Stats, Target Sciences, GlaxoSmithKline, Stevenage, UK
Paul-Michael Agapow Data Science Institute, Imperial College London, London, UK
Michael Zietz Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Michael M Hoffman Princess Margaret Cancer Centre, Toronto, Ontario, Canada Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
Wei Xie Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
Gail L Rosen Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
Benjamin J Lengerich Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Johnny Israeli Biophysics Program, Stanford University, Stanford, CA, USA
Jack Lanchantin Department of Computer Science, University of Virginia, Charlottesville, VA, USA
Stephen Woloszynek Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
Anne E Carpenter Imaging Platform, Broad Institute of Harvard and MIT, Cambridge, MA, USA
Avanti Shrikumar Department of Computer Science, Stanford University, Stanford, CA, USA
Jinbo Xu Toyota Technological Institute at Chicago, Chicago, IL, USA
Evan M Cofer Department of Computer Science, Trinity University, San Antonio, TX, USA Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
Christopher A Lavender Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
Srinivas C Turaga Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA
Amr M Alexandari Department of Computer Science, Stanford University, Stanford, CA, USA
Zhiyong Lu National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
David J Harris Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
Dave DeCaprio ClosedLoop.ai, Austin, TX, USA
Yanjun Qi Department of Computer Science, University of Virginia, Charlottesville, VA, USA
Anshul Kundaje Department of Computer Science, Stanford University, Stanford, CA, USA Department of Genetics, Stanford University, Stanford, CA, USA
Yifan Peng National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Laura K Wiley Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, CO, USA
Marwin H S Segler Institute of Organic Chemistry, Westfälische Wilhelms-Universität Münster, Münster, Germany
Simina M Boca Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
S Joshua Swamidass Department of Pathology and Immunology, Washington University in Saint Louis, St Louis, MO, USA
Austin Huang Department of Medicine, Brown University, Providence, RI, USA
Anthony Gitter Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA Morgridge Institute for Research, Madison, WI, USA
Casey S Greene Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA

Collapse

515

Kim HK, Min S, Song M, Jung S, Choi JW, Kim Y, Lee S, Yoon S, Kim HH. Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity. Nat Biotechnol 2018;36:239-241. [PMID: 29431740 DOI: 10.1038/nbt.4061] [Citation(s) in RCA: 210] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Accepted: 12/08/2017] [Indexed: 12/26/2022]

Affiliation(s)

Hui Kwon Kim Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea Brain Korea 21 Plus Project for Medical Sciences, Yonsei University College of Medicine, Seoul, Republic of Korea
Seonwoo Min Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea
Myungjae Song Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea Graduate School of Biomedical Science and Engineering, Hanyang University, Seoul, Republic of Korea
Soobin Jung Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea Brain Korea 21 Plus Project for Medical Sciences, Yonsei University College of Medicine, Seoul, Republic of Korea
Jae Woo Choi Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Republic of Korea
Younggwang Kim Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea Brain Korea 21 Plus Project for Medical Sciences, Yonsei University College of Medicine, Seoul, Republic of Korea
Sangeun Lee Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea Brain Korea 21 Plus Project for Medical Sciences, Yonsei University College of Medicine, Seoul, Republic of Korea
Sungroh Yoon Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
Hyongbum Henry Kim Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea Brain Korea 21 Plus Project for Medical Sciences, Yonsei University College of Medicine, Seoul, Republic of Korea Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Republic of Korea Center for Nanomedicine, Institute for Basic Science (IBS), Seoul, Republic of Korea Yonsei-IBS Institute, Yonsei University, Seoul, Republic of Korea

Collapse

516

Zhang Y, An L, Xu J, Zhang B, Zheng WJ, Hu M, Tang J, Yue F. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun 2018;9:750. [PMID: 29467363 PMCID: PMC5821732 DOI: 10.1038/s41467-018-03113-2] [Citation(s) in RCA: 96] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2017] [Accepted: 01/19/2018] [Indexed: 12/31/2022] Open

517

Cao C, Liu F, Tan H, Song D, Shu W, Li W, Zhou Y, Bo X, Xie Z. Deep Learning and Its Applications in Biomedicine. GENOMICS, PROTEOMICS & BIOINFORMATICS 2018;16:17-32. [PMID: 29522900 PMCID: PMC6000200 DOI: 10.1016/j.gpb.2017.07.003] [Citation(s) in RCA: 253] [Impact Index Per Article: 36.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2017] [Revised: 06/18/2017] [Accepted: 07/05/2017] [Indexed: 12/19/2022]

518

Alakwaa F, Chaudhary K, Garmire LX. Deep Learning Accurately Predicts Estrogen Receptor Status in Breast Cancer Metabolomics Data. J Proteome Res 2018;17:337-347. [PMID: 29110491 PMCID: PMC5759031 DOI: 10.1021/acs.jproteome.7b00595] [Citation(s) in RCA: 134] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Indexed: 12/17/2022]

519

Celesti F, Celesti A, Wan J, Villari M. Why Deep Learning Is Changing the Way to Approach NGS Data Processing: A Review. IEEE Rev Biomed Eng 2018;11:68-76. [DOI: 10.1109/rbme.2018.2825987] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

520

Kim M, Tagkopoulos I. Data integration and predictive modeling methods for multi-omics datasets. Mol Omics 2018;14:8-25. [DOI: 10.1039/c7mo00051k] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

521

Lee PH, Lee C, Li X, Wee B, Dwivedi T, Daly M. Principles and methods of in-silico prioritization of non-coding regulatory variants. Hum Genet 2018;137:15-30. [PMID: 29288389 PMCID: PMC5892192 DOI: 10.1007/s00439-017-1861-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 12/14/2017] [Indexed: 12/13/2022]

522

Ye F. Particle swarm optimization-based automatic parameter selection for deep neural networks and its applications in large-scale and high-dimensional data. PLoS One 2017;12:e0188746. [PMID: 29236718 PMCID: PMC5728507 DOI: 10.1371/journal.pone.0188746] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2017] [Accepted: 10/02/2017] [Indexed: 01/02/2023] Open

523

Banovich NE, Li YI, Raj A, Ward MC, Greenside P, Calderon D, Tung PY, Burnett JE, Myrthil M, Thomas SM, Burrows CK, Romero IG, Pavlovic BJ, Kundaje A, Pritchard JK, Gilad Y. Impact of regulatory variation across human iPSCs and differentiated cells. Genome Res 2017;28:122-131. [PMID: 29208628 PMCID: PMC5749177 DOI: 10.1101/gr.224436.117] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Accepted: 11/20/2017] [Indexed: 12/17/2022]

Affiliation(s)

Nicholas E Banovich Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
Yang I Li Department of Genetics, Stanford University, Stanford, California 94305, USA
Anil Raj Department of Genetics, Stanford University, Stanford, California 94305, USA
Michelle C Ward Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.,Department of Medicine, University of Chicago, Chicago, Illinois 60637, USA
Peyton Greenside Department of Biomedical Informatics, Stanford University, Stanford, California 94305, USA
Diego Calderon Department of Biomedical Informatics, Stanford University, Stanford, California 94305, USA
Po Yuan Tung Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.,Department of Medicine, University of Chicago, Chicago, Illinois 60637, USA
Jonathan E Burnett Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
Marsha Myrthil Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
Samantha M Thomas Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
Courtney K Burrows Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
Irene Gallego Romero Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
Bryan J Pavlovic Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
Anshul Kundaje Department of Genetics, Stanford University, Stanford, California 94305, USA
Jonathan K Pritchard Department of Genetics, Stanford University, Stanford, California 94305, USA.,Department of Biology, Stanford University, Stanford, California 94305, USA.,Howard Hughes Medical Institute, Stanford University, Stanford, California 94305, USA
Yoav Gilad Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.,Department of Medicine, University of Chicago, Chicago, Illinois 60637, USA

Collapse

524

Singh R, Lanchantin J, Sekhon A, Qi Y. Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2017;30:6785-6795. [PMID: 30147283 PMCID: PMC6105294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

525

Xu Y, Wang Y, Luo J, Zhao W, Zhou X. Deep learning of the splicing (epi)genetic code reveals a novel candidate mechanism linking histone modifications to ESC fate decision. Nucleic Acids Res 2017;45:12100-12112. [PMID: 29036709 PMCID: PMC5716079 DOI: 10.1093/nar/gkx870] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Revised: 09/08/2017] [Accepted: 09/15/2017] [Indexed: 01/31/2023] Open

526

Min X, Zeng W, Chen S, Chen N, Chen T, Jiang R. Predicting enhancers with deep convolutional neural networks. BMC Bioinformatics 2017;18:478. [PMID: 29219068 PMCID: PMC5773911 DOI: 10.1186/s12859-017-1878-3] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open

527

Xie R, Wen J, Quitadamo A, Cheng J, Shi X. A deep auto-encoder model for gene expression prediction. BMC Genomics 2017;18:845. [PMID: 29219072 PMCID: PMC5773895 DOI: 10.1186/s12864-017-4226-0] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

528

Ransohoff JD, Wei Y, Khavari PA. The functions and unique features of long intergenic non-coding RNA. Nat Rev Mol Cell Biol 2017;19:143-157. [PMID: 29138516 DOI: 10.1038/nrm.2017.104] [Citation(s) in RCA: 923] [Impact Index Per Article: 115.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

529

Computational biology: deep learning. Emerg Top Life Sci 2017;1:257-274. [PMID: 33525807 PMCID: PMC7289034 DOI: 10.1042/etls20160025] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 09/13/2017] [Accepted: 09/18/2017] [Indexed: 02/06/2023]

530

Gene Prediction in Metagenomic Fragments with Deep Learning. BIOMED RESEARCH INTERNATIONAL 2017;2017:4740354. [PMID: 29250541 PMCID: PMC5698827 DOI: 10.1155/2017/4740354] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 10/08/2017] [Indexed: 01/14/2023]

531

Cuperus JT, Groves B, Kuchina A, Rosenberg AB, Jojic N, Fields S, Seelig G. Deep learning of the regulatory grammar of yeast 5' untranslated regions from 500,000 random sequences. Genome Res 2017;27:2015-2024. [PMID: 29097404 PMCID: PMC5741052 DOI: 10.1101/gr.224964.117] [Citation(s) in RCA: 119] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 10/18/2017] [Indexed: 11/25/2022]

532

Zeng H, Gifford DK. Predicting the impact of non-coding variants on DNA methylation. Nucleic Acids Res 2017;45:e99. [PMID: 28334830 PMCID: PMC5499808 DOI: 10.1093/nar/gkx177] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 03/13/2017] [Indexed: 12/22/2022] Open

533

Finnegan A, Song JS. Maximum entropy methods for extracting the learned features of deep neural networks. PLoS Comput Biol 2017;13:e1005836. [PMID: 29084280 PMCID: PMC5679649 DOI: 10.1371/journal.pcbi.1005836] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Revised: 11/09/2017] [Accepted: 10/23/2017] [Indexed: 11/19/2022] Open

Abstract

New architectures of multilayer artificial neural networks and new methods for training them are rapidly revolutionizing the application of machine learning in diverse fields, including business, social science, physical sciences, and biology. Interpreting deep neural networks, however, currently remains elusive, and a critical challenge lies in understanding which meaningful features a network is actually learning. We present a general method for interpreting deep neural networks and extracting network-learned features from input data. We describe our algorithm in the context of biological sequence analysis. Our approach, based on ideas from statistical physics, samples from the maximum entropy distribution over possible sequences, anchored at an input sequence and subject to constraints implied by the empirical function learned by a network. Using our framework, we demonstrate that local transcription factor binding motifs can be identified from a network trained on ChIP-seq data and that nucleosome positioning signals are indeed learned by a network trained on chemical cleavage nucleosome maps. Imposing a further constraint on the maximum entropy distribution also allows us to probe whether a network is learning global sequence features, such as the high GC content in nucleosome-rich regions. This work thus provides valuable mathematical tools for interpreting and extracting learned features from feed-forward neural networks.

Deep learning is a state-of-the-art reformulation of artificial neural networks that have a long history of development. It can perform superbly well in diverse automated classification and prediction problems, including handwriting recognition, image identification, and biological pattern recognition. Its modern success can be attributed to improved training algorithms, clever network architecture, rapid explosion of available data, and advanced computing power–all of which have allowed the great expansion in the number of unknown parameters to be estimated by the model. These parameters, however, are so intricately connected through highly nonlinear functions that interpreting which essential features of given data are actually used by a deep neural network for its excellent performance has been difficult. We address this problem by using ideas from statistical physics to sample new unseen data that are likely to behave similarly to original data points when passed through the trained network. This synthetic data cloud around each original data point retains informative features while averaging out nonessential ones, ultimately allowing us to extract important network-learned features from the original data set and thus improving the human interpretability of deep learning methods. We demonstrate how our method can be applied to biological sequence analysis.

Collapse

534

Alvarez RV, Li S, Landsman D, Ovcharenko I. SNPDelScore: combining multiple methods to score deleterious effects of noncoding mutations in the human genome. Bioinformatics 2017;34:289-291. [PMID: 28968739 DOI: 10.1093/bioinformatics/btx583] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 09/11/2017] [Accepted: 09/13/2017] [Indexed: 11/12/2022] Open

535

Schwessinger R, Suciu MC, McGowan SJ, Telenius J, Taylor S, Higgs DR, Hughes JR. Sasquatch: predicting the impact of regulatory SNPs on transcription factor binding from cell- and tissue-specific DNase footprints. Genome Res 2017;27:1730-1742. [PMID: 28904015 PMCID: PMC5630036 DOI: 10.1101/gr.220202.117] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2017] [Accepted: 08/07/2017] [Indexed: 12/22/2022]

536

Kreimer A, Zeng H, Edwards MD, Guo Y, Tian K, Shin S, Welch R, Wainberg M, Mohan R, Sinnott-Armstrong NA, Li Y, Eraslan G, AMIN TB, Goke J, Mueller NS, Kellis M, Kundaje A, Beer MA, Keles S, Gifford DK, Yosef N. Predicting gene expression in massively parallel reporter assays: A comparative study. Hum Mutat 2017;38:1240-1250. [PMID: 28220625 PMCID: PMC5560998 DOI: 10.1002/humu.23197] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Revised: 01/19/2017] [Accepted: 02/12/2017] [Indexed: 02/03/2023]

Affiliation(s)

Anat Kreimer Department of Electrical Engineering and Computer Science and Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA Department of Bioengineering and Therapeutic Sciences, Institute for Human Genetics, University of California, San Francisco, San Francisco, California, USA
Haoyang Zeng Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
Matthew D. Edwards Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
Yuchun Guo Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
Kevin Tian Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
Sunyoung Shin Department of Statistics, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, Madison, Wisconsin, USA
Rene Welch Department of Statistics, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, Madison, Wisconsin, USA
Michael Wainberg Department of Genetics, Stanford University School of Medicine, Department of Computer Science, Stanford, California 94305, USA
Rahul Mohan Department of Genetics, Stanford University School of Medicine, Department of Computer Science, Stanford, California 94305, USA
Nicholas A. Sinnott-Armstrong Department of Genetics, Stanford University School of Medicine, Department of Computer Science, Stanford, California 94305, USA
Yue Li Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA
Gökcen Eraslan Computational Cell Maps, Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1 85764 Neuherberg, Germany
Talal Bin AMIN Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
Jonathan Goke Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
Nikola S. Mueller Computational Cell Maps, Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1 85764 Neuherberg, Germany
Manolis Kellis Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA
Anshul Kundaje Department of Genetics, Stanford University School of Medicine, Department of Computer Science, Stanford, California 94305, USA
Michael A Beer McKusick-Nathans Institute of Genetic Medicine, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
Sunduz Keles Department of Statistics, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, Madison, Wisconsin, USA
David K. Gifford Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
Nir Yosef Department of Electrical Engineering and Computer Science and Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA Ragon Institute of Massachusetts General Hospital, MIT and Harvard, Cambridge, MA, 02139

Collapse

537

Fu H, Zhang X. Noncoding Variants Functional Prioritization Methods Based on Predicted Regulatory Factor Binding Sites. Curr Genomics 2017;18:322-331. [PMID: 29081688 PMCID: PMC5635616 DOI: 10.2174/1389202918666170228143619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 10/16/2016] [Accepted: 11/02/2016] [Indexed: 12/31/2022] Open

Abstract

BACKGROUNDS

With the advent of the post genomic era, the research for the genetic mechanism of the diseases has found to be increasingly depended on the studies of the genes, the gene-networks and gene-protein interaction networks. To explore gene expression and regulation, the researchers have carried out many studies on transcription factors and their binding sites (TFBSs). Based on the large amount of transcription factor binding sites predicting values in the deep learning models, further computation and analysis have been done to reveal the relationship between the gene mutation and the occurrence of the disease. It has been demonstrated that based on the deep learning methods, the performances of the prediction for the functions of the noncoding variants are outperforming than those of the conventional methods. The research on the prediction for functions of Single Nucleotide Polymorphisms (SNPs) is expected to uncover the mechanism of the gene mutation affection on traits and diseases of human beings.

RESULTS

We reviewed the conventional TFBSs identification methods from different perspectives. As for the deep learning methods to predict the TFBSs, we discussed the related problems, such as the raw data preprocessing, the structure design of the deep convolution neural network (CNN) and the model performance measure et al. And then we summarized the techniques that usually used in finding out the functional noncoding variants from de novo sequence.

CONCLUSION

Along with the rapid development of the high-throughout assays, more and more sample data and chromatin features would be conducive to improve the prediction accuracy of the deep convolution neural network for TFBSs identification. Meanwhile, getting more insights into the deep CNN framework itself has been proved useful for both the promotion on model performance and the development for more suitable design to sample data. Based on the feature values predicted by the deep CNN model, the prioritization model for functional noncoding variants would contribute to reveal the affection of gene mutation on the diseases.

Collapse

538

Reiman D, Metwally A. Using convolutional neural networks to explore the microbiome. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017;2017:4269-4272. [PMID: 29060840 DOI: 10.1109/embc.2017.8037799] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

539

Zhang H, Zhu L, Huang DS. WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data. Sci Rep 2017;7:3217. [PMID: 28607381 PMCID: PMC5468353 DOI: 10.1038/s41598-017-03554-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 05/02/2017] [Indexed: 01/24/2023] Open

540

Gomez-Cabrero D, Tegnér J. Iterative Systems Biology for Medicine – Time for advancing from network signatures to mechanistic equations. ACTA ACUST UNITED AC 2017. [DOI: 10.1016/j.coisb.2017.05.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

541

Feigin ME, Garvin T, Bailey P, Waddell N, Chang DK, Kelley DR, Shuai S, Gallinger S, McPherson JD, Grimmond SM, Khurana E, Stein LD, Biankin AV, Schatz MC, Tuveson DA. Recurrent noncoding regulatory mutations in pancreatic ductal adenocarcinoma. Nat Genet 2017;49:825-833. [PMID: 28481342 PMCID: PMC5659388 DOI: 10.1038/ng.3861] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Accepted: 04/10/2017] [Indexed: 12/15/2022]

Affiliation(s)

Michael E Feigin Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA Lustgarten Foundation Pancreatic Cancer Research Laboratory, Cold Spring Harbor, New York, USA
Tyler Garvin Watson School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
Peter Bailey Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow, Scotland, UK
Nicola Waddell QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
David K Chang Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow, Scotland, UK The Kinghorn Cancer Centre, Cancer Research Program, Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia Department of Surgery, Bankstown Hospital, Bankstown, Sydney, New South Wales, Australia South Western Sydney Clinical School, Faculty of Medicine, University of New South Wales, Liverpool, New South Wales, Australia
David R Kelley Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA
Shimin Shuai Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
Steven Gallinger Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada Division of General Surgery, Toronto General Hospital, Toronto, Ontario, Canada
John D McPherson Genome Technologies Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
Sean M Grimmond Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow, Scotland, UK Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
Ekta Khurana Sandra and Edward Meyer Cancer Center, Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Medical College of Cornell University, New York, New York, USA
Lincoln D Stein Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada Informatics and Biocomputing, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
Andrew V Biankin Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow, Scotland, UK South Western Sydney Clinical School, Faculty of Medicine, University of New South Wales, Liverpool, New South Wales, Australia West of Scotland Pancreatic Unit, Glasgow Royal Infirmary, Glasgow, Scotland, UK
Michael C Schatz Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA
David A Tuveson Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA Lustgarten Foundation Pancreatic Cancer Research Laboratory, Cold Spring Harbor, New York, USA Rubenstein Center for Pancreatic Cancer Research, Memorial Sloan Kettering Cancer Center, New York, New York, USA

Collapse

542

Pärnamaa T, Parts L. Accurate Classification of Protein Subcellular Localization from High-Throughput Microscopy Images Using Deep Learning. G3 (BETHESDA, MD.) 2017;7:1385-1392. [PMID: 28391243 PMCID: PMC5427497 DOI: 10.1534/g3.116.033654] [Citation(s) in RCA: 88] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Accepted: 11/22/2016] [Indexed: 11/29/2022]

543

Chasman D, Roy S. Inference of cell type specific regulatory networks on mammalian lineages. ACTA ACUST UNITED AC 2017;2:130-139. [PMID: 29082337 DOI: 10.1016/j.coisb.2017.04.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

544

Bussemaker HJ, Causton HC, Fazlollahi M, Lee E, Muroff I. Network-based approaches that exploit inferred transcription factor activity to analyze the impact of genetic variation on gene expression. ACTA ACUST UNITED AC 2017;2:98-102. [PMID: 28691107 DOI: 10.1016/j.coisb.2017.04.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

545

Angermueller C, Lee HJ, Reik W, Stegle O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol 2017;18:67. [PMID: 28395661 PMCID: PMC5387360 DOI: 10.1186/s13059-017-1189-z] [Citation(s) in RCA: 245] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Accepted: 03/07/2017] [Indexed: 12/31/2022] Open

546

Huang YF, Gulko B, Siepel A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat Genet 2017;49:618-624. [PMID: 28288115 PMCID: PMC5395419 DOI: 10.1038/ng.3810] [Citation(s) in RCA: 232] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 02/13/2017] [Indexed: 12/17/2022]

547

Pan X, Shen HB. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinformatics 2017;18:136. [PMID: 28245811 PMCID: PMC5331642 DOI: 10.1186/s12859-017-1561-8] [Citation(s) in RCA: 114] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Accepted: 02/23/2017] [Indexed: 01/08/2023] Open

Abstract

Background

RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation.

Results

In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6%. Besides the overall enhanced prediction performance, the convolutional neural network module embedded in iDeep is also able to automatically capture the interpretable binding motifs for RBPs. Large-scale experiments demonstrate that these mined binding motifs agree well with the experimentally verified results, suggesting iDeep is a promising approach in the real-world applications.

Conclusion

The iDeep framework not only can achieve promising performance than the state-of-the-art predictors, but also easily capture interpretable binding motifs. iDeep is available at http://www.csbio.sjtu.edu.cn/bioinf/iDeep

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-017-1561-8) contains supplementary material, which is available to authorized users.

Collapse

548

Sequence-specific bias correction for RNA-seq data using recurrent neural networks. BMC Genomics 2017;18:1044. [PMID: 28198674 PMCID: PMC5310274 DOI: 10.1186/s12864-016-3262-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

549

Wouters J, Kalender Atak Z, Aerts S. Decoding transcriptional states in cancer. Curr Opin Genet Dev 2017;43:82-92. [PMID: 28129557 DOI: 10.1016/j.gde.2017.01.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Revised: 01/05/2017] [Accepted: 01/09/2017] [Indexed: 12/27/2022]

550

Lanchantin J, Singh R, Wang B, Qi Y. DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017;22:254-265. [PMID: 27896980 PMCID: PMC5787355 DOI: 10.1142/9789813207813_0025] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]