1
|
Nag S, Banerjee C, Goyal M, Siddiqui AA, Saha D, Mazumder S, Debsharma S, Pramanik S, Saha SJ, De R, Bandyopadhyay U. Plasmodium falciparum Alba6 exhibits DNase activity and participates in stress response. iScience 2024; 27:109467. [PMID: 38558939 PMCID: PMC10981135 DOI: 10.1016/j.isci.2024.109467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 12/12/2023] [Accepted: 03/07/2024] [Indexed: 04/04/2024] Open
Abstract
Alba domain proteins, owing to their functional plasticity, play a significant role in organisms. Here, we report an intrinsic DNase activity of PfAlba6 from Plasmodium falciparum, an etiological agent responsible for human malignant malaria. We identified that tyrosine28 plays a critical role in the Mg2+ driven 5'-3' DNase activity of PfAlba6. PfAlba6 cleaves both dsDNA as well as ssDNA. We also characterized PfAlba6-DNA interaction and observed concentration-dependent oligomerization in the presence of DNA, which is evident from size exclusion chromatography and single molecule AFM-imaging. PfAlba6 mRNA expression level is up-regulated several folds following heat stress and treatment with artemisinin, indicating a possible role in stress response. PfAlba6 has no human orthologs and is expressed in all intra-erythrocytic stages; thus, this protein can potentially be a new anti-malarial drug target.
Collapse
Affiliation(s)
- Shiladitya Nag
- Division of Infectious Diseases and Immunology, CSIR-Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Jadavpur, Kolkata 700032, West Bengal, India
| | - Chinmoy Banerjee
- Division of Infectious Diseases and Immunology, CSIR-Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Jadavpur, Kolkata 700032, West Bengal, India
| | - Manish Goyal
- Department of Molecular & Cell Biology, School of Dental Medicine, Boston University Medical Campus, Boston, MA, USA
| | - Asim Azhar Siddiqui
- Division of Infectious Diseases and Immunology, CSIR-Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Jadavpur, Kolkata 700032, West Bengal, India
| | - Debanjan Saha
- Division of Infectious Diseases and Immunology, CSIR-Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Jadavpur, Kolkata 700032, West Bengal, India
| | - Somnath Mazumder
- Division of Infectious Diseases and Immunology, CSIR-Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Jadavpur, Kolkata 700032, West Bengal, India
- Department of Zoology, Raja Peary Mohan College, 1 Acharya Dhruba Pal Road, Uttarpara, West Bengal 712258, India
| | - Subhashis Debsharma
- Division of Infectious Diseases and Immunology, CSIR-Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Jadavpur, Kolkata 700032, West Bengal, India
| | - Saikat Pramanik
- Division of Infectious Diseases and Immunology, CSIR-Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Jadavpur, Kolkata 700032, West Bengal, India
| | - Shubhra Jyoti Saha
- Division of Infectious Diseases and Immunology, CSIR-Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Jadavpur, Kolkata 700032, West Bengal, India
| | - Rudranil De
- Amity Institute of Biotechnology, Amity University, Kolkata, Plot No: 36, 37 & 38, Major Arterial Road, Action Area II, Kadampukur Village, Newtown, Kolkata, West Bengal 700135, India
| | - Uday Bandyopadhyay
- Division of Infectious Diseases and Immunology, CSIR-Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Jadavpur, Kolkata 700032, West Bengal, India
- Division of Molecular Medicine, Bose Institute, Unified Academic Campus, EN 80, Sector V, Bidhan Nagar, Kolkata, West Bengal 700091, India
| |
Collapse
|
2
|
Aslam I, Shah S, Jabeen S, ELAffendi M, A Abdel Latif A, Ul Haq N, Ali G. A CNN based m5c RNA methylation predictor. Sci Rep 2023; 13:21885. [PMID: 38081880 PMCID: PMC10713599 DOI: 10.1038/s41598-023-48751-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
Post-transcriptional modifications of RNA play a key role in performing a variety of biological processes, such as stability and immune tolerance, RNA splicing, protein translation and RNA degradation. One of these RNA modifications is m5c which participates in various cellular functions like RNA structural stability and translation efficiency, got popularity among biologists. By applying biological experiments to detect RNA m5c methylation sites would require much more efforts, time and money. Most of the researchers are using pre-processed RNA sequences of 41 nucleotides where the methylated cytosine is in the center. Therefore, it is possible that some of the information around these motif may have lost. The conventional methods are unable to process the RNA sequence directly due to high dimensionality and thus need optimized techniques for better features extraction. To handle the above challenges the goal of this study is to employ an end-to-end, 1D CNN based model to classify and interpret m5c methylated data sites. Moreover, our aim is to analyze the sequence in its full length where the methylated cytosine may not be in the center. The evaluation of the proposed architecture showed a promising results by outperforming state-of-the-art techniques in terms of sensitivity and accuracy. Our model achieve 96.70% sensitivity and 96.21% accuracy for 41 nucleotides sequences while 96.10% accuracy for full length sequences.
Collapse
Affiliation(s)
- Irum Aslam
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, KPK, Pakistan
| | - Sajid Shah
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| | - Saima Jabeen
- College of Engineering, AI Research Center, Alfaisal University, Riyadh, 50927, Saudi Arabia.
| | - Mohammed ELAffendi
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| | - Asmaa A Abdel Latif
- Public Health and Community Medicine Department (Industrial medicine and occupational health specialty, Faculty of Medicine, Menoufia University, Shibîn el Kôm, Egypt
| | - Nuhman Ul Haq
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, KPK, Pakistan
| | - Gauhar Ali
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| |
Collapse
|
3
|
Wei H, Zhao Z, Luo R. Machine-Learned Molecular Surface and Its Application to Implicit Solvent Simulations. J Chem Theory Comput 2021; 17:6214-6224. [PMID: 34516109 DOI: 10.1021/acs.jctc.1c00492] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Implicit solvent models, such as Poisson-Boltzmann models, play important roles in computational studies of biomolecules. A vital step in almost all implicit solvent models is to determine the solvent-solute interface, and the solvent excluded surface (SES) is the most widely used interface definition in these models. However, classical algorithms used for computing SES are geometry-based, so that they are neither suitable for parallel implementations nor convenient for obtaining surface derivatives. To address the limitations, we explored a machine learning strategy to obtain a level set formulation for the SES. The training process was conducted in three steps, eventually leading to a model with over 95% agreement with the classical SES. Visualization of tested molecular surfaces shows that the machine-learned SES overlaps with the classical SES in almost all situations. Further analyses show that the machine-learned SES is incredibly stable in terms of rotational variation of tested molecules. Our timing analysis shows that the machine-learned SES is roughly 2.5 times as efficient as the classical SES routine implemented in Amber/PBSA on a tested central processing unit (CPU) platform. We expect further performance gain on massively parallel platforms such as graphics processing units (GPUs) given the ease in converting the machine-learned SES to a parallel procedure. We also implemented the machine-learned SES into the Amber/PBSA program to study its performance on reaction field energy calculation. The analysis shows that the two sets of reaction field energies are highly consistent with a 1% deviation on average. Given its level set formulation, we expect the machine-learned SES to be applied in molecular simulations that require either surface derivatives or high efficiency on parallel computing platforms.
Collapse
Affiliation(s)
- Haixin Wei
- Departments of Materials Science and Engineering, Molecular Biology and Biochemistry, Chemical and Biomolecular Engineering, and Biomedical Engineering, Graduate Program in Chemical and Materials Physics, University of California, Irvine, California 92697, United States
| | - Zekai Zhao
- Departments of Materials Science and Engineering, Molecular Biology and Biochemistry, Chemical and Biomolecular Engineering, and Biomedical Engineering, Graduate Program in Chemical and Materials Physics, University of California, Irvine, California 92697, United States
| | - Ray Luo
- Departments of Materials Science and Engineering, Molecular Biology and Biochemistry, Chemical and Biomolecular Engineering, and Biomedical Engineering, Graduate Program in Chemical and Materials Physics, University of California, Irvine, California 92697, United States
| |
Collapse
|
4
|
Shuvo MH, Gulfam M, Bhattacharya D. DeepRefiner: high-accuracy protein structure refinement by deep network calibration. Nucleic Acids Res 2021; 49:W147-W152. [PMID: 33999209 PMCID: PMC8262753 DOI: 10.1093/nar/gkab361] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 04/18/2021] [Accepted: 04/23/2021] [Indexed: 12/20/2022] Open
Abstract
The DeepRefiner webserver, freely available at http://watson.cse.eng.auburn.edu/DeepRefiner/, is an interactive and fully configurable online system for high-accuracy protein structure refinement. Fuelled by deep learning, DeepRefiner offers the ability to leverage cutting-edge deep neural network architectures which can be calibrated for on-demand selection of adventurous or conservative refinement modes targeted at degree or consistency of refinement. The method has been extensively tested in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments under the group name 'Bhattacharya-Server' and was officially ranked as the No. 2 refinement server in CASP13 (second only to 'Seok-server' and outperforming all other refinement servers) and No. 2 refinement server in CASP14 (second only to 'FEIG-S' and outperforming all other refinement servers including 'Seok-server'). The DeepRefiner web interface offers a number of convenient features, including (i) fully customizable refinement job submission and validation; (ii) automated job status update, tracking, and notifications; (ii) interactive and interpretable web-based results retrieval with quantitative and visual analysis and (iv) extensive help information on job submission and results interpretation via web-based tutorial and help tooltips.
Collapse
Affiliation(s)
- Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Muhammad Gulfam
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| |
Collapse
|
5
|
Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. JOURNAL OF BIG DATA 2021; 8:53. [PMID: 33816053 PMCID: PMC8010506 DOI: 10.1186/s40537-021-00444-8] [Citation(s) in RCA: 671] [Impact Index Per Article: 223.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 03/22/2021] [Indexed: 05/04/2023]
Abstract
In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching or even beating those provided by human performance. One of the benefits of DL is the ability to learn massive amounts of data. The DL field has grown fast in the last few years and it has been extensively used to successfully address a wide range of traditional applications. More importantly, DL has outperformed well-known ML techniques in many domains, e.g., cybersecurity, natural language processing, bioinformatics, robotics and control, and medical information processing, among many others. Despite it has been contributed several works reviewing the State-of-the-Art on DL, all of them only tackled one aspect of the DL, which leads to an overall lack of knowledge about it. Therefore, in this contribution, we propose using a more holistic approach in order to provide a more suitable starting point from which to develop a full understanding of DL. Specifically, this review attempts to provide a more comprehensive survey of the most important aspects of DL and including those enhancements recently added to the field. In particular, this paper outlines the importance of DL, presents the types of DL techniques and networks. It then presents convolutional neural networks (CNNs) which the most utilized DL network type and describes the development of CNNs architectures together with their main features, e.g., starting with the AlexNet network and closing with the High-Resolution network (HR.Net). Finally, we further present the challenges and suggested solutions to help researchers understand the existing research gaps. It is followed by a list of the major DL applications. Computational tools including FPGA, GPU, and CPU are summarized along with a description of their influence on DL. The paper ends with the evolution matrix, benchmark datasets, and summary and conclusion.
Collapse
Affiliation(s)
- Laith Alzubaidi
- School of Computer Science, Queensland University of Technology, Brisbane, QLD 4000 Australia
- AlNidhal Campus, University of Information Technology & Communications, Baghdad, 10001 Iraq
| | - Jinglan Zhang
- School of Computer Science, Queensland University of Technology, Brisbane, QLD 4000 Australia
| | - Amjad J. Humaidi
- Control and Systems Engineering Department, University of Technology, Baghdad, 10001 Iraq
| | - Ayad Al-Dujaili
- Electrical Engineering Technical College, Middle Technical University, Baghdad, 10001 Iraq
| | - Ye Duan
- Faculty of Electrical Engineering & Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Omran Al-Shamma
- AlNidhal Campus, University of Information Technology & Communications, Baghdad, 10001 Iraq
| | - J. Santamaría
- Department of Computer Science, University of Jaén, 23071 Jaén, Spain
| | - Mohammed A. Fadhel
- College of Computer Science and Information Technology, University of Sumer, Thi Qar, 64005 Iraq
| | - Muthana Al-Amidie
- Faculty of Electrical Engineering & Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Laith Farhan
- School of Engineering, Manchester Metropolitan University, Manchester, M1 5GD UK
| |
Collapse
|
6
|
Marconi G, Aiello D, Kindiger B, Storchi L, Marrone A, Reale L, Terzaroli N, Albertini E. The Role of APOSTART in Switching between Sexuality and Apomixis in Poa pratensis. Genes (Basel) 2020; 11:genes11080941. [PMID: 32824095 PMCID: PMC7464379 DOI: 10.3390/genes11080941] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 08/11/2020] [Accepted: 08/11/2020] [Indexed: 12/20/2022] Open
Abstract
The production of seeds without sex is considered the holy grail of plant biology. The transfer of apomixis to various crop species has the potential to transform plant breeding, since it will allow new varieties to retain valuable traits thorough asexual reproduction. Therefore, a greater molecular understanding of apomixis is fundamental. In a previous work we identified a gene, namely APOSTART, that seemed to be involved in this asexual mode of reproduction, which is very common in Poa pratensis L., and here we present a detailed work aimed at clarifying its role in apomixis. In situ hybridization showed that PpAPOSTART is expressed in reproductive tissues from pre-meiosis to embryo development. Interestingly, it is expressed early in few nucellar cells of apomictic individuals possibly switching from a somatic to a reproductive cell as in aposporic apomixis. Moreover, out of 13 APOSTART members, we identified one, APOSTART_6, as specifically expressed in flower tissue. APOSTART_6 also exhibited delayed expression in apomictic genotypes when compared with sexual types. Most importantly, the SCAR (Sequence Characterized Amplified Region) derived from the APOSTART_6 sequence completely co-segregated with apomixis.
Collapse
Affiliation(s)
- Gianpiero Marconi
- Dipartimento di Scienze Agrarie, Alimentari e Ambientali, Università degli Studi di Perugia, Borgo XX Giugno 74, 06121 Perugia, Italy; (G.M.); (D.A.); (L.R.); (N.T.)
| | - Domenico Aiello
- Dipartimento di Scienze Agrarie, Alimentari e Ambientali, Università degli Studi di Perugia, Borgo XX Giugno 74, 06121 Perugia, Italy; (G.M.); (D.A.); (L.R.); (N.T.)
| | - Bryan Kindiger
- USDA-ARS, Grazinglands Research Laboratory, 7207 West Cheyenne St., El Reno, OK 73036, USA;
| | - Loriano Storchi
- Dipartimento di Farmacia, Università G. d’Annunzio, via dei Vestini 31, 66100 Chieti, Italy; (L.S.); (A.M.)
- Molecular Discovery Limited, Elstree WD6 3FG, UK
| | - Alessandro Marrone
- Dipartimento di Farmacia, Università G. d’Annunzio, via dei Vestini 31, 66100 Chieti, Italy; (L.S.); (A.M.)
| | - Lara Reale
- Dipartimento di Scienze Agrarie, Alimentari e Ambientali, Università degli Studi di Perugia, Borgo XX Giugno 74, 06121 Perugia, Italy; (G.M.); (D.A.); (L.R.); (N.T.)
| | - Niccolò Terzaroli
- Dipartimento di Scienze Agrarie, Alimentari e Ambientali, Università degli Studi di Perugia, Borgo XX Giugno 74, 06121 Perugia, Italy; (G.M.); (D.A.); (L.R.); (N.T.)
| | - Emidio Albertini
- Dipartimento di Scienze Agrarie, Alimentari e Ambientali, Università degli Studi di Perugia, Borgo XX Giugno 74, 06121 Perugia, Italy; (G.M.); (D.A.); (L.R.); (N.T.)
- Correspondence:
| |
Collapse
|
7
|
LRRpredictor-A New LRR Motif Detection Method for Irregular Motifs of Plant NLR Proteins Using an Ensemble of Classifiers. Genes (Basel) 2020; 11:genes11030286. [PMID: 32182725 PMCID: PMC7140858 DOI: 10.3390/genes11030286] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 02/28/2020] [Accepted: 03/04/2020] [Indexed: 12/17/2022] Open
Abstract
Leucine-rich-repeats (LRRs) belong to an archaic procaryal protein architecture that is widely involved in protein-protein interactions. In eukaryotes, LRR domains developed into key recognition modules in many innate immune receptor classes. Due to the high sequence variability imposed by recognition specificity, precise repeat delineation is often difficult especially in plant NOD-like Receptors (NLRs) notorious for showing far larger irregularities. To address this problem, we introduce here LRRpredictor, a method based on an ensemble of estimators designed to better identify LRR motifs in general but particularly adapted for handling more irregular LRR environments, thus allowing to compensate for the scarcity of structural data on NLR proteins. The extrapolation capacity tested on a set of annotated LRR domains from six immune receptor classes shows the ability of LRRpredictor to recover all previously defined specific motif consensuses and to extend the LRR motif coverage over annotated LRR domains. This analysis confirms the increased variability of LRR motifs in plant and vertebrate NLRs when compared to extracellular receptors, consistent with previous studies. Hence, LRRpredictor is able to provide novel insights into the diversification of LRR domains and a robust support for structure-informed analyses of LRRs in immune receptor functioning.
Collapse
|
8
|
Mosior J, Bourland R, Soma S, Nathan C, Sacchettini J. Structural insights into phosphopantetheinyl hydrolase PptH from Mycobacterium tuberculosis. Protein Sci 2019; 29:744-757. [PMID: 31886928 DOI: 10.1002/pro.3813] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 12/19/2019] [Accepted: 12/23/2019] [Indexed: 11/07/2022]
Abstract
The amidinourea 8918 was recently reported to inhibit the type II phosphopantetheinyl transferase (PPTase) of Mycobacterium tuberculosis (Mtb), PptT, a potential drug-target that activates synthases and synthetases involved in cell wall biosynthesis and secondary metabolism. Surprisingly, high-level resistance to 8918 occurred in Mtb harboring mutations within the gene adjacent to pptT, rv2795c, highlighting the role of the encoded protein as a potentiator of the bactericidal action of the amidinourea. Those studies revealed that Rv2795c (PptH) is a phosphopantetheinyl (PpT) hydrolase, possessing activity antagonistic with respect to PptT. We have solved the crystal structure of Mtb's phosphopantetheinyl hydrolase, making it the first phosphopantetheinyl (carrier protein) hydrolase structurally characterized. The 2.5 Å structure revealed the hydrolases' four-layer (α/β/β/α) sandwich fold featuring a Mn-Fe binuclear center within the active site. A structural similarity search confirmed that PptH most closely resembles previously characterized metallophosphoesterases (MPEs), particularly within the vicinity of the active site, suggesting that it may utilize a similar catalytic mechanism. In addition, analysis of the structure has allowed for the rationalization of the previously reported PptH mutations associated with 8918-resistance. Notably, differences in the sequences and predicted structural characteristics of the PpT hydrolases PptH of Mtb and E. coli's acyl carrier protein hydrolase (AcpH) indicate that the two enzymes evolved convergently and therefore are representative of two distinct PpT hydrolase families.
Collapse
Affiliation(s)
- John Mosior
- Department of Biochemistry and Biophysics, Texas Agricultural and Mechanical University, College Station, Texas
| | - Ronnie Bourland
- Department of Biochemistry and Biophysics, Texas Agricultural and Mechanical University, College Station, Texas
| | - Shivatheja Soma
- Department of Biochemistry and Biophysics, Texas Agricultural and Mechanical University, College Station, Texas
| | - Carl Nathan
- Department of Microbiology and Immunology, Weill Cornell Medicine, New York, New York
| | - James Sacchettini
- Department of Biochemistry and Biophysics, Texas Agricultural and Mechanical University, College Station, Texas
| |
Collapse
|
9
|
Deep learning in bioinformatics: Introduction, application, and perspective in the big data era. Methods 2019; 166:4-21. [PMID: 31022451 DOI: 10.1016/j.ymeth.2019.04.008] [Citation(s) in RCA: 125] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 03/23/2019] [Accepted: 04/15/2019] [Indexed: 12/13/2022] Open
Abstract
Deep learning, which is especially formidable in handling big data, has achieved great success in various fields, including bioinformatics. With the advances of the big data era in biology, it is foreseeable that deep learning will become increasingly important in the field and will be incorporated in vast majorities of analysis pipelines. In this review, we provide both the exoteric introduction of deep learning, and concrete examples and implementations of its representative applications in bioinformatics. We start from the recent achievements of deep learning in the bioinformatics field, pointing out the problems which are suitable to use deep learning. After that, we introduce deep learning in an easy-to-understand fashion, from shallow neural networks to legendary convolutional neural networks, legendary recurrent neural networks, graph neural networks, generative adversarial networks, variational autoencoder, and the most recent state-of-the-art architectures. After that, we provide eight examples, covering five bioinformatics research directions and all the four kinds of data type, with the implementation written in Tensorflow and Keras. Finally, we discuss the common issues, such as overfitting and interpretability, that users will encounter when adopting deep learning methods and provide corresponding suggestions. The implementations are freely available at https://github.com/lykaust15/Deep_learning_examples.
Collapse
|
10
|
WaveNano: a signal-level nanopore base-caller via simultaneous prediction of nucleotide labels and move labels through bi-directional WaveNets. QUANTITATIVE BIOLOGY 2018. [DOI: 10.1007/s40484-018-0155-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
11
|
Wang Z, Jumper JM, Wang S, Freed KF, Sosnick TR. A Membrane Burial Potential with H-Bonds and Applications to Curved Membranes and Fast Simulations. Biophys J 2018; 115:1872-1884. [PMID: 30413241 DOI: 10.1016/j.bpj.2018.10.012] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 09/21/2018] [Accepted: 10/10/2018] [Indexed: 10/28/2022] Open
Abstract
We use the statistics of a large and curated training set of transmembrane helical proteins to develop a knowledge-based potential that accounts for the dependence on both the depth of burial of the protein in the membrane and the degree of side-chain exposure. Additionally, the statistical potential includes depth-dependent energies for unsatisfied backbone hydrogen bond donors and acceptors, which are found to be relatively small, ∼2 RT. Our potential accurately places known proteins within the bilayer. The potential is applied to the mechanosensing MscL channel in membranes of varying thickness and curvature, as well as to the prediction of protein structure. The potential is incorporated into our new Upside molecular dynamics algorithm. Notably, we account for the exchange of protein-lipid interactions for protein-protein interactions as helices contact each other, thereby avoiding overestimating the energetics of helix association within the membrane. Simulations of most multimeric complexes find that isolated monomers and the oligomers retain the same orientation in the membrane, suggesting that the assembly of prepositioned monomers presents a viable mechanism of oligomerization.
Collapse
Affiliation(s)
- Zongan Wang
- Department of Chemistry, The University of Chicago, Chicago, Illinois; James Franck Institute, The University of Chicago, Chicago, Illinois
| | - John M Jumper
- Department of Chemistry, The University of Chicago, Chicago, Illinois; James Franck Institute, The University of Chicago, Chicago, Illinois; Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, Illinois
| | - Sheng Wang
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia; Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Karl F Freed
- Department of Chemistry, The University of Chicago, Chicago, Illinois; James Franck Institute, The University of Chicago, Chicago, Illinois.
| | - Tobin R Sosnick
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, Illinois; Institute for Biophysical Dynamics, The University of Chicago, Chicago, Illinois.
| |
Collapse
|
12
|
Shao M, Ma J, Wang S. DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields. Bioinformatics 2018; 33:i267-i273. [PMID: 28881999 PMCID: PMC5870651 DOI: 10.1093/bioinformatics/btx267] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Motivation Reconstructing the full-length expressed transcripts (a.k.a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak. Results We present DeepBound, an effective approach to identify boundaries of expressed transcripts from RNA-seq reads alignment. In its core DeepBound employs deep convolutional neural fields to learn the hidden distributions and patterns of boundaries. To accurately model the transition probabilities and to solve the label-imbalance problem, we novelly incorporate the AUC (area under the curve) score into the optimizing objective function. To address the issue that deep probabilistic graphical models requires large number of labeled training samples, we propose to use simulated RNA-seq datasets to train our model. Through extensive experimental studies on both simulation datasets of two species and biological datasets, we show that DeepBound consistently and significantly outperforms the two existing methods. Availability and implementation DeepBound is freely available at https://github.com/realbigws/DeepBound.
Collapse
Affiliation(s)
- Mingfu Shao
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
- To whom correspondence should be addressed. or
| | - Jianzhu Ma
- School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Sheng Wang
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- To whom correspondence should be addressed. or
| |
Collapse
|
13
|
Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 2018; 15:20170387. [PMID: 29618526 PMCID: PMC5938574 DOI: 10.1098/rsif.2017.0387] [Citation(s) in RCA: 790] [Impact Index Per Article: 131.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 03/07/2018] [Indexed: 11/12/2022] Open
Abstract
Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.
Collapse
Affiliation(s)
- Travers Ching
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, HI, USA
| | - Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Brett K Beaulieu-Jones
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Alexandr A Kalinin
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | - Gregory P Way
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Enrico Ferrero
- Computational Biology and Stats, Target Sciences, GlaxoSmithKline, Stevenage, UK
| | | | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Wei Xie
- Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Gail L Rosen
- Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Benjamin J Lengerich
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Johnny Israeli
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Jack Lanchantin
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Stephen Woloszynek
- Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Anne E Carpenter
- Imaging Platform, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Evan M Cofer
- Department of Computer Science, Trinity University, San Antonio, TX, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Christopher A Lavender
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
| | - Srinivas C Turaga
- Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - David J Harris
- Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
| | | | - Yanjun Qi
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Yifan Peng
- National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Laura K Wiley
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, CO, USA
| | - Marwin H S Segler
- Institute of Organic Chemistry, Westfälische Wilhelms-Universität Münster, Münster, Germany
| | - Simina M Boca
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - S Joshua Swamidass
- Department of Pathology and Immunology, Washington University in Saint Louis, St Louis, MO, USA
| | - Austin Huang
- Department of Medicine, Brown University, Providence, RI, USA
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
14
|
Abstract
BACKGROUND Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. RESULTS To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. CONCLUSION We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes' contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.
Collapse
Affiliation(s)
- Rui Xie
- Department of Computer Science, University of Missouri at Columbia, Columbia, MO USA
| | - Jia Wen
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, University City Blvd, Charlotte, NC USA
| | - Andrew Quitadamo
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, University City Blvd, Charlotte, NC USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri at Columbia, Columbia, MO USA
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, University City Blvd, Charlotte, NC USA
| |
Collapse
|