51
|
Bajaj P, Manjunath K, Varadarajan R. Structural and functional determinants inferred from deep mutational scans. Protein Sci 2022; 31:e4357. [PMID: 35762712 DOI: 10.1002/pro.4357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 04/04/2022] [Accepted: 05/11/2022] [Indexed: 11/08/2022]
Abstract
Mutations that affect protein binding to a cognate partner primarily occur either at buried residues or at exposed residues directly involved in partner binding. Distinguishing between these two categories based solely on mutational phenotypes is challenging. The bacterial toxin CcdB kills cells by binding to DNA Gyrase. Cell death is prevented by binding to its cognate antitoxin CcdA, at an extended interface that partially overlaps with the GyrA binding site. Using the CcdAB toxin-antitoxin (TA) system as a model, a comprehensive site-saturation mutagenesis library of CcdB was generated in its native operonic context. The mutational sensitivity of each mutant was estimated by evaluating the relative abundance of each mutant in two strains, one resistant and the other sensitive to the toxic activity of the CcdB toxin, through deep sequencing. The ability to bind CcdA was inferred through a RelE reporter gene assay, since the CcdAB complex binds to its own promoter, repressing transcription. By analyzing mutant phenotypes in the CcdB-sensitive, CcdB-resistant, and RelE reporter strains, it was possible to assign residues to buried, CcdA interacting or GyrA interacting sites. A few mutants were individually constructed, expressed, and biophysically characterized to validate molecular mechanisms responsible for the observed phenotypes. Residues inferred to be important for antitoxin binding, are also likely to be important for rejuvenating CcdB from the CcdB-Gyrase complex. Therefore, even in the absence of structural information, when coupled to appropriate genetic screens, such high-throughput strategies can be deployed for predicting structural and functional determinants of proteins.
Collapse
Affiliation(s)
- Priyanka Bajaj
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Kavyashree Manjunath
- Centre for Chemical Biology and Therapeutics, Institute for Stem Cell Science and Regenerative Medicine, Bangalore, India
| | | |
Collapse
|
52
|
Application of Deep Learning Models and Network Method for Comprehensive Air-Quality Index Prediction. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Accurate pollutant prediction is essential in fields such as meteorology, meteorological disasters, and climate change studies. In this study, long short-term memory (LSTM) and deep neural network (DNN) models were applied to six pollutants and comprehensive air-quality index (CAI) predictions from 2015 to 2020 in Korea. In addition, we used the network method to find the best data sources that provide factors affecting comprehensive air-quality index behaviors. This study had two steps: (1) predicting the six pollutants, including fine dust (PM10), fine particulate matter (PM2.5), ozone (O3), sulfurous acid gas (SO2), nitrogen dioxide (NO2), and carbon monoxide (CO) using the LSTM model; (2) forecasting the CAI using the six predicted pollutants in the first step as predictors of DNNs. The predictive ability of each model for the six pollutants and CAI prediction was evaluated by comparing it with the observed air-quality data. This study showed that combining a DNN model with the network method provided a high predictive power, and this combination could be a remarkable strength in CAI prediction. As the need for disaster management increases, it is anticipated that the LSTM and DNN models with the network method have ample potential to track the dynamics of air pollution behaviors.
Collapse
|
53
|
Molina RS, Rix G, Mengiste AA, Alvarez B, Seo D, Chen H, Hurtado J, Zhang Q, Donato García-García J, Heins ZJ, Almhjell PJ, Arnold FH, Khalil AS, Hanson AD, Dueber JE, Schaffer DV, Chen F, Kim S, Ángel Fernández L, Shoulders MD, Liu CC. In vivo hypermutation and continuous evolution. NATURE REVIEWS. METHODS PRIMERS 2022; 2:37. [PMID: 37073402 PMCID: PMC10108624 DOI: 10.1038/s43586-022-00130-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Rosana S. Molina
- Department of Biomedical Engineering, University of California, Irvine, CA 92617, USA
| | - Gordon Rix
- Department of Molecular Biology and Biochemistry, University of California, Irvine, CA 92697, USA
| | - Amanuella A. Mengiste
- Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Beatriz Alvarez
- Department of Microbial Biotechnology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas (CNB-CSIC), Darwin 3, Campus UAM Cantoblanco, 28049 Madrid, Spain
| | - Daeje Seo
- Department of Chemistry, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| | - Haiqi Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Juan Hurtado
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
| | - Qiong Zhang
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
| | - Jorge Donato García-García
- Tecnologico de Monterrey, Escuela de Ingenieria y Ciencias, Av. General Ramon Corona 2514, Nuevo Mexico, C.P. 45138, Zapopan, Jalisco, Mexico
| | - Zachary J. Heins
- Biological Design Center, Boston University, Boston, Massachusetts, USA
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Patrick J. Almhjell
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Frances H. Arnold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Ahmad S. Khalil
- Biological Design Center, Boston University, Boston, Massachusetts, USA
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, Massachusetts, USA
| | - Andrew D. Hanson
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - John E. Dueber
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Innovative Genomics Institute, University of California Berkeley and San Francisco, Berkeley, CA, USA
- Biological Systems & Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - David V. Schaffer
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Innovative Genomics Institute, University of California Berkeley and San Francisco, Berkeley, CA, USA
- Department of Chemical and Biomolecular Engineering, University of California Berkeley, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA
| | - Fei Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Seokhee Kim
- Department of Chemistry, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| | - Luis Ángel Fernández
- Department of Microbial Biotechnology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas (CNB-CSIC), Darwin 3, Campus UAM Cantoblanco, 28049 Madrid, Spain
| | - Matthew D. Shoulders
- Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Chang C. Liu
- Department of Biomedical Engineering, University of California, Irvine, CA 92617, USA
- Department of Molecular Biology and Biochemistry, University of California, Irvine, CA 92697, USA
- Department of Chemistry, University of California, Irvine, CA 92617, USA
| |
Collapse
|
54
|
Panapitiya G, Girard M, Hollas A, Sepulveda J, Murugesan V, Wang W, Saldanha E. Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction. ACS OMEGA 2022; 7:15695-15710. [PMID: 35571767 PMCID: PMC9096921 DOI: 10.1021/acsomega.2c00642] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 04/11/2022] [Indexed: 05/17/2023]
Abstract
Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goals of this study are to assess current deep learning methods for solubility prediction, develop a general model capable of predicting the solubility of a broad range of organic molecules, and to understand the impact of data properties, molecular representation, and modeling architecture on predictive performance. Using the largest currently available solubility data set, we implement deep learning-based models to predict solubility from the molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system strings, molecular graphs, and three-dimensional atomic coordinates using four different neural network architectures-fully connected neural networks, recurrent neural networks, graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about the molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance.
Collapse
|
55
|
Eliasof M, Boesen T, Haber E, Keasar C, Treister E. Mimetic Neural Networks: A Unified Framework for Protein Design and Folding. FRONTIERS IN BIOINFORMATICS 2022; 2:715006. [DOI: 10.3389/fbinf.2022.715006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 03/29/2022] [Indexed: 11/13/2022] Open
Abstract
Recent advancements in machine learning techniques for protein structure prediction motivate better results in its inverse problem–protein design. In this work we introduce a new graph mimetic neural network, MimNet, and show that it is possible to build a reversible architecture that solves the structure and design problems in tandem, allowing to improve protein backbone design when the structure is better estimated. We use the ProteinNet data set and show that the state of the art results in protein design can be met and even improved, given recent architectures for protein folding.
Collapse
|
56
|
Abstract
Three-dimensional protein structural data at the molecular level are pivotal for successful precision medicine. Such data are crucial not only for discovering drugs that act to block the active site of the target mutant protein but also for clarifying to the patient and the clinician how the mutations harbored by the patient work. The relative paucity of structural data reflects their cost, challenges in their interpretation, and lack of clinical guidelines for their utilization. Rapid technological advancements in experimental high-resolution structural determination increasingly generate structures. Computationally, modeling algorithms, including molecular dynamics simulations, are becoming more powerful, as are compute-intensive hardware, particularly graphics processing units, overlapping with the inception of the exascale era. Accessible, freely available, and detailed structural and dynamical data can be merged with big data to powerfully transform personalized pharmacology. Here we review protein and emerging genome high-resolution data, along with means, applications, and examples underscoring their usefulness in precision medicine. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA; .,Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA;
| | - Guy Nir
- Department of Biochemistry and Molecular Biology, Department of Neuroscience, Cell Biology and Anatomy, and Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, Texas, USA
| | - Chung-Jung Tsai
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA;
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA.,Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, Ohio, USA.,Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| |
Collapse
|
57
|
Gu J, Zhang T, Wu C, Liang Y, Shi X. Refined Contact Map Prediction of Peptides Based on GCN and ResNet. Front Genet 2022; 13:859626. [PMID: 35571037 PMCID: PMC9092020 DOI: 10.3389/fgene.2022.859626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 03/23/2022] [Indexed: 11/13/2022] Open
Abstract
Predicting peptide inter-residue contact maps plays an important role in computational biology, which determines the topology of the peptide structure. However, due to the limited number of known homologous structures, there is still much room for inter-residue contact map prediction. Current models are not sufficient for capturing the high accuracy relationship between the residues, especially for those with a long-range distance. In this article, we developed a novel deep neural network framework to refine the rough contact map produced by the existing methods. The rough contact map is used to construct the residue graph that is processed by the graph convolutional neural network (GCN). GCN can better capture the global information and is therefore used to grasp the long-range contact relationship. The residual convolutional neural network is also applied in the framework for learning local information. We conducted the experiments on four different test datasets, and the inter-residue long-range contact map prediction accuracy demonstrates the effectiveness of our proposed method.
Collapse
Affiliation(s)
- Jiawei Gu
- College of Computer Science and Technology, University of Jilin, Changchun, China
| | - Tianhao Zhang
- College of Computer Science and Technology, University of Jilin, Changchun, China
| | - Chunguo Wu
- College of Computer Science and Technology, University of Jilin, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, China
| | - Yanchun Liang
- College of Computer Science and Technology, University of Jilin, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, China
- School of Computer Science, Zhuhai College of Science and Technology, Zhuhai, China
| | - Xiaohu Shi
- College of Computer Science and Technology, University of Jilin, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, China
- School of Computer Science, Zhuhai College of Science and Technology, Zhuhai, China
- *Correspondence: Xiaohu Shi,
| |
Collapse
|
58
|
V HH Structural Modelling Approaches: A Critical Review. Int J Mol Sci 2022; 23:ijms23073721. [PMID: 35409081 PMCID: PMC8998791 DOI: 10.3390/ijms23073721] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/23/2022] [Accepted: 03/23/2022] [Indexed: 12/20/2022] Open
Abstract
VHH, i.e., VH domains of camelid single-chain antibodies, are very promising therapeutic agents due to their significant physicochemical advantages compared to classical mammalian antibodies. The number of experimentally solved VHH structures has significantly improved recently, which is of great help, because it offers the ability to directly work on 3D structures to humanise or improve them. Unfortunately, most VHHs do not have 3D structures. Thus, it is essential to find alternative ways to get structural information. The methods of structure prediction from the primary amino acid sequence appear essential to bypass this limitation. This review presents the most extensive overview of structure prediction methods applied for the 3D modelling of a given VHH sequence (a total of 21). Besides the historical overview, it aims at showing how model software programs have been shaping the structural predictions of VHHs. A brief explanation of each methodology is supplied, and pertinent examples of their usage are provided. Finally, we present a structure prediction case study of a recently solved VHH structure. According to some recent studies and the present analysis, AlphaFold 2 and NanoNet appear to be the best tools to predict a structural model of VHH from its sequence.
Collapse
|
59
|
Choudhury C, Arul Murugan N, Deva Priyakumar U. Structure-based drug repurposing: traditional and advanced AI/ML-aided methods. Drug Discov Today 2022; 27:1847-1861. [PMID: 35301148 PMCID: PMC8920090 DOI: 10.1016/j.drudis.2022.03.006] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 02/16/2022] [Accepted: 03/10/2022] [Indexed: 02/08/2023]
Abstract
The current global health emergency in the form of the Coronavirus 2019 (COVID-19) pandemic has highlighted the need for fast, accurate, and efficient drug discovery pipelines. Traditional drug discovery projects relying on in vitro high-throughput screening (HTS) involve large investments and sophisticated experimental set-ups, affordable only to big biopharmaceutical companies. In this scenario, application of efficient state-of-the-art computational methods and modern artificial intelligence (AI)-based algorithms for rapid screening of repurposable chemical space [approved drugs and natural products (NPs) with proven pharmacokinetic profiles] to identify the initial leads is a powerful option to save resources and time. Structure-based drug repurposing is a popular in silico repurposing approach. In this review, we discuss traditional and modern AI-based computational methods and tools applied at various stages for structure-based drug discovery (SBDD) pipelines. Additionally, we highlight the role of generative models in generating molecules with scaffolds from repurposable chemical space. Teaser: This review highlights the importance of repurposable chemical space, and the contributions of conventional in silico approaches and modern machine-learning algorithms for rapid structure-based drug repurposing.
Collapse
Affiliation(s)
- Chinmayee Choudhury
- Department of Experimental Medicine and Biotechnology, Postgraduate Institute of Medical Education and Research, Sector-12, Chandigarh 160012, India
| | - N Arul Murugan
- Department of Computer Science, School of Electrical Engineering and Computer Sciences, KTH Royal Institute of Technology, S-100 44, Stockholm, Sweden; Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi 110020, India.
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| |
Collapse
|
60
|
Yu CH, Chen W, Chiang YH, Guo K, Martin Moldes Z, Kaplan DL, Buehler MJ. End-to-End Deep Learning Model to Predict and Design Secondary Structure Content of Structural Proteins. ACS Biomater Sci Eng 2022; 8:1156-1165. [PMID: 35129957 PMCID: PMC9347213 DOI: 10.1021/acsbiomaterials.1c01343] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Structural proteins are the basis of many biomaterials and key construction and functional components of all life. Further, it is well-known that the diversity of proteins' function relies on their local structures derived from their primary amino acid sequences. Here, we report a deep learning model to predict the secondary structure content of proteins directly from primary sequences, with high computational efficiency. Understanding the secondary structure content of proteins is crucial to designing proteins with targeted material functions, especially mechanical properties. Using convolutional and recurrent architectures and natural language models, our deep learning model predicts the content of two essential types of secondary structures, the α-helix and the β-sheet. The training data are collected from the Protein Data Bank and contain many existing protein geometries. We find that our model can learn the hidden features as patterns of input sequences that can then be directly related to secondary structure content. The α-helix and β-sheet content predictions show excellent agreement with training data and newly deposited protein structures that were recently identified and that were not included in the original training set. We further demonstrate the features of the model by a search for de novo protein sequences that optimize max/min α-helix/β-sheet content and compare the predictions with folded models of these sequences based on AlphaFold2. Excellent agreement is found, underscoring that our model has predictive potential for rapidly designing proteins with specific secondary structures and could be widely applied to biomedical industries, including protein biomaterial designs and regenerative medicine applications.
Collapse
Affiliation(s)
- Chi-Hua Yu
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.,Department of Engineering Science, National Cheng Kung University, No.1, University Road, Tainan City 701, Taiwan
| | - Wei Chen
- Department of Engineering Science, National Cheng Kung University, No.1, University Road, Tainan City 701, Taiwan
| | - Yu-Hsuan Chiang
- Department of Civil Engineering, National Cheng Kung University, No.1, University Road, Tainan City 701, Taiwan
| | - Kai Guo
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Zaira Martin Moldes
- Department of Biomedical Engineering, Tufts University, Medford, Massachusetts 02155, United States
| | - David L Kaplan
- Department of Biomedical Engineering, Tufts University, Medford, Massachusetts 02155, United States
| | - Markus J Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.,Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.,Center for Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
61
|
Nair S, Shrikumar A, Schreiber J, Kundaje A. fastISM: performant in silico saturation mutagenesis for convolutional neural networks. Bioinformatics 2022; 38:2397-2403. [PMID: 35238376 PMCID: PMC9048647 DOI: 10.1093/bioinformatics/btac135] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 02/09/2022] [Accepted: 03/01/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Deep-learning models, such as convolutional neural networks, are able to accurately map biological sequences to associated functional readouts and properties by learning predictive de novo representations. In silico saturation mutagenesis (ISM) is a popular feature attribution technique for inferring contributions of all characters in an input sequence to the model's predicted output. The main drawback of ISM is its runtime, as it involves multiple forward propagations of all possible mutations of each character in the input sequence through the trained model to predict the effects on the output. RESULTS We present fastISM, an algorithm that speeds up ISM by a factor of over 10× for commonly used convolutional neural network architectures. fastISM is based on the observations that the majority of computation in ISM is spent in convolutional layers, and a single mutation only disrupts a limited region of intermediate layers, rendering most computation redundant. fastISM reduces the gap between backpropagation-based feature attribution methods and ISM. It far surpasses the runtime of backpropagation-based methods on multi-output architectures, making it feasible to run ISM on a large number of sequences. AVAILABILITY AND IMPLEMENTATION An easy-to-use Keras/TensorFlow 2 implementation of fastISM is available at https://github.com/kundajelab/fastISM. fastISM can be installed using pip install fastism. A hands-on tutorial can be found at https://colab.research.google.com/github/kundajelab/fastISM/blob/master/notebooks/colab/DeepSEA.ipynb. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Surag Nair
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | | | | | | |
Collapse
|
62
|
Zhao B, Kurgan L. Deep Learning in Prediction of Intrinsic Disorder in Proteins. Comput Struct Biotechnol J 2022; 20:1286-1294. [PMID: 35356546 PMCID: PMC8927795 DOI: 10.1016/j.csbj.2022.03.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/04/2022] [Accepted: 03/04/2022] [Indexed: 12/12/2022] Open
|
63
|
Ray A. Machine learning in postgenomic biology and personalized medicine. WILEY INTERDISCIPLINARY REVIEWS. DATA MINING AND KNOWLEDGE DISCOVERY 2022; 12:e1451. [PMID: 35966173 PMCID: PMC9371441 DOI: 10.1002/widm.1451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 12/22/2021] [Indexed: 06/15/2023]
Abstract
In recent years Artificial Intelligence in the form of machine learning has been revolutionizing biology, biomedical sciences, and gene-based agricultural technology capabilities. Massive data generated in biological sciences by rapid and deep gene sequencing and protein or other molecular structure determination, on the one hand, requires data analysis capabilities using machine learning that are distinctly different from classical statistical methods; on the other, these large datasets are enabling the adoption of novel data-intensive machine learning algorithms for the solution of biological problems that until recently had relied on mechanistic model-based approaches that are computationally expensive. This review provides a bird's eye view of the applications of machine learning in post-genomic biology. Attempt is also made to indicate as far as possible the areas of research that are poised to make further impacts in these areas, including the importance of explainable artificial intelligence (XAI) in human health. Further contributions of machine learning are expected to transform medicine, public health, agricultural technology, as well as to provide invaluable gene-based guidance for the management of complex environments in this age of global warming.
Collapse
Affiliation(s)
- Animesh Ray
- Riggs School of Applied Life Sciences, Keck Graduate Institute, 535 Watson Drive, Claremont, CA91711, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, USA
| |
Collapse
|
64
|
Saleh RO, Essia INA, Jasim SA. The Anticancer Effect of a Conjugated Antimicrobial Peptide Against Colorectal Cancer (CRC) Cells. J Gastrointest Cancer 2022; 54:165-170. [PMID: 35217999 DOI: 10.1007/s12029-021-00799-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/30/2021] [Indexed: 01/05/2023]
Abstract
PURPOSE Although antimicrobial peptides (AMPs) were initially known as compounds of the innate immune system to fight microbial pathogens, it has been recently proposed that differences in normal and cancer cell membranes cause the anticancer effect of these peptides. The aim of this study was to evaluate the anticancer effect of MELITININ+BMAP27-conjugated peptide against colorectal cancer (CRC) cells. METHODS The MELITININ+BMAP27-conjugated peptides were designed and the β-naphthylalanine residues were added to the termini to improve the anticancer effect. CRC cancer cell lines including HT29, SW742, HCT-116, and WiDr were used. After preparing concentrations of 5, 10, 25, 50, 100, 150, 200, and 400 μg/mL of peptide solution, the rate of cell death after 12, 24, and 48 h was assessed using MTT test. After confirmation of the 30 µg/mL efficacy and nontoxic concentration, the cells were exposed to this concentration, and the total RNA was extracted. The quantitative real-time PCR (RT-qPCR) technique was performed for the amplification of Bax, caspase3, atg5, and GAPDH (glyceraldehyde 3-phosphate dehydrogenase as the internal control) genes. RESULTS The cytotoxicity of peptide against normal cells exhibited that the IC50 at 24 and 4 h included 80 and 100 µg/mL, respectively. After 24-72 h of treatment, a significant difference in the mean percentage of CRC living cells was observed at concentrations of 50-400 μg/mL of conjugated peptide (p < 0.05). The IC50 of the peptide at 24, 48, and 72 h of exposure was measured as 30, 20, and 10 μg/mL, respectively. The peptide resulted in a significant increase of 2.35-fold in the mean expression of Bax gene in CRC cells (p < 0.001). It also caused a significant increase of 1.75 times (p = 0.0112) of caspase 3 gene and 1.2 times (p = 0.0217) of atg5 gene. There was no significant difference among cell lines regarding the expression of each gene. CONCLUSION The conjugated peptide caused the death of CRC lines via induction of the apoptosis and necrosis mechanisms. More studies are needed in this regard.
Collapse
Affiliation(s)
- Raed Obaid Saleh
- Department of Pharmacy, Al-Maarif University College, Ramadi City, Al-Anbar, Iraq
| | | | | |
Collapse
|
65
|
Cheng J, Xu Y, Zhao Y. Prediction of protein secondary structure based on deep residual convolutional neural network. BIOTECHNOL BIOTEC EQ 2022. [DOI: 10.1080/13102818.2022.2026815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
Affiliation(s)
- Jinyong Cheng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, PR China
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, PR China
| | - Ying Xu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, PR China
| | - Yunxiang Zhao
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, PR China
| |
Collapse
|
66
|
Artificial Intelligence in Medicine: Biochemical 3D Modeling and Drug Discovery. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
67
|
Fischer S, Stegmann F, Gnanapragassam VS, Lepenies B. From structure to function – Ligand recognition by myeloid C-type lectin receptors. Comput Struct Biotechnol J 2022; 20:5790-5812. [DOI: 10.1016/j.csbj.2022.10.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/14/2022] [Accepted: 10/14/2022] [Indexed: 11/29/2022] Open
|
68
|
Abstract
INTRODUCTION Intrinsic disorder prediction field develops, assesses, and deploys computational predictors of disorder in protein sequences and constructs and disseminates databases of these predictions. Over 40 years of research resulted in the release of numerous resources. AREAS COVERED We identify and briefly summarize the most comprehensive to date collection of over 100 disorder predictors. We focus on their predictive models, availability and predictive performance. We categorize and study them from a historical point of view to highlight informative trends. EXPERT OPINION We find a consistent trend of improvements in predictive quality as newer and more advanced predictors are developed. The original focus on machine learning methods has shifted to meta-predictors in early 2010s, followed by a recent transition to deep learning. The use of deep learners will continue in foreseeable future given recent and convincing success of these methods. Moreover, a broad range of resources that facilitate convenient collection of accurate disorder predictions is available to users. They include web servers and standalone programs for disorder prediction, servers that combine prediction of disorder and disorder functions, and large databases of pre-computed predictions. We also point to the need to address the shortage of accurate methods that predict disordered binding regions.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| |
Collapse
|
69
|
Artificial intelligence unifies knowledge and actions in drug repositioning. Emerg Top Life Sci 2021; 5:803-813. [PMID: 34881780 DOI: 10.1042/etls20210223] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 11/08/2021] [Accepted: 11/09/2021] [Indexed: 11/17/2022]
Abstract
Drug repositioning aims to reuse existing drugs, shelved drugs, or drug candidates that failed clinical trials for other medical indications. Its attraction is sprung from the reduction in risk associated with safety testing of new medications and the time to get a known drug into the clinics. Artificial Intelligence (AI) has been recently pursued to speed up drug repositioning and discovery. The essence of AI in drug repositioning is to unify the knowledge and actions, i.e. incorporating real-world and experimental data to map out the best way forward to identify effective therapeutics against a disease. In this review, we share positive expectations for the evolution of AI and drug repositioning and summarize the role of AI in several methods of drug repositioning.
Collapse
|
70
|
Artificial intelligence challenges for predicting the impact of mutations on protein stability. Curr Opin Struct Biol 2021; 72:161-168. [PMID: 34922207 DOI: 10.1016/j.sbi.2021.11.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/15/2021] [Accepted: 11/08/2021] [Indexed: 01/17/2023]
Abstract
Stability is a key ingredient of protein fitness, and its modification through targeted mutations has applications in various fields, such as protein engineering, drug design, and deleterious variant interpretation. Many studies have been devoted over the past decades to build new, more effective methods for predicting the impact of mutations on protein stability based on the latest developments in artificial intelligence. We discuss their features, algorithms, computational efficiency, and accuracy estimated on an independent test set. We focus on a critical analysis of their limitations, the recurrent biases toward the training set, their generalizability, and interpretability. We found that the accuracy of the predictors has stagnated at around 1 kcal/mol for over 15 years. We conclude by discussing the challenges that need to be addressed to reach improved performance.
Collapse
|
71
|
Gelman S, Fahlberg SA, Heinzelman P, Romero PA, Gitter A. Neural networks to learn protein sequence-function relationships from deep mutational scanning data. Proc Natl Acad Sci U S A 2021; 118:e2104878118. [PMID: 34815338 PMCID: PMC8640744 DOI: 10.1073/pnas.2104878118] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/01/2021] [Indexed: 11/18/2022] Open
Abstract
The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein's behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network's internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks' ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models' ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.
Collapse
Affiliation(s)
- Sam Gelman
- Department of Computer Sciences, University of Wisconsin–Madison, Madison, WI 53706
- Morgridge Institute for Research, Madison, WI 53715
| | - Sarah A. Fahlberg
- Department of Biochemistry, University of Wisconsin–Madison, Madison, WI 53706
| | - Pete Heinzelman
- Department of Biochemistry, University of Wisconsin–Madison, Madison, WI 53706
| | - Philip A. Romero
- Department of Biochemistry, University of Wisconsin–Madison, Madison, WI 53706
| | - Anthony Gitter
- Department of Computer Sciences, University of Wisconsin–Madison, Madison, WI 53706
- Morgridge Institute for Research, Madison, WI 53715
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI 53792
| |
Collapse
|
72
|
Timmons PB, Hewage CM. APPTEST is a novel protocol for the automatic prediction of peptide tertiary structures. Brief Bioinform 2021; 22:bbab308. [PMID: 34396417 PMCID: PMC8575040 DOI: 10.1093/bib/bbab308] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 07/05/2021] [Accepted: 07/16/2021] [Indexed: 01/29/2023] Open
Abstract
Good knowledge of a peptide's tertiary structure is important for understanding its function and its interactions with its biological targets. APPTEST is a novel computational protocol that employs a neural network architecture and simulated annealing methods for the prediction of peptide tertiary structure from the primary sequence. APPTEST works for both linear and cyclic peptides of 5-40 natural amino acids. APPTEST is computationally efficient, returning predicted structures within a number of minutes. APPTEST performance was evaluated on a set of 356 test peptides; the best structure predicted for each peptide deviated by an average of 1.9Å from its experimentally determined backbone conformation, and a native or near-native structure was predicted for 97% of the target sequences. A comparison of APPTEST performance with PEP-FOLD, PEPstrMOD and PepLook across benchmark datasets of short, long and cyclic peptides shows that on average APPTEST produces structures more native than the existing methods in all three categories. This innovative, cutting-edge peptide structure prediction method is available as an online web server at https://research.timmons.eu/apptest, facilitating in silico study and design of peptides by the wider research community.
Collapse
Affiliation(s)
- Patrick Brendan Timmons
- UCD School of Biomolecular and Biomedical Science, UCD Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| | - Chandralal M Hewage
- UCD School of Biomolecular and Biomedical Science, UCD Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| |
Collapse
|
73
|
Defresne M, Barbe S, Schiex T. Protein Design with Deep Learning. Int J Mol Sci 2021; 22:11741. [PMID: 34769173 PMCID: PMC8584038 DOI: 10.3390/ijms222111741] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/23/2021] [Accepted: 10/26/2021] [Indexed: 12/21/2022] Open
Abstract
Computational Protein Design (CPD) has produced impressive results for engineering new proteins, resulting in a wide variety of applications. In the past few years, various efforts have aimed at replacing or improving existing design methods using Deep Learning technology to leverage the amount of publicly available protein data. Deep Learning (DL) is a very powerful tool to extract patterns from raw data, provided that data are formatted as mathematical objects and the architecture processing them is well suited to the targeted problem. In the case of protein data, specific representations are needed for both the amino acid sequence and the protein structure in order to capture respectively 1D and 3D information. As no consensus has been reached about the most suitable representations, this review describes the representations used so far, discusses their strengths and weaknesses, and details their associated DL architecture for design and related tasks.
Collapse
Affiliation(s)
- Marianne Defresne
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, ANITI, 31077 Toulouse, France; (M.D.); (S.B.)
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, 31326 Toulouse, France
| | - Sophie Barbe
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, ANITI, 31077 Toulouse, France; (M.D.); (S.B.)
| | - Thomas Schiex
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, 31326 Toulouse, France
| |
Collapse
|
74
|
Walker CC, Meek GA, Fobe TL, Shirts MR. Using a Coarse-Grained Modeling Framework to Identify Oligomeric Motifs with Tunable Secondary Structure. J Chem Theory Comput 2021; 17:6018-6035. [PMID: 34495659 DOI: 10.1021/acs.jctc.1c00528] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Coarse-grained modeling can be used to explore general theories that are independent of specific chemical detail. In this paper, we present cg_openmm, a Python-based simulation framework for modeling coarse-grained hetero-oligomers and screening them for structural and thermodynamic characteristics of cooperative secondary structures. cg_openmm facilitates the building of coarse-grained topology and random starting configurations, setup of GPU-accelerated replica exchange molecular dynamics simulations with the OpenMM software package, and features a suite of postprocessing thermodynamic and structural analysis tools. In particular, native contact analysis, heat capacity calculations, and free energy of folding calculations are used to identify and characterize cooperative folding transitions and stable secondary structures. In this work, we demonstrate the capabilities of cg_openmm on a simple 1-1 Lennard-Jones coarse-grained model, in which each residue contains 1 backbone and 1 side-chain bead. By scanning both nonbonded and bonded force-field parameter spaces at the coarse-grained level, we identify and characterize sets of parameters which result in the formation of stable helices through cooperative folding transitions. Moreover, we show that the geometries and stabilities of these helices can be tuned by manipulating the force-field parameters.
Collapse
Affiliation(s)
- Christopher C Walker
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Garrett A Meek
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Theodore L Fobe
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Michael R Shirts
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| |
Collapse
|
75
|
AoP-LSE: Antioxidant Proteins Classification Using Deep Latent Space Encoding of Sequence Features. Curr Issues Mol Biol 2021; 43:1489-1501. [PMID: 34698113 PMCID: PMC8928959 DOI: 10.3390/cimb43030105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 09/28/2021] [Accepted: 09/29/2021] [Indexed: 11/16/2022] Open
Abstract
It is of utmost importance to develop a computational method for accurate prediction of antioxidants, as they play a vital role in the prevention of several diseases caused by oxidative stress. In this correspondence, we present an effective computational methodology based on the notion of deep latent space encoding. A deep neural network classifier fused with an auto-encoder learns class labels in a pruned latent space. This strategy has eliminated the need to separately develop classifier and the feature selection model, allowing the standalone model to effectively harness discriminating feature space and perform improved predictions. A thorough analytical study has been presented alongwith the PCA/tSNE visualization and PCA-GCNR scores to show the discriminating power of the proposed method. The proposed method showed a high MCC value of 0.43 and a balanced accuracy of 76.2%, which is superior to the existing models. The model has been evaluated on an independent dataset during which it outperformed the contemporary methods by correctly identifying the novel proteins with an accuracy of 95%.
Collapse
|
76
|
Robson B. Testing machine learning techniques for general application by using protein secondary structure prediction. A brief survey with studies of pitfalls and benefits using a simple progressive learning approach. Comput Biol Med 2021; 138:104883. [PMID: 34598067 DOI: 10.1016/j.compbiomed.2021.104883] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/05/2021] [Accepted: 09/17/2021] [Indexed: 01/05/2023]
Abstract
Many researchers have recently used the prediction of protein secondary structure (local conformational states of amino acid residues) to test advances in predictive and machine learning technology such as Neural Net Deep Learning. Protein secondary structure prediction continues to be a helpful tool in research in biomedicine and the life sciences, but it is also extremely enticing for testing predictive methods such as neural nets that are intended for different or more general purposes. A complication is highlighted here for researchers testing their methods for other applications. Modern protein databases inevitably contain important clues to the answer, so-called "strong buried clues", though often obscurely; they are hard to avoid. This is because most proteins or parts of proteins in a modern protein data base are related to others by biological evolution. For researchers developing machine learning and predictive methods, this can overstate and so confuse understanding of the true quality of a predictive method. However, for researchers using the algorithms as tools, understanding strong buried clues is of great value, because they need to make maximum use of all information available. A simple method related to the GOR methods but with some features of neural nets in the sense of progressive learning of large numbers of weights, is used to explore this. It can acquire tens of millions and hence gigabytes of weights, but they are learned stably by exhaustive sampling. The significance of the findings is discussed in the light of promising recent results from AlphaFold using Google's DeepMind.
Collapse
Affiliation(s)
- Barry Robson
- Ingine Inc. Ohio, USA and the Dirac Foundation Oxfordshire, UK.
| |
Collapse
|
77
|
Hybrid Deep Learning Based on a Heterogeneous Network Profile for Functional Annotations of Plasmodium falciparum Genes. Int J Mol Sci 2021; 22:ijms221810019. [PMID: 34576183 PMCID: PMC8468833 DOI: 10.3390/ijms221810019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 09/13/2021] [Accepted: 09/14/2021] [Indexed: 12/15/2022] Open
Abstract
Functional annotation of unknown function genes reveals unidentified functions that can enhance our understanding of complex genome communications. A common approach for inferring gene function involves the ortholog-based method. However, genetic data alone are often not enough to provide information for function annotation. Thus, integrating other sources of data can potentially increase the possibility of retrieving annotations. Network-based methods are efficient techniques for exploring interactions among genes and can be used for functional inference. In this study, we present an analysis framework for inferring the functions of Plasmodium falciparum genes based on connection profiles in a heterogeneous network between human and Plasmodium falciparum proteins. These profiles were fed into a hybrid deep learning algorithm to predict the orthologs of unknown function genes. The results show high performance of the model's predictions, with an AUC of 0.89. One hundred and twenty-one predicted pairs with high prediction scores were selected for inferring the functions using statistical enrichment analysis. Using this method, PF3D7_1248700 and PF3D7_0401800 were found to be involved with muscle contraction and striated muscle tissue development, while PF3D7_1303800 and PF3D7_1201000 were found to be related to protein dephosphorylation. In conclusion, combining a heterogeneous network and a hybrid deep learning technique can allow us to identify unknown gene functions of malaria parasites. This approach is generalized and can be applied to other diseases that enhance the field of biomedical science.
Collapse
|
78
|
Kell DB. The Transporter-Mediated Cellular Uptake and Efflux of Pharmaceutical Drugs and Biotechnology Products: How and Why Phospholipid Bilayer Transport Is Negligible in Real Biomembranes. Molecules 2021; 26:5629. [PMID: 34577099 PMCID: PMC8470029 DOI: 10.3390/molecules26185629] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 09/03/2021] [Accepted: 09/14/2021] [Indexed: 12/12/2022] Open
Abstract
Over the years, my colleagues and I have come to realise that the likelihood of pharmaceutical drugs being able to diffuse through whatever unhindered phospholipid bilayer may exist in intact biological membranes in vivo is vanishingly low. This is because (i) most real biomembranes are mostly protein, not lipid, (ii) unlike purely lipid bilayers that can form transient aqueous channels, the high concentrations of proteins serve to stop such activity, (iii) natural evolution long ago selected against transport methods that just let any undesirable products enter a cell, (iv) transporters have now been identified for all kinds of molecules (even water) that were once thought not to require them, (v) many experiments show a massive variation in the uptake of drugs between different cells, tissues, and organisms, that cannot be explained if lipid bilayer transport is significant or if efflux were the only differentiator, and (vi) many experiments that manipulate the expression level of individual transporters as an independent variable demonstrate their role in drug and nutrient uptake (including in cytotoxicity or adverse drug reactions). This makes such transporters valuable both as a means of targeting drugs (not least anti-infectives) to selected cells or tissues and also as drug targets. The same considerations apply to the exploitation of substrate uptake and product efflux transporters in biotechnology. We are also beginning to recognise that transporters are more promiscuous, and antiporter activity is much more widespread, than had been realised, and that such processes are adaptive (i.e., were selected by natural evolution). The purpose of the present review is to summarise the above, and to rehearse and update readers on recent developments. These developments lead us to retain and indeed to strengthen our contention that for transmembrane pharmaceutical drug transport "phospholipid bilayer transport is negligible".
Collapse
Affiliation(s)
- Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK;
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs Lyngby, Denmark
- Mellizyme Biotechnology Ltd., IC1, Liverpool Science Park, Mount Pleasant, Liverpool L3 5TF, UK
| |
Collapse
|
79
|
Garagounis C, Delkis N, Papadopoulou KK. Unraveling the roles of plant specialized metabolites: using synthetic biology to design molecular biosensors. THE NEW PHYTOLOGIST 2021; 231:1338-1352. [PMID: 33997999 DOI: 10.1111/nph.17470] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 04/16/2021] [Indexed: 05/25/2023]
Abstract
Plants are a rich source of specialized metabolites with a broad range of bioactivities and many applications in human daily life. Over the past decades significant progress has been made in identifying many such metabolites in different plant species and in elucidating their biosynthetic pathways. However, the biological roles of plant specialized metabolites remain elusive and proposed functions lack an identified underlying molecular mechanism. Understanding the roles of specialized metabolites frequently is hampered by their dynamic production and their specific spatiotemporal accumulation within plant tissues and organs throughout a plant's life cycle. In this review, we propose the employment of strategies from the field of Synthetic Biology to construct and optimize genetically encoded biosensors that can detect individual specialized metabolites in a standardized and high-throughput manner. This will help determine the precise localization of specialized metabolites at the tissue and single-cell levels. Such information will be useful in developing complete system-level models of specialized plant metabolism, which ultimately will demonstrate how the biosynthesis of specialized metabolites is integrated with the core processes of plant growth and development.
Collapse
Affiliation(s)
- Constantine Garagounis
- Department of Biochemistry and Biotechnology, Plant and Environmental Biotechnology Laboratory, University of Thessaly, Larissa, 41500, Greece
| | - Nikolaos Delkis
- Department of Biochemistry and Biotechnology, Plant and Environmental Biotechnology Laboratory, University of Thessaly, Larissa, 41500, Greece
| | - Kalliope K Papadopoulou
- Department of Biochemistry and Biotechnology, Plant and Environmental Biotechnology Laboratory, University of Thessaly, Larissa, 41500, Greece
| |
Collapse
|
80
|
Li B, Mendenhall J, Capra JA, Meiler J. A Multitask Deep-Learning Method for Predicting Membrane Associations and Secondary Structures of Proteins. J Proteome Res 2021; 20:4089-4100. [PMID: 34236204 PMCID: PMC8650144 DOI: 10.1021/acs.jproteome.1c00410] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Prediction of residue-level structural attributes and protein-level structural classes helps model protein tertiary structures and understand protein functions. Existing methods are either specialized on only one class of proteins or developed to predict only a specific type of residue-level attribute. In this work, we develop a new deep-learning method, named Membrane Association and Secondary Structure Predictor (MASSP), for accurately predicting both residue-level structural attributes (secondary structure, location, orientation, and topology) and protein-level structural classes (bitopic, α-helical, β-barrel, and soluble). MASSP integrates a multilayer two-dimensional convolutional neural network (2D-CNN) with a long short-term memory (LSTM) neural network into a multitasking framework. Our comparison shows that MASSP performs equally well or better than the state-of-the-art methods in predicting residue-level secondary structures, boundaries of transmembrane segments, and topology. Furthermore, it achieves outstanding accuracy in predicting protein-level structural classes. MASSP automatically distinguishes the structural classes of input sequences and identifies transmembrane segments and topologies if present, making it broadly applicable to different classes of proteins. In summary, MASSP's good performance and broad applicability make it well suited for annotating residue-level attributes and protein-level structural classes at the proteome scale.
Collapse
Affiliation(s)
- Bian Li
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 37203, United States.,Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37203, United States
| | - Jeffrey Mendenhall
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37203, United States.,Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37203, United States
| | - John A Capra
- Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco, California 94143, United States
| | - Jens Meiler
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37203, United States.,Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37203, United States.,Institute for Drug Discovery, University Leipzig Medical School, Leipzig 04109, Germany
| |
Collapse
|
81
|
Park J, Chang S. A Particulate Matter Concentration Prediction Model Based on Long Short-Term Memory and an Artificial Neural Network. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18136801. [PMID: 34202834 PMCID: PMC8297184 DOI: 10.3390/ijerph18136801] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 06/15/2021] [Accepted: 06/16/2021] [Indexed: 01/12/2023]
Abstract
Many countries are concerned about high particulate matter (PM) concentrations caused by rapid industrial development, which can harm both human health and the environment. To manage PM, the prediction of PM concentrations based on historical data is actively being conducted. Existing technologies for predicting PM mostly assess the model performance for the prediction of existing PM concentrations; however, PM must be forecast in advance, before it becomes highly concentrated and causes damage to the citizens living in the affected regions. Thus, it is necessary to conduct research on an index that can illustrate whether the PM concentration will increase or decrease. We developed a model that can predict whether the PM concentration might increase or decrease after a certain time, specifically for PM2.5 (fine PM) generated by anthropogenic volatile organic compounds. An algorithm that can select a model on an hourly basis, based on the long short-term memory (LSTM) and artificial neural network (ANN) models, was developed. The proposed algorithm exhibited a higher F1-score than the LSTM, ANN, or random forest models alone. The model developed in this study could be used to predict future regional PM concentration levels more effectively.
Collapse
|
82
|
Scherer M, Fleishman SJ, Jones PR, Dandekar T, Bencurova E. Computational Enzyme Engineering Pipelines for Optimized Production of Renewable Chemicals. Front Bioeng Biotechnol 2021; 9:673005. [PMID: 34211966 PMCID: PMC8239229 DOI: 10.3389/fbioe.2021.673005] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 05/06/2021] [Indexed: 11/13/2022] Open
Abstract
To enable a sustainable supply of chemicals, novel biotechnological solutions are required that replace the reliance on fossil resources. One potential solution is to utilize tailored biosynthetic modules for the metabolic conversion of CO2 or organic waste to chemicals and fuel by microorganisms. Currently, it is challenging to commercialize biotechnological processes for renewable chemical biomanufacturing because of a lack of highly active and specific biocatalysts. As experimental methods to engineer biocatalysts are time- and cost-intensive, it is important to establish efficient and reliable computational tools that can speed up the identification or optimization of selective, highly active, and stable enzyme variants for utilization in the biotechnological industry. Here, we review and suggest combinations of effective state-of-the-art software and online tools available for computational enzyme engineering pipelines to optimize metabolic pathways for the biosynthesis of renewable chemicals. Using examples relevant for biotechnology, we explain the underlying principles of enzyme engineering and design and illuminate future directions for automated optimization of biocatalysts for the assembly of synthetic metabolic pathways.
Collapse
Affiliation(s)
- Marc Scherer
- Department of Bioinformatics, Julius-Maximilians University of Würzburg, Würzburg, Germany
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Patrik R Jones
- Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Thomas Dandekar
- Department of Bioinformatics, Julius-Maximilians University of Würzburg, Würzburg, Germany
| | - Elena Bencurova
- Department of Bioinformatics, Julius-Maximilians University of Würzburg, Würzburg, Germany
| |
Collapse
|
83
|
Afify HM, Abdelhalim MB, Mabrouk MS, Sayed AY. Protein secondary structure prediction (PSSP) using different machine algorithms. EGYPTIAN JOURNAL OF MEDICAL HUMAN GENETICS 2021. [DOI: 10.1186/s43042-021-00173-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
The computational biology approach has advanced exponentially in protein secondary structure prediction (PSSP), which is vital for the pharmaceutical industry. Extracting protein structure from the laboratory has insufficient information for PSSP that is used in bioinformatics studies. In this paper, the support vector machine (SVM) model and decision tree are presented on the RS126 dataset to address the problem of PSSP. A decision tree is applied for the SVM outcome to obtain the relevant guidelines possible for PSSP. Furthermore, the number of produced rules was fairly small, and they show a greater degree of comprehensibility compared to other rules. Several of the proposed principles have compelling and relevant biological clarification.
Results
The results confirmed that the existence of a particular amino acid in a protein sequence increases the stability for the forecast of protein secondary structure. The suggested algorithm achieved 85% accuracy for the E|~E classifier.
Conclusions
The proposed rules can be very important in managing wet laboratory experiments intended at determining protein secondary structure. Lastly, future work will focus mainly on large protein datasets without overfitting and expand the amount of extracted regulations for PSSP.
Collapse
|
84
|
Suh D, Lee JW, Choi S, Lee Y. Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction. Int J Mol Sci 2021; 22:6032. [PMID: 34199677 PMCID: PMC8199773 DOI: 10.3390/ijms22116032] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 05/29/2021] [Accepted: 05/29/2021] [Indexed: 01/23/2023] Open
Abstract
The new advances in deep learning methods have influenced many aspects of scientific research, including the study of the protein system. The prediction of proteins' 3D structural components is now heavily dependent on machine learning techniques that interpret how protein sequences and their homology govern the inter-residue contacts and structural organization. Especially, methods employing deep neural networks have had a significant impact on recent CASP13 and CASP14 competition. Here, we explore the recent applications of deep learning methods in the protein structure prediction area. We also look at the potential opportunities for deep learning methods to identify unknown protein structures and functions to be discovered and help guide drug-target interactions. Although significant problems still need to be addressed, we expect these techniques in the near future to play crucial roles in protein structural bioinformatics as well as in drug discovery.
Collapse
Affiliation(s)
- Donghyuk Suh
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Jai Woo Lee
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Sun Choi
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Yoonji Lee
- College of Pharmacy, Chung-Ang University, Seoul 06974, Korea
| |
Collapse
|
85
|
Biomedical Image Classification in a Big Data Architecture Using Machine Learning Algorithms. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:9998819. [PMID: 34122785 PMCID: PMC8191587 DOI: 10.1155/2021/9998819] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 05/09/2021] [Accepted: 05/25/2021] [Indexed: 12/13/2022]
Abstract
In modern-day medicine, medical imaging has undergone immense advancements and can capture several biomedical images from patients. In the wake of this, to assist medical specialists, these images can be used and trained in an intelligent system in order to aid the determination of the different diseases that can be identified from analyzing these images. Classification plays an important role in this regard; it enhances the grouping of these images into categories of diseases and optimizes the next step of a computer-aided diagnosis system. The concept of classification in machine learning deals with the problem of identifying to which set of categories a new population belongs. When category membership is known, the classification is done on the basis of a training set of data containing observations. The goal of this paper is to perform a survey of classification algorithms for biomedical images. The paper then describes how these algorithms can be applied to a big data architecture by using the Spark framework. This paper further proposes the classification workflow based on the observed optimal algorithms, Support Vector Machine and Deep Learning as drawn from the literature. The algorithm for the feature extraction step during the classification process is presented and can be customized in all other steps of the proposed classification workflow.
Collapse
|
86
|
Chandrasekaran S, Danos N, George UZ, Han JP, Quon G, Müller R, Tsang Y, Wolgemuth C. The Axes of Life: A roadmap for understanding dynamic multiscale systems. Integr Comp Biol 2021; 61:2011-2019. [PMID: 34048574 DOI: 10.1093/icb/icab114] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The biological challenges facing humanity are complex, multi-factorial, and are intimately tied to the future of our health, welfare, and stewardship of the Earth. Tackling problems in diverse areas, such as agriculture, ecology, and health care require linking vast data sets that encompass numerous components and spatio-temporal scales. Here, we provide a new framework and a road map for using experiments and computation to understand dynamic biological systems that span multiple scales. We discuss theories that can help understand complex biological systems and highlight the limitations of existing methodologies and recommend data generation practices. The advent of new technologies such as big data analytics and artificial intelligence can help bridge different scales and data types. We recommend ways to make such models transparent, compatible with existing theories of biological function, and to make biological data sets readable by advanced machine learning algorithms. Overall, the barriers for tackling pressing biological challenges are not only technological, but also sociological. Hence, we also provide recommendations for promoting interdisciplinary interactions between scientists.
Collapse
Affiliation(s)
| | - Nicole Danos
- Department of Biology, University of San Diego, San Diego, CA, USA
| | - Uduak Z George
- Department of Mathematics & Statistics, San Diego State University, San Diego, CA, USA
| | - Jin-Ping Han
- IBM TJ Watson Research Center, Ossining, NY, USA
| | - Gerald Quon
- Department of Molecular and Cellular Biology, University of California-Davis, Davis, CA,USA
| | - Rolf Müller
- Department of Mechanical Engineering, Virginia Tech, Blacksburg, VI, USA
| | - Yinphan Tsang
- Department of Natural Resources and Environmental Management, University of Hawai'i at Mānoa, Honolulu, HI, USA
| | - Charles Wolgemuth
- Departments of Physics and Molecular and Cellular Biology, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
87
|
Pakhrin SC, Shrestha B, Adhikari B, KC DB. Deep Learning-Based Advances in Protein Structure Prediction. Int J Mol Sci 2021; 22:5553. [PMID: 34074028 PMCID: PMC8197379 DOI: 10.3390/ijms22115553] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 05/12/2021] [Accepted: 05/18/2021] [Indexed: 12/29/2022] Open
Abstract
Obtaining an accurate description of protein structure is a fundamental step toward understanding the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the success of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress in the field. Finally, we provide an outlook and possible future research directions for DL-based approaches in the protein structure prediction arena.
Collapse
Affiliation(s)
- Subash C. Pakhrin
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA;
| | - Bikash Shrestha
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA;
| | - Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA;
| | - Dukka B. KC
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA;
| |
Collapse
|
88
|
Milchevskaya V, Nikitin AM, Lukshin SA, Filatov IV, Kravatsky YV, Tumanyan VG, Esipova NG, Milchevskiy YV. Structural coordinates: A novel approach to predict protein backbone conformation. PLoS One 2021; 16:e0239793. [PMID: 34014953 PMCID: PMC8136669 DOI: 10.1371/journal.pone.0239793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 04/14/2021] [Indexed: 11/19/2022] Open
Abstract
Motivation Local protein structure is usually described via classifying each peptide to a unique class from a set of pre-defined structures. These classifications may differ in the number of structural classes, the length of peptides, or class attribution criteria. Most methods that predict the local structure of a protein from its sequence first rely on some classification and only then proceed to the 3D conformation assessment. However, most classification methods rely on homologous proteins’ existence, unavoidably lose information by attributing a peptide to a single class or suffer from a suboptimal choice of the representative classes. Results To alleviate the above challenges, we propose a method that constructs a peptide’s structural representation from the sequence, reflecting its similarity to several basic representative structures. For 5-mer peptides and 16 representative structures, we achieved the Q16 classification accuracy of 67.9%, which is higher than what is currently reported in the literature. Our prediction method does not utilize information about protein homologues but relies only on the amino acids’ physicochemical properties and the resolved structures’ statistics. We also show that the 3D coordinates of a peptide can be uniquely recovered from its structural coordinates, and show the required conditions under various geometric constraints.
Collapse
Affiliation(s)
- Vladislava Milchevskaya
- Institute of Medical Statistics and Bioinformatics, Faculty of Medicine, University of Cologne, Cologne, Germany
- * E-mail: (VM); (YVM)
| | | | | | - Ivan V. Filatov
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | | | | | | | - Yury V. Milchevskiy
- Engelhardt Institute of Molecular Biology, Moscow, Russia
- * E-mail: (VM); (YVM)
| |
Collapse
|
89
|
Remodelling structure-based drug design using machine learning. Emerg Top Life Sci 2021; 5:13-27. [PMID: 33825834 DOI: 10.1042/etls20200253] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 03/17/2021] [Accepted: 03/30/2021] [Indexed: 12/13/2022]
Abstract
To keep up with the pace of rapid discoveries in biomedicine, a plethora of research endeavors had been directed toward Rational Drug Development that slowly gave way to Structure-Based Drug Design (SBDD). In the past few decades, SBDD played a stupendous role in identification of novel drug-like molecules that are capable of altering the structures and/or functions of the target macromolecules involved in different disease pathways and networks. Unfortunately, post-delivery drug failures due to adverse drug interactions have constrained the use of SBDD in biomedical applications. However, recent technological advancements, along with parallel surge in clinical research have led to the concomitant establishment of other powerful computational techniques such as Artificial Intelligence (AI) and Machine Learning (ML). These leading-edge tools with the ability to successfully predict side-effects of a wide range of drugs have eventually taken over the field of drug design. ML, a subset of AI, is a robust computational tool that is capable of data analysis and analytical model building with minimal human intervention. It is based on powerful algorithms that use huge sets of 'training data' as inputs to predict new output values, which improve iteratively through experience. In this review, along with a brief discussion on the evolution of the drug discovery process, we have focused on the methodologies pertaining to the technological advancements of machine learning. This review, with specific examples, also emphasises the tremendous contributions of ML in the field of biomedicine, while exploring possibilities for future developments.
Collapse
|
90
|
Vedithi SC, Malhotra S, Acebrón-García-de-Eulate M, Matusevicius M, Torres PHM, Blundell TL. Structure-Guided Computational Approaches to Unravel Druggable Proteomic Landscape of Mycobacterium leprae. Front Mol Biosci 2021; 8:663301. [PMID: 34026836 PMCID: PMC8138464 DOI: 10.3389/fmolb.2021.663301] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 04/12/2021] [Indexed: 02/02/2023] Open
Abstract
Leprosy, caused by Mycobacterium leprae (M. leprae), is treated with a multidrug regimen comprising Dapsone, Rifampicin, and Clofazimine. These drugs exhibit bacteriostatic, bactericidal and anti-inflammatory properties, respectively, and control the dissemination of infection in the host. However, the current treatment is not cost-effective, does not favor patient compliance due to its long duration (12 months) and does not protect against the incumbent nerve damage, which is a severe leprosy complication. The chronic infectious peripheral neuropathy associated with the disease is primarily due to the bacterial components infiltrating the Schwann cells that protect neuronal axons, thereby inducing a demyelinating phenotype. There is a need to discover novel/repurposed drugs that can act as short duration and effective alternatives to the existing treatment regimens, preventing nerve damage and consequent disability associated with the disease. Mycobacterium leprae is an obligate pathogen resulting in experimental intractability to cultivate the bacillus in vitro and limiting drug discovery efforts to repositioning screens in mouse footpad models. The dearth of knowledge related to structural proteomics of M. leprae, coupled with emerging antimicrobial resistance to all the three drugs in the multidrug therapy, poses a need for concerted novel drug discovery efforts. A comprehensive understanding of the proteomic landscape of M. leprae is indispensable to unravel druggable targets that are essential for bacterial survival and predilection of human neuronal Schwann cells. Of the 1,614 protein-coding genes in the genome of M. leprae, only 17 protein structures are available in the Protein Data Bank. In this review, we discussed efforts made to model the proteome of M. leprae using a suite of software for protein modeling that has been developed in the Blundell laboratory. Precise template selection by employing sequence-structure homology recognition software, multi-template modeling of the monomeric models and accurate quality assessment are the hallmarks of the modeling process. Tools that map interfaces and enable building of homo-oligomers are discussed in the context of interface stability. Other software is described to determine the druggable proteome by using information related to the chokepoint analysis of the metabolic pathways, gene essentiality, homology to human proteins, functional sites, druggable pockets and fragment hotspot maps.
Collapse
Affiliation(s)
- Sundeep Chaitanya Vedithi
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom,*Correspondence: Sundeep Chaitanya Vedithi,
| | - Sony Malhotra
- Rutherford Appleton Laboratory, Science and Technology Facilities Council, Oxon, United Kingdom
| | | | | | - Pedro Henrique Monteiro Torres
- Laboratório de Modelagem e Dinâmica Molecular, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Tom L. Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom,Tom L. Blundell,
| |
Collapse
|
91
|
Nallasamy V, S M. Bingham deep neural and oppositional fish swarm optimized protein structure prediction. J Biomol Struct Dyn 2021; 40:8706-8724. [PMID: 33955323 DOI: 10.1080/07391102.2021.1915181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
It is familiar that essential proteins take part in managing cellular activities in living organisms. Moreover, protein structure prediction from its amino acid sequence is advantageous to the comprehending of cellular functions. Formerly, several essential protein prediction methods have been proposed. However, those existing prediction methods were not satisfactory because to low sensitivity to imbalance characteristics. To address this issue, this paper presents a novel secondary protein structure prediction method, called, Bingham Deep Convolutional-based Oppositional Artificial Fish Optimized (BDC-OAFO). First, a protein structure identification framework, called, Bingham Distributed Deep Convolutional (BDDC) is designed to identify the essential proteins by eliminating the imbalanced learning issue. Next, secondary structure prediction framework, called, Oppositional Artificial Fish Swarm Optimization is proposed that obtain precise prediction results. Then, predicting secondary protein structure by emulating three biological behaviors of artificial fishes, including foraging behavior, following behavior, swarming behavior in which process, proximal count, oppositional function and Gaussian function are utilized. To evaluate the performance of BDC-OAFO method, we conduct experiments on Protein Data Bank dataset the experimental results show that our method BDC-OAFO achieves a better performance for identifying essential proteins and precise prediction in comparison with several other well-known prediction methods, which confirms the significance of BDC-OAFO.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
| | - Malarvizhi S
- Department of Computer Science, Thiruvalluvar Government Arts College, Namakkal, Tamil Nadu, India
| |
Collapse
|
92
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 95] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
93
|
One-Dimensional Convolutional Neural Network with Adaptive Moment Estimation for Modelling of the Sand Retention Test. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11093802] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Stand-alone screens (SASs) are active sand control methods where compatible screens and slot sizes are selected through the sand retention test (SRT) to filter an unacceptable amount of sand produced from oil and gas wells. SRTs have been modelled in the laboratory using computer simulation to replicate experimental conditions and ensure that the selected screens are suitable for selected reservoirs. However, the SRT experimental setups and result analyses are not standardized. A few changes made to the experimental setup can cause a huge variation in results, leading to different plugging performance and sand retention analysis. Besides, conducting many laboratory experiments is expensive and time-consuming. Since the application of CNN in the petroleum industry attained promising results for both classification and regression problems, this method is proposed on SRT to reduce the time, cost, and effort to run the laboratory test by predicting the plugging performance and sand production. The application of deep learning has yet to be imposed in SRT. Therefore, in this study, a deep learning model using a one-dimensional convolutional neural network (1D-CNN) with adaptive moment estimation is developed to model the SRT with the aim of classifying plugging sign (screen plug, the screen does not plug) as well as to predict sand production and retained permeability using a varying sand distribution, SAS, screen slot size, and sand concentration as inputs. The performance of the proposed 1D-CNN model for the slurry test shows that the prediction of retained permeability and the classification of plugging sign achieved robust accuracy with more than a 90% value of R2, while the prediction of sand production achieved 77% accuracy. In addition, the model for the sand pack test achieved 84% accuracy in predicting sand production. For comparative model performance, gradient boosting (GB), K-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM) were also modelled on the same datasets. The results showed that the proposed 1D-CNN model outperforms the other four machine learning models for both SRT tests in terms of prediction accuracy.
Collapse
|
94
|
Kyrilis FL, Belapure J, Kastritis PL. Detecting Protein Communities in Native Cell Extracts by Machine Learning: A Structural Biologist's Perspective. Front Mol Biosci 2021; 8:660542. [PMID: 33937337 PMCID: PMC8082361 DOI: 10.3389/fmolb.2021.660542] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/18/2021] [Indexed: 11/13/2022] Open
Abstract
Native cell extracts hold great promise for understanding the molecular structure of ordered biological systems at high resolution. This is because higher-order biomolecular interactions, dubbed as protein communities, may be retained in their (near-)native state, in contrast to extensively purifying or artificially overexpressing the proteins of interest. The distinct machine-learning approaches are applied to discover protein-protein interactions within cell extracts, reconstruct dedicated biological networks, and report on protein community members from various organisms. Their validation is also important, e.g., by the cross-linking mass spectrometry or cell biology methods. In addition, the cell extracts are amenable to structural analysis by cryo-electron microscopy (cryo-EM), but due to their inherent complexity, sorting structural signatures of protein communities derived by cryo-EM comprises a formidable task. The application of image-processing workflows inspired by machine-learning techniques would provide improvements in distinguishing structural signatures, correlating proteomic and network data to structural signatures and subsequently reconstructed cryo-EM maps, and, ultimately, characterizing unidentified protein communities at high resolution. In this review article, we summarize recent literature in detecting protein communities from native cell extracts and identify the remaining challenges and opportunities. We argue that the progress in, and the integration of, machine learning, cryo-EM, and complementary structural proteomics approaches would provide the basis for a multi-scale molecular description of protein communities within native cell extracts.
Collapse
Affiliation(s)
- Fotis L. Kyrilis
- Interdisciplinary Research Center HALOmem, Charles Tanford Protein Center, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
- Institute of Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Jaydeep Belapure
- Interdisciplinary Research Center HALOmem, Charles Tanford Protein Center, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Panagiotis L. Kastritis
- Interdisciplinary Research Center HALOmem, Charles Tanford Protein Center, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
- Institute of Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
- Biozentrum, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| |
Collapse
|
95
|
The whole is greater than its parts: ensembling improves protein contact prediction. Sci Rep 2021; 11:8039. [PMID: 33850214 PMCID: PMC8044223 DOI: 10.1038/s41598-021-87524-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 03/29/2021] [Indexed: 11/30/2022] Open
Abstract
The prediction of amino acid contacts from protein sequence is an important problem, as protein contacts are a vital step towards the prediction of folded protein structures. We propose that a powerful concept from deep learning, called ensembling, can increase the accuracy of protein contact predictions by combining the outputs of different neural network models. We show that ensembling the predictions made by different groups at the recent Critical Assessment of Protein Structure Prediction (CASP13) outperforms all individual groups. Further, we show that contacts derived from the distance predictions of three additional deep neural networks—AlphaFold, trRosetta, and ProSPr—can be substantially improved by ensembling all three networks. We also show that ensembling these recent deep neural networks with the best CASP13 group creates a superior contact prediction tool. Finally, we demonstrate that two ensembled networks can successfully differentiate between the folds of two highly homologous sequences. In order to build further on these findings, we propose the creation of a better protein contact benchmark set and additional open-source contact prediction methods.
Collapse
|
96
|
Auslander N, Gussow AB, Koonin EV. Incorporating Machine Learning into Established Bioinformatics Frameworks. Int J Mol Sci 2021; 22:2903. [PMID: 33809353 PMCID: PMC8000113 DOI: 10.3390/ijms22062903] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/08/2021] [Accepted: 03/10/2021] [Indexed: 12/23/2022] Open
Abstract
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.
Collapse
Affiliation(s)
| | | | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;
| |
Collapse
|
97
|
Daley SK, Cordell GA. Natural Products, the Fourth Industrial Revolution, and the Quintuple Helix. Nat Prod Commun 2021. [DOI: 10.1177/1934578x211003029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The profound interconnectedness of the sciences and technologies embodied in the Fourth Industrial Revolution is discussed in terms of the global role of natural products, and how that interplays with the development of sustainable and climate-conscious practices of cyberecoethnopharmacolomics within the Quintuple Helix for the promotion of a healthier planet and society.
Collapse
Affiliation(s)
| | - Geoffrey A. Cordell
- Natural Products Inc., Evanston, IL, USA
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, FL, USA
| |
Collapse
|
98
|
Que-Salinas U, Ramírez-González PE, Torres-Carbajal A. Determination of thermodynamic state variables of liquids from their microscopic structures using an artificial neural network. SOFT MATTER 2021; 17:1975-1984. [PMID: 33427848 DOI: 10.1039/d0sm02127j] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this work we implement a machine learning method to predict the thermodynamic state of a liquid using only its microscopic structure provided by the radial distribution function (RDF). The main goal is to determine the equation of state of the system. The goal is achieved by predicting the density, temperature or both at the same time using only the RDF. We implement and train a machine learning feed forward artificial neural network (ANN) to address the different cases of interest where single or simultaneous predictions are done. Due to its versatility, in this study the Lennard-Jones (LJ) fluid is used as the reference system. The ANN is trained in a wide range of densities and temperatures, covering the liquid-vapour coexistence, liquid phase and supercritical states. We show that the overall percentage relative error of most of the predictions in different cases of study is around 3%. As a practical case of study we use the ANN predictions to determine the pressure equation of state for different isotherms and we found a very good agreement with respect to the exact results. Our ANN implementation is a versatile and useful tool to predict thermodynamic state variables when some information is unknown and, consequently, to enhance the thermodynamic description of liquids.
Collapse
Affiliation(s)
- Ulices Que-Salinas
- Instituto de Física "Manuel Sandoval Vallarta", Universidad Autónoma de San Luis Potosí, Álvaro Obregón 64, 78000 San Luis Potosí, SLP, Mexico.
| | - Pedro E Ramírez-González
- CONACYT-Instituto de Física "Manuel Sandoval Vallarta", Universidad Autónoma de San Luis Potosí, Álvaro Obregón 64, 78000 San Luis Potosí, SLP, Mexico
| | - Alexis Torres-Carbajal
- Instituto de Física "Manuel Sandoval Vallarta", Universidad Autónoma de San Luis Potosí, Álvaro Obregón 64, 78000 San Luis Potosí, SLP, Mexico.
| |
Collapse
|
99
|
|
100
|
Katsimpouras C, Stephanopoulos G. Enzymes in biotechnology: Critical platform technologies for bioprocess development. Curr Opin Biotechnol 2021; 69:91-102. [PMID: 33422914 DOI: 10.1016/j.copbio.2020.12.003] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 11/09/2020] [Accepted: 12/08/2020] [Indexed: 01/02/2023]
Abstract
Enzymes are core elements of biosynthetic pathways employed in the synthesis of numerous bioproducts. Here, we review enzyme promiscuity, enzyme engineering, enzyme immobilization, and cell-free systems as fundamental strategies of bioprocess development. Initially, promiscuous enzymes are the first candidates in the quest for new activities to power new, artificial, or bypass pathways that expand substrate range and catalyze the production of new products. If the activity or regulation of available enzymes is unsuitable for a process, protein engineering can be applied to improve them to the required level. When cell toxicity and low productivity cannot be engineered away, cell-free systems are an attractive option, especially in combination with enzyme immobilization that allows extended enzyme use. Overall, the above methods support powerful platforms for bioprocess development and optimization.
Collapse
Affiliation(s)
- Constantinos Katsimpouras
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, 02139 MA, USA
| | - Gregory Stephanopoulos
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, 02139 MA, USA.
| |
Collapse
|