1
|
Wang T, Zhang X, Zhang O, Chen G, Pan P, Wang E, Wang J, Wu J, Zhou D, Wang L, Jin R, Chen S, Shen C, Kang Y, Hsieh CY, Hou T. Highly Accurate and Efficient Deep Learning Paradigm for Full-Atom Protein Loop Modeling with KarmaLoop. RESEARCH (WASHINGTON, D.C.) 2024; 7:0408. [PMID: 39055686 PMCID: PMC11268956 DOI: 10.34133/research.0408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 05/22/2024] [Indexed: 07/27/2024]
Abstract
Protein loop modeling is a challenging yet highly nontrivial task in protein structure prediction. Despite recent progress, existing methods including knowledge-based, ab initio, hybrid, and deep learning (DL) methods fall substantially short of either atomic accuracy or computational efficiency. To overcome these limitations, we present KarmaLoop, a novel paradigm that distinguishes itself as the first DL method centered on full-atom (encompassing both backbone and side-chain heavy atoms) protein loop modeling. Our results demonstrate that KarmaLoop considerably outperforms conventional and DL-based methods of loop modeling in terms of both accuracy and efficiency, with the average RMSDs of 1.77 and 1.95 Å for the CASP13+14 and CASP15 benchmark datasets, respectively, and manifests at least 2 orders of magnitude speedup in general compared with other methods. Consequently, our comprehensive evaluations indicate that KarmaLoop provides a state-of-the-art DL solution for protein loop modeling, with the potential to hasten the advancement of protein engineering, antibody-antigen recognition, and drug design.
Collapse
Affiliation(s)
- Tianyue Wang
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Odin Zhang
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | | | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Ercheng Wang
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- Zhejiang Laboratory, Hangzhou 311100, Zhejiang, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Donghao Zhou
- Shenzhen Institute of Advanced Technology,
Chinese Academy of Sciences, Shenzhen 518055, Guangdong, China
| | - Langcheng Wang
- Department of Pathology,
New York University Medical Center, New York, NY 10016, USA
| | - Ruofan Jin
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- College of Life Sciences,
Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Shicheng Chen
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
2
|
Yue X, Li Y, Wei M, Duan Y, Yang L, Chen FE. Rational redesign of the loop dynamics of carbonyl reductase LfSDR1 to improve the stereoselectivity for asymmetric synthesis of bulky chiral alcohols. Int J Biol Macromol 2024; 274:133345. [PMID: 38944066 DOI: 10.1016/j.ijbiomac.2024.133345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 06/04/2024] [Accepted: 06/19/2024] [Indexed: 07/01/2024]
Abstract
Engineering biocatalysts with enhanced stereoselectivity is highly desirable, and active-site loop dynamics play an important role in its regulation. However, knowledge of their precise roles in catalysis and evolution is limited. Here, we used the strategy of Rosetta enzyme design combined molecular dynamic simulations (MDs) to reprogram the landscapes of the key active-site loop dynamics of the carbonyl reductase LfSDR1 to improve stereoselectivity. The key flexible loop in the active site showed the potential to regulate the catalytic properties. A library of virtual variants was produced using the Rosetta design and assessed dynamic effect of the loop with the aid of MDs. A potential candidate was obtained with significant stereoselectivity (ee > 99 %) compared to the wild-type (ee = 42 %) without loss of catalytic activity or thermostability. The molecular basis of the catalytic property enhancement was flanked by MDs, which revealed the role of the G92L mutation in regulating loop dynamics to stabilize the environment of the active site. Finally, a series of the challenge bulky substrate derivatives were assessed using the G92L variant, and all showed improved stereoselectivity ee > 99 %. This study provides novel insights for improving stereoselectivity through rational engineering of the loop dynamics of biocatalysts.
Collapse
Affiliation(s)
- Xiaoping Yue
- Engineering Center of Catalysis and Synthesis for Chiral Molecules, Fudan University, Shanghai 200433, China; Shanghai Engineering Center of Industrial Catalysis for Chiral Drugs, Fudan University, Shanghai 200433, China; School of Chemical Engineering, Jiangxi Normal University, Nanchang 330022, China
| | - Yitong Li
- Engineering Center of Catalysis and Synthesis for Chiral Molecules, Fudan University, Shanghai 200433, China; Shanghai Engineering Center of Industrial Catalysis for Chiral Drugs, Fudan University, Shanghai 200433, China
| | - Mankun Wei
- School of life science, Jiangxi Normal University, Nanchang 330022, China
| | - Yu Duan
- School of life science, Jiangxi Normal University, Nanchang 330022, China
| | - Lin Yang
- School of Chemical Engineering, Jiangxi Normal University, Nanchang 330022, China.
| | - Fen-Er Chen
- Engineering Center of Catalysis and Synthesis for Chiral Molecules, Fudan University, Shanghai 200433, China; Shanghai Engineering Center of Industrial Catalysis for Chiral Drugs, Fudan University, Shanghai 200433, China; School of Chemical Engineering, Jiangxi Normal University, Nanchang 330022, China.
| |
Collapse
|
3
|
Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024; 40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The selection among substitution models of molecular evolution is fundamental for obtaining accurate phylogenetic inferences. At the protein level, evolutionary analyses are traditionally based on empirical substitution models but these models make unrealistic assumptions and are being surpassed by structurally constrained substitution (SCS) models. The SCS models often consider site-dependent evolution, a process that provides realism but complicates their implementation into likelihood functions that are commonly used for substitution model selection. RESULTS We present a method to perform selection among site-dependent SCS models, also among empirical and site-dependent SCS models, based on the approximate Bayesian computation (ABC) approach and its implementation into the computational framework ProteinModelerABC. The framework implements ABC with and without regression adjustments and includes diverse empirical and site-dependent SCS models of protein evolution. Using extensive simulated data, we found that it provides selection among SCS and empirical models with acceptable accuracy. As illustrative examples, we applied the framework to analyze a variety of protein families observing that SCS models fit them better than the corresponding best-fitting empirical substitution models. AVAILABILITY AND IMPLEMENTATION ProteinModelerABC is freely available from https://github.com/DavidFerreiro/ProteinModelerABC, can run in parallel and includes a graphical user interface. The framework is distributed with detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
4
|
Khalaf MNA, Soliman THA, Mohamed SS. PLM-GAN: A Large-Scale Protein Loop Modeling Using pix2pix GAN. ACS OMEGA 2024; 9:437-446. [PMID: 38222545 PMCID: PMC10785670 DOI: 10.1021/acsomega.3c05863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/01/2023] [Accepted: 11/22/2023] [Indexed: 01/16/2024]
Abstract
Revealing the tertiary structure of proteins holds huge significance as it unveils their vital properties and functions. These intricate three-dimensional configurations comprise diverse interactions including ionic, hydrophobic, and disulfide forces. In certain instances, these structures exhibit missing regions, necessitating the reconstruction of specific segments, thereby resulting in challenges in protein design, which encompasses loop modeling, circular permutation, and interface prediction. To address this problem, we present two pioneering models: pix2pix generative adversarial network (GAN) and PLM-GAN. The pix2pix GAN model is adept at generating and inpainting distance matrices of protein structures, whereas the PLM-GAN model incorporates residual blocks into the U-Net network of the GAN, building upon the foundation of the pix2pix GAN model. To bolster the models' performance, we introduce a novel loss function named the "missing to real regions loss" (LMTR) within the GAN framework. Additionally, we introduce a distinctive approach of pairing two different distance matrices: one representing the native protein structure and the other representing the same structure with a missing region that undergoes changes in each successive epoch. Moreover, we extend the reconstruction of missing regions, encompassing up to 30 amino acids and increase the protein length by 128 amino acids. The evaluation of our pix2pix GAN and PLM-GAN models on a random selection of natural proteins (4ZCB, 3FJB, and 2REZ) demonstrated promising experimental results. Our models constitute significant contributions to addressing intricate challenges in protein structure design. These contributions hold immense potential to propel advancements in protein-protein interactions, drug design, and further innovations in protein engineering. Data, code, trained models, examples, and measurements are available on https://github.com/mena01/PLM-GAN-A-Large-Scale-Protein-Loop-Modeling-Using-pix2pix-GAN_.
Collapse
Affiliation(s)
- Mena Nagy A Khalaf
- Information System Department, Faculty of Computer and Information, Assiut University, Assiut 71515, Egypt
| | - Taysir Hassan A Soliman
- Information System Department, Faculty of Computer and Information, Assiut University, Assiut 71515, Egypt
| | - Sara Salah Mohamed
- Information System Department, Faculty of Computer and Information, Assiut University, Assiut 71515, Egypt
- Mathematics and Computer Science Department, Faculty of Science, New Valley University, New Valley 71511, Egypt
| |
Collapse
|
5
|
Wang J, Wang W, Shang Y. Protein Loop Modeling Using AlphaFold2. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3306-3313. [PMID: 37037235 DOI: 10.1109/tcbb.2023.3264899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The functions of proteins are largely determined by their three-dimensional (3D) structures. Loop modeling tries to predict the conformation of a relatively short stretch of protein backbone and sidechain. It is a difficult problem due to conformational variability. Recently, AlphaFold2 has achieved outstanding results in 3-D protein structure prediction and is expected to perform well on loop modeling. In this paper, we investigate the performances of AlphaFold2 variants on popular loop modeling benchmark datasets and propose an efficient protocol of using AlphaFold2 for loop modeling, called IAFLoop. To predict the structure of a loop region, IAFLoop gives a moderately extended segment of the target loop region as input to AlphaFold2, runs a fast version of AlphaFold2 using a reduced database without ensembling, and uses RMSD based consensus scores to select the final output models. Our experimental results on benchmark datasets show that IAFLoop generated highly accurate loop models. It achieves comparable performance to the original application of AlphaFold2 in terms of RMSD error, and achieving much better results on some targets, while only using half of the time. Compared to the best previous methods, IAFLoop reduces the RMSD error by almost half on the 8-residual loop dataset, and more than 70% on the 12-residual loop dataset.
Collapse
|
6
|
Corbella M, Pinto GP, Kamerlin SCL. Loop dynamics and the evolution of enzyme activity. Nat Rev Chem 2023; 7:536-547. [PMID: 37225920 DOI: 10.1038/s41570-023-00495-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/06/2023] [Indexed: 05/26/2023]
Abstract
In the early 2000s, Tawfik presented his 'New View' on enzyme evolution, highlighting the role of conformational plasticity in expanding the functional diversity of limited repertoires of sequences. This view is gaining increasing traction with increasing evidence of the importance of conformational dynamics in both natural and laboratory evolution of enzymes. The past years have seen several elegant examples of harnessing conformational (particularly loop) dynamics to successfully manipulate protein function. This Review revisits flexible loops as critical participants in regulating enzyme activity. We showcase several systems of particular interest: triosephosphate isomerase barrel proteins, protein tyrosine phosphatases and β-lactamases, while briefly discussing other systems in which loop dynamics are important for selectivity and turnover. We then discuss the implications for engineering, presenting examples of successful loop manipulation in either improving catalytic efficiency, or changing selectivity completely. Overall, it is becoming clearer that mimicking nature by manipulating the conformational dynamics of key protein loops is a powerful method of tailoring enzyme activity, without needing to target active-site residues.
Collapse
Affiliation(s)
- Marina Corbella
- Department of Chemistry, Uppsala University, Uppsala, Sweden
| | - Gaspar P Pinto
- Department of Chemistry, Uppsala University, Uppsala, Sweden
- Cortex Discovery GmbH, Regensburg, Germany
| | - Shina C L Kamerlin
- Department of Chemistry, Uppsala University, Uppsala, Sweden.
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
7
|
Nallasamy V, Seshiah M. Energy Profile Bayes and Thompson Optimized Convolutional Neural Network protein structure prediction. Neural Comput Appl 2023; 35:1983-2006. [PMID: 36245797 PMCID: PMC9542649 DOI: 10.1007/s00521-022-07868-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/21/2022] [Indexed: 01/12/2023]
Abstract
In living organisms, proteins are considered as the executants of biological functions. Owing to its pivotal role played in protein folding patterns, comprehension of protein structure is a challenging issue. Moreover, owing to numerous protein sequence exploration in protein data banks and complication of protein structures, experimental methods are found to be inadequate for protein structural class prediction. Hence, it is very much advantageous to design a reliable computational method to predict protein structural classes from protein sequences. In the recent few years there has been an elevated interest in using deep learning to assist protein structure prediction as protein structure prediction models can be utilized to screen a large number of novel sequences. In this regard, we propose a model employing Energy Profile for atom pairs in conjunction with the Legion-Class Bayes function called Energy Profile Legion-Class Bayes Protein Structure Identification model. Followed by this, we use a Thompson Optimized convolutional neural network to extract features between amino acids and then the Thompson Optimized SoftMax function is employed to extract associations between protein sequences for predicting secondary protein structure. The proposed Energy Profile Bayes and Thompson Optimized Convolutional Neural Network (EPB-OCNN) method tested distinct unique protein data and was compared to the state-of-the-art methods, the Template-Based Modeling, Protein Design using Deep Graph Neural Networks, a deep learning-based S-glutathionylation sites prediction tool called a Computational Framework, the Deep Learning and a distance-based protein structure prediction using deep learning. The results obtained when applied with the Biopython tool with respect to protein structure prediction time, protein structure prediction accuracy, specificity, recall, F-measure, and precision, respectively, are measured. The proposed EPB-OCNN method outperformed the state-of-the-art methods, thereby corroborating the objective.
Collapse
Affiliation(s)
- Varanavasi Nallasamy
- Cognizant Technology Solutions Pvt. Ltd, CHIL SEZ IT Park, Keeranatham, Saravanam Patti, Coimbatore, Tamil Nadu 641035 India
| | - Malarvizhi Seshiah
- Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram, Namakkal, Tamil Nadu India
| |
Collapse
|
8
|
Zhu Y, Wang M, Yin X, Zhang J, Meijering E, Hu J. Deep Learning in Diverse Intelligent Sensor Based Systems. SENSORS (BASEL, SWITZERLAND) 2022; 23:s23010062. [PMID: 36616657 PMCID: PMC9823653 DOI: 10.3390/s23010062] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 12/06/2022] [Accepted: 12/14/2022] [Indexed: 05/27/2023]
Abstract
Deep learning has become a predominant method for solving data analysis problems in virtually all fields of science and engineering. The increasing complexity and the large volume of data collected by diverse sensor systems have spurred the development of deep learning methods and have fundamentally transformed the way the data are acquired, processed, analyzed, and interpreted. With the rapid development of deep learning technology and its ever-increasing range of successful applications across diverse sensor systems, there is an urgent need to provide a comprehensive investigation of deep learning in this domain from a holistic view. This survey paper aims to contribute to this by systematically investigating deep learning models/methods and their applications across diverse sensor systems. It also provides a comprehensive summary of deep learning implementation tips and links to tutorials, open-source codes, and pretrained models, which can serve as an excellent self-contained reference for deep learning practitioners and those seeking to innovate deep learning in this space. In addition, this paper provides insights into research topics in diverse sensor systems where deep learning has not yet been well-developed, and highlights challenges and future opportunities. This survey serves as a catalyst to accelerate the application and transformation of deep learning in diverse sensor systems.
Collapse
Affiliation(s)
- Yanming Zhu
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia
| | - Min Wang
- School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2612, Australia
| | - Xuefei Yin
- School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2612, Australia
| | - Jue Zhang
- School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2612, Australia
| | - Erik Meijering
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia
| | - Jiankun Hu
- School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2612, Australia
| |
Collapse
|
9
|
Nallasamy V, Seshiah M. Protein Structure Prediction Using Quantile Dragonfly and Structural Class-Based Deep Learning. INT J PATTERN RECOGN 2022. [DOI: 10.1142/s021800142250015x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Predicting three-dimensional structure of a protein in the field of computational molecular biology has received greater attention. Most of the recent research works aimed at exploring search space, however with the increasing nature and size of data, protein structure identification and prediction are still in the preliminary stage. This work is aimed at exploring search space to tackle protein structure prediction with minimum execution time and maximum accuracy by means of quantile regressive dragonfly and structural class homolog-based deep learning (QRD-SCHDL). The proposed QRD-SCHDL method consists of two distinct steps. They are protein structure identification and prediction. In the first step, protein structure identification is performed by means of QRD optimization model to identify protein structure with minimum error. Here the protein structure identification is first performed as the raw database contains sequence information and does not contain structural information. An optimization model is designed to obtain the structural information from the database. However, protein structure gives much more insight than its sequence. Therefore, to perform computational prediction of protein structure from its sequence, actual protein structure prediction is made. The second step involves the actual protein structure prediction via structural class and homolog-based deep learning. For each protein structure prediction, a scoring matrix is obtained by utilizing structural class maximum correlation coefficient. Finally, the proposed method is tested on a set of different unique numbers of protein data and compared to the state-of-the-art methods. The obtained results showed the potentiality of the proposed method in terms of metrics, error rate, protein structure prediction time, protein structure prediction accuracy, precision, specificity, recall, ROC, Kappa coefficient and [Formula: see text]-measure, respectively. It also shows that the proposed QRD-SCHDL method attains comparable results and outperformed in certain cases, thereby signifying the efficiency of the proposed work.
Collapse
Affiliation(s)
- Varanavasi Nallasamy
- Department of Computer Science, Periyar University, Salem-636011, Tamil Nadu, India
| | - Malarvizhi Seshiah
- Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram-637401, Namakkal, Tamil Nadu, India
| |
Collapse
|
10
|
Miller I, Totrov M, Korotchkina L, Kazyulkin DN, Gudkov AV, Korolev S. Structural dissection of sequence recognition and catalytic mechanism of human LINE-1 endonuclease. Nucleic Acids Res 2021; 49:11350-11366. [PMID: 34554261 PMCID: PMC8565326 DOI: 10.1093/nar/gkab826] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 09/03/2021] [Accepted: 09/08/2021] [Indexed: 11/12/2022] Open
Abstract
Long interspersed nuclear element-1 (L1) is an autonomous non-LTR retrotransposon comprising ∼20% of the human genome. L1 self-propagation causes genomic instability and is strongly associated with aging, cancer and other diseases. The endonuclease domain of L1’s ORFp2 protein (L1-EN) initiates de novo L1 integration by nicking the consensus sequence 5′-TTTTT/AA-3′. In contrast, related nucleases including structurally conserved apurinic/apyrimidinic endonuclease 1 (APE1) are non-sequence specific. To investigate mechanisms underlying sequence recognition and catalysis by L1-EN, we solved crystal structures of L1-EN complexed with DNA substrates. This showed that conformational properties of the preferred sequence drive L1-EN’s sequence-specificity and catalysis. Unlike APE1, L1-EN does not bend the DNA helix, but rather causes ‘compression’ near the cleavage site. This provides multiple advantages for L1-EN’s role in retrotransposition including facilitating use of the nicked poly-T DNA strand as a primer for reverse transcription. We also observed two alternative conformations of the scissile bond phosphate, which allowed us to model distinct conformations for a nucleophilic attack and a transition state that are likely applicable to the entire family of nucleases. This work adds to our mechanistic understanding of L1-EN and related nucleases and should facilitate development of L1-EN inhibitors as potential anticancer and antiaging therapeutics.
Collapse
Affiliation(s)
- Ian Miller
- Edward A. Doisy Department of Biochemistry and Molecular Biology, Saint Louis University School of Medicine, St. Louis, MO 63104, USA
| | | | | | | | - Andrei V Gudkov
- Genome Protection, Inc., Buffalo, NY 14203, USA.,Roswell Park Comprehensive Cancer Center, Buffalo, NY 14263, USA
| | - Sergey Korolev
- Edward A. Doisy Department of Biochemistry and Molecular Biology, Saint Louis University School of Medicine, St. Louis, MO 63104, USA
| |
Collapse
|
11
|
Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021; 19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open
Abstract
We review the current applications of artificial intelligence (AI) in functional genomics. The recent explosion of AI follows the remarkable achievements made possible by "deep learning", along with a burst of "big data" that can meet its hunger. Biology is about to overthrow astronomy as the paradigmatic representative of big data producer. This has been made possible by huge advancements in the field of high throughput technologies, applied to determine how the individual components of a biological system work together to accomplish different processes. The disciplines contributing to this bulk of data are collectively known as functional genomics. They consist in studies of: i) the information contained in the DNA (genomics); ii) the modifications that DNA can reversibly undergo (epigenomics); iii) the RNA transcripts originated by a genome (transcriptomics); iv) the ensemble of chemical modifications decorating different types of RNA transcripts (epitranscriptomics); v) the products of protein-coding transcripts (proteomics); and vi) the small molecules produced from cell metabolism (metabolomics) present in an organism or system at a given time, in physiological or pathological conditions. After reviewing main applications of AI in functional genomics, we discuss important accompanying issues, including ethical, legal and economic issues and the importance of explainability.
Collapse
Affiliation(s)
- Claudia Caudai
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Antonella Galizia
- CNR, Institute of Applied Mathematics and Information Technologies (IMATI), Genoa, Italy
| | - Filippo Geraci
- CNR, Institute for Informatics and Telematics (IIT), Pisa, Italy
| | - Loredana Le Pera
- CNR, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Veronica Morea
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Emanuele Salerno
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Allegra Via
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Teresa Colombo
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| |
Collapse
|
12
|
Carrillo-Cabada H, Benson J, Razavi AM, Mulligan B, Cuendet MA, Weinstein H, Taufer M, Estrada T. A Graphic Encoding Method for Quantitative Classification of Protein Structure and Representation of Conformational Changes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1336-1349. [PMID: 31603792 PMCID: PMC9119144 DOI: 10.1109/tcbb.2019.2945291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In order to successfully predict a proteins function throughout its trajectory, in addition to uncovering changes in its conformational state, it is necessary to employ techniques that maintain its 3D information while performing at scale. We extend a protein representation that encodes secondary and tertiary structure into fix-sized, color images, and a neural network architecture (called GEM-net) that leverages our encoded representation. We show the applicability of our method in two ways: (1) performing protein function prediction, hitting accuracy between 78 and 83 percent, and (2) visualizing and detecting conformational changes in protein trajectories during molecular dynamics simulations.
Collapse
|
13
|
Abstract
Genome sequencing projects have resulted in a rapid increase in the number of known protein sequences. In contrast, only about one-hundredth of these sequences have been characterized at atomic resolution using experimental structure determination methods. Computational protein structure modeling techniques have the potential to bridge this sequence-structure gap. In the following chapter, we present an example that illustrates the use of MODELLER to construct a comparative model for a protein with unknown structure. Automation of a similar protocol has resulted in models of useful accuracy for domains in more than half of all known protein sequences.
Collapse
|
14
|
Gao W, Mahajan SP, Sulam J, Gray JJ. Deep Learning in Protein Structural Modeling and Design. PATTERNS (NEW YORK, N.Y.) 2020; 1:100142. [PMID: 33336200 PMCID: PMC7733882 DOI: 10.1016/j.patter.2020.100142] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the "sequence → structure → function" paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques.
Collapse
Affiliation(s)
- Wenhao Gao
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Sai Pooja Mahajan
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeremias Sulam
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
15
|
Jauro F, Chiroma H, Gital AY, Almutairi M, Abdulhamid SM, Abawajy JH. Deep learning architectures in emerging cloud computing architectures: Recent development, challenges and next research trend. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106582] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
16
|
Investigation of machine learning techniques on proteomics: A comprehensive survey. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2019; 149:54-69. [PMID: 31568792 DOI: 10.1016/j.pbiomolbio.2019.09.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 09/16/2019] [Accepted: 09/23/2019] [Indexed: 11/21/2022]
Abstract
Proteomics is the extensive investigation of proteins which has empowered the recognizable proof of consistently expanding quantities of protein. Proteins are necessary part of living life form, with numerous capacities. The proteome is the complete arrangement of proteins that are created or altered by a life form or framework of the organism. Proteome fluctuates with time and unambiguous prerequisites, or stresses, that a cell or organism experiences. Proteomics is an interdisciplinary area that has derived from the hereditary data of different genome ventures. Much proteomics information is gathered with the assistance of high throughput techniques, for example, mass spectrometry and microarray. It would regularly take weeks or months to analyze the information and perform examinations by hand. Therefore, scholars and scientific experts are teaming up with computer science researchers and mathematicians to make projects and pipeline to computationally examine the protein information. Utilizing bioinformatics procedures, scientists are prepared to do quicker investigation and protein information storing. The goal of this paper is to brief about the review of machine learning procedures and its application in the field of proteomics.
Collapse
|
17
|
Deep Learning in the Biomedical Applications: Recent and Future Status. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9081526] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Deep neural networks represent, nowadays, the most effective machine learning technology in biomedical domain. In this domain, the different areas of interest concern the Omics (study of the genome—genomics—and proteins—transcriptomics, proteomics, and metabolomics), bioimaging (study of biological cell and tissue), medical imaging (study of the human organs by creating visual representations), BBMI (study of the brain and body machine interface) and public and medical health management (PmHM). This paper reviews the major deep learning concepts pertinent to such biomedical applications. Concise overviews are provided for the Omics and the BBMI. We end our analysis with a critical discussion, interpretation and relevant open challenges.
Collapse
|
18
|
Kundert K, Kortemme T. Computational design of structured loops for new protein functions. Biol Chem 2019; 400:275-288. [PMID: 30676995 PMCID: PMC6530579 DOI: 10.1515/hsz-2018-0348] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 12/18/2018] [Indexed: 12/20/2022]
Abstract
The ability to engineer the precise geometries, fine-tuned energetics and subtle dynamics that are characteristic of functional proteins is a major unsolved challenge in the field of computational protein design. In natural proteins, functional sites exhibiting these properties often feature structured loops. However, unlike the elements of secondary structures that comprise idealized protein folds, structured loops have been difficult to design computationally. Addressing this shortcoming in a general way is a necessary first step towards the routine design of protein function. In this perspective, we will describe the progress that has been made on this problem and discuss how recent advances in the field of loop structure prediction can be harnessed and applied to the inverse problem of computational loop design.
Collapse
Affiliation(s)
- Kale Kundert
- Graduate Group in Biophysics, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
| | - Tanja Kortemme
- Graduate Group in Biophysics, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, 499 Illinois St, San Francisco, CA 94158, USA
| |
Collapse
|