1
|
Soleymani F, Paquet E, Viktor HL, Michalowski W. Structure-based protein and small molecule generation using EGNN and diffusion models: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2779-2797. [PMID: 39050782 PMCID: PMC11268121 DOI: 10.1016/j.csbj.2024.06.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 06/13/2024] [Accepted: 06/18/2024] [Indexed: 07/27/2024] Open
Abstract
Recent breakthroughs in deep learning have revolutionized protein sequence and structure prediction. These advancements are built on decades of protein design efforts, and are overcoming traditional time and cost limitations. Diffusion models, at the forefront of these innovations, significantly enhance design efficiency by automating knowledge acquisition. In the field of de novo protein design, the goal is to create entirely novel proteins with predetermined structures. Given the arbitrary positions of proteins in 3-D space, graph representations and their properties are widely used in protein generation studies. A critical requirement in protein modelling is maintaining spatial relationships under transformations (rotations, translations, and reflections). This property, known as equivariance, ensures that predicted protein characteristics adapt seamlessly to changes in orientation or position. Equivariant graph neural networks offer a solution to this challenge. By incorporating equivariant graph neural networks to learn the score of the probability density function in diffusion models, one can generate proteins with robust 3-D structural representations. This review examines the latest deep learning advancements, specifically focusing on frameworks that combine diffusion models with equivariant graph neural networks for protein generation.
Collapse
Affiliation(s)
- Farzan Soleymani
- Telfer School of Management, University of Ottawa, ON, K1N 6N5, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON, K1A 0R6, Canada
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, K1N 6N5, Canada
| | - Herna Lydia Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, K1N 6N5, Canada
| | | |
Collapse
|
2
|
Bordin N, Scholes H, Rauer C, Roca-Martínez J, Sillitoe I, Orengo C. Clustering protein functional families at large scale with hierarchical approaches. Protein Sci 2024; 33:e5140. [PMID: 39145441 PMCID: PMC11325189 DOI: 10.1002/pro.5140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 08/16/2024]
Abstract
Proteins, fundamental to cellular activities, reveal their function and evolution through their structure and sequence. CATH functional families (FunFams) are coherent clusters of protein domain sequences in which the function is conserved across their members. The increasing volume and complexity of protein data enabled by large-scale repositories like MGnify or AlphaFold Database requires more powerful approaches that can scale to the size of these new resources. In this work, we introduce MARC and FRAN, two algorithms developed to build upon and address limitations of GeMMA/FunFHMMER, our original methods developed to classify proteins with related functions using a hierarchical approach. We also present CATH-eMMA, which uses embeddings or Foldseek distances to form relationship trees from distance matrices, reducing computational demands and handling various data types effectively. CATH-eMMA offers a highly robust and much faster tool for clustering protein functions on a large scale, providing a new tool for future studies in protein function and evolution.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Harry Scholes
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London, UK
- Universidad Autonoma de Madrid, Ciudad Universitaria de Cantoblanco, Madrid, Spain
| | - Joel Roca-Martínez
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, UK
| |
Collapse
|
3
|
Terui R, Berger SE, Sambel LA, Song D, Chistol G. Single-molecule imaging reveals the mechanism of bidirectional replication initiation in metazoa. Cell 2024; 187:3992-4009.e25. [PMID: 38866019 PMCID: PMC11283366 DOI: 10.1016/j.cell.2024.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 03/28/2024] [Accepted: 05/13/2024] [Indexed: 06/14/2024]
Abstract
Metazoan genomes are copied bidirectionally from thousands of replication origins. Replication initiation entails the assembly and activation of two CMG helicases (Cdc45⋅Mcm2-7⋅GINS) at each origin. This requires several replication firing factors (including TopBP1, RecQL4, and DONSON) whose exact roles are still under debate. How two helicases are correctly assembled and activated at each origin is a long-standing question. By visualizing the recruitment of GINS, Cdc45, TopBP1, RecQL4, and DONSON in real time, we uncovered that replication initiation is surprisingly dynamic. First, TopBP1 transiently binds to the origin and dissociates before the start of DNA synthesis. Second, two Cdc45 are recruited together, even though Cdc45 alone cannot dimerize. Next, two copies of DONSON and two GINS simultaneously arrive at the origin, completing the assembly of two CMG helicases. Finally, RecQL4 is recruited to the CMG⋅DONSON⋅DONSON⋅CMG complex and promotes DONSON dissociation and CMG activation via its ATPase activity.
Collapse
Affiliation(s)
- Riki Terui
- Chemical and Systems Biology Department, Stanford School of Medicine, Stanford, CA 94305, USA
| | - Scott E Berger
- Biophysics Program, Stanford School of Medicine, Stanford, CA 94305, USA
| | - Larissa A Sambel
- Chemical and Systems Biology Department, Stanford School of Medicine, Stanford, CA 94305, USA
| | - Dan Song
- Chemical and Systems Biology Department, Stanford School of Medicine, Stanford, CA 94305, USA
| | - Gheorghe Chistol
- Chemical and Systems Biology Department, Stanford School of Medicine, Stanford, CA 94305, USA; Biophysics Program, Stanford School of Medicine, Stanford, CA 94305, USA; Cancer Biology Program, Stanford School of Medicine, Stanford, CA 94305, USA; Stanford Cancer Institute, Stanford School of Medicine, Stanford, CA 94305, USA; BioX Interdisciplinary Institute, Stanford School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
4
|
Shen X, Jin J, Zhang G, Yan B, Yu X, Wu H, Yang M, Zhang F. The chromosome-level genome assembly of Aphidoletes aphidimyza Rondani (Diptera: Cecidomyiidae). Sci Data 2024; 11:785. [PMID: 39019956 PMCID: PMC11255235 DOI: 10.1038/s41597-024-03614-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 07/05/2024] [Indexed: 07/19/2024] Open
Abstract
Aphidoletes aphidimyza is widely recognized as an effective predator of aphids in agricultural systems. However, there is limited understanding of its predation mechanisms. In this study, we generated a high-quality chromosome level of the A. aphidimyza genome by combining PacBio, Illumina, and Hi-C data. The genome has a size of 192.08 Mb, with a scaffold N50 size of 46.85 Mb, and 99.08% (190.35 Mb) of the assembly is located on four chromosomes. The BUSCO analysis of our assembly indicates a completeness of 97.8% (n = 1,367), including 1,307 (95.6%) single-copy BUSCOs and 30 (2.2%) duplicated BUSCOs. Additionally, we annotated a total of 13,073 protein-coding genes, 18.43% (35.40 Mb) repetitive elements, and 376 non-coding RNAs. Our study is the first time to report the chromosome-scale genome for the species of A. aphidimyza. It provides a valuable genomic resource for the molecular study of A. aphidimyza.
Collapse
Affiliation(s)
- Xiuxian Shen
- Institute of Entomology, Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region, College of Agriculture, Guizhou University, Guiyang, 550025, China
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, China
| | - Jianfeng Jin
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, China
| | - Guoqiang Zhang
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, China
| | - Bin Yan
- Institute of Entomology, Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region, College of Agriculture, Guizhou University, Guiyang, 550025, China
| | - Xiaofei Yu
- College of Tobacco Science, Guizhou University, Guiyang, 550025, China
| | - Huizi Wu
- Zunyi Branch of Guizhou Tobacco Company, Zunyi, 564200, China
| | - Maofa Yang
- Institute of Entomology, Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region, College of Agriculture, Guizhou University, Guiyang, 550025, China.
- College of Tobacco Science, Guizhou University, Guiyang, 550025, China.
| | - Feng Zhang
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, China.
| |
Collapse
|
5
|
Hamamsy T, Morton JT, Blackwell R, Berenberg D, Carriero N, Gligorijevic V, Strauss CEM, Leman JK, Cho K, Bonneau R. Protein remote homology detection and structural alignment using deep learning. Nat Biotechnol 2024; 42:975-985. [PMID: 37679542 PMCID: PMC11180608 DOI: 10.1038/s41587-023-01917-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 07/26/2023] [Indexed: 09/09/2023]
Abstract
Exploiting sequence-structure-function relationships in biotechnology requires improved methods for aligning proteins that have low sequence similarity to previously annotated proteins. We develop two deep learning methods to address this gap, TM-Vec and DeepBLAST. TM-Vec allows searching for structure-structure similarities in large sequence databases. It is trained to accurately predict TM-scores as a metric of structural similarity directly from sequence pairs without the need for intermediate computation or solution of structures. Once structurally similar proteins have been identified, DeepBLAST can structurally align proteins using only sequence information by identifying structurally homologous regions between proteins. It outperforms traditional sequence alignment methods and performs similarly to structure-based alignment methods. We show the merits of TM-Vec and DeepBLAST on a variety of datasets, including better identification of remotely homologous proteins compared with state-of-the-art sequence alignment and structure prediction methods.
Collapse
Grants
- R35GM122515 National Science Foundation (NSF)
- IOS-1546218 National Science Foundation (NSF)
- R35 GM122515 NIGMS NIH HHS
- R01 DK103358 NIDDK NIH HHS
- CBET- 1728858 National Science Foundation (NSF)
- R01 AI130945 NIAID NIH HHS
- This research was supported by NIH R01DK103358, the Simons Foundation, NSF- IOS-1546218, R35GM122515, NSF CBET- 1728858, NIH R01AI130945, to T.H. This research was supported by the intramural research program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) to J.T.M. This research was supported by the Flatiron Institute as part of the Simons Foundation to Robert Blackwell, J.K.L., and N.C. This research was supported by Los Alamos National Lab to C.S. This research was supported by the Samsung Advanced Institute of Technology (Next Generation Deep Learning: from pattern recognition to AI), Samsung Research (Improving Deep Learning using Latent Structure), and NSF Award 1922658 to K.C.
- Simons Foundation
- U.S. Department of Health & Human Services | NIH | Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD)
Collapse
Affiliation(s)
- Tymor Hamamsy
- Center for Data Science, New York University, New York, NY, USA
| | - James T Morton
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Robert Blackwell
- Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Daniel Berenberg
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
- Prescient Design, New York, NY, USA
| | - Nicholas Carriero
- Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA
| | | | | | - Julia Koehler Leman
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Kyunghyun Cho
- Center for Data Science, New York University, New York, NY, USA.
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA.
- Prescient Design, New York, NY, USA.
- CIFAR, Toronto, Ontario, Canada.
| | - Richard Bonneau
- Center for Data Science, New York University, New York, NY, USA.
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA.
- Prescient Design, New York, NY, USA.
- Department of Biology, New York University, New York, NY, USA.
| |
Collapse
|
6
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
7
|
Abbass J, Parisi C. Machine learning-based prediction of proteins' architecture using sequences of amino acids and structural alphabets. J Biomol Struct Dyn 2024:1-16. [PMID: 38505995 DOI: 10.1080/07391102.2024.2328736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 03/05/2024] [Indexed: 03/21/2024]
Abstract
In addition to the growth of protein structures generated through wet laboratory experiments and deposited in the PDB repository, AlphaFold predictions have significantly contributed to the creation of a much larger database of protein structures. Annotating such a vast number of structures has become an increasingly challenging task. CATH is widely recognized as one the most common platforms for addressing this challenge, as it classifies proteins based on their structural and evolutionary relationships, offering the scientific community an invaluable resource for uncovering various properties, including functional annotations. While CATH annotation involves - to some extent - human intervention, keeping up with the classification of the rapidly expanding repositories of protein structures has become exceedingly difficult. Therefore, there is a pressing need for a fully automated approach. On the other hand, the abundance of protein sequences stemming from next generation sequencing technologies, lacking structural annotations, presents an additional challenge to the scientific community. Consequently, 'pre-annotating' protein sequences with structural features, ensuring a high level of precision, could prove highly advantageous. In this paper, after a thorough investigation, we introduce a novel machine-learning model capable of classifying any protein domain, whether it has a known structure or not, into one of the 40 main CATH Architectures. We achieve an F1 Score of 0.92 using only the amino acid sequence and a score of 0.94 using both the sequence of amino acids and the sequence of structural alphabets.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Jad Abbass
- School of Computer Science and Mathematics, Kingston University, London, UK
| | - Charles Parisi
- School of Computer Science and Mathematics, Kingston University, London, UK
- Telecom Physique Strasbourg, Strasbourg University, Strasbourg, France
| |
Collapse
|
8
|
Pavlenok M, Nair RR, Hendrickson RC, Niederweis M. The C-terminus is essential for the stability of the mycobacterial channel protein MspA. Protein Sci 2024; 33:e4912. [PMID: 38358254 PMCID: PMC10868439 DOI: 10.1002/pro.4912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 12/15/2023] [Accepted: 01/10/2024] [Indexed: 02/16/2024]
Abstract
Outer membrane proteins perform essential functions in uptake and secretion processes in bacteria. MspA is an octameric channel protein in the outer membrane of Mycobacterium smegmatis and is structurally distinct from any other known outer membrane protein. MspA is the founding member of a family with more than 3000 homologs and is one of the most widely used proteins in nanotechnological applications due to its advantageous pore structure and extraordinary stability. While a conserved C-terminal signal sequence is essential for folding and protein assembly in the outer membrane of Gram-negative bacteria, the molecular determinants of these processes are unknown for MspA. In this study, we show that mutation and deletion of methionine 183 in the highly conserved C-terminus of MspA and mutation of the conserved tryptophan 40 lead to a complete loss of protein in heat extracts of M. smegmatis. Swapping these residues partially restores the heat stability of MspA indicating that methionine 183 and tryptophan 40 form a conserved sulfur-π electron interaction, which stabilizes the MspA monomer. Flow cytometry showed that all MspA mutants are surface-accessible demonstrating that oligomerization and membrane integration in M. smegmatis are not affected. Thus, the conserved C-terminus of MspA is essential for its thermal stability, but it is not required for protein assembly in its native membrane, indicating that this process is mediated by a mechanism distinct from that in Gram-negative bacteria. These findings will benefit the rational design of MspA-like pores to tailor their properties in current and future applications.
Collapse
Affiliation(s)
- Mikhail Pavlenok
- Department of MicrobiologyUniversity of Alabama at BirminghamBirminghamAlabamaUSA
| | | | | | - Michael Niederweis
- Department of MicrobiologyUniversity of Alabama at BirminghamBirminghamAlabamaUSA
| |
Collapse
|
9
|
Botkin JR, Farmer AD, Young ND, Curtin SJ. Genome assembly of Medicago truncatula accession SA27063 provides insight into spring black stem and leaf spot disease resistance. BMC Genomics 2024; 25:204. [PMID: 38395768 PMCID: PMC10885650 DOI: 10.1186/s12864-024-10112-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 02/10/2024] [Indexed: 02/25/2024] Open
Abstract
Medicago truncatula, model legume and alfalfa relative, has served as an essential resource for advancing our understanding of legume physiology, functional genetics, and crop improvement traits. Necrotrophic fungus, Ascochyta medicaginicola, the causal agent of spring black stem (SBS) and leaf spot is a devasting foliar disease of alfalfa affecting stand survival, yield, and forage quality. Host resistance to SBS disease is poorly understood, and control methods rely on cultural practices. Resistance has been observed in M. truncatula accession SA27063 (HM078) with two recessively inherited quantitative-trait loci (QTL), rnpm1 and rnpm2, previously reported. To shed light on host resistance, we carried out a de novo genome assembly of HM078. The genome, referred to as MtHM078 v1.0, is comprised of 23 contigs totaling 481.19 Mbp. Notably, this assembly contains a substantial amount of novel centromere-related repeat sequences due to deep long-read sequencing. Genome annotation resulted in 98.4% of BUSCO fabales proteins being complete. The assembly enabled sequence-level analysis of rnpm1 and rnpm2 for gene content, synteny, and structural variation between SBS-resistant accession SA27063 (HM078) and SBS-susceptible accession A17 (HM101). Fourteen candidate genes were identified, and some have been implicated in resistance to necrotrophic fungi. Especially interesting candidates include loss-of-function events in HM078 because they fit the inverse gene-for-gene model, where resistance is recessively inherited. In rnpm1, these include a loss-of-function in a disease resistance gene due to a premature stop codon, and a 10.85 kbp retrotransposon-like insertion disrupting a ubiquitin conjugating E2. In rnpm2, we identified a frameshift mutation causing a loss-of-function in a glycosidase, as well as a missense and frameshift mutation altering an F-box family protein. This study generated a high-quality genome of HM078 and has identified promising candidates, that once validated, could be further studied in alfalfa to enhance disease resistance.
Collapse
Affiliation(s)
- Jacob R Botkin
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, 55108, USA
| | - Andrew D Farmer
- National Center for Genome Resources, Santa Fe, NM, 87505, USA
| | - Nevin D Young
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, 55108, USA
| | - Shaun J Curtin
- United States Department of Agriculture, Plant Science Research Unit, St Paul, MN, 55108, USA.
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA.
- Center for Plant Precision Genomics, University of Minnesota, St. Paul, MN, 55108, USA.
- Center for Genome Engineering, University of Minnesota, St. Paul, MN, 55108, USA.
| |
Collapse
|
10
|
Duran-Romaña R, Houben B, De Vleeschouwer M, Louros N, Wilson MP, Matthijs G, Schymkowitz J, Rousseau F. N-glycosylation as a eukaryotic protective mechanism against protein aggregation. SCIENCE ADVANCES 2024; 10:eadk8173. [PMID: 38295165 PMCID: PMC10830103 DOI: 10.1126/sciadv.adk8173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 12/28/2023] [Indexed: 02/02/2024]
Abstract
The tendency for proteins to form aggregates is an inherent part of every proteome and arises from the self-assembly of short protein segments called aggregation-prone regions (APRs). While posttranslational modifications (PTMs) have been implicated in modulating protein aggregation, their direct role in APRs remains poorly understood. In this study, we used a combination of proteome-wide computational analyses and biophysical techniques to investigate the potential involvement of PTMs in aggregation regulation. Our findings reveal that while most PTM types are disfavored near APRs, N-glycosylation is enriched and evolutionarily selected, especially in proteins prone to misfolding. Experimentally, we show that N-glycosylation inhibits the aggregation of peptides in vitro through steric hindrance. Moreover, mining existing proteomics data, we find that the loss of N-glycans at the flanks of APRs leads to specific protein aggregation in Neuro2a cells. Our findings indicate that, among its many molecular functions, N-glycosylation directly prevents protein aggregation in higher eukaryotes.
Collapse
Affiliation(s)
- Ramon Duran-Romaña
- Switch Laboratory, VIB Center for Brain and Disease Research, 3000 Leuven, Belgium
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, 3000 Leuven, Belgium
| | - Bert Houben
- Switch Laboratory, VIB Center for Brain and Disease Research, 3000 Leuven, Belgium
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, 3000 Leuven, Belgium
| | - Matthias De Vleeschouwer
- Switch Laboratory, VIB Center for Brain and Disease Research, 3000 Leuven, Belgium
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, 3000 Leuven, Belgium
| | - Nikolaos Louros
- Switch Laboratory, VIB Center for Brain and Disease Research, 3000 Leuven, Belgium
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, 3000 Leuven, Belgium
| | - Matthew P. Wilson
- Laboratory for Molecular Diagnosis, Center for Human Genetics, KU Leuven, 3000 Leuven, Belgium
| | - Gert Matthijs
- Laboratory for Molecular Diagnosis, Center for Human Genetics, KU Leuven, 3000 Leuven, Belgium
| | - Joost Schymkowitz
- Switch Laboratory, VIB Center for Brain and Disease Research, 3000 Leuven, Belgium
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, 3000 Leuven, Belgium
| | - Frederic Rousseau
- Switch Laboratory, VIB Center for Brain and Disease Research, 3000 Leuven, Belgium
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, 3000 Leuven, Belgium
| |
Collapse
|
11
|
Roy BG, Choi J, Fuchs MF. Predictive Modeling of Proteins Encoded by a Plant Virus Sheds a New Light on Their Structure and Inherent Multifunctionality. Biomolecules 2024; 14:62. [PMID: 38254661 PMCID: PMC10813169 DOI: 10.3390/biom14010062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 12/29/2023] [Accepted: 12/30/2023] [Indexed: 01/24/2024] Open
Abstract
Plant virus genomes encode proteins that are involved in replication, encapsidation, cell-to-cell, and long-distance movement, avoidance of host detection, counter-defense, and transmission from host to host, among other functions. Even though the multifunctionality of plant viral proteins is well documented, contemporary functional repertoires of individual proteins are incomplete. However, these can be enhanced by modeling tools. Here, predictive modeling of proteins encoded by the two genomic RNAs, i.e., RNA1 and RNA2, of grapevine fanleaf virus (GFLV) and their satellite RNAs by a suite of protein prediction software confirmed not only previously validated functions (suppressor of RNA silencing [VSR], viral genome-linked protein [VPg], protease [Pro], symptom determinant [Sd], homing protein [HP], movement protein [MP], coat protein [CP], and transmission determinant [Td]) and previously identified putative functions (helicase [Hel] and RNA-dependent RNA polymerase [Pol]), but also predicted novel functions with varying levels of confidence. These include a T3/T7-like RNA polymerase domain for protein 1AVSR, a short-chain reductase for protein 1BHel/VSR, a parathyroid hormone family domain for protein 1EPol/Sd, overlapping domains of unknown function and an ABC transporter domain for protein 2BMP, and DNA topoisomerase domains, transcription factor FBXO25 domain, or DNA Pol subunit cdc27 domain for the satellite RNA protein. Structural predictions for proteins 2AHP/Sd, 2BMP, and 3A? had low confidence, while predictions for proteins 1AVSR, 1BHel*/VSR, 1CVPg, 1DPro, 1EPol*/Sd, and 2CCP/Td retained higher confidence in at least one prediction. This research provided new insights into the structure and functions of GFLV proteins and their satellite protein. Future work is needed to validate these findings.
Collapse
Affiliation(s)
- Brandon G. Roy
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, 15 Castle Creek Drive, Geneva, NY 14456, USA; (J.C.); (M.F.F.)
| | | | | |
Collapse
|
12
|
Hamamsy T, Barot M, Morton JT, Steinegger M, Bonneau R, Cho K. Learning sequence, structure, and function representations of proteins with language models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.26.568742. [PMID: 38045331 PMCID: PMC10690258 DOI: 10.1101/2023.11.26.568742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
The sequence-structure-function relationships that ultimately generate the diversity of extant observed proteins is complex, as proteins bridge the gap between multiple informational and physical scales involved in nearly all cellular processes. One limitation of existing protein annotation databases such as UniProt is that less than 1% of proteins have experimentally verified functions, and computational methods are needed to fill in the missing information. Here, we demonstrate that a multi-aspect framework based on protein language models can learn sequence-structure-function representations of amino acid sequences, and can provide the foundation for sensitive sequence-structure-function aware protein sequence search and annotation. Based on this model, we introduce a multi-aspect information retrieval system for proteins, Protein-Vec, covering sequence, structure, and function aspects, that enables computational protein annotation and function prediction at tree-of-life scales.
Collapse
|
13
|
Nijkamp E, Ruffolo JA, Weinstein EN, Naik N, Madani A. ProGen2: Exploring the boundaries of protein language models. Cell Syst 2023; 14:968-978.e3. [PMID: 37909046 DOI: 10.1016/j.cels.2023.10.002] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 05/01/2023] [Accepted: 10/02/2023] [Indexed: 11/02/2023]
Abstract
Attention-based models trained on protein sequences have demonstrated incredible success at classification and generation tasks relevant for artificial-intelligence-driven protein design. However, we lack a sufficient understanding of how very large-scale models and data play a role in effective protein model development. We introduce a suite of protein language models, named ProGen2, that are scaled up to 6.4B parameters and trained on different sequence datasets drawn from over a billion proteins from genomic, metagenomic, and immune repertoire databases. ProGen2 models show state-of-the-art performance in capturing the distribution of observed evolutionary sequences, generating novel viable sequences, and predicting protein fitness without additional fine-tuning. As large model sizes and raw numbers of protein sequences continue to become more widely accessible, our results suggest that a growing emphasis needs to be placed on the data distribution provided to a protein sequence model. Our models and code are open sourced for widespread adoption in protein engineering. A record of this paper's Transparent Peer Review process is included in the supplemental information.
Collapse
Affiliation(s)
| | - Jeffrey A Ruffolo
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, USA; Profluent Bio, Berkeley, CA, USA
| | - Eli N Weinstein
- Data Science Institute, Columbia University, New York, NY, USA
| | | | - Ali Madani
- Salesforce Research, Palo Alto, CA, USA; Profluent Bio, Berkeley, CA, USA.
| |
Collapse
|
14
|
Kurgan L, Hu G, Wang K, Ghadermarzi S, Zhao B, Malhis N, Erdős G, Gsponer J, Uversky VN, Dosztányi Z. Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins. Nat Protoc 2023; 18:3157-3172. [PMID: 37740110 DOI: 10.1038/s41596-023-00876-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 06/21/2023] [Indexed: 09/24/2023]
Abstract
Intrinsic disorder is instrumental for a wide range of protein functions, and its analysis, using computational predictions from primary structures, complements secondary and tertiary structure-based approaches. In this Tutorial, we provide an overview and comparison of 23 publicly available computational tools with complementary parameters useful for intrinsic disorder prediction, partly relying on results from the Critical Assessment of protein Intrinsic Disorder prediction experiment. We consider factors such as accuracy, runtime, availability and the need for functional insights. The selected tools are available as web servers and downloadable programs, offer state-of-the-art predictions and can be used in a high-throughput manner. We provide examples and instructions for the selected tools to illustrate practical aspects related to the submission, collection and interpretation of predictions, as well as the timing and their limitations. We highlight two predictors for intrinsically disordered proteins, flDPnn as accurate and fast and IUPred as very fast and moderately accurate, while suggesting ANCHOR2 and MoRFchibi as two of the best-performing predictors for intrinsically disordered region binding. We link these tools to additional resources, including databases of predictions and web servers that integrate multiple predictive methods. Altogether, this Tutorial provides a hands-on guide to comparatively evaluating multiple predictors, submitting and collecting their own predictions, and reading and interpreting results. It is suitable for experimentalists and computational biologists interested in accurately and conveniently identifying intrinsic disorder, facilitating the functional characterization of the rapidly growing collections of protein sequences.
Collapse
Affiliation(s)
- Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Kui Wang
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Gábor Erdős
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
- Byrd Alzheimer's Center and Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
| | - Zsuzsanna Dosztányi
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary.
| |
Collapse
|
15
|
Schweke H, Xu Q, Tauriello G, Pantolini L, Schwede T, Cazals F, Lhéritier A, Fernandez-Recio J, Rodríguez-Lumbreras LÁ, Schueler-Furman O, Varga JK, Jiménez-García B, Réau MF, Bonvin A, Savojardo C, Martelli PL, Casadio R, Tubiana J, Wolfson H, Oliva R, Barradas-Bautista D, Ricciardelli T, Cavallo L, Venclovas Č, Olechnovič K, Guerois R, Andreani J, Martin J, Wang X, Kihara D, Marchand A, Correia B, Zou X, Dey S, Dunbrack R, Levy E, Wodak S. Discriminating physiological from non-physiological interfaces in structures of protein complexes: A community-wide study. Proteomics 2023; 23:e2200323. [PMID: 37365936 PMCID: PMC10937251 DOI: 10.1002/pmic.202200323] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 05/11/2023] [Accepted: 05/11/2023] [Indexed: 06/28/2023]
Abstract
Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Julia K. Varga
- Hebrew University of Jerusalem Institute for Medical Research Israel-Canada
| | | | | | | | | | | | | | - Jérôme Tubiana
- Tel Aviv University Blavatnik School of Computer Science
| | - Haim Wolfson
- Tel Aviv University Blavatnik School of Computer Science
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Xiaoqin Zou
- Dalton Cardiovascular Research Center, Institute for Data Science and Informatics, University of Missouri
| | | | | | | | | |
Collapse
|
16
|
Yan B, Di X, Yang M, Wu H, Yu X, Zhang F. Chromosome-Scale Genome Assembly of the Solitary Parasitoid Wasp Microplitis manilae Ashmead, 1904 (Braconidae: Microgastrinae). Genome Biol Evol 2023; 15:evad144. [PMID: 37515590 PMCID: PMC10448859 DOI: 10.1093/gbe/evad144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 07/13/2023] [Accepted: 07/23/2023] [Indexed: 07/31/2023] Open
Abstract
Parasitoid wasps are invaluable natural enemies extensively used to control coleopteran, dipteran, and lepidopteran pests in agriculture and forestry owing to their killing and reproductive actions on hosts. The important larval endoparasitoid wasp Microplitis manilae, which belongs to the Microgastrinae subfamily, parasitizes the larval stages of Spodoptera spp., such as Spodoptera litura and Spodoptera frugiperda. The absence of a genomic resource for M. manilae has impeded studies on chemosensory- and detoxification-related genes. This study presents a chromosome-level genome assembly of M. manilae with a genome size of 293.18 Mb, which includes 222 contigs (N50 size, 7.58 Mb) and 134 scaffolds (N50 size, 27.33 Mb). A major proportion of the genome (284.76 Mb; 97.13%) was anchored to 11 pseudochromosomes with a single-copy BUSCO score of 98.4%. Furthermore, 14,316 protein-coding genes, 165.14 Mb (57.99%) repetitive elements, and 871 noncoding RNAs were annotated and identified. Additionally, a manual annotation of 399 genes associated with chemosensation and 168 genes involved in detoxification was conducted. This study provides a valuable and high-quality genomic resource to facilitate further functional genomics research on parasitoid wasps.
Collapse
Affiliation(s)
- Bin Yan
- Institute of Entomology, Guizhou University, Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region Guiyang, Guizhou, China
- Natural Enemies Breeding Center of Guizhou, Guizhou University, Guiyang, Guizhou, China
| | - Xueyuan Di
- Institute of Entomology, Guizhou University, Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region Guiyang, Guizhou, China
- Natural Enemies Breeding Center of Guizhou, Guizhou University, Guiyang, Guizhou, China
| | - Maofa Yang
- Institute of Entomology, Guizhou University, Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region Guiyang, Guizhou, China
- Natural Enemies Breeding Center of Guizhou, Guizhou University, Guiyang, Guizhou, China
- College of Tobacco Science, Guizhou University, Guiyang, Guizhou, China
| | - Huizi Wu
- Guizhou Provincial Tobacco Company Zunyi Branch, Zunyi, Guizhou, China
| | - Xiaofei Yu
- Natural Enemies Breeding Center of Guizhou, Guizhou University, Guiyang, Guizhou, China
- College of Tobacco Science, Guizhou University, Guiyang, Guizhou, China
| | - Feng Zhang
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, China
| |
Collapse
|
17
|
Fan Z, Wang LY, Xiao L, Tan B, Luo B, Ren TY, Liu N, Zhang ZS, Bai M. Lampshade web spider Ectatosticta davidi chromosome-level genome assembly provides evidence for its phylogenetic position. Commun Biol 2023; 6:748. [PMID: 37463957 PMCID: PMC10354039 DOI: 10.1038/s42003-023-05129-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 07/10/2023] [Indexed: 07/20/2023] Open
Abstract
The spider of Ectatosticta davidi, belonging to the lamp-shade web spider family, Hypochilidae, which is closely related to Hypochilidae and Filistatidae and recovered as sister of the rest Araneomorphs spiders. Here we show the final assembled genome of E. davidi with 2.16 Gb in 15 chromosomes. Then we confirm the evolutionary position of Hypochilidae. Moreover, we find that the GMC gene family exhibit high conservation throughout the evolution of true spiders. We also find that the MaSp genes of E. davidi may represent an early stage of MaSp and MiSp genes in other true spiders, while CrSp shares a common origin with AgSp and PySp but differ from MaSp. Altogether, this study contributes to addressing the limited availability of genomic sequences from Hypochilidae spiders, and provides a valuable resource for investigating the genomic evolution of spiders.
Collapse
Affiliation(s)
- Zheng Fan
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China
- School of Life Sciences, Southwest University, 400700, Chongqing, China
| | - Lu-Yu Wang
- School of Life Sciences, Southwest University, 400700, Chongqing, China
| | - Lin Xiao
- School of Life Sciences, Southwest University, 400700, Chongqing, China
| | - Bing Tan
- School of Life Sciences, Southwest University, 400700, Chongqing, China
| | - Bin Luo
- School of Life Sciences, Southwest University, 400700, Chongqing, China
| | - Tian-Yu Ren
- School of Life Sciences, Southwest University, 400700, Chongqing, China
| | - Ning Liu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China.
| | - Zhi-Sheng Zhang
- School of Life Sciences, Southwest University, 400700, Chongqing, China.
| | - Ming Bai
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China.
- Northeast Asia Biodiversity Research Center, Northeast Forestry University, 150040, Harbin, China.
- University of Chinese Academy of Sciences, 100049, Beijing, China.
| |
Collapse
|
18
|
Feng T, Pucker B, Kuang T, Song B, Yang Y, Lin N, Zhang H, Moore MJ, Brockington SF, Wang Q, Deng T, Wang H, Sun H. The genome of the glasshouse plant noble rhubarb (Rheum nobile) provides a window into alpine adaptation. Commun Biol 2023; 6:706. [PMID: 37429977 DOI: 10.1038/s42003-023-05044-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 06/14/2023] [Indexed: 07/12/2023] Open
Abstract
Glasshouse plants are species that trap warmth via specialized morphology and physiology, mimicking a human glasshouse. In the Himalayan alpine region, the highly specialized glasshouse morphology has independently evolved in distinct lineages to adapt to intensive UV radiation and low temperature. Here we demonstrate that the glasshouse structure - specialized cauline leaves - is highly effective in absorbing UV light but transmitting visible and infrared light, creating an optimal microclimate for the development of reproductive organs. We reveal that this glasshouse syndrome has evolved at least three times independently in the rhubarb genus Rheum. We report the genome sequence of the flagship glasshouse plant Rheum nobile and identify key genetic network modules in association with the morphological transition to specialized glasshouse leaves, including active secondary cell wall biogenesis, upregulated cuticular cutin biosynthesis, and suppression of photosynthesis and terpenoid biosynthesis. The distinct cell wall organization and cuticle development might be important for the specialized optical property of glasshouse leaves. We also find that the expansion of LTRs has likely played an important role in noble rhubarb adaptation to high elevation environments. Our study will enable additional comparative analyses to identify the genetic basis underlying the convergent occurrence of glasshouse syndrome.
Collapse
Affiliation(s)
- Tao Feng
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China
- CAS Key Laboratory for Plant Biodiversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
- Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
| | - Boas Pucker
- Department of Plant Sciences, University of Cambridge, Tennis Court Road, Cambridge, CB2 3EA, UK
- CeBiTec & Faculty of Biology, Bielefeld University, Universitaetsstrasse, Bielefeld, 33615, Germany
- Institute of Plant Biology & BRICS, TU Braunschweig, 38106, Braunschweig, Germany
| | - Tianhui Kuang
- CAS Key Laboratory for Plant Biodiversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| | - Bo Song
- CAS Key Laboratory for Plant Biodiversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| | - Ya Yang
- Department of Plant and Microbial Biology, University of Minnesota, Twin Cities, St. Paul, MN, 55108, USA
| | - Nan Lin
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China
- Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
| | - Huajie Zhang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China
- Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
| | - Michael J Moore
- Department of Biology, Oberlin College, Oberlin, OH, 44074, USA
| | - Samuel F Brockington
- Department of Plant Sciences, University of Cambridge, Tennis Court Road, Cambridge, CB2 3EA, UK
| | - Qingfeng Wang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China
- Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
| | - Tao Deng
- CAS Key Laboratory for Plant Biodiversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China.
| | - Hengchang Wang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China.
- Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, Hubei, 430074, China.
| | - Hang Sun
- CAS Key Laboratory for Plant Biodiversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China.
| |
Collapse
|
19
|
Varadi M, Bordin N, Orengo C, Velankar S. The opportunities and challenges posed by the new generation of deep learning-based protein structure predictors. Curr Opin Struct Biol 2023; 79:102543. [PMID: 36807079 DOI: 10.1016/j.sbi.2023.102543] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 01/04/2023] [Accepted: 01/13/2023] [Indexed: 02/21/2023]
Abstract
The function of proteins can often be inferred from their three-dimensional structures. Experimental structural biologists spent decades studying these structures, but the accelerated pace of protein sequencing continuously increases the gaps between sequences and structures. The early 2020s saw the advent of a new generation of deep learning-based protein structure prediction tools that offer the potential to predict structures based on any number of protein sequences. In this review, we give an overview of the impact of this new generation of structure prediction tools, with examples of the impacted field in the life sciences. We discuss the novel opportunities and new scientific and technical challenges these tools present to the broader scientific community. Finally, we highlight some potential directions for the future of computational protein structure prediction.
Collapse
Affiliation(s)
- Mihaly Varadi
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College, London, London, WC1E 6BT, UK. https://twitter.com/nicolabordin
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College, London, London, WC1E 6BT, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
20
|
Choudhury A, Saha S, Maiti NC, Datta S. Exploring structural features and potential lipid interactions of Pseudomonas aeruginosa type three secretion effector PemB by spectroscopic and calorimetric experiments. Protein Sci 2023; 32:e4627. [PMID: 36916835 PMCID: PMC10044109 DOI: 10.1002/pro.4627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 03/06/2023] [Accepted: 03/10/2023] [Indexed: 03/15/2023]
Abstract
Type Three Secretion System (T3SS) is a sophisticated nano-scale weapon utilized by several gram negative bacteria under stringent spatio-temporal regulation to manipulate and evade host immune systems in order to cause infection. To the best of our knowledge, this present study is the first report where we embark upon characterizing inherent features of native type three secretion effector protein PemB through biophysical techniques. Herein, first, we demonstrate binding affinity of PemB for phosphoinositides through isothermal calorimetric titrations. Second, we shed light on its strong homo-oligomerization propensity in aqueous solution through multiple biophysical methods. Third, we also employ several spectroscopic techniques to delineate its disordered and helical conformation. Lastly, we perform a phylogenetic analysis of this new effector to elucidate evolutionary relationship with other organisms. Taken together, our results shall surely contribute to our existing knowledge of Pseudomonas aeruginosa secretome.
Collapse
Affiliation(s)
- Arkaprabha Choudhury
- Department of Structural Biology and BioinformaticsCSIR‐Indian Institute of Chemical Biology (CSIR‐IICB)Kolkata700032India
- Biological SciencesAcademy of Scientific and Innovative Research (AcSIR)201002GhaziabadIndia
| | - Saumen Saha
- Department of Structural Biology and BioinformaticsCSIR‐Indian Institute of Chemical Biology (CSIR‐IICB)Kolkata700032India
| | - Nakul Chandra Maiti
- Department of Structural Biology and BioinformaticsCSIR‐Indian Institute of Chemical Biology (CSIR‐IICB)Kolkata700032India
- Biological SciencesAcademy of Scientific and Innovative Research (AcSIR)201002GhaziabadIndia
| | - Saumen Datta
- Department of Structural Biology and BioinformaticsCSIR‐Indian Institute of Chemical Biology (CSIR‐IICB)Kolkata700032India
- Biological SciencesAcademy of Scientific and Innovative Research (AcSIR)201002GhaziabadIndia
| |
Collapse
|
21
|
AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun Biol 2023; 6:160. [PMID: 36755055 PMCID: PMC9908985 DOI: 10.1038/s42003-023-04488-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/16/2023] [Indexed: 02/10/2023] Open
Abstract
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
Collapse
|
22
|
Zhang Q, Zhou Q, Han S, Li Y, Wang Y, He H. The genome of sheep ked (Melophagus ovinus) reveals potential mechanisms underlying reproduction and narrower ecological niches. BMC Genomics 2023; 24:54. [PMID: 36717784 PMCID: PMC9887928 DOI: 10.1186/s12864-023-09155-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 01/27/2023] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND Melophagus ovinus is considered to be of great veterinary health significance. However, little is known about the information on genetic mechanisms of the specific biological characteristics and novel methods for controlling M. ovinus. RESULTS In total, the de novo genome assembly of M. ovinus was 188.421 Mb in size (330 scaffolds, N50 Length: 10.666 Mb), with a mean GC content of 27.74%. A total of 13,372 protein-coding genes were functionally annotated. Phylogenetic analysis indicated that the diversification of M. ovinus and Glossina fuscipes took place 72.76 Mya within the Late Cretaceous. Gene family expansion and contraction analysis revealed that M. ovinus has 65 rapidly-evolving families (26 expansion and 39 contractions) mainly involved DNA metabolic activity, transposases activity, odorant receptor 59a/67d-like, IMD domain-containing protein, and cuticle protein, etc. The universal and tightly conserved list of milk protein orthologues has been assembled from the genome of M. ovinus. Contractions and losses of sensory receptors and vision-associated Rhodopsin genes were significant in M. ovinus, which indicate that the M. ovinus has narrower ecological niches. CONCLUSIONS We sequenced, assembled, and annotated the whole genome sequence of M. ovinus, and launches into the preliminary genetic mechanisms analysis of the adaptive evolution characteristics of M. ovinus. These resources will provide insights to understand the biological underpinnings of this parasite and the disease control strategies.
Collapse
Affiliation(s)
- Qingxun Zhang
- National Research Center for Wildlife-Borne Diseases, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
- Beijing Milu Ecological Research Center, Beijing, 100076, China
| | - Qingsong Zhou
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Shuyi Han
- National Research Center for Wildlife-Borne Diseases, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Ying Li
- State Key Laboratory of Plateau Ecology and Agriculture, Qinghai University, Xining, 810016, China
| | - Ye Wang
- National Research Center for Wildlife-Borne Diseases, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Hongxuan He
- National Research Center for Wildlife-Borne Diseases, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China.
| |
Collapse
|
23
|
Nallapareddy V, Bordin N, Sillitoe I, Heinzinger M, Littmann M, Waman VP, Sen N, Rost B, Orengo C. CATHe: detection of remote homologues for CATH superfamilies using embeddings from protein language models. Bioinformatics 2023; 39:6989624. [PMID: 36648327 PMCID: PMC9887088 DOI: 10.1093/bioinformatics/btad029] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 12/07/2022] [Accepted: 01/16/2023] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION CATH is a protein domain classification resource that exploits an automated workflow of structure and sequence comparison alongside expert manual curation to construct a hierarchical classification of evolutionary and structural relationships. The aim of this study was to develop algorithms for detecting remote homologues missed by state-of-the-art hidden Markov model (HMM)-based approaches. The method developed (CATHe) combines a neural network with sequence representations obtained from protein language models. It was assessed using a dataset of remote homologues having less than 20% sequence identity to any domain in the training set. RESULTS The CATHe models trained on 1773 largest and 50 largest CATH superfamilies had an accuracy of 85.6 ± 0.4% and 98.2 ± 0.3%, respectively. As a further test of the power of CATHe to detect more remote homologues missed by HMMs derived from CATH domains, we used a dataset consisting of protein domains that had annotations in Pfam, but not in CATH. By using highly reliable CATHe predictions (expected error rate <0.5%), we were able to provide CATH annotations for 4.62 million Pfam domains. For a subset of these domains from Homo sapiens, we structurally validated 90.86% of the predictions by comparing their corresponding AlphaFold2 structures with structures from the CATH superfamilies to which they were assigned. AVAILABILITY AND IMPLEMENTATION The code for the developed models is available on https://github.com/vam-sin/CATHe, and the datasets developed in this study can be accessed on https://zenodo.org/record/6327572. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vamsi Nallapareddy
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology—i12, Technical University of Munich (TUM), Garching/Munich 85748, Germany
| | - Maria Littmann
- Department of Informatics, Bioinformatics and Computational Biology—i12, Technical University of Munich (TUM), Garching/Munich 85748, Germany
| | - Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology—i12, Technical University of Munich (TUM), Garching/Munich 85748, Germany
- Institute for Advanced Study (TUM-IAS), Garching/Munich 85748, Germany
- TUM School of Life Sciences Weihenstephan (WZW) 85354, Germany
| | | |
Collapse
|
24
|
Miller J, Zimin AV, Gordus A. Chromosome-level genome and the identification of sex chromosomes in Uloborus diversus. Gigascience 2022; 12:giad002. [PMID: 36762707 PMCID: PMC9912274 DOI: 10.1093/gigascience/giad002] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 11/18/2022] [Accepted: 01/03/2023] [Indexed: 02/11/2023] Open
Abstract
The orb web is a remarkable example of animal architecture that is observed in families of spiders that diverged over 200 million years ago. While several genomes exist for araneid orb-weavers, none exist for other orb-weaving families, hampering efforts to investigate the genetic basis of this complex behavior. Here we present a chromosome-level genome assembly for the cribellate orb-weaving spider Uloborus diversus. The assembly reinforces evidence of an ancient arachnid genome duplication and identifies complete open reading frames for every class of spidroin gene, which encode the proteins that are the key structural components of spider silks. We identified the 2 X chromosomes for U. diversus and identify candidate sex-determining loci. This chromosome-level assembly will be a valuable resource for evolutionary research into the origins of orb-weaving, spidroin evolution, chromosomal rearrangement, and chromosomal sex determination in spiders.
Collapse
Affiliation(s)
- Jeremiah Miller
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Aleksey V Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Andrew Gordus
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
25
|
Zeller M, Huson DH. Comparison of functional classification systems. NAR Genom Bioinform 2022; 4:lqac090. [PMID: 36465499 PMCID: PMC9713901 DOI: 10.1093/nargab/lqac090] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 11/08/2022] [Indexed: 06/12/2024] Open
Abstract
In microbiome analysis, functional profiling is based on assigning reads or contigs to terms or nodes in a functional classification system. There are a number of large, general-purpose functional classifications that are in use, such as eggNOG, KEGG, InterPro and SEED. Smaller, special-purpose classifications include CARD, EC, MetaCyc and VFDB. Here, we compare the different classifications in terms of their overlap, redundancy, structure and assignment rates. We also provide mappings between main concepts in different classifications. For the large classifications, we find that eggNOG performs the best with respect to sequence redundancy and structure, SEED has the cleanest hierarchy, whereas KEGG and InterPro:BP might be more informative for medical applications. We illustrate the practical assignment rates for different classifications using a number of metagenomic samples.
Collapse
Affiliation(s)
- Monika Zeller
- Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, Sand 14, 72076 Tübingen, Germany
| | - Daniel H Huson
- Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, Sand 14, 72076 Tübingen, Germany
| |
Collapse
|
26
|
Tubiana T, Sillitoe I, Orengo C, Reuter N. Dissecting peripheral protein-membrane interfaces. PLoS Comput Biol 2022; 18:e1010346. [PMID: 36516231 PMCID: PMC9797079 DOI: 10.1371/journal.pcbi.1010346] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 12/28/2022] [Accepted: 11/24/2022] [Indexed: 12/15/2022] Open
Abstract
Peripheral membrane proteins (PMPs) include a wide variety of proteins that have in common to bind transiently to the chemically complex interfacial region of membranes through their interfacial binding site (IBS). In contrast to protein-protein or protein-DNA/RNA interfaces, peripheral protein-membrane interfaces are poorly characterized. We collected a dataset of PMP domains representative of the variety of PMP functions: membrane-targeting domains (Annexin, C1, C2, discoidin C2, PH, PX), enzymes (PLA, PLC/D) and lipid-transfer proteins (START). The dataset contains 1328 experimental structures and 1194 AphaFold models. We mapped the amino acid composition and structural patterns of the IBS of each protein in this dataset, and evaluated which were more likely to be found at the IBS compared to the rest of the domains' accessible surface. In agreement with earlier work we find that about two thirds of the PMPs in the dataset have protruding hydrophobes (Leu, Ile, Phe, Tyr, Trp and Met) at their IBS. The three aromatic amino acids Trp, Tyr and Phe are a hallmark of PMPs IBS regardless of whether they protrude on loops or not. This is also the case for lysines but not arginines suggesting that, unlike for Arg-rich membrane-active peptides, the less membrane-disruptive lysine is preferred in PMPs. Another striking observation was the over-representation of glycines at the IBS of PMPs compared to the rest of their surface, possibly procuring IBS loops a much-needed flexibility to insert in-between membrane lipids. The analysis of the 9 superfamilies revealed amino acid distribution patterns in agreement with their known functions and membrane-binding mechanisms. Besides revealing novel amino acids patterns at protein-membrane interfaces, our work contributes a new PMP dataset and an analysis pipeline that can be further built upon for future studies of PMPs properties, or for developing PMPs prediction tools using for example, machine learning approaches.
Collapse
Affiliation(s)
- Thibault Tubiana
- Department of Chemistry, University of Bergen, Bergen, Norway
- Computational Biology Unit, University of Bergen, Bergen, Norway
| | - Ian Sillitoe
- Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Nathalie Reuter
- Department of Chemistry, University of Bergen, Bergen, Norway
- Computational Biology Unit, University of Bergen, Bergen, Norway
| |
Collapse
|
27
|
Yue J, Wei Y, Sun Z, Chen Y, Wei X, Wang H, Pasin F, Zhao M. AlkB RNA demethylase homologues and N 6 -methyladenosine are involved in Potyvirus infection. MOLECULAR PLANT PATHOLOGY 2022; 23:1555-1564. [PMID: 35700092 PMCID: PMC9452765 DOI: 10.1111/mpp.13239] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 05/25/2022] [Accepted: 05/26/2022] [Indexed: 05/28/2023]
Abstract
Proteins of the alkylation B (AlkB) superfamily show RNA demethylase activity removing methyl adducts from N6 -methyladenosine (m6 A). m6 A is a reversible epigenetic mark of RNA that regulates human virus replication but has unclear roles in plant virus infection. We focused on Potyvirus-the largest genus of plant RNA viruses-and report here the identification of AlkB domains within P1 of endive necrotic mosaic virus (ENMV) and an additional virus of a putative novel species within Potyvirus. We show that Nicotiana benthamiana m6 A levels are reduced by infection of plum pox virus (PPV) and potato virus Y (PVY). The two potyviruses lack AlkB and the results suggest a general involvement of RNA methylation in potyvirus infection and evolution. Methylated RNA immunoprecipitation sequencing of virus-infected samples showed that m6 A peaks are enriched in plant transcript 3' untranslated regions and in discrete internal and 3' terminal regions of PPV and PVY genomes. Down-regulation of N. benthamiana AlkB homologues of the plant-specific ALKBH9 clade caused a significant decrease in PPV and PVY accumulation. In summary, our study provides evolutionary and experimental evidence that supports the m6 A implication and the proviral roles of AlkB homologues in Potyvirus infection.
Collapse
Affiliation(s)
- Jianying Yue
- College of Horticulture and Plant ProtectionInner Mongolia Agricultural UniversityHohhotChina
| | - Yao Wei
- College of Horticulture and Plant ProtectionInner Mongolia Agricultural UniversityHohhotChina
| | - Zhenqi Sun
- College of Horticulture and Plant ProtectionInner Mongolia Agricultural UniversityHohhotChina
| | - Yahan Chen
- College of Plant ProtectionGansu Agricultural UniversityLanzhouChina
| | - Xuefeng Wei
- Development of Fine ChemicalsGuizhou UniversityGuizhouChina
| | - Haijuan Wang
- College of Horticulture and Plant ProtectionInner Mongolia Agricultural UniversityHohhotChina
| | - Fabio Pasin
- Instituto de Biología Molecular y Celular de Plantas (IBMCP)Consejo Superior de Investigaciones Científicas—Universitat Politècnica de València (CSIC‐UPV)ValenciaSpain
- School of ScienceUniversity of PaduaPaduaItaly
| | - Mingmin Zhao
- College of Horticulture and Plant ProtectionInner Mongolia Agricultural UniversityHohhotChina
| |
Collapse
|
28
|
Accounting for small variations in the tracrRNA sequence improves sgRNA activity predictions for CRISPR screening. Nat Commun 2022; 13:5255. [PMID: 36068235 PMCID: PMC9448816 DOI: 10.1038/s41467-022-33024-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 08/30/2022] [Indexed: 12/17/2022] Open
Abstract
CRISPR technology is a powerful tool for studying genome function. To aid in picking sgRNAs that have maximal efficacy against a target of interest from many possible options, several groups have developed models that predict sgRNA on-target activity. Although multiple tracrRNA variants are commonly used for screening, no existing models account for this feature when nominating sgRNAs. Here we develop an on-target model, Rule Set 3, that makes optimal predictions for multiple tracrRNA variants. We validate Rule Set 3 on a new dataset of sgRNAs tiling essential and non-essential genes, demonstrating substantial improvement over prior prediction models. By analyzing the differences in sgRNA activity between tracrRNA variants, we show that Pol III transcription termination is a strong determinant of sgRNA activity. We expect these results to improve the performance of CRISPR screening and inform future research on tracrRNA engineering and sgRNA modeling.
Collapse
|
29
|
Fang Y, Qin X, Liao Q, Du R, Luo X, Zhou Q, Li Z, Chen H, Jin W, Yuan Y, Sun P, Zhang R, Zhang J, Wang L, Cheng S, Yang X, Yan Y, Zhang X, Zhang Z, Bai S, Van de Peer Y, Lucas WJ, Huang S, Yan J. The genome of homosporous maidenhair fern sheds light on the euphyllophyte evolution and defences. NATURE PLANTS 2022; 8:1024-1037. [PMID: 36050462 PMCID: PMC7613604 DOI: 10.1038/s41477-022-01222-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 07/13/2022] [Indexed: 05/06/2023]
Abstract
Euphyllophytes encompass almost all extant plants, including two sister clades, ferns and seed plants. Decoding genomes of ferns is the key to deep insight into the origin of euphyllophytes and the evolution of seed plants. Here we report a chromosome-level genome assembly of Adiantum capillus-veneris L., a model homosporous fern. This fern genome comprises 30 pseudochromosomes with a size of 4.8-gigabase and a contig N50 length of 16.22 Mb. Gene co-expression network analysis uncovered that homospore development in ferns has relatively high genetic similarities with that of the pollen in seed plants. Analysing fern defence response expands understanding of evolution and diversity in endogenous bioactive jasmonates in plants. Moreover, comparing fern genomes with those of other land plants reveals changes in gene families important for the evolutionary novelties within the euphyllophyte clade. These results lay a foundation for studies on fern genome evolution and function, as well as the origin and evolution of euphyllophytes.
Collapse
Affiliation(s)
- Yuhan Fang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| | - Xing Qin
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Qinggang Liao
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Ran Du
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Xizhi Luo
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Qian Zhou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- Peng Cheng Laboratory, Artificial Intelligence Research Center, Shenzhen, China
| | - Zhen Li
- Department of Plant Biotechnology and Bioinformatics, Ghent University and VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Hengchi Chen
- Department of Plant Biotechnology and Bioinformatics, Ghent University and VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Wanting Jin
- State Key Laboratory of Protein and Plant Gene Research, Quantitative Biology Center, College of Life Sciences, Peking University, Beijing, China
| | - Yaning Yuan
- State Key Laboratory of Protein and Plant Gene Research, Quantitative Biology Center, College of Life Sciences, Peking University, Beijing, China
| | - Pengbo Sun
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Rui Zhang
- Eastern China Conservation Centre for Wild Endangered Plant Resources, Shanghai Chenshan Botanical Garden, Shanghai, China
| | - Jiao Zhang
- Eastern China Conservation Centre for Wild Endangered Plant Resources, Shanghai Chenshan Botanical Garden, Shanghai, China
| | - Li Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Shifeng Cheng
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Xueyong Yang
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yuehong Yan
- The Orchid Conservation and Research Centre of Shenzhen, Shenzhen, China
| | - Xingtan Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Zhonghua Zhang
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
- College of Horticulture, Qingdao Agricultural University, Qingdao, China
| | - Shunong Bai
- State Key Laboratory of Protein and Plant Gene Research, Quantitative Biology Center, College of Life Sciences, Peking University, Beijing, China
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, Ghent University and VIB Center for Plant Systems Biology, Ghent, Belgium
- College of Horticulture, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, China
- Centre for Microbial Ecology and Genomics, Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa
| | - William John Lucas
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- Department of Plant Biology, College of Biological Sciences, University of California, Davis, CA, USA
| | - Sanwen Huang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Jianbin Yan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| |
Collapse
|
30
|
Climate-Endangered Arctic Epishelf Lake Harbors Viral Assemblages with Distinct Genetic Repertoires. Appl Environ Microbiol 2022; 88:e0022822. [PMID: 36005820 PMCID: PMC9469726 DOI: 10.1128/aem.00228-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Milne Fiord, located on the coastal margin of the Last Ice Area (LIA) in the High Arctic (82°N, Canada), harbors an epishelf lake, a rare type of ice-dependent ecosystem in which a layer of freshwater overlies marine water connected to the open ocean. This microbe-dominated ecosystem faces catastrophic change due to the deterioration of its ice environment related to warming temperatures. We produced the first assessment of viral abundance, diversity, and distribution in this vulnerable ecosystem and explored the niches available for viral taxa and the functional genes underlying their distribution. We found that the viral community in the freshwater layer was distinct from, and more diverse than, the community in the underlying seawater and contained a different set of putative auxiliary metabolic genes, including the sulfur starvation-linked gene tauD and the gene coding for patatin-like phospholipase. The halocline community resembled the freshwater more than the marine community, but harbored viral taxa unique to this layer. We observed distinct viral assemblages immediately below the halocline, at a depth that was associated with a peak of prasinophyte algae and the viral family Phycodnaviridae. We also assembled 15 complete circular genomes, including a putative Pelagibacter phage with a marine distribution. It appears that despite its isolated and precarious situation, the varied niches in this epishelf lake support a diverse viral community, highlighting the importance of characterizing underexplored microbiota in the Last Ice Area before these ecosystems undergo irreversible change. IMPORTANCE Viruses are key to understanding polar aquatic ecosystems, which are dominated by microorganisms. However, studies of viral communities are challenging to interpret because the vast majority of viruses are known only from sequence fragments, and their taxonomy, hosts, and genetic repertoires are unknown. Our study establishes a basis for comparison that will advance understanding of viral ecology in diverse global environments, particularly in the High Arctic. Rising temperatures in this region mean that researchers have limited time remaining to understand the biodiversity and biogeochemical cycles of ice-dependent environments and the consequences of these rapid, irreversible changes. The case of the Milne Fiord epishelf lake has special urgency because of the rarity of this type of “floating lake” ecosystem and its location in the Last Ice Area, a region of thick sea ice with global importance for conservation efforts.
Collapse
|
31
|
Song YF, Yu LC, Yang MF, Ye S, Yan B, Li LT, Wu C, Liu JF. A Long-Read Genome Assembly of a Native Mite in China Pyemotes zhonghuajia Yu, Zhang & He (Prostigmata: Pyemotidae) Reveals Gene Expansion in Toxin-Related Gene Families. Toxins (Basel) 2022; 14:toxins14080571. [PMID: 36006233 PMCID: PMC9415403 DOI: 10.3390/toxins14080571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/16/2022] [Accepted: 08/17/2022] [Indexed: 11/16/2022] Open
Abstract
Pyemotes zhonghuajia Yu, Zhang & He (Prostigmata: Pyemotidae), discovered in China, has been demonstrated as a high-efficient natural enemy in controlling many agricultural and forestry pests. This mite injects toxins into the host (eggs, larvae, pupae, and adults), resulting in its paralyzation and then gets nourishment for reproductive development. These toxins have been approved to be mammal-safe, which have the potential to be used as biocontrol pesticides. Toxin proteins have been identified from many insects, especially those from the orders Scorpions and Araneae, some of which are now widely used as efficient biocontrol pesticides. However, toxin proteins in mites are not yet understood. In this study, we assembled the genome of P. zhonghuajia using PacBio technology and then identified toxin-related genes that are likely to be responsible for the paralytic process of P. zhonghuajia. The genome assembly has a size of 71.943 Mb, including 20 contigs with a N50 length of 21.248 Mb and a BUSCO completeness ratio of 90.6% (n = 1367). These contigs were subsequently assigned to three chromosomes. There were 11,183 protein coding genes annotated, which were assessed with 91.2% BUSCO completeness (n = 1066). Neurotoxin and dermonecrotic toxin gene families were significantly expanded within the genus of Pyemotes and they also formed several gene clusters on the chromosomes. Most of the genes from these two families and all of the three agatoxin genes were shown with higher expression in the one-day-old mites compared to the seven-day-pregnant mites, supporting that the one-day-old mites cause paralyzation and even death of the host. The identification of these toxin proteins may provide insights into how to improve the parasitism efficiency of this mite, and the purification of these proteins may be used to develop new biological pesticides.
Collapse
Affiliation(s)
- Yan-Fei Song
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Guiyang 550025, China
- Institute of Entomology, Guizhou University, Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region, Scientific Observing and Experiment Station of Crop Pest Guiyang, Ministry of Agriculture, Guiyang 550025, China
| | - Li-Chen Yu
- Changli Institute of Pomology, Hebei Academy of Agriculture and Forestry Sciences, Changli 066600, China
| | - Mao-Fa Yang
- Institute of Entomology, Guizhou University, Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region, Scientific Observing and Experiment Station of Crop Pest Guiyang, Ministry of Agriculture, Guiyang 550025, China
- College of Tobacco Science, Guizhou University, Guiyang 550025, China
| | - Shuai Ye
- Institute of Entomology, Guizhou University, Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region, Scientific Observing and Experiment Station of Crop Pest Guiyang, Ministry of Agriculture, Guiyang 550025, China
| | - Bin Yan
- Institute of Entomology, Guizhou University, Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region, Scientific Observing and Experiment Station of Crop Pest Guiyang, Ministry of Agriculture, Guiyang 550025, China
| | - Li-Tao Li
- Changli Institute of Pomology, Hebei Academy of Agriculture and Forestry Sciences, Changli 066600, China
| | - Chen Wu
- The New Zealand Institute for Plant and Food Research Limited, Auckland 1142, New Zealand
| | - Jian-Feng Liu
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Guiyang 550025, China
- Institute of Entomology, Guizhou University, Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region, Scientific Observing and Experiment Station of Crop Pest Guiyang, Ministry of Agriculture, Guiyang 550025, China
- Correspondence:
| |
Collapse
|
32
|
Luan YX, Cui Y, Chen WJ, Jin JF, Liu AM, Huang CW, Potapov M, Bu Y, Zhan S, Zhang F, Li S. High-quality genomes reveal significant genetic divergence and cryptic speciation in the model organism Folsomia candida (Collembola). Mol Ecol Resour 2022; 23:273-293. [PMID: 35962787 PMCID: PMC10087712 DOI: 10.1111/1755-0998.13699] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 12/01/2022]
Abstract
The collembolan Folsomia candida Willem, 1902, is widely distributed throughout the world and has been frequently used as a test organism in soil ecology and ecotoxicology studies. However, it is questioned as an ideal "standard" because of differences in reproductive modes and cryptic genetic diversity between strains from various geographical origins. In this study, we obtained two high-quality chromosome-level genomes of F. candida, for a parthenogenetic strain (named as FCDK, 219.08 Mb, 25,139 protein-coding genes) and a sexual strain (named as FCSH, 153.09 Mb, 21,609 protein-coding genes), reannotated the genome of the parthenogenetic strain reported by Faddeeva-Vakhrusheva et al. in 2017 (named as FCBL, 221.7 Mb, 25,980 protein-coding genes), and conducted comparative genomic analyses of three strains. High genome similarities between FCDK and FCBL on synteny, genome architecture, mitochondrial and nuclear gene sequences support they are conspecific. The seven chromosomes of FCDK are each 25-54% larger than the corresponding chromosomes of FCSH, showing obvious repetitive element expansions and large-scale inversions and translocations but no whole-genome duplication. The strain-specific genes, expanded gene families and genes in nonsyntenic chromosomal regions identified in FCDK are highly related to the broader environmental adaptation of parthenogenetic strains. In addition, FCDK has fewer strain-specific microRNAs than FCSH, and their mitochondrial and nuclear genes have diverged greatly. In conclusion, FCDK/FCBL and FCSH have accumulated independent genetic changes and evolved into distinct species since 10 Mya. Our work provides important genomic resources for studying the mechanisms of rapidly cryptic speciation and soil arthropod adaptation to soil ecosystems.
Collapse
Affiliation(s)
- Yun-Xia Luan
- Guangdong Provincial Key Laboratory of Insect Development Biology and Applied Technology, Institute of Insect Science and Technology, School of Life Sciences, South China Normal University, Guangzhou, China.,Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, China
| | - Yingying Cui
- Guangdong Provincial Key Laboratory of Insect Development Biology and Applied Technology, Institute of Insect Science and Technology, School of Life Sciences, South China Normal University, Guangzhou, China
| | | | - Jian-Feng Jin
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, China
| | - Ai-Min Liu
- Department of Pomology, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Cheng-Wang Huang
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China
| | | | - Yun Bu
- Natural History Research Center, Shanghai Natural History Museum, Shanghai Science & Technology Museum, Shanghai, China
| | - Shuai Zhan
- CAS Key Laboratory of Insect Developmental and Evolutionary Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Feng Zhang
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, China
| | - Sheng Li
- Guangdong Provincial Key Laboratory of Insect Development Biology and Applied Technology, Institute of Insect Science and Technology, School of Life Sciences, South China Normal University, Guangzhou, China.,Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, China.,Guangmeiyuan R&D Center, Guangdong Provincial Key Laboratory of Insect Developmental Biology and Applied Technology, South China Normal University, Meizhou, China
| |
Collapse
|
33
|
Heinzinger M, Littmann M, Sillitoe I, Bordin N, Orengo C, Rost B. Contrastive learning on protein embeddings enlightens midnight zone. NAR Genom Bioinform 2022; 4:lqac043. [PMID: 35702380 PMCID: PMC9188115 DOI: 10.1093/nargab/lqac043] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 03/25/2022] [Accepted: 05/17/2022] [Indexed: 12/23/2022] Open
Abstract
Experimental structures are leveraged through multiple sequence alignments, or more generally through homology-based inference (HBI), facilitating the transfer of information from a protein with known annotation to a query without any annotation. A recent alternative expands the concept of HBI from sequence-distance lookup to embedding-based annotation transfer (EAT). These embeddings are derived from protein Language Models (pLMs). Here, we introduce using single protein representations from pLMs for contrastive learning. This learning procedure creates a new set of embeddings that optimizes constraints captured by hierarchical classifications of protein 3D structures defined by the CATH resource. The approach, dubbed ProtTucker, has an improved ability to recognize distant homologous relationships than more traditional techniques such as threading or fold recognition. Thus, these embeddings have allowed sequence comparison to step into the 'midnight zone' of protein similarity, i.e. the region in which distantly related sequences have a seemingly random pairwise sequence similarity. The novelty of this work is in the particular combination of tools and sampling techniques that ascertained good performance comparable or better to existing state-of-the-art sequence comparison methods. Additionally, since this method does not need to generate alignments it is also orders of magnitudes faster. The code is available at https://github.com/Rostlab/EAT.
Collapse
Affiliation(s)
- Michael Heinzinger
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Maria Littmann
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Burkhard Rost
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching, Germany & TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
34
|
Pan Z, Jin J, Xu C, Yu D. Chromosomal-level genome assembly of the springtail Tomocerus qinae (Collembola: Tomoceridae). Genome Biol Evol 2022; 14:6550138. [PMID: 35298623 PMCID: PMC8995043 DOI: 10.1093/gbe/evac039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/10/2022] [Indexed: 11/14/2022] Open
Abstract
The family Tomoceridae is among the earliest derived collembolan lineages, thus is of key importance in understanding the evolution of Collembola. Here, we assembled a chromosome-level genome of one tomocerid species Tomocerus qinae by combining Nanopore long reads and Hi-C data. The final genome size was 334.44 Mb with the scaffold/contig N50 length of 71.85/13.94 Mb. BUSCO assessment indicated that 96.80% of complete arthropod universal single-copy orthologs (n = 1,013) were present in the assembly. The repeat elements accounted for 26.11% (87.26 Mb) and 494 noncoding RNAs were identified in the genome. A total of 20,451 protein-coding genes were predicted, which captured 96.0% (973) BUSCO genes. Gene family evolution analyses identified 4,825 expanded gene families of T. qinae, among them, 47 experienced significant expansions, and these significantly expanded gene families mainly involved in proliferation and growth. This study provides an important genomic resource for future evolution and comparative genomics analyses of Collembola.
Collapse
Affiliation(s)
- Zhixiang Pan
- School of Life Sciences, Taizhou University, Taizhou, Zhejiang province 318000, China
| | - Jianfeng Jin
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing 210095, China
| | - Cong Xu
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing 210095, China
| | - Daoyuan Yu
- Soil Ecology Lab, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing 210095, China
| |
Collapse
|
35
|
Kurgan L. Resources for computational prediction of intrinsic disorder in proteins. Methods 2022; 204:132-141. [DOI: 10.1016/j.ymeth.2022.03.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 03/25/2022] [Accepted: 03/29/2022] [Indexed: 12/26/2022] Open
|
36
|
Zhang D, Jin J, Niu Z, Zhang F, Orr MC, Zhou Q, Luo A, Zhu C. Chromosome-Level Genome Assembly of Anthidium xuezhongi Niu & Zhu, 2020 (Hymenoptera: Apoidea: Megachilidae: Anthidiini). Genome Biol Evol 2022; 14:6527634. [PMID: 35150256 PMCID: PMC8850706 DOI: 10.1093/gbe/evac014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2022] [Indexed: 11/23/2022] Open
Abstract
Anthidiini, a large bee tribe characterized by light-colored maculations, represents nearly 1,000 pollinator species, but no genomes are yet available for this tribe. Here, we report a chromosome-level genome assembly of Anthidium xuezhongi collected from the Tibetan Plateau. Using PacBio long reads and Hi-C data, we assembled a genome of 189.14 Mb with 99.94% of the assembly located in 16 chromosomes. Our assembly contains 23 scaffolds, with the scaffold N50 length of 12.53 Mb, and BUSCO completeness of 98.70% (n = 1,367). We masked 25.98 Mb (13.74%) of the assembly as repetitive elements, identified 385 noncoding RNAs, and predicted 10,820 protein-coding genes (99.20% BUSCO completeness). Gene family evolution analyses identified 9,251 gene families, of which 31 gene families experienced rapid evolution. Interspecific chromosomal variation among A. xuezhongi, Bombus terrestris, and Apis mellifera showed strong chromosomal syntenic relationships. This high-quality genome assembly is a valuable resource for evolutionary and comparative genomic analyses of bees.
Collapse
Affiliation(s)
- Dan Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, P. R., China.,College of Biological Sciences, University of Chinese Academy of Sciences, No.19A Yuquan Road, Shijingshan District, Beijing, 10049, P. R., China
| | - Jianfeng Jin
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, P. R. China
| | - Zeqing Niu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, P. R., China
| | - Feng Zhang
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, P. R. China
| | - Michael C Orr
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, P. R., China
| | - Qingsong Zhou
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, P. R., China
| | - Arong Luo
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, P. R., China.,International College, University of Chinese Academy of Sciences, No.19A Yuquan Road, Shijingshan District, Beijing, 10049, P. R., China
| | - Chaodong Zhu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, P. R., China.,College of Biological Sciences, University of Chinese Academy of Sciences, No.19A Yuquan Road, Shijingshan District, Beijing, 10049, P. R., China.,State Key Laboratory of Integrated Pest Management, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, P. R., China
| |
Collapse
|
37
|
Anand N, Eguchi R, Mathews II, Perez CP, Derry A, Altman RB, Huang PS. Protein sequence design with a learned potential. Nat Commun 2022; 13:746. [PMID: 35136054 PMCID: PMC8826426 DOI: 10.1038/s41467-022-28313-9] [Citation(s) in RCA: 66] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Accepted: 01/08/2022] [Indexed: 11/08/2022] Open
Abstract
The task of protein sequence design is central to nearly all rational protein engineering problems, and enormous effort has gone into the development of energy functions to guide design. Here, we investigate the capability of a deep neural network model to automate design of sequences onto protein backbones, having learned directly from crystal structure data and without any human-specified priors. The model generalizes to native topologies not seen during training, producing experimentally stable designs. We evaluate the generalizability of our method to a de novo TIM-barrel scaffold. The model produces novel sequences, and high-resolution crystal structures of two designs show excellent agreement with in silico models. Our findings demonstrate the tractability of an entirely learned method for protein sequence design.
Collapse
Affiliation(s)
- Namrata Anand
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Raphael Eguchi
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Irimpan I Mathews
- Stanford Synchrotron Radiation Lightsource, Menlo Park, CA, 94025, USA
| | - Carla P Perez
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Alexander Derry
- Biomedical Informatics Training Program, Stanford University, Stanford, CA, USA
| | - Russ B Altman
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Departments of Genetics and Medicine, Stanford University, Stanford, CA, USA
| | - Po-Ssu Huang
- Department of Bioengineering, Stanford University, Stanford, CA, USA.
| |
Collapse
|
38
|
Mascotti ML. Resurrecting Enzymes by Ancestral Sequence Reconstruction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2397:111-136. [PMID: 34813062 DOI: 10.1007/978-1-0716-1826-4_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Ancestral Sequence Reconstruction (ASR) allows one to infer the sequences of extinct proteins using the phylogeny of extant proteins. It consists of disclosing the evolutionary history-i.e., the phylogeny-of a protein family of interest and then inferring the sequences of its ancestors-i.e., the nodes in the phylogeny. Assisted by gene synthesis, the selected ancestors can be resurrected in the lab and experimentally characterized. The crucial step to succeed with ASR is starting from a reliable phylogeny. At the same time, it is of the utmost importance to have a clear idea on the evolutionary history of the family under study and the events that influenced it. This allows us to implement ASR with well-defined hypotheses and to apply the appropriate experimental methods. In the last years, ASR has become popular to test hypotheses about the origin of functionalities, changes in activities, understanding physicochemical properties of proteins, among others. In this context, the aim of this chapter is to present the ASR approach applied to the reconstruction of enzymes-i.e., proteins with catalytic roles. The spirit of this contribution is to provide a basic, hands-to-work guide for biochemists and biologists who are unfamiliar with molecular phylogenetics.
Collapse
Affiliation(s)
- Maria Laura Mascotti
- Molecular Enzymology group, University of Groningen, Groningen, The Netherlands. .,IMIBIO-SL CONICET, Facultad de Química Bioquímica y Farmacia, Universidad Nacional de San Luis, San Luis, Argentina.
| |
Collapse
|
39
|
Kandathil SM, Greener JG, Lau AM, Jones DT. Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins. Proc Natl Acad Sci U S A 2022; 119:e2113348119. [PMID: 35074909 PMCID: PMC8795500 DOI: 10.1073/pnas.2113348119] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 12/07/2021] [Indexed: 12/12/2022] Open
Abstract
Deep learning-based prediction of protein structure usually begins by constructing a multiple sequence alignment (MSA) containing homologs of the target protein. The most successful approaches combine large feature sets derived from MSAs, and considerable computational effort is spent deriving these input features. We present a method that greatly reduces the amount of preprocessing required for a target MSA, while producing main chain coordinates as a direct output of a deep neural network. The network makes use of just three recurrent networks and a stack of residual convolutional layers, making the predictor very fast to run, and easy to install and use. Our approach constructs a directly learned representation of the sequences in an MSA, starting from a one-hot encoding of the sequences. When supplemented with an approximate precision matrix, the learned representation can be used to produce structural models of comparable or greater accuracy as compared to our original DMPfold method, while requiring less than a second to produce a typical model. This level of accuracy and speed allows very large-scale three-dimensional modeling of proteins on minimal hardware, and we demonstrate this by producing models for over 1.3 million uncharacterized regions of proteins extracted from the BFD sequence clusters. After constructing an initial set of approximate models, we select a confident subset of over 30,000 models for further refinement and analysis, revealing putative novel protein folds. We also provide updated models for over 5,000 Pfam families studied in the original DMPfold paper.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Department of Computer Science, University College London, London, WC1E 6BT, United Kingdom
| | - Joe G Greener
- Department of Computer Science, University College London, London, WC1E 6BT, United Kingdom
| | - Andy M Lau
- Department of Computer Science, University College London, London, WC1E 6BT, United Kingdom
| | - David T Jones
- Department of Computer Science, University College London, London, WC1E 6BT, United Kingdom
| |
Collapse
|
40
|
Raghavan V, Kraft L, Mesny F, Rigerte L. A simple guide to de novo transcriptome assembly and annotation. Brief Bioinform 2022; 23:6514404. [PMID: 35076693 PMCID: PMC8921630 DOI: 10.1093/bib/bbab563] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 12/03/2021] [Accepted: 12/09/2021] [Indexed: 12/13/2022] Open
Abstract
A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be assembled de novo using only the information available in the RNA-seq reads. Subsequently, the sequences must be annotated in order to identify sequence-intrinsic and evolutionary features in them (for example, protein-coding regions). Although straightforward at first glance, de novo transcriptome assembly and annotation can quickly prove to be challenging undertakings. In addition to familiarizing themselves with the conceptual and technical intricacies of the tasks at hand and the numerous pre- and post-processing steps involved, those interested must also grapple with an overwhelmingly large choice of tools. The lack of standardized workflows, fast pace of development of new tools and techniques and paucity of authoritative literature have served to exacerbate the difficulty of the task even further. Here, we present a comprehensive overview of de novo transcriptome assembly and annotation. We discuss the procedures involved, including pre- and post-processing steps, and present a compendium of corresponding tools.
Collapse
Affiliation(s)
- Venket Raghavan
- Corresponding authors: Venket Raghavan, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail: ; Louis Kraft, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail:
| | - Louis Kraft
- Corresponding authors: Venket Raghavan, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail: ; Louis Kraft, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail:
| | | | | |
Collapse
|
41
|
Zhang J, Zhang F, Tay WT, Robin C, Shi Y, Guan F, Yang Y, Wu Y. Population genomics provides insights into lineage divergence and local adaptation within the cotton bollworm. Mol Ecol Resour 2022; 22:1875-1891. [PMID: 35007400 DOI: 10.1111/1755-0998.13581] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 12/20/2021] [Accepted: 01/05/2022] [Indexed: 11/28/2022]
Abstract
The cotton bollworm Helicoverpa armigera is a cosmopolitan pest and its diverse habitats plausibly contribute to the formation of diverse lineages. Despite the significant threat it poses to economic crops worldwide, its evolutionary history and genetic basis of local adaptation are poorly understood. In this study, we de novo assembled a high-quality chromosome-level reference genome of H. a. armigera (contig N50 = 7.34 Mb), with 99.13% of the HaSCD2 assembly assigned into 31 chromosomes (Z-chromosome + 30 autosomes). We constructed an ultra-dense variation map across 14 cotton bollworm populations and identified a novel lineage in northwestern China. Historical inference showed that effective population size changes coincided with global temperature fluctuation. We identified nine differentiated genes in the three H. armigera lineages (H. a. armigera, H. a. conferta, and the new northwestern Chinese lineage), of which per and clk genes are involved in circadian rhythm. Selective sweep analyses identified a series of GO categories related to climate adaptation, feeding behavior and insecticide tolerance. Our findings reveal fundamental knowledge of the local adaptation of different cotton bollworm lineages and will guide the formulation of cotton bollworm management measures at different scales.
Collapse
Affiliation(s)
- Jianpeng Zhang
- College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, China
| | - Feng Zhang
- College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, China
| | - Wee Tek Tay
- CSIRO Black Mountain Laboratories, Clunies Ross Street, ACT, 2601, Australia
| | - Charles Robin
- School of BioSciences, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Yu Shi
- College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, China
| | - Fang Guan
- College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, China
| | - Yihua Yang
- College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, China
| | - Yidong Wu
- College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, China
| |
Collapse
|
42
|
Shahrear S, Afroj Zinnia M, Sany MRU, Islam ABMMK. Functional Analysis of Hypothetical Proteins of Vibrio parahaemolyticus Reveals the Presence of Virulence Factors and Growth-Related Enzymes With Therapeutic Potential. Bioinform Biol Insights 2022; 16:11779322221136002. [PMID: 36386863 PMCID: PMC9661560 DOI: 10.1177/11779322221136002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 09/30/2022] [Indexed: 11/11/2022] Open
Abstract
Vibrio parahaemolyticus, an aquatic pathogen, is a major concern in the shrimp aquaculture industry. Several strains of this pathogen are responsible for causing acute hepatopancreatic necrosis disease as well as other serious illness, both of which result in severe economic losses. The genome sequence of two pathogenic strains of V. parahaemolyticus, MSR16 and MSR17, isolated from Bangladesh, have been reported to gain a better understanding of their diversity and virulence. However, the prevalence of hypothetical proteins (HPs) makes it challenging to obtain a comprehensive understanding of the pathogenesis of V. parahaemolyticus. The aim of the present study is to provide a functional annotation of the HPs to elucidate their role in pathogenesis employing several in silico tools. The exploration of protein domains and families, similarity searches against proteins with known function, gene ontology enrichment, along with protein-protein interaction analysis of the HPs led to the functional assignment with a high level of confidence for 656 proteins out of a pool of 2631 proteins. The in silico approach used in this study was important for accurately assigning function to HPs and inferring interactions with proteins with previously described functions. The HPs with function predicted were categorized into various groups such as enzymes involved in small-compound biosynthesis pathway, iron binding proteins, antibiotics resistance proteins, and other proteins. Several proteins with potential druggability were identified among them. In addition, the HPs were investigated in search of virulent factors, which led to the identification of proteins that have the potential to be exploited as vaccine candidate. The findings of the study will be effective in gaining a better understanding of the molecular mechanisms of bacterial pathogenesis. They may also provide an insight into the process of evaluating promising targets for the development of drugs and vaccines against V. parahaemolyticus.
Collapse
Affiliation(s)
- Sazzad Shahrear
- Department of Genetic Engineering and Biotechnology, University of Dhaka, Dhaka, Bangladesh
| | | | - Md. Rabi Us Sany
- Department of Genetic Engineering and Biotechnology, University of Dhaka, Dhaka, Bangladesh
| | | |
Collapse
|
43
|
Yadav NS, Kumar P, Singh I. Structural and functional analysis of protein. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00026-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
44
|
Abstract
Chelonus formosanus Sonan 1932 (Hymenoptera: Braconidae) is a wasp capable of parasitizing a variety of lepidopteran pests at the “egg-larval” stage which distributes throughout Taiwan, Guangdong, Zhejiang, and Hainan provinces of China. This wasp has been successfully used to control pests such as Spodoptera litura Fabricius, 1775, Spodoptera frugiperda (JE Smith, 1797), Spodoptera exigua (Hübner, 1808), and Helicoverpa armigera (Hübner, 1808). So far, there is only one genome assembled from the Chelonus genus [Chelonus insularis (Cresson, 1865)] and it is fragmented with 455 scaffolds. Here, we report a chromosome-level genome assembly of C. formosanus, which was sequenced using PacBio, Illumina, and Hi-C technologies. The long reads were 35.4 Gb (∼150× coverage) with an average length of 15.23 kb. The size of the genome assembly was 139.59 Mb. More than 99.46% of the assembled sequences were anchored to seven pseudochromosomes (138.84 Mb). The Benchmarking University Single-Copy Orthologs (BUSCO) assessment results showed 99.0% of the 1,367 genes (insect_odb10 database) were completely present. We annotated 11,242 protein-coding genes including 98.6% of BUSCO complete genes that were recovered. Nearly one-fourth of the genome assembly (22.25%) was annotated as repetitive sequences and 324 noncoding RNAs were predicted. There were 58 gene families found with significant expansion including allelopathic families (odorant receptors and ionotropic receptors), which may play a crucial role in efficiently locating a wide range of hosts. This high-quality genome assembly and annotation could provide a highly valuable resource of parasitic wasp for the biological control of Lepidoptera pest.
Collapse
Affiliation(s)
- Jian-Feng Liu
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Guiyang, China
- Institute of Entomology, Guizhou University; Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region; Scientific Observing and Experimental Station of Crop Pest in Guiyang, Ministry of Agriculture, Guiyang, China
| | - Hai-Yan Zhao
- College of Tobacco Science, Guizhou University, Guiyang, China
- Corresponding authors: E-mails: ;
| | - Yan-Fei Song
- Institute of Entomology, Guizhou University; Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region; Scientific Observing and Experimental Station of Crop Pest in Guiyang, Ministry of Agriculture, Guiyang, China
| | - Yuan-Chan Yu
- Institute of Entomology, Guizhou University; Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region; Scientific Observing and Experimental Station of Crop Pest in Guiyang, Ministry of Agriculture, Guiyang, China
| | - Mao-Fa Yang
- Institute of Entomology, Guizhou University; Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region; Scientific Observing and Experimental Station of Crop Pest in Guiyang, Ministry of Agriculture, Guiyang, China
- College of Tobacco Science, Guizhou University, Guiyang, China
- Corresponding authors: E-mails: ;
| |
Collapse
|
45
|
Schlachter CR, O’Malley A, Grimes LL, Tomashek JJ, Chruszcz M, Lee LA. Purification, Characterization, and Structural Studies of a Sulfatase from Pedobacter yulinensis. Molecules 2021; 27:87. [PMID: 35011319 PMCID: PMC8746622 DOI: 10.3390/molecules27010087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 12/15/2021] [Accepted: 12/21/2021] [Indexed: 11/17/2022] Open
Abstract
Sulfatases are ubiquitous enzymes that hydrolyze sulfate from sulfated organic substrates such as carbohydrates, steroids, and flavones. These enzymes can be exploited in the field of biotechnology to analyze sulfated metabolites in humans, such as steroids and drugs of abuse. Because genomic data far outstrip biochemical characterization, the analysis of sulfatases from published sequences can lead to the discovery of new and unique activities advantageous for biotechnological applications. We expressed and characterized a putative sulfatase (PyuS) from the bacterium Pedobacter yulinensis. PyuS contains the (C/S)XPXR sulfatase motif, where the Cys or Ser is post-translationally converted into a formylglycine residue (FGly). His-tagged PyuS was co-expressed in Escherichia coli with a formylglycine-generating enzyme (FGE) from Mycobacterium tuberculosis and purified. We obtained several crystal structures of PyuS, and the FGly modification was detected at the active site. The enzyme has sulfatase activity on aromatic sulfated substrates as well as phosphatase activity on some aromatic phosphates; however, PyuS did not have detectable activity on 17α-estradiol sulfate, cortisol 21-sulfate, or boldenone sulfate.
Collapse
Affiliation(s)
- Caleb R. Schlachter
- Integrated Micro-Chromatography Systems, 110 Centrum Drive, Irmo, SC 29063, USA; (C.R.S.); (L.L.G.); (J.J.T.)
| | - Andrea O’Malley
- Department of Chemistry and Biochemistry, University of South Carolina, Columbia, SC 29208, USA;
| | - Linda L. Grimes
- Integrated Micro-Chromatography Systems, 110 Centrum Drive, Irmo, SC 29063, USA; (C.R.S.); (L.L.G.); (J.J.T.)
| | - John J. Tomashek
- Integrated Micro-Chromatography Systems, 110 Centrum Drive, Irmo, SC 29063, USA; (C.R.S.); (L.L.G.); (J.J.T.)
| | - Maksymilian Chruszcz
- Department of Chemistry and Biochemistry, University of South Carolina, Columbia, SC 29208, USA;
| | - L. Andrew Lee
- Integrated Micro-Chromatography Systems, 110 Centrum Drive, Irmo, SC 29063, USA; (C.R.S.); (L.L.G.); (J.J.T.)
| |
Collapse
|
46
|
BEHZADI PAYAM, GAJDÁCS MÁRIÓ. Worldwide Protein Data Bank (wwPDB): A virtual treasure for research in biotechnology. Eur J Microbiol Immunol (Bp) 2021; 11:77-86. [PMID: 34908533 PMCID: PMC8830413 DOI: 10.1556/1886.2021.00020] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 11/23/2021] [Indexed: 12/25/2022] Open
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RSCB PDB) provides a wide range of digital data regarding biology and biomedicine. This huge internet resource involves a wide range of important biological data, obtained from experiments around the globe by different scientists. The Worldwide Protein Data Bank (wwPDB) represents a brilliant collection of 3D structure data associated with important and vital biomolecules including nucleic acids (RNAs and DNAs) and proteins. Moreover, this database accumulates knowledge regarding function and evolution of biomacromolecules which supports different disciplines such as biotechnology. 3D structure, functional characteristics and phylogenetic properties of biomacromolecules give a deep understanding of the biomolecules' characteristics. An important advantage of the wwPDB database is the data updating time, which is done every week. This updating process helps users to have the newest data and information for their projects. The data and information in wwPDB can be a great support to have an accurate imagination and illustrations of the biomacromolecules in biotechnology. As demonstrated by the SARS-CoV-2 pandemic, rapidly reliable and accessible biological data for microbiology, immunology, vaccinology, and drug development are critical to address many healthcare-related challenges that are facing humanity. The aim of this paper is to introduce the readers to wwPDB, and to highlight the importance of this database in biotechnology, with the expectation that the number of scientists interested in the utilization of Protein Data Bank's resources will increase substantially in the coming years.
Collapse
Affiliation(s)
- PAYAM BEHZADI
- Department of Microbiology, College of Basic Sciences, Shahr-e-Qods Branch, Islamic Azad University, Tehran, 37541-374, Iran
| | - MÁRIÓ GAJDÁCS
- Department of Oral Biology and Experimental Dental Research, Faculty of Dentistry, University of Szeged, 6720, Szeged, Hungary,*Corresponding author. Tel.: +36-62-342-532. E-mail:
| |
Collapse
|
47
|
Yan B, Yu X, Dai R, Li Z, Yang M. Chromosome-Level Genome Assembly of Nephotettix cincticeps (Uhler, 1896) (Hemiptera: Cicadellidae: Deltocephalinae). Genome Biol Evol 2021; 13:evab236. [PMID: 34677607 PMCID: PMC8598198 DOI: 10.1093/gbe/evab236] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/14/2021] [Indexed: 12/22/2022] Open
Abstract
The green rice leafhopper, Nephotettix cincticeps (Uhler), is an important rice pest and a vector of the rice dwarf virus in Asia. Here, we produced a high-quality chromosome-level genome assembly of 753.23 Mb using PacBio (∼110×) and Hi-C data (∼94×). It contained 163 scaffolds and 950 contigs, whose scaffold/contig N50 lengths reached 85.36/2.57 Mb. And 731.19 Mb (97.07%) of the assembly was anchored into eight pseudochromosomes. Genome completeness was attained to 97.0% according to the insect reference Benchmarking Universal Single-Copy Orthologs (BUSCO) gene set (n = 1,367). We masked 347.10 Mb (46.08%) of the genome as repetitive elements. Nine hundred sixty-two noncoding RNAs were identified and 14,337 protein-coding genes were predicted. We also assigned GO term and KEGG pathway annotations for 10,049 and 9,251 genes, respectively. Significantly expanded gene families were primarily involved in immunity, cuticle, digestion, detoxification, and embryonic development. This study provided a crucial genomic resource for better understanding on the biology and evolution in family Cicadellidae.
Collapse
Affiliation(s)
- Bin Yan
- Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region, Institute of Entomology, Guizhou University, Guiyang, China
| | - Xiaofei Yu
- College of Tobacco Science, Guizhou University, Guiyang, China
| | - Renhuai Dai
- Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region, Institute of Entomology, Guizhou University, Guiyang, China
| | - Zizhong Li
- Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region, Institute of Entomology, Guizhou University, Guiyang, China
| | - Maofa Yang
- Guizhou Provincial Key Laboratory for Agricultural Pest Management of the Mountainous Region, Institute of Entomology, Guizhou University, Guiyang, China
- College of Tobacco Science, Guizhou University, Guiyang, China
| |
Collapse
|
48
|
Tian X, Su X, Li C, Zhou Y, Li S, Guo J, Fan Q, Lü S, Zhang Y. Draft genome of the blister beetle, Epicauta chinensis. Int J Biol Macromol 2021; 193:1694-1706. [PMID: 34742848 DOI: 10.1016/j.ijbiomac.2021.11.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 10/31/2021] [Accepted: 11/01/2021] [Indexed: 01/07/2023]
Abstract
Existence of cantharidin (CTD) in blister beetles is a significant ecological adaptive mechanism that defends against predators and regulates courtship and mating behaviors. To better understand CTD biosynthetic information as well as its biology and pharmacology, we assembled a genome of 151.88 Mb for Epicauta chinensis using PacBio sequencing technology. Gene annotation yielded 249,238 repeats, 527 non-coding RNAs and 12,520 protein-coding genes. Compared to other 11 insects, expansions of gene families in E. chinensis for most core gene families likely associated with environmental adaptation, such as chemoreception, immunity, and detoxification. We further annotated P450s and immune-related genes, a total of 117 putative P450s comprising 7 CYP2, 67 CYP3, 36 CYP4, and 7 mitochondrial P450s and 281 immune-related genes were identified. Comparative analysis of the insect immune repertoires indicated presence of immune genes detected only from Coleopteran insects such as MD2-like. This suggested a lineage-specific gene evolution for Coleopteran insects. Based on the gene family evolution analysis, we identified two probable candidate genes including CYP4TT1 and phytanoyl-CoA dioxygenase for CTD biosynthesis. The high-quality reference genome of E. chinensis provides the genetic basis for further investigation of CTD biosynthesis and in-depth studies of the development and evolution of blister beetles.
Collapse
Affiliation(s)
- Xing Tian
- Key Laboratory of Plant Protection Resources & Pest Management of the Ministry of Education, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xinxin Su
- Key Laboratory of Plant Protection Resources & Pest Management of the Ministry of Education, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chenjing Li
- Key Laboratory of Plant Protection Resources & Pest Management of the Ministry of Education, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yifei Zhou
- Key Laboratory of Plant Protection Resources & Pest Management of the Ministry of Education, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Shuying Li
- Key Laboratory of Plant Protection Resources & Pest Management of the Ministry of Education, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jiamin Guo
- Key Laboratory of Plant Protection Resources & Pest Management of the Ministry of Education, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Qiqi Fan
- Key Laboratory of Plant Protection Resources & Pest Management of the Ministry of Education, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Shumin Lü
- Key Laboratory of Plant Protection Resources & Pest Management of the Ministry of Education, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China.
| | - Yalin Zhang
- Key Laboratory of Plant Protection Resources & Pest Management of the Ministry of Education, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China.
| |
Collapse
|
49
|
Genome-Wide Identification and Characterization of Polygalacturonase Gene Family in Maize ( Zea mays L.). Int J Mol Sci 2021; 22:ijms221910722. [PMID: 34639068 PMCID: PMC8509529 DOI: 10.3390/ijms221910722] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 09/27/2021] [Accepted: 09/29/2021] [Indexed: 11/29/2022] Open
Abstract
Polygalacturonase (PG, EC 3.2.1.15) is a crucial enzyme for pectin degradation and is involved in various developmental processes such as fruit ripening, pollen development, cell expansion, and organ abscission. However, information on the PG gene family in the maize (Zea mays L.) genome and the specific members involved in maize anther development are still lacking. In this study, we identified 55 PG family genes from the maize genome and further characterized their evolutionary relationship and expression patterns. Phylogenetic analysis revealed that ZmPGs are grouped into six Clades, and gene structures of the same Clade are highly conserved, suggesting their functional conservation. The ZmPGs are randomly distributed across maize chromosomes, and collinearity analysis showed that many ZmPGs might be derived from tandem duplications and segmental duplications, and these genes are under purifying selection. Furthermore, gene expression analysis provided insights into possible functional divergence among ZmPGs. Based on the RNA-seq data analysis, we found that many ZmPGs are expressed in various tissues while 18 ZmPGs are highly expressed in maize anther, and their detailed expression profiles in different anther developmental stages were further investigated by using RT-qPCR analysis. These results provide valuable information for further functional characterization and application of the ZmPGs in maize.
Collapse
|
50
|
Wang Y, Zhang R, Wang M, Zhang L, Shi CM, Li J, Fan F, Geng S, Liu X, Yang D. The first chromosome-level genome assembly of a green lacewing Chrysopa pallens and its implication for biological control. Mol Ecol Resour 2021; 22:755-767. [PMID: 34549894 PMCID: PMC9292380 DOI: 10.1111/1755-0998.13503] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 09/04/2021] [Accepted: 09/09/2021] [Indexed: 12/13/2022]
Abstract
Many lacewing species (Insecta: Neuroptera) are important predators of pests with great potential in biological control. So far, there is no chromosome‐level published genome available for Neuroptera. Here we report a high‐quality chromosome‐level reference genome for a green lacewing species Chrysopa pallens (Neuroptera: Chrysopidae), which is one of the most important insect natural enemies used in pest biocontrol. The genome was sequenced using a combination of PacBio and Hi‐C technologies and assembled into seven chromosomes with a total size of 517.21 Mb, occupying 96.07% of the genome sequence. A total of 12,840 protein‐coding genes were identified and approximately 206.21 Mb of repeated sequences were annotated. Phylogenetic analyses indicated that C. pallens diverged from its common ancestor with Tribolium castaneum (Coleoptera) approximately 300 million years ago. The gene families involved in digestion, detoxification, chemoreception, carbohydrate metabolism, immunity, nerves and development were significantly expanded, revealing the potential genomic basis for the polyphagia of C. pallens and its role as an excellent biocontrol agent. This high‐quality genome of C. pallens will provide an important genomic resource for future population genetics, evolutionary and phylogenetic investigations of Chrysopidae as well as comparative genomic studies of Neuropterida and other insects.
Collapse
Affiliation(s)
- Yuyu Wang
- College of Plant Protection, Hebei Agricultural University, Baoding, China
| | - Ruyue Zhang
- College of Plant Protection, Hebei Agricultural University, Baoding, China
| | - Mengqing Wang
- Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lisheng Zhang
- Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Cheng-Min Shi
- College of Plant Protection, Hebei Agricultural University, Baoding, China
| | - Jing Li
- College of Plant Protection, Hebei Agricultural University, Baoding, China
| | - Fan Fan
- College of Plant Protection, Hebei Agricultural University, Baoding, China
| | - Shuo Geng
- College of Plant Protection, Hebei Agricultural University, Baoding, China
| | - Xingyue Liu
- Department of Entomology, China Agricultural University, Beijing, China
| | - Ding Yang
- Department of Entomology, China Agricultural University, Beijing, China
| |
Collapse
|