1
|
Waman VP, Bordin N, Alcraft R, Vickerstaff R, Rauer C, Chan Q, Sillitoe I, Yamamori H, Orengo C. CATH 2024: CATH-AlphaFlow Doubles the Number of Structures in CATH and Reveals Nearly 200 New Folds. J Mol Biol 2024:168551. [PMID: 38548261 DOI: 10.1016/j.jmb.2024.168551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/20/2024] [Accepted: 03/22/2024] [Indexed: 04/07/2024]
Abstract
CATH (https://www.cathdb.info) classifies domain structures from experimental protein structures in the PDB and predicted structures in the AlphaFold Database (AFDB). To cope with the scale of the predicted data a new NextFlow workflow (CATH-AlphaFlow), has been developed to classify high-quality domains into CATH superfamilies and identify novel fold groups and superfamilies. CATH-AlphaFlow uses a novel state-of-the-art structure-based domain boundary prediction method (ChainSaw) for identifying domains in multi-domain proteins. We applied CATH-AlphaFlow to process PDB structures not classified in CATH and AFDB structures from 21 model organisms, expanding CATH by over 100%. Domains not classified in existing CATH superfamilies or fold groups were used to seed novel folds, giving 253 new folds from PDB structures (September 2023 release) and 96 from AFDB structures of proteomes of 21 model organisms. Where possible, functional annotations were obtained using (i) predictions from publicly available methods (ii) annotations from structural relatives in AFDB/UniProt50. We also predicted functional sites and highly conserved residues. Some folds are associated with important functions such as photosynthetic acclimation (in flowering plants), iron permease activity (in fungi) and post-natal spermatogenesis (in mice). CATH-AlphaFlow will allow us to identify many more CATH relatives in the AFDB, further characterising the protein structure landscape.
Collapse
Affiliation(s)
- Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Rachel Alcraft
- Advanced Research Computing Centre, University College London, London, United Kingdom
| | - Robert Vickerstaff
- Advanced Research Computing Centre, University College London, London, United Kingdom
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Qian Chan
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Hazuki Yamamori
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom.
| |
Collapse
|
2
|
Harihar B, Saravanan KM, Gromiha MM, Selvaraj S. Importance of Inter-residue Contacts for Understanding Protein Folding and Unfolding Rates, Remote Homology, and Drug Design. Mol Biotechnol 2024:10.1007/s12033-024-01119-4. [PMID: 38498284 DOI: 10.1007/s12033-024-01119-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 02/10/2024] [Indexed: 03/20/2024]
Abstract
Inter-residue interactions in protein structures provide valuable insights into protein folding and stability. Understanding these interactions can be helpful in many crucial applications, including rational design of therapeutic small molecules and biologics, locating functional protein sites, and predicting protein-protein and protein-ligand interactions. The process of developing machine learning models incorporating inter-residue interactions has been improved recently. This review highlights the theoretical models incorporating inter-residue interactions in predicting folding and unfolding rates of proteins. Utilizing contact maps to depict inter-residue interactions aids researchers in developing computer models for detecting remote homologs and interface residues within protein-protein complexes which, in turn, enhances our knowledge of the relationship between sequence and structure of proteins. Further, the application of contact maps derived from inter-residue interactions is highlighted in the field of drug discovery. Overall, this review presents an extensive assessment of the significant models that use inter-residue interactions to investigate folding rates, unfolding rates, remote homology, and drug development, providing potential future advancements in constructing efficient computational models in structural biology.
Collapse
Affiliation(s)
- Balasubramanian Harihar
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Konda Mani Saravanan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, 600073, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India.
| |
Collapse
|
3
|
Johnson SR, Peshwa M, Sun Z. Sensitive remote homology search by local alignment of small positional embeddings from protein language models. eLife 2024; 12:RP91415. [PMID: 38488154 PMCID: PMC10942778 DOI: 10.7554/elife.91415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2024] Open
Abstract
Accurately detecting distant evolutionary relationships between proteins remains an ongoing challenge in bioinformatics. Search methods based on primary sequence struggle to accurately detect homology between sequences with less than 20% amino acid identity. Profile- and structure-based strategies extend sensitive search capabilities into this twilight zone of sequence similarity but require slow pre-processing steps. Recently, whole-protein and positional embeddings from deep neural networks have shown promise for providing sensitive sequence comparison and annotation at long evolutionary distances. Embeddings are generally faster to compute than profiles and predicted structures but still suffer several drawbacks related to the ability of whole-protein embeddings to discriminate domain-level homology, and the database size and search speed of methods using positional embeddings. In this work, we show that low-dimensionality positional embeddings can be used directly in speed-optimized local search algorithms. As a proof of concept, we use the ESM2 3B model to convert primary sequences directly into the 3D interaction (3Di) alphabet or amino acid profiles and use these embeddings as input to the highly optimized Foldseek, HMMER3, and HH-suite search algorithms. Our results suggest that positional embeddings as small as a single byte can provide sufficient information for dramatically improved sensitivity over amino acid sequence searches without sacrificing search speed.
Collapse
Affiliation(s)
| | | | - Zhiyi Sun
- New England Biolabs IncIpswichUnited States
| |
Collapse
|
4
|
Ahuja N, Cao X, Schultz DT, Picciani N, Lord A, Shao S, Jia K, Burdick DR, Haddock SHD, Li Y, Dunn CW. Giants among Cnidaria: Large Nuclear Genomes and Rearranged Mitochondrial Genomes in Siphonophores. Genome Biol Evol 2024; 16:evae048. [PMID: 38502059 PMCID: PMC10980510 DOI: 10.1093/gbe/evae048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 02/20/2024] [Accepted: 03/07/2024] [Indexed: 03/20/2024] Open
Abstract
Siphonophores (Cnidaria: Hydrozoa) are abundant predators found throughout the ocean and are important constituents of the global zooplankton community. They range in length from a few centimeters to tens of meters. They are gelatinous, fragile, and difficult to collect, so many aspects of the biology of these roughly 200 species remain poorly understood. To survey siphonophore genome diversity, we performed Illumina sequencing of 32 species sampled broadly across the phylogeny. Sequencing depth was sufficient to estimate nuclear genome size from k-mer spectra in six specimens, ranging from 0.7 to 2.3 Gb, with heterozygosity estimates between 0.69% and 2.32%. Incremental k-mer counting indicates k-mer peaks can be absent with nearly 20× read coverage, suggesting minimum genome sizes range from 1.4 to 5.6 Gb in the 25 samples without peaks in the k-mer spectra. This work confirms most siphonophore nuclear genomes are large relative to the genomes of other cnidarians, but also identifies several with reduced size that are tractable targets for future siphonophore nuclear genome assembly projects. We also assembled complete mitochondrial genomes for 33 specimens from these new data, indicating a conserved gene order shared among nonsiphonophore hydrozoans, Cystonectae, and some Physonectae, revealing the ancestral mitochondrial gene order of siphonophores. Our results also suggest extensive rearrangement of mitochondrial genomes within other Physonectae and in Calycophorae. Though siphonophores comprise a small fraction of cnidarian species, this survey greatly expands our understanding of cnidarian genome diversity. This study further illustrates both the importance of deep phylogenetic sampling and the utility of k-mer-based genome skimming in understanding the genomic diversity of a clade.
Collapse
Affiliation(s)
- Namrata Ahuja
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
| | - Xuwen Cao
- Institute of Marine Science and Technology, Shandong University, Qingdao 266237, China
| | - Darrin T Schultz
- Department of Neuroscience and Developmental Biology, University of Vienna, Vienna 1010, Austria
| | - Natasha Picciani
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
| | - Arianna Lord
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Shengyuan Shao
- Institute of Marine Science and Technology, Shandong University, Qingdao 266237, China
| | - Kejue Jia
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, USA
| | | | | | - Yuanning Li
- Institute of Marine Science and Technology, Shandong University, Qingdao 266237, China
| | - Casey W Dunn
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
| |
Collapse
|
5
|
Rosen Y, Brbić M, Roohani Y, Swanson K, Li Z, Leskovec J. Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN. Nat Methods 2024:10.1038/s41592-024-02191-z. [PMID: 38366243 DOI: 10.1038/s41592-024-02191-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 01/22/2024] [Indexed: 02/18/2024]
Abstract
Analysis of single-cell datasets generated from diverse organisms offers unprecedented opportunities to unravel fundamental evolutionary processes of conservation and diversification of cell types. However, interspecies genomic differences limit the joint analysis of cross-species datasets to homologous genes. Here we present SATURN, a deep learning method for learning universal cell embeddings that encodes genes' biological properties using protein language models. By coupling protein embeddings from language models with RNA expression, SATURN integrates datasets profiled from different species regardless of their genomic similarity. SATURN can detect functionally related genes coexpressed across species, redefining differential expression for cross-species analysis. Applying SATURN to three species whole-organism atlases and frog and zebrafish embryogenesis datasets, we show that SATURN can effectively transfer annotations across species, even when they are evolutionarily remote. We also demonstrate that SATURN can be used to find potentially divergent gene functions between glaucoma-associated genes in humans and four other species.
Collapse
Affiliation(s)
- Yanay Rosen
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Maria Brbić
- School of Computer and Communication Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Yusuf Roohani
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Kyle Swanson
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Ziang Li
- Department of Computer Science and Technology, Tsinghua University, Beijing, China
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
6
|
Hannon Bozorgmehr J. Four classic "de novo" genes all have plausible homologs and likely evolved from retro-duplicated or pseudogenic sequences. Mol Genet Genomics 2024; 299:6. [PMID: 38315248 DOI: 10.1007/s00438-023-02090-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 10/15/2023] [Indexed: 02/07/2024]
Abstract
Despite being previously regarded as extremely unlikely, the idea that entirely novel protein-coding genes can emerge from non-coding sequences has gradually become accepted over the past two decades. Examples of "de novo origination", resulting in lineage-specific "orphan" genes, lacking coding orthologs, are now produced every year. However, many are likely cases of duplicates that are difficult to recognize. Here, I re-examine the claims and show that four very well-known examples of genes alleged to have emerged completely "from scratch"- FLJ33706 in humans, Goddard in fruit flies, BSC4 in baker's yeast and AFGP2 in codfish-may have plausible evolutionary ancestors in pre-existing genes. The first two are likely highly diverged retrogenes coding for regulatory proteins that have been misidentified as orphans. The antifreeze glycoprotein, moreover, may not have evolved from repetitive non-genic sequences but, as in several other related cases, from an apolipoprotein that could have become pseudogenized before later being reactivated. These findings detract from various claims made about de novo gene birth and show there has been a tendency not to invest the necessary effort in searching for homologs outside of a very limited syntenic or phylostratigraphic methodology. A robust approach is used for improving detection that draws upon similarities, not just in terms of statistical sequence analysis, but also relating to biochemistry and function, to obviate notable failures to identify homologs.
Collapse
|
7
|
Jaito N, Kaewsawat N, Phetlum S, Uengwetwanit T. Metagenomic discovery of lipases with predicted structural similarity to Candida antarctica lipase B. PLoS One 2023; 18:e0295397. [PMID: 38055755 PMCID: PMC10699602 DOI: 10.1371/journal.pone.0295397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 11/21/2023] [Indexed: 12/08/2023] Open
Abstract
Here we employed sequence-based and structure-based screening for prospecting lipases that have structural homolog to Candida antarctica lipase B (CalB). CalB, a widely used biocatalyst, was used as structural template reference because of its enzymatic properties. Structural homolog could aid in the discovery of novel wild-type enzymes with desirable features and serve as a scaffold for further biocatalyst design. The available metagenomic data isolated from various environments was leveraged as a source for bioprospecting. We identified two bacteria lipases that showed high structural similarity to CalB with <40% sequence identity. Partial purification was conducted. In comparison to CalB, the enzymatic characteristics of two potential lipases were examined. A candidate exhibited optimal pH of 8 and temperature of 50°C similar to CalB. The second lipase candidate demonstrated an optimal pH of 8 and a higher optimal temperature of 55°C. Notably, this candidate sustained considerable activity at extreme conditions, maintaining high activity at 70°C or pH 9, contrasting with the diminished activity of CalB under similar conditions. Further comprehensive experimentation is warranted to uncover and exploit these novel enzymatic properties for practical biotechnological purposes.
Collapse
Affiliation(s)
- Nongluck Jaito
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Nattha Kaewsawat
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Suthathip Phetlum
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Tanaporn Uengwetwanit
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| |
Collapse
|
8
|
Nussinov R, Liu Y, Zhang W, Jang H. Cell phenotypes can be predicted from propensities of protein conformations. Curr Opin Struct Biol 2023; 83:102722. [PMID: 37871498 PMCID: PMC10841533 DOI: 10.1016/j.sbi.2023.102722] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/25/2023]
Abstract
Proteins exist as dynamic conformational ensembles. Here we suggest that the propensities of the conformations can be predictors of cell function. The conformational states that the molecules preferentially visit can be viewed as phenotypic determinants, and their mutations work by altering the relative propensities, thus the cell phenotype. Our examples include (i) inactive state variants harboring cancer driver mutations that present active state-like conformational features, as in K-Ras4BG12V compared to other K-Ras4BG12X mutations; (ii) mutants of the same protein presenting vastly different phenotypic and clinical profiles: cancer and neurodevelopmental disorders; (iii) alterations in the occupancies of the conformational (sub)states influencing enzyme reactivity. Thus, protein conformational propensities can determine cell fate. They can also suggest the allosteric drugs efficiency.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA; Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA.
| | - Yonglan Liu
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA
| | - Wengang Zhang
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA; Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA
| |
Collapse
|
9
|
Kulikova AV, Parker JK, Davies BW, Wilke CO. Semantic search using protein large language models detects class II microcins in bacterial genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.15.567263. [PMID: 38014091 PMCID: PMC10680697 DOI: 10.1101/2023.11.15.567263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Class II microcins are antimicrobial peptides that have shown some potential as novel antibiotics. However, to date only ten class II microcins have been described, and discovery of novel microcins has been hampered by their short length and high sequence divergence. Here, we ask if we can use numerical embeddings generated by protein large language models to detect microcins in bacterial genome assemblies and whether this method can outperform sequence-based methods such as BLAST. We find that embeddings detect known class II microcins much more reliably than does BLAST and that any two microcins tend to have a small distance in embedding space even though they typically are highly diverged at the sequence level. In datasets of Escherichia coli , Klebsiella spp., and Enterobacter spp. genomes, we further find novel putative microcins that were previously missed by sequence-based search methods.
Collapse
|
10
|
Al-Ayari EA, Shehata MG, El-Hadidi M, Shaalan MG. In silico SNP prediction of selected protein orthologues in insect models for Alzheimer's, Parkinson's, and Huntington's diseases. Sci Rep 2023; 13:18986. [PMID: 37923901 PMCID: PMC10624829 DOI: 10.1038/s41598-023-46250-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 10/30/2023] [Indexed: 11/06/2023] Open
Abstract
Alzheimer's, Parkinson's, and Huntington's are the most common neurodegenerative diseases that are incurable and affect the elderly population. Discovery of effective treatments for these diseases is often difficult, expensive, and serendipitous. Previous comparative studies on different model organisms have revealed that most animals share similar cellular and molecular characteristics. The meta-SNP tool includes four different integrated tools (SIFT, PANTHER, SNAP, and PhD-SNP) was used to identify non synonymous single nucleotide polymorphism (nsSNPs). Prediction of nsSNPs was conducted on three representative proteins for Alzheimer's, Parkinson's, and Huntington's diseases; APPl in Drosophila melanogaster, LRRK1 in Aedes aegypti, and VCPl in Tribolium castaneum. With the possibility of using insect models to investigate neurodegenerative diseases. We conclude from the protein comparative analysis between different insect models and nsSNP analyses that D. melanogaster is the best model for Alzheimer's representing five nsSNPs of the 21 suggested mutations in the APPl protein. Aedes aegypti is the best model for Parkinson's representing three nsSNPs in the LRRK1 protein. Tribolium castaneum is the best model for Huntington's disease representing 13 SNPs of 37 suggested mutations in the VCPl protein. This study aimed to improve human neural health by identifying the best insect to model Alzheimer's, Parkinson's, and Huntington's.
Collapse
Affiliation(s)
- Eshraka A Al-Ayari
- Entomology Department, Faculty of Science, Ain Shams University, Cairo, Egypt.
| | - Magdi G Shehata
- Entomology Department, Faculty of Science, Ain Shams University, Cairo, Egypt
| | - Mohamed El-Hadidi
- Bioinformatics Group, Center for Informatics Sciences (CIS), School of Information Technology and Computer Science (ITCS) , Nile University, Giza, Egypt
| | - Mona G Shaalan
- Entomology Department, Faculty of Science, Ain Shams University, Cairo, Egypt
| |
Collapse
|
11
|
Kilinc M, Jia K, Jernigan RL. JSONWP: a static website generator for protein bioinformatics research. BIOINFORMATICS ADVANCES 2023; 3:vbad154. [PMID: 37904893 PMCID: PMC10613403 DOI: 10.1093/bioadv/vbad154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/18/2023] [Accepted: 10/24/2023] [Indexed: 11/01/2023]
Abstract
Motivation Presenting the integrated results of bioinformatics research can be challenging and requires sophisticated visualization components, which can be time-consuming to develop. This article presents a new way to effectively communicate research findings. Results We have developed a static web page generator, JSONWP, which is specifically designed for protein bioinformatics research. Utilizing React (a JavaScript library used to build interactive and dynamic user interfaces for web applications), we have integrated publicly available bioinformatics visualization components to provide standardized access to these components. JSON (or JavaScript Object Notation, is a lightweight textual data format often used to structure and exchange information between different software tools.) is used as the input source due to its ability to represent nearly all types of data using key and value pairs. This allows researchers to use their preferred programming language to create a JSON representation, which can then be converted into a website by JSONWP. No server or domain is required to host the website, as only the publicly accessible JSON file is required. Conclusions Overall, JSONWP provides a useful new tool for bioinformatics researchers to effectively communicate their findings. The open-source implementation is located at https://github.com/MesihK/react-json-wpbuilder, and the tool can be used at jsonwp.onrender.com.
Collapse
Affiliation(s)
- Mesih Kilinc
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, United States
| | - Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA 50011, United States
| | - Robert L Jernigan
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, United States
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA 50011, United States
| |
Collapse
|
12
|
Jia K, Kilinc M, Jernigan RL. New alignment method for remote protein sequences by the direct use of pairwise sequence correlations and substitutions. FRONTIERS IN BIOINFORMATICS 2023; 3:1227193. [PMID: 37900964 PMCID: PMC10602800 DOI: 10.3389/fbinf.2023.1227193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 08/14/2023] [Indexed: 10/31/2023] Open
Abstract
Understanding protein sequences and how they relate to the functions of proteins is extremely important. One of the most basic operations in bioinformatics is sequence alignment and usually the first things learned from these are which positions are the most conserved and often these are critical parts of the structure, such as enzyme active site residues. In addition, the contact pairs in a protein usually correspond closely to the correlations between residue positions in the multiple sequence alignment, and these usually change in a systematic and coordinated way, if one position changes then the other member of the pair also changes to compensate. In the present work, these correlated pairs are taken as anchor points for a new type of sequence alignment. The main advantage of the method here is its combining the remote homolog detection from our method PROST with pairwise sequence substitutions in the rigorous method from Kleinjung et al. We show a few examples of some resulting sequence alignments, and how they can lead to improvements in alignments for function, even for a disordered protein.
Collapse
Affiliation(s)
- Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
| | - Mesih Kilinc
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Robert L. Jernigan
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|
13
|
Harnagel AP, Sheshova M, Zheng M, Zheng M, Skorupinska-Tudek K, Swiezewska E, Lupoli TJ. Preference of Bacterial Rhamnosyltransferases for 6-Deoxysugars Reveals a Strategy To Deplete O-Antigens. J Am Chem Soc 2023. [PMID: 37437030 PMCID: PMC10375533 DOI: 10.1021/jacs.3c03005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/14/2023]
Abstract
Bacteria synthesize hundreds of bacteria-specific or "rare" sugars that are absent in mammalian cells and enriched in 6-deoxy monosaccharides such as l-rhamnose (l-Rha). Across bacteria, l-Rha is incorporated into glycans by rhamnosyltransferases (RTs) that couple nucleotide sugar substrates (donors) to target biomolecules (acceptors). Since l-Rha is required for the biosynthesis of bacterial glycans involved in survival or host infection, RTs represent potential antibiotic or antivirulence targets. However, purified RTs and their unique bacterial sugar substrates have been difficult to obtain. Here, we use synthetic nucleotide rare sugar and glycolipid analogs to examine substrate recognition by three RTs that produce cell envelope components in diverse species, including a known pathogen. We find that bacterial RTs prefer pyrimidine nucleotide-linked 6-deoxysugars, not those containing a C6-hydroxyl, as donors. While glycolipid acceptors must contain a lipid, isoprenoid chain length, and stereochemistry can vary. Based on these observations, we demonstrate that a 6-deoxysugar transition state analog inhibits an RT in vitro and reduces levels of RT-dependent O-antigen polysaccharides in Gram-negative cells. As O-antigens are virulence factors, bacteria-specific sugar transferase inhibition represents a novel strategy to prevent bacterial infections.
Collapse
Affiliation(s)
- Alexa P Harnagel
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Mia Sheshova
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Meng Zheng
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Maggie Zheng
- Department of Chemistry, New York University, New York, New York 10003, United States
| | | | - Ewa Swiezewska
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, 02-106, Poland
| | - Tania J Lupoli
- Department of Chemistry, New York University, New York, New York 10003, United States
| |
Collapse
|
14
|
Stok C, Tsaridou S, van den Tempel N, Everts M, Wierenga E, Bakker FJ, Kok Y, Alves IT, Jae LT, Raas MWD, Huis In 't Veld PJ, de Boer HR, Bhattacharya A, Karanika E, Warner H, Chen M, van de Kooij B, Dessapt J, Ter Morsche L, Perepelkina P, Fradet-Turcotte A, Guryev V, Tromer EC, Chan KL, Fehrmann RSN, van Vugt MATM. FIRRM/C1orf112 is synthetic lethal with PICH and mediates RAD51 dynamics. Cell Rep 2023; 42:112668. [PMID: 37347663 DOI: 10.1016/j.celrep.2023.112668] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 04/21/2023] [Accepted: 06/05/2023] [Indexed: 06/24/2023] Open
Abstract
Joint DNA molecules are natural byproducts of DNA replication and repair. Persistent joint molecules give rise to ultrafine DNA bridges (UFBs) in mitosis, compromising sister chromatid separation. The DNA translocase PICH (ERCC6L) has a central role in UFB resolution. A genome-wide loss-of-function screen is performed to identify the genetic context of PICH dependency. In addition to genes involved in DNA condensation, centromere stability, and DNA-damage repair, we identify FIGNL1-interacting regulator of recombination and mitosis (FIRRM), formerly known as C1orf112. We find that FIRRM interacts with and stabilizes the AAA+ ATPase FIGNL1. Inactivation of either FIRRM or FIGNL1 results in UFB formation, prolonged accumulation of RAD51 at nuclear foci, and impaired replication fork dynamics and consequently impairs genome maintenance. Combined, our data suggest that inactivation of FIRRM and FIGNL1 dysregulates RAD51 dynamics at replication forks, resulting in persistent DNA lesions and a dependency on PICH to preserve cell viability.
Collapse
Affiliation(s)
- Colin Stok
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Stavroula Tsaridou
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Nathalie van den Tempel
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Marieke Everts
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Elles Wierenga
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Femke J Bakker
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Yannick Kok
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Inês Teles Alves
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Lucas T Jae
- Gene Center and Department of Biochemistry, Ludwig-Maximilians-Universität München, Feodor-Lynen-Straße 25, 81377 Munich, Germany
| | - Maximilian W D Raas
- Oncode Institute, Hubrecht Institute, Royal Academy of Arts and Sciences, Uppsalalaan 8, 3584CT Utrecht, the Netherlands; Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Padualaan 8, 3584 CH Utrecht, the Netherlands
| | - Pim J Huis In 't Veld
- Department of Mechanistic Cell Biology, Max Planck Institute of Molecular Physiology, 44227 Dortmund, Germany
| | - H Rudolf de Boer
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Arkajyoti Bhattacharya
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Eleftheria Karanika
- Genome Damage and Stability Centre, University of Sussex, Brighton BN1 9RQ, UK
| | - Harry Warner
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Mengting Chen
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Bert van de Kooij
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Julien Dessapt
- CHU de Québec Research Center-Université Laval (L'Hôtel-Dieu de Québec), Cancer Research Center, Université Laval, Québec, QC GIR 3S3, Canada
| | - Lars Ter Morsche
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Polina Perepelkina
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Amelie Fradet-Turcotte
- CHU de Québec Research Center-Université Laval (L'Hôtel-Dieu de Québec), Cancer Research Center, Université Laval, Québec, QC GIR 3S3, Canada
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Eelco C Tromer
- Cell Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, Faculty of Science and Engineering, University of Groningen, Nijenborgh 7, 9747 AG Groningen, the Netherlands
| | - Kok-Lung Chan
- Genome Damage and Stability Centre, University of Sussex, Brighton BN1 9RQ, UK
| | - Rudolf S N Fehrmann
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands
| | - Marcel A T M van Vugt
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713GZ Groningen, the Netherlands.
| |
Collapse
|