1
|
Viner C, Ishak CA, Johnson J, Walker NJ, Shi H, Sjöberg-Herrera MK, Shen SY, Lardo SM, Adams DJ, Ferguson-Smith AC, De Carvalho DD, Hainer SJ, Bailey TL, Hoffman MM. Modeling methyl-sensitive transcription factor motifs with an expanded epigenetic alphabet. Genome Biol 2024; 25:11. [PMID: 38191487 PMCID: PMC10773111 DOI: 10.1186/s13059-023-03070-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 09/21/2023] [Indexed: 01/10/2024] Open
Abstract
BACKGROUND Transcription factors bind DNA in specific sequence contexts. In addition to distinguishing one nucleobase from another, some transcription factors can distinguish between unmodified and modified bases. Current models of transcription factor binding tend not to take DNA modifications into account, while the recent few that do often have limitations. This makes a comprehensive and accurate profiling of transcription factor affinities difficult. RESULTS Here, we develop methods to identify transcription factor binding sites in modified DNA. Our models expand the standard A/C/G/T DNA alphabet to include cytosine modifications. We develop Cytomod to create modified genomic sequences and we also enhance the MEME Suite, adding the capacity to handle custom alphabets. We adapt the well-established position weight matrix (PWM) model of transcription factor binding affinity to this expanded DNA alphabet. Using these methods, we identify modification-sensitive transcription factor binding motifs. We confirm established binding preferences, such as the preference of ZFP57 and C/EBPβ for methylated motifs and the preference of c-Myc for unmethylated E-box motifs. CONCLUSIONS Using known binding preferences to tune model parameters, we discover novel modified motifs for a wide array of transcription factors. Finally, we validate our binding preference predictions for OCT4 using cleavage under targets and release using nuclease (CUT&RUN) experiments across conventional, methylation-, and hydroxymethylation-enriched sequences. Our approach readily extends to other DNA modifications. As more genome-wide single-base resolution modification data becomes available, we expect that our method will yield insights into altered transcription factor binding affinities across many different modifications.
Collapse
Affiliation(s)
- Coby Viner
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Charles A Ishak
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - James Johnson
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Nicolas J Walker
- Department of Genetics, University of Cambridge, Cambridge, England
| | - Hui Shi
- Department of Genetics, University of Cambridge, Cambridge, England
| | - Marcela K Sjöberg-Herrera
- Wellcome Sanger Institute, Cambridge, England
- Faculty of Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Shu Yi Shen
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Santana M Lardo
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | | | | | - Daniel D De Carvalho
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Sarah J Hainer
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | - Timothy L Bailey
- Department of Pharmacology, University of Nevada, Reno, Reno, NV, USA
| | - Michael M Hoffman
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada.
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada.
| |
Collapse
|
2
|
Zhu HT, Xia YH, Zhang GJ. E2EDA: Protein Domain Assembly Based on End-to-End Deep Learning. J Chem Inf Model 2023; 63:6451-6461. [PMID: 37788318 DOI: 10.1021/acs.jcim.3c01387] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
With the development of deep learning, almost all single-domain proteins can be predicted at experimental resolution. However, the structure prediction of multi-domain proteins remains a challenge. Achieving end-to-end protein domain assembly and further improving the accuracy of the full-chain modeling by accurately predicting inter-domain orientation while improving the assembly efficiency will provide significant insights into structure-based drug discovery. In this work, we propose an End-to-End Domain Assembly method based on deep learning, named E2EDA. We first develop RMNet, an EfficientNetV2-based deep learning model that fuses multiple features using an attention mechanism to predict inter-domain rigid motion. Then, the predicted rigid motions are transformed into inter-domain spatial transformations to directly assemble the full-chain model. Finally, the scoring strategy RMscore is designed to select the best model from multiple assembled models. The experimental results show that the average TM-score of the model assembled by E2EDA on the benchmark set (282) is 0.827, which is better than those of other domain assembly methods SADA (0.792) and DEMO (0.730). Meanwhile, on our constructed multi-domain data set from AlphaFold DB, the model reassembled by E2EDA is 7.0% higher in TM-score compared to the full-chain model predicted by AlphaFold2, indicating that E2EDA can capture more accurate inter-domain orientations to improve the quality of the model predicted by AlphaFold2. Furthermore, compared to SADA and AlphaFold2, E2EDA reduced the average runtime on the benchmark by 64.7% and 19.2%, respectively, indicating that E2EDA can significantly improve assembly efficiency through an end-to-end approach. The online server is available at http://zhanglab-bioinf.com/E2EDA.
Collapse
Affiliation(s)
- Hai-Tao Zhu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| |
Collapse
|
3
|
Nevers Y, Glover NM, Dessimoz C, Lecompte O. Protein length distribution is remarkably uniform across the tree of life. Genome Biol 2023; 24:135. [PMID: 37291671 PMCID: PMC10251718 DOI: 10.1186/s13059-023-02973-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/16/2023] [Indexed: 06/10/2023] Open
Abstract
BACKGROUND In every living species, the function of a protein depends on its organization of structural domains, and the length of a protein is a direct reflection of this. Because every species evolved under different evolutionary pressures, the protein length distribution, much like other genomic features, is expected to vary across species but has so far been scarcely studied. RESULTS Here we evaluate this diversity by comparing protein length distribution across 2326 species (1688 bacteria, 153 archaea, and 485 eukaryotes). We find that proteins tend to be on average slightly longer in eukaryotes than in bacteria or archaea, but that the variation of length distribution across species is low, especially compared to the variation of other genomic features (genome size, number of proteins, gene length, GC content, isoelectric points of proteins). Moreover, most cases of atypical protein length distribution appear to be due to artifactual gene annotation, suggesting the actual variation of protein length distribution across species is even smaller. CONCLUSIONS These results open the way for developing a genome annotation quality metric based on protein length distribution to complement conventional quality measures. Overall, our findings show that protein length distribution between living species is more uniform than previously thought. Furthermore, we also provide evidence for a universal selection on protein length, yet its mechanism and fitness effect remain intriguing open questions.
Collapse
Affiliation(s)
- Yannis Nevers
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland.
| | - Natasha M Glover
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland
- Department of Computer Science, University College London, London, UK
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Odile Lecompte
- Department of Computer Science, Centre de Recherche en Biomédecine de Strasbourg, ICube, UMR 7357, University of Strasbourg, CNRS, Strasbourg, France
| |
Collapse
|
4
|
Mohammed Y, Goodlett D, Borchers CH. Bioinformatics Tools and Knowledgebases to Assist Generating Targeted Assays for Plasma Proteomics. Methods Mol Biol 2023; 2628:557-577. [PMID: 36781806 DOI: 10.1007/978-1-0716-2978-9_32] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
In targeted proteomics experiments, selecting the appropriate proteotypic peptides as surrogate for the target protein is a crucial pre-acquisition step. This step is largely a bioinformatics exercise that involves integrating information on the peptides and proteins and using various software tools and knowledgebases. We present here a few resources that automate and simplify the selection process to a great degree. These tools and knowledgebases were developed primarily to streamline targeted proteomics assay development and include PeptidePicker, PeptidePickerDB, MRMAssayDB, MouseQuaPro, and PeptideTracker. We have used these tools to develop and document thousands of targeted proteomics assays, many of them for plasma proteins with focus on human and mouse. An important aspect in all these resources is the integrative approach on which they are based. Using these tools in the first steps of designing a singleplexed or multiplexed targeted proteomic experiment can reduce the necessary experimental steps tremendously. All the tools and knowledgebases we describe here are Web-based and freely accessible so scientists can query the information conveniently from the browser. This chapter provides an overview of these software tools and knowledgebases, their content, and how to use them for targeted plasma proteomics. We further demonstrate how to use them with the results of the HUPO Human Plasma Proteome Project to produce a new database of 3.8 k targeted assays for known human plasma proteins. Upon experimental validation, these assays should help in the further quantitative characterizing of the plasma proteome.
Collapse
Affiliation(s)
- Yassene Mohammed
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, ZA, Netherlands. .,University of Victoria - Genome BC Proteomics Centre, Victoria, BC, Canada. .,Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada.
| | - David Goodlett
- University of Victoria - Genome BC Proteomics Centre, Victoria, BC, Canada.,Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada.,University of Gdansk, International Centre for Cancer Vaccine Science, Gdansk, Poland
| | - Christoph H Borchers
- Proteomics Centre, Segal Cancer Centre, Lady Davis Institute, Jewish General Hospital, McGill University, Montreal, QC, Canada.,Gerald Bronfman Department of Oncology, Jewish General Hospital, Montreal, QC, Canada.,Division of Experimental Medicine, McGill University, Montreal, QC, Canada.,Department of Pathology, McGill University, Montreal, QC, Canada
| |
Collapse
|
5
|
Genomic basis of the giga-chromosomes and giga-genome of tree peony Paeonia ostii. Nat Commun 2022; 13:7328. [PMID: 36443323 PMCID: PMC9705720 DOI: 10.1038/s41467-022-35063-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 11/17/2022] [Indexed: 11/29/2022] Open
Abstract
Tree peony (Paeonia ostii) is an economically important ornamental plant native to China. It is also notable for its seed oil, which is abundant in unsaturated fatty acids such as α-linolenic acid (ALA). Here, we report chromosome-level genome assembly (12.28 Gb) of P. ostii. In contrast to monocots with giant genomes, tree peony does not appear to have undergone lineage-specific whole-genome duplication. Instead, explosive LTR expansion in the intergenic regions within a short period (~ two million years) may have contributed to the formation of its giga-genome. In addition, expansion of five types of histone encoding genes may have helped maintain the giga-chromosomes. Further, we conduct genome-wide association studies (GWAS) on 448 accessions and show expansion and high expression of several genes in the key nodes of fatty acid biosynthetic pathway, including SAD, FAD2 and FAD3, may function in high level of ALAs synthesis in tree peony seeds. Moreover, by comparing with cultivated tree peony (P. suffruticosa), we show that ectopic expression of class A gene AP1 and reduced expression of class C gene AG may contribute to the formation of petaloid stamens. Genomic resources reported in this study will be valuable for studying chromosome/genome evolution and tree peony breeding.
Collapse
|
6
|
Peng CX, Zhou XG, Xia YH, Liu J, Hou MH, Zhang GJ. Structural analogue-based protein structure domain assembly assisted by deep learning. Bioinformatics 2022; 38:4513-4521. [PMID: 35962986 DOI: 10.1093/bioinformatics/btac553] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 07/27/2022] [Accepted: 08/08/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION With the breakthrough of AlphaFold2, the protein structure prediction problem has made remarkable progress through deep learning end-to-end techniques, in which correct folds could be built for nearly all single-domain proteins. However, the full-chain modelling appears to be lower on average accuracy than that for the constituent domains and requires higher demand on computing hardware, indicating the performance of full-chain modelling still needs to be improved. In this study, we investigate whether the predicted accuracy of the full-chain model can be further improved by domain assembly assisted by deep learning. RESULTS In this article, we developed a structural analogue-based protein structure domain assembly method assisted by deep learning, named SADA. In SADA, a multi-domain protein structure database was constructed for the full-chain analogue detection using individual domain models. Starting from the initial model constructed from the analogue, the domain assembly simulation was performed to generate the full-chain model through a two-stage differential evolution algorithm guided by the energy function with an inter-residue distance potential predicted by deep learning. SADA was compared with the state-of-the-art domain assembly methods on 356 benchmark proteins, and the average TM-score of SADA models is 8.1% and 27.0% higher than that of DEMO and AIDA, respectively. We also assembled 293 human multi-domain proteins, where the average TM-score of the full-chain model after the assembly by SADA is 1.1% higher than that of the model by AlphaFold2. To conclude, we find that the domains often interact in the similar way in the quaternary orientations if the domains have similar tertiary structures. Furthermore, homologous templates and structural analogues are complementary for multi-domain protein full-chain modelling. AVAILABILITY AND IMPLEMENTATION http://zhanglab-bioinf.com/SADA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Ming-Hua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
7
|
I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 2022; 17:2326-2353. [PMID: 35931779 DOI: 10.1038/s41596-022-00728-0] [Citation(s) in RCA: 104] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/24/2022] [Indexed: 01/17/2023]
Abstract
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.
Collapse
|
8
|
Gene Mining for Conserved, Non-Annotated Proteins of Podosphaera xanthii Identifies Novel Target Candidates for Controlling Powdery Mildews by Spray-Induced Gene Silencing. J Fungi (Basel) 2021; 7:jof7090735. [PMID: 34575773 PMCID: PMC8465782 DOI: 10.3390/jof7090735] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 08/31/2021] [Accepted: 09/06/2021] [Indexed: 11/17/2022] Open
Abstract
The powdery mildew fungus Podosphaera xanthii is one of the most important limiting factors for cucurbit production worldwide. Despite the significant efforts made by breeding and chemical companies, effective control of this pathogen remains elusive to growers. In this work, we examined the suitability of RNAi technology called spray-induced gene silencing (SIGS) for controlling cucurbit powdery mildew. Using leaf disc and cotyledon infiltration assays, we tested the efficacy of dsRNA applications to induce gene silencing in P. xanthii. Furthermore, to identify new target candidate genes, we analyzed sixty conserved and non-annotated proteins (CNAPs) deduced from the P. xanthii transcriptome in silico. Six proteins presumably involved in essential functions, specifically respiration (CNAP8878, CNAP9066, CNAP10905 and CNAP30520), glycosylation (CNAP1048) and efflux transport (CNAP948), were identified. Functional analysis of these CNAP coding genes by dsRNA-induced gene silencing resulted in strong silencing phenotypes with large reductions in fungal growth and disease symptoms. Due to their important contributions to fungal development, the CNAP1048, CNAP10905 and CNAP30520 genes were selected as targets to conduct SIGS assays under plant growth chamber conditions. The spray application of these dsRNAs induced high levels of disease control, supporting that SIGS could be a sustainable approach to combat powdery mildew diseases.
Collapse
|
9
|
Yang Y, Huang L, Xu C, Qi L, Wu Z, Li J, Chen H, Wu Y, Fu T, Zhu H, Saand MA, Li J, Liu L, Fan H, Zhou H, Qin W. Chromosome-scale genome assembly of areca palm (Areca catechu). Mol Ecol Resour 2021; 21:2504-2519. [PMID: 34133844 DOI: 10.1111/1755-0998.13446] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 06/08/2021] [Accepted: 06/11/2021] [Indexed: 11/28/2022]
Abstract
Areca palm (Areca catechu L.; family Arecaceae) is an important tropical medicinal crop and is also used for masticatory and religious purposes in Asia. Improvements to areca properties made by traditional breeding tools have been very slow, and further advances in its cultivation and practical use require genomic information, which is still unavailable. Here, we present a chromosome-scale reference genome assembly for areca by combining Illumina and PacBio data with Hi-C mapping technologies, covering the predicted A. catechu genome length (2.59 Gb, variety "Reyan#1") to an estimated 240× read depth. The assembly was 2.51 Gb in length with a scaffold N50 of 1.7Mb. The scaffolds were then further assembled into 16 pseudochromosomes, with an N50 of 172 Mb. Transposable elements comprised 80.37% of the areca genome, and 68.68% of them were long-terminal repeat retrotransposon elements. The areca palm genome was predicted to harbour 31,571 protein-coding genes and overall, 92.92% of genes were functionally annotated, including enriched and expanded families of genes responsible for biosynthesis of flavonoid, anthocyanin, monoterpenoid and their derivatives. Comparative analyses indicated that A. catechu probably diverged from its close relatives Elaeis guineensis and Cocos nucifera approximately 50.3 million years ago (Ma). Two whole genome duplication events in areca palm were found to be shared by palms and monocots, respectively. This genome assembly and associated resources represents an important addition to the palm genomics community and will be a valuable resource that will facilitate areca palm breeding and improve our understanding of areca palm biology and evolution.
Collapse
Affiliation(s)
- Yaodong Yang
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Liyun Huang
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Chunyan Xu
- BGI Genomics, BGI-Shenzhen, Shenzhen, China
| | - Lan Qi
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | | | - Jia Li
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | | | - Yi Wu
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Tao Fu
- BGI Genomics, BGI-Shenzhen, Shenzhen, China
| | - Hui Zhu
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Mumtaz Ali Saand
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Jing Li
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Liyun Liu
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Haikou Fan
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Huanqi Zhou
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Weiquan Qin
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| |
Collapse
|
10
|
Moreira-Filho JT, Silva AC, Dantas RF, Gomes BF, Souza Neto LR, Brandao-Neto J, Owens RJ, Furnham N, Neves BJ, Silva-Junior FP, Andrade CH. Schistosomiasis Drug Discovery in the Era of Automation and Artificial Intelligence. Front Immunol 2021; 12:642383. [PMID: 34135888 PMCID: PMC8203334 DOI: 10.3389/fimmu.2021.642383] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/30/2021] [Indexed: 12/20/2022] Open
Abstract
Schistosomiasis is a parasitic disease caused by trematode worms of the genus Schistosoma and affects over 200 million people worldwide. The control and treatment of this neglected tropical disease is based on a single drug, praziquantel, which raises concerns about the development of drug resistance. This, and the lack of efficacy of praziquantel against juvenile worms, highlights the urgency for new antischistosomal therapies. In this review we focus on innovative approaches to the identification of antischistosomal drug candidates, including the use of automated assays, fragment-based screening, computer-aided and artificial intelligence-based computational methods. We highlight the current developments that may contribute to optimizing research outputs and lead to more effective drugs for this highly prevalent disease, in a more cost-effective drug discovery endeavor.
Collapse
Affiliation(s)
- José T. Moreira-Filho
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Arthur C. Silva
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Rafael F. Dantas
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Barbara F. Gomes
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Lauro R. Souza Neto
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Jose Brandao-Neto
- Diamond Light Source Ltd., Didcot, United Kingdom
- Research Complex at Harwell, Didcot, United Kingdom
| | - Raymond J. Owens
- The Rosalind Franklin Institute, Harwell, United Kingdom
- Division of Structural Biology, The Wellcome Centre for Human Genetic, University of Oxford, Oxford, United Kingdom
| | - Nicholas Furnham
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Bruno J. Neves
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Floriano P. Silva-Junior
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Carolina H. Andrade
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| |
Collapse
|
11
|
Lam SD, Ashford P, Díaz-Sánchez S, Villar M, Gortázar C, de la Fuente J, Orengo C. Arthropod Ectoparasites Have Potential to Bind SARS-CoV-2 via ACE. Viruses 2021; 13:v13040708. [PMID: 33921873 PMCID: PMC8073597 DOI: 10.3390/v13040708] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 04/16/2021] [Accepted: 04/16/2021] [Indexed: 12/17/2022] Open
Abstract
Coronavirus-like organisms have been previously identified in Arthropod ectoparasites (such as ticks and unfed cat flea). Yet, the question regarding the possible role of these arthropods as SARS-CoV-2 passive/biological transmission vectors is still poorly explored. In this study, we performed in silico structural and binding energy calculations to assess the risks associated with possible ectoparasite transmission. We found sufficient similarity between ectoparasite ACE and human ACE2 protein sequences to build good quality 3D-models of the SARS-CoV-2 Spike:ACE complex to assess the impacts of ectoparasite mutations on complex stability. For several species (e.g., water flea, deer tick, body louse), our analyses showed no significant destabilisation of the SARS-CoV-2 Spike:ACE complex, suggesting these species would bind the viral Spike protein. Our structural analyses also provide structural rationale for interactions between the viral Spike and the ectoparasite ACE proteins. Although we do not have experimental evidence of infection in these ectoparasites, the predicted stability of the complex suggests this is possible, raising concerns of a possible role in passive transmission of the virus to their human hosts.
Collapse
Affiliation(s)
- Su Datt Lam
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, UK;
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia
- Correspondence: (S.D.L.); (J.d.l.F.); (C.O.)
| | - Paul Ashford
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, UK;
| | - Sandra Díaz-Sánchez
- SaBio, Instituto de Investigación en Recursos Cinegéticos IREC-CSIC-UCLM-JCCM, Ronda de Toledo s/n, 13005 Ciudad Real, Spain; (S.D.-S.); (M.V.); (C.G.)
| | - Margarita Villar
- SaBio, Instituto de Investigación en Recursos Cinegéticos IREC-CSIC-UCLM-JCCM, Ronda de Toledo s/n, 13005 Ciudad Real, Spain; (S.D.-S.); (M.V.); (C.G.)
- Regional Centre for Biomedical Research (CRIB), Biochemistry Section, Faculty of Science and Chemical Technologies, University of Castilla-La Mancha, 13071 Ciudad Real, Spain
| | - Christian Gortázar
- SaBio, Instituto de Investigación en Recursos Cinegéticos IREC-CSIC-UCLM-JCCM, Ronda de Toledo s/n, 13005 Ciudad Real, Spain; (S.D.-S.); (M.V.); (C.G.)
| | - José de la Fuente
- SaBio, Instituto de Investigación en Recursos Cinegéticos IREC-CSIC-UCLM-JCCM, Ronda de Toledo s/n, 13005 Ciudad Real, Spain; (S.D.-S.); (M.V.); (C.G.)
- Center for Veterinary Health Sciences, Department of Veterinary Pathobiology, Oklahoma State University, Stillwater, OK 74078, USA
- Correspondence: (S.D.L.); (J.d.l.F.); (C.O.)
| | - Christine Orengo
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, UK;
- Correspondence: (S.D.L.); (J.d.l.F.); (C.O.)
| |
Collapse
|
12
|
Bhowmick P, Roome S, Borchers CH, Goodlett DR, Mohammed Y. An Update on MRMAssayDB: A Comprehensive Resource for Targeted Proteomics Assays in the Community. J Proteome Res 2021; 20:2105-2115. [PMID: 33683131 PMCID: PMC8041396 DOI: 10.1021/acs.jproteome.0c00961] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
Precise multiplexed
quantification of proteins in biological samples
can be achieved by targeted proteomics using multiple or parallel
reaction monitoring (MRM/PRM). Combined with internal standards, the
method achieves very good repeatability and reproducibility enabling
excellent protein quantification and allowing longitudinal and cohort
studies. A laborious part of performing such experiments lies in the
preparation steps dedicated to the development and validation of individual
protein assays. Several public repositories host information on targeted
proteomics assays, including NCI’s Clinical Proteomic Tumor
Analysis Consortium assay portals, PeptideAtlas SRM Experiment Library,
SRMAtlas, PanoramaWeb, and PeptideTracker, with all offering varying
levels of details. We introduced MRMAssayDB in 2018 as an integrated
resource for targeted proteomics assays. The Web-based application
maps and links the assays from the repositories, includes comprehensive
up-to-date protein and sequence annotations, and provides multiple
visualization options on the peptide and protein level. We have extended
MRMAssayDB with more assays and extensive annotations. Currently it
contains >828 000 assays covering >51 000 proteins
from
94 organisms, of which >17 000 proteins are present in >2400
biological pathways, and >48 000 mapping to >21 000
Gene Ontology terms. This is an increase of about four times the number
of assays since introduction. We have expanded annotations of interaction,
biological pathways, and disease associations. A newly added visualization
module for coupled molecular structural annotation browsing allows
the user to interactively examine peptide sequence and any known PTMs
and disease mutations, and map all to available protein 3D structures.
Because of its integrative approach, MRMAssayDB enables a holistic
view of suitable proteotypic peptides and commonly used transitions
in empirical data. Availability: http://mrmassaydb.proteincentre.com.
Collapse
Affiliation(s)
- Pallab Bhowmick
- University of Victoria - Genome BC Proteomics Centre, Victoria, British Columbia V8Z 7X8, Canada.,University of Victoria, Victoria, British Columbia V8P 5C2, Canada
| | - Simon Roome
- University of Victoria - Genome BC Proteomics Centre, Victoria, British Columbia V8Z 7X8, Canada.,University of Victoria, Victoria, British Columbia V8P 5C2, Canada
| | - Christoph H Borchers
- Proteomics Centre, Segal Cancer Centre, Lady Davis Institute, Jewish General Hospital, McGill University, Montreal, Quebec H3T 1E2, Canada.,Gerald Bronfman Department of Oncology, Jewish General Hospital, Montreal, Quebec H3T 1E2, Canada.,Department of Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Nobel Street, Moscow 121205, Russia
| | - David R Goodlett
- University of Victoria - Genome BC Proteomics Centre, Victoria, British Columbia V8Z 7X8, Canada.,University of Victoria, Victoria, British Columbia V8P 5C2, Canada.,University of Gdansk, International Centre for Cancer Vaccine Science, 80-309 Gdansk, Poland
| | - Yassene Mohammed
- University of Victoria - Genome BC Proteomics Centre, Victoria, British Columbia V8Z 7X8, Canada.,University of Victoria, Victoria, British Columbia V8P 5C2, Canada.,Center for Proteomics and Metabolomics, Leiden University Medical Center, 2333 ZA Leiden, Netherlands
| |
Collapse
|
13
|
Lam SD, Babu MM, Lees J, Orengo CA. Biological impact of mutually exclusive exon switching. PLoS Comput Biol 2021; 17:e1008708. [PMID: 33651795 PMCID: PMC7954323 DOI: 10.1371/journal.pcbi.1008708] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 03/12/2021] [Accepted: 01/14/2021] [Indexed: 12/27/2022] Open
Abstract
Alternative splicing can expand the diversity of proteomes. Homologous mutually exclusive exons (MXEs) originate from the same ancestral exon and result in polypeptides with similar structural properties but altered sequence. Why would some genes switch homologous exons and what are their biological impact? Here, we analyse the extent of sequence, structural and functional variability in MXEs and report the first large scale, structure-based analysis of the biological impact of MXE events from different genomes. MXE-specific residues tend to map to single domains, are highly enriched in surface exposed residues and cluster at or near protein functional sites. Thus, MXE events are likely to maintain the protein fold, but alter specificity and selectivity of protein function. This comprehensive resource of MXE events and their annotations is available at: http://gene3d.biochem.ucl.ac.uk/mxemod/. These findings highlight how small, but significant changes at critical positions on a protein surface are exploited in evolution to alter function.
Collapse
Affiliation(s)
- Su Datt Lam
- Institute of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London, United Kingdom
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia
- * E-mail: (SDL); (JL); (CO)
| | - M. Madan Babu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, United Kingdom
- Department of Structural Biology and Center for Data Driven Discovery, St Jude Children’s Research Hospital, Memphis, Tennessee, United States of America
| | - Jonathan Lees
- Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, United Kingdom
- * E-mail: (SDL); (JL); (CO)
| | - Christine A. Orengo
- Institute of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London, United Kingdom
- * E-mail: (SDL); (JL); (CO)
| |
Collapse
|
14
|
Altenhoff AM, Train CM, Gilbert KJ, Mediratta I, Mendes de Farias T, Moi D, Nevers Y, Radoykova HS, Rossier V, Warwick Vesztrocy A, Glover NM, Dessimoz C. OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more. Nucleic Acids Res 2021; 49:D373-D379. [PMID: 33174605 PMCID: PMC7779010 DOI: 10.1093/nar/gkaa1007] [Citation(s) in RCA: 99] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/10/2020] [Accepted: 10/14/2020] [Indexed: 01/11/2023] Open
Abstract
OMA is an established resource to elucidate evolutionary relationships among genes from currently 2326 genomes covering all domains of life. OMA provides pairwise and groupwise orthologs, functional annotations, local and global gene order conservation (synteny) information, among many other functions. This update paper describes the reorganisation of the database into gene-, group- and genome-centric pages. Other new and improved features are detailed, such as reporting of the evolutionarily best conserved isoforms of alternatively spliced genes, the inferred local order of ancestral genes, phylogenetic profiling, better cross-references, fast genome mapping, semantic data sharing via RDF, as well as a special coronavirus OMA with 119 viruses from the Nidovirales order, including SARS-CoV-2, the agent of the COVID-19 pandemic. We conclude with improvements to the documentation of the resource through primers, tutorials and short videos. OMA is accessible at https://omabrowser.org.
Collapse
Affiliation(s)
- Adrian M Altenhoff
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland
| | - Clément-Marie Train
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Kimberly J Gilbert
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Ishita Mediratta
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Department of Computer Science and Information Systems, BITS Pilani K.K. Birla Goa Campus, India
| | | | - David Moi
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Yannis Nevers
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Hale-Seda Radoykova
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Gower St, London WC1E 6BT, United Kingdom
- Department of Computer Science, University College London, Gower St, London WC1E 6BT, United Kingdom
| | - Victor Rossier
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Alex Warwick Vesztrocy
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Natasha M Glover
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Gower St, London WC1E 6BT, United Kingdom
- Department of Computer Science, University College London, Gower St, London WC1E 6BT, United Kingdom
| |
Collapse
|
15
|
Zhou X, Li Y, Zhang C, Zheng W, Zhang G, Zhang Y. Progressive and accurate assembly of multi-domain protein structures from cryo-EM density maps. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.10.15.340455. [PMID: 33083802 PMCID: PMC7574260 DOI: 10.1101/2020.10.15.340455] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Progress in cryo-electron microscopy (cryo-EM) has provided the potential for large-size protein structure determination. However, the solution rate for multi-domain proteins remains low due to the difficulty in modeling inter-domain orientations. We developed DEMO-EM, an automatic method to assemble multi-domain structures from cryo-EM maps through a progressive structural refinement procedure combining rigid-body domain fitting and flexible assembly simulations with deep neural network inter-domain distance profiles. The method was tested on a large-scale benchmark set of proteins containing up to twelve continuous and discontinuous domains with medium-to-low-resolution density maps, where DEMO-EM produced models with correct inter-domain orientations (TM-score >0.5) for 98% of cases and significantly outperformed the state-of-the-art methods. DEMO-EM was applied to SARS-Cov-2 coronavirus genome and generated models with average TM-score/RMSD of 0.97/1.4Å to the deposited structures. These results demonstrated an efficient pipeline that enables automated and reliable large-scale multi-domain protein structure modeling with atomic-level accuracy from cryo-EM maps.
Collapse
Affiliation(s)
- Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou 310023, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
16
|
Lam SD, Bordin N, Waman VP, Scholes HM, Ashford P, Sen N, van Dorp L, Rauer C, Dawson NL, Pang CSM, Abbasian M, Sillitoe I, Edwards SJL, Fraternali F, Lees JG, Santini JM, Orengo CA. SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals. Sci Rep 2020; 10:16471. [PMID: 33020502 PMCID: PMC7536205 DOI: 10.1038/s41598-020-71936-5] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 08/17/2020] [Indexed: 01/04/2023] Open
Abstract
SARS-CoV-2 has a zoonotic origin and was transmitted to humans via an undetermined intermediate host, leading to infections in humans and other mammals. To enter host cells, the viral spike protein (S-protein) binds to its receptor, ACE2, and is then processed by TMPRSS2. Whilst receptor binding contributes to the viral host range, S-protein:ACE2 complexes from other animals have not been investigated widely. To predict infection risks, we modelled S-protein:ACE2 complexes from 215 vertebrate species, calculated changes in the energy of the complex caused by mutations in each species, relative to human ACE2, and correlated these changes with COVID-19 infection data. We also analysed structural interactions to better understand the key residues contributing to affinity. We predict that mutations are more detrimental in ACE2 than TMPRSS2. Finally, we demonstrate phylogenetically that human SARS-CoV-2 strains have been isolated in animals. Our results suggest that SARS-CoV-2 can infect a broad range of mammals, but few fish, birds or reptiles. Susceptible animals could serve as reservoirs of the virus, necessitating careful ongoing animal management and surveillance.
Collapse
Affiliation(s)
- S D Lam
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - N Bordin
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - V P Waman
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - H M Scholes
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - P Ashford
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - N Sen
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
- Indian Institute of Science Education and Research, Pune, 411008, India
| | - L van Dorp
- UCL Genetics Institute, University College London, London, WC1E 6BT, UK
| | - C Rauer
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - N L Dawson
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - C S M Pang
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - M Abbasian
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - I Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - S J L Edwards
- Department of Science and Technology Studies, University College London, London, WC1E 6BT, UK
| | - F Fraternali
- Randall Division of Cell and Molecular Biophysics, Guy's Campus, New Hunt's House, King's College London, London, SE1 1UL, UK
| | - J G Lees
- Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, OX3 OBP, UK
| | - J M Santini
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - C A Orengo
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
17
|
Lewis TE, Sillitoe I, Lees JG. cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly. Bioinformatics 2020; 35:1766-1767. [PMID: 30295745 PMCID: PMC6513158 DOI: 10.1093/bioinformatics/bty863] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2018] [Revised: 09/18/2018] [Accepted: 10/05/2018] [Indexed: 11/26/2022] Open
Abstract
Motivation Many bioinformatics areas require us to assign domain matches onto stretches of a query protein. Starting with a set of candidate matches, we want to identify the optimal subset that has limited/no overlap between matches. This may be further complicated by discontinuous domains in the input data. Existing tools are increasingly facing very large data-sets for which they require prohibitive amounts of CPU-time and memory. Results We present cath-resolve-hits (CRH), a new tool that uses a dynamic-programming algorithm implemented in open-source C++ to handle large datasets quickly (up to ∼1 million hits/second) and in reasonable amounts of memory. It accepts multiple input formats and provides its output in plain text, JSON or graphical HTML. We describe a benchmark against an existing algorithm, which shows CRH delivers very similar or slightly improved results and very much improved CPU/memory performance on large datasets. Availability and implementation CRH is available at https://github.com/UCLOrengoGroup/cath-tools; documentation is available at http://cath-tools.readthedocs.io. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- T E Lewis
- Department of Structural and Molecular Biology, UCL, Darwin Building, London, UK
| | - I Sillitoe
- Department of Structural and Molecular Biology, UCL, Darwin Building, London, UK
| | - J G Lees
- Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, Oxfordshire, UK
| |
Collapse
|
18
|
Ma B, France MT, Crabtree J, Holm JB, Humphrys MS, Brotman RM, Ravel J. A comprehensive non-redundant gene catalog reveals extensive within-community intraspecies diversity in the human vagina. Nat Commun 2020; 11:940. [PMID: 32103005 PMCID: PMC7044274 DOI: 10.1038/s41467-020-14677-3] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 01/23/2020] [Indexed: 12/12/2022] Open
Abstract
Analysis of metagenomic and metatranscriptomic data is complicated and typically requires extensive computational resources. Leveraging a curated reference database of genes encoded by members of the target microbiome can make these analyses more tractable. In this study, we assemble a comprehensive human vaginal non-redundant gene catalog (VIRGO) that includes 0.95 million non-redundant genes. The gene catalog is functionally and taxonomically annotated. We also construct a vaginal orthologous groups (VOG) from VIRGO. The gene-centric design of VIRGO and VOG provides an easily accessible tool to comprehensively characterize the structure and function of vaginal metagenome and metatranscriptome datasets. To highlight the utility of VIRGO, we analyze 1,507 additional vaginal metagenomes, and identify a high degree of intraspecies diversity within and across vaginal microbiota. VIRGO offers a convenient reference database and toolkit that will facilitate a more in-depth understanding of the role of vaginal microorganisms in women’s health and reproductive outcomes. Reference databases are essential for studies on host-microbiota interactions. Here, the authors present the construction of VIRGO, a human vaginal non-redundant gene catalog, which represents a comprehensive resource for taxonomic and functional profiling of vaginal microbiomes from metagenomic and metatranscriptomic datasets.
Collapse
Affiliation(s)
- Bing Ma
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.,Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Michael T France
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.,Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Jonathan Crabtree
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Johanna B Holm
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.,Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Michael S Humphrys
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Rebecca M Brotman
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.,Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Jacques Ravel
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA. .,Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
| |
Collapse
|
19
|
Martin KP, MacKenzie SM, Barnes JW, Ytreberg FM. Protein Stability in Titan's Subsurface Water Ocean. ASTROBIOLOGY 2020; 20:190-198. [PMID: 31730377 PMCID: PMC7041334 DOI: 10.1089/ast.2018.1972] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 10/08/2019] [Indexed: 06/10/2023]
Abstract
Models of Titan predict that there is a subsurface ocean of water and ammonia under a layer of ice. Such an ocean would be important in the search for extraterrestrial life since it provides a potentially habitable environment. To evaluate how Earth-based proteins would behave in Titan's subsurface ocean environment, we used molecular dynamics simulations to calculate the properties of proteins with the most common secondary structure types (alpha helix and beta sheet) in both Earth and Titan-like conditions. The Titan environment was simulated by using a temperature of 300 K, a pressure of 1000 bar, and a eutectic mixture of water and ammonia. We analyzed protein compactness, flexibility, and backbone dihedral distributions to identify differences between the two environments. Secondary structures in the Titan environment were found to be less long-lasting, less flexible, and had small differences in backbone dihedral preferences (e.g., in one instance a pi helix formed). These environment-driven differences could lead to changes in how these proteins interact with other biomolecules and therefore changes in how evolution would potentially shape proteins to function in subsurface ocean environments.
Collapse
Affiliation(s)
- Kyle P. Martin
- Department of Physics, University of Idaho, Moscow, Idaho
- Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, Idaho
| | | | | | - F. Marty Ytreberg
- Department of Physics, University of Idaho, Moscow, Idaho
- Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, Idaho
| |
Collapse
|
20
|
Waman VP, Blundell TL, Buchan DWA, Gough J, Jones D, Kelley L, Murzin A, Pandurangan AP, Sillitoe I, Sternberg M, Torres P, Orengo C. The Genome3D Consortium for Structural Annotations of Selected Model Organisms. Methods Mol Biol 2020; 2165:27-67. [PMID: 32621218 DOI: 10.1007/978-1-0716-0708-4_3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Genome3D consortium is a collaborative project involving protein structure prediction and annotation resources developed by six world-leading structural bioinformatics groups, based in the United Kingdom (namely Blundell, Murzin, Gough, Sternberg, Orengo, and Jones). The main objective of Genome3D serves as a common portal to provide both predicted models and annotations of proteins in model organisms, using several resources developed by these labs such as CATH-Gene3D, DOMSERF, pDomTHREADER, PHYRE, SUPERFAMILY, FUGUE/TOCATTA, and VIVACE. These resources primarily use SCOP- and/or CATH-based protein domain assignments. Another objective of Genome3D is to compare structural classifications of protein domains in CATH and SCOP databases and to provide a consensus mapping of CATH and SCOP protein superfamilies. CATH/SCOP mapping analyses led to the identification of total of 1429 consensus superfamilies.Currently, Genome3D provides structural annotations for ten model organisms, including Homo sapiens, Arabidopsis thaliana, Mus musculus, Escherichia coli, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Plasmodium falciparum, Staphylococcus aureus, and Schizosaccharomyces pombe. Thus, Genome3D serves as a common gateway to each structure prediction/annotation resource and allows users to perform comparative assessment of the predictions. It, thus, assists researchers to broaden their perspective on structure/function predictions of their query protein of interest in selected model organisms.
Collapse
Affiliation(s)
- Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Daniel W A Buchan
- Department of Computer Science, University College London, London, UK
| | - Julian Gough
- MRC Laboratory of Molecular Biology, Cambridge, UK
| | - David Jones
- Department of Computer Science, University College London, London, UK
| | - Lawrence Kelley
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, UK
| | | | | | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Michael Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, UK
| | - Pedro Torres
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, UK.
| |
Collapse
|
21
|
Li N, Qian S, Li B, Zhan X. Quantitative analysis of the human ovarian carcinoma mitochondrial phosphoproteome. Aging (Albany NY) 2019; 11:6449-6468. [PMID: 31442208 PMCID: PMC6738437 DOI: 10.18632/aging.102199] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Accepted: 08/10/2019] [Indexed: 05/02/2023]
Abstract
To investigate the existence and their potential biological roles of mitochondrial phosphoproteins (mtPPs) in human ovarian carcinoma (OC), mitochondria purified from OC and control tissues were analyzed with TiO2 enrichment-based iTRAQ quantitative proteomics. Totally 67 mtPPs with 124 phosphorylation sites were identified, which of them included 48 differential mtPPs (mtDPPs). Eighteen mtPPs were reported previously in OCs, and they were consistent in this study compared to previous literature. GO analysis revealed those mtPPs were involved in multiple cellular processes. PPI network indicated that those mtPPs were correlated mutually, and some mtPPs acted as hub molecules, such as EIF2S2, RPLP0, RPLP2, CFL1, MYH10, HSP90, HSPD1, PSMA3, TMX1, VDAC2, VDAC3, TOMM22, and TOMM20. Totally 32 mtPP-pathway systems (p<0.05) were enriched and clustered into 15 groups, including mitophagy, apoptosis, deubiquitination, signaling by VEGF, RHO-GTPase effectors, mitochondrial protein import, translation initiation, RNA transport, cellular responses to stress, and c-MYC transcriptional activation. Totally 29 mtPPs contained a certain protein domains. Upstream regulation analysis showed that TP53, TGFB1, dexamethasone, and thapsigargin might act as inhibitors, and L-dopa and forskolin might act as activators. This study provided novel insights into mitochondrial protein phosphorylations and their potential roles in OC pathogenesis and offered new biomarker resource for OCs.
Collapse
Affiliation(s)
- Na Li
- Key Laboratory of Cancer Proteomics of Chinese Ministry of Health, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
- Hunan Engineering Laboratory for Structural Biology and Drug Design, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
- State Local Joint Engineering Laboratory for Anticancer Drugs, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| | - Shehua Qian
- Key Laboratory of Cancer Proteomics of Chinese Ministry of Health, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
- Hunan Engineering Laboratory for Structural Biology and Drug Design, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
- State Local Joint Engineering Laboratory for Anticancer Drugs, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| | - Biao Li
- Key Laboratory of Cancer Proteomics of Chinese Ministry of Health, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
- Hunan Engineering Laboratory for Structural Biology and Drug Design, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
- State Local Joint Engineering Laboratory for Anticancer Drugs, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| | - Xianquan Zhan
- Key Laboratory of Cancer Proteomics of Chinese Ministry of Health, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
- Hunan Engineering Laboratory for Structural Biology and Drug Design, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
- State Local Joint Engineering Laboratory for Anticancer Drugs, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| |
Collapse
|
22
|
Lewis TE, Sillitoe I, Dawson N, Lam SD, Clarke T, Lee D, Orengo C, Lees J. Gene3D: Extensive prediction of globular domains in proteins. Nucleic Acids Res 2019; 46:D435-D439. [PMID: 29112716 PMCID: PMC5753370 DOI: 10.1093/nar/gkx1069] [Citation(s) in RCA: 88] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/18/2017] [Indexed: 11/28/2022] Open
Abstract
Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of globular domain annotations for millions of available protein sequences. Gene3D has previously featured in the Database issue of NAR and here we report a significant update to the Gene3D database. The current release, Gene3D v16, has significantly expanded its domain coverage over the previous version and now contains over 95 million domain assignments. We also report a new method for dealing with complex domain architectures that exist in Gene3D, arising from discontinuous domains. Amongst other updates, we have added visualization tools for exploring domain annotations in the context of other sequence features and in gene families. We also provide web-pages to visualize other domain families that co-occur with a given query domain family.
Collapse
Affiliation(s)
- Tony E Lewis
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - Natalie Dawson
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - Su Datt Lam
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK.,School of Biosciences and Biotechnology, Faculty of Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
| | - Tristan Clarke
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - David Lee
- Bristol Life Sciences Building, University of Bristol, Bristol Life Sciences Building, Bristol, BS8 1TQ, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - Jonathan Lees
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK.,Oxford Brookes University, Faculty of Health and Life Sciences, Oxford, Oxfordshire, UK
| |
Collapse
|
23
|
Altenhoff AM, Glover NM, Train CM, Kaleb K, Warwick Vesztrocy A, Dylus D, de Farias TM, Zile K, Stevenson C, Long J, Redestig H, Gonnet GH, Dessimoz C. The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces. Nucleic Acids Res 2019; 46:D477-D485. [PMID: 29106550 PMCID: PMC5753216 DOI: 10.1093/nar/gkx1019] [Citation(s) in RCA: 191] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/27/2017] [Indexed: 12/22/2022] Open
Abstract
The Orthologous Matrix (OMA) is a leading resource to relate genes across many species from all of life. In this update paper, we review the recent algorithmic improvements in the OMA pipeline, describe increases in species coverage (particularly in plants and early-branching eukaryotes) and introduce several new features in the OMA web browser. Notable improvements include: (i) a scalable, interactive viewer for hierarchical orthologous groups; (ii) protein domain annotations and domain-based links between orthologous groups; (iii) functionality to retrieve phylogenetic marker genes for a subset of species of interest; (iv) a new synteny dot plot viewer; and (v) an overhaul of the programmatic access (REST API and semantic web), which will facilitate incorporation of OMA analyses in computational pipelines and integration with other bioinformatic resources. OMA can be freely accessed at https://omabrowser.org.
Collapse
Affiliation(s)
- Adrian M Altenhoff
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,ETH Zurich, Computer Science, Universitätstrasse 6, 8092 Zurich, Switzerland
| | - Natasha M Glover
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,Dept. of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Clément-Marie Train
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,Dept. of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Klara Kaleb
- Dept. of Genetics, Evolution & Environment, University College London, Gower St, London WC1E 6BT, UK
| | - Alex Warwick Vesztrocy
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Dept. of Genetics, Evolution & Environment, University College London, Gower St, London WC1E 6BT, UK
| | - David Dylus
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,Dept. of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Tarcisio M de Farias
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,Dept. of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Karina Zile
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Dept. of Genetics, Evolution & Environment, University College London, Gower St, London WC1E 6BT, UK
| | - Charles Stevenson
- Dept. of Genetics, Evolution & Environment, University College London, Gower St, London WC1E 6BT, UK
| | - Jiao Long
- Bayer Crop Science NV, Technologiepark 38, 9052 Gent, Belgium
| | | | - Gaston H Gonnet
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,ETH Zurich, Computer Science, Universitätstrasse 6, 8092 Zurich, Switzerland
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,Dept. of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Dept. of Genetics, Evolution & Environment, University College London, Gower St, London WC1E 6BT, UK.,Dept. of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| |
Collapse
|
24
|
Hong J, Luo Y, Zhang Y, Ying J, Xue W, Xie T, Tao L, Zhu F. Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning. Brief Bioinform 2019; 21:1437-1447. [PMID: 31504150 PMCID: PMC7412958 DOI: 10.1093/bib/bbz081] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 05/27/2019] [Accepted: 06/10/2019] [Indexed: 11/12/2022] Open
Abstract
Functional annotation of protein sequence with high accuracy has become one of the most important issues in modern biomedical studies, and computational approaches of significantly accelerated analysis process and enhanced accuracy are greatly desired. Although a variety of methods have been developed to elevate protein annotation accuracy, their ability in controlling false annotation rates remains either limited or not systematically evaluated. In this study, a protein encoding strategy, together with a deep learning algorithm, was proposed to control the false discovery rate in protein function annotation, and its performances were systematically compared with that of the traditional similarity-based and de novo approaches. Based on a comprehensive assessment from multiple perspectives, the proposed strategy and algorithm were found to perform better in both prediction stability and annotation accuracy compared with other de novo methods. Moreover, an in-depth assessment revealed that it possessed an improved capacity of controlling the false discovery rate compared with traditional methods. All in all, this study not only provided a comprehensive analysis on the performances of the newly proposed strategy but also provided a tool for the researcher in the fields of protein function annotation.
Collapse
Affiliation(s)
- Jiajun Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yang Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Junbiao Ying
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Feng Zhu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|
25
|
Assembling multidomain protein structures through analogous global structural alignments. Proc Natl Acad Sci U S A 2019; 116:15930-15938. [PMID: 31341084 DOI: 10.1073/pnas.1905068116] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Most proteins exist with multiple domains in cells for cooperative functionality. However, structural biology and protein folding methods are often optimized for single-domain structures, resulting in a rapidly growing gap between the improved capability for tertiary structure determination and high demand for multidomain structure models. We have developed a pipeline, termed DEMO, for constructing multidomain protein structures by docking-based domain assembly simulations, with interdomain orientations determined by the distance profiles from analogous templates as detected through domain-level structure alignments. The pipeline was tested on a comprehensive benchmark set of 356 proteins consisting of 2-7 continuous and discontinuous domains, for which DEMO generated models with correct global fold (TM-score > 0.5) for 86% of cases with continuous domains and for 100% of cases with discontinuous domain structures, starting from randomly oriented target-domain structures. DEMO was also applied to reassemble multidomain targets in the CASP12 and CASP13 experiments using domain structures excised from the top server predictions, where the full-length DEMO models showed a significantly improved quality over the original server models. Finally, sparse restraints of mass spectrometry-generated cross-linking data and cryo-EM density maps are incorporated into DEMO, resulting in improvements in the average TM-score by 6.3% and 12.5%, respectively. The results demonstrate an efficient approach to assembling multidomain structures, which can be easily used for automated, genome-scale multidomain protein structure assembly.
Collapse
|
26
|
Polonio Á, Seoane P, Claros MG, Pérez-García A. The haustorial transcriptome of the cucurbit pathogen Podosphaera xanthii reveals new insights into the biotrophy and pathogenesis of powdery mildew fungi. BMC Genomics 2019; 20:543. [PMID: 31272366 PMCID: PMC6611051 DOI: 10.1186/s12864-019-5938-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 06/26/2019] [Indexed: 12/11/2022] Open
Abstract
Background Podosphaera xanthii is the main causal agent of powdery mildew disease in cucurbits and is responsible for important yield losses in these crops worldwide. Powdery mildew fungi are obligate biotrophs. In these parasites, biotrophy is determined by the presence of haustoria, which are specialized structures of parasitism developed by these fungi for the acquisition of nutrients and the delivery of effectors. Detailed molecular studies of powdery mildew haustoria are scarce due mainly to difficulties in their isolation. Therefore, their analysis is considered an important challenge for powdery mildew research. The aim of this work was to gain insights into powdery mildew biology by analysing the haustorial transcriptome of P. xanthii. Results Prior to RNA isolation and massive-scale mRNA sequencing, a flow cytometric approach was developed to isolate P. xanthii haustoria free of visible contaminants. Next, several commercial kits were used to isolate total RNA and to construct the cDNA and Illumina libraries that were finally sequenced by the Illumina NextSeq system. Using this approach, the maximum amount of information from low-quality RNA that could be obtained was used to accomplish the de novo assembly of the P. xanthii haustorial transcriptome. The subsequent analysis of this transcriptome and comparison with the epiphytic transcriptome allowed us to identify the importance of several biological processes for haustorial cells such as protection against reactive oxygen species, the acquisition of different nutrients and genetic regulation mediated by non-coding RNAs. In addition, we could also identify several secreted proteins expressed exclusively in haustoria such as cell adhesion proteins that have not been related to powdery mildew biology to date. Conclusions This work provides a novel approach to study the molecular aspects of powdery mildew haustoria. In addition, the results of this study have also allowed us to identify certain previously unknown processes and proteins involved in the biology of powdery mildews that could be essential for their biotrophy and pathogenesis. Electronic supplementary material The online version of this article (10.1186/s12864-019-5938-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Álvaro Polonio
- Departamento de Microbiología, Facultad de Ciencias, Universidad de Málaga, Bulevar Louis Pasteur 31, 29071, Málaga, Spain.,Instituto de Hortofruticultura Subtropical y Mediterránea "La Mayora", Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Bulevar Louis Pasteur 31, 29071, Málaga, Spain
| | - Pedro Seoane
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de Málaga, Bulevar Louis Pasteur 31, 29071, Málaga, Spain
| | - M Gonzalo Claros
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de Málaga, Bulevar Louis Pasteur 31, 29071, Málaga, Spain
| | - Alejandro Pérez-García
- Departamento de Microbiología, Facultad de Ciencias, Universidad de Málaga, Bulevar Louis Pasteur 31, 29071, Málaga, Spain. .,Instituto de Hortofruticultura Subtropical y Mediterránea "La Mayora", Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Bulevar Louis Pasteur 31, 29071, Málaga, Spain.
| |
Collapse
|
27
|
Santos HJ, Imai K, Makiuchi T, Tomii K, Horton P, Nozawa A, Okada K, Tozawa Y, Nozaki T. Novel lineage-specific transmembrane β-barrel proteins in the endoplasmic reticulum of Entamoeba histolytica. FEBS J 2019; 286:3416-3432. [PMID: 31045303 DOI: 10.1111/febs.14870] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 03/06/2019] [Accepted: 04/29/2019] [Indexed: 12/11/2022]
Abstract
β-barrel outer membrane proteins (BOMPs) are essential components of outer membranes of Gram-negative bacteria and endosymbiotic organelles, usually involved in the transport of proteins and substrates across the membrane. Based on the analysis of our in silico BOMP predictor data for the Entamoeba histolytica genome, we detected a new transmembrane β-barrel domain-containing protein, EHI_192610. Sequence analysis revealed that this protein is unique to Entamoeba species, and it exclusively clusters with a homolog, EHI_099780, which is similarly lineage specific. Both proteins possess an N-terminal signal peptide sequence as well as multiple repeats that contain dyad hydrophobic periodicities. Data from immunofluorescence assay of trophozoites expressing the respective candidates showed the absence of colocalization with mitosomal marker, and interestingly demonstrated partial colocalization with endoplasmic reticulum (ER) proteins instead. Integration to organellar membrane was supported by carbonate fractionation assay and immunoelectron microscopy. CD analysis of reconstituted proteoliposomes containing EHI_192610 showed a spectrum demonstrating a predominant β-sheet structure, suggesting that this protein is β-strand rich. Furthermore, the presence of repeat regions with predicted transmembrane β-strand pairs in both EHI_192610 and EHI_099780, is consistent with the hypothesis that BOMPs originated from the amplification of ββ-hairpin modules, suggesting that the two Entamoeba-specific proteins are novel β-barrels, intriguingly localized partially to the ER membrane.
Collapse
Affiliation(s)
- Herbert J Santos
- Graduate School of Medicine, The University of Tokyo, Japan.,Department of Parasitology, National Institute of Infectious Diseases, Tokyo, Japan.,Graduate School of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Japan
| | - Kenichiro Imai
- Molecular Profiling Research Center for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.,Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.,Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Takashi Makiuchi
- Department of Infectious Diseases, Tokai University School of Medicine, Isehara, Japan
| | - Kentaro Tomii
- Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.,Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Paul Horton
- Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.,Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.,Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan City, Taiwan
| | - Akira Nozawa
- Proteo-Science Center, Ehime University, Matsuyama, Japan
| | - Kenta Okada
- Proteo-Science Center, Ehime University, Matsuyama, Japan
| | - Yuzuru Tozawa
- Graduate School of Science and Engineering, Saitama University, Japan
| | - Tomoyoshi Nozaki
- Graduate School of Medicine, The University of Tokyo, Japan.,Department of Parasitology, National Institute of Infectious Diseases, Tokyo, Japan.,Graduate School of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Japan
| |
Collapse
|
28
|
Sillitoe I, Dawson N, Lewis TE, Das S, Lees JG, Ashford P, Tolulope A, Scholes HM, Senatorov I, Bujan A, Ceballos Rodriguez-Conde F, Dowling B, Thornton J, Orengo CA. CATH: expanding the horizons of structure-based functional annotations for genome sequences. Nucleic Acids Res 2019; 47:D280-D284. [PMID: 30398663 PMCID: PMC6323983 DOI: 10.1093/nar/gky1097] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Revised: 10/16/2018] [Accepted: 11/02/2018] [Indexed: 01/20/2023] Open
Abstract
This article provides an update of the latest data and developments within the CATH protein structure classification database (http://www.cathdb.info). The resource provides two levels of release: CATH-B, a daily snapshot of the latest structural domain boundaries and superfamily assignments, and CATH+, which adds layers of derived data, such as predicted sequence domains, functional annotations and functional clustering (known as Functional Families or FunFams). The most recent CATH+ release (version 4.2) provides a huge update in the coverage of structural data. This release increases the number of fully- classified domains by over 40% (from 308 999 to 434 857 structural domains), corresponding to an almost two- fold increase in sequence data (from 53 million to over 95 million predicted domains) organised into 6119 superfamilies. The coverage of high-resolution, protein PDB chains that contain at least one assigned CATH domain is now 90.2% (increased from 82.3% in the previous release). A number of highly requested features have also been implemented in our web pages: allowing the user to view an alignment between their query sequence and a representative FunFam structure and providing tools that make it easier to view the full structural context (multi-domain architecture) of domains and chains.
Collapse
Affiliation(s)
- Ian Sillitoe
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Natalie Dawson
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Tony E Lewis
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Sayoni Das
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Jonathan G Lees
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Paul Ashford
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Adeyelu Tolulope
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Harry M Scholes
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Ilya Senatorov
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Andra Bujan
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | | | - Benjamin Dowling
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Janet Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Christine A Orengo
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| |
Collapse
|
29
|
Abstract
This chapter reviews current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this will directly impact which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multi-domain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly). We end by a discussion of some available tools for computational analysis or exploitation of protein domain architectures and their evolution.
Collapse
|
30
|
Zavadil Kokáš F, Bergougnoux V, Majeská Čudejková M. SATrans: New Free Available Software for Annotation of Transcriptome and Functional Analysis of Differentially Expressed Genes. J Comput Biol 2018; 26:117-123. [PMID: 30328709 DOI: 10.1089/cmb.2018.0149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Recent technological advances have made next-generation sequencing (NGS) a popular and financially accessible technique allowing a broad range of analyses to be done simultaneously. A huge amount of newly generated NGS data, however, require advanced software support to help both in analyzing the data and biologically interpreting the results. In this article, we describe SATrans (Software for Annotation of Transcriptome), a software package providing fast and robust functional annotation of novel sequences obtained from transcriptome sequencing. Moreover, it performs advanced gene ontology analysis of differentially expressed genes, thereby helping to interpret biologically-and in a user-friendly form-the quantitative changes in gene expression. The software is freely available and provides the possibility to work with thousands of sequences using a standard personal computer or notebook running on the Linux operating system.
Collapse
Affiliation(s)
- Filip Zavadil Kokáš
- Department of Molecular Biology, Centre of Region Haná for Biotechnological and Agricultural Research, Palacký University in Olomouc, Olomouc, Czech Republic
| | - Véronique Bergougnoux
- Department of Molecular Biology, Centre of Region Haná for Biotechnological and Agricultural Research, Palacký University in Olomouc, Olomouc, Czech Republic
| | - Mária Majeská Čudejková
- Department of Molecular Biology, Centre of Region Haná for Biotechnological and Agricultural Research, Palacký University in Olomouc, Olomouc, Czech Republic
| |
Collapse
|
31
|
Bhowmick P, Mohammed Y, Borchers CH. MRMAssayDB: an integrated resource for validated targeted proteomics assays. Bioinformatics 2018; 34:3566-3571. [PMID: 29762640 PMCID: PMC6184479 DOI: 10.1093/bioinformatics/bty385] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Revised: 04/28/2018] [Accepted: 05/10/2018] [Indexed: 02/04/2023] Open
Abstract
Motivation Multiple Reaction Monitoring (MRM)-based targeted proteomics is increasingly being used to study the molecular basis of disease. When combined with an internal standard, MRM allows absolute quantification of proteins in virtually any type of sample but the development and validation of an MRM assay for a specific protein is laborious. Therefore, several public repositories now host targeted proteomics MRM assays, including NCI's Clinical Proteomic Tumor Analysis Consortium assay portals, PeptideAtlas SRM Experiment Library, SRMAtlas, PanoramaWeb and PeptideTracker, with all of which contain different levels of information. Results Here we present MRMAssayDB, a web-based application that integrates these repositories into a single resource. MRMAssayDB maps and links the targeted assays, annotates the proteins with information from UniProtKB, KEGG pathways and Gene Ontologies, and provides several visualization options on the peptide and protein level. Currently MRMAssayDB contains >168K assays covering more than 34K proteins from 63 organisms; >13.5K of these proteins are present in >2.3K KEGG biological pathways corresponding to >300 master pathways, and mapping to >13K GO biological processes. MRMAssayDB allows comprehensive searches for a targeted-proteomics assay depending on the user's interests, by using target-protein name or accession number, or using annotations such as subcellular localization, biological pathway, or disease or drug associations. The user can see how many data repositories include a specific peptide assay, and the commonly used transitions for each peptide in all empirical data from the repositories. Availability and implementation http://mrmassaydb.proteincentre.com. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pallab Bhowmick
- University of Victoria – Genome British Columbia Proteomics Centre, Victoria, BC, Canada
| | - Yassene Mohammed
- University of Victoria – Genome British Columbia Proteomics Centre, Victoria, BC, Canada
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, ZA, The Netherlands
| | - Christoph H Borchers
- University of Victoria – Genome British Columbia Proteomics Centre, Victoria, BC, Canada
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada
- Proteomics Centre, Segal Cancer Centre, Lady Davis Institute, Jewish General Hospital, McGill University, Montreal, QC, Canada
- Gerald Bronfman Department of Oncology, Jewish General Hospital, Montreal, QC, Canada
| |
Collapse
|
32
|
Lykins JD, Filippova EV, Halavaty AS, Minasov G, Zhou Y, Dubrovska I, Flores KJ, Shuvalova LA, Ruan J, El Bissati K, Dovgin S, Roberts CW, Woods S, Moulton JD, Moulton H, McPhillie MJ, Muench SP, Fishwick CWG, Sabini E, Shanmugam D, Roos DS, McLeod R, Anderson WF, Ngô HM. CSGID Solves Structures and Identifies Phenotypes for Five Enzymes in Toxoplasma gondii. Front Cell Infect Microbiol 2018; 8:352. [PMID: 30345257 PMCID: PMC6182094 DOI: 10.3389/fcimb.2018.00352] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 09/14/2018] [Indexed: 12/23/2022] Open
Abstract
Toxoplasma gondii, an Apicomplexan parasite, causes significant morbidity and mortality, including severe disease in immunocompromised hosts and devastating congenital disease, with no effective treatment for the bradyzoite stage. To address this, we used the Tropical Disease Research database, crystallography, molecular modeling, and antisense to identify and characterize a range of potential therapeutic targets for toxoplasmosis. Phosphoglycerate mutase II (PGMII), nucleoside diphosphate kinase (NDK), ribulose phosphate 3-epimerase (RPE), ribose-5-phosphate isomerase (RPI), and ornithine aminotransferase (OAT) were structurally characterized. Crystallography revealed insights into the overall structure, protein oligomeric states and molecular details of active sites important for ligand recognition. Literature and molecular modeling suggested potential inhibitors and druggability. The targets were further studied with vivoPMO to interrupt enzyme synthesis, identifying the targets as potentially important to parasitic replication and, therefore, of therapeutic interest. Targeted vivoPMO resulted in statistically significant perturbation of parasite replication without concomitant host cell toxicity, consistent with a previous CRISPR/Cas9 screen showing PGM, RPE, and RPI contribute to parasite fitness. PGM, RPE, and RPI have the greatest promise for affecting replication in tachyzoites. These targets are shared between other medically important parasites and may have wider therapeutic potential.
Collapse
Affiliation(s)
- Joseph D. Lykins
- Pritzker School of Medicine, University of Chicago, Chicago, IL, United States
| | - Ekaterina V. Filippova
- Center for Structural Genomics of Infectious Diseases and the Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Andrei S. Halavaty
- Center for Structural Genomics of Infectious Diseases and the Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - George Minasov
- Center for Structural Genomics of Infectious Diseases and the Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Ying Zhou
- Department of Ophthalmology and Visual Sciences, University of Chicago, Chicago, IL, United States
| | - Ievgeniia Dubrovska
- Center for Structural Genomics of Infectious Diseases and the Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Kristin J. Flores
- Center for Structural Genomics of Infectious Diseases and the Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Ludmilla A. Shuvalova
- Center for Structural Genomics of Infectious Diseases and the Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Jiapeng Ruan
- Center for Structural Genomics of Infectious Diseases and the Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Kamal El Bissati
- Department of Ophthalmology and Visual Sciences, University of Chicago, Chicago, IL, United States
| | - Sarah Dovgin
- Illinois Math and Science Academy, Aurora, IL, United States
| | - Craig W. Roberts
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, United Kingdom
| | - Stuart Woods
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, United Kingdom
| | | | - Hong Moulton
- Department of Biomedical Sciences, College of Veterinary Medicine, Oregon State University, Corvallis, OR, United States
| | - Martin J. McPhillie
- Department of Molecular Biology and Biotechnology, University of Sheffield, Sheffield, United Kingdom
| | - Stephen P. Muench
- School of Biomedical Sciences, Faculty of Biological Sciences, and Astbury Centre for Structural Molecular Biology, University of Leeds, Leeds, United Kingdom
| | - Colin W. G. Fishwick
- School of Chemistry and Astbury Centre for Structural Molecular Biology, University of Leeds, Leeds, United Kingdom
| | - Elisabetta Sabini
- Center for Structural Genomics of Infectious Diseases and the Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | | | - David S. Roos
- Department of Biology, University of Pennsylvania, Philadelphia, PA, United States
| | - Rima McLeod
- Department of Ophthalmology and Visual Sciences, University of Chicago, Chicago, IL, United States
- Department of Pediatrics (Infectious Diseases), Institute of Genomics, Genetics, and Systems Biology, Global Health Center, Toxoplasmosis Center, CHeSS, The College, University of Chicago, Chicago, IL, United States
| | - Wayne F. Anderson
- Center for Structural Genomics of Infectious Diseases and the Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Huân M. Ngô
- Center for Structural Genomics of Infectious Diseases and the Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
- BrainMicro LLC, New Haven, CT, United States
| |
Collapse
|
33
|
Martínez-Cruz J, Romero D, de la Torre FN, Fernández-Ortuño D, Torés JA, de Vicente A, Pérez-García A. The Functional Characterization of Podosphaera xanthii Candidate Effector Genes Reveals Novel Target Functions for Fungal Pathogenicity. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2018; 31:914-931. [PMID: 29513627 DOI: 10.1094/mpmi-12-17-0318-r] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Podosphaera xanthii is the main causal agent of powdery mildew disease in cucurbits. In a previous study, we determined that P. xanthii expresses approximately 50 Podosphaera effector candidates (PECs), identified based on the presence of a predicted signal peptide and the absence of functional annotation. In this work, we used host-induced gene silencing (HIGS), employing Agrobacterium tumefaciens as a vector for the delivery of the silencing constructs (ATM-HIGS), to identify genes involved in early plant-pathogen interaction. The analysis of seven selected PEC-encoding genes showed that six of them, PEC007, PEC009, PEC019, PEC032, PEC034, and PEC054, are required for P. xanthii pathogenesis, as revealed by reduced fungal growth and increased production of hydrogen peroxide by host cells. In addition, protein models and protein-ligand predictions allowed us to identify putative functions for these candidates. The biochemical activities of PEC019, PEC032, and PEC054 were elucidated using their corresponding proteins expressed in Escherichia coli. These proteins were confirmed as phospholipid-binding protein, α-mannosidase, and cellulose-binding protein. Further, BLAST searches showed that these three effectors are widely distributed in phytopathogenic fungi. These results suggest novel targets for fungal effectors, such as host-cell plasma membrane, host-cell glycosylation, and damage-associated molecular pattern-triggered immunity.
Collapse
Affiliation(s)
- Jesús Martínez-Cruz
- 1 Departamento de Microbiología, Facultad de Ciencias, Universidad de Málaga and Instituto de Hortofruticultura Subtropical y Mediterránea "La Mayora"-Universidad de Málaga-Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), 29071 Málaga, Spain
| | - Diego Romero
- 1 Departamento de Microbiología, Facultad de Ciencias, Universidad de Málaga and Instituto de Hortofruticultura Subtropical y Mediterránea "La Mayora"-Universidad de Málaga-Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), 29071 Málaga, Spain
| | - Fernando N de la Torre
- 2 Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de Málaga, 29071 Málaga, Spain; and
| | - Dolores Fernández-Ortuño
- 3 Instituto de Hortofruticultura Subtropical y Mediterránea "La Mayora"-Universidad de Málaga-Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), 29750 Algarrobo-Costa, Málaga, Spain
| | - Juan A Torés
- 3 Instituto de Hortofruticultura Subtropical y Mediterránea "La Mayora"-Universidad de Málaga-Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), 29750 Algarrobo-Costa, Málaga, Spain
| | - Antonio de Vicente
- 1 Departamento de Microbiología, Facultad de Ciencias, Universidad de Málaga and Instituto de Hortofruticultura Subtropical y Mediterránea "La Mayora"-Universidad de Málaga-Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), 29071 Málaga, Spain
| | - Alejandro Pérez-García
- 1 Departamento de Microbiología, Facultad de Ciencias, Universidad de Málaga and Instituto de Hortofruticultura Subtropical y Mediterránea "La Mayora"-Universidad de Málaga-Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), 29071 Málaga, Spain
| |
Collapse
|
34
|
Qi F, Motz M, Jung K, Lassak J, Frishman D. Evolutionary analysis of polyproline motifs in Escherichia coli reveals their regulatory role in translation. PLoS Comput Biol 2018; 14:e1005987. [PMID: 29389943 PMCID: PMC5811046 DOI: 10.1371/journal.pcbi.1005987] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Revised: 02/13/2018] [Accepted: 01/17/2018] [Indexed: 12/14/2022] Open
Abstract
Translation of consecutive prolines causes ribosome stalling, which is alleviated but cannot be fully compensated by the elongation factor P. However, the presence of polyproline motifs in about one third of the E. coli proteins underlines their potential functional importance, which remains largely unexplored. We conducted an evolutionary analysis of polyproline motifs in the proteomes of 43 E. coli strains and found evidence of evolutionary selection against translational stalling, which is especially pronounced in proteins with high translational efficiency. Against the overall trend of polyproline motif loss in evolution, we observed their enrichment in the vicinity of translational start sites, in the inter-domain regions of multi-domain proteins, and downstream of transmembrane helices. Our analysis demonstrates that the time gain caused by ribosome pausing at polyproline motifs might be advantageous in protein regions bracketing domains and transmembrane helices. Polyproline motifs might therefore be crucial for co-translational folding and membrane insertion. Polyproline motifs induce ribosome stalling during translation, but the functional significance of this effect remains unclear. Our evolutionary analysis of polyproline motifs reveals that they are disfavored in E. coli proteomes as a consequence of the reduced translation efficiency, supporting the conjecture that translation efficiency-based evolutionary pressure shapes protein sequences. Enrichment of polyproline motifs in the protein regions bracketing structural domains and transmembrane helices indicates their regulatory role in co-translational protein folding and transmembrane helix insertion. Polyproline motifs could thus serve as protein-level cis-acting elements, which directly regulate the rate of translation elongation.
Collapse
Affiliation(s)
- Fei Qi
- Department of Bioinformatics, Wissenschaftzentrum Weihenstephan, Technische Universität München, Freising, Germany
| | - Magdalena Motz
- Center for Integrated Protein Science Munich, Ludwig-Maximilians-Universität München, Munich, Germany.,Department of Biology I, Microbiology, Ludwig-Maximilians-Universität München, Martinsried, Germany
| | - Kirsten Jung
- Center for Integrated Protein Science Munich, Ludwig-Maximilians-Universität München, Munich, Germany.,Department of Biology I, Microbiology, Ludwig-Maximilians-Universität München, Martinsried, Germany
| | - Jürgen Lassak
- Center for Integrated Protein Science Munich, Ludwig-Maximilians-Universität München, Munich, Germany.,Department of Biology I, Microbiology, Ludwig-Maximilians-Universität München, Martinsried, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Wissenschaftzentrum Weihenstephan, Technische Universität München, Freising, Germany.,St Petersburg State Polytechnic University, St Petersburg, Russia
| |
Collapse
|
35
|
Rojas-Lopez M, Zorgani MA, Kelley LA, Bailly X, Kajava AV, Henderson IR, Polticelli F, Pizza M, Rosini R, Desvaux M. Identification of the Autochaperone Domain in the Type Va Secretion System (T5aSS): Prevalent Feature of Autotransporters with a β-Helical Passenger. Front Microbiol 2018; 8:2607. [PMID: 29375499 PMCID: PMC5767081 DOI: 10.3389/fmicb.2017.02607] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 12/14/2017] [Indexed: 11/29/2022] Open
Abstract
Autotransporters (ATs) belong to a family of modular proteins secreted by the Type V, subtype a, secretion system (T5aSS) and considered as an important source of virulence factors in lipopolysaccharidic diderm bacteria (archetypical Gram-negative bacteria). While exported by the Sec pathway, the ATs are further secreted across the outer membrane via their own C-terminal translocator forming a β-barrel, through which the rest of the protein, namely the passenger, can pass. In several ATs, an autochaperone domain (AC) present at the C-terminal region of the passenger and upstream of the translocator was demonstrated as strictly required for proper secretion and folding. However, considering it was functionally characterised and identified only in a handful of ATs, wariness recently fells on the commonality and conservation of this structural element in the T5aSS. To circumvent the issue of sequence divergence and taking advantage of the resolved three-dimensional structure of some ACs, identification of this domain was performed following structural alignment among all AT passengers experimentally resolved by crystallography before searching in a dataset of 1523 ATs. While demonstrating that the AC is indeed a conserved structure found in numerous ATs, phylogenetic analysis further revealed a distribution into deeply rooted branches, from which emerge 20 main clusters. Sequence analysis revealed that an AC could be identified in the large majority of SAATs (self-associating ATs) but not in any LEATs (lipase/esterase ATs) nor in some PATs (protease autotransporters) and PHATs (phosphatase/hydrolase ATs). Structural analysis indicated that an AC was present in passengers exhibiting single-stranded right-handed parallel β-helix, whatever the type of β-solenoid, but not with α-helical globular fold. From this investigation, the AC of type 1 appears as a prevalent and conserved structural element exclusively associated to β-helical AT passenger and should promote further studies about the protein secretion and folding via the T5aSS, especially toward α-helical AT passengers.
Collapse
Affiliation(s)
- Maricarmen Rojas-Lopez
- Université Clermont Auvergne, INRA, UMR454 MEDiS, Clermont-Ferrand, France.,GSK, Siena, Italy
| | - Mohamed A Zorgani
- Université Clermont Auvergne, INRA, UMR454 MEDiS, Clermont-Ferrand, France
| | - Lawrence A Kelley
- Structural Bioinformatics Group, Imperial College London, London, United Kingdom
| | - Xavier Bailly
- Institut National de la Recherche Agronomique, UR346 Epidémiologie Animale, Saint Genès Champanelle, France
| | - Andrey V Kajava
- CRBM UMR5237 CNRS, Institut de Biologie Computationnelle, Université Montpellier, Montpellier, France
| | - Ian R Henderson
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, United Kingdom
| | - Fabio Polticelli
- Department of Sciences, National Institute of Nuclear Physics, Roma Tre University, Rome, Italy
| | | | | | - Mickaël Desvaux
- Université Clermont Auvergne, INRA, UMR454 MEDiS, Clermont-Ferrand, France
| |
Collapse
|
36
|
Laddach A, Ng JCF, Chung SS, Fraternali F. Genetic variants and protein-protein interactions: a multidimensional network-centric view. Curr Opin Struct Biol 2018; 50:82-90. [PMID: 29306755 DOI: 10.1016/j.sbi.2017.12.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Revised: 12/19/2017] [Accepted: 12/20/2017] [Indexed: 01/18/2023]
Abstract
We review recent progress in the mapping of genetic variants to proteins, in the context of their interactions, as measured from experiments and/or computational predictions. Such variants can impact on the molecular mechanisms underlying an interaction and its stability. We highlight recent work which relies on the effective use of protein-protein interaction networks (PPINs), integrated with 3D structural information, for evaluating disease-associated variants. Furthermore, we discuss how the integration of multiple layers of biological information, in the context of PPINs, can improve the interpretation of genetic variants and inspire new therapeutic strategies.
Collapse
Affiliation(s)
- Anna Laddach
- Randall Division of Cell and Molecular Biophysics, King's College London, UK
| | - Joseph Chi-Fung Ng
- Randall Division of Cell and Molecular Biophysics, King's College London, UK
| | - Sun Sook Chung
- Randall Division of Cell and Molecular Biophysics, King's College London, UK; Department of Haematological Medicine, King's College London, UK
| | - Franca Fraternali
- Randall Division of Cell and Molecular Biophysics, King's College London, UK.
| |
Collapse
|
37
|
Isom DG, Page SC, Collins LB, Kapolka NJ, Taghon GJ, Dohlman HG. Coordinated regulation of intracellular pH by two glucose-sensing pathways in yeast. J Biol Chem 2017; 293:2318-2329. [PMID: 29284676 DOI: 10.1074/jbc.ra117.000422] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Revised: 12/22/2017] [Indexed: 12/19/2022] Open
Abstract
The yeast Saccharomyces cerevisiae employs multiple pathways to coordinate sugar availability and metabolism. Glucose and other sugars are detected by a G protein-coupled receptor, Gpr1, as well as a pair of transporter-like proteins, Rgt2 and Snf3. When glucose is limiting, however, an ATP-driven proton pump (Pma1) is inactivated, leading to a marked decrease in cytoplasmic pH. Here we determine the relative contribution of the two sugar-sensing pathways to pH regulation. Whereas cytoplasmic pH is strongly dependent on glucose abundance and is regulated by both glucose-sensing pathways, ATP is largely unaffected and therefore cannot account for the changes in Pma1 activity. These data suggest that the pH is a second messenger of the glucose-sensing pathways. We show further that different sugars differ in their ability to control cellular acidification, in the manner of inverse agonists. We conclude that the sugar-sensing pathways act via Pma1 to invoke coordinated changes in cellular pH and metabolism. More broadly, our findings support the emerging view that cellular systems have evolved the use of pH signals as a means of adapting to environmental stresses such as those caused by hypoxia, ischemia, and diabetes.
Collapse
Affiliation(s)
- Daniel G Isom
- From the Department of Pharmacology, University of North Carolina, Chapel Hill, North Carolina 27599-7365, .,the Department of Molecular and Cellular Pharmacology, University of Miami Miller School of Medicine, Miami, Florida 33136, and
| | - Stephani C Page
- From the Department of Pharmacology, University of North Carolina, Chapel Hill, North Carolina 27599-7365
| | - Leonard B Collins
- the Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599-7432
| | - Nicholas J Kapolka
- the Department of Molecular and Cellular Pharmacology, University of Miami Miller School of Medicine, Miami, Florida 33136, and
| | - Geoffrey J Taghon
- the Department of Molecular and Cellular Pharmacology, University of Miami Miller School of Medicine, Miami, Florida 33136, and
| | - Henrik G Dohlman
- From the Department of Pharmacology, University of North Carolina, Chapel Hill, North Carolina 27599-7365,
| |
Collapse
|
38
|
González-Sánchez A, Cubillas CA, Miranda F, Dávalos A, García-de Los Santos A. The ropAe gene encodes a porin-like protein involved in copper transit in Rhizobium etli CFN42. Microbiologyopen 2017; 7:e00573. [PMID: 29280343 PMCID: PMC6011978 DOI: 10.1002/mbo3.573] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2017] [Revised: 11/19/2017] [Accepted: 11/21/2017] [Indexed: 11/16/2022] Open
Abstract
Copper (Cu) is an essential micronutrient for all aerobic forms of life. Its oxidation states (Cu+/Cu2+) make this metal an important cofactor of enzymes catalyzing redox reactions in essential biological processes. In gram‐negative bacteria, Cu uptake is an unexplored component of a finely regulated trafficking network, mediated by protein–protein interactions that deliver Cu to target proteins and efflux surplus metal to avoid toxicity. Rhizobium etliCFN42 is a facultative symbiotic diazotroph that must ensure its appropriate Cu supply for living either free in the soil or as an intracellular symbiont of leguminous plants. In crop fields, rhizobia have to contend with copper‐based fungicides. A detailed deletion analysis of the pRet42e (505 kb) plasmid from an R. etli mutant with enhanced CuCl2 tolerance led us to the identification of the ropAe gene, predicted to encode an outer membrane protein (OMP) with a β–barrel channel structure that may be involved in Cu transport. In support of this hypothesis, the functional characterization of ropAe revealed that: (I) gene disruption increased copper tolerance of the mutant, and its complementation with the wild‐type gene restored its wild‐type copper sensitivity; (II) the ropAe gene maintains a low basal transcription level in copper overload, but is upregulated when copper is scarce; (III) disruption of ropAe in an actP (copA) mutant background, defective in copper efflux, partially reduced its copper sensitivity phenotype. Finally, BLASTP comparisons and a maximum likelihood phylogenetic analysis highlight the diversification of four RopA paralogs in members of the Rhizobiaceae family. Orthologs of RopAe are highly conserved in the Rhizobiales order, poorly conserved in other alpha proteobacteria and phylogenetically unrelated to characterized porins involved in Cu or Mn uptake.
Collapse
Affiliation(s)
- Antonio González-Sánchez
- Programa de Ingeniería Genómica, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Ciro A Cubillas
- Deparment of Developmental Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Fabiola Miranda
- Deparment of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Araceli Dávalos
- Programa de Ingeniería Genómica, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Alejandro García-de Los Santos
- Programa de Ingeniería Genómica, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| |
Collapse
|
39
|
Sharan M, Förstner KU, Eulalio A, Vogel J. APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins. Nucleic Acids Res 2017; 45:e96. [PMID: 28334975 PMCID: PMC5499795 DOI: 10.1093/nar/gkx137] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 02/27/2017] [Indexed: 11/14/2022] Open
Abstract
RNA-binding proteins (RBPs) have been established as core components of several post-transcriptional gene regulation mechanisms. Experimental techniques such as cross-linking and co-immunoprecipitation have enabled the identification of RBPs, RNA-binding domains (RBDs) and their regulatory roles in the eukaryotic species such as human and yeast in large-scale. In contrast, our knowledge of the number and potential diversity of RBPs in bacteria is poorer due to the technical challenges associated with the existing global screening approaches. We introduce APRICOT, a computational pipeline for the sequence-based identification and characterization of proteins using RBDs known from experimental studies. The pipeline identifies functional motifs in protein sequences using position-specific scoring matrices and Hidden Markov Models of the functional domains and statistically scores them based on a series of sequence-based features. Subsequently, APRICOT identifies putative RBPs and characterizes them by several biological properties. Here we demonstrate the application and adaptability of the pipeline on large-scale protein sets, including the bacterial proteome of Escherichia coli. APRICOT showed better performance on various datasets compared to other existing tools for the sequence-based prediction of RBPs by achieving an average sensitivity and specificity of 0.90 and 0.91 respectively. The command-line tool and its documentation are available at https://pypi.python.org/pypi/bio-apricot.
Collapse
Affiliation(s)
- Malvika Sharan
- Institute of Molecular Infection Biology, University of Würzburg, 97080 Würzburg, Germany
| | - Konrad U Förstner
- Core Unit Systems Medicine, University of Würzburg, 97080 Würzburg, Germany
| | - Ana Eulalio
- Institute of Molecular Infection Biology, University of Würzburg, 97080 Würzburg, Germany
| | - Jörg Vogel
- Institute of Molecular Infection Biology, University of Würzburg, 97080 Würzburg, Germany
| |
Collapse
|
40
|
Qin G, Xu C, Ming R, Tang H, Guyot R, Kramer EM, Hu Y, Yi X, Qi Y, Xu X, Gao Z, Pan H, Jian J, Tian Y, Yue Z, Xu Y. The pomegranate (Punica granatum L.) genome and the genomics of punicalagin biosynthesis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2017; 91:1108-1128. [PMID: 28654223 DOI: 10.1111/tpj.13625] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Revised: 06/15/2017] [Accepted: 06/21/2017] [Indexed: 05/21/2023]
Abstract
Pomegranate (Punica granatum L.) is a perennial fruit crop grown since ancient times that has been planted worldwide and is known for its functional metabolites, particularly punicalagins. We have sequenced and assembled the pomegranate genome with 328 Mb anchored into nine pseudo-chromosomes and annotated 29 229 gene models. A Myrtales lineage-specific whole-genome duplication event was detected that occurred in the common ancestor before the divergence of pomegranate and Eucalyptus. Repetitive sequences accounted for 46.1% of the assembled genome. We found that the integument development gene INNER NO OUTER (INO) was under positive selection and potentially contributed to the development of the fleshy outer layer of the seed coat, an edible part of pomegranate fruit. The genes encoding the enzymes for synthesis and degradation of lignin, hemicelluloses and cellulose were also differentially expressed between soft- and hard-seeded varieties, reflecting differences in their accumulation in cultivars differing in seed hardness. Candidate genes for punicalagin biosynthesis were identified and their expression patterns indicated that gallic acid synthesis in tissues could follow different biochemical pathways. The genome sequence of pomegranate provides a valuable resource for the dissection of many biological and biochemical traits and also provides important insights for the acceleration of breeding. Elucidation of the biochemical pathway(s) involved in punicalagin biosynthesis could assist breeding efforts to increase production of this bioactive compound.
Collapse
Affiliation(s)
- Gaihua Qin
- Institute of Horticulture, Anhui Academy of Agricultural Sciences, Hefei, 230031, China
- Key Laboratory of Genetic Improvement and Ecophysiology of Horticultural Crops, Hefei, Anhui Province, 230031, China
| | - Chunyan Xu
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Ray Ming
- Fujian Agriculture and Forestry University and University of Illinois at Urbana-Champaign School of Integrative Biology Joint Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61822, USA
| | - Haibao Tang
- Fujian Agriculture and Forestry University and University of Illinois at Urbana-Champaign School of Integrative Biology Joint Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Romain Guyot
- Institut de Recherche pour le Développement, Diversité, Adaptation et Développement des Plantes, Montpellier, 34394, France
| | - Elena M Kramer
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Yudong Hu
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Xingkai Yi
- Institute of Horticulture, Anhui Academy of Agricultural Sciences, Hefei, 230031, China
- Key Laboratory of Genetic Improvement and Ecophysiology of Horticultural Crops, Hefei, Anhui Province, 230031, China
| | - Yongjie Qi
- Institute of Horticulture, Anhui Academy of Agricultural Sciences, Hefei, 230031, China
- Key Laboratory of Genetic Improvement and Ecophysiology of Horticultural Crops, Hefei, Anhui Province, 230031, China
| | - Xiangyang Xu
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Zhenghui Gao
- Institute of Horticulture, Anhui Academy of Agricultural Sciences, Hefei, 230031, China
- Key Laboratory of Genetic Improvement and Ecophysiology of Horticultural Crops, Hefei, Anhui Province, 230031, China
| | - Haifa Pan
- Institute of Horticulture, Anhui Academy of Agricultural Sciences, Hefei, 230031, China
- Key Laboratory of Genetic Improvement and Ecophysiology of Horticultural Crops, Hefei, Anhui Province, 230031, China
| | - Jianbo Jian
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Yinping Tian
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Zhen Yue
- BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Yiliu Xu
- Institute of Horticulture, Anhui Academy of Agricultural Sciences, Hefei, 230031, China
- Key Laboratory of Genetic Improvement and Ecophysiology of Horticultural Crops, Hefei, Anhui Province, 230031, China
| |
Collapse
|
41
|
Richier B, Vijandi CDM, Mackensen S, Salecker I. Lapsyn controls branch extension and positioning of astrocyte-like glia in the Drosophila optic lobe. Nat Commun 2017; 8:317. [PMID: 28827667 PMCID: PMC5567088 DOI: 10.1038/s41467-017-00384-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Accepted: 06/21/2017] [Indexed: 11/09/2022] Open
Abstract
Astrocytes have diverse, remarkably complex shapes in different brain regions. Their branches closely associate with neurons. Despite the importance of this heterogeneous glial cell type for brain development and function, the molecular cues controlling astrocyte branch morphogenesis and positioning during neural circuit assembly remain largely unknown. We found that in the Drosophila visual system, astrocyte-like medulla neuropil glia (mng) variants acquire stereotypic morphologies with columnar and layered branching patterns in a stepwise fashion from mid-metamorphosis onwards. Using knockdown and loss-of-function analyses, we uncovered a previously unrecognized role for the transmembrane leucine-rich repeat protein Lapsyn in regulating mng development. lapsyn is expressed in mng and cell-autonomously required for branch extension into the synaptic neuropil and anchoring of cell bodies at the neuropil border. Lapsyn works in concert with the fibroblast growth factor (FGF) pathway to promote branch morphogenesis, while correct positioning is essential for mng survival mediated by gliotrophic FGF signaling. How glial cells, such as astrocytes, acquire their characteristic morphology during development is poorly understood. Here the authors describe the morphogenesis of astrocyte-like glia in the Drosophila optic lobe, and through a RNAi screen, they identify a transmembrane LRR protein–Lapsyn–that plays a critical role in this process.
Collapse
Affiliation(s)
- Benjamin Richier
- The Francis Crick Institute, Visual Circuit Assembly Laboratory, 1 Midland Road, London, NW1 1AT, UK.,The Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK
| | | | - Stefanie Mackensen
- The Francis Crick Institute, Visual Circuit Assembly Laboratory, 1 Midland Road, London, NW1 1AT, UK.,University of Münster, Institute of Neuro- and Behavioral Biology, Badestr. 9, 48149, Muenster, Germany
| | - Iris Salecker
- The Francis Crick Institute, Visual Circuit Assembly Laboratory, 1 Midland Road, London, NW1 1AT, UK.
| |
Collapse
|
42
|
Lam SD, Das S, Sillitoe I, Orengo C. An overview of comparative modelling and resources dedicated to large-scale modelling of genome sequences. Acta Crystallogr D Struct Biol 2017; 73:628-640. [PMID: 28777078 PMCID: PMC5571743 DOI: 10.1107/s2059798317008920] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 06/14/2017] [Indexed: 12/02/2022] Open
Abstract
Computational modelling of proteins has been a major catalyst in structural biology. Bioinformatics groups have exploited the repositories of known structures to predict high-quality structural models with high efficiency at low cost. This article provides an overview of comparative modelling, reviews recent developments and describes resources dedicated to large-scale comparative modelling of genome sequences. The value of subclustering protein domain superfamilies to guide the template-selection process is investigated. Some recent cases in which structural modelling has aided experimental work to determine very large macromolecular complexes are also cited.
Collapse
Affiliation(s)
- Su Datt Lam
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
- School of Biosciences and Biotechnology, Faculty of Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
| | - Sayoni Das
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| | - Christine Orengo
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| |
Collapse
|
43
|
Galperin MY, Fernández-Suárez XM, Rigden DJ. The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes. Nucleic Acids Res 2017; 45:D1-D11. [PMID: 28053160 PMCID: PMC5210597 DOI: 10.1093/nar/gkw1188] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 11/16/2016] [Indexed: 12/23/2022] Open
Abstract
This year's Database Issue of Nucleic Acids Research contains 152 papers that include descriptions of 54 new databases and update papers on 98 databases, of which 16 have not been previously featured in NAR As always, these databases cover a broad range of molecular biology subjects, including genome structure, gene expression and its regulation, proteins, protein domains, and protein-protein interactions. Following the recent trend, an increasing number of new and established databases deal with the issues of human health, from cancer-causing mutations to drugs and drug targets. In accordance with this trend, three recently compiled databases that have been selected by NAR reviewers and editors as 'breakthrough' contributions, denovo-db, the Monarch Initiative, and Open Targets, cover human de novo gene variants, disease-related phenotypes in model organisms, and a bioinformatics platform for therapeutic target identification and validation, respectively. We expect these databases to attract the attention of numerous researchers working in various areas of genetics and genomics. Looking back at the past 12 years, we present here the 'golden set' of databases that have consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database. The Database Issue is freely available online at the https://academic.oup.com/nar web site. An updated version of the NAR Molecular Biology Database Collection is available at http://www.oxfordjournals.org/nar/database/a/.
Collapse
Affiliation(s)
- Michael Y Galperin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Daniel J Rigden
- Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| |
Collapse
|
44
|
CATH-Gene3D: Generation of the Resource and Its Use in Obtaining Structural and Functional Annotations for Protein Sequences. Methods Mol Biol 2017; 1558:79-110. [PMID: 28150234 DOI: 10.1007/978-1-4939-6783-4_4] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
This chapter describes the generation of the data in the CATH-Gene3D online resource and how it can be used to study protein domains and their evolutionary relationships. Methods will be presented for: comparing protein structures, recognizing homologs, predicting domain structures within protein sequences, and subclassifying superfamilies into functionally pure families, together with a guide on using the webpages.
Collapse
|
45
|
Abstract
Protein function is a concept that can have different interpretations in different biological contexts, and the number and diversity of novel proteins identified by large-scale "omics" technologies poses increasingly new challenges. In this review we explore current strategies used to predict protein function focused on high-throughput sequence analysis, as for example, inference based on sequence similarity, sequence composition, structure, and protein-protein interaction. Various prediction strategies are discussed together with illustrative workflows highlighting the use of some benchmark tools and knowledge bases in the field.
Collapse
Affiliation(s)
- Leonardo Magalhães Cruz
- Department of Biochemistry and Molecular Biology, Federal University of Paraná (UFPR), Curitiba, PR, Brazil.
- Sector of Professional and Technological Education, Federal University of Paraná (UFPR), Curitiba, PR, Brazil.
| | - Sheyla Trefflich
- Sector of Professional and Technological Education, Federal University of Paraná (UFPR), Curitiba, PR, Brazil
| | - Vinícius Almir Weiss
- Sector of Professional and Technological Education, Federal University of Paraná (UFPR), Curitiba, PR, Brazil
| | - Mauro Antônio Alves Castro
- Sector of Professional and Technological Education, Federal University of Paraná (UFPR), Curitiba, PR, Brazil
| |
Collapse
|
46
|
Finn RD, Attwood TK, Babbitt PC, Bateman A, Bork P, Bridge AJ, Chang HY, Dosztányi Z, El-Gebali S, Fraser M, Gough J, Haft D, Holliday GL, Huang H, Huang X, Letunic I, Lopez R, Lu S, Marchler-Bauer A, Mi H, Mistry J, Natale DA, Necci M, Nuka G, Orengo CA, Park Y, Pesseat S, Piovesan D, Potter SC, Rawlings ND, Redaschi N, Richardson L, Rivoire C, Sangrador-Vegas A, Sigrist C, Sillitoe I, Smithers B, Squizzato S, Sutton G, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Xenarios I, Yeh LS, Young SY, Mitchell AL. InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res 2016; 45:D190-D199. [PMID: 27899635 PMCID: PMC5210578 DOI: 10.1093/nar/gkw1107] [Citation(s) in RCA: 1019] [Impact Index Per Article: 127.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 10/27/2016] [Indexed: 02/07/2023] Open
Abstract
InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.
Collapse
Affiliation(s)
- Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Patricia C Babbitt
- Department of Bioengineering & Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peer Bork
- European Molecular Biology Laboratory, Biocomputing, Meyerhofstasse 1, 69117 Heidelberg, Germany
| | - Alan J Bridge
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Hsin-Yu Chang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zsuzsanna Dosztányi
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Pázmány Péter sétány 1/c, Budapest, Hungary
| | - Sara El-Gebali
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Fraser
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julian Gough
- Computer Science department, University of Bristol, Woodland Road, Bristol BS8 1UB, UK
| | - David Haft
- Bioinformatics Department, J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD 20850, USA
| | - Gemma L Holliday
- Department of Bioengineering & Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Xiaosong Huang
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Ivica Letunic
- Biobyte Solutions GmbH, Bothestr. 142, 69126 Heidelberg, Germany
| | - Rodrigo Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Shennan Lu
- National Center for Biotechnology Information, National Library of Medicine, NIH Bldg, 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, NIH Bldg, 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Jaina Mistry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Darren A Natale
- Georgetown University Medical Center, 3300 Whitehaven St, NW, Washington, DC 20007, USA
| | - Marco Necci
- Department of Biomedical Sciences and CRIBI Biotech Center, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Gift Nuka
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Christine A Orengo
- Structural and Molecular Biology, University College London, Darwin Building, London WC1E 6BT, UK
| | - Youngmi Park
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sebastien Pesseat
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Damiano Piovesan
- Department of Biomedical Sciences and CRIBI Biotech Center, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Simon C Potter
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Neil D Rawlings
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nicole Redaschi
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Lorna Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Catherine Rivoire
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Amaia Sangrador-Vegas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Christian Sigrist
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Ian Sillitoe
- Structural and Molecular Biology, University College London, Darwin Building, London WC1E 6BT, UK
| | - Ben Smithers
- Computer Science department, University of Bristol, Woodland Road, Bristol BS8 1UB, UK
| | - Silvano Squizzato
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Granger Sutton
- Bioinformatics Department, J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD 20850, USA
| | - Narmada Thanki
- National Center for Biotechnology Information, National Library of Medicine, NIH Bldg, 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Silvio C E Tosatto
- Department of Biomedical Sciences and CRIBI Biotech Center, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy.,CNR Institute of Neuroscience, via U. Bassi 58/b, 35131 Padua, Italy
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Ioannis Xenarios
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Lai-Su Yeh
- Georgetown University Medical Center, 3300 Whitehaven St, NW, Washington, DC 20007, USA
| | - Siew-Yit Young
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alex L Mitchell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
47
|
Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, Orengo CA, Sillitoe I. CATH: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res 2016; 45:D289-D295. [PMID: 27899584 PMCID: PMC5210570 DOI: 10.1093/nar/gkw1098] [Citation(s) in RCA: 242] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Revised: 10/25/2016] [Accepted: 10/27/2016] [Indexed: 01/05/2023] Open
Abstract
The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site.
Collapse
Affiliation(s)
- Natalie L Dawson
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Tony E Lewis
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Sayoni Das
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Jonathan G Lees
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - David Lee
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Paul Ashford
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| |
Collapse
|
48
|
Lees JG, Dawson NL, Sillitoe I, Orengo CA. Functional innovation from changes in protein domains and their combinations. Curr Opin Struct Biol 2016; 38:44-52. [DOI: 10.1016/j.sbi.2016.05.016] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Revised: 05/17/2016] [Accepted: 05/24/2016] [Indexed: 10/21/2022]
|