1
|
Bahadır S, Abdulla MF, Mostafa K, Kavas M, Hacıkamiloğlu S, Kurt O, Yıldırım K. Exploring the role of FAT genes in Solanaceae species through genome-wide analysis and genome editing. THE PLANT GENOME 2024; 17:e20506. [PMID: 39253757 PMCID: PMC11628882 DOI: 10.1002/tpg2.20506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 07/08/2024] [Accepted: 08/04/2024] [Indexed: 09/11/2024]
Abstract
Plants produce numerous fatty acid derivatives, and some of these compounds have significant regulatory functions, such as governing effector-induced resistance, systemic resistance, and other defense pathways. This study systematically identified and characterized eight FAT genes (Acyl-acyl carrier protein thioesterases), four in the Solanum lycopersicum and four in the Solanum tuberosum genome. Phylogenetic analysis classified these genes into four distinct groups, exhibiting conserved domain structures across different plant species. Promoter analysis revealed various cis-acting elements, most of which are associated with stress responsiveness and growth and development. Micro-RNA (miRNA) analysis identified specific miRNAs, notably miRNA166, targeting different FAT genes in both species. Utilizing clustered regularly interspaced short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9)-mediated knockout, mutant lines for SlFATB1 and SlFATB3 were successfully generated and exhibited diverse mutation types. Biochemical evaluation of selected mutant lines revealed significant changes in fatty acid composition, with linoleic and linolenic acid content variations. The study also explored the impact of FAT gene knockout on tomato leaf architecture through scanning electron microscopy, providing insights into potential morphological alterations. Knocking out of FAT genes resulted in a significant reduction in both trichome and stoma density. These findings contribute to a comprehensive understanding of FAT genes in Solanaceous species, encompassing genetic, functional, and phenotypic aspects.
Collapse
Affiliation(s)
- Sibel Bahadır
- Faculty of Agriculture, Department of Agricultural BiotechnologyOndokuz Mayis UniversitySamsunTurkey
| | - Mohamed Farah Abdulla
- Faculty of Agriculture, Department of Agricultural BiotechnologyOndokuz Mayis UniversitySamsunTurkey
| | - Karam Mostafa
- Faculty of Agriculture, Department of Agricultural BiotechnologyOndokuz Mayis UniversitySamsunTurkey
- The Central Laboratory for Date Palm Research and Development, Agricultural Research Center (ARC)GizaEgypt
| | - Musa Kavas
- Faculty of Agriculture, Department of Agricultural BiotechnologyOndokuz Mayis UniversitySamsunTurkey
| | - Safa Hacıkamiloğlu
- Faculty of Agriculture, Department of Field CropsOndokuz Mayis UniversitySamsunTurkey
| | - Orhan Kurt
- Faculty of Agriculture, Department of Field CropsOndokuz Mayis UniversitySamsunTurkey
| | - Kubilay Yıldırım
- Faculty of Science, Department of Molecular Biology and GeneticsOndokuz Mayis UniversitySamsunTurkey
| |
Collapse
|
2
|
Klemm P, Stadler PF, Lechner M. Proteinortho6: pseudo-reciprocal best alignment heuristic for graph-based detection of (co-)orthologs. FRONTIERS IN BIOINFORMATICS 2023; 3:1322477. [PMID: 38152702 PMCID: PMC10751348 DOI: 10.3389/fbinf.2023.1322477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 11/06/2023] [Indexed: 12/29/2023] Open
Abstract
Proteinortho is a widely used tool to predict (co)-orthologous groups of genes for any set of species. It finds application in comparative and functional genomics, phylogenomics, and evolutionary reconstructions. With a rapidly increasing number of available genomes, the demand for large-scale predictions is also growing. In this contribution, we evaluate and implement major algorithmic improvements that significantly enhance the speed of the analysis without reducing precision. Graph-based detection of (co-)orthologs is typically based on a reciprocal best alignment heuristic that requires an all vs. all comparison of proteins from all species under study. The initial identification of similar proteins is accelerated by introducing an alternative search tool along with a revised search strategy-the pseudo-reciprocal best alignment heuristic-that reduces the number of required sequence comparisons by one-half. The clustering algorithm was reworked to efficiently decompose very large clusters and accelerate processing. Proteinortho6 reduces the overall processing time by an order of magnitude compared to its predecessor while maintaining its small memory footprint and good predictive quality.
Collapse
Affiliation(s)
- Paul Klemm
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-Universität Marburg, Marburg, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Institute of Computer Science and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
- Max-Planck-Institute for Mathematics in the Sciences, Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, Bogotá, Colombia
- Santa Fe Institute, Santa Fe, NM, United States
| | - Marcus Lechner
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-Universität Marburg, Marburg, Germany
| |
Collapse
|
3
|
Tantoso E, Eisenhaber B, Sinha S, Jensen LJ, Eisenhaber F. Did the early full genome sequencing of yeast boost gene function discovery? Biol Direct 2023; 18:46. [PMID: 37574542 PMCID: PMC10424406 DOI: 10.1186/s13062-023-00403-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 08/01/2023] [Indexed: 08/15/2023] Open
Abstract
BACKGROUND Although the genome of Saccharomyces cerevisiae (S. cerevisiae) was the first one of a eukaryote organism that was fully sequenced (in 1996), a complete understanding of the potential of encoded biomolecular mechanisms has not yet been achieved. Here, we wish to quantify how far the goal of a full list of S. cerevisiae gene functions still is. RESULTS The scientific literature about S. cerevisiae protein-coding genes has been mapped onto the yeast genome via the mentioning of names for genomic regions in scientific publications. The match was quantified with the ratio of a given gene name's occurrences to those of any gene names in the article. We find that ~ 230 elite genes with ≥ 75 full publication equivalents (FPEs, FPE = 1 is an idealized publication referring to just a single gene) command ~ 45% of all literature. At the same time, about two thirds of the genes (each with less than 10 FPEs) are described in just 12% of the literature (in average each such gene has just ~ 1.5% of the literature of an elite gene). About 600 genes have not been mentioned in any dedicated article. Compared with other groups of genes, the literature growth rates were highest for uncharacterized or understudied genes until late nineties of the twentieth century. Yet, these growth rates deteriorated and became negative thereafter. Thus, yeast function discovery for previously uncharacterized genes has returned to the level of ~ 1980. At the same time, literature for anyhow well-studied genes (with a threshold T10 (≥ 10 FPEs) and higher) remains steadily growing. CONCLUSIONS Did the early full genome sequencing of yeast boost gene function discovery? The data proves that the moment of publishing the full genome in reality coincides with the onset of decline of gene function discovery for previously uncharacterized genes. If the current status of literature about yeast molecular mechanisms can be extrapolated into the future, it will take about another ~ 50 years to complete the yeast gene function list. We found that a small group of scientific journals contributed extraordinarily to publishing early reports relevant to yeast gene function discoveries.
Collapse
Affiliation(s)
- Erwin Tantoso
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
| | - Birgit Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
- LASA - Lausitz Advanced Scientific Applications gGmbH, Straße Der Einheit 2-24, 02943, Weißwasser, Federal Republic of Germany.
| | - Swati Sinha
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Frank Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
- LASA - Lausitz Advanced Scientific Applications gGmbH, Straße Der Einheit 2-24, 02943, Weißwasser, Federal Republic of Germany.
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Republic of Singapore.
| |
Collapse
|
4
|
Tantoso E, Eisenhaber B, Sinha S, Jensen LJ, Eisenhaber F. About the dark corners in the gene function space of Escherichia coli remaining without illumination by scientific literature. Biol Direct 2023; 18:7. [PMID: 36855185 PMCID: PMC9976479 DOI: 10.1186/s13062-023-00362-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 02/21/2023] [Indexed: 03/02/2023] Open
Abstract
BACKGROUND Although Escherichia coli (E. coli) is the most studied prokaryote organism in the history of life sciences, many molecular mechanisms and gene functions encoded in its genome remain to be discovered. This work aims at quantifying the illumination of the E. coli gene function space by the scientific literature and how close we are towards the goal of a complete list of E. coli gene functions. RESULTS The scientific literature about E. coli protein-coding genes has been mapped onto the genome via the mentioning of names for genomic regions in scientific articles both for the case of the strain K-12 MG1655 as well as for the 95%-threshold softcore genome of 1324 E. coli strains with known complete genome. The article match was quantified with the ratio of a given gene name's occurrence to the mentioning of any gene names in the paper. The various genome regions have an extremely uneven literature coverage. A group of elite genes with ≥ 100 full publication equivalents (FPEs, FPE = 1 is an idealized publication devoted to just a single gene) attracts the lion share of the papers. For K-12, ~ 65% of the literature covers just 342 elite genes; for the softcore genome, ~ 68% of the FPEs is about only 342 elite gene families (GFs). We also find that most genes/GFs have at least one mentioning in a dedicated scientific article (with the exception of at least 137 protein-coding transcripts for K-12 and 26 GFs from the softcore genome). Whereas the literature growth rates were highest for uncharacterized or understudied genes until 2005-2010 compared with other groups of genes, they became negative thereafter. At the same time, literature for anyhow well-studied genes started to grow explosively with threshold T10 (≥ 10 FPEs). Typically, a body of ~ 20 actual articles generated over ~ 15 years of research effort was necessary to reach T10. Lineage-specific co-occurrence analysis of genes belonging to the accessory genome of E. coli together with genomic co-localization and sequence-analytic exploration hints previously completely uncharacterized genes yahV and yddL being associated with osmotic stress response/motility mechanisms. CONCLUSION If the numbers of scientific articles about uncharacterized and understudied genes remain at least at present levels, full gene function lists for the strain K-12 MG1655 and the E. coli softcore genome are in reach within the next 25-30 years. Once the literature body for a gene crosses 10 FPEs, most of the critical fundamental research risk appears overcome and steady incremental research becomes possible.
Collapse
Affiliation(s)
- Erwin Tantoso
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore
| | - Birgit Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore
| | - Swati Sinha
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.,European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Frank Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore. .,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore. .,School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Republic of Singapore.
| |
Collapse
|
5
|
Capel H, Weiler R, Dijkstra M, Vleugels R, Bloem P, Feenstra KA. ProteinGLUE multi-task benchmark suite for self-supervised protein modeling. Sci Rep 2022; 12:16047. [PMID: 36163232 PMCID: PMC9512797 DOI: 10.1038/s41598-022-19608-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 08/31/2022] [Indexed: 11/09/2022] Open
Abstract
Self-supervised language modeling is a rapidly developing approach for the analysis of protein sequence data. However, work in this area is heterogeneous and diverse, making comparison of models and methods difficult. Moreover, models are often evaluated only on one or two downstream tasks, making it unclear whether the models capture generally useful properties. We introduce the ProteinGLUE benchmark for the evaluation of protein representations: a set of seven per-amino-acid tasks for evaluating learned protein representations. We also offer reference code, and we provide two baseline models with hyperparameters specifically trained for these benchmarks. Pre-training was done on two tasks, masked symbol prediction and next sentence prediction. We show that pre-training yields higher performance on a variety of downstream tasks such as secondary structure and protein interaction interface prediction, compared to no pre-training. However, the larger base model does not outperform the smaller medium model. We expect the ProteinGLUE benchmark dataset introduced here, together with the two baseline pre-trained models and their performance evaluations, to be of great value to the field of protein sequence-based property prediction. Availability: code and datasets from https://github.com/ibivu/protein-glue .
Collapse
Affiliation(s)
- Henriette Capel
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - Robin Weiler
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - Maurits Dijkstra
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - Reinier Vleugels
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - Peter Bloem
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - K Anton Feenstra
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands.
| |
Collapse
|
6
|
Escorcia-Rodríguez JM, Esposito M, Freyre-González JA, Moreno-Hagelsieb G. Non-synonymous to synonymous substitutions suggest that orthologs tend to keep their functions, while paralogs are a source of functional novelty. PeerJ 2022; 10:e13843. [PMID: 36065404 PMCID: PMC9440661 DOI: 10.7717/peerj.13843] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 07/14/2022] [Indexed: 01/18/2023] Open
Abstract
Orthologs separate after lineages split from each other and paralogs after gene duplications. Thus, orthologs are expected to remain more functionally coherent across lineages, while paralogs have been proposed as a source of new functions. Because protein functional divergence follows from non-synonymous substitutions, we performed an analysis based on the ratio of non-synonymous to synonymous substitutions (dN/dS), as proxy for functional divergence. We used five working definitions of orthology, including reciprocal best hits (RBH), among other definitions based on network analyses and clustering. The results showed that orthologs, by all definitions tested, had values of dN/dS noticeably lower than those of paralogs, suggesting that orthologs generally tend to be more functionally stable than paralogs. The differences in dN/dS ratios remained suggesting the functional stability of orthologs after eliminating gene comparisons with potential problems, such as genes with high codon usage biases, low coverage of either of the aligned sequences, or sequences with very high similarities. Separation by percent identity of the encoded proteins showed that the differences between the dN/dS ratios of orthologs and paralogs were more evident at high sequence identity, less so as identity dropped. The last results suggest that the differences between dN/dS ratios were partially related to differences in protein identity. However, they also suggested that paralogs undergo functional divergence relatively early after duplication. Our analyses indicate that choosing orthologs as probably functionally coherent remains the right approach in comparative genomics.
Collapse
Affiliation(s)
- Juan M. Escorcia-Rodríguez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autonóma de México, Cuernavaca, Morelos, México
| | - Mario Esposito
- Department of Biology, Wilfrid Laurier University, Waterloo, Canada
| | - Julio A. Freyre-González
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autonóma de México, Cuernavaca, Morelos, México
| | | |
Collapse
|
7
|
Phyletic Distribution and Diversification of the Phage Shock Protein Stress Response System in Bacteria and Archaea. mSystems 2022; 7:e0134821. [PMID: 35604119 PMCID: PMC9239133 DOI: 10.1128/msystems.01348-21] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The PspA protein domain is found in all domains of life, highlighting its central role in Psp networks. To date, all insights into the core functions of Psp responses derive mainly from protein network blueprints representing only three bacterial phyla.
Collapse
|
8
|
Abstract
Behavior genetics is a controversial science. For decades, scholars have sought to understand the role of heredity in human behavior and life-course outcomes. Recently, technological advances and the rapid expansion of genomic databases have facilitated the discovery of genes associated with human phenotypes such as educational attainment and substance use disorders. To maximize the potential of this flourishing science, and to minimize potential harms, careful analysis of what it would mean for genes to be causes of human behavior is needed. In this paper, we advance a framework for identifying instances of genetic causes, interpreting those causal relationships, and applying them to advance causal knowledge more generally in the social sciences. Central to thinking about genes as causes is counterfactual reasoning, the cornerstone of causal thinking in statistics, medicine, and philosophy. We argue that within-family genetic effects represent the product of a counterfactual comparison in the same way as average treatment effects (ATEs) from randomized controlled trials (RCTs). Both ATEs from RCTs and within-family genetic effects are shallow causes: They operate within intricate causal systems (non-unitary), produce heterogeneous effects across individuals (non-uniform), and are not mechanistically informative (non-explanatory). Despite these limitations, shallow causal knowledge can be used to improve understanding of the etiology of human behavior and to explore sources of heterogeneity and fade-out in treatment effects.
Collapse
Affiliation(s)
- James W Madole
- Department of Psychology, University of Texas at Austin, Austin, TX, USA
- VA Puget Sound Health Care System, Seattle, WA, USA
| | - K Paige Harden
- Department of Psychology, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
9
|
Zhao C, Liu T, Wang Z. Functional Similarities of Protein-Coding Genes in Topologically Associating Domains and Spatially-Proximate Genomic Regions. Genes (Basel) 2022; 13:genes13030480. [PMID: 35328034 PMCID: PMC8951421 DOI: 10.3390/genes13030480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2022] [Revised: 02/26/2022] [Accepted: 03/05/2022] [Indexed: 02/01/2023] Open
Abstract
Topologically associating domains (TADs) are the structural and functional units of the genome. However, the functions of protein-coding genes existing in the same or different TADs have not been fully investigated. We compared the functional similarities of protein-coding genes existing in the same TAD and between different TADs, and also in the same gap region (the region between two consecutive TADs) and between different gap regions. We found that the protein-coding genes from the same TAD or gap region are more likely to share similar protein functions, and this trend is more obvious with TADs than the gap regions. We further created two types of gene–gene spatial interaction networks: the first type is based on Hi-C contacts, whereas the second type is based on both Hi-C contacts and the relationship of being in the same TAD. A graph auto-encoder was applied to learn the network topology, reconstruct the two types of networks, and predict the functions of the central genes/nodes based on the functions of the neighboring genes/nodes. It was found that better performance was achieved with the second type of network. Furthermore, we detected long-range spatially-interactive regions based on Hi-C contacts and calculated the functional similarities of the gene pairs from these regions.
Collapse
|
10
|
Zhang H, Ouyang Z, Zhao N, Han S, Zheng S. Transcriptional Regulation of the Creatine Utilization Genes of Corynebacterium glutamicum ATCC 14067 by AmtR, a Central Nitrogen Regulator. Front Bioeng Biotechnol 2022; 10:816628. [PMID: 35223787 PMCID: PMC8864220 DOI: 10.3389/fbioe.2022.816628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 01/13/2022] [Indexed: 11/23/2022] Open
Abstract
In the genus Corynebacterium, AmtR is a key component of the nitrogen regulatory system, and it belongs to the TetR family of transcription regulators. There has been much research on AmtR structure, functions, and regulons in the type strain C. glutamicum ATCC 13032, but little research in other C. glutamicum strains. In this study, chromatin immunoprecipitation and massively parallel DNA sequencing (ChIP-seq) was performed to identify the AmtR regulon in C. glutamicum ATCC 14067. Ten peaks were obtained in the C. glutamicum ATCC 14067 genome including two new peaks related to three operons (RS_01910-RS_01915, RS_15995, and RS_16000). The interactions between AmtR and the promoter regions of the three operons were confirmed by electrophoretic mobility shift assays (EMSAs). The RS_01910, RS_01915, RS_15995, and RS_16000 are not present in the type strain C. glutamicum ATCC 13032. Sequence analysis indicates that RS_01910, RS_01915, RS_15995, and RS_16000, are related to the degradation of creatine and creatinine; RS_01910 may encode a protein related to creatine transport. The genes RS_01910, RS_01915, RS_15995, and RS_16000 were given the names crnA, creT, cshA, and hyuB, respectively. Real-time quantitative PCR (RT-qPCR) analysis and sfGFP (superfolder green fluorescent protein) analysis reveal that AmtR directly and negatively regulates the transcription and expression of crnA, creT, cshA, and hyuB. A growth test shows that C. glutamicum ATCC 14067 can use creatine or creatinine as a sole nitrogen source. In comparison, a creT deletion mutant strain is able to grow on creatinine but loses the ability to grow on creatine. This study provides the first genome-wide captures of the dynamics of in vivo AmtR binding events and the regulatory network they define. These elements provide more options for synthetic biology by extending the scope of the AmtR regulon.
Collapse
Affiliation(s)
- Hao Zhang
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Guangdong Research Center of Industrial Enzyme and Green Manufacturing Technology, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Zhilin Ouyang
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Guangdong Research Center of Industrial Enzyme and Green Manufacturing Technology, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Nannan Zhao
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Guangdong Research Center of Industrial Enzyme and Green Manufacturing Technology, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Shuangyan Han
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Guangdong Research Center of Industrial Enzyme and Green Manufacturing Technology, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Suiping Zheng
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Guangdong Research Center of Industrial Enzyme and Green Manufacturing Technology, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
11
|
Mansoor M, Nauman M, Ur Rehman H, Benso A. Gene Ontology GAN (GOGAN): a novel architecture for protein function prediction. Soft comput 2022. [DOI: 10.1007/s00500-021-06707-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
12
|
Sales-Lee J, Perry DS, Bowser BA, Diedrich JK, Rao B, Beusch I, Yates JR, Roy SW, Madhani HD. Coupling of spliceosome complexity to intron diversity. Curr Biol 2021; 31:4898-4910.e4. [PMID: 34555349 DOI: 10.1016/j.cub.2021.09.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 08/17/2021] [Accepted: 09/01/2021] [Indexed: 10/20/2022]
Abstract
We determined that over 40 spliceosomal proteins are conserved between many fungal species and humans but were lost during the evolution of S. cerevisiae, an intron-poor yeast with unusually rigid splicing signals. We analyzed null mutations in a subset of these factors, most of which had not been investigated previously, in the intron-rich yeast Cryptococcus neoformans. We found they govern splicing efficiency of introns with divergent spacing between intron elements. Importantly, most of these factors also suppress usage of weak nearby cryptic/alternative splice sites. Among these, orthologs of GPATCH1 and the helicase DHX35 display correlated functional signatures and copurify with each other as well as components of catalytically active spliceosomes, identifying a conserved G patch/helicase pair that promotes splicing fidelity. We propose that a significant fraction of spliceosomal proteins in humans and most eukaryotes are involved in limiting splicing errors, potentially through kinetic proofreading mechanisms, thereby enabling greater intron diversity.
Collapse
Affiliation(s)
- Jade Sales-Lee
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Daniela S Perry
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Bradley A Bowser
- Department of Molecular and Cellular Biology, University of California, Merced, Merced, CA 95343, USA
| | - Jolene K Diedrich
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Beiduo Rao
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Irene Beusch
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94158, USA
| | - John R Yates
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Scott W Roy
- Department of Biology, San Francisco State University, San Francisco, CA 94132, USA.
| | - Hiten D Madhani
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94158, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA.
| |
Collapse
|
13
|
Fabritius AS, Bayless BA, Li S, Stoddard D, Heydeck W, Ebmeier CC, Anderson L, Gunnels T, Nachiappan C, Whittall JB, Old W, Agard DA, Nicastro D, Winey M. Proteomic analysis of microtubule inner proteins (MIPs) in Rib72 null Tetrahymena cells reveals functional MIPs. Mol Biol Cell 2021; 32:br8. [PMID: 34406789 PMCID: PMC8693976 DOI: 10.1091/mbc.e20-12-0786] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
The core structure of motile cilia and flagella, the axoneme, is built from a stable population of doublet microtubules. This unique stability is brought about, at least in part, by a network of microtubule inner proteins (MIPs) that are bound to the luminal side of the microtubule walls. Rib72A and Rib72B were identified as MIPs in the motile cilia of the protist Tetrahymena thermophila. Loss of these proteins leads to ciliary defects and loss of additional MIPs. We performed mass spectrometry coupled with proteomic analysis and bioinformatics to identify the MIPs lost in RIB72A/B knockout Tetrahymena axonemes. We identified a number of candidate MIPs and pursued one, Fap115, for functional characterization. We find that loss of Fap115 results in disrupted cell swimming and aberrant ciliary beating. Cryo-electron tomography reveals that Fap115 localizes to MIP6a in the A-tubule of the doublet microtubules. Overall, our results highlight the complex relationship between MIPs, ciliary structure, and ciliary function.
Collapse
Affiliation(s)
- Amy S Fabritius
- Department of Molecular and Cellular Biology, University of California Davis, Davis, CA 95616
| | - Brian A Bayless
- Department of Molecular and Cellular Biology, University of California Davis, Davis, CA 95616.,Department of Biology, Santa Clara University, Santa Clara, CA 95053
| | - Sam Li
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94158
| | - Daniel Stoddard
- Department of Cell Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Westley Heydeck
- Department of Molecular Cellular and Developmental Biology, University of Colorado Boulder, Boulder, CO 80309
| | - Christopher C Ebmeier
- Department of Molecular Cellular and Developmental Biology, University of Colorado Boulder, Boulder, CO 80309
| | - Lauren Anderson
- Department of Molecular and Cellular Biology, University of California Davis, Davis, CA 95616
| | - Tess Gunnels
- Department of Biology, Santa Clara University, Santa Clara, CA 95053
| | | | - Justen B Whittall
- Department of Biology, Santa Clara University, Santa Clara, CA 95053
| | - William Old
- Department of Molecular Cellular and Developmental Biology, University of Colorado Boulder, Boulder, CO 80309
| | - David A Agard
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94158
| | - Daniela Nicastro
- Department of Cell Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Mark Winey
- Department of Molecular and Cellular Biology, University of California Davis, Davis, CA 95616
| |
Collapse
|
14
|
Sanchez-Pulido L, Ponting CP. Extending the Horizon of Homology Detection with Coevolution-based Structure Prediction. J Mol Biol 2021; 433:167106. [PMID: 34139218 PMCID: PMC8527833 DOI: 10.1016/j.jmb.2021.167106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/12/2022]
Abstract
Traditional sequence analysis algorithms fail to identify distant homologies when they lie beyond a detection horizon. In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon, thereby yielding new functional insights and experimentally testable hypotheses. Based on correlated substitutions, these methods divine three-dimensional constraints among amino acids in protein sequences that were previously devoid of all annotated domains and repeats. The new algorithms discern hidden structure in an otherwise featureless sequence landscape. Their revelatory impact promises to be as profound as the use, by archaeologists, of ground-penetrating radar to discern long-hidden, subterranean structures. As examples of this, we describe how triplicated structures reflecting longin domains in MON1A-like proteins, or UVR-like repeats in DISC1, emerge from their predicted contact and distance maps. These methods also help to resolve structures that do not conform to a "beads-on-a-string" model of protein domains. In one such example, we describe CFAP298 whose ubiquitin-like domain was previously challenging to perceive owing to a large sequence insertion within it. More generally, the new algorithms permit an easier appreciation of domain families and folds whose evolution involved structural insertion or rearrangement. As we exemplify with α1-antitrypsin, coevolution-based predicted contacts may also yield insights into protein dynamics and conformational change. This new combination of structure prediction (using innovative co-evolution based methods) and homology inference (using more traditional sequence analysis approaches) shows great promise for bringing into view a sea of evolutionary relationships that had hitherto lain far beyond the horizon of homology detection.
Collapse
Affiliation(s)
- Luis Sanchez-Pulido
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| | - Chris P Ponting
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| |
Collapse
|
15
|
Finding functional associations between prokaryotic virus orthologous groups: a proof of concept. BMC Bioinformatics 2021; 22:438. [PMID: 34525942 PMCID: PMC8442406 DOI: 10.1186/s12859-021-04343-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 08/27/2021] [Indexed: 02/02/2023] Open
Abstract
Background The field of viromics has greatly benefited from recent developments in metagenomics, with significant efforts focusing on viral discovery. However, functional annotation of the increasing number of viral genomes is lagging behind. This is highlighted by the degree of annotation of the protein clusters in the prokaryotic Virus Orthologous Groups (pVOGs) database, with 83% of its current 9518 pVOGs having an unknown function. Results In this study we describe a machine learning approach to explore potential functional associations between pVOGs. We measure seven genomic features and use them as input to a Random Forest classifier to predict protein–protein interactions between pairs of pVOGs. After systematic evaluation of the model’s performance on 10 different datasets, we obtained a predictor with a mean accuracy of 0.77 and Area Under Receiving Operation Characteristic (AUROC) score of 0.83. Its application to a set of 2,133,027 pVOG-pVOG interactions allowed us to predict 267,265 putative interactions with a reported probability greater than 0.65. At an expected false discovery rate of 0.27, we placed 95.6% of the previously unannotated pVOGs in a functional context, by predicting their interaction with a pVOG that is functionally annotated. Conclusions We believe that this proof-of-concept methodology, wrapped in a reproducible and automated workflow, can represent a significant step towards obtaining a more complete picture of bacteriophage biology. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04343-w.
Collapse
|
16
|
The Xanthomonas RaxH-RaxR Two-Component Regulatory System Is Orthologous to the Zinc-Responsive Pseudomonas ColS-ColR System. Microorganisms 2021; 9:microorganisms9071458. [PMID: 34361895 PMCID: PMC8306577 DOI: 10.3390/microorganisms9071458] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 06/30/2021] [Accepted: 07/02/2021] [Indexed: 01/08/2023] Open
Abstract
Genome sequence comparisons to infer likely gene functions require accurate ortholog assignments. In Pseudomonas spp., the sensor-regulator ColS-ColR two-component regulatory system responds to zinc and other metals to control certain membrane-related functions, including lipid A remodeling. In Xanthomonas spp., three different two-component regulatory systems, RaxH-RaxR, VgrS-VgrR, and DetS-DetR, have been denoted as ColS-ColR in several different genome annotations and publications. To clarify these assignments, we compared the sensor periplasmic domain sequences and found that those from Pseudomonas ColS and Xanthomonas RaxH share a similar size as well as the location of a Glu-X-X-Glu metal ion-binding motif. Furthermore, we determined that three genes adjacent to raxRH are predicted to encode enzymes that remodel the lipid A component of lipopolysaccharide. The modifications catalyzed by lipid A phosphoethanolamine transferase (EptA) and lipid A 1-phosphatase (LpxE) previously were detected in lipid A from multiple Xanthomonas spp. The third gene encodes a predicted lipid A glycosyl transferase (ArnT). Together, these results indicate that the Xanthomonas RaxH-RaxR system is orthologous to the Pseudomonas ColS-ColR system that regulates lipid A remodeling. To avoid future confusion, we recommend that the terms ColS and ColR no longer be applied to Xanthomonas spp., and that the Vgr, Rax, and Det designations be used instead.
Collapse
|
17
|
Collins DH, Wirén A, Labédan M, Smith M, Prince DC, Mohorianu I, Dalmay T, Bourke AFG. Gene expression during larval caste determination and differentiation in intermediately eusocial bumblebees, and a comparative analysis with advanced eusocial honeybees. Mol Ecol 2021; 30:718-735. [PMID: 33238067 PMCID: PMC7898649 DOI: 10.1111/mec.15752] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 11/11/2020] [Accepted: 11/16/2020] [Indexed: 12/19/2022]
Abstract
The queen‐worker caste system of eusocial insects represents a prime example of developmental polyphenism (environmentally‐induced phenotypic polymorphism) and is intrinsic to the evolution of advanced eusociality. However, the comparative molecular basis of larval caste determination and subsequent differentiation in the eusocial Hymenoptera remains poorly known. To address this issue within bees, we profiled caste‐associated gene expression in female larvae of the intermediately eusocial bumblebee Bombus terrestris. In B. terrestris, female larvae experience a queen‐dependent period during which their caste fate as adults is determined followed by a nutrition‐sensitive period also potentially affecting caste fate but for which the evidence is weaker. We used mRNA‐seq and qRT‐PCR validation to isolate genes differentially expressed between each caste pathway in larvae at developmental stages before and after each of these periods. We show that differences in gene expression between caste pathways are small in totipotent larvae, then peak after the queen‐dependent period. Relatively few novel (i.e., taxonomically‐restricted) genes were differentially expressed between castes, though novel genes were significantly enriched in late‐instar larvae in the worker pathway. We compared sets of caste‐associated genes in B. terrestris with those reported from the advanced eusocial honeybee, Apis mellifera, and found significant but relatively low levels of overlap of gene lists between the two species. These results suggest both the existence of low numbers of shared toolkit genes and substantial divergence in caste‐associated genes between Bombus and the advanced eusocial Apis since their last common eusocial ancestor.
Collapse
Affiliation(s)
- David H Collins
- School of Biological Sciences, University of East Anglia, Norwich, UK
| | - Anders Wirén
- School of Biological Sciences, University of East Anglia, Norwich, UK.,School of Medical Sciences, Faculty of Medicine and Health, Örebro University, Örebro, Sweden
| | - Marjorie Labédan
- School of Biological Sciences, University of East Anglia, Norwich, UK.,Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Michael Smith
- School of Biological Sciences, University of East Anglia, Norwich, UK
| | - David C Prince
- School of Biological Sciences, University of East Anglia, Norwich, UK
| | - Irina Mohorianu
- School of Biological Sciences, University of East Anglia, Norwich, UK.,Jeffrey Cheah Biomedical Centre, WT-MRC Cambridge Stem Cell Institute, Cambridge, UK
| | - Tamas Dalmay
- School of Biological Sciences, University of East Anglia, Norwich, UK
| | - Andrew F G Bourke
- School of Biological Sciences, University of East Anglia, Norwich, UK
| |
Collapse
|
18
|
Orosz F. On the TPPP-like proteins of flagellated fungi. Fungal Biol 2020; 125:357-367. [PMID: 33910677 DOI: 10.1016/j.funbio.2020.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 12/02/2020] [Accepted: 12/06/2020] [Indexed: 12/12/2022]
Abstract
TPPP-like proteins, exhibiting microtubule stabilizing function, constitute a eukaryotic superfamily, characterized by the presence of the p25alpha domain. TPPPs in the strict sense are present in animals except Trichoplax adhaerens, which instead contains apicortin where a part of the p25alpha domain is combined with a DCX domain. Apicortin is absent in other animals and occurs mostly in the protozoan phylum, Apicomplexa. A strong correlation between the occurrence of p25alpha domain and that of the eukaryotic cilium/flagellum was suggested. Species of the deeper branching clades of Fungi possess flagellum but others lost it thus investigation of fungal genomes can help testing of this suggestion. Indeed, these proteins are present in early branching Fungi. Both TPPP and apicortin are present in Rozellomycota (Cryptomycota) and Chytridiomycota, TPPP in Blastocladiomycota, apicortin in Neocallimastigomycota, Monoblepharomycota and the non-flagellated Mucoromycota. Beside the "normal" TPPP occurring in animals, a special, fungal-type TPPP is also present in Fungi, in which a part of the p25alpha domain is duplicated. Dikarya, the most developed subkingdom of Fungi, lacks both flagellum and TPPPs. Thus it is strengthened that each ciliated/flagellated organism contains p25alpha domain-containing proteins while there are very few non-flagellated ones where p25alpha domain can be found.
Collapse
Affiliation(s)
- Ferenc Orosz
- Institute of Enzymology, Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, 1117, Budapest, Hungary.
| |
Collapse
|
19
|
Identification of Uncharacterized Components of Prokaryotic Immune Systems and Their Diverse Eukaryotic Reformulations. J Bacteriol 2020; 202:JB.00365-20. [PMID: 32868406 DOI: 10.1128/jb.00365-20] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 08/25/2020] [Indexed: 12/19/2022] Open
Abstract
Nucleotide-activated effector deployment, prototyped by interferon-dependent immunity, is a common mechanistic theme shared by immune systems of several animals and prokaryotes. Prokaryotic versions include CRISPR-Cas with the CRISPR polymerase domain, their minimal variants, and systems with second messenger oligonucleotide or dinucleotide synthetase (SMODS). Cyclic or linear oligonucleotide signals in these systems help set a threshold for the activation of potentially deleterious downstream effectors in response to invader detection. We establish such a regulatory mechanism to be a more general principle of immune systems, which can also operate independently of such messengers. Using sensitive sequence analysis and comparative genomics, we identify 12 new prokaryotic immune systems, which we unify by this principle of threshold-dependent effector activation. These display regulatory mechanisms paralleling physiological signaling based on 3'-5' cyclic mononucleotides, NAD+-derived messengers, two- and one-component signaling that includes histidine kinase-based signaling, and proteolytic activation. Furthermore, these systems allowed the identification of multiple new sensory signal sensory components, such as a tetratricopeptide repeat (TPR) scaffold predicted to recognize NAD+-derived signals, unreported versions of the STING domain, prokaryotic YEATS domains, and a predicted nucleotide sensor related to receiver domains. We also identify previously unrecognized invader detection components and effector components, such as prokaryotic versions of the Wnt domain. Finally, we show that there have been multiple acquisitions of unidentified STING domains in eukaryotes, while the TPR scaffold was incorporated into the animal immunity/apoptosis signal-regulating kinase (ASK) signalosome.IMPORTANCE Both prokaryotic and eukaryotic immune systems face the dangers of premature activation of effectors and degradation of self-molecules in the absence of an invader. To mitigate this, they have evolved threshold-setting regulatory mechanisms for the triggering of effectors only upon the detection of a sufficiently strong invader signal. This work defines general templates for such regulation in effector-based immune systems. Using this, we identify several previously uncharacterized prokaryotic immune mechanisms that accomplish the regulation of downstream effector deployment by using nucleotide, NAD+-derived, two-component, and one-component signals paralleling physiological homeostasis. This study has also helped identify several previously unknown sensor and effector modules in these systems. Our findings also augment the growing evidence for the emergence of key animal immunity and chromatin regulatory components from prokaryotic progenitors.
Collapse
|
20
|
Raue S, Fan SH, Rosenstein R, Zabel S, Luqman A, Nieselt K, Götz F. The Genome of Staphylococcus epidermidis O47. Front Microbiol 2020; 11:2061. [PMID: 32983045 PMCID: PMC7477909 DOI: 10.3389/fmicb.2020.02061] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 08/05/2020] [Indexed: 12/21/2022] Open
Abstract
The skin colonizing coagulase-negative Staphylococcus epidermidis causes nosocomial infections and is an important opportunistic and highly adaptable pathogen. To gain more insight into this species, we sequenced the genome of the biofilm positive, methicillin susceptible S. epidermidis O47 strain (hereafter O47). This strain belongs to the most frequently isolated sequence type 2. In comparison to the RP62A strain, O47 can be transformed, which makes it a preferred strain for molecular studies. S. epidermidis O47’s genome has a single chromosome of about 2.5 million base pairs and no plasmid. Its oriC sequence has the same directionality as S. epidermidis RP62A, S. carnosus, S. haemolyticus, S. saprophyticus and is inverted in comparison to Staphylococcus aureus and S. epidermidis ATCC 12228. A phylogenetic analysis based on all S. epidermidis genomes currently available at GenBank revealed that O47 is closest related to DAR1907. The genome of O47 contains genes for the typical global regulatory systems known in staphylococci. In addition, it contains most of the genes encoding for the typical virulence factors for S. epidermidis but not for S. aureus with the exception of a putative hemolysin III. O47 has the typical S. epidermidis genetic islands and several mobile genetic elements, which include staphylococcal cassette chromosome (SCC) of about 54 kb length and two prophages φO47A and φO47B. However, its genome has no transposons and the smallest number of insertion sequence (IS) elements compared to the other known S. epidermidis genomes. By sequencing and analyzing the genome of O47, we provide the basis for its utilization in genetic and molecular studies of biofilm formation.
Collapse
Affiliation(s)
- Stefan Raue
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany.,Microbial Genetics, Interfaculty Institute of Microbiology and Infection Medicine Tübingen (IMIT), University of Tübingen, Tübingen, Germany
| | - Sook-Ha Fan
- Microbial Genetics, Interfaculty Institute of Microbiology and Infection Medicine Tübingen (IMIT), University of Tübingen, Tübingen, Germany
| | - Ralf Rosenstein
- Infection Biology, Interfaculty Institute for Microbiology and Infection Medicine Tübingen (IMIT), University of Tübingen, Tübingen, Germany
| | - Susanne Zabel
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Arif Luqman
- Microbial Genetics, Interfaculty Institute of Microbiology and Infection Medicine Tübingen (IMIT), University of Tübingen, Tübingen, Germany.,Biology Department, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
| | - Kay Nieselt
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Friedrich Götz
- Microbial Genetics, Interfaculty Institute of Microbiology and Infection Medicine Tübingen (IMIT), University of Tübingen, Tübingen, Germany
| |
Collapse
|
21
|
Cantini L, Kairov U, de Reyniès A, Barillot E, Radvanyi F, Zinovyev A. Assessing reproducibility of matrix factorization methods in independent transcriptomes. Bioinformatics 2020; 35:4307-4313. [PMID: 30938767 PMCID: PMC6821374 DOI: 10.1093/bioinformatics/btz225] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 03/20/2019] [Accepted: 04/01/2019] [Indexed: 12/26/2022] Open
Abstract
Motivation Matrix factorization (MF) methods are widely used in order to reduce dimensionality of transcriptomic datasets to the action of few hidden factors (metagenes). MF algorithms have never been compared based on the between-datasets reproducibility of their outputs in similar independent datasets. Lack of this knowledge might have a crucial impact when generalizing the predictions made in a study to others. Results We systematically test widely used MF methods on several transcriptomic datasets collected from the same cancer type (14 colorectal, 8 breast and 4 ovarian cancer transcriptomic datasets). Inspired by concepts of evolutionary bioinformatics, we design a novel framework based on Reciprocally Best Hit (RBH) graphs in order to benchmark the MF methods for their ability to produce generalizable components. We show that a particular protocol of application of independent component analysis (ICA), accompanied by a stabilization procedure, leads to a significant increase in the between-datasets reproducibility. Moreover, we show that the signals detected through this method are systematically more interpretable than those of other standard methods. We developed a user-friendly tool for performing the Stabilized ICA-based RBH meta-analysis. We apply this methodology to the study of colorectal cancer (CRC) for which 14 independent transcriptomic datasets can be collected. The resulting RBH graph maps the landscape of interconnected factors associated to biological processes or to technological artifacts. These factors can be used as clinical biomarkers or robust and tumor-type specific transcriptomic signatures of tumoral cells or tumoral microenvironment. Their intensities in different samples shed light on the mechanistic basis of CRC molecular subtyping. Availability and implementation The RBH construction tool is available from http://goo.gl/DzpwYp Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Laura Cantini
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, F-75006 Paris, France.,Computational Systems Biology Team, Institut de Biologie de l'École Normale Supérieure, CNRS UMR8197, INSERM U1024, École Normale Supérieure, PSL Research University, Paris, France
| | - Ulykbek Kairov
- Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
| | - Aurélien de Reyniès
- Programme Cartes d'Identité des Tumeurs (CIT), Ligue Nationale Contre le Cancer, Paris, France
| | - Emmanuel Barillot
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, F-75006 Paris, France
| | - François Radvanyi
- Institut Curie, PSL Research University, CNRS, UMR144, Equipe Labellisée Ligue Contre le Cancer, Paris, France.,Sorbonne Universités, UPMC Université Paris 06, CNRS, UMR144, Paris
| | - Andrei Zinovyev
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, F-75006 Paris, France.,Lobachevsky University, Nizhny Novgorod, Russia
| |
Collapse
|
22
|
Koo DCE, Bonneau R. Towards region-specific propagation of protein functions. Bioinformatics 2020; 35:1737-1744. [PMID: 30304483 PMCID: PMC6513163 DOI: 10.1093/bioinformatics/bty834] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 08/23/2018] [Accepted: 10/08/2018] [Indexed: 01/06/2023] Open
Abstract
MOTIVATION Due to the nature of experimental annotation, most protein function prediction methods operate at the protein-level, where functions are assigned to full-length proteins based on overall similarities. However, most proteins function by interacting with other proteins or molecules, and many functional associations should be limited to specific regions rather than the entire protein length. Most domain-centric function prediction methods depend on accurate domain family assignments to infer relationships between domains and functions, with regions that are unassigned to a known domain-family left out of functional evaluation. Given the abundance of residue-level annotations currently available, we present a function prediction methodology that automatically infers function labels of specific protein regions using protein-level annotations and multiple types of region-specific features. RESULTS We apply this method to local features obtained from InterPro, UniProtKB and amino acid sequences and show that this method improves both the accuracy and region-specificity of protein function transfer and prediction. We compare region-level predictive performance of our method against that of a whole-protein baseline method using proteins with structurally verified binding sites and also compare protein-level temporal holdout predictive performances to expand the variety and specificity of GO terms we could evaluate. Our results can also serve as a starting point to categorize GO terms into region-specific and whole-protein terms and select prediction methods for different classes of GO terms. AVAILABILITY AND IMPLEMENTATION The code and features are freely available at: https://github.com/ek1203/rsfp. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Da Chen Emily Koo
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Richard Bonneau
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA.,Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA.,Center for Data Science, New York University, New York, NY, USA
| |
Collapse
|
23
|
Stadler PF, Geiß M, Schaller D, López Sánchez A, González Laffitte M, Valdivia DI, Hellmuth M, Hernández Rosales M. From pairs of most similar sequences to phylogenetic best matches. Algorithms Mol Biol 2020; 15:5. [PMID: 32308731 PMCID: PMC7147060 DOI: 10.1186/s13015-020-00165-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 03/26/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many of the commonly used methods for orthology detection start from mutually most similar pairs of genes (reciprocal best hits) as an approximation for evolutionary most closely related pairs of genes (reciprocal best matches). This approximation of best matches by best hits becomes exact for ultrametric dissimilarities, i.e., under the Molecular Clock Hypothesis. It fails, however, whenever there are large lineage specific rate variations among paralogous genes. In practice, this introduces a high level of noise into the input data for best-hit-based orthology detection methods. RESULTS If additive distances between genes are known, then evolutionary most closely related pairs can be identified by considering certain quartets of genes provided that in each quartet the outgroup relative to the remaining three genes is known. A priori knowledge of underlying species phylogeny greatly facilitates the identification of the required outgroup. Although the workflow remains a heuristic since the correct outgroup cannot be determined reliably in all cases, simulations with lineage specific biases and rate asymmetries show that nearly perfect results can be achieved. In a realistic setting, where distances data have to be estimated from sequence data and hence are noisy, it is still possible to obtain highly accurate sets of best matches. CONCLUSION Improvements of tree-free orthology assessment methods can be expected from a combination of the accurate inference of best matches reported here and recent mathematical advances in the understanding of (reciprocal) best match graphs and orthology relations. AVAILABILITY Accompanying software is available at https://github.com/david-schaller/AsymmeTree.
Collapse
Affiliation(s)
- Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, 04107 Leipzig, Germany
- Competence Center for Scalable Data Services and Solutions Dresden/Leipzig, Interdisciplinary Center for Bioinformatics, German Centre for Integrative Biodiversity Research (iDiv), and Leipzig Research Center for Civilization Diseases, Universität Leipzig, Augustusplatz 12, 04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, Germany
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, 1090 Vienna, Austria
- Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Ciudad Universitaria, 111321 Bogotá, D.C. Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501 USA
| | - Manuela Geiß
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, 04107 Leipzig, Germany
- Software Competence Center Hagenberg GmbH, Softwarepark 21, 4232 Hagenberg, Austria
| | - David Schaller
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, 04107 Leipzig, Germany
| | - Alitzel López Sánchez
- CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO México
| | - Marcos González Laffitte
- CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO México
| | - Dulce I. Valdivia
- Departamento de Ingeniería Genética, Centro de Investigación y de Estudios Avanzados del IPN (CINVESTAV), Km. 9.6 Libramiento Norte Carretera Irapuato-León, 36821 Irapuato, GTO México
| | - Marc Hellmuth
- School of Computing, University of Leeds, E C Stoner Building, Leeds, LS2 9JT UK
| | - Maribel Hernández Rosales
- CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO México
| |
Collapse
|
24
|
Forsthoefel DJ, Cejda NI, Khan UW, Newmark PA. Cell-type diversity and regionalized gene expression in the planarian intestine. eLife 2020; 9:e52613. [PMID: 32240093 PMCID: PMC7117911 DOI: 10.7554/elife.52613] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 03/06/2020] [Indexed: 12/17/2022] Open
Abstract
Proper function and repair of the digestive system are vital to most animals. Deciphering the mechanisms involved in these processes requires an atlas of gene expression and cell types. Here, we applied laser-capture microdissection (LCM) and RNA-seq to characterize the intestinal transcriptome of Schmidtea mediterranea, a planarian flatworm that can regenerate all organs, including the gut. We identified hundreds of genes with intestinal expression undetected by previous approaches. Systematic analyses revealed extensive conservation of digestive physiology and cell types with other animals, including humans. Furthermore, spatial LCM enabled us to uncover previously unappreciated regionalization of gene expression in the planarian intestine along the medio-lateral axis, especially among intestinal goblet cells. Finally, we identified two intestine-enriched transcription factors that specifically regulate regeneration (hedgehog signaling effector gli-1) or maintenance (RREB2) of goblet cells. Altogether, this work provides resources for further investigation of mechanisms involved in gastrointestinal function, repair and regeneration.
Collapse
Affiliation(s)
- David J Forsthoefel
- Genes and Human Disease Research Program, Oklahoma Medical Research FoundationOklahoma CityUnited States
- Howard Hughes Medical Institute, Department of Cell and Developmental Biology, University of Illinois at Urbana-ChampaignUrbanaUnited States
| | - Nicholas I Cejda
- Genes and Human Disease Research Program, Oklahoma Medical Research FoundationOklahoma CityUnited States
| | - Umair W Khan
- Howard Hughes Medical Institute, Department of Cell and Developmental Biology, University of Illinois at Urbana-ChampaignUrbanaUnited States
| | - Phillip A Newmark
- Howard Hughes Medical Institute, Department of Cell and Developmental Biology, University of Illinois at Urbana-ChampaignUrbanaUnited States
| |
Collapse
|
25
|
Gao K, Miller J. Primary orthologs from local sequence context. BMC Bioinformatics 2020; 21:48. [PMID: 32028880 PMCID: PMC7006074 DOI: 10.1186/s12859-020-3384-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 01/22/2020] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND The evolutionary history of genes serves as a cornerstone of contemporary biology. Most conserved sequences in mammalian genomes don't code for proteins, yielding a need to infer evolutionary history of sequences irrespective of what kind of functional element they may encode. Thus, sequence-, as opposed to gene-, centric modes of inferring paths of sequence evolution are increasingly relevant. Customarily, homologous sequences derived from the same direct ancestor, whose ancestral position in two genomes is usually conserved, are termed "primary" (or "positional") orthologs. Methods based solely on similarity don't reliably distinguish primary orthologs from other homologs; for this, genomic context is often essential. Context-dependent identification of orthologs traditionally relies on genomic context over length scales characteristic of conserved gene order or whole-genome sequence alignment, and can be computationally intensive. RESULTS We demonstrate that short-range sequence context-as short as a single "maximal" match- distinguishes primary orthologs from other homologs across whole genomes. On mammalian whole genomes not preprocessed by repeat-masker, potential orthologs are extracted by genome intersection as "non-nested maximal matches:" maximal matches that are not nested into other maximal matches. It emerges that on both nucleotide and gene scales, non-nested maximal matches recapitulate primary or positional orthologs with high precision and high recall, while the corresponding computation consumes less than one thirtieth of the computation time required by commonly applied whole-genome alignment methods. In regions of genomes that would be masked by repeat-masker, non-nested maximal matches recover orthologs that are inaccessible to Lastz net alignment, for which repeat-masking is a prerequisite. mmRBHs, reciprocal best hits of genes containing non-nested maximal matches, yield novel putative orthologs, e.g. around 1000 pairs of genes for human-chimpanzee. CONCLUSIONS We describe an intersection-based method that requires neither repeat-masking nor alignment to infer evolutionary history of sequences based on short-range genomic sequence context. Ortholog identification based on non-nested maximal matches is parameter-free, and less computationally intensive than many alignment-based methods. It is especially suitable for genome-wide identification of orthologs, and may be applicable to unassembled genomes. We are agnostic as to the reasons for its effectiveness, which may reflect local variation of mean mutation rate.
Collapse
Affiliation(s)
- Kun Gao
- School of Science, Southwest University of Science and Technology, 59 Qinglong Road, Mianyang, Sichuan Province, 621010, People's Republic of China.
| | - Jonathan Miller
- Physics and Biology Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Kunigami-gun, Okinawa, 904-0495, Japan
| |
Collapse
|
26
|
Geiß M, Stadler PF, Hellmuth M. Reciprocal best match graphs. J Math Biol 2019; 80:865-953. [PMID: 31691135 DOI: 10.1007/s00285-019-01444-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 06/10/2019] [Indexed: 11/24/2022]
Abstract
Reciprocal best matches play an important role in numerous applications in computational biology, in particular as the basis of many widely used tools for orthology assessment. Nevertheless, very little is known about their mathematical structure. Here, we investigate the structure of reciprocal best match graphs (RBMGs). In order to abstract from the details of measuring distances, we define reciprocal best matches here as pairwise most closely related leaves in a gene tree, arguing that conceptually this is the notion that is pragmatically approximated by distance- or similarity-based heuristics. We start by showing that a graph G is an RBMG if and only if its quotient graph w.r.t. a certain thinness relation is an RBMG. Furthermore, it is necessary and sufficient that all connected components of G are RBMGs. The main result of this contribution is a complete characterization of RBMGs with 3 colors/species that can be checked in polynomial time. For 3 colors, there are three distinct classes of trees that are related to the structure of the phylogenetic trees explaining them. We derive an approach to recognize RBMGs with an arbitrary number of colors; it remains open however, whether a polynomial-time for RBMG recognition exists. In addition, we show that RBMGs that at the same time are cographs (co-RBMGs) can be recognized in polynomial time. Co-RBMGs are characterized in terms of hierarchically colored cographs, a particular class of vertex colored cographs that is introduced here. The (least resolved) trees that explain co-RBMGs can be constructed in polynomial time.
Collapse
Affiliation(s)
- Manuela Geiß
- Bioinformatics Group, Department of Computer Science, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.,Interdisciplinary Center of Bioinformatics, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.,Interdisciplinary Center of Bioinformatics, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.,Competence Center for Scalable Data Services and Solutions, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.,Leipzig Research Center for Civilization Diseases, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.,Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany.,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090, Vienna, Austria.,Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA
| | - Marc Hellmuth
- Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Straße 47, 17487, Greifswald, Germany. .,Center for Bioinformatics, Saarland University, Building E 2.1, P.O. Box 151150, 66041, Saarbrücken, Germany.
| |
Collapse
|
27
|
Grisdale CJ, Smith DR, Archibald JM. Relative Mutation Rates in Nucleomorph-Bearing Algae. Genome Biol Evol 2019; 11:1045-1053. [PMID: 30859201 PMCID: PMC6456004 DOI: 10.1093/gbe/evz056] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/08/2019] [Indexed: 12/23/2022] Open
Abstract
Chlorarachniophyte and cryptophyte algae are unique among plastid-containing species in that they have a nucleomorph genome: a compact, highly reduced nuclear genome from a photosynthetic eukaryotic endosymbiont. Despite their independent origins, the nucleomorph genomes of these two lineages have similar genomic architectures, but little is known about the evolutionary pressures impacting nucleomorph DNA, particularly how their rates of evolution compare to those of the neighboring genetic compartments (the mitochondrion, plastid, and nucleus). Here, we use synonymous substitution rates to estimate relative mutation rates in the four genomes of nucleomorph-bearing algae. We show that the relative mutation rates of the host versus endosymbiont nuclear genomes are similar in both chlorarachniophytes and cryptophytes, despite the fact that nucleomorph gene sequences are notoriously highly divergent. There is some evidence, however, for slightly elevated mutation rates in the nucleomorph DNA of chlorarachniophytes-a feature not observed in that of cryptophytes. For both lineages, relative mutation rates in the plastid appear to be lower than those in the nucleus and nucleomorph (and, in one case, the mitochondrion), which is consistent with studies of other plastid-bearing protists. Given the divergent nature of nucleomorph genes, our finding of relatively low evolutionary rates in these genomes suggests that for both lineages a burst of evolutionary change and/or decreased selection pressures likely occurred early in the integration of the secondary endosymbiont.
Collapse
Affiliation(s)
- Cameron J Grisdale
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - David R Smith
- Department of Biology, University of Western Ontario, London, Ontario, Canada
| | - John M Archibald
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
28
|
Machado KCT, Fortuin S, Tomazella GG, Fonseca AF, Warren RM, Wiker HG, de Souza SJ, de Souza GA. On the Impact of the Pangenome and Annotation Discrepancies While Building Protein Sequence Databases for Bacteria Proteogenomics. Front Microbiol 2019; 10:1410. [PMID: 31281302 PMCID: PMC6596428 DOI: 10.3389/fmicb.2019.01410] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 06/05/2019] [Indexed: 01/19/2023] Open
Abstract
In proteomics, peptide information within mass spectrometry (MS) data from a specific organism sample is routinely matched against a protein sequence database that best represent such organism. However, if the species/strain in the sample is unknown or genetically poorly characterized, it becomes challenging to determine a database which can represent such sample. Building customized protein sequence databases merging multiple strains for a given species has become a strategy to overcome such restrictions. However, as more genetic information is publicly available and interesting genetic features such as the existence of pan- and core genes within a species are revealed, we questioned how efficient such merging strategies are to report relevant information. To test this assumption, we constructed databases containing conserved and unique sequences for 10 different species. Features that are relevant for probabilistic-based protein identification by proteomics were then monitored. As expected, increase in database complexity correlates with pangenomic complexity. However, Mycobacterium tuberculosis and Bordetella pertussis generated very complex databases even having low pangenomic complexity. We further tested database performance by using MS data from eight clinical strains from M. tuberculosis, and from two published datasets from Staphylococcus aureus. We show that by using an approach where database size is controlled by removing repeated identical tryptic sequences across strains/species, computational time can be reduced drastically as database complexity increases.
Collapse
Affiliation(s)
- Karla C T Machado
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
| | - Suereta Fortuin
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research/SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | - Gisele Guicardi Tomazella
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
- The Gade Research Group for Infection and Immunity, Department of Clinical Science, University of Bergen, Bergen, Norway
- The Institute of Bioinformatics and Biotechnology, Natal, Brazil
| | - Andre F Fonseca
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
| | - Robin Mark Warren
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research/SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | - Harald G Wiker
- The Gade Research Group for Infection and Immunity, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Sandro Jose de Souza
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
- The Brain Institute, Universidade Federal do Rio Grande do Norte, Natal, Brazil
| | - Gustavo Antonio de Souza
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
- Department of Biochemistry, Federal University of Rio Grande do Norte (UFRN), Natal, Brazil
| |
Collapse
|
29
|
Abstract
Best match graphs arise naturally as the first processing intermediate in algorithms for orthology detection. Let T be a phylogenetic (gene) tree T and [Formula: see text] an assignment of leaves of T to species. The best match graph [Formula: see text] is a digraph that contains an arc from x to y if the genes x and y reside in different species and y is one of possibly many (evolutionary) closest relatives of x compared to all other genes contained in the species [Formula: see text]. Here, we characterize best match graphs and show that it can be decided in cubic time and quadratic space whether [Formula: see text] derived from a tree in this manner. If the answer is affirmative, there is a unique least resolved tree that explains [Formula: see text], which can also be constructed in cubic time.
Collapse
|
30
|
Wielstra B, McCartney-Melstad E, Arntzen J, Butlin R, Shaffer H. Phylogenomics of the adaptive radiation of Triturus newts supports gradual ecological niche expansion towards an incrementally aquatic lifestyle. Mol Phylogenet Evol 2019; 133:120-127. [DOI: 10.1016/j.ympev.2018.12.032] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 12/30/2018] [Accepted: 12/30/2018] [Indexed: 11/29/2022]
|
31
|
González JM, Hernández L, Manzano I, Pedrós-Alió C. Functional annotation of orthologs in metagenomes: a case study of genes for the transformation of oceanic dimethylsulfoniopropionate. ISME JOURNAL 2019; 13:1183-1197. [PMID: 30643200 DOI: 10.1038/s41396-019-0347-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 11/22/2018] [Accepted: 12/25/2018] [Indexed: 11/09/2022]
Abstract
Dimethylsulfoniopropionate (DMSP) is produced mainly by phytoplankton and bacteria. It is relatively abundant and ubiquitous in the marine environment, where bacterioplankton make use of it readily as both carbon and sulfur sources. In one transformation pathway, part of the molecule becomes dimethylsulfide (DMS), which escapes into the atmosphere and plays an important role in the sulfur exchange between oceans and atmosphere. Through its other dominant catabolic pathway, bacteria are able to use it as sulfur source. During the past few years, a number of genes involved in its transformation have been characterized. Identifying genes in taxonomic groups not amenable to conventional methods of cultivation is challenging. Indeed, functional annotation of genes in environmental studies is not straightforward, considering that particular taxa are not well represented in the available sequence databases. Furthermore, many genes belong to families of paralogs with similar sequences but perhaps different functions. In this study, we develop in silico approaches to infer protein function of an environmentally important gene (dmdA) that carries out the first step in the sulfur assimilation from DMSP. The method combines a set of tools to annotate a targeted gene in genome databases and metagenome assemblies. The method will be useful to identify genes that carry out key biochemical processes in the environment.
Collapse
Affiliation(s)
- José M González
- Department of Microbiology, University of La Laguna, La Laguna, Spain.
| | - Laura Hernández
- Department of Microbiology, University of La Laguna, La Laguna, Spain
| | - Iris Manzano
- Department of Microbiology, University of La Laguna, La Laguna, Spain
| | - Carlos Pedrós-Alió
- Systems Biology Program, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| |
Collapse
|
32
|
Hu C, Yang H, Jiang K, Wang L, Yang B, Hsieh T, Lan S, Huang W. Development of polymorphic microsatellite markers by using de novo transcriptome assembly of Calanthe masuca and C. sinica (Orchidaceae). BMC Genomics 2018; 19:800. [PMID: 30400862 PMCID: PMC6219035 DOI: 10.1186/s12864-018-5161-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Accepted: 10/11/2018] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Calanthe masuca and C. sinica are two genetically closely related species in Orchidaceae. C. masuca is widely distributed in Asia, whereas C. sinica is restricted to Yunnan and Guangxi Provinces in southwest China. Both play important roles in horticulture and are under the pressure of population decline. Understanding their genetic background can greatly help us develop effective conservation strategies for these species. Simple sequence repeats (SSRs) are useful for genetic diversity analysis, presumably providing key information for the study and preservation of the wild populations of the two species we are interested in. RESULTS In this study, we performed RNA-seq analysis on the leaves of C. masuca and C. sinica, obtaining 40,916 and 71,618 unigenes for each species, respectively. In total, 2,019/3,865 primer pairs were successfully designed from 3,764/7,189 putative SSRs, among which 197 polymorphic SSRs were screened out according to orthologous gene pairs. After mononucleotide exclusion, a subset of 129 SSR primers were analysed, and 13 of them were found to have high polymorphism levels. Further analysis demonstrated that they were feasible and effective against C. masuca and C. sinica as well as transferable to another species in Calanthe. Molecular evolutionary analysis revealed functional pathways commonly enriched in unigenes with similar evolutionary rates in the two species, as well as pathways specific to each species, implicating species-specific adaptation. The divergence time between the two closely related species was tentatively determined to be 3.42 ± 1.86 Mya. CONCLUSIONS We completed and analysed the transcriptomes of C. masuca and C. sinica, assembling large numbers of unigenes and generating effective polymorphic SSR markers. This is the first report of the development of expressed sequence tag (EST)-SSR markers for Calanthe. In addition, our study could enable further genetic diversity analysis and functional and comparative genomic studies on Calanthe.
Collapse
Affiliation(s)
- Chao Hu
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602 China
- Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai Chenshan Botanical Garden, Shanghai, 201602 China
- Institute of Botany, Chinese Academy of Sciences, Beijing, 100093 China
| | - Hongxing Yang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602 China
- Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai Chenshan Botanical Garden, Shanghai, 201602 China
| | - Kai Jiang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602 China
- Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai Chenshan Botanical Garden, Shanghai, 201602 China
| | - Ling Wang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602 China
- Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai Chenshan Botanical Garden, Shanghai, 201602 China
| | - Boyun Yang
- School of Life Science, Nanchang University, Nanchang, 330031 China
| | - Tungyu Hsieh
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602 China
- Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai Chenshan Botanical Garden, Shanghai, 201602 China
- Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031 China
| | - Siren Lan
- College of Landscape, Fujian Agriculture and Forestry University, Fuzhou, 350002 China
| | - Weichang Huang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602 China
- Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai Chenshan Botanical Garden, Shanghai, 201602 China
- College of Landscape, Fujian Agriculture and Forestry University, Fuzhou, 350002 China
| |
Collapse
|
33
|
Sinha S, Eisenhaber B, Jensen LJ, Kalbuaji B, Eisenhaber F. Darkness in the Human Gene and Protein Function Space: Widely Modest or Absent Illumination by the Life Science Literature and the Trend for Fewer Protein Function Discoveries Since 2000. Proteomics 2018; 18:e1800093. [PMID: 30265449 PMCID: PMC6282819 DOI: 10.1002/pmic.201800093] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 09/07/2018] [Indexed: 12/15/2022]
Abstract
The mentioning of gene names in the body of the scientific literature 1901-2017 and their fractional counting is used as a proxy to assess the level of biological function discovery. A literature score of one has been defined as full publication equivalent (FPE), the amount of literature necessary to achieve one publication solely dedicated to a gene. It has been found that less than 5000 human genes have each at least 100 FPEs in the available literature corpus. This group of elite genes (4817 protein-coding genes, 119 non-coding RNAs) attracts the overwhelming majority of the scientific literature about genes. Yet, thousands of proteins have never been mentioned at all, ≈2000 further proteins have not even one FPE of literature and, for ≈4600 additional proteins, the FPE count is below 10. The protein function discovery rate measured as numbers of proteins first mentioned or crossing a threshold of accumulated FPEs in a given year has grown until 2000 but is in decline thereafter. This drop is partially offset by function discoveries for non-coding RNAs. The full human genome sequencing does not boost the function discovery rate. Since 2000, the fastest growing group in the literature is that with at least 500 FPEs per gene.
Collapse
Affiliation(s)
- Swati Sinha
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein ResearchFaculty of Health and Medical SciencesUniversity of CopenhagenDK-2200 CopenhagenDenmark
| | - Bharata Kalbuaji
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
- School of Computer Science and Engineering (SCSE)Nanyang Technological University (NTU)637553Singapore
| |
Collapse
|
34
|
Ryo M, Yamashino T, Nomoto Y, Goto Y, Ichinose M, Sato K, Sugita M, Aoki S. Light-regulated PAS-containing histidine kinases delay gametophore formation in the moss Physcomitrella patens. JOURNAL OF EXPERIMENTAL BOTANY 2018; 69:4839-4851. [PMID: 29992239 PMCID: PMC6137987 DOI: 10.1093/jxb/ery257] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Accepted: 07/04/2018] [Indexed: 05/07/2023]
Abstract
Two-component systems (TCSs) are signal transduction mechanisms for responding to various environmental stimuli. In angiosperms, TCSs involved in phytohormone signaling have been intensively studied, whereas there are only a few reports on TCSs in basal land plants. The moss Physcomitrella patens possesses several histidine kinases (HKs) that are lacking in seed plant genomes. Here, we studied two of these unique HKs, PAS-histidine kinase 1 (PHK1) and its paralog PHK2, both of which have PAS (Per-Arnt-Sim) domains, which are known to show versatile functions such as sensing light or molecular oxygen. We found homologs of PHK1 and PHK2 only in early diverged clades such as bryophytes and lycophytes, but not in seed plants. The PAS sequences of PHK1 and PHK2 are more similar to a subset of bacterial PAS sequences than to any angiosperm PAS sequences. Gene disruption lines that lack either PHK1 or PHK2 or both formed gametophores earlier than the wild-type, and consistently, more caulonema side branches were induced in response to light in the disruption lines. Therefore, PHK1 and PHK2 delay the timing of gametophore development, probably by suppressing light-induced caulonema branching. This study provides new insights into the evolution of TCSs in plants.
Collapse
Affiliation(s)
- Masashi Ryo
- Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan
| | - Takafumi Yamashino
- Graduate School of Bioagricultural Sciences, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan
- Correspondence: or
| | - Yuji Nomoto
- Graduate School of Bioagricultural Sciences, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan
| | - Yuki Goto
- Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan
| | - Mizuho Ichinose
- Center for Gene Research, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan
- Institute of Transformative Bio-Molecules, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan
| | - Kensuke Sato
- Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan
| | - Mamoru Sugita
- Center for Gene Research, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan
| | - Setsuyuki Aoki
- Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan
- Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan
- Correspondence: or
| |
Collapse
|
35
|
Perricone U, Gulotta MR, Lombino J, Parrino B, Cascioferro S, Diana P, Cirrincione G, Padova A. An overview of recent molecular dynamics applications as medicinal chemistry tools for the undruggable site challenge. MEDCHEMCOMM 2018; 9:920-936. [PMID: 30108981 PMCID: PMC6072422 DOI: 10.1039/c8md00166a] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 04/19/2018] [Indexed: 12/14/2022]
Abstract
Molecular dynamics (MD) has become increasingly popular due to the development of hardware and software solutions and the improvement in algorithms, which allowed researchers to scale up calculations in order to speed them up. MD simulations are usually used to address protein folding issues or protein-ligand complex stability through energy profile analysis over time. In recent years, the development of new tools able to deeply explore a potential energy surface (PES) has allowed researchers to focus on the dynamic nature of the binding recognition process and binding-induced protein conformational changes. Moreover, modern approaches have been demonstrated to be effective and reliable in calculating some kinetic and thermodynamic parameters behind the host-guest recognition process. Starting from all of these considerations, several efforts have been made in order to integrate MD within the virtual screening process in drug discovery. Knowledge retrieved from MD can, in fact, be exploited as a starting point to build pharmacophores or docking constraints in the early stage of the screening campaign as well as to define key features, in order to unravel hidden binding modes and help the optimisation of the molecular structure of a lead compound. Based on these outcomes, researchers are nowadays using MD as an invaluable tool to discover and target previously considered undruggable binding sites, including protein-protein interactions and allosteric sites on a protein surface. As a matter of fact, the use of MD has been recognised as vital to the discovery of selective protein-protein interaction modulators. The use of a dynamic overview on how the host-guest recognition occurs and of the relative conformational modifications induced allows researchers to optimise small molecules and small peptides capable of tightly interacting within the cleft between two proteins. In this review, we aim to present the most recent applications of MD as an integrated tool to be used in the rational design of small molecules or small peptides able to modulate undruggable targets, such as allosteric sites and protein-protein interactions.
Collapse
Affiliation(s)
- Ugo Perricone
- Computational and Medicinal Chemistry Group , Fondazione Ri.MED , Via Bandiera 11 , 90133 Palermo , Italy .
| | - Maria Rita Gulotta
- Computational and Medicinal Chemistry Group , Fondazione Ri.MED , Via Bandiera 11 , 90133 Palermo , Italy .
- Dipartimento di Scienze e Tecnologie Biologiche Chimiche e Farmaceutiche (STEBICEF) , Università degli Studi di Palermo , Via Archirafi 32 , 90123 Palermo , Italy
| | - Jessica Lombino
- Computational and Medicinal Chemistry Group , Fondazione Ri.MED , Via Bandiera 11 , 90133 Palermo , Italy .
- Dipartimento di Scienze e Tecnologie Biologiche Chimiche e Farmaceutiche (STEBICEF) , Università degli Studi di Palermo , Via Archirafi 32 , 90123 Palermo , Italy
| | - Barbara Parrino
- Dipartimento di Scienze e Tecnologie Biologiche Chimiche e Farmaceutiche (STEBICEF) , Università degli Studi di Palermo , Via Archirafi 32 , 90123 Palermo , Italy
| | - Stella Cascioferro
- Dipartimento di Scienze e Tecnologie Biologiche Chimiche e Farmaceutiche (STEBICEF) , Università degli Studi di Palermo , Via Archirafi 32 , 90123 Palermo , Italy
| | - Patrizia Diana
- Dipartimento di Scienze e Tecnologie Biologiche Chimiche e Farmaceutiche (STEBICEF) , Università degli Studi di Palermo , Via Archirafi 32 , 90123 Palermo , Italy
| | - Girolamo Cirrincione
- Dipartimento di Scienze e Tecnologie Biologiche Chimiche e Farmaceutiche (STEBICEF) , Università degli Studi di Palermo , Via Archirafi 32 , 90123 Palermo , Italy
| | - Alessandro Padova
- Computational and Medicinal Chemistry Group , Fondazione Ri.MED , Via Bandiera 11 , 90133 Palermo , Italy .
| |
Collapse
|
36
|
Eisenhaber B, Sinha S, Wong WC, Eisenhaber F. Function of a membrane-embedded domain evolutionarily multiplied in the GPI lipid anchor pathway proteins PIG-B, PIG-M, PIG-U, PIG-W, PIG-V, and PIG-Z. Cell Cycle 2018; 17:874-880. [PMID: 29764287 PMCID: PMC6056205 DOI: 10.1080/15384101.2018.1456294] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Distant homology relationships among proteins with many transmembrane regions (TMs) are difficult to detect as they are clouded by the TMs’ hydrophobic compositional bias and mutational divergence in connecting loops. In the case of several GPI lipid anchor biosynthesis pathway components, the hidden evolutionary signal can be revealed with dissectHMMER, a sequence similarity search tool focusing on fold-critical, high complexity sequence segments. We find that a sequence module with 10 TMs in PIG-W, described as acyl transferase, is homologous to PIG-U, a transamidase subunit without characterized molecular function, and to mannosyltransferases PIG-B, PIG-M, PIG-V and PIG-Z. We conclude that this new, membrane-embedded domain named BindGPILA functions as the unit for recognizing, binding and stabilizing the GPI lipid anchor in a modification-competent form as this appears the only functional aspect shared among all proteins. Thus, PIG-U's likely molecular function is shuttling/presenting the anchor in a productive conformation to the transamidase complex.
Collapse
Affiliation(s)
- Birgit Eisenhaber
- a Bioinformatics Institute, Agency for Science , Technology and Research (A*STAR) , 30 Biopolis Street, #07-01 Matrix, Singapore 138671 , Republic of Singapore
| | - Swati Sinha
- a Bioinformatics Institute, Agency for Science , Technology and Research (A*STAR) , 30 Biopolis Street, #07-01 Matrix, Singapore 138671 , Republic of Singapore
| | - Wing-Cheong Wong
- a Bioinformatics Institute, Agency for Science , Technology and Research (A*STAR) , 30 Biopolis Street, #07-01 Matrix, Singapore 138671 , Republic of Singapore
| | - Frank Eisenhaber
- a Bioinformatics Institute, Agency for Science , Technology and Research (A*STAR) , 30 Biopolis Street, #07-01 Matrix, Singapore 138671 , Republic of Singapore.,b School of Computer Engineering , Nanyang Technological University (NTU) , 50 Nanyang Drive, Singapore 637553 , Republic of Singapore
| |
Collapse
|
37
|
Ozer EA. ClustAGE: a tool for clustering and distribution analysis of bacterial accessory genomic elements. BMC Bioinformatics 2018; 19:150. [PMID: 29678129 PMCID: PMC5910555 DOI: 10.1186/s12859-018-2154-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 04/11/2018] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The non-conserved accessory genome of bacteria can be associated with important adaptive characteristics that can contribute to niche specificity or pathogenicity of strains. High degrees of structural and compositional diversity in genomic islands and other elements of the accessory genome can complicate characterization of accessory genome contents among populations of strains. Methods for easily and effectively defining the distributions of discrete elements of the accessory genome among bacterial strains in a population are needed to explore the relationships between the flexible genome and bacterial adaptive traits. RESULTS We have developed the open-source software package ClustAGE. This program, written in Perl, uses BLAST to cluster nucleotide accessory genomic elements from the genomes of multiple bacterial strains and to identify their distribution within the study population. The program output can be used in combination with strain phenotype data or other characteristics to detect associations. Optional graphical output is available for visualizing accessory genome gene content and distribution patterns. The capabilities of the software are demonstrated on a collection of 14 Pseudomonas aeruginosa genome sequences. CONCLUSIONS The ClustAGE software and utilities are effective for identifying characteristics and distributions of accessory genomic elements among groups of bacterial genomes. The ability to easily and effectively characterize the accessory genome of a sequence collection may provide a better understanding of the accessory genome's contribution to a species' adaptation and pathogenesis. The ClustAGE source code can be downloaded from https://clustage.sourceforge.io and a limited web-based implementation is available at http://vfsmspineagent.fsm.northwestern.edu/cgi-bin/clustage.cgi .
Collapse
Affiliation(s)
- Egon A Ozer
- Department of Medicine, Division of Infectious Diseases, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA.
| |
Collapse
|
38
|
Scholte LLS, Pascoal-Xavier MA, Nahum LA. Helminths and Cancers From the Evolutionary Perspective. Front Med (Lausanne) 2018; 5:90. [PMID: 29713629 PMCID: PMC5911458 DOI: 10.3389/fmed.2018.00090] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Accepted: 03/22/2018] [Indexed: 01/20/2023] Open
Abstract
Helminths include free-living and parasitic Platyhelminthes and Nematoda which infect millions of people worldwide. Some Platyhelminthes species of blood flukes (Schistosoma haematobium, Schistosoma japonicum, and Schistosoma mansoni) and liver flukes (Clonorchis sinensis and Opisthorchis viverrini) are known to be involved in human cancers. Other helminths are likely to be carcinogenic. Our main goals are to summarize the current knowledge of human cancers caused by Platyhelminthes, point out some helminth and human biomarkers identified so far, and highlight the potential contributions of phylogenetics and molecular evolution to cancer research. Human cancers caused by helminth infection include cholangiocarcinoma, colorectal hepatocellular carcinoma, squamous cell carcinoma, and urinary bladder cancer. Chronic inflammation is proposed as a common pathway for cancer initiation and development. Furthermore, different bacteria present in gastric, colorectal, and urogenital microbiomes might be responsible for enlarging inflammatory and fibrotic responses in cancers. Studies have suggested that different biomarkers are involved in helminth infection and human cancer development; although, the detailed mechanisms remain under debate. Different helminth proteins have been studied by different approaches. However, their evolutionary relationships remain unsolved. Here, we illustrate the strengths of homology identification and function prediction of uncharacterized proteins from genome sequencing projects based on an evolutionary framework. Together, these approaches may help identifying new biomarkers for disease diagnostics and intervention measures. This work has potential applications in the field of phylomedicine (evolutionary medicine) and may contribute to parasite and cancer research.
Collapse
Affiliation(s)
- Larissa L. S. Scholte
- Instituto René Rachou, Fundação Oswaldo Cruz (FIOCRUZ), Belo Horizonte, Brazil
- Vice-Presidência de Pesquisa e Coleções Biológicas, Fundação Oswaldo Cruz (FIOCRUZ), Rio de Janeiro, Brazil
| | - Marcelo A. Pascoal-Xavier
- Instituto René Rachou, Fundação Oswaldo Cruz (FIOCRUZ), Belo Horizonte, Brazil
- Departamento de Anatomia Patológica, Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Laila A. Nahum
- Instituto René Rachou, Fundação Oswaldo Cruz (FIOCRUZ), Belo Horizonte, Brazil
- Faculdade Promove de Tecnologia, Belo Horizonte, Brazil
| |
Collapse
|
39
|
Identification of new antibacterial targets in RNA polymerase of Mycobacterium tuberculosis by detecting positive selection sites. Comput Biol Chem 2018; 73:25-30. [PMID: 29413813 DOI: 10.1016/j.compbiolchem.2017.11.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2016] [Revised: 10/07/2017] [Accepted: 11/03/2017] [Indexed: 01/22/2023]
Abstract
Bacterial RNA polymerase (RNAP) is an effective target for antibacterial treatment. In order to search new potential targets in RNAP of Mycobacterium, we detected adaptive selections of RNAP related genes in 13 strains of Mycobacterium by phylogenetic analysis. We first collected sequences of 17 genes including rpoA, rpoB, rpoC, rpoZ, and sigma factor A-M. Then maximum likelihood trees were constructed, followed by positive selection detection. We found that sigG shows positive selection along the clade (M. tuberculosis, M. bovis), suggesting its important evolutionary role and its potential to be a new antibacterial target. Moreover, the regions near 933Cys and 935His on the rpoB subunit of M. tuberculosis showed significant positive selection, which could also be a new attractive target for anti-tuberculosis drugs.
Collapse
|
40
|
de Castro MR, Tostes CDS, Dávila AMR, Senger H, da Silva FAB. SparkBLAST: scalable BLAST processing using in-memory operations. BMC Bioinformatics 2017; 18:318. [PMID: 28655296 PMCID: PMC5488373 DOI: 10.1186/s12859-017-1723-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 06/12/2017] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computational resources and Apache Spark as the coordination framework. As a proof of concept, some radionuclide-resistant bacterial genomes were selected for similarity analysis. RESULTS Experiments in Google and Microsoft Azure clouds demonstrated that SparkBLAST outperforms an equivalent system implemented on Hadoop in terms of speedup and execution times. CONCLUSIONS The superior performance of SparkBLAST is mainly due to the in-memory operations available through the Spark framework, consequently reducing the number of local I/O operations required for distributed BLAST processing.
Collapse
Affiliation(s)
- Marcelo Rodrigo de Castro
- Computer Science Department, Federal University of São Carlos, Rod. Washington Luís, Km 235, São Carlos, 21040-900, Brazil
| | | | - Alberto M R Dávila
- LBCS-IOC, Oswaldo Cruz Foundation, Av Brasil 4365, Rio de Janeiro, 21040-900, Brazil
| | - Hermes Senger
- Computer Science Department, Federal University of São Carlos, Rod. Washington Luís, Km 235, São Carlos, 21040-900, Brazil
| | | |
Collapse
|
41
|
Knüppel R, Kuttenberger C, Ferreira-Cerca S. Toward Time-Resolved Analysis of RNA Metabolism in Archaea Using 4-Thiouracil. Front Microbiol 2017; 8:286. [PMID: 28286499 PMCID: PMC5323407 DOI: 10.3389/fmicb.2017.00286] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 02/13/2017] [Indexed: 11/13/2022] Open
Abstract
Archaea are widespread organisms colonizing almost every habitat on Earth. However, the molecular biology of archaea still remains relatively uncharacterized. RNA metabolism is a central cellular process, which has been extensively analyzed in both bacteria and eukarya. In contrast, analysis of RNA metabolism dynamic in archaea has been limited to date. To facilitate analysis of the RNA metabolism dynamic at a system-wide scale in archaea, we have established non-radioactive pulse labeling of RNA, using the nucleotide analog 4-thiouracil (4TU) in two commonly used model archaea: the halophile Euryarchaeota Haloferax volcanii, and the thermo-acidophile Crenarchaeota Sulfolobus acidocaldarius. In this work, we show that 4TU pulse labeling can be efficiently performed in these two organisms in a dose- and time-dependent manner. In addition, our results suggest that uracil prototrophy had no critical impact on the overall 4TU incorporation in RNA molecules. Accordingly, our work suggests that 4TU incorporation can be widely performed in archaea, thereby expanding the molecular toolkit to analyze archaeal gene expression network dynamic in unprecedented detail.
Collapse
Affiliation(s)
- Robert Knüppel
- Biochemistry III, Institute for Biochemistry, Genetics and Microbiology, University of Regensburg Regensburg, Germany
| | - Corinna Kuttenberger
- Biochemistry III, Institute for Biochemistry, Genetics and Microbiology, University of Regensburg Regensburg, Germany
| | - Sébastien Ferreira-Cerca
- Biochemistry III, Institute for Biochemistry, Genetics and Microbiology, University of Regensburg Regensburg, Germany
| |
Collapse
|
42
|
Allen SL, Bonduriansky R, Sgro CM, Chenoweth SF. Sex-biased transcriptome divergence along a latitudinal gradient. Mol Ecol 2017; 26:1256-1272. [PMID: 28100025 DOI: 10.1111/mec.14015] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Revised: 11/23/2016] [Accepted: 11/28/2016] [Indexed: 12/26/2022]
Abstract
Sex-dependent gene expression is likely an important genomic mechanism that allows sex-specific adaptation to environmental changes. Among Drosophila species, sex-biased genes display remarkably consistent evolutionary patterns; male-biased genes evolve faster than unbiased genes in both coding sequence and expression level, suggesting sex differences in selection through time. However, comparatively little is known of the evolutionary process shaping sex-biased expression within species. Latitudinal clines offer an opportunity to examine how changes in key ecological parameters also influence sex-specific selection and the evolution of sex-biased gene expression. We assayed male and female gene expression in Drosophila serrata along a latitudinal gradient in eastern Australia spanning most of its endemic distribution. Analysis of 11 631 genes across eight populations revealed strong sex differences in the frequency, mode and strength of divergence. Divergence was far stronger in males than females and while latitudinal clines were evident in both sexes, male divergence was often population specific, suggesting responses to localized selection pressures that do not covary predictably with latitude. While divergence was enriched for male-biased genes, there was no overrepresentation of X-linked genes in males. By contrast, X-linked divergence was elevated in females, especially for female-biased genes. Many genes that diverged in D. serrata have homologs also showing latitudinal divergence in Drosophila simulans and Drosophila melanogaster on other continents, likely indicating parallel adaptation in these distantly related species. Our results suggest that sex differences in selection play an important role in shaping the evolution of gene expression over macro- and micro-ecological spatial scales.
Collapse
Affiliation(s)
- Scott L Allen
- The School of Biological Sciences, The University of Queensland, St. Lucia, Qld, 4072, Australia
| | - Russell Bonduriansky
- Evolution & Ecology Research Centre and School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Carla M Sgro
- School of Biological Sciences, Monash University, Melbourne, Vic., 3800, Australia
| | - Stephen F Chenoweth
- The School of Biological Sciences, The University of Queensland, St. Lucia, Qld, 4072, Australia
| |
Collapse
|
43
|
Bayram H, Sayadi A, Goenaga J, Immonen E, Arnqvist G. Novel seminal fluid proteins in the seed beetle Callosobruchus maculatus identified by a proteomic and transcriptomic approach. INSECT MOLECULAR BIOLOGY 2017; 26:58-73. [PMID: 27779332 DOI: 10.1111/imb.12271] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The seed beetle Callosobruchus maculatus is a significant agricultural pest and increasingly studied model of sexual conflict. Males possess genital spines that increase the transfer of seminal fluid proteins (SFPs) into the female body. As SFPs alter female behaviour and physiology, they are likely to modulate reproduction and sexual conflict in this species. Here, we identified SFPs using proteomics combined with a de novo transcriptome. A prior 2D-sodium dodecyl sulphate polyacrylamide gel electrophoresis analysis identified male accessory gland protein spots that were probably transferred to the female at mating. Proteomic analysis of these spots identified 98 proteins, a majority of which were also present within ejaculates collected from females. Standard annotation workflows revealed common functional groups for SFPs, including proteases and metabolic proteins. Transcriptomic analysis found 84 transcripts differentially expressed between the sexes. Notably, genes encoding 15 proteins were highly expressed in male abdomens and only negligibly expressed within females. Most of these sequences corresponded to 'unknown' proteins (nine of 15) and may represent rapidly evolving SFPs novel to seed beetles. Our combined analyses highlight 44 proteins for which there is strong evidence that they are SFPs. These results can inform further investigation, to better understand the molecular mechanisms of sexual conflict in seed beetles.
Collapse
Affiliation(s)
- H Bayram
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - A Sayadi
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - J Goenaga
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - E Immonen
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - G Arnqvist
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| |
Collapse
|
44
|
Cong Y, Chan YB, Phillips CA, Langston MA, Ragan MA. Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF. Front Microbiol 2017; 8:21. [PMID: 28154557 PMCID: PMC5243798 DOI: 10.3389/fmicb.2017.00021] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Accepted: 01/04/2017] [Indexed: 11/13/2022] Open
Abstract
Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k. Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k.
Collapse
Affiliation(s)
- Yingnan Cong
- Institute for Molecular Bioscience and ARC Centre of Excellence in Bioinformatics, University of Queensland, St Lucia QLD, Australia
| | - Yao-Ban Chan
- School of Mathematics and Statistics, University of Melbourne, Parkville VIC, Australia
| | - Charles A Phillips
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville TN, USA
| | - Michael A Langston
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville TN, USA
| | - Mark A Ragan
- Institute for Molecular Bioscience and ARC Centre of Excellence in Bioinformatics, University of Queensland, St Lucia QLD, Australia
| |
Collapse
|
45
|
Abstract
Protein function is a concept that can have different interpretations in different biological contexts, and the number and diversity of novel proteins identified by large-scale "omics" technologies poses increasingly new challenges. In this review we explore current strategies used to predict protein function focused on high-throughput sequence analysis, as for example, inference based on sequence similarity, sequence composition, structure, and protein-protein interaction. Various prediction strategies are discussed together with illustrative workflows highlighting the use of some benchmark tools and knowledge bases in the field.
Collapse
Affiliation(s)
- Leonardo Magalhães Cruz
- Department of Biochemistry and Molecular Biology, Federal University of Paraná (UFPR), Curitiba, PR, Brazil.
- Sector of Professional and Technological Education, Federal University of Paraná (UFPR), Curitiba, PR, Brazil.
| | - Sheyla Trefflich
- Sector of Professional and Technological Education, Federal University of Paraná (UFPR), Curitiba, PR, Brazil
| | - Vinícius Almir Weiss
- Sector of Professional and Technological Education, Federal University of Paraná (UFPR), Curitiba, PR, Brazil
| | - Mauro Antônio Alves Castro
- Sector of Professional and Technological Education, Federal University of Paraná (UFPR), Curitiba, PR, Brazil
| |
Collapse
|
46
|
Inferring Functional Relationships from Conservation of Gene Order. Methods Mol Biol 2016. [PMID: 27896735 DOI: 10.1007/978-1-4939-6613-4_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Predicting functional associations using the Gene Neighbor Method depends on the simple idea that if genes are conserved next to each other in evolutionarily distant prokaryotes they might belong to a polycistronic transcription unit. The procedure presented in this chapter starts with the organization of the genes within genomes into pairs of adjacent genes. Then, the pairs of adjacent genes in a genome of interest are mapped to their corresponding orthologs in other, informative, genomes. The final step is to verify if the mapped orthologs are also pairs of adjacent genes in the informative genomes.
Collapse
|
47
|
Yap CK, Eisenhaber B, Eisenhaber F, Wong WC. xHMMER3x2: Utilizing HMMER3's speed and HMMER2's sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation. Biol Direct 2016; 11:63. [PMID: 27894340 PMCID: PMC5126834 DOI: 10.1186/s13062-016-0163-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2016] [Accepted: 10/24/2016] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND While the local-mode HMMER3 is notable for its massive speed improvement, the slower glocal-mode HMMER2 is more exact for domain annotation by enforcing full domain-to-sequence alignments. Since a unit of domain necessarily implies a unit of function, local-mode HMMER3 alone remains insufficient for precise function annotation tasks. In addition, the incomparable E-values for the same domain model by different HMMER builds create difficulty when checking for domain annotation consistency on a large-scale basis. RESULTS In this work, both the speed of HMMER3 and glocal-mode alignment of HMMER2 are combined within the xHMMER3x2 framework for tackling the large-scale domain annotation task. Briefly, HMMER3 is utilized for initial domain detection so that HMMER2 can subsequently perform the glocal-mode, sequence-to-full-domain alignments for the detected HMMER3 hits. An E-value calibration procedure is required to ensure that the search space by HMMER2 is sufficiently replicated by HMMER3. We find that the latter is straightforwardly possible for ~80% of the models in the Pfam domain library (release 29). However in the case of the remaining ~20% of HMMER3 domain models, the respective HMMER2 counterparts are more sensitive. Thus, HMMER3 searches alone are insufficient to ensure sensitivity and a HMMER2-based search needs to be initiated. When tested on the set of UniProt human sequences, xHMMER3x2 can be configured to be between 7× and 201× faster than HMMER2, but with descending domain detection sensitivity from 99.8 to 95.7% with respect to HMMER2 alone; HMMER3's sensitivity was 95.7%. At extremes, xHMMER3x2 is either the slow glocal-mode HMMER2 or the fast HMMER3 with glocal-mode. Finally, the E-values to false-positive rates (FPR) mapping by xHMMER3x2 allows E-values of different model builds to be compared, so that any annotation discrepancies in a large-scale annotation exercise can be flagged for further examination by dissectHMMER. CONCLUSION The xHMMER3x2 workflow allows large-scale domain annotation speed to be drastically improved over HMMER2 without compromising for domain-detection with regard to sensitivity and sequence-to-domain alignment incompleteness. The xHMMER3x2 code and its webserver (for Pfam release 27, 28 and 29) are freely available at http://xhmmer3x2.bii.a-star.edu.sg/ . REVIEWERS Reviewed by Thomas Dandekar, L. Aravind, Oliviero Carugo and Shamil Sunyaev. For the full reviews, please go to the Reviewers' comments section.
Collapse
Affiliation(s)
- Choon-Kong Yap
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671, Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671, Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671, Singapore. .,School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore, 637553, Singapore.
| | - Wing-Cheong Wong
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671, Singapore.
| |
Collapse
|
48
|
Gupta M, Chauhan R, Prasad Y, Wadhwa G, Jain CK. Protein-protein interaction and molecular dynamics analysis for identification of novel inhibitors in Burkholderia cepacia GG4. Comput Biol Chem 2016; 65:80-90. [PMID: 27776248 DOI: 10.1016/j.compbiolchem.2016.10.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2016] [Revised: 09/24/2016] [Accepted: 10/06/2016] [Indexed: 11/25/2022]
Abstract
The lack of complete treatments and appearance of multiple drug-resistance strains of Burkholderia cepacia complex (Bcc) are causing an increased risk of lung infections in cystic fibrosis patients. Bcc infection is a big risk to human health and demands an urgent need to identify new therapeutics against these bacteria. Network biology has emerged as one of the prospective hope in identifying novel drug targets and hits. We have applied protein-protein interaction methodology to identify new drug-target candidates (orthologs) in Burkhloderia cepacia GG4, which is an important strain for studying the quorum-sensing phenomena. An evolutionary based ortholog mapping approach has been applied for generating the large scale protein-protein interactions in B. Cepacia. As a case study, one of the identified drug targets; GEM_3202, a NH (3)-dependent NAD synthetase protein has been studied and the potential ligand molecules were screened using the ZINC database. The three dimensional structure (NH (3)-dependent NAD synthetase protein) has been predicted from MODELLERv9.11 tool using multiple PDB templates such as 3DPI, 2PZ8 and 1NSY with sequence identity of 76%, 50% and 50% respectively. The structure has been validated with Ramachandaran plot having 100% residues of NadE in allowed region and overall quality factor of 81.75 using ERRAT tool. High throughput screening and Vina resulted in two potential hits against NadE such as ZINC83103551 and ZINC38008121. These molecules showed lowest binding energy of -5.7kcalmol-1 and high stability in the binding pockets during molecular dynamics simulation analysis. The similar approach for target identification could be applied for clinical strains of other pathogenic microbes.
Collapse
Affiliation(s)
- Money Gupta
- Department of Biotechnology, Jaypee Institute of Information Technology, A-10, Sector-62, Noida, Uttar Pradesh, 201307, India
| | - Rashi Chauhan
- Department of Biotechnology, Jaypee Institute of Information Technology, A-10, Sector-62, Noida, Uttar Pradesh, 201307, India
| | - Yamuna Prasad
- Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, 110016, India
| | - Gulshan Wadhwa
- Department of Biotechnology (DBT), Ministry of Science & Technology, New Delhi-110003, India
| | - Chakresh Kumar Jain
- Department of Biotechnology, Jaypee Institute of Information Technology, A-10, Sector-62, Noida, Uttar Pradesh, 201307, India.
| |
Collapse
|
49
|
Naqvi AAT, Anjum F, Khan FI, Islam A, Ahmad F, Hassan MI. Sequence Analysis of Hypothetical Proteins from Helicobacter pylori 26695 to Identify Potential Virulence Factors. Genomics Inform 2016; 14:125-135. [PMID: 27729842 PMCID: PMC5056897 DOI: 10.5808/gi.2016.14.3.125] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 08/05/2016] [Accepted: 08/29/2016] [Indexed: 12/16/2022] Open
Abstract
Helicobacter pylori is a Gram-negative bacteria that is responsible for gastritis in human. Its spiral flagellated body helps in locomotion and colonization in the host environment. It is capable of living in the highly acidic environment of the stomach with the help of acid adaptive genes. The genome of H. pylori 26695 strain contains 1,555 coding genes that encode 1,445 proteins. Out of these, 340 proteins are characterized as hypothetical proteins (HP). This study involves extensive analysis of the HPs using an established pipeline which comprises various bioinformatics tools and databases to find out probable functions of the HPs and identification of virulence factors. After extensive analysis of all the 340 HPs, we found that 104 HPs are showing characteristic similarities with the proteins with known functions. Thus, on the basis of such similarities, we assigned probable functions to 104 HPs with high confidence and precision. All the predicted HPs contain representative members of diverse functional classes of proteins such as enzymes, transporters, binding proteins, regulatory proteins, proteins involved in cellular processes and other proteins with miscellaneous functions. Therefore, we classified 104 HPs into aforementioned functional groups. During the virulence factors analysis of the HPs, we found 11 HPs are showing significant virulence. The identification of virulence proteins with the help their predicted functions may pave the way for drug target estimation and development of effective drug to counter the activity of that protein.
Collapse
Affiliation(s)
- Ahmad Abu Turab Naqvi
- Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi 110025, India
| | - Farah Anjum
- Female College of Applied Medical Science, Taif University, Al-Taif 21974, Kingdom of Saudi Arabia
| | - Faez Iqbal Khan
- School of Chemistry and Chemical Engineering, Henan University of Technology, Henan 450001, China
| | - Asimul Islam
- Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi 110025, India
| | - Faizan Ahmad
- Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi 110025, India
| | - Md Imtaiyaz Hassan
- Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi 110025, India
| |
Collapse
|
50
|
Saripella GV, Sonnhammer ELL, Forslund K. Benchmarking the next generation of homology inference tools. Bioinformatics 2016; 32:2636-41. [PMID: 27256311 PMCID: PMC5013910 DOI: 10.1093/bioinformatics/btw305] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 05/05/2016] [Indexed: 12/21/2022] Open
Abstract
Motivation: Over the last decades, vast numbers of sequences were deposited in public databases. Bioinformatics tools allow homology and consequently functional inference for these sequences. New profile-based homology search tools have been introduced, allowing reliable detection of remote homologs, but have not been systematically benchmarked. To provide such a comparison, which can guide bioinformatics workflows, we extend and apply our previously developed benchmark approach to evaluate the ‘next generation’ of profile-based approaches, including CS-BLAST, HHSEARCH and PHMMER, in comparison with the non-profile based search tools NCBI-BLAST, USEARCH, UBLAST and FASTA. Method: We generated challenging benchmark datasets based on protein domain architectures within either the PFAM + Clan, SCOP/Superfamily or CATH/Gene3D domain definition schemes. From each dataset, homologous and non-homologous protein pairs were aligned using each tool, and standard performance metrics calculated. We further measured congruence of domain architecture assignments in the three domain databases. Results: CSBLAST and PHMMER had overall highest accuracy. FASTA, UBLAST and USEARCH showed large trade-offs of accuracy for speed optimization. Conclusion: Profile methods are superior at inferring remote homologs but the difference in accuracy between methods is relatively small. PHMMER and CSBLAST stand out with the highest accuracy, yet still at a reasonable computational cost. Additionally, we show that less than 0.1% of Swiss-Prot protein pairs considered homologous by one database are considered non-homologous by another, implying that these classifications represent equivalent underlying biological phenomena, differing mostly in coverage and granularity. Availability and Implementation: Benchmark datasets and all scripts are placed at (http://sonnhammer.org/download/Homology_benchmark). Contact:forslund@embl.de Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ganapathi Varma Saripella
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Stockholm SE-10691, Sweden
| | - Erik L L Sonnhammer
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Stockholm SE-10691, Sweden
| | - Kristoffer Forslund
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg 69117, Germany
| |
Collapse
|