1
|
Földi C, Merényi Z, Balázs B, Csernetics Á, Miklovics N, Wu H, Hegedüs B, Virágh M, Hou Z, Liu XB, Galgóczy L, Nagy LG. Snowball: a novel gene family required for developmental patterning of fruiting bodies of mushroom-forming fungi (Agaricomycetes). mSystems 2024; 9:e0120823. [PMID: 38334416 PMCID: PMC10949477 DOI: 10.1128/msystems.01208-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 01/10/2024] [Indexed: 02/10/2024] Open
Abstract
The morphogenesis of sexual fruiting bodies of fungi is a complex process determined by a genetically encoded program. Fruiting bodies reached the highest complexity levels in the Agaricomycetes; yet, the underlying genetics is currently poorly known. In this work, we functionally characterized a highly conserved gene termed snb1, whose expression level increases rapidly during fruiting body initiation. According to phylogenetic analyses, orthologs of snb1 are present in almost all agaricomycetes and may represent a novel conserved gene family that plays a substantial role in fruiting body development. We disrupted snb1 using CRISPR/Cas9 in the agaricomycete model organism Coprinopsis cinerea. snb1 deletion mutants formed unique, snowball-shaped, rudimentary fruiting bodies that could not differentiate caps, stipes, and lamellae. We took advantage of this phenotype to study fruiting body differentiation using RNA-Seq analyses. This revealed differentially regulated genes and gene families that, based on wild-type RNA-Seq data, were upregulated early during development and showed tissue-specific expression, suggesting a potential role in differentiation. Taken together, the novel gene family of snb1 and the differentially expressed genes in the snb1 mutants provide valuable insights into the complex mechanisms underlying developmental patterning in the Agaricomycetes. IMPORTANCE Fruiting bodies of mushroom-forming fungi (Agaricomycetes) are complex multicellular structures, with a spatially and temporally integrated developmental program that is, however, currently poorly known. In this study, we present a novel, conserved gene family, Snowball (snb), termed after the unique, differentiation-less fruiting body morphology of snb1 knockout strains in the model mushroom Coprinopsis cinerea. snb is a gene of unknown function that is highly conserved among agaricomycetes and encodes a protein of unknown function. A comparative transcriptomic analysis of the early developmental stages of differentiated wild-type and non-differentiated mutant fruiting bodies revealed conserved differentially expressed genes which may be related to tissue differentiation and developmental patterning fruiting body development.
Collapse
Affiliation(s)
- Csenge Földi
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Szeged, Hungary
- Doctoral School of Biology, Faculty of Science and Informatics, University of Szeged, Szeged, Hungary
| | - Zsolt Merényi
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Szeged, Hungary
| | - Bálint Balázs
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Szeged, Hungary
| | - Árpád Csernetics
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Szeged, Hungary
| | - Nikolett Miklovics
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Szeged, Hungary
| | - Hongli Wu
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Szeged, Hungary
| | - Botond Hegedüs
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Szeged, Hungary
| | - Máté Virágh
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Szeged, Hungary
| | - Zhihao Hou
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Szeged, Hungary
| | - Xiao-Bin Liu
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Szeged, Hungary
| | - László Galgóczy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Szeged, Hungary
- Department of Biotechnology, Faculty of Science and Informatics, University of Szeged, Szeged, Hungary
| | - László G. Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Szeged, Hungary
| |
Collapse
|
2
|
Zhao B, Zhao J, Wang M, Guo Y, Mehmood A, Wang W, Xiong Y, Luo S, Wei DQ, Zhao XQ, Wang Y. Exploring microproteins from various model organisms using the mip-mining database. BMC Genomics 2023; 24:661. [PMID: 37919660 PMCID: PMC10623795 DOI: 10.1186/s12864-023-09735-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 10/12/2023] [Indexed: 11/04/2023] Open
Abstract
Microproteins, prevalent across all kingdoms of life, play a crucial role in cell physiology and human health. Although global gene transcription is widely explored and abundantly available, our understanding of microprotein functions using transcriptome data is still limited. To mitigate this problem, we present a database, Mip-mining ( https://weilab.sjtu.edu.cn/mipmining/ ), underpinned by high-quality RNA-sequencing data exclusively aimed at analyzing microprotein functions. The Mip-mining hosts 336 sets of high-quality transcriptome data from 8626 samples and nine representative living organisms, including microorganisms, plants, animals, and humans, in our Mip-mining database. Our database specifically provides a focus on a range of diseases and environmental stress conditions, taking into account chemical, physical, biological, and diseases-related stresses. Comparatively, our platform enables customized analysis by inputting desired data sets with self-determined cutoff values. The practicality of Mip-mining is demonstrated by identifying essential microproteins in different species and revealing the importance of ATP15 in the acetic acid stress tolerance of budding yeast. We believe that Mip-mining will facilitate a greater understanding and application of microproteins in biotechnology. Moreover, it will be beneficial for designing therapeutic strategies under various biological conditions.
Collapse
Affiliation(s)
- Bowen Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Jing Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Muyao Wang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yangfan Guo
- Central Laboratory of Yan'an Hospital Affiliated to Kunming Medical University, Kunming, 650051, China
| | - Aamir Mehmood
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Weibin Wang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Shenggan Luo
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nayang, Henan, 473006, China.
- Peng Cheng Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nanshan District, Shenzhen, 518055, Guangdong, China.
| | - Xin-Qing Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Engineering Research Center of Cell & Therapeutic Antibody, School of Pharmacy, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
3
|
Yang J, Yue HR, Pan LY, Feng JX, Zhao S, Suwannarangsee S, Chempreda V, Liu CG, Zhao XQ. Fungal strain improvement for efficient cellulase production and lignocellulosic biorefinery: Current status and future prospects. BIORESOURCE TECHNOLOGY 2023:129449. [PMID: 37406833 DOI: 10.1016/j.biortech.2023.129449] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 06/29/2023] [Accepted: 07/01/2023] [Indexed: 07/07/2023]
Abstract
Lignocellulosic biomass (LCB) has been recognized as a valuable carbon source for the sustainable production of biofuels and value-added biochemicals. Crude enzymes produced by fungal cell factories benefit economic LCB degradation. However, high enzyme production cost remains a great challenge. Filamentous fungi have been widely used to produce cellulolytic enzymes. Metabolic engineering of fungi contributes to efficient cellulase production for LCB biorefinery. Here the latest progress in utilizing fungal cell factories for cellulase production was summarized, including developing genome engineering tools to improve the efficiency of fungal cell factories, manipulating promoters, and modulating transcription factors. Multi-omics analysis of fungi contributes to identifying novel genetic elements for enhancing cellulase production. Furthermore, the importance of translation regulation of cellulase production are emphasized. Efficient development of fungal cell factories based on integrative strain engineering would benefit the overall bioconversion efficacy of LCB for sustainable bioproduction.
Collapse
Affiliation(s)
- Jie Yang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hou-Ru Yue
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Li-Ya Pan
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi Research Center for Microbial and Enzyme Engineering Technology, College of Life Science and Technology, Guangxi University, Nanning 530004, China
| | - Jia-Xun Feng
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi Research Center for Microbial and Enzyme Engineering Technology, College of Life Science and Technology, Guangxi University, Nanning 530004, China
| | - Shuai Zhao
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi Research Center for Microbial and Enzyme Engineering Technology, College of Life Science and Technology, Guangxi University, Nanning 530004, China
| | - Surisa Suwannarangsee
- National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Khlong Luang, Pathumthani 12120, Thailand
| | - Verawat Chempreda
- National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Khlong Luang, Pathumthani 12120, Thailand
| | - Chen-Guang Liu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xin-Qing Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.
| |
Collapse
|
4
|
Sruthi KB, Menon A, P A, Vasudevan Soniya E. Pervasive translation of small open reading frames in plant long non-coding RNAs. FRONTIERS IN PLANT SCIENCE 2022; 13:975938. [PMID: 36352887 PMCID: PMC9638090 DOI: 10.3389/fpls.2022.975938] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 09/29/2022] [Indexed: 06/16/2023]
Abstract
Long non-coding RNAs (lncRNAs) are primarily recognized as non-coding transcripts longer than 200 nucleotides with low coding potential and are present in both eukaryotes and prokaryotes. Recent findings reveal that lncRNAs can code for micropeptides in various species. Micropeptides are generated from small open reading frames (smORFs) and have been discovered frequently in short mRNAs and non-coding RNAs, such as lncRNAs, circular RNAs, and pri-miRNAs. The most accepted definition of a smORF is an ORF containing fewer than 100 codons, and ribosome profiling and mass spectrometry are the most prevalent experimental techniques used to identify them. Although the majority of micropeptides perform critical roles throughout plant developmental processes and stress conditions, only a handful of their functions have been verified to date. Even though more research is being directed toward identifying micropeptides, there is still a dearth of information regarding these peptides in plants. This review outlines the lncRNA-encoded peptides, the evolutionary roles of such peptides in plants, and the techniques used to identify them. It also describes the functions of the pri-miRNA and circRNA-encoded peptides that have been identified in plants.
Collapse
|
5
|
Identification and analysis of smORFs in Chlamydomonas reinhardtii. Genomics 2022; 114:110444. [PMID: 35933072 DOI: 10.1016/j.ygeno.2022.110444] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 07/06/2022] [Accepted: 07/31/2022] [Indexed: 11/24/2022]
Abstract
Small open reading frames (smORFs) have been acknowledged as an important partner in organism functions ranging from bacteria to higher eukaryotes. However, lack of investigation of smORFs in green algae, despite their importance in ecology and evolution. We applied bioinformatic analysis, ribosome profiling, and small peptide proteomics to provide a genome-wide and high-confident smORF database in the model green alga Chlamydomonas reinhardtii. The whole genome was screened first to mine potential coding smORFs. Then conservative analysis, ribosome profiling, and proteomics data were processed to identify conserved smORFs and generate translation evidence. The combination of procedures resulted in 2014 smORFs that might exist in the C. reinhardtii genome. The expression of smORFs in Cd treatment suggested that two smORFs might participate in redox reaction, three in inorganic phosphate transport, and one in DNA repair under stress. Our study built a genome-widely database in C. reinhardtii, providing target smORFs for further research.
Collapse
|
6
|
Identification and characterisation of sPEPs in Cryptococcus neoformans. Fungal Genet Biol 2022; 160:103688. [PMID: 35339703 DOI: 10.1016/j.fgb.2022.103688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 03/02/2022] [Accepted: 03/21/2022] [Indexed: 11/24/2022]
Abstract
Short open reading frame (sORF)-encoded peptides (sPEPs) have been found across a wide range of genomic locations in a variety of species. To date, their identification, validation, and characterisation in the human fungal pathogen Cryptococcus neoformans has been limited due to a lack of standardised protocols. We have developed an enrichment process that enables sPEP detection within a protein sample from this polysaccharide-encapsulated yeast, and implemented proteogenomics to provide insights into the validity of predicted and hypothetical sORFs annotated in the C. neoformans genome. Novel sORFs were discovered within the 5' and 3' UTRs of known transcripts as well as in "non-coding" RNAs. One novel candidate, dubbed NPB1, that resided in an RNA annotated as "non-coding", was chosen for characterisation. Through the creation of both specific point mutations and a full deletion allele, the function of the new sPEP, Npb1, was shown to resemble that of the bacterial trans-translation protein SmpB.
Collapse
|
7
|
Choudhary S, Li W, D Smith A. Accurate detection of short and long active ORFs using Ribo-seq data. Bioinformatics 2020; 36:2053-2059. [PMID: 31750902 DOI: 10.1093/bioinformatics/btz878] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 11/04/2019] [Accepted: 11/20/2019] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Ribo-seq, a technique for deep-sequencing ribosome-protected mRNA fragments, has enabled transcriptome-wide monitoring of translation in vivo. It has opened avenues for re-evaluating the coding potential of open reading frames (ORFs), including many short ORFs that were previously presumed to be non-translating. However, the detection of translating ORFs, specifically short ORFs, from Ribo-seq data, remains challenging due to its high heterogeneity and noise. RESULTS We present ribotricer, a method for detecting actively translating ORFs by directly leveraging the three-nucleotide periodicity of Ribo-seq data. Ribotricer demonstrates higher accuracy and robustness compared with other methods at detecting actively translating ORFs including short ORFs on multiple published datasets across species inclusive of Arabidopsis, Caenorhabditis elegans, Drosophila, human, mouse, rat, yeast and zebrafish. AVAILABILITY AND IMPLEMENTATION Ribotricer is available at https://github.com/smithlabcode/ribotricer. All analysis scripts and results are available at https://github.com/smithlabcode/ribotricer-results. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Saket Choudhary
- Computational Biology and Bioinformatics, University of Southern California, Los Angeles, CA 90089, USA
| | - Wenzheng Li
- Computational Biology and Bioinformatics, University of Southern California, Los Angeles, CA 90089, USA
| | - Andrew D Smith
- Computational Biology and Bioinformatics, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
8
|
Khitun A, Slavoff SA. Proteomic Detection and Validation of Translated Small Open Reading Frames. ACTA ACUST UNITED AC 2020; 11:e77. [PMID: 31750990 DOI: 10.1002/cpch.77] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Small open reading frames (smORFs) encode previously unannotated polypeptides or short proteins that regulate translation in cis (eukaryotes) and/or are independently functional (prokaryotes and eukaryotes). Ongoing efforts for complete annotation and functional characterization of smORF-encoded proteins have yielded novel regulators and therapeutic targets. However, because they are excluded from protein databases, initiate at non-AUG start codons, and produce few unique tryptic peptides, unannotated small proteins cannot be detected with standard proteomic methods. Here,, we outline a procedure for mass spectrometry-based detection of translated smORFs in cultured human cells from protein extraction, digestion, and LC-MS/MS, to database preparation and data analysis. Following proteomic detection, translation from a unique smORF may be validated via siRNA-based silencing or overexpression and epitope tagging. This is necessary to unambiguously assign a peptide to a smORF within a specific transcript isoform or genomic locus. Provided that sufficient starting material is available, this workflow can be applied to any cell type/organism and adjusted to study specific (patho)physiological contexts including, but not limited to, development, stress, and disease. © 2019 by John Wiley & Sons, Inc. Basic Protocol 1: Protein extraction, size selection, and trypsin digestion Alternate Protocol 1: In-solution C8 column size selection Support Protocol 1: Chloroform/methanol precipitation Support Protocol 2: Reduction, alkylation, and in-solution protease digestion Support Protocol 3: Peptide de-salting Basic Protocol 2: Two-dimensional LC-MS/MS with ERLIC fractionation Basic Protocol 3: Transcriptomic database construction Alternate Protocol 2: Transcriptomics database generation with gffread Basic Protocol 4: Non-annotated peptide identification from LC-MS/MS data Basic Protocol 5: Validation using isotopically labeled synthetic peptide standards and siRNA Basic Protocol 6: Transcript validation using transient overexpression.
Collapse
Affiliation(s)
- Alexandra Khitun
- Department of Chemistry, Yale University, New Haven, Connecticut.,Chemical Biology Institute, Yale University, West Haven, Connecticut
| | - Sarah A Slavoff
- Department of Chemistry, Yale University, New Haven, Connecticut.,Chemical Biology Institute, Yale University, West Haven, Connecticut.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut
| |
Collapse
|
9
|
Scalzitti N, Jeannin-Girardon A, Collet P, Poch O, Thompson JD. A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms. BMC Genomics 2020; 21:293. [PMID: 32272892 PMCID: PMC7147072 DOI: 10.1186/s12864-020-6707-9] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 03/30/2020] [Indexed: 02/02/2023] Open
Abstract
Background The draft genome assemblies produced by new sequencing technologies present important challenges for automatic gene prediction pipelines, leading to less accurate gene models. New benchmark methods are needed to evaluate the accuracy of gene prediction methods in the face of incomplete genome assemblies, low genome coverage and quality, complex gene structures, or a lack of suitable sequences for evidence-based annotations. Results We describe the construction of a new benchmark, called G3PO (benchmark for Gene and Protein Prediction PrOgrams), designed to represent many of the typical challenges faced by current genome annotation projects. The benchmark is based on a carefully validated and curated set of real eukaryotic genes from 147 phylogenetically disperse organisms, and a number of test sets are defined to evaluate the effects of different features, including genome sequence quality, gene structure complexity, protein length, etc. We used the benchmark to perform an independent comparative analysis of the most widely used ab initio gene prediction programs and identified the main strengths and weaknesses of the programs. More importantly, we highlight a number of features that could be exploited in order to improve the accuracy of current prediction tools. Conclusions The experiments showed that ab initio gene structure prediction is a very challenging task, which should be further investigated. We believe that the baseline results associated with the complex gene test sets in G3PO provide useful guidelines for future studies.
Collapse
Affiliation(s)
- Nicolas Scalzitti
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Anne Jeannin-Girardon
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Pierre Collet
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| |
Collapse
|
10
|
R Cerqueira F, Vasconcelos ATR. OCCAM: prediction of small ORFs in bacterial genomes by means of a target-decoy database approach and machine learning techniques. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5989499. [PMID: 33206960 PMCID: PMC7673341 DOI: 10.1093/database/baaa067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 07/11/2020] [Accepted: 07/27/2020] [Indexed: 11/14/2022]
Abstract
Small open reading frames (ORFs) have been systematically disregarded by automatic genome annotation. The difficulty in finding patterns in tiny sequences is the main reason that makes small ORFs to be overlooked by computational procedures. However, advances in experimental methods show that small proteins can play vital roles in cellular activities. Hence, it is urgent to make progress in the development of computational approaches to speed up the identification of potential small ORFs. In this work, our focus is on bacterial genomes. We improve a previous approach to identify small ORFs in bacteria. Our method uses machine learning techniques and decoy subject sequences to filter out spurious ORF alignments. We show that an advanced multivariate analysis can be more effective in terms of sensitivity than applying the simplistic and widely used e-value cutoff. This is particularly important in the case of small ORFs for which alignments present higher e-values than usual. Experiments with control datasets show that the machine learning algorithms used in our method to curate significant alignments can achieve average sensitivity and specificity of 97.06% and 99.61%, respectively. Therefore, an important step is provided here toward the construction of more accurate computational tools for the identification of small ORFs in bacteria.
Collapse
Affiliation(s)
- Fabio R Cerqueira
- Department of Production Engineering, Universidade Federal Fluminense, Rua Domingos Silvério s/n, Petrópolis, 25 650-050, Rio de Janeiro, Brazil.,Graduate Program in Computer Science, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil
| | | |
Collapse
|
11
|
Korandla DR, Wozniak JM, Campeau A, Gonzalez DJ, Wright ES. AssessORF: combining evolutionary conservation and proteomics to assess prokaryotic gene predictions. Bioinformatics 2019; 36:1022-1029. [PMID: 31532487 PMCID: PMC7998711 DOI: 10.1093/bioinformatics/btz714] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 09/05/2019] [Accepted: 09/13/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION A core task of genomics is to identify the boundaries of protein coding genes, which may cover over 90% of a prokaryote's genome. Several programs are available for gene finding, yet it is currently unclear how well these programs perform and whether any offers superior accuracy. This is in part because there is no universal benchmark for gene finding and, therefore, most developers select their own benchmarking strategy. RESULTS Here, we introduce AssessORF, a new approach for benchmarking prokaryotic gene predictions based on evidence from proteomics data and the evolutionary conservation of start and stop codons. We applied AssessORF to compare gene predictions offered by GenBank, GeneMarkS-2, Glimmer and Prodigal on genomes spanning the prokaryotic tree of life. Gene predictions were 88-95% in agreement with the available evidence, with Glimmer performing the worst but no clear winner. All programs were biased towards selecting start codons that were upstream of the actual start. Given these findings, there remains considerable room for improvement, especially in the detection of correct start sites. AVAILABILITY AND IMPLEMENTATION AssessORF is available as an R package via the Bioconductor package repository. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Deepank R Korandla
- Department of Biological Sciences, USA,Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA,Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA 15219, USA
| | - Jacob M Wozniak
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
| | - Anaamika Campeau
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
| | - David J Gonzalez
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
| | | |
Collapse
|
12
|
Mathema VB, Dondorp AM, Imwong M. OSTRFPD: Multifunctional Tool for Genome-Wide Short Tandem Repeat Analysis for DNA, Transcripts, and Amino Acid Sequences with Integrated Primer Designer. Evol Bioinform Online 2019; 15:1176934319843130. [PMID: 31040636 PMCID: PMC6482647 DOI: 10.1177/1176934319843130] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 03/15/2019] [Indexed: 01/18/2023] Open
Abstract
Microsatellite mining is a common outcome of the in silico approach to genomic studies. The resulting short tandemly repeated DNA could be used as molecular markers for studying polymorphism, genotyping and forensics. The omni short tandem repeat finder and primer designer (OSTRFPD) is among the few versatile, platform-independent open-source tools written in Python that enables researchers to identify and analyse genome-wide short tandem repeats in both nucleic acids and protein sequences. OSTRFPD is designed to run either in a user-friendly fully featured graphical interface or in a command line interface mode for advanced users. OSTRFPD can detect both perfect and imperfect repeats of low complexity with customisable scores. Moreover, the software has built-in architecture to simultaneously filter selection of flanking regions in DNA and generate microsatellite-targeted primers implementing the Primer3 platform. The software has built-in motif-sequence generator engines and an additional option to use the dictionary mode for custom motif searches. The software generates search results including general statistics containing motif categorisation, repeat frequencies, densities, coverage, guanine–cytosine (GC) content, and simple text-based imperfect alignment visualisation. Thus, OSTRFPD presents users with a quick single-step solution package to assist development of microsatellite markers and categorise tandemly repeated amino acids in proteome databases. Practical implementation of OSTRFPD was demonstrated using publicly available whole-genome sequences of selected Plasmodium species. OSTRFPD is freely available and open-sourced for improvement and user-specific adaptation.
Collapse
Affiliation(s)
- Vivek Bhakta Mathema
- Department of Molecular Tropical Medicine and Genetics, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Arjen M Dondorp
- Mahidol-Oxford Tropical Medicine Research unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
- Centre for Tropical Medicine, Churchill Hospital, Oxford, UK
| | - Mallika Imwong
- Department of Molecular Tropical Medicine and Genetics, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
- Mallika Imwong, Department of Molecular Tropical Medicine and Genetics, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand.
| |
Collapse
|
13
|
Ahmad S, Gromiha MM, Raghava GPS, Schönbach C, Ranganathan S. APBioNet's annual International Conference on Bioinformatics (InCoB) returns to India in 2018. BMC Genomics 2019; 19:266. [PMID: 30999857 PMCID: PMC7402400 DOI: 10.1186/s12864-019-5582-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
InCoB, one of the largest annual bioinformatics conferences in the Asia-Pacific region since its launch in 2002, returned to New Delhi, India after 12 years, with a conference attendance of 314 delegates. The 2018 conference had sessions on Big Data and Algorithms, Next Generation Sequencing and Omics Science, Structure, Function and Interactions, Disease and Drug Discovery and Plant and Agricultural Bioinformatics. The conference also featured an industry track as well as panel discussions on Women in Bioinformatics and Democratization vs. Quality control in academic publishing. Asia Pacific Bioinformatics Interaction & Networking Society (APbians) was launched as an APBionet Special Interest Group. Of the 52 oral presentations made, 22 were accepted in supplemental issues of BMC Bioinformatics, BMC Genomics or BMC Medical Genomics and are briefly reviewed here. Next year’s InCoB will be held in Jakarta, Indonesia from September 10–12, 2019.
Collapse
Affiliation(s)
- Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110 067, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamilnadu, 600 036, India
| | - Gajendra P S Raghava
- Centre for Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, 110020, India
| | - Christian Schönbach
- Department of Biology, School of Science and Technology, Nazarbayev University, Astana, Kazakhstan.,International Research Center for Medical Sciences, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, 860-0811, Japan
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, NSW, 2109, Australia. .,Transformational Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation, Sydney, Australia.
| |
Collapse
|