1
|
Lin Y, Qi X, Wan Y, Chen Z, Fang H, Liang C. Genome-wide analysis of the MADS-box gene family in Lonicera japonica and a proposed floral organ identity model. BMC Genomics 2023; 24:447. [PMID: 37553575 PMCID: PMC10408238 DOI: 10.1186/s12864-023-09509-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 07/08/2023] [Indexed: 08/10/2023] Open
Abstract
BACKGROUND Lonicera japonica Thunb. is widely used in traditional Chinese medicine. Medicinal L. japonica mainly consists of dried flower buds and partially opened flowers, thus flowers are an important quality indicator. MADS-box genes encode transcription factors that regulate flower development. However, little is known about these genes in L. japonica. RESULTS In this study, 48 MADS-box genes were identified in L. japonica, including 20 Type-I genes (8 Mα, 2 Mβ, and 10 Mγ) and 28 Type-II genes (26 MIKCc and 2 MIKC*). The Type-I and Type-II genes differed significantly in gene structure, conserved domains, protein structure, chromosomal distribution, phylogenesis, and expression pattern. Type-I genes had a simpler gene structure, lacked the K domain, had low protein structure conservation, were tandemly distributed on the chromosomes, had more frequent lineage-specific duplications, and were expressed at low levels. In contrast, Type-II genes had a more complex gene structure; contained conserved M, I, K, and C domains; had highly conserved protein structure; and were expressed at high levels throughout the flowering period. Eleven floral homeotic MADS-box genes that are orthologous to the proposed Arabidopsis ABCDE model of floral organ identity determination, were identified in L. japonica. By integrating expression pattern and protein interaction data for these genes, we developed a possible model for floral organ identity determination. CONCLUSION This study genome-widely identified and characterized the MADS-box gene family in L. japonica. Eleven floral homeotic MADS-box genes were identified and a possible model for floral organ identity determination was also developed. This study contributes to our understanding of the MADS-box gene family and its possible involvement in floral organ development in L. japonica.
Collapse
Affiliation(s)
- Yi Lin
- Jiangsu Key Laboratory for the Research and Utilization of Plant Resources, Institute of Botany, Chinese Academy of Sciences, Nanjing, 210014, Jiangsu Province, China
- Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Xiwu Qi
- Jiangsu Key Laboratory for the Research and Utilization of Plant Resources, Institute of Botany, Chinese Academy of Sciences, Nanjing, 210014, Jiangsu Province, China
| | - Yan Wan
- Jiangsu Key Laboratory for the Research and Utilization of Plant Resources, Institute of Botany, Chinese Academy of Sciences, Nanjing, 210014, Jiangsu Province, China
- Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Zequn Chen
- Jiangsu Key Laboratory for the Research and Utilization of Plant Resources, Institute of Botany, Chinese Academy of Sciences, Nanjing, 210014, Jiangsu Province, China
| | - Hailing Fang
- Jiangsu Key Laboratory for the Research and Utilization of Plant Resources, Institute of Botany, Chinese Academy of Sciences, Nanjing, 210014, Jiangsu Province, China
| | - Chengyuan Liang
- Jiangsu Key Laboratory for the Research and Utilization of Plant Resources, Institute of Botany, Chinese Academy of Sciences, Nanjing, 210014, Jiangsu Province, China.
- Nanjing University of Chinese Medicine, Nanjing, 210023, China.
| |
Collapse
|
2
|
Sinha A, Sangeet S, Roy S. Evolution of Sequence and Structure of SARS-CoV-2 Spike Protein: A Dynamic Perspective. ACS OMEGA 2023; 8:23283-23304. [PMID: 37426203 PMCID: PMC10324094 DOI: 10.1021/acsomega.3c00944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 06/01/2023] [Indexed: 07/11/2023]
Abstract
Novel coronavirus (SARS-CoV-2) enters its host cell through a surface spike protein. The viral spike protein has undergone several modifications/mutations at the genomic level, through which it modulated its structure-function and passed through several variants of concern. Recent advances in high-resolution structure determination and multiscale imaging techniques, cost-effective next-generation sequencing, and development of new computational methods (including information theory, statistical methods, machine learning, and many other artificial intelligence-based techniques) have hugely contributed to the characterization of sequence, structure, function of spike proteins, and its different variants to understand viral pathogenesis, evolutions, and transmission. Laying on the foundation of the sequence-structure-function paradigm, this review summarizes not only the important findings on structure/function but also the structural dynamics of different spike components, highlighting the effects of mutations on them. As dynamic fluctuations of three-dimensional spike structure often provide important clues for functional modulation, quantifying time-dependent fluctuations of mutational events over spike structure and its genetic/amino acidic sequence helps identify alarming functional transitions having implications for enhanced fusogenicity and pathogenicity of the virus. Although these dynamic events are more difficult to capture than quantifying a static, average property, this review encompasses those challenging aspects of characterizing the evolutionary dynamics of spike sequence and structure and their implications for functions.
Collapse
|
3
|
Ejigu GF, Jung J. Review on the Computational Genome Annotation of Sequences Obtained by Next-Generation Sequencing. BIOLOGY 2020; 9:E295. [PMID: 32962098 PMCID: PMC7565776 DOI: 10.3390/biology9090295] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 09/13/2020] [Accepted: 09/16/2020] [Indexed: 12/16/2022]
Abstract
Next-Generation Sequencing (NGS) has made it easier to obtain genome-wide sequence data and it has shifted the research focus into genome annotation. The challenging tasks involved in annotation rely on the currently available tools and techniques to decode the information contained in nucleotide sequences. This information will improve our understanding of general aspects of life and evolution and improve our ability to diagnose genetic disorders. Here, we present a summary of both structural and functional annotations, as well as the associated comparative annotation tools and pipelines. We highlight visualization tools that immensely aid the annotation process and the contributions of the scientific community to the annotation. Further, we discuss quality-control practices and the need for re-annotation, and highlight the future of annotation.
Collapse
Affiliation(s)
| | - Jaehee Jung
- Department of Information and Communication Engineering, Myongji University, Yongin-si 17058, Gyeonggi-do, Korea;
| |
Collapse
|
4
|
In or Out? New Insights on Exon Recognition through Splice-Site Interdependency. Int J Mol Sci 2020; 21:ijms21072300. [PMID: 32225107 PMCID: PMC7177576 DOI: 10.3390/ijms21072300] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 03/13/2020] [Accepted: 03/23/2020] [Indexed: 01/02/2023] Open
Abstract
Noncanonical splice-site mutations are an important cause of inherited diseases. Based on in vitro and stem-cell-based studies, some splice-site variants show a stronger splice defect than expected based on their predicted effects, suggesting that other sequence motifs influence the outcome. We investigated whether splice defects due to human-inherited-disease-associated variants in noncanonical splice-site sequences in ABCA4, DMD, and TMC1 could be rescued by strengthening the splice site on the other side of the exon. Noncanonical 5′- and 3′-splice-site variants were selected. Rescue variants were introduced based on an increase in predicted splice-site strength, and the effects of these variants were analyzed using in vitro splice assays in HEK293T cells. Exon skipping due to five variants in noncanonical splice sites of exons in ABCA4, DMD, and TMC1 could be partially or completely rescued by increasing the predicted strengths of the other splice site of the same exon. We named this mechanism “splicing interdependency”, and it is likely based on exon recognition by splicing machinery. Awareness of this interdependency is of importance in the classification of noncanonical splice-site variants associated with disease and may open new opportunities for treatments.
Collapse
|
5
|
Abstract
Spliceosome-mediated mRNA trans-splicing (SMaRT) is a promising strategy for treatment of genetic diseases which cannot be targeted via classical therapy approaches. SMaRT utilizes an exogenous pre-mRNA trans-splicing molecule (PTM) to correct a diseased target pre-mRNA. This process relies on splicing of two separate pre-mRNA molecules in trans creating a mature chimeric mRNA molecule which consists of the protein coding sequence of the PTM as well as the endogenous mRNA. For therapeutic implications, the most critical step in SMaRT is to develop PTMs resulting in a high ratio of trans-splicing to regular cis-splicing.This protocol provides guidelines on how to design PTMs and describes a fast screening assay to test their efficiencies. To elucidate the therapeutic potential of the best candidates in a more native setting, these PTMs are tested further on mini genes.
Collapse
Affiliation(s)
- Lisa M Riedmayr
- Center for Integrated Protein Science Munich CIPSM, Ludwig-Maximilians-Universität München, Munich, Germany.
- Department of Pharmacy, Center for Drug Research, Ludwig-Maximilians-Universität München, Munich, Germany.
| |
Collapse
|
6
|
Abstract
This unit describes the usage of geneid, an efficient gene-finding program that allows for the analysis of large genomic sequences, including whole mammalian chromosomes. These sequences can be partially annotated, and geneid can be used to refine this initial annotation. Training geneid is relatively easy, and parameter configurations exist for a number of eukaryotic species. geneid produces output in a variety of standard formats. The results, thus, can be processed by a variety of software tools, including visualization programs. geneid software is in the public domain, and is undergoing constant development. It is easy to install and use. Exhaustive benchmark evaluations show that geneid compares favorably with other existing gene-finding tools. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Tyler Alioto
- Centre Nacional d'Anàlisi Genòmica (CNAG-CRG), Barcelona, Spain.,Universitat Pompeu Fabra, Barcelona, Spain.,Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Enrique Blanco
- Centre de Regulació Genòmica (CRG), Barcelona, Spain.,Universitat Pompeu Fabra, Barcelona, Spain.,Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Genís Parra
- Centre Nacional d'Anàlisi Genòmica (CNAG-CRG), Barcelona, Spain.,Centre de Regulació Genòmica (CRG), Barcelona, Spain.,Universitat Pompeu Fabra, Barcelona, Spain.,Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Roderic Guigó
- Centre de Regulació Genòmica (CRG), Barcelona, Spain.,Universitat Pompeu Fabra, Barcelona, Spain.,Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| |
Collapse
|
7
|
Stella A, Lastella P, Loconte DC, Bukvic N, Varvara D, Patruno M, Bagnulo R, Lovaglio R, Bartolomeo N, Serio G, Resta N. Accurate Classification of NF1 Gene Variants in 84 Italian Patients with Neurofibromatosis Type 1. Genes (Basel) 2018; 9:genes9040216. [PMID: 29673180 PMCID: PMC5924558 DOI: 10.3390/genes9040216] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Revised: 03/27/2018] [Accepted: 04/03/2018] [Indexed: 11/16/2022] Open
Abstract
Neurofibromatosis type 1 (NF1) is one of the most common autosomal dominant genetic diseases. It is caused by mutations in the NF1 gene encoding for the large protein, neurofibromin. Genetic testing of NF1 is cumbersome because 50% of cases are sporadic, and there are no mutation hot spots. In addition, the most recognizable NF1 clinical features—café-au-lait (CALs) spots and axillary and/or inguinal freckling—appear early in childhood but are rather non-specific. Thus, the identification of causative variants is extremely important for early diagnosis, especially in paediatric patients. Here, we aimed to identify the underlying genetic defects in 72 index patients referred to our centre for NF1. Causative mutations were identified in 58 subjects, with 29 being novel changes. We evaluated missense and non-canonical splicing mutations with both protein and splicing prediction algorithms. The ratio of splicing mutations detected was higher than that reported in recent patients’ series and in the Human Gene Mutation Database (HGMD). After applying in silico predictive tools to 41 previously reported missense variants, we demonstrated that 46.3% of these putatively missense mutations were forecasted to alter splicing instead. Our data suggest that mutations affecting splicing can be frequently underscored if not analysed in depth. We confirm that hamartomas can be useful for diagnosing NF1 in children. Lisch nodules and cutaneous neurofibromas were more frequent in patients with frameshifting mutations. In conclusion, we demonstrated that comprehensive in silico analysis can be a highly specific method for predicting the nature of NF1 mutations and may help in assuring proper patient care.
Collapse
Affiliation(s)
- Alessandro Stella
- Laboratorio di Genetica Medica, Dipartimento di Scienze Biomediche e Oncologia Umana, Università degli Studi di Bari Aldo Moro, 70124 Bari, Italy.
| | - Patrizia Lastella
- Centro di Malattie Rare, Azienda Ospedaliero-Universitario Policlinico di Bari, 70124 Bari, Italy.
| | - Daria Carmela Loconte
- Laboratorio di Genetica Medica, Dipartimento di Scienze Biomediche e Oncologia Umana, Università degli Studi di Bari Aldo Moro, 70124 Bari, Italy.
| | - Nenad Bukvic
- Laboratorio di Genetica Medica, Dipartimento di Scienze Biomediche e Oncologia Umana, Università degli Studi di Bari Aldo Moro, 70124 Bari, Italy.
| | - Dora Varvara
- Azienda Ospedaliero-Universitario Policlinico di Bari, 70124 Bari, Italy.
| | - Margherita Patruno
- Laboratorio di Genetica Medica, Dipartimento di Scienze Biomediche e Oncologia Umana, Università degli Studi di Bari Aldo Moro, 70124 Bari, Italy.
| | - Rosanna Bagnulo
- Laboratorio di Genetica Medica, Dipartimento di Scienze Biomediche e Oncologia Umana, Università degli Studi di Bari Aldo Moro, 70124 Bari, Italy.
| | - Rosaura Lovaglio
- Laboratorio di Genetica Medica, Dipartimento di Scienze Biomediche e Oncologia Umana, Università degli Studi di Bari Aldo Moro, 70124 Bari, Italy.
| | - Nicola Bartolomeo
- Sezione di Igiene, Dipartimento di Scienze Biomediche e Oncologia Umana, Università degli Studi di Bari Aldo Moro, 70124 Bari, Italy.
| | - Gabriella Serio
- Sezione di Igiene, Dipartimento di Scienze Biomediche e Oncologia Umana, Università degli Studi di Bari Aldo Moro, 70124 Bari, Italy.
| | - Nicoletta Resta
- Laboratorio di Genetica Medica, Dipartimento di Scienze Biomediche e Oncologia Umana, Università degli Studi di Bari Aldo Moro, 70124 Bari, Italy.
| |
Collapse
|
8
|
Pucker B, Holtgräwe D, Weisshaar B. Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence. BMC Res Notes 2017; 10:667. [PMID: 29202864 PMCID: PMC5716242 DOI: 10.1186/s13104-017-2985-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 11/23/2017] [Indexed: 12/26/2022] Open
Abstract
Objective The Arabidopsis thaliana Niederzenz-1 genome sequence was recently published with an ab initio gene prediction. In depth analysis of the predicted gene set revealed some errors involving genes with non-canonical splice sites in their introns. Since non-canonical splice sites are difficult to predict ab initio, we checked for options to improve the annotation by transferring annotation information from the recently released Columbia-0 reference genome sequence annotation Araport11. Results Incorporation of hints generated from Araport11 enabled the precise prediction of non-canonical splice sites. Manual inspection of RNA-Seq read mapping and RT-PCR were applied to validate the structural annotations of non-canonical splice sites. Predictions of untranslated regions were also updated by harnessing the potential of Araport11’s information, which was generated by using high coverage RNA-Seq data. The improved gene set of the Nd-1 genome assembly (GeneSet_Nd-1_v1.1) was evaluated via comparison to the initial gene prediction (GeneSet_Nd-1_v1.0) as well as against Araport11 for the Col-0 reference genome sequence. GeneSet_Nd-1_v1.1 contains previously missed non-canonical splice sites in 1256 genes. Reciprocal best hits for 24,527 (89.4%) of all nuclear Col-0 genes against the GeneSet_Nd-1_v1.1 indicate a high gene prediction quality. Electronic supplementary material The online version of this article (10.1186/s13104-017-2985-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Boas Pucker
- Faculty of Biology & Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Daniela Holtgräwe
- Faculty of Biology & Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Bernd Weisshaar
- Faculty of Biology & Center for Biotechnology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
9
|
Abstract
Gene finding is the process of identifying genome sequence regions representing stretches of DNA that encode biologically active products, such as proteins or functional noncoding RNAs. As this is usually the first step in the analysis of any novel genomic sequence or resequenced sample of well-known organisms, it is a very important issue, as all downstream analyses depend on the results. This chapter describes the biological basis for gene finding, and the programs and computational approaches that are available for the automated identification of protein-coding genes. For bacterial, archaeal, and eukaryotic genomes, as well as for multi-species sequence data originating from environmental community studies, the state of the art in automated gene finding is described.
Collapse
Affiliation(s)
- Alice Carolyn McHardy
- Department for Algorithmic Bioinformatics, Heinrich Heine University, Düsseldorf, Germany.
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany.
| | - Andreas Kloetgen
- Department for Algorithmic Bioinformatics, Heinrich Heine University, Düsseldorf, Germany
- Department of Pediatric Oncology, Hematology and Clinical Immunology, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
10
|
Identification and characterization of protein coding genes in monsonia (Monsonia burkeana Planch. ex harv) using a combination of approaches. Genes Genomics 2016. [DOI: 10.1007/s13258-016-0499-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
11
|
The Poitiers School of Mathematical and Theoretical Biology: Besson-Gavaudan-Schützenberger's Conjectures on Genetic Code and RNA Structures. Acta Biotheor 2016; 64:403-426. [PMID: 27592342 DOI: 10.1007/s10441-016-9287-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 08/23/2016] [Indexed: 02/08/2023]
Abstract
The French school of theoretical biology has been mainly initiated in Poitiers during the sixties by scientists like J. Besson, G. Bouligand, P. Gavaudan, M. P. Schützenberger and R. Thom, launching many new research domains on the fractal dimension, the combinatorial properties of the genetic code and related amino-acids as well as on the genetic regulation of the biological processes. Presently, the biological science knows that RNA molecules are often involved in the regulation of complex genetic networks as effectors, e.g., activators (small RNAs as transcription factors), inhibitors (micro-RNAs) or hybrids (circular RNAs). Examples of such networks will be given showing that (1) there exist RNA "relics" that have played an important role during evolution and have survived in many genomes, whose probability distribution of their sub-sequences is quantified by the Shannon entropy, and (2) the robustness of the dynamics of the networks they regulate can be characterized by the Kolmogorov-Sinaï dynamic entropy and attractor entropy.
Collapse
|
12
|
Li H, Hu C, Bai L, Li H, Li M, Zhao X, Czajkowsky DM, Shao Z. Ultra-deep sequencing of ribosome-associated poly-adenylated RNA in early Drosophila embryos reveals hundreds of conserved translated sORFs. DNA Res 2016; 23:571-580. [PMID: 27559081 PMCID: PMC5144680 DOI: 10.1093/dnares/dsw040] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2016] [Accepted: 07/11/2016] [Indexed: 11/23/2022] Open
Abstract
There is growing recognition that small open reading frames (sORFs) encoding peptides shorter than 100 amino acids are an important class of functional elements in the eukaryotic genome, with several already identified to play critical roles in growth, development, and disease. However, our understanding of their biological importance has been hindered owing to the significant technical challenges limiting their annotation. Here we combined ultra-deep sequencing of ribosome-associated poly-adenylated RNAs with rigorous conservation analysis to identify a comprehensive population of translated sORFs during early Drosophila embryogenesis. In total, we identify 399 sORFs, including those previously annotated but without evidence of translational capacity, those found within transcripts previously classified as non-coding, and those not previously known to be transcribed. Further, we find, for the first time, evidence for translation of many sORFs with different isoforms, suggesting their regulation is as complex as longer ORFs. Furthermore, many sORFs are found not associated with ribosomes in late-stage Drosophila S2 cells, suggesting that many of the translated sORFs may have stage-specific functions during embryogenesis. These results thus provide the first comprehensive annotation of the sORFs present during early Drosophila embryogenesis, a necessary basis for a detailed delineation of their function in embryogenesis and other biological processes.
Collapse
Affiliation(s)
- Hongmei Li
- Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Chuansheng Hu
- Bio-ID Center, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ling Bai
- Bio-ID Center, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hua Li
- Bio-ID Center, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Mingfa Li
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xiaodong Zhao
- Bio-ID Center, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Daniel M Czajkowsky
- Bio-ID Center, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhifeng Shao
- Bio-ID Center, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
13
|
Sharma V, Elghafari A, Hiller M. Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation. Nucleic Acids Res 2016; 44:e103. [PMID: 27016733 PMCID: PMC4914097 DOI: 10.1093/nar/gkw210] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Revised: 03/04/2016] [Accepted: 03/18/2016] [Indexed: 12/03/2022] Open
Abstract
Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource (http://bds.mpi-cbg.de/hillerlab/CESAR/). CESAR (https://github.com/hillerlab/CESAR/) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes.
Collapse
Affiliation(s)
- Virag Sharma
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany Max Planck Institute for the Physics of Complex Systems, Nöthnitzer Str. 38, 01187 Dresden, Germany
| | - Anas Elghafari
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany Max Planck Institute for the Physics of Complex Systems, Nöthnitzer Str. 38, 01187 Dresden, Germany Technical University, 01069 Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany Max Planck Institute for the Physics of Complex Systems, Nöthnitzer Str. 38, 01187 Dresden, Germany
| |
Collapse
|
14
|
Sheshukova EV, Shindyapina AV, Komarova TV, Dorokhov YL. “Matreshka” genes with alternative reading frames. RUSS J GENET+ 2016. [DOI: 10.1134/s1022795416020149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
15
|
Abstract
Long noncoding RNAs (lncRNAs) are pivotal regulators of genome structure and gene expression. LncRNAs can directly interact with chromatin-modifying enzymes and nucleosome-remodeling factors to control chromatin structure and accessibility of genetic information. Moreover, lncRNA expression can be controlled by chromatin-remodeling factors, suggesting a feedback circuit of regulation. Here, we discuss the recent advances of lncRNA studies, focusing on the function and mechanism of lncRNA-chromatin interactions.
Collapse
Affiliation(s)
- Pei Han
- a Krannert Institute of Cardiology and Division of Cardiology ; Department of Medicine; Indiana University School of Medicine ; Indianapolis , IN USA.,b Division of Cardiovascular Medicine; Stanford University School of Medicine ; Stanford , CA USA
| | - Ching-Pin Chang
- a Krannert Institute of Cardiology and Division of Cardiology ; Department of Medicine; Indiana University School of Medicine ; Indianapolis , IN USA.,c Department of Biochemistry and Molecular Biology ; Indiana University School of Medicine ; Indianapolis , IN USA.,d Department of Medical and Molecular Genetics ; Indiana University School of Medicine ; Indianapolis , IN USA
| |
Collapse
|
16
|
Chu Q, Ma J, Saghatelian A. Identification and characterization of sORF-encoded polypeptides. Crit Rev Biochem Mol Biol 2015; 50:134-41. [PMID: 25857697 DOI: 10.3109/10409238.2015.1016215] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Molecular biology, genomics and proteomics methods have been utilized to reveal a non-annotated class of endogenous polypeptides (small proteins and peptides) encoded by short open reading frames (sORFs), or small open reading frames (smORFs). We refer to these polypeptides as s(m)ORF-encoded polypeptides or SEPs. The early SEPs were identified via genetic screens, and many of the RNAs that contain s(m)ORFs were originally considered to be non-coding; however, elegant work in bacteria and flies demonstrated that these s(m)ORFs code for functional polypeptides as small as 11-amino acids in length. The discovery of these initial SEPs led to search for these molecules using methods such as ribosome profiling and proteomics, which have revealed the existence of many SEPs, including novel human SEPs. Unlike screens, omics methods do not necessarily link a SEP to a cellular or biological function, but functional genomic and proteomic strategies have demonstrated that at least some of these newly discovered SEPs have biochemical and cellular functions. Here, we provide an overview of these results and discuss the future directions in this emerging field.
Collapse
Affiliation(s)
- Qian Chu
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, Helmsley Center for Genomic Medicine , La Jolla, CA , USA and
| | | | | |
Collapse
|
17
|
Cooke IR, Jones D, Bowen JK, Deng C, Faou P, Hall NE, Jayachandran V, Liem M, Taranto AP, Plummer KM, Mathivanan S. Proteogenomic analysis of the Venturia pirina (Pear Scab Fungus) secretome reveals potential effectors. J Proteome Res 2014; 13:3635-44. [PMID: 24965097 DOI: 10.1021/pr500176c] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
A proteogenomic analysis is presented for Venturia pirina, a fungus that causes scab disease on European pear (Pyrus communis). V. pirina is host-specific, and the infection is thought to be mediated by secreted effector proteins. Currently, only 36 V. pirina proteins are catalogued in GenBank, and the genome sequence is not publicly available. To identify putative effectors, V. pirina was grown in vitro on and in cellophane sheets mimicking its growth in infected leaves. Secreted extracts were analyzed by tandem mass spectrometry, and the data (ProteomeXchange identifier PXD000710) was queried against a protein database generated by combining in silico predicted transcripts with six frame translations of a whole genome sequence of V. pirina (GenBank Accession JEMP00000000 ). We identified 1088 distinct V. pirina protein groups (FDR 1%) including 1085 detected for the first time. Thirty novel (not in silico predicted) proteins were found, of which 14 were identified as potential effectors based on characteristic features of fungal effector protein sequences. We also used evidence from semitryptic peptides at the protein N-terminus to corroborate in silico signal peptide predictions for 22 proteins, including several potential effectors. The analysis highlights the utility of proteogenomics in the study of secreted effectors.
Collapse
Affiliation(s)
- Ira R Cooke
- Department of Biochemistry, La Trobe Institute for Molecular Science, La Trobe University , Melbourne, Victoria 3086, Australia
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Emerging evidence for functional peptides encoded by short open reading frames. Nat Rev Genet 2014; 15:193-204. [PMID: 24514441 DOI: 10.1038/nrg3520] [Citation(s) in RCA: 382] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Short open reading frames (sORFs) are a common feature of all genomes, but their coding potential has mostly been disregarded, partly because of the difficulty in determining whether these sequences are translated. Recent innovations in computing, proteomics and high-throughput analyses of translation start sites have begun to address this challenge and have identified hundreds of putative coding sORFs. The translation of some of these has been confirmed, although the contribution of their peptide products to cellular functions remains largely unknown. This Review examines this hitherto overlooked component of the proteome and considers potential roles for sORF-encoded peptides.
Collapse
|
19
|
Two new methods for DNA splice site prediction based on neuro-fuzzy network and clustering. Neural Comput Appl 2013. [DOI: 10.1007/s00521-012-1257-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
20
|
Adi SS, Ferreira CE. Syntenic global alignment and its application to the gene prediction problem. JOURNAL OF THE BRAZILIAN COMPUTER SOCIETY 2013. [DOI: 10.1007/s13173-013-0115-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Abstract
Given the increasing number of available genomic sequences, one now faces the task of identifying their protein coding regions. The gene prediction problem can be addressed in several ways, and one of the most promising methods makes use of information derived from the comparison of homologous sequences. In this work, we develop a new comparative-based gene prediction program, called Exon_Finder2. This tool is based on a new type of alignment we propose, called syntenic global alignment, that can deal satisfactorily with sequences that share regions with different rates of conservation. In addition to this new type of alignment itself, we also describe a dynamic programming algorithm that computes a best syntenic global alignment of two sequences, as well as its related score. The applicability of our approach was validated by the promising initial results achieved by Exon_Finder2. On a benchmark including 120 pairs of human and mouse genomic sequences, most of their encoded genes were successfully identified by our program.
Collapse
|
21
|
Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, Boisvert S, Chapman JA, Chapuis G, Chikhi R, Chitsaz H, Chou WC, Corbeil J, Del Fabbro C, Docking TR, Durbin R, Earl D, Emrich S, Fedotov P, Fonseca NA, Ganapathy G, Gibbs RA, Gnerre S, Godzaridis E, Goldstein S, Haimel M, Hall G, Haussler D, Hiatt JB, Ho IY, Howard J, Hunt M, Jackman SD, Jaffe DB, Jarvis ED, Jiang H, Kazakov S, Kersey PJ, Kitzman JO, Knight JR, Koren S, Lam TW, Lavenier D, Laviolette F, Li Y, Li Z, Liu B, Liu Y, Luo R, Maccallum I, Macmanes MD, Maillet N, Melnikov S, Naquin D, Ning Z, Otto TD, Paten B, Paulo OS, Phillippy AM, Pina-Martins F, Place M, Przybylski D, Qin X, Qu C, Ribeiro FJ, Richards S, Rokhsar DS, Ruby JG, Scalabrin S, Schatz MC, Schwartz DC, Sergushichev A, Sharpe T, Shaw TI, Shendure J, Shi Y, Simpson JT, Song H, Tsarev F, Vezzi F, Vicedomini R, Vieira BM, Wang J, Worley KC, Yin S, Yiu SM, Yuan J, Zhang G, Zhang H, Zhou S, Korf IF. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2013; 2:10. [PMID: 23870653 PMCID: PMC3844414 DOI: 10.1186/2047-217x-2-10] [Citation(s) in RCA: 420] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Accepted: 07/15/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. RESULTS In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. CONCLUSIONS Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
Collapse
|
22
|
Abstract
Motivation: Subcellular localization is one aspect of protein function. Despite advances in high-throughput imaging, localization maps remain incomplete. Several methods accurately predict localization, but many challenges remain to be tackled. Results: In this study, we introduced a framework to predict localization in life's three domains, including globular and membrane proteins (3 classes for archaea; 6 for bacteria and 18 for eukaryota). The resulting method, LocTree2, works well even for protein fragments. It uses a hierarchical system of support vector machines that imitates the cascading mechanism of cellular sorting. The method reaches high levels of sustained performance (eukaryota: Q18=65%, bacteria: Q6=84%). LocTree2 also accurately distinguishes membrane and non-membrane proteins. In our hands, it compared favorably with top methods when tested on new data. Availability: Online through PredictProtein (predictprotein.org); as standalone version at http://www.rostlab.org/services/loctree2. Contact:localization@rostlab.org Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tatyana Goldberg
- TUM, Bioinformatik-I12, Informatik, Boltzmannstrasse 3, Garching 85748, Germany.
| | | | | |
Collapse
|
23
|
Peyretaillade E, Parisot N, Polonais V, Terrat S, Denonfoux J, Dugat-Bony E, Wawrzyniak I, Biderre-Petit C, Mahul A, Rimour S, Gonçalves O, Bornes S, Delbac F, Chebance B, Duprat S, Samson G, Katinka M, Weissenbach J, Wincker P, Peyret P. Annotation of microsporidian genomes using transcriptional signals. Nat Commun 2012; 3:1137. [DOI: 10.1038/ncomms2156] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2012] [Accepted: 09/20/2012] [Indexed: 12/24/2022] Open
|
24
|
Cheng H, Chan WS, Li Z, Wang D, Liu S, Zhou Y. Small open reading frames: current prediction techniques and future prospect. Curr Protein Pept Sci 2012; 12:503-7. [PMID: 21787300 DOI: 10.2174/138920311796957667] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Revised: 04/01/2011] [Accepted: 05/04/2011] [Indexed: 11/22/2022]
Abstract
Evidence is accumulating that small open reading frames (sORF, <100 codons) play key roles in many important biological processes. Yet, they are generally ignored in gene annotation despite they are far more abundant than the genes with more than 100 codons. Here, we demonstrate that popular homolog search and codon-index techniques perform poorly for small genes relative to that for larger genes, while a method dedicated to sORF discovery has a similar level of accuracy as homology search. The result is largely due to the small dataset of experimentally verified sORF available for homology search and for training ab initio techniques. It highlights the urgent need for both experimental and computational studies in order to further advance the accuracy of sORF prediction.
Collapse
Affiliation(s)
- Haoyu Cheng
- Indiana University School of Informatics, Indiana University-Purdue University and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | | | | | | | | |
Collapse
|
25
|
Goli B, Nair AS. The elusive short gene – an ensemble method for recognition for prokaryotic genome. Biochem Biophys Res Commun 2012; 422:36-41. [DOI: 10.1016/j.bbrc.2012.04.090] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2012] [Accepted: 04/17/2012] [Indexed: 10/28/2022]
|
26
|
García-Pedrajas N, de Haro-García A. Scaling up data mining algorithms: review and taxonomy. PROGRESS IN ARTIFICIAL INTELLIGENCE 2012. [DOI: 10.1007/s13748-011-0004-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
27
|
Haas BJ, Zeng Q, Pearson MD, Cuomo CA, Wortman JR. Approaches to Fungal Genome Annotation. Mycology 2011; 2:118-141. [PMID: 22059117 PMCID: PMC3207268 DOI: 10.1080/21501203.2011.606851] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Fungal genome annotation is the starting point for analysis of genome content. This generally involves the application of diverse methods to identify features on a genome assembly such as protein-coding and non-coding genes, repeats and transposable elements, and pseudogenes. Here we describe tools and methods leveraged for eukaryotic genome annotation with a focus on the annotation of fungal nuclear and mitochondrial genomes. We highlight the application of the latest technologies and tools to improve the quality of predicted gene sets. The Broad Institute eukaryotic genome annotation pipeline is described as one example of how such methods and tools are integrated into a sequencing center's production genome annotation environment.
Collapse
Affiliation(s)
- Brian J Haas
- Genome Sequencing and Analysis Program, Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, U.S.A
| | | | | | | | | |
Collapse
|
28
|
Specht M, Stanke M, Terashima M, Naumann-Busch B, Janssen I, Höhner R, Hom EFY, Liang C, Hippler M. Concerted action of the new Genomic Peptide Finder and AUGUSTUS allows for automated proteogenomic annotation of the Chlamydomonas reinhardtii genome. Proteomics 2011; 11:1814-23. [PMID: 21432999 DOI: 10.1002/pmic.201000621] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2010] [Revised: 01/31/2011] [Accepted: 02/11/2011] [Indexed: 12/24/2022]
Abstract
The use and development of post-genomic tools naturally depends on large-scale genome sequencing projects. The usefulness of post-genomic applications is dependent on the accuracy of genome annotations, for which the correct identification of intron-exon borders in complex genomes of eukaryotic organisms is often an error-prone task. Although automated algorithms for predicting intron-exon structures are available, supporting exon evidence is necessary to achieve comprehensive genome annotation. Besides cDNA and EST support, peptides identified via MS/MS can be used as extrinsic evidence in a proteogenomic approach. We describe an improved version of the Genomic Peptide Finder (GPF), which aligns de novo predicted amino acid sequences to the genomic DNA sequence of an organism while correcting for peptide sequencing errors and accounting for the possibility of splicing. We have coupled GPF and the gene finding program AUGUSTUS in a way that provides automatic structural annotations of the Chlamydomonas reinhardtii genome, using highly unbiased GPF evidence. A comparison of the AUGUSTUS gene set incorporating GPF evidence to the standard JGI FM4 (Filtered Models 4) gene set reveals 932 GPF peptides that are not contained in the Filtered Models 4 gene set. Furthermore, the GPF evidence improved the AUGUSTUS gene models by altering 65 gene models and adding three previously unidentified genes.
Collapse
Affiliation(s)
- Michael Specht
- Institute of Plant Biology and Biotechnology, University of Münster, Münster, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Misawa K, Kikuno RF. GeneWaltz--A new method for reducing the false positives of gene finding. BioData Min 2010; 3:6. [PMID: 20875138 PMCID: PMC2955682 DOI: 10.1186/1756-0381-3-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2010] [Accepted: 09/28/2010] [Indexed: 11/28/2022] Open
Abstract
Background Identifying protein-coding regions in genomic sequences is an essential step in genome analysis. It is well known that the proportion of false positives among genes predicted by current methods is high, especially when the exons are short. These false positives are problematic because they waste time and resources of experimental studies. Methods We developed GeneWaltz, a new filtering method that reduces the risk of false positives in gene finding. GeneWaltz utilizes a codon-to-codon substitution matrix that was constructed by comparing protein-coding regions from orthologous gene pairs between mouse and human genomes. Using this matrix, a scoring scheme was developed; it assigned higher scores to coding regions and lower scores to non-coding regions. The regions with high scores were considered candidate coding regions. One-dimensional Karlin-Altschul statistics was used to test the significance of the coding regions identified by GeneWaltz. Results The proportion of false positives among genes predicted by GENSCAN and Twinscan were high, especially when the exons were short. GeneWaltz significantly reduced the ratio of false positives to all positives predicted by GENSCAN and Twinscan, especially when the exons were short. Conclusions GeneWaltz will be helpful in experimental genomic studies. GeneWaltz binaries and the matrix are available online at http://en.sourceforge.jp/projects/genewaltz/.
Collapse
Affiliation(s)
- Kazuharu Misawa
- Research Program for Computational Science, Research and Development Group for Next-Generation Integrated Living Matter Simulation, Fusion of Data and Analysis Research and Development Team, RIKEN, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan.
| | | |
Collapse
|
30
|
Bidard F, Imbeaud S, Reymond N, Lespinet O, Silar P, Clavé C, Delacroix H, Berteaux-Lecellier V, Debuchy R. A general framework for optimization of probes for gene expression microarray and its application to the fungus Podospora anserina. BMC Res Notes 2010; 3:171. [PMID: 20565839 PMCID: PMC2908635 DOI: 10.1186/1756-0500-3-171] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2010] [Accepted: 06/18/2010] [Indexed: 01/16/2023] Open
Abstract
Background The development of new microarray technologies makes custom long oligonucleotide arrays affordable for many experimental applications, notably gene expression analyses. Reliable results depend on probe design quality and selection. Probe design strategy should cope with the limited accuracy of de novo gene prediction programs, and annotation up-dating. We present a novel in silico procedure which addresses these issues and includes experimental screening, as an empirical approach is the best strategy to identify optimal probes in the in silico outcome. Findings We used four criteria for in silico probe selection: cross-hybridization, hairpin stability, probe location relative to coding sequence end and intron position. This latter criterion is critical when exon-intron gene structure predictions for intron-rich genes are inaccurate. For each coding sequence (CDS), we selected a sub-set of four probes. These probes were included in a test microarray, which was used to evaluate the hybridization behavior of each probe. The best probe for each CDS was selected according to three experimental criteria: signal-to-noise ratio, signal reproducibility, and representative signal intensities. This procedure was applied for the development of a gene expression Agilent platform for the filamentous fungus Podospora anserina and the selection of a single 60-mer probe for each of the 10,556 P. anserina CDS. Conclusions A reliable gene expression microarray version based on the Agilent 44K platform was developed with four spot replicates of each probe to increase statistical significance of analysis.
Collapse
Affiliation(s)
- Frédérique Bidard
- Univ Paris-Sud 11, Institut de Génétique et Microbiologie UMR8621, F- 91405 Orsay, France.
| | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Bertini I, Decaria L, Rosato A. The annotation of full zinc proteomes. J Biol Inorg Chem 2010; 15:1071-8. [DOI: 10.1007/s00775-010-0666-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2010] [Accepted: 04/16/2010] [Indexed: 11/29/2022]
|
32
|
An overview of the current status of eukaryote gene prediction strategies. Gene 2010; 461:1-4. [PMID: 20430068 DOI: 10.1016/j.gene.2010.04.008] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2010] [Revised: 04/15/2010] [Accepted: 04/16/2010] [Indexed: 01/12/2023]
Abstract
As sequence data continues to be generated at a logarithmic rate our dependence on effective in silico gene prediction methods is also increasing. Herein, I review the current state of eukaryote gene prediction methods; their strengths, weaknesses and future directions.
Collapse
|
33
|
Ma LJ, Fedorova ND. A practical guide to fungal genome projects: strategy, technology, cost and completion. Mycology 2010. [DOI: 10.1080/21501201003680943] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
|
34
|
Stanke M. Computational Gene Prediction in Eukaryotic Genomes. CELLULAR ORIGIN, LIFE IN EXTREME HABITATS AND ASTROBIOLOGY 2010:291-306. [DOI: 10.1007/978-90-481-3795-4_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
35
|
Liu YJ, Zheng D, Balasubramanian S, Carriero N, Khurana E, Robilotto R, Gerstein MB. Comprehensive analysis of the pseudogenes of glycolytic enzymes in vertebrates: the anomalously high number of GAPDH pseudogenes highlights a recent burst of retrotrans-positional activity. BMC Genomics 2009; 10:480. [PMID: 19835609 PMCID: PMC2770531 DOI: 10.1186/1471-2164-10-480] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2009] [Accepted: 10/16/2009] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Pseudogenes provide a record of the molecular evolution of genes. As glycolysis is such a highly conserved and fundamental metabolic pathway, the pseudogenes of glycolytic enzymes comprise a standardized genomic measuring stick and an ideal platform for studying molecular evolution. One of the glycolytic enzymes, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), has already been noted to have one of the largest numbers of associated pseudogenes, among all proteins. RESULTS We assembled the first comprehensive catalog of the processed and duplicated pseudogenes of glycolytic enzymes in many vertebrate model-organism genomes, including human, chimpanzee, mouse, rat, chicken, zebrafish, pufferfish, fruitfly, and worm (available at http://pseudogene.org/glycolysis/). We found that glycolytic pseudogenes are predominantly processed, i.e. retrotransposed from the mRNA of their parent genes. Although each glycolytic enzyme plays a unique role, GAPDH has by far the most pseudogenes, perhaps reflecting its large number of non-glycolytic functions or its possession of a particularly retrotranspositionally active sub-sequence. Furthermore, the number of GAPDH pseudogenes varies significantly among the genomes we studied: none in zebrafish, pufferfish, fruitfly, and worm, 1 in chicken, 50 in chimpanzee, 62 in human, 331 in mouse, and 364 in rat. Next, we developed a simple method of identifying conserved syntenic blocks (consistently applicable to the wide range of organisms in the study) by using orthologous genes as anchors delimiting a conserved block between a pair of genomes. This approach showed that few glycolytic pseudogenes are shared between primate and rodent lineages. Finally, by estimating pseudogene ages using Kimura's two-parameter model of nucleotide substitution, we found evidence for bursts of retrotranspositional activity approximately 42, 36, and 26 million years ago in the human, mouse, and rat lineages, respectively. CONCLUSION Overall, we performed a consistent analysis of one group of pseudogenes across multiple genomes, finding evidence that most of them were created within the last 50 million years, subsequent to the divergence of rodent and primate lineages.
Collapse
Affiliation(s)
- Yuen-Jong Liu
- Department of Surgery, Beth Israel Deaconess Medical Center, Harvard Medical School, 110 Francis Street, Boston, MA, USA
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
| | - Deyou Zheng
- Albert Einstein College of Medicine of Yeshiva University, Department of Neurology, Rose F. Kennedy Center, 1410 Pelham Parkway South, Room 915B, Bronx, NY 10461, USA
| | - Suganthi Balasubramanian
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
| | - Nicholas Carriero
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
| | - Ekta Khurana
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
| | - Rebecca Robilotto
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Mark B Gerstein
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| |
Collapse
|
36
|
Findlay GD, MacCoss MJ, Swanson WJ. Proteomic discovery of previously unannotated, rapidly evolving seminal fluid genes in Drosophila. Genome Res 2009; 19:886-96. [PMID: 19411605 DOI: 10.1101/gr.089391.108] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
As genomic sequences become easier to acquire, shotgun proteomics will play an increasingly important role in genome annotation. With proteomics, researchers can confirm and revise existing genome annotations and discover completely new genes. Proteomic-based de novo gene discovery should be especially useful for sets of genes with characteristics that make them difficult to predict with gene-finding algorithms. Here, we report the proteomic discovery of 19 previously unannotated genes encoding seminal fluid proteins (Sfps) that are transferred from males to females during mating in Drosophila. Using bioinformatics, we detected putative orthologs of these genes, as well as 19 others detected by the same method in a previous study, across several related species. Gene expression analysis revealed that nearly all predicted orthologs are transcribed and that most are expressed in a male-specific or male-biased manner. We suggest several reasons why these genes escaped computational prediction. Like annotated Sfps, many of these new proteins show a pattern of adaptive evolution, consistent with their potential role in influencing male sperm competitive ability. However, in contrast to annotated Sfps, these new genes are shorter, have a higher rate of nonsynonymous substitution, and have a markedly lower GC content in coding regions. Our data demonstrate the utility of applying proteomic gene discovery methods to a specific biological process and provide a more complete picture of the molecules that are critical to reproductive success in Drosophila.
Collapse
Affiliation(s)
- Geoffrey D Findlay
- Department of Genome Sciences, University of Washington, Seattle, WA 98195-5065, USA.
| | | | | |
Collapse
|
37
|
Harrow J, Nagy A, Reymond A, Alioto T, Patthy L, Antonarakis SE, Guigó R. Identifying protein-coding genes in genomic sequences. Genome Biol 2009; 10:201. [PMID: 19226436 PMCID: PMC2687780 DOI: 10.1186/gb-2009-10-1-201] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
A review of the main computational pipelines used to generate the human reference protein-coding gene sets. The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.
Collapse
Affiliation(s)
- Jennifer Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge, UK
| | | | | | | | | | | | | |
Collapse
|
38
|
Jost D, Everaers R. Genome wide application of DNA melting analysis. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2009; 21:034108. [PMID: 21817253 DOI: 10.1088/0953-8984/21/3/034108] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Correspondences between functional and thermodynamic melting properties in a genome are being increasingly employed for ab initio gene finding and for the interpretation of the evolution of genomes. Here we present the first systematic genome wide comparison between biologically coding domains and thermodynamically stable regions. In particular, we develop statistical methods to estimate the reliability of the resulting predictions. Not surprisingly, we find that the success of the approach depends on the difference in GC content between the coding and the non-coding parts of the genome and on the percentage of coding base-pairs in the sequence. These prerequisites vary strongly between species, where we observe no systematic differences between eukaryotes and prokaryotes. We find a number of organisms in which the strong correlation of coding domains and thermodynamically stable regions allows us to identify putative exons or genes to complement existing approaches. In contrast to previous investigations along these lines we have not employed the Poland-Scheraga (PS) model of DNA melting but use the earlier Zimm-Bragg (ZB) model. The Ising-like form of the ZB model can be viewed as an approximation to the PS model, with averaged loop entropies included into the cooperative factor [Formula: see text]. This results in a speed-up by a factor of 20-100 compared to the Fixman-Freire algorithm for the solution of the PS model. We show that for genomic sequences the resulting systematic errors are negligible compared to the parameterization uncertainty of the models. We argue that for limited computing resources, available CPU power is better invested in broadening the statistical base for genomic investigations than in marginal improvements of the description of the physical melting behavior.
Collapse
Affiliation(s)
- Daniel Jost
- Laboratoire de Physique de l'École Normale Supérieure de Lyon, Université de Lyon, CNRS UMR 5672, 46 Allée d'Italie 69364 Lyon Cedex 07, France
| | | |
Collapse
|
39
|
Abstract
The accurate identification of exons and introns that comprise a complete plant gene structure can be a time-consuming and challenging task. Novel Web-based tools facilitate the process by providing a convenient interface to current transcript evidence, and portals to relevant bioinformatics software. With a few keystrokes, the user can explore alternative transcript assemblies and, for example, select for annotation those that are clearly supported by transcript evidence and similarity to known genes. The implementation of the tool at the PlantGDB resource also allows immediate communication of the novel annotations to the community through Web display.
Collapse
Affiliation(s)
- Volker Brendel
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, USA
| |
Collapse
|
40
|
Valentini G, Tagliaferri R, Masulli F. Computational intelligence and machine learning in bioinformatics. Artif Intell Med 2008; 45:91-6. [PMID: 18929473 DOI: 10.1016/j.artmed.2008.08.014] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
41
|
A Method for Construction, Cloning and Expression of Intron-Less Gene from Unannotated Genomic DNA. Mol Biotechnol 2008; 40:217-23. [DOI: 10.1007/s12033-008-9076-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2008] [Accepted: 05/23/2008] [Indexed: 10/22/2022]
|
42
|
Abstract
As the number of sequenced genomes increases, the ability to deduce genome function becomes increasingly salient. For many genome sequences, the only annotation that will be available for the foreseeable future will be based on computational predictions and comparisons with functional elements in related species. Here we discuss computational approaches for automated genome-wide annotation of functional elements in mammalian genomes. These include methods for ab initio and comparative gene-structure predictions. Gene features such as intron splice sites, 3' untranslated regions, promoters, and cis-regulatory elements are discussed, as is a novel method for predicting DNaseI hypersensitive sites. Recent methodologies for predicting noncoding RNA genes, including microRNA genes and their targets, are also reviewed.
Collapse
Affiliation(s)
- Steven J M Jones
- Genome Sciences Centre, British Columbia Cancer Research Center, Vancouver, British Columbia, V5Z 1L3, Canada.
| |
Collapse
|
43
|
Ferro M, Tardif M, Reguer E, Cahuzac R, Bruley C, Vermat T, Nugues E, Vigouroux M, Vandenbrouck Y, Garin J, Viari A. PepLine: a software pipeline for high-throughput direct mapping of tandem mass spectrometry data on genomic sequences. J Proteome Res 2008; 7:1873-83. [PMID: 18348511 DOI: 10.1021/pr070415k] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
PepLine is a fully automated software which maps MS/MS fragmentation spectra of trypsic peptides to genomic DNA sequences. The approach is based on Peptide Sequence Tags (PSTs) obtained from partial interpretation of QTOF MS/MS spectra (first module). PSTs are then mapped on the six-frame translations of genomic sequences (second module) giving hits. Hits are then clustered to detect potential coding regions (third module). Our work aimed at optimizing the algorithms of each component to allow the whole pipeline to proceed in a fully automated manner using raw nucleic acid sequences (i.e., genomes that have not been "reduced" to a database of ORFs or putative exons sequences). The whole pipeline was tested on controlled MS/MS spectra sets from standard proteins and from Arabidopsis thaliana envelope chloroplast samples. Our results demonstrate that PepLine competed with protein database searching softwares and was fast enough to potentially tackle large data sets and/or high size genomes. We also illustrate the potential of this approach for the detection of the intron/exon structure of genes.
Collapse
Affiliation(s)
- Myriam Ferro
- CEA, DSV, iRTSV, Laboratoire d'Etude de la Dynamique des Protéomes, Grenoble, F-38054, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Kashyap L, Tabish M, Ganesh S, Dubey D. Identification and comparative analysis of novel alternatively spliced transcripts of RhoGEF domain encoding gene in C. elegans and C. briggsae. Bioinformation 2007; 2:43-9. [PMID: 18188419 PMCID: PMC2174416 DOI: 10.6026/97320630002043] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2007] [Revised: 08/16/2007] [Accepted: 08/23/2007] [Indexed: 11/23/2022] Open
Abstract
Y95B8A.12 gene of C. elegans encodes RhoGEF domain, which is a novel module in the Guanine nucleotide exchange factors (GEFs). Alternative splicing increases transcriptome and proteome diversification. Y95B8A.12 gene has two reported alternatively spliced transcripts by the C. elegans genome sequencing consortium. In the work presented here, we report the presence of four new spliced transcripts of Y95B8A.12 arising as a result of alternative splicing in the pre-mRNA encoded by Y95B8A.12 gene. Our methodology involved the use of various gene or exon finding programmes and several other bioinformatics tools followed by experimental validation. We have also studied alternative splicing pattern in RhoGEF domain encoding orthologues gene from C. briggsae and have obtained very similar results. These new unreported spliced transcripts, which were not detected through conventional approaches, not only point towards the extent of alternative splicing in C. elegans genes but also emphasize towards the need of analyzing genome data using a combinations of bioinformatics tools to delineate all possible gene products.
Collapse
Affiliation(s)
- Luv Kashyap
- Department of Biochemistry, Faculty of Life Sciences, Aligarh Muslim University, Aligarh, India
| | | | | | | |
Collapse
|
45
|
Affiliation(s)
- Dmitrij Frishman
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenchaftszentrum Weihenstephan, 85350 Freising, Germany
| |
Collapse
|
46
|
Blanco E, Parra G, Guigó R. Using geneid to identify genes. CURRENT PROTOCOLS IN BIOINFORMATICS 2007; Chapter 4:Unit 4.3. [PMID: 30332532 DOI: 10.1002/0471250953.bi0403s18] [Citation(s) in RCA: 166] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
This unit describes the usage of geneid, an efficient gene-finding program that allows for the analysis of large genomic sequences, including whole mammalian chromosomes. These sequences can be partially annotated, and geneid can be used to refine this initial annotation. Training geneid is relatively easy, and parameter configurations exist for a number of eukaryotic species. Geneid produces output in a variety of standard formats. The results, thus, can be processed by a variety of software tools, including visualization programs. Geneid software is in the public domain, and it is undergoing constant development. It is easy to install and use. Exhaustive benchmark evaluations show that geneid compares favorably with other existing gene finding tools.
Collapse
Affiliation(s)
- Enrique Blanco
- Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, Barcelona, Spain
| | | | | |
Collapse
|
47
|
Brandão A. The untranslated regions of genes from Trypanosoma cruzi: perspectives for functional characterization of strains and isolates. Mem Inst Oswaldo Cruz 2007; 101:775-7. [PMID: 17160286 DOI: 10.1590/s0074-02762006000700011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2006] [Accepted: 08/15/2006] [Indexed: 11/21/2022] Open
Abstract
The sequencing of Trypanosoma cruzi genome has been completed and a great deal of information is now available. However, the organization of protozoa genomes is somewhat elusive and much effort must be applied to reveal all the information coded in the nucleotide sequences. Among the DNA segments that needs further investigation are the untranslated regions of genes. Many of the T. cruzi genes that were revealed by the genome sequencing lack information about the untranslated regions. In this paper, some features of these untranslated segments as well as their applications in T. cruzi populations are discussed.
Collapse
Affiliation(s)
- Adeilton Brandão
- Departamento de Medicina Tropical, Instituto Oswaldo Cruz-Fiocruz, 21045-900 Rio de Janeiro, RJ, Brazil.
| |
Collapse
|
48
|
|
49
|
Hanada K, Zhang X, Borevitz JO, Li WH, Shiu SH. A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis thaliana genome are transcribed and/or under purifying selection. Genome Res 2007; 17:632-40. [PMID: 17395691 PMCID: PMC1855179 DOI: 10.1101/gr.5836207] [Citation(s) in RCA: 126] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Large-scale cDNA sequencing projects and tiling array studies have revealed the presence of many unannotated genes. For protein coding genes, small coding sequences may not be identified by gene finders because of the conservative nature of prediction algorithms. In this study, we identified small open reading frames (sORFs) with high coding potential by a simple gene finding method (Coding Index, CI) based on the nucleotide composition bias found in most coding sequences. Applying this method to 18 Arabidopsis thaliana and 84 yeast sORF genes with evidence of expression at the protein level gives 100% accurate prediction. In the A. thaliana genome, we identified 7159 sORFs that are likely coding sequences (coding sORFs) with the CI measure at the 1% false-positive rate. To determine if these coding sORFs are parts of functional genes, we evaluated each coding sORF for evidence of transcription or evolutionary conservation. At the 5% false-positive rate, we found that 2996 coding sORFs are likely expressed in at least one experimental condition of the A. thaliana tiling array data. In addition, the evolutionary conservation of each A. thaliana sORF was examined within A. thaliana or between A. thaliana and five plants with complete or partial genome sequences. In 3997 coding sORFs with readily identifiable homologous sequences, 2376 are subject to purifying selection at the 1% false-positive rate. After eliminating coding sORFs with similarity to known transposable elements and those that are likely missing exons of known genes, the remaining 3241 coding sORFs with either evidence of transcription or purifying selection likely belong to novel coding genes in the A. thaliana genome.
Collapse
Affiliation(s)
- Kousuke Hanada
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824, USA
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA
| | - Xu Zhang
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA
| | - Justin O. Borevitz
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA
| | - Wen-Hsiung Li
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824, USA
- Corresponding author.E-mail ; fax (517) 353-7244
| |
Collapse
|
50
|
Saeys Y, Rouzé P, Van de Peer Y. In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists. Bioinformatics 2007; 23:414-20. [PMID: 17204465 DOI: 10.1093/bioinformatics/btl639] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Prediction of the coding potential for stretches of DNA is crucial in gene calling and genome annotation, where it is used to identify potential exons and to position their boundaries in conjunction with functional sites, such as splice sites and translation initiation sites. The ability to discriminate between coding and non-coding sequences relates to the structure of coding sequences, which are organized in codons, and by their biased usage. For statistical reasons, the longer the sequences, the easier it is to detect this codon bias. However, in many eukaryotic genomes, where genes harbour many introns, both introns and exons might be small and hard to distinguish based on coding potential. RESULTS Here, we present novel approaches that specifically aim at a better detection of coding potential in short sequences. The methods use complementary sequence features, combined with identification of which features are relevant in discriminating between coding and non-coding sequences. These newly developed methods are evaluated on different species, representative of four major eukaryotic kingdoms, and extensively compared to state-of-the-art Markov models, which are often used for predicting coding potential. The main conclusions drawn from our analyses are that (1) combining complementary sequence features clearly outperforms current Markov models for coding potential prediction in short sequence fragments, (2) coding potential prediction benefits from length-specific models, and these models are not necessarily the same for different sequence lengths and (3) comparing the results across several species indicates that, although our combined method consistently performs extremely well, there are important differences across genomes. SUPPLEMENTARY DATA http://bioinformatics.psb.ugent.be/.
Collapse
Affiliation(s)
- Yvan Saeys
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Technologiepark 927, B-9052 Ghent, Belgium.
| | | | | |
Collapse
|