1
|
Gu X, Li L, Zhong X, Su Y, Wang T. The size diversity of the Pteridaceae family chloroplast genome is caused by overlong intergenic spacers. BMC Genomics 2024; 25:396. [PMID: 38649816 PMCID: PMC11036588 DOI: 10.1186/s12864-024-10296-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 04/09/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND While the size of chloroplast genomes (cpDNAs) is often influenced by the expansion and contraction of inverted repeat regions and the enrichment of repeats, it is the intergenic spacers (IGSs) that appear to play a pivotal role in determining the size of Pteridaceae cpDNAs. This provides an opportunity to delve into the evolution of chloroplast genomic structures of the Pteridaceae family. This study added five Pteridaceae species, comparing them with 36 published counterparts. RESULTS Poor alignment in the non-coding regions of the Pteridaceae family was observed, and this was attributed to the widespread presence of overlong IGSs in Pteridaceae cpDNAs. These overlong IGSs were identified as a major factor influencing variations in cpDNA size. In comparison to non-expanded IGSs, overlong IGSs exhibited significantly higher GC content and were rich in repetitive sequences. Species divergence time estimations suggest that these overlong IGSs may have already existed during the early radiation of the Pteridaceae family. CONCLUSIONS This study reveals new insights into the genetic variation, evolutionary history, and dynamic changes in the cpDNA structure of the Pteridaceae family, providing a fundamental resource for further exploring its evolutionary research.
Collapse
Affiliation(s)
- Xiaolin Gu
- College of Life Sciences, South China Agricultural University, 510642, Guangzhou, China
| | - Lingling Li
- College of Life Sciences, South China Agricultural University, 510642, Guangzhou, China
| | - Xiaona Zhong
- College of Life Sciences, South China Agricultural University, 510642, Guangzhou, China
| | - Yingjuan Su
- School of Life Sciences, Sun Yat-sen University, 510275, Guangzhou, China.
- Research Institute of Sun Yat-sen University in Shenzhen, 518057, Shenzhen, China.
| | - Ting Wang
- College of Life Sciences, South China Agricultural University, 510642, Guangzhou, China.
| |
Collapse
|
2
|
Liu X, Xiao C, Xu X, Zhang J, Mo F, Chen JY, Delihas N, Zhang L, An NA, Li CY. Origin of functional de novo genes in humans from "hopeful monsters". WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1845. [PMID: 38605485 DOI: 10.1002/wrna.1845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 03/13/2024] [Accepted: 03/18/2024] [Indexed: 04/13/2024]
Abstract
For a long time, it was believed that new genes arise only from modifications of preexisting genes, but the discovery of de novo protein-coding genes that originated from noncoding DNA regions demonstrates the existence of a "motherless" origination process for new genes. However, the features, distributions, expression profiles, and origin modes of these genes in humans seem to support the notion that their origin is not a purely "motherless" process; rather, these genes arise preferentially from genomic regions encoding preexisting precursors with gene-like features. In such a case, the gene loci are typically not brand new. In this short review, we will summarize the definition and features of human de novo genes and clarify their process of origination from ancestral non-coding genomic regions. In addition, we define the favored precursors, or "hopeful monsters," for the origin of de novo genes and present a discussion of the functional significance of these young genes in brain development and tumorigenesis in humans. This article is categorized under: RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution.
Collapse
Affiliation(s)
- Xiaoge Liu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chunfu Xiao
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xinwei Xu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jie Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Fan Mo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jia-Yu Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing, China
| | - Nicholas Delihas
- Department of Microbiology and Immunology, Renaissance School of Medicine, Stony Brook University, Stony Brook, New York, USA
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, China
| | - Ni A An
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chuan-Yun Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
- Southwest United Graduate School, Kunming, China
| |
Collapse
|
3
|
Khandia R, Gurjar P, Kamal MA, Greig NH. Relative synonymous codon usage and codon pair analysis of depression associated genes. Sci Rep 2024; 14:3502. [PMID: 38346990 PMCID: PMC10861588 DOI: 10.1038/s41598-024-51909-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 01/11/2024] [Indexed: 02/15/2024] Open
Abstract
Depression negatively impacts mood, behavior, and mental and physical health. It is the third leading cause of suicides worldwide and leads to decreased quality of life. We examined 18 genes available at the genetic testing registry (GTR) from the National Center for Biotechnological Information to investigate molecular patterns present in depression-associated genes. Different genotypes and differential expression of the genes are responsible for ensuing depression. The present study, investigated codon pattern analysis, which might play imperative roles in modulating gene expression of depression-associated genes. Of the 18 genes, seven and two genes tended to up- and down-regulate, respectively, and, for the remaining genes, different genotypes, an outcome of SNPs were responsible alone or in combination with differential expression for different conditions associated with depression. Codon context analysis revealed the abundance of identical GTG-GTG and CTG-CTG pairs, and the rarity of methionine-initiated codon pairs. Information based on codon usage, preferred codons, rare, and codon context might be used in constructing a deliverable synthetic construct to correct the gene expression level of the human body, which is altered in the depressive state. Other molecular signatures also revealed the role of evolutionary forces in shaping codon usage.
Collapse
Affiliation(s)
- Rekha Khandia
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal, 462026, MP, India.
| | - Pankaj Gurjar
- Centre for Global Health Research, Saveetha Medical College and Hospital, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamilnadu, India
- Department of Science and Engineering, Novel Global Community Educational Foundation, Hebersham, NSW, Australia
| | - Mohammad Amjad Kamal
- Joint Laboratory of Artificial Intelligence in Healthcare, Institutes for Systems Genetics and West China School of Nursing, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
- King Fahd Medical Research Center, King Abdulaziz University, Jeddah, 21589, Saudi Arabia
- Department of Pharmacy, Faculty of Allied Health Sciences, Daffodil International University, Dhaka, 1207, Bangladesh
- Enzymoics, Novel Global Community Educational Foundation, 7 Peterlee place, Hebersham, NSW, 2770, Australia
| | - Nigel H Greig
- Translational Gerontology Branch, Intramural Research Program, National Institute on Aging, NIH, Baltimore, MD, 21224, USA.
| |
Collapse
|
4
|
Baker L, David C, Jacobs DJ. Ab initio gene prediction for protein-coding regions. BIOINFORMATICS ADVANCES 2023; 3:vbad105. [PMID: 37638212 PMCID: PMC10448985 DOI: 10.1093/bioadv/vbad105] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 07/04/2023] [Accepted: 08/08/2023] [Indexed: 08/29/2023]
Abstract
Motivation Ab initio gene prediction in nonmodel organisms is a difficult task. While many ab initio methods have been developed, their average accuracy over long segments of a genome, and especially when assessed over a wide range of species, generally yields results with sensitivity and specificity levels in the low 60% range. A common weakness of most methods is the tendency to learn patterns that are species-specific to varying degrees. The need exists for methods to extract genetic features that can distinguish coding and noncoding regions that are not sensitive to specific organism characteristics. Results A new method based on a neural network (NN) that uses a collection of sensors to create input features is presented. It is shown that accurate predictions are achieved even when trained on organisms that are significantly different phylogenetically than test organisms. A consensus prediction algorithm for a CoDing Sequence (CDS) is subsequently applied to the first nucleotide level of NN predictions that boosts accuracy through a data-driven procedure that optimizes a CDS/non-CDS threshold. An aggregate accuracy benchmark at the nucleotide level shows that this new approach performs better than existing ab initio methods, while requiring significantly less training data. Availability and implementation https://github.com/BioMolecularPhysicsGroup-UNCC/MachineLearning.
Collapse
Affiliation(s)
- Lonnie Baker
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, NC 28223, United States
| | - Charles David
- Department of Bioinformatics, The New Zealand Institute for Plant and Food Research, Lincoln 7608, New Zealand
| | - Donald J Jacobs
- Department of Physics and Optical Science, University of North Carolina at Charlotte, NC 28223, United States
- UNC Charlotte School of Data Science, University of North Carolina at Charlotte, NC 28223, United States
| |
Collapse
|
5
|
Liu D, Liu LL, Zheng XQ, Chen R, Lin LR, Yang TC, Tong ML. Genetic Profiling of the Full-Length tprK Gene in Patients with Primary and Secondary Syphilis. Microbiol Spectr 2023; 11:e0493122. [PMID: 37036342 PMCID: PMC10269439 DOI: 10.1128/spectrum.04931-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 03/17/2023] [Indexed: 04/11/2023] Open
Abstract
TprK antigenic variation is acknowledged as an important strategy developed by Treponema pallidum to achieve immune evasion. Previous studies applied short-read sequencing to explore tprK gene sequence diversity in clinical samples; however, due to the limitations of short-read sequencing, it was difficult to determine the linkage between the seven V regions, and crucial information about full-length tprK variants was lost. Although two recent studies explored complete tprK gene profiles in natural human syphilis infection, there are still too few profiled full-length tprK variants among clinical T. pallidum isolates to fully understand the characteristics of TprK coding diversity. Here, Pacific Biosciences (PacBio) long-read sequencing was applied to examine the diversity of full-length tprK variants in 21 clinical T. pallidum isolates from 11 patients with primary syphilis and 10 patients with secondary syphilis. A total of 398 high-confidence full-length sequences, which presented remarkable sequence heterogeneity, were found. However, these full-length tprK variants exhibited limited variation in length and GC content, showing 24 length types and average GC content of 51.5 ± 0.42% and 51.6 ± 0.26% for primary and secondary syphilis samples, respectively. Additionally, the combined patterns of mutated V regions generating new tprK variants were obviously different in primary and secondary syphilis samples. The diversity of tprK gene sequences in primary syphilis samples may represent the underlying variability of the bacterium; conversely, the variability of the tprK gene in secondary syphilis samples may more accurately reflect how T. pallidum escapes host immune clearance. These data highlight the tprK gene as an important coding gene that shows conflicting genetic characteristics but underlies the persistence of spirochete infection. IMPORTANCE The resurgence of syphilis in both low- and high-income countries has attracted attention, and persistent infection by the pathogen has long been a research focus. The tprK gene, encoding the hypervariable outer membrane protein, is thought to be responsible for pathogen immune evasion and persistent infection. Here, PacBio long-read sequencing was applied to examine the diversity of full-length tprK variants in 21 clinical T. pallidum isolates from 11 patients with primary syphilis and 10 patients with secondary syphilis. The results showed that the sequences of the tprK gene were remarkably heterogeneous; however, the sequences presented limited variation in length and GC content. The investigation of the combined patterns of the V regions allowed us to gain insight into the features of the tprK gene generating new variants at different clinical stages. The findings of this study will be helpful for further exploration of the pathogenesis of syphilis.
Collapse
Affiliation(s)
- Dan Liu
- Center of Clinical Laboratory, Zhongshan Hospital, School of Medicine, Xiamen University, Xiamen, China
- Institute of Infectious Disease, School of Medicine, Xiamen University, Xiamen, China
| | - Li-Li Liu
- Center of Clinical Laboratory, Zhongshan Hospital, School of Medicine, Xiamen University, Xiamen, China
- Institute of Infectious Disease, School of Medicine, Xiamen University, Xiamen, China
| | - Xin-Qi Zheng
- Institute of Infectious Disease, School of Medicine, Xiamen University, Xiamen, China
| | - Rui Chen
- Institute of Infectious Disease, School of Medicine, Xiamen University, Xiamen, China
| | - Li-Rong Lin
- Center of Clinical Laboratory, Zhongshan Hospital, School of Medicine, Xiamen University, Xiamen, China
- Institute of Infectious Disease, School of Medicine, Xiamen University, Xiamen, China
| | - Tian-Ci Yang
- Center of Clinical Laboratory, Zhongshan Hospital, School of Medicine, Xiamen University, Xiamen, China
- Institute of Infectious Disease, School of Medicine, Xiamen University, Xiamen, China
| | - Man-Li Tong
- Center of Clinical Laboratory, Zhongshan Hospital, School of Medicine, Xiamen University, Xiamen, China
- Institute of Infectious Disease, School of Medicine, Xiamen University, Xiamen, China
| |
Collapse
|
6
|
Nevers Y, Glover NM, Dessimoz C, Lecompte O. Protein length distribution is remarkably uniform across the tree of life. Genome Biol 2023; 24:135. [PMID: 37291671 PMCID: PMC10251718 DOI: 10.1186/s13059-023-02973-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/16/2023] [Indexed: 06/10/2023] Open
Abstract
BACKGROUND In every living species, the function of a protein depends on its organization of structural domains, and the length of a protein is a direct reflection of this. Because every species evolved under different evolutionary pressures, the protein length distribution, much like other genomic features, is expected to vary across species but has so far been scarcely studied. RESULTS Here we evaluate this diversity by comparing protein length distribution across 2326 species (1688 bacteria, 153 archaea, and 485 eukaryotes). We find that proteins tend to be on average slightly longer in eukaryotes than in bacteria or archaea, but that the variation of length distribution across species is low, especially compared to the variation of other genomic features (genome size, number of proteins, gene length, GC content, isoelectric points of proteins). Moreover, most cases of atypical protein length distribution appear to be due to artifactual gene annotation, suggesting the actual variation of protein length distribution across species is even smaller. CONCLUSIONS These results open the way for developing a genome annotation quality metric based on protein length distribution to complement conventional quality measures. Overall, our findings show that protein length distribution between living species is more uniform than previously thought. Furthermore, we also provide evidence for a universal selection on protein length, yet its mechanism and fitness effect remain intriguing open questions.
Collapse
Affiliation(s)
- Yannis Nevers
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland.
| | - Natasha M Glover
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland
- Department of Computer Science, University College London, London, UK
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Odile Lecompte
- Department of Computer Science, Centre de Recherche en Biomédecine de Strasbourg, ICube, UMR 7357, University of Strasbourg, CNRS, Strasbourg, France
| |
Collapse
|
7
|
Lamolle G, Simón D, Iriarte A, Musto H. Main Factors Shaping Amino Acid Usage Across Evolution. J Mol Evol 2023:10.1007/s00239-023-10120-5. [PMID: 37264211 DOI: 10.1007/s00239-023-10120-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 05/17/2023] [Indexed: 06/03/2023]
Abstract
The standard genetic code determines that in most species, including viruses, there are 20 amino acids that are coded by 61 codons, while the other three codons are stop triplets. Considering the whole proteome each species features its own amino acid frequencies, given the slow rate of change, closely related species display similar GC content and amino acids usage. In contrast, distantly related species display different amino acid frequencies. Furthermore, within certain multicellular species, as mammals, intragenomic differences in the usage of amino acids are evident. In this communication, we shall summarize some of the most prominent and well-established factors that determine the differences found in the amino acid usage, both across evolution and intragenomically.
Collapse
Affiliation(s)
- Guillermo Lamolle
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
| | - Diego Simón
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Evolución Experimental de Virus, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Andrés Iriarte
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de La República, Montevideo, Uruguay
| | - Héctor Musto
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay.
| |
Collapse
|
8
|
The first draft genome assembly and data analysis of the Malaysian mahseer (Tor tambroides). AQUACULTURE AND FISHERIES 2022. [DOI: 10.1016/j.aaf.2022.05.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
9
|
Pavesi A. Origin, Evolution and Stability of Overlapping Genes in Viruses: A Systematic Review. Genes (Basel) 2021; 12:genes12060809. [PMID: 34073395 PMCID: PMC8227390 DOI: 10.3390/genes12060809] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 05/22/2021] [Accepted: 05/24/2021] [Indexed: 12/11/2022] Open
Abstract
During their long evolutionary history viruses generated many proteins de novo by a mechanism called “overprinting”. Overprinting is a process in which critical nucleotide substitutions in a pre-existing gene can induce the expression of a novel protein by translation of an alternative open reading frame (ORF). Overlapping genes represent an intriguing example of adaptive conflict, because they simultaneously encode two proteins whose freedom to change is constrained by each other. However, overlapping genes are also a source of genetic novelties, as the constraints under which alternative ORFs evolve can give rise to proteins with unusual sequence properties, most importantly the potential for novel functions. Starting with the discovery of overlapping genes in phages infecting Escherichia coli, this review covers a range of studies dealing with detection of overlapping genes in small eukaryotic viruses (genomic length below 30 kb) and recognition of their critical role in the evolution of pathogenicity. Origin of overlapping genes, what factors favor their birth and retention, and how they manage their inherent adaptive conflict are extensively reviewed. Special attention is paid to the assembly of overlapping genes into ad hoc databases, suitable for future studies, and to the development of statistical methods for exploring viral genome sequences in search of undiscovered overlaps.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 23/A, I-43124 Parma, Italy
| |
Collapse
|
10
|
The whale shark genome reveals how genomic and physiological properties scale with body size. Proc Natl Acad Sci U S A 2020; 117:20662-20671. [PMID: 32753383 DOI: 10.1073/pnas.1922576117] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
The endangered whale shark (Rhincodon typus) is the largest fish on Earth and a long-lived member of the ancient Elasmobranchii clade. To characterize the relationship between genome features and biological traits, we sequenced and assembled the genome of the whale shark and compared its genomic and physiological features to those of 83 animals and yeast. We examined the scaling relationships between body size, temperature, metabolic rates, and genomic features and found both general correlations across the animal kingdom and features specific to the whale shark genome. Among animals, increased lifespan is positively correlated to body size and metabolic rate. Several genomic traits also significantly correlated with body size, including intron and gene length. Our large-scale comparative genomic analysis uncovered general features of metazoan genome architecture: Guanine and cytosine (GC) content and codon adaptation index are negatively correlated, and neural connectivity genes are longer than average genes in most genomes. Focusing on the whale shark genome, we identified multiple features that significantly correlate with lifespan. Among these were very long gene length, due to introns being highly enriched in repetitive elements such as CR1-like long interspersed nuclear elements, and considerably longer neural genes of several types, including connectivity, activity, and neurodegeneration genes. The whale shark genome also has the second slowest evolutionary rate observed in vertebrates to date. Our comparative genomics approach uncovered multiple genetic features associated with body size, metabolic rate, and lifespan and showed that the whale shark is a promising model for studies of neural architecture and lifespan.
Collapse
|
11
|
Avissa R, Widyaningtyas ST, Bela B. Optimization of the <em> apolipoprotein B mRNA editing enzyme catalytic polypeptidelike-3G </em> (<em>APOBEC3G</em>) gene to enhance its expression in <em> Escherichia coli </em>. MEDICAL JOURNAL OF INDONESIA 2020. [DOI: 10.13181/mji.oa.202853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
BACKGROUND Apolipoprotein B mRNA editing enzyme catalytic polypeptide-like-3G (APOBEC3G) can abolish HIV infection by inducing lethal mutations in the HIV genome. The HIV protein virion infectivity factor (Vif) can interact with APOBEC3G protein and cause its degradation. Development of a method that can screen substances inhibiting the APOBEC3G-Vif interaction is necessary for identification of substances that potentially used in anti-HIV drug development. In order to increase expression of recombinant APOBEC3G protein that will be used in APOBEC3G-Vif interaction assay, we developed an optimized APOBEC3G gene for expression in Escherichia coli.
METHODS The gene coding APOBEC3G was codon-optimized in accordance with prokaryotic codon using DNA 2.0 software to avoid bias codons that could inhibit its expression. The APOBEC3G gene was synthesized and sub-cloned into pQE80L plasmid vector. pQE80L containing APOBEC3G was screened by polymerase chain reaction, enzyme restriction, and sequencing to verify its DNA sequence. The recombinant APOBEC3G was expressed in E. coli under isopropyl-β-D-thiogalactoside (IPTG) induction and purified by using nickel-nitrilotriacetic acid (Ni-NTA) resin.
RESULTS The synthetic gene coding APOBEC3G was successfully cloned into the pQE80L vector and could be expressed abundantly in E. coli BL21 in the presence of IPTG.
CONCLUSIONS Recombinant APOBEC3G is robustly expressed in E. coli BL21, and the APOBEC3G protein could be purified by using Ni-NTA. The molecular weight of the recombinant APOBEC3G produced is smaller than the expected value. However, the protein is predicted to be able to interact with Vif because this interaction is determined by a specific domain located on the N-terminal of APOBEC3G.
Collapse
|
12
|
Exploring Castellaniella defragrans Linalool (De)hydratase-Isomerase for Enzymatic Hydration of Alkenes. Molecules 2019; 24:molecules24112092. [PMID: 31159367 PMCID: PMC6600392 DOI: 10.3390/molecules24112092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Revised: 05/30/2019] [Accepted: 05/31/2019] [Indexed: 01/08/2023] Open
Abstract
Acyclic monoterpenes constitute a large and highly abundant class of secondary plant metabolites and are, therefore, attractive low-cost raw materials for the chemical industry. To date, numerous biocatalysts for their transformation are known, giving access to highly sought-after monoterpenoids. In view of the high selectivity associated with many of these reactions, the demand for enzymes generating commercially important target molecules is unabated. Here, linalool (de)hydratase-isomerase (Ldi, EC 4.2.1.127) from Castellaniella defragrans was examined for the regio- and stereoselective hydration of the acyclic monoterpene β-myrcene to (S)-(+)-linalool. Expression of the native enzyme in Escherichia coli allowed for identification of bottlenecks limiting enzyme activity, which were investigated by mutating selected residues implied in enzyme assembly and function. Combining these analyses with the recently published 3D structures of Ldi highlighted the precisely coordinated reduction-oxidation state of two cysteine pairs in correct oligomeric assembly and the catalytic mechanism, respectively. Subcellular targeting studies upon fusion of Ldi to different signal sequences revealed the significance of periplasmic localization of the mature enzyme in the heterologous expression host. This study provides biochemical and mechanistic insight into the hydration of β-myrcene, a nonfunctionalized terpene, and emphasizes its potential for access to scarcely available but commercially interesting tertiary alcohols.
Collapse
|
13
|
Kasai F, O'Brien PCM, Ferguson-Smith MA. Squamate Chromosome Size and GC Content Assessed by Flow Karyotyping. Cytogenet Genome Res 2019; 157:46-52. [PMID: 30904910 DOI: 10.1159/000497265] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Chromosome homologies in reptiles have been investigated extensively by gene mapping and chromosome painting. Relative chromosome size can be estimated roughly from conventional karyotypes, but chromosome GC content cannot be evaluated by any of these approaches. However, GC content can be obtained by whole-genome sequencing, although complete data are available only for a limited number of reptilian species. Chromosomes can be characterized by size and GC content in bivariate flow karyotypes, in which the distribution of peaks represents the differences. We have analysed flow karyotypes from 9 representative squamate species and show chromosome profiles for each species based on the relationship between size and GC content. Our results reveal that the GC content of macrochromosomes is invariable in the 9 species. A higher GC content was found in microchromosomes, similar to profiles previously determined in crocodile, turtle, and chicken. The findings suggest that karyotype evolution in reptiles is characterized by unique features of chromosome GC content.
Collapse
|
14
|
Miravet-Verde S, Ferrar T, Espadas-García G, Mazzolini R, Gharrab A, Sabido E, Serrano L, Lluch-Senar M. Unraveling the hidden universe of small proteins in bacterial genomes. Mol Syst Biol 2019; 15:e8290. [PMID: 30796087 PMCID: PMC6385055 DOI: 10.15252/msb.20188290] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Identification of small open reading frames (smORFs) encoding small proteins (≤ 100 amino acids; SEPs) is a challenge in the fields of genome annotation and protein discovery. Here, by combining a novel bioinformatics tool (RanSEPs) with “‐omics” approaches, we were able to describe 109 bacterial small ORFomes. Predictions were first validated by performing an exhaustive search of SEPs present in Mycoplasma pneumoniae proteome via mass spectrometry, which illustrated the limitations of shotgun approaches. Then, RanSEPs predictions were validated and compared with other tools using proteomic datasets from different bacterial species and SEPs from the literature. We found that up to 16 ± 9% of proteins in an organism could be classified as SEPs. Integration of RanSEPs predictions with transcriptomics data showed that some annotated non‐coding RNAs could in fact encode for SEPs. A functional study of SEPs highlighted an enrichment in the membrane, translation, metabolism, and nucleotide‐binding categories. Additionally, 9.7% of the SEPs included a N‐terminus predicted signal peptide. We envision RanSEPs as a tool to unmask the hidden universe of small bacterial proteins.
Collapse
Affiliation(s)
- Samuel Miravet-Verde
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Tony Ferrar
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Guadalupe Espadas-García
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Rocco Mazzolini
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Anas Gharrab
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Eduard Sabido
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Luis Serrano
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain .,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Maria Lluch-Senar
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain .,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| |
Collapse
|
15
|
Wang G, Yin H, Li B, Yu C, Wang F, Xu X, Cao J, Bao Y, Wang L, Abbasi AA, Bajic VB, Ma L, Zhang Z. Characterization and identification of long non-coding RNAs based on feature relationship. Bioinformatics 2019; 35:2949-2956. [DOI: 10.1093/bioinformatics/btz008] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 12/05/2018] [Accepted: 01/07/2019] [Indexed: 01/24/2023] Open
Abstract
Abstract
Motivation
The significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification of lncRNAs in a wide range of species remains challenging; it requires prior knowledge of well-established sequences and annotations or species-specific training data, but the reality is that only a limited number of species have high-quality sequences and annotations.
Results
Here we first characterize lncRNAs in contrast to protein-coding RNAs based on feature relationship and find that the feature relationship between open reading frame length and guanine-cytosine (GC) content presents universally substantial divergence in lncRNAs and protein-coding RNAs, as observed in a broad variety of species. Based on the feature relationship, accordingly, we further present LGC, a novel algorithm for identifying lncRNAs that is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without any prior knowledge. As validated on large-scale empirical datasets, comparative results show that LGC outperforms existing algorithms by achieving higher accuracy, well-balanced sensitivity and specificity, and is robustly effective (>90% accuracy) in discriminating lncRNAs from protein-coding RNAs across diverse species that range from plants to mammals. To our knowledge, this study, for the first time, differentially characterizes lncRNAs and protein-coding RNAs based on feature relationship, which is further applied in computational identification of lncRNAs. Taken together, our study represents a significant advance in characterization and identification of lncRNAs and LGC thus bears broad potential utility for computational analysis of lncRNAs in a wide range of species.
Availability and implementation
LGC web server is publicly available at http://bigd.big.ac.cn/lgc/calculator. The scripts and data can be downloaded at http://bigd.big.ac.cn/biocode/tools/BT000004.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guangyu Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Hongyan Yin
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Boyang Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Chunlei Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Fan Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Xingjian Xu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jiabao Cao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yiming Bao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Liguo Wang
- Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - Amir A Abbasi
- National Center for Bioinformatics, Programme of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal, Kingdom of Saudi Arabia
| | - Lina Ma
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
16
|
Casola C. From De Novo to "De Nono": The Majority of Novel Protein-Coding Genes Identified with Phylostratigraphy Are Old Genes or Recent Duplicates. Genome Biol Evol 2018; 10:2906-2918. [PMID: 30346517 PMCID: PMC6239577 DOI: 10.1093/gbe/evy231] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2018] [Indexed: 12/11/2022] Open
Abstract
The evolution of novel protein-coding genes from noncoding regions of the genome is one of the most compelling pieces of evidence for genetic innovations in nature. One popular approach to identify de novo genes is phylostratigraphy, which consists of determining the approximate time of origin (age) of a gene based on its distribution along a species phylogeny. Several studies have revealed significant flaws in determining the age of genes, including de novo genes, using phylostratigraphy alone. However, the rate of false positives in de novo gene surveys, based on phylostratigraphy, remains unknown. Here, I reanalyze the findings from three studies, two of which identified tens to hundreds of rodent-specific de novo genes adopting a phylostratigraphy-centered approach. Most putative de novo genes discovered in these investigations are no longer included in recently updated mouse gene sets. Using a combination of synteny information and sequence similarity searches, I show that ∼60% of the remaining 381 putative de novo genes share homology with genes from other vertebrates, originated through gene duplication, and/or share no synteny information with nonrodent mammals. These results led to an estimated rate of ∼12 de novo genes per million years in mouse. Contrary to a previous study (Wilson BA, Foy SG, Neme R, Masel J. 2017. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol. 1:0146), I found no evidence supporting the preadaptation hypothesis of de novo gene formation. Nearly half of the de novo genes confirmed in this study are within older genes, indicating that co-option of preexisting regulatory regions and a higher GC content may facilitate the origin of novel genes.
Collapse
Affiliation(s)
- Claudio Casola
- Department of Ecosystem Science and Management, Texas A&M University
| |
Collapse
|
17
|
Kasai F, O'Brien PCM, Pereira JC, Ferguson-Smith MA. Marsupial chromosome DNA content and genome size assessed from flow karyotypes: invariable low autosomal GC content. ROYAL SOCIETY OPEN SCIENCE 2018; 5:171539. [PMID: 30224977 PMCID: PMC6124049 DOI: 10.1098/rsos.171539] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Accepted: 08/06/2018] [Indexed: 06/08/2023]
Abstract
Extensive chromosome homologies revealed by cross-species chromosome painting between marsupials have suggested a high level of genome conservation during evolution. Surprisingly, it has been reported that marsupial genome sizes vary by more than 1.2 Gb between species. We have shown previously that individual chromosome sizes and GC content can be measured in flow karyotypes, and have applied this method to compare four marsupial species. Chromosome sizes and GC content were calculated for the grey short-tailed opossum (2n = 18), tammar wallaby (2n = 16), Tasmanian devil (2n = 14) and fat-tailed dunnart (2n = 14), resulting in genome sizes of 3.41, 3.31, 3.17 and 3.25 Gb, respectively. The findings under the same conditions allow a comparison between the four species, indicating that the genomes of these four species are 1-8% larger than human. We show that marsupial genomes are characterized by a low GC content invariable between autosomes and distinct from the higher GC content of the marsupial × chromosome.
Collapse
Affiliation(s)
- Fumio Kasai
- Author for correspondence: Fumio Kasai e-mail:
| | | | | | | |
Collapse
|
18
|
Kapase VU, Nesamma AA, Jutur PP. Identification and characterization of candidates involved in production of OMEGAs in microalgae: a gene mining and phylogenomic approach. Prep Biochem Biotechnol 2018; 48:619-628. [PMID: 29932840 DOI: 10.1080/10826068.2018.1476886] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Optimizing the production of the high-value renewables such as OMEGAs through pathway engineering requires an in-depth understanding of the structure-function relationship of genes involved in the OMEGA biosynthetic pathways. In this preliminary study, our rationale is to identify and characterize the ∼221 putative genes involved in production of OMEGAs using bioinformatic analysis from the Streptophyte (plants), Chlorophyte (green algae), Rhodophyta (red algae), and Bacillariophyta (diatoms) lineages based on their phylogenomic profiling, conserved motif/domain organization and physico-chemical properties. The MEME suite predicted 12 distinct protein domains, which are conserved among these putative genes. The phylogenomic analysis of the putative candidate genes [such as FAD2 (delta-12 desaturase); ECR (enoyl-CoA reductase); FAD2 (delta-12 desaturase); ACOT (acyl CoA thioesterase); ECH (enoyl-CoA hydratase); and ACAT (acetyl-CoA acyltransferase)] with similar domains and motif patterns were remarkably well conserved. Furthermore, the subcellular network prediction of OMEGA biosynthetic pathway genes revealed a unique interaction between the light-dependent chlorophyll biosynthesis and glycerol-3-phosphate dehydrogenase, which predicts a major cross-talk between the key essential pathways. Such bioinformatic analysis will provide insights in finding the key regulatory genes to optimize the productivity of OMEGAs in microalgal cell factories.
Collapse
Affiliation(s)
- Vikas U Kapase
- a Omics of Algae Group, Integrative Biology , International Centre for Genetic Engineering and Biotechnology , New Delhi , India
| | - Asha A Nesamma
- a Omics of Algae Group, Integrative Biology , International Centre for Genetic Engineering and Biotechnology , New Delhi , India
| | - Pannaga P Jutur
- a Omics of Algae Group, Integrative Biology , International Centre for Genetic Engineering and Biotechnology , New Delhi , India
| |
Collapse
|
19
|
Li M, Ponce-Gordo F, Grim JN, Li C, Zou H, Li W, Wu S, Wang G. Morphological Redescription ofOpalina undulataNie 1932 fromFejervarya limnochariswith Molecular Phylogenetic Study of Opalinids (Heterokonta, Opalinea). J Eukaryot Microbiol 2018; 65:783-791. [DOI: 10.1111/jeu.12520] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2017] [Revised: 03/20/2018] [Accepted: 03/23/2018] [Indexed: 11/26/2022]
Affiliation(s)
- Ming Li
- Key Laboratory of Aquaculture Disease Control; Ministry of Agriculture, and State Key Laboratory of Freshwater Ecology and Biotechnology; Institute of Hydrobiology; Chinese Academy of Sciences; Wuhan 430072 China
| | - Francisco Ponce-Gordo
- Departamento de Microbiología y Parasitología; Facultad de Farmacia; Universidad Complutense de Madrid; Plaza Ramóny Cajal s/n 28040 Madrid Spain
| | - J. Norman Grim
- Department of Biological Sciences; Northern Arizona University; Flagstaff Arizona 86011
| | - Can Li
- Key Laboratory of Aquaculture Disease Control; Ministry of Agriculture, and State Key Laboratory of Freshwater Ecology and Biotechnology; Institute of Hydrobiology; Chinese Academy of Sciences; Wuhan 430072 China
| | - Hong Zou
- Key Laboratory of Aquaculture Disease Control; Ministry of Agriculture, and State Key Laboratory of Freshwater Ecology and Biotechnology; Institute of Hydrobiology; Chinese Academy of Sciences; Wuhan 430072 China
| | - Wenxiang Li
- Key Laboratory of Aquaculture Disease Control; Ministry of Agriculture, and State Key Laboratory of Freshwater Ecology and Biotechnology; Institute of Hydrobiology; Chinese Academy of Sciences; Wuhan 430072 China
| | - Shangong Wu
- Key Laboratory of Aquaculture Disease Control; Ministry of Agriculture, and State Key Laboratory of Freshwater Ecology and Biotechnology; Institute of Hydrobiology; Chinese Academy of Sciences; Wuhan 430072 China
| | - Guitang Wang
- Key Laboratory of Aquaculture Disease Control; Ministry of Agriculture, and State Key Laboratory of Freshwater Ecology and Biotechnology; Institute of Hydrobiology; Chinese Academy of Sciences; Wuhan 430072 China
| |
Collapse
|
20
|
Hu X, Ke L, Wang Z, Zeng Z. Dynamic transcriptome landscape of Asian domestic honeybee (Apis cerana) embryonic development revealed by high-quality RNA sequencing. BMC DEVELOPMENTAL BIOLOGY 2018; 18:11. [PMID: 29653508 PMCID: PMC5899340 DOI: 10.1186/s12861-018-0169-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 04/03/2018] [Indexed: 12/18/2022]
Abstract
Background Honeybee development consists of four stages: embryo, larva, pupa and adult. Embryogenesis, a key process of cell division and differentiation, takes 3 days in honeybees. However, the embryonic transcriptome and the dynamic regulation of embryonic transcription are still largely uncharacterized in honeybees, especially in the Asian honeybee (Apis cerana). Here, we employed high-quality RNA-seq to explore the transcriptome of Asian honeybee embryos at three ages, approximately 24, 48 and 72 h (referred to as Day1, Day2 and Day3, respectively). Results Nine embryo samples, three from each age, were collected for RNA-seq. According to the staging scheme of honeybee embryos and the morphological features we observed, our Day1, Day2 and Day3 embryos likely corresponded to the late stage four, stage eight and stage ten development stages, respectively. Hierarchical clustering and principal component analysis showed that same-age samples were grouped together, and the Day2 samples had a closer relationship with the Day3 samples than the Day1 samples. Finally, a total of 18,284 genes harboring 55,646 transcripts were detected in the A. cerana embryos, of which 44.5% consisted of the core transcriptome shared by all three ages of embryos. A total of 4088 upregulated and 3046 downregulated genes were identified among the three embryo ages, of which 2010, 3177 and 1528 genes were upregulated and 2088, 2294 and 303 genes were downregulated from Day1 to Day2, from Day1 to Day3 and from Day2 to Day3, respectively. The downregulated genes were mostly involved in cellular, biosynthetic and metabolic processes, gene expression and protein localization, and macromolecule modification; the upregulated genes mainly participated in cell development and differentiation, tissue, organ and system development, and morphogenesis. Interestingly, several biological processes related to the response to and detection of light stimuli were enriched in the first-day A. cerana embryogenesis but not in the Apis mellifera embryogenesis, which was valuable for further investigations. Conclusions Our transcriptomic data substantially expand the number of known transcribed elements in the A. cerana genome and provide a high-quality view of the transcriptome dynamics of A. cerana embryonic development. Electronic supplementary material The online version of this article (10.1186/s12861-018-0169-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaofen Hu
- Honeybee Research Institute, Jiangxi Agricultural University, Nanchang, 330045, Jiangxi, China
| | - Li Ke
- Honeybee Research Institute, Jiangxi Agricultural University, Nanchang, 330045, Jiangxi, China
| | - Zilong Wang
- Honeybee Research Institute, Jiangxi Agricultural University, Nanchang, 330045, Jiangxi, China
| | - Zhijiang Zeng
- Honeybee Research Institute, Jiangxi Agricultural University, Nanchang, 330045, Jiangxi, China.
| |
Collapse
|
21
|
Hamid MH, Rozano L, Yeong WC, Abdullah JO, Saidi NB. Analysis of MAP kinase MPK4/MEKK1/MKK genes of Carica papaya L. comparative to other plant homologues. Bioinformation 2017; 13:31-41. [PMID: 28642634 PMCID: PMC5463617 DOI: 10.6026/97320630013031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Revised: 02/17/2017] [Accepted: 02/17/2017] [Indexed: 12/25/2022] Open
Abstract
Mitogen-activated protein kinase 4 (MPK4) interacts with the (Mitogen-activated protein kinase kinase kinase 1) MEKK1/ Mitogenactivated protein kinase kinase 1 (MKK1)/ Mitogen-activated protein kinase kinase 2 (MKK2) complex to affect its function in plant development or against pathogen attacks. The KEGG (Kyoto Encyclopedia of Genes and Genomes) network analysis of Arabidopsis thaliana revealed close interactions between those four genes in the same plant-pathogen interaction pathway, which warrants further study of these genes due to their evolutionary conservation in different plant species. Through targeting the signature sequence in MPK4 of papaya using orthologs from Arabidopsis, the predicted sequence of MPK4 was studied using a comparative in silico approach between different plant species and the MAP cascade complex of MEKK1/MKK1/MKK2. This paper reported that MPK4 was highly conserved in papaya with 93% identical across more than 500 bases compared in each species predicted. Slight variations found in the MEKK1/MKK1/MKK2 complex nevertheless still illustrated sequence similarities between most of the species. Localization of each gene in the cascade network was also predicted, potentiating future functional verification of these genes interactions using knock out or/and gene silencing tactics.
Collapse
Affiliation(s)
- Muhammad Hanam Hamid
- Biotechnology and Nanotechnology Research Centre, Malaysian Agricultural Research and Development Institute, 43400 Serdang, Selangor, Malaysia
- Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Science, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia
| | - Lina Rozano
- Biotechnology and Nanotechnology Research Centre, Malaysian Agricultural Research and Development Institute, 43400 Serdang, Selangor, Malaysia
| | - Wee Chien Yeong
- Biotechnology and Nanotechnology Research Centre, Malaysian Agricultural Research and Development Institute, 43400 Serdang, Selangor, Malaysia
| | - Janna Ong Abdullah
- Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Science, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia
| | - Noor Baity Saidi
- Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Science, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia
| |
Collapse
|
22
|
Samchenko AA, Kiselev SS, Kabanov AV, Kondratjev MS, Komarov VM. On the nature of the domination of oligomeric (dA:dT) n tracts in the structure of eukaryotic genomes. Biophysics (Nagoya-shi) 2016. [DOI: 10.1134/s0006350916060233] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
23
|
Sun S, Xiao J, Zhang H, Zhang Z. Pangenome Evidence for Higher Codon Usage Bias and Stronger Translational Selection in Core Genes of Escherichia coli. Front Microbiol 2016; 7:1180. [PMID: 27536275 PMCID: PMC4971109 DOI: 10.3389/fmicb.2016.01180] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Accepted: 07/18/2016] [Indexed: 11/25/2022] Open
Abstract
Codon usage bias, as a combined interplay from mutation and selection, has been intensively studied in Escherichia coli. However, codon usage analysis in an E. coli pangenome remains unexplored and the relative importance of mutation and selection acting on core genes and strain-specific genes is unknown. Here we perform comprehensive codon usage analyses based on a collection of multiple complete genome sequences of E. coli. Our results show that core genes that are present in all strains have higher codon usage bias than strain-specific genes that are unique to single strains. We further explore the forces in influencing codon usage and investigate the difference of the major force between core and strain-specific genes. Our results demonstrate that although mutation may exert genome-wide influences on codon usage acting similarly in different gene sets, selection dominates as an important force to shape biased codon usage as genes are present in an increased number of strains. Together, our results provide important insights for better understanding genome plasticity and complexity as well as evolutionary mechanisms behind codon usage bias.
Collapse
Affiliation(s)
- Shixiang Sun
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of SciencesBeijing, China; BIG Data Center, Beijing Institute of Genomics, Chinese Academy of SciencesBeijing, China; University of Chinese Academy of SciencesBeijing, China
| | - Jingfa Xiao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of SciencesBeijing, China; BIG Data Center, Beijing Institute of Genomics, Chinese Academy of SciencesBeijing, China
| | - Huiyong Zhang
- College of Life Sciences, Henan Agricultural University Zhengzhou, China
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of SciencesBeijing, China; BIG Data Center, Beijing Institute of Genomics, Chinese Academy of SciencesBeijing, China
| |
Collapse
|
24
|
Wang G, Sun S, Zhang Z. Randomness in Sequence Evolution Increases over Time. PLoS One 2016; 11:e0155935. [PMID: 27224236 PMCID: PMC4880282 DOI: 10.1371/journal.pone.0155935] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 05/06/2016] [Indexed: 12/02/2022] Open
Abstract
The second law of thermodynamics states that entropy, as a measure of randomness in a system, increases over time. Although studies have investigated biological sequence randomness from different aspects, it remains unknown whether sequence randomness changes over time and whether this change consists with the second law of thermodynamics. To capture the dynamics of randomness in molecular sequence evolution, here we detect sequence randomness based on a collection of eight statistical random tests and investigate the randomness variation of coding sequences with an application to Escherichia coli. Given that core/essential genes are more ancient than specific/non-essential genes, our results clearly show that core/essential genes are more random than specific/non-essential genes and accordingly indicate that sequence randomness indeed increases over time, consistent well with the second law of thermodynamics. We further find that an increase in sequence randomness leads to increasing randomness of GC content and longer sequence length. Taken together, our study presents an important finding, for the first time, that sequence randomness increases over time, which may provide profound insights for unveiling the underlying mechanisms of molecular sequence evolution.
Collapse
Affiliation(s)
- Guangyu Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing 100101, China
- BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shixiang Sun
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing 100101, China
- BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing 100101, China
- BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
25
|
Geographic isolates of Lymantria dispar multiple nucleopolyhedrovirus: Genome sequence analysis and pathogenicity against European and Asian gypsy moth strains. J Invertebr Pathol 2016; 137:10-22. [PMID: 27090923 DOI: 10.1016/j.jip.2016.03.014] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Revised: 03/07/2016] [Accepted: 03/29/2016] [Indexed: 02/04/2023]
Abstract
Isolates of the baculovirus species Lymantria dispar multiple nucleopolyhedrovirus have been formulated and applied to suppress outbreaks of the gypsy moth, L. dispar. To evaluate the genetic diversity in this species at the genomic level, the genomes of three isolates from Massachusetts, USA (LdMNPV-Ab-a624), Spain (LdMNPV-3054), and Japan (LdMNPV-3041) were sequenced and compared with four previously determined LdMNPV genome sequences. The LdMNPV genome sequences were collinear and contained the same homologous repeats (hrs) and clusters of baculovirus repeat orf (bro) gene family members in the same relative positions in their genomes, although sequence identities in these regions were low. Of 146 non-bro ORFs annotated in the genome of the representative isolate LdMNPV 5-6, 135 ORFs were found in every other LdMNPV genome, including the 37 core genes of Baculoviridae and other genes conserved in genus Alphabaculovirus. Phylogenetic inference with an alignment of the core gene nucleotide sequences grouped isolates 3041 (Japan) and 2161 (Korea) separately from a cluster containing isolates from Europe, North America, and Russia. To examine phenotypic diversity, bioassays were carried out with a selection of isolates against neonate larvae from three European gypsy moth (Lymantria dispar dispar) and three Asian gypsy moth (Lymantria dispar asiatica and Lymantria dispar japonica) colonies. LdMNPV isolates 2161 (Korea), 3029 (Russia), and 3041 (Japan) exhibited a greater degree of pathogenicity against all L. dispar strains than LdMNPV from a sample of Gypchek. This study provides additional information on the genetic diversity of LdMNPV isolates and their activity against the Asian gypsy moth, a potential invasive pest of North American trees and forests.
Collapse
|
26
|
Abstract
Exonic splice enhancers (ESEs) are short nucleotide motifs, enriched near exon ends, that enhance the recognition of the splice site and thus promote splicing. Are intronless genes under selection to avoid these motifs so as not to attract the splicing machinery to an mRNA that should not be spliced, thereby preventing the production of an aberrant transcript? Consistent with this possibility, we find that ESEs in putative recent retrocopies are at a higher density and evolving faster than those in other intronless genes, suggesting that they are being lost. Moreover, intronless genes are less dense in putative ESEs than intron-containing ones. However, this latter difference is likely due to the skewed base composition of intronless sequences, a skew that is in line with the general GC richness of few exon genes. Indeed, after controlling for such biases, we find that both intronless and intron-containing genes are denser in ESEs than expected by chance. Importantly, nucleotide-controlled analysis of evolutionary rates at synonymous sites in ESEs indicates that the ESEs in intronless genes are under purifying selection in both human and mouse. We conclude that on the loss of introns, some but not all, ESE motifs are lost, the remainder having functions beyond a role in splice promotion. These results have implications for the design of intronless transgenes and for understanding the causes of selection on synonymous sites.
Collapse
Affiliation(s)
- Rosina Savisaar
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| |
Collapse
|
27
|
Van Campenhout J, Vanreusel A, Van Belleghem S, Derycke S. Transcription, Signaling Receptor Activity, Oxidative Phosphorylation, and Fatty Acid Metabolism Mediate the Presence of Closely Related Species in Distinct Intertidal and Cold-Seep Habitats. Genome Biol Evol 2015; 8:51-69. [PMID: 26637468 PMCID: PMC4758239 DOI: 10.1093/gbe/evv242] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Bathyal cold seeps are isolated extreme deep-sea environments characterized by low species diversity while biomass can be high. The Håkon Mosby mud volcano (Barents Sea, 1,280 m) is a rather stable chemosynthetic driven habitat characterized by prominent surface bacterial mats with high sulfide concentrations and low oxygen levels. Here, the nematode Halomonhystera hermesi thrives in high abundances (11,000 individuals 10 cm−2). Halomonhystera hermesi is a member of the intertidal Halomonhystera disjuncta species complex that includes five cryptic species (GD1-5). GD1-5’s common habitat is characterized by strong environmental fluctuations. Here, we compared the transcriptomes of H. hermesi and GD1, H. hermesi’s closest relative. Genes encoding proteins involved in oxidative phosphorylation are more strongly expressed in H. hermesi than in GD1, and many genes were only observed in H. hermesi while being completely absent in GD1. Both observations could in part be attributed to high sulfide concentrations and low oxygen levels. Additionally, fatty acid elongation was also prominent in H. hermesi confirming the importance of highly unsaturated fatty acids in this species. Significant higher amounts of transcription factors and genes involved in signaling receptor activity were observed in GD1 (many of which were completely absent in H. hermesi), allowing fast signaling and transcriptional reprogramming which can mediate survival in dynamic intertidal environments. GC content was approximately 8% higher in H. hermesi coding unigenes resulting in differential codon usage between both species and a higher proportion of amino acids with GC-rich codons in H. hermesi. In general our results showed that most pathways were active in both environments and that only three genes are under natural selection. This indicates that also plasticity should be taken in consideration in the evolutionary history of Halomonhystera species. Such plasticity, as well as possible preadaptation to low oxygen and high sulfide levels might have played an important role in the establishment of a cold-seep Halomonhystera population.
Collapse
Affiliation(s)
- Jelle Van Campenhout
- Research Group Marine Biology, Biology Department, Ghent University, Belgium Department of Biology, Center for Molecular Phylogenetics and Evolution (CeMoFe), Ghent University, Biology Department, Belgium
| | - Ann Vanreusel
- Research Group Marine Biology, Biology Department, Ghent University, Belgium
| | - Steven Van Belleghem
- Terrestrial Ecology Unit, Biology Department, Ghent University, Belgium OD Taxonomy and Phylogeny, Royal Belgian Institute of Natural Sciences, Brussels, Belgium
| | - Sofie Derycke
- Research Group Marine Biology, Biology Department, Ghent University, Belgium OD Taxonomy and Phylogeny, Royal Belgian Institute of Natural Sciences, Brussels, Belgium
| |
Collapse
|
28
|
Chen JY, Shen QS, Zhou WZ, Peng J, He BZ, Li Y, Liu CJ, Luan X, Ding W, Li S, Chen C, Tan BCM, Zhang YE, He A, Li CY. Emergence, Retention and Selection: A Trilogy of Origination for Functional De Novo Proteins from Ancestral LncRNAs in Primates. PLoS Genet 2015; 11:e1005391. [PMID: 26177073 PMCID: PMC4503675 DOI: 10.1371/journal.pgen.1005391] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 06/24/2015] [Indexed: 01/08/2023] Open
Abstract
While some human-specific protein-coding genes have been proposed to originate from ancestral lncRNAs, the transition process remains poorly understood. Here we identified 64 hominoid-specific de novo genes and report a mechanism for the origination of functional de novo proteins from ancestral lncRNAs with precise splicing structures and specific tissue expression profiles. Whole-genome sequencing of dozens of rhesus macaque animals revealed that these lncRNAs are generally not more selectively constrained than other lncRNA loci. The existence of these newly-originated de novo proteins is also not beyond anticipation under neutral expectation, as they generally have longer theoretical lifespan than their current age, due to their GC-rich sequence property enabling stable ORFs with lower chance of non-sense mutations. Interestingly, although the emergence and retention of these de novo genes are likely driven by neutral forces, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution, which may contribute to human-specific genetic novelties by taking advantage of existed genomic contexts. Although gene duplication has been believed as a predominant mechanism for creating new genes, recent reports suggested that new proteins could evolve “de novo” from non-coding DNA regions. These de novo genes are also named as “motherless” genes due to their lack of ancestral proteins as precursors, while recently we and others found that lncRNAs may represent an intermediate stage of their origination. To further elucidate this lncRNA-protein transition process, here we identified 64 hominoid-specific de novo genes and report a new mechanism for the origination of functional de novo proteins from ancestral non-coding transcripts: These non-coding “precursors” are generally not more selectively constrained than other lncRNA loci; and the existence of these de novo proteins is not beyond anticipation under neutral expectation; however, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution.
Collapse
Affiliation(s)
- Jia-Yu Chen
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Qing Sunny Shen
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Wei-Zhen Zhou
- Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, China
| | - Jiguang Peng
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Bin Z. He
- FAS Center for Systems Biology & Howard Hughes Medical Institute, Harvard University, Cambridge, Massachusetts, United States of America
| | - Yumei Li
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Chu-Jun Liu
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Xuke Luan
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Beijing, China
| | - Wanqiu Ding
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Shuxian Li
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Chunyan Chen
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | | | - Yong E. Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Aibin He
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Beijing, China
- * E-mail: (AH); (CYL)
| | - Chuan-Yun Li
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
- * E-mail: (AH); (CYL)
| |
Collapse
|
29
|
Mutations That Stimulate flhDC Expression in Escherichia coli K-12. J Bacteriol 2015; 197:3087-96. [PMID: 26170415 DOI: 10.1128/jb.00455-15] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 07/09/2015] [Indexed: 01/01/2023] Open
Abstract
UNLABELLED Motility is a beneficial attribute that enables cells to access and explore new environments and to escape detrimental ones. The organelle of motility in Escherichia coli is the flagellum, and its production is initiated by the activating transcription factors FlhD and FlhC. The expression of these factors by the flhDC operon is highly regulated and influenced by environmental conditions. The flhDC promoter is recognized by σ(70) and is dependent on the transcriptional activator cyclic AMP (cAMP)-cAMP receptor protein complex (cAMP-CRP). A number of K-12 strains exhibit limited motility due to low expression levels of flhDC. We report here a large number of mutations that stimulate flhDC expression in such strains. They include single nucleotide changes in the -10 element of the promoter, in the promoter spacer, and in the cAMP-CRP binding region. In addition, we show that insertion sequence (IS) elements or a kanamycin gene located hundreds of base pairs upstream of the promoter can effectively enhance transcription, suggesting that the topology of a large upstream region plays a significant role in the regulation of flhDC expression. None of the mutations eliminated the requirement for cAMP-CRP for activation. However, several mutations allowed expression in the absence of the nucleoid organizing protein, H-NS, which is normally required for flhDC expression. IMPORTANCE The flhDC operon of Escherichia coli encodes transcription factors that initiate flagellar synthesis, an energetically costly process that is highly regulated. Few deregulating mutations have been reported thus far. This paper describes new single nucleotide mutations that stimulate flhDC expression, including a number that map to the promoter spacer region. In addition, this work shows that insertion sequence elements or a kanamycin gene located far upstream from the promoter or repressor binding sites also stimulate transcription, indicating a role of regional topology in the regulation of flhDC expression.
Collapse
|
30
|
Lengths of Orthologous Prokaryotic Proteins Are Affected by Evolutionary Factors. BIOMED RESEARCH INTERNATIONAL 2015; 2015:786861. [PMID: 26114113 PMCID: PMC4465819 DOI: 10.1155/2015/786861] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 11/02/2014] [Indexed: 12/16/2022]
Abstract
Proteins of the same functional family (for example, kinases) may have significantly different lengths. It is an open question whether such variation in length is random or it appears as a response to some unknown evolutionary driving factors. The main purpose of this paper is to demonstrate existence of factors affecting prokaryotic gene lengths. We believe that the ranking of genomes according to lengths of their genes, followed by the calculation of coefficients of association between genome rank and genome property, is a reasonable approach in revealing such evolutionary driving factors. As we demonstrated earlier, our chosen approach, Bubble-sort, combines stability, accuracy, and computational efficiency as compared to other ranking methods. Application of Bubble Sort to the set of 1390 prokaryotic genomes confirmed that genes of Archaeal species are generally shorter than Bacterial ones. We observed that gene lengths are affected by various factors: within each domain, different phyla have preferences for short or long genes; thermophiles tend to have shorter genes than the soil-dwellers; halophiles tend to have longer genes. We also found that species with overrepresentation of cytosines and guanines in the third position of the codon (GC3 content) tend to have longer genes than species with low GC3 content.
Collapse
|
31
|
Wong TY, Schwartzbach SD. Protein Mis-Termination Initiates Genetic Diseases, Cancers, and Restricts Bacterial Genome Expansion. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, ENVIRONMENTAL CARCINOGENESIS & ECOTOXICOLOGY REVIEWS 2015; 33:255-285. [PMID: 26087060 DOI: 10.1080/10590501.2015.1053461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Protein termination is an important cellular process. Protein termination relies on the stop-codons in the mRNA interacting properly with the releasing factors on the ribosome. One third of inherited diseases, including cancers, are associated with the mutation of the stop-codons. Many pathogens and viruses are able to manipulate their stop-codons to express their virulence. The influence of stop-codons is not limited to the primary reading frame of the genes. Stop-codons in the second and third reading frames are referred as premature stop signals (PSC). Stop-codons and PSCs together are collectively referred as stop-signals. The ratios of the stop-signals (referred as translation stop-signals ratio or TSSR) of genetically related bacteria, despite their great differences in gene contents, are much alike. This nearly identical Genomic-TSSR value of genetically related bacteria may suggest that bacterial genome expansion is limited by their unique stop-signals bias. We review the protein termination process and the different types of stop-codon mutation in plants, animals, microbes, and viruses, with special emphasis on the role of PSCs in directing bacterial evolution in their natural environments. Knowing the limit of genomic boundary could facilitate the formulation of new strategies in controlling the spread of diseases and combat antibiotic-resistant bacteria.
Collapse
Affiliation(s)
- Tit-Yee Wong
- a Department of Biological Sciences , University of Memphis , Memphis , Tennessee , USA
| | | |
Collapse
|
32
|
Eichenmüller M, Trippel F, Kreuder M, Beck A, Schwarzmayr T, Häberle B, Cairo S, Leuschner I, von Schweinitz D, Strom TM, Kappler R. The genomic landscape of hepatoblastoma and their progenies with HCC-like features. J Hepatol 2014; 61:1312-20. [PMID: 25135868 DOI: 10.1016/j.jhep.2014.08.009] [Citation(s) in RCA: 290] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Revised: 07/15/2014] [Accepted: 08/07/2014] [Indexed: 02/01/2023]
Abstract
BACKGROUND & AIMS Hepatoblastoma (HB) is the most common childhood liver cancer and occasionally presents with histological and clinical features reminiscent of hepatocellular carcinoma (HCC). Identification of molecular mechanisms that drive the neoplastic continuation towards more aggressive HCC phenotypes may help to guide the new stage of targeted therapies. METHODS We performed comprehensive studies on genetic and chromosomal alterations as well as candidate gene function and their clinical relevance. RESULTS Whole-exome sequencing identified HB as a genetically very simple tumour (2.9 mutations per tumour) with recurrent mutations in ß-catenin (CTNNB1) (12/15 cases) and the transcription factor NFE2L2 (2/15 cases). Their HCC-like progenies share the common CTNNB1 mutation, but additionally exhibit a significantly increased mutation number and chromosomal instability due to deletions of the genome guardians RAD17 and TP53, accompanied by telomerase reverse-transcriptase (TERT) promoter mutations. Targeted genotyping of 33 primary tumours and cell lines revealed CTNNB1, NFE2L2, and TERT mutations in 72.5%, 9.8%, and 5.9% of cases, respectively. All NFE2L2 mutations affected residues of the NFE2L2 protein that are recognized by the KEAP1/CUL3 complex for proteasomal degradation. Consequently, cells transfected with mutant NFE2L2 were insensitive to KEAP1-mediated downregulation of NFE2L2 signalling. Clinically, overexpression of the NFE2L2 target gene NQO1 in tumours was significantly associated with metastasis, vascular invasion, the adverse prognostic C2 gene signature, as well as poor outcome. CONCLUSIONS Our study demonstrates the importance of CTNNB1 mutations and NFE2L2-KEAP1 pathway activation in HB development and defines loss of genomic stability and TERT promoter mutations as prominent characteristics of aggressive HB with HCC features.
Collapse
Affiliation(s)
- Melanie Eichenmüller
- Department of Pediatric Surgery, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Franziska Trippel
- Department of Pediatric Surgery, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Michaela Kreuder
- Department of Pediatric Surgery, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Alexander Beck
- Department of Pediatric Surgery, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Thomas Schwarzmayr
- Institute of Human Genetics, Helmholtz Center Munich, Neuherberg, Germany; Institute of Human Genetics, Technical University Munich, Munich, Germany
| | - Beate Häberle
- Department of Pediatric Surgery, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | | | - Ivo Leuschner
- Institute of Paidopathology, Pediatric Tumor Registry, Christian-Albrechts-University Kiel, Kiel, Germany
| | - Dietrich von Schweinitz
- Department of Pediatric Surgery, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Tim M Strom
- Institute of Human Genetics, Helmholtz Center Munich, Neuherberg, Germany; Institute of Human Genetics, Technical University Munich, Munich, Germany
| | - Roland Kappler
- Department of Pediatric Surgery, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-University Munich, Munich, Germany; German Cancer Consortium (DKTK), Heidelberg, Germany; German Cancer Research Center (DKFZ), Heidelberg, Germany.
| |
Collapse
|
33
|
Fares M. Identifying Evolution Signatures in Molecules. NATURAL SELECTION 2014:9-27. [DOI: 10.1201/b17795-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
34
|
Wang Z, Guo F, Mao Y, Xia Y, Zhang T. Metabolic characteristics of a glycogen-accumulating organism in Defluviicoccus cluster II revealed by comparative genomics. MICROBIAL ECOLOGY 2014; 68:716-728. [PMID: 24889288 DOI: 10.1007/s00248-014-0440-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2014] [Accepted: 05/20/2014] [Indexed: 06/03/2023]
Abstract
Glycogen-accumulating organisms (GAOs) may compete with phosphate-accumulating organisms (PAOs) for short-chain fatty acids (VFAs) in anaerobic polyhydroxyalkanoates (PHA) synthesis, but no consequently aerobic polyphosphate accumulation in enhanced biological phosphorus removal (EBPR) process, thus deteriorating the EBPR process. They are detected frequently in the deteriorated EBPR process, but their metabolisms are still far from our comprehensions for there is seldom pure culture. In this study, a nearly complete draft genome of a GAOs in Defluviicoccus cluster II, GAO-HK, is recruited from the metagenome of activated sludge in a full-scale industrial anoxic/aerobic wastewater plant. Comparative genomics reveal similar metabolisms of PHA and glycogen in GAOs of GAO-HK, Defluviicoccus tetraformis TFO71 (TFO71) and Competibacter phosphatis clade IIA (CPIIA), and PAOs of Accumulibacter clade IIA UW-1 (UW-1) and Tetrasphaera elongata Lp2 (Lp2). Although there are similar gene cassettes related with polyphosphate metabolism in these GAOs and PAOs, especially for Defluviicoccus-relative bacteria and UW-1, ppk1 in GAOs are diverse from those in the identified PAOs, implying the difference of polyphosphate metabolism in GAOs and PAOs. Additionally, genes related to the dissimilatory denitrification are absent in TFO71 and GAO-HK, implying that additional nitrate or nitrite may favor PAOs over Defluviicoccus-relative GAOs. Therefore, PAOs suffering from competition of Defluviicoccus-relative GAOs might be rescued with the additional nitrate/nitrite, which is important to improve the stability of EBPR processes.
Collapse
Affiliation(s)
- Zhiping Wang
- Environmental Biotechnology Laboratory, The University of Hong Kong, Hong Kong, SAR, China
| | | | | | | | | |
Collapse
|
35
|
Wang J, Raskin L, Samuels DC, Shyr Y, Guo Y. Genome measures used for quality control are dependent on gene function and ancestry. ACTA ACUST UNITED AC 2014; 31:318-23. [PMID: 25297068 DOI: 10.1093/bioinformatics/btu668] [Citation(s) in RCA: 106] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The transition/transversion (Ti/Tv) ratio and heterozygous/nonreference-homozygous (het/nonref-hom) ratio have been commonly computed in genetic studies as a quality control (QC) measurement. Additionally, these two ratios are helpful in our understanding of the patterns of DNA sequence evolution. RESULTS To thoroughly understand these two genomic measures, we performed a study using 1000 Genomes Project (1000G) released genotype data (N=1092). An additional two datasets (N=581 and N=6) were used to validate our findings from the 1000G dataset. We compared the two ratios among continental ancestry, genome regions and gene functionality. We found that the Ti/Tv ratio can be used as a quality indicator for single nucleotide polymorphisms inferred from high-throughput sequencing data. The Ti/Tv ratio varies greatly by genome region and functionality, but not by ancestry. The het/nonref-hom ratio varies greatly by ancestry, but not by genome regions and functionality. Furthermore, extreme guanine + cytosine content (either high or low) is negatively associated with the Ti/Tv ratio magnitude. Thus, when performing QC assessment using these two measures, care must be taken to apply the correct thresholds based on ancestry and genome region. Failure to take these considerations into account at the QC stage will bias any following analysis. CONTACT yan.guo@vanderbilt.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jing Wang
- Center for Quantitative Sciences and Department of Medicine and Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37212, USA
| | - Leon Raskin
- Center for Quantitative Sciences and Department of Medicine and Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37212, USA
| | - David C Samuels
- Center for Quantitative Sciences and Department of Medicine and Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37212, USA
| | - Yu Shyr
- Center for Quantitative Sciences and Department of Medicine and Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37212, USA
| | - Yan Guo
- Center for Quantitative Sciences and Department of Medicine and Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37212, USA
| |
Collapse
|
36
|
Posnien N, Zeng V, Schwager EE, Pechmann M, Hilbrant M, Keefe JD, Damen WGM, Prpic NM, McGregor AP, Extavour CG. A comprehensive reference transcriptome resource for the common house spider Parasteatoda tepidariorum. PLoS One 2014; 9:e104885. [PMID: 25118601 PMCID: PMC4132015 DOI: 10.1371/journal.pone.0104885] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Accepted: 07/17/2014] [Indexed: 12/12/2022] Open
Abstract
Parasteatoda tepidariorum is an increasingly popular model for the study of spider development and the evolution of development more broadly. However, fully understanding the regulation and evolution of P. tepidariorum development in comparison to other animals requires a genomic perspective. Although research on P. tepidariorum has provided major new insights, gene analysis to date has been limited to candidate gene approaches. Furthermore, the few available EST collections are based on embryonic transcripts, which have not been systematically annotated and are unlikely to contain transcripts specific to post-embryonic stages of development. We therefore generated cDNA from pooled embryos representing all described embryonic stages, as well as post-embryonic stages including nymphs, larvae and adults, and using Illumina HiSeq technology obtained a total of 625,076,514 100-bp paired end reads. We combined these data with 24,360 ESTs available in GenBank, and 1,040,006 reads newly generated from 454 pyrosequencing of a mixed-stage embryo cDNA library. The combined sequence data were assembled using a custom de novo assembly strategy designed to optimize assembly product length, number of predicted transcripts, and proportion of raw reads incorporated into the assembly. The de novo assembly generated 446,427 contigs with an N50 of 1,875 bp. These sequences obtained 62,799 unique BLAST hits against the NCBI non-redundant protein data base, including putative orthologs to 8,917 Drosophila melanogaster genes based on best reciprocal BLAST hit identity compared with the D. melanogaster proteome. Finally, we explored the utility of the transcriptome for RNA-Seq studies, and showed that this resource can be used as a mapping scaffold to detect differential gene expression in different cDNA libraries. This resource will therefore provide a platform for future genomic, gene expression and functional approaches using P. tepidariorum.
Collapse
Affiliation(s)
- Nico Posnien
- Johann-Friedrich-Blumenbach-Institute for Zoology and Anthropology, Department of Developmental Biology, Georg-August-University Göttingen, GZMB Ernst-Caspari-Haus, Göttingen, Germany
- * E-mail: (NP); (CGE)
| | - Victor Zeng
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Evelyn E. Schwager
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, United Kingdom
| | - Matthias Pechmann
- Cologne Biocenter, Institute of Developmental Biology, University of Cologne, Cologne, Germany
| | - Maarten Hilbrant
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, United Kingdom
| | - Joseph D. Keefe
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, United Kingdom
| | - Wim G. M. Damen
- Department of Genetics, Friedrich Schiller University Jena, Jena, Germany
| | - Nikola-Michael Prpic
- Johann-Friedrich-Blumenbach-Institute for Zoology and Anthropology, Department of Developmental Biology, Georg-August-University Göttingen, GZMB Ernst-Caspari-Haus, Göttingen, Germany
| | - Alistair P. McGregor
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, United Kingdom
| | - Cassandra G. Extavour
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
- * E-mail: (NP); (CGE)
| |
Collapse
|
37
|
Li XQ, Du D. Variation, evolution, and correlation analysis of C+G content and genome or chromosome size in different kingdoms and phyla. PLoS One 2014; 9:e88339. [PMID: 24551092 PMCID: PMC3923770 DOI: 10.1371/journal.pone.0088339] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2013] [Accepted: 01/06/2014] [Indexed: 12/05/2022] Open
Abstract
C+G content (GC content or G+C content) is known to be correlated with genome/chromosome size in bacteria but the relationship for other kingdoms remains unclear. This study analyzed genome size, chromosome size, and base composition in most of the available sequenced genomes in various kingdoms. Genome size tends to increase during evolution in plants and animals, and the same is likely true for bacteria. The genomic C+G contents were found to vary greatly in microorganisms but were quite similar within each animal or plant subkingdom. In animals and plants, the C+G contents are ranked as follows: monocot plants>mammals>non-mammalian animals>dicot plants. The variation in C+G content between chromosomes within species is greater in animals than in plants. The correlation between average chromosome C+G content and chromosome length was found to be positive in Proteobacteria, Actinobacteria (but not in other analyzed bacterial phyla), Ascomycota fungi, and likely also in some plants; negative in some animals, insignificant in two protist phyla, and likely very weak in Archaea. Clearly, correlations between C+G content and chromosome size can be positive, negative, or not significant depending on the kingdoms/groups or species. Different phyla or species exhibit different patterns of correlation between chromosome-size and C+G content. Most chromosomes within a species have a similar pattern of variation in C+G content but outliers are common. The data presented in this study suggest that the C+G content is under genetic control by both trans- and cis- factors and that the correlation between C+G content and chromosome length can be positive, negative, or not significant in different phyla.
Collapse
Affiliation(s)
- Xiu-Qing Li
- Molecular Genetics Laboratory, Potato Research Centre, Agriculture and Agri-Food Canada, Fredericton, New Brunswick, Canada
- * E-mail:
| | - Donglei Du
- Quantitative Methods Research Group, Faculty of Business Administration, University of New Brunswick, Fredericton, New Brunswick, Canada
| |
Collapse
|
38
|
Tatarinova T, Elhaik E, Pellegrini M. Cross-species analysis of genic GC3 content and DNA methylation patterns. Genome Biol Evol 2013; 5:1443-56. [PMID: 23833164 PMCID: PMC3762193 DOI: 10.1093/gbe/evt103] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The GC content in the third codon position (GC3) exhibits a unimodal distribution in many plant and animal genomes. Interestingly, grasses and homeotherm vertebrates exhibit a unique bimodal distribution. High GC3 was previously found to be associated with variable expression, higher frequency of upstream TATA boxes, and an increase of GC3 from 5′ to 3′. Moreover, GC3-rich genes are predominant in certain gene classes and are enriched in CpG dinucleotides that are potential targets for methylation. Based on the GC3 bimodal distribution we hypothesize that GC3 has a regulatory role involving methylation and gene expression. To test that hypothesis, we selected diverse taxa (rice, thale cress, bee, and human) that varied in the modality of their GC3 distribution and tested the association between GC3, DNA methylation, and gene expression. We examine the relationship between cytosine methylation levels and GC3, gene expression, genome signature, gene length, and other gene compositional features. We find a strong negative correlation (Pearson’s correlation coefficient r = −0.67, P value < 0.0001) between GC3 and genic CpG methylation. The comparison between 5′-3′ gradients of CG3-skew and genic methylation for the taxa in the study suggests interplay between gene-body methylation and transcription-coupled cytosine deamination effect. Compositional features are correlated with methylation levels of genes in rice, thale cress, human, bee, and fruit fly (which acts as an unmethylated control). These patterns allow us to generate evolutionary hypotheses about the relationships between GC3 and methylation and how these affect expression patterns. Specifically, we propose that the opposite effects of methylation and compositional gradients along coding regions of GC3-poor and GC3-rich genes are the products of several competing processes.
Collapse
Affiliation(s)
- Tatiana Tatarinova
- Laboratory of Applied Pharmacokinetics and Bioinformatics, University of Southern California.
| | | | | |
Collapse
|
39
|
Gurudeeban S, Satyavani K, Ramanathan T. Phylogeny of Indian rhizophoraceae based on the molecular data from chloroplast tRNA(LEU)UAA intergenic sequences. Pak J Biol Sci 2013; 16:1130-7. [PMID: 24506012 DOI: 10.3923/pjbs.2013.1130.1137] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Molecular identification data of unexplored Indian Rhizophoraceae an eco-friendly mangroves are an imperative aspect in molecular phylogenetics. We describe the phylogenetic relationships among the Rhizophoraceae genus Rhizophora, Ceriops and Bruguiera using tRNA Leu (UAA) intron sequences as a molecular marker. The results of present study reveals congeneric relationship between R. apiculata, R. mucronata, B. gymnorhiza indicating a high degree of gene flow within them and the pairwise distribution of study plants among Rhizophoraceae family. The phylogram constructed using tRNA Leu sequence clearly clustered the species of the same genus in individual group. The stem-loop could be divided into two classes, both built up from two base pairing heptanucleotide repeats. Size variation was primarily caused by different numbers of repeats but some strains also contained additional sequences in this stem-loop. The statistical summaries of DNA sequence data can enable to identify the structural signature of the genome as well as to classify phylogenetic relationships among different species reflected in the difference of genetic diversity distributions within their DNA sequences.
Collapse
Affiliation(s)
- S Gurudeeban
- Marine Floral Biotechnology Laboratory, Centre of Advanced Study in Marine Biology, Faculty of Marine Sciences, Annamalai University, Parangipettai 608502, Tamil Nadu, India
| | - K Satyavani
- Marine Floral Biotechnology Laboratory, Centre of Advanced Study in Marine Biology, Faculty of Marine Sciences, Annamalai University, Parangipettai 608502, Tamil Nadu, India
| | - T Ramanathan
- Marine Floral Biotechnology Laboratory, Centre of Advanced Study in Marine Biology, Faculty of Marine Sciences, Annamalai University, Parangipettai 608502, Tamil Nadu, India
| |
Collapse
|
40
|
Romiguier J, Ranwez V, Delsuc F, Galtier N, Douzery EJP. Less is more in mammalian phylogenomics: AT-rich genes minimize tree conflicts and unravel the root of placental mammals. Mol Biol Evol 2013; 30:2134-44. [PMID: 23813978 DOI: 10.1093/molbev/mst116] [Citation(s) in RCA: 117] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Despite the rapid increase of size in phylogenomic data sets, a number of important nodes on animal phylogeny are still unresolved. Among these, the rooting of the placental mammal tree is still a controversial issue. One difficulty lies in the pervasive phylogenetic conflicts among genes, with each one telling its own story, which may be reliable or not. Here, we identified a simple criterion, that is, the GC content, which substantially helps in determining which gene trees best reflect the species tree. We assessed the ability of 13,111 coding sequence alignments to correctly reconstruct the placental phylogeny. We found that GC-rich genes induced a higher amount of conflict among gene trees and performed worse than AT-rich genes in retrieving well-supported, consensual nodes on the placental tree. We interpret this GC effect mainly as a consequence of genome-wide variations in recombination rate. Indeed, recombination is known to drive GC-content evolution through GC-biased gene conversion and might be problematic for phylogenetic reconstruction, for instance, in an incomplete lineage sorting context. When we focused on the AT-richest fraction of the data set, the resolution level of the placental phylogeny was greatly increased, and a strong support was obtained in favor of an Afrotheria rooting, that is, Afrotheria as the sister group of all other placentals. We show that in mammals most conflicts among gene trees, which have so far hampered the resolution of the placental tree, are concentrated in the GC-rich regions of the genome. We argue that the GC content-because it is a reliable indicator of the long-term recombination rate-is an informative criterion that could help in identifying the most reliable molecular markers for species tree inference.
Collapse
Affiliation(s)
- Jonathan Romiguier
- CNRS, Université Montpellier, Institut des Sciences de l'Evolution, Montpellier, France.
| | | | | | | | | |
Collapse
|
41
|
Abstract
Background Computational gene finding algorithms have proven their robustness in identifying genes in complete genomes. However, metagenomic sequencing has presented new challenges due to the incomplete and fragmented nature of the data. During the last few years, attempts have been made to extract complete and incomplete open reading frames (ORFs) directly from short reads and identify the coding ORFs, bypassing other challenging tasks such as the assembly of the metagenome. Results In this paper we introduce a metagenomics gene caller (MGC) which is an improvement over the state-of-the-art prediction algorithm Orphelia. Orphelia uses a two-stage machine learning approach and computes a model that classifies extracted ORFs from fragmented sequences. We hypothesise and demonstrate evidence that sequences need separate models based on their local GC-content in order to avoid the noise introduced to a single model computed with sequences from the entire GC spectrum. We have also added two amino-acid features based on the benefit of amino-acid usage shown in our previous research. Our algorithm is able to predict genes and translation initiation sites (TIS) more accurately than Orphelia which uses a single model. Conclusions Learning separate models for several pre-defined GC-content regions as opposed to a single model approach improves the performance of the neural network as demonstrated by the experimental results presented in this paper. The inclusion of amino-acid usage features also helps improve the overall accuracy of our algorithm. MGC's improvement sets the ground for further investigation into the use of GC-content to separate data for training models in machine learning based gene finders.
Collapse
Affiliation(s)
- Achraf El Allali
- Department of Computer Science and Engineering, University of South Carolina, 315 Main Street, Columbia, SC 29208, USA.
| | | |
Collapse
|
42
|
Predicting statistical properties of open reading frames in bacterial genomes. PLoS One 2012; 7:e45103. [PMID: 23028785 PMCID: PMC3454372 DOI: 10.1371/journal.pone.0045103] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 08/14/2012] [Indexed: 11/26/2022] Open
Abstract
An analytical model based on the statistical properties of Open Reading Frames (ORFs) of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.
Collapse
|
43
|
Tiessen A, Pérez-Rodríguez P, Delaye-Arredondo LJ. Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes. BMC Res Notes 2012; 5:85. [PMID: 22296664 PMCID: PMC3296660 DOI: 10.1186/1756-0500-5-85] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2011] [Accepted: 02/01/2012] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND The sizes of proteins are relevant to their biochemical structure and for their biological function. The statistical distribution of protein lengths across a diverse set of taxa can provide hints about the evolution of proteomes. RESULTS Using the full genomic sequences of over 1,302 prokaryotic and 140 eukaryotic species two datasets containing 1.2 and 6.1 million proteins were generated and analyzed statistically. The lengthwise distribution of proteins can be roughly described with a gamma type or log-normal model, depending on the species. However the shape parameter of the gamma model has not a fixed value of 2, as previously suggested, but varies between 1.5 and 3 in different species. A gamma model with unrestricted shape parameter described best the distributions in ~48% of the species, whereas the log-normal distribution described better the observed protein sizes in 42% of the species. The gamma restricted function and the sum of exponentials distribution had a better fitting in only ~5% of the species. Eukaryotic proteins have an average size of 472 aa, whereas bacterial (320 aa) and archaeal (283 aa) proteins are significantly smaller (33-40% on average). Average protein sizes in different phylogenetic groups were: Alveolata (628 aa), Amoebozoa (533 aa), Fornicata (543 aa), Placozoa (453 aa), Eumetazoa (486 aa), Fungi (487 aa), Stramenopila (486 aa), Viridiplantae (392 aa). Amino acid composition is biased according to protein size. Protein length correlated negatively with %C, %M, %K, %F, %R, %W, %Y and positively with %D, %E, %Q, %S and %T. Prokaryotic proteins had a different protein size bias for %E, %G, %K and %M as compared to eukaryotes. CONCLUSIONS Mathematical modeling of protein length empirical distributions can be used to asses the quality of small ORFs annotation in genomic releases (detection of too many false positive small ORFs). There is a negative correlation between average protein size and total number of proteins among eukaryotes but not in prokaryotes. The %GC content is positively correlated to total protein number and protein size in prokaryotes but not in eukaryotes. Small proteins have a different amino acid bias than larger proteins. Compared to prokaryotic species, the evolution of eukaryotic proteomes was characterized by increased protein number (massive gene duplication) and substantial changes of protein size (domain addition/subtraction).
Collapse
Affiliation(s)
- Axel Tiessen
- Departamento de Ingeniería Genética, CINVESTAV Irapuato, Irapuato, CP 36821, Mexico
| | | | | |
Collapse
|
44
|
Wu H, Zhang Z, Hu S, Yu J. On the molecular mechanism of GC content variation among eubacterial genomes. Biol Direct 2012; 7:2. [PMID: 22230424 PMCID: PMC3274465 DOI: 10.1186/1745-6150-7-2] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2011] [Accepted: 01/10/2012] [Indexed: 12/02/2022] Open
Abstract
Background As a key parameter of genome sequence variation, the GC content of bacterial genomes has been investigated for over half a century, and many hypotheses have been put forward to explain this GC content variation and its relationship to other fundamental processes. Previously, we classified eubacteria into dnaE-based groups (the dimeric combination of DNA polymerase III alpha subunits), according to a hypothesis where GC content variation is essentially governed by genome replication and DNA repair mechanisms. Further investigation led to the discovery that two major mutator genes, polC and dnaE2, may be responsible for genomic GC content variation. Consequently, an in-depth analysis was conducted to evaluate various potential intrinsic and extrinsic factors in association with GC content variation among eubacterial genomes. Results Mutator genes, especially those with dominant effects on the mutation spectra, are biased towards either GC or AT richness, and they alter genomic GC content in the two opposite directions. Increased bacterial genome size (or gene number) appears to rely on increased genomic GC content; however, it is unclear whether the changes are directly related to certain environmental pressures. Certain environmental and bacteriological features are related to GC content variation, but their trends are more obvious when analyzed under the dnaE-based grouping scheme. Most terrestrial, plant-associated, and nitrogen-fixing bacteria are members of the dnaE1|dnaE2 group, whereas most pathogenic or symbiotic bacteria in insects, and those dwelling in aquatic environments, are largely members of the dnaE1|polV group. Conclusion Our studies provide several lines of evidence indicating that DNA polymerase III α subunit and its isoforms participating in either replication (such as polC) or SOS mutagenesis/translesion synthesis (such as dnaE2), play dominant roles in determining GC variability. Other environmental or bacteriological factors, such as genome size, temperature, oxygen requirement, and habitat, either play subsidiary roles or rely indirectly on different mutator genes to fine-tune the GC content. These results provide a comprehensive insight into mechanisms of GC content variation and the robustness of eubacterial genomes in adapting their ever-changing environments over billions of years. Reviewers This paper was reviewed by Nicolas Galtier, Adam Eyre-Walker, and Eugene Koonin.
Collapse
Affiliation(s)
- Hao Wu
- James D Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310007, China
| | | | | | | |
Collapse
|
45
|
Huang Q, Cheng X, Cheung MK, Kiselev SS, Ozoline ON, Kwan HS. High-density transcriptional initiation signals underline genomic islands in bacteria. PLoS One 2012; 7:e33759. [PMID: 22448273 PMCID: PMC3309015 DOI: 10.1371/journal.pone.0033759] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Accepted: 02/21/2012] [Indexed: 02/07/2023] Open
Abstract
Genomic islands (GIs), frequently associated with the pathogenicity of bacteria and having a substantial influence on bacterial evolution, are groups of "alien" elements which probably undergo special temporal-spatial regulation in the host genome. Are there particular hallmark transcriptional signals for these "exotic" regions? We here explore the potential transcriptional signals that underline the GIs beyond the conventional views on basic sequence composition, such as codon usage and GC property bias. It showed that there is a significant enrichment of the transcription start positions (TSPs) in the GI regions compared to the whole genome of Salmonella enterica and Escherichia coli. There was up to a four-fold increase for the 70% GIs, implying high-density TSPs profile can potentially differentiate the GI regions. Based on this feature, we developed a new sliding window method GIST, Genomic-island Identification by Signals of Transcription, to identify these regions. Subsequently, we compared the known GI-associated features of the GIs detected by GIST and by the existing method Islandviewer to those of the whole genome. Our method demonstrates high sensitivity in detecting GIs harboring genes with biased GI-like function, preferred subcellular localization, skewed GC property, shorter gene length and biased "non-optimal" codon usage. The special transcriptional signals discovered here may contribute to the coordinate expression regulation of foreign genes. Finally, by using GIST, we detected many interesting GIs in the 2011 German E. coli O104:H4 outbreak strain TY-2482, including the microcin H47 system and gene cluster ycgXEFZ-ymgABC that activates the production of biofilm matrix. The aforesaid findings highlight the power of GIST to predict GIs with distinct intrinsic features to the genome. The heterogeneity of cumulative TSPs profiles may not only be a better identity for "alien" regions, but also provide hints to the special evolutionary course and transcriptional regulation of GI regions.
Collapse
Affiliation(s)
- Qianli Huang
- School of Life Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Xuanjin Cheng
- School of Life Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Man Kit Cheung
- School of Life Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Sergey S. Kiselev
- Institute of Cell Biophysics, Russian Academy of Sciences, Moscow, Russia
| | - Olga N. Ozoline
- Institute of Cell Biophysics, Russian Academy of Sciences, Moscow, Russia
| | - Hoi Shan Kwan
- School of Life Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
- * E-mail:
| |
Collapse
|
46
|
Schmid P, Flegel WA. Codon usage in vertebrates is associated with a low risk of acquiring nonsense mutations. J Transl Med 2011; 9:87. [PMID: 21651781 PMCID: PMC3123582 DOI: 10.1186/1479-5876-9-87] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2011] [Accepted: 06/08/2011] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Codon usage in genomes is biased towards specific subsets of codons. Codon usage bias affects translational speed and accuracy, and it is associated with the tRNA levels and the GC content of the genome. Spontaneous mutations drive genomes to a low GC content. Active cellular processes are needed to maintain a high GC content, which influences the codon usage of a species. Loss-of-function mutations, such as nonsense mutations, are the molecular basis of many recessive alleles, which can greatly affect the genome of an organism and are the cause of many genetic diseases in humans. METHODS We developed an event based model to calculate the risk of acquiring nonsense mutations in coding sequences. Complete coding sequences and genomes of 40 eukaryotes were analyzed for GC and CpG content, codon usage, and the associated risk of acquiring nonsense mutations. We included one species per genus for all eukaryotes with available reference sequence. RESULTS We discovered that the codon usage bias detected in genomes of high GC content decreases the risk of acquiring nonsense mutations (Pearson's r = -0.95; P < 0.0001). In the genomes of all examined vertebrates, including humans, this risk was lower than expected (0.93 ± 0.02; mean ± SD) and lower than the risk in genomes of non-vertebrates (1.02 ± 0.13; P = 0.019). CONCLUSIONS While the maintenance of a high GC content is energetically costly, it is associated with a codon usage bias harboring a low risk of acquiring nonsense mutations. The reduced exposure to this risk may contribute to the fitness of vertebrates.
Collapse
Affiliation(s)
- Pirmin Schmid
- National Institutes of Health, Clinical Center, Bethesda, MD, USA
| | | |
Collapse
|
47
|
Stoletzki N. The surprising negative correlation of gene length and optimal codon use--disentangling translational selection from GC-biased gene conversion in yeast. BMC Evol Biol 2011; 11:93. [PMID: 21481245 PMCID: PMC3096941 DOI: 10.1186/1471-2148-11-93] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2010] [Accepted: 04/11/2011] [Indexed: 02/06/2023] Open
Abstract
Background Surprisingly, in several multi-cellular eukaryotes optimal codon use correlates negatively with gene length. This contrasts with the expectation under selection for translational accuracy. While suggested explanations focus on variation in strength and efficiency of translational selection, it has rarely been noticed that the negative correlation is reported only in organisms whose optimal codons are biased towards codons that end with G or C (-GC). This raises the question whether forces that affect base composition - such as GC-biased gene conversion - contribute to the negative correlation between optimal codon use and gene length. Results Yeast is a good organism to study this as equal numbers of optimal codons end in -GC and -AT and one may hence compare frequencies of optimal GC- with optimal AT-ending codons to disentangle the forces. Results of this study demonstrate in yeast frequencies of GC-ending (optimal AND non-optimal) codons decrease with gene length and increase with recombination. A decrease of GC-ending codons along genes contributes to the negative correlation with gene length. Correlations with recombination and gene expression differentiate between GC-ending and optimal codons, and also substitution patterns support effects of GC-biased gene conversion. Conclusion While the general effect of GC-biased gene conversion is well known, the negative correlation of optimal codon use with gene length has not been considered in this context before. Initiation of gene conversion events in promoter regions and the presence of a gene conversion gradient most likely explain the observed decrease of GC-ending codons with gene length and gene position.
Collapse
Affiliation(s)
- Nina Stoletzki
- Ludwig-Maximilan Universität, Biocenter, Grosshadernerstr, 2, D-82152 Planegg-Martinsried, Germany.
| |
Collapse
|
48
|
|
49
|
McCoy MW, Allen AP, Gillooly JF. The random nature of genome architecture: predicting open reading frame distributions. PLoS One 2009; 4:e6456. [PMID: 19649247 PMCID: PMC2714469 DOI: 10.1371/journal.pone.0006456] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2009] [Accepted: 06/23/2009] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND A better understanding of the size and abundance of open reading frames (ORFS) in whole genomes may shed light on the factors that control genome complexity. Here we examine the statistical distributions of open reading frames (i.e. distribution of start and stop codons) in the fully sequenced genomes of 297 prokaryotes, and 14 eukaryotes. METHODOLOGY/PRINCIPAL FINDINGS By fitting mixture models to data from whole genome sequences we show that the size-frequency distributions for ORFS are strikingly similar across prokaryotic and eukaryotic genomes. Moreover, we show that i) a large fraction (60-80%) of ORF size-frequency distributions can be predicted a priori with a stochastic assembly model based on GC content, and that (ii) size-frequency distributions of the remaining "non-random" ORFs are well-fitted by log-normal or gamma distributions, and similar to the size distributions of annotated proteins. CONCLUSIONS/SIGNIFICANCE Our findings suggest stochastic processes have played a primary role in the evolution of genome complexity, and that common processes govern the conservation and loss of functional genomics units in both prokaryotes and eukaryotes.
Collapse
Affiliation(s)
- Michael W McCoy
- Department of Biology, Boston University, Boston, Massachusetts, United States of America.
| | | | | |
Collapse
|
50
|
Comparative component analysis of exons with different splicing frequencies. PLoS One 2009; 4:e5387. [PMID: 19404386 PMCID: PMC2671145 DOI: 10.1371/journal.pone.0005387] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2008] [Accepted: 03/31/2009] [Indexed: 12/12/2022] Open
Abstract
Transcriptional isoforms are not just random combinations of exons. What has caused exons to be differentially spliced and whether exons with different splicing frequencies are subjected to divergent regulation by potential elements or splicing signals? Beyond the conventional classification for alternatively spliced exons (ASEs) and constitutively spliced exons (CSEs), we have classified exons from alternatively spliced human genes and their mouse orthologs (12,314 and 5,464, respectively) into four types based on their splicing frequencies. Analysis has indicated that different groups of exons presented divergent compositional and regulatory properties. Interestingly, with the decrease of splicing frequency, exons tend to have greater lengths, higher GC content, and contain more splicing elements and repetitive elements, which seem to imply that the splicing frequency is influenced by such factors. Comparison of non-alternatively spliced (NAS) mouse genes with alternatively spliced human orthologs also suggested that exons with lower splicing frequencies may be newly evolved ones which gained functions with splicing frequencies altered through the evolution. Our findings have revealed for the first time that certain factors may have critical influence on the splicing frequency, suggesting that exons with lower splicing frequencies may originate from old repetitive sequences, with splicing sites altered by mutation, gaining novel functions and become more frequently spliced.
Collapse
|