Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Oliver JL, Marín A. A relationship between GC content and coding-sequence length. J Mol Evol 1996;43:216-23. [PMID: 8703087 DOI: 10.1007/bf02338829] [Citation(s) in RCA: 72] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Number

Cited by Other Article(s)

Gu X, Li L, Zhong X, Su Y, Wang T. The size diversity of the Pteridaceae family chloroplast genome is caused by overlong intergenic spacers. BMC Genomics 2024;25:396. [PMID: 38649816 PMCID: PMC11036588 DOI: 10.1186/s12864-024-10296-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 04/09/2024] [Indexed: 04/25/2024] Open

Liu X, Xiao C, Xu X, Zhang J, Mo F, Chen JY, Delihas N, Zhang L, An NA, Li CY. Origin of functional de novo genes in humans from "hopeful monsters". WILEY INTERDISCIPLINARY REVIEWS. RNA 2024;15:e1845. [PMID: 38605485 DOI: 10.1002/wrna.1845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 03/13/2024] [Accepted: 03/18/2024] [Indexed: 04/13/2024]

Affiliation(s)

Xiaoge Liu State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
Chunfu Xiao State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
Xinwei Xu State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
Jie Zhang State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
Fan Mo State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
Jia-Yu Chen State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing, China
Nicholas Delihas Department of Microbiology and Immunology, Renaissance School of Medicine, Stony Brook University, Stony Brook, New York, USA
Li Zhang Chinese Institute for Brain Research, Beijing, China
Ni A An State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
Chuan-Yun Li State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China Chinese Institute for Brain Research, Beijing, China Southwest United Graduate School, Kunming, China

Collapse

Khandia R, Gurjar P, Kamal MA, Greig NH. Relative synonymous codon usage and codon pair analysis of depression associated genes. Sci Rep 2024;14:3502. [PMID: 38346990 PMCID: PMC10861588 DOI: 10.1038/s41598-024-51909-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 01/11/2024] [Indexed: 02/15/2024] Open

Baker L, David C, Jacobs DJ. Ab initio gene prediction for protein-coding regions. BIOINFORMATICS ADVANCES 2023;3:vbad105. [PMID: 37638212 PMCID: PMC10448985 DOI: 10.1093/bioadv/vbad105] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 07/04/2023] [Accepted: 08/08/2023] [Indexed: 08/29/2023]

Liu D, Liu LL, Zheng XQ, Chen R, Lin LR, Yang TC, Tong ML. Genetic Profiling of the Full-Length tprK Gene in Patients with Primary and Secondary Syphilis. Microbiol Spectr 2023;11:e0493122. [PMID: 37036342 PMCID: PMC10269439 DOI: 10.1128/spectrum.04931-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 03/17/2023] [Indexed: 04/11/2023] Open

Abstract

TprK antigenic variation is acknowledged as an important strategy developed by Treponema pallidum to achieve immune evasion. Previous studies applied short-read sequencing to explore tprK gene sequence diversity in clinical samples; however, due to the limitations of short-read sequencing, it was difficult to determine the linkage between the seven V regions, and crucial information about full-length tprK variants was lost. Although two recent studies explored complete tprK gene profiles in natural human syphilis infection, there are still too few profiled full-length tprK variants among clinical T. pallidum isolates to fully understand the characteristics of TprK coding diversity. Here, Pacific Biosciences (PacBio) long-read sequencing was applied to examine the diversity of full-length tprK variants in 21 clinical T. pallidum isolates from 11 patients with primary syphilis and 10 patients with secondary syphilis. A total of 398 high-confidence full-length sequences, which presented remarkable sequence heterogeneity, were found. However, these full-length tprK variants exhibited limited variation in length and GC content, showing 24 length types and average GC content of 51.5 ± 0.42% and 51.6 ± 0.26% for primary and secondary syphilis samples, respectively. Additionally, the combined patterns of mutated V regions generating new tprK variants were obviously different in primary and secondary syphilis samples. The diversity of tprK gene sequences in primary syphilis samples may represent the underlying variability of the bacterium; conversely, the variability of the tprK gene in secondary syphilis samples may more accurately reflect how T. pallidum escapes host immune clearance. These data highlight the tprK gene as an important coding gene that shows conflicting genetic characteristics but underlies the persistence of spirochete infection. IMPORTANCE The resurgence of syphilis in both low- and high-income countries has attracted attention, and persistent infection by the pathogen has long been a research focus. The tprK gene, encoding the hypervariable outer membrane protein, is thought to be responsible for pathogen immune evasion and persistent infection. Here, PacBio long-read sequencing was applied to examine the diversity of full-length tprK variants in 21 clinical T. pallidum isolates from 11 patients with primary syphilis and 10 patients with secondary syphilis. The results showed that the sequences of the tprK gene were remarkably heterogeneous; however, the sequences presented limited variation in length and GC content. The investigation of the combined patterns of the V regions allowed us to gain insight into the features of the tprK gene generating new variants at different clinical stages. The findings of this study will be helpful for further exploration of the pathogenesis of syphilis.

Collapse

Nevers Y, Glover NM, Dessimoz C, Lecompte O. Protein length distribution is remarkably uniform across the tree of life. Genome Biol 2023;24:135. [PMID: 37291671 PMCID: PMC10251718 DOI: 10.1186/s13059-023-02973-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/16/2023] [Indexed: 06/10/2023] Open

Lamolle G, Simón D, Iriarte A, Musto H. Main Factors Shaping Amino Acid Usage Across Evolution. J Mol Evol 2023:10.1007/s00239-023-10120-5. [PMID: 37264211 DOI: 10.1007/s00239-023-10120-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 05/17/2023] [Indexed: 06/03/2023]

The first draft genome assembly and data analysis of the Malaysian mahseer (Tor tambroides). AQUACULTURE AND FISHERIES 2022. [DOI: 10.1016/j.aaf.2022.05.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]

Pavesi A. Origin, Evolution and Stability of Overlapping Genes in Viruses: A Systematic Review. Genes (Basel) 2021;12:genes12060809. [PMID: 34073395 PMCID: PMC8227390 DOI: 10.3390/genes12060809] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 05/22/2021] [Accepted: 05/24/2021] [Indexed: 12/11/2022] Open

The whale shark genome reveals how genomic and physiological properties scale with body size. Proc Natl Acad Sci U S A 2020;117:20662-20671. [PMID: 32753383 DOI: 10.1073/pnas.1922576117] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open

Avissa R, Widyaningtyas ST, Bela B. Optimization of the apolipoprotein B mRNA editing enzyme catalytic polypeptidelike-3G (APOBEC3G) gene to enhance its expression in Escherichia coli . MEDICAL JOURNAL OF INDONESIA 2020. [DOI: 10.13181/mji.oa.202853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open

Exploring Castellaniella defragrans Linalool (De)hydratase-Isomerase for Enzymatic Hydration of Alkenes. Molecules 2019;24:molecules24112092. [PMID: 31159367 PMCID: PMC6600392 DOI: 10.3390/molecules24112092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Revised: 05/30/2019] [Accepted: 05/31/2019] [Indexed: 01/08/2023] Open

Kasai F, O'Brien PCM, Ferguson-Smith MA. Squamate Chromosome Size and GC Content Assessed by Flow Karyotyping. Cytogenet Genome Res 2019;157:46-52. [PMID: 30904910 DOI: 10.1159/000497265] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Miravet-Verde S, Ferrar T, Espadas-García G, Mazzolini R, Gharrab A, Sabido E, Serrano L, Lluch-Senar M. Unraveling the hidden universe of small proteins in bacterial genomes. Mol Syst Biol 2019;15:e8290. [PMID: 30796087 PMCID: PMC6385055 DOI: 10.15252/msb.20188290] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Wang G, Yin H, Li B, Yu C, Wang F, Xu X, Cao J, Bao Y, Wang L, Abbasi AA, Bajic VB, Ma L, Zhang Z. Characterization and identification of long non-coding RNAs based on feature relationship. Bioinformatics 2019;35:2949-2956. [DOI: 10.1093/bioinformatics/btz008] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 12/05/2018] [Accepted: 01/07/2019] [Indexed: 01/24/2023] Open

Abstract Abstract Motivation The significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification of lncRNAs in a wide range of species remains challenging; it requires prior knowledge of well-established sequences and annotations or species-specific training data, but the reality is that only a limited number of species have high-quality sequences and annotations. Results Here we first characterize lncRNAs in contrast to protein-coding RNAs based on feature relationship and find that the feature relationship between open reading frame length and guanine-cytosine (GC) content presents universally substantial divergence in lncRNAs and protein-coding RNAs, as observed in a broad variety of species. Based on the feature relationship, accordingly, we further present LGC, a novel algorithm for identifying lncRNAs that is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without any prior knowledge. As validated on large-scale empirical datasets, comparative results show that LGC outperforms existing algorithms by achieving higher accuracy, well-balanced sensitivity and specificity, and is robustly effective (>90% accuracy) in discriminating lncRNAs from protein-coding RNAs across diverse species that range from plants to mammals. To our knowledge, this study, for the first time, differentially characterizes lncRNAs and protein-coding RNAs based on feature relationship, which is further applied in computational identification of lncRNAs. Taken together, our study represents a significant advance in characterization and identification of lncRNAs and LGC thus bears broad potential utility for computational analysis of lncRNAs in a wide range of species. Availability and implementation LGC web server is publicly available at http://bigd.big.ac.cn/lgc/calculator. The scripts and data can be downloaded at http://bigd.big.ac.cn/biocode/tools/BT000004. Supplementary information Supplementary data are available at Bioinformatics online. Collapse

Affiliation(s)

Guangyu Wang CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
Hongyan Yin CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
Boyang Li Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
Chunlei Yu CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
Fan Wang CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
Xingjian Xu CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
Jiabao Cao CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
Yiming Bao CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
Liguo Wang Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN, USA
Amir A Abbasi National Center for Bioinformatics, Programme of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
Vladimir B Bajic King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal, Kingdom of Saudi Arabia
Lina Ma CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
Zhang Zhang CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China

Collapse

Casola C. From De Novo to "De Nono": The Majority of Novel Protein-Coding Genes Identified with Phylostratigraphy Are Old Genes or Recent Duplicates. Genome Biol Evol 2018;10:2906-2918. [PMID: 30346517 PMCID: PMC6239577 DOI: 10.1093/gbe/evy231] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2018] [Indexed: 12/11/2022] Open

Kasai F, O'Brien PCM, Pereira JC, Ferguson-Smith MA. Marsupial chromosome DNA content and genome size assessed from flow karyotypes: invariable low autosomal GC content. ROYAL SOCIETY OPEN SCIENCE 2018;5:171539. [PMID: 30224977 PMCID: PMC6124049 DOI: 10.1098/rsos.171539] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Accepted: 08/06/2018] [Indexed: 06/08/2023]

Kapase VU, Nesamma AA, Jutur PP. Identification and characterization of candidates involved in production of OMEGAs in microalgae: a gene mining and phylogenomic approach. Prep Biochem Biotechnol 2018;48:619-628. [PMID: 29932840 DOI: 10.1080/10826068.2018.1476886] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Li M, Ponce-Gordo F, Grim JN, Li C, Zou H, Li W, Wu S, Wang G. Morphological Redescription ofOpalina undulataNie 1932 fromFejervarya limnochariswith Molecular Phylogenetic Study of Opalinids (Heterokonta, Opalinea). J Eukaryot Microbiol 2018;65:783-791. [DOI: 10.1111/jeu.12520] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2017] [Revised: 03/20/2018] [Accepted: 03/23/2018] [Indexed: 11/26/2022]

Hu X, Ke L, Wang Z, Zeng Z. Dynamic transcriptome landscape of Asian domestic honeybee (Apis cerana) embryonic development revealed by high-quality RNA sequencing. BMC DEVELOPMENTAL BIOLOGY 2018;18:11. [PMID: 29653508 PMCID: PMC5899340 DOI: 10.1186/s12861-018-0169-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 04/03/2018] [Indexed: 12/18/2022]

Abstract

Background

Honeybee development consists of four stages: embryo, larva, pupa and adult. Embryogenesis, a key process of cell division and differentiation, takes 3 days in honeybees. However, the embryonic transcriptome and the dynamic regulation of embryonic transcription are still largely uncharacterized in honeybees, especially in the Asian honeybee (Apis cerana). Here, we employed high-quality RNA-seq to explore the transcriptome of Asian honeybee embryos at three ages, approximately 24, 48 and 72 h (referred to as Day1, Day2 and Day3, respectively).

Results

Nine embryo samples, three from each age, were collected for RNA-seq. According to the staging scheme of honeybee embryos and the morphological features we observed, our Day1, Day2 and Day3 embryos likely corresponded to the late stage four, stage eight and stage ten development stages, respectively. Hierarchical clustering and principal component analysis showed that same-age samples were grouped together, and the Day2 samples had a closer relationship with the Day3 samples than the Day1 samples. Finally, a total of 18,284 genes harboring 55,646 transcripts were detected in the A. cerana embryos, of which 44.5% consisted of the core transcriptome shared by all three ages of embryos. A total of 4088 upregulated and 3046 downregulated genes were identified among the three embryo ages, of which 2010, 3177 and 1528 genes were upregulated and 2088, 2294 and 303 genes were downregulated from Day1 to Day2, from Day1 to Day3 and from Day2 to Day3, respectively. The downregulated genes were mostly involved in cellular, biosynthetic and metabolic processes, gene expression and protein localization, and macromolecule modification; the upregulated genes mainly participated in cell development and differentiation, tissue, organ and system development, and morphogenesis. Interestingly, several biological processes related to the response to and detection of light stimuli were enriched in the first-day A. cerana embryogenesis but not in the Apis mellifera embryogenesis, which was valuable for further investigations.

Conclusions

Our transcriptomic data substantially expand the number of known transcribed elements in the A. cerana genome and provide a high-quality view of the transcriptome dynamics of A. cerana embryonic development.

Electronic supplementary material

The online version of this article (10.1186/s12861-018-0169-1) contains supplementary material, which is available to authorized users.

Collapse

Hamid MH, Rozano L, Yeong WC, Abdullah JO, Saidi NB. Analysis of MAP kinase MPK4/MEKK1/MKK genes of Carica papaya L. comparative to other plant homologues. Bioinformation 2017;13:31-41. [PMID: 28642634 PMCID: PMC5463617 DOI: 10.6026/97320630013031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Revised: 02/17/2017] [Accepted: 02/17/2017] [Indexed: 12/25/2022] Open

Samchenko AA, Kiselev SS, Kabanov AV, Kondratjev MS, Komarov VM. On the nature of the domination of oligomeric (dA:dT) n tracts in the structure of eukaryotic genomes. Biophysics (Nagoya-shi) 2016. [DOI: 10.1134/s0006350916060233] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Sun S, Xiao J, Zhang H, Zhang Z. Pangenome Evidence for Higher Codon Usage Bias and Stronger Translational Selection in Core Genes of Escherichia coli. Front Microbiol 2016;7:1180. [PMID: 27536275 PMCID: PMC4971109 DOI: 10.3389/fmicb.2016.01180] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Accepted: 07/18/2016] [Indexed: 11/25/2022] Open

Wang G, Sun S, Zhang Z. Randomness in Sequence Evolution Increases over Time. PLoS One 2016;11:e0155935. [PMID: 27224236 PMCID: PMC4880282 DOI: 10.1371/journal.pone.0155935] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 05/06/2016] [Indexed: 12/02/2022] Open

Geographic isolates of Lymantria dispar multiple nucleopolyhedrovirus: Genome sequence analysis and pathogenicity against European and Asian gypsy moth strains. J Invertebr Pathol 2016;137:10-22. [PMID: 27090923 DOI: 10.1016/j.jip.2016.03.014] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Revised: 03/07/2016] [Accepted: 03/29/2016] [Indexed: 02/04/2023]

Savisaar R, Hurst LD. Purifying Selection on Exonic Splice Enhancers in Intronless Genes. Mol Biol Evol 2016;33:1396-418. [PMID: 26802218 PMCID: PMC4868121 DOI: 10.1093/molbev/msw018] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Van Campenhout J, Vanreusel A, Van Belleghem S, Derycke S. Transcription, Signaling Receptor Activity, Oxidative Phosphorylation, and Fatty Acid Metabolism Mediate the Presence of Closely Related Species in Distinct Intertidal and Cold-Seep Habitats. Genome Biol Evol 2015;8:51-69. [PMID: 26637468 PMCID: PMC4758239 DOI: 10.1093/gbe/evv242] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

Abstract

Bathyal cold seeps are isolated extreme deep-sea environments characterized by low species diversity while biomass can be high. The Håkon Mosby mud volcano (Barents Sea, 1,280 m) is a rather stable chemosynthetic driven habitat characterized by prominent surface bacterial mats with high sulfide concentrations and low oxygen levels. Here, the nematode Halomonhystera hermesi thrives in high abundances (11,000 individuals 10 cm⁻²). Halomonhystera hermesi is a member of the intertidal Halomonhystera disjuncta species complex that includes five cryptic species (GD1-5). GD1-5’s common habitat is characterized by strong environmental fluctuations. Here, we compared the transcriptomes of H. hermesi and GD1, H. hermesi’s closest relative. Genes encoding proteins involved in oxidative phosphorylation are more strongly expressed in H. hermesi than in GD1, and many genes were only observed in H. hermesi while being completely absent in GD1. Both observations could in part be attributed to high sulfide concentrations and low oxygen levels. Additionally, fatty acid elongation was also prominent in H. hermesi confirming the importance of highly unsaturated fatty acids in this species. Significant higher amounts of transcription factors and genes involved in signaling receptor activity were observed in GD1 (many of which were completely absent in H. hermesi), allowing fast signaling and transcriptional reprogramming which can mediate survival in dynamic intertidal environments. GC content was approximately 8% higher in H. hermesi coding unigenes resulting in differential codon usage between both species and a higher proportion of amino acids with GC-rich codons in H. hermesi. In general our results showed that most pathways were active in both environments and that only three genes are under natural selection. This indicates that also plasticity should be taken in consideration in the evolutionary history of Halomonhystera species. Such plasticity, as well as possible preadaptation to low oxygen and high sulfide levels might have played an important role in the establishment of a cold-seep Halomonhystera population.

Collapse

Chen JY, Shen QS, Zhou WZ, Peng J, He BZ, Li Y, Liu CJ, Luan X, Ding W, Li S, Chen C, Tan BCM, Zhang YE, He A, Li CY. Emergence, Retention and Selection: A Trilogy of Origination for Functional De Novo Proteins from Ancestral LncRNAs in Primates. PLoS Genet 2015;11:e1005391. [PMID: 26177073 PMCID: PMC4503675 DOI: 10.1371/journal.pgen.1005391] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 06/24/2015] [Indexed: 01/08/2023] Open

Abstract

While some human-specific protein-coding genes have been proposed to originate from ancestral lncRNAs, the transition process remains poorly understood. Here we identified 64 hominoid-specific de novo genes and report a mechanism for the origination of functional de novo proteins from ancestral lncRNAs with precise splicing structures and specific tissue expression profiles. Whole-genome sequencing of dozens of rhesus macaque animals revealed that these lncRNAs are generally not more selectively constrained than other lncRNA loci. The existence of these newly-originated de novo proteins is also not beyond anticipation under neutral expectation, as they generally have longer theoretical lifespan than their current age, due to their GC-rich sequence property enabling stable ORFs with lower chance of non-sense mutations. Interestingly, although the emergence and retention of these de novo genes are likely driven by neutral forces, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution, which may contribute to human-specific genetic novelties by taking advantage of existed genomic contexts.

Although gene duplication has been believed as a predominant mechanism for creating new genes, recent reports suggested that new proteins could evolve “de novo” from non-coding DNA regions. These de novo genes are also named as “motherless” genes due to their lack of ancestral proteins as precursors, while recently we and others found that lncRNAs may represent an intermediate stage of their origination. To further elucidate this lncRNA-protein transition process, here we identified 64 hominoid-specific de novo genes and report a new mechanism for the origination of functional de novo proteins from ancestral non-coding transcripts: These non-coding “precursors” are generally not more selectively constrained than other lncRNA loci; and the existence of these de novo proteins is not beyond anticipation under neutral expectation; however, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution.

Collapse

Affiliation(s)

Jia-Yu Chen Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
Qing Sunny Shen Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
Wei-Zhen Zhou Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, China
Jiguang Peng Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
Bin Z. He FAS Center for Systems Biology & Howard Hughes Medical Institute, Harvard University, Cambridge, Massachusetts, United States of America
Yumei Li Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
Chu-Jun Liu Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
Xuke Luan Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China Peking-Tsinghua Center for Life Sciences, Beijing, China
Wanqiu Ding Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
Shuxian Li Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
Chunyan Chen Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
Bertrand Chin-Ming Tan Molecular Medicine Research Center, Chang Gung University, Tao-Yuan, Taiwan
Yong E. Zhang Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
Aibin He Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China Peking-Tsinghua Center for Life Sciences, Beijing, China * E-mail: (AH); (CYL)
Chuan-Yun Li Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China * E-mail: (AH); (CYL)

Collapse

Mutations That Stimulate flhDC Expression in Escherichia coli K-12. J Bacteriol 2015;197:3087-96. [PMID: 26170415 DOI: 10.1128/jb.00455-15] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 07/09/2015] [Indexed: 01/01/2023] Open

Abstract

UNLABELLED

Motility is a beneficial attribute that enables cells to access and explore new environments and to escape detrimental ones. The organelle of motility in Escherichia coli is the flagellum, and its production is initiated by the activating transcription factors FlhD and FlhC. The expression of these factors by the flhDC operon is highly regulated and influenced by environmental conditions. The flhDC promoter is recognized by σ(70) and is dependent on the transcriptional activator cyclic AMP (cAMP)-cAMP receptor protein complex (cAMP-CRP). A number of K-12 strains exhibit limited motility due to low expression levels of flhDC. We report here a large number of mutations that stimulate flhDC expression in such strains. They include single nucleotide changes in the -10 element of the promoter, in the promoter spacer, and in the cAMP-CRP binding region. In addition, we show that insertion sequence (IS) elements or a kanamycin gene located hundreds of base pairs upstream of the promoter can effectively enhance transcription, suggesting that the topology of a large upstream region plays a significant role in the regulation of flhDC expression. None of the mutations eliminated the requirement for cAMP-CRP for activation. However, several mutations allowed expression in the absence of the nucleoid organizing protein, H-NS, which is normally required for flhDC expression.

IMPORTANCE

The flhDC operon of Escherichia coli encodes transcription factors that initiate flagellar synthesis, an energetically costly process that is highly regulated. Few deregulating mutations have been reported thus far. This paper describes new single nucleotide mutations that stimulate flhDC expression, including a number that map to the promoter spacer region. In addition, this work shows that insertion sequence elements or a kanamycin gene located far upstream from the promoter or repressor binding sites also stimulate transcription, indicating a role of regional topology in the regulation of flhDC expression.

Collapse

Lengths of Orthologous Prokaryotic Proteins Are Affected by Evolutionary Factors. BIOMED RESEARCH INTERNATIONAL 2015;2015:786861. [PMID: 26114113 PMCID: PMC4465819 DOI: 10.1155/2015/786861] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 11/02/2014] [Indexed: 12/16/2022]

Wong TY, Schwartzbach SD. Protein Mis-Termination Initiates Genetic Diseases, Cancers, and Restricts Bacterial Genome Expansion. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, ENVIRONMENTAL CARCINOGENESIS & ECOTOXICOLOGY REVIEWS 2015;33:255-285. [PMID: 26087060 DOI: 10.1080/10590501.2015.1053461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]

Eichenmüller M, Trippel F, Kreuder M, Beck A, Schwarzmayr T, Häberle B, Cairo S, Leuschner I, von Schweinitz D, Strom TM, Kappler R. The genomic landscape of hepatoblastoma and their progenies with HCC-like features. J Hepatol 2014;61:1312-20. [PMID: 25135868 DOI: 10.1016/j.jhep.2014.08.009] [Citation(s) in RCA: 290] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Revised: 07/15/2014] [Accepted: 08/07/2014] [Indexed: 02/01/2023]

Abstract

BACKGROUND & AIMS

Hepatoblastoma (HB) is the most common childhood liver cancer and occasionally presents with histological and clinical features reminiscent of hepatocellular carcinoma (HCC). Identification of molecular mechanisms that drive the neoplastic continuation towards more aggressive HCC phenotypes may help to guide the new stage of targeted therapies.

METHODS

We performed comprehensive studies on genetic and chromosomal alterations as well as candidate gene function and their clinical relevance.

RESULTS

Whole-exome sequencing identified HB as a genetically very simple tumour (2.9 mutations per tumour) with recurrent mutations in ß-catenin (CTNNB1) (12/15 cases) and the transcription factor NFE2L2 (2/15 cases). Their HCC-like progenies share the common CTNNB1 mutation, but additionally exhibit a significantly increased mutation number and chromosomal instability due to deletions of the genome guardians RAD17 and TP53, accompanied by telomerase reverse-transcriptase (TERT) promoter mutations. Targeted genotyping of 33 primary tumours and cell lines revealed CTNNB1, NFE2L2, and TERT mutations in 72.5%, 9.8%, and 5.9% of cases, respectively. All NFE2L2 mutations affected residues of the NFE2L2 protein that are recognized by the KEAP1/CUL3 complex for proteasomal degradation. Consequently, cells transfected with mutant NFE2L2 were insensitive to KEAP1-mediated downregulation of NFE2L2 signalling. Clinically, overexpression of the NFE2L2 target gene NQO1 in tumours was significantly associated with metastasis, vascular invasion, the adverse prognostic C2 gene signature, as well as poor outcome.

CONCLUSIONS

Our study demonstrates the importance of CTNNB1 mutations and NFE2L2-KEAP1 pathway activation in HB development and defines loss of genomic stability and TERT promoter mutations as prominent characteristics of aggressive HB with HCC features.

Collapse

Fares M. Identifying Evolution Signatures in Molecules. NATURAL SELECTION 2014:9-27. [DOI: 10.1201/b17795-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]

Wang Z, Guo F, Mao Y, Xia Y, Zhang T. Metabolic characteristics of a glycogen-accumulating organism in Defluviicoccus cluster II revealed by comparative genomics. MICROBIAL ECOLOGY 2014;68:716-728. [PMID: 24889288 DOI: 10.1007/s00248-014-0440-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2014] [Accepted: 05/20/2014] [Indexed: 06/03/2023]

Wang J, Raskin L, Samuels DC, Shyr Y, Guo Y. Genome measures used for quality control are dependent on gene function and ancestry. ACTA ACUST UNITED AC 2014;31:318-23. [PMID: 25297068 DOI: 10.1093/bioinformatics/btu668] [Citation(s) in RCA: 106] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Posnien N, Zeng V, Schwager EE, Pechmann M, Hilbrant M, Keefe JD, Damen WGM, Prpic NM, McGregor AP, Extavour CG. A comprehensive reference transcriptome resource for the common house spider Parasteatoda tepidariorum. PLoS One 2014;9:e104885. [PMID: 25118601 PMCID: PMC4132015 DOI: 10.1371/journal.pone.0104885] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Accepted: 07/17/2014] [Indexed: 12/12/2022] Open

Abstract

Parasteatoda tepidariorum is an increasingly popular model for the study of spider development and the evolution of development more broadly. However, fully understanding the regulation and evolution of P. tepidariorum development in comparison to other animals requires a genomic perspective. Although research on P. tepidariorum has provided major new insights, gene analysis to date has been limited to candidate gene approaches. Furthermore, the few available EST collections are based on embryonic transcripts, which have not been systematically annotated and are unlikely to contain transcripts specific to post-embryonic stages of development. We therefore generated cDNA from pooled embryos representing all described embryonic stages, as well as post-embryonic stages including nymphs, larvae and adults, and using Illumina HiSeq technology obtained a total of 625,076,514 100-bp paired end reads. We combined these data with 24,360 ESTs available in GenBank, and 1,040,006 reads newly generated from 454 pyrosequencing of a mixed-stage embryo cDNA library. The combined sequence data were assembled using a custom de novo assembly strategy designed to optimize assembly product length, number of predicted transcripts, and proportion of raw reads incorporated into the assembly. The de novo assembly generated 446,427 contigs with an N50 of 1,875 bp. These sequences obtained 62,799 unique BLAST hits against the NCBI non-redundant protein data base, including putative orthologs to 8,917 Drosophila melanogaster genes based on best reciprocal BLAST hit identity compared with the D. melanogaster proteome. Finally, we explored the utility of the transcriptome for RNA-Seq studies, and showed that this resource can be used as a mapping scaffold to detect differential gene expression in different cDNA libraries. This resource will therefore provide a platform for future genomic, gene expression and functional approaches using P. tepidariorum.

Collapse

Li XQ, Du D. Variation, evolution, and correlation analysis of C+G content and genome or chromosome size in different kingdoms and phyla. PLoS One 2014;9:e88339. [PMID: 24551092 PMCID: PMC3923770 DOI: 10.1371/journal.pone.0088339] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2013] [Accepted: 01/06/2014] [Indexed: 12/05/2022] Open

Tatarinova T, Elhaik E, Pellegrini M. Cross-species analysis of genic GC3 content and DNA methylation patterns. Genome Biol Evol 2013;5:1443-56. [PMID: 23833164 PMCID: PMC3762193 DOI: 10.1093/gbe/evt103] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Abstract

The GC content in the third codon position (GC₃) exhibits a unimodal distribution in many plant and animal genomes. Interestingly, grasses and homeotherm vertebrates exhibit a unique bimodal distribution. High GC₃ was previously found to be associated with variable expression, higher frequency of upstream TATA boxes, and an increase of GC₃ from 5′ to 3′. Moreover, GC₃-rich genes are predominant in certain gene classes and are enriched in CpG dinucleotides that are potential targets for methylation. Based on the GC₃ bimodal distribution we hypothesize that GC₃ has a regulatory role involving methylation and gene expression. To test that hypothesis, we selected diverse taxa (rice, thale cress, bee, and human) that varied in the modality of their GC₃ distribution and tested the association between GC₃, DNA methylation, and gene expression. We examine the relationship between cytosine methylation levels and GC₃, gene expression, genome signature, gene length, and other gene compositional features. We find a strong negative correlation (Pearson’s correlation coefficient r = −0.67, P value < 0.0001) between GC₃ and genic CpG methylation. The comparison between 5′-3′ gradients of CG₃-skew and genic methylation for the taxa in the study suggests interplay between gene-body methylation and transcription-coupled cytosine deamination effect. Compositional features are correlated with methylation levels of genes in rice, thale cress, human, bee, and fruit fly (which acts as an unmethylated control). These patterns allow us to generate evolutionary hypotheses about the relationships between GC₃ and methylation and how these affect expression patterns. Specifically, we propose that the opposite effects of methylation and compositional gradients along coding regions of GC₃-poor and GC₃-rich genes are the products of several competing processes.

Collapse

Gurudeeban S, Satyavani K, Ramanathan T. Phylogeny of Indian rhizophoraceae based on the molecular data from chloroplast tRNA(LEU)UAA intergenic sequences. Pak J Biol Sci 2013;16:1130-7. [PMID: 24506012 DOI: 10.3923/pjbs.2013.1130.1137] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Romiguier J, Ranwez V, Delsuc F, Galtier N, Douzery EJP. Less is more in mammalian phylogenomics: AT-rich genes minimize tree conflicts and unravel the root of placental mammals. Mol Biol Evol 2013;30:2134-44. [PMID: 23813978 DOI: 10.1093/molbev/mst116] [Citation(s) in RCA: 117] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

El Allali A, Rose JR. MGC: a metagenomic gene caller. BMC Bioinformatics 2013;14 Suppl 9:S6. [PMID: 23901840 PMCID: PMC3698006 DOI: 10.1186/1471-2105-14-s9-s6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Predicting statistical properties of open reading frames in bacterial genomes. PLoS One 2012;7:e45103. [PMID: 23028785 PMCID: PMC3454372 DOI: 10.1371/journal.pone.0045103] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 08/14/2012] [Indexed: 11/26/2022] Open

Tiessen A, Pérez-Rodríguez P, Delaye-Arredondo LJ. Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes. BMC Res Notes 2012;5:85. [PMID: 22296664 PMCID: PMC3296660 DOI: 10.1186/1756-0500-5-85] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2011] [Accepted: 02/01/2012] [Indexed: 11/29/2022] Open

Abstract

BACKGROUND

The sizes of proteins are relevant to their biochemical structure and for their biological function. The statistical distribution of protein lengths across a diverse set of taxa can provide hints about the evolution of proteomes.

RESULTS

Using the full genomic sequences of over 1,302 prokaryotic and 140 eukaryotic species two datasets containing 1.2 and 6.1 million proteins were generated and analyzed statistically. The lengthwise distribution of proteins can be roughly described with a gamma type or log-normal model, depending on the species. However the shape parameter of the gamma model has not a fixed value of 2, as previously suggested, but varies between 1.5 and 3 in different species. A gamma model with unrestricted shape parameter described best the distributions in ~48% of the species, whereas the log-normal distribution described better the observed protein sizes in 42% of the species. The gamma restricted function and the sum of exponentials distribution had a better fitting in only ~5% of the species. Eukaryotic proteins have an average size of 472 aa, whereas bacterial (320 aa) and archaeal (283 aa) proteins are significantly smaller (33-40% on average). Average protein sizes in different phylogenetic groups were: Alveolata (628 aa), Amoebozoa (533 aa), Fornicata (543 aa), Placozoa (453 aa), Eumetazoa (486 aa), Fungi (487 aa), Stramenopila (486 aa), Viridiplantae (392 aa). Amino acid composition is biased according to protein size. Protein length correlated negatively with %C, %M, %K, %F, %R, %W, %Y and positively with %D, %E, %Q, %S and %T. Prokaryotic proteins had a different protein size bias for %E, %G, %K and %M as compared to eukaryotes.

CONCLUSIONS

Mathematical modeling of protein length empirical distributions can be used to asses the quality of small ORFs annotation in genomic releases (detection of too many false positive small ORFs). There is a negative correlation between average protein size and total number of proteins among eukaryotes but not in prokaryotes. The %GC content is positively correlated to total protein number and protein size in prokaryotes but not in eukaryotes. Small proteins have a different amino acid bias than larger proteins. Compared to prokaryotic species, the evolution of eukaryotic proteomes was characterized by increased protein number (massive gene duplication) and substantial changes of protein size (domain addition/subtraction).

Collapse

Wu H, Zhang Z, Hu S, Yu J. On the molecular mechanism of GC content variation among eubacterial genomes. Biol Direct 2012;7:2. [PMID: 22230424 PMCID: PMC3274465 DOI: 10.1186/1745-6150-7-2] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2011] [Accepted: 01/10/2012] [Indexed: 12/02/2022] Open

Abstract

Background

As a key parameter of genome sequence variation, the GC content of bacterial genomes has been investigated for over half a century, and many hypotheses have been put forward to explain this GC content variation and its relationship to other fundamental processes. Previously, we classified eubacteria into dnaE-based groups (the dimeric combination of DNA polymerase III alpha subunits), according to a hypothesis where GC content variation is essentially governed by genome replication and DNA repair mechanisms. Further investigation led to the discovery that two major mutator genes, polC and dnaE2, may be responsible for genomic GC content variation. Consequently, an in-depth analysis was conducted to evaluate various potential intrinsic and extrinsic factors in association with GC content variation among eubacterial genomes.

Results

Mutator genes, especially those with dominant effects on the mutation spectra, are biased towards either GC or AT richness, and they alter genomic GC content in the two opposite directions. Increased bacterial genome size (or gene number) appears to rely on increased genomic GC content; however, it is unclear whether the changes are directly related to certain environmental pressures. Certain environmental and bacteriological features are related to GC content variation, but their trends are more obvious when analyzed under the dnaE-based grouping scheme. Most terrestrial, plant-associated, and nitrogen-fixing bacteria are members of the dnaE1|dnaE2 group, whereas most pathogenic or symbiotic bacteria in insects, and those dwelling in aquatic environments, are largely members of the dnaE1|polV group.

Conclusion

Our studies provide several lines of evidence indicating that DNA polymerase III α subunit and its isoforms participating in either replication (such as polC) or SOS mutagenesis/translesion synthesis (such as dnaE2), play dominant roles in determining GC variability. Other environmental or bacteriological factors, such as genome size, temperature, oxygen requirement, and habitat, either play subsidiary roles or rely indirectly on different mutator genes to fine-tune the GC content. These results provide a comprehensive insight into mechanisms of GC content variation and the robustness of eubacterial genomes in adapting their ever-changing environments over billions of years.

Reviewers

This paper was reviewed by Nicolas Galtier, Adam Eyre-Walker, and Eugene Koonin.

Collapse

Huang Q, Cheng X, Cheung MK, Kiselev SS, Ozoline ON, Kwan HS. High-density transcriptional initiation signals underline genomic islands in bacteria. PLoS One 2012;7:e33759. [PMID: 22448273 PMCID: PMC3309015 DOI: 10.1371/journal.pone.0033759] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Accepted: 02/21/2012] [Indexed: 02/07/2023] Open

Abstract

Genomic islands (GIs), frequently associated with the pathogenicity of bacteria and having a substantial influence on bacterial evolution, are groups of "alien" elements which probably undergo special temporal-spatial regulation in the host genome. Are there particular hallmark transcriptional signals for these "exotic" regions? We here explore the potential transcriptional signals that underline the GIs beyond the conventional views on basic sequence composition, such as codon usage and GC property bias. It showed that there is a significant enrichment of the transcription start positions (TSPs) in the GI regions compared to the whole genome of Salmonella enterica and Escherichia coli. There was up to a four-fold increase for the 70% GIs, implying high-density TSPs profile can potentially differentiate the GI regions. Based on this feature, we developed a new sliding window method GIST, Genomic-island Identification by Signals of Transcription, to identify these regions. Subsequently, we compared the known GI-associated features of the GIs detected by GIST and by the existing method Islandviewer to those of the whole genome. Our method demonstrates high sensitivity in detecting GIs harboring genes with biased GI-like function, preferred subcellular localization, skewed GC property, shorter gene length and biased "non-optimal" codon usage. The special transcriptional signals discovered here may contribute to the coordinate expression regulation of foreign genes. Finally, by using GIST, we detected many interesting GIs in the 2011 German E. coli O104:H4 outbreak strain TY-2482, including the microcin H47 system and gene cluster ycgXEFZ-ymgABC that activates the production of biofilm matrix. The aforesaid findings highlight the power of GIST to predict GIs with distinct intrinsic features to the genome. The heterogeneity of cumulative TSPs profiles may not only be a better identity for "alien" regions, but also provide hints to the special evolutionary course and transcriptional regulation of GI regions.

Collapse

Schmid P, Flegel WA. Codon usage in vertebrates is associated with a low risk of acquiring nonsense mutations. J Transl Med 2011;9:87. [PMID: 21651781 PMCID: PMC3123582 DOI: 10.1186/1479-5876-9-87] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2011] [Accepted: 06/08/2011] [Indexed: 12/18/2022] Open

Stoletzki N. The surprising negative correlation of gene length and optimal codon use--disentangling translational selection from GC-biased gene conversion in yeast. BMC Evol Biol 2011;11:93. [PMID: 21481245 PMCID: PMC3096941 DOI: 10.1186/1471-2148-11-93] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2010] [Accepted: 04/11/2011] [Indexed: 02/06/2023] Open

Comparative Genomics Of Insect Endosymbionts. ACTA ACUST UNITED AC 2010. [DOI: 10.1201/9780203009918.ch3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

McCoy MW, Allen AP, Gillooly JF. The random nature of genome architecture: predicting open reading frame distributions. PLoS One 2009;4:e6456. [PMID: 19649247 PMCID: PMC2714469 DOI: 10.1371/journal.pone.0006456] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2009] [Accepted: 06/23/2009] [Indexed: 11/18/2022] Open

Comparative component analysis of exons with different splicing frequencies. PLoS One 2009;4:e5387. [PMID: 19404386 PMCID: PMC2671145 DOI: 10.1371/journal.pone.0005387] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2008] [Accepted: 03/31/2009] [Indexed: 12/12/2022] Open