1
|
Sun W, Li M, Wang J. Characteristics of duplicated gene expression and DNA methylation regulation in different tissues of allopolyploid Brassica napus. BMC PLANT BIOLOGY 2024; 24:518. [PMID: 38851683 PMCID: PMC11162574 DOI: 10.1186/s12870-024-05245-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 06/04/2024] [Indexed: 06/10/2024]
Abstract
Plant polyploidization increases the complexity of epigenomes and transcriptional regulation, resulting in genome evolution and enhanced adaptability. However, few studies have been conducted on the relationship between gene expression and epigenetic modification in different plant tissues after allopolyploidization. In this study, we studied gene expression and DNA methylation modification patterns in four tissues (stems, leaves, flowers and siliques) of Brassica napusand its diploid progenitors. On this basis, the alternative splicing patterns and cis-trans regulation patterns of four tissues in B. napus and its diploid progenitors were also analyzed. It can be seen that the number of alternative splicing occurs in the B. napus is higher than that in the diploid progenitors, and the IR type increases the most during allopolyploidy. In addition, we studied the fate changes of duplicated genes after allopolyploidization in B. napus. We found that the fate of most duplicated genes is conserved, but the number of neofunctionalization and specialization is also large. The genetic fate of B. napus was classified according to five replication types (WGD, PD, DSD, TD, TRD). This study also analyzed generational transmission analysis of expression and DNA methylation patterns. Our study provides a reference for the fate differentiation of duplicated genes during allopolyploidization.
Collapse
Affiliation(s)
- Weiqi Sun
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Mengdi Li
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, 430072, China
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Jianbo Wang
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, 430072, China.
| |
Collapse
|
2
|
Corzo G, Seeling-Branscomb CE, Seeling JM. Differential Synonymous Codon Selection in the B56 Gene Family of PP2A Regulatory Subunits. Int J Mol Sci 2023; 25:392. [PMID: 38203563 PMCID: PMC10778929 DOI: 10.3390/ijms25010392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/18/2023] [Accepted: 12/23/2023] [Indexed: 01/12/2024] Open
Abstract
Protein phosphatase 2A (PP2A) functions as a tumor suppressor and consists of a scaffolding, catalytic, and regulatory subunit. The B56 gene family of regulatory subunits impart distinct functions onto PP2A. Codon usage bias (CUB) involves the selection of synonymous codons, which can affect gene expression by modulating processes such as transcription and translation. CUB can vary along the length of a gene, and differential use of synonymous codons can be important in the divergence of gene families. The N-termini of the gene product encoded by B56α possessed high CUB, high GC content at the third codon position (GC3), and high rare codon content. In addition, differential CUB was found in the sequence encoding two B56γ N-terminal splice forms. The sequence encoding the N-termini of B56γ/γ, relative to B56δ/γ, displayed CUB, utilized more frequent codons, and had higher GC3 content. B56α mRNA had stronger than predicted secondary structure at their 5' end, and the B56δ/γ splice variants had long regions of weaker than predicted secondary structure at their 5' end. The data suggest that B56α is expressed at relatively low levels as compared to the other B56 isoforms and that the B56δ/γ splice variant is expressed more highly than B56γ/γ.
Collapse
Affiliation(s)
- Gabriel Corzo
- Department of Biology, Hofstra University, Hempstead, NY 11549, USA;
| | | | - Joni M. Seeling
- Department of Biology, Hofstra University, Hempstead, NY 11549, USA;
| |
Collapse
|
3
|
Liu Y, Liang N, Xian Q, Zhang W. GC heterogeneity reveals sequence-structures evolution of angiosperm ITS2. BMC PLANT BIOLOGY 2023; 23:608. [PMID: 38036992 PMCID: PMC10691020 DOI: 10.1186/s12870-023-04634-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 11/26/2023] [Indexed: 12/02/2023]
Abstract
BACKGROUND Despite GC variation constitutes a fundamental element of genome and species diversity, the precise mechanisms driving it remain unclear. The abundant sequence data available for the ITS2, a commonly employed phylogenetic marker in plants, offers an exceptional resource for exploring the GC variation across angiosperms. RESULTS A comprehensive selection of 8666 species, comprising 165 genera, 63 families, and 30 orders were used for the analyses. The alignment of ITS2 sequence-structures and partitioning of secondary structures into paired and unpaired regions were performed using 4SALE. Substitution rates and frequencies among GC base-pairs in the paired regions of ITS2 were calculated using RNA-specific models in the PHASE package. The results showed that the distribution of ITS2 GC contents on the angiosperm phylogeny was heterogeneous, but their increase was generally associated with ITS2 sequence homogenization, thereby supporting the occurrence of GC-biased gene conversion (gBGC) during the concerted evolution of ITS2. Additionally, the GC content in the paired regions of the ITS2 secondary structure was significantly higher than that of the unpaired regions, indicating the selection of GC for thermodynamic stability. Furthermore, the RNA substitution models demonstrated that base-pair transformations favored both the elevation and fixation of GC in the paired regions, providing further support for gBGC. CONCLUSIONS Our findings highlight the significance of secondary structure in GC investigation, which demonstrate that both gBGC and structure-based selection are influential factors driving angiosperm ITS2 GC content.
Collapse
Affiliation(s)
- Yubo Liu
- Marine College, Shandong University, Weihai, 264209, China
- Division of Physical Biology, CAS Key Laboratory of Interfacial Physics and Technology, Shanghai Institute of Applied Physics, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, 201800, China
| | - Nan Liang
- Marine College, Shandong University, Weihai, 264209, China
- Allergy Department, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China
| | - Qing Xian
- Marine College, Shandong University, Weihai, 264209, China
| | - Wei Zhang
- Marine College, Shandong University, Weihai, 264209, China.
| |
Collapse
|
4
|
Tyagi S, Kabade PG, Gnanapragasam N, Singh UM, Gurjar AKS, Rai A, Sinha P, Kumar A, Singh VK. Codon Usage Provide Insights into the Adaptation of Rice Genes under Stress Condition. Int J Mol Sci 2023; 24:ijms24021098. [PMID: 36674611 PMCID: PMC9861248 DOI: 10.3390/ijms24021098] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 12/14/2022] [Accepted: 12/17/2022] [Indexed: 01/09/2023] Open
Abstract
Plants experience different stresses, i.e., abiotic, or biotic, and to combat them, plants re-program the expression of growth-, metabolism-, and resistance-related genes. These genes differ in their synonymous codon usage frequency and show codon usage bias. Here, we investigated the correlation among codon usage bias, gene expression, and underlying mechanisms in rice under abiotic and biotic stress conditions. The results indicated that genes with higher expression (up- or downregulated) levels had high GC content (≥60%), a low effective number of codon usage (≤40), and exhibited strong biases towards the codons with C/G at the third nucleotide position, irrespective of stress received. TTC, ATC, and CTC were the most preferred codons, while TAC, CAC, AAC, GAC, and TGC were moderately preferred under any stress (abiotic or biotic) condition. Additionally, downregulated genes are under mutational pressure (R2 ≥ 0.5) while upregulated genes are under natural selection pressure (R2 ≤ 0.5). Based on these results, we also identified the possible target codons that can be used to design an optimized set of genes with specific codons to develop climate-resilient varieties. Conclusively, under stress, rice has a bias towards codon usage which is correlated with GC content, gene expression level, and gene length.
Collapse
Affiliation(s)
- Swati Tyagi
- International Rice Research Institute-South Asia Regional Centre (ISARC), Varanasi 221106, India
| | | | - Niranjani Gnanapragasam
- International Rice Research Institute (IRRI)-South-Asia Hub, International Crops Research Institute for the Semi-Arid Tropics, Hyderabad 502324, India
| | - Uma Maheshwar Singh
- International Rice Research Institute-South Asia Regional Centre (ISARC), Varanasi 221106, India
| | | | - Ashutosh Rai
- International Rice Research Institute-South Asia Regional Centre (ISARC), Varanasi 221106, India
| | - Pallavi Sinha
- International Rice Research Institute (IRRI)-South-Asia Hub, International Crops Research Institute for the Semi-Arid Tropics, Hyderabad 502324, India
| | - Arvind Kumar
- International Rice Research Institute-South Asia Regional Centre (ISARC), Varanasi 221106, India
| | - Vikas Kumar Singh
- International Rice Research Institute-South Asia Regional Centre (ISARC), Varanasi 221106, India
- International Rice Research Institute (IRRI)-South-Asia Hub, International Crops Research Institute for the Semi-Arid Tropics, Hyderabad 502324, India
- Correspondence:
| |
Collapse
|
5
|
Zhang Y, Shen Z, Meng X, Zhang L, Liu Z, Liu M, Zhang F, Zhao J. Codon usage patterns across seven Rosales species. BMC PLANT BIOLOGY 2022; 22:65. [PMID: 35123393 PMCID: PMC8817548 DOI: 10.1186/s12870-022-03450-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 01/31/2022] [Indexed: 05/03/2023]
Abstract
BACKGROUND Codon usage bias (CUB) analysis is an effective method for studying specificity, evolutionary relationships, and mRNA translation and discovering new genes among various species. In general, CUB analysis is mainly performed within one species or between closely related species and no such study has been applied among species with distant genetic relationships. Here, seven Rosales species with high economic value were selected to conduct CUB analysis. RESULTS The results showed that the average GC1, GC2 and GC3 contents were 51.08, 40.52 and 43.12%, respectively, indicating that the A/T content is more abundant and the Rosales species prefer A/T as the last codon. Neutrality plot and ENc plot analysis revealed that natural selection was the main factor leading to CUB during the evolution of Rosales species. All 7 Rosales species contained three high-frequency codons, AGA, GTT and TTG, encoding Arg, Val and Leu, respectively. The 7 Rosales species differed in high-frequency codon pairs and the distribution of GC3, though the usage patterns of closely related species were more consistent. The results of the biclustering heat map among 7 Rosales species and 20 other species were basically consistent with the results of genome data, suggesting that CUB analysis is an effective method for revealing evolutionary relationships among species at the family or order level. In addition, chlorophytes prefer using G/C as ending codon, while monocotyledonous and dicotyledonous plants prefer using A/T as ending codon. CONCLUSIONS The CUB pattern among Rosales species was mainly affected by natural selection. This work is the first to highlight the CUB patterns and characteristics of Rosales species and provides a new perspective for studying genetic relationships across a wide range of species.
Collapse
Affiliation(s)
- Yao Zhang
- College of Life Science, Hebei Agricultural University, Baoding, China
- Hebei Key Laboratory of Plant Physiology and Molecular Pathology, Hebei Agricultural University, Baoding, China
| | - Zenan Shen
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190 China
| | - Xiangrui Meng
- College of Life Science, Hebei Agricultural University, Baoding, China
- Hebei Key Laboratory of Plant Physiology and Molecular Pathology, Hebei Agricultural University, Baoding, China
| | - Liman Zhang
- College of Life Science, Hebei Agricultural University, Baoding, China
- Hebei Key Laboratory of Plant Physiology and Molecular Pathology, Hebei Agricultural University, Baoding, China
| | - Zhiguo Liu
- Research Center of Chinese Jujube, Hebei Agricultural University, Baoding, China
| | - Mengjun Liu
- Research Center of Chinese Jujube, Hebei Agricultural University, Baoding, China
| | - Fa Zhang
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190 China
| | - Jin Zhao
- College of Life Science, Hebei Agricultural University, Baoding, China
- Hebei Key Laboratory of Plant Physiology and Molecular Pathology, Hebei Agricultural University, Baoding, China
| |
Collapse
|
6
|
Entrambasaguas L, Ruocco M, Verhoeven KJF, Procaccini G, Marín-Guirao L. Gene body DNA methylation in seagrasses: inter- and intraspecific differences and interaction with transcriptome plasticity under heat stress. Sci Rep 2021; 11:14343. [PMID: 34253765 PMCID: PMC8275578 DOI: 10.1038/s41598-021-93606-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 06/28/2021] [Indexed: 02/06/2023] Open
Abstract
The role of DNA methylation and its interaction with gene expression and transcriptome plasticity is poorly understood, and current insight comes mainly from studies in very few model plant species. Here, we study gene body DNA methylation (gbM) and gene expression patterns in ecotypes from contrasting thermal environments of two marine plants with contrasting life history strategies in order to explore the potential role epigenetic mechanisms could play in gene plasticity and responsiveness to heat stress. In silico transcriptome analysis of CpGO/E ratios suggested that the bulk of Posidonia oceanica and Cymodocea nodosa genes possess high levels of intragenic methylation. We also observed a correlation between gbM and gene expression flexibility: genes with low DNA methylation tend to show flexible gene expression and plasticity under changing conditions. Furthermore, the empirical determination of global DNA methylation (5-mC) showed patterns of intra and inter-specific divergence that suggests a link between methylation level and the plants' latitude of origin and life history. Although we cannot discern whether gbM regulates gene expression or vice versa, or if other molecular mechanisms play a role in facilitating transcriptome responsiveness, our findings point to the existence of a relationship between gene responsiveness and gbM patterns in marine plants.
Collapse
Affiliation(s)
- Laura Entrambasaguas
- Integrative Marine Ecology Department, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Napoli, Italy
| | - Miriam Ruocco
- Integrative Marine Ecology Department, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Napoli, Italy
| | - Koen J F Verhoeven
- Terrestrial Ecology Department, Netherlands Institute of Ecology (NIOO-KNAW), Droevendaalsesteeg 10, 6708 PB, Wageningen, The Netherlands
| | - Gabriele Procaccini
- Integrative Marine Ecology Department, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Napoli, Italy.
| | - Lazaro Marín-Guirao
- Integrative Marine Ecology Department, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Napoli, Italy
- Seagrass Ecology Group, Oceanographic Center of Murcia, Spanish Institute of Oceanography, C/Varadero, 30740, San Pedro del Pinatar, Spain
| |
Collapse
|
7
|
Sweet DR, Lam C, Jain MK. Evolutionary Protection of Krüppel-Like Factors 2 and 4 in the Development of the Mature Hemovascular System. Front Cardiovasc Med 2021; 8:645719. [PMID: 34079826 PMCID: PMC8165158 DOI: 10.3389/fcvm.2021.645719] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 04/21/2021] [Indexed: 02/02/2023] Open
Abstract
A properly functioning hemovascular system, consisting of circulating innate immune cells and endothelial cells (ECs), is essential in the distribution of nutrients to distant tissues while ensuring protection from invading pathogens. Professional phagocytes (e.g., macrophages) and ECs have co-evolved in vertebrates to adapt to increased physiological demands. Intercellular interactions between components of the hemovascular system facilitate numerous functions in physiology and disease in part through the utilization of shared signaling pathways and factors. Krüppel-like factors (KLFs) 2 and 4 are two such transcription factors with critical roles in both cellular compartments. Decreased expression of either factor in myeloid or endothelial cells increases susceptibility to a multitude of inflammatory diseases, underscoring the essential role for their expression in maintaining cellular quiescence. Given the close evolutionary relationship between macrophages and ECs, along with their shared utilization of KLF2 and 4, we hypothesize that KLF genes evolved in such a way that protected their expression in myeloid and endothelial cells. Within this Perspective, we review the roles of KLF2 and 4 in the hemovascular system and explore evolutionary trends in their nucleotide composition that suggest a coordinated protection that corresponds with the development of mature myeloid and endothelial systems.
Collapse
Affiliation(s)
- David R Sweet
- Case Cardiovascular Research Institute, Case Western Reserve University, Cleveland, OH, United States.,Harrington Heart and Vascular Institute, University Hospitals Cleveland Medical Center, Cleveland, OH, United States.,Department of Pathology, Case Western Reserve University, Cleveland, OH, United States
| | - Cherry Lam
- Department of Biology, New York University, New York, NY, United States
| | - Mukesh K Jain
- Case Cardiovascular Research Institute, Case Western Reserve University, Cleveland, OH, United States.,Harrington Heart and Vascular Institute, University Hospitals Cleveland Medical Center, Cleveland, OH, United States
| |
Collapse
|
8
|
Kim EY, Wang L, Lei Z, Li H, Fan W, Cho J. Ribosome stalling and SGS3 phase separation prime the epigenetic silencing of transposons. NATURE PLANTS 2021; 7:303-309. [PMID: 33649597 DOI: 10.1038/s41477-021-00867-4] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 02/01/2021] [Indexed: 05/20/2023]
Abstract
Transposable elements (TEs, transposons) are mobile DNAs that can cause fatal mutations1. To suppress their activity, host genomes deploy small interfering RNAs (siRNAs) that trigger and maintain their epigenetic silencing2,3. Whereas 24-nucleotide (nt) siRNAs mediate RNA-directed DNA methylation (RdDM) to reinforce the silent state of TEs3, activated or naive TEs give rise to 21- or 22-nt siRNAs by the RNA-DEPENDENT RNA POLYMERASE 6 (RDR6)-mediated pathway, triggering both RNAi and de novo DNA methylation4,5. This process, which is called RDR6-RdDM, is critical for the initiation of epigenetic silencing of active TEs; however, their specific recognition and the selective processing of siRNAs remain elusive. Here, we suggest that plant transposon RNAs undergo frequent ribosome stalling caused by their unfavourable codon usage. Ribosome stalling subsequently induces RNA truncation and localization to cytoplasmic siRNA bodies, both of which are essential prerequisites for RDR6 targeting6,7. In addition, SUPPRESSOR OF GENE SILENCING 3 (SGS3), the RDR6-interacting protein7, exhibits phase separation both in vitro and in vivo through its prion-like domains, implicating the role of liquid-liquid phase separation in siRNA body formation. Our study provides insight into the host recognition of active TEs, which is important for the maintenance of genome integrity.
Collapse
Affiliation(s)
- Eun Yu Kim
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Ling Wang
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Shanghai, China
| | - Zhen Lei
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Shanghai, China
| | - Hui Li
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Shanghai, China
| | - Wenwen Fan
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Shanghai, China
| | - Jungnam Cho
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Shanghai, China.
- CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), Chinese Academy of Sciences, Shanghai, China.
| |
Collapse
|
9
|
Pachganov S, Murtazalieva K, Zarubin A, Taran T, Chartier D, Tatarinova TV. Prediction of Rice Transcription Start Sites Using TransPrise: A Novel Machine Learning Approach. Methods Mol Biol 2021; 2238:261-274. [PMID: 33471337 DOI: 10.1007/978-1-0716-1068-8_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
As the interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper, we present TransPrise-an efficient deep learning tool for predicting positions of eukaryotic transcription start sites. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise with the TSSPlant approach for well-annotated genome of Oryza sativa. Using a computer with a graphics processing unit, the run time of TransPrise is 250 min on a genome of 374 Mb long.We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all the necessary packages, models, and code as well as the source code of the TransPrise algorithm are available at http://compubioverne.group/ . The source code is ready to use and to be customized to predict TSS in any eukaryotic organism.
Collapse
Affiliation(s)
- Stepan Pachganov
- Ugra Research Institute of Information Technologies, Khanty-Mansiysk, Russia
| | | | - Alexei Zarubin
- Tomsk National Research Medical Center of the Russian Academy of Sciences, Research Institute of Medical Genetics, Tomsk, Russia
| | | | - Duane Chartier
- International Center for Art Intelligence, Inc, Los Angeles, CA, USA
| | - Tatiana V Tatarinova
- Vavilov Institute of General Genetics, Moscow, Russia.
- Department of Biology, University of La Verne, La Verne, CA, USA.
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.
- Siberian Federal University, Krasnoyarsk, Russia.
| |
Collapse
|
10
|
Shen Z, Gan Z, Zhang F, Yi X, Zhang J, Wan X. Analysis of codon usage patterns in citrus based on coding sequence data. BMC Genomics 2020; 21:234. [PMID: 33327935 PMCID: PMC7739459 DOI: 10.1186/s12864-020-6641-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Accepted: 03/03/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Codon usage is an important determinant of gene expression levels that can help us understand codon biology, evolution and mRNA translation of species. The majority of previous codon usage studies have focused on single species analysis, although few studies have focused on the species within the same genus. In this study, we proposed a multispecies codon usage analysis workflow to reveal the genetic features and correlation in citrus. RESULTS Our codon usage analysis workflow was based on the GC content, GC plot, and relative synonymous codon usage value of each codon in 8 citrus species. This approach allows for the comparison of codon usage bias of different citrus species. Next, we performed cluster analysis and obtained an overview of the relationship in citrus. However, traditional methods cannot conduct quantitative analysis of the correlation. To further estimate the correlation among the citrus species, we used the frequency profile to construct feature vectors of each species. The Pearson correlation coefficient was used to quantitatively analyze the distance among the citrus species. This result was consistent with the cluster analysis. CONCLUSIONS Our findings showed that the citrus species are conserved at the genetic level and demonstrated the existing genetic evolutionary relationship in citrus. This work provides new insights into codon biology and the evolution of citrus and other plant species.
Collapse
Affiliation(s)
- Zenan Shen
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100000, China
| | - Zhimeng Gan
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Fa Zhang
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100000, China
| | - Xinyao Yi
- Department of Computer Science and Engineering, University of South Carolina, Colombia, 29201, USA
| | - Jinzhi Zhang
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xiaohua Wan
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China. .,University of Chinese Academy of Sciences, Beijing, 100000, China.
| |
Collapse
|
11
|
Sarpan N, Taranenko E, Ooi SE, Low ETL, Espinoza A, Tatarinova TV, Ong-Abdullah M. DNA methylation changes in clonally propagated oil palm. PLANT CELL REPORTS 2020; 39:1219-1233. [PMID: 32591850 DOI: 10.1007/s00299-020-02561-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 06/17/2020] [Indexed: 06/11/2023]
Abstract
Several hypomethylated sites within the Karma region of EgDEF1 and hotspot regions in chromosomes 1, 2, 3, and 5 may be associated with mantling. One of the main challenges faced by the oil palm industry is fruit abnormalities, such as the "mantled" phenotype that can lead to reduced yields. This clonal abnormality is an epigenetic phenomenon and has been linked to the hypomethylation of a transposable element within the EgDEF1 gene. To understand the epigenome changes in clones, methylomes of clonal oil palms were compared to methylomes of seedling-derived oil palms. Whole-genome bisulfite sequencing data from seedlings, normal, and mantled clones were analyzed to determine and compare the context-specific DNA methylomes. In seedlings, coding and regulatory regions are generally hypomethylated while introns and repeats are extensively methylated. Genes with a low number of guanines and cytosines in the third position of codons (GC3-poor genes) were increasingly methylated towards their 3' region, while GC3-rich genes remain demethylated, similar to patterns in other eukaryotic species. Predicted promoter regions were generally hypomethylated in seedlings. In clones, CG, CHG, and CHH methylation levels generally decreased in functionally important regions, such as promoters, 5' UTRs, and coding regions. Although random regions were found to be hypomethylated in clonal genomes, hypomethylation of certain hotspot regions may be associated with the clonal mantling phenotype. Our findings, therefore, suggest other hypomethylated CHG sites within the Karma of EgDEF1 and hypomethylated hotspot regions in chromosomes 1, 2, 3 and 5, are associated with mantling.
Collapse
Affiliation(s)
- Norashikin Sarpan
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000, Kajang, Selangor, Malaysia
| | - Elizaveta Taranenko
- Department of Biology, University of La Verne, La Verne, CA, USA
- Department of Fundamental Biology and Biotechnology, Siberian Federal University, 660074, Krasnoyarsk, Russia
| | - Siew-Eng Ooi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000, Kajang, Selangor, Malaysia
| | - Eng-Ti Leslie Low
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000, Kajang, Selangor, Malaysia
| | | | - Tatiana V Tatarinova
- Department of Biology, University of La Verne, La Verne, CA, USA.
- Department of Fundamental Biology and Biotechnology, Siberian Federal University, 660074, Krasnoyarsk, Russia.
- Vavilov Institute for General Genetics, Moscow, Russia.
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.
| | - Meilina Ong-Abdullah
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000, Kajang, Selangor, Malaysia.
| |
Collapse
|
12
|
The whale shark genome reveals how genomic and physiological properties scale with body size. Proc Natl Acad Sci U S A 2020; 117:20662-20671. [PMID: 32753383 DOI: 10.1073/pnas.1922576117] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
The endangered whale shark (Rhincodon typus) is the largest fish on Earth and a long-lived member of the ancient Elasmobranchii clade. To characterize the relationship between genome features and biological traits, we sequenced and assembled the genome of the whale shark and compared its genomic and physiological features to those of 83 animals and yeast. We examined the scaling relationships between body size, temperature, metabolic rates, and genomic features and found both general correlations across the animal kingdom and features specific to the whale shark genome. Among animals, increased lifespan is positively correlated to body size and metabolic rate. Several genomic traits also significantly correlated with body size, including intron and gene length. Our large-scale comparative genomic analysis uncovered general features of metazoan genome architecture: Guanine and cytosine (GC) content and codon adaptation index are negatively correlated, and neural connectivity genes are longer than average genes in most genomes. Focusing on the whale shark genome, we identified multiple features that significantly correlate with lifespan. Among these were very long gene length, due to introns being highly enriched in repetitive elements such as CR1-like long interspersed nuclear elements, and considerably longer neural genes of several types, including connectivity, activity, and neurodegeneration genes. The whale shark genome also has the second slowest evolutionary rate observed in vertebrates to date. Our comparative genomics approach uncovered multiple genetic features associated with body size, metabolic rate, and lifespan and showed that the whale shark is a promising model for studies of neural architecture and lifespan.
Collapse
|
13
|
Saha J, Saha BK, Pal Sarkar M, Roy V, Mandal P, Pal A. Comparative Genomic Analysis of Soil Dwelling Bacteria Utilizing a Combinational Codon Usage and Molecular Phylogenetic Approach Accentuating on Key Housekeeping Genes. Front Microbiol 2019; 10:2896. [PMID: 31921071 PMCID: PMC6928123 DOI: 10.3389/fmicb.2019.02896] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 12/02/2019] [Indexed: 01/02/2023] Open
Abstract
Soil is a diversified and complex ecological niche, home to a myriad of microorganisms particularly bacteria. The physico-chemical complexities of soil results in a plethora of physiological variations to exist within the different types of soil dwelling bacteria, giving rise to a wide variation in genome structure and complexity. This serves as an attractive proposition to analyze and compare the genome of a large number soil bacteria to comprehend their genome complexity and evolution. In this study a combination of codon usage and molecular phylogenetics of the whole genome and key housekeeping genes like infB (translation initiation factor 2), trpB (tryptophan synthase, beta subunit), atpD (ATP synthase, beta subunit), and rpoB (RNA polymerase, beta subunit) of 92 soil bacterial species spread across the entire eubacterial domain and residing in different soil types was performed. The results indicated the direct relationship of genome size with codon bias and coding frequency in the studied bacteria. The codon usage profile demonstrated by the gene trpB was found to be relatively different from the rest of the housekeeping genes with a large number of bacteria having a greater percentage of genes with Nc values less than the Nc of trpB. The results from the overall codon usage bias profile also depicted that the codon usage bias in the key housekeeping genes of soil bacteria was majorly due to selectional pressure and not mutation. The analysis of hydrophobicity of the gene product encoded by the rpoB coding sequences demonstrated tight clustering across all the soil bacteria suggesting conservation of protein structure for maintenance of form and function. The phylogenetic affinities inferred using 16S rRNA gene and the housekeeping genes demonstrated conflicting signals with trpB gene being the noisiest one. The housekeeping gene atpD was found to depict the least amount of evolutionary change in the soil bacteria considered in this study except in two Clostridium species. The phylogenetic and codon usage analysis of the soil bacteria consistently demonstrated the relatedness of Azotobacter chroococcum with different species of the genus Pseudomonas.
Collapse
Affiliation(s)
- Jayanti Saha
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj, India
| | - Barnan K. Saha
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj, India
| | - Monalisha Pal Sarkar
- Mycology & Plant Pathology Laboratory, Department of Botany, Raiganj University, Raiganj, India
| | - Vivek Roy
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj, India
| | - Parimal Mandal
- Mycology & Plant Pathology Laboratory, Department of Botany, Raiganj University, Raiganj, India
| | - Ayon Pal
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj, India
| |
Collapse
|
14
|
Pachganov S, Murtazalieva K, Zarubin A, Sokolov D, Chartier DR, Tatarinova TV. TransPrise: a novel machine learning approach for eukaryotic promoter prediction. PeerJ 2019; 7:e7990. [PMID: 31695967 PMCID: PMC6827441 DOI: 10.7717/peerj.7990] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 10/04/2019] [Indexed: 02/01/2023] Open
Abstract
As interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper we present TransPrise-an efficient deep learning tool for prediction of positions of eukaryotic transcription start sites. Our pipeline consists of two parts: the binary classifier operates the first, and if a sequence is classified as TSS-containing the regression step follows, where the precise location of TSS is being identified. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise classification and regression models with the TSSPlant approach for the well annotated genome of Oryza sativa. Using a computer equipped with a graphics processing unit, the run time of TransPrise is 250 minutes on a genome of 374 Mb long. The Matthews correlation coefficient value for TransPrise is 0.79, more than two times larger than the 0.31 for TSSPlant classification models. This represents a high level of prediction accuracy. Additionally, the mean absolute error for the regression model is 29.19 nt, allowing for accurate prediction of TSS location. TransPrise was also tested in Homo sapiens, where mean absolute error of the regression model was 47.986 nt. We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all necessary packages, models, code as well as the source code of the TransPrise algorithm are available at (http://compubioverne.group/). The source code is ready to use and customizable to predict TSS in any eukaryotic organism.
Collapse
Affiliation(s)
- Stepan Pachganov
- Ugra Research Institute of Information Technologies, Khanty-Mansiysk, Russia
| | - Khalimat Murtazalieva
- Vavilov Institute for General Genetics, Moscow, Russia.,Institute of Bioinformatics, Moscow, Russia
| | - Aleksei Zarubin
- Tomsk National Research Medical Center of the Russian Academy of Sciences, Research Institute of Medical Genetics, Tomsk, Russia
| | | | - Duane R Chartier
- International Center for Art Intelligence, Inc., Los Angeles, CA, United States of America
| | - Tatiana V Tatarinova
- Vavilov Institute for General Genetics, Moscow, Russia.,Department of Biology, University of La Verne, La Verne, CA, United States of America.,A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Siberian Federal University, Krasnoyarsk, Russia
| |
Collapse
|
15
|
Pirih N, Kunej T. An Updated Taxonomy and a Graphical Summary Tool for Optimal Classification and Comprehension of Omics Research. ACTA ACUST UNITED AC 2018; 22:337-353. [DOI: 10.1089/omi.2017.0186] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Affiliation(s)
- Nina Pirih
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Domzale, Slovenia
| | - Tanja Kunej
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Domzale, Slovenia
| |
Collapse
|
16
|
Paul P, Malakar AK, Chakraborty S. Codon usage vis-a-vis start and stop codon context analysis of three dicot species. J Genet 2018; 97:97-107. [PMID: 29666329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
To understand the variation in genomic composition and its effect on codon usage, we performed the comparative analysis of codon usage and nucleotide usage in the genes of three dicots, Glycine max, Arabidopsis thaliana and Medicago truncatula. The dicot genes were found to be A/T rich and have predominantly A-ending and/or T-ending codons. GC3s directly mimic theusage pattern of global GC content. Relative synonymous codon usage analysis suggests that the high usage frequency of A/T over G/C mononucleotide containing codons in AT-rich dicot genome is due to compositional constraint as a factor of codon usage bias. Odds ratio analysis identified the dinucleotides TpG, TpC, GpA, CpA and CpT as over-represented, where, CpG and TpA as under-represented dinucleotides. The results of (NcExp-NcObs)/NcExp plot suggests that selection pressure other than mutation played a significant role in influencing the pattern of codon usage in these dicots. PR2 analysis revealed the significant role of selection pressure on codon usage. Analysis of varience on codon usage at start and stop site showed variation in codon selection in these sites. This study provides evidence that the dicot genes were subjected to compositional selection pressure.
Collapse
Affiliation(s)
- Prosenjit Paul
- Department of Biotechnology, Assam University, Silchar 788 011, India.
| | | | | |
Collapse
|
17
|
Paul P, Malakar AK, Chakraborty S. Codon usage vis-a-vis start and stop codon context analysis of three dicot species. J Genet 2018. [DOI: 10.1007/s12041-018-0892-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
18
|
Reynoso MA, Pauluzzi GC, Kajala K, Cabanlit S, Velasco J, Bazin J, Deal R, Sinha NR, Brady SM, Bailey-Serres J. Nuclear Transcriptomes at High Resolution Using Retooled INTACT. PLANT PHYSIOLOGY 2018; 176:270-281. [PMID: 28956755 PMCID: PMC5761756 DOI: 10.1104/pp.17.00688] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 09/26/2017] [Indexed: 05/03/2023]
Abstract
Isolated nuclei provide access to early steps in gene regulation involving chromatin as well as transcript production and processing. Here, we describe transfer of the isolation of nuclei from tagged specific cell types (INTACT) to the monocot rice (Oryza sativa L.). The purification of biotinylated nuclei was redesigned by replacing the outer nuclear-envelope-targeting domain of the nuclear tagging fusion (NTF) protein with an outer nuclear-envelope-anchored domain. This modified NTF was combined with codon-optimized Escherichia coli BirA in a single T-DNA construct. We also developed inexpensive methods for INTACT, T-DNA insertion mapping, and profiling of the complete nuclear transcriptome, including a ribosomal RNA degradation procedure that minimizes pre-ribosomal RNA (pre-rRNA) transcripts. A high-resolution comparison of nuclear and steady-state poly(A)+ transcript populations of seedling root tips confirmed the capture of pre-messenger RNA (pre-mRNA) and exposed distinctions in diversity and abundance of the nuclear and total transcriptomes. This retooled INTACT can enable high-resolution monitoring of the nuclear transcriptome and chromatin in specific cell types of rice and other species.
Collapse
Affiliation(s)
- Mauricio A Reynoso
- Center for Plant Cell Biology, Department of Botany and Plant Sciences, University of California, Riverside, California 92521
| | - Germain C Pauluzzi
- Center for Plant Cell Biology, Department of Botany and Plant Sciences, University of California, Riverside, California 92521
| | - Kaisa Kajala
- Department of Plant Biology, University of California, Davis, California 95616
- Genome Center, University of California, Davis, California 95616
| | - Sean Cabanlit
- Center for Plant Cell Biology, Department of Botany and Plant Sciences, University of California, Riverside, California 92521
| | - Joel Velasco
- Center for Plant Cell Biology, Department of Botany and Plant Sciences, University of California, Riverside, California 92521
| | - Jérémie Bazin
- IPS2, Institute of Plant Science-Paris Saclay (CNRS-INRA), University of Paris-Saclay, F-911405, Orsay, France
| | - Roger Deal
- Department of Biology, Emory University, Atlanta, Georgia 30322
| | - Neelima R Sinha
- Department of Plant Biology, University of California, Davis, California 95616
| | - Siobhan M Brady
- Department of Plant Biology, University of California, Davis, California 95616
- Genome Center, University of California, Davis, California 95616
| | - Julia Bailey-Serres
- Center for Plant Cell Biology, Department of Botany and Plant Sciences, University of California, Riverside, California 92521
| |
Collapse
|
19
|
Matsumura H, Nakano Y, Ochi H, Onohara Y, Sairaku A, Tokuyama T, Tomomori S, Motoda C, Amioka M, Hironobe N, Toshishige M, Takahashi S, Imai K, Sueda T, Chayama K, Kihara Y. H558R, a common SCN5A polymorphism, modifies the clinical phenotype of Brugada syndrome by modulating DNA methylation of SCN5A promoters. J Biomed Sci 2017; 24:91. [PMID: 29202755 PMCID: PMC5713129 DOI: 10.1186/s12929-017-0397-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 11/22/2017] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND A common SCN5A polymorphism H558R (c.1673 A > G, rs1805124) improves sodium channel activity in mutated channels and known to be a genetic modifier of Brugada syndrome patients (BrS). We investigated clinical manifestations and underlying mechanisms of H558R in BrS. METHODS AND RESULTS We genotyped H558R in 100 BrS (mean age 45 ± 14 years; 91 men) and 1875 controls (mean age 54 ± 18 years; 1546 men). We compared clinical parameters in BrS with and without H558R (H558R+ vs. H558R- group, N = 9 vs. 91). We also obtained right atrial sections from 30 patients during aortic aneurysm operations and compared SCN5A expression and methylation with or without H558R. H558R was less frequent in BrS than controls (9.0% vs. 19.2%, P = 0.028). The VF occurrence ratio was significantly lower (0% vs. 29.7%, P = 0.03) and spontaneous type 1 ECG was less observed in H558R+ than H558R- group (33.3% vs. 74.7%, P = 0.01). The SCN5A expression level was significantly higher and the methylation rate was significantly lower in sections with H558R (N = 10) than those without (0.98 ± 0.14 vs. 0.83 ± 0.19, P = 0.04; 0.7 ± 0.2% vs. 1.6 ± 0.1%, P = 0.004, respectively). In BrS with heterozygous H558R, the A allele mRNA expression was 1.38 fold higher than G allele expression. CONCLUSION The SCN5A polymorphism H558R may be a modifier that protects against VF occurrence in BrS. The H558R decreased the SCN5A promoter methylation and increased the expression level in cardiac tissue. An allelic expression imbalance in BrS with a heterozygous H558R may also contribute to the protective effects in heterozygous mutations.
Collapse
Affiliation(s)
- Hiroya Matsumura
- Department of Cardiovascular Medicine, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Yukiko Nakano
- Department of Cardiovascular Medicine, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
- Laboratory for Digestive Diseases, Center for Genomic Medicine, RIKEN, Hiroshima, Japan
| | - Hidenori Ochi
- Laboratory for Digestive Diseases, Center for Genomic Medicine, RIKEN, Hiroshima, Japan
- Department of Gastroenterology and Metabolism, Division of Frontier Medical Science, Programs for Biomedical Research Graduate School of Biomedical Science, Hiroshima University, Hiroshima, Japan
| | - Yuko Onohara
- Department of Cardiovascular Medicine, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Akinori Sairaku
- Department of Cardiovascular Medicine, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Takehito Tokuyama
- Department of Cardiovascular Medicine, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Shunsuke Tomomori
- Department of Cardiovascular Medicine, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Chikaaki Motoda
- Department of Cardiovascular Medicine, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Michitaka Amioka
- Department of Cardiovascular Medicine, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Naoya Hironobe
- Department of Cardiovascular Medicine, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Masaaki Toshishige
- Department of Cardiovascular Medicine, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Shinya Takahashi
- Department of Cardiovascular Surgery, Hiroshima University Hospital, Hiroshima, Japan
| | - Katsuhiko Imai
- Department of Cardiovascular Surgery, Hiroshima University Hospital, Hiroshima, Japan
| | - Taijiro Sueda
- Department of Cardiovascular Surgery, Hiroshima University Hospital, Hiroshima, Japan
| | - Kazuaki Chayama
- Laboratory for Digestive Diseases, Center for Genomic Medicine, RIKEN, Hiroshima, Japan
- Department of Gastroenterology and Metabolism, Division of Frontier Medical Science, Programs for Biomedical Research Graduate School of Biomedical Science, Hiroshima University, Hiroshima, Japan
| | - Yasuki Kihara
- Department of Cardiovascular Medicine, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| |
Collapse
|
20
|
Goswami AM. Codon usage patterns of 3β-hydroxysteroid dehydrogenase type 2 gene across mammalian species and the influence of mutation and selection pressure. GENE REPORTS 2017. [DOI: 10.1016/j.genrep.2017.08.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
21
|
Mazumdar P, Binti Othman R, Mebus K, Ramakrishnan N, Ann Harikrishna J. Codon usage and codon pair patterns in non-grass monocot genomes. ANNALS OF BOTANY 2017; 120:893-909. [PMID: 29155926 PMCID: PMC5710610 DOI: 10.1093/aob/mcx112] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Accepted: 09/19/2017] [Indexed: 05/19/2023]
Abstract
BACKGROUND AND AIMS Studies on codon usage in monocots have focused on grasses, and observed patterns of this taxon were generalized to all monocot species. Here, non-grass monocot species were analysed to investigate the differences between grass and non-grass monocots. METHODS First, studies of codon usage in monocots were reviewed. The current information was then extended regarding codon usage, as well as codon-pair context bias, using four completely sequenced non-grass monocot genomes (Musa acuminata, Musa balbisiana, Phoenix dactylifera and Spirodela polyrhiza) for which comparable transcriptome datasets are available. Measurements were taken regarding relative synonymous codon usage, effective number of codons, derived optimal codon and GC content and then the relationships investigated to infer the underlying evolutionary forces. KEY RESULTS The research identified optimal codons, rare codons and preferred codon-pair context in the non-grass monocot species studied. In contrast to the bimodal distribution of GC3 (GC content in third codon position) in grasses, non-grass monocots showed a unimodal distribution. Disproportionate use of G and C (and of A and T) in two- and four-codon amino acids detected in the analysis rules out the mutational bias hypothesis as an explanation of genomic variation in GC content. There was found to be a positive relationship between CAI (codon adaptation index; predicts the level of expression of a gene) and GC3. In addition, a strong correlation was observed between coding and genomic GC content and negative correlation of GC3 with gene length, indicating a strong impact of GC-biased gene conversion (gBGC) in shaping codon usage and nucleotide composition in non-grass monocots. CONCLUSION Optimal codons in these non-grass monocots show a preference for G/C in the third codon position. These results support the concept that codon usage and nucleotide composition in non-grass monocots are mainly driven by gBGC.
Collapse
Affiliation(s)
- Purabi Mazumdar
- Centre for Research in Biotechnology for Agriculture, University of Malaya, Kuala Lumpur, Malaysia
| | - RofinaYasmin Binti Othman
- Centre for Research in Biotechnology for Agriculture, University of Malaya, Kuala Lumpur, Malaysia
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Katharina Mebus
- Centre for Research in Biotechnology for Agriculture, University of Malaya, Kuala Lumpur, Malaysia
| | - N Ramakrishnan
- Electrical and Computer System Engineering, School of Engineering, Monash University Malaysia, Bandar Sunway, Malaysia
| | - Jennifer Ann Harikrishna
- Centre for Research in Biotechnology for Agriculture, University of Malaya, Kuala Lumpur, Malaysia
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
- For correspondence. E-mail:
| |
Collapse
|
22
|
Nakayama TJ, Rodrigues FA, Neumaier N, Marcolino-Gomes J, Molinari HBC, Santiago TR, Formighieri EF, Basso MF, Farias JRB, Emygdio BM, de Oliveira ACB, Campos ÂD, Borém A, Harmon FG, Mertz-Henning LM, Nepomuceno AL. Insights into soybean transcriptome reconfiguration under hypoxic stress: Functional, regulatory, structural, and compositional characterization. PLoS One 2017; 12:e0187920. [PMID: 29145496 PMCID: PMC5690659 DOI: 10.1371/journal.pone.0187920] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Accepted: 10/27/2017] [Indexed: 11/19/2022] Open
Abstract
Soybean (Glycine max) is one of the major crops worldwide and flooding stress affects the production and expansion of cultivated areas. Oxygen is essential for mitochondrial aerobic respiration to supply the energy demand of plant cells. Because oxygen diffusion in water is 10,000 times lower than in air, partial (hypoxic) or total (anoxic) oxygen deficiency is important component of flooding. Even when oxygen is externally available, oxygen deficiency frequently occurs in bulky, dense or metabolically active tissues such as phloem, meristems, seeds, and fruits. In this study, we analyzed conserved and divergent root transcriptional responses between flood-tolerant Embrapa 45 and flood-sensitive BR 4 soybean cultivars under hypoxic stress conditions with RNA-seq. To understand how soybean genes evolve and respond to hypoxia, stable and differentially expressed genes were characterized structurally and compositionally comparing its mechanistic relationship. Between cultivars, Embrapa 45 showed less up- and more down-regulated genes, and stronger induction of phosphoglucomutase (Glyma05g34790), unknown protein related to N-terminal protein myristoylation (Glyma06g03430), protein suppressor of phyA-105 (Glyma06g37080), and fibrillin (Glyma10g32620). RNA-seq and qRT-PCR analysis of non-symbiotic hemoglobin (Glyma11g12980) indicated divergence in gene structure between cultivars. Transcriptional changes for genes in amino acids and derivative metabolic process suggest involvement of amino acids metabolism in tRNA modifications, translation accuracy/efficiency, and endoplasmic reticulum stress in both cultivars under hypoxia. Gene groups differed in promoter TATA box, ABREs (ABA-responsive elements), and CRT/DREs (C-repeat/dehydration-responsive elements) frequency. Gene groups also differed in structure, composition, and codon usage, indicating biological significances. Additional data suggests that cis-acting ABRE elements can mediate gene expression independent of ABA in soybean roots under hypoxia.
Collapse
Affiliation(s)
- Thiago J. Nakayama
- Departamento de Fitotecnia, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Fabiana A. Rodrigues
- Embrapa Soja, Empresa Brasileira de Pesquisa Agropecuária, Londrina, Paraná, Brazil
| | - Norman Neumaier
- Embrapa Soja, Empresa Brasileira de Pesquisa Agropecuária, Londrina, Paraná, Brazil
| | | | - Hugo B. C. Molinari
- Embrapa Agroenergia, Empresa Brasileira de Pesquisa Agropecuária, Brasília, Distrito Federal, Brazil
| | - Thaís R. Santiago
- Embrapa Agroenergia, Empresa Brasileira de Pesquisa Agropecuária, Brasília, Distrito Federal, Brazil
| | - Eduardo F. Formighieri
- Embrapa Agroenergia, Empresa Brasileira de Pesquisa Agropecuária, Brasília, Distrito Federal, Brazil
| | - Marcos F. Basso
- Embrapa Agroenergia, Empresa Brasileira de Pesquisa Agropecuária, Brasília, Distrito Federal, Brazil
| | - José R. B. Farias
- Embrapa Soja, Empresa Brasileira de Pesquisa Agropecuária, Londrina, Paraná, Brazil
| | - Beatriz M. Emygdio
- Embrapa Clima Temperado, Empresa Brasileira de Pesquisa Agropecuária, Pelotas, Rio Grande do Sul, Brazil
| | - Ana C. B. de Oliveira
- Embrapa Clima Temperado, Empresa Brasileira de Pesquisa Agropecuária, Pelotas, Rio Grande do Sul, Brazil
| | - Ângela D. Campos
- Embrapa Clima Temperado, Empresa Brasileira de Pesquisa Agropecuária, Pelotas, Rio Grande do Sul, Brazil
| | - Aluízio Borém
- Departamento de Fitotecnia, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Frank G. Harmon
- Department of Plant and Microbial Biology, University of California-Berkeley, Berkeley, California, United States of America
| | | | | |
Collapse
|
23
|
Triska M, Solovyev V, Baranova A, Kel A, Tatarinova TV. Nucleotide patterns aiding in prediction of eukaryotic promoters. PLoS One 2017; 12:e0187243. [PMID: 29141011 PMCID: PMC5687710 DOI: 10.1371/journal.pone.0187243] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2017] [Accepted: 09/05/2017] [Indexed: 01/09/2023] Open
Abstract
Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into “promoters” and “non-promoters” even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 “promoter-specific” transcription factors), those that bind preferentially to the [0,500] region (282 “5′ UTR-specific” TFs), and 207 of the “promiscuous” transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots.
Collapse
Affiliation(s)
- Martin Triska
- Children’s Hospital Los Angeles, University of Southern California, Los Angeles, CA, United States of America
- Faculty of Advanced Technology, University of South Wales, Pontypridd, Wales, United Kingdom
| | | | - Ancha Baranova
- School of Systems Biology, George Mason University, Fairfax, VA, United States of America
- Research Centre for Medical Genetics, Moscow, Russia
| | - Alexander Kel
- geneXplain GmbH, Wolfenbuettel, Germany
- Institute of Chemical Biology and Fundamental Medicine, Novosibirsk, Russia
| | - Tatiana V. Tatarinova
- School of Systems Biology, George Mason University, Fairfax, VA, United States of America
- Department of Biology, Division of Natural Sciences, University of La Verne, La Verne, CA, United States of America
- Bioinformatics Center, AA Kharkevich Institute for Information Transmission Problems RAS, Moscow, Russia
- Vavilov’s Institute for General Genetics, Moscow, Russia, Moscow, Russia
- * E-mail:
| |
Collapse
|
24
|
Niu Z, Xue Q, Wang H, Xie X, Zhu S, Liu W, Ding X. Mutational Biases and GC-Biased Gene Conversion Affect GC Content in the Plastomes of Dendrobium Genus. Int J Mol Sci 2017; 18:E2307. [PMID: 29099062 PMCID: PMC5713276 DOI: 10.3390/ijms18112307] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Revised: 09/27/2017] [Accepted: 10/20/2017] [Indexed: 01/03/2023] Open
Abstract
The variation of GC content is a key genome feature because it is associated with fundamental elements of genome organization. However, the reason for this variation is still an open question. Different kinds of hypotheses have been proposed to explain the variation of GC content during genome evolution. However, these hypotheses have not been explicitly investigated in whole plastome sequences. Dendrobium is one of the largest genera in the orchid species. Evolutionary studies of the plastomic organization and base composition are limited in this genus. In this study, we obtained the high-quality plastome sequences of D. loddigesii and D. devonianum. The comparison results showed a nearly identical organization in Dendrobium plastomes, indicating that the plastomic organization is highly conserved in Dendrobium genus. Furthermore, the impact of three evolutionary forces-selection, mutational biases, and GC-biased gene conversion (gBGC)-on the variation of GC content in Dendrobium plastomes was evaluated. Our results revealed: (1) consistent GC content evolution trends and mutational biases in single-copy (SC) and inverted repeats (IRs) regions; and (2) that gBGC has influenced the plastome-wide GC content evolution. These results suggest that both mutational biases and gBGC affect GC content in the plastomes of Dendrobium genus.
Collapse
Affiliation(s)
- Zhitao Niu
- College of Life Sciences, Nanjing Normal University, Nanjing 210023, China.
| | - Qingyun Xue
- College of Life Sciences, Nanjing Normal University, Nanjing 210023, China.
| | - Hui Wang
- College of Life Sciences, Nanjing Normal University, Nanjing 210023, China.
| | - Xuezhu Xie
- College of Life Sciences, Nanjing Normal University, Nanjing 210023, China.
| | - Shuying Zhu
- College of Life Sciences, Nanjing Normal University, Nanjing 210023, China.
| | - Wei Liu
- College of Life Sciences, Nanjing Normal University, Nanjing 210023, China.
| | - Xiaoyu Ding
- College of Life Sciences, Nanjing Normal University, Nanjing 210023, China.
| |
Collapse
|
25
|
Sidorenko LV, Lee TF, Woosley A, Moskal WA, Bevan SA, Merlo PAO, Walsh TA, Wang X, Weaver S, Glancy TP, Wang P, Yang X, Sriram S, Meyers BC. GC-rich coding sequences reduce transposon-like, small RNA-mediated transgene silencing. NATURE PLANTS 2017; 3:875-884. [PMID: 29085072 DOI: 10.1038/s41477-017-0040-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Accepted: 09/29/2017] [Indexed: 05/04/2023]
Abstract
The molecular basis of transgene susceptibility to silencing is poorly characterized in plants; thus, we evaluated several transgene design parameters as means to reduce heritable transgene silencing. Analyses of Arabidopsis plants with transgenes encoding a microalgal polyunsaturated fatty acid (PUFA) synthase revealed that small RNA (sRNA)-mediated silencing, combined with the use of repetitive regulatory elements, led to aggressive transposon-like silencing of canola-biased PUFA synthase transgenes. Diversifying regulatory sequences and using native microalgal coding sequences (CDSs) with higher GC content improved transgene expression and resulted in a remarkable trans-generational stability via reduced accumulation of sRNAs and DNA methylation. Further experiments in maize with transgenes individually expressing three crystal (Cry) proteins from Bacillus thuringiensis (Bt) tested the impact of CDS recoding using different codon bias tables. Transgenes with higher GC content exhibited increased transcript and protein accumulation. These results demonstrate that the sequence composition of transgene CDSs can directly impact silencing, providing design strategies for increasing transgene expression levels and reducing risks of heritable loss of transgene expression.
Collapse
Affiliation(s)
| | - Tzuu-Fen Lee
- Delaware Biotechnology Institute, Department of Plant and Soil Sciences, University of Delaware, Newark, DE, 19716, USA
- Dupont Pioneer, Johnston, IA, 50131, USA
| | - Aaron Woosley
- Dow AgroSciences LLC., 9330 Zionsville Road, Indianapolis, IN, 46268, USA
| | - William A Moskal
- Dow AgroSciences LLC., 9330 Zionsville Road, Indianapolis, IN, 46268, USA
| | - Scott A Bevan
- Dow AgroSciences LLC., 9330 Zionsville Road, Indianapolis, IN, 46268, USA
| | - P Ann Owens Merlo
- Dow AgroSciences LLC., 9330 Zionsville Road, Indianapolis, IN, 46268, USA
| | - Terence A Walsh
- Dow AgroSciences LLC., 9330 Zionsville Road, Indianapolis, IN, 46268, USA
| | - Xiujuan Wang
- Dow AgroSciences LLC., 9330 Zionsville Road, Indianapolis, IN, 46268, USA
| | - Staci Weaver
- Dow AgroSciences LLC., 9330 Zionsville Road, Indianapolis, IN, 46268, USA
| | - Todd P Glancy
- Dow AgroSciences LLC., 9330 Zionsville Road, Indianapolis, IN, 46268, USA
| | - PoHao Wang
- Dow AgroSciences LLC., 9330 Zionsville Road, Indianapolis, IN, 46268, USA
| | - Xiaozeng Yang
- Dow AgroSciences LLC., 9330 Zionsville Road, Indianapolis, IN, 46268, USA
- Beijing Agro-Biotechnology Research Center, Beijing Academy of Agriculture and Forestry Sciences, 100097, Beijing, China
| | - Shreedharan Sriram
- Dow AgroSciences LLC., 9330 Zionsville Road, Indianapolis, IN, 46268, USA
| | - Blake C Meyers
- Delaware Biotechnology Institute, Department of Plant and Soil Sciences, University of Delaware, Newark, DE, 19716, USA.
- Donald Danforth Plant Science Center, St. Louis, MO, 63132, USA.
| |
Collapse
|
26
|
Chan KL, Tatarinova TV, Rosli R, Amiruddin N, Azizi N, Halim MAA, Sanusi NSNM, Jayanthi N, Ponomarenko P, Triska M, Solovyev V, Firdaus-Raih M, Sambanthamurthi R, Murphy D, Low ETL. Evidence-based gene models for structural and functional annotations of the oil palm genome. Biol Direct 2017; 12:21. [PMID: 28886750 PMCID: PMC5591544 DOI: 10.1186/s13062-017-0191-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 08/07/2017] [Indexed: 11/13/2022] Open
Abstract
Background Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Results Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC3-rich genes (GC3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. Conclusions We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC3-rich and intronless), as well as those associated with important functions, such as FA biosynthesis and disease resistance. The study demonstrated the advantages of having an integrated approach to gene prediction and developed a computational framework for combining multiple genome annotations. These results, available in the oil palm annotation database (http://palmxplore.mpob.gov.my), will provide important resources for studies on the genomes of oil palm and related crops. Reviewers This article was reviewed by Alexander Kel, Igor Rogozin, and Vladimir A. Kuznetsov. Electronic supplementary material The online version of this article (doi:10.1186/s13062-017-0191-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kuang-Lim Chan
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.,Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
| | - Tatiana V Tatarinova
- Department of Biology, University of La Verne, La Verne, California, 91750, USA.,Spatial Sciences Institute, University of Southern California, Los Angeles, CA, 90089, USA
| | - Rozana Rosli
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.,Genomics and Computational Biology Research Group, University of South Wales, Pontypridd, CF371DL, UK
| | - Nadzirah Amiruddin
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Norazah Azizi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Mohd Amin Ab Halim
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Nik Shazana Nik Mohd Sanusi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Nagappan Jayanthi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Petr Ponomarenko
- Spatial Sciences Institute, University of Southern California, Los Angeles, CA, 90089, USA
| | - Martin Triska
- Children's Hospital Los Angeles, University of Southern California, Los Angeles, CA, 90089, USA
| | - Victor Solovyev
- Softberry Inc., 116 Radio Circle, Suite 400, Mount Kisco, NY, 10549, USA
| | - Mohd Firdaus-Raih
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
| | - Ravigadevi Sambanthamurthi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Denis Murphy
- Genomics and Computational Biology Research Group, University of South Wales, Pontypridd, CF371DL, UK
| | - Eng-Ti Leslie Low
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.
| |
Collapse
|
27
|
Evolution of Brain Active Gene Promoters in Human Lineage Towards the Increased Plasticity of Gene Regulation. Mol Neurobiol 2017; 55:1871-1904. [PMID: 28233272 DOI: 10.1007/s12035-017-0427-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 01/26/2017] [Indexed: 01/31/2023]
Abstract
Adaptability to a variety of environmental conditions is a prominent feature of Homo sapiens. We hypothesize that this feature can be explained by evolutionary changes in gene promoters active in the brain prefrontal cortex leading to a more flexible gene regulation network. The genotype-dependent range of gene expression can be broader in humans than in other higher primates. Thus, we searched for specific signatures of evolutionary changes in promoter architectures of multiple hominid genes, including the genes active in human cortical neurons that may indicate an increase of variability of gene expression rather than just changes in the level of expression, such as downregulation or upregulation of the genes. We performed a whole-genome search for genetic-based alterations that may impact gene regulation "flexibility" in a process of hominids evolution, such as (i) CpG dinucleotide content, (ii) predicted nucleosome-DNA dissociation constant, and (iii) predicted affinities for TATA-binding protein (TBP) in gene promoters. We tested all putative promoter regions across the human genome and especially gene promoters in active chromatin state in neurons of prefrontal cortex, the brain region critical for abstract thinking and social and behavioral adaptation. Our data imply that the origin of modern man has been associated with an increase of flexibility of promoter-driven gene regulation in brain. In contrast, after splitting from the ancestral lineages of H. sapiens, the evolution of ape species is characterized by reduced flexibility of gene promoter functioning, underlying reduced variability of the gene expression.
Collapse
|
28
|
Tatarinova TV, Chekalin E, Nikolsky Y, Bruskin S, Chebotarov D, McNally KL, Alexandrov N. Nucleotide diversity analysis highlights functionally important genomic regions. Sci Rep 2016; 6:35730. [PMID: 27774999 PMCID: PMC5075931 DOI: 10.1038/srep35730] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/30/2016] [Indexed: 12/15/2022] Open
Abstract
We analyzed functionality and relative distribution of genetic variants across the complete Oryza sativa genome, using the 40 million single nucleotide polymorphisms (SNPs) dataset from the 3,000 Rice Genomes Project (http://snp-seek.irri.org), the largest and highest density SNP collection for any higher plant. We have shown that the DNA-binding transcription factors (TFs) are the most conserved group of genes, whereas kinases and membrane-localized transporters are the most variable ones. TFs may be conserved because they belong to some of the most connected regulatory hubs that modulate transcription of vast downstream gene networks, whereas signaling kinases and transporters need to adapt rapidly to changing environmental conditions. In general, the observed profound patterns of nucleotide variability reveal functionally important genomic regions. As expected, nucleotide diversity is much higher in intergenic regions than within gene bodies (regions spanning gene models), and protein-coding sequences are more conserved than untranslated gene regions. We have observed a sharp decline in nucleotide diversity that begins at about 250 nucleotides upstream of the transcription start and reaches minimal diversity exactly at the transcription start. We found the transcription termination sites to have remarkably symmetrical patterns of SNP density, implying presence of functional sites near transcription termination. Also, nucleotide diversity was significantly lower near 3′ UTRs, the area rich with regulatory regions.
Collapse
Affiliation(s)
- Tatiana V Tatarinova
- Center for Personalized Medicine and Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA.,Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russian Federation
| | | | - Yuri Nikolsky
- Vavilov Institute of General Genetics, Moscow, Russia.,F1 Genomics, San Diego, CA, USA.,School of Systems Biology, George Mason University, VA, USA
| | | | - Dmitry Chebotarov
- International Rice Research Institute, Los Baños, Laguna 4031, Philippines
| | - Kenneth L McNally
- International Rice Research Institute, Los Baños, Laguna 4031, Philippines
| | | |
Collapse
|
29
|
Shimada MK, Sanbonmatsu R, Yamaguchi-Kabata Y, Yamasaki C, Suzuki Y, Chakraborty R, Gojobori T, Imanishi T. Selection pressure on human STR loci and its relevance in repeat expansion disease. Mol Genet Genomics 2016; 291:1851-69. [PMID: 27290643 DOI: 10.1007/s00438-016-1219-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2015] [Accepted: 05/21/2016] [Indexed: 12/30/2022]
Abstract
Short Tandem Repeats (STRs) comprise repeats of one to several base pairs. Because of the high mutability due to strand slippage during DNA synthesis, rapid evolutionary change in the number of repeating units directly shapes the range of repeat-number variation according to selection pressure. However, the remaining questions include: Why are STRs causing repeat expansion diseases maintained in the human population; and why are these limited to neurodegenerative diseases? By evaluating the genome-wide selection pressure on STRs using the database we constructed, we identified two different patterns of relationship in repeat-number polymorphisms between DNA and amino-acid sequences, although both patterns are evolutionary consequences of avoiding the formation of harmful long STRs. First, a mixture of degenerate codons is represented in poly-proline (poly-P) repeats. Second, long poly-glutamine (poly-Q) repeats are favored at the protein level; however, at the DNA level, STRs encoding long poly-Qs are frequently divided by synonymous SNPs. Furthermore, significant enrichments of apoptosis and neurodevelopment were biological processes found specifically in genes encoding poly-Qs with repeat polymorphism. This suggests the existence of a specific molecular function for polymorphic and/or long poly-Q stretches. Given that the poly-Qs causing expansion diseases were longer than other poly-Qs, even in healthy subjects, our results indicate that the evolutionary benefits of long and/or polymorphic poly-Q stretches outweigh the risks of long CAG repeats predisposing to pathological hyper-expansions. Molecular pathways in neurodevelopment requiring long and polymorphic poly-Q stretches may provide a clue to understanding why poly-Q expansion diseases are limited to neurodegenerative diseases.
Collapse
Affiliation(s)
- Makoto K Shimada
- Institute for Comprehensive Medical Science, Fujita Health University, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake, Aichi, 470-1192, Japan. .,National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan. .,Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan.
| | - Ryoko Sanbonmatsu
- Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan
| | - Yumi Yamaguchi-Kabata
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, 980-8573, Japan
| | - Chisato Yamasaki
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan
| | - Yoshiyuki Suzuki
- Graduate School of Natural Sciences, Nagoya City University, 1 Yamanohata, Mizuho-cho, Mizuho-ku, Nagoya, Aichi, 467-8501, Japan
| | - Ranajit Chakraborty
- Health Science Center, University of North Texas, 3500 Camp Bowie Blvd., Fort Worth, TX, 76107, USA
| | - Takashi Gojobori
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Computational Bioscience Research Center, King Abdullah University of Science and Technology, Ibn Al-Haytham Building (West), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tadashi Imanishi
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Department of Molecular Life Science, Tokai University School of Medicine, 143 Shimokasuya, Isehara, Kanagawa, 259-1193, Japan
| |
Collapse
|
30
|
McKain MR, Tang H, McNeal JR, Ayyampalayam S, Davis JI, dePamphilis CW, Givnish TJ, Pires JC, Stevenson DW, Leebens-Mack JH. A Phylogenomic Assessment of Ancient Polyploidy and Genome Evolution across the Poales. Genome Biol Evol 2016; 8:1150-64. [PMID: 26988252 PMCID: PMC4860692 DOI: 10.1093/gbe/evw060] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Comparisons of flowering plant genomes reveal multiple rounds of ancient polyploidy characterized by large intragenomic syntenic blocks. Three such whole-genome duplication (WGD) events, designated as rho (ρ), sigma (σ), and tau (τ), have been identified in the genomes of cereal grasses. Precise dating of these WGD events is necessary to investigate how they have influenced diversification rates, evolutionary innovations, and genomic characteristics such as the GC profile of protein-coding sequences. The timing of these events has remained uncertain due to the paucity of monocot genome sequence data outside the grass family (Poaceae). Phylogenomic analysis of protein-coding genes from sequenced genomes and transcriptome assemblies from 35 species, including representatives of all families within the Poales, has resolved the timing of rho and sigma relative to speciation events and placed tau prior to divergence of Asparagales and the commelinids but after divergence with eudicots. Examination of gene family phylogenies indicates that rho occurred just prior to the diversification of Poaceae and sigma occurred before early diversification of Poales lineages but after the Poales-commelinid split. Additional lineage-specific WGD events were identified on the basis of the transcriptome data. Gene families exhibiting high GC content are underrepresented among those with duplicate genes that persisted following these genome duplications. However, genome duplications had little overall influence on lineage-specific changes in the GC content of coding genes. Improved resolution of the timing of WGD events in monocot history provides evidence for the influence of polyploidization on functional evolution and species diversification.
Collapse
Affiliation(s)
- Michael R McKain
- Donald Danforth Plant Science Center, St. Louis, Missouri Department of Plant Biology, University of Georgia
| | - Haibao Tang
- Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China School of Plant Sciences, iPlant Collaborative, University of Arizona
| | - Joel R McNeal
- Department of Ecology, Evolution, and Organismal Biology, Kennesaw State University Department of Plant Biology, University of Georgia
| | | | - Jerrold I Davis
- L. H. Bailey Hortorium and Department of Plant Biology, Cornell University
| | - Claude W dePamphilis
- Department of Biology and Institute of Molecular Evolutionary Genetics, Pennsylvania State University, University Park, Pennsylvania
| | | | - J Chris Pires
- Division of Biological Sciences, University of Missouri, Columbia
| | | | | |
Collapse
|
31
|
Tatarinova TV, Lysnyansky I, Nikolsky YV, Bolshoy A. The mysterious orphans of Mycoplasmataceae. Biol Direct 2016; 11:2. [PMID: 26747447 PMCID: PMC4706650 DOI: 10.1186/s13062-015-0104-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2015] [Accepted: 12/30/2015] [Indexed: 01/08/2023] Open
Abstract
Background The length of a protein sequence is largely determined by its function. In certain species, it may be also affected by additional factors, such as growth temperature or acidity. In 2002, it was shown that in the bacterium Escherichia coli and in the archaeon Archaeoglobus fulgidus, protein sequences with no homologs were, on average, shorter than those with homologs (BMC Evol Biol 2:20, 2002). It is now generally accepted that in bacterial and archaeal genomes the distributions of protein length are different between sequences with and without homologs. In this study, we examine this postulate by conducting a comprehensive analysis of all annotated prokaryotic genomes and by focusing on certain exceptions. Results We compared the distribution of lengths of “having homologs proteins” (HHPs) and “non-having homologs proteins” (orphans or ORFans) in all currently completely sequenced and COG-annotated prokaryotic genomes. As expected, the HHPs and ORFans have strikingly different length distributions in almost all genomes. As previously established, the HHPs, indeed are, on average, longer than the ORFans, and the length distributions for the ORFans have a relatively narrow peak, in contrast to the HHPs, whose lengths spread over a wider range of values. However, about thirty genomes do not obey these rules. Practically all genomes of Mycoplasma and Ureaplasma have atypical ORFans distributions, with the mean lengths of ORFan larger than the mean lengths of HHPs. These genera constitute over 80 % of atypical genomes. Conclusions We confirmed on a ubiquitous set of genomes that the previous observation of HHPs and ORFans have different gene length distributions. We also showed that Mycoplasmataceae genomes have very distinctive distributions of ORFans lengths. We offer several possible biological explanations of this phenomenon, such as an adaptation to Mycoplasmataceae’s ecological niche, specifically its “quiet” co-existence with host organisms, resulting in long ABC transporters. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0104-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tatiana V Tatarinova
- Children's Hospital Los Angeles, Keck School of Medicine, University of Southern California, Los Angeles, 90027, CA, USA. .,Spatial Sciences Institute, University of Southern California, Los Angeles, 90089, CA, USA.
| | - Inna Lysnyansky
- Mycoplasma Unit, Division of Avian and Aquatic Diseases, Kimron Veterinary Institute, POB 12, Beit Dagan, 50250, Israel.
| | - Yuri V Nikolsky
- School of Systems Biology, George Mason University, 10900 University Blvd, MSN 5B3, Manassas, VA, 20110, USA. .,Prosapia Genetics, LLC, 534 San Andres Dr., Solana Beach, CA, 92075, USA. .,Vavilov Institute of General Genetics, Moscow, Russian Federation.
| | - Alexander Bolshoy
- Department of Evolutionary and Environmental Biology and Institute of Evolution, University of Haifa, Haifa, Israel.
| |
Collapse
|
32
|
Lengths of Orthologous Prokaryotic Proteins Are Affected by Evolutionary Factors. BIOMED RESEARCH INTERNATIONAL 2015; 2015:786861. [PMID: 26114113 PMCID: PMC4465819 DOI: 10.1155/2015/786861] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 11/02/2014] [Indexed: 12/16/2022]
Abstract
Proteins of the same functional family (for example, kinases) may have significantly different lengths. It is an open question whether such variation in length is random or it appears as a response to some unknown evolutionary driving factors. The main purpose of this paper is to demonstrate existence of factors affecting prokaryotic gene lengths. We believe that the ranking of genomes according to lengths of their genes, followed by the calculation of coefficients of association between genome rank and genome property, is a reasonable approach in revealing such evolutionary driving factors. As we demonstrated earlier, our chosen approach, Bubble-sort, combines stability, accuracy, and computational efficiency as compared to other ranking methods. Application of Bubble Sort to the set of 1390 prokaryotic genomes confirmed that genes of Archaeal species are generally shorter than Bacterial ones. We observed that gene lengths are affected by various factors: within each domain, different phyla have preferences for short or long genes; thermophiles tend to have shorter genes than the soil-dwellers; halophiles tend to have longer genes. We also found that species with overrepresentation of cytosines and guanines in the third position of the codon (GC3 content) tend to have longer genes than species with low GC3 content.
Collapse
|
33
|
Shen W, Wang D, Ye B, Shi M, Ma L, Zhang Y, Zhao Z. GC3-biased gene domains in mammalian genomes. Bioinformatics 2015; 31:3081-4. [PMID: 26019240 PMCID: PMC4576692 DOI: 10.1093/bioinformatics/btv329] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 05/19/2015] [Indexed: 01/17/2023] Open
Abstract
Motivation: Synonymous codon usage bias has been shown to be correlated with many genomic features among different organisms. However, the biological significance of codon bias with respect to gene function and genome organization remains unclear. Results: Guanine and cytosine content at the third codon position (GC3) could be used as a good indicator of codon bias. Here, we used relative GC3 bias values to compare the strength of GC3 bias of genes in human and mouse. We reported, for the first time, that GC3-rich and GC3-poor gene products might have distinct sub-cellular spatial distributions. Moreover, we extended the view of genomic gene domains and identified conserved GC3 biased gene domains along chromosomes. Our results indicated that similar GC3 biased genes might be co-translated in specific spatial regions to share local translational machineries, and that GC3 could be involved in the organization of genome architecture. Availability and implementation: Source code is available upon request from the authors. Contact:zhaozh@nic.bmi.ac.cn or zany1983@gmail.com Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenlong Shen
- Beijing Institute of Biotechnology, Beijing 100071, China
| | - Dong Wang
- Beijing Institute of Biotechnology, Beijing 100071, China
| | - Bingyu Ye
- Beijing Institute of Biotechnology, Beijing 100071, China, College of Life Sciences, Capital Normal University, Beijing 100048, China and
| | - Minglei Shi
- Beijing Institute of Biotechnology, Beijing 100071, China
| | - Lei Ma
- College of Life Sciences, Shihezi University, Shihezi 832003, China
| | - Yan Zhang
- Beijing Institute of Biotechnology, Beijing 100071, China
| | - Zhihu Zhao
- Beijing Institute of Biotechnology, Beijing 100071, China
| |
Collapse
|
34
|
Faucillion ML, Larsson J. Increased expression of X-linked genes in mammals is associated with a higher stability of transcripts and an increased ribosome density. Genome Biol Evol 2015; 7:1039-52. [PMID: 25786432 PMCID: PMC4419800 DOI: 10.1093/gbe/evv054] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Mammalian sex chromosomes evolved from the degeneration of one homolog of a pair of ancestral autosomes, the proto-Y. This resulted in a gene dose imbalance that is believed to be restored (partially or fully) through upregulation of gene expression from the single active X-chromosome in both sexes by a dosage compensatory mechanism. We analyzed multiple genome-wide RNA stability data sets and found significantly longer average half-lives for X-chromosome transcripts than for autosomal transcripts in various human cell lines, both male and female, and in mice. Analysis of ribosome profiling data shows that ribosome density is higher on X-chromosome transcripts than on autosomal transcripts in both humans and mice, suggesting that the higher stability is causally linked to a higher translation rate. Our results and observations are in accordance with a dosage compensatory upregulation of expressed X-linked genes. We therefore propose that differential mRNA stability and translation rates of the autosomes and sex chromosomes contribute to an evolutionarily conserved dosage compensation mechanism in mammals.
Collapse
Affiliation(s)
| | - Jan Larsson
- Department of Molecular Biology, Umeå University, Sweden
| |
Collapse
|
35
|
Clément Y, Fustier MA, Nabholz B, Glémin S. The bimodal distribution of genic GC content is ancestral to monocot species. Genome Biol Evol 2014; 7:336-48. [PMID: 25527839 PMCID: PMC4316631 DOI: 10.1093/gbe/evu278] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
In grasses such as rice or maize, the distribution of genic GC content is well known to be bimodal. It is mainly driven by GC content at third codon positions (GC3 for short). This feature is thought to be specific to grasses as closely related species like banana have a unimodal GC3 distribution. GC3 is associated with numerous genomics features and uncovering the origin of this peculiar distribution will help understanding the potential roles and consequences of GC3 variations within and between genomes. Until recently, the origin of the peculiar GC3 distribution in grasses has remained unknown. Thanks to the recent publication of several complete genomes and transcriptomes of nongrass monocots, we studied more than 1,000 groups of one-to-one orthologous genes in seven grasses and three outgroup species (banana, palm tree, and yam). Using a maximum likelihood-based method, we reconstructed GC3 at several ancestral nodes. We found that the bimodal GC3 distribution observed in extant grasses is ancestral to both grasses and most monocot species, and that other species studied here have lost this peculiar structure. We also found that GC3 in grass lineages is globally evolving very slowly and that the decreasing GC3 gradient observed from 5′ to 3′ along coding sequences is also conserved and ancestral to monocots. This result strongly challenges the previous views on the specificity of grass genomes and we discuss its implications for the possible causes of the evolution of GC content in monocots.
Collapse
Affiliation(s)
- Yves Clément
- Montpellier SupAgro, Unité Mixte de Recherche 1334, Amélioration Génétique et Adaptation des Plantes Méditerranéennes et Tropicales, Montpellier, France Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier, France
| | | | - Benoit Nabholz
- Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier, France
| | - Sylvain Glémin
- Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier, France
| |
Collapse
|
36
|
Jiang N, Wang L, Chen J, Wang L, Leach L, Luo Z. Conserved and divergent patterns of DNA methylation in higher vertebrates. Genome Biol Evol 2014; 6:2998-3014. [PMID: 25355807 PMCID: PMC4255770 DOI: 10.1093/gbe/evu238] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/20/2014] [Indexed: 02/07/2023] Open
Abstract
DNA methylation in the genome plays a fundamental role in the regulation of gene expression and is widespread in the genome of eukaryotic species. For example, in higher vertebrates, there is a "global" methylation pattern involving complete methylation of CpG sites genome-wide, except in promoter regions that are typically enriched for CpG dinucleotides, or so called "CpG islands." Here, we comprehensively examined and compared the distribution of CpG sites within ten model eukaryotic species and linked the observed patterns to the role of DNA methylation in controlling gene transcription. The analysis revealed two distinct but conserved methylation patterns for gene promoters in human and mouse genomes, involving genes with distinct distributions of promoter CpGs and gene expression patterns. Comparative analysis with four other higher vertebrates revealed that the primary regulatory role of the DNA methylation system is highly conserved in higher vertebrates.
Collapse
Affiliation(s)
- Ning Jiang
- Department of Biostatistics & Computational Biology, SKLG, School of Life Sciences, Fudan University, Shanghai, China School of Biosciences, The University of Birmingham, Birmingham B15 2TT United Kingdom
| | - Lin Wang
- Department of Biostatistics & Computational Biology, SKLG, School of Life Sciences, Fudan University, Shanghai, China
| | - Jing Chen
- School of Biosciences, The University of Birmingham, Birmingham B15 2TT United Kingdom
| | - Luwen Wang
- Department of Biostatistics & Computational Biology, SKLG, School of Life Sciences, Fudan University, Shanghai, China
| | - Lindsey Leach
- School of Biosciences, The University of Birmingham, Birmingham B15 2TT United Kingdom
| | - Zewei Luo
- Department of Biostatistics & Computational Biology, SKLG, School of Life Sciences, Fudan University, Shanghai, China School of Biosciences, The University of Birmingham, Birmingham B15 2TT United Kingdom
| |
Collapse
|
37
|
Babbitt GA, Alawad MA, Schulze KV, Hudson AO. Synonymous codon bias and functional constraint on GC3-related DNA backbone dynamics in the prokaryotic nucleoid. Nucleic Acids Res 2014; 42:10915-26. [PMID: 25200075 PMCID: PMC4176184 DOI: 10.1093/nar/gku811] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
While mRNA stability has been demonstrated to control rates of translation, generating both global and local synonymous codon biases in many unicellular organisms, this explanation cannot adequately explain why codon bias strongly tracks neighboring intergene GC content; suggesting that structural dynamics of DNA might also influence codon choice. Because minor groove width is highly governed by 3-base periodicity in GC, the existence of triplet-based codons might imply a functional role for the optimization of local DNA molecular dynamics via GC content at synonymous sites (≈GC3). We confirm a strong association between GC3-related intrinsic DNA flexibility and codon bias across 24 different prokaryotic multiple whole-genome alignments. We develop a novel test of natural selection targeting synonymous sites and demonstrate that GC3-related DNA backbone dynamics have been subject to moderate selective pressure, perhaps contributing to our observation that many genes possess extreme DNA backbone dynamics for their given protein space. This dual function of codons may impose universal functional constraints affecting the evolution of synonymous and non-synonymous sites. We propose that synonymous sites may have evolved as an 'accessory' during an early expansion of a primordial genetic code, allowing for multiplexed protein coding and structural dynamic information within the same molecular context.
Collapse
Affiliation(s)
- Gregory A Babbitt
- Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester NY, USA 14623
| | - Mohammed A Alawad
- B. Thomas Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester NY, USA 14623
| | - Katharina V Schulze
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston TX, USA 77030
| | - André O Hudson
- Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester NY, USA 14623
| |
Collapse
|
38
|
Glémin S, Clément Y, David J, Ressayre A. GC content evolution in coding regions of angiosperm genomes: a unifying hypothesis. Trends Genet 2014; 30:263-70. [PMID: 24916172 DOI: 10.1016/j.tig.2014.05.002] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Revised: 05/09/2014] [Accepted: 05/13/2014] [Indexed: 01/06/2023]
Abstract
In angiosperms (as in other species), GC content varies along and between genes, within a genome, and between genomes of different species, but the reason for this distribution is still an open question. Grass genomes are particularly intriguing because they exhibit a strong bimodal distribution of genic GC content and a sharp 5'-3' decreasing GC content gradient along most genes. Here, we propose a unifying model to explain the main patterns of GC content variation at the gene and genome scale. We argue that GC content patterns could be mainly determined by the interactions between gene structure, recombination patterns, and GC-biased gene conversion. Recent studies on fine-scale recombination maps in angiosperms support this hypothesis and previous results also fit this model. We propose that our model could be used as a null hypothesis to search for additional forces that affect GC content in angiosperms.
Collapse
Affiliation(s)
- Sylvain Glémin
- Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, UMR 5554 CNRS, Université Montpellier 2, F-34095 Montpellier, France.
| | - Yves Clément
- Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, UMR 5554 CNRS, Université Montpellier 2, F-34095 Montpellier, France; Montpellier SupAgro, Unité Mixte de Recherche 1334 Amélioration Génétique et Adaptation des Plantes Méditerranéennes et Tropicales, F-34398 Montpellier, France
| | - Jacques David
- Montpellier SupAgro, Unité Mixte de Recherche 1334 Amélioration Génétique et Adaptation des Plantes Méditerranéennes et Tropicales, F-34398 Montpellier, France
| | - Adrienne Ressayre
- INRA, UMR de Génétique Végétale, INRA/CNRS/Univ Paris-Sud/AgroParistech, Ferme du Moulon, F-91190 Gif sur Yvette, France
| |
Collapse
|
39
|
Abstract
In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution of ESTs using a maximum likelihood (ML) approach, which is then used to predict positions of TSS. Accurate identification of TSS is an important genomics task, since the position of regulatory elements with respect to the TSS can have large effects on gene regulation, and performance of promoter motif-finding methods depends on correct identification of TSSs. Our probabilistic approach expands recognition capabilities to multiple TSS per locus that may be a useful tool to enhance the understanding of alternative splicing mechanisms. This paper presents analysis of simulated data as well as statistical analysis of promoter regions of a model dicot plant Arabidopsis thaliana. Using our statistical tool we analyzed 16520 loci and developed a database of TSS, which is now publicly available at www.glacombio.net/NPEST.
Collapse
|
40
|
Elhaik E, Pellegrini M, Tatarinova TV. Gene expression and nucleotide composition are associated with genic methylation level in Oryza sativa. BMC Bioinformatics 2014; 15:23. [PMID: 24447369 PMCID: PMC3903047 DOI: 10.1186/1471-2105-15-23] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2013] [Accepted: 12/26/2013] [Indexed: 12/21/2022] Open
Abstract
Background The methylation of cytosines at CpG dinucleotides, which plays an important role in gene expression regulation, is one of the most studied epigenetic modifications. Thus far, the detection of DNA methylation has been determined mostly by experimental methods, which are not only prone to bench effects and artifacts but are also time-consuming, expensive, and cannot be easily scaled up to many samples. It is therefore useful to develop computational prediction methods for DNA methylation. Our previous studies highlighted the existence of correlations between the GC content of the third codon position (GC3), methylation, and gene expression. We thus designed a model to predict methylation in Oryza sativa based on genomic sequence features and gene expression data. Results We first derive equations to describe the relationship between gene methylation levels, GC3, expression, length, and other gene compositional features. We next assess gene compositional features involving sixmers and their association with methylation levels and other gene level properties. By applying our sixmer-based approach on rice gene expression data we show that it can accurately predict methylation (Pearson’s correlation coefficient r = 0.79) for the majority (79%) of the genes. Matlab code with our model is included. Conclusions Gene expression variation can be used as predictors of gene methylation levels.
Collapse
Affiliation(s)
| | | | - Tatiana V Tatarinova
- Children's Hospital Los Angeles, Keck School of Medicine, University of Southern California, 4650 Sunset Blvd, Los Angeles, CA 90027, USA.
| |
Collapse
|
41
|
DNA methylation, epigenetics, and evolution in vertebrates: facts and challenges. INTERNATIONAL JOURNAL OF EVOLUTIONARY BIOLOGY 2014; 2014:475981. [PMID: 24551476 PMCID: PMC3914449 DOI: 10.1155/2014/475981] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Revised: 11/11/2013] [Accepted: 11/23/2013] [Indexed: 12/22/2022]
Abstract
DNA methylation is a key epigenetic modification in the vertebrate genomes known to be involved in biological processes such as regulation of gene expression, DNA structure and control of transposable elements. Despite increasing knowledge about DNA methylation, we still lack a complete understanding of its specific functions and correlation with environment and gene expression in diverse organisms. To understand how global DNA methylation levels changed under environmental influence during vertebrate evolution, we analyzed its distribution pattern along the whole genome in mammals, reptiles and fishes showing that it is correlated with temperature, independently on phylogenetic inheritance. Other studies in mammals and plants have evidenced that environmental stimuli can promote epigenetic changes that, in turn, might generate localized changes in DNA sequence resulting in phenotypic effects. All these observations suggest that environment can affect the epigenome of vertebrates by generating hugely different methylation patterns that could, possibly, reflect in phenotypic differences. We are at the first steps towards the understanding of mechanisms that underlie the role of environment in molding the entire genome over evolutionary times. The next challenge will be to map similarities and differences of DNA methylation in vertebrates and to associate them with environmental adaptation and evolution.
Collapse
|