1
|
Chauhan S, Ahmad S. Enabling full‐length evolutionary profiles based deep convolutional neural network for predicting DNA‐binding proteins from sequence. Proteins 2019; 88:15-30. [DOI: 10.1002/prot.25763] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Revised: 06/01/2019] [Accepted: 06/15/2019] [Indexed: 12/22/2022]
Affiliation(s)
- Sucheta Chauhan
- School of Computational and Integrative SciencesJawaharlal Nehru University New Delhi India
| | - Shandar Ahmad
- School of Computational and Integrative SciencesJawaharlal Nehru University New Delhi India
| |
Collapse
|
2
|
Casola C, Betrán E. The Genomic Impact of Gene Retrocopies: What Have We Learned from Comparative Genomics, Population Genomics, and Transcriptomic Analyses? Genome Biol Evol 2017; 9:1351-1373. [PMID: 28605529 PMCID: PMC5470649 DOI: 10.1093/gbe/evx081] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/18/2017] [Indexed: 02/07/2023] Open
Abstract
Gene duplication is a major driver of organismal evolution. Gene retroposition is a mechanism of gene duplication whereby a gene's transcript is used as a template to generate retroposed gene copies, or retrocopies. Intriguingly, the formation of retrocopies depends upon the enzymatic machinery encoded by retrotransposable elements, genomic parasites occurring in the majority of eukaryotes. Most retrocopies are depleted of the regulatory regions found upstream of their parental genes; therefore, they were initially considered transcriptionally incompetent gene copies, or retropseudogenes. However, examples of functional retrocopies, or retrogenes, have accumulated since the 1980s. Here, we review what we have learned about retrocopies in animals, plants and other eukaryotic organisms, with a particular emphasis on comparative and population genomic analyses complemented with transcriptomic datasets. In addition, these data have provided information about the dynamics of the different "life cycle" stages of retrocopies (i.e., polymorphic retrocopy number variants, fixed retropseudogenes and retrogenes) and have provided key insights into the retroduplication mechanisms, the patterns and evolutionary forces at work during the fixation process and the biological function of retrogenes. Functional genomic and transcriptomic data have also revealed that many retropseudogenes are transcriptionally active and a biological role has been experimentally determined for many. Finally, we have learned that not only non-long terminal repeat retroelements but also long terminal repeat retroelements play a role in the emergence of retrocopies across eukaryotes. This body of work has shown that mRNA-mediated duplication represents a widespread phenomenon that produces an array of new genes that contribute to organismal diversity and adaptation.
Collapse
Affiliation(s)
- Claudio Casola
- Department of Ecosystem Science and Management, Texas A&M University, TX
| | - Esther Betrán
- Department of Biology, University of Texas at Arlington, Arlington, TX
| |
Collapse
|
3
|
Carelli FN, Hayakawa T, Go Y, Imai H, Warnefors M, Kaessmann H. The life history of retrocopies illuminates the evolution of new mammalian genes. Genome Res 2016; 26:301-14. [PMID: 26728716 PMCID: PMC4772013 DOI: 10.1101/gr.198473.115] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 12/21/2015] [Indexed: 02/03/2023]
Abstract
New genes contribute substantially to adaptive evolutionary innovation, but the functional evolution of new mammalian genes has been little explored at a broad scale. Previous work established mRNA-derived gene duplicates, known as retrocopies, as models for the study of new gene origination. Here we combine mammalian transcriptomic and epigenomic data to unveil the processes underlying the evolution of stripped-down retrocopies into complex new genes. We show that although some robustly expressed retrocopies are transcribed from preexisting promoters, most evolved new promoters from scratch or recruited proto-promoters in their genomic vicinity. In particular, many retrocopy promoters emerged from ancestral enhancers (or bivalent regulatory elements) or are located in CpG islands not associated with other genes. We detected 88–280 selectively preserved retrocopies per mammalian species, illustrating that these mechanisms facilitated the birth of many functional retrogenes during mammalian evolution. The regulatory evolution of originally monoexonic retrocopies was frequently accompanied by exon gain, which facilitated co-option of distant promoters and allowed expression of alternative isoforms. While young retrogenes are often initially expressed in the testis, increased regulatory and structural complexities allowed retrogenes to functionally diversify and evolve somatic organ functions, sometimes as complex as those of their parents. Thus, some retrogenes evolved the capacity to temporarily substitute for their parents during the process of male meiotic X inactivation, while others rendered parental functions superfluous, allowing for parental gene loss. Overall, our reconstruction of the “life history” of mammalian retrogenes highlights retroposition as a general model for understanding new gene birth and functional evolution.
Collapse
Affiliation(s)
- Francesco Nicola Carelli
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Takashi Hayakawa
- Department of Wildlife Science (Nagoya Railroad Company, Limited), Primate Research Institute, Kyoto University, Inuyama, Aichi 484-8506, Japan; Japan Monkey Center, Inuyama, Aichi 484-0081, Japan
| | - Yasuhiro Go
- Department of Brain Sciences, Center for Novel Science Initiatives, National Institutes of Natural Sciences, Okazaki, Aichi 444-8585, Japan; Department of Developmental Physiology, National Institute for Physiological Sciences, Okazaki, Aichi 444-8585, Japan; Department of Physiological Sciences, School of Life Science, SOKENDAI (The Graduate University for Advanced Studies), Okazaki, Aichi 484-8585, Japan
| | - Hiroo Imai
- Department of Cellular and Molecular Biology, Primate Research Institute, Kyoto University, Inuyama, Aichi 484-8506, Japan
| | - Maria Warnefors
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Henrik Kaessmann
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
4
|
Wu CS, Yu CY, Chuang CY, Hsiao M, Kao CF, Kuo HC, Chuang TJ. Integrative transcriptome sequencing identifies trans-splicing events with important roles in human embryonic stem cell pluripotency. Genome Res 2013; 24:25-36. [PMID: 24131564 PMCID: PMC3875859 DOI: 10.1101/gr.159483.113] [Citation(s) in RCA: 83] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Trans-splicing is a post-transcriptional event that joins exons from separate pre-mRNAs. Detection of trans-splicing is usually severely hampered by experimental artifacts and genetic rearrangements. Here, we develop a new computational pipeline, TSscan, which integrates different types of high-throughput long-/short-read transcriptome sequencing of different human embryonic stem cell (hESC) lines to effectively minimize false positives while detecting trans-splicing. Combining TSscan screening with multiple experimental validation steps revealed that most chimeric RNA products were platform-dependent experimental artifacts of RNA sequencing. We successfully identified and confirmed four trans-spliced RNAs, including the first reported trans-spliced large intergenic noncoding RNA (“tsRMST”). We showed that these trans-spliced RNAs were all highly expressed in human pluripotent stem cells and differentially expressed during hESC differentiation. Our results further indicated that tsRMST can contribute to pluripotency maintenance of hESCs by suppressing lineage-specific gene expression through the recruitment of NANOG and the PRC2 complex factor, SUZ12. Taken together, our findings provide important insights into the role of trans-splicing in pluripotency maintenance of hESCs and help to facilitate future studies into trans-splicing, opening up this important but understudied class of post-transcriptional events for comprehensive characterization.
Collapse
Affiliation(s)
- Chan-Shuo Wu
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | | | | | | | | | | | | |
Collapse
|
5
|
Chen FC, Chen YZ, Chuang TJ. CNVVdb: a database of copy number variations across vertebrate genomes. Bioinformatics 2009; 25:1419-21. [PMID: 19321736 PMCID: PMC2682513 DOI: 10.1093/bioinformatics/btp166] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
SUMMARY CNVVdb is a web interface for identification of putative copy number variations (CNVs) among 16 vertebrate species using the-same-species self-alignments and cross-species pairwise alignments. By querying genomic coordinates in the target species, all the potential paralogous/orthologous regions that overlap > or = 80-100% (adjustable) of the query sequences with user-specified sequence identity (> or = 60% ~ > or = 90%) are returned. Additional information is also given for the genes that are included in the returned regions, including gene description, alternatively spliced transcripts, gene ontology descriptions and other biologically important information. CNVVdb also provides information of pseudogenes and single nucleotide polymorphisms (SNPs) for the CNV-related genomic regions. Moreover, multiple sequence alignments of shared CNVs across species are also provided. With the combination of CNV, SNP, pseudogene and functional information, CNVVdb can be very useful for comparative and functional studies in vertebrates. AVAILABILITY CNVVdb is freely accessible at (http://CNVVdb.genomics.sinica.edu.tw).
Collapse
Affiliation(s)
- Feng-Chi Chen
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Miaoli County 350, Taipei, Taiwan
| | | | | |
Collapse
|
6
|
Abstract
Gene copies that stem from the mRNAs of parental source genes have long been viewed as evolutionary dead-ends with little biological relevance. Here we review a range of recent studies that have unveiled a significant number of functional retroposed gene copies in both mammalian and some non-mammalian genomes. These studies have not only revealed previously unknown mechanisms for the emergence of new genes and their functions but have also provided fascinating general insights into molecular and evolutionary processes that have shaped genomes. For example, analyses of chromosomal gene movement patterns via RNA-based gene duplication have shed fresh light on the evolutionary origin and biology of our sex chromosomes.
Collapse
|