1
|
Dopkins N, Nixon DF. Activation of human endogenous retroviruses and its physiological consequences. Nat Rev Mol Cell Biol 2024; 25:212-222. [PMID: 37872387 DOI: 10.1038/s41580-023-00674-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/27/2023] [Indexed: 10/25/2023]
Abstract
Human endogenous retroviruses (HERVs) are abundant sequences that persist within the human genome as remnants of ancient retroviral infections. These sequences became fixed and accumulate mutations or deletions over time. HERVs have affected human evolution and physiology by providing a unique repertoire of coding and non-coding sequences to the genome. In healthy individuals, HERVs participate in immune responses, formation of syncytiotrophoblasts and cell-fate specification. In this Review, we discuss how endogenized retroviral motifs and regulatory sequences have been co-opted into human physiology and how they are tightly regulated. Infections and mutations can derail this regulation, leading to differential HERV expression, which may contribute to pathologies including neurodegeneration, pathological inflammation and oncogenesis. Emerging evidence demonstrates that HERVs are crucial to human health and represent an understudied facet of many diseases, and we therefore argue that investigating their fundamental properties could improve existing therapies and help develop novel therapeutic strategies.
Collapse
Affiliation(s)
- Nicholas Dopkins
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, New York, NY, USA.
| | - Douglas F Nixon
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
2
|
Lozano-Iturbe V, Blanco-Agudín N, Vázquez-Espinosa E, Fernández-Vega I, Merayo-Lloves J, Vazquez F, Girón RM, Quirós LM. The Binding of Pseudomonas aeruginosa to Cystic Fibrosis Bronchial Epithelial Model Cells Alters the Composition of the Exosomes They Produce Compared to Healthy Control Cells. Int J Mol Sci 2024; 25:895. [PMID: 38255969 PMCID: PMC10815301 DOI: 10.3390/ijms25020895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 01/05/2024] [Accepted: 01/09/2024] [Indexed: 01/24/2024] Open
Abstract
Cystic fibrosis (CF) is a genetic disease that causes dehydration of the surface of the airways, increasing lung infections, most frequently caused by Pseudomonas aeruginosa. Exosomes are nanovesicles released by cells that play an essential role in intercellular communication, although their role during bacterial infections is not well understood. In this article, we analyze the alterations in exosomes produced by healthy bronchial epithelial and cystic fibrosis cell lines caused by the interaction with P. aeruginosa. The proteomic study detected alterations in 30% of the species analyzed. In healthy cells, they mainly involve proteins related to the extracellular matrix, cytoskeleton, and various catabolic enzymes. In CF, proteins related to the cytoskeleton and matrix, in addition to the proteasome. These differences could be related to the inflammatory response. A study of miRNAs detected alterations in 18% of the species analyzed. The prediction of their potential biological targets identified 7149 genes, regulated by up to 7 different miRNAs. The identification of their functions showed that they preferentially affected molecules involved in binding and catalytic activities, although with differences between cell types. In conclusion, this study shows differences in exosomes between CF and healthy cells that could be involved in the response to infection.
Collapse
Affiliation(s)
- Víctor Lozano-Iturbe
- Department of Functional Biology, University of Oviedo, 33006 Oviedo, Spain; (V.L.-I.); (N.B.-A.); (F.V.)
- Instituto Universitario Fernández-Vega, Fundación de Investigación Oftalmológica, University of Oviedo, 33012 Oviedo, Spain; (I.F.-V.); (J.M.-L.)
- Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), 33011 Oviedo, Spain
| | - Noelia Blanco-Agudín
- Department of Functional Biology, University of Oviedo, 33006 Oviedo, Spain; (V.L.-I.); (N.B.-A.); (F.V.)
- Instituto Universitario Fernández-Vega, Fundación de Investigación Oftalmológica, University of Oviedo, 33012 Oviedo, Spain; (I.F.-V.); (J.M.-L.)
- Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), 33011 Oviedo, Spain
| | - Emma Vázquez-Espinosa
- Pneumology Service, Institute for Health Research (IP), Hospital Universitario de La Princesa, 28006 Madrid, Spain;
| | - Iván Fernández-Vega
- Instituto Universitario Fernández-Vega, Fundación de Investigación Oftalmológica, University of Oviedo, 33012 Oviedo, Spain; (I.F.-V.); (J.M.-L.)
- Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), 33011 Oviedo, Spain
- Department of Pathology, Hospital Universitario Central de Asturias, 33011 Oviedo, Spain
| | - Jesús Merayo-Lloves
- Instituto Universitario Fernández-Vega, Fundación de Investigación Oftalmológica, University of Oviedo, 33012 Oviedo, Spain; (I.F.-V.); (J.M.-L.)
- Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), 33011 Oviedo, Spain
| | - Fernando Vazquez
- Department of Functional Biology, University of Oviedo, 33006 Oviedo, Spain; (V.L.-I.); (N.B.-A.); (F.V.)
- Instituto Universitario Fernández-Vega, Fundación de Investigación Oftalmológica, University of Oviedo, 33012 Oviedo, Spain; (I.F.-V.); (J.M.-L.)
- Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), 33011 Oviedo, Spain
- Department of Microbiology, Hospital Universitario Central de Asturias, 33011 Oviedo, Spain
| | - Rosa M. Girón
- Pneumology Service, Institute for Health Research (IP), Hospital Universitario de La Princesa, 28006 Madrid, Spain;
| | - Luis M. Quirós
- Department of Functional Biology, University of Oviedo, 33006 Oviedo, Spain; (V.L.-I.); (N.B.-A.); (F.V.)
- Instituto Universitario Fernández-Vega, Fundación de Investigación Oftalmológica, University of Oviedo, 33012 Oviedo, Spain; (I.F.-V.); (J.M.-L.)
- Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), 33011 Oviedo, Spain
| |
Collapse
|
3
|
Maeng JH, Jang HJ, Du AY, Tzeng SC, Wang T. Using long-read CAGE sequencing to profile cryptic-promoter-derived transcripts and their contribution to the immunopeptidome. Genome Res 2023; 33:gr.277061.122. [PMID: 38065624 PMCID: PMC10760525 DOI: 10.1101/gr.277061.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 11/13/2023] [Indexed: 01/04/2024]
Abstract
Recent studies have shown that the noncoding genome can produce unannotated proteins as antigens that induce immune response. One major source of this activity is the aberrant epigenetic reactivation of transposable elements (TEs). In tumors, TEs often provide cryptic or alternate promoters, which can generate transcripts that encode tumor-specific unannotated proteins. Thus, TE-derived transcripts (TE transcripts) have the potential to produce tumor-specific, but recurrent, antigens shared among many tumors. Identification of TE-derived tumor antigens holds the promise to improve cancer immunotherapy approaches; however, current genomics and computational tools are not optimized for their detection. Here we combined CAGE technology with full-length long-read transcriptome sequencing (long-read CAGE, or LRCAGE) and developed a suite of computational tools to significantly improve immunopeptidome detection by incorporating TE and other tumor transcripts into the proteome database. By applying our methods to human lung cancer cell line H1299 data, we show that long-read technology significantly improves mapping of promoters with low mappability scores and that LRCAGE guarantees accurate construction of uncharacterized 5' transcript structure. Augmenting a reference proteome database with newly characterized transcripts enabled us to detect noncanonical antigens from HLA-pulldown LC-MS/MS data. Lastly, we show that epigenetic treatment increased the number of noncanonical antigens, particularly those encoded by TE transcripts, which might expand the pool of targetable antigens for cancers with low mutational burden.
Collapse
Affiliation(s)
- Ju Heon Maeng
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - H Josh Jang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Alan Y Du
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Shin-Cheng Tzeng
- Donald Danforth Plant Science Center, St. Louis, Missouri 63132, USA
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| |
Collapse
|
4
|
Yan Y, Tian Y, Wu Z, Zhang K, Yang R. Interchromosomal Colocalization with Parental Genes Is Linked to the Function and Evolution of Mammalian Retrocopies. Mol Biol Evol 2023; 40:msad265. [PMID: 38060983 PMCID: PMC10733166 DOI: 10.1093/molbev/msad265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 10/25/2023] [Accepted: 11/29/2023] [Indexed: 12/22/2023] Open
Abstract
Retrocopies are gene duplicates arising from reverse transcription of mature mRNA transcripts and their insertion back into the genome. While long being regarded as processed pseudogenes, more and more functional retrocopies have been discovered. How the stripped-down retrocopies recover expression capability and become functional paralogs continually intrigues evolutionary biologists. Here, we investigated the function and evolution of retrocopies in the context of 3D genome organization. By mapping retrocopy-parent pairs onto sequencing-based and imaging-based chromatin contact maps in human and mouse cell lines and onto Hi-C interaction maps in 5 other mammals, we found that retrocopies and their parental genes show a higher-than-expected interchromosomal colocalization frequency. The spatial interactions between retrocopies and parental genes occur frequently at loci in active subcompartments and near nuclear speckles. Accordingly, colocalized retrocopies are more actively transcribed and translated and are more evolutionarily conserved than noncolocalized ones. The active transcription of colocalized retrocopies may result from their permissive epigenetic environment and shared regulatory elements with parental genes. Population genetic analysis of retroposed gene copy number variants in human populations revealed that retrocopy insertions are not entirely random in regard to interchromosomal interactions and that colocalized retroposed gene copy number variants are more likely to reach high frequencies, suggesting that both insertion bias and natural selection contribute to the colocalization of retrocopy-parent pairs. Further dissection implies that reduced selection efficacy, rather than positive selection, contributes to the elevated allele frequency of colocalized retroposed gene copy number variants. Overall, our results hint a role of interchromosomal colocalization in the "resurrection" of initially neutral retrocopies.
Collapse
Affiliation(s)
- Yubin Yan
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Yuhan Tian
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Zefeng Wu
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Kunling Zhang
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Ruolin Yang
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| |
Collapse
|
5
|
Qi K, Dou Y, Li C, Liu Y, Song C, Li X, Wang K, Qiao R, Li X, Yang F, Han X. CircGUCY2C regulates cofilin 1 by sponging miR-425-3p to promote the proliferation of porcine skeletal muscle satellite cells. Arch Anim Breed 2023; 66:285-298. [PMID: 38039333 PMCID: PMC10655074 DOI: 10.5194/aab-66-285-2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 09/07/2023] [Indexed: 12/03/2023] Open
Abstract
Circular ribonucleic acids (or circRNAs) are an emerging class of endogenous noncoding RNAs that are involved in physiological and pathological processes. Increasing evidence suggests that circRNAs play an important regulatory role in skeletal muscle development and meat quality regulation. In this study, it was found that circGUCY2C exhibits a high expression level in the longissimus dorsi muscle. It shows resistance to RNase R and additionally promotes the mRNA expression of cyclin-dependent kinase 2 (CDK2) and proliferating cell nuclear antigen (PCNA). Specifically, it was observed that the overexpression of circGUCY2C could promote the transition of porcine skeletal muscle satellite cells into the S and G2 phases of the cell cycle and that it regulates the proliferation of porcine skeletal muscle satellite cells. In contrast, miR-425-3p plays the opposite role and has an inhibitory effect on the proliferation of porcine skeletal muscle satellite cells. MiR-425-3p has been described as a target of circGUCY2C; consequently, the depletion of miR-425-3p promoted the proliferation of porcine skeletal muscle satellite cells. CFL1 (cofilin 1) is a target of miR-425-3p, and circGUCY2C upregulated CFL1 expression by inhibiting miR-425-3p. Collectively, our research outcomes demonstrate that circGUCY2C significantly influences the proliferation of porcine skeletal muscle satellite cells by selectively targeting the miR-425-3p-CFL1 axis, and our work partially clarified the role of circGUCY2C in porcine skeletal muscle satellite cells. Thus, the study provides new insight into the function of circGUCY2C and adds to the knowledge of the post-transcriptional regulation of pork quality.
Collapse
Affiliation(s)
- Kunlong Qi
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Yaqing Dou
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Chenlei Li
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Yingke Liu
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Chenglei Song
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Xinjian Li
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Kejun Wang
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Ruimin Qiao
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Xiuling Li
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Feng Yang
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| | - Xuelei Han
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou 450046, China
| |
Collapse
|
6
|
Amaral P, Carbonell-Sala S, De La Vega FM, Faial T, Frankish A, Gingeras T, Guigo R, Harrow JL, Hatzigeorgiou AG, Johnson R, Murphy TD, Pertea M, Pruitt KD, Pujar S, Takahashi H, Ulitsky I, Varabyou A, Wells CA, Yandell M, Carninci P, Salzberg SL. The status of the human gene catalogue. Nature 2023; 622:41-47. [PMID: 37794265 PMCID: PMC10575709 DOI: 10.1038/s41586-023-06490-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 07/27/2023] [Indexed: 10/06/2023]
Abstract
Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.
Collapse
Affiliation(s)
- Paulo Amaral
- INSPER Institute of Education and Research, Sao Paulo, Brazil
| | | | - Francisco M De La Vega
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
- Tempus Labs, Chicago, IL, USA
| | | | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Thomas Gingeras
- Department of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Roderic Guigo
- Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jennifer L Harrow
- Centre for Genomics Research, Discovery Sciences, AstraZeneca, Royston, UK
| | - Artemis G Hatzigeorgiou
- Department of Computer Science and Biomedical Informatics, Universithy of Thessaly, Lamia, Greece
- Hellenic Pasteur Institute, Athens, Greece
| | - Rory Johnson
- School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
- Conway Institute of Biomedical and Biomolecular Research, University College Dublin, Dublin, Ireland
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research, University of Bern, Bern, Switzerland
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Hazuki Takahashi
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Igor Ulitsky
- Department of Immunology and Regenerative Biology, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Christine A Wells
- Stem Cell Systems, Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, Victoria, Australia
| | - Mark Yandell
- Departent of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Piero Carninci
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Human Technopole, Milan, Italy.
| | - Steven L Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
7
|
Chen Y, Sim A, Wan YK, Yeo K, Lee JJX, Ling MH, Love MI, Göke J. Context-aware transcript quantification from long-read RNA-seq data with Bambu. Nat Methods 2023; 20:1187-1195. [PMID: 37308696 PMCID: PMC10448944 DOI: 10.1038/s41592-023-01908-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 05/08/2023] [Indexed: 06/14/2023]
Abstract
Most approaches to transcript quantification rely on fixed reference annotations; however, the transcriptome is dynamic and depending on the context, such static annotations contain inactive isoforms for some genes, whereas they are incomplete for others. Here we present Bambu, a method that performs machine-learning-based transcript discovery to enable quantification specific to the context of interest using long-read RNA-sequencing. To identify novel transcripts, Bambu estimates the novel discovery rate, which replaces arbitrary per-sample thresholds with a single, interpretable, precision-calibrated parameter. Bambu retains the full-length and unique read counts, enabling accurate quantification in presence of inactive isoforms. Compared to existing methods for transcript discovery, Bambu achieves greater precision without sacrificing sensitivity. We show that context-aware annotations improve quantification for both novel and known transcripts. We apply Bambu to quantify isoforms from repetitive HERVH-LTR7 retrotransposons in human embryonic stem cells, demonstrating the ability for context-specific transcript expression analysis.
Collapse
Affiliation(s)
- Ying Chen
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Andre Sim
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Yuk Kei Wan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
| | - Keith Yeo
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Joseph Jing Xian Lee
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Min Hao Ling
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Jonathan Göke
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
- Department of Statistics and Data Science, National University of Singapore, Singapore, Republic of Singapore.
| |
Collapse
|
8
|
Seczynska M, Lehner PJ. The sound of silence: mechanisms and implications of HUSH complex function. Trends Genet 2023; 39:251-267. [PMID: 36754727 DOI: 10.1016/j.tig.2022.12.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 12/14/2022] [Accepted: 12/30/2022] [Indexed: 02/08/2023]
Abstract
The vertebrate genome is under constant threat of invasion by genetic parasites. Whether the host can immediately recognize and respond to invading elements has been unclear. The discovery of the human silencing hub (HUSH) complex, and the finding that it provides immediate protection from genome invasion by silencing products of reverse transcription, have important implications for mammalian genome evolution. In this review, we summarize recent insights into HUSH function and describe how cellular introns provide a novel means of self-nonself discrimination, allowing HUSH to recognize and transcriptionally repress a broad range of intronless genetic elements. We discuss how HUSH contributes to genome evolution, and highlight studies reporting the critical role of HUSH in development and implicating HUSH in the control of immune signaling and cancer progression.
Collapse
Affiliation(s)
- Marta Seczynska
- Cambridge Institute for Therapeutic Immunology & Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge CB2 0AW, UK.
| | - Paul J Lehner
- Cambridge Institute for Therapeutic Immunology & Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge CB2 0AW, UK.
| |
Collapse
|
9
|
Amaral P, Carbonell-Sala S, De La Vega FM, Faial T, Frankish A, Gingeras T, Guigo R, Harrow JL, Hatzigeorgiou AG, Johnson R, Murphy TD, Pertea M, Pruitt KD, Pujar S, Takahashi H, Ulitsky I, Varabyou A, Wells CA, Yandell M, Carninci P, Salzberg SL. The status of the human gene catalogue. ARXIV 2023:arXiv:2303.13996v1. [PMID: 36994150 PMCID: PMC10055485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
Scientists have been trying to identify all of the genes in the human genome since the initial draft of the genome was published in 2001. Over the intervening years, much progress has been made in identifying protein-coding genes, and the estimated number has shrunk to fewer than 20,000, although the number of distinct protein-coding isoforms has expanded dramatically. The invention of high-throughput RNA sequencing and other technological breakthroughs have led to an explosion in the number of reported non-coding RNA genes, although most of them do not yet have any known function. A combination of recent advances offers a path forward to identifying these functions and towards eventually completing the human gene catalogue. However, much work remains to be done before we have a universal annotation standard that includes all medically significant genes, maintains their relationships with different reference genomes, and describes clinically relevant genetic variants.
Collapse
Affiliation(s)
- Paulo Amaral
- INSPER Institute of Education and Research, São Paulo, SP, Brasil
| | - Silvia Carbonell-Sala
- Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Catalonia, Spain
| | - Francisco M. De La Vega
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA; Tempus Labs, Inc., Chicago, IL
| | | | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Gingeras
- Department of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
| | - Roderic Guigo
- Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Jennifer L Harrow
- Centre for Genomics Research, Discovery Sciences, AstraZeneca, Da Vinci Building. Melbourn Science Park, Royston UK SG8 6HB
| | - Artemis G. Hatzigeorgiou
- Universithy of Thessaly, Department of Computer Science and Biomedical Informatics, Lamia, Greece; Hellenic Pasteur Institute, Athens, Greece
| | - Rory Johnson
- School of Biology and Environmental Science, University College Dublin, D04 V1W8 Dublin, Ireland; Conway Institute of Biomedical and Biomolecular Research, University College Dublin, D04 V1W8 Dublin, Ireland; Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, 3010 Bern, Switzerland; Department for BioMedical Research, University of Bern, 3008 Bern, Switzerland
| | - Terence D. Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Kim D. Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Hazuki Takahashi
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama Kanagawa 230-0045 Japan
| | - Igor Ulitsky
- Department of Immunology and Regenerative Biology; Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Ales Varabyou
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Christine A. Wells
- Stem Cell Systems, Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville 3010 Vic Australia
| | - Mark Yandell
- Departent of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Piero Carninci
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Human Technopole, via Rita Levi Montalcini 1, Milan 20157 Italy
| | - Steven L. Salzberg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Immunology and Regenerative Biology; Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot 76100, Israel
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
10
|
Mattick JS. RNA out of the mist. Trends Genet 2023; 39:187-207. [PMID: 36528415 DOI: 10.1016/j.tig.2022.11.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 11/08/2022] [Accepted: 11/27/2022] [Indexed: 12/23/2022]
Abstract
RNA has long been regarded primarily as the intermediate between genes and proteins. It was a surprise then to discover that eukaryotic genes are mosaics of mRNA sequences interrupted by large tracts of transcribed but untranslated sequences, and that multicellular organisms also express many long 'intergenic' and antisense noncoding RNAs (lncRNAs). The identification of small RNAs that regulate mRNA translation and half-life did not disturb the prevailing view that animals and plant genomes are full of evolutionary debris and that their development is mainly supervised by transcription factors. Gathering evidence to the contrary involved addressing the low conservation, expression, and genetic visibility of lncRNAs, demonstrating their cell-specific roles in cell and developmental biology, and their association with chromatin-modifying complexes and phase-separated domains. The emerging picture is that most lncRNAs are the products of genetic loci termed 'enhancers', which marshal generic effector proteins to their sites of action to control cell fate decisions during development.
Collapse
Affiliation(s)
- John S Mattick
- School of Biotechnology and Biomolecular Sciences, UNSW, Sydney, NSW 2052, Australia; UNSW RNA Institute, UNSW, Sydney, NSW 2052, Australia.
| |
Collapse
|
11
|
Manuel JM, Guilloy N, Khatir I, Roucou X, Laurent B. Re-evaluating the impact of alternative RNA splicing on proteomic diversity. Front Genet 2023; 14:1089053. [PMID: 36845399 PMCID: PMC9947481 DOI: 10.3389/fgene.2023.1089053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Alternative splicing (AS) constitutes a mechanism by which protein-coding genes and long non-coding RNA (lncRNA) genes produce more than a single mature transcript. From plants to humans, AS is a powerful process that increases transcriptome complexity. Importantly, splice variants produced from AS can potentially encode for distinct protein isoforms which can lose or gain specific domains and, hence, differ in their functional properties. Advances in proteomics have shown that the proteome is indeed diverse due to the presence of numerous protein isoforms. For the past decades, with the help of advanced high-throughput technologies, numerous alternatively spliced transcripts have been identified. However, the low detection rate of protein isoforms in proteomic studies raised debatable questions on whether AS contributes to proteomic diversity and on how many AS events are really functional. We propose here to assess and discuss the impact of AS on proteomic complexity in the light of the technological progress, updated genome annotation, and current scientific knowledge.
Collapse
Affiliation(s)
- Jeru Manoj Manuel
- Research Center on Aging, Centre Intégré Universitaire de Santé et Services Sociaux de l’Estrie-Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC, Canada,Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Noé Guilloy
- Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Inès Khatir
- Research Center on Aging, Centre Intégré Universitaire de Santé et Services Sociaux de l’Estrie-Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC, Canada,Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, Canada,Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke (CRCHUS), Sherbrooke, QC, Canada,Quebec Network for Research on Protein Function Structure and Engineering, PROTEO, Québec, QC, Canada
| | - Benoit Laurent
- Research Center on Aging, Centre Intégré Universitaire de Santé et Services Sociaux de l’Estrie-Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC, Canada,Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, Canada,*Correspondence: Benoit Laurent,
| |
Collapse
|
12
|
Qian SH, Chen L, Xiong YL, Chen ZX. Evolution and function of developmentally dynamic pseudogenes in mammals. Genome Biol 2022; 23:235. [PMID: 36348461 PMCID: PMC9641868 DOI: 10.1186/s13059-022-02802-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 10/23/2022] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND Pseudogenes are excellent markers for genome evolution, which are emerging as crucial regulators of development and disease, especially cancer. However, systematic functional characterization and evolution of pseudogenes remain largely unexplored. RESULTS To systematically characterize pseudogenes, we date the origin of human and mouse pseudogenes across vertebrates and observe a burst of pseudogene gain in these two lineages. Based on a hybrid sequencing dataset combining full-length PacBio sequencing, sample-matched Illumina sequencing, and public time-course transcriptome data, we observe that abundant mammalian pseudogenes could be transcribed, which contribute to the establishment of organ identity. Our analyses reveal that developmentally dynamic pseudogenes are evolutionarily conserved and show an increasing weight during development. Besides, they are involved in complex transcriptional and post-transcriptional modulation, exhibiting the signatures of functional enrichment. Coding potential evaluation suggests that 19% of human pseudogenes could be translated, thus serving as a new way for protein innovation. Moreover, pseudogenes carry disease-associated SNPs and conduce to cancer transcriptome perturbation. CONCLUSIONS Our discovery reveals an unexpectedly high abundance of mammalian pseudogenes that can be transcribed and translated, and these pseudogenes represent a novel regulatory layer. Our study also prioritizes developmentally dynamic pseudogenes with signatures of functional enrichment and provides a hybrid sequencing dataset for further unraveling their biological mechanisms in organ development and carcinogenesis in the future.
Collapse
Affiliation(s)
- Sheng Hu Qian
- grid.35155.370000 0004 1790 4137Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan, 430070 PR China ,grid.35155.370000 0004 1790 4137Hubei Key Laboratory of Agricultural Bioinformatics, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 PR China
| | - Lu Chen
- grid.35155.370000 0004 1790 4137Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan, 430070 PR China ,grid.35155.370000 0004 1790 4137Hubei Key Laboratory of Agricultural Bioinformatics, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 PR China
| | - Yu-Li Xiong
- grid.35155.370000 0004 1790 4137Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan, 430070 PR China ,grid.35155.370000 0004 1790 4137Hubei Key Laboratory of Agricultural Bioinformatics, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 PR China
| | - Zhen-Xia Chen
- grid.35155.370000 0004 1790 4137Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan, 430070 PR China ,grid.35155.370000 0004 1790 4137Hubei Key Laboratory of Agricultural Bioinformatics, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 PR China ,grid.35155.370000 0004 1790 4137Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan, 430070 PR China ,grid.35155.370000 0004 1790 4137Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Shenzhen, 518124 PR China ,grid.488316.00000 0004 4912 1102Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124 PR China
| |
Collapse
|
13
|
Reixachs-Solé M, Eyras E. Uncovering the impacts of alternative splicing on the proteome with current omics techniques. WILEY INTERDISCIPLINARY REVIEWS. RNA 2022; 13:e1707. [PMID: 34979593 PMCID: PMC9542554 DOI: 10.1002/wrna.1707] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Revised: 11/27/2021] [Accepted: 11/29/2021] [Indexed: 12/15/2022]
Abstract
The high‐throughput sequencing of cellular RNAs has underscored a broad effect of isoform diversification through alternative splicing on the transcriptome. Moreover, the differential production of transcript isoforms from gene loci has been recognized as a critical mechanism in cell differentiation, organismal development, and disease. Yet, the extent of the impact of alternative splicing on protein production and cellular function remains a matter of debate. Multiple experimental and computational approaches have been developed in recent years to address this question. These studies have unveiled how molecular changes at different steps in the RNA processing pathway can lead to differences in protein production and have functional effects. New and emerging experimental technologies open exciting new opportunities to develop new methods to fully establish the connection between messenger RNA expression and protein production and to further investigate how RNA variation impacts the proteome and cell function. This article is categorized under:RNA Processing > Splicing Regulation/Alternative Splicing Translation > Regulation RNA Evolution and Genomics > Computational Analyses of RNA
Collapse
Affiliation(s)
- Marina Reixachs-Solé
- The John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia.,EMBL Australia Partner Laboratory Network and the Australian National University, Canberra, Australian Capital Territory, Australia
| | - Eduardo Eyras
- The John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia.,EMBL Australia Partner Laboratory Network and the Australian National University, Canberra, Australian Capital Territory, Australia.,Catalan Institution for Research and Advanced Studies, Barcelona, Spain.,Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
| |
Collapse
|
14
|
Bowles H, Kabiljo R, Al Khleifat A, Jones A, Quinn JP, Dobson RJB, Swanson CM, Al-Chalabi A, Iacoangeli A. An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data. FRONTIERS IN BIOINFORMATICS 2022; 2:1062328. [PMID: 36845320 PMCID: PMC9945273 DOI: 10.3389/fbinf.2022.1062328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 12/12/2022] [Indexed: 02/10/2023] Open
Abstract
There is a growing interest in the study of human endogenous retroviruses (HERVs) given the substantial body of evidence that implicates them in many human diseases. Although their genomic characterization presents numerous technical challenges, next-generation sequencing (NGS) has shown potential to detect HERV insertions and their polymorphisms in humans. Currently, a number of computational tools to detect them in short-read NGS data exist. In order to design optimal analysis pipelines, an independent evaluation of the available tools is required. We evaluated the performance of a set of such tools using a variety of experimental designs and datasets. These included 50 human short-read whole-genome sequencing samples, matching long and short-read sequencing data, and simulated short-read NGS data. Our results highlight a great performance variability of the tools across the datasets and suggest that different tools might be suitable for different study designs. However, specialized tools designed to detect exclusively human endogenous retroviruses consistently outperformed generalist tools that detect a wider range of transposable elements. We suggest that, if sufficient computing resources are available, using multiple HERV detection tools to obtain a consensus set of insertion loci may be ideal. Furthermore, given that the false positive discovery rate of the tools varied between 8% and 55% across tools and datasets, we recommend the wet lab validation of predicted insertions if DNA samples are available.
Collapse
Affiliation(s)
- Harry Bowles
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | - Renata Kabiljo
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- Department of Biostatistics and Health Informatics, King’s College London, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | - Ahmad Al Khleifat
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | - Ashley Jones
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | - John P. Quinn
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Richard J. B. Dobson
- Department of Biostatistics and Health Informatics, King’s College London, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London, London, United Kingdom
- Institute of Health Informatics, University College London, London, United Kingdom
- NIHR Biomedical Research Centre, University College London Hospitals NHS Foundation Trust, London, United Kingdom
| | - Chad M. Swanson
- Department of Infectious Diseases, School of Immunology and Microbial Sciences, King’s College London, London, United Kingdom
| | - Ammar Al-Chalabi
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- Department of Neurology, King’s College Hospital, London, United Kingdom
| | - Alfredo Iacoangeli
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- Department of Biostatistics and Health Informatics, King’s College London, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London, London, United Kingdom
- *Correspondence: Alfredo Iacoangeli,
| |
Collapse
|
15
|
Huminiecki Ł. Virtual Gene Concept and a Corresponding Pragmatic Research Program in Genetical Data Science. ENTROPY (BASEL, SWITZERLAND) 2021; 24:17. [PMID: 35052043 PMCID: PMC8774939 DOI: 10.3390/e24010017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/02/2021] [Accepted: 12/14/2021] [Indexed: 06/14/2023]
Abstract
Mendel proposed an experimentally verifiable paradigm of particle-based heredity that has been influential for over 150 years. The historical arguments have been reflected in the near past as Mendel's concept has been diversified by new types of omics data. As an effect of the accumulation of omics data, a virtual gene concept forms, giving rise to genetical data science. The concept integrates genetical, functional, and molecular features of the Mendelian paradigm. I argue that the virtual gene concept should be deployed pragmatically. Indeed, the concept has already inspired a practical research program related to systems genetics. The program includes questions about functionality of structural and categorical gene variants, about regulation of gene expression, and about roles of epigenetic modifications. The methodology of the program includes bioinformatics, machine learning, and deep learning. Education, funding, careers, standards, benchmarks, and tools to monitor research progress should be provided to support the research program.
Collapse
Affiliation(s)
- Łukasz Huminiecki
- Evolutionary, Computational, and Statistical Genetics, Department of Molecula Biology, Institute of Genetics and Animal Biotechnology, Polish Academy of Sciences, Postępu 36A, Jastrzębiec, 05-552 Warsaw, Poland
| |
Collapse
|
16
|
You Y, Tian L, Su S, Dong X, Jabbari JS, Hickey PF, Ritchie ME. Benchmarking UMI-based single-cell RNA-seq preprocessing workflows. Genome Biol 2021; 22:339. [PMID: 34906205 PMCID: PMC8672463 DOI: 10.1186/s13059-021-02552-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 11/22/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied. RESULTS Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis. CONCLUSIONS In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users.
Collapse
Affiliation(s)
- Yue You
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Luyi Tian
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Shian Su
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Xueyi Dong
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Jafar S. Jabbari
- Australian Genome Research Facility, Victorian Comprehensive Cancer Centre, Melbourne, Australia
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, The University of Melbourne at The Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Peter F. Hickey
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
- Single-Cell Open Research Endeavour (SCORE), The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
| | - Matthew E. Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
- School of Mathematics and Statistics, The University of Melbourne, Parkville, Australia
| |
Collapse
|
17
|
Sun D, Li X, Yin Z, Hou Z. The Full-Length Transcriptome Provides New Insights Into the Transcript Complexity of Abdominal Adipose and Subcutaneous Adipose in Pekin Ducks. Front Physiol 2021; 12:767739. [PMID: 34858212 PMCID: PMC8631521 DOI: 10.3389/fphys.2021.767739] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 10/21/2021] [Indexed: 01/12/2023] Open
Abstract
Adipose tissues have a central role in organisms, and adipose content is a crucial economic trait of poultry. Pekin duck is an ideal model to study the mechanism of abdominal and subcutaneous adipose deposition for its high ability of adipose synthesis and deposition. Alternative splicing contributes to functional diversity in abdominal and subcutaneous adipose. However, there has been no systematic analysis of the dynamics of differential alternative splicing of abdominal and subcutaneous adipose in Pekin duck. In our study, the Pacific Biosciences (PacBio) Iso-Seq technology was applied to explore the transcriptional complexity of abdominal and subcutaneous adipose in Pekin ducks. In total, 143,931 and 111,337 full-length non-chimeric transcriptome sequences of abdominal and subcutaneous adipocytes were obtained from 41.78 GB raw data, respectively. These data led us to identify 19,212 long non-coding RNAs (lncRNAs) and 74,571 alternative splicing events. In addition, combined with the next-generation sequencing technology, we correlated the structure and function annotation with the differential expression profiles of abdominal and subcutaneous adipose transcripts. This study identified lots of novel alternative splicing events and major transcripts of transcription factors related to adipose synthesis. STAT3 was reported as a vital gene for adipogenesis, and we found that its major transcript is STAT3-1, which may play a considerable role in the process of adipose synthesis in Pekin duck. This study greatly increases our understanding of the gene models, genome annotations, genome structures, and the complexity and diversity of abdominal and subcutaneous adipose in Pekin duck. These data provide insights into the regulation of alternative splicing events, which form an essential part of transcript diversity during adipogenesis in poultry. The results of this study provide an invaluable resource for studying alternative splicing and tissue-specific expression.
Collapse
Affiliation(s)
- Dandan Sun
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Xiaoqin Li
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Zhongtao Yin
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Zhuocheng Hou
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, China Agricultural University, Beijing, China
| |
Collapse
|
18
|
Chen Z, He X. Application of third-generation sequencing in cancer research. MEDICAL REVIEW (BERLIN, GERMANY) 2021; 1:150-171. [PMID: 37724303 PMCID: PMC10388785 DOI: 10.1515/mr-2021-0013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/09/2021] [Indexed: 09/20/2023]
Abstract
In the past several years, nanopore sequencing technology from Oxford Nanopore Technologies (ONT) and single-molecule real-time (SMRT) sequencing technology from Pacific BioSciences (PacBio) have become available to researchers and are currently being tested for cancer research. These methods offer many advantages over most widely used high-throughput short-read sequencing approaches and allow the comprehensive analysis of transcriptomes by identifying full-length splice isoforms and several other posttranscriptional events. In addition, these platforms enable structural variation characterization at a previously unparalleled resolution and direct detection of epigenetic marks in native DNA and RNA. Here, we present a comprehensive summary of important applications of these technologies in cancer research, including the identification of complex structure variants, alternatively spliced isoforms, fusion transcript events, and exogenous RNA. Furthermore, we discuss the impact of the newly developed nanopore direct RNA sequencing (RNA-Seq) approach in advancing epitranscriptome research in cancer. Although the unique challenges still present for these new single-molecule long-read methods, they will unravel many aspects of cancer genome complexity in unprecedented ways and present an encouraging outlook for continued application in an increasing number of different cancer research settings.
Collapse
Affiliation(s)
- Zhiao Chen
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Xianghuo He
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
- Key Laboratory of Breast Cancer in Shanghai, Fudan University Shanghai Cancer Center, Fudan University, Shanghai, China
| |
Collapse
|
19
|
Chen X, Chen Z, Wu H, Liu X, Nie F, Wang Z, Sun M. Comprehensive Genomic Characterization Analysis Identifies an Oncogenic Pseudogene RP11-3543B.1 in Human Gastric Cancer. Front Cell Dev Biol 2021; 9:743652. [PMID: 34660601 PMCID: PMC8511815 DOI: 10.3389/fcell.2021.743652] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 09/06/2021] [Indexed: 01/15/2023] Open
Abstract
Background: Gastrointestinal Cancer (GICs) is the most common group of malignancies, and many of its types are the leading causes of cancer related death worldwide. Pseudogenes have been revealed to have critical regulatory roles in human cancers. The objective of this study is to comprehensive characterize the pseudogenes expression profiling and identify key pseudogenes in the development of gastric cancer (GC). Methods: The pseudogenes expression profiling was analyzed in six types of GICs cancer from The Cancer Genome Atlas RNA-seq data to identify GICs cancer related pseudogenes. Meanwhile, the genomic characterization including somatic alterations of pseudogenes was analyzed. Then, CCK8 and colony formation assays were performed to evaluate the biological function of RP11-3543B.1 and miR-145 in gastric cancer cells. The mechanisms of pseudogene RP11-3543B.1 in GC cells were explored via using bioinformatics analysis, next generation sequencing and lucifarese reporter assay. Results: We identified a great number of pseudogenes with significantly altered expression in GICs, and some of these pseudogenes expressed differently among the six cancer types. The amplification or deletion in the pseudogenes-containing loci involved in the alterations of pseudogenes expression in GICs. Among these altered pseudogenes, RP11-3543B.1 is significantly upregulated in gastric cancer. Down-regulation of RP11-3543B.1 expression impaired GC cells proliferation both in vitro and in vivo. RP11-3543B.1 exerts oncogene function via targeting miR-145-5p to regulate MAPK4 expression in gastric cancer cells. Conclusion: Our study reveals the potential of pseudogenes expression as a new paradigm for investigating GI cancer tumorigenesis and discovering prognostic biomarkers for patients.
Collapse
Affiliation(s)
- Xin Chen
- Department of Oncology, Second Affiliated Hospital, Nanjing Medical University, Nanjing, China
| | - Zhenyao Chen
- Department of Oncology, Second Affiliated Hospital, Nanjing Medical University, Nanjing, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Hao Wu
- Department of Oncology, First Affiliated Hospital, Nanjing Medical University, Nanjing, China
| | - Xianghua Liu
- Department of Biochemistry and Molecular Biology, Nanjing Medical University, Nanjing, China
| | - Fengqi Nie
- Department of Oncology, Second Affiliated Hospital, Nanjing Medical University, Nanjing, China
| | - Zhaoxia Wang
- Department of Oncology, Second Affiliated Hospital, Nanjing Medical University, Nanjing, China
| | - Ming Sun
- Suzhou Cancer Center Core Laboratory, Suzhou Municipal Hospital, Gusu School, The Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou, China
| |
Collapse
|
20
|
Troskie RL, Faulkner GJ, Cheetham SW. Processed pseudogenes: A substrate for evolutionary innovation: Retrotransposition contributes to genome evolution by propagating pseudogene sequences with rich regulatory potential throughout the genome. Bioessays 2021; 43:e2100186. [PMID: 34569081 DOI: 10.1002/bies.202100186] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/09/2021] [Accepted: 09/13/2021] [Indexed: 11/08/2022]
Abstract
Processed pseudogenes may serve as a genetic reservoir for evolutionary innovation. Here, we argue that through the activity of long interspersed element-1 retrotransposons, processed pseudogenes disperse coding and noncoding sequences rich with regulatory potential throughout the human genome. While these sequences may appear to be non-functional, a lack of contemporary function does not prohibit future development of biological activity. Here, we discuss the dynamic evolution of certain processed pseudogenes into coding and noncoding genes and regulatory elements, and their implication in wide-ranging biological and pathological processes. Also see the video abstract here: https://youtu.be/iUY_mteVoPI.
Collapse
Affiliation(s)
- Robin-Lee Troskie
- Mater Research Institute, University of Queensland, Woolloongabba, Australia
| | - Geoffrey J Faulkner
- Mater Research Institute, University of Queensland, Woolloongabba, Australia.,Queensland Brain Institute, University of Queensland, Brisbane, Australia
| | - Seth W Cheetham
- Mater Research Institute, University of Queensland, Woolloongabba, Australia
| |
Collapse
|
21
|
Dorado G, Gálvez S, Rosales TE, Vásquez VF, Hernández P. Analyzing Modern Biomolecules: The Revolution of Nucleic-Acid Sequencing - Review. Biomolecules 2021; 11:1111. [PMID: 34439777 PMCID: PMC8393538 DOI: 10.3390/biom11081111] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 07/12/2021] [Accepted: 07/23/2021] [Indexed: 02/06/2023] Open
Abstract
Recent developments have revolutionized the study of biomolecules. Among them are molecular markers, amplification and sequencing of nucleic acids. The latter is classified into three generations. The first allows to sequence small DNA fragments. The second one increases throughput, reducing turnaround and pricing, and is therefore more convenient to sequence full genomes and transcriptomes. The third generation is currently pushing technology to its limits, being able to sequence single molecules, without previous amplification, which was previously impossible. Besides, this represents a new revolution, allowing researchers to directly sequence RNA without previous retrotranscription. These technologies are having a significant impact on different areas, such as medicine, agronomy, ecology and biotechnology. Additionally, the study of biomolecules is revealing interesting evolutionary information. That includes deciphering what makes us human, including phenomena like non-coding RNA expansion. All this is redefining the concept of gene and transcript. Basic analyses and applications are now facilitated with new genome editing tools, such as CRISPR. All these developments, in general, and nucleic-acid sequencing, in particular, are opening a new exciting era of biomolecule analyses and applications, including personalized medicine, and diagnosis and prevention of diseases for humans and other animals.
Collapse
Affiliation(s)
- Gabriel Dorado
- Dep. Bioquímica y Biología Molecular, Campus Rabanales C6-1-E17, Campus de Excelencia Internacional Agroalimentario (ceiA3), Universidad de Córdoba, 14071 Córdoba, Spain
| | - Sergio Gálvez
- Dep. Lenguajes y Ciencias de la Computación, Boulevard Louis Pasteur 35, Universidad de Málaga, 29071 Málaga, Spain;
| | - Teresa E. Rosales
- Laboratorio de Arqueobiología, Avda. Universitaria s/n, Universidad Nacional de Trujillo, 13011 Trujillo, Peru;
| | - Víctor F. Vásquez
- Centro de Investigaciones Arqueobiológicas y Paleoecológicas Andinas Arqueobios, Martínez de Companón 430-Bajo 100, Urbanización San Andres, 13088 Trujillo, Peru;
| | - Pilar Hernández
- Instituto de Agricultura Sostenible (IAS), Consejo Superior de Investigaciones Científicas (CSIC), Alameda del Obispo s/n, 14080 Córdoba, Spain;
| |
Collapse
|
22
|
Faulkner GJ. The evolving gene regulatory landscape-a tinkerer of complex creatures. Genome Biol 2021; 22:199. [PMID: 34238343 PMCID: PMC8265009 DOI: 10.1186/s13059-021-02412-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Geoffrey J Faulkner
- Queensland Brain Institute, University of Queensland, Brisbane, QLD, 4072, Australia. .,Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD, 4102, Australia.
| |
Collapse
|