1
|
Ramberg S, Høyheim B, Østbye TKK, Andreassen R. A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon. Front Genet 2021; 12:656334. [PMID: 33986770 PMCID: PMC8110904 DOI: 10.3389/fgene.2021.656334] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 04/01/2021] [Indexed: 12/18/2022] Open
Abstract
Atlantic salmon (Salmo salar) is a major species produced in world aquaculture and an important vertebrate model organism for studying the process of rediploidization following whole genome duplication events (Ss4R, 80 mya). The current Salmo salar transcriptome is largely generated from genome sequence based in silico predictions supported by ESTs and short-read sequencing data. However, recent progress in long-read sequencing technologies now allows for full-length transcript sequencing from single RNA-molecules. This study provides a de novo full-length mRNA transcriptome from liver, head-kidney and gill materials. A pipeline was developed based on Iso-seq sequencing of long-reads on the PacBio platform (HQ reads) followed by error-correction of the HQ reads by short-reads from the Illumina platform. The pipeline successfully processed more than 1.5 million long-reads and more than 900 million short-reads into error-corrected HQ reads. A surprisingly high percentage (32%) represented expressed interspersed repeats, while the remaining were processed into 71 461 full-length mRNAs from 23 071 loci. Each transcript was supported by several single-molecule long-read sequences and at least three short-reads, assuring a high sequence accuracy. On average, each gene was represented by three isoforms. Comparisons to the current Atlantic salmon transcripts in the RefSeq database showed that the long-read transcriptome validated 25% of all known transcripts, while the remaining full-length transcripts were novel isoforms, but few were transcripts from novel genes. A comparison to the current genome assembly indicates that the long-read transcriptome may aid in improving transcript annotation as well as provide long-read linkage information useful for improving the genome assembly. More than 80% of transcripts were assigned GO terms and thousands of transcripts were from genes or splice-variants expressed in an organ-specific manner demonstrating that hybrid error-corrected long-read transcriptomes may be applied to study genes and splice-variants expressed in certain organs or conditions (e.g., challenge materials). In conclusion, this is the single largest contribution of full-length mRNAs in Atlantic salmon. The results will be of great value to salmon genomics research, and the pipeline outlined may be applied to generate additional de novo transcriptomes in Atlantic Salmon or applied for similar projects in other species.
Collapse
Affiliation(s)
- Sigmund Ramberg
- Department of Life Sciences and Health, Faculty of Health Sciences, OsloMet - Oslo Metropolitan University, Oslo, Norway
| | - Bjørn Høyheim
- Department of Preclinical Sciences and Pathology, Faculty of Veterinary Medicine, Norwegian University of Life Sciences, Ås, Norway
| | | | - Rune Andreassen
- Department of Life Sciences and Health, Faculty of Health Sciences, OsloMet - Oslo Metropolitan University, Oslo, Norway
| |
Collapse
|
2
|
Li HD, Zhang W, Luo Y, Wang J. IsoDetect: Detection of Splice Isoforms from Third Generation Long Reads Based on Short Feature Sequences. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200316101205] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Transcriptome annotation is the basis for understanding gene structures
and analysing gene expression. The transcriptome annotation of many organisms such as humans
is far from incomplete, due partly to the challenge in the identification of isoforms that are
produced from the same gene through alternative splicing. Third generation sequencing (TGS)
reads provide unprecedented opportunity for detecting isoforms due to their long length that
exceeds the length of most isoforms. One limitation of current TGS reads-based isoform detection
methods is that they are exclusively based on sequence reads, without incorporating the sequence
information of annotated isoforms.
Objective:
We aim to develop a method to detect isoforms by incorporating annotated isoforms.
Methods:
Based on annotated isoforms, we propose a splice isoform detection method called
IsoDetect. First, the sequence at exon-exon junctions is extracted from annotated isoforms as
“short feature sequences”, which is used to distinguish splice isoforms. Second, we align these
feature sequences to long reads and partition long reads into groups that contain the same set of
feature sequences, thereby avoiding the pair-wise comparison among the large number of long
reads. Third, clustering and consensus generation are carried out based on sequence similarity. For
the long reads that do not contain any short feature sequence, clustering analysis based on
sequence similarity is performed to identify isoforms. Therefore, our method can detect not only
known but also novel isoforms.
Result:
Tested on two datasets from Calypte anna and Zebra Finch, IsoDetect shows higher speed
and good accuracies compared with four existing methods.
Conclusion:
IsoDetect may become a promising method for isoform detection.
Collapse
Affiliation(s)
- Hong-Dong Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Wenjing Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Yuwen Luo
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
3
|
Ma L, Chen Z, Huang DW, Cissé OH, Rothenburger JL, Latinne A, Bishop L, Blair R, Brenchley JM, Chabé M, Deng X, Hirsch V, Keesler R, Kutty G, Liu Y, Margolis D, Morand S, Pahar B, Peng L, Van Rompay KKA, Song X, Song J, Sukura A, Thapar S, Wang H, Weissenbacher-Lang C, Xu J, Lee CH, Jardine C, Lempicki RA, Cushion MT, Cuomo CA, Kovacs JA. Diversity and Complexity of the Large Surface Protein Family in the Compacted Genomes of Multiple Pneumocystis Species. mBio 2020; 11:e02878-19. [PMID: 32127451 PMCID: PMC7064768 DOI: 10.1128/mbio.02878-19] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 01/16/2020] [Indexed: 12/23/2022] Open
Abstract
Pneumocystis, a major opportunistic pathogen in patients with a broad range of immunodeficiencies, contains abundant surface proteins encoded by a multicopy gene family, termed the major surface glycoprotein (Msg) gene superfamily. This superfamily has been identified in all Pneumocystis species characterized to date, highlighting its important role in Pneumocystis biology. In this report, through a comprehensive and in-depth characterization of 459 msg genes from 7 Pneumocystis species, we demonstrate, for the first time, the phylogeny and evolution of conserved domains in Msg proteins and provide a detailed description of the classification, unique characteristics, and phylogenetic relatedness of five Msg families. We further describe, for the first time, the relative expression levels of individual msg families in two rodent Pneumocystis species, the substantial variability of the msg repertoires in P. carinii from laboratory and wild rats, and the distinct features of the expression site for the classic msg genes in Pneumocystis from 8 mammalian host species. Our analysis suggests multiple functions for this superfamily rather than just conferring antigenic variation to allow immune evasion as previously believed. This study provides a rich source of information that lays the foundation for the continued experimental exploration of the functions of the Msg superfamily in Pneumocystis biology.IMPORTANCEPneumocystis continues to be a major cause of disease in humans with immunodeficiency, especially those with HIV/AIDS and organ transplants, and is being seen with increasing frequency worldwide in patients treated with immunodepleting monoclonal antibodies. Annual health care associated with Pneumocystis pneumonia costs ∼$475 million dollars in the United States alone. In addition to causing overt disease in immunodeficient individuals, Pneumocystis can cause subclinical infection or colonization in healthy individuals, which may play an important role in species preservation and disease transmission. Our work sheds new light on the diversity and complexity of the msg superfamily and strongly suggests that the versatility of this superfamily reflects multiple functions, including antigenic variation to allow immune evasion and optimal adaptation to host environmental conditions to promote efficient infection and transmission. These findings are essential to consider in developing new diagnostic and therapeutic strategies.
Collapse
Affiliation(s)
- Liang Ma
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Zehua Chen
- Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Da Wei Huang
- Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland, USA
| | - Ousmane H Cissé
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Jamie L Rothenburger
- Department of Pathobiology, Canadian Wildlife Health Cooperative, Ontario Veterinary College, University of Guelph, Ontario, Canada
| | | | - Lisa Bishop
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Robert Blair
- Tulane National Primate Research Center, Tulane University, New Orleans, Louisiana, USA
| | - Jason M Brenchley
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Magali Chabé
- Université Lille, CNRS, Inserm, CHU Lille, Institut Pasteur de Lille, U1019-UMR 8204-CIIL-Centre d'Infection et d'Immunité de Lille, Lille, France
| | - Xilong Deng
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Vanessa Hirsch
- Laboratory of Molecular Microbiology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Rebekah Keesler
- California National Primate Research Center, University of California, Davis, Davis, California, USA
| | - Geetha Kutty
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Yueqin Liu
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Daniel Margolis
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Serge Morand
- Institut des Sciences de l'Evolution, Université de Montpellier 2, Montpellier, France
| | - Bapi Pahar
- Tulane National Primate Research Center, Tulane University, New Orleans, Louisiana, USA
| | - Li Peng
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Koen K A Van Rompay
- California National Primate Research Center, University of California, Davis, Davis, California, USA
| | - Xiaohong Song
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Jun Song
- Center for Advanced Models for Translational Sciences and Therapeutics, University of Michigan Medical Center, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Antti Sukura
- Department of Veterinary Pathology, Faculty of Veterinary Medicine, University of Helsinki, Helsinki, Finland
| | - Sabrina Thapar
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Honghui Wang
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | | | - Jie Xu
- Center for Advanced Models for Translational Sciences and Therapeutics, University of Michigan Medical Center, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Chao-Hung Lee
- Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Claire Jardine
- Department of Pathobiology, Canadian Wildlife Health Cooperative, Ontario Veterinary College, University of Guelph, Ontario, Canada
| | - Richard A Lempicki
- Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland, USA
| | - Melanie T Cushion
- Department of Internal Medicine, College of Medicine, University of Cincinnati, Cincinnati, Ohio, USA
| | - Christina A Cuomo
- Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Joseph A Kovacs
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
4
|
Kumar V, Vollbrecht T, Chernyshev M, Mohan S, Hanst B, Bavafa N, Lorenzo A, Kumar N, Ketteringham R, Eren K, Golden M, Oliveira MF, Murrell B. Long-read amplicon denoising. Nucleic Acids Res 2019; 47:e104. [PMID: 31418021 PMCID: PMC6765106 DOI: 10.1093/nar/gkz657] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 07/03/2019] [Accepted: 07/17/2019] [Indexed: 01/03/2023] Open
Abstract
Long-read next-generation amplicon sequencing shows promise for studying complete genes or genomes from complex and diverse populations. Current long-read sequencing technologies have challenging error profiles, hindering data processing and incorporation into downstream analyses. Here we consider the problem of how to reconstruct, free of sequencing error, the true sequence variants and their associated frequencies from PacBio reads. Called 'amplicon denoising', this problem has been extensively studied for short-read sequencing technologies, but current solutions do not always successfully generalize to long reads with high indel error rates. We introduce two methods: one that runs nearly instantly and is very accurate for medium length reads and high template coverage, and another, slower method that is more robust when reads are very long or coverage is lower. On two Mock Virus Community datasets with ground truth, each sequenced on a different PacBio instrument, and on a number of simulated datasets, we compare our two approaches to each other and to existing algorithms. We outperform all tested methods in accuracy, with competitive run times even for our slower method, successfully discriminating templates that differ by a just single nucleotide. Julia implementations of Fast Amplicon Denoising (FAD) and Robust Amplicon Denoising (RAD), and a webserver interface, are freely available.
Collapse
Affiliation(s)
- Venkatesh Kumar
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm 17177, Sweden
- Department of Medicine, University of California, San Diego, La Jolla 92093, CA, USA
| | - Thomas Vollbrecht
- Department of Medicine, University of California, San Diego, La Jolla 92093, CA, USA
| | - Mark Chernyshev
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm 17177, Sweden
- Department of Medicine, University of California, San Diego, La Jolla 92093, CA, USA
| | - Sanjay Mohan
- Department of Medicine, University of California, San Diego, La Jolla 92093, CA, USA
| | - Brian Hanst
- Department of Biology, University of California, San Diego, La Jolla 92093, CA, USA
| | - Nicholas Bavafa
- Department of Medicine, University of California, San Diego, La Jolla 92093, CA, USA
| | - Antonia Lorenzo
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm 17177, Sweden
- Department of Medicine, University of California, San Diego, La Jolla 92093, CA, USA
| | - Nikesh Kumar
- Department of Medicine, University of California, San Diego, La Jolla 92093, CA, USA
| | - Robert Ketteringham
- Department of Pathology, Institute of Infectious Diseases and Molecular Medicine, Faculty of Health Science, University of Cape Town, Cape Town 7925, South Africa
| | - Kemal Eren
- Department of Medicine, University of California, San Diego, La Jolla 92093, CA, USA
| | - Michael Golden
- Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
| | - Michelli F Oliveira
- Department of Medicine, University of California, San Diego, La Jolla 92093, CA, USA
| | - Ben Murrell
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm 17177, Sweden
- Department of Medicine, University of California, San Diego, La Jolla 92093, CA, USA
| |
Collapse
|
5
|
Mohindra V, Dangi T, Chowdhury LM, Jena JK. Tissue specific alpha-2-Macroglobulin (A2M) splice isoform diversity in Hilsa shad, Tenualosa ilisha (Hamilton, 1822). PLoS One 2019; 14:e0216144. [PMID: 31335900 PMCID: PMC6650032 DOI: 10.1371/journal.pone.0216144] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2018] [Accepted: 04/15/2019] [Indexed: 12/12/2022] Open
Abstract
The present study, for the first time, reported twelve A2M isoforms in Tenualosa ilisha, through SMRT sequencing. Hilsa shad, T. ilisha, an anadromous fish, faces environmental stresses and is thus prone to diseases. Here, expression profiles of different A2M isoforms in four tissues were studied in T. ilisha, for the tissue specific diversity of A2M. Large scale high quality full length transcripts (>0.99% accuracy) were obtained from liver, ovary, testes and gill transcriptomes, through Iso-sequencing on PacBio RSII. A total of 12 isoforms, with complete putatative proteins, were detected in three tissues (7 isoforms in liver, 4 in ovary and 1 in testes). Complete structure of A2M mRNA was predicted from these isoforms, containing 4680 bp sequence, 35 exons and 1508 amino acids. With Homo sapiens A2M as reference, six functional domains (A2M_N,A2M_N2, A2M, Thiol-ester_cl, Complement and Receptor domain), along with a bait region, were predicted in A2M consensus protein. A total of 35 splice sites were identified in T. ilisha A2M consensus transcript, with highest frequency (55.7%) of GT-AG splice sites, as compared to that of Homo sapiens. Liver showed longest isoform (X1) consisting of all domains, while smallest (X10) was found in ovary with one Receptor domain. Present study predicted five putative markers (I-212, I-269, A-472, S-567 and Y-906) for EUS disease resistance in A2M protein, which were present in MG2 domains (A2M_N and A2M_N2), by comparing with that of resistant and susceptible/unknown response species. These markers classified fishes into two groups, resistant and susceptible response. Potential markers, predicted in T. ilisha, placed it to be EUS susceptible category. Putative markers reported in A2M protein may serve as molecular markers in diagnosis of EUS disease resistance/susceptibility in fishes and may have a potential for inclusion in the marker panel for pilot studies. Further, challenging studies are required to confirm the role of particular A2M isoforms and markers identified in immune protection against EUS disease.
Collapse
Affiliation(s)
- Vindhya Mohindra
- ICAR-National Bureau of Fish Genetic Resources (ICAR-NBFGR), Lucknow, India
- * E-mail: ,
| | - Tanushree Dangi
- ICAR-National Bureau of Fish Genetic Resources (ICAR-NBFGR), Lucknow, India
| | | | - J. K. Jena
- Indian Council of Agricultural Research (ICAR), Krishi Anusandhan Bhawan—II, New Delhi, India
| |
Collapse
|
6
|
Matsumura W, Fujita Y, Shinkuma S, Suzuki S, Yokoshiki S, Goto H, Hayashi H, Ono K, Inoie M, Takashima S, Nakayama C, Nomura T, Nakamura H, Abe R, Sato N, Shimizu H. Cultured Epidermal Autografts from Clinically Revertant Skin as a Potential Wound Treatment for Recessive Dystrophic Epidermolysis Bullosa. J Invest Dermatol 2019; 139:2115-2124.e11. [PMID: 31054844 DOI: 10.1016/j.jid.2019.03.1155] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Revised: 03/05/2019] [Accepted: 03/20/2019] [Indexed: 01/19/2023]
Abstract
Inherited skin disorders have been reported recently to have sporadic normal-looking areas, where a portion of the keratinocytes have recovered from causative gene mutations (revertant mosaicism). We observed a case of recessive dystrophic epidermolysis bullosa treated with cultured epidermal autografts (CEAs), whose CEA-grafted site remained epithelized for 16 years. We proved that the CEA product and the grafted area included cells with revertant mosaicism. Based on these findings, we conducted an investigator-initiated clinical trial of CEAs from clinically revertant skin for recessive dystrophic epidermolysis bullosa. The donor sites were analyzed by genetic analysis, immunofluorescence, electron microscopy, and quantification of the reverted mRNA with deep sequencing. The primary endpoint was the ulcer epithelization rate per patient at 4 weeks after the last CEA application. Three patients with recessive dystrophic epidermolysis bullosa with 8 ulcers were enrolled, and the epithelization rate for each patient at the primary endpoint was 87.7%, 100%, and 57.0%, respectively. The clinical effects were found to persist for at least 76 weeks after CEA transplantation. One of the three patients had apparent revertant mosaicism in the donor skin and in the post-transplanted area. CEAs from clinically normal skin are a potentially well-tolerated treatment for recessive dystrophic epidermolysis bullosa.
Collapse
Affiliation(s)
- Wakana Matsumura
- Department of Dermatology, Hokkaido University Graduate School of Medicine, Sapporo, Japan
| | - Yasuyuki Fujita
- Department of Dermatology, Hokkaido University Graduate School of Medicine, Sapporo, Japan.
| | - Satoru Shinkuma
- Department of Dermatology, Hokkaido University Graduate School of Medicine, Sapporo, Japan; Department of Dermatology, Niigata University, Niigata, Japan
| | - Shotaro Suzuki
- Department of Dermatology, Hokkaido University Graduate School of Medicine, Sapporo, Japan
| | - Saki Yokoshiki
- Clinical Research and Medical Innovation Center, Hokkaido University Hospital, Sapporo, Japan
| | - Hideki Goto
- Clinical Research and Medical Innovation Center, Hokkaido University Hospital, Sapporo, Japan; Department of Hematology, Hokkaido University Faculty of Medicine, Sapporo, Japan
| | - Hiroshi Hayashi
- Clinical Research and Medical Innovation Center, Hokkaido University Hospital, Sapporo, Japan
| | - Kota Ono
- Clinical Research and Medical Innovation Center, Hokkaido University Hospital, Sapporo, Japan
| | | | - Shota Takashima
- Department of Dermatology, Hokkaido University Graduate School of Medicine, Sapporo, Japan
| | - Chihiro Nakayama
- Department of Dermatology, Hokkaido University Graduate School of Medicine, Sapporo, Japan
| | - Toshifumi Nomura
- Department of Dermatology, Hokkaido University Graduate School of Medicine, Sapporo, Japan
| | - Hideki Nakamura
- Department of Dermatology, Hokkaido University Graduate School of Medicine, Sapporo, Japan
| | - Riichiro Abe
- Department of Dermatology, Hokkaido University Graduate School of Medicine, Sapporo, Japan; Department of Dermatology, Niigata University, Niigata, Japan
| | - Norihiro Sato
- Clinical Research and Medical Innovation Center, Hokkaido University Hospital, Sapporo, Japan
| | - Hiroshi Shimizu
- Department of Dermatology, Hokkaido University Graduate School of Medicine, Sapporo, Japan
| |
Collapse
|
7
|
Wei C, Li M, Qin J, Xu Y, Zhang Y, Wang H. Transcriptome analysis reveals the effects of grafting on sweetpotato scions during the full blooming stages. Genes Genomics 2019; 41:895-907. [PMID: 31030407 DOI: 10.1007/s13258-019-00823-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 04/20/2019] [Indexed: 01/21/2023]
Abstract
BACKGROUND Sweetpotato (Ipomoea batatas) is a hexaploid plant and generally most genotypes do not flower at all in sub-tropics. Heterografting was carried out between sweetpotato cultivar 'Xushu 18' and Japanese morning glory (Ipomoea nil). With sweetpotato as 'scion' and I. nil as 'rootstock', sweetpotato was induced flowering in the autumn. However, little is known about the molecular mechanisms underlying sweetpotato responses to grafting, especially during the full blooming stages. OBJECTIVES To investigate the poorly understood molecular responses underlying the grafting-induced phenotypic processes in sweetpotato at full anthesis. METHODS In this study, to explore the transcriptome diversity and complexity of sweetpotato, PacBio Iso-Seq and Illumina RNA-seq analysis were combined to obtain full-length transcripts and to profile the changes in gene expression of five tissues: scion flowers (SF), scion leaves (SL), scion stems (SS), own-rooted leaves (OL) and own-rooted stems (OS). RESULTS A total of 138,151 transcripts were generated with an average length of 2255 bp, and more than 72% (100,396) of the transcripts were full-length. During full blooming, to examine the difference in gene expression of sweetpotato under grafting and natural growth conditions, 7905, 7795 and 15,707 differentially expressed genes were detected in pairwise comparisons of OS versus SS, OL versus SL and SL versus SF, respectively. Moreover, differential transcription of genes associated with anthocyanin biosynthesis, light pathway and photosynthesis, ethylene signal transduction pathway was observed in scion responses to grafting. CONCLUSION Our study is useful in understanding the molecular basis of grafting-induced flowering in grafted sweetpotatoes, and will lay a foundation for further research on sweetpotato breeding in the future.
Collapse
Affiliation(s)
- Changhe Wei
- Key Laboratory of Bio-resources and Eco-environment, Ministry of Education, Sichuan Key Laboratory of Molecular Biology and Biotechnology, College of Life Sciences, Sichuan University, Chengdu, 610064, China
| | - Ming Li
- Key Laboratory of Bio-resources and Eco-environment, Ministry of Education, Sichuan Key Laboratory of Molecular Biology and Biotechnology, College of Life Sciences, Sichuan University, Chengdu, 610064, China.,Institute of Biotechnology and Nuclear Technology, Sichuan Academy of Agricultural Sciences, Chengdu, 610061, China
| | - Jia Qin
- Key Laboratory of Bio-resources and Eco-environment, Ministry of Education, Sichuan Key Laboratory of Molecular Biology and Biotechnology, College of Life Sciences, Sichuan University, Chengdu, 610064, China
| | - Yunfan Xu
- Key Laboratory of Bio-resources and Eco-environment, Ministry of Education, Sichuan Key Laboratory of Molecular Biology and Biotechnology, College of Life Sciences, Sichuan University, Chengdu, 610064, China
| | - Yizheng Zhang
- Key Laboratory of Bio-resources and Eco-environment, Ministry of Education, Sichuan Key Laboratory of Molecular Biology and Biotechnology, College of Life Sciences, Sichuan University, Chengdu, 610064, China
| | - Haiyan Wang
- Key Laboratory of Bio-resources and Eco-environment, Ministry of Education, Sichuan Key Laboratory of Molecular Biology and Biotechnology, College of Life Sciences, Sichuan University, Chengdu, 610064, China.
| |
Collapse
|
8
|
Chao Y, Yuan J, Guo T, Xu L, Mu Z, Han L. Analysis of transcripts and splice isoforms in Medicago sativa L. by single-molecule long-read sequencing. PLANT MOLECULAR BIOLOGY 2019; 99:219-235. [PMID: 30600412 DOI: 10.1007/s11103-018-0813-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 12/14/2018] [Indexed: 05/20/2023]
Abstract
The full-length transcriptome of alfalfa was analyzed with PacBio single-molecule long-read sequencing technology. The transcriptome data provided full-length sequences and gene isoforms of transcripts in alfalfa, which will improve genome annotation and enhance our understanding of the gene structure of alfalfa. As an important forage, alfalfa (Medicago sativa L.) is world-wide planted. For its complexity of genome and unfinished whole genome sequencing, the sequences and complete structure of mRNA transcripts remain unclear in alfalfa. In this study, single-molecule long-read sequencing was applied to investigate the alfalfa transcriptome using the Pacific Biosciences platform, and a total of 113,321 transcripts were obtained from young, mature and senescent leaves. We identified 72,606 open reading frames including 46,616 full-length ORFs, 1670 transcription factors from 54 TF families and 44,040 simple sequence repeats from 30,797 sequences. A total of 7568 alternative splicing events was identified and the majority of alternative splicing events in alfalfa was intron retention. In addition, we identified 17,740 long non-coding RNAs. Our results show the feasibility of deep sequencing full-length RNA from alfalfa transcriptome on a single-molecule level.
Collapse
Affiliation(s)
- Yuehui Chao
- Turfgrass Research Institute, Beijing Forestry University, Beijing, 100083, China
| | - Jianbo Yuan
- Turfgrass Research Institute, Beijing Forestry University, Beijing, 100083, China
| | - Tao Guo
- Turfgrass Research Institute, Beijing Forestry University, Beijing, 100083, China
| | - Lixin Xu
- Turfgrass Research Institute, Beijing Forestry University, Beijing, 100083, China
| | - Zhiyuan Mu
- Turfgrass Research Institute, Beijing Forestry University, Beijing, 100083, China
| | - Liebao Han
- Turfgrass Research Institute, Beijing Forestry University, Beijing, 100083, China.
| |
Collapse
|
9
|
Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons. PLoS Comput Biol 2018; 14:e1006498. [PMID: 30543621 PMCID: PMC6314628 DOI: 10.1371/journal.pcbi.1006498] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 01/02/2019] [Accepted: 09/10/2018] [Indexed: 01/07/2023] Open
Abstract
Next generation sequencing of viral populations has advanced our understanding of viral population dynamics, the development of drug resistance, and escape from host immune responses. Many applications require complete gene sequences, which can be impossible to reconstruct from short reads. HIV env, the protein of interest for HIV vaccine studies, is exceptionally challenging for long-read sequencing and analysis due to its length, high substitution rate, and extensive indel variation. While long-read sequencing is attractive in this setting, the analysis of such data is not well handled by existing methods. To address this, we introduce FLEA (Full-Length Envelope Analyzer), which performs end-to-end analysis and visualization of long-read sequencing data. FLEA consists of both a pipeline (optionally run on a high-performance cluster), and a client-side web application that provides interactive results. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN/dS) across time and across protein structure, and a phylogenetic tree browser. We demonstrate how FLEA may be used to process Pacific Biosciences HIV env data and describe recent examples of its use. Simulations show how FLEA dramatically reduces the error rate of this sequencing platform, providing an accurate portrait of complex and variable HIV env populations. A public instance of FLEA is hosted at http://flea.datamonkey.org. The Python source code for the FLEA pipeline can be found at https://github.com/veg/flea-pipeline. The client-side application is available at https://github.com/veg/flea-web-app. A live demo of the P018 results can be found at http://flea.murrell.group/view/P018. Viral populations constantly evolve and diversify. In this article we introduce a method, FLEA, for reconstructing and visualizing the details of evolutionary changes. FLEA specifically processes data from sequencing platforms that generate reads that are long, but error-prone. To study the evolutionary dynamics of entire genes during viral infection, data is collected via long-read sequencing at discrete time points, allowing us to understand how the virus changes over time. However, the experimental and sequencing process is imperfect, so the resulting data contain not only real evolutionary changes, but also mutations and other genetic artifacts caused by sequencing errors. Our method corrects most of these errors by combining thousands of erroneous sequences into a much smaller number of unique consensus sequences that represent biologically meaningful variation. The resulting high-quality sequences are used for further analysis, such as building an evolutionary tree that tracks and interprets the genetic changes in the viral population over time. FLEA is open source, and is freely available online.
Collapse
|
10
|
Sahlin K, Tomaszkiewicz M, Makova KD, Medvedev P. Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon. Nat Commun 2018; 9:4601. [PMID: 30389934 PMCID: PMC6214943 DOI: 10.1038/s41467-018-06910-x] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 09/29/2018] [Indexed: 12/30/2022] Open
Abstract
A significant portion of genes in vertebrate genomes belongs to multigene families, with each family containing several gene copies whose presence/absence, as well as isoform structure, can be highly variable across individuals. Existing de novo techniques for assaying the sequences of such highly-similar gene families fall short of reconstructing end-to-end transcripts with nucleotide-level precision or assigning alternatively spliced transcripts to their respective gene copies. We present IsoCon, a high-precision method using long PacBio Iso-Seq reads to tackle this challenge. We apply IsoCon to nine Y chromosome ampliconic gene families and show that it outperforms existing methods on both experimental and simulated data. IsoCon has allowed us to detect an unprecedented number of novel isoforms and has opened the door for unraveling the structure of many multigene families and gaining a deeper understanding of genome evolution and human diseases.
Collapse
Affiliation(s)
- Kristoffer Sahlin
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16802, USA
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Medical Genomics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Computational Biology and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
| | - Paul Medvedev
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Medical Genomics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Computational Biology and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
11
|
Delaye L, Ruiz-Ruiz S, Calderon E, Tarazona S, Conesa A, Moya A. Evidence of the Red-Queen Hypothesis from Accelerated Rates of Evolution of Genes Involved in Biotic Interactions in Pneumocystis. Genome Biol Evol 2018; 10:1596-1606. [PMID: 29893833 PMCID: PMC6012782 DOI: 10.1093/gbe/evy116] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/05/2018] [Indexed: 01/15/2023] Open
Abstract
Pneumocystis species are ascomycete fungi adapted to live inside the lungs of mammals. These ascomycetes show extensive stenoxenism, meaning that each species of Pneumocystis infects a single species of host. Here, we study the effect exerted by natural selection on gene evolution in the genomes of three Pneumocystis species. We show that genes involved in host interaction evolve under positive selection. In the first place, we found strong evidence of episodic diversifying selection in Major surface glycoproteins (Msg). These proteins are located on the surface of Pneumocystis and are used for host attachment and probably for immune system evasion. Consistent with their function as antigens, most sites under diversifying selection in Msg code for residues with large relative surface accessibility areas. We also found evidence of positive selection in part of the cell machinery used to export Msg to the cell surface. Specifically, we found that genes participating in glycosylphosphatidylinositol (GPI) biosynthesis show an increased rate of nonsynonymous substitutions (dN) versus synonymous substitutions (dS). GPI is a molecule synthesized in the endoplasmic reticulum that is used to anchor proteins to membranes. We interpret the aforementioned findings as evidence of selective pressure exerted by the host immune system on Pneumocystis species, shaping the evolution of Msg and several proteins involved in GPI biosynthesis. We suggest that genome evolution in Pneumocystis is well described by the Red-Queen hypothesis whereby genes relevant for biotic interactions show accelerated rates of evolution.
Collapse
Affiliation(s)
- Luis Delaye
- Departamento de Ingeniería Genética, CINVESTAV Irapuato, Guanajuato, México
| | - Susana Ruiz-Ruiz
- Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunitat Valenciana (FISABIO)-Salud Pública, València, Spain
| | - Enrique Calderon
- Instituto de Biomedicina de Sevilla, Hospital Universitario Virgen del Rocío/Consejo Superior de Investigaciones Científicas/Universidad de Sevilla.,Investigación Biomédica en Red de Epidemiología y Salud Pública (CIBERESP), Madrid, Spain
| | - Sonia Tarazona
- Centro de Investigacion Principe Felipe, València, Spain.,Departamento de Estadística e Investigación Operativa Aplicadas y Calidad, Universitat Politècnica de València, Spain
| | - Ana Conesa
- Centro de Investigacion Principe Felipe, València, Spain.,Microbiology and Cell Science, University of Florida
| | - Andrés Moya
- Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunitat Valenciana (FISABIO)-Salud Pública, València, Spain.,Institute for Integrative Systems Biology, Universitat de València, Spain
| |
Collapse
|
12
|
Huang C, Sam V, Du S, Le T, Fletcher A, Lau W, Meyer K, Asaki E, Huang DW, Johnson C. Towards Personalized Medicine: An Improved De Novo Assembly Procedure for Early Detection of Drug Resistant HIV Minor Quasispecies in Patient Samples. Bioinformation 2018; 14:449-454. [PMID: 30310253 PMCID: PMC6166399 DOI: 10.6026/97320630014449] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Revised: 09/17/2018] [Accepted: 09/17/2018] [Indexed: 12/31/2022] Open
Abstract
The third-generation sequencing technology, PacBio, has shown an ability to sequence the HIV virus amplicons in their full length. The
long read of PaBio offers a distinct advantage to comprehensively understand the virus evolution complexity at quasispecies level (i.e.
maintaining linkage information of variants) comparing to the short reads from Illumina shotgun sequencing. However, due to the highnoise
nature of the PacBio reads, it is still a challenge to build accurate contigs at high sensitivity. Most of previously developed NGS
assembly tools work with the assumption that the input reads are fairly accurate, which is largely true for the data derived from Sanger or
Illumina technologies. When applying these tools on PacBio high-noise reads, they are largely driven by noise rather than true signal
eventually leading to poor results in most cases. In this study, we propose the de novo assembly procedure, which comprises a positivefocused
strategy, and linkage-frequency noise reduction so that it is more suitable for PacBio high-noise reads. We further tested the
unique de novo assembly procedure on HIV PacBio benchmark data and clinical samples, which accurately assembled dominant and minor
populations of HIV quasispecies as expected. The improved de novo assembly procedure shows potential ability to promote PacBio
technology in the field of HIV drug-resistance clinical detection, as well as in broad HIV phylogenetic studies.
Collapse
Affiliation(s)
- Cindy Huang
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891.,Thomas Wootton High School, Rockville, Maryland 20850
| | - Vichetra Sam
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891.,CSRA, Falls Church, VA 22042
| | - Sophie Du
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891.,CSRA, Falls Church, VA 22042
| | - Tuan Le
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891
| | - Anthony Fletcher
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891
| | - William Lau
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891
| | - Kathleen Meyer
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891.,CSRA, Falls Church, VA 22042
| | - Esther Asaki
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891.,CSRA, Falls Church, VA 22042
| | - Da Wei Huang
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891
| | - Calvin Johnson
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891
| |
Collapse
|
13
|
Cissé OH, Hauser PM. Genomics and evolution of Pneumocystis species. INFECTION GENETICS AND EVOLUTION 2018; 65:308-320. [PMID: 30138710 DOI: 10.1016/j.meegid.2018.08.015] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 08/15/2018] [Accepted: 08/17/2018] [Indexed: 01/20/2023]
Abstract
The genus Pneumocystis comprises highly diversified fungal species that cause severe pneumonia in individuals with a deficient immune system. These fungi infect exclusively mammals and present a strict host species specificity. These species have co-diverged with their hosts for long periods of time (> 100 MYA). Details of their biology and evolution are fragmentary mainly because of a lack of an established long-term culture system. Recent genomic advances have unlocked new areas of research and allow new hypotheses to be tested. We review here new findings of the genomic studies in relation with the evolutionary trajectory of these fungi and discuss the impact of genomic data analysis in the context of the population genetics. The combination of slow genome decay and limited expansion of specific gene families and introns reflect intimate interactions of these species with their hosts. The evolutionary adaptation of these organisms is profoundly influenced by their population structure, which in turn is determined by intrinsic features such as their self-fertilizing mating system, high host specificity, long generation times, and transmission mode. Essential key questions concerning their adaptation and speciation remain to be answered. The next cornerstone will consist in the establishment of a long-term culture system and genetic manipulation that should allow unravelling the driving forces of Pneumocystis species evolution.
Collapse
Affiliation(s)
- Ousmane H Cissé
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, MD 20892, USA.
| | - Philippe M Hauser
- Institute of Microbiology, Lausanne University Hospital, Lausanne, Switzerland.
| |
Collapse
|
14
|
Ma L, Cissé OH, Kovacs JA. A Molecular Window into the Biology and Epidemiology of Pneumocystis spp. Clin Microbiol Rev 2018; 31:e00009-18. [PMID: 29899010 PMCID: PMC6056843 DOI: 10.1128/cmr.00009-18] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Pneumocystis, a unique atypical fungus with an elusive lifestyle, has had an important medical history. It came to prominence as an opportunistic pathogen that not only can cause life-threatening pneumonia in patients with HIV infection and other immunodeficiencies but also can colonize the lungs of healthy individuals from a very early age. The genus Pneumocystis includes a group of closely related but heterogeneous organisms that have a worldwide distribution, have been detected in multiple mammalian species, are highly host species specific, inhabit the lungs almost exclusively, and have never convincingly been cultured in vitro, making Pneumocystis a fascinating but difficult-to-study organism. Improved molecular biologic methodologies have opened a new window into the biology and epidemiology of Pneumocystis. Advances include an improved taxonomic classification, identification of an extremely reduced genome and concomitant inability to metabolize and grow independent of the host lungs, insights into its transmission mode, recognition of its widespread colonization in both immunocompetent and immunodeficient hosts, and utilization of strain variation to study drug resistance, epidemiology, and outbreaks of infection among transplant patients. This review summarizes these advances and also identifies some major questions and challenges that need to be addressed to better understand Pneumocystis biology and its relevance to clinical care.
Collapse
Affiliation(s)
- Liang Ma
- Critical Care Medicine Department, NIH Clinical Center, Bethesda, Maryland, USA
| | - Ousmane H Cissé
- Critical Care Medicine Department, NIH Clinical Center, Bethesda, Maryland, USA
| | - Joseph A Kovacs
- Critical Care Medicine Department, NIH Clinical Center, Bethesda, Maryland, USA
| |
Collapse
|
15
|
Giolai M, Paajanen P, Verweij W, Percival-Alwyn L, Baker D, Witek K, Jupe F, Bryan G, Hein I, Jones JDG, Clark MD. Targeted capture and sequencing of gene-sized DNA molecules. Biotechniques 2016; 61:315-322. [PMID: 27938323 DOI: 10.2144/000114484] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 10/06/2016] [Indexed: 11/23/2022] Open
Abstract
Targeted capture provides an efficient and sensitive means for sequencing specific genomic regions in a high-throughput manner. To date, this method has mostly been used to capture exons from the genome (the exome) using short insert libraries and short-read sequencing technology, enabling the identification of genetic variants or new members of large gene families. Sequencing larger molecules results in the capture of whole genes, including intronic and intergenic sequences that are typically more polymorphic and allow the resolution of the gene structure of homologous genes, which are often clustered together on the chromosome. Here, we describe an improved method for the capture and single-molecule sequencing of DNA molecules as large as 7 kb by means of size selection and optimized PCR conditions. Our approach can be used to capture, sequence, and distinguish between similar members of the NB-LRR gene family-key genes in plant immune systems.
Collapse
Affiliation(s)
- Michael Giolai
- Earlham Institute (EI), Norwich Research Park, Norwich, UK
| | | | - Walter Verweij
- Earlham Institute (EI), Norwich Research Park, Norwich, UK
| | | | - David Baker
- Earlham Institute (EI), Norwich Research Park, Norwich, UK
| | - Kamil Witek
- The Sainsbury Laboratory, Norwich Research Park, Norwich, UK
| | - Florian Jupe
- The Sainsbury Laboratory, Norwich Research Park, Norwich, UK.,Plant Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA
| | | | - Ingo Hein
- The James Hutton Institute, Dundee, UK
| | | | - Matthew D Clark
- Earlham Institute (EI), Norwich Research Park, Norwich, UK.,School of Environmental Sciences, University of East Anglia, Norwich Research Park, Norwich, UK
| |
Collapse
|