1
|
Tang T, Liu Y, Zheng B, Li R, Zhang X, Liu Y. Integration of hybrid and self-correction method improves the quality of long-read sequencing data. Brief Funct Genomics 2024; 23:249-255. [PMID: 37340778 DOI: 10.1093/bfgp/elad026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 06/04/2023] [Accepted: 06/05/2023] [Indexed: 06/22/2023] Open
Abstract
Third-generation sequencing (TGS) technologies have revolutionized genome science in the past decade. However, the long-read data produced by TGS platforms suffer from a much higher error rate than that of the previous technologies, thus complicating the downstream analysis. Several error correction tools for long-read data have been developed; these tools can be categorized into hybrid and self-correction tools. So far, these two types of tools are separately investigated, and their interplay remains understudied. Here, we integrate hybrid and self-correction methods for high-quality error correction. Our procedure leverages the inter-similarity between long-read data and high-accuracy information from short reads. We compare the performance of our method and state-of-the-art error correction tools on Escherichia coli and Arabidopsis thaliana datasets. The result shows that the integration approach outperformed the existing error correction methods and holds promise for improving the quality of downstream analyses in genomic research.
Collapse
Affiliation(s)
- Tao Tang
- School of Mordern Posts, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Qixia District, 210023, Jiangsu, China
| | - Yiping Liu
- College of Computer Science and Electronic Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| | - Binshuang Zheng
- School of Mordern Posts, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Qixia District, 210023, Jiangsu, China
| | - Rong Li
- School of Mordern Posts, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Qixia District, 210023, Jiangsu, China
| | - Xiaocai Zhang
- Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), 138632, Singapore, Singapore
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| |
Collapse
|
2
|
Lou F, Ren Z, Tang Y, Han Z. Full-length transcriptome reveals the circularly polarized light response-related molecular genetic characteristics of Oratosquilla oratoria. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY. PART D, GENOMICS & PROTEOMICS 2024; 49:101183. [PMID: 38141370 DOI: 10.1016/j.cbd.2023.101183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/16/2023] [Accepted: 12/16/2023] [Indexed: 12/25/2023]
Abstract
The mantis shrimp is the only animal that can recognize circularly polarized light (CPL), but its molecular genetic characteristics are unclear. Multi-tissue level full-length (FL) transcriptome sequencing of Oratosquilla oratoria, a representative widely distributed mantis shrimp, was performed in the present study. We used comparative transcriptomics to explore the critical genes of O. oratoria selected by CPL and the GNβ gene associated with CPL signal transduction was hypothesized to be positively selected. Furthermore, the FL transcriptomes of O. oratoria compound eyes under five light conditions were sequenced and used to detect alternative splicing (AS). The ASs associated with CPL recognition mainly occurred in the LWS, ARR and TRPC regions. The number of FL transcripts with AS events and annotation information also provided evidence that O. oratoria could recognize LCPL. Additionally, 51 sequences belonging to the LWS, UV and Peropsin gene families were identified based on conserved 7tm domains. The LWS, UV and Peropsin opsins have similar 3D structures with seven domains across the cell membrane and conserved KSLRTPSN, DRY, and QAKK motifs. In conclusion, these results are undoubtedly valuable for perfecting the vision theory of O. oratoria and other mantis shrimp.
Collapse
Affiliation(s)
- Fangrui Lou
- School of Ocean, Yantai University, Yantai 264003, Shandong, China.
| | - Zhongjie Ren
- School of Ocean, Yantai University, Yantai 264003, Shandong, China
| | - Yongzheng Tang
- School of Ocean, Yantai University, Yantai 264003, Shandong, China
| | - Zhiqiang Han
- Fishery College, Zhejiang Ocean University, Zhoushan 316022, Zhejiang, China.
| |
Collapse
|
3
|
Liu S, Ebel ER, Luniewski A, Zulawinska J, Simpson ML, Kim J, Ene N, Braukmann TWA, Congdon M, Santos W, Yeh E, Guler JL. Direct long read visualization reveals metabolic interplay between two antimalarial drug targets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.13.528367. [PMID: 36824743 PMCID: PMC9948948 DOI: 10.1101/2023.02.13.528367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
Increases in the copy number of large genomic regions, termed genome amplification, are an important adaptive strategy for malaria parasites. Numerous amplifications across the Plasmodium falciparum genome contribute directly to drug resistance or impact the fitness of this protozoan parasite. During the characterization of parasite lines with amplifications of the dihydroorotate dehydrogenase (DHODH) gene, we detected increased copies of an additional genomic region that encompassed 3 genes (~5 kb) including GTP cyclohydrolase I (GCH1 amplicon). While this gene is reported to increase the fitness of antifolate resistant parasites, GCH1 amplicons had not previously been implicated in any other antimalarial resistance context. Here, we further explored the association between GCH1 and DHODH copy number. Using long read sequencing and single read visualization, we directly observed a higher number of tandem GCH1 amplicons in parasites with increased DHODH copies (up to 9 amplicons) compared to parental parasites (3 amplicons). While all GCH1 amplicons shared a consistent structure, expansions arose in 2-unit steps (from 3 to 5 to 7, etc copies). Adaptive evolution of DHODH and GCH1 loci was further bolstered when we evaluated prior selection experiments; DHODH amplification was only successful in parasite lines with pre-existing GCH1 amplicons. These observations, combined with the direct connection between metabolic pathways that contain these enzymes, lead us to propose that the GCH1 locus is beneficial for the fitness of parasites exposed to DHODH inhibitors. This finding highlights the importance of studying variation within individual parasite genomes as well as biochemical connections of drug targets as novel antimalarials move towards clinical approval.
Collapse
Affiliation(s)
- Shiwei Liu
- University of Virginia, Department of Biology, Charlottesville, VA, USA
- Current affiliation: Indiana University School of Medicine, Indianapolis, IN, USA
| | - Emily R. Ebel
- Stanford, Departments of Pediatrics and Microbiology & Immunology, Stanford, CA, USA
| | | | - Julia Zulawinska
- University of Virginia, Department of Biology, Charlottesville, VA, USA
| | | | - Jane Kim
- University of Virginia, Department of Biology, Charlottesville, VA, USA
| | - Nnenna Ene
- University of Virginia, Department of Biology, Charlottesville, VA, USA
| | | | - Molly Congdon
- Virginia Tech, Department of Chemistry, Blacksburg, VA, USA
| | - Webster Santos
- Virginia Tech, Department of Chemistry, Blacksburg, VA, USA
| | - Ellen Yeh
- Stanford University, Departments of Pathology and Microbiology & Immunology, Stanford, CA, USA
| | - Jennifer L. Guler
- University of Virginia, Department of Biology, Charlottesville, VA, USA
| |
Collapse
|
4
|
Greenberg G, Ravi AN, Shomorony I. LexicHash: sequence similarity estimation via lexicographic comparison of hashes. Bioinformatics 2023; 39:btad652. [PMID: 37878809 PMCID: PMC10628434 DOI: 10.1093/bioinformatics/btad652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 10/11/2023] [Accepted: 10/23/2023] [Indexed: 10/27/2023] Open
Abstract
MOTIVATION Pairwise sequence alignment is a heavy computational burden, particularly in the context of third-generation sequencing technologies. This issue is commonly addressed by approximately estimating sequence similarities using a hash-based method such as MinHash. In MinHash, all k-mers in a read are hashed and the minimum hash value, the min-hash, is stored. Pairwise similarities can then be estimated by counting the number of min-hash matches between a pair of reads, across many distinct hash functions. The choice of the parameter k controls an important tradeoff in the task of identifying alignments: larger k-values give greater confidence in the identification of alignments (high precision) but can lead to many missing alignments (low recall), particularly in the presence of significant noise. RESULTS In this work, we introduce LexicHash, a new similarity estimation method that is effectively independent of the choice of k and attains the high precision of large-k and the high sensitivity of small-k MinHash. LexicHash is a variant of MinHash with a carefully designed hash function. When estimating the similarity between two reads, instead of simply checking whether min-hashes match (as in standard MinHash), one checks how "lexicographically similar" the LexicHash min-hashes are. In our experiments on 40 PacBio datasets, the area under the precision-recall curves obtained by LexicHash had an average improvement of 20.9% over MinHash. Additionally, the LexicHash framework lends itself naturally to an efficient search of the largest alignments, yielding an O(n) time algorithm, and circumventing the seemingly fundamental O(n2) scaling associated with pairwise similarity search. AVAILABILITY AND IMPLEMENTATION LexicHash is available on GitHub at https://github.com/gcgreenberg/LexicHash.
Collapse
Affiliation(s)
- Grant Greenberg
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Aditya Narayan Ravi
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Ilan Shomorony
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| |
Collapse
|
5
|
Niaré K, Greenhouse B, Bailey JA. An optimized GATK4 pipeline for Plasmodium falciparum whole genome sequencing variant calling and analysis. Malar J 2023; 22:207. [PMID: 37420214 PMCID: PMC10327343 DOI: 10.1186/s12936-023-04632-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 06/21/2023] [Indexed: 07/09/2023] Open
Abstract
BACKGROUND Accurate variant calls from whole genome sequencing (WGS) of Plasmodium falciparum infections are crucial in malaria population genomics. Here a falciparum variant calling pipeline based on GATK version 4 (GATK4) was optimized and applied to 6626 public Illumina WGS samples. METHODS Control WGS and accurate PacBio assemblies of 10 laboratory strains were leveraged to optimize parameters that control the heterozygosity, local assembly region size, ploidy, mapping and base quality in both GATK HaplotypeCaller and GenotypeGVCFs. From these controls, a high-quality training dataset was generated to recalibrate the raw variant data. RESULTS On current high-quality samples (read length = 250 bp, insert size = 405-524 bp), the optimized pipeline shows improved sensitivity (86.6 ± 1.7% for SNPs and 82.2 ± 5.9% for indels) compared to the default GATK4 pipeline (77.7 ± 1.3% for SNPs; and 73.1 ± 5.1% for indels, adjusted P < 0.001) and previous variant calling with GATK version 3 (GATK3, 70.3 ± 3.0% for SNPs and 59.7 ± 5.8% for indels, adjusted P < 0.001). Its sensitivity on simulated mixed infection samples (80.8 ± 6.1% for SNPs and 78.3 ± 5.1% for indels) was again improved relative to default GATK4 (68.8 ± 6.0% for SNPs and 38.9 ± 0.7% for indels, adjusted, adjusted P < 0.001). Precision was high and comparable across all pipelines on each type of data tested. The resulting combination of high-quality SNPs and indels increases the resolution of local population population structure detection in sub-Saharan Africa. Finally, increasing ploidy improves the detection of drug resistance mutations and estimation of complexity of infection. CONCLUSIONS Overall, this study provides an optimized falciparum GATK4 pipeline resource for variant calling which should help improve genomic studies of malaria.
Collapse
Affiliation(s)
- Karamoko Niaré
- Department of Pathology and Laboratory Medicine, Brown University, Providence, RI, USA.
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA.
| | - Bryan Greenhouse
- EPPIcenter Program, Division of HIV, Infectious Diseases, and Global Medicine, Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Jeffrey A Bailey
- Department of Pathology and Laboratory Medicine, Brown University, Providence, RI, USA.
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA.
| |
Collapse
|
6
|
Insight into molecular diagnosis for antimalarial drug resistance of Plasmodium falciparum parasites: A review. Acta Trop 2023; 241:106870. [PMID: 36849091 DOI: 10.1016/j.actatropica.2023.106870] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Revised: 02/20/2023] [Accepted: 02/22/2023] [Indexed: 02/27/2023]
Abstract
Malaria is an infectious disease transmitted by the female Anopheles mosquito and poses a severe threat to human health. At present, antimalarial drugs are the primary treatment for malaria. The widespread use of artemisinin-based combination therapies (ACTs) has dramatically reduced the number of malaria-related deaths; however, the emergence of resistance has the potential to reverse this progress. Accurate and timely diagnosis of drug-resistant strains of Plasmodium parasites via detecting molecular markers (such as Pfnhe1, Pfmrp, Pfcrt, Pfmdr1, Pfdhps, Pfdhfr, and Pfk13) is essential for malaria control and elimination. Here, we review the current techniques which commonly used for molecular diagnosis of antimalarial resistance in P. falciparum and discuss their sensitivities and specificities for different drug resistance-associated molecular markers, with the aim of providing insights into possible directions for future precise point-of-care testing (POCT) of antimalarial drug resistance of malaria parasites.
Collapse
|
7
|
Niaré K, Greenhouse B, Bailey JA. An Optimized GATK4 Pipeline for Plasmodium falciparum Whole Genome Sequencing Variant Calling and Analysis. RESEARCH SQUARE 2023:rs.3.rs-2561857. [PMID: 36824880 PMCID: PMC9949269 DOI: 10.21203/rs.3.rs-2561857/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
Background Accurate variant calls from whole genome sequencing (WGS) of Plasmodium falciparum infections are crucial in malaria population genomics. Here we optimized a falciparum variant calling pipeline based on GATK version 4 (GATK4) and applied it to 6,626 public Illumina WGS samples. Methods We optimized parameters that control the heterozygosity, local assembly region size, ploidy, mapping and base quality in both GATK HaplotypeCaller and GenotypeGVCFs leveraging control WGS and accurate PacBio assemblies of 10 laboratory strains. From these controls we generated a high-quality training dataset to recalibrate the raw variant data. Results On current high-quality samples (read length = 250bp, insert size = 405 - 524 bp ), we show improved sensitivity (86.6 ± 1.7% for SNPs and 82.2 ± 5.9% for indels) compared to the default GATK4 pipeline (77.7 ± 1.3% for SNPs; and 73.1 ± 5.1% for indels, adjusted P < 0.001) and previous variant calling with GATK version 3 (GATK3, 70.3 ± 3.0% for SNPs and 59.7 ± 5.8% for indels, adjusted P < 0.001). The sensitivity of our pipeline on simulated mixed infection samples (80.8 ± 6.1% for SNPs and 78.3 ± 5.1% for indels) was again improved relative to default GATK4 (68.8 ± 6.0% for SNPs and 38.9 ± 0.7% for indels, adjusted P < 0.001). Precision was high and comparable across all pipelines on each type of data tested. We further show that using the combination of high-quality SNPs and indels increases the resolution of local population population structure detection in sub-Saharan Africa. We finally demonstrate that increasing ploidy improves the detection of drug resistance mutations and estimation of complexity of infection. Conclusions Overall, we provide an optimized GATK4 pipeline and resource for falciparum variant calling which should help improve genomic studies of malaria.
Collapse
|
8
|
Brashear AM, Cui L. Population genomics in neglected malaria parasites. Front Microbiol 2022; 13:984394. [PMID: 36160257 PMCID: PMC9493318 DOI: 10.3389/fmicb.2022.984394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open
Abstract
Malaria elimination includes neglected human malaria parasites Plasmodium vivax, Plasmodium ovale spp., and Plasmodium malariae. Biological features such as association with low-density infection and the formation of hypnozoites responsible for relapse make their elimination challenging. Studies on these parasites rely primarily on clinical samples due to the lack of long-term culture techniques. With improved methods to enrich parasite DNA from clinical samples, whole-genome sequencing of the neglected malaria parasites has gained increasing popularity. Population genomics of more than 2200 P. vivax global isolates has improved our knowledge of parasite biology and host-parasite interactions, identified vaccine targets and potential drug resistance markers, and provided a new way to track parasite migration and introduction and monitor the evolutionary response of local populations to elimination efforts. Here, we review advances in population genomics for neglected malaria parasites, discuss how the rich genomic information is being used to understand parasite biology and epidemiology, and explore opportunities for the applications of malaria genomic data in malaria elimination practice.
Collapse
|
9
|
Akoniyon OP, Adewumi TS, Maharaj L, Oyegoke OO, Roux A, Adeleke MA, Maharaj R, Okpeku M. Whole Genome Sequencing Contributions and Challenges in Disease Reduction Focused on Malaria. BIOLOGY 2022; 11:587. [PMID: 35453786 PMCID: PMC9027812 DOI: 10.3390/biology11040587] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Revised: 03/31/2022] [Accepted: 04/01/2022] [Indexed: 12/11/2022]
Abstract
Malaria elimination remains an important goal that requires the adoption of sophisticated science and management strategies in the era of the COVID-19 pandemic. The advent of next generation sequencing (NGS) is making whole genome sequencing (WGS) a standard today in the field of life sciences, as PCR genotyping and targeted sequencing provide insufficient information compared to the whole genome. Thus, adapting WGS approaches to malaria parasites is pertinent to studying the epidemiology of the disease, as different regions are at different phases in their malaria elimination agenda. Therefore, this review highlights the applications of WGS in disease management, challenges of WGS in controlling malaria parasites, and in furtherance, provides the roles of WGS in pursuit of malaria reduction and elimination. WGS has invaluable impacts in malaria research and has helped countries to reach elimination phase rapidly by providing required information needed to thwart transmission, pathology, and drug resistance. However, to eliminate malaria in sub-Saharan Africa (SSA), with high malaria transmission, we recommend that WGS machines should be readily available and affordable in the region.
Collapse
Affiliation(s)
- Olusegun Philip Akoniyon
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville Campus, Durban 4041, South Africa; (O.P.A.); (T.S.A.); (L.M.); (O.O.O.); (A.R.); (M.A.A.)
| | - Taiye Samson Adewumi
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville Campus, Durban 4041, South Africa; (O.P.A.); (T.S.A.); (L.M.); (O.O.O.); (A.R.); (M.A.A.)
| | - Leah Maharaj
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville Campus, Durban 4041, South Africa; (O.P.A.); (T.S.A.); (L.M.); (O.O.O.); (A.R.); (M.A.A.)
| | - Olukunle Olugbenle Oyegoke
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville Campus, Durban 4041, South Africa; (O.P.A.); (T.S.A.); (L.M.); (O.O.O.); (A.R.); (M.A.A.)
| | - Alexandra Roux
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville Campus, Durban 4041, South Africa; (O.P.A.); (T.S.A.); (L.M.); (O.O.O.); (A.R.); (M.A.A.)
| | - Matthew A. Adeleke
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville Campus, Durban 4041, South Africa; (O.P.A.); (T.S.A.); (L.M.); (O.O.O.); (A.R.); (M.A.A.)
| | - Rajendra Maharaj
- Office of Malaria Research, South African Medical Research Council, Cape Town 7505, South Africa;
| | - Moses Okpeku
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville Campus, Durban 4041, South Africa; (O.P.A.); (T.S.A.); (L.M.); (O.O.O.); (A.R.); (M.A.A.)
| |
Collapse
|
10
|
Baptista RP, Li Y, Sateriale A, Sanders MJ, Brooks KL, Tracey A, Ansell BRE, Jex AR, Cooper GW, Smith ED, Xiao R, Dumaine JE, Georgeson P, Pope BJ, Berriman M, Striepen B, Cotton JA, Kissinger JC. Long-read assembly and comparative evidence-based reanalysis of Cryptosporidium genome sequences reveal expanded transporter repertoire and duplication of entire chromosome ends including subtelomeric regions. Genome Res 2022; 32:203-213. [PMID: 34764149 PMCID: PMC8744675 DOI: 10.1101/gr.275325.121] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 11/10/2021] [Indexed: 11/25/2022]
Abstract
Cryptosporidiosis is a leading cause of waterborne diarrheal disease globally and an important contributor to mortality in infants and the immunosuppressed. Despite its importance, the Cryptosporidium community has only had access to a good, but incomplete, Cryptosporidium parvum IOWA reference genome sequence. Incomplete reference sequences hamper annotation, experimental design, and interpretation. We have generated a new C. parvum IOWA genome assembly supported by Pacific Biosciences (PacBio) and Oxford Nanopore long-read technologies and a new comparative and consistent genome annotation for three closely related species: C. parvum, Cryptosporidium hominis, and Cryptosporidium tyzzeri We made 1926 C. parvum annotation updates based on experimental evidence. They include new transporters, ncRNAs, introns, and altered gene structures. The new assembly and annotation revealed a complete Dnmt2 methylase ortholog. Comparative annotation between C. parvum, C. hominis, and C. tyzzeri revealed that most "missing" orthologs are found, suggesting that the biological differences between the species must result from gene copy number variation, differences in gene regulation, and single-nucleotide variants (SNVs). Using the new assembly and annotation as reference, 190 genes are identified as evolving under positive selection, including many not detected previously. The new C. parvum IOWA reference genome assembly is larger, gap free, and lacks ambiguous bases. This chromosomal assembly recovers all 16 chromosome ends, 13 of which are contiguously assembled. The three remaining chromosome ends are provisionally placed. These ends represent duplication of entire chromosome ends including subtelomeric regions revealing a new level of genome plasticity that will both inform and impact future research.
Collapse
Affiliation(s)
- Rodrigo P Baptista
- Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, Georgia 30602, USA
- Institute of Bioinformatics, University of Georgia, Athens, Georgia 30602, USA
| | - Yiran Li
- Institute of Bioinformatics, University of Georgia, Athens, Georgia 30602, USA
| | - Adam Sateriale
- Department of Pathology, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Mandy J Sanders
- The Wellcome Sanger Institute, Hinxton, CB10 1SA, United Kingdom
| | - Karen L Brooks
- The Wellcome Sanger Institute, Hinxton, CB10 1SA, United Kingdom
| | - Alan Tracey
- The Wellcome Sanger Institute, Hinxton, CB10 1SA, United Kingdom
| | - Brendan R E Ansell
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne and Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville 3052, Australia
| | - Aaron R Jex
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne and Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville 3052, Australia
| | - Garrett W Cooper
- Department of Clinical Pathology, The University of Melbourne, Victorian Comprehensive Cancer Centre, Melbourne VIC 3000, Australia
| | - Ethan D Smith
- Department of Clinical Pathology, The University of Melbourne, Victorian Comprehensive Cancer Centre, Melbourne VIC 3000, Australia
| | - Rui Xiao
- Institute of Bioinformatics, University of Georgia, Athens, Georgia 30602, USA
| | - Jennifer E Dumaine
- Department of Pathology, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Peter Georgeson
- Department of Clinical Pathology, The University of Melbourne, Victorian Comprehensive Cancer Centre, Melbourne VIC 3000, Australia
- Melbourne Bioinformatics, The University of Melbourne, Parkville VIC 3010, Australia
- University of Melbourne Centre for Cancer Research, Victorian Comprehensive Cancer Centre, Melbourne VIC 3000, Australia
| | - Bernard J Pope
- Department of Clinical Pathology, The University of Melbourne, Victorian Comprehensive Cancer Centre, Melbourne VIC 3000, Australia
- Melbourne Bioinformatics, The University of Melbourne, Parkville VIC 3010, Australia
- Department of Surgery (Royal Melbourne Hospital), Melbourne Medical School, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne 3010, Australia
- Department of Medicine, Central Clinical School, Faculty of Medicine Nursing and Health Sciences, Monash University, Melbourne 3004, Australia
| | - Matthew Berriman
- The Wellcome Sanger Institute, Hinxton, CB10 1SA, United Kingdom
| | - Boris Striepen
- Department of Pathology, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - James A Cotton
- The Wellcome Sanger Institute, Hinxton, CB10 1SA, United Kingdom
| | - Jessica C Kissinger
- Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, Georgia 30602, USA
- Institute of Bioinformatics, University of Georgia, Athens, Georgia 30602, USA
- Department of Genetics, University of Georgia, Athens, Georgia 30602, USA
| |
Collapse
|
11
|
Zhang X, Deitsch KW, Dzikowski R. CRISPR-Cas9 Editing of the Plasmodium falciparum Genome: Special Applications. Methods Mol Biol 2022; 2470:241-253. [PMID: 35881350 DOI: 10.1007/978-1-0716-2189-9_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The virulence of Plasmodium falciparum has been attributed in large part to the expression on the surface of infected red blood cells of the variant surface antigen Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1). Different forms of this protein are encoded by individual members of the multicopy gene family called var. Two attributes of the var gene family are key to the pathogenesis of malaria caused by P. falciparum; the hyperrecombinogenic nature of the var gene family that continuously generates antigenic diversity within parasite populations, and the ability of parasites to express only a single var gene at a time and to switch which gene is expressed over the course of an infection. The unique attributes of CRISPR-Cas9 have been applied to help decipher the molecular mechanisms underlying these unusual properties of the var gene family, both as a source of the DNA double strand breaks that initiate var gene recombination and as a way to recruit molecular probes to specific regions of the genome. In this chapter, we describe these somewhat unusual applications of the CRISPR-Cas9 system.
Collapse
Affiliation(s)
- Xu Zhang
- Department of Microbiology and Immunology, Weill Medical College of Cornell University, New York, NY, USA
| | - Kirk William Deitsch
- Department of Microbiology and Immunology, Weill Medical College of Cornell University, New York, NY, USA
| | - Ron Dzikowski
- Department of Microbiology and Molecular Genetics, The Institute for Medical Research Israel-Canada, The Kuvin Center for the Study of Infectious and Tropical Diseases, Hebrew University-Hadassah Medical School, Jerusalem, Israel.
| |
Collapse
|
12
|
Li T, Zhang X, Guo L, Qi T, Tang H, Wang H, Qiao X, Zhang M, Zhang B, Feng J, Zuo Z, Zhang Y, Xing C, Wu J. Single-molecule real-time transcript sequencing of developing cotton anthers facilitates genome annotation and fertility restoration candidate gene discovery. Genomics 2021; 113:4245-4253. [PMID: 34793949 DOI: 10.1016/j.ygeno.2021.11.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 07/04/2021] [Accepted: 11/10/2021] [Indexed: 01/23/2023]
Abstract
Heterosis refers to the superior phenotypes observed in hybrids. Cytoplasmic male sterility (CMS) system plays an important role in cotton heterosis utilization. However, the global gene expression patterns of CMS-D2 and its interaction with the restorer gene Rf1 remain unclear. Here, the full-length transcript sequencing was performed in anthers of the CMS-D2 restorer line using PacBio single-molecule real-time sequencing technology. Combining PacBio SMRT long-read isoforms and Illumina RNA-seq data, 107,066 isoforms from 44,338 loci were obtained, including 10,086 novel isoforms of novel genes and 66,419 new isoforms of known genes. Totally 56,572 alternative splicing (AS) events, 1146 lncRNAs, 61 fusion transcripts and 10,466 genes exhibited alternative polyadenylation (APA), and 60,995 novel isoforms with predicted open reading frames (ORFs) were further identified. Furthermore, the specifically expressed genes in restorer line were selected and confirmed by qRT-PCR. These findings provide a basis for upland cotton genome annotation and transcriptome research, and will help to reveal the molecular mechanism of interaction between Rf1 and CMS-D2 cytoplasm.
Collapse
Affiliation(s)
- Ting Li
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou 450000, Henan, China
| | - Xuexian Zhang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Key Laboratory for Cotton Genetic Improvement, Ministry of Agriculture, 38 Huanghe Dadao, Anyang 455000, Henan, China.
| | - Liping Guo
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Key Laboratory for Cotton Genetic Improvement, Ministry of Agriculture, 38 Huanghe Dadao, Anyang 455000, Henan, China.
| | - Tingxiang Qi
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Key Laboratory for Cotton Genetic Improvement, Ministry of Agriculture, 38 Huanghe Dadao, Anyang 455000, Henan, China.
| | - Huini Tang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Key Laboratory for Cotton Genetic Improvement, Ministry of Agriculture, 38 Huanghe Dadao, Anyang 455000, Henan, China.
| | - Hailin Wang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Key Laboratory for Cotton Genetic Improvement, Ministry of Agriculture, 38 Huanghe Dadao, Anyang 455000, Henan, China
| | - Xiuqin Qiao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Key Laboratory for Cotton Genetic Improvement, Ministry of Agriculture, 38 Huanghe Dadao, Anyang 455000, Henan, China.
| | - Meng Zhang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Key Laboratory for Cotton Genetic Improvement, Ministry of Agriculture, 38 Huanghe Dadao, Anyang 455000, Henan, China
| | - Bingbing Zhang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Key Laboratory for Cotton Genetic Improvement, Ministry of Agriculture, 38 Huanghe Dadao, Anyang 455000, Henan, China
| | - Juanjuan Feng
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Key Laboratory for Cotton Genetic Improvement, Ministry of Agriculture, 38 Huanghe Dadao, Anyang 455000, Henan, China
| | - Zhidan Zuo
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Key Laboratory for Cotton Genetic Improvement, Ministry of Agriculture, 38 Huanghe Dadao, Anyang 455000, Henan, China
| | - Yongjie Zhang
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou 450000, Henan, China
| | - Chaozhu Xing
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Key Laboratory for Cotton Genetic Improvement, Ministry of Agriculture, 38 Huanghe Dadao, Anyang 455000, Henan, China.
| | - Jianyong Wu
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou 450000, Henan, China; State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Key Laboratory for Cotton Genetic Improvement, Ministry of Agriculture, 38 Huanghe Dadao, Anyang 455000, Henan, China.
| |
Collapse
|
13
|
Alternative splicing landscape of small brown planthopper and different response of JNK2 isoforms to rice stripe virus infection. J Virol 2021; 96:e0171521. [PMID: 34757837 DOI: 10.1128/jvi.01715-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Alternative splicing (AS) is a frequent posttranscriptional regulatory event occurring in response to various endogenous and exogenous stimuli in most eukaryotic organisms. However, little is known about the effects of insect-transmitted viruses on AS events in insect vectors. The present study used third-generation sequencing technology and RNA sequencing (RNA-Seq) to evaluate the AS response in the small brown planthopper Laodelphax striatellus to rice stripe virus (RSV). The full-length transcriptome of L. striatellus was obtained using single-molecule real-time sequencing technology (SMRT). Posttranscriptional regulatory events, including AS, alternative polyadenylation, and fusion transcripts, were analyzed. A total of 28,175 nonredundant transcript isoforms included 24,950 transcripts assigned to 8,500 annotated genes of L. striatellus, and 5,000 of these genes (58.8%) had AS events. RNA-Seq of the gut samples of insects infected by RSV for 8 d identified 3,458 differentially expressed transcripts (DETs); 2,185 of these DETs were transcribed from 1,568 genes that had AS events, indicating that 31.4% of alternatively spliced genes responded to RSV infection of the gut. One of the c-Jun N-terminal kinase (JNK) genes, JNK2, experienced exon skipping, resulting in three transcript isoforms. These three isoforms differentially responded to RSV infection during development and in various organs. Injection of double-stranded RNAs targeting all or two isoforms indicated that three or at least two JNK2 isoforms facilitated RSV accumulation in planthoppers. These results implied that AS events could participate in the regulation of complex relationships between viruses and insect vectors. Importance Alternative splicing (AS) is a regulatory mechanism that occurs after gene transcription. AS events can enrich protein diversity to promote the reactions of the organisms to various endogenous and exogenous stimulations. It is not known how insect vectors exploit AS events to cope with transmitted viruses. The present study used third-generation sequencing technology to obtain the profile of AS events in the small brown planthopper Laodelphax striatellus, which is an efficient vector for rice stripe virus (RSV). The results indicated that 31.4% of alternatively spliced genes responded to RSV infection in the gut of planthoppers. One of the c-Jun N-terminal kinase (JNK) genes, JNK2, produced three transcript isoforms by AS. These three isoforms showed different responses to RSV infection, and at least two isoforms facilitated viral accumulation in planthoppers. These results implied that AS events could participate in the regulation of complex relationships between viruses and insect vectors.
Collapse
|
14
|
Wang W, Wang L, Wang L, Tan M, Ogutu CO, Yin Z, Zhou J, Wang J, Wang L, Yan X. Transcriptome analysis and molecular mechanism of linseed (Linum usitatissimum L.) drought tolerance under repeated drought using single-molecule long-read sequencing. BMC Genomics 2021; 22:109. [PMID: 33563217 PMCID: PMC7871411 DOI: 10.1186/s12864-021-07416-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Accepted: 01/29/2021] [Indexed: 12/13/2022] Open
Abstract
Background Oil flax (linseed, Linum usitatissimum L.) is one of the most important oil crops., However, the increases in drought resulting from climate change have dramatically reduces linseed yield and quality, but very little is known about how linseed coordinates the expression of drought resistance gene in response to different level of drought stress (DS) on the genome-wide level. Results To explore the linseed transcriptional response of DS and repeated drought (RD) stress, we determined the drought tolerance of different linseed varieties. Then we performed full-length transcriptome sequencing of drought-resistant variety (Z141) and drought-sensitive variety (NY-17) under DS and RD stress at the seedling stage using single-molecule real-time sequencing and RNA-sequencing. Gene Ontology (GO) and reduce and visualize GO (REVIGO) enrichment analysis showed that upregulated genes of Z141 were enriched in more functional pathways related to plant drought tolerance than those of NY-17 were under DS. In addition, 4436 linseed transcription factors were identified, and 1190 were responsive to stress treatments. Moreover, protein-protein interaction (PPI) network analysis showed that the proline biosynthesis pathway interacts with stress response genes through RAD50 (DNA repair protein 50) interacting protein 1 (RIN-1). Finally, proline biosynthesis and DNA repair structural gene expression patterns were verified by RT- PCR. Conclusions The drought tolerance of Z141 may be related to its upregulation of drought tolerance genes under DS. Proline may play an important role in linseed drought tolerance by maintaining cell osmotic and protecting DNA from ROS damage. In summary, this study provides a new perspective to understand the drought adaptability of linseed. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07416-5.
Collapse
Affiliation(s)
- Wei Wang
- Key Laboratory of Biology and Genetic Improvement of Oil Crops of Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of Chinese Academy of Agricultural Science, Wuhan, 430062, China
| | - Lei Wang
- Key Laboratory of Biology and Genetic Improvement of Oil Crops of Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of Chinese Academy of Agricultural Science, Wuhan, 430062, China
| | - Ling Wang
- Key Laboratory of Biology and Genetic Improvement of Oil Crops of Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of Chinese Academy of Agricultural Science, Wuhan, 430062, China
| | - Meilian Tan
- Key Laboratory of Biology and Genetic Improvement of Oil Crops of Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of Chinese Academy of Agricultural Science, Wuhan, 430062, China
| | - Collins O Ogutu
- CAS Key Laboratory of Plant Germplasm Enhancement and Specicalty Agriculature, Wuhan Botanical Garden, The Innovative Academy of Science Design, Chinese Academy of Sciences, Wuhan, 430074, China
| | - Ziyan Yin
- Key Laboratory of Biology and Genetic Improvement of Oil Crops of Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of Chinese Academy of Agricultural Science, Wuhan, 430062, China
| | - Jian Zhou
- Wuhan Igenebook Biotechnology Co.,Ltd, Wuhan, 430075, China
| | - Jiaomei Wang
- Key Laboratory of Biology and Genetic Improvement of Oil Crops of Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of Chinese Academy of Agricultural Science, Wuhan, 430062, China
| | - Lijun Wang
- Key Laboratory of Biology and Genetic Improvement of Oil Crops of Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of Chinese Academy of Agricultural Science, Wuhan, 430062, China
| | - Xingchu Yan
- Key Laboratory of Biology and Genetic Improvement of Oil Crops of Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of Chinese Academy of Agricultural Science, Wuhan, 430062, China.
| |
Collapse
|
15
|
Liu J, Wang J, Xiao X, Lai X, Dai D, Zhang X, Zhu X, Zhao Z, Wang J, Li Z. A hybrid correcting method considering heterozygous variations by a comprehensive probabilistic model. BMC Genomics 2020; 21:753. [PMID: 33208104 PMCID: PMC7677778 DOI: 10.1186/s12864-020-07008-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Background The emergence of the third generation sequencing technology, featuring longer read lengths, has demonstrated great advancement compared to the next generation sequencing technology and greatly promoted the biological research. However, the third generation sequencing data has a high level of the sequencing error rates, which inevitably affects the downstream analysis. Although the issue of sequencing error has been improving these years, large amounts of data were produced at high sequencing errors, and huge waste will be caused if they are discarded. Thus, the error correction for the third generation sequencing data is especially important. The existing error correction methods have poor performances at heterozygous sites, which are ubiquitous in diploid and polyploidy organisms. Therefore, it is a lack of error correction algorithms for the heterozygous loci, especially at low coverages. Results In this article, we propose a error correction method, named QIHC. QIHC is a hybrid correction method, which needs both the next generation and third generation sequencing data. QIHC greatly enhances the sensitivity of identifying the heterozygous sites from sequencing errors, which leads to a high accuracy on error correction. To achieve this, QIHC established a set of probabilistic models based on Bayesian classifier, to estimate the heterozygosity of a site and makes a judgment by calculating the posterior probabilities. The proposed method is consisted of three modules, which respectively generates a pseudo reference sequence, obtains the read alignments, estimates the heterozygosity the sites and corrects the read harboring them. The last module is the core module of QIHC, which is designed to fit for the calculations of multiple cases at a heterozygous site. The other two modules enable the reads mapping to the pseudo reference sequence which somehow overcomes the inefficiency of multiple mappings that adopt by the existing error correction methods. Conclusions To verify the performance of our method, we selected Canu and Jabba to compare with QIHC in several aspects. As a hybrid correction method, we first conducted a groups of experiments under different coverages of the next-generation sequencing data. QIHC is far ahead of Jabba on accuracy. Meanwhile, we varied the coverages of the third generation sequencing data and compared performances again among Canu, Jabba and QIHC. QIHC outperforms the other two methods on accuracy of both correcting the sequencing errors and identifying the heterozygous sites, especially at low coverage. We carried out a comparison analysis between Canu and QIHC on the different error rates of the third generation sequencing data. QIHC still performs better. Therefore, QIHC is superior to the existing error correction methods when heterozygous sites exist.
Collapse
Affiliation(s)
- Jiaqi Liu
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,Shaanxi Engineering Research Center of Medical and Health Big Data, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
| | - Jiayin Wang
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China. .,Shaanxi Engineering Research Center of Medical and Health Big Data, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.
| | - Xiao Xiao
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,School of Public Policy and Administration, Xi'an Jiaotong University, Xi'an, 710048, China
| | - Xin Lai
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,Shaanxi Engineering Research Center of Medical and Health Big Data, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
| | - Daocheng Dai
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,Shaanxi Engineering Research Center of Medical and Health Big Data, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
| | - Xuanping Zhang
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,Shaanxi Engineering Research Center of Medical and Health Big Data, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
| | - Xiaoyan Zhu
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,Shaanxi Engineering Research Center of Medical and Health Big Data, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
| | - Zhongmeng Zhao
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,Shaanxi Engineering Research Center of Medical and Health Big Data, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
| | - Juan Wang
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China.,Annoroad Gene Institute, Beijing, 100176, China
| | - Zhimin Li
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China. .,Annoroad Gene Institute, Beijing, 100176, China.
| |
Collapse
|
16
|
Lateral Gene Transfer Mechanisms and Pan-genomes in Eukaryotes. Trends Parasitol 2020; 36:927-941. [DOI: 10.1016/j.pt.2020.07.014] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 07/20/2020] [Accepted: 07/20/2020] [Indexed: 02/06/2023]
|
17
|
Chappell L, Ross P, Orchard L, Russell TJ, Otto TD, Berriman M, Rayner JC, Llinás M. Refining the transcriptome of the human malaria parasite Plasmodium falciparum using amplification-free RNA-seq. BMC Genomics 2020; 21:395. [PMID: 32513207 PMCID: PMC7278070 DOI: 10.1186/s12864-020-06787-5] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 05/19/2020] [Indexed: 12/24/2022] Open
Abstract
Background Plasmodium parasites undergo several major developmental transitions during their complex lifecycle, which are enabled by precisely ordered gene expression programs. Transcriptomes from the 48-h blood stages of the major human malaria parasite Plasmodium falciparum have been described using cDNA microarrays and RNA-seq, but these assays have not always performed well within non-coding regions, where the AT-content is often 90–95%. Results We developed a directional, amplification-free RNA-seq protocol (DAFT-seq) to reduce bias against AT-rich cDNA, which we have applied to three strains of P. falciparum (3D7, HB3 and IT). While strain-specific differences were detected, overall there is strong conservation between the transcriptional profiles. For the 3D7 reference strain, transcription was detected from 89% of the genome, with over 78% of the genome transcribed into mRNAs. We also find that transcription from bidirectional promoters frequently results in non-coding, antisense transcripts. These datasets allowed us to refine the 5′ and 3′ untranslated regions (UTRs), which can be variable, long (> 1000 nt), and often overlap those of adjacent transcripts. Conclusions The approaches applied in this study allow a refined description of the transcriptional landscape of P. falciparum and demonstrate that very little of the densely packed P. falciparum genome is inactive or redundant. By capturing the 5′ and 3′ ends of mRNAs, we reveal both constant and dynamic use of transcriptional start sites across the intraerythrocytic developmental cycle that will be useful in guiding the definition of regulatory regions for use in future experimental gene expression studies.
Collapse
Affiliation(s)
- Lia Chappell
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Philipp Ross
- Department of Biochemistry & Molecular Biology and Huck Center for Malaria Research, Pennsylvania State University, University Park, PA, 16802, USA.,Present Address: Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL, 60637, USA
| | - Lindsey Orchard
- Department of Biochemistry & Molecular Biology and Huck Center for Malaria Research, Pennsylvania State University, University Park, PA, 16802, USA
| | - Timothy J Russell
- Department of Biochemistry & Molecular Biology and Huck Center for Malaria Research, Pennsylvania State University, University Park, PA, 16802, USA
| | - Thomas D Otto
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK.,Present Address: Institute of Infection, Immunity and Inflammation, MVLS, University of Glasgow, Glasgow, G12 8TA, UK
| | - Matthew Berriman
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Julian C Rayner
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK.,Present Address: Cambridge Institute for Medical Research, University of Cambridge, Cambridge, CB2 0XY, UK
| | - Manuel Llinás
- Department of Biochemistry & Molecular Biology and Huck Center for Malaria Research, Pennsylvania State University, University Park, PA, 16802, USA. .,Department of Chemistry, Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
18
|
Hammam E, Ananda G, Sinha A, Scheidig-Benatar C, Bohec M, Preiser PR, Dedon PC, Scherf A, Vembar SS. Discovery of a new predominant cytosine DNA modification that is linked to gene expression in malaria parasites. Nucleic Acids Res 2020; 48:184-199. [PMID: 31777939 PMCID: PMC6943133 DOI: 10.1093/nar/gkz1093] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 10/09/2019] [Accepted: 11/05/2019] [Indexed: 12/13/2022] Open
Abstract
DNA cytosine modifications are key epigenetic regulators of cellular processes in mammalian cells, with their misregulation leading to varied disease states. In the human malaria parasite Plasmodium falciparum, a unicellular eukaryotic pathogen, little is known about the predominant cytosine modifications, cytosine methylation (5mC) and hydroxymethylation (5hmC). Here, we report the first identification of a hydroxymethylcytosine-like (5hmC-like) modification in P. falciparum asexual blood stages using a suite of biochemical methods. In contrast to mammalian cells, we report 5hmC-like levels in the P. falciparum genome of 0.2–0.4%, which are significantly higher than the methylated cytosine (mC) levels of 0.01–0.05%. Immunoprecipitation of hydroxymethylated DNA followed by next generation sequencing (hmeDIP-seq) revealed that 5hmC-like modifications are enriched in gene bodies with minimal dynamic changes during asexual development. Moreover, levels of the 5hmC-like base in gene bodies positively correlated to transcript levels, with more than 2000 genes stably marked with this modification throughout asexual development. Our work highlights the existence of a new predominant cytosine DNA modification pathway in P. falciparum and opens up exciting avenues for gene regulation research and the development of antimalarials.
Collapse
Affiliation(s)
- Elie Hammam
- Institut Pasteur, 75015 Paris, France.,CNRS ERL9195, 75015 Paris, France.,INSERM U1201, 75015 Paris, France.,Sorbonne Université, Ecole doctorale Complexité du Vivant ED515, F-75005 Paris, France
| | - Guruprasad Ananda
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Ameya Sinha
- Antimicrobial Resistance Interdisciplinary Research Group, Singapore-MIT Alliance for Research and Technology, Singapore 138602, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Christine Scheidig-Benatar
- Institut Pasteur, 75015 Paris, France.,CNRS ERL9195, 75015 Paris, France.,INSERM U1201, 75015 Paris, France
| | - Mylene Bohec
- Institut Curie Genomics of Excellence (ICGex) Platform, Institut Curie Research Center, 75005 Paris, France
| | - Peter R Preiser
- Antimicrobial Resistance Interdisciplinary Research Group, Singapore-MIT Alliance for Research and Technology, Singapore 138602, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Peter C Dedon
- Antimicrobial Resistance Interdisciplinary Research Group, Singapore-MIT Alliance for Research and Technology, Singapore 138602, Singapore.,Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Artur Scherf
- Institut Pasteur, 75015 Paris, France.,CNRS ERL9195, 75015 Paris, France.,INSERM U1201, 75015 Paris, France
| | - Shruthi S Vembar
- Institut Pasteur, 75015 Paris, France.,CNRS ERL9195, 75015 Paris, France.,INSERM U1201, 75015 Paris, France
| |
Collapse
|
19
|
Wang A, Au KF. Performance difference of graph-based and alignment-based hybrid error correction methods for error-prone long reads. Genome Biol 2020; 21:14. [PMID: 31952552 PMCID: PMC6966875 DOI: 10.1186/s13059-019-1885-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Accepted: 11/10/2019] [Indexed: 11/10/2022] Open
Abstract
The error-prone third-generation sequencing (TGS) long reads can be corrected by the high-quality second-generation sequencing (SGS) short reads, which is referred to as hybrid error correction. We here investigate the influences of the principal algorithmic factors of two major types of hybrid error correction methods by mathematical modeling and analysis on both simulated and real data. Our study reveals the distribution of accuracy gain with respect to the original long read error rate. We also demonstrate that the original error rate of 19% is the limit for perfect correction, beyond which long reads are too error-prone to be corrected by these methods.
Collapse
Affiliation(s)
- Anqi Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
- Department of Internal Medicine, University of Iowa, Iowa City, IA, 52242, USA
| | - Kin Fai Au
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
- Department of Internal Medicine, University of Iowa, Iowa City, IA, 52242, USA.
- Department of Biostatistics, University of Iowa, Iowa City, IA, 52242, USA.
| |
Collapse
|
20
|
Moser KA, Drábek EF, Dwivedi A, Stucke EM, Crabtree J, Dara A, Shah Z, Adams M, Li T, Rodrigues PT, Koren S, Phillippy AM, Munro JB, Ouattara A, Sparklin BC, Dunning Hotopp JC, Lyke KE, Sadzewicz L, Tallon LJ, Spring MD, Jongsakul K, Lon C, Saunders DL, Ferreira MU, Nyunt MM, Laufer MK, Travassos MA, Sauerwein RW, Takala-Harrison S, Fraser CM, Sim BKL, Hoffman SL, Plowe CV, Silva JC. Strains used in whole organism Plasmodium falciparum vaccine trials differ in genome structure, sequence, and immunogenic potential. Genome Med 2020; 12:6. [PMID: 31915075 PMCID: PMC6950926 DOI: 10.1186/s13073-019-0708-9] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 12/19/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Plasmodium falciparum (Pf) whole-organism sporozoite vaccines have been shown to provide significant protection against controlled human malaria infection (CHMI) in clinical trials. Initial CHMI studies showed significantly higher durable protection against homologous than heterologous strains, suggesting the presence of strain-specific vaccine-induced protection. However, interpretation of these results and understanding of their relevance to vaccine efficacy have been hampered by the lack of knowledge on genetic differences between vaccine and CHMI strains, and how these strains are related to parasites in malaria endemic regions. METHODS Whole genome sequencing using long-read (Pacific Biosciences) and short-read (Illumina) sequencing platforms was conducted to generate de novo genome assemblies for the vaccine strain, NF54, and for strains used in heterologous CHMI (7G8 from Brazil, NF166.C8 from Guinea, and NF135.C10 from Cambodia). The assemblies were used to characterize sequences in each strain relative to the reference 3D7 (a clone of NF54) genome. Strains were compared to each other and to a collection of clinical isolates (sequenced as part of this study or from public repositories) from South America, sub-Saharan Africa, and Southeast Asia. RESULTS While few variants were detected between 3D7 and NF54, we identified tens of thousands of variants between NF54 and the three heterologous strains. These variants include SNPs, indels, and small structural variants that fall in regulatory and immunologically important regions, including transcription factors (such as PfAP2-L and PfAP2-G) and pre-erythrocytic antigens that may be key for sporozoite vaccine-induced protection. Additionally, these variants directly contributed to diversity in immunologically important regions of the genomes as detected through in silico CD8+ T cell epitope predictions. Of all heterologous strains, NF135.C10 had the highest number of unique predicted epitope sequences when compared to NF54. Comparison to global clinical isolates revealed that these four strains are representative of their geographic origin despite long-term culture adaptation; of note, NF135.C10 is from an admixed population, and not part of recently formed subpopulations resistant to artemisinin-based therapies present in the Greater Mekong Sub-region. CONCLUSIONS These results will assist in the interpretation of vaccine efficacy of whole-organism vaccines against homologous and heterologous CHMI.
Collapse
Affiliation(s)
- Kara A. Moser
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
- Present address: Institute for Global Health and Infectious Diseases, University of North Carolina Chapel Hill, Chapel Hill, USA
| | - Elliott F. Drábek
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Ankit Dwivedi
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Emily M. Stucke
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Jonathan Crabtree
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Antoine Dara
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Zalak Shah
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Matthew Adams
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Tao Li
- Sanaria, Inc., Rockville, MD 20850 USA
| | - Priscila T. Rodrigues
- Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD 20892 USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD 20892 USA
| | - James B. Munro
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Amed Ouattara
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Benjamin C. Sparklin
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Julie C. Dunning Hotopp
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Kirsten E. Lyke
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Lisa Sadzewicz
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Luke J. Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Michele D. Spring
- Department of Bacterial and Parasitic Diseases, Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Krisada Jongsakul
- Department of Bacterial and Parasitic Diseases, Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Chanthap Lon
- Department of Bacterial and Parasitic Diseases, Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - David L. Saunders
- Department of Bacterial and Parasitic Diseases, Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
- Present address: Warfighter Expeditionary Medicine and Treatment, US Army Medical Material Development Activity, Frederick, USA
| | - Marcelo U. Ferreira
- Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
| | - Myaing M. Nyunt
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD 21201 USA
- Present address: Duke Global Health Institute, Duke University, Durham, NC 27708 USA
| | - Miriam K. Laufer
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Mark A. Travassos
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Robert W. Sauerwein
- Department of Medical Microbiology, Radboud University Medical Center, Nijmegen, Netherlands
| | - Shannon Takala-Harrison
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Claire M. Fraser
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | | | | | - Christopher V. Plowe
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD 21201 USA
- Present address: Duke Global Health Institute, Duke University, Durham, NC 27708 USA
| | - Joana C. Silva
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| |
Collapse
|
21
|
Lou F, Song N, Han Z, Gao T. Single-molecule real-time (SMRT) sequencing facilitates Tachypleus tridentatus genome annotation. Int J Biol Macromol 2020; 147:89-97. [PMID: 31923512 DOI: 10.1016/j.ijbiomac.2020.01.029] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 01/04/2020] [Accepted: 01/04/2020] [Indexed: 12/19/2022]
Abstract
Tachypleus tridentatus is a keystone species in marine ecosystems. Its hemolymph also provides the limulus amebocyte lysate (LAL) for detection of bacterial endotoxin in human medical service. Here we combined SMRT sequencing and Illumina RNA-seq to characterize the novel isoforms, novel genetic loci, fusion isoforms formation and transcriptome structure and further to unveil the transcriptome complexity of T. tridentatus. We identified 26,705 non-redundancy isoforms form 10,919 genetic loci, including 25,713 novel isoforms, 2403 novel genes and 170 fusion isoforms. In addition, 1578 novel genes and 23,172 novel isoforms were annotated in the NR, Pfam, KOG, COG, eggNOG, Swiss-Prot, KEGG and GO databases. Meanwhile, we have obtained 4671 gene family clustering based on genetic loci. Furthermore, there are 17,296, 4887, 1054, and 1435 APAs, AS events, lncRNAs, and TFs were identified in the T. tridentatus long-read transcriptome and the target genes of 1054 lncRNA sequences were also predicted. Overall, our work firstly provided the long-read transcriptome and these data are very necessary to improve the annotation information of T. tridentatus genome and optimize the boundaries of 12,342 original reference annotated genes. Furthermore, these information are a potential resource to study LAL secretion mechanisms in T. tridentatus.
Collapse
Affiliation(s)
- Fangrui Lou
- Fishery College, Ocean University of China, Qingdao, Shandong 266003, China; Fishery College, Zhejiang Ocean University, Zhoushan, Zhejiang 316022, China
| | - Na Song
- Fishery College, Ocean University of China, Qingdao, Shandong 266003, China
| | - Zhiqiang Han
- Fishery College, Zhejiang Ocean University, Zhoushan, Zhejiang 316022, China.
| | - Tianxiang Gao
- Fishery College, Zhejiang Ocean University, Zhoushan, Zhejiang 316022, China.
| |
Collapse
|
22
|
Calarco L, Barratt J, Ellis J. Detecting sequence variants in clinically important protozoan parasites. Int J Parasitol 2019; 50:1-18. [PMID: 31857072 DOI: 10.1016/j.ijpara.2019.10.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 09/29/2019] [Accepted: 10/01/2019] [Indexed: 02/06/2023]
Abstract
Second and third generation sequencing methods are crucial for population genetic studies, and variant detection is a popular approach for exploiting this sequence data. While mini- and microsatellites are historically useful markers for studying important Protozoa such as Toxoplasma and Plasmodium spp., detecting non-repetitive variants such as those found in genes can be fundamental to investigating a pathogen's biology. These variants, namely single nucleotide polymorphisms and insertions and deletions, can help elucidate the genetic basis of an organism's pathogenicity, identify selective pressures, and resolve phylogenetic relationships. They also have the added benefit of possessing a comparatively low mutation rate, which contributes to their stability. However, there is a plethora of variant analysis tools with nuanced pipelines and conflicting recommendations for best practise, which can be confounding. This lack of standardisation means that variant analysis requires careful parameter optimisation, an understanding of its limitations, and the availability of high quality data. This review explores the value of variant detection when applied to non-model organisms such as clinically important protozoan pathogens. The limitations of current methods are discussed, including special considerations that require the end-users' attention to ensure that the results generated are reproducible, and the biological conclusions drawn are valid.
Collapse
Affiliation(s)
- Larissa Calarco
- School of Life Sciences, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia.
| | - Joel Barratt
- School of Life Sciences, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia
| | - John Ellis
- School of Life Sciences, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia
| |
Collapse
|
23
|
Erath J, Djuranovic S, Djuranovic SP. Adaptation of Translational Machinery in Malaria Parasites to Accommodate Translation of Poly-Adenosine Stretches Throughout Its Life Cycle. Front Microbiol 2019; 10:2823. [PMID: 31866984 PMCID: PMC6908487 DOI: 10.3389/fmicb.2019.02823] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Accepted: 11/21/2019] [Indexed: 11/13/2022] Open
Abstract
Malaria is caused by unicellular apicomplexan parasites of the genus Plasmodium, which includes the major human parasite Plasmodium falciparum. The complex cycle of the malaria parasite in both mosquito and human hosts has been studied extensively. There is tight control of gene expression in each developmental stage, and at every level of gene synthesis: from RNA transcription, to its subsequent translation, and finally post-translational modifications of the resulting protein. Whole-genome sequencing of P. falciparum has laid the foundation for significant biological advances by revealing surprising genomic information. The P. falciparum genome is extremely AT-rich (∼80%), with a substantial portion of genes encoding intragenic polyadenosine (polyA) tracks being expressed throughout the entire parasite life cycle. In most eukaryotes, intragenic polyA runs act as negative regulators of gene expression. Recent studies have shown that translation of mRNAs containing 12 or more consecutive adenosines results in ribosomal stalling and frameshifting; activating mRNA surveillance mechanisms. In contrast, P. falciparum translational machinery can efficiently and accurately translate polyA tracks without activating mRNA surveillance pathways. This unique feature of P. falciparum raises interesting questions: (1) How is P. falciparum able to efficiently and correctly translate polyA track transcripts, and (2) What are the specifics of the translational machinery and mRNA surveillance mechanisms that separate P. falciparum from other organisms? In this review, we analyze possible evolutionary shifts in P. falciparum protein synthesis machinery that allow efficient translation of an AU rich-transcriptome. We focus on physiological and structural differences of P. falciparum stage specific ribosomes, ribosome-associated proteins, and changes in mRNA surveillance mechanisms throughout the complete parasite life cycle, with an emphasis on the mosquito and liver stages.
Collapse
Affiliation(s)
| | - Sergej Djuranovic
- Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, MO, United States
| | - Slavica Pavlovic Djuranovic
- Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, MO, United States
| |
Collapse
|
24
|
Tan C, Liu H, Ren J, Ye X, Feng H, Liu Z. Single-molecule real-time sequencing facilitates the analysis of transcripts and splice isoforms of anthers in Chinese cabbage (Brassica rapa L. ssp. pekinensis). BMC PLANT BIOLOGY 2019; 19:517. [PMID: 31771515 PMCID: PMC6880451 DOI: 10.1186/s12870-019-2133-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2019] [Accepted: 11/12/2019] [Indexed: 05/06/2023]
Abstract
BACKGROUND Anther development has been extensively studied at the transcriptional level, but a systematic analysis of full-length transcripts on a genome-wide scale has not yet been published. Here, the Pacific Biosciences (PacBio) Sequel platform and next-generation sequencing (NGS) technology were combined to generate full-length sequences and completed structures of transcripts in anthers of Chinese cabbage. RESULTS Using single-molecule real-time sequencing (SMRT), a total of 1,098,119 circular consensus sequences (CCSs) were generated with a mean length of 2664 bp. More than 75% of the CCSs were considered full-length non-chimeric (FLNC) reads. After error correction, 725,731 high-quality FLNC reads were estimated to carry 51,501 isoforms from 19,503 loci, consisting of 38,992 novel isoforms from known genes and 3691 novel isoforms from novel genes. Of the novel isoforms, we identified 407 long non-coding RNAs (lncRNAs) and 37,549 open reading frames (ORFs). Furthermore, a total of 453,270 alternative splicing (AS) events were identified and the majority of AS models in anther were determined to be approximate exon skipping (XSKIP) events. Of the key genes regulated during anther development, AS events were mainly identified in the genes SERK1, CALS5, NEF1, and CESA1/3. Additionally, we identified 104 fusion transcripts and 5806 genes that had alternative polyadenylation (APA). CONCLUSIONS Our work demonstrated the transcriptome diversity and complexity of anther development in Chinese cabbage. The findings provide a basis for further genome annotation and transcriptome research in Chinese cabbage.
Collapse
Affiliation(s)
- Chong Tan
- College of Horticulture, Shenyang Agricultural University, Shenyang, Liaoning, 110866, People's Republic of China
| | - Hongxin Liu
- College of Horticulture, Shenyang Agricultural University, Shenyang, Liaoning, 110866, People's Republic of China
| | - Jie Ren
- College of Horticulture, Shenyang Agricultural University, Shenyang, Liaoning, 110866, People's Republic of China
| | - Xueling Ye
- College of Horticulture, Shenyang Agricultural University, Shenyang, Liaoning, 110866, People's Republic of China
| | - Hui Feng
- College of Horticulture, Shenyang Agricultural University, Shenyang, Liaoning, 110866, People's Republic of China
| | - Zhiyong Liu
- College of Horticulture, Shenyang Agricultural University, Shenyang, Liaoning, 110866, People's Republic of China.
| |
Collapse
|
25
|
Böhme U, Otto TD, Sanders M, Newbold CI, Berriman M. Progression of the canonical reference malaria parasite genome from 2002-2019. Wellcome Open Res 2019; 4:58. [PMID: 31080894 PMCID: PMC6484455 DOI: 10.12688/wellcomeopenres.15194.2] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/24/2019] [Indexed: 01/15/2023] Open
Abstract
Here we describe the ways in which the sequence and annotation of the
Plasmodium falciparum reference genome has changed since its publication in 2002. As the malaria species responsible for the most deaths worldwide, the richness of annotation and accuracy of the sequence are important resources for the
P. falciparum research community as well as the basis for interpreting the genomes of subsequently sequenced species. At the time of publication in 2002 over 60% of predicted genes had unknown functions. As of March 2019, this number has been significantly decreased to 33%. The reduction is due to the inclusion of genes that were subsequently characterised experimentally and genes with significant similarity to others with known functions. In addition, the structural annotation of genes has been significantly refined; 27% of gene structures have been changed since 2002, comprising changes in exon-intron boundaries, addition or deletion of exons and the addition or deletion of genes. The sequence has also undergone significant improvements. In addition to the correction of a large number of single-base and insertion or deletion errors, a major miss-assembly between the subtelomeres of chromosome 7 and 8 has been corrected. As the number of sequenced isolates continues to grow rapidly, a single reference genome will not be an adequate basis for interpreting intra-species sequence diversity. We therefore describe in this publication a population reference genome of
P. falciparum, called Pfref1. This reference will enable the community to map to regions that are not present in the current assembly.
P. falciparum 3D7 will continue to be maintained, with ongoing curation ensuring continual improvements in annotation quality.
Collapse
Affiliation(s)
- Ulrike Böhme
- Parasite Genomics, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | - Thomas D Otto
- Parasite Genomics, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.,Institute of Infection, Immunity and Inflammation, MVLS, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Mandy Sanders
- Parasite Genomics, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | - Chris I Newbold
- Parasite Genomics, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.,Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DU, UK
| | - Matthew Berriman
- Parasite Genomics, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
26
|
Böhme U, Otto TD, Sanders M, Newbold CI, Berriman M. Progression of the canonical reference malaria parasite genome from 2002-2019. Wellcome Open Res 2019; 4:58. [PMID: 31080894 DOI: 10.12688/wellcomeopenres.15194.1] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/21/2019] [Indexed: 11/20/2022] Open
Abstract
Here we describe the ways in which the sequence and annotation of the Plasmodium falciparum reference genome has changed since its publication in 2002. As the malaria species responsible for the most deaths worldwide, the richness of annotation and accuracy of the sequence are important resources for the P. falciparum research community as well as the basis for interpreting the genomes of subsequently sequenced species. At the time of publication in 2002 over 60% of predicted genes had unknown functions. As of March 2019, this number has been significantly decreased to 33%. The reduction is due to the inclusion of genes that were subsequently characterised experimentally and genes with significant similarity to others with known functions. In addition, the structural annotation of genes has been significantly refined; 27% of gene structures have been changed since 2002, comprising changes in exon-intron boundaries, addition or deletion of exons and the addition or deletion of genes. The sequence has also undergone significant improvements. In addition to the correction of a large number of single-base and insertion or deletion errors, a major miss-assembly between the subtelomeres of chromosome 7 and 8 has been corrected. As the number of sequenced isolates continues to grow rapidly, a single reference genome will not be an adequate basis for interpreting intra-species sequence diversity. We therefore describe in this publication a population reference genome of P. falciparum, called Pfref1. This reference will enable the community to map to regions that are not present in the current assembly. P. falciparum 3D7 will continue to be maintained, with ongoing curation ensuring continual improvements in annotation quality.
Collapse
Affiliation(s)
- Ulrike Böhme
- Parasite Genomics, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | - Thomas D Otto
- Parasite Genomics, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.,Institute of Infection, Immunity and Inflammation, MVLS, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Mandy Sanders
- Parasite Genomics, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | - Chris I Newbold
- Parasite Genomics, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.,Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DU, UK
| | - Matthew Berriman
- Parasite Genomics, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
27
|
Jayakumar V, Sakakibara Y. Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data. Brief Bioinform 2019; 20:866-876. [PMID: 29112696 PMCID: PMC6585154 DOI: 10.1093/bib/bbx147] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 09/22/2017] [Indexed: 12/20/2022] Open
Abstract
Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences (PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms.
Collapse
|
28
|
Schmedes SE, Patel D, Kelley J, Udhayakumar V, Talundzic E. Using the Plasmodium mitochondrial genome for classifying mixed-species infections and inferring the geographical origin of P. falciparum parasites imported to the U.S. PLoS One 2019; 14:e0215754. [PMID: 31039178 PMCID: PMC6490880 DOI: 10.1371/journal.pone.0215754] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Accepted: 04/08/2019] [Indexed: 12/20/2022] Open
Abstract
The ability to identify mixed-species infections and track the origin of Plasmodium parasites can further enhance the development of treatment and prevention recommendations as well as outbreak investigations. Here, we explore the utility of using the full Plasmodium mitochondrial genome to classify Plasmodium species, detect mixed infections, and infer the geographical origin of imported P. falciparum parasites to the United States (U.S.). Using the recently developed standardized, high-throughput Malaria Resistance Surveillance (MaRS) protocol, the full Plasmodium mitochondrial genomes of 265 malaria cases imported to the U.S. from 2014-2017 were sequenced and analyzed. P. falciparum infections were found in 94.7% (251/265) of samples. Five percent (14/265) of samples were identified as mixed- Plasmodium species or non-P. falciparum, including P. vivax, P. malariae, P. ovale curtisi, and P. ovale wallikeri. P. falciparum mitochondrial haplotypes analysis revealed greater than eighteen percent of samples to have at least two P. falciparum mitochondrial genome haplotypes, indicating either heteroplasmy or multi-clonal infections. Maximum-likelihood phylogenies of 912 P. falciparum mitochondrial genomes with known country origin were used to infer the geographical origin of thirteen samples from persons with unknown travel histories as: Africa (country unspecified) (n = 10), Ghana (n = 1), Southeast Asia (n = 1), and the Philippines (n = 1). We demonstrate the utility and current limitations of using the Plasmodium mitochondrial genome to classify samples with mixed-infections and infer the geographical origin of imported P. falciparum malaria cases to the U.S. with unknown travel history.
Collapse
Affiliation(s)
- Sarah E. Schmedes
- Malaria Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, United States America
- Association of Public Health Laboratories, Silver Spring, Maryland, United States America
| | - Dhruviben Patel
- Malaria Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, United States America
- Williams Consulting LLC, Baltimore, Maryland, United States America
| | - Julia Kelley
- Malaria Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, United States America
- Atlanta Research and Education Foundation, Atlanta, Georgia, United States America
| | - Venkatachalam Udhayakumar
- Malaria Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, United States America
| | - Eldin Talundzic
- Malaria Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, United States America
| |
Collapse
|
29
|
Korhonen PK, Hall RS, Young ND, Gasser RB. Common workflow language (CWL)-based software pipeline for de novo genome assembly from long- and short-read data. Gigascience 2019; 8:giz014. [PMID: 30821816 PMCID: PMC6451199 DOI: 10.1093/gigascience/giz014] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Revised: 11/03/2018] [Accepted: 01/25/2019] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Here, we created an automated pipeline for the de novoassembly of genomes from Pacific Biosciences long-read and Illumina short-read data using common workflow language (CWL). To evaluate the performance of this pipeline, we assembled the nuclear genomes of the eukaryotes Caenorhabditis elegans (∼100 Mb), Drosophila melanogaster (∼138 Mb), and Plasmodium falciparum (∼23 Mb) directly from publicly accessible nucleotide sequence datasets and assessed the quality of the assemblies against curated reference genomes. FINDINGS We showed a dependency of the accuracy of assembly on sequencing technology and GC content and repeatedly achieved assemblies that meet the high standards set by the National Human Genome Research Institute, being applicable to gene prediction and subsequent genomic analyses. CONCLUSIONS This CWL pipeline overcomes current challenges of achieving repeatability and reproducibility of assembly results and offers a platform for the re-use of the workflow and the integration of diverse datasets. This workflow is publicly available via GitHub (https://github.com/vetscience/Assemblosis) and is currently applicable to the assembly of haploid and diploid genomes of eukaryotes.
Collapse
Affiliation(s)
- Pasi K Korhonen
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Ross S Hall
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Neil D Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
30
|
Fu S, Wang A, Au KF. A comparative evaluation of hybrid error correction methods for error-prone long reads. Genome Biol 2019; 20:26. [PMID: 30717772 PMCID: PMC6362602 DOI: 10.1186/s13059-018-1605-z] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 12/05/2018] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Third-generation sequencing technologies have advanced the progress of the biological research by generating reads that are substantially longer than second-generation sequencing technologies. However, their notorious high error rate impedes straightforward data analysis and limits their application. A handful of error correction methods for these error-prone long reads have been developed to date. The output data quality is very important for downstream analysis, whereas computing resources could limit the utility of some computing-intense tools. There is a lack of standardized assessments for these long-read error-correction methods. RESULTS Here, we present a comparative performance assessment of ten state-of-the-art error-correction methods for long reads. We established a common set of benchmarks for performance assessment, including sensitivity, accuracy, output rate, alignment rate, output read length, run time, and memory usage, as well as the effects of error correction on two downstream applications of long reads: de novo assembly and resolving haplotype sequences. CONCLUSIONS Taking into account all of these metrics, we provide a suggestive guideline for method choice based on available data size, computing resources, and individual research goals.
Collapse
Affiliation(s)
- Shuhua Fu
- Department of Internal Medicine, University of Iowa, Iowa City, IA, 52242, USA
| | - Anqi Wang
- Department of Internal Medicine, University of Iowa, Iowa City, IA, 52242, USA
| | - Kin Fai Au
- Department of Internal Medicine, University of Iowa, Iowa City, IA, 52242, USA.
- Department of Biostatistics, University of Iowa, Iowa City, IA, 52242, USA.
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
31
|
Chao Y, Yuan J, Guo T, Xu L, Mu Z, Han L. Analysis of transcripts and splice isoforms in Medicago sativa L. by single-molecule long-read sequencing. PLANT MOLECULAR BIOLOGY 2019; 99:219-235. [PMID: 30600412 DOI: 10.1007/s11103-018-0813-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 12/14/2018] [Indexed: 05/20/2023]
Abstract
The full-length transcriptome of alfalfa was analyzed with PacBio single-molecule long-read sequencing technology. The transcriptome data provided full-length sequences and gene isoforms of transcripts in alfalfa, which will improve genome annotation and enhance our understanding of the gene structure of alfalfa. As an important forage, alfalfa (Medicago sativa L.) is world-wide planted. For its complexity of genome and unfinished whole genome sequencing, the sequences and complete structure of mRNA transcripts remain unclear in alfalfa. In this study, single-molecule long-read sequencing was applied to investigate the alfalfa transcriptome using the Pacific Biosciences platform, and a total of 113,321 transcripts were obtained from young, mature and senescent leaves. We identified 72,606 open reading frames including 46,616 full-length ORFs, 1670 transcription factors from 54 TF families and 44,040 simple sequence repeats from 30,797 sequences. A total of 7568 alternative splicing events was identified and the majority of alternative splicing events in alfalfa was intron retention. In addition, we identified 17,740 long non-coding RNAs. Our results show the feasibility of deep sequencing full-length RNA from alfalfa transcriptome on a single-molecule level.
Collapse
Affiliation(s)
- Yuehui Chao
- Turfgrass Research Institute, Beijing Forestry University, Beijing, 100083, China
| | - Jianbo Yuan
- Turfgrass Research Institute, Beijing Forestry University, Beijing, 100083, China
| | - Tao Guo
- Turfgrass Research Institute, Beijing Forestry University, Beijing, 100083, China
| | - Lixin Xu
- Turfgrass Research Institute, Beijing Forestry University, Beijing, 100083, China
| | - Zhiyuan Mu
- Turfgrass Research Institute, Beijing Forestry University, Beijing, 100083, China
| | - Liebao Han
- Turfgrass Research Institute, Beijing Forestry University, Beijing, 100083, China.
| |
Collapse
|
32
|
Bruske E, Otto TD, Frank M. Whole genome sequencing and microsatellite analysis of the Plasmodium falciparum E5 NF54 strain show that the var, rifin and stevor gene families follow Mendelian inheritance. Malar J 2018; 17:376. [PMID: 30348135 PMCID: PMC6198375 DOI: 10.1186/s12936-018-2503-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Accepted: 10/03/2018] [Indexed: 12/30/2022] Open
Abstract
Background Plasmodium falciparum exhibits a high degree of inter-isolate genetic diversity in its variant surface antigen (VSA) families: P. falciparum erythrocyte membrane protein 1, repetitive interspersed family (RIFIN) and subtelomeric variable open reading frame (STEVOR). The role of recombination for the generation of this diversity is a subject of ongoing research. Here the genome of E5, a sibling of the 3D7 genome strain is presented. Short and long read whole genome sequencing (WGS) techniques (Ilumina, Pacific Bioscience) and a set of 84 microsatellites (MS) were employed to characterize the 3D7 and non-3D7 parts of the E5 genome. This is the first time that VSA genes in sibling parasites were analysed with long read sequencing technology. Results Of the 5733 E5 genes only 278 genes, mostly var and rifin/stevor genes, had no orthologues in the 3D7 genome. WGS and MS analysis revealed that chromosomal crossovers occurred at a rate of 0–3 per chromosome. var, stevor and rifin genes were inherited within the respective non-3D7 or 3D7 chromosomal context. 54 of the 84 MS PCR fragments correctly identified the respective MS as 3D7- or non-3D7 and this correlated with var and rifin/stevor gene inheritance in the adjacent chromosomal regions. E5 had 61 var and 189 rifin/stevor genes. One large non-chromosomal recombination event resulted in a new var gene on chromosome 14. The remainder of the E5 3D7-type subtelomeric and central regions were identical to 3D7. Conclusions The data show that the rifin/stevor and var gene families represent the most diverse compartments of the P. falciparum genome but that the majority of var genes are inherited without alterations within their respective parental chromosomal context. Furthermore, MS genotyping with 54 MS can successfully distinguish between two sibling progeny of a natural P. falciparum cross and thus can be used to investigate identity by descent in field isolates. Electronic supplementary material The online version of this article (10.1186/s12936-018-2503-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ellen Bruske
- Institute of Tropical Medicine, University of Tuebingen, Wilhelmstr. 27, 72074, Tuebingen, Germany
| | - Thomas D Otto
- Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK. .,Centre of Immunobiology, Institute of Infection, Immunity & Inflammation, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK.
| | - Matthias Frank
- Institute of Tropical Medicine, University of Tuebingen, Wilhelmstr. 27, 72074, Tuebingen, Germany.
| |
Collapse
|
33
|
Molecular assays for antimalarial drug resistance surveillance: A target product profile. PLoS One 2018; 13:e0204347. [PMID: 30235327 PMCID: PMC6147503 DOI: 10.1371/journal.pone.0204347] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 09/05/2018] [Indexed: 11/25/2022] Open
Abstract
Antimalarial drug resistance is a major constraint for malaria control and elimination efforts. Artemisinin-based combination therapy is now the mainstay for malaria treatment. However, delayed parasite clearance following treatment with artemisinin derivatives has now spread in the Greater Mekong Sub region and may emerge or spread to other malaria endemic regions. This spread is of great concern for malaria control programmes, as no alternatives to artemisinin-based combination therapies are expected to be available in the near future. There is a need to strengthen surveillance systems for early detection and response to the antimalarial drug resistance threat. Current surveillance is mainly done through therapeutic efficacy studies; however these studies are complex and both time- and resource-intensive. For multiple common antimalarials, parasite drug resistance has been correlated with specific genetic mutations, and the molecular markers associated with antimalarial drug resistance offer a simple and powerful tool to monitor the emergence and spread of resistant parasites. Different techniques to analyse molecular markers associated with antimalarial drug resistance are available, each with advantages and disadvantages. However, procedures are not adequately harmonized to facilitate comparisons between sites. Here we describe the target product profiles for tests to analyse molecular markers associated with antimalarial drug resistance, discuss how use of current techniques can be standardised, and identify the requirements for an ideal product that would allow malaria endemic countries to provide useful spatial and temporal information on the spread of resistance.
Collapse
|
34
|
Li Y, Fang C, Fu Y, Hu A, Li C, Zou C, Li X, Zhao S, Zhang C, Li C. A survey of transcriptome complexity in Sus scrofa using single-molecule long-read sequencing. DNA Res 2018; 25:421-437. [PMID: 29850846 PMCID: PMC6105124 DOI: 10.1093/dnares/dsy014] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Accepted: 05/08/2018] [Indexed: 12/19/2022] Open
Abstract
Alternative splicing (AS) and fusion transcripts produce a vast expansion of transcriptomes and proteomes diversity. However, the reliability of these events and the extend of epigenetic mechanisms have not been adequately addressed due to its limitation of uncertainties about the complete structure of mRNA. Here we combined single-molecule real-time sequencing, Illumina RNA-seq and DNA methylation data to characterize the landscapes of DNA methylation on AS, fusion isoforms formation and lncRNA feature and further to unveil the transcriptome complexity of pig. Our analysis identified an unprecedented scale of high-quality full-length isoforms with over 28,127 novel isoforms from 26,881 novel genes. More than 92,000 novel AS events were detected and intron retention predominated in AS model, followed by exon skipping. Interestingly, we found that DNA methylation played an important role in generating various AS isoforms by regulating splicing sites, promoter regions and first exons. Furthermore, we identified a large of fusion transcripts and novel lncRNAs, and found that DNA methylation of the promoter and gene body could regulate lncRNA expression. Our results significantly improved existed gene models of pig and unveiled that pig AS and epigenetic modify were more complex than previously thought.
Collapse
Affiliation(s)
- Yao Li
- Key Lab of Agriculture Animal Genetics, Breeding, and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Chengchi Fang
- Key Lab of Agriculture Animal Genetics, Breeding, and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Yuhua Fu
- Key Lab of Agriculture Animal Genetics, Breeding, and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - An Hu
- Key Lab of Agriculture Animal Genetics, Breeding, and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Cencen Li
- Key Lab of Agriculture Animal Genetics, Breeding, and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Cheng Zou
- Key Lab of Agriculture Animal Genetics, Breeding, and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Xinyun Li
- Key Lab of Agriculture Animal Genetics, Breeding, and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Shuhong Zhao
- Key Lab of Agriculture Animal Genetics, Breeding, and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Chengjun Zhang
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
| | - Changchun Li
- Key Lab of Agriculture Animal Genetics, Breeding, and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
35
|
Prosser C, Meyer W, Ellis J, Lee R. Evolutionary ARMS Race: Antimalarial Resistance Molecular Surveillance. Trends Parasitol 2018; 34:322-334. [PMID: 29396203 DOI: 10.1016/j.pt.2018.01.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 01/02/2018] [Accepted: 01/03/2018] [Indexed: 01/13/2023]
Abstract
Molecular surveillance of antimalarial drug resistance markers has become an important part of resistance detection and containment. In the current climate of multidrug resistance, including resistance to the global front-line drug artemisinin, there is a consensus to upscale molecular surveillance. The most salient limitation to current surveillance efforts is that skill and infrastructure requirements preclude many regions. This includes sub-Saharan Africa, where Plasmodium falciparum is responsible for most of the global malaria disease burden. New molecular and data technologies have emerged with an emphasis on accessibility. These may allow surveillance to be conducted in broad settings where it is most needed, including at the primary healthcare level in endemic countries, and extending to the village health worker.
Collapse
Affiliation(s)
- Christiane Prosser
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Westmead Clinical School-Sydney Medical School, Marie Bashir Institute for Infectious Diseases and Biosecurity, University of Sydney, Sydney, NSW, Australia; Westmead Institute for Medical Research, Westmead, NSW, Australia.
| | - Wieland Meyer
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Westmead Clinical School-Sydney Medical School, Marie Bashir Institute for Infectious Diseases and Biosecurity, University of Sydney, Sydney, NSW, Australia; Westmead Institute for Medical Research, Westmead, NSW, Australia
| | - John Ellis
- School of Life Sciences, University of Technology Sydney, NSW, Australia
| | - Rogan Lee
- Centre for Infectious Diseases and Microbiology Laboratory Services, Institute of Clinical Pathology & Medical Research, Westmead Hospital, Westmead, NSW, Australia
| |
Collapse
|
36
|
Abstract
The human malaria parasite Plasmodium falciparum replicates within circulating red blood cells, where it is subjected to conditions that frequently cause DNA damage. The repair of DNA double-stranded breaks (DSBs) is thought to rely almost exclusively on homologous recombination (HR), due to a lack of efficient nonhomologous end joining. However, given that the parasite is haploid during this stage of its life cycle, the mechanisms involved in maintaining genome stability are poorly understood. Of particular interest are the subtelomeric regions of the chromosomes, which contain the majority of the multicopy variant antigen-encoding genes responsible for virulence and disease severity. Here, we show that parasites utilize a competitive balance between de novo telomere addition, also called “telomere healing,” and HR to stabilize chromosome ends. Products of both repair pathways were observed in response to DSBs that occurred spontaneously during routine in vitro culture or resulted from experimentally induced DSBs, demonstrating that both pathways are active in repairing DSBs within subtelomeric regions and that the pathway utilized was determined by the DNA sequences immediately surrounding the break. In combination, these two repair pathways enable parasites to efficiently maintain chromosome stability while also contributing to the generation of genetic diversity. Malaria is a major global health threat, causing approximately 430,000 deaths annually. This mosquito-transmitted disease is caused by Plasmodium parasites, with infection with the species Plasmodium falciparum being the most lethal. Mechanisms underlying DNA repair and maintenance of genome integrity in P. falciparum are not well understood and represent a gap in our understanding of how parasites survive the hostile environment of their vertebrate and insect hosts. Our work examines DNA repair in real time by using single-molecule real-time (SMRT) sequencing focused on the subtelomeric regions of the genome that harbor the multicopy gene families important for virulence and the maintenance of infection. We show that parasites utilize two competing molecular mechanisms to repair double-strand breaks, homologous recombination and de novo telomere addition, with the pathway used being determined by the surrounding DNA sequence. In combination, these two pathways balance the need to maintain genome stability with the selective advantage of generating antigenic diversity.
Collapse
|
37
|
Zhang Y, Yao Y, Du W, Wu K, Xu W, Lin M, Tan H, Li J. Development of loop-mediated isothermal amplification with Plasmodium falciparum unique genes for molecular diagnosis of human malaria. Pathog Glob Health 2017; 111:247-255. [PMID: 28683669 PMCID: PMC5560202 DOI: 10.1080/20477724.2017.1347379] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
In order to achieve better outcomes for treatment and in the prophylaxis of malaria, it is imperative to develop a sensitive, specific, and accurate assay for early diagnosis of Plasmodium falciparum infection, which is the major cause of malaria. In this study, we aimed to develop a loop-mediated isothermal amplification (LAMP) assay with P. falciparum unique genes for sensitive, specific, and accurate detection of P. falciparum infection. The unique genes of P. falciparum were randomly selected from PlasmoDB. The LAMP primers of the unique genes were designed using PrimerExplorer V4. LAMP assays with primers from unique genes of P. falciparum and conserved 18S rRNA gene were developed and their sensitivity was assessed. The specificity of the most sensitive LAMP assay was further examined using genomic DNA from Plasmodium vivax, Plasmodium yoelii and Toxoplasma gondii. Finally, the unique gene-based LAMP assay was validated using clinical samples of P. falciparum infection cases. A total of 31 sets of top-scored LAMP primers from nine unique genes were selected from the pools of designed primers. The LAMP assay with PF3D7_1253300-5 was the most sensitive with the detection limit 5 parasites/μl, and it displayed negative LAMP assay with the genomic DNA samples of P. vivax, P. yoelii, and T. gondii. The LAMP assay with PF3D7_0112300 (18S rRNA) was less sensitive with the detection limit 50 parasites/μl, and it displayed negative LAMP assay with the genomic DNA samples of P. yoelii and T. gondii, but displayed positive LAMP detection with P. vivax. The positive detection rate of the LAMP assay with PF3D7_1253300-5 was 90% (27/30), higher than that (80%, 24/30) of the positive rate of PF3D7_0112300 (18S rRNA) in examining clinical samples of P. falciparum infection cases. The LAMP assay with the primer set PF3D7_1253300-5 was more sensitive, specific, and accurate than those with PF3D7_0112300 (18S rRNA) in examining P. falciparum infection, and therefore it is a promising tool for diagnosis of P. falciparum infection.
Collapse
Affiliation(s)
- Yijing Zhang
- Department of Human Parasitology, College of Basic Medicine; Department of Infectious Diseases, Renmin Hospital, Hubei University of Medicine, Shiyan, People’s Republic of China
| | - Yi Yao
- Department of Human Parasitology, College of Basic Medicine; Department of Infectious Diseases, Renmin Hospital, Hubei University of Medicine, Shiyan, People’s Republic of China
| | - Weixing Du
- Department of Human Parasitology, College of Basic Medicine; Department of Infectious Diseases, Renmin Hospital, Hubei University of Medicine, Shiyan, People’s Republic of China
| | - Kai Wu
- Department of Schistosomiasis and Endemic Diseases, Wuhan City Center for Disease Prevention and Control, Wuhan, People’s Republic of China
| | - Wenyue Xu
- The Department of Pathogenic Biology, Third Military Medical University, Chongqing, People’s Republic of China
| | - Min Lin
- Department of Histology and Embryology, Shantou University Medical College, Shantou, People’s Republic of China
| | - Huabing Tan
- Department of Human Parasitology, College of Basic Medicine; Department of Infectious Diseases, Renmin Hospital, Hubei University of Medicine, Shiyan, People’s Republic of China
| | - Jian Li
- Department of Human Parasitology, College of Basic Medicine; Department of Infectious Diseases, Renmin Hospital, Hubei University of Medicine, Shiyan, People’s Republic of China
- Corresponding author.
| |
Collapse
|
38
|
Mason CE, Afshinnekoo E, Tighe S, Wu S, Levy S. International Standards for Genomes, Transcriptomes, and Metagenomes. J Biomol Tech 2017; 28:8-18. [PMID: 28337071 PMCID: PMC5359768 DOI: 10.7171/jbt.17-2801-006] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Challenges and biases in preparing, characterizing, and sequencing DNA and RNA can have significant impacts on research in genomics across all kingdoms of life, including experiments in single-cells, RNA profiling, and metagenomics (across multiple genomes). Technical artifacts and contamination can arise at each point of sample manipulation, extraction, sequencing, and analysis. Thus, the measurement and benchmarking of these potential sources of error are of paramount importance as next-generation sequencing (NGS) projects become more global and ubiquitous. Fortunately, a variety of methods, standards, and technologies have recently emerged that improve measurements in genomics and sequencing, from the initial input material to the computational pipelines that process and annotate the data. Here we review current standards and their applications in genomics, including whole genomes, transcriptomes, mixed genomic samples (metagenomes), and the modified bases within each (epigenomes and epitranscriptomes). These standards, tools, and metrics are critical for quantifying the accuracy of NGS methods, which will be essential for robust approaches in clinical genomics and precision medicine.
Collapse
Affiliation(s)
- Christopher E. Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA
- Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, New York, New York 10065, USA
| | - Ebrahim Afshinnekoo
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA
- School of Medicine, New York Medical College, Valhalla, New York 10595, USA
| | - Scott Tighe
- Advanced Genomics Lab, University of Vermont Cancer Center, Burlington, Vermont 05405, USA
| | - Shixiu Wu
- Hangzhou Cancer Institute in Hangzhou Cancer Hospital, Hangzhou, China; and
| | - Shawn Levy
- HudsonAlpha Institute of Technology, Huntsville, Alabama 35806, USA
| |
Collapse
|
39
|
Dara A, Drábek EF, Travassos MA, Moser KA, Delcher AL, Su Q, Hostelley T, Coulibaly D, Daou M, Dembele A, Diarra I, Kone AK, Kouriba B, Laurens MB, Niangaly A, Traore K, Tolo Y, Fraser CM, Thera MA, Djimde AA, Doumbo OK, Plowe CV, Silva JC. New var reconstruction algorithm exposes high var sequence diversity in a single geographic location in Mali. Genome Med 2017; 9:30. [PMID: 28351419 PMCID: PMC5368897 DOI: 10.1186/s13073-017-0422-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 03/02/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Encoded by the var gene family, highly variable Plasmodium falciparum erythrocyte membrane protein-1 (PfEMP1) proteins mediate tissue-specific cytoadherence of infected erythrocytes, resulting in immune evasion and severe malaria disease. Sequencing and assembling the 40-60 var gene complement for individual infections has been notoriously difficult, impeding molecular epidemiological studies and the assessment of particular var elements as subunit vaccine candidates. METHODS We developed and validated a novel algorithm, Exon-Targeted Hybrid Assembly (ETHA), to perform targeted assembly of var gene sequences, based on a combination of Pacific Biosciences and Illumina data. RESULTS Using ETHA, we characterized the repertoire of var genes in 12 samples from uncomplicated malaria infections in children from a single Malian village and showed them to be as genetically diverse as vars from isolates from around the globe. The gene var2csa, a member of the var family associated with placental malaria pathogenesis, was present in each genome, as were vars previously associated with severe malaria. CONCLUSION ETHA, a tool to discover novel var sequences from clinical samples, will aid the understanding of malaria pathogenesis and inform the design of malaria vaccines based on PfEMP1. ETHA is available at: https://sourceforge.net/projects/etha/ .
Collapse
Affiliation(s)
- Antoine Dara
- Division of Malaria Research, Institute for Global Health, University of Maryland School of Medicine, Baltimore, MD USA
| | - Elliott F. Drábek
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD USA
| | - Mark A. Travassos
- Division of Malaria Research, Institute for Global Health, University of Maryland School of Medicine, Baltimore, MD USA
| | - Kara A. Moser
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD USA
| | - Arthur L. Delcher
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD USA
| | - Qi Su
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD USA
| | - Timothy Hostelley
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD USA
| | - Drissa Coulibaly
- Malaria Research and Training Center, University of Science, Techniques and Technologies, Bamako, Mali
| | - Modibo Daou
- Malaria Research and Training Center, University of Science, Techniques and Technologies, Bamako, Mali
| | - Ahmadou Dembele
- Malaria Research and Training Center, University of Science, Techniques and Technologies, Bamako, Mali
| | - Issa Diarra
- Malaria Research and Training Center, University of Science, Techniques and Technologies, Bamako, Mali
| | - Abdoulaye K. Kone
- Malaria Research and Training Center, University of Science, Techniques and Technologies, Bamako, Mali
| | - Bourema Kouriba
- Malaria Research and Training Center, University of Science, Techniques and Technologies, Bamako, Mali
| | - Matthew B. Laurens
- Division of Malaria Research, Institute for Global Health, University of Maryland School of Medicine, Baltimore, MD USA
| | - Amadou Niangaly
- Malaria Research and Training Center, University of Science, Techniques and Technologies, Bamako, Mali
| | - Karim Traore
- Malaria Research and Training Center, University of Science, Techniques and Technologies, Bamako, Mali
| | - Youssouf Tolo
- Malaria Research and Training Center, University of Science, Techniques and Technologies, Bamako, Mali
| | - Claire M. Fraser
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD USA
| | - Mahamadou A. Thera
- Malaria Research and Training Center, University of Science, Techniques and Technologies, Bamako, Mali
| | - Abdoulaye A. Djimde
- Malaria Research and Training Center, University of Science, Techniques and Technologies, Bamako, Mali
| | - Ogobara K. Doumbo
- Malaria Research and Training Center, University of Science, Techniques and Technologies, Bamako, Mali
| | - Christopher V. Plowe
- Division of Malaria Research, Institute for Global Health, University of Maryland School of Medicine, Baltimore, MD USA
| | - Joana C. Silva
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD USA
| |
Collapse
|
40
|
Volkman SK, Herman J, Lukens AK, Hartl DL. Genome-Wide Association Studies of Drug-Resistance Determinants. Trends Parasitol 2016; 33:214-230. [PMID: 28179098 DOI: 10.1016/j.pt.2016.10.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Revised: 09/26/2016] [Accepted: 10/06/2016] [Indexed: 02/07/2023]
Abstract
Population genetic strategies that leverage association, selection, and linkage have identified drug-resistant loci. However, challenges and limitations persist in identifying drug-resistance loci in malaria. In this review we discuss the genetic basis of drug resistance and the use of genome-wide association studies, complemented by selection and linkage studies, to identify and understand mechanisms of drug resistance and response. We also discuss the implications of nongenetic mechanisms of drug resistance recently reported in the literature, and present models of the interplay between nongenetic and genetic processes that contribute to the emergence of drug resistance. Throughout, we examine artemisinin resistance as an example to emphasize challenges in identifying phenotypes suitable for population genetic studies as well as complications due to multiple-factor drug resistance.
Collapse
Affiliation(s)
- Sarah K Volkman
- Harvard T.H. Chan School of Public Health, Department of Immunology and Infectious Disease, Boston, MA, USA; The Broad Institute of MIT and Harvard, Infectious Disease Initiative, Cambridge, MA, USA; Simmons College, School of Nursing and Health Science, Boston, MA, USA.
| | - Jonathan Herman
- Harvard T.H. Chan School of Public Health, Department of Immunology and Infectious Disease, Boston, MA, USA; Weill Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Amanda K Lukens
- Harvard T.H. Chan School of Public Health, Department of Immunology and Infectious Disease, Boston, MA, USA; The Broad Institute of MIT and Harvard, Infectious Disease Initiative, Cambridge, MA, USA
| | - Daniel L Hartl
- The Broad Institute of MIT and Harvard, Infectious Disease Initiative, Cambridge, MA, USA; Harvard University, Organismic and Evolutionary Biology, Cambridge, MA, USA
| |
Collapse
|