1
|
Zhao LY, Song J, Liu Y, Song CX, Yi C. Mapping the epigenetic modifications of DNA and RNA. Protein Cell 2020; 11:792-808. [PMID: 32440736 PMCID: PMC7647981 DOI: 10.1007/s13238-020-00733-7] [Citation(s) in RCA: 204] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 03/16/2020] [Indexed: 02/05/2023] Open
Abstract
Over 17 and 160 types of chemical modifications have been identified in DNA and RNA, respectively. The interest in understanding the various biological functions of DNA and RNA modifications has lead to the cutting-edged fields of epigenomics and epitranscriptomics. Developing chemical and biological tools to detect specific modifications in the genome or transcriptome has greatly facilitated their study. Here, we review the recent technological advances in this rapidly evolving field. We focus on high-throughput detection methods and biological findings for these modifications, and discuss questions to be addressed as well. We also summarize third-generation sequencing methods, which enable long-read and single-molecule sequencing of DNA and RNA modification.
Collapse
|
Review |
5 |
204 |
2
|
Lagarde J, Uszczynska-Ratajczak B, Carbonell S, Pérez-Lluch S, Abad A, Davis C, Gingeras TR, Frankish A, Harrow J, Guigo R, Johnson R. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. Nat Genet 2017; 49:1731-1740. [PMID: 29106417 PMCID: PMC5709232 DOI: 10.1038/ng.3988] [Citation(s) in RCA: 182] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 10/11/2017] [Indexed: 12/20/2022]
Abstract
Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.
Collapse
|
research-article |
8 |
182 |
3
|
The Genome of C57BL/6J "Eve", the Mother of the Laboratory Mouse Genome Reference Strain. G3-GENES GENOMES GENETICS 2019; 9:1795-1805. [PMID: 30996023 PMCID: PMC6553538 DOI: 10.1534/g3.119.400071] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Isogenic laboratory mouse strains enhance reproducibility because individual animals are genetically identical. For the most widely used isogenic strain, C57BL/6, there exists a wealth of genetic, phenotypic, and genomic data, including a high-quality reference genome (GRCm38.p6). Now 20 years after the first release of the mouse reference genome, C57BL/6J mice are at least 26 inbreeding generations removed from GRCm38 and the strain is now maintained with periodic reintroduction of cryorecovered mice derived from a single breeder pair, aptly named Adam and Eve. To provide an update to the mouse reference genome that more accurately represents the genome of today's C57BL/6J mice, we took advantage of long read, short read, and optical mapping technologies to generate a de novo assembly of the C57BL/6J Eve genome (B6Eve). Using these data, we have addressed recurring variants observed in previous mouse genomic studies. We have also identified structural variations, closed gaps in the mouse reference assembly, and revealed previously unannotated coding sequences. This B6Eve assembly explains discrepant observations that have been associated with GRCm38-based analyses, and will inform a reference genome that is more representative of the C57BL/6J mice that are in use today.
Collapse
|
Research Support, N.I.H., Intramural |
6 |
38 |
4
|
Ferraj A, Audano PA, Balachandran P, Czechanski A, Flores JI, Radecki AA, Mosur V, Gordon DS, Walawalkar IA, Eichler EE, Reinholdt LG, Beck CR. Resolution of structural variation in diverse mouse genomes reveals chromatin remodeling due to transposable elements. CELL GENOMICS 2023; 3:100291. [PMID: 37228752 PMCID: PMC10203049 DOI: 10.1016/j.xgen.2023.100291] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 02/03/2023] [Accepted: 03/10/2023] [Indexed: 05/25/2023]
Abstract
Diverse inbred mouse strains are important biomedical research models, yet genome characterization of many strains is fundamentally lacking in comparison with humans. In particular, catalogs of structural variants (SVs) (variants ≥ 50 bp) are incomplete, limiting the discovery of causative alleles for phenotypic variation. Here, we resolve genome-wide SVs in 20 genetically distinct inbred mice with long-read sequencing. We report 413,758 site-specific SVs affecting 13% (356 Mbp) of the mouse reference assembly, including 510 previously unannotated coding variants. We substantially improve the Mus musculus transposable element (TE) callset, and we find that TEs comprise 39% of SVs and account for 75% of altered bases. We further utilize this callset to investigate how TE heterogeneity affects mouse embryonic stem cells and find multiple TE classes that influence chromatin accessibility. Our work provides a comprehensive analysis of SVs found in diverse mouse genomes and illustrates the role of TEs in epigenetic differences.
Collapse
|
research-article |
2 |
30 |
5
|
Mangin A, de Pontual L, Tsai YC, Monteil L, Nizon M, Boisseau P, Mercier S, Ziegle J, Harting J, Heiner C, Gourdon G, Tomé S. Robust Detection of Somatic Mosaicism and Repeat Interruptions by Long-Read Targeted Sequencing in Myotonic Dystrophy Type 1. Int J Mol Sci 2021; 22:2616. [PMID: 33807660 PMCID: PMC7962047 DOI: 10.3390/ijms22052616] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 02/26/2021] [Accepted: 02/27/2021] [Indexed: 02/07/2023] Open
Abstract
Myotonic dystrophy type 1 (DM1) is the most complex and variable trinucleotide repeat disorder caused by an unstable CTG repeat expansion, reaching up to 4000 CTG in the most severe cases. The genetic and clinical variability of DM1 depend on the sex and age of the transmitting parent, but also on the CTG repeat number, presence of repeat interruptions and/or on the degree of somatic instability. Currently, it is difficult to simultaneously and accurately determine these contributing factors in DM1 patients due to the limitations of gold standard methods used in molecular diagnostics and research laboratories. Our study showed the efficiency of the latest PacBio long-read sequencing technology to sequence large CTG trinucleotides, detect multiple and single repeat interruptions and estimate the levels of somatic mosaicism in DM1 patients carrying complex CTG repeat expansions inaccessible to most methods. Using this innovative approach, we revealed the existence of de novo CCG interruptions associated with CTG stabilization/contraction across generations in a new DM1 family. We also demonstrated that our method is suitable to sequence the DM1 locus and measure somatic mosaicism in DM1 families carrying more than 1000 pure CTG repeats. Better characterization of expanded alleles in DM1 patients can significantly improve prognosis and genetic counseling, not only in DM1 but also for other tandem DNA repeat disorders.
Collapse
|
research-article |
4 |
25 |
6
|
Abstract
Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT). However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements.
Methods: We made use of such components on ONTs miniature ‘MinION’ device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings.
Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads.
Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications with relevance to complex genomes.
Collapse
|
Journal Article |
8 |
22 |
7
|
Hiatt SM, Lawlor JM, Handley LH, Ramaker RC, Rogers BB, Partridge EC, Boston LB, Williams M, Plott CB, Jenkins J, Gray DE, Holt JM, Bowling KM, Bebin EM, Grimwood J, Schmutz J, Cooper GM. Long-read genome sequencing for the molecular diagnosis of neurodevelopmental disorders. HGG ADVANCES 2021; 2:100023. [PMID: 33937879 PMCID: PMC8087252 DOI: 10.1016/j.xhgg.2021.100023] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 01/07/2021] [Indexed: 02/07/2023] Open
Abstract
Exome and genome sequencing have proven to be effective tools for the diagnosis of neurodevelopmental disorders (NDDs), but large fractions of NDDs cannot be attributed to currently detectable genetic variation. This is likely, at least in part, a result of the fact that many genetic variants are difficult or impossible to detect through typical short-read sequencing approaches. Here, we describe a genomic analysis using Pacific Biosciences circular consensus sequencing (CCS) reads, which are both long (>10 kb) and accurate (>99% bp accuracy). We used CCS on six proband-parent trios with NDDs that were unexplained despite extensive testing, including genome sequencing with short reads. We identified variants and created de novo assemblies in each trio, with global metrics indicating these datasets are more accurate and comprehensive than those provided by short-read data. In one proband, we identified a likely pathogenic (LP), de novo L1-mediated insertion in CDKL5 that results in duplication of exon 3, leading to a frameshift. In a second proband, we identified multiple large de novo structural variants, including insertion-translocations affecting DGKB and MLLT3, which we show disrupt MLLT3 transcript levels. We consider this extensive structural variation likely pathogenic. The breadth and quality of variant detection, coupled to finding variants of clinical and research interest in two of six probands with unexplained NDDs, support the hypothesis that long-read genome sequencing can substantially improve rare disease genetic discovery rates.
Collapse
|
research-article |
4 |
21 |
8
|
Sasani TA, Cone KR, Quinlan AR, Elde NC. Long read sequencing reveals poxvirus evolution through rapid homogenization of gene arrays. eLife 2018; 7:35453. [PMID: 30156554 PMCID: PMC6115191 DOI: 10.7554/elife.35453] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2018] [Accepted: 08/12/2018] [Indexed: 12/21/2022] Open
Abstract
Poxvirus adaptation can involve combinations of recombination-driven gene copy number variation and beneficial single nucleotide variants (SNVs) at the same loci. How these distinct mechanisms of genetic diversification might simultaneously facilitate adaptation to host immune defenses is unknown. We performed experimental evolution with vaccinia virus populations harboring a SNV in a gene actively undergoing copy number amplification. Using long sequencing reads from the Oxford Nanopore Technologies platform, we phased SNVs within large gene copy arrays for the first time. Our analysis uncovered a mechanism of adaptive SNV homogenization reminiscent of gene conversion, which is actively driven by selection. This study reveals a new mechanism for the fluid gain of beneficial mutations in genetic regions undergoing active recombination in viruses and illustrates the value of long read sequencing technologies for investigating complex genome dynamics in diverse biological systems.
Collapse
|
Research Support, Non-U.S. Gov't |
7 |
19 |
9
|
Pucker B, Irisarri I, de Vries J, Xu B. Plant genome sequence assembly in the era of long reads: Progress, challenges and future directions. QUANTITATIVE PLANT BIOLOGY 2022; 3:e5. [PMID: 37077982 PMCID: PMC10095996 DOI: 10.1017/qpb.2021.18] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/24/2021] [Accepted: 12/21/2021] [Indexed: 05/03/2023]
Abstract
Third-generation long-read sequencing is transforming plant genomics. Oxford Nanopore Technologies and Pacific Biosciences are offering competing long-read sequencing technologies and enable plant scientists to investigate even large and complex plant genomes. Sequencing projects can be conducted by single research groups and sequences of smaller plant genomes can be completed within days. This also resulted in an increased investigation of genomes from multiple species in large scale to address fundamental questions associated with the origin and evolution of land plants. Increased accessibility of sequencing devices and user-friendly software allows more researchers to get involved in genomics. Current challenges are accurately resolving diploid or polyploid genome sequences and better accounting for the intra-specific diversity by switching from the use of single reference genome sequences to a pangenome graph.
Collapse
|
Review |
3 |
18 |
10
|
Abstract
Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT). However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements. Methods: We made use of such components on ONTs miniature 'MinION' device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings. Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads. Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications.
Collapse
|
Journal Article |
8 |
17 |
11
|
Rausch T, Snajder R, Leger A, Simovic M, Giurgiu M, Villacorta L, Henssen AG, Fröhling S, Stegle O, Birney E, Bonder MJ, Ernst A, Korbel JO. Long-read sequencing of diagnosis and post-therapy medulloblastoma reveals complex rearrangement patterns and epigenetic signatures. CELL GENOMICS 2023; 3:100281. [PMID: 37082141 PMCID: PMC10112291 DOI: 10.1016/j.xgen.2023.100281] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 06/14/2022] [Accepted: 02/22/2023] [Indexed: 04/22/2023]
Abstract
Cancer genomes harbor a broad spectrum of structural variants (SVs) driving tumorigenesis, a relevant subset of which escape discovery using short-read sequencing. We employed Oxford Nanopore Technologies (ONT) long-read sequencing in a paired diagnostic and post-therapy medulloblastoma to unravel the haplotype-resolved somatic genetic and epigenetic landscape. We assembled complex rearrangements, including a 1.55-Mbp chromothripsis event, and we uncover a complex SV pattern termed templated insertion (TI) thread, characterized by short (mostly <1 kb) insertions showing prevalent self-concatenation into highly amplified structures of up to 50 kbp in size. TI threads occur in 3% of cancers, with a prevalence up to 74% in liposarcoma, and frequent colocalization with chromothripsis. We also perform long-read-based methylome profiling and discover allele-specific methylation (ASM) effects, complex rearrangements exhibiting differential methylation, and differential promoter methylation in cancer-driver genes. Our study shows the advantage of long-read sequencing in the discovery and characterization of complex somatic rearrangements.
Collapse
|
research-article |
2 |
16 |
12
|
Abstract
Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT). However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements. Methods: We made use of such components on ONTs miniature 'MinION' device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings. Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads. Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications.
Collapse
|
Journal Article |
8 |
16 |
13
|
Tseng E, Rowell WJ, Glenn OC, Hon T, Barrera J, Kujawa S, Chiba-Falek O. The Landscape of SNCA Transcripts Across Synucleinopathies: New Insights From Long Reads Sequencing Analysis. Front Genet 2019; 10:584. [PMID: 31338105 PMCID: PMC6629766 DOI: 10.3389/fgene.2019.00584] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 06/04/2019] [Indexed: 11/21/2022] Open
Abstract
Dysregulation of alpha-synuclein expression has been implicated in the pathogenesis of synucleinopathies, in particular Parkinson's Disease (PD) and Dementia with Lewy bodies (DLB). Previous studies have shown that the alternatively spliced isoforms of the SNCA gene are differentially expressed in different parts of the brain for PD and DLB patients. Similarly, SNCA isoforms with skipped exons can have a functional impact on the protein domains. The large intronic region of the SNCA gene was also shown to harbor structural variants that affect transcriptional levels. Here, we apply the first study of using long read sequencing with targeted capture of both the gDNA and cDNA of the SNCA gene in brain tissues of PD, DLB, and control samples using the PacBio Sequel system. The targeted full-length cDNA (Iso-Seq) data confirmed complex usage of known alternative start sites and variable 3' UTR lengths, as well as novel 5' starts and 3' ends not previously described. The targeted gDNA data allowed phasing of up to 81% of the ~114 kb SNCA region, with the longest phased block exceeding 54 kb. We demonstrate that long gDNA and cDNA reads have the potential to reveal long-range information not previously accessible using traditional sequencing methods. This approach has a potential impact in studying disease risk genes such as SNCA, providing new insights into the genetic etiologies, including perturbations to the landscape the gene transcripts, of human complex diseases such as synucleinopathies.
Collapse
|
research-article |
6 |
14 |
14
|
Zhang Y, Zhang Y, Burke JM, Gleitsman K, Friedrich SM, Liu KJ, Wang TH. A Simple Thermoplastic Substrate Containing Hierarchical Silica Lamellae for High-Molecular-Weight DNA Extraction. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2016; 28:10630-10636. [PMID: 27862402 PMCID: PMC5234087 DOI: 10.1002/adma.201603738] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Revised: 09/05/2016] [Indexed: 05/22/2023]
Abstract
An inexpensive, magnetic thermoplastic nanomaterial is developed utilizing a hierarchical layering of micro- and nanoscale silica lamellae to create a high-surface-area and low-shear substrate capable of capturing vast amounts of ultrahigh-molecular-weight DNA. Extraction is performed via a simple 45 min process and is capable of achieving binding capacities up to 1 000 000 times greater than silica microparticles.
Collapse
|
research-article |
9 |
9 |
15
|
Lamb HJ, Ross EM, Nguyen LT, Lyons RE, Moore SS, Hayes BJ. Characterization of the poll allele in Brahman cattle using long-read Oxford Nanopore sequencing. J Anim Sci 2020; 98:5823688. [PMID: 32318708 DOI: 10.1093/jas/skaa127] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 04/20/2020] [Indexed: 12/13/2022] Open
Abstract
Brahman cattle (Bos indicus) are well adapted to thrive in tropical environments. Since their introduction to Australia in 1933, Brahman's ability to grow and reproduce on marginal lands has proven their value in the tropical beef industry. The poll phenotype, which describes the absence of horns, has become desirable in the cattle industry for animal welfare and handler safety concerns. The poll locus has been mapped to chromosome one. Four alleles, each a copy number variant, have been reported across this locus in B. indicus and Bos taurus. However, the causative mutation in Brahman cattle has not been fully characterized. Oxford Nanopore Technologies' minION sequencer was used to sequence four homozygous poll (PcPc), four homozygous horned (pp), and three heterozygous (Pcp) Brahmans to characterize the poll allele in Brahman cattle. A total of 98 Gb were sequenced and an average coverage of 3.33X was achieved. Read N50 scores ranged from 9.9 to 19 kb. Examination of the mapped reads across the poll locus revealed insertions approximately 200 bp in length in the poll animals that were absent in the horned animals. These results are consistent with the Celtic poll allele, a 212-bp duplication that replaces 10 bp. This provides direct evidence that the Celtic poll allele is segregating in the Australian Brahman population.
Collapse
|
|
5 |
9 |
16
|
Pucker B, Rückert C, Stracke R, Viehöver P, Kalinowski J, Weisshaar B. Twenty-Five Years of Propagation in Suspension Cell Culture Results in Substantial Alterations of the Arabidopsis Thaliana Genome. Genes (Basel) 2019; 10:E671. [PMID: 31480756 PMCID: PMC6770967 DOI: 10.3390/genes10090671] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Revised: 08/23/2019] [Accepted: 08/29/2019] [Indexed: 01/16/2023] Open
Abstract
Arabidopsis thaliana is one of the best studied plant model organisms. Besides cultivation in greenhouses, cells of this plant can also be propagated in suspension cell culture. At7 is one such cell line that was established about 25 years ago. Here, we report the sequencing and the analysis of the At7 genome. Large scale duplications and deletions compared to the Columbia-0 (Col-0) reference sequence were detected. The number of deletions exceeds the number of insertions, thus indicating that a haploid genome size reduction is ongoing. Patterns of small sequence variants differ from the ones observed between A. thaliana accessions, e.g., the number of single nucleotide variants matches the number of insertions/deletions. RNA-Seq analysis reveals that disrupted alleles are less frequent in the transcriptome than the native ones.
Collapse
|
research-article |
6 |
8 |
17
|
Laurent S, Gehrig C, Nouspikel T, Amr SS, Oza A, Murphy E, Vannier A, Béna FS, Carminho-Rodrigues MT, Blouin JL, Cao Van H, Abramowicz M, Paoloni-Giacobino A, Guipponi M. Molecular characterization of pathogenic OTOA gene conversions in hearing loss patients. Hum Mutat 2021; 42:373-377. [PMID: 33492714 DOI: 10.1002/humu.24167] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 12/02/2020] [Accepted: 12/16/2020] [Indexed: 11/11/2022]
Abstract
Bi-allelic loss-of-function variants of OTOA are a well-known cause of moderate-to-severe hearing loss. Whereas non-allelic homologous recombination-mediated deletions of the gene are well known, gene conversions to pseudogene OTOAP1 have been reported in the literature but never fully described nor their pathogenicity assessed. Here, we report two unrelated patients with moderate hearing-loss, who were compound heterozygotes for a converted allele and a deletion of OTOA. The conversions were initially detected through sequencing depths anomalies at the OTOA locus after exome sequencing, then confirmed with long range polymerase chain reactions. Both conversions lead to loss-of-function by introducing a premature stop codon in exon 22 (p.Glu787*). Using genomic alignments and long read nanopore sequencing, we found that the two probands carry stretches of converted DNA of widely different lengths (at least 9 kbp and around 900 bp, respectively).
Collapse
|
Journal Article |
4 |
7 |
18
|
Reimer K, Neugebauer K. Preparation of Mammalian Nascent RNA for Long Read Sequencing. CURRENT PROTOCOLS IN MOLECULAR BIOLOGY 2020; 133:e128. [PMID: 33085989 PMCID: PMC7586757 DOI: 10.1002/cpmb.128] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Long read sequencing technologies now allow high-quality sequencing of RNAs (or their cDNAs) that are hundreds to thousands of nucleotides long. Long read sequences of nascent RNA provide single-nucleotide-resolution information about co-transcriptional RNA processing events-e.g., splicing, folding, and base modifications. Here, we describe how to isolate nascent RNA from mammalian cells through subcellular fractionation of chromatin-associated RNA, as well as how to deplete poly(A)+ RNA and rRNA, and, finally, how to generate a full-length cDNA library for use on long read sequencing platforms. This approach allows for an understanding of coordinated splicing status across multi-intron transcripts by revealing patterns of splicing or other RNA processing events that cannot be gained from traditional short read RNA sequencing. © 2020 Wiley Periodicals LLC. Basic Protocol 1: Subcellular fractionation Basic Protocol 2: Nascent RNA isolation and adapter ligation Basic Protocol 3: cDNA amplicon preparation.
Collapse
|
Research Support, N.I.H., Extramural |
5 |
7 |
19
|
Comparative Analysis of PacBio and Oxford Nanopore Sequencing Technologies for Transcriptomic Landscape Identification of Penaeus monodon. Life (Basel) 2021; 11:life11080862. [PMID: 34440606 PMCID: PMC8399832 DOI: 10.3390/life11080862] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 08/07/2021] [Accepted: 08/17/2021] [Indexed: 12/16/2022] Open
Abstract
With the advantages that long-read sequencing platforms such as Pacific Biosciences (Menlo Park, CA, USA) (PacBio) and Oxford Nanopore Technologies (Oxford, UK) (ONT) can offer, various research fields such as genomics and transcriptomics can exploit their benefits. Selecting an appropriate sequencing platform is undoubtedly crucial for the success of the research outcome, thus there is a need to compare these long-read sequencing platforms and evaluate them for specific research questions. This study aims to compare the performance of PacBio and ONT platforms for transcriptomic analysis by utilizing transcriptome data from three different tissues (hepatopancreas, intestine, and gonads) of the juvenile black tiger shrimp, Penaeus monodon. We compared three important features: (i) main characteristics of the sequencing libraries and their alignment with the reference genome, (ii) transcript assembly features and isoform identification, and (iii) correlation of the quantification of gene expression levels for both platforms. Our analyses suggest that read-length bias and differences in sequencing throughput are highly influential factors when using long reads in transcriptome studies. These comparisons can provide a guideline when designing a transcriptome study utilizing these two long-read sequencing technologies.
Collapse
|
|
4 |
6 |
20
|
Alba P, Carfora V, Feltrin F, Diaconu EL, Sorbara L, Dell'Aira E, Cerci T, Ianzano A, Donati V, Franco A, Battisti A. Evidence of structural rearrangements in ESBL-positive pESI(like) megaplasmids of S.Infantis. FEMS Microbiol Lett 2023; 370:7049104. [PMID: 36806934 PMCID: PMC9990980 DOI: 10.1093/femsle/fnad014] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 01/19/2023] [Accepted: 02/17/2023] [Indexed: 02/23/2023] Open
Abstract
The increasing prevalence of pESI(like)-positive, multidrug-resistant (MDR) S. Infantis in Europe is a cause of major concern. As previously demonstrated, the pESI(like) megaplasmid is not only a carrier of antimicrobial resistant (AMR) genes (at least tet, dfr, and sul genes), but also harbours several virulence and fitness genes, and toxin/antitoxin systems that enhance its persistence in the S. Infantis host. In this study, five prototype pESI(like) plasmids, of either CTX-M-1 or CTX-M-65 ESBL-producing strains, were long-read sequenced using Oxford Nanopore Technology (ONT), and their complete sequences were resolved. Comparison of the structure and gene content of the five sequenced plasmids, and further comparison with previously published pESI(like) sequences, indicated that although the sequence of such pESI(like) 'mosaic' plasmids remains almost identical, their structures appear different and composed of regions inserted or transposed after different events. The results obtained in this study are essential to better understand the plasticity and the evolution of the pESI(like) megaplasmid, and therefore to better address risk management options and policy decisions to fight against AMR and MDR in Salmonella and other food-borne pathogens. Graphical representation of the pESI-like plasmid complete sequence (ID 12037823/11). Block colours indicate the function of the genes: red: repB gene; pink: class I integrons (IntI); yellow; mobile elements; blue: resistance genes; green: toxin/anti-toxin systems; grey: mer operon; light green: genes involve in conjugation.
Collapse
|
research-article |
2 |
5 |
21
|
Ng E, Dobrica MO, Harris JM, Wu Y, Tsukuda S, Wing PAC, Piazza P, Balfe P, Matthews PC, Ansari MA, McKeating JA. An enrichment protocol and analysis pipeline for long read sequencing of the hepatitis B virus transcriptome. J Gen Virol 2023; 104:001856. [PMID: 37196057 PMCID: PMC10845048 DOI: 10.1099/jgv.0.001856] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 04/21/2023] [Indexed: 05/19/2023] Open
Abstract
Hepatitis B virus (HBV) is one of the smallest human DNA viruses and its 3.2 Kb genome encodes multiple overlapping open reading frames, making its viral transcriptome challenging to dissect. Previous studies have combined quantitative PCR and Next Generation Sequencing to identify viral transcripts and splice junctions, however the fragmentation and selective amplification used in short read sequencing precludes the resolution of full length RNAs. Our study coupled an oligonucleotide enrichment protocol with state-of-the-art long read sequencing (PacBio) to identify the repertoire of HBV RNAs. This methodology provides sequencing libraries where up to 25 % of reads are of viral origin and enable the identification of canonical (unspliced), non-canonical (spliced) and chimeric viral-human transcripts. Sequencing RNA isolated from de novo HBV infected cells or those transfected with 1.3 × overlength HBV genomes allowed us to assess the viral transcriptome and to annotate 5' truncations and polyadenylation profiles. The two HBV model systems showed an excellent agreement in the pattern of major viral RNAs, however differences were noted in the abundance of spliced transcripts. Viral-host chimeric transcripts were identified and more commonly found in the transfected cells. Enrichment capture and PacBio sequencing allows the assignment of canonical and non-canonical HBV RNAs using an open-source analysis pipeline that enables the accurate mapping of the HBV transcriptome.
Collapse
|
methods-article |
2 |
4 |
22
|
Yu MHC, Chau JFT, Au SLK, Lo HM, Yeung KS, Fung JLF, Mak CCY, Chung CCY, Chan KYK, Chung BHY, Kan ASY. Evaluating the Clinical Utility of Genome Sequencing for Cytogenetically Balanced Chromosomal Abnormalities in Prenatal Diagnosis. Front Genet 2021; 11:620162. [PMID: 33584815 PMCID: PMC7873444 DOI: 10.3389/fgene.2020.620162] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 12/21/2020] [Indexed: 11/13/2022] Open
Abstract
Balanced chromosomal abnormalities (BCAs) are changes in the localization or orientation of a chromosomal segment without visible gain or loss of genetic material. BCAs occur at a frequency of 1 in 500 newborns and are associated with an increased risk of multiple congenital anomalies and/or neurodevelopmental disorders, especially if it is a de novo mutation. In this pilot project, we used short read genome sequencing (GS) to retrospectively re-sequence ten prenatal subjects with de novo BCAs and compared the performance of GS with the original karyotyping. GS characterized all BCAs found by conventional karyotyping with the added benefit of precise sub-band delineation. By identifying BCA breakpoints at the nucleotide level using GS, we found disruption of OMIM genes in three cases and identified cryptic gain/loss at the breakpoints in two cases. Of these five cases, four cases reached a definitive genetic diagnosis while the other one case had a BCA interpreted as unknown clinical significance. The additional information gained from GS can change the interpretation of the BCAs and has the potential to improve the genetic counseling and perinatal management by providing a more specific genetic diagnosis. This demonstrates the added clinical utility of using GS for the diagnosis of BCAs.
Collapse
|
Journal Article |
4 |
4 |
23
|
Kirov I, Merkulov P, Dudnikov M, Polkhovskaya E, Komakhin RA, Konstantinov Z, Gvaramiya S, Ermolaev A, Kudryavtseva N, Gilyok M, Divashuk MG, Karlov GI, Soloviev A. Transposons Hidden in Arabidopsis thaliana Genome Assembly Gaps and Mobilization of Non-Autonomous LTR Retrotransposons Unravelled by Nanotei Pipeline. PLANTS (BASEL, SWITZERLAND) 2021; 10:2681. [PMID: 34961152 PMCID: PMC8704663 DOI: 10.3390/plants10122681] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 11/26/2021] [Accepted: 12/02/2021] [Indexed: 06/12/2023]
Abstract
Long-read data is a great tool to discover new active transposable elements (TEs). However, no ready-to-use tools were available to gather this information from low coverage ONT datasets. Here, we developed a novel pipeline, nanotei, that allows detection of TE-contained structural variants, including individual TE transpositions. We exploited this pipeline to identify TE insertion in the Arabidopsis thaliana genome. Using nanotei, we identified tens of TE copies, including ones for the well-characterized ONSEN retrotransposon family that were hidden in genome assembly gaps. The results demonstrate that some TEs are inaccessible for analysis with the current A. thaliana (TAIR10.1) genome assembly. We further explored the mobilome of the ddm1 mutant with elevated TE activity. Nanotei captured all TEs previously known to be active in ddm1 and also identified transposition of non-autonomous TEs. Of them, one non-autonomous TE derived from (AT5TE33540) belongs to TR-GAG retrotransposons with a single open reading frame (ORF) encoding the GAG protein. These results provide the first direct evidence that TR-GAGs and other non-autonomous LTR retrotransposons can transpose in the plant genome, albeit in the absence of most of the encoded proteins. In summary, nanotei is a useful tool to detect active TEs and their insertions in plant genomes using low-coverage data from Nanopore genome sequencing.
Collapse
|
research-article |
4 |
3 |
24
|
Kaplun L, Krautz-Peterson G, Neerman N, Stanley C, Hussey S, Folwick M, McGarry A, Weiss S, Kaplun A. ONT long-read WGS for variant discovery and orthogonal confirmation of short read WGS derived genetic variants in clinical genetic testing. Front Genet 2023; 14:1145285. [PMID: 37152986 PMCID: PMC10160624 DOI: 10.3389/fgene.2023.1145285] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 04/05/2023] [Indexed: 05/09/2023] Open
Abstract
Technological advances in Next-Generation Sequencing dramatically increased clinical efficiency of genetic testing, allowing detection of a wide variety of variants, from single nucleotide events to large structural aberrations. Whole Genome Sequencing (WGS) has allowed exploration of areas of the genome that might not have been targeted by other approaches, such as intergenic regions. A single technique detecting all genetic variants at once is intended to expedite the diagnostic process while making it more comprehensive and efficient. Nevertheless, there are still several shortcomings that cannot be effectively addressed by short read sequencing, such as determination of the precise size of short tandem repeat (STR) expansions, phasing of potentially compound recessive variants, resolution of some structural variants and exact determination of their boundaries, etc. Therefore, in some cases variants can only be tentatively detected by short reads sequencing and require orthogonal confirmation, particularly for clinical reporting purposes. Moreover, certain regulatory authorities, for example, New York state CLIA, require orthogonal confirmation of every reportable variant. Such orthogonal confirmations often involve numerous different techniques, not necessarily available in the same laboratory and not always performed in an expedited manner, thus negating the advantages of "one-technique-for-all" approach, and making the process lengthy, prone to logistical and analytical faults, and financially inefficient. Fortunately, those weak spots of short read sequencing can be compensated by long read technology that have comparable or better detection of some types of variants while lacking the mentioned above limitations of short read sequencing. At Variantyx we have developed an integrated clinical genetic testing approach, augmenting short read WGS-based variant detection with Oxford Nanopore Technologies (ONT) long read sequencing, providing simultaneous orthogonal confirmation of all types of variants with the additional benefit of improved identification of exact size and position of the detected aberrations. The validation study of this augmented test has demonstrated that Oxford Nanopore Technologies sequencing can efficiently verify multiple types of reportable variants, thus ensuring highly reliable detection and a quick turnaround time for WGS-based clinical genetic testing.
Collapse
|
research-article |
2 |
3 |
25
|
Nicot F, Trémeaux P, Latour J, Jeanne N, Ranger N, Raymond S, Dimeglio C, Salin G, Donnadieu C, Izopet J. Whole-genome sequencing of SARS-CoV-2: Comparison of target capture and amplicon single molecule real-time sequencing protocols. J Med Virol 2022; 95:e28123. [PMID: 36056719 PMCID: PMC9539136 DOI: 10.1002/jmv.28123] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/17/2022] [Accepted: 08/30/2022] [Indexed: 01/11/2023]
Abstract
Fast, accurate sequencing methods are needed to identify new variants and genetic mutations of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome. Single-molecule real-time (SMRT) Pacific Biosciences (PacBio) provides long, highly accurate sequences by circular consensus reads. This study compares the performance of a target capture SMRT PacBio protocol for whole-genome sequencing (WGS) of SARS-CoV-2 to that of an amplicon PacBio SMRT sequencing protocol. The median genome coverage was higher (p < 0.05) with the target capture protocol (99.3% [interquartile range, IQR: 96.3-99.5]) than with the amplicon protocol (99.3% [IQR: 69.9-99.3]). The clades of 65 samples determined with both protocols were 100% concordant. After adjusting for Ct values, S gene coverage was higher with the target capture protocol than with the amplicon protocol. After stratification on Ct values, higher S gene coverage with the target capture protocol was observed only for samples with Ct > 17 (p < 0.01). PacBio SMRT sequencing protocols appear to be suitable for WGS, genotyping, and detecting mutations of SARS-CoV-2.
Collapse
|
research-article |
3 |
2 |