1
|
Li Q, Wu J, Mao X. The roles of different gene expression regulators in acoustic variation in the intermediate horseshoe bat revealed by long-read and short-read RNA sequencing data. Curr Zool 2024; 70:575-588. [PMID: 39463690 PMCID: PMC11502156 DOI: 10.1093/cz/zoad045] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 09/27/2023] [Indexed: 10/29/2024] Open
Abstract
Gene expression changes contribute greatly to phenotypic variations in nature. Studying patterns of regulators of gene expression is important to fully understand the molecular mechanism underlying phenotypic variations. In horseshoe bats, the cochleae are finely tuned to echoes of call frequency. Here, using 2 recently diverged subspecies of the intermediate horseshoe bat (Rhinolophus affinis hainanus and R. a. himalayanus) with great acoustic variations as the system, we aim to explore relative roles of different regulators of gene expression (differential gene expression, alternative splicing (AS) and long non-coding RNAs (lncRNAs)) in phenotypic variation with a combination of Illumina short-read and Nanopore long-read RNA-seq data from the cochlea. Compared to R. a. hainanus, R. a. himalayanus exhibited much more upregulated differentially expressed genes (DEGs) and multiple of them may play important roles in the maintenance and damage repair of auditory hair cells. We identified 411 differentially expressed lncRNAs and their target DEGs upregulated in R. a. himalayanus were also mainly involved in a protective mechanism for auditory hair cells. Using 3 different methods of AS analysis, we identified several candidate alternatively spliced genes (ASGs) that expressed different isoforms which may be associated with acoustic divergence of the 2 subspecies. We observed significantly less overlap than expected between DEGs and ASGs, supporting complementary roles of differential gene expression and AS in generating phenotypic variations. Overall, our study highlights the importance of a combination of short-read and long-read RNA-seq data in examining the regulation of gene expression changes responsible for phenotypic variations.
Collapse
Affiliation(s)
- Qianqian Li
- School of Ecological and Environmental Sciences, East China Normal University, Shanghai 200062, China
| | - Jianyu Wu
- School of Ecological and Environmental Sciences, East China Normal University, Shanghai 200062, China
| | - Xiuguang Mao
- School of Ecological and Environmental Sciences, East China Normal University, Shanghai 200062, China
| |
Collapse
|
2
|
Unneberg P, Larsson M, Olsson A, Wallerman O, Petri A, Bunikis I, Vinnere Pettersson O, Papetti C, Gislason A, Glenner H, Cartes JE, Blanco-Bercial L, Eriksen E, Meyer B, Wallberg A. Ecological genomics in the Northern krill uncovers loci for local adaptation across ocean basins. Nat Commun 2024; 15:6297. [PMID: 39090106 PMCID: PMC11294593 DOI: 10.1038/s41467-024-50239-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 05/15/2024] [Indexed: 08/04/2024] Open
Abstract
Krill are vital as food for many marine animals but also impacted by global warming. To learn how they and other zooplankton may adapt to a warmer world we studied local adaptation in the widespread Northern krill (Meganyctiphanes norvegica). We assemble and characterize its large genome and compare genome-scale variation among 74 specimens from the colder Atlantic Ocean and warmer Mediterranean Sea. The 19 Gb genome likely evolved through proliferation of retrotransposons, now targeted for inactivation by extensive DNA methylation, and contains many duplicated genes associated with molting and vision. Analysis of 760 million SNPs indicates extensive homogenizing gene-flow among populations. Nevertheless, we detect signatures of adaptive divergence across hundreds of genes, implicated in photoreception, circadian regulation, reproduction and thermal tolerance, indicating polygenic adaptation to light and temperature. The top gene candidate for ecological adaptation was nrf-6, a lipid transporter with a Mediterranean variant that may contribute to early spring reproduction. Such variation could become increasingly important for fitness in Atlantic stocks. Our study underscores the widespread but uneven distribution of adaptive variation, necessitating characterization of genetic variation among natural zooplankton populations to understand their adaptive potential, predict risks and support ocean conservation in the face of climate change.
Collapse
Affiliation(s)
- Per Unneberg
- Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Mårten Larsson
- Department of Medical Biochemistry and Microbiology, Uppsala University, Husargatan 3, 751 23, Uppsala, Sweden
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Anna Olsson
- Department of Medical Biochemistry and Microbiology, Uppsala University, Husargatan 3, 751 23, Uppsala, Sweden
| | - Ola Wallerman
- Department of Medical Biochemistry and Microbiology, Uppsala University, Husargatan 3, 751 23, Uppsala, Sweden
| | - Anna Petri
- Uppsala Genome Center, Department of Immunology, Genetics and Pathology, Uppsala University, National Genomics Infrastructure hosted by SciLifeLab, Uppsala, Sweden
| | - Ignas Bunikis
- Uppsala Genome Center, Department of Immunology, Genetics and Pathology, Uppsala University, National Genomics Infrastructure hosted by SciLifeLab, Uppsala, Sweden
| | - Olga Vinnere Pettersson
- Uppsala Genome Center, Department of Immunology, Genetics and Pathology, Uppsala University, National Genomics Infrastructure hosted by SciLifeLab, Uppsala, Sweden
| | | | - Astthor Gislason
- Marine and Freshwater Research Institute, Pelagic Division, Reykjavik, Iceland
| | - Henrik Glenner
- Department of Biological Sciences, University of Bergen, Bergen, Norway
- Center for Macroecology, Evolution and Climate Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Joan E Cartes
- Instituto de Ciencias del Mar (ICM-CSIC), Barcelona, Spain
| | | | | | - Bettina Meyer
- Section Polar Biological Oceanography, Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research, Bremerhaven, Germany
- Institute for Chemistry and Biology of the Marine Environment, Carlvon Ossietzky University of Oldenburg, Oldenburg, Germany
- Helmholtz Institute for Functional Marine Biodiversity (HIFMB), University of Oldenburg, Oldenburg, Germany
| | - Andreas Wallberg
- Department of Medical Biochemistry and Microbiology, Uppsala University, Husargatan 3, 751 23, Uppsala, Sweden.
| |
Collapse
|
3
|
Vass M, Székely AJ, Carlsson-Graner U, Wikner J, Andersson A. Microeukaryote community coalescence strengthens community stability and elevates diversity. FEMS Microbiol Ecol 2024; 100:fiae100. [PMID: 39003240 PMCID: PMC11287207 DOI: 10.1093/femsec/fiae100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 06/19/2024] [Accepted: 07/12/2024] [Indexed: 07/15/2024] Open
Abstract
Mixing of entire microbial communities represents a frequent, yet understudied phenomenon. Here, we mimicked estuarine condition in a microcosm experiment by mixing a freshwater river community with a brackish sea community and assessed the effects of both environmental and community coalescences induced by varying mixing processes on microeukaryotic communities. Signs of shifted community composition of coalesced communities towards the sea parent community suggest asymmetrical community coalescence outcome, which, in addition, was generally less impacted by environmental coalescence. Community stability, inferred from community cohesion, differed among river and sea parent communities, and increased following coalescence treatments. Generally, community coalescence increased alpha diversity and promoted competition from the introduction (or emergence) of additional (or rare) species. These competitive interactions in turn had community stabilizing effect as evidenced by the increased proportion of negative cohesion. The fate of microeukaryotes was influenced by mixing ratios and frequencies (i.e. one-time versus repeated coalescence). Namely, diatoms were negatively impacted by coalescence, while fungi, ciliates, and cercozoans were promoted to varying extents, depending on the mixing ratios of the parent communities. Our study suggests that the predictability of coalescence outcomes was greater when the sea parent community dominated the final community, and this predictability was further enhanced when communities collided repeatedly.
Collapse
Affiliation(s)
- Máté Vass
- Department of Ecology and Environmental Science, Umeå University, SE-90187 Umeå, Sweden
- Division of Systems and Synthetic Biology, Department of Life Sciences, Science for Life Laboratory, Chalmers University of Technology, SE-41296 Gothenburg, Sweden
| | - Anna J Székely
- Division of Microbial Ecology, Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, SE-75007 Uppsala, Sweden
| | - Ulla Carlsson-Graner
- Department of Ecology and Environmental Science, Umeå University, SE-90187 Umeå, Sweden
| | - Johan Wikner
- Department of Ecology and Environmental Science, Umeå University, SE-90187 Umeå, Sweden
- Umeå Marine Sciences Centre, Umeå University, SE-90571 Hörnefors, Sweden
| | - Agneta Andersson
- Department of Ecology and Environmental Science, Umeå University, SE-90187 Umeå, Sweden
- Umeå Marine Sciences Centre, Umeå University, SE-90571 Hörnefors, Sweden
| |
Collapse
|
4
|
Vass M, Ramasamy KP, Andersson A. Microbial hitchhikers on microplastics: The exchange of aquatic microbes across distinct aquatic habitats. Environ Microbiol 2024; 26:e16618. [PMID: 38561820 DOI: 10.1111/1462-2920.16618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 03/16/2024] [Indexed: 04/04/2024]
Abstract
Microplastics (MPs) have the potential to modify aquatic microbial communities and distribute microorganisms, including pathogens. This poses a potential risk to aquatic life and human health. Despite this, the fate of 'hitchhiking' microbes on MPs that traverse different aquatic habitats remains largely unknown. To address this, we conducted a 50-day microcosm experiment, manipulating estuarine conditions to study the exchange of bacteria and microeukaryotes between river, sea and plastisphere using a long-read metabarcoding approach. Our findings revealed a significant increase in bacteria on the plastisphere, including Pseudomonas, Sphingomonas, Hyphomonas, Brevundimonas, Aquabacterium and Thalassolituus, all of which are known for their pollutant degradation capabilities, specifically polycyclic aromatic hydrocarbons. We also observed a strong association of plastic-degrading fungi (i.e., Cladosporium and Plectosphaerella) and early-diverging fungi (Cryptomycota, also known as Rozellomycota) with the plastisphere. Sea MPs were primarily colonised by fungi (70%), with a small proportion of river-transported microbes (1%-4%). The mere presence of MPs in seawater increased the relative abundance of planktonic fungi from 2% to 25%, suggesting significant exchanges between planktonic and plastisphere communities. Using microbial source tracking, we discovered that MPs only dispersed 3.5% and 5.5% of river bacterial and microeukaryotic communities into the sea, respectively. Hence, although MPs select and facilitate the dispersal of ecologically significant microorganisms, drastic compositional changes across distinct aquatic habitats are unlikely.
Collapse
Affiliation(s)
- Máté Vass
- Department of Ecology and Environmental Science, Umeå University, Umeå, Sweden
- Division of Systems and Synthetic Biology, Department of Life Sciences, Science for Life Laboratory, Chalmers University of Technology, Gothenburg, Sweden
| | - Kesava Priyan Ramasamy
- Department of Ecology and Environmental Science, Umeå University, Umeå, Sweden
- Umeå Marine Sciences Centre, Umeå University, Umeå, Sweden
| | - Agneta Andersson
- Department of Ecology and Environmental Science, Umeå University, Umeå, Sweden
- Umeå Marine Sciences Centre, Umeå University, Umeå, Sweden
| |
Collapse
|
5
|
Ma J, Zhao X, Qi E, Han R, Yu T, Li G. Highly efficient clustering of long-read transcriptomic data with GeLuster. Bioinformatics 2024; 40:btae059. [PMID: 38310330 PMCID: PMC10881092 DOI: 10.1093/bioinformatics/btae059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 01/08/2024] [Accepted: 01/30/2024] [Indexed: 02/05/2024] Open
Abstract
MOTIVATION The advancement of long-read RNA sequencing technologies leads to a bright future for transcriptome analysis, in which clustering long reads according to their gene family of origin is of great importance. However, existing de novo clustering algorithms require plenty of computing resources. RESULTS We developed a new algorithm GeLuster for clustering long RNA-seq reads. Based on our tests on one simulated dataset and nine real datasets, GeLuster exhibited superior performance. On the tested Nanopore datasets it ran 2.9-17.5 times as fast as the second-fastest method with less than one-seventh of memory consumption, while achieving higher clustering accuracy. And on the PacBio data, GeLuster also had a similar performance. It sets the stage for large-scale transcriptome study in future. AVAILABILITY AND IMPLEMENTATION GeLuster is freely available at https://github.com/yutingsdu/GeLuster.
Collapse
Affiliation(s)
- Junchi Ma
- Research Center for Mathematics and Interdisciplinary Sciences (Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao 266237, China
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| | - Xiaoyu Zhao
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| | - Enfeng Qi
- School of Mathematics and Statistics, Guangxi Normal University, Guilin 541000, China
| | - Renmin Han
- Research Center for Mathematics and Interdisciplinary Sciences (Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao 266237, China
| | - Ting Yu
- Research Center for Mathematics and Interdisciplinary Sciences (Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao 266237, China
| | - Guojun Li
- Research Center for Mathematics and Interdisciplinary Sciences (Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao 266237, China
| |
Collapse
|
6
|
Westrin KJ, Kretzschmar WW, Emanuelsson O. ClusTrast: a short read de novo transcript isoform assembler guided by clustered contigs. BMC Bioinformatics 2024; 25:54. [PMID: 38302873 PMCID: PMC10836024 DOI: 10.1186/s12859-024-05663-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 01/18/2024] [Indexed: 02/03/2024] Open
Abstract
BACKGROUND Transcriptome assembly from RNA-sequencing data in species without a reliable reference genome has to be performed de novo, but studies have shown that de novo methods often have inadequate ability to reconstruct transcript isoforms. We address this issue by constructing an assembly pipeline whose main purpose is to produce a comprehensive set of transcript isoforms. RESULTS We present the de novo transcript isoform assembler ClusTrast, which takes short read RNA-seq data as input, assembles a primary assembly, clusters a set of guiding contigs, aligns the short reads to the guiding contigs, assembles each clustered set of short reads individually, and merges the primary and clusterwise assemblies into the final assembly. We tested ClusTrast on real datasets from six eukaryotic species, and showed that ClusTrast reconstructed more expressed known isoforms than any of the other tested de novo assemblers, at a moderate reduction in precision. For recall, ClusTrast was on top in the lower end of expression levels (<15% percentile) for all tested datasets, and over the entire range for almost all datasets. Reference transcripts were often (35-69% for the six datasets) reconstructed to at least 95% of their length by ClusTrast, and more than half of reference transcripts (58-81%) were reconstructed with contigs that exhibited polymorphism, measuring on a subset of reliably predicted contigs. ClusTrast recall increased when using a union of assembled transcripts from more than one assembly tool as primary assembly. CONCLUSION We suggest that ClusTrast can be a useful tool for studying isoforms in species without a reliable reference genome, in particular when the goal is to produce a comprehensive transcriptome set with polymorphic variants.
Collapse
Affiliation(s)
- Karl Johan Westrin
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, 171 65, Solna, Sweden
| | - Warren W Kretzschmar
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, 171 65, Solna, Sweden
- Department of Medicine Huddinge, Center for Hematology and Regenerative Medicine (HERM), Karolinska Institute, 141 52, Flemingsberg, Sweden
| | - Olof Emanuelsson
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, 171 65, Solna, Sweden.
| |
Collapse
|
7
|
Sekiguchi Y, Teramoto K, Tourlousse DM, Ohashi A, Hamajima M, Miura D, Yamada Y, Iwamoto S, Tanaka K. A large-scale genomically predicted protein mass database enables rapid and broad-spectrum identification of bacterial and archaeal isolates by mass spectrometry. Genome Biol 2023; 24:257. [PMID: 38049850 PMCID: PMC10696839 DOI: 10.1186/s13059-023-03096-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 10/24/2023] [Indexed: 12/06/2023] Open
Abstract
MALDI-TOF MS-based microbial identification relies on reference spectral libraries, which limits the screening of diverse isolates, including uncultured lineages. We present a new strategy for broad-spectrum identification of bacterial and archaeal isolates by MALDI-TOF MS using a large-scale database of protein masses predicted from nearly 200,000 publicly available genomes. We verify the ability of the database to identify microorganisms at the species level and below, achieving correct identification for > 90% of measured spectra. We further demonstrate its utility by identifying uncultured strains from mouse feces with metagenomics, allowing the identification of new strains by customizing the database with metagenome-assembled genomes.
Collapse
Affiliation(s)
- Yuji Sekiguchi
- Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), AIST Tsukuba Central 6, Ibaraki, 305-8566, Japan.
| | | | - Dieter M Tourlousse
- Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), AIST Tsukuba Central 6, Ibaraki, 305-8566, Japan
| | - Akiko Ohashi
- Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), AIST Tsukuba Central 6, Ibaraki, 305-8566, Japan
| | - Mayu Hamajima
- Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), AIST Tsukuba Central 6, Ibaraki, 305-8566, Japan
| | - Daisuke Miura
- Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), AIST Tsukuba Central 6, Ibaraki, 305-8566, Japan
| | - Yoshihiro Yamada
- Koichi Tanaka Mass Spectrometry Research Laboratory, Shimadzu Corporation, Kyoto, Japan
| | - Shinichi Iwamoto
- Koichi Tanaka Mass Spectrometry Research Laboratory, Shimadzu Corporation, Kyoto, Japan
| | - Koichi Tanaka
- Koichi Tanaka Mass Spectrometry Research Laboratory, Shimadzu Corporation, Kyoto, Japan
| |
Collapse
|
8
|
Zheng H, Marçais G, Kingsford C. Creating and Using Minimizer Sketches in Computational Genomics. J Comput Biol 2023; 30:1251-1276. [PMID: 37646787 PMCID: PMC11082048 DOI: 10.1089/cmb.2023.0094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023] Open
Abstract
Processing large data sets has become an essential part of computational genomics. Greatly increased availability of sequence data from multiple sources has fueled breakthroughs in genomics and related fields but has led to computational challenges processing large sequencing experiments. The minimizer sketch is a popular method for sequence sketching that underlies core steps in computational genomics such as read mapping, sequence assembling, k-mer counting, and more. In most applications, minimizer sketches are constructed using one of few classical approaches. More recently, efforts have been put into building minimizer sketches with desirable properties compared with the classical constructions. In this survey, we review the history of the minimizer sketch, the theories developed around the concept, and the plethora of applications taking advantage of such sketches. We aim to provide the readers a comprehensive picture of the research landscape involving minimizer sketches, in anticipation of better fusion of theory and application in the future.
Collapse
Affiliation(s)
- Hongyu Zheng
- Computer Science Department, Princeton University, Princeton, New Jersey, USA
| | - Guillaume Marçais
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Carl Kingsford
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
9
|
Zhang R, Duan Q, Luo Q, Deng L. PacBio Full-Length Transcriptome of a Tetraploid Sinocyclocheilus multipunctatus Provides Insights into the Evolution of Cavefish. Animals (Basel) 2023; 13:3399. [PMID: 37958154 PMCID: PMC10648740 DOI: 10.3390/ani13213399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/21/2023] [Accepted: 10/31/2023] [Indexed: 11/15/2023] Open
Abstract
Sinocyclocheilus multipunctatus is a second-class nationally protected wild animal in China. As one of the cavefish, S. multipunctatus has strong adaptability to harsh subterranean environments. In this study, we used PacBio SMRT sequencing technology to generate a first representative full-length transcriptome for S. multipunctatus. Sequence clustering analysis obtained 232,126 full-length transcripts. Among all transcripts, 40,487 were annotated in public databases, while 70,300 microsatellites, 2384 transcription factors, and 16,321 long non-coding RNAs were identified. The phylogenetic tree showed that S. multipunctatus shows a closer relationship to Carassius auratus and Cyprinus carpio, phylogenetically diverging from the common ancestor ~14.74 million years ago (Mya). We also found that between 15.6 and 17.5 Mya, S. multipunctatus also experienced an additional whole-genome duplication (WGD) event, which may have promoted the species evolution of S. multipunctatus. Meanwhile, the overall rates of evolutionary of polyploid S. multipunctatus were significantly higher than those of the other cyprinids, and 220 positively selected genes (PSGs) were identified in two sub-genomes of S. multipunctatus. These PSGs are likely to fulfill critical roles in the process of adapting to diverse cave environments. This study has the potential to facilitate future investigations into the genomic characteristics of S. multipunctatus and provide valuable insights into revealing the evolutionary history of polyploid S. multipunctatus.
Collapse
|
10
|
Tomaszkiewicz M, Sahlin K, Medvedev P, Makova KD. Transcript Isoform Diversity of Ampliconic Genes on the Y Chromosome of Great Apes. Genome Biol Evol 2023; 15:evad205. [PMID: 37967251 PMCID: PMC10673640 DOI: 10.1093/gbe/evad205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 10/20/2023] [Accepted: 11/03/2023] [Indexed: 11/17/2023] Open
Abstract
Y chromosomal ampliconic genes (YAGs) are important for male fertility, as they encode proteins functioning in spermatogenesis. The variation in copy number and expression levels of these multicopy gene families has been studied in great apes; however, the diversity of splicing variants remains unexplored. Here, we deciphered the sequences of polyadenylated transcripts of all nine YAG families (BPY2, CDY, DAZ, HSFY, PRY, RBMY, TSPY, VCY, and XKRY) from testis samples of six great ape species (human, chimpanzee, bonobo, gorilla, Bornean orangutan, and Sumatran orangutan). To achieve this, we enriched YAG transcripts with capture probe hybridization and sequenced them with long (Pacific Biosciences) reads. Our analysis of this data set resulted in several findings. First, we observed evolutionarily conserved alternative splicing patterns for most YAG families except for BPY2 and PRY. Second, our results suggest that BPY2 transcripts and proteins originate from separate genomic regions in bonobo versus human, which is possibly facilitated by acquiring new promoters. Third, our analysis indicates that the PRY gene family, having the highest representation of noncoding transcripts, has been undergoing pseudogenization. Fourth, we have not detected signatures of selection in the five YAG families shared among great apes, even though we identified many species-specific protein-coding transcripts. Fifth, we predicted consensus disorder regions across most gene families and species, which could be used for future investigations of male infertility. Overall, our work illuminates the YAG isoform landscape and provides a genomic resource for future functional studies focusing on infertility phenotypes in humans and critically endangered great apes.
Collapse
Affiliation(s)
- Marta Tomaszkiewicz
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Medical Genomics, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Kateryna D Makova
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Medical Genomics, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
11
|
Petri AJ, Sahlin K. isONform: reference-free transcriptome reconstruction from Oxford Nanopore data. Bioinformatics 2023; 39:i222-i231. [PMID: 37387174 PMCID: PMC10311309 DOI: 10.1093/bioinformatics/btad264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION With advances in long-read transcriptome sequencing, we can now fully sequence transcripts, which greatly improves our ability to study transcription processes. A popular long-read transcriptome sequencing technique is Oxford Nanopore Technologies (ONT), which through its cost-effective sequencing and high throughput, has the potential to characterize the transcriptome in a cell. However, due to transcript variability and sequencing errors, long cDNA reads need substantial bioinformatic processing to produce a set of isoform predictions from the reads. Several genome and annotation-based methods exist to produce transcript predictions. However, such methods require high-quality genomes and annotations and are limited by the accuracy of long-read splice aligners. In addition, gene families with high heterogeneity may not be well represented by a reference genome and would benefit from reference-free analysis. Reference-free methods to predict transcripts from ONT, such as RATTLE, exist, but their sensitivity is not comparable to reference-based approaches. RESULTS We present isONform, a high-sensitivity algorithm to construct isoforms from ONT cDNA sequencing data. The algorithm is based on iterative bubble popping on gene graphs built from fuzzy seeds from the reads. Using simulated, synthetic, and biological ONT cDNA data, we show that isONform has substantially higher sensitivity than RATTLE albeit with some loss in precision. On biological data, we show that isONform's predictions have substantially higher consistency with the annotation-based method StringTie2 compared with RATTLE. We believe isONform can be used both for isoform construction for organisms without well-annotated genomes and as an orthogonal method to verify predictions of reference-based methods. AVAILABILITY AND IMPLEMENTATION https://github.com/aljpetri/isONform.
Collapse
Affiliation(s)
- Alexander J Petri
- Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm 106 91, Sweden
| | - Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm 106 91, Sweden
| |
Collapse
|
12
|
Sanchez-Cid C, Ghaly TM, Gillings MR, Vogel TM. Sub-inhibitory gentamicin pollution induces gentamicin resistance gene integration in class 1 integrons in the environment. Sci Rep 2023; 13:8612. [PMID: 37244902 PMCID: PMC10224954 DOI: 10.1038/s41598-023-35074-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 05/12/2023] [Indexed: 05/29/2023] Open
Abstract
Antibiotics at sub-inhibitory concentrations are often found in the environment. Here they could impose selective pressure on bacteria, leading to the selection and dissemination of antibiotic resistance, despite being under the inhibitory threshold. The goal of this study was to evaluate the effects of sub-inhibitory concentrations of gentamicin on environmental class 1 integron cassettes in natural river microbial communities. Gentamicin at sub-inhibitory concentrations promoted the integration and selection of gentamicin resistance genes (GmRG) in class 1 integrons after only a one-day exposure. Therefore, sub-inhibitory concentrations of gentamicin induced integron rearrangements, increasing the mobilization potential of gentamicin resistance genes and potentially increasing their dissemination in the environment. This study demonstrates the effects of antibiotics at sub-inhibitory concentrations in the environment and supports concerns about antibiotics as emerging pollutants.
Collapse
Affiliation(s)
- Concepcion Sanchez-Cid
- Environmental Microbial Genomics, UMR 5005 Laboratoire Ampère, CNRS, École Centrale de Lyon, Université de Lyon, Écully, France.
| | - Timothy M Ghaly
- School of Natural Sciences, Macquarie University, NSW, 2109, Australia
| | - Michael R Gillings
- School of Natural Sciences, Macquarie University, NSW, 2109, Australia
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, NSW, 2109, Australia
| | - Timothy M Vogel
- Université de Lyon, Université Claude Bernard Lyon 1, UMR CNRS 5557, UMR INRAe 1418, VetAgro Sup, Ecologie Microbienne, F-69622, Villeurbanne, France
| |
Collapse
|
13
|
Kariuki EG, Kibet C, Paredes JC, Mboowa G, Mwaura O, Njogu J, Masiga D, Bugg TDH, Tanga CM. Metatranscriptomic analysis of the gut microbiome of black soldier fly larvae reared on lignocellulose-rich fiber diets unveils key lignocellulolytic enzymes. Front Microbiol 2023; 14:1120224. [PMID: 37180276 PMCID: PMC10171111 DOI: 10.3389/fmicb.2023.1120224] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 04/03/2023] [Indexed: 05/16/2023] Open
Abstract
Recently, interest in the black soldier fly larvae (BSFL) gut microbiome has received increased attention primarily due to their role in waste bioconversion. However, there is a lack of information on the positive effect on the activities of the gut microbiomes and enzymes (CAZyme families) acting on lignocellulose. In this study, BSFL were subjected to lignocellulose-rich diets: chicken feed (CF), chicken manure (CM), brewers' spent grain (BSG), and water hyacinth (WH). The mRNA libraries were prepared, and RNA-Sequencing was conducted using the PCR-cDNA approach through the MinION sequencing platform. Our results demonstrated that BSFL reared on BSG and WH had the highest abundance of Bacteroides and Dysgonomonas. The presence of GH51 and GH43_16 enzyme families in the gut of BSFL with both α-L-arabinofuranosidases and exo-alpha-L-arabinofuranosidase 2 were common in the BSFL reared on the highly lignocellulosic WH and BSG diets. Gene clusters that encode hemicellulolytic arabinofuranosidases in the CAZy family GH51 were also identified. These findings provide novel insight into the shift of gut microbiomes and the potential role of BSFL in the bioconversion of various highly lignocellulosic diets to fermentable sugars for subsequent value-added products (bioethanol). Further research on the role of these enzymes to improve existing technologies and their biotechnological applications is crucial.
Collapse
Affiliation(s)
- Eric G. Kariuki
- International Centre of Insect Physiology and Ecology (icipe), Nairobi, Kenya
- Department of Immunology and Molecular Biology, Makerere University, Kampala, Uganda
| | - Caleb Kibet
- International Centre of Insect Physiology and Ecology (icipe), Nairobi, Kenya
| | - Juan C. Paredes
- Department of Immunology and Molecular Biology, Makerere University, Kampala, Uganda
| | - Gerald Mboowa
- Department of Immunology and Molecular Biology, Makerere University, Kampala, Uganda
| | - Oscar Mwaura
- International Centre of Insect Physiology and Ecology (icipe), Nairobi, Kenya
| | - John Njogu
- International Centre of Insect Physiology and Ecology (icipe), Nairobi, Kenya
| | - Daniel Masiga
- International Centre of Insect Physiology and Ecology (icipe), Nairobi, Kenya
| | - Timothy D. H. Bugg
- Department of Chemistry, School of Life Sciences, University of Warwick, Coventry, United Kingdom
| | - Chrysantus M. Tanga
- International Centre of Insect Physiology and Ecology (icipe), Nairobi, Kenya
| |
Collapse
|
14
|
Berger B, Yu YW. Navigating bottlenecks and trade-offs in genomic data analysis. Nat Rev Genet 2023; 24:235-250. [PMID: 36476810 PMCID: PMC10204111 DOI: 10.1038/s41576-022-00551-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2022] [Indexed: 12/12/2022]
Abstract
Genome sequencing and analysis allow researchers to decode the functional information hidden in DNA sequences as well as to study cell to cell variation within a cell population. Traditionally, the primary bottleneck in genomic analysis pipelines has been the sequencing itself, which has been much more expensive than the computational analyses that follow. However, an important consequence of the continued drive to expand the throughput of sequencing platforms at lower cost is that often the analytical pipelines are struggling to keep up with the sheer amount of raw data produced. Computational cost and efficiency have thus become of ever increasing importance. Recent methodological advances, such as data sketching, accelerators and domain-specific libraries/languages, promise to address these modern computational challenges. However, despite being more efficient, these innovations come with a new set of trade-offs, both expected, such as accuracy versus memory and expense versus time, and more subtle, including the human expertise needed to use non-standard programming interfaces and set up complex infrastructure. In this Review, we discuss how to navigate these new methodological advances and their trade-offs.
Collapse
Affiliation(s)
- Bonnie Berger
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Yun William Yu
- Department of Computer and Mathematical Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
- Tri-Campus Department of Mathematics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
15
|
Qi Q, Ghaly TM, Penesyan A, Rajabal V, Stacey JA, Tetu SG, Gillings MR. Uncovering Bacterial Hosts of Class 1 Integrons in an Urban Coastal Aquatic Environment with a Single-Cell Fusion-Polymerase Chain Reaction Technology. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:4870-4879. [PMID: 36912846 DOI: 10.1021/acs.est.2c09739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Horizontal gene transfer (HGT) is a key driver of bacterial evolution via transmission of genetic materials across taxa. Class 1 integrons are genetic elements that correlate strongly with anthropogenic pollution and contribute to the spread of antimicrobial resistance (AMR) genes via HGT. Despite their significance to human health, there is a shortage of robust, culture-free surveillance technologies for identifying uncultivated environmental taxa that harbor class 1 integrons. We developed a modified version of epicPCR (emulsion, paired isolation, and concatenation polymerase chain reaction (PCR)) that links class 1 integrons amplified from single bacterial cells to taxonomic markers from the same cells in emulsified aqueous droplets. Using this single-cell genomic approach and Nanopore sequencing, we successfully assigned class 1 integron gene cassette arrays containing mostly AMR genes to their hosts in coastal water samples that were affected by pollution. Our work presents the first application of epicPCR for targeting variable, multigene loci of interest. We also identified the Rhizobacter genus as novel hosts of class 1 integrons. These findings establish epicPCR as a powerful tool for linking taxa to class 1 integrons in environmental bacterial communities and offer the potential to direct mitigation efforts toward hotspots of class 1 integron-mediated dissemination of AMR.
Collapse
Affiliation(s)
- Qin Qi
- School of Natural Sciences, Macquarie University, 14 Eastern Road, Sydney, NSW 2109, Australia
| | - Timothy M Ghaly
- School of Natural Sciences, Macquarie University, 14 Eastern Road, Sydney, NSW 2109, Australia
| | - Anahit Penesyan
- School of Natural Sciences, Macquarie University, 14 Eastern Road, Sydney, NSW 2109, Australia
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW 2109, Australia
| | - Vaheesan Rajabal
- School of Natural Sciences, Macquarie University, 14 Eastern Road, Sydney, NSW 2109, Australia
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW 2109, Australia
| | - Jeremy Ac Stacey
- School of Natural Sciences, Macquarie University, 14 Eastern Road, Sydney, NSW 2109, Australia
| | - Sasha G Tetu
- School of Natural Sciences, Macquarie University, 14 Eastern Road, Sydney, NSW 2109, Australia
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW 2109, Australia
| | - Michael R Gillings
- School of Natural Sciences, Macquarie University, 14 Eastern Road, Sydney, NSW 2109, Australia
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
16
|
Tomaszkiewicz M, Sahlin K, Medvedev P, Makova KD. Transcript Isoform Diversity of Ampliconic Genes on the Y Chromosome of Great Apes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.02.530874. [PMID: 36993458 PMCID: PMC10054944 DOI: 10.1101/2023.03.02.530874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Y-chromosomal Ampliconic Genes (YAGs) are important for male fertility, as they encode proteins functioning in spermatogenesis. The variation in copy number and expression levels of these multicopy gene families has been recently studied in great apes, however, the diversity of splicing variants remains unexplored. Here we deciphered the sequences of polyadenylated transcripts of all nine YAG families (BPY2, CDY, DAZ, HSFY, PRY, RBMY, TSPY, VCY, and XKRY) from testis samples of six great ape species (human, chimpanzee, bonobo, gorilla, Bornean orangutan, and Sumatran orangutan). To achieve this, we enriched YAG transcripts with capture-probe hybridization and sequenced them with long (Pacific Biosciences) reads. Our analysis of this dataset resulted in several findings. First, we uncovered a high diversity of YAG transcripts across great apes. Second, we observed evolutionarily conserved alternative splicing patterns for most YAG families except for BPY2 and PRY. Our results suggest that BPY2 transcripts and predicted proteins in several great ape species (bonobo and the two orangutans) have independent evolutionary origins and are not homologous to human reference transcripts and proteins. In contrast, our results suggest that the PRY gene family, having the highest representation of transcripts without open reading frames, has been undergoing pseudogenization. Third, even though we have identified many species-specific protein-coding YAG transcripts, we have not detected any signatures of positive selection. Overall, our work illuminates the YAG isoform landscape and its evolutionary history, and provides a genomic resource for future functional studies focusing on infertility phenotypes in humans and critically endangered great apes.
Collapse
Affiliation(s)
- Marta Tomaszkiewicz
- Department of Biomedical Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, 106 91, Stockholm, Sweden
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Medical Genomics, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Kateryna D Makova
- Center for Medical Genomics, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
17
|
Sahlin K. Strobealign: flexible seed size enables ultra-fast and accurate read alignment. Genome Biol 2022; 23:260. [PMID: 36522758 PMCID: PMC9753264 DOI: 10.1186/s13059-022-02831-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 12/02/2022] [Indexed: 12/23/2022] Open
Abstract
Read alignment is often the computational bottleneck in analyses. Recently, several advances have been made on seeding methods for fast sequence comparison. We combine two such methods, syncmers and strobemers, in a novel seeding approach for constructing dynamic-sized fuzzy seeds and implement the method in a short-read aligner, strobealign. The seeding is fast to construct and effectively reduces repetitiveness in the seeding step, as shown using a novel metric E-hits. strobealign is several times faster than traditional aligners at similar and sometimes higher accuracy while being both faster and more accurate than more recently proposed aligners for short reads of lengths 150nt and longer. Availability: https://github.com/ksahlin/strobealign.
Collapse
Affiliation(s)
- Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, 106 91, Stockholm, Sweden.
| |
Collapse
|
18
|
Blassel L, Medvedev P, Chikhi R. Mapping-friendly sequence reductions: Going beyond homopolymer compression. iScience 2022; 25:105305. [PMID: 36339268 PMCID: PMC9633736 DOI: 10.1016/j.isci.2022.105305] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 08/17/2022] [Accepted: 10/03/2022] [Indexed: 11/09/2022] Open
Abstract
Sequencing errors continue to pose algorithmic challenges to methods working with sequencing data. One of the simplest and most prevalent techniques for ameliorating the detrimental effects of homopolymer expansion/contraction errors present in long reads is homopolymer compression. It collapses runs of repeated nucleotides, to remove some sequencing errors and improve mapping sensitivity. Though our intuitive understanding justifies why homopolymer compression works, it in no way implies that it is the best transformation that can be done. In this paper, we explore if there are transformations that can be applied in the same pre-processing manner as homopolymer compression that would achieve better alignment sensitivity. We introduce a more general framework than homopolymer compression, called mapping-friendly sequence reductions. We transform the reference and the reads using these reductions and then apply an alignment algorithm. We demonstrate that some mapping-friendly sequence reductions lead to improved mapping accuracy, outperforming homopolymer compression.
Collapse
Affiliation(s)
- Luc Blassel
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France
- Sorbonne Université, Collège doctoral, Paris F-75005, France
| | - Paul Medvedev
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
| | - Rayan Chikhi
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
19
|
Vass M, Eriksson K, Carlsson-Graner U, Wikner J, Andersson A. Co-occurrences enhance our understanding of aquatic fungal metacommunity assembly and reveal potential host-parasite interactions. FEMS Microbiol Ecol 2022; 98:fiac120. [PMID: 36202390 PMCID: PMC9621394 DOI: 10.1093/femsec/fiac120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 08/30/2022] [Accepted: 10/03/2022] [Indexed: 01/21/2023] Open
Abstract
Our knowledge of aquatic fungal communities, their assembly, distributions and ecological roles in marine ecosystems is scarce. Hence, we aimed to investigate fungal metacommunities of coastal habitats in a subarctic zone (northern Baltic Sea, Sweden). Using a novel joint species distribution model and network approach, we quantified the importance of biotic associations contributing to the assembly of mycoplankton, further, detected potential biotic interactions between fungi-algae pairs, respectively. Our long-read metabarcoding approach identified 493 fungal taxa, of which a dominant fraction (44.4%) was assigned as early-diverging fungi (i.e. Cryptomycota and Chytridiomycota). Alpha diversity of mycoplankton declined and community compositions changed along inlet-bay-offshore transects. The distributions of most fungi were rather influenced by environmental factors than by spatial drivers, and the influence of biotic associations was pronounced when environmental filtering was weak. We found great number of co-occurrences (120) among the dominant fungal groups, and the 25 associations between fungal and algal OTUs suggested potential host-parasite and/or saprotroph links, supporting a Cryptomycota-based mycoloop pathway. We emphasize that the contribution of biotic associations to mycoplankton assembly are important to consider in future studies as it helps to improve predictions of species distributions in aquatic ecosystems.
Collapse
Affiliation(s)
- Máté Vass
- Department of Ecology and Environmental Science, Umeå University, SE-901 87, Umeå, Sweden
| | - Karolina Eriksson
- Department of Ecology and Environmental Science, Umeå University, SE-901 87, Umeå, Sweden
| | - Ulla Carlsson-Graner
- Department of Ecology and Environmental Science, Umeå University, SE-901 87, Umeå, Sweden
| | - Johan Wikner
- Department of Ecology and Environmental Science, Umeå University, SE-901 87, Umeå, Sweden
- Sweden Umeå Marine Sciences Centre, Umeå University, SE-905 71, Hörnefors, Sweden
| | - Agneta Andersson
- Department of Ecology and Environmental Science, Umeå University, SE-901 87, Umeå, Sweden
- Sweden Umeå Marine Sciences Centre, Umeå University, SE-905 71, Hörnefors, Sweden
| |
Collapse
|
20
|
de la Rubia I, Srivastava A, Xue W, Indi JA, Carbonell-Sala S, Lagarde J, Albà MM, Eyras E. RATTLE: reference-free reconstruction and quantification of transcriptomes from Nanopore sequencing. Genome Biol 2022; 23:153. [PMID: 35804393 PMCID: PMC9264490 DOI: 10.1186/s13059-022-02715-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 06/20/2022] [Indexed: 11/04/2022] Open
Abstract
Nanopore sequencing enables the efficient and unbiased measurement of transcriptomes. Current methods for transcript identification and quantification rely on mapping reads to a reference genome, which precludes the study of species with a partial or missing reference or the identification of disease-specific transcripts not readily identifiable from a reference. We present RATTLE, a tool to perform reference-free reconstruction and quantification of transcripts using only Nanopore reads. Using simulated data and experimental data from isoform spike-ins, human tissues, and cell lines, we show that RATTLE accurately determines transcript sequences and their abundances, and shows good scalability with the number of transcripts.
Collapse
Affiliation(s)
- Ivan de la Rubia
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia.,Pompeu Fabra University (UPF), E08003, Barcelona, Spain
| | - Akanksha Srivastava
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia.,Australian National University, Acton, Canberra, ACT, 2601, Australia
| | - Wenjing Xue
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia.,Australian National University, Acton, Canberra, ACT, 2601, Australia
| | - Joel A Indi
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia.,Universidade de Lisboa, Lisboa, Portugal
| | - Silvia Carbonell-Sala
- Pompeu Fabra University (UPF), E08003, Barcelona, Spain.,Centre for Regulatory Genomics (CRG), E08001, Barcelona, Spain
| | - Julien Lagarde
- Pompeu Fabra University (UPF), E08003, Barcelona, Spain.,Centre for Regulatory Genomics (CRG), E08001, Barcelona, Spain
| | - M Mar Albà
- Pompeu Fabra University (UPF), E08003, Barcelona, Spain. .,Catalan Institution for Research and Advanced Studies (ICREA), E08010, Barcelona, Spain. .,Hospital del Mar Medical Research Institute (IMIM), E08001, Barcelona, Spain.
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia. .,Australian National University, Acton, Canberra, ACT, 2601, Australia. .,Catalan Institution for Research and Advanced Studies (ICREA), E08010, Barcelona, Spain. .,Hospital del Mar Medical Research Institute (IMIM), E08001, Barcelona, Spain.
| |
Collapse
|
21
|
González-Miguéns R, Todorov M, Blandenier Q, Duckert C, Porfirio-Sousa AL, Ribeiro GM, Ramos D, Lahr DJG, Buckley D, Lara E. Deconstructing Difflugia: The tangled evolution of lobose testate amoebae shells (Amoebozoa: Arcellinida) illustrates the importance of convergent evolution in protist phylogeny. Mol Phylogenet Evol 2022; 175:107557. [PMID: 35777650 DOI: 10.1016/j.ympev.2022.107557] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 05/25/2022] [Accepted: 05/31/2022] [Indexed: 10/17/2022]
Abstract
Protists, the micro-eukaryotes that are neither plants, animals nor fungi build up the greatest part of eukaryotic diversity on Earth. Yet, their evolutionary histories and patterns are still mostly ignored, and their complexity overlooked. Protists are often assumed to keep stable morphologies for long periods of time (morphological stasis). In this work, we test this paradigm taking Arcellinida testate amoebae as a model. We build a taxon-rich phylogeny based on two mitochondrial (COI and NADH) and one nuclear (SSU) gene, and reconstruct morphological evolution among clades. In addition, we prove the existence of mitochondrial mRNA editing for the COI gene. The trees show a lack of conservatism of shell outlines within the main clades, as well as a widespread occurrence of morphological convergences between far-related taxa. Our results refute, therefore, a widespread morphological stasis, which may be an artefact resulting from low taxon coverage. As a corollary, we also revise the groups systematics, notably by emending the large and highly polyphyletic genus Difflugia. These results lead, amongst others, to the erection of a new infraorder Cylindrothecina, as well as two new genera Cylindrifflugia and Golemanskia.
Collapse
Affiliation(s)
| | - Milcho Todorov
- Institute of Biodiversity and Ecosystem Research, Bulgarian Academy of Science, 1113 Sofia, Bulgaria
| | - Quentin Blandenier
- Laboratory of Soil Biodiversity, University of Neuchâtel, Emile-Argand 11, 2000 Neuchâtel, Switzerland
| | - Clément Duckert
- Laboratory of Soil Biodiversity, University of Neuchâtel, Emile-Argand 11, 2000 Neuchâtel, Switzerland
| | | | - Giulia M Ribeiro
- Department of Zoology, Institute of Biosciences, University of São Paulo, Brazil
| | - Diana Ramos
- Real Jardín Botánico (RJB-CSIC), Plaza Murillo 2, 28014 Madrid, Spain
| | - Daniel J G Lahr
- Department of Zoology, Institute of Biosciences, University of São Paulo, Brazil
| | - David Buckley
- Department of Biology (Genetics), Universidad Autónoma de Madrid, Spain; Centro de Investigación en Biodiversidad y Cambio Global (CIBC-UAM), Universidad Autónoma de Madrid, Spain
| | - Enrique Lara
- Real Jardín Botánico (RJB-CSIC), Plaza Murillo 2, 28014 Madrid, Spain.
| |
Collapse
|
22
|
Belbasi M, Blanca A, Harris RS, Koslicki D, Medvedev P. The minimizer Jaccard estimator is biased and inconsistent. Bioinformatics 2022; 38:i169-i176. [PMID: 35758786 PMCID: PMC9235516 DOI: 10.1093/bioinformatics/btac244] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Motivation Sketching is now widely used in bioinformatics to reduce data size and increase data processing speed. Sketching approaches entice with improved scalability but also carry the danger of decreased accuracy and added bias. In this article, we investigate the minimizer sketch and its use to estimate the Jaccard similarity between two sequences. Results We show that the minimizer Jaccard estimator is biased and inconsistent, which means that the expected difference (i.e. the bias) between the estimator and the true value is not zero, even in the limit as the lengths of the sequences grow. We derive an analytical formula for the bias as a function of how the shared k-mers are laid out along the sequences. We show both theoretically and empirically that there are families of sequences where the bias can be substantial (e.g. the true Jaccard can be more than double the estimate). Finally, we demonstrate that this bias affects the accuracy of the widely used mashmap read mapping tool. Availability and implementation Scripts to reproduce our experiments are available at https://github.com/medvedevgroup/minimizer-jaccard-estimator/tree/main/reproduce. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mahdi Belbasi
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA
| | - Antonio Blanca
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA
| | - Robert S Harris
- Department of Biology, The Pennsylvania State University, University Park, PA, USA
| | - David Koslicki
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA.,Department of Biology, The Pennsylvania State University, University Park, PA, USA.,Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA.,Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA.,Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
23
|
Egeter B, Veríssimo J, Lopes-Lima M, Chaves C, Pinto J, Riccardi N, Beja P, Fonseca NA. Speeding up the detection of invasive bivalve species using environmental DNA: a Nanopore and Illumina sequencing comparison. Mol Ecol Resour 2022; 22:2232-2247. [PMID: 35305077 DOI: 10.1111/1755-0998.13610] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 02/09/2022] [Accepted: 03/02/2022] [Indexed: 11/30/2022]
Abstract
Traditional detection of aquatic invasive species via morphological identification is often time-consuming and can require a high level of taxonomic expertise, leading to delayed mitigation responses. Environmental DNA (eDNA) detection approaches of multiple species using Illumina-based sequencing technology have been used to overcome these hindrances, but sample processing is often lengthy. More recently, portable nanopore sequencing technology has become available, which has the potential to make molecular detection of invasive species more widely accessible and substantially decrease sample turnaround times. However, nanopore-sequenced reads have a much higher error rate than those produced by Illumina platforms, which has so far hindered the adoption of this technology. We provide a detailed laboratory protocol and bioinformatic tools (msi package) to increase the reliability of nanopore sequencing to detect invasive species, and we test its application using invasive bivalves while comparing it with Illumina-based sequencing. We sampled water from sites with pre-existing bivalve occurrence and abundance data, and contrasting bivalve communities, in Italy and Portugal. Samples were extracted, amplified, and sequenced by the two platforms. The mean agreement between sequencing methods was 69% and the difference between methods was non-significant. The lack of detections of some species at some sites could be explained by their known low abundances. This is the first reported use of MinION to detect aquatic invasive species from eDNA samples.
Collapse
Affiliation(s)
- Bastian Egeter
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661, Vairão, Portugal.,NatureMetrics, Bakeham Lane, Egham, Surrey, TW20 9TY, U.K
| | - Joana Veríssimo
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661, Vairão, Portugal.,BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, 4485-661, Vairão, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - Manuel Lopes-Lima
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661, Vairão, Portugal.,BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, 4485-661, Vairão, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal.,IUCN SSC Mollusc Specialist Group, c/o 219 Huntingdon Road, Cambridge, CB3 0DL, U.K
| | - Cátia Chaves
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661, Vairão, Portugal.,BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, 4485-661, Vairão, Portugal
| | - Joana Pinto
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661, Vairão, Portugal.,BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, 4485-661, Vairão, Portugal
| | | | - Pedro Beja
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661, Vairão, Portugal.,BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, 4485-661, Vairão, Portugal.,CIBIO/InBIO, Instituto Superior de Agronomia, Universidade de Lisboa, Lisboa, Portugal
| | - Nuno A Fonseca
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661, Vairão, Portugal.,BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, 4485-661, Vairão, Portugal
| |
Collapse
|
24
|
Ghaly TM, Penesyan A, Pritchard A, Qi Q, Rajabal V, Tetu SG, Gillings MR. Methods for the targeted sequencing and analysis of integrons and their gene cassettes from complex microbial communities. Microb Genom 2022; 8. [PMID: 35298369 PMCID: PMC9176274 DOI: 10.1099/mgen.0.000788] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Integrons are microbial genetic elements that can integrate mobile gene cassettes. They are mostly known for spreading antibiotic resistance cassettes among human pathogens. However, beyond clinical settings, gene cassettes encode an extraordinarily diverse range of functions important for bacterial adaptation. The recovery and sequencing of cassettes has promising applications, including: surveillance of clinically important genes, particularly antibiotic resistance determinants; investigating the functional diversity of integron-carrying bacteria; and novel enzyme discovery. Although gene cassettes can be directly recovered using PCR, there are no standardised methods for their amplification and, importantly, for validating sequences as genuine integron gene cassettes. Here, we present reproducible methods for the amplification, sequence processing, and validation of gene cassette amplicons from complex communities. We describe two different PCR assays that either amplify cassettes together with integron integrases, or gene cassettes together within cassette arrays. We compare the performance of Nanopore and Illumina sequencing, and present bioinformatic pipelines that filter sequences to ensure that they represent amplicons from genuine integrons. Using a diverse set of environmental DNAs, we show that our approach can consistently recover thousands of unique cassettes per sample and up to hundreds of different integron integrases. Recovered cassettes confer a wide range of functions, including antibiotic resistance, with as many as 300 resistance cassettes found in a single sample. In particular, we show that class one integrons are collecting and concentrating resistance genes out of the broader diversity of cassette functions. The methods described here can be applied to any environmental or clinical microbiome sample.
Collapse
Affiliation(s)
- Timothy M Ghaly
- School of Natural Sciences, Macquarie University, New South Wales 2109, Australia
| | - Anahit Penesyan
- School of Natural Sciences, Macquarie University, New South Wales 2109, Australia.,ARC Centre of Excellence in Synthetic Biology, Macquarie University, New South Wales 2109, Australia
| | - Alexander Pritchard
- Division of Food Sciences, University of Nottingham, Loughborough LE12 5RD, Australia
| | - Qin Qi
- School of Natural Sciences, Macquarie University, New South Wales 2109, Australia
| | - Vaheesan Rajabal
- School of Natural Sciences, Macquarie University, New South Wales 2109, Australia.,ARC Centre of Excellence in Synthetic Biology, Macquarie University, New South Wales 2109, Australia
| | - Sasha G Tetu
- School of Natural Sciences, Macquarie University, New South Wales 2109, Australia.,ARC Centre of Excellence in Synthetic Biology, Macquarie University, New South Wales 2109, Australia
| | - Michael R Gillings
- School of Natural Sciences, Macquarie University, New South Wales 2109, Australia.,ARC Centre of Excellence in Synthetic Biology, Macquarie University, New South Wales 2109, Australia
| |
Collapse
|
25
|
Vierstraete AR, Braeckman BP. Amplicon_sorter: A tool for reference‐free amplicon sorting based on sequence similarity and for building consensus sequences. Ecol Evol 2022; 12:e8603. [PMID: 35261737 PMCID: PMC8888255 DOI: 10.1002/ece3.8603] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 01/12/2022] [Accepted: 01/18/2022] [Indexed: 11/23/2022] Open
Abstract
Oxford Nanopore Technologies (ONT) is a third‐generation sequencing technology that is gaining popularity in ecological research for its portable and low‐cost sequencing possibilities. Although the technology excels at long‐read sequencing, it can also be applied to sequence amplicons. The downside of ONT is the low quality of the raw reads. Hence, generating a high‐quality consensus sequence is still a challenge. We present Amplicon_sorter, a tool for reference‐free sorting of ONT sequenced amplicons based on their similarity in sequence and length and for building solid consensus sequences.
Collapse
Affiliation(s)
- Andy R. Vierstraete
- Laboratory of aging physiology and Molecular Evolution University of Gent Gent Belgium
| | - Bart P. Braeckman
- Laboratory of aging physiology and Molecular Evolution University of Gent Gent Belgium
| |
Collapse
|
26
|
Wilburn DB, Kunkel CL, Feldhoff RC, Feldhoff PW, Searle BC. Recurrent Co-Option and Recombination of Cytokine and Three Finger Proteins in Multiple Reproductive Tissues Throughout Salamander Evolution. Front Cell Dev Biol 2022; 10:828947. [PMID: 35281090 PMCID: PMC8904931 DOI: 10.3389/fcell.2022.828947] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Accepted: 02/01/2022] [Indexed: 11/13/2022] Open
Abstract
Reproductive proteins evolve at unparalleled rates, resulting in tremendous diversity of both molecular composition and biochemical function between gametes of different taxonomic clades. To date, the proteomic composition of amphibian gametes is largely a molecular mystery, particularly for Urodeles (salamanders and newts) for which few genomic-scale resources exist. In this study, we provide the first detailed molecular characterization of gametes from two salamander species (Plethodon shermani and Desmognathus ocoee) that are models of reproductive behavior. Long-read PacBio transcriptome sequencing of testis and ovary of both species revealed sex-specific expression of many genes common to vertebrate gametes, including a similar expression profile to the egg coat genes of Xenopus oocytes. In contrast to broad conservation of oocyte genes, major testis transcripts included paralogs of salamander-specific courtship pheromones (PRF, PMF, and SPF) that were confirmed as major sperm proteins by mass spectrometry proteomics. Sperm-specific paralogs of PMF and SPF are likely the most abundant secreted proteins in P. shermani and D. ocoee, respectively. In contrast, sperm PRF lacks a signal peptide and may be expressed in cytoplasm. PRF pheromone genes evolved independently multiple times by repeated gene duplication of sperm PRF genes with signal peptides recovered through recombination with PMF genes. Phylogenetic analysis of courtship pheromones and their sperm paralogs support that each protein family evolved for these two reproductive contexts at distinct evolutionary time points between 17 and 360 million years ago. Our combined phylogenetic, transcriptomic and proteomic analyses of plethodontid reproductive tissues support that the recurrent co-option and recombination of TFPs and cytokine-like proteins have been a novel driving force throughout salamander evolution and reproduction.
Collapse
Affiliation(s)
- Damien B. Wilburn
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States
- *Correspondence: Damien B. Wilburn,
| | - Christy L. Kunkel
- Department of Biology, John Carroll University, Cleveland Heights, OH, United States
| | - Richard C. Feldhoff
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, United States
| | - Pamela W. Feldhoff
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, United States
| | - Brian C. Searle
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States
| |
Collapse
|
27
|
Allemand E, Ango F. Analysis of Splicing Regulation by Third-Generation Sequencing. Methods Mol Biol 2022; 2537:81-95. [PMID: 35895260 DOI: 10.1007/978-1-0716-2521-7_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In Metazoa, the diversity of transcripts produced by the RNA Polymerase II is generated essentially through post-transcriptional processing of the nascent transcripts. The regulation of exon inclusion by alternative splicing is one of the main sources of this diversity, which leads to the expansion of the proteome. The portfolio of alternative transcripts remains largely underestimated. Improvement of the sequencing technologies has enhanced the characterization of RNA isoforms and led to the perpetual incrementation of gene expression diversity. Here, we describe a high throughput approach to assess in-depth the splicing regulation of target gene(s) using the third-generation sequencing (TGS) technologies.
Collapse
Affiliation(s)
- Eric Allemand
- Laboratory of cellular and molecular mechanisms of hematological disorders and therapeutic implications, Institut IMAGINE, INSERM, Paris, France.
| | - Fabrice Ango
- INM, University of Montpellier, INSERM, Montpellier, France
| |
Collapse
|
28
|
Wang Y, Zhao Y, Bollas A, Wang Y, Au KF. Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol 2021; 39:1348-1365. [PMID: 34750572 PMCID: PMC8988251 DOI: 10.1038/s41587-021-01108-x] [Citation(s) in RCA: 614] [Impact Index Per Article: 153.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 09/22/2021] [Indexed: 12/13/2022]
Abstract
Rapid advances in nanopore technologies for sequencing single long DNA and RNA molecules have led to substantial improvements in accuracy, read length and throughput. These breakthroughs have required extensive development of experimental and bioinformatics methods to fully exploit nanopore long reads for investigations of genomes, transcriptomes, epigenomes and epitranscriptomes. Nanopore sequencing is being applied in genome assembly, full-length transcript detection and base modification detection and in more specialized areas, such as rapid clinical diagnoses and outbreak surveillance. Many opportunities remain for improving data quality and analytical approaches through the development of new nanopores, base-calling methods and experimental protocols tailored to particular applications.
Collapse
Affiliation(s)
- Yunhao Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Yue Zhao
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
- Biomedical Informatics Shared Resources, The Ohio State University, Columbus, OH, USA
| | - Audrey Bollas
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Yuru Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Kin Fai Au
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.
- Biomedical Informatics Shared Resources, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
29
|
Sahlin K. Effective sequence similarity detection with strobemers. Genome Res 2021; 31:2080-2094. [PMID: 34667119 PMCID: PMC8559714 DOI: 10.1101/gr.275648.121] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 08/20/2021] [Indexed: 01/08/2023]
Abstract
k-mer-based methods are widely used in bioinformatics for various types of sequence comparisons. However, a single mutation will mutate k consecutive k-mers and make most k-mer-based applications for sequence comparison sensitive to variable mutation rates. Many techniques have been studied to overcome this sensitivity, for example, spaced k-mers and k-mer permutation techniques, but these techniques do not handle indels well. For indels, pairs or groups of small k-mers are commonly used, but these methods first produce k-mer matches, and only in a second step, a pairing or grouping of k-mers is performed. Such techniques produce many redundant k-mer matches owing to the size of k Here, we propose strobemers as an alternative to k-mers for sequence comparison. Intuitively, strobemers consist of two or more linked shorter k-mers, where the combination of linked k-mers is decided by a hash function. We use simulated data to show that strobemers provide more evenly distributed sequence matches and are less sensitive to different mutation rates than k-mers and spaced k-mers. Strobemers also produce higher match coverage across sequences. We further implement a proof-of-concept sequence-matching tool StrobeMap and use synthetic and biological Oxford Nanopore sequencing data to show the utility of using strobemers for sequence comparison in different contexts such as sequence clustering and alignment scenarios.
Collapse
Affiliation(s)
- Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, 10691 Stockholm, Sweden
| |
Collapse
|
30
|
Langat SK, Eyase F, Bulimo W, Lutomiah J, Oyola SO, Imbuga M, Sang R. Profiling of RNA Viruses in Biting Midges ( Ceratopogonidae) and Related Diptera from Kenya Using Metagenomics and Metabarcoding Analysis. mSphere 2021; 6:e0055121. [PMID: 34643419 PMCID: PMC8513680 DOI: 10.1128/msphere.00551-21] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 09/15/2021] [Indexed: 11/28/2022] Open
Abstract
Vector-borne diseases (VBDs) cause enormous health burden worldwide, as they account for more than 17% of all infectious diseases and over 700,000 deaths each year. A significant number of these VBDs are caused by RNA virus pathogens. Here, we used metagenomics and metabarcoding analysis to characterize RNA viruses and their insect hosts among biting midges from Kenya. We identified a total of 15 phylogenetically distinct insect-specific viruses. These viruses fall into six families, with one virus falling in the recently proposed negevirus taxon. The six virus families include Partitiviridae, Iflaviridae, Tombusviridae, Solemoviridae, Totiviridae, and Chuviridae. In addition, we identified many insect species that were possibly associated with the identified viruses. Ceratopogonidae was the most common family of midges identified. Others included Chironomidae and Cecidomyiidae. Our findings reveal a diverse RNA virome among Kenyan midges that includes previously unknown viruses. Further, metabarcoding analysis based on COI (cytochrome c oxidase subunit 1 mitochondrial gene) barcodes reveal a diverse array of midge species among the insects used in the study. Successful application of metagenomics and metabarcoding methods to characterize RNA viruses and their insect hosts in this study highlights a possible simultaneous application of these two methods as cost-effective approaches to virus surveillance and host characterization. IMPORTANCE The majority of the viruses that currently cause diseases in humans and animals are RNA viruses, and more specifically arthropod-transmitted viruses. They cause diseases such as dengue, West Nile infection, bluetongue disease, Schmallenberg disease, and yellow fever, among others. Several sequencing investigations have shown us that a diverse array of RNA viruses among insect vectors remain unknown. Some of these could be ancient lineages that could aid in comprehensive studies on RNA virus evolution. Such studies may provide us with insights into the evolution of the currently pathogenic viruses. Here, we applied metagenomics to field-collected midges and we managed to characterize several RNA viruses, where we recovered complete and nearly complete genomes of these viruses. We also characterized the insect host species that are associated with these viruses. These results add to the currently known diversity of RNA viruses among biting midges as well as their associated insect hosts.
Collapse
Affiliation(s)
- Solomon K. Langat
- Department of Biochemistry, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
- Centre for Virus Research, Kenya Medical Research Institute, Nairobi, Kenya
| | - Fredrick Eyase
- Institute of Biotechnology Research, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
- Department of Emerging Infectious Diseases, United States Army Medical Research Directorate—Africa, Nairobi, Kenya
| | - Wallace Bulimo
- Centre for Virus Research, Kenya Medical Research Institute, Nairobi, Kenya
- Department of Biochemistry, University of Nairobi, Nairobi, Kenya
| | - Joel Lutomiah
- Centre for Virus Research, Kenya Medical Research Institute, Nairobi, Kenya
| | | | - Mabel Imbuga
- Department of Biochemistry, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
| | - Rosemary Sang
- Centre for Virus Research, Kenya Medical Research Institute, Nairobi, Kenya
| |
Collapse
|
31
|
Srivathsan A, Lee L, Katoh K, Hartop E, Kutty SN, Wong J, Yeo D, Meier R. ONTbarcoder and MinION barcodes aid biodiversity discovery and identification by everyone, for everyone. BMC Biol 2021; 19:217. [PMID: 34587965 PMCID: PMC8479912 DOI: 10.1186/s12915-021-01141-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 09/03/2021] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND DNA barcodes are a useful tool for discovering, understanding, and monitoring biodiversity which are critical tasks at a time of rapid biodiversity loss. However, widespread adoption of barcodes requires cost-effective and simple barcoding methods. We here present a workflow that satisfies these conditions. It was developed via "innovation through subtraction" and thus requires minimal lab equipment, can be learned within days, reduces the barcode sequencing cost to < 10 cents, and allows fast turnaround from specimen to sequence by using the portable MinION sequencer. RESULTS We describe how tagged amplicons can be obtained and sequenced with the real-time MinION sequencer in many settings (field stations, biodiversity labs, citizen science labs, schools). We also provide amplicon coverage recommendations that are based on several runs of the latest generation of MinION flow cells ("R10.3") which suggest that each run can generate barcodes for > 10,000 specimens. Next, we present a novel software, ONTbarcoder, which overcomes the bioinformatics challenges posed by MinION reads. The software is compatible with Windows 10, Macintosh, and Linux, has a graphical user interface (GUI), and can generate thousands of barcodes on a standard laptop within hours based on only two input files (FASTQ, demultiplexing file). We document that MinION barcodes are virtually identical to Sanger and Illumina barcodes for the same specimens (> 99.99%) and provide evidence that MinION flow cells and reads have improved rapidly since 2018. CONCLUSIONS We propose that barcoding with MinION is the way forward for government agencies, universities, museums, and schools because it combines low consumable and capital cost with scalability. Small projects can use the flow cell dongle ("Flongle") while large projects can rely on MinION flow cells that can be stopped and re-used after collecting sufficient data for a given project.
Collapse
Affiliation(s)
- Amrita Srivathsan
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Leshon Lee
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Kazutaka Katoh
- Research Institute for Microbial Diseases, Osaka University, Osaka, Japan
- Artificial Intelligence Research Center, AIST, Tokyo, Japan
| | - Emily Hartop
- Zoology Department, Stockholms Universitet, Stockholm, Sweden
- Station Linné, Öland, Sweden
| | - Sujatha Narayanan Kutty
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
- Tropical Marine Science Institute, National University of Singapore, Singapore, Singapore
| | - Johnathan Wong
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Darren Yeo
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Rudolf Meier
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore.
- Museum für Naturkunde, Leibniz Institute for Evolution and Biodiversity Science, Center for Integrative Biodiversity Discovery, Berlin, Germany.
| |
Collapse
|
32
|
Fu Y, Mahmoud M, Muraliraman VV, Sedlazeck FJ, Treangen TJ. Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment. Gigascience 2021; 10:giab063. [PMID: 34561697 PMCID: PMC8463296 DOI: 10.1093/gigascience/giab063] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/22/2021] [Accepted: 08/29/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Long-read sequencing has enabled unprecedented surveys of structural variation across the entire human genome. To maximize the potential of long-read sequencing in this context, novel mapping methods have emerged that have primarily focused on either speed or accuracy. Various heuristics and scoring schemas have been implemented in widely used read mappers (minimap2 and NGMLR) to optimize for speed or accuracy, which have variable performance across different genomic regions and for specific structural variants. Our hypothesis is that constraining read mapping to the use of a single gap penalty across distinct mutational hot spots reduces read alignment accuracy and impedes structural variant detection. FINDINGS We tested our hypothesis by implementing a read-mapping pipeline called Vulcan that uses two distinct gap penalty modes, which we refer to as dual-mode alignment. The high-level idea is that Vulcan leverages the computed normalized edit distance of the mapped reads via minimap2 to identify poorly aligned reads and realigns them using the more accurate yet computationally more expensive long-read mapper (NGMLR). In support of our hypothesis, we show that Vulcan improves the alignments for Oxford Nanopore Technology long reads for both simulated and real datasets. These improvements, in turn, lead to improved accuracy for structural variant calling performance on human genome datasets compared to either of the read-mapping methods alone. CONCLUSIONS Vulcan is the first long-read mapping framework that combines two distinct gap penalty modes for improved structural variant recall and precision. Vulcan is open-source and available under the MIT License at https://gitlab.com/treangenlab/vulcan.
Collapse
Affiliation(s)
- Yilei Fu
- Department of Computer Science, Rice University, Houston, TX 77251-1892, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX 77251-1892, USA
| |
Collapse
|
33
|
Nanopore sequencing in non-human forensic genetics. Emerg Top Life Sci 2021; 5:465-473. [PMID: 34002773 PMCID: PMC8457772 DOI: 10.1042/etls20200287] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 04/21/2021] [Accepted: 04/29/2021] [Indexed: 12/28/2022]
Abstract
The past decade has seen a rapid expansion of non-human forensic genetics coinciding with the development of 2nd and 3rd generation DNA sequencing technologies. Nanopore sequencing is one such technology that offers massively parallel sequencing at a fraction of the capital cost of other sequencing platforms. The application of nanopore sequencing to species identification has already been widely demonstrated in biomonitoring studies and has significant potential for non-human forensic casework, particularly in the area of wildlife forensics. This review examines nanopore sequencing technology and assesses its potential applications, advantages and drawbacks for use in non-human forensics, alongside other next-generation sequencing platforms and as a possible replacement to Sanger sequencing. We assess the specific challenges of sequence error rate and the standardisation of consensus sequence production, before discussing recent progress in the validation of nanopore sequencing for use in forensic casework. We conclude that nanopore sequencing may be able to play a considerable role in the future of non-human forensic genetics, especially for applications to wildlife law enforcement within emerging forensic laboratories.
Collapse
|
34
|
Dong X, Tian L, Gouil Q, Kariyawasam H, Su S, De Paoli-Iseppi R, Prawer YDJ, Clark MB, Breslin K, Iminitoff M, Blewitt ME, Law CW, Ritchie ME. The long and the short of it: unlocking nanopore long-read RNA sequencing data with short-read differential expression analysis tools. NAR Genom Bioinform 2021; 3:lqab028. [PMID: 33937765 PMCID: PMC8074342 DOI: 10.1093/nargab/lqab028] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 02/26/2021] [Accepted: 03/30/2021] [Indexed: 12/12/2022] Open
Abstract
Application of Oxford Nanopore Technologies’ long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs (‘sequins’) as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.
Collapse
Affiliation(s)
- Xueyi Dong
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Luyi Tian
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Quentin Gouil
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Hasaru Kariyawasam
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Shian Su
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Ricardo De Paoli-Iseppi
- Centre for Stem Cell Systems, Department of Anatomy and Neuroscience, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Yair David Joseph Prawer
- Centre for Stem Cell Systems, Department of Anatomy and Neuroscience, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Michael B Clark
- Centre for Stem Cell Systems, Department of Anatomy and Neuroscience, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Kelsey Breslin
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Megan Iminitoff
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Marnie E Blewitt
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Charity W Law
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Matthew E Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| |
Collapse
|
35
|
Ciuffreda L, Rodríguez-Pérez H, Flores C. Nanopore sequencing and its application to the study of microbial communities. Comput Struct Biotechnol J 2021; 19:1497-1511. [PMID: 33815688 PMCID: PMC7985215 DOI: 10.1016/j.csbj.2021.02.020] [Citation(s) in RCA: 98] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 02/24/2021] [Accepted: 02/27/2021] [Indexed: 12/14/2022] Open
Abstract
Since its introduction, nanopore sequencing has enhanced our ability to study complex microbial samples through the possibility to sequence long reads in real time using inexpensive and portable technologies. The use of long reads has allowed to address several previously unsolved issues in the field, such as the resolution of complex genomic structures, and facilitated the access to metagenome assembled genomes (MAGs). Furthermore, the low cost and portability of platforms together with the development of rapid protocols and analysis pipelines have featured nanopore technology as an attractive and ever-growing tool for real-time in-field sequencing for environmental microbial analysis. This review provides an up-to-date summary of the experimental protocols and bioinformatic tools for the study of microbial communities using nanopore sequencing, highlighting the most important and recent research in the field with a major focus on infectious diseases. An overview of the main approaches including targeted and shotgun approaches, metatranscriptomics, epigenomics, and epitranscriptomics is provided, together with an outlook to the major challenges and perspectives over the use of this technology for microbial studies.
Collapse
Affiliation(s)
- Laura Ciuffreda
- Research Unit, Hospital Universitario N.S. de Candelaria, Universidad de La Laguna, 38010 Santa Cruz de Tenerife, Spain
| | - Héctor Rodríguez-Pérez
- Research Unit, Hospital Universitario N.S. de Candelaria, Universidad de La Laguna, 38010 Santa Cruz de Tenerife, Spain
| | - Carlos Flores
- Research Unit, Hospital Universitario N.S. de Candelaria, Universidad de La Laguna, 38010 Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, 28029 Madrid, Spain
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
- Instituto de Tecnologías Biomédicas (ITB), Universidad de La Laguna, 38200 Santa Cruz de Tenerife, Spain
| |
Collapse
|
36
|
Masutani B, Arimura SI, Morishita S. Investigating the mitochondrial genomic landscape of Arabidopsis thaliana by long-read sequencing. PLoS Comput Biol 2021; 17:e1008597. [PMID: 33434206 PMCID: PMC7833223 DOI: 10.1371/journal.pcbi.1008597] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 01/25/2021] [Accepted: 12/01/2020] [Indexed: 11/18/2022] Open
Abstract
Plant mitochondrial genomes have distinctive features compared to those of animals; namely, they are large and divergent, with sizes ranging from hundreds of thousands of to a few million bases. Recombination among repetitive regions is thought to produce similar structures that differ slightly, known as "multipartite structures," which contribute to different phenotypes. Although many reference plant mitochondrial genomes represent almost all the genes in mitochondria, the full spectrum of their structures remains largely unknown. The emergence of long-read sequencing technology is expected to yield this landscape; however, many studies aimed to assemble only one representative circular genome, because properly understanding multipartite structures using existing assemblers is not feasible. To elucidate multipartite structures, we leveraged the information in existing reference genomes and classified long reads according to their corresponding structures. We developed a method that exploits two classic algorithms, partial order alignment (POA) and the hidden Markov model (HMM) to construct a sensitive read classifier. This method enables us to represent a set of reads as a POA graph and analyze it using the HMM. We can then calculate the likelihood of a read occurring in a given cluster, resulting in an iterative clustering algorithm. For synthetic data, our proposed method reliably detected one variation site out of 9,000-bp synthetic long reads with a 15% sequencing-error rate and produced accurate clustering. It was also capable of clustering long reads from six very similar sequences containing only slight differences. For real data, we assembled putative multipartite structures of mitochondrial genomes of Arabidopsis thaliana from nine accessions sequenced using PacBio Sequel. The results indicated that there are recurrent and strain-specific structures in A. thaliana mitochondrial genomes.
Collapse
Affiliation(s)
- Bansho Masutani
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
- * E-mail:
| | - Shin-ichi Arimura
- Laboratory of Plant Molecular Genetics, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| |
Collapse
|
37
|
Sahlin K, Medvedev P. Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis. Nat Commun 2021; 12:2. [PMID: 33397972 PMCID: PMC7782715 DOI: 10.1038/s41467-020-20340-8] [Citation(s) in RCA: 81] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 11/25/2020] [Indexed: 01/24/2023] Open
Abstract
Oxford Nanopore (ONT) is a leading long-read technology which has been revolutionizing transcriptome analysis through its capacity to sequence the majority of transcripts from end-to-end. This has greatly increased our ability to study the diversity of transcription mechanisms such as transcription initiation, termination, and alternative splicing. However, ONT still suffers from high error rates which have thus far limited its scope to reference-based analyses. When a reference is not available or is not a viable option due to reference-bias, error correction is a crucial step towards the reconstruction of the sequenced transcripts and downstream sequence analysis of transcripts. In this paper, we present a novel computational method to error correct ONT cDNA sequencing data, called isONcorrect. IsONcorrect is able to jointly use all isoforms from a gene during error correction, thereby allowing it to correct reads at low sequencing depths. We are able to obtain a median accuracy of 98.9-99.6%, demonstrating the feasibility of applying cost-effective cDNA full transcript length sequencing for reference-free transcriptome analysis.
Collapse
Affiliation(s)
- Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, 106 91, Stockholm, Sweden
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA.
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA.
- Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
38
|
Oikonomopoulos S, Bayega A, Fahiminiya S, Djambazian H, Berube P, Ragoussis J. Methodologies for Transcript Profiling Using Long-Read Technologies. Front Genet 2020; 11:606. [PMID: 32733532 PMCID: PMC7358353 DOI: 10.3389/fgene.2020.00606] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Accepted: 05/19/2020] [Indexed: 12/28/2022] Open
Abstract
RNA sequencing using next-generation sequencing technologies (NGS) is currently the standard approach for gene expression profiling, particularly for large-scale high-throughput studies. NGS technologies comprise high throughput, cost efficient short-read RNA-Seq, while emerging single molecule, long-read RNA-Seq technologies have enabled new approaches to study the transcriptome and its function. The emerging single molecule, long-read technologies are currently commercially available by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), while new methodologies based on short-read sequencing approaches are also being developed in order to provide long range single molecule level information-for example, the ones represented by the 10x Genomics linked read methodology. The shift toward long-read sequencing technologies for transcriptome characterization is based on current increases in throughput and decreases in cost, making these attractive for de novo transcriptome assembly, isoform expression quantification, and in-depth RNA species analysis. These types of analyses were challenging with standard short sequencing approaches, due to the complex nature of the transcriptome, which consists of variable lengths of transcripts and multiple alternatively spliced isoforms for most genes, as well as the high sequence similarity of highly abundant species of RNA, such as rRNAs. Here we aim to focus on single molecule level sequencing technologies and single-cell technologies that, combined with perturbation tools, allow the analysis of complete RNA species, whether short or long, at high resolution. In parallel, these tools have opened new ways in understanding gene functions at the tissue, network, and pathway levels, as well as their detailed functional characterization. Analysis of the epi-transcriptome, including RNA methylation and modification and the effects of such modifications on biological systems is now enabled through direct RNA sequencing instead of classical indirect approaches. However, many difficulties and challenges remain, such as methodologies to generate full-length RNA or cDNA libraries from all different species of RNAs, not only poly-A containing transcripts, and the identification of allele-specific transcripts due to current error rates of single molecule technologies, while the bioinformatics analysis on long-read data for accurate identification of 5' and 3' UTRs is still in development.
Collapse
Affiliation(s)
- Spyros Oikonomopoulos
- McGill Genome Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Anthony Bayega
- McGill Genome Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Somayyeh Fahiminiya
- McGill Genome Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Haig Djambazian
- McGill Genome Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Pierre Berube
- McGill Genome Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Jiannis Ragoussis
- McGill Genome Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
- Department of Bioengineering, McGill University, Montréal, QC, Canada
| |
Collapse
|
39
|
Jain C, Rhie A, Zhang H, Chu C, Walenz BP, Koren S, Phillippy AM. Weighted minimizer sampling improves long read mapping. Bioinformatics 2020; 36:i111-i118. [PMID: 32657365 PMCID: PMC7355284 DOI: 10.1093/bioinformatics/btaa435] [Citation(s) in RCA: 100] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence comparison. This technique yields a sub-linear representation of sequences, enabling their comparison in reduced space and time. A key property of the minimizer technique is that if two sequences share a substring of a specified length, then they can be guaranteed to have a matching minimizer. However, because the k-mer distribution in eukaryotic genomes is highly uneven, minimizer-based tools (e.g. Minimap2, Mashmap) opt to discard the most frequently occurring minimizers from the genome to avoid excessive false positives. By doing so, the underlying guarantee is lost and accuracy is reduced in repetitive genomic regions. RESULTS We introduce a novel weighted-minimizer sampling algorithm. A unique feature of the proposed algorithm is that it performs minimizer sampling while considering a weight for each k-mer; i.e. the higher the weight of a k-mer, the more likely it is to be selected. By down-weighting frequently occurring k-mers, we are able to meet both objectives: (i) avoid excessive false-positive matches and (ii) maintain the minimizer match guarantee. We tested our algorithm, Winnowmap, using both simulated and real long-read data and compared it to a state-of-the-art long read mapper, Minimap2. Our results demonstrate a reduction in the mapping error-rate from 0.14% to 0.06% in the recently finished human X chromosome (154.3 Mbp), and from 3.6% to 0% within the highly repetitive X centromere (3.1 Mbp). Winnowmap improves mapping accuracy within repeats and achieves these results with sparser sampling, leading to better index compression and competitive runtimes. AVAILABILITY AND IMPLEMENTATION Winnowmap is built on top of the Minimap2 codebase and is available at https://github.com/marbl/winnowmap.
Collapse
Affiliation(s)
- Chirag Jain
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Haowen Zhang
- College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Claudia Chu
- College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Brian P Walenz
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Adam M Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
40
|
Seah A, Lim MC, McAloose D, Prost S, Seimon TA. MinION-Based DNA Barcoding of Preserved and Non-Invasively Collected Wildlife Samples. Genes (Basel) 2020; 11:genes11040445. [PMID: 32325704 PMCID: PMC7230362 DOI: 10.3390/genes11040445] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 04/14/2020] [Accepted: 04/16/2020] [Indexed: 01/14/2023] Open
Abstract
The ability to sequence a variety of wildlife samples with portable, field-friendly equipment will have significant impacts on wildlife conservation and health applications. However, the only currently available field-friendly DNA sequencer, the MinION by Oxford Nanopore Technologies, has a high error rate compared to standard laboratory-based sequencing platforms and has not been systematically validated for DNA barcoding accuracy for preserved and non-invasively collected tissue samples. We tested whether various wildlife sample types, field-friendly methods, and our clustering-based bioinformatics pipeline, SAIGA, can be used to generate consistent and accurate consensus sequences for species identification. Here, we systematically evaluate variation in cytochrome b sequences amplified from scat, hair, feather, fresh frozen liver, and formalin-fixed paraffin-embedded (FFPE) liver. Each sample was processed by three DNA extraction protocols. For all sample types tested, the MinION consensus sequences matched the Sanger references with 99.29%-100% sequence similarity, even for samples that were difficult to amplify, such as scat and FFPE tissue extracted with Chelex resin. Sequencing errors occurred primarily in homopolymer regions, as identified in previous MinION studies. We demonstrate that it is possible to generate accurate DNA barcode sequences from preserved and non-invasively collected wildlife samples using portable MinION sequencing, creating more opportunities to apply portable sequencing technology for species identification.
Collapse
Affiliation(s)
- Adeline Seah
- Zoological Health Program, Wildlife Conservation Society, Bronx Zoo, 2300 Southern Blvd, Bronx, NY 10460, USA; (A.S.); (D.M.); (T.A.S.)
| | - Marisa C.W. Lim
- Zoological Health Program, Wildlife Conservation Society, Bronx Zoo, 2300 Southern Blvd, Bronx, NY 10460, USA; (A.S.); (D.M.); (T.A.S.)
- Correspondence:
| | - Denise McAloose
- Zoological Health Program, Wildlife Conservation Society, Bronx Zoo, 2300 Southern Blvd, Bronx, NY 10460, USA; (A.S.); (D.M.); (T.A.S.)
| | - Stefan Prost
- LOEWE-Centre for Translational Biodiversity Genomics, Senckenberg Nature Research Society, 60325 Frankfurt, Germany;
- South African National Biodiversity Institute, National Zoological Garden, Pretoria 0001, South Africa
| | - Tracie A. Seimon
- Zoological Health Program, Wildlife Conservation Society, Bronx Zoo, 2300 Southern Blvd, Bronx, NY 10460, USA; (A.S.); (D.M.); (T.A.S.)
| |
Collapse
|