1
|
Luebbert L, Sullivan DK, Carilli M, Hjörleifsson KE, Winnett AV, Chari T, Pachter L. Efficient and accurate detection of viral sequences at single-cell resolution reveals putative novel viruses perturbing host gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.11.571168. [PMID: 38168363 PMCID: PMC10760059 DOI: 10.1101/2023.12.11.571168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
There are an estimated 300,000 mammalian viruses from which infectious diseases in humans may arise. They inhabit human tissues such as the lungs, blood, and brain and often remain undetected. Efficient and accurate detection of viral infection is vital to understanding its impact on human health and to make accurate predictions to limit adverse effects, such as future epidemics. The increasing use of high-throughput sequencing methods in research, agriculture, and healthcare provides an opportunity for the cost-effective surveillance of viral diversity and investigation of virus-disease correlation. However, existing methods for identifying viruses in sequencing data rely on and are limited to reference genomes or cannot retain single-cell resolution through cell barcode tracking. We introduce a method that accurately and rapidly detects viral sequences in bulk and single-cell transcriptomics data based on highly conserved amino acid domains, which enables the detection of RNA viruses covering up to 1012 virus species. The analysis of viral presence and host gene expression in parallel at single-cell resolution allows for the characterization of host viromes and the identification of viral tropism and host responses. We applied our method to identify putative novel viruses in rhesus macaque PBMC data that display cell type specificity and whose presence correlates with altered host gene expression.
Collapse
Affiliation(s)
- Laura Luebbert
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Delaney K. Sullivan
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, California
| | - Maria Carilli
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | | | - Alexander Viloria Winnett
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, California
| | - Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California
| |
Collapse
|
2
|
Terrazos Miani MA, Borcard L, Gempeler S, Baumann C, Bittel P, Leib SL, Neuenschwander S, Ramette A. NASCarD (Nanopore Adaptive Sampling with Carrier DNA): A Rapid, PCR-Free Method for SARS-CoV-2 Whole-Genome Sequencing in Clinical Samples. Pathogens 2024; 13:61. [PMID: 38251368 PMCID: PMC10818518 DOI: 10.3390/pathogens13010061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 01/04/2024] [Accepted: 01/07/2024] [Indexed: 01/23/2024] Open
Abstract
Whole-genome sequencing (WGS) represents the main technology for SARS-CoV-2 lineage characterization in diagnostic laboratories worldwide. The rapid, near-full-length sequencing of the viral genome is commonly enabled by high-throughput sequencing of PCR amplicons derived from cDNA molecules. Here, we present a new approach called NASCarD (Nanopore Adaptive Sampling with Carrier DNA), which allows a low amount of nucleic acids to be sequenced while selectively enriching for sequences of interest, hence limiting the production of non-target sequences. Using COVID-19 positive samples available during the omicron wave, we demonstrate how the method may lead to >99% genome completeness of the SARS-CoV-2 genome sequences within 7 h of sequencing at a competitive cost. The new approach may have applications beyond SARS-CoV-2 sequencing for other DNA or RNA pathogens in clinical samples.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Alban Ramette
- Institute for Infectious Diseases, University of Bern, Friedbühlstrasse 25, 3001 Bern, Switzerland
| |
Collapse
|
3
|
Rádai Z, Váradi A, Takács P, Nagy NA, Schmitt N, Prépost E, Kardos G, Laczkó L. An overlooked phenomenon: complex interactions of potential error sources on the quality of bacterial de novo genome assemblies. BMC Genomics 2024; 25:45. [PMID: 38195441 PMCID: PMC10777565 DOI: 10.1186/s12864-023-09910-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 12/15/2023] [Indexed: 01/11/2024] Open
Abstract
BACKGROUND Parameters adversely affecting the contiguity and accuracy of the assemblies from Illumina next-generation sequencing (NGS) are well described. However, past studies generally focused on their additive effects, overlooking their potential interactions possibly exacerbating one another's effects in a multiplicative manner. To investigate whether or not they act interactively on de novo genome assembly quality, we simulated sequencing data for 13 bacterial reference genomes, with varying levels of error rate, sequencing depth, PCR and optical duplicate ratios. RESULTS We assessed the quality of assemblies from the simulated sequencing data with a number of contiguity and accuracy metrics, which we used to quantify both additive and multiplicative effects of the four parameters. We found that the tested parameters are engaged in complex interactions, exerting multiplicative, rather than additive, effects on assembly quality. Also, the ratio of non-repeated regions and GC% of the original genomes can shape how the four parameters affect assembly quality. CONCLUSIONS We provide a framework for consideration in future studies using de novo genome assembly of bacterial genomes, e.g. in choosing the optimal sequencing depth, balancing between its positive effect on contiguity and negative effect on accuracy due to its interaction with error rate. Furthermore, the properties of the genomes to be sequenced also should be taken into account, as they might influence the effects of error sources themselves.
Collapse
Affiliation(s)
- Zoltán Rádai
- Institute of Metagenomics, University of Debrecen, Debrecen, Hungary.
- Department of Dermatology, University Hospital Düsseldorf, Heinrich-Heine-University, Düsseldorf, Germany.
| | - Alex Váradi
- Institute of Metagenomics, University of Debrecen, Debrecen, Hungary
- Department of Laboratory Medicine, Medical School, University of Pécs, Pécs, Hungary
| | - Péter Takács
- Institute of Metagenomics, University of Debrecen, Debrecen, Hungary
- Department of Health Informatics, Institute of Health Sciences, Faculty of Health, University of Debrecen, Debrecen, Hungary
| | - Nikoletta Andrea Nagy
- Institute of Metagenomics, University of Debrecen, Debrecen, Hungary
- Department of Evolutionary Zoology, ELKH-DE Behavioural Ecology Research Group, University of Debrecen, Debrecen, Hungary
- Department of Evolutionary Zoology and Human Biology, University of Debrecen, Debrecen, Hungary
| | - Nicholas Schmitt
- Department of Dermatology, University Hospital Düsseldorf, Heinrich-Heine-University, Düsseldorf, Germany
| | - Eszter Prépost
- Department of Health Industry, University of Debrecen, Debrecen, Hungary
| | - Gábor Kardos
- Institute of Metagenomics, University of Debrecen, Debrecen, Hungary
- Department of Gerontology, Faculty of Health Sciences, University of Debrecen, Debrecen, Hungary
| | - Levente Laczkó
- Institute of Metagenomics, University of Debrecen, Debrecen, Hungary
- ELKH-DE Conservation Biology Research Group, Debrecen, Hungary
| |
Collapse
|
4
|
Pentz JT, MacGillivray K, DuBose JG, Conlin PL, Reinhardt E, Libby E, Ratcliff WC. Evolutionary consequences of nascent multicellular life cycles. eLife 2023; 12:e84336. [PMID: 37889142 PMCID: PMC10611430 DOI: 10.7554/elife.84336] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 10/08/2023] [Indexed: 10/28/2023] Open
Abstract
A key step in the evolutionary transition to multicellularity is the origin of multicellular groups as biological individuals capable of adaptation. Comparative work, supported by theory, suggests clonal development should facilitate this transition, although this hypothesis has never been tested in a single model system. We evolved 20 replicate populations of otherwise isogenic clonally reproducing 'snowflake' yeast (Δace2/∆ace2) and aggregative 'floc' yeast (GAL1p::FLO1 /GAL1p::FLO1) with daily selection for rapid growth in liquid media, which favors faster cell division, followed by selection for rapid sedimentation, which favors larger multicellular groups. While both genotypes adapted to this regime, growing faster and having higher survival during the group-selection phase, there was a stark difference in evolutionary dynamics. Aggregative floc yeast obtained nearly all their increased fitness from faster growth, not improved group survival; indicating that selection acted primarily at the level of cells. In contrast, clonal snowflake yeast mainly benefited from higher group-dependent fitness, indicating a shift in the level of Darwinian individuality from cells to groups. Through genome sequencing and mathematical modeling, we show that the genetic bottlenecks in a clonal life cycle also drive much higher rates of genetic drift-a result with complex implications for this evolutionary transition. Our results highlight the central role that early multicellular life cycles play in the process of multicellular adaptation.
Collapse
Affiliation(s)
| | - Kathryn MacGillivray
- School of Biological Sciences, Georgia Institute of TechnologyAtlantaUnited States
- Interdisciplinary Graduate Program in Quantitative Biosciences, Georgia Institute of TechnologyAtlantaUnited States
| | - James G DuBose
- School of Biological Sciences, Georgia Institute of TechnologyAtlantaUnited States
| | - Peter L Conlin
- School of Biological Sciences, Georgia Institute of TechnologyAtlantaUnited States
| | - Emma Reinhardt
- Department of Biology, University of North Carolina at Chapel HillChapel HillUnited States
| | | | - William C Ratcliff
- School of Biological Sciences, Georgia Institute of TechnologyAtlantaUnited States
| |
Collapse
|
5
|
Arredondo-Alonso S, Gladstone R, Pöntinen A, Gama J, Schürch A, Lanza V, Johnsen P, Samuelsen Ø, Tonkin-Hill G, Corander J. Mge-cluster: a reference-free approach for typing bacterial plasmids. NAR Genom Bioinform 2023; 5:lqad066. [PMID: 37435357 PMCID: PMC10331934 DOI: 10.1093/nargab/lqad066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 06/08/2023] [Accepted: 06/26/2023] [Indexed: 07/13/2023] Open
Abstract
Extrachromosomal elements of bacterial cells such as plasmids are notorious for their importance in evolution and adaptation to changing ecology. However, high-resolution population-wide analysis of plasmids has only become accessible recently with the advent of scalable long-read sequencing technology. Current typing methods for the classification of plasmids remain limited in their scope which motivated us to develop a computationally efficient approach to simultaneously recognize novel types and classify plasmids into previously identified groups. Here, we introduce mge-cluster that can easily handle thousands of input sequences which are compressed using a unitig representation in a de Bruijn graph. Our approach offers a faster runtime than existing algorithms, with moderate memory usage, and enables an intuitive visualization, classification and clustering scheme that users can explore interactively within a single framework. Mge-cluster platform for plasmid analysis can be easily distributed and replicated, enabling a consistent labelling of plasmids across past, present, and future sequence collections. We underscore the advantages of our approach by analysing a population-wide plasmid data set obtained from the opportunistic pathogen Escherichia coli, studying the prevalence of the colistin resistance gene mcr-1.1 within the plasmid population, and describing an instance of resistance plasmid transmission within a hospital environment.
Collapse
Affiliation(s)
| | | | - Anna K Pöntinen
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Norwegian National Advisory Unit on Detection of Antimicrobial Resistance, Department of Microbiology and Infection Control, University Hospital of North Norway, Tromsø, Norway
| | - João A Gama
- Department of Pharmacy, Faculty of Health Sciences, UiT The Arctic University of Norway, Tromsø, Norway
| | - Anita C Schürch
- Department of Medical Microbiology, UMC Utrecht, Utrecht, The Netherlands
| | - Val F Lanza
- CIBERINFEC, Madrid, Spain
- Bioinformatics Unit, University Hospital Ramón y Cajal, IRYCIS, Madrid, Spain
| | - Pål Jarle Johnsen
- Department of Pharmacy, Faculty of Health Sciences, UiT The Arctic University of Norway, Tromsø, Norway
| | - Ørjan Samuelsen
- Norwegian National Advisory Unit on Detection of Antimicrobial Resistance, Department of Microbiology and Infection Control, University Hospital of North Norway, Tromsø, Norway
- Department of Pharmacy, Faculty of Health Sciences, UiT The Arctic University of Norway, Tromsø, Norway
| | - Gerry Tonkin-Hill
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Parasites and Microbes, Wellcome Sanger Institute, Cambridge, UK
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Parasites and Microbes, Wellcome Sanger Institute, Cambridge, UK
- Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), FI-00014 University of Helsinki, Helsinki, Finland
| |
Collapse
|
6
|
Rigby CV, Sabsay KR, Bisht K, Eggink D, Jalal H, te Velthuis AJW. Evolution of transient RNA structure-RNA polymerase interactions in respiratory RNA virus genomes. Virus Evol 2023; 9:vead056. [PMID: 37692892 PMCID: PMC10492445 DOI: 10.1093/ve/vead056] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 08/02/2023] [Accepted: 08/24/2023] [Indexed: 09/12/2023] Open
Abstract
RNA viruses are important human pathogens that cause seasonal epidemics and occasional pandemics. Examples are influenza A viruses (IAV) and coronaviruses (CoV). When emerging IAV and CoV spill over to humans, they adapt to evade immune responses and optimize their replication and spread in human cells. In IAV, adaptation occurs in all viral proteins, including the viral ribonucleoprotein (RNP) complex. RNPs consist of a copy of the viral RNA polymerase, a double-helical coil of nucleoprotein, and one of the eight segments of the IAV RNA genome. The RNA segments and their transcripts are partially structured to coordinate the packaging of the viral genome and modulate viral mRNA translation. In addition, RNA structures can affect the efficiency of viral RNA synthesis and the activation of host innate immune response. Here, we investigated if RNA structures that modulate IAV replication processivity, so-called template loops (t-loops), vary during the adaptation of pandemic and emerging IAV to humans. Using cell culture-based replication assays and in silico sequence analyses, we find that the sensitivity of the IAV H3N2 RNA polymerase to t-loops increased between isolates from 1968 and 2017, whereas the total free energy of t-loops in the IAV H3N2 genome was reduced. This reduction is particularly prominent in the PB1 gene. In H1N1 IAV, we find two separate reductions in t-loop free energy, one following the 1918 pandemic and one following the 2009 pandemic. No destabilization of t-loops is observed in the influenza B virus genome, whereas analysis of SARS-CoV-2 isolates reveals destabilization of viral RNA structures. Overall, we propose that a loss of free energy in the RNA genome of emerging respiratory RNA viruses may contribute to the adaption of these viruses to the human population.
Collapse
Affiliation(s)
- Charlotte V Rigby
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Washington Road, Princeton, NJ 08544, USA
- Department of Pathology, Addenbrooke’s Hospital, University of Cambridge, Hills Road, Cambridge CB2 2QQ, UK
- Addenbrooke’s Hospital, Public Health England, Hills Road, Cambridge CB2 2QQ, UK
| | - Kimberly R Sabsay
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Washington Road, Princeton, NJ 08544, USA
- Carl Icahn Laboratory, Lewis-Sigler Institute, Princeton University, South Drive, Princeton, NJ 08544, USA
| | - Karishma Bisht
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Washington Road, Princeton, NJ 08544, USA
| | - Dirk Eggink
- Department of Medical Microbiology, Amsterdam UMC, Meibergdreef 9, Amsterdam 1105 AZ, The Netherlands
| | - Hamid Jalal
- Addenbrooke’s Hospital, Public Health England, Hills Road, Cambridge CB2 2QQ, UK
| | - Aartjan J W te Velthuis
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Washington Road, Princeton, NJ 08544, USA
- Center for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Antonie van Leeuwenhoeklaan 9, Bilthoven 3721 MA, the Netherlands
| |
Collapse
|
7
|
Rigby C, Sabsay K, Bisht K, Eggink D, Jalal H, te Velthuis AJ. Evolution of transient RNA structure-RNA polymerase interactions in respiratory RNA virus genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.25.542331. [PMID: 37292879 PMCID: PMC10245964 DOI: 10.1101/2023.05.25.542331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
RNA viruses are important human pathogens that cause seasonal epidemics and occasional pandemics. Examples are influenza A viruses (IAV) and coronaviruses (CoV). When emerging IAV and CoV spill over to humans, they adapt to evade immune responses and optimize their replication and spread in human cells. In IAV, adaptation occurs in all viral proteins, including the viral ribonucleoprotein (RNP) complex. RNPs consists of a copy of the viral RNA polymerase, a double-helical coil of nucleoprotein, and one of the eight segments of the IAV RNA genome. The RNA segments and their transcripts are partially structured to coordinate the packaging of the viral genome and modulate viral mRNA translation. In addition, RNA structures can affect the efficiency of viral RNA synthesis and the activation of host innate immune response. Here, we investigated if RNA structures that modulate IAV replication processivity, so called template loops (t-loops), vary during the adaptation of pandemic and emerging IAV to humans. Using cell culture-based replication assays and in silico sequence analyses, we find that the sensitivity of the IAV H3N2 RNA polymerase to t-loops increased between isolates from 1968 and 2017, whereas the total free energy of t-loops in the IAV H3N2 genome was reduced. This reduction is particularly prominent in the PB1 gene. In H1N1 IAV, we find two separate reductions in t-loop free energy, one following the 1918 pandemic and one following the 2009 pandemic. No destabilization of t-loops is observed in the IBV genome, whereas analysis of SARS-CoV-2 isolates reveals destabilization of viral RNA structures. Overall, we propose that a loss of free energy in the RNA genome of emerging respiratory RNA viruses may contribute to the adaption of these viruses to the human population.
Collapse
Affiliation(s)
- Charlotte Rigby
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, 08544 New Jersey, United States
- University of Cambridge, Department of Pathology, Addenbrooke’s Hospital, Cambridge CB2 2QQ, United Kingdom
- Public Health England, Addenbrooke’s Hospital, Cambridge CB2 2QQ, United Kingdom
| | - Kimberly Sabsay
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, 08544 New Jersey, United States
- Sigler Institute, Princeton University, Princeton, NJ 08544, United States
| | - Karishma Bisht
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, 08544 New Jersey, United States
| | - Dirk Eggink
- Department of Medical Microbiology, Amsterdam UMC, Amsterdam, The Netherlands
- Center for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands
| | - Hamid Jalal
- Public Health England, Addenbrooke’s Hospital, Cambridge CB2 2QQ, United Kingdom
| | - Aartjan J.W. te Velthuis
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, 08544 New Jersey, United States
| |
Collapse
|
8
|
Flynn JM, Hu KB, Clark AG. Three recent sex chromosome-to-autosome fusions in a Drosophila virilis strain with high satellite DNA content. Genetics 2023; 224:iyad062. [PMID: 37052958 PMCID: PMC10213488 DOI: 10.1093/genetics/iyad062] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 12/02/2022] [Accepted: 04/07/2023] [Indexed: 04/14/2023] Open
Abstract
The karyotype, or number and arrangement of chromosomes, has varying levels of stability across both evolution and disease. Karyotype changes often originate from DNA breaks near the centromeres of chromosomes, which generally contain long arrays of tandem repeats or satellite DNA. Drosophila virilis possesses among the highest relative satellite abundances of studied species, with almost half its genome composed of three related 7 bp satellites. We discovered a strain of D. virilis that we infer recently underwent three independent chromosome fusion events involving the X and Y chromosomes, in addition to one subsequent fission event. Here, we isolate and characterize the four different karyotypes we discovered in this strain which we believe demonstrates remarkable genome instability. We discovered that one of the substrains with an X-autosome fusion has an X-to-Y chromosome nondisjunction rate 20 × higher than the D. virilis reference strain (21% vs 1%). Finally, we found an overall higher rate of DNA breakage in the substrain with higher satellite DNA compared to a genetically similar substrain with less satellite DNA. This suggests that satellite DNA abundance may play a role in the risk of genome instability. Overall, we introduce a novel system consisting of a single strain with four different karyotypes, which we believe will be useful for future studies of genome instability, centromere function, and sex chromosome evolution.
Collapse
Affiliation(s)
- Jullien M Flynn
- Department of Molecular Biology and Genetics, Cornell University, Biotechnology Building Room 227, Ithaca, NY 14853, USA
| | - Kevin B Hu
- Department of Molecular Biology and Genetics, Cornell University, Biotechnology Building Room 227, Ithaca, NY 14853, USA
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Biotechnology Building Room 227, Ithaca, NY 14853, USA
| |
Collapse
|
9
|
Weisweiler M, Stich B. Benchmarking of structural variant detection in the tetraploid potato genome using linked-read sequencing. Genomics 2023; 115:110568. [PMID: 36702293 DOI: 10.1016/j.ygeno.2023.110568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/12/2023] [Accepted: 01/18/2023] [Indexed: 01/25/2023]
Abstract
It has recently been shown that structural variants (SV) can have a higher impact on gene expression variation compared to single nucleotide variants (SNV) in different plant species. Additionally, SV were associated with phenotypic variation in several crops. However, compared to the established SV detection based on short-read sequencing, less approaches were described for linked-read based SV calling. We therefore evaluated the performance of six linked-read SV callers compared to an established short-read SV caller based on simulated linked-reads in tetraploid potato. The objectives of our study were to i) compare the performance of SV callers based on linked-read sequencing to short-read sequencing, ii) examine the influence of SV type, SV length, haplotype incidence (HI), as well as sequencing coverage on the SV calling performance in the tetraploid potato genome, and iii) evaluate the accuracy of detecting insertions by linked-read compared to short-read sequencing. We observed high break point resolutions (BPR) detecting short SV and slightly lower BPR for large SV. Our observations highlighted the importance of short-read signals provided by Manta and LinkedSV to detect short SV. Manta and NAIBR performed well for detecting larger deletions, inversions, and duplications. Detected large SV were weakly influenced by the HI. Furthermore, we illustrated that large insertions can be assembled by Novel-X. Our results suggest the usage of the short-read and linked-read SV callers Manta, NAIBR, LinkedSV, and Novel-X based on at least 90x linked-read sequencing coverage to ensure the detection of a broad range of SV in the tetraploid potato genome.
Collapse
Affiliation(s)
- Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225 Düsseldorf, Germany; Max Planck Institute for Plant Breeding Research, Carl-von-Linne-Weg 10, 50829 Köln, Germany.
| |
Collapse
|
10
|
Mouri K, Dewey HB, Castro R, Berenzy D, Kales S, Tewhey R. Whole-genome functional characterization of RE1 silencers using a modified massively parallel reporter assay. CELL GENOMICS 2023; 3:100234. [PMID: 36777181 PMCID: PMC9903721 DOI: 10.1016/j.xgen.2022.100234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 09/12/2022] [Accepted: 11/23/2022] [Indexed: 12/23/2022]
Abstract
Both upregulation and downregulation by cis-regulatory elements help modulate precise gene expression. However, our understanding of repressive elements is far more limited than activating elements. To address this gap, we characterized RE1, a group of transcriptional silencers bound by REST, at genome-wide scale using a modified massively parallel reporter assay (MPRAduo). MPRAduo empirically defined a minimal binding strength of REST (REST motif-intrinsic value [m-value]), above which cofactors colocalize and silence transcription. We identified 1,500 human variants that alter RE1 silencing and found that their effect sizes are predictable when they overlap with REST-binding sites above the m-value. Additionally, we demonstrate that non-canonical REST-binding motifs exhibit silencer function only if they precisely align half sites with specific spacer lengths. Our results show mechanistic insights into RE1, which allow us to predict its activity and effect of variants on RE1, providing a paradigm for performing genome-wide functional characterization of transcription-factor-binding sites.
Collapse
Affiliation(s)
| | | | | | | | - Susan Kales
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
- Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA, USA
| |
Collapse
|
11
|
Valenzuela SL, Norambuena T, Morgante V, García F, Jiménez JC, Núñez C, Fuentes I, Pollak B. Viroscope: Plant viral diagnosis from high-throughput sequencing data using biologically-informed genome assembly coverage. Front Microbiol 2022; 13:967021. [PMID: 36338106 PMCID: PMC9634423 DOI: 10.3389/fmicb.2022.967021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 09/29/2022] [Indexed: 11/25/2022] Open
Abstract
High-throughput sequencing (HTS) methods are transforming our capacity to detect pathogens and perform disease diagnosis. Although sequencing advances have enabled accessible and point-of-care HTS, data analysis pipelines have yet to provide robust tools for precise and certain diagnosis, particularly in cases of low sequencing coverage. Lack of standardized metrics and harmonized detection thresholds confound the problem further, impeding the adoption and implementation of these solutions in real-world applications. In this work, we tackle these issues and propose biologically-informed viral genome assembly coverage as a method to improve diagnostic certainty. We use the identification of viral replicases, an essential function of viral life cycles, to define genome coverage thresholds in which biological functions can be described. We validate the analysis pipeline, Viroscope, using field samples, synthetic and published datasets, and demonstrate that it provides sensitive and specific viral detection. Furthermore, we developed Viroscope.io a web-service to provide on-demand HTS data viral diagnosis to facilitate adoption and implementation by phytosanitary agencies to enable precise viral diagnosis.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Bernardo Pollak
- Meristem SpA, Santiago, Chile
- Multiplex SpA, Santiago, Chile
- *Correspondence: Bernardo Pollak,
| |
Collapse
|
12
|
Wang Y, Korneliussen TS, Holman LE, Manica A, Pedersen MW.
ngs
LCA
—A toolkit for fast and flexible lowest common ancestor inference and taxonomic profiling of metagenomic data. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.14006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Yucheng Wang
- Department of Zoology University of Cambridge Cambridge UK
- Lundbeck Foundation GeoGenetics Centre, Globe Institute University of Copenhagen Copenhagen K Denmark
- ALPHA, State Key Laboratory of Tibetan Plateau Earth System, Environment and Resources (TPESER) Institute of Tibetan Plateau Research (ITPCAS), Chinese Academy of Sciences (CAS) Beijing China
- BGI BGI‐Shenzhen Shanghai China
| | | | - Luke E. Holman
- School of Ocean and Earth Science, National Oceanography Centre Southampton University of Southampton Southampton UK
- Section for Evolutionary Genomics, Faculty of Health and Medical Sciences, Globe Institute University of Copenhagen Copenhagen Denmark
| | - Andrea Manica
- Department of Zoology University of Cambridge Cambridge UK
| | - Mikkel Winther Pedersen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute University of Copenhagen Copenhagen K Denmark
| |
Collapse
|
13
|
Weisweiler M, Arlt C, Wu PY, Van Inghelandt D, Hartwig T, Stich B. Structural variants in the barley gene pool: precision and sensitivity to detect them using short-read sequencing and their association with gene expression and phenotypic variation. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:3511-3529. [PMID: 36029318 PMCID: PMC9519679 DOI: 10.1007/s00122-022-04197-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 08/03/2022] [Indexed: 06/15/2023]
Abstract
Structural variants (SV) of 23 barley inbreds, detected by the best combination of SV callers based on short-read sequencing, were associated with genome-wide and gene-specific gene expression and, thus, were evaluated to predict agronomic traits. In human genetics, several studies have shown that phenotypic variation is more likely to be caused by structural variants (SV) than by single nucleotide variants. However, accurate while cost-efficient discovery of SV in complex genomes remains challenging. The objectives of our study were to (i) facilitate SV discovery studies by benchmarking SV callers and their combinations with respect to their sensitivity and precision to detect SV in the barley genome, (ii) characterize the occurrence and distribution of SV clusters in the genomes of 23 barley inbreds that are the parents of a unique resource for mapping quantitative traits, the double round robin population, (iii) quantify the association of SV clusters with transcript abundance, and (iv) evaluate the use of SV clusters for the prediction of phenotypic traits. In our computer simulations based on a sequencing coverage of 25x, a sensitivity > 70% and precision > 95% was observed for all combinations of SV types and SV length categories if the best combination of SV callers was used. We observed a significant (P < 0.05) association of gene-associated SV clusters with global gene-specific gene expression. Furthermore, about 9% of all SV clusters that were within 5 kb of a gene were significantly (P < 0.05) associated with the gene expression of the corresponding gene. The prediction ability of SV clusters was higher compared to that of single-nucleotide polymorphisms from an array across the seven studied phenotypic traits. These findings suggest the usefulness of exploiting SV information when fine mapping and cloning the causal genes underlying quantitative traits as well as the high potential of using SV clusters for the prediction of phenotypes in diverse germplasm sets.
Collapse
Affiliation(s)
- Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Christopher Arlt
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Po-Ya Wu
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Delphine Van Inghelandt
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Thomas Hartwig
- Institute for Molecular Physiology, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany.
- Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225, Düsseldorf, Germany.
| |
Collapse
|
14
|
Váradi A, Kaszab E, Kardos G, Prépost E, Szarka K, Laczkó L. Rapid genotyping of targeted viral samples using Illumina short-read sequencing data. PLoS One 2022; 17:e0274414. [PMID: 36112576 PMCID: PMC9481040 DOI: 10.1371/journal.pone.0274414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 08/30/2022] [Indexed: 11/19/2022] Open
Abstract
The most important information about microorganisms might be their accurate genome sequence. Using current Next Generation Sequencing methods, sequencing data can be generated at an unprecedented pace. However, we still lack tools for the automated and accurate reference-based genotyping of viral sequencing reads. This paper presents our pipeline designed to reconstruct the dominant consensus genome of viral samples and analyze their within-host variability. We benchmarked our approach on numerous datasets and showed that the consensus genome of samples could be obtained reliably without further manual data curation. Our pipeline can be a valuable tool for fast identifying viral samples. The pipeline is publicly available on the project’s GitHub page (https://github.com/laczkol/QVG).
Collapse
Affiliation(s)
- Alex Váradi
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
- Department of Laboratory Medicine, University of Pécs, Pécs, Hungary
| | - Eszter Kaszab
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
- Veterinary Medical Research Institute, Budapest, Hungary
| | - Gábor Kardos
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
| | - Eszter Prépost
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
| | - Krisztina Szarka
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
| | - Levente Laczkó
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
- ELKH-DE Conservation Biology Research Group, Debrecen, Hungary
- * E-mail:
| |
Collapse
|
15
|
Das JK, Roy S. A study on non-synonymous mutational patterns in structural proteins of SARS-CoV-2. Genome 2021; 64:665-678. [PMID: 33788636 DOI: 10.1139/gen-2020-0157] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
SARS-CoV-2 is mutating and creating divergent variants across the world. An in-depth investigation of the amino acid substitutions in the genomic signature of SARS-CoV-2 proteins is highly essential for understanding its host adaptation and infection biology. A total of 9587 SARS-CoV-2 structural protein sequences collected from 49 different countries are used to characterize protein-wise variants, substitution patterns (type and location), and major substitution changes. The majority of the substitutions are distinct, mostly in a particular location, and lead to a change in an amino acid's biochemical properties. In terms of mutational changes, envelope (E) and membrane (M) proteins are relatively more stable than nucleocapsid (N) and spike (S) proteins. Several co-occurrence substitutions are observed, particularly in S and N proteins. Substitution specific to active sub-domains reveals that heptapeptide repeat, fusion peptides, transmembrane in S protein, and N-terminal and C-terminal domains in the N protein are remarkably mutated. We also observe a few deleterious mutations in the above domains. The overall study on non-synonymous mutation in structural proteins of SARS-CoV-2 at the start of the pandemic indicates a diversity amongst virus sequences.
Collapse
Affiliation(s)
- Jayanta Kumar Das
- Department of Pediatrics, Johns Hopkins University School of Medicine, Maryland, USA
| | - Swarup Roy
- Network Reconstruction & Analysis (NetRA) Lab, Department of Computer Applications, Sikkim University, Gangtok, India
| |
Collapse
|