1
|
Tripathi S, Voogdt CGP, Bassler SO, Anderson M, Huang PH, Sakenova N, Capraz T, Jain S, Koumoutsi A, Bravo AM, Trotter V, Zimmerman M, Sonnenburg JL, Buie C, Typas A, Deutschbauer AM, Shiver AL, Huang KC. Randomly barcoded transposon mutant libraries for gut commensals I: Strategies for efficient library construction. Cell Rep 2024; 43:113517. [PMID: 38142397 DOI: 10.1016/j.celrep.2023.113517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 10/22/2023] [Accepted: 11/14/2023] [Indexed: 12/26/2023] Open
Abstract
Randomly barcoded transposon mutant libraries are powerful tools for studying gene function and organization, assessing gene essentiality and pathways, discovering potential therapeutic targets, and understanding the physiology of gut bacteria and their interactions with the host. However, construction of high-quality libraries with uniform representation can be challenging. In this review, we survey various strategies for barcoded library construction, including transposition systems, methods of transposon delivery, optimal library size, and transconjugant selection schemes. We discuss the advantages and limitations of each approach, as well as factors to consider when selecting a strategy. In addition, we highlight experimental and computational advances in arraying condensed libraries from mutant pools. We focus on examples of successful library construction in gut bacteria and their application to gene function studies and drug discovery. Given the need for understanding gene function and organization in gut bacteria, we provide a comprehensive guide for researchers to construct randomly barcoded transposon mutant libraries.
Collapse
Affiliation(s)
- Surya Tripathi
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Carlos Geert Pieter Voogdt
- Genome Biology Unit, EMBL Heidelberg, Meyerhofstraße 1, 69117 Heidelberg, Germany; Structural and Computational Biology Unit, EMBL Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Stefan Oliver Bassler
- Genome Biology Unit, EMBL Heidelberg, Meyerhofstraße 1, 69117 Heidelberg, Germany; Faculty of Biosciences, Heidelberg University, Grabengasse 1, 69117 Heidelberg, Germany
| | - Mary Anderson
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Po-Hsun Huang
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Nazgul Sakenova
- Genome Biology Unit, EMBL Heidelberg, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Tümay Capraz
- Genome Biology Unit, EMBL Heidelberg, Meyerhofstraße 1, 69117 Heidelberg, Germany; Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Sunit Jain
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Alexandra Koumoutsi
- Genome Biology Unit, EMBL Heidelberg, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Afonso Martins Bravo
- Department of Fundamental Microbiology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Valentine Trotter
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Michael Zimmerman
- Structural and Computational Biology Unit, EMBL Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Justin L Sonnenburg
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA; Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Cullen Buie
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Athanasios Typas
- Genome Biology Unit, EMBL Heidelberg, Meyerhofstraße 1, 69117 Heidelberg, Germany; Structural and Computational Biology Unit, EMBL Meyerhofstraße 1, 69117 Heidelberg, Germany.
| | - Adam M Deutschbauer
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| | - Anthony L Shiver
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA.
| | - Kerwyn Casey Huang
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA; Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Bioengineering, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
2
|
Willett JLE, Barnes AMT, Brunson DN, Lecomte A, Robertson EB, Dunny GM. Optimized Replication of Arrayed Bacterial Mutant Libraries Increases Access to Biological Resources. Microbiol Spectr 2023; 11:e0169323. [PMID: 37432110 PMCID: PMC10434011 DOI: 10.1128/spectrum.01693-23] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 06/19/2023] [Indexed: 07/12/2023] Open
Abstract
Biological collections, including arrayed libraries of single transposon (Tn) or deletion mutants, greatly accelerate the pace of bacterial genetic research. Despite the importance of these resources, few protocols exist for the replication and distribution of these materials. Here, we describe a protocol for creating multiple replicates of an arrayed bacterial Tn library consisting of approximately 6,800 mutants in 96-well plates (73 plates). Our protocol provides multiple checkpoints to guard against contamination and minimize genetic drift caused by freeze/thaw cycles. This approach can also be scaled for arrayed culture collections of various sizes. Overall, this protocol is a valuable resource for other researchers considering the construction and distribution of arrayed culture collection resources for the benefit of the greater scientific community. IMPORTANCE Arrayed mutant collections drive robust genetic screens, but few protocols exist for replication of these resources and subsequent quality control. Increasing the distribution of arrayed biological collections will increase the accessibility and use of these resources. Developing standardized techniques for replication of these resources is essential for ensuring their quality and usefulness to the scientific community.
Collapse
Affiliation(s)
- Julia L. E. Willett
- Department of Microbiology and Immunology, University of Minnesota Medical School, Minneapolis, Minnesota, USA
| | - Aaron M. T. Barnes
- Department of Microbiology and Immunology, University of Minnesota Medical School, Minneapolis, Minnesota, USA
| | - Debra N. Brunson
- Department of Oral Biology, University of Florida College of Dentistry, Gainesville, Florida, USA
| | - Alexandre Lecomte
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, Jouy-en-Josas, France
| | - Ethan B. Robertson
- Department of Microbiology and Immunology, University of Minnesota Medical School, Minneapolis, Minnesota, USA
| | - Gary M. Dunny
- Department of Microbiology and Immunology, University of Minnesota Medical School, Minneapolis, Minnesota, USA
| |
Collapse
|
3
|
Johnson MS, Venkataram S, Kryazhimskiy S. Best Practices in Designing, Sequencing, and Identifying Random DNA Barcodes. J Mol Evol 2023; 91:263-280. [PMID: 36651964 PMCID: PMC10276077 DOI: 10.1007/s00239-022-10083-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 12/15/2022] [Indexed: 01/19/2023]
Abstract
Random DNA barcodes are a versatile tool for tracking cell lineages, with applications ranging from development to cancer to evolution. Here, we review and critically evaluate barcode designs as well as methods of barcode sequencing and initial processing of barcode data. We first demonstrate how various barcode design decisions affect data quality and propose a new design that balances all considerations that we are currently aware of. We then discuss various options for the preparation of barcode sequencing libraries, including inline indices and Unique Molecular Identifiers (UMIs). Finally, we test the performance of several established and new bioinformatic pipelines for the extraction of barcodes from raw sequencing reads and for error correction. We find that both alignment and regular expression-based approaches work well for barcode extraction, and that error-correction pipelines designed specifically for barcode data are superior to generic ones. Overall, this review will help researchers to approach their barcoding experiments in a deliberate and systematic way.
Collapse
Affiliation(s)
- Milo S Johnson
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, 94720, USA
| | - Sandeep Venkataram
- Department of Ecology, Behavior and Evolution, University of California San Diego, La Jolla, CA, 92093, USA
| | - Sergey Kryazhimskiy
- Department of Ecology, Behavior and Evolution, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
4
|
Willett JLE, Barnes AMT, Brunson DN, Lecomte A, Robertson EB, Dunny GM. Optimized replication of arrayed bacterial mutant libraries increase access to biological resources. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.25.537918. [PMID: 37162974 PMCID: PMC10168237 DOI: 10.1101/2023.04.25.537918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Biological collections, including arrayed libraries of single transposon or deletion mutants, greatly accelerate the pace of bacterial genetics research. Despite the importance of these resources, few protocols exist for the replication and distribution of these materials. Here, we describe a protocol for creating multiple replicates of an arrayed bacterial Tn library consisting of approximately 6,800 mutants in 73 × 96-well plates. Our protocol provides multiple checkpoints to guard against contamination and minimize genetic drift caused by freeze/thaw cycles. This approach can also be scaled for arrayed culture collections of various sizes. Overall, this protocol is a valuable resource for other researchers considering the construction and distribution of arrayed culture collection resources for the benefit of the greater scientific community. Importance Arrayed mutant collections drive robust genetic screens, yet few protocols exist for replication of these resources and subsequent quality control. Increasing distribution of arrayed biological collections will increase accessibility to and use of these resources. Developing standardized techniques for replication of these resources is essential for ensuring their quality and usefulness to the scientific community.
Collapse
Affiliation(s)
- Julia L. E. Willett
- Department of Microbiology and Immunology, University of Minnesota Medical School, Minneapolis, MN USA
| | - Aaron M. T. Barnes
- Department of Microbiology and Immunology, University of Minnesota Medical School, Minneapolis, MN USA
- Present address: Minnesota Department of Health, MN, USA
| | - Debra N. Brunson
- Department of Oral Biology, University of Florida College of Dentistry, Gainesville, FL USA
| | - Alexandre Lecomte
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, Jouy-en-Josas, France
| | - Ethan B. Robertson
- Department of Microbiology and Immunology, University of Minnesota Medical School, Minneapolis, MN USA
| | - Gary M. Dunny
- Department of Microbiology and Immunology, University of Minnesota Medical School, Minneapolis, MN USA
| |
Collapse
|
5
|
Zhang L, Chen J, Ma C, Liu X, Xu L. Performance Analysis of Electromyogram Signal Compression Sampling in a Wireless Body Area Network. MICROMACHINES 2022; 13:1748. [PMID: 36296102 PMCID: PMC9611018 DOI: 10.3390/mi13101748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/10/2022] [Accepted: 10/13/2022] [Indexed: 06/16/2023]
Abstract
The rapid growth in demand for portable and intelligent hardware has caused tremendous pressure on signal sampling, transfer, and storage resources. As an emerging signal acquisition technology, compressed sensing (CS) has promising application prospects in low-cost wireless sensor networks. To achieve reduced energy consumption and maintain a longer acquisition duration for high sample rate electromyogram (EMG) signals, this paper comprehensively analyzes the compressed sensing method using EMG. A fair comparison is carried out on the performances of 52 ordinary wavelet sparse bases and five widely applied reconstruction algorithms at different compression levels. The experimental results show that the db2 wavelet basis can sparse EMG signals so that the compressed EMG signals are reconstructed properly, thanks to its low percentage root mean square distortion (PRD) values at most compression ratios. In addition, the basis pursuit (BP) reconstruction algorithm can provide a more efficient reconstruction process and better reconstruction performance by comparison. The experiment records and comparative analysis screen out the suitable sparse bases and reconstruction algorithms for EMG signals, acting as prior experiments for further practical applications and also a benchmark for future academic research.
Collapse
Affiliation(s)
- Liangyu Zhang
- College of Medicine and Biological Information Engineering, Northeastern University, 195 Innovation Road, Shenyang 110169, China
| | - Junxin Chen
- College of Medicine and Biological Information Engineering, Northeastern University, 195 Innovation Road, Shenyang 110169, China
| | - Chenfei Ma
- Edinburgh Neuroprosthetics Laboratory, School of Informatics, The University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, UK
| | - Xiufang Liu
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Lisheng Xu
- College of Medicine and Biological Information Engineering, Northeastern University, 195 Innovation Road, Shenyang 110169, China
| |
Collapse
|
6
|
Clouard C, Ausmees K, Nettelblad C. A joint use of pooling and imputation for genotyping SNPs. BMC Bioinformatics 2022; 23:421. [PMID: 36229780 PMCID: PMC9563787 DOI: 10.1186/s12859-022-04974-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 09/29/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount of SNP arrays needed. We believe that this will be of the greatest importance for non-model organisms with more limited resources in terms of cost-efficient large-scale chips and high-quality reference genomes, such as application in wildlife monitoring, plant and animal breeding, but it is in essence species-agnostic. The proposed approach consists in grouping and mixing individual DNA samples into pools before testing these pools on bead-chips, such that the number of pools is less than the number of individual samples. We present a statistical estimation algorithm, based on the pooling outcomes, for inferring marker-wise the most likely genotype of every sample in each pool. Finally, we input these estimated genotypes into existing imputation algorithms. We compare the imputation performance from pooled data with the Beagle algorithm, and a local likelihood-aware phasing algorithm closely modeled on MaCH that we implemented. RESULTS We conduct simulations based on human data from the 1000 Genomes Project, to aid comparison with other imputation studies. Based on the simulated data, we find that pooling impacts the genotype frequencies of the directly identifiable markers, without imputation. We also demonstrate how a combinatorial estimation of the genotype probabilities from the pooling design can improve the prediction performance of imputation models. Our algorithm achieves 93% concordance in predicting unassayed markers from pooled data, thus it outperforms the Beagle imputation model which reaches 80% concordance. We observe that the pooling design gives higher concordance for the rare variants than traditional low-density to high-density imputation commonly used for cost-effective genotyping of large cohorts. CONCLUSIONS We present promising results for combining a pooling scheme for SNP genotyping with computational genotype imputation on human data. These results could find potential applications in any context where the genotyping costs form a limiting factor on the study size, such as in marker-assisted selection in plant breeding.
Collapse
Affiliation(s)
- Camille Clouard
- Division of Scientific Computing, Department of Information Technology, Uppsala University, Lägerhyddsvägen 1, hus 10, 75237 Uppsala, Sweden
| | - Kristiina Ausmees
- Division of Scientific Computing, Department of Information Technology, Uppsala University, Lägerhyddsvägen 1, hus 10, 75237 Uppsala, Sweden
| | - Carl Nettelblad
- Division of Scientific Computing, Department of Information Technology, Uppsala University, Lägerhyddsvägen 1, hus 10, 75237 Uppsala, Sweden
| |
Collapse
|
7
|
Nguyen Ba AN, Lawrence KR, Rego-Costa A, Gopalakrishnan S, Temko D, Michor F, Desai MM. Barcoded Bulk QTL mapping reveals highly polygenic and epistatic architecture of complex traits in yeast. eLife 2022; 11:73983. [PMID: 35147078 PMCID: PMC8979589 DOI: 10.7554/elife.73983] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 02/11/2022] [Indexed: 11/25/2022] Open
Abstract
Mapping the genetic basis of complex traits is critical to uncovering the biological mechanisms that underlie disease and other phenotypes. Genome-wide association studies (GWAS) in humans and quantitative trait locus (QTL) mapping in model organisms can now explain much of the observed heritability in many traits, allowing us to predict phenotype from genotype. However, constraints on power due to statistical confounders in large GWAS and smaller sample sizes in QTL studies still limit our ability to resolve numerous small-effect variants, map them to causal genes, identify pleiotropic effects across multiple traits, and infer non-additive interactions between loci (epistasis). Here, we introduce barcoded bulk quantitative trait locus (BB-QTL) mapping, which allows us to construct, genotype, and phenotype 100,000 offspring of a budding yeast cross, two orders of magnitude larger than the previous state of the art. We use this panel to map the genetic basis of eighteen complex traits, finding that the genetic architecture of these traits involves hundreds of small-effect loci densely spaced throughout the genome, many with widespread pleiotropic effects across multiple traits. Epistasis plays a central role, with thousands of interactions that provide insight into genetic networks. By dramatically increasing sample size, BB-QTL mapping demonstrates the potential of natural variants in high-powered QTL studies to reveal the highly polygenic, pleiotropic, and epistatic architecture of complex traits.
Collapse
Affiliation(s)
- Alex N Nguyen Ba
- Department of Organismic and Evolutionary Biology, Harvard University
| | | | - Artur Rego-Costa
- Department of Organismic and Evolutionary Biology, Harvard University
| | | | | | | | - Michael M Desai
- Department of Organismic and Evolutionary Biology, Harvard University
| |
Collapse
|
8
|
Furth N, Shilo S, Cohen N, Erez N, Fedyuk V, Schrager AM, Weinberger A, Dror AA, Zigron A, Shehadeh M, Sela E, Srouji S, Amit S, Levy I, Segal E, Dahan R, Jones D, Douek DC, Shema E. Unified platform for genetic and serological detection of COVID-19 with single-molecule technology. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021. [PMID: 34075385 PMCID: PMC8168389 DOI: 10.1101/2021.05.25.21257501] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The COVID-19 pandemic raises the need for diverse diagnostic approaches to rapidly detect different stages of viral infection. The flexible and quantitative nature of single-molecule imaging technology renders it optimal for development of new diagnostic tools. Here we present a proof-of-concept for a single-molecule based, enzyme-free assay for detection of SARS-CoV-2. The unified platform we developed allows direct detection of the viral genetic material from patients' samples, as well as their immune response consisting of IgG and IgM antibodies. Thus, it establishes a platform for diagnostics of COVID-19, which could also be adjusted to diagnose additional pathogens.
Collapse
Affiliation(s)
- Noa Furth
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Shay Shilo
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Niv Cohen
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Nir Erez
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Vadim Fedyuk
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Alexander M Schrager
- Human Immunology Section, Vaccine Research Center, National Institutes of Health, Bethesda, MD, USA
| | - Adina Weinberger
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Amiel A Dror
- Department of Otolaryngology, Head and Neck Surgery, Galilee Medical Center, Nahariya, Israel.,The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Asaf Zigron
- Oral and Maxillofacial Department, Galilee Medical Center, Nahariya, Israel.,The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Mona Shehadeh
- Clinical Laboratories division, Clinical Biochemistry and Endocrinology laboratory, Galilee Medical Center, Naharia, Israel.,The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Eyal Sela
- Department of Otolaryngology, Head and Neck Surgery, Galilee Medical Center, Nahariya, Israel.,The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Samer Srouji
- Oral and Maxillofacial Department, Galilee Medical Center, Nahariya, Israel.,The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | | | - Itzchak Levy
- Sheba Medical Center, Ramat Gan, Israel.,Sackler Medical School, Tel Aviv university, Tel Aviv, Israel
| | - Eran Segal
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Rony Dahan
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | | | - Daniel C Douek
- Human Immunology Section, Vaccine Research Center, National Institutes of Health, Bethesda, MD, USA
| | - Efrat Shema
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
9
|
Furth N, Shilo S, Cohen N, Erez N, Fedyuk V, Schrager AM, Weinberger A, Dror AA, Zigron A, Shehadeh M, Sela E, Srouji S, Amit S, Levy I, Segal E, Dahan R, Jones D, Douek DC, Shema E. Unified platform for genetic and serological detection of COVID-19 with single-molecule technology. PLoS One 2021; 16:e0255096. [PMID: 34310620 PMCID: PMC8312974 DOI: 10.1371/journal.pone.0255096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 07/10/2021] [Indexed: 11/26/2022] Open
Abstract
The COVID-19 pandemic raises the need for diverse diagnostic approaches to rapidly detect different stages of viral infection. The flexible and quantitative nature of single-molecule imaging technology renders it optimal for development of new diagnostic tools. Here we present a proof-of-concept for a single-molecule based, enzyme-free assay for detection of SARS-CoV-2. The unified platform we developed allows direct detection of the viral genetic material from patients' samples, as well as their immune response consisting of IgG and IgM antibodies. Thus, it establishes a platform for diagnostics of COVID-19, which could also be adjusted to diagnose additional pathogens.
Collapse
Affiliation(s)
- Noa Furth
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Shay Shilo
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Niv Cohen
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Nir Erez
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Vadim Fedyuk
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Alexander M. Schrager
- Human Immunology Section, Vaccine Research Center, National Institutes of Health, Bethesda, MD, United States of America
| | - Adina Weinberger
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Amiel A. Dror
- Department of Otolaryngology, Head and Neck Surgery, Galilee Medical Center, Nahariya, Israel
- The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Asaf Zigron
- The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
- Oral and Maxillofacial Department, Galilee Medical Center, Nahariya, Israel
| | - Mona Shehadeh
- The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
- Clinical Laboratories Division, Clinical Biochemistry and Endocrinology Laboratory, Galilee Medical Center, Naharia, Israel
| | - Eyal Sela
- Department of Otolaryngology, Head and Neck Surgery, Galilee Medical Center, Nahariya, Israel
- The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Samer Srouji
- The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
- Oral and Maxillofacial Department, Galilee Medical Center, Nahariya, Israel
| | | | - Itzchak Levy
- Sheba Medical Center, Ramat Gan, Israel
- Sackler Medical School, Tel Aviv university, Tel Aviv, Israel
| | - Eran Segal
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Rony Dahan
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Dan Jones
- SeqLL, Woburn, MA, United States of America
| | - Daniel C. Douek
- Human Immunology Section, Vaccine Research Center, National Institutes of Health, Bethesda, MD, United States of America
| | - Efrat Shema
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
10
|
Rapid ordering of barcoded transposon insertion libraries of anaerobic bacteria. Nat Protoc 2021; 16:3049-3071. [PMID: 34021295 DOI: 10.1038/s41596-021-00531-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2019] [Accepted: 02/16/2021] [Indexed: 02/07/2023]
Abstract
Commensal bacteria from the human intestinal microbiota play important roles in health and disease. Research into the mechanisms by which these bacteria exert their effects is hampered by the complexity of the microbiota, the strict growth requirements of the individual species and a lack of genetic tools and resources. The assembly of ordered transposon insertion libraries, in which nearly all nonessential genes have been disrupted and the strains stored as independent monocultures, would be a transformative resource for research into many microbiota members. However, assembly of these libraries must be fast and inexpensive in order to empower investigation of the large number of species that typically compose gut communities. The methods used to generate ordered libraries must also be adapted to the anaerobic growth requirements of most intestinal bacteria. We have developed a protocol to assemble ordered libraries of transposon insertion mutants that is fast, cheap and effective for even strict anaerobes. The protocol differs from currently available methods by making use of cell sorting to order the library and barcoded transposons to facilitate the localization of ordered mutations in the library. By tracking transposon insertions using barcode sequencing, our approach increases the accuracy and reduces the time and effort required to locate mutants in the library. Ordered libraries can be sorted and characterized over the course of 2 weeks using this approach. We expect this protocol will lower the barrier to generating comprehensive, ordered mutant libraries for many species in the human microbiota, allowing for new investigations into genotype-phenotype relationships within this important microbial ecosystem.
Collapse
|
11
|
Park D, Swayambhu G, Lyga T, Pfeifer BA. Complex natural product production methods and options. Synth Syst Biotechnol 2021; 6:1-11. [PMID: 33474503 PMCID: PMC7803631 DOI: 10.1016/j.synbio.2020.12.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 11/19/2020] [Accepted: 12/21/2020] [Indexed: 12/29/2022] Open
Abstract
Natural products have had a major impact upon quality of life, with antibiotics as a classic example of having a transformative impact upon human health. In this contribution, we will highlight both historic and emerging methods of natural product bio-manufacturing. Traditional methods of natural product production relied upon native cellular host systems. In this context, pragmatic and effective methodologies were established to enable widespread access to natural products. In reviewing such strategies, we will also highlight the development of heterologous natural product biosynthesis, which relies instead on a surrogate host system theoretically capable of advanced production potential. In comparing native and heterologous systems, we will comment on the base organisms used for natural product biosynthesis and how the properties of such cellular hosts dictate scaled engineering practices to facilitate compound distribution. In concluding the article, we will examine novel efforts in production practices that entirely eliminate the constraints of cellular production hosts. That is, cell free production efforts will be introduced and reviewed for the purpose of complex natural product biosynthesis. Included in this final analysis will be research efforts made on our part to test the cell free biosynthesis of the complex polyketide antibiotic natural product erythromycin.
Collapse
Affiliation(s)
- Dongwon Park
- Department of Chemical and Biological Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Girish Swayambhu
- Department of Chemical and Biological Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Thomas Lyga
- Department of Chemical and Biological Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Blaine A Pfeifer
- Department of Chemical and Biological Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| |
Collapse
|
12
|
Zhang T, Foreman R, Wollman R. Identifying chromatin features that regulate gene expression distribution. Sci Rep 2020; 10:20566. [PMID: 33239733 PMCID: PMC7688950 DOI: 10.1038/s41598-020-77638-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 11/10/2020] [Indexed: 12/17/2022] Open
Abstract
Gene expression variability, differences in the number of mRNA per cell across a population of cells, is ubiquitous across diverse organisms with broad impacts on cellular phenotypes. The role of chromatin in regulating average gene expression has been extensively studied. However, what aspects of the chromatin contribute to gene expression variability is still underexplored. Here we addressed this problem by leveraging chromatin diversity and using a systematic investigation of randomly integrated expression reporters to identify what aspects of chromatin microenvironment contribute to gene expression variability. Using DNA barcoding and split-pool decoding, we created a large library of isogenic reporter clones and identified reporter integration sites in a massive and parallel manner. By mapping our measurements of reporter expression at different genomic loci with multiple epigenetic profiles including the enrichment of transcription factors and the distance to different chromatin states, we identified new factors that impact the regulation of gene expression distributions.
Collapse
Affiliation(s)
- Thanutra Zhang
- Institute for Quantitative and Computational Biosciences, UCLA, Los Angeles, CA, USA
| | - Robert Foreman
- Institute for Quantitative and Computational Biosciences, UCLA, Los Angeles, CA, USA
| | - Roy Wollman
- Institute for Quantitative and Computational Biosciences, UCLA, Los Angeles, CA, USA.
- Departments of Integrative Biology and Physiology and Chemistry and Biochemistry, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
13
|
Furstenau TN, Cocking JH, Hepp CM, Fofanov VY. Sample pooling methods for efficient pathogen screening: Practical implications. PLoS One 2020; 15:e0236849. [PMID: 33175841 PMCID: PMC7657563 DOI: 10.1371/journal.pone.0236849] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 10/14/2020] [Indexed: 01/06/2023] Open
Abstract
Due to the large number of negative tests, individually screening large populations for rare pathogens can be wasteful and expensive. Sample pooling methods improve the efficiency of large-scale pathogen screening campaigns by reducing the number of tests and reagents required to accurately categorize positive and negative individuals. Such methods rely on group testing theory which mainly focuses on minimizing the total number of tests; however, many other practical concerns and tradeoffs must be considered when choosing an appropriate method for a given set of circumstances. Here we use computational simulations to determine how several theoretical approaches compare in terms of (a) the number of tests, to minimize costs and save reagents, (b) the number of sequential steps, to reduce the time it takes to complete the assay, (c) the number of samples per pool, to avoid the limits of detection, (d) simplicity, to reduce the risk of human error, and (e) robustness, to poor estimates of the number of positive samples. We found that established methods often perform very well in one area but very poorly in others. Therefore, we introduce and validate a new method which performs fairly well across each of the above criteria making it a good general use approach.
Collapse
Affiliation(s)
- Tara N. Furstenau
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - Jill H. Cocking
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, Arizona, United States of America
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - Crystal M. Hepp
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, Arizona, United States of America
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - Viacheslav Y. Fofanov
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, Arizona, United States of America
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, United States of America
- * E-mail:
| |
Collapse
|
14
|
Shental N, Levy S, Wuvshet V, Skorniakov S, Shalem B, Ottolenghi A, Greenshpan Y, Steinberg R, Edri A, Gillis R, Goldhirsh M, Moscovici K, Sachren S, Friedman LM, Nesher L, Shemer-Avni Y, Porgador A, Hertz T. Efficient high-throughput SARS-CoV-2 testing to detect asymptomatic carriers. SCIENCE ADVANCES 2020; 6:eabc5961. [PMID: 32917716 PMCID: PMC7485993 DOI: 10.1126/sciadv.abc5961] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 07/28/2020] [Indexed: 05/26/2023]
Abstract
Recent reports suggest that 10 to 30% of severe acute respiratory syndrome coronavirus 2 (SARS- CoV-2) infected patients are asymptomatic and that viral shedding may occur before symptom onset. Therefore, there is an urgent need to increase diagnostic testing capabilities to prevent disease spread. We developed P-BEST, a method for Pooling-Based Efficient SARS-CoV-2 Testing, which identifies all positive subjects within a set of samples using a single round of testing. Each sample is assigned into multiple pools using a combinatorial pooling strategy based on compressed sensing. We pooled sets of 384 samples into 48 pools, providing both an eightfold increase in testing efficiency and an eightfold reduction in test costs, while identifying up to five positive carriers. We then used P-BEST to screen 1115 health care workers using 144 tests. P- BEST provides an efficient and easy-to-implement solution for increasing testing capacity that can be easily integrated into diagnostic laboratories.
Collapse
Affiliation(s)
- Noam Shental
- Department of Computer Science, The Open University of Israel, Ra'anana, Israel.
| | - Shlomia Levy
- Department of Microbiology and Immunology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
- National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Vered Wuvshet
- Department of Microbiology and Immunology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
- National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Shosh Skorniakov
- Department of Microbiology and Immunology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
- National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Bar Shalem
- Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
| | - Aner Ottolenghi
- Department of Microbiology and Immunology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
- National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Yariv Greenshpan
- Department of Microbiology and Immunology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
- National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | | | - Avishay Edri
- Department of Microbiology and Immunology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
- National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Roni Gillis
- Goldman Medical School, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Michal Goldhirsh
- Goldman Medical School, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Khen Moscovici
- Goldman Medical School, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Sinai Sachren
- National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Lilach M Friedman
- Department of Microbiology and Immunology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
- National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Lior Nesher
- Soroka University Medical Center, Beer-Sheva, Israel
| | - Yonat Shemer-Avni
- Department of Microbiology and Immunology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
- Soroka University Medical Center, Beer-Sheva, Israel
| | - Angel Porgador
- Department of Microbiology and Immunology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
- National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Tomer Hertz
- Department of Microbiology and Immunology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
- National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| |
Collapse
|
15
|
Zhang T, Pilko A, Wollman R. Loci specific epigenetic drug sensitivity. Nucleic Acids Res 2020; 48:4797-4810. [PMID: 32246716 PMCID: PMC7229858 DOI: 10.1093/nar/gkaa210] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 02/10/2020] [Accepted: 03/27/2020] [Indexed: 12/14/2022] Open
Abstract
Therapeutic targeting of epigenetic modulators offers a novel approach to the treatment of multiple diseases. The cellular consequences of chemical compounds that target epigenetic regulators (epi-drugs) are complex. Epi-drugs affect global cellular phenotypes and cause local changes to gene expression due to alteration of a gene chromatin environment. Despite increasing use in the clinic, the mechanisms responsible for cellular changes are unclear. Specifically, to what degree the effects are a result of cell-wide changes or disease related locus specific effects is unknown. Here we developed a platform to systematically and simultaneously investigate the sensitivity of epi-drugs at hundreds of genomic locations by combining DNA barcoding, unique split-pool encoding, and single cell expression measurements. Internal controls are used to isolate locus specific effects separately from any global consequences these drugs have. Using this platform we discovered wide-spread loci specific sensitivities to epi-drugs for three distinct epi-drugs that target histone deacetylase, DNA methylation and bromodomain proteins. By leveraging ENCODE data on chromatin modification, we identified features of chromatin environments that are most likely to be affected by epi-drugs. The measurements of loci specific epi-drugs sensitivities will pave the way to the development of targeted therapy for personalized medicine.
Collapse
Affiliation(s)
- Thanutra Zhang
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, CA, USA
| | - Anna Pilko
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, CA, USA
- Departments of Integrative Biology and Physiology and Chemistry and Biochemistry, University of California UCLA, CA, USA
| | - Roy Wollman
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, CA, USA
- Departments of Integrative Biology and Physiology and Chemistry and Biochemistry, University of California UCLA, CA, USA
| |
Collapse
|
16
|
Damian D, Maghembe R, Damas M, Wensman JJ, Berg M. Application of Viral Metagenomics for Study of Emerging and Reemerging Tick-Borne Viruses. Vector Borne Zoonotic Dis 2020; 20:557-565. [PMID: 32267808 DOI: 10.1089/vbz.2019.2579] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Ticks are important vectors for different tick-borne viruses, some of which cause diseases and death in humans, livestock, and wild animals. Tick-borne encephalitis virus, Crimean-Congo hemorrhagic fever virus, Kyasanur forest disease virus, severe fever with thrombocytopenia syndrome virus, Heartland virus, African swine fever virus, Nairobi sheep disease virus, and Louping ill virus are just a few examples of important tick-borne viruses. The majority of tick-borne viruses have RNA genomes that routinely undergo rapid genetic modifications such as point mutations during their replication. These genomic changes can influence the spread of viruses to new habitats and hosts and lead to the emergence of novel viruses that can pose a threat to public health. Therefore, investigation of the viruses circulating in ticks is important to understand their diversity, host and vector range, and evolutionary history, as well as to predict new emerging pathogens. The choice of detection method is important, as most methods detect only those viruses that have been previously well described. On the other hand, viral metagenomics is a useful tool to simultaneously identify all the viruses present in a sample, including novel variants of already known viruses or completely new viruses. This review describes tick-borne viruses, their historical background of emergence, and their reemergence in nature, and the use of viral metagenomics for viral discovery and studies of viral evolution.
Collapse
Affiliation(s)
- Donath Damian
- Department of Molecular Biology and Biotechnology, University of Dar es Salaam, Dar es Salaam, Tanzania
| | - Reuben Maghembe
- Department of Molecular Biology and Biotechnology, University of Dar es Salaam, Dar es Salaam, Tanzania
| | - Modester Damas
- Department of Molecular Biology and Biotechnology, University of Dar es Salaam, Dar es Salaam, Tanzania
| | - Jonas Johansson Wensman
- Section of Ruminant Medicine, Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Mikael Berg
- Section of Virology, Department of Biomedical Sciences and Veterinary Public Health, Swedish University of Agricultural Sciences, Uppsala, Sweden
| |
Collapse
|
17
|
Borgers K, Vandewalle K, Festjens N, Callewaert N. A guide to Mycobacterium mutagenesis. FEBS J 2019; 286:3757-3774. [PMID: 31419030 DOI: 10.1111/febs.15041] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 07/05/2019] [Accepted: 08/12/2019] [Indexed: 12/18/2022]
Abstract
The genus Mycobacterium includes several pathogens that cause severe disease in humans, like Mycobacterium tuberculosis (M. tb), the infectious agent causing tuberculosis. Genetic tools to engineer mycobacterial genomes, in a targeted or random fashion, have provided opportunities to investigate M. tb infection and pathogenesis. Furthermore, they have allowed the identification and validation of potential targets for the diagnosis, prevention, and treatment of tuberculosis. This review describes the various methods that are available for the generation of mutants in Mycobacterium species, focusing specifically on tools for altering slow-growing mycobacteria from the M. tb complex. Among others, it incorporates the recent new molecular biological technologies (e.g. ORBIT) to rapidly and/or genome-wide comprehensively obtain targeted mutants in mycobacteria. As such, this review can be used as a guide to select the appropriate genetic tools to generate mycobacterial mutants of interest, which can be used as tools to aid understanding of M. tb infection or to help developing TB intervention strategies.
Collapse
Affiliation(s)
- Katlyn Borgers
- VIB-UGhent Center for Medical Biotechnology, Belgium.,Department of Biochemistry and Microbiology, Ghent University, Belgium
| | - Kristof Vandewalle
- VIB-UGhent Center for Medical Biotechnology, Belgium.,Department of Biochemistry and Microbiology, Ghent University, Belgium
| | - Nele Festjens
- VIB-UGhent Center for Medical Biotechnology, Belgium.,Department of Biochemistry and Microbiology, Ghent University, Belgium
| | - Nico Callewaert
- VIB-UGhent Center for Medical Biotechnology, Belgium.,Department of Biochemistry and Microbiology, Ghent University, Belgium
| |
Collapse
|
18
|
Zhernakov AI, Afonin AM, Gavriliuk ND, Moiseeva OM, Zhukov VA. s-dePooler: determination of polymorphism carriers from overlapping DNA pools. BMC Bioinformatics 2019; 20:45. [PMID: 30669964 PMCID: PMC6343301 DOI: 10.1186/s12859-019-2616-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 01/09/2019] [Indexed: 11/26/2022] Open
Abstract
Background Samples pooling is a method widely used in studies to reduce costs and labour. DNA sample pooling combined with massive parallel sequencing is a powerful tool for discovering DNA variants (polymorphisms) in large analysing populations, which is the base of such research fields as Genome-Wide Association Studies, evolutionary and population studies, etc. Usage of overlapping pools where each sample is present in multiple pools can enhance the accuracy of polymorphism detection and allow identifying carriers of rare-variants. Surprisingly there is a lack of tools for result interpretation and carrier identification, i.e. for “depooling”. Results Here we present s-dePooler, the application for analysis of pooling experiments data. s-dePooler uses the variants information (VCF-file) and the pooling scheme to produce a list of candidate carriers for each polymorphism. We incorporated s-dePooler into a pipeline (dePoP) for automation of pooling analysis. The performance of the pipeline was tested on a synthetic dataset built using the 1000 Genomes Project data, resulting in the successful identification 97% of carriers of polymorphisms present in fewer than ~ 10% of carriers. Conclusions s-dePooler along with dePoP can be used to identify carriers of polymorphisms in overlapping pools, and is compatible with any pooling scheme with equivalent molar ratios of pooled samples. s-dePooler and dePoP with usage instructions and test data are freely available at https://github.com/lab9arriam/depop. Electronic supplementary material The online version of this article (10.1186/s12859-019-2616-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Aleksandr Igorevich Zhernakov
- Research Department of Non-Coronary Heart Diseases, Almazov National Medical Research Center, Ministry of Health of Russia, 2 Akkuratova St., St. Petersburg, 197341, Russia. .,All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 3 Podbelsky Ch., St. Petersburg - Pushkin, 196608, Russia.
| | - Alexey Mikhailovich Afonin
- All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 3 Podbelsky Ch., St. Petersburg - Pushkin, 196608, Russia
| | - Natalia Dmitrievna Gavriliuk
- Research Department of Non-Coronary Heart Diseases, Almazov National Medical Research Center, Ministry of Health of Russia, 2 Akkuratova St., St. Petersburg, 197341, Russia
| | - Olga Mikhailovna Moiseeva
- Research Department of Non-Coronary Heart Diseases, Almazov National Medical Research Center, Ministry of Health of Russia, 2 Akkuratova St., St. Petersburg, 197341, Russia
| | - Vladimir Aleksandrovich Zhukov
- Research Department of Non-Coronary Heart Diseases, Almazov National Medical Research Center, Ministry of Health of Russia, 2 Akkuratova St., St. Petersburg, 197341, Russia.,All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 3 Podbelsky Ch., St. Petersburg - Pushkin, 196608, Russia
| |
Collapse
|
19
|
Comprehensive Functional Analysis of the Enterococcus faecalis Core Genome Using an Ordered, Sequence-Defined Collection of Insertional Mutations in Strain OG1RF. mSystems 2018; 3:mSystems00062-18. [PMID: 30225373 PMCID: PMC6134198 DOI: 10.1128/msystems.00062-18] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Accepted: 08/03/2018] [Indexed: 12/14/2022] Open
Abstract
The robust ability of Enterococcus faecalis to survive outside the host and to spread via oral-fecal transmission and its high degree of intrinsic and acquired antimicrobial resistance all complicate the treatment of hospital-acquired enterococcal infections. The conserved E. faecalis core genome serves as an important genetic scaffold for evolution of this bacterium in the modern health care setting and also provides interesting vaccine and drug targets. We used an innovative pooling/sequencing strategy to map a large collection of arrayed transposon insertions in E. faecalis OG1RF and generated an arrayed library of defined mutants covering approximately 70% of the OG1RF genome. Then, we performed high-throughput transposon sequencing experiments using this library to determine core genomic determinants of bile resistance in OG1RF. This collection is a valuable resource for comprehensive, functional enterococcal genomics using both traditional and high-throughput approaches and enables immediate recovery of mutants of interest. Enterococcus faecalis is a common commensal bacterium in animal gastrointestinal (GI) tracts and a leading cause of opportunistic infections of humans in the modern health care setting. E. faecalis OG1RF is a plasmid-free strain that contains few mobile elements yet retains the robust survival characteristics, intrinsic antibiotic resistance, and virulence traits characteristic of most E. faecalis genotypes. To facilitate interrogation of the core enterococcal genetic determinants for competitive fitness in the GI tract, biofilm formation, intrinsic antimicrobial resistance, and survival in the environment, we generated an arrayed, sequence-defined set of chromosomal transposon insertions in OG1RF. We used an orthogonal pooling strategy in conjunction with Illumina sequencing to identify a set of mutants with unique, single Himar-based transposon insertions. The mutants contained insertions in 1,926 of 2,651 (72.6%) annotated open reading frames and in the majority of hypothetical protein-encoding genes and intergenic regions greater than 100 bp in length, which could encode small RNAs. As proof of principle of the usefulness of this arrayed transposon library, we created a minimal input pool containing 6,829 mutants chosen for maximal genomic coverage and used an approach that we term SMarT (sequence-defined marinertechnology) transposon sequencing (TnSeq) to identify numerous genetic determinants of bile resistance in E. faecalis OG1RF. These included several genes previously associated with bile acid resistance as well as new loci. Our arrayed library allows functional screening of a large percentage of the genome with a relatively small number of mutants, reducing potential effects of bottlenecking, and enables immediate recovery of mutants following competitions. IMPORTANCE The robust ability of Enterococcus faecalis to survive outside the host and to spread via oral-fecal transmission and its high degree of intrinsic and acquired antimicrobial resistance all complicate the treatment of hospital-acquired enterococcal infections. The conserved E. faecalis core genome serves as an important genetic scaffold for evolution of this bacterium in the modern health care setting and also provides interesting vaccine and drug targets. We used an innovative pooling/sequencing strategy to map a large collection of arrayed transposon insertions in E. faecalis OG1RF and generated an arrayed library of defined mutants covering approximately 70% of the OG1RF genome. Then, we performed high-throughput transposon sequencing experiments using this library to determine core genomic determinants of bile resistance in OG1RF. This collection is a valuable resource for comprehensive, functional enterococcal genomics using both traditional and high-throughput approaches and enables immediate recovery of mutants of interest.
Collapse
|
20
|
Salomé PA. Divide and Conquer: High-Throughput Screening of Chlamydomonas Cell Cycle Mutants. THE PLANT CELL 2018; 30:1167-1168. [PMID: 29789358 PMCID: PMC6048788 DOI: 10.1105/tpc.18.00391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Affiliation(s)
- Patrice A Salomé
- Department of Chemistry and Biochemistry University of California, Los Angeles
| |
Collapse
|
21
|
Breker M, Lieberman K, Cross FR. Comprehensive Discovery of Cell-Cycle-Essential Pathways in Chlamydomonas reinhardtii. THE PLANT CELL 2018; 30:1178-1198. [PMID: 29743196 PMCID: PMC6048789 DOI: 10.1105/tpc.18.00071] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 03/26/2018] [Accepted: 05/08/2018] [Indexed: 05/05/2023]
Abstract
We generated a large collection of temperature-sensitive lethal mutants in the unicellular green alga Chlamydomonas reinhardtii, focusing on mutations specifically affecting cell cycle regulation. We used UV mutagenesis and robotically assisted phenotypic screening to isolate candidates. To overcome the bottleneck at the critical step of molecular identification of the causative mutation ("driver"), we developed MAPS-SEQ (meiosis-assisted purifying selection sequencing), a multiplexed genetic/bioinformatics strategy. MAPS-SEQ allowed us to perform multiplexed simultaneous determination of the driver mutations from hundreds of neutral "passenger" mutations in each member of a large pool of mutants. This method should work broadly, including in multicellular diploid genetic systems, for any scorable trait. Using MAPS-SEQ, we identified essential genes spanning a wide range of molecular functions. Phenotypic clustering based on DNA content analysis and cell morphology indicated that the mutated genes function in the cell cycle at multiple points and by diverse mechanisms. The collection is sufficiently complete to allow specific conditional inactivation of almost all cell-cycle-regulatory pathways. Approximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes, a huge enrichment compared with the value of ∼20% for the Chlamydomonas genome overall. Findings about these mutants will likely have direct relevance to essential cell biology in land plants.
Collapse
Affiliation(s)
- Michal Breker
- Laboratory of Cell Cycle Genetics, The Rockefeller University, New York, New York 10065
| | - Kristi Lieberman
- Laboratory of Cell Cycle Genetics, The Rockefeller University, New York, New York 10065
| | - Frederick R Cross
- Laboratory of Cell Cycle Genetics, The Rockefeller University, New York, New York 10065
| |
Collapse
|
22
|
Pool deconvolution approach for high-throughput gene mining from Bacillus thuringiensis. Appl Microbiol Biotechnol 2017; 102:1467-1482. [DOI: 10.1007/s00253-017-8633-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Revised: 10/24/2017] [Accepted: 11/05/2017] [Indexed: 11/27/2022]
|
23
|
Erlich Y, Zielinski D. DNA Fountain enables a robust and efficient storage architecture. Science 2017; 355:950-954. [PMID: 28254941 DOI: 10.1126/science.aaj2038] [Citation(s) in RCA: 291] [Impact Index Per Article: 41.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 02/09/2017] [Indexed: 12/16/2022]
Abstract
DNA is an attractive medium to store digital information. Here we report a storage strategy, called DNA Fountain, that is highly robust and approaches the information capacity per nucleotide. Using our approach, we stored a full computer operating system, movie, and other files with a total of 2.14 × 106 bytes in DNA oligonucleotides and perfectly retrieved the information from a sequencing coverage equivalent to a single tile of Illumina sequencing. We also tested a process that can allow 2.18 × 1015 retrievals using the original DNA sample and were able to perfectly decode the data. Finally, we explored the limit of our architecture in terms of bytes per molecule and obtained a perfect retrieval from a density of 215 petabytes per gram of DNA, orders of magnitude higher than previous reports.
Collapse
Affiliation(s)
- Yaniv Erlich
- New York Genome Center, New York, NY 10013, USA. .,Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, NY 10027, USA.,Center for Computational Biology and Bioinformatics (C2B2), Department of Systems Biology, Columbia University, New York, NY 10027, USA
| | | |
Collapse
|
24
|
Anzai IA, Shaket L, Adesina O, Baym M, Barstow B. Rapid curation of gene disruption collections using Knockout Sudoku. Nat Protoc 2017; 12:2110-2137. [DOI: 10.1038/nprot.2017.073] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
25
|
Verdin E, Wipf-Scheibel C, Gognalons P, Aller F, Jacquemond M, Tepfer M. Sequencing viral siRNAs to identify previously undescribed viruses and viroids in a panel of ornamental plant samples structured as a matrix of pools. Virus Res 2017; 241:19-28. [PMID: 28576697 DOI: 10.1016/j.virusres.2017.05.019] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Revised: 05/12/2017] [Accepted: 05/24/2017] [Indexed: 10/19/2022]
Abstract
Ornamental plants constitute a largely unknown and potentially important source of pathogens affecting not only ornamental plants, but also major crop species. We have carried out studies using high-throughput sequencing of 21-24 nt RNAs from potentially virus-infected ornamental plants, followed by assembly of sequence scaffolds, to identify the virus and viroid genomes present in a panel of 67 plant samples representing 46 species belonging to the main sectors of the ornamental plant industry (cut flowers, pot plants, bulbs). A pilot study demonstrated that samples could be pooled (5 samples per pool), and the overall process simplified without loss of detection of important known pathogens. In a full-scale study, pools of 5 samples were organized in a 5×5 matrix to facilitate attribution of a sequence to a precise sample directly from analysis of the matrix. In the total of 67 samples analyzed in the two studies, partial sequences suggesting the presence of 25 previously unknown viruses and viroids were detected, including all types of virus and viroid genomes, and also showed four cases of known viruses infecting previously undescribed hosts. Furthermore, two types of potential mis-assembly were analyzed, and were shown to not affect the conclusions regarding the presence of the pathogens identified, but show that mis-assembly can affect the results when the objective is determining complete bona fide viral genome sequences. These results clearly confirm that ornamental plants constitute a potential source of unknown viruses and viroids that could have a major impact on agriculture, and that sequencing siRNAs of potentially virus- or viroid-infected ornamental plants is an effective means for screening for the presence of potentially important pathogens.
Collapse
Affiliation(s)
- Eric Verdin
- Pathologie Végétale, INRA, F-84140 Montfavet, France.
| | | | | | - François Aller
- Fasteris SA, Ch. du Pont-du-Centenaire 109, CH-1228 Plan-les-Ouates, Switzerland
| | | | - Mark Tepfer
- Pathologie Végétale, INRA, F-84140 Montfavet, France; Institut Jean-Pierre Bourgin (IJPB), INRA, AgroParisTech, CNRS, Saclay Plant Sciences (SPS), Université Paris-Saclay, F-78026 Versailles, France.
| |
Collapse
|
26
|
Rapid construction of a whole-genome transposon insertion collection for Shewanella oneidensis by Knockout Sudoku. Nat Commun 2016; 7:13270. [PMID: 27830751 PMCID: PMC5109470 DOI: 10.1038/ncomms13270] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 09/14/2016] [Indexed: 02/07/2023] Open
Abstract
Whole-genome knockout collections are invaluable for connecting gene sequence to function, yet traditionally, their construction has required an extraordinary technical effort. Here we report a method for the construction and purification of a curated whole-genome collection of single-gene transposon disruption mutants termed Knockout Sudoku. Using simple combinatorial pooling, a highly oversampled collection of mutants is condensed into a next-generation sequencing library in a single day, a 30- to 100-fold improvement over prior methods. The identities of the mutants in the collection are then solved by a probabilistic algorithm that uses internal self-consistency within the sequencing data set, followed by rapid algorithmically guided condensation to a minimal representative set of mutants, validation, and curation. Starting from a progenitor collection of 39,918 mutants, we compile a quality-controlled knockout collection of the electroactive microbe Shewanella oneidensis MR-1 containing representatives for 3,667 genes that is functionally validated by high-throughput kinetic measurements of quinone reduction. Knockout collections provide a valuable tool to explore gene function, yet are expensive and technically challenging to produce at a genome-wide scale. Here Baym et al. devise a cost-effective transposon-based method to quickly develop a knockout collection for the electroactive microbe Shewanella oneidensis.
Collapse
|
27
|
Kaseniit KE, Theilmann MR, Robertson A, Evans EA, Haque IS. Group Testing Approach for Trinucleotide Repeat Expansion Disorder Screening. Clin Chem 2016; 62:1401-8. [DOI: 10.1373/clinchem.2016.259796] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 07/22/2016] [Indexed: 11/06/2022]
Abstract
Abstract
BACKGROUND
Fragile X syndrome (FXS, OMIM #300624) is an X-linked condition caused by trinucleotide repeat expansions in the 5′ UTR (untranslated region) of the fragile X mental retardation 1 (FMR1) gene. FXS testing is commonly performed in expanded carrier screening and has been proposed for inclusion in newborn screening. However, because pathogenic alleles are long and have low complexity (>200 CGG repeats), FXS is currently tested by a single-plex electrophoresis-resolved PCR assay rather than multiplexed approaches like next-generation sequencing or mass spectrometry. In this work, we sought an experimental design based on nonadaptive group testing that could accurately and reliably identify the size of abnormally expanded FMR1 alleles of males and females.
METHODS
We developed a new group testing scheme named StairCase (SC) that was designed to the constraints of the FXS testing problem, and compared its performance to existing group testing schemes by simulation. We experimentally evaluated SC's performance on 210 samples from the Coriell Institute biorepositories using pooled PCR followed by capillary electrophoresis on 3 replicates of each of 3 pooling layouts differing by the mapping of samples to pools.
RESULTS
The SC pooled PCR approach demonstrated perfect classification of samples by clinical category (normal, intermediate, premutation, or full mutation) for 90 positives and 1800 negatives, with a batch of 210 samples requiring only 21 assays.
CONCLUSIONS
Group testing based on SC is an implementable approach to trinucleotide repeat expansion disorder testing that offers ≥10-fold reduction in assay costs over current single-plex methods.
Collapse
|
28
|
Li C, Cao C, Tu J, Sun X. An accurate clone-based haplotyping method by overlapping pool sequencing. Nucleic Acids Res 2016; 44:e112. [PMID: 27095193 PMCID: PMC4937318 DOI: 10.1093/nar/gkw284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Accepted: 04/07/2016] [Indexed: 11/25/2022] Open
Abstract
Chromosome-long haplotyping of human genomes is important to identify genetic variants with differing gene expression, in human evolution studies, clinical diagnosis, and other biological and medical fields. Although several methods have realized haplotyping based on sequencing technologies or population statistics, accuracy and cost are factors that prohibit their wide use. Borrowing ideas from group testing theories, we proposed a clone-based haplotyping method by overlapping pool sequencing. The clones from a single individual were pooled combinatorially and then sequenced. According to the distinct pooling pattern for each clone in the overlapping pool sequencing, alleles for the recovered variants could be assigned to their original clones precisely. Subsequently, the clone sequences could be reconstructed by linking these alleles accordingly and assembling them into haplotypes with high accuracy. To verify the utility of our method, we constructed 130 110 clones in silico for the individual NA12878 and simulated the pooling and sequencing process. Ultimately, 99.9% of variants on chromosome 1 that were covered by clones from both parental chromosomes were recovered correctly, and 112 haplotype contigs were assembled with an N50 length of 3.4 Mb and no switch errors. A comparison with current clone-based haplotyping methods indicated our method was more accurate.
Collapse
Affiliation(s)
- Cheng Li
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu 210002, China
| | - Changchang Cao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu 210002, China
| | - Jing Tu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu 210002, China
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu 210002, China
| |
Collapse
|
29
|
Zepeda-Mendoza ML, Bohmann K, Carmona Baez A, Gilbert MTP. DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses. BMC Res Notes 2016; 9:255. [PMID: 27142414 PMCID: PMC4855357 DOI: 10.1186/s13104-016-2064-9] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2015] [Accepted: 04/26/2016] [Indexed: 01/23/2023] Open
Abstract
Background DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5′-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way. Results We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe. Conclusions DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies. Electronic supplementary material The online version of this article (doi:10.1186/s13104-016-2064-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marie Lisandra Zepeda-Mendoza
- Evogenomics, Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen, Denmark.
| | - Kristine Bohmann
- Evogenomics, Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen, Denmark
| | - Aldo Carmona Baez
- Evogenomics, Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen, Denmark.,Undergraduate Program on Genomic Sciences, Center for Genomic Sciences, National Autonomous University of Mexico (UNAM), Av. Universidad s/n Col. Chamilpa, 62210, Cuernavaca, Morelos, Mexico
| | - M Thomas P Gilbert
- Evogenomics, Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen, Denmark
| |
Collapse
|
30
|
|
31
|
Schmidt T, Schmid-Burgk JL, Hornung V. Synthesis of an arrayed sgRNA library targeting the human genome. Sci Rep 2015; 5:14987. [PMID: 26446710 PMCID: PMC4597219 DOI: 10.1038/srep14987] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2015] [Accepted: 08/21/2015] [Indexed: 12/26/2022] Open
Abstract
Clustered regularly interspaced short palindromic repeats (CRISPR) in conjunction with CRISPR-associated proteins (Cas) can be employed to introduce double stand breaks into mammalian genomes at user-defined loci. The endonuclease activity of the Cas complex can be targeted to a specific genomic region using a single guide RNA (sgRNA). We developed a ligation-independent cloning (LIC) assembly method for efficient and bias-free generation of large sgRNA libraries. Using this system, we performed an iterative shotgun cloning approach to generate an arrayed sgRNA library that targets one critical exon of almost every protein-coding human gene. An orthogonal mixing and deconvolution approach was used to obtain 19,506 unique sequence-validated sgRNAs (91.4% coverage). As tested in HEK 293T cells, constructs of this library have a median genome editing activity of 54.6% and employing sgRNAs of this library to generate knockout cells was successful for 19 out of 19 genes tested.
Collapse
Affiliation(s)
- Tobias Schmidt
- Institute of Molecular Medicine, University Hospital, University of Bonn, Sigmund-Freud-Str. 25, 53127 Bonn, Germany
| | - Jonathan L. Schmid-Burgk
- Institute of Molecular Medicine, University Hospital, University of Bonn, Sigmund-Freud-Str. 25, 53127 Bonn, Germany
| | - Veit Hornung
- Institute of Molecular Medicine, University Hospital, University of Bonn, Sigmund-Freud-Str. 25, 53127 Bonn, Germany
| |
Collapse
|
32
|
Characterization of genome-wide ordered sequence-tagged Mycobacterium mutant libraries by Cartesian Pooling-Coordinate Sequencing. Nat Commun 2015; 6:7106. [PMID: 25960123 PMCID: PMC4432585 DOI: 10.1038/ncomms8106] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 04/07/2015] [Indexed: 02/02/2023] Open
Abstract
Reverse genetics research approaches require the availability of methods to rapidly generate specific mutants. Alternatively, where these methods are lacking, the construction of pre-characterized libraries of mutants can be extremely valuable. However, this can be complex, expensive and time consuming. Here, we describe a robust, easy to implement parallel sequencing-based method (Cartesian Pooling-Coordinate Sequencing or CP-CSeq) that reports both on the identity as well as on the location of sequence-tagged biological entities in well-plate archived clone collections. We demonstrate this approach using a transposon insertion mutant library of the Mycobacterium bovis BCG vaccine strain, providing the largest resource of mutants in any strain of the M. tuberculosis complex. The method is applicable to any entity for which sequence-tagged identification is possible. The generation of characterized panels of specific mutants is an essential but time-consuming step of reverse genetic studies. Here Vandewalle et al. describe CP-CSeq, an easy to implement parallel sequencing method for rapid library construction.
Collapse
|
33
|
Skums P, Artyomenko A, Glebova O, Ramachandran S, Mandoiu I, Campo DS, Dimitrova Z, Zelikovsky A, Khudyakov Y. Computational framework for next-generation sequencing of heterogeneous viral populations using combinatorial pooling. ACTA ACUST UNITED AC 2014; 31:682-90. [PMID: 25359889 DOI: 10.1093/bioinformatics/btu726] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Next-generation sequencing (NGS) allows for analyzing a large number of viral sequences from infected patients, providing an opportunity to implement large-scale molecular surveillance of viral diseases. However, despite improvements in technology, traditional protocols for NGS of large numbers of samples are still highly cost and labor intensive. One of the possible cost-effective alternatives is combinatorial pooling. Although a number of pooling strategies for consensus sequencing of DNA samples and detection of SNPs have been proposed, these strategies cannot be applied to sequencing of highly heterogeneous viral populations. RESULTS We developed a cost-effective and reliable protocol for sequencing of viral samples, that combines NGS using barcoding and combinatorial pooling and a computational framework including algorithms for optimal virus-specific pools design and deconvolution of individual samples from sequenced pools. Evaluation of the framework on experimental and simulated data for hepatitis C virus showed that it substantially reduces the sequencing costs and allows deconvolution of viral populations with a high accuracy. AVAILABILITY AND IMPLEMENTATION The source code and experimental data sets are available at http://alan.cs.gsu.edu/NGS/?q=content/pooling.
Collapse
Affiliation(s)
- Pavel Skums
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Alexander Artyomenko
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Olga Glebova
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Sumathi Ramachandran
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Ion Mandoiu
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - David S Campo
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Zoya Dimitrova
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Alex Zelikovsky
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Yury Khudyakov
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| |
Collapse
|
34
|
Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat Genet 2014; 46:1343-9. [PMID: 25326703 DOI: 10.1038/ng.3119] [Citation(s) in RCA: 122] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2014] [Accepted: 09/24/2014] [Indexed: 12/11/2022]
Abstract
Haplotype-resolved genome sequencing enables the accurate interpretation of medically relevant genetic variation, deep inferences regarding population history and non-invasive prediction of fetal genomes. We describe an approach for genome-wide haplotyping based on contiguity-preserving transposition (CPT-seq) and combinatorial indexing. Tn5 transposition is used to modify DNA with adaptor and index sequences while preserving contiguity. After DNA dilution and compartmentalization, the transposase is removed, resolving the DNA into individually indexed libraries. The libraries in each compartment, enriched for neighboring genomic elements, are further indexed via PCR. Combinatorial 96-plex indexing at both the transposition and PCR stage enables the construction of phased synthetic reads from each of the nearly 10,000 'virtual compartments'. We demonstrate the feasibility of this method by assembling >95% of the heterozygous variants in a human genome into long, accurate haplotype blocks (N50 = 1.4-2.3 Mb). The rapid, scalable and cost-effective workflow could enable haplotype resolution to become routine in human genome sequencing.
Collapse
|
35
|
Adey A, Kitzman JO, Burton JN, Daza R, Kumar A, Christiansen L, Ronaghi M, Amini S, Gunderson KL, Steemers FJ, Shendure J. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res 2014; 24:2041-9. [PMID: 25327137 PMCID: PMC4248320 DOI: 10.1101/gr.178319.114] [Citation(s) in RCA: 117] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to >1 megabase. These pools are “subhaploid,” in that the lengths of fragments contained in each pool sums to ∼5% to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate “joins” are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by eight- to 57-fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing midrange contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences.
Collapse
Affiliation(s)
- Andrew Adey
- Department of Genome Sciences, University of Washington, Seattle, Washington 98115, USA
| | - Jacob O Kitzman
- Department of Genome Sciences, University of Washington, Seattle, Washington 98115, USA
| | - Joshua N Burton
- Department of Genome Sciences, University of Washington, Seattle, Washington 98115, USA
| | - Riza Daza
- Department of Genome Sciences, University of Washington, Seattle, Washington 98115, USA
| | - Akash Kumar
- Department of Genome Sciences, University of Washington, Seattle, Washington 98115, USA
| | - Lena Christiansen
- Illumina, Inc., Advanced Research Group, San Diego, California 92122, USA
| | - Mostafa Ronaghi
- Illumina, Inc., Advanced Research Group, San Diego, California 92122, USA
| | - Sasan Amini
- Illumina, Inc., Advanced Research Group, San Diego, California 92122, USA
| | - Kevin L Gunderson
- Illumina, Inc., Advanced Research Group, San Diego, California 92122, USA
| | - Frank J Steemers
- Illumina, Inc., Advanced Research Group, San Diego, California 92122, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, Washington 98115, USA;
| |
Collapse
|
36
|
Cao CC, Sun X. Accurate estimation of haplotype frequency from pooled sequencing data and cost-effective identification of rare haplotype carriers by overlapping pool sequencing. Bioinformatics 2014; 31:515-22. [PMID: 25304780 DOI: 10.1093/bioinformatics/btu670] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION A variety of hypotheses have been proposed for finding the missing heritability of complex diseases in genome-wide association studies. Studies have focused on the value of haplotype to improve the power of detecting associations with disease. To facilitate haplotype-based association analysis, it is necessary to accurately estimate haplotype frequencies of pooled samples. RESULTS Taking advantage of databases that contain prior haplotypes, we present Ehapp based on the algorithm for solving the system of linear equations to estimate the frequencies of haplotypes from pooled sequencing data. Effects of various factors in sequencing on the performance are evaluated using simulated data. Our method could estimate the frequencies of haplotypes with only about 3% average relative difference for pooled sequencing of the mixture of 10 haplotypes with total coverage of 50×. When unknown haplotypes exist, our method maintains excellent performance for haplotypes with actual frequencies >0.05. Comparisons with present method on simulated data in conjunction with publicly available Illumina sequencing data indicate that our method is state of the art for many sequencing study designs. We also demonstrate the feasibility of applying overlapping pool sequencing to identify rare haplotype carriers cost-effectively. AVAILABILITY AND IMPLEMENTATION Ehapp (in Perl) for the Linux platforms is available online (http://bioinfo.seu.edu.cn/Ehapp/). CONTACT xsun@seu.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chang-Chang Cao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| |
Collapse
|
37
|
Zielinski D, Gordon A, Zaks BL, Erlich Y. iPipet: sample handling using a tablet. Nat Methods 2014; 11:784-5. [PMID: 25075904 DOI: 10.1038/nmeth.3028] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Dina Zielinski
- 1] Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, USA. [2]
| | - Assaf Gordon
- 1] Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, USA. [2]
| | | | - Yaniv Erlich
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, USA
| |
Collapse
|
38
|
Bonachea EM, Zender G, White P, Corsmeier D, Newsom D, Fitzgerald-Butt S, Garg V, McBride KL. Use of a targeted, combinatorial next-generation sequencing approach for the study of bicuspid aortic valve. BMC Med Genomics 2014; 7:56. [PMID: 25260786 PMCID: PMC4181662 DOI: 10.1186/1755-8794-7-56] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2014] [Accepted: 09/24/2014] [Indexed: 12/18/2022] Open
Abstract
Background Bicuspid aortic valve (BAV) is the most common type of congenital heart disease with a population prevalence of 1-2%. While BAV is known to be highly heritable, mutations in single genes (such as GATA5 and NOTCH1) have been reported in few human BAV cases. Traditional gene sequencing methods are time and labor intensive, while next-generation high throughput sequencing remains costly for large patient cohorts and requires extensive bioinformatics processing. Here we describe an approach to targeted multi-gene sequencing with combinatorial pooling of samples from BAV patients. Methods We studied a previously described cohort of 78 unrelated subjects with echocardiogram-identified BAV. Subjects were identified as having isolated BAV or BAV associated with coarctation of aorta (BAV-CoA). BAV cusp fusion morphology was defined as right-left cusp fusion, right non-coronary cusp fusion, or left non-coronary cusp fusion. Samples were combined into 19 pools using a uniquely overlapping combinatorial design; a given mutation could be attributed to a single individual on the basis of which pools contained the mutation. A custom gene capture of 97 candidate genes was sequenced on the Illumina HiSeq 2000. Multistep bioinformatics processing was performed for base calling, variant identification, and in-silico analysis of putative disease-causing variants. Results Targeted capture identified 42 rare, non-synonymous, exonic variants involving 35 of the 97 candidate genes. Among these variants, in-silico analysis classified 33 of these variants as putative disease-causing changes. Sanger sequencing confirmed thirty-one of these variants, found among 16 individuals. There were no significant differences in variant burden among BAV fusion phenotypes or isolated BAV versus BAV-CoA. Pathway analysis suggests a role for the WNT signaling pathway in human BAV. Conclusion We successfully developed a pooling and targeted capture strategy that enabled rapid and cost effective next generation sequencing of target genes in a large patient cohort. This approach identified a large number of putative disease-causing variants in a cohort of patients with BAV, including variants in 26 genes not previously associated with human BAV. The data suggest that BAV heritability is complex and polygenic. Our pooling approach saved over $39,350 compared to an unpooled, targeted capture sequencing strategy.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Kim L McBride
- Department of Pediatrics, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
39
|
Abstract
This work presents bipolar neural systems for check-rule embedded pattern restoration, fault-tolerant information encoding and Sudoku memory construction and association. The primitive bipolar neural unit is generalized to have internal fields and activations, which are respectively characterized by exponential growth and logistic differential dynamics, in response to inhibitory and excitatory stimuli. Coupling extended bipolar units induces multi-state artificial Potts neurons which are interconnected with inhibitory synapses for Latin square encoding, K-alphabet Latin square encoding and Sudoku encoding. The proposed neural dynamics can generally restore Sudoku patterns from partial sparse clues. Neural relaxation is based on mean field annealing that well guarantees reliable convergence to ground states. Sudoku associative memory combines inhibitory interconnections of Sudoku encoding with Hebb's excitatory synapses of encoding conjunctive relations among active units over memorized patterns. Sudoku associative memory is empirically shown reliable and effective for restoring memorized patterns subject to typical sparse clues, fewer partial clues, dense clues and perturbed or damaged clues. On the basis, compound Sudoku patterns are further extended to emulate complex topological information encoding.
Collapse
|
40
|
Cao CC, Li C, Sun X. Quantitative group testing-based overlapping pool sequencing to identify rare variant carriers. BMC Bioinformatics 2014; 15:195. [PMID: 24934981 PMCID: PMC4229885 DOI: 10.1186/1471-2105-15-195] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Accepted: 06/10/2014] [Indexed: 11/23/2022] Open
Abstract
Background Genome-wide association studies have revealed that rare variants are responsible for a large portion of the heritability of some complex human diseases. This highlights the increasing importance of detecting and screening for rare variants. Although the massively parallel sequencing technologies have greatly reduced the cost of DNA sequencing, the identification of rare variant carriers by large-scale re-sequencing remains prohibitively expensive because of the huge challenge of constructing libraries for thousands of samples. Recently, several studies have reported that techniques from group testing theory and compressed sensing could help identify rare variant carriers in large-scale samples with few pooled sequencing experiments and a dramatically reduced cost. Results Based on quantitative group testing, we propose an efficient overlapping pool sequencing strategy that allows the efficient recovery of variant carriers in numerous individuals with much lower costs than conventional methods. We used random k-set pool designs to mix samples, and optimized the design parameters according to an indicative probability. Based on a mathematical model of sequencing depth distribution, an optimal threshold was selected to declare a pool positive or negative. Then, using the quantitative information contained in the sequencing results, we designed a heuristic Bayesian probability decoding algorithm to identify variant carriers. Finally, we conducted in silico experiments to find variant carriers among 200 simulated Escherichia coli strains. With the simulated pools and publicly available Illumina sequencing data, our method correctly identified the variant carriers for 91.5–97.9% variants with the variant frequency ranging from 0.5 to 1.5%. Conclusions Using the number of reads, variant carriers could be identified precisely even though samples were randomly selected and pooled. Our method performed better than the published DNA Sudoku design and compressed sequencing, especially in reducing the required data throughput and cost.
Collapse
Affiliation(s)
| | | | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
| |
Collapse
|
41
|
Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries. PLoS One 2014; 9:e98968. [PMID: 24911009 PMCID: PMC4049660 DOI: 10.1371/journal.pone.0098968] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2013] [Accepted: 05/09/2014] [Indexed: 11/19/2022] Open
Abstract
High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones.
Collapse
|
42
|
Zuzarte PC, Denroche RE, Fehringer G, Katzov-Eckert H, Hung RJ, McPherson JD. A two-dimensional pooling strategy for rare variant detection on next-generation sequencing platforms. PLoS One 2014; 9:e93455. [PMID: 24728235 PMCID: PMC3984111 DOI: 10.1371/journal.pone.0093455] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Accepted: 03/04/2014] [Indexed: 11/18/2022] Open
Abstract
We describe a method for pooling and sequencing DNA from a large number of individual samples while preserving information regarding sample identity. DNA from 576 individuals was arranged into four 12 row by 12 column matrices and then pooled by row and by column resulting in 96 total pools with 12 individuals in each pool. Pooling of DNA was carried out in a two-dimensional fashion, such that DNA from each individual is present in exactly one row pool and exactly one column pool. By considering the variants observed in the rows and columns of a matrix we are able to trace rare variants back to the specific individuals that carry them. The pooled DNA samples were enriched over a 250 kb region previously identified by GWAS to significantly predispose individuals to lung cancer. All 96 pools (12 row and 12 column pools from 4 matrices) were barcoded and sequenced on an Illumina HiSeq 2000 instrument with an average depth of coverage greater than 4,000×. Verification based on Ion PGM sequencing confirmed the presence of 91.4% of confidently classified SNVs assayed. In this way, each individual sample is sequenced in multiple pools providing more accurate variant calling than a single pool or a multiplexed approach. This provides a powerful method for rare variant detection in regions of interest at a reduced cost to the researcher.
Collapse
Affiliation(s)
- Philip C. Zuzarte
- Genome Technologies, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Robert E. Denroche
- Genome Technologies, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Gordon Fehringer
- Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital, Toronto, Ontario, Canada
| | - Hagit Katzov-Eckert
- Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital, Toronto, Ontario, Canada
| | - Rayjean J. Hung
- Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital, Toronto, Ontario, Canada
- Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - John D. McPherson
- Genome Technologies, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- * E-mail:
| |
Collapse
|
43
|
Harper M, Gronenberg L, Liao J, Lee C. Comprehensive detection of genes causing a phenotype using phenotype sequencing and pathway analysis. PLoS One 2014; 9:e88072. [PMID: 24586303 PMCID: PMC3935835 DOI: 10.1371/journal.pone.0088072] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Accepted: 01/06/2014] [Indexed: 12/30/2022] Open
Abstract
Discovering all the genetic causes of a phenotype is an important goal in functional genomics. We combine an experimental design for detecting independent genetic causes of a phenotype with a high-throughput sequencing analysis that maximizes sensitivity for comprehensively identifying them. Testing this approach on a set of 24 mutant strains generated for a metabolic phenotype with many known genetic causes, we show that this pathway-based phenotype sequencing analysis greatly improves sensitivity of detection compared with previous methods, and reveals a wide range of pathways that can cause this phenotype. We demonstrate our approach on a metabolic re-engineering phenotype, the PEP/OAA metabolic node in E. coli, which is crucial to a substantial number of metabolic pathways and under renewed interest for biofuel research. Out of 2157 mutations in these strains, pathway-phenoseq discriminated just five gene groups (12 genes) as statistically significant causes of the phenotype. Experimentally, these five gene groups, and the next two high-scoring pathway-phenoseq groups, either have a clear connection to the PEP metabolite level or offer an alternative path of producing oxaloacetate (OAA), and thus clearly explain the phenotype. These high-scoring gene groups also show strong evidence of positive selection pressure, compared with strictly neutral selection in the rest of the genome.
Collapse
Affiliation(s)
- Marc Harper
- Institute for Genomics and Proteomics, University of California Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| | - Luisa Gronenberg
- Department of Chemical and Biomolecular Engineering, University of California Los Angeles, Los Angeles, California, United States of America
| | - James Liao
- Institute for Genomics and Proteomics, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Chemical and Biomolecular Engineering, University of California Los Angeles, Los Angeles, California, United States of America
| | - Christopher Lee
- Institute for Genomics and Proteomics, University of California Los Angeles, Los Angeles, California, United States of America
- Dept. of Chemistry & Biochemistry, University of California Los Angeles, Los Angeles, California, United States of America
- Dept. of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
- Molecular Biology Institute, University of California Los Angeles, Los Angeles, California, United States of America
| |
Collapse
|
44
|
Large-scale mapping of transposable element insertion sites using digital encoding of sample identity. Genetics 2013; 196:615-23. [PMID: 24374352 DOI: 10.1534/genetics.113.159483] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Determining the genomic locations of transposable elements is a common experimental goal. When mapping large collections of transposon insertions, individualized amplification and sequencing is both time consuming and costly. We describe an approach in which large numbers of insertion lines can be simultaneously mapped in a single DNA sequencing reaction by using digital error-correcting codes to encode line identity in a unique set of barcoded pools.
Collapse
|
45
|
Cao CC, Li C, Huang Z, Ma X, Sun X. Identifying rare variants with optimal depth of coverage and cost-effective overlapping pool sequencing. Genet Epidemiol 2013; 37:820-30. [PMID: 24166758 DOI: 10.1002/gepi.21769] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Revised: 09/09/2013] [Accepted: 09/27/2013] [Indexed: 01/19/2023]
Abstract
Genome-wide association studies have identified hundreds of genetic variants associated with complex diseases although most variants identified so far explain only a small proportion of heritability, suggesting that rare variants are responsible for missing heritability. Identification of rare variants through large-scale resequencing becomes increasing important but still prohibitively expensive despite the rapid decline in the sequencing costs. Nevertheless, group testing based overlapping pool sequencing in which pooled rather than individual samples are sequenced will greatly reduces the efforts of sample preparation as well as the costs to screen for rare variants. Here, we proposed an overlapping pool sequencing to screen rare variants with optimal sequencing depth and a corresponding cost model. We formulated a model to compute the optimal depth for sufficient observations of variants in pooled sequencing. Utilizing shifted transversal design algorithm, appropriate parameters for overlapping pool sequencing could be selected to minimize cost and guarantee accuracy. Due to the mixing constraint and high depth for pooled sequencing, results showed that it was more cost-effective to divide a large population into smaller blocks which were tested using optimized strategies independently. Finally, we conducted an experiment to screen variant carriers with frequency equaled 1%. With simulated pools and publicly available human exome sequencing data, the experiment achieved 99.93% accuracy. Utilizing overlapping pool sequencing, the cost for screening variant carriers with frequency equaled 1% in 200 diploid individuals dropped to at least 66% at which target sequencing region was set to 30 Mb.
Collapse
Affiliation(s)
- Chang-Chang Cao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | | | | | | | | |
Collapse
|
46
|
Eskin I, Hormozdiari F, Conde L, Riby J, Skibola CF, Eskin E, Halperin E. eALPS: estimating abundance levels in pooled sequencing using available genotyping data. J Comput Biol 2013; 20:861-77. [PMID: 24144111 DOI: 10.1089/cmb.2013.0105] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
The recent advances in high-throughput sequencing technologies bring the potential of a better characterization of the genetic variation in humans and other organisms. In many occasions, either by design or by necessity, the sequencing procedure is performed on a pool of DNA samples with different abundances, where the abundance of each sample is unknown. Such a scenario is naturally occurring in the case of metagenomics analysis where a pool of bacteria is sequenced, or in the case of population studies involving DNA pools by design. Particularly, various pooling designs were recently suggested that can identify carriers of rare alleles in large cohorts, dramatically reducing the cost of such large-scale sequencing projects. A fundamental problem with such approaches for population studies is that the uncertainty of DNA proportions from different individuals in the pools might lead to spurious associations. Fortunately, it is often the case that the genotype data of at least some of the individuals in the pool is known. Here, we propose a method (eALPS) that uses the genotype data in conjunction with the pooled sequence data in order to accurately estimate the proportions of the samples in the pool, even in cases where not all individuals in the pool were genotyped (eALPS-LD). Using real data from a sequencing pooling study of non-Hodgkin's lymphoma, we demonstrate that the estimation of the proportions is crucial, since otherwise there is a risk for false discoveries. Additionally, we demonstrate that our approach is also applicable to the problem of quantification of species in metagenomics samples (eALPS-BCR) and is particularly suitable for metagenomic quantification of closely related species.
Collapse
Affiliation(s)
- Itamar Eskin
- 1 The Blavatnik School of Computer Science, Tel-Aviv University , Tel Aviv, Israel
| | | | | | | | | | | | | |
Collapse
|
47
|
Yan S, Wang N, Chen Z, Wang Y, He N, Peng Y, Li Q, Deng X. Genes encoding the production of extracellular polysaccharide bioflocculant are clustered on a 30-kb DNA segment in Bacillus licheniformis. Funct Integr Genomics 2013; 13:425-34. [DOI: 10.1007/s10142-013-0333-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2013] [Revised: 07/18/2013] [Accepted: 08/12/2013] [Indexed: 10/26/2022]
|
48
|
Garvin MR, Saitoh K, Gharrett AJ. Application of single nucleotide polymorphisms to non-model species: a technical review. Mol Ecol Resour 2013; 10:915-34. [PMID: 21565101 DOI: 10.1111/j.1755-0998.2010.02891.x] [Citation(s) in RCA: 159] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Single nucleotide polymorphisms (SNPs) have gained wide use in humans and model species and are becoming the marker of choice for applications in other species. Technology that was developed for work in model species may provide useful tools for SNP discovery and genotyping in non-model organisms. However, SNP discovery can be expensive, labour intensive, and introduce ascertainment bias. In addition, the most efficient approaches to SNP discovery will depend on the research questions that the markers are to resolve as well as the focal species. We discuss advantages and disadvantages of several past and recent technologies for SNP discovery and genotyping and summarize a variety of SNP discovery and genotyping studies in ecology and evolution.
Collapse
Affiliation(s)
- M R Garvin
- Fisheries Division, School of Fisheries and Ocean Sciences, University of Alaska Fairbanks, 17101 Point Lena Loop Road, Juneau, AK 99801, USA National Research Institute of Fisheries Science, Fukuura, Kanazawa, Yokohama 236-8648 Japan
| | | | | |
Collapse
|
49
|
Téllez-Sosa J, Rodríguez MH, Gómez-Barreto RE, Valdovinos-Torres H, Hidalgo AC, Cruz-Hervert P, Luna RS, Carrillo-Valenzo E, Ramos C, García-García L, Martínez-Barnetche J. Using high-throughput sequencing to leverage surveillance of genetic diversity and oseltamivir resistance: a pilot study during the 2009 influenza A(H1N1) pandemic. PLoS One 2013; 8:e67010. [PMID: 23843978 PMCID: PMC3699567 DOI: 10.1371/journal.pone.0067010] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2012] [Accepted: 05/17/2013] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Influenza viruses display a high mutation rate and complex evolutionary patterns. Next-generation sequencing (NGS) has been widely used for qualitative and semi-quantitative assessment of genetic diversity in complex biological samples. The "deep sequencing" approach, enabled by the enormous throughput of current NGS platforms, allows the identification of rare genetic viral variants in targeted genetic regions, but is usually limited to a small number of samples. METHODOLOGY AND PRINCIPAL FINDINGS We designed a proof-of-principle study to test whether redistributing sequencing throughput from a high depth-small sample number towards a low depth-large sample number approach is feasible and contributes to influenza epidemiological surveillance. Using 454-Roche sequencing, we sequenced at a rather low depth, a 307 bp amplicon of the neuraminidase gene of the Influenza A(H1N1) pandemic (A(H1N1)pdm) virus from cDNA amplicons pooled in 48 barcoded libraries obtained from nasal swab samples of infected patients (n = 299) taken from May to November, 2009 pandemic period in Mexico. This approach revealed that during the transition from the first (May-July) to second wave (September-November) of the pandemic, the initial genetic variants were replaced by the N248D mutation in the NA gene, and enabled the establishment of temporal and geographic associations with genetic diversity and the identification of mutations associated with oseltamivir resistance. CONCLUSIONS NGS sequencing of a short amplicon from the NA gene at low sequencing depth allowed genetic screening of a large number of samples, providing insights to viral genetic diversity dynamics and the identification of genetic variants associated with oseltamivir resistance. Further research is needed to explain the observed replacement of the genetic variants seen during the second wave. As sequencing throughput rises and library multiplexing and automation improves, we foresee that the approach presented here can be scaled up for global genetic surveillance of influenza and other infectious diseases.
Collapse
Affiliation(s)
- Juan Téllez-Sosa
- Centro de Investigaciones sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, México
| | - Mario Henry Rodríguez
- Centro de Investigaciones sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, México
| | - Rosa E. Gómez-Barreto
- Centro de Investigaciones sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, México
| | - Humberto Valdovinos-Torres
- Centro de Investigaciones sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, México
| | - Ana Cecilia Hidalgo
- Centro de Investigaciones sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, México
| | - Pablo Cruz-Hervert
- Centro de Investigaciones sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, México
| | - René Santos Luna
- Centro de Información para Decisiones en Salud Pública, Instituto Nacional de Salud Pública, Cuernavaca, México
| | | | - Celso Ramos
- Centro de Investigaciones sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, México
| | - Lourdes García-García
- Centro de Investigaciones sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, México
| | - Jesús Martínez-Barnetche
- Centro de Investigaciones sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, México
- * E-mail:
| |
Collapse
|
50
|
Navon O, Sul JH, Han B, Conde L, Bracci PM, Riby J, Skibola CF, Eskin E, Halperin E. Rare variant association testing under low-coverage sequencing. Genetics 2013; 194:769-79. [PMID: 23636738 PMCID: PMC3697979 DOI: 10.1534/genetics.113.150169] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2013] [Accepted: 04/17/2013] [Indexed: 01/15/2023] Open
Abstract
Deep sequencing technologies enable the study of the effects of rare variants in disease risk. While methods have been developed to increase statistical power for detection of such effects, detecting subtle associations requires studies with hundreds or thousands of individuals, which is prohibitively costly. Recently, low-coverage sequencing has been shown to effectively reduce the cost of genome-wide association studies, using current sequencing technologies. However, current methods for disease association testing on rare variants cannot be applied directly to low-coverage sequencing data, as they require individual genotype data, which may not be called correctly due to low-coverage and inherent sequencing errors. In this article, we propose two novel methods for detecting association of rare variants with disease risk, using low coverage, error-prone sequencing. We show by simulation that our methods outperform previous methods under both low- and high-coverage sequencing and under different disease architectures. We use real data and simulation studies to demonstrate that to maximize the power to detect associations for a fixed budget, it is desirable to include more samples while lowering coverage and to perform an analysis using our suggested methods.
Collapse
Affiliation(s)
- Oron Navon
- Molecular Microbiology and Biotechnology Department, Tel-Aviv University, Tel Aviv 69978, Israel
| | - Jae Hoon Sul
- Computer Science Department, University of California, Los Angeles, California 90095
| | - Buhm Han
- Division of Genetics, Brigham & Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115
- Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts 02142
| | - Lucia Conde
- Department of Epidemiology, School of Public Health, and the Comprehensive Cancer Center, University of Alabama at Birmingham, Birmingham, Alabama 35294
| | - Paige M. Bracci
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California 94107
| | - Jacques Riby
- Department of Epidemiology, School of Public Health, and the Comprehensive Cancer Center, University of Alabama at Birmingham, Birmingham, Alabama 35294
| | - Christine F. Skibola
- Department of Epidemiology, School of Public Health, and the Comprehensive Cancer Center, University of Alabama at Birmingham, Birmingham, Alabama 35294
| | - Eleazar Eskin
- Computer Science Department, University of California, Los Angeles, California 90095
- Department of Human Genetics, University of California, Los Angeles, California 90095
| | - Eran Halperin
- Molecular Microbiology and Biotechnology Department, Tel-Aviv University, Tel Aviv 69978, Israel
- The Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 69978, Israel
- International Computer Science Institute, Berkeley, California 94704
| |
Collapse
|