1
|
Ramezani M, Weisbart E, Bauman J, Singh A, Yong J, Lozada M, Way GP, Kavari SL, Diaz C, Leardini E, Jetley G, Pagnotta J, Haghighi M, Batista TM, Pérez-Schindler J, Claussnitzer M, Singh S, Cimini BA, Blainey PC, Carpenter AE, Jan CH, Neal JT. A genome-wide atlas of human cell morphology. Nat Methods 2025:10.1038/s41592-024-02537-7. [PMID: 39870862 DOI: 10.1038/s41592-024-02537-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 10/25/2024] [Indexed: 01/29/2025]
Abstract
A key challenge of the modern genomics era is developing empirical data-driven representations of gene function. Here we present the first unbiased morphology-based genome-wide perturbation atlas in human cells, containing three genome-wide genotype-phenotype maps comprising CRISPR-Cas9-based knockouts of >20,000 genes in >30 million cells. Our optical pooled cell profiling platform (PERISCOPE) combines a destainable high-dimensional phenotyping panel (based on Cell Painting) with optical sequencing of molecular barcodes and a scalable open-source analysis pipeline to facilitate massively parallel screening of pooled perturbation libraries. This perturbation atlas comprises high-dimensional phenotypic profiles of individual cells with sufficient resolution to cluster thousands of human genes, reconstruct known pathways and protein-protein interaction networks, interrogate subcellular processes and identify culture media-specific responses. Using this atlas, we identify the poorly characterized disease-associated TMEM251/LYSET as a Golgi-resident transmembrane protein essential for mannose-6-phosphate-dependent trafficking of lysosomal enzymes. In sum, this perturbation atlas and screening platform represents a rich and accessible resource for connecting genes to cellular functions at scale.
Collapse
Affiliation(s)
- Meraj Ramezani
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Erin Weisbart
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Julia Bauman
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanford University, Stanford, CA, USA
| | - Avtar Singh
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Genentech Department of Cellular and Tissue Genomics, South San Francisco, CA, USA
| | - John Yong
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Maria Lozada
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gregory P Way
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sanam L Kavari
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- University of Pennsylvania, Philadelphia, PA, USA
| | - Celeste Diaz
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanford University, Stanford, CA, USA
| | - Eddy Leardini
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gunjan Jetley
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jenlu Pagnotta
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Thiago M Batista
- Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA, USA
| | - Joaquín Pérez-Schindler
- Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA, USA
| | - Melina Claussnitzer
- Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA, USA
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | | | - Beth A Cimini
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Paul C Blainey
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biological Engineering, MIT, Cambridge, MA, USA
- Koch Institute for Integrative Research, MIT, Cambridge, MA, USA
| | | | - Calvin H Jan
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - James T Neal
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA, USA.
| |
Collapse
|
2
|
Tonelli A, Cousin P, Gambetta MC. Protocol for detecting genomic insulators in Drosophila using insulator-seq, a massively parallel reporter assay. STAR Protoc 2024; 5:103391. [PMID: 39453817 PMCID: PMC11541826 DOI: 10.1016/j.xpro.2024.103391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 09/10/2024] [Accepted: 09/24/2024] [Indexed: 10/27/2024] Open
Abstract
Genomic insulators are DNA elements that prevent transcriptional activation of a promoter by an enhancer when interposed. We present a protocol for insulator-seq that enables high-throughput screening of genomic insulators using a plasmid-based massively parallel reporter assay in Drosophila cultured cells. We describe steps for insulator reporter plasmid library generation, transient transfection into cultured cells, and sequencing library preparation and provide a pipeline for data analysis. For complete details on the use and execution of this protocol, please refer to Tonelli et al.1.
Collapse
Affiliation(s)
- Anastasiia Tonelli
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.
| | - Pascal Cousin
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | | |
Collapse
|
3
|
Shang Y, Wang Z, Xi L, Wang Y, Liu M, Feng Y, Wang J, Wu Q, Xiang X, Chen M, Ding Y. Droplet-based single-cell sequencing: Strategies and applications. Biotechnol Adv 2024; 77:108454. [PMID: 39271031 DOI: 10.1016/j.biotechadv.2024.108454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 08/22/2024] [Accepted: 09/10/2024] [Indexed: 09/15/2024]
Abstract
Notable advancements in single-cell omics technologies have not only addressed longstanding challenges but also enabled unprecedented studies of cellular heterogeneity with unprecedented resolution and scale. These strides have led to groundbreaking insights into complex biological systems, paving the way for a more profound comprehension of human biology and diseases. The droplet microfluidic technology has become a crucial component in many single-cell sequencing workflows in terms of throughput, cost-effectiveness, and automation. Utilizing a microfluidic chip to encapsulate and profile individual cells within droplets has significantly improved single-cell research. Therefore, this review aims to comprehensively elaborate the droplet microfluidics-assisted omics methods from a single-cell perspective. The strategies for using droplet microfluidics in the realms of genomics, epigenomics, transcriptomics, and proteomics analyses are first introduced. On this basis, the focus then turns to the latest applications of this technology in different sequencing patterns, including mono- and multi-omics. Finally, the challenges and further perspectives of droplet-based single-cell sequencing in both foundational research and commercial applications are discussed.
Collapse
Affiliation(s)
- Yuting Shang
- Department of Food Science & Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Zhengzheng Wang
- Department of Food Science & Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Liqing Xi
- Department of Food Science & Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Yantao Wang
- Department of Food Science & Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Meijing Liu
- Department of Food Science & Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Ying Feng
- Department of Food Science & Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Juan Wang
- College of Food Science, South China Agricultural University, Guangzhou 510432, China
| | - Qingping Wu
- National Health Commission Science and Technology Innovation Platform for Nutrition and Safety of Microbial Food, Guangdong Provincial Key Laboratory of Microbial Safety and Health, State Key Laboratory of Applied Microbiology Southern China, Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou 510070, China
| | - Xinran Xiang
- Jiangsu Key Laboratory of Huaiyang Food Safety and Nutrition Function Evaluation, Jiangsu Collaborative Innovation Center of Regional Modern Agriculture & Environmental Protection, Jiangsu Key Laboratory for Eco-Agricultural Biotechnology Around Hongze Lake, School of Life Science, Huaiyin Normal University, Huai'an 223300, China; Fujian Key Laboratory of Aptamers Technology, Fuzhou General Clinical Medical School (the 900th Hospital), Fujian Medical University, Fuzhou 350001, China.
| | - Moutong Chen
- National Health Commission Science and Technology Innovation Platform for Nutrition and Safety of Microbial Food, Guangdong Provincial Key Laboratory of Microbial Safety and Health, State Key Laboratory of Applied Microbiology Southern China, Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou 510070, China.
| | - Yu Ding
- Department of Food Science & Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China.
| |
Collapse
|
4
|
Kauer L, Imholt C, Jacob J, Berens C, Kühn R. Seasonal shifts and land-use impact: unveiling the gut microbiomes of bank voles (Myodes glareolus) and common voles (Microtus arvalis). FEMS Microbiol Ecol 2024; 100:fiae159. [PMID: 39611357 DOI: 10.1093/femsec/fiae159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 10/28/2024] [Accepted: 11/27/2024] [Indexed: 11/30/2024] Open
Abstract
Gut microbial diversity influences the health and vitality of the host, yet it is itself affected by internal and external factors, including land-use. The impact of land-use practices on wild rodents' gut microbiomes remains understudied, despite their abundance and potential as reservoirs for zoonotic pathogens. We examined the bacterial and fungal gut microbiomes of bank voles (Myodes glareolus) and common voles (Microtus arvalis) across grassland and forest habitats with varying land-use intensities and types. We collected rodents seasonally and used 16S rRNA and ITS amplicon sequencing for microbe identification. We found significant differences in alpha and beta diversities between the species, with M. arvalis exhibiting higher diversity. Seasonality emerged as a prominent factor influencing microbial diversity, with significant variations between sampling months. While land-use affects the gut microbiome, its impact is subordinate to seasonal variations. Differential abundance analysis underscores the dynamic nature of microbial composition, with seasonal changes playing a predominant role. Overall, our findings highlight the significant influence of seasonality on gut microbiome diversity and composition in wild rodents, reflecting dietary shifts associated with seasonal changes. Understanding the interplay between environmental factors and microbial communities in wild rodents enahnces our knowledge of ecosystem health and resilience, warranting further investigation.
Collapse
Affiliation(s)
- Lea Kauer
- Molecular Zoology, Department of Zoology, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Christian Imholt
- Julius Kühn-Institute, Federal Research Centre for Cultivated Plants, Institute for Epidemiology and Pathogen Diagnostics, Rodent Research, 48161 Münster, Germany
| | - Jens Jacob
- Julius Kühn-Institute, Federal Research Centre for Cultivated Plants, Institute for Epidemiology and Pathogen Diagnostics, Rodent Research, 48161 Münster, Germany
| | - Christian Berens
- Friedrich-Loeffler-Institut, Institute of Molecular Pathogenesis, 07743 Jena, Germany
| | - Ralph Kühn
- Molecular Zoology, Department of Zoology, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
- Department of Fish, Wildlife and Conservation Ecology, New Mexico State University, 8803 Las Cruces, NM, United States
| |
Collapse
|
5
|
Asami S, Yin C, Garza LA, Kalhor R. Deconvolving organogenesis in space and time via spatial transcriptomics in thick tissues. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614640. [PMID: 39386671 PMCID: PMC11463617 DOI: 10.1101/2024.09.24.614640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Organ development is guided by a space-time landscape that constraints cell behavior. This landscape is challenging to characterize for the hair follicle - the most abundant mini organ - due to its complex microscopic structure and asynchronous development. We developed 3DEEP, a tissue clearing and spatial transcriptomic strategy for characterizing tissue blocks up to 400 µm in thickness. We captured 371 hair follicles at different stages of organogenesis in 1 mm3 of skin of a 12-hour-old mouse with 6 million transcripts from 81 genes. From this single time point, we deconvoluted follicles by age based on whole-organ molecular pseudotimes to animate a stop-motion 3D atlas of follicle development along its trajectory. We defined molecular stages for hair follicle organogenesis and characterized the order of emergence for its structures, differential signaling dynamics at its top and bottom, morphogen shifts preceding and accompanying structural changes, and series of structural changes leading to the formation of its canal and opening. We further found that hair follicle stem cells and their niche are established and stratified early in organogenesis, before the formation of the hair bulb. Overall, this work demonstrates the power of increased depth of spatial transcriptomics to provide a four-dimensional analysis of organogenesis.
Collapse
Affiliation(s)
- Soichiro Asami
- Department of Biomedical Engineering, Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Chenshuo Yin
- Department of Biomedical Engineering, Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Luis A. Garza
- Department of Dermatology, Department of Cell Biology, Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Reza Kalhor
- Department of Biomedical Engineering, Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Molecular Biology and Genetics, Department of Medicine, Department of Neuroscience, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
6
|
Berezin CT, Peccoud S, Kar DM, Peccoud J. Cryptographic approaches to authenticating synthetic DNA sequences. Trends Biotechnol 2024; 42:1002-1016. [PMID: 38418329 PMCID: PMC11309913 DOI: 10.1016/j.tibtech.2024.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 02/01/2024] [Accepted: 02/02/2024] [Indexed: 03/01/2024]
Abstract
In a bioeconomy that relies on synthetic DNA sequences, the ability to ensure their authenticity is critical. DNA watermarks can encode identifying data in short sequences and can be combined with error correction and encryption protocols to ensure that sequences are robust to errors and securely communicated. New digital signature techniques allow for public verification that a sequence has not been modified and can contain sufficient information for synthetic DNA to be self-documenting. In translating these techniques from bacteria to more complex genetically modified organisms (GMOs), special considerations must be made to allow for public verification of these products. We argue that these approaches should be widely implemented to assert authorship, increase the traceability, and detect the unauthorized use of synthetic DNA.
Collapse
Affiliation(s)
- Casey-Tyler Berezin
- Department of Chemical & Biological Engineering, Colorado State University, Fort Collins, CO, USA
| | - Samuel Peccoud
- Department of Electrical Engineering, Colorado State University, Fort Collins, CO, USA
| | - Diptendu M Kar
- Department of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Jean Peccoud
- Department of Chemical & Biological Engineering, Colorado State University, Fort Collins, CO, USA; Department of Computer Sciences, Colorado State University, Fort Collins, CO, USA; School of Biomedical Engineering, Colorado State University, Fort Collins, CO, USA; Department of Systems Engineering, Colorado State University, Fort Collins, CO, USA.
| |
Collapse
|
7
|
Yu T, Ren Z, Gao X, Li G, Han R. Generating barcodes for nanopore sequencing data with PRO. FUNDAMENTAL RESEARCH 2024; 4:785-794. [PMID: 39660352 PMCID: PMC11630701 DOI: 10.1016/j.fmre.2024.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 02/20/2024] [Accepted: 04/09/2024] [Indexed: 12/12/2024] Open
Abstract
DNA barcodes, short and unique DNA sequences, play a crucial role in sample identification when processing many samples simultaneously, which helps reduce experimental costs. Nevertheless, the low quality of long-read sequencing makes it difficult to identify barcodes accurately, which poses significant challenges for the design of barcodes for large numbers of samples in a single sequencing run. Here, we present a comprehensive study of the generation of barcodes and develop a tool, PRO, that can be used for selecting optimal barcode sets and demultiplexing. We formulate the barcode design problem as a combinatorial problem and prove that finding the optimal largest barcode set in a given DNA sequence space in which all sequences have the same length is theoretically NP-complete. For practical applications, we developed the novel method PRO by introducing the probability divergence between two DNA sequences to expand the capacity of barcode kits while ensuring demultiplexing accuracy. Specifically, the maximum size of the barcode kits designed by PRO is 2,292, which keeps the length of barcodes the same as that of the official ones used by Oxford Nanopore Technologies (ONT). We validated the performance of PRO on a simulated nanopore dataset with high error rates. The demultiplexing accuracy of PRO reached 98.29% for a barcode kit of size 2,922, 4.31% higher than that of Guppy, the official demultiplexing tool. When the size of the barcode kit generated by PRO is the same as the official size provided by ONT, both tools show superior and comparable demultiplexing accuracy.
Collapse
Affiliation(s)
- Ting Yu
- Research Center for Mathematics and Interdisciplinary Sciences, Frontiers Science Center for Nonlinear Expectations (Ministry of Education), Shandong University, Shandong 266000, China
| | - Zitong Ren
- Research Center for Mathematics and Interdisciplinary Sciences, Frontiers Science Center for Nonlinear Expectations (Ministry of Education), Shandong University, Shandong 266000, China
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division & Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Guojun Li
- Research Center for Mathematics and Interdisciplinary Sciences, Frontiers Science Center for Nonlinear Expectations (Ministry of Education), Shandong University, Shandong 266000, China
| | - Renmin Han
- Research Center for Mathematics and Interdisciplinary Sciences, Frontiers Science Center for Nonlinear Expectations (Ministry of Education), Shandong University, Shandong 266000, China
| |
Collapse
|
8
|
Alcantar MA, English MA, Valeri JA, Collins JJ. A high-throughput synthetic biology approach for studying combinatorial chromatin-based transcriptional regulation. Mol Cell 2024; 84:2382-2396.e9. [PMID: 38906116 DOI: 10.1016/j.molcel.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 04/11/2024] [Accepted: 05/24/2024] [Indexed: 06/23/2024]
Abstract
The construction of synthetic gene circuits requires the rational combination of multiple regulatory components, but predicting their behavior can be challenging due to poorly understood component interactions and unexpected emergent behaviors. In eukaryotes, chromatin regulators (CRs) are essential regulatory components that orchestrate gene expression. Here, we develop a screening platform to investigate the impact of CR pairs on transcriptional activity in yeast. We construct a combinatorial library consisting of over 1,900 CR pairs and use a high-throughput workflow to characterize the impact of CR co-recruitment on gene expression. We recapitulate known interactions and discover several instances of CR pairs with emergent behaviors. We also demonstrate that supervised machine learning models trained with low-dimensional amino acid embeddings accurately predict the impact of CR co-recruitment on transcriptional activity. This work introduces a scalable platform and machine learning approach that can be used to study how networks of regulatory components impact gene expression.
Collapse
Affiliation(s)
- Miguel A Alcantar
- Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, MIT, Cambridge, MA 02139, USA
| | - Max A English
- Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, MIT, Cambridge, MA 02139, USA
| | - Jacqueline A Valeri
- Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, MIT, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - James J Collins
- Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, MIT, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
9
|
Huang X, Li X, Tay A. Advances in techniques to characterize cell-nanomaterial interactions (CNI). NANO TODAY 2024; 55:102149. [DOI: 10.1016/j.nantod.2024.102149] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
|
10
|
Al'Khafaji AM, Smith JT, Garimella KV, Babadi M, Popic V, Sade-Feldman M, Gatzen M, Sarkizova S, Schwartz MA, Blaum EM, Day A, Costello M, Bowers T, Gabriel S, Banks E, Philippakis AA, Boland GM, Blainey PC, Hacohen N. High-throughput RNA isoform sequencing using programmed cDNA concatenation. Nat Biotechnol 2024; 42:582-586. [PMID: 37291427 DOI: 10.1038/s41587-023-01815-7] [Citation(s) in RCA: 51] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 05/02/2023] [Indexed: 06/10/2023]
Abstract
Full-length RNA-sequencing methods using long-read technologies can capture complete transcript isoforms, but their throughput is limited. We introduce multiplexed arrays isoform sequencing (MAS-ISO-seq), a technique for programmably concatenating complementary DNAs (cDNAs) into molecules optimal for long-read sequencing, increasing the throughput >15-fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. When applied to single-cell RNA sequencing of tumor-infiltrating T cells, MAS-ISO-seq demonstrated a 12- to 32-fold increase in the discovery of differentially spliced genes.
Collapse
Affiliation(s)
| | | | | | | | | | - Moshe Sade-Feldman
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
| | | | | | - Marc A Schwartz
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatric Oncology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Emily M Blaum
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
| | - Allyson Day
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Tera Bowers
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Eric Banks
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Genevieve M Boland
- Division of Surgical Oncology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Paul C Blainey
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Koch Institute for Integrative Cancer Research at the Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Nir Hacohen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Medicine, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA.
- Harvard Medical School, Boston, MA, USA.
- Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Charlestown, MA, USA.
| |
Collapse
|
11
|
Mihai IS, Chafle S, Henriksson J. Representing and extracting knowledge from single-cell data. Biophys Rev 2024; 16:29-56. [PMID: 38495441 PMCID: PMC10937862 DOI: 10.1007/s12551-023-01091-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 06/28/2023] [Indexed: 03/19/2024] Open
Abstract
Single-cell analysis is currently one of the most high-resolution techniques to study biology. The large complex datasets that have been generated have spurred numerous developments in computational biology, in particular the use of advanced statistics and machine learning. This review attempts to explain the deeper theoretical concepts that underpin current state-of-the-art analysis methods. Single-cell analysis is covered from cell, through instruments, to current and upcoming models. The aim of this review is to spread concepts which are not yet in common use, especially from topology and generative processes, and how new statistical models can be developed to capture more of biology. This opens epistemological questions regarding our ontology and models, and some pointers will be given to how natural language processing (NLP) may help overcome our cognitive limitations for understanding single-cell data.
Collapse
Affiliation(s)
- Ionut Sebastian Mihai
- The Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå, Sweden
- Umeå Centre for Microbial Research (UCMR), Department of Molecular Biology, Umeå University, Umeå, Sweden
- Industrial Doctoral School, Umeå University, Umeå, Sweden
| | - Sarang Chafle
- The Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå, Sweden
- Umeå Centre for Microbial Research (UCMR), Department of Molecular Biology, Umeå University, Umeå, Sweden
| | - Johan Henriksson
- The Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå, Sweden
- Umeå Centre for Microbial Research (UCMR), Department of Molecular Biology, Umeå University, Umeå, Sweden
| |
Collapse
|
12
|
Handler K, Bach K, Borrelli C, Piscuoglio S, Ficht X, Acar IE, Moor AE. Fragment-sequencing unveils local tissue microenvironments at single-cell resolution. Nat Commun 2023; 14:7775. [PMID: 38012149 PMCID: PMC10681997 DOI: 10.1038/s41467-023-43005-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 10/27/2023] [Indexed: 11/29/2023] Open
Abstract
Cells collectively determine biological functions by communicating with each other-both through direct physical contact and secreted factors. Consequently, the local microenvironment of a cell influences its behavior, gene expression, and cellular crosstalk. Disruption of this microenvironment causes reciprocal changes in those features, which can lead to the development and progression of diseases. Hence, assessing the cellular transcriptome while simultaneously capturing the spatial relationships of cells within a tissue provides highly valuable insights into how cells communicate in health and disease. Yet, methods to probe the transcriptome often fail to preserve native spatial relationships, lack single-cell resolution, or are highly limited in throughput, i.e. lack the capacity to assess multiple environments simultaneously. Here, we introduce fragment-sequencing (fragment-seq), a method that enables the characterization of single-cell transcriptomes within multiple spatially distinct tissue microenvironments. We apply fragment-seq to a murine model of the metastatic liver to study liver zonation and the metastatic niche. This analysis reveals zonated genes and ligand-receptor interactions enriched in specific hepatic microenvironments. Finally, we apply fragment-seq to other tissues and species, demonstrating the adaptability of our method.
Collapse
Affiliation(s)
- Kristina Handler
- Department of Biosystems Science and Engineering, ETH Zürich, Schanzenstrasse 44, 4056, Basel, Switzerland
| | - Karsten Bach
- Department of Biosystems Science and Engineering, ETH Zürich, Schanzenstrasse 44, 4056, Basel, Switzerland
| | - Costanza Borrelli
- Department of Biosystems Science and Engineering, ETH Zürich, Schanzenstrasse 44, 4056, Basel, Switzerland
| | - Salvatore Piscuoglio
- Institute of Medical Genetics and Pathology, University Hospital Basel, Basel, Switzerland
- Visceral Surgery and Precision Medicine Research Laboratory, Department of Biomedicine, University of Basel, Basel, Switzerland
| | - Xenia Ficht
- Department of Biosystems Science and Engineering, ETH Zürich, Schanzenstrasse 44, 4056, Basel, Switzerland
| | - Ilhan E Acar
- Department of Biosystems Science and Engineering, ETH Zürich, Schanzenstrasse 44, 4056, Basel, Switzerland
| | - Andreas E Moor
- Department of Biosystems Science and Engineering, ETH Zürich, Schanzenstrasse 44, 4056, Basel, Switzerland.
| |
Collapse
|
13
|
Trauernicht M, Rastogi C, Manzo S, Bussemaker H, van Steensel B. Optimisation of TP53 reporters by systematic dissection of synthetic TP53 response elements. Nucleic Acids Res 2023; 51:9690-9702. [PMID: 37650627 PMCID: PMC10570033 DOI: 10.1093/nar/gkad718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 07/24/2023] [Accepted: 08/22/2023] [Indexed: 09/01/2023] Open
Abstract
TP53 is a transcription factor that controls multiple cellular processes, including cell cycle arrest, DNA repair and apoptosis. The relation between TP53 binding site architecture and transcriptional output is still not fully understood. Here, we systematically examined in three different cell lines the effects of binding site affinity and copy number on TP53-dependent transcriptional output, and also probed the impact of spacer length and sequence between adjacent binding sites, and of core promoter identity. Paradoxically, we found that high-affinity TP53 binding sites are less potent than medium-affinity sites. TP53 achieves supra-additive transcriptional activation through optimally spaced adjacent binding sites, suggesting a cooperative mechanism. Optimally spaced adjacent binding sites have a ∼10-bp periodicity, suggesting a role for spatial orientation along the DNA double helix. We leveraged these insights to construct a log-linear model that explains activity from sequence features, and to identify new highly active and sensitive TP53 reporters.
Collapse
Affiliation(s)
- Max Trauernicht
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
- Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| | - Chaitanya Rastogi
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Stefano G Manzo
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
- Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
- Department of Biosciences, University of Milan “La Statale”, 20133 Milan, Italy
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Bas van Steensel
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
- Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| |
Collapse
|
14
|
Ramezani M, Bauman J, Singh A, Weisbart E, Yong J, Lozada M, Way GP, Kavari SL, Diaz C, Haghighi M, Batista TM, Pérez-Schindler J, Claussnitzer M, Singh S, Cimini BA, Blainey PC, Carpenter AE, Jan CH, Neal JT. A genome-wide atlas of human cell morphology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.06.552164. [PMID: 37609130 PMCID: PMC10441312 DOI: 10.1101/2023.08.06.552164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
A key challenge of the modern genomics era is developing data-driven representations of gene function. Here, we present the first unbiased morphology-based genome-wide perturbation atlas in human cells, containing three genome-scale genotype-phenotype maps comprising >20,000 single-gene CRISPR-Cas9-based knockout experiments in >30 million cells. Our optical pooled cell profiling approach (PERISCOPE) combines a de-stainable high-dimensional phenotyping panel (based on Cell Painting1,2) with optical sequencing of molecular barcodes and a scalable open-source analysis pipeline to facilitate massively parallel screening of pooled perturbation libraries. This approach provides high-dimensional phenotypic profiles of individual cells, while simultaneously enabling interrogation of subcellular processes. Our atlas reconstructs known pathways and protein-protein interaction networks, identifies culture media-specific responses to gene knockout, and clusters thousands of human genes by phenotypic similarity. Using this atlas, we identify the poorly-characterized disease-associated transmembrane protein TMEM251/LYSET as a Golgi-resident protein essential for mannose-6-phosphate-dependent trafficking of lysosomal enzymes, showing the power of these representations. In sum, our atlas and screening technology represent a rich and accessible resource for connecting genes to cellular functions at scale.
Collapse
Affiliation(s)
- Meraj Ramezani
- Broad Institute of MIT & Harvard, Cambridge, MA, USA
- Type 2 Diabetes Systems Genomics Initiative of the Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Julia Bauman
- Broad Institute of MIT & Harvard, Cambridge, MA, USA
- Current address: Stanford University, Stanford, CA, USA
| | - Avtar Singh
- Broad Institute of MIT & Harvard, Cambridge, MA, USA
- Current address: Genentech Department of Cellular and Tissue Genomics, South San Francisco, CA, USA
| | - Erin Weisbart
- Broad Institute of MIT & Harvard, Cambridge, MA, USA
| | - John Yong
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Maria Lozada
- Broad Institute of MIT & Harvard, Cambridge, MA, USA
- Type 2 Diabetes Systems Genomics Initiative of the Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gregory P Way
- Broad Institute of MIT & Harvard, Cambridge, MA, USA
- Current address: Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, Colorado, USA
| | - Sanam L Kavari
- Broad Institute of MIT & Harvard, Cambridge, MA, USA
- Current address: University of Pennsylvania, Philadelphia, PA, USA
| | - Celeste Diaz
- Broad Institute of MIT & Harvard, Cambridge, MA, USA
- Current address: Stanford University, Stanford, CA, USA
| | | | - Thiago M Batista
- Type 2 Diabetes Systems Genomics Initiative of the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease at Broad Institute, Cambridge, MA, USA
| | - Joaquín Pérez-Schindler
- Type 2 Diabetes Systems Genomics Initiative of the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease at Broad Institute, Cambridge, MA, USA
| | - Melina Claussnitzer
- Type 2 Diabetes Systems Genomics Initiative of the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease at Broad Institute, Cambridge, MA, USA
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | | | - Beth A Cimini
- Broad Institute of MIT & Harvard, Cambridge, MA, USA
| | - Paul C Blainey
- Broad Institute of MIT & Harvard, Cambridge, MA, USA
- MIT Department of Biological Engineering, Cambridge, MA, USA
- Koch Institute for Integrative Research at MIT, Cambridge, MA, USA
| | | | - Calvin H Jan
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - James T Neal
- Broad Institute of MIT & Harvard, Cambridge, MA, USA
- Type 2 Diabetes Systems Genomics Initiative of the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease at Broad Institute, Cambridge, MA, USA
| |
Collapse
|
15
|
Lim CK, Yeoh JW, Kunartama AA, Yew WS, Poh CL. A biological camera that captures and stores images directly into DNA. Nat Commun 2023; 14:3921. [PMID: 37400476 DOI: 10.1038/s41467-023-38876-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 05/19/2023] [Indexed: 07/05/2023] Open
Abstract
The increasing integration between biological and digital interfaces has led to heightened interest in utilizing biological materials to store digital data, with the most promising one involving the storage of data within defined sequences of DNA that are created by de novo DNA synthesis. However, there is a lack of methods that can obviate the need for de novo DNA synthesis, which tends to be costly and inefficient. Here, in this work, we detail a method of capturing 2-dimensional light patterns into DNA, by utilizing optogenetic circuits to record light exposure into DNA, encoding spatial locations with barcoding, and retrieving stored images via high-throughput next-generation sequencing. We demonstrate the encoding of multiple images into DNA, totaling 1152 bits, selective image retrieval, as well as robustness to drying, heat and UV. We also demonstrate successful multiplexing using multiple wavelengths of light, capturing 2 different images simultaneously using red and blue light. This work thus establishes a 'living digital camera', paving the way towards integrating biological systems with digital devices.
Collapse
Affiliation(s)
- Cheng Kai Lim
- Synthetic Biology for Clinical and Technological Innovation, National University of Singapore, 28 Medical Drive, Singapore, 117456, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, 14 Medical Drive, Singapore, 117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore, 117597, Singapore
- Department of Biomedical Engineering, College of Design and Engineering, National University of Singapore, Singapore, Singapore
- Integrative Sciences and Engineering Programme (ISEP), NUS Graduate School, National University of Singapore, Singapore, Singapore
| | - Jing Wui Yeoh
- Synthetic Biology for Clinical and Technological Innovation, National University of Singapore, 28 Medical Drive, Singapore, 117456, Singapore
- Department of Biomedical Engineering, College of Design and Engineering, National University of Singapore, Singapore, Singapore
| | - Aurelius Andrew Kunartama
- Synthetic Biology for Clinical and Technological Innovation, National University of Singapore, 28 Medical Drive, Singapore, 117456, Singapore
- Department of Biomedical Engineering, College of Design and Engineering, National University of Singapore, Singapore, Singapore
| | - Wen Shan Yew
- Synthetic Biology for Clinical and Technological Innovation, National University of Singapore, 28 Medical Drive, Singapore, 117456, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, 14 Medical Drive, Singapore, 117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore, 117597, Singapore
| | - Chueh Loo Poh
- Synthetic Biology for Clinical and Technological Innovation, National University of Singapore, 28 Medical Drive, Singapore, 117456, Singapore.
- Department of Biomedical Engineering, College of Design and Engineering, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
16
|
Ford K, Munson BP, Fong SH, Panwala R, Chu WK, Rainaldi J, Plongthongkum N, Arunachalam V, Kostrowicki J, Meluzzi D, Kreisberg JF, Jensen-Pergakes K, VanArsdale T, Paul T, Tamayo P, Zhang K, Bienkowska J, Mali P, Ideker T. Multimodal perturbation analyses of cyclin-dependent kinases reveal a network of synthetic lethalities associated with cell-cycle regulation and transcriptional regulation. Sci Rep 2023; 13:7678. [PMID: 37169829 PMCID: PMC10175263 DOI: 10.1038/s41598-023-33329-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 04/11/2023] [Indexed: 05/13/2023] Open
Abstract
Cell-cycle control is accomplished by cyclin-dependent kinases (CDKs), motivating extensive research into CDK targeting small-molecule drugs as cancer therapeutics. Here we use combinatorial CRISPR/Cas9 perturbations to uncover an extensive network of functional interdependencies among CDKs and related factors, identifying 43 synthetic-lethal and 12 synergistic interactions. We dissect CDK perturbations using single-cell RNAseq, for which we develop a novel computational framework to precisely quantify cell-cycle effects and diverse cell states orchestrated by specific CDKs. While pairwise disruption of CDK4/6 is synthetic-lethal, only CDK6 is required for normal cell-cycle progression and transcriptional activation. Multiple CDKs (CDK1/7/9/12) are synthetic-lethal in combination with PRMT5, independent of cell-cycle control. In-depth analysis of mRNA expression and splicing patterns provides multiple lines of evidence that the CDK-PRMT5 dependency is due to aberrant transcriptional regulation resulting in premature termination. These inter-dependencies translate to drug-drug synergies, with therapeutic implications in cancer and other diseases.
Collapse
Affiliation(s)
- Kyle Ford
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Brenton P Munson
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Samson H Fong
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Rebecca Panwala
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Wai Keung Chu
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Joseph Rainaldi
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
- Biomedical Sciences Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - Nongluk Plongthongkum
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | | | | | - Dario Meluzzi
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Jason F Kreisberg
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | | | - Todd VanArsdale
- Pfizer Inc, 10555 Science Center Drive, San Diego, CA, 92121, USA
| | - Thomas Paul
- Pfizer Inc, 10555 Science Center Drive, San Diego, CA, 92121, USA
| | - Pablo Tamayo
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Kun Zhang
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | | | - Prashant Mali
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA.
| | - Trey Ideker
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
17
|
Reinbold C, Kong KYE, Kats I, Khmelinskii A, Knop M. Multiplexed protein stability (MPS) profiling of terminal degrons using fluorescent timer libraries in Saccharomyces cerevisiae. Methods Enzymol 2023; 686:321-344. [PMID: 37532406 DOI: 10.1016/bs.mie.2023.02.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2023]
Abstract
N-terminal protein sequences and their proteolytic processing and modifications influence the stability and turnover of proteins by creating potential degrons for cellular proteolytic pathways. Understanding the impact of genetic perturbations of components affecting the processing of protein N-termini and thereby their stability, requires methods compatible with proteome-wide studies of many N-termini simultaneously. Tandem fluorescent timers (tFT) allow the in vivo measurement of protein turnover completely independent of protein abundance and can be deployed for proteome-wide studies. Here we present a protocol for Multiplexed Protein Stability (MPS) profiling of tFT-libraries encoding large numbers of different protein N-termini fused to tFT in the yeast Saccharomyces cerevisiae. This protocol includes fluorescence cell sorting based profiling of these libraries using a pooling approach. Analysis of the sorted pools is done by using multiplexed deep sequencing, in order to generate a stability index for each N-terminally peptide fused to the tFT reporter, and to evaluate half-life changes across all species represented in the library.
Collapse
Affiliation(s)
- Christian Reinbold
- Center for Molecular Biology of Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, Heidelberg, Germany
| | | | - Ilia Kats
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | - Michael Knop
- Center for Molecular Biology of Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, Heidelberg, Germany; Cell Morphogenesis and Signal Transduction, German Cancer Research Center (DKFZ), DKFZ-ZMBH Alliance, Heidelberg, Germany.
| |
Collapse
|
18
|
He PC, Wei J, Dou X, Harada BT, Zhang Z, Ge R, Liu C, Zhang LS, Yu X, Wang S, Lyu R, Zou Z, Chen M, He C. Exon architecture controls mRNA m 6A suppression and gene expression. Science 2023; 379:677-682. [PMID: 36705538 PMCID: PMC9990141 DOI: 10.1126/science.abj9090] [Citation(s) in RCA: 101] [Impact Index Per Article: 50.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 01/16/2023] [Indexed: 01/28/2023]
Abstract
N6-methyladenosine (m6A) is the most abundant messenger RNA (mRNA) modification and plays crucial roles in diverse physiological processes. Using a massively parallel assay for m6A (MPm6A), we discover that m6A specificity is globally regulated by suppressors that prevent m6A deposition in unmethylated transcriptome regions. We identify exon junction complexes (EJCs) as m6A suppressors that protect exon junction-proximal RNA within coding sequences from methylation and regulate mRNA stability through m6A suppression. EJC suppression of m6A underlies multiple global characteristics of mRNA m6A specificity, with the local range of EJC protection sufficient to suppress m6A deposition in average-length internal exons but not in long internal and terminal exons. EJC-suppressed methylation sites colocalize with EJC-suppressed splice sites, which suggests that exon architecture broadly determines local mRNA accessibility to regulatory complexes.
Collapse
Affiliation(s)
- P. Cody He
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Committee on Immunology, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Jiangbo Wei
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Xiaoyang Dou
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Bryan T. Harada
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Zijie Zhang
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
- State Key Laboratory for Conservation and Utilization of Bio-Resources, School of Life Sciences, Yunnan University, Kunming, Yunnan 650091, China
| | - Ruiqi Ge
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Chang Liu
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Li-Sheng Zhang
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Xianbin Yu
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Shuai Wang
- Department of Neurobiology, The University of Chicago, Chicago, IL 60637, USA
| | - Ruitu Lyu
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Zhongyu Zou
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Mengjie Chen
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
| | - Chuan He
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Committee on Immunology, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
19
|
Kijima Y, Evans-Yamamoto D, Toyoshima H, Yachie N. A universal sequencing read interpreter. SCIENCE ADVANCES 2023; 9:eadd2793. [PMID: 36598975 PMCID: PMC9812397 DOI: 10.1126/sciadv.add2793] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 11/30/2022] [Indexed: 06/17/2023]
Abstract
Massively parallel DNA sequencing has led to the rapid growth of highly multiplexed experiments in biology. These experiments produce unique sequencing results that require specific analysis pipelines to decode highly structured reads. However, no versatile framework that interprets sequencing reads to extract their encoded information for downstream biological analysis has been developed. Here, we report INTERSTELLAR (interpretation, scalable transformation, and emulation of large-scale sequencing reads) that decodes data values encoded in theoretically any type of sequencing read and translates them into sequencing reads of another structure of choice. We demonstrated that INTERSTELLAR successfully extracted information from a range of short- and long-read sequencing reads and translated those of single-cell (sc)RNA-seq, scATAC-seq, and spatial transcriptomics to be analyzed by different software tools that have been developed for conceptually the same types of experiments. INTERSTELLAR will greatly facilitate the development of sequencing-based experiments and sharing of data analysis pipelines.
Collapse
Affiliation(s)
- Yusuke Kijima
- School of Biomedical Engineering, Faculty of Applied Science and Faculty of Medicine, The University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo 153-8904, Japan
- Department of Aquatic Bioscience, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan
| | - Daniel Evans-Yamamoto
- Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo 153-8904, Japan
- Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0035, Japan
| | - Hiromi Toyoshima
- Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo 153-8904, Japan
| | - Nozomu Yachie
- School of Biomedical Engineering, Faculty of Applied Science and Faculty of Medicine, The University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo 153-8904, Japan
- Twitter: @yachielab
| |
Collapse
|
20
|
Press WH. Fast trimer statistics facilitate accurate decoding of large random DNA barcode sets even at large sequencing error rates. PNAS NEXUS 2022; 1:pgac252. [PMID: 36712375 PMCID: PMC9802387 DOI: 10.1093/pnasnexus/pgac252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 10/31/2022] [Indexed: 11/06/2022]
Abstract
Predefined sets of short DNA sequences are commonly used as barcodes to identify individual biomolecules in pooled populations. Such use requires either sufficiently small DNA error rates, or else an error-correction methodology. Most existing DNA error-correcting codes (ECCs) correct only one or two errors per barcode in sets of typically ≲104 barcodes. We here consider the use of random barcodes of sufficient length that they remain accurately decodable even with ≳6 errors and even at [Formula: see text] or 20% nucleotide error rates. We show that length ∼34 nt is sufficient even with ≳106 barcodes. The obvious objection to this scheme is that it requires comparing every read to every possible barcode by a slow Levenshtein or Needleman-Wunsch comparison. We show that several orders of magnitude speedup can be achieved by (i) a fast triage method that compares only trimer (three consecutive nucleotide) occurence statistics, precomputed in linear time for both reads and barcodes, and (ii) the massive parallelism available on today's even commodity-grade Graphics Processing Units (GPUs). With 106 barcodes of length 34 and 10% DNA errors (substitutions and indels), we achieve in simulation 99.9% precision (decode accuracy) with 98.8% recall (read acceptance rate). Similarly high precision with somewhat smaller recall is achievable even with 20% DNA errors. The amortized computation cost on a commodity workstation with two GPUs (2022 capability and price) is estimated as between US$ 0.15 and US$ 0.60 per million decoded reads.
Collapse
|
21
|
Lalli M, Yen A, Thopte U, Dong F, Moudgil A, Chen X, Milbrandt J, Dougherty JD, Mitra RD. Measuring transcription factor binding and gene expression using barcoded self-reporting transposon calling cards and transcriptomes. NAR Genom Bioinform 2022; 4:lqac061. [PMID: 36062164 PMCID: PMC9428926 DOI: 10.1093/nargab/lqac061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 07/04/2022] [Accepted: 08/24/2022] [Indexed: 11/29/2022] Open
Abstract
Calling cards technology using self-reporting transposons enables the identification of DNA-protein interactions through RNA sequencing. Although immensely powerful, current implementations of calling cards in bulk experiments on populations of cells are technically cumbersome and require many replicates to identify independent insertions into the same genomic locus. Here, we have drastically reduced the cost and labor requirements of calling card experiments in bulk populations of cells by introducing a DNA barcode into the calling card itself. An additional barcode incorporated during reverse transcription enables simultaneous transcriptome measurement in a facile and affordable protocol. We demonstrate that barcoded self-reporting transposons recover in vitro binding sites for four basic helix-loop-helix transcription factors with important roles in cell fate specification: ASCL1, MYOD1, NEUROD2 and NGN1. Further, simultaneous calling cards and transcriptional profiling during transcription factor overexpression identified both binding sites and gene expression changes for two of these factors. Lastly, we demonstrated barcoded calling cards can record binding in vivo in the mouse brain. In sum, RNA-based identification of transcription factor binding sites and gene expression through barcoded self-reporting transposon calling cards and transcriptomes is an efficient and powerful method to infer gene regulatory networks in a population of cells.
Collapse
Affiliation(s)
- Matthew Lalli
- Department of Genetics, School of Medicine, Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, USA.,Edison Family Center for Genome Sciences and Systems Biology Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, USA.,Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Allen Yen
- Department of Genetics, School of Medicine, Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, USA.,Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Urvashi Thopte
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Fengping Dong
- Department of Genetics, School of Medicine, Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, USA.,Edison Family Center for Genome Sciences and Systems Biology Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, USA
| | - Arnav Moudgil
- Department of Genetics, School of Medicine, Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, USA.,Edison Family Center for Genome Sciences and Systems Biology Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, USA
| | - Xuhua Chen
- Department of Genetics, School of Medicine, Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, USA.,Edison Family Center for Genome Sciences and Systems Biology Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, USA
| | - Jeffrey Milbrandt
- Department of Genetics, School of Medicine, Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, USA
| | - Joseph D Dougherty
- Department of Genetics, School of Medicine, Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, USA.,Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Robi D Mitra
- Department of Genetics, School of Medicine, Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, USA.,Edison Family Center for Genome Sciences and Systems Biology Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, USA
| |
Collapse
|
22
|
Bak SK, Seong W, Rha E, Lee H, Kim SK, Kwon KK, Kim H, Lee SG. Novel High-Throughput DNA Part Characterization Technique for Synthetic Biology. J Microbiol Biotechnol 2022; 32:1026-1033. [PMID: 35879270 PMCID: PMC9628936 DOI: 10.4014/jmb.2207.07013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 07/19/2022] [Accepted: 07/19/2022] [Indexed: 12/15/2022]
Abstract
This study presents a novel DNA part characterization technique that increases throughput by combinatorial DNA part assembly, solid plate-based quantitative fluorescence assay for phenotyping, and barcode tagging-based long-read sequencing for genotyping. We confirmed that the fluorescence intensities of colonies on plates were comparable to fluorescence at the single-cell level from a high-end, flow-cytometry device and developed a high-throughput image analysis pipeline. The barcode tagging-based long-read sequencing technique enabled rapid identification of all DNA parts and their combinations with a single sequencing experiment. Using our techniques, forty-four DNA parts (21 promoters and 23 RBSs) were successfully characterized in 72 h without any automated equipment. We anticipate that this high-throughput and easy-to-use part characterization technique will contribute to increasing part diversity and be useful for building genetic circuits and metabolic pathways in synthetic biology.
Collapse
Affiliation(s)
- Seong-Kun Bak
- Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea,Biosystems and Bioengineering Program, University of Science and Technology, Daejeon 34141, Republic of Korea
| | - Wonjae Seong
- Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea
| | - Eugene Rha
- Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea
| | - Hyewon Lee
- Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea
| | - Seong Keun Kim
- Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea
| | - Kil Koang Kwon
- Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea
| | - Haseong Kim
- Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea,Biosystems and Bioengineering Program, University of Science and Technology, Daejeon 34141, Republic of Korea,Corresponding authors H.S. Kim Phone: +82-42-860-4372 Fax: +82-42-860-4489 E-mail:
| | - Seung-Goo Lee
- Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea,Biosystems and Bioengineering Program, University of Science and Technology, Daejeon 34141, Republic of Korea,
S.G. Lee Phone: +82-42-860-4373 E-mail:
| |
Collapse
|
23
|
Cohen-Aharonov LA, Rebibo-Sabbah A, Yaacov A, Granit RZ, Strauss M, Colodner R, Cheshin O, Rosenberg S, Eavri R. High throughput SARS-CoV-2 variant analysis using molecular barcodes coupled with next generation sequencing. PLoS One 2022; 17:e0253404. [PMID: 35727806 PMCID: PMC9212143 DOI: 10.1371/journal.pone.0253404] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 04/24/2022] [Indexed: 11/18/2022] Open
Abstract
The identification of SARS-CoV-2 variants across the globe and their implications on the outspread of the pandemic, infection potential and resistance to vaccination, requires modification of the current diagnostic methods to map out viral mutations rapidly and reliably. Here, we demonstrate that integrating DNA barcoding technology, sample pooling and Next Generation Sequencing (NGS) provide an applicable solution for large-population viral screening combined with specific variant analysis. Our solution allows high throughput testing by barcoding each sample, followed by pooling of test samples using a multi-step procedure. First, patient-specific barcodes are added to the primers used in a one-step RT-PCR reaction, amplifying three different viral genes and one human housekeeping gene (as internal control). Then, samples are pooled, purified and finally, the generated sequences are read using an Illumina NGS system to identify the positive samples with a sensitivity of 82.5% and a specificity of 97.3%. Using this solution, we were able to identify six known and one unknown SARS-CoV-2 variants in a screen of 960 samples out of which 258 (27%) were positive for the virus. Thus, our diagnostic solution integrates the benefits of large population and epidemiological screening together with sensitive and specific identification of positive samples including variant analysis at a single nucleotide resolution.
Collapse
Affiliation(s)
| | | | - Adar Yaacov
- Laboratory for Computational Biology of Cancer, Sharett Institute for Oncology, Hadassah - Hebrew University Medical Center, Jerusalem, Israel
- The Wohl Institute for Translational Medicine, Hadassah – Hebrew University Medical Center, Jerusalem, Israel
| | | | - Merav Strauss
- Microbiology Laboratory, Emek Medical Center, Afula, Israel
| | - Raul Colodner
- Microbiology Laboratory, Emek Medical Center, Afula, Israel
| | - Ori Cheshin
- Internal Medicine E, Emek Medical Center, Afula, Israel
| | - Shai Rosenberg
- Laboratory for Computational Biology of Cancer, Sharett Institute for Oncology, Hadassah - Hebrew University Medical Center, Jerusalem, Israel
- The Wohl Institute for Translational Medicine, Hadassah – Hebrew University Medical Center, Jerusalem, Israel
| | - Ronen Eavri
- Barcode Diagnostics Ltd., Nazareth, Israel
- * E-mail:
| |
Collapse
|
24
|
Emiliani FE, Hsu I, McKenna A. Multiplexed Assembly and Annotation of Synthetic Biology Constructs Using Long-Read Nanopore Sequencing. ACS Synth Biol 2022; 11:2238-2246. [PMID: 35695379 PMCID: PMC9295152 DOI: 10.1021/acssynbio.2c00126] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Recombinant DNA is
a fundamental tool in biotechnology and medicine.
These DNA sequences are often built, replicated, and delivered in
the form of plasmids. Validation of these plasmid sequences is a critical
and time-consuming step, which has been dominated for the last 35
years by Sanger sequencing. As plasmid sequences grow more complex
with new DNA synthesis and cloning techniques, we need new approaches
that address the corresponding validation challenges at scale. Here
we prototype a high-throughput plasmid sequencing approach using DNA
transposition and Oxford Nanopore sequencing. Our method, Circuit-seq,
creates robust, full-length, and accurate plasmid assemblies without
prior knowledge of the underlying sequence. We demonstrate the power
of Circuit-seq across a wide range of plasmid sizes and complexities,
generating full-length, contiguous plasmid maps. We then leverage
our long-read data to characterize epigenetic marks and estimate plasmid
contamination levels. Circuit-seq scales to large numbers of samples
at a lower per-sample cost than commercial Sanger sequencing, accelerating
a key step in synthetic biology, while low equipment costs make it
practical for individual laboratories.
Collapse
Affiliation(s)
- Francesco E Emiliani
- Department of Molecular and Systems Biology, Geisel School of Medicine, Dartmouth College, Lebanon, New Hampshire 03756, United States
| | - Ian Hsu
- Department of Molecular and Systems Biology, Geisel School of Medicine, Dartmouth College, Lebanon, New Hampshire 03756, United States
| | - Aaron McKenna
- Department of Molecular and Systems Biology, Geisel School of Medicine, Dartmouth College, Lebanon, New Hampshire 03756, United States.,Norris Cotton Cancer Center, Dartmouth-Hitchcock Medical Center, Lebanon, New Hampshire 03756, United States
| |
Collapse
|
25
|
Molik DC. met v1: expanding on old estimations of biodiversity from eDNA with a new database framework. Database (Oxford) 2022; 2022:6583522. [PMID: 35543254 PMCID: PMC9216496 DOI: 10.1093/database/baac032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 03/10/2022] [Accepted: 04/28/2022] [Indexed: 11/14/2022]
Abstract
Abstract
A long-standing problem in environmental DNA has been the inability to compute across large number of datasets. Here we introduce an open-source software framework that can store a large number of environmental DNA datasets, as well as provide a platform for analysis, in an easily customizable way. We show the utility of such an approach by analyzing over 1400 arthropod metabarcode datasets. This article introduces a new software framework, met, which utilizes large numbers of metabarcode datasets to draw conclusions about patterns of diversity at large spatial scales. Given more accurate estimations on the distribution of variance in metabarcode datasets, this software framework could facilitate novel analyses that are outside the scope of currently available similar platforms.
Database URL https://osf.io/spb8v/
Collapse
Affiliation(s)
- David C Molik
- Navari Family Center for Digital Scholarship, Hesburgh Library, University of Notre Dame, Notre Dame, IN 46556, USA
- Department of Biological Sciences, Galvin Life Science Center, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
26
|
Ezpeleta J, Garcia Labari I, Villanova GV, Bulacio P, Lavista-Llanos S, Posner V, Krsticevic F, Arranz S, Tapia E. Robust and scalable barcoding for massively parallel long-read sequencing. Sci Rep 2022; 12:7619. [PMID: 35538127 PMCID: PMC9090787 DOI: 10.1038/s41598-022-11656-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 04/25/2022] [Indexed: 01/02/2023] Open
Abstract
Nucleic-acid barcoding is an enabling technique for many applications, but its use remains limited in emerging long-read sequencing technologies with intrinsically low raw accuracy. Here, we apply so-called NS-watermark barcodes, whose error correction capability was previously validated in silico, in a proof of concept where we synthesize 3840 NS-watermark barcodes and use them to asymmetrically tag and simultaneously sequence amplicons from two evolutionarily distant species (namely Bordetella pertussis and Drosophila mojavensis) on the ONT MinION platform. To our knowledge, this is the largest number of distinct, non-random tags ever sequenced in parallel and the first report of microarray-based synthesis as a source for large oligonucleotide pools for barcoding. We recovered the identity of more than 86% of the barcodes, with a crosstalk rate of 0.17% (i.e., one misassignment every 584 reads). This falls in the range of the index hopping rate of established, high-accuracy Illumina sequencing, despite the increased number of tags and the relatively low accuracy of both microarray-based synthesis and long-read sequencing. The robustness of NS-watermark barcodes, together with their scalable design and compatibility with low-cost massive synthesis, makes them promising for present and future sequencing applications requiring massive labeling, such as long-read single-cell RNA-Seq.
Collapse
Affiliation(s)
- Joaquín Ezpeleta
- Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas, Rosario, Argentina. .,Facultad de Ciencias Exactas, Ingeniería y Agrimensura, Universidad Nacional de Rosario, Rosario, Argentina.
| | - Ignacio Garcia Labari
- Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas, Rosario, Argentina
| | - Gabriela Vanina Villanova
- Consejo Nacional de Investigaciones Científicas y Técnicas, Rosario, Argentina.,Laboratorio Mixto de Biotecnología Acuática, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario - Centro Científico Tecnológico y Educativo Acuario del Río Paraná, Rosario, Argentina
| | - Pilar Bulacio
- Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas, Rosario, Argentina.,Facultad de Ciencias Exactas, Ingeniería y Agrimensura, Universidad Nacional de Rosario, Rosario, Argentina
| | - Sofía Lavista-Llanos
- Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas, Rosario, Argentina
| | - Victoria Posner
- Laboratorio Mixto de Biotecnología Acuática, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario - Centro Científico Tecnológico y Educativo Acuario del Río Paraná, Rosario, Argentina
| | - Flavia Krsticevic
- Robert H Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Silvia Arranz
- Laboratorio Mixto de Biotecnología Acuática, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario - Centro Científico Tecnológico y Educativo Acuario del Río Paraná, Rosario, Argentina
| | - Elizabeth Tapia
- Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas, Rosario, Argentina.,Facultad de Ciencias Exactas, Ingeniería y Agrimensura, Universidad Nacional de Rosario, Rosario, Argentina
| |
Collapse
|
27
|
Ezekannagha C, Becker A, Heider D, Hattab G. Design considerations for advancing data storage with synthetic DNA for long-term archiving. Mater Today Bio 2022; 15:100306. [PMID: 35677811 PMCID: PMC9167972 DOI: 10.1016/j.mtbio.2022.100306] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 05/05/2022] [Accepted: 05/22/2022] [Indexed: 11/22/2022]
Abstract
Deoxyribonucleic acid (DNA) is increasingly emerging as a serious medium for long-term archival data storage because of its remarkable high-capacity, high-storage-density characteristics and its lasting ability to store data for thousands of years. Various encoding algorithms are generally required to store digital information in DNA and to maintain data integrity. Indeed, since DNA is the information carrier, its performance under different processing and storage conditions significantly impacts the capabilities of the data storage system. Therefore, the design of a DNA storage system must meet specific design considerations to be less error-prone, robust and reliable. In this work, we summarize the general processes and technologies employed when using synthetic DNA as a storage medium. We also share the design considerations for sustainable engineering to include viability. We expect this work to provide insight into how sustainable design can be used to develop an efficient and robust synthetic DNA-based storage system for long-term archiving.
Collapse
Affiliation(s)
- Chisom Ezekannagha
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Str. 6, D-35043, Marburg, Germany
- Corresponding author.
| | - Anke Becker
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-Universität Marburg, Karl-von-Frisch-Str. 14, D-35043, Marburg, Germany
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Str. 6, D-35043, Marburg, Germany
| | - Georges Hattab
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Str. 6, D-35043, Marburg, Germany
| |
Collapse
|
28
|
Tedersoo L, Bahram M, Zinger L, Nilsson RH, Kennedy PG, Yang T, Anslan S, Mikryukov V. Best practices in metabarcoding of fungi: From experimental design to results. Mol Ecol 2022; 31:2769-2795. [PMID: 35395127 DOI: 10.1111/mec.16460] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 02/07/2022] [Accepted: 03/30/2022] [Indexed: 02/06/2023]
Abstract
The development of high-throughput sequencing (HTS) technologies has greatly improved our capacity to identify fungi and unveil their ecological roles across a variety of ecosystems. Here we provide an overview of current best practices in metabarcoding analysis of fungal communities, from experimental design through molecular and computational analyses. By reanalysing published data sets, we demonstrate that operational taxonomic units (OTUs) outperform amplified sequence variants (ASVs) in recovering fungal diversity, a finding that is particularly evident for long markers. Additionally, analysis of the full-length ITS region allows more accurate taxonomic placement of fungi and other eukaryotes compared to the ITS2 subregion. Finally, we show that specific methods for compositional data analyses provide more reliable estimates of shifts in community structure. We conclude that metabarcoding analyses of fungi are especially promising for integrating fungi into the full microbiome and broader ecosystem functioning context, recovery of novel fungal lineages and ancient organisms as well as barcoding of old specimens including type material.
Collapse
Affiliation(s)
- Leho Tedersoo
- Mycology and Microbiology Center, University of Tartu, Tartu, Estonia.,College of Science, King Saud University, Riyadh, Saudi Arabia
| | - Mohammad Bahram
- Mycology and Microbiology Center, University of Tartu, Tartu, Estonia.,Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Lucie Zinger
- Institut de Biologie de l'ENS (IBENS), Département de Biologie, École normale supérieure, CNRS, INSERM, Université PSL, Paris, France.,Naturalis Biodiversity Center, Leiden, The Netherlands
| | - R Henrik Nilsson
- Department of Biological and Environmental Sciences, Gothenburg Global Biodiversity Centre, University of Gothenburg, Göteborg, Sweden
| | - Peter G Kennedy
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, Minnesota, USA
| | - Teng Yang
- State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing, China
| | - Sten Anslan
- Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia
| | - Vladimir Mikryukov
- Mycology and Microbiology Center, University of Tartu, Tartu, Estonia.,Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia
| |
Collapse
|
29
|
Janjic A, Wange LE, Bagnoli JW, Geuder J, Nguyen P, Richter D, Vieth B, Vick B, Jeremias I, Ziegenhain C, Hellmann I, Enard W. Prime-seq, efficient and powerful bulk RNA sequencing. Genome Biol 2022; 23:88. [PMID: 35361256 PMCID: PMC8969310 DOI: 10.1186/s13059-022-02660-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 03/23/2022] [Indexed: 12/21/2022] Open
Abstract
Cost-efficient library generation by early barcoding has been central in propelling single-cell RNA sequencing. Here, we optimize and validate prime-seq, an early barcoding bulk RNA-seq method. We show that it performs equivalently to TruSeq, a standard bulk RNA-seq method, but is fourfold more cost-efficient due to almost 50-fold cheaper library costs. We also validate a direct RNA isolation step, show that intronic reads are derived from RNA, and compare cost-efficiencies of available protocols. We conclude that prime-seq is currently one of the best options to set up an early barcoding bulk RNA-seq protocol from which many labs would profit.
Collapse
Affiliation(s)
- Aleksandar Janjic
- Anthropology & Human Genomics, Faculty of Biology, Ludwig-Maximilians University, Großhaderner Str. 2, 82152, Martinsried, Germany
- Graduate School of Systemic Neurosciences, Faculty of Biology, Ludwig-Maximilians University, Martinsried, Germany
| | - Lucas E Wange
- Anthropology & Human Genomics, Faculty of Biology, Ludwig-Maximilians University, Großhaderner Str. 2, 82152, Martinsried, Germany
| | - Johannes W Bagnoli
- Anthropology & Human Genomics, Faculty of Biology, Ludwig-Maximilians University, Großhaderner Str. 2, 82152, Martinsried, Germany
| | - Johanna Geuder
- Anthropology & Human Genomics, Faculty of Biology, Ludwig-Maximilians University, Großhaderner Str. 2, 82152, Martinsried, Germany
| | - Phong Nguyen
- Anthropology & Human Genomics, Faculty of Biology, Ludwig-Maximilians University, Großhaderner Str. 2, 82152, Martinsried, Germany
| | - Daniel Richter
- Anthropology & Human Genomics, Faculty of Biology, Ludwig-Maximilians University, Großhaderner Str. 2, 82152, Martinsried, Germany
| | - Beate Vieth
- Anthropology & Human Genomics, Faculty of Biology, Ludwig-Maximilians University, Großhaderner Str. 2, 82152, Martinsried, Germany
| | - Binje Vick
- Research Unit Apoptosis in Hematopoietic Stem Cells, Helmholtz Zentrum München, German Research Center for Environmental Health (HMGU), Munich, Germany
- German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
| | - Irmela Jeremias
- Research Unit Apoptosis in Hematopoietic Stem Cells, Helmholtz Zentrum München, German Research Center for Environmental Health (HMGU), Munich, Germany
- German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
- Department of Pediatrics, Dr. von Hauner Children's Hospital, Ludwig-Maximilians University, Munich, Germany
| | - Christoph Ziegenhain
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | - Ines Hellmann
- Anthropology & Human Genomics, Faculty of Biology, Ludwig-Maximilians University, Großhaderner Str. 2, 82152, Martinsried, Germany
| | - Wolfgang Enard
- Anthropology & Human Genomics, Faculty of Biology, Ludwig-Maximilians University, Großhaderner Str. 2, 82152, Martinsried, Germany.
| |
Collapse
|
30
|
Logan R, Fleischmann Z, Annis S, Wehe AW, Tilly JL, Woods DC, Khrapko K. 3GOLD: optimized Levenshtein distance for clustering third-generation sequencing data. BMC Bioinformatics 2022; 23:95. [PMID: 35307007 PMCID: PMC8934446 DOI: 10.1186/s12859-022-04637-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 03/10/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Third-generation sequencing offers some advantages over next-generation sequencing predecessors, but with the caveat of harboring a much higher error rate. Clustering-related sequences is an essential task in modern biology. To accurately cluster sequences rich in errors, error type and frequency need to be accounted for. Levenshtein distance is a well-established mathematical algorithm for measuring the edit distance between words and can specifically weight insertions, deletions and substitutions. However, there are drawbacks to using Levenshtein distance in a biological context and hence has rarely been used for this purpose. We present novel modifications to the Levenshtein distance algorithm to optimize it for clustering error-rich biological sequencing data. RESULTS We successfully introduced a bidirectional frameshift allowance with end-user determined accommodation caps combined with weighted error discrimination. Furthermore, our modifications dramatically improved the computational speed of Levenstein distance. For simulated ONT MinION and PacBio Sequel datasets, the average clustering sensitivity for 3GOLD was 41.45% (S.D. 10.39) higher than Sequence-Levenstein distance, 52.14% (S.D. 9.43) higher than Levenshtein distance, 55.93% (S.D. 8.67) higher than Starcode, 42.68% (S.D. 8.09) higher than CD-HIT-EST and 61.49% (S.D. 7.81) higher than DNACLUST. For biological ONT MinION data, 3GOLD clustering sensitivity was 27.99% higher than Sequence-Levenstein distance, 52.76% higher than Levenshtein distance, 56.39% higher than Starcode, 48% higher than CD-HIT-EST and 70.4% higher than DNACLUST. CONCLUSION Our modifications to Levenshtein distance have improved its speed and accuracy compared to the classic Levenshtein distance, Sequence-Levenshtein distance and other commonly used clustering approaches on simulated and biological third-generation sequenced datasets. Our clustering approach is appropriate for datasets of unknown cluster centroids, such as those generated with unique molecular identifiers as well as known centroids such as barcoded datasets. A strength of our approach is high accuracy in resolving small clusters and mitigating the number of singletons.
Collapse
Affiliation(s)
- Robert Logan
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA, 02115, USA.,Department of Biology, Eastern Nazarene College, 23 E Elm Ave, Quincy, MA, 02170, USA
| | - Zoe Fleischmann
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA, 02115, USA
| | - Sofia Annis
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA, 02115, USA
| | - Amy Wangsness Wehe
- Health and Natural Sciences Division, Mathematics Department, Fitchburg State University, Fitchburg, MA, 01420-2697, USA
| | - Jonathan L Tilly
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA, 02115, USA
| | - Dori C Woods
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA, 02115, USA
| | - Konstantin Khrapko
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA, 02115, USA.
| |
Collapse
|
31
|
Abstract
Oligo library pools are powerful tools for systematic investigation of genetic and transcriptomic machinery such as promoter function and gene regulation, non-coding RNAs, or RNA modifications. Here, we provide a detailed protocol for cloning DNA oligo pools made up of tens of thousands of different constructs, aiming to preserve the complexity of the pools. This system would be suitable for expression in cell lines and can be followed up by next-generation sequencing analysis. For complete details on the use and execution of this profile, please refer to Uzonyi et al. (2021). Restriction-based cloning of DNA pools Preservation of complexity of thousands of constructs Used to investigate genetic and transcriptomic machineries To be expressed in cell lines and follow up by NGS analysis
Collapse
Affiliation(s)
- Anna Uzonyi
- Department of Molecular Genetics, Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - Ronit Nir
- Department of Molecular Genetics, Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - Schraga Schwartz
- Department of Molecular Genetics, Weizmann Institute of Science, 7610001 Rehovot, Israel
| |
Collapse
|
32
|
Massively parallel phenotyping of coding variants in cancer with Perturb-seq. Nat Biotechnol 2022; 40:896-905. [DOI: 10.1038/s41587-021-01160-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 11/11/2021] [Indexed: 02/08/2023]
|
33
|
Sauter B, Schneider L, Stress C, Gillingham D. An assessment of the mutational load caused by various reactions used in DNA encoded libraries. Bioorg Med Chem 2021; 52:116508. [PMID: 34800876 DOI: 10.1016/j.bmc.2021.116508] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 10/14/2021] [Accepted: 10/15/2021] [Indexed: 02/08/2023]
Abstract
DNA encoded libraries have become an essential hit-finding tool in early drug discovery. Recent advances in synthetic methods for DNA encoded libraries have expanded the available chemical space, but precisely how each type of chemistry affects the DNA is unstudied. Available assays to quantify the damage are limited to write efficiency, where the ability to ligate DNA onto a working encoded library strand is measured, or qPCR is performed to measure the amplifiability of the DNA. These measures read signal quantity and overall integrity, but do not report on specific damages in the encoded information. Herein, we use next generation sequencing (NGS) to measure the quality of the read signal in order to quantify the truthfulness of the retrieved information. We identify CuAAC to be the worst offender in terms of DNA damage amongst commonly used reactions in DELs, causing an increase of G → T transversions. Furthermore, we show that the analysis provides useful information even in fully elaborated DELs; indeed we see that vestiges of the synthetic history, both chemical and biochemical, are written into the mutational spectra of NGS datasets.
Collapse
Affiliation(s)
- Basilius Sauter
- Department of Chemistry, University of Basel, St. Johanns-Ring 19, CH-4056 Basel, Switzerland.
| | - Lukas Schneider
- Department of Chemistry, University of Basel, St. Johanns-Ring 19, CH-4056 Basel, Switzerland
| | - Cedric Stress
- Department of Chemistry, University of Basel, St. Johanns-Ring 19, CH-4056 Basel, Switzerland
| | - Dennis Gillingham
- Department of Chemistry, University of Basel, St. Johanns-Ring 19, CH-4056 Basel, Switzerland.
| |
Collapse
|
34
|
Liu S, Punthambaker S, Iyer EPR, Ferrante T, Goodwin D, Fürth D, Pawlowski AC, Jindal K, Tam JM, Mifflin L, Alon S, Sinha A, Wassie AT, Chen F, Cheng A, Willocq V, Meyer K, Ling KH, Camplisson CK, Kohman RE, Aach J, Lee JH, Yankner BA, Boyden ES, Church GM. Barcoded oligonucleotides ligated on RNA amplified for multiplexed and parallel in situ analyses. Nucleic Acids Res 2021; 49:e58. [PMID: 33693773 PMCID: PMC8191787 DOI: 10.1093/nar/gkab120] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 01/29/2021] [Accepted: 03/02/2021] [Indexed: 12/13/2022] Open
Abstract
We present barcoded oligonucleotides ligated on RNA amplified for multiplexed and parallel insitu analyses (BOLORAMIS), a reverse transcription-free method for spatially-resolved, targeted, in situ RNA identification of single or multiple targets. BOLORAMIS was demonstrated on a range of cell types and human cerebral organoids. Singleplex experiments to detect coding and non-coding RNAs in human iPSCs showed a stem-cell signature pattern. Specificity of BOLORAMIS was found to be 92% as illustrated by a clear distinction between human and mouse housekeeping genes in a co-culture system, as well as by recapitulation of subcellular localization of lncRNA MALAT1. Sensitivity of BOLORAMIS was quantified by comparing with single molecule FISH experiments and found to be 11%, 12% and 35% for GAPDH, TFRC and POLR2A, respectively. To demonstrate BOLORAMIS for multiplexed gene analysis, we targeted 96 mRNAs within a co-culture of iNGN neurons and HMC3 human microglial cells. We used fluorescence in situ sequencing to detect error-robust 8-base barcodes associated with each of these genes. We then used this data to uncover the spatial relationship among cells and transcripts by performing single-cell clustering and gene–gene proximity analyses. We anticipate the BOLORAMIS technology for in situ RNA detection to find applications in basic and translational research.
Collapse
Affiliation(s)
- Songlei Liu
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Sukanya Punthambaker
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.,Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Eswar P R Iyer
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.,Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Thomas Ferrante
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Daniel Goodwin
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Media Arts and Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Daniel Fürth
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Andrew C Pawlowski
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.,Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Kunal Jindal
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.,Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Jenny M Tam
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.,Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Lauren Mifflin
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Shahar Alon
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Media Arts and Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Anubhav Sinha
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Media Arts and Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Asmamaw T Wassie
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Media Arts and Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Fei Chen
- Media Arts and Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Broad Institute, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Anne Cheng
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Valerie Willocq
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Katharina Meyer
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - King-Hwa Ling
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.,Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia
| | - Conor K Camplisson
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.,Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Richie E Kohman
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.,Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - John Aach
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Je Hyuk Lee
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Bruce A Yankner
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Edward S Boyden
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Media Arts and Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02142, USA.,Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02142, USA.,Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USAHoward Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - George M Church
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.,Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| |
Collapse
|
35
|
Pjevac P, Hausmann B, Schwarz J, Kohl G, Herbold CW, Loy A, Berry D. An Economical and Flexible Dual Barcoding, Two-Step PCR Approach for Highly Multiplexed Amplicon Sequencing. Front Microbiol 2021; 12:669776. [PMID: 34093488 PMCID: PMC8173057 DOI: 10.3389/fmicb.2021.669776] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 04/12/2021] [Indexed: 12/26/2022] Open
Abstract
In microbiome research, phylogenetic and functional marker gene amplicon sequencing is the most commonly-used community profiling approach. Consequently, a plethora of protocols for the preparation and multiplexing of samples for amplicon sequencing have been developed. Here, we present two economical high-throughput gene amplification and sequencing workflows that are implemented as standard operating procedures at the Joint Microbiome Facility of the Medical University of Vienna and the University of Vienna. These workflows are based on a previously-published two-step PCR approach, but have been updated to either increase the accuracy of results, or alternatively to achieve orders of magnitude higher numbers of samples to be multiplexed in a single sequencing run. The high-accuracy workflow relies on unique dual sample barcoding. It allows the same level of sample multiplexing as the previously-published two-step PCR approach, but effectively eliminates residual read missasignments between samples (crosstalk) which are inherent to single barcoding approaches. The high-multiplexing workflow is based on combinatorial dual sample barcoding, which theoretically allows for multiplexing up to 299,756 amplicon libraries of the same target gene in a single massively-parallelized amplicon sequencing run. Both workflows presented here are highly economical, easy to implement, and can, without significant modifications or cost, be applied to any target gene of interest.
Collapse
Affiliation(s)
- Petra Pjevac
- Joint Microbiome Facility of the Medical University of Vienna and the University of Vienna, Vienna, Austria
- Division of Microbial Ecology, Department of Microbiology and Ecosystem Science, Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria
| | - Bela Hausmann
- Joint Microbiome Facility of the Medical University of Vienna and the University of Vienna, Vienna, Austria
- Department of Laboratory Medicine, Medical University of Vienna, Vienna, Austria
| | - Jasmin Schwarz
- Joint Microbiome Facility of the Medical University of Vienna and the University of Vienna, Vienna, Austria
- Department of Laboratory Medicine, Medical University of Vienna, Vienna, Austria
| | - Gudrun Kohl
- Joint Microbiome Facility of the Medical University of Vienna and the University of Vienna, Vienna, Austria
- Division of Microbial Ecology, Department of Microbiology and Ecosystem Science, Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria
| | - Craig W. Herbold
- Division of Microbial Ecology, Department of Microbiology and Ecosystem Science, Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria
| | - Alexander Loy
- Joint Microbiome Facility of the Medical University of Vienna and the University of Vienna, Vienna, Austria
- Division of Microbial Ecology, Department of Microbiology and Ecosystem Science, Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria
| | - David Berry
- Joint Microbiome Facility of the Medical University of Vienna and the University of Vienna, Vienna, Austria
- Division of Microbial Ecology, Department of Microbiology and Ecosystem Science, Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria
| |
Collapse
|
36
|
Yang IS, Bae SW, Park B, Kim S. Development of a program for in silico optimized selection of oligonucleotide-based molecular barcodes. PLoS One 2021; 16:e0246354. [PMID: 33600481 PMCID: PMC7891705 DOI: 10.1371/journal.pone.0246354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 01/15/2021] [Indexed: 11/19/2022] Open
Abstract
Short DNA oligonucleotides (~4 mer) have been used to index samples from different sources, such as in multiplex sequencing. Presently, longer oligonucleotides (8–12 mer) are being used as molecular barcodes with which to distinguish among raw DNA molecules in many high-tech sequence analyses, including low-frequent mutation detection, quantitative transcriptome analysis, and single-cell sequencing. Despite some advantages of using molecular barcodes with random sequences, such an approach, however, makes it impossible to know the exact sequences used in an experiment and can lead to inaccurate interpretation due to misclustering of barcodes arising from the occurrence of unexpected mutations in the barcodes. The present study introduces a tool developed for selecting an optimal barcode subset during molecular barcoding. The program considers five barcode factors: GC content, homopolymers, simple sequence repeats with repeated units of dinucleotides, Hamming distance, and complementarity between barcodes. To evaluate a selected barcode set, penalty scores for the factors are defined based on their distributions observed in random barcodes. The algorithm employed in the program comprises two steps: i) random generation of an initial set and ii) optimal barcode selection via iterative replacement. Users can execute the program by inputting barcode length and the number of barcodes to be generated. Furthermore, the program accepts a user’s own values for other parameters, including penalty scores, for advanced use, allowing it to be applied in various conditions. In many test runs to obtain 100000 barcodes with lengths of 12 nucleotides, the program showed fast performance, efficient enough to generate optimal barcode sequences with merely the use of a desktop PC. We also showed that VFOS has comparable performance, flexibility in program running, consideration of simple sequence repeats, and fast computation time in comparison with other two tools (DNABarcodes and FreeBarcodes). Owing to the versatility and fast performance of the program, we expect that many researchers will opt to apply it for selecting optimal barcode sets during their experiments, including next-generation sequencing.
Collapse
Affiliation(s)
- In Seok Yang
- Department of Biomedical Systems Informatics and Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, Korea
| | - Sang Won Bae
- Department of Computer Science, Kyonggi University, Suwon, Korea
| | - BeumJin Park
- Department of Biomedical Systems Informatics and Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, Korea
| | - Sangwoo Kim
- Department of Biomedical Systems Informatics and Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, Korea
- * E-mail:
| |
Collapse
|
37
|
Low-complexity and highly robust barcodes for error-rich single molecular sequencing. 3 Biotech 2021; 11:78. [PMID: 33505833 DOI: 10.1007/s13205-020-02607-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 12/23/2020] [Indexed: 12/28/2022] Open
Abstract
DNA barcodes are frequently corrupted due to insertion, deletion, and substitution errors during DNA synthesis, amplification and sequencing, resulting in index hopping. In this paper, we propose a new DNA barcode construction scheme that combines a cyclic block code with a predetermined pseudo-random sequence bit by bit to form bit pairs, and then converts the bit pairs to bases, i.e., the DNA barcodes. Then, we present a barcode identification scheme for noisy sequencing reads, which uses a combination of cyclic shifting and traditional dynamic programming to mark the insertion and deletion positions, and then performs erasure-and-error-correction decoding on the corrupted codewords. Furthermore, we verify the identification error rate of barcodes for multiple errors and evaluate the reliability of the barcodes in DNA context. This method can be easily generalized for constructing long barcodes, which may be used in scenarios with serious errors. Simulation results show that the bit error rate after identifying insertions/deletions is greatly reduced using the combination of cyclic shift and dynamic programming compared to using dynamic programming only. It indicates that the proposed method can effectively improve the accuracy for estimating insertion/deletion errors. And the overall identification error rate of the proposed method is lower than 10 - 5 when the probability of each base mutation is less than 0.1, which is the typical scenario in third-generation sequencing.
Collapse
|
38
|
Sidore AM, Plesa C, Samson JA, Lubock NB, Kosuri S. DropSynth 2.0: high-fidelity multiplexed gene synthesis in emulsions. Nucleic Acids Res 2020; 48:e95. [PMID: 32692349 PMCID: PMC7498354 DOI: 10.1093/nar/gkaa600] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 06/13/2020] [Accepted: 07/11/2020] [Indexed: 01/12/2023] Open
Abstract
Multiplexed assays allow functional testing of large synthetic libraries of genetic elements, but are limited by the designability, length, fidelity and scale of the input DNA. Here, we improve DropSynth, a low-cost, multiplexed method that builds gene libraries by compartmentalizing and assembling microarray-derived oligonucleotides in vortexed emulsions. By optimizing enzyme choice, adding enzymatic error correction and increasing scale, we show that DropSynth can build thousands of gene-length fragments at >20% fidelity.
Collapse
Affiliation(s)
- Angus M Sidore
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Calin Plesa
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Joyce A Samson
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nathan B Lubock
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Sriram Kosuri
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA.,UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, Quantitative and Computational Biology Institute, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
39
|
Ultra-high throughput single-cell analysis of proteins and RNAs by split-pool synthesis. Commun Biol 2020; 3:213. [PMID: 32382044 PMCID: PMC7205613 DOI: 10.1038/s42003-020-0896-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 03/04/2020] [Indexed: 12/11/2022] Open
Abstract
Single-cell omics provide insight into cellular heterogeneity and function. Recent technological advances have accelerated single-cell analyses, but workflows remain expensive and complex. We present a method enabling simultaneous, ultra-high throughput single-cell barcoding of millions of cells for targeted analysis of proteins and RNAs. Quantum barcoding (QBC) avoids isolation of single cells by building cell-specific oligo barcodes dynamically within each cell. With minimal instrumentation (four 96-well plates and a multichannel pipette), cell-specific codes are added to each tagged molecule within cells through sequential rounds of classical split-pool synthesis. Here we show the utility of this technology in mouse and human model systems for as many as 50 antibodies to targeted proteins and, separately, >70 targeted RNA regions. We demonstrate that this method can be applied to multi-modal protein and RNA analyses. It can be scaled by expansion of the split-pool process and effectively renders sequencing instruments as versatile multi-parameter flow cytometers. Maeve O’Huallachain et al. report a method that enables simultaneous, ultra-high throughput single-cell barcoding for targeted single-cell protein and RNA analysis. They show the utility of their method in analyses of mRNA and protein expression in human and mouse cells.
Collapse
|
40
|
Sequencing barcode construction and identification methods based on block error-correction codes. SCIENCE CHINA-LIFE SCIENCES 2020; 63:1580-1592. [PMID: 32303959 DOI: 10.1007/s11427-019-1651-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 02/11/2020] [Indexed: 02/07/2023]
Abstract
Multiplexed sequencing relies on specific sample labels, the barcodes, to tag DNA fragments belonging to different samples and to separate the output of the sequencers. However, the barcodes are often corrupted by insertion, deletion and substitution errors introduced during sequencing, which may lead to sample misassignment. In this paper, we propose a barcode construction method, which combines a block error-correction code with a predetermined pseudorandom sequence to generate a base sequence for labeling different samples. Furthermore, to identify the corrupted barcodes for assigning reads to their respective samples, we present a soft decision identification method that consists of inner decoding and outer decoding. The inner decoder establishes the hidden Markov model (HMM) for base insertion/deletion estimation with the pseudorandom sequence, and adapts the forward-backward (FB) algorithm to output the soft information of each bit in the block code. The outer decoder performs soft decision decoding using the soft information to effectively correct multiple errors in the barcodes. Simulation results show that the proposed methods are highly robust to high error rates of insertions, deletions and substitutions in the barcodes. In addition, compared with the inner decoding algorithm of the barcodes based on watermarks, the proposed inner decoding algorithm can greatly reduce the decoding complexity.
Collapse
|
41
|
Short DNA Probes Developed for Sample Tracking and Quality Assurance in Gene Panel Testing. J Mol Diagn 2019; 21:1079-1094. [DOI: 10.1016/j.jmoldx.2019.07.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Revised: 07/02/2019] [Accepted: 07/24/2019] [Indexed: 12/21/2022] Open
|
42
|
Feldman D, Singh A, Schmid-Burgk JL, Carlson RJ, Mezger A, Garrity AJ, Zhang F, Blainey PC. Optical Pooled Screens in Human Cells. Cell 2019; 179:787-799.e17. [PMID: 31626775 PMCID: PMC6886477 DOI: 10.1016/j.cell.2019.09.016] [Citation(s) in RCA: 167] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Revised: 07/08/2019] [Accepted: 09/13/2019] [Indexed: 01/06/2023]
Abstract
Genetic screens are critical for the systematic identification of genes underlying cellular phenotypes. Pooling gene perturbations greatly improves scalability but is not compatible with imaging of complex and dynamic cellular phenotypes. Here, we introduce a pooled approach for optical genetic screens in mammalian cells. We use targeted in situ sequencing to demultiplex a library of genetic perturbations following image-based phenotyping. We screened a set of 952 genes across millions of cells for involvement in nuclear factor κB (NF-κB) signaling by imaging the translocation of RelA (p65) to the nucleus. Screening at a single time point across 3 cell lines recovered 15 known pathway components, while repeating the screen with live-cell imaging revealed a role for Mediator complex subunits in regulating the duration of p65 nuclear retention. These results establish a highly multiplexed approach to image-based screens of spatially and temporally defined phenotypes with pooled libraries.
Collapse
Affiliation(s)
- David Feldman
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Physics, MIT, Cambridge, MA 02142, USA
| | - Avtar Singh
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Rebecca J Carlson
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Health Sciences and Technology, MIT, Cambridge, MA 02142, USA
| | - Anja Mezger
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Medical Biochemistry and Biophysics, Karolinska Institute, Stockholm, Sweden
| | | | - Feng Zhang
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Biological Engineering, MIT, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Cambridge, MA 02142, USA; Department of Brain and Cognitive Science, MIT, Cambridge, MA 02142, USA; Howard Hughes Medical Institute, MIT, Cambridge, MA 02142, USA
| | - Paul C Blainey
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Biological Engineering, MIT, Cambridge, MA 02142, USA; Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA 02142, USA.
| |
Collapse
|
43
|
Wang B, Zhang Q, Wei X. Tabu Variable Neighborhood Search for Designing DNA Barcodes. IEEE Trans Nanobioscience 2019; 19:127-131. [PMID: 31581087 DOI: 10.1109/tnb.2019.2942036] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Massively parallel sequencing, a popular and efficient sequencing method, produces a tremendous number of sequences from multiple individual samples. Labeling sequences with barcodes (tags) prevents them from being unrecoverable or confused during sequencing, replication, and oligonucleotide synthesis. In view of DNA barcode set design, we propose a tabu variable neighborhood search (TVNS) to design DNA barcode sets. We propose using sequences with the maximum sum of edit distances as the basis for building the neighborhood structures of TVNS. We adopt an exhaustive search to complete local searches built from these structures. The algorithm switches between neighborhoods built from different seed data. Compared with previous designs, our proposed algorithm is effective for designing DNA barcode sets and improving the lower bound of DNA barcode sets.
Collapse
|
44
|
Kluiver J, Niu F, Yuan Y, Kok K, van den Berg A, Dzikiewicz-Krawczyk A. NGS-Based High-Throughput Screen to Identify MicroRNAs Regulating Growth of B-Cell Lymphoma. Methods Mol Biol 2019; 1956:269-282. [PMID: 30779039 DOI: 10.1007/978-1-4939-9151-8_12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
MicroRNAs (miRNAs) play important roles in development, differentiation, and homeostasis by regulating protein translation. In B-cell lymphoma, many miRNAs have altered expression levels, and for a limited subset of them, experimental data supports their functional relevance in lymphoma pathogenesis. This chapter describes an unbiased next-generation sequencing (NGS)-based high-throughput screening approach to identify miRNAs that are involved in the control of cell growth. First, we provide a protocol for performing high-throughput screening for miRNA inhibition and overexpression. Second, we describe the procedure for next-generation sequencing library preparation. Third, we provide a workflow for data analysis.
Collapse
Affiliation(s)
- Joost Kluiver
- Department of Pathology and Medical Biology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
| | - Fubiao Niu
- Department of Pathology and Medical Biology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Ye Yuan
- Department of Pathology and Medical Biology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.,Department of Clinical Pharmacy, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Klaas Kok
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Anke van den Berg
- Department of Pathology and Medical Biology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Agnieszka Dzikiewicz-Krawczyk
- Department of Pathology and Medical Biology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands. .,Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland.
| |
Collapse
|
45
|
Tchesnokova V, Radey M, Chattopadhyay S, Larson L, Weaver JL, Kisiela D, Sokurenko EV. Pandemic fluoroquinolone resistant Escherichia coli clone ST1193 emerged via simultaneous homologous recombinations in 11 gene loci. Proc Natl Acad Sci U S A 2019; 116:14740-14748. [PMID: 31262826 PMCID: PMC6642405 DOI: 10.1073/pnas.1903002116] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Global growth in antibiotic resistance is a major social problem. A high level of resistance to fluoroquinolones requires the concurrent presence of at least 3 mutations in the target proteins-2 in DNA gyrase (GyrA) and 1 in topoisomerase IV (ParC), which occur in a stepwise manner. In the Escherichia coli chromosome, the gyrA and parC loci are positioned about 1 Mb away from each other. Here we show that the 3 fluoroquinolone resistance mutations are tightly associated genetically in naturally occurring strains. In the latest pandemic uropathogenic and multidrug-resistant E. coli clonal group ST1193, the mutant variants of gyrA and parC were acquired not by a typical gradual, stepwise evolution but all at once. This happened as part of 11 simultaneous homologous recombination events involving 2 phylogenetically distant strains of E. coli, from an uropathogenic clonal complex ST14 and fluoroquinolone-resistant ST10. The gene exchanges swapped regions between 0.5 and 139 Kb in length (183 Kb total) spread along 976 Kb of chromosomal DNA around and between gyrA and parC loci. As a result, all 3 fluoroquinolone resistance mutations in GyrA and ParC have simultaneously appeared in ST1193. Based on molecular clock estimates, this potentially happened as recently as <12 y ago. Thus, naturally occurring homologous recombination events between 2 strains can involve numerous chromosomal gene locations simultaneously, resulting in the transfer of distant but tightly associated genetic mutations and emergence of a both highly pathogenic and antibiotic-resistant strain with a rapid global spread capability.
Collapse
Affiliation(s)
| | - Matthew Radey
- Department of Microbiology, University of Washington, Seattle, WA 98105
| | - Sujay Chattopadhyay
- Institute of Advanced Studies and Research, JIS University, Kolkata 700091, India
| | - Lydia Larson
- Department of Microbiology, University of Washington, Seattle, WA 98105
| | - Jamie Lee Weaver
- Department of Microbiology, University of Washington, Seattle, WA 98105
| | - Dagmara Kisiela
- Department of Microbiology, University of Washington, Seattle, WA 98105
| | - Evgeni V Sokurenko
- Department of Microbiology, University of Washington, Seattle, WA 98105;
| |
Collapse
|
46
|
Kats I, Khmelinskii A, Kschonsak M, Huber F, Knieß RA, Bartosik A, Knop M. Mapping Degradation Signals and Pathways in a Eukaryotic N-terminome. Mol Cell 2019; 70:488-501.e5. [PMID: 29727619 DOI: 10.1016/j.molcel.2018.03.033] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Revised: 01/26/2018] [Accepted: 03/27/2018] [Indexed: 01/01/2023]
Abstract
Most eukaryotic proteins are N-terminally acetylated. This modification can be recognized as a signal for selective protein degradation (degron) by the N-end rule pathways. However, the prevalence and specificity of such degrons in the proteome are unclear. Here, by systematically examining how protein turnover is affected by N-terminal sequences, we perform a comprehensive survey of degrons in the yeast N-terminome. We find that approximately 26% of nascent protein N termini encode cryptic degrons. These degrons exhibit high hydrophobicity and are frequently recognized by the E3 ubiquitin ligase Doa10, suggesting a role in protein quality control. In contrast, N-terminal acetylation rarely functions as a degron. Surprisingly, we identify two pathways where N-terminal acetylation has the opposite function and blocks protein degradation through the E3 ubiquitin ligase Ubr1. Our analysis highlights the complexity of N-terminal degrons and argues that hydrophobicity, not N-terminal acetylation, is the predominant feature of N-terminal degrons in nascent proteins.
Collapse
Affiliation(s)
- Ilia Kats
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany
| | - Anton Khmelinskii
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany
| | - Marc Kschonsak
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany
| | - Florian Huber
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany
| | - Robert A Knieß
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany
| | - Anna Bartosik
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany
| | - Michael Knop
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany; Deutsches Krebsforschungszentrum (DKFZ), DKFZ-ZMBH Alliance, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany.
| |
Collapse
|
47
|
Akhmetov A, Ellington AD, Marcotte EM. A highly parallel strategy for storage of digital information in living cells. BMC Biotechnol 2018; 18:64. [PMID: 30333005 PMCID: PMC6191901 DOI: 10.1186/s12896-018-0476-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Accepted: 10/01/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Encoding arbitrary digital information in DNA has attracted attention as a potential avenue for large scale and long term data storage. However, in order to enable DNA data storage technologies there needs to be improvements in data storage fidelity (tolerance to mutation), the facility of writing and reading the data (biases and systematic error arising from synthesis and sequencing), and overall scalability. RESULTS To this end, we have developed and implemented an encoding scheme that is suitable for detecting and correcting errors that may arise during storage, writing, and reading, such as those arising from nucleotide substitutions, insertions, and deletions. We propose a scheme for parallelized long term storage of encoded sequences that relies on overlaps rather than the address blocks found in previously published work. Using computer simulations, we illustrate the encoding, sequencing, decoding, and recovery of encoded information, ultimately demonstrating the possibility of a successful round-trip read/write. These demonstrations show that in theory a precise control over error tolerance is possible. Even after simulated degradation of DNA, recovery of original data is possible owing to the error correction capabilities built into the encoding strategy. A secondary advantage of our method is that the statistical characteristics (such as repetitiveness and GC-composition) of encoded sequences can also be tailored without sacrificing the overall ability to store large amounts of data. Finally, the combination of the overlap-based partitioning of data with the LZMA compression that is integral to encoding means that the entire sequence must be present for successful decoding. This feature enables inordinately strong encryptions. As a potential application, an encrypted pathogen genome could be distributed and carried by cells without danger of being expressed, and could not even be read out in the absence of the entire DNA consortium. CONCLUSIONS We have developed a method for DNA encoding, using a significantly different fundamental approach from existing work, which often performs better than alternatives and allows for a great deal of freedom and flexibility of application.
Collapse
Affiliation(s)
- Azat Akhmetov
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA. .,Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, TX, USA.
| | - Andrew D Ellington
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA. .,Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, TX, USA. .,Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, USA.
| | - Edward M Marcotte
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA. .,Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, TX, USA. .,Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, USA.
| |
Collapse
|
48
|
Somervuo P, Koskinen P, Mei P, Holm L, Auvinen P, Paulin L. BARCOSEL: a tool for selecting an optimal barcode set for high-throughput sequencing. BMC Bioinformatics 2018; 19:257. [PMID: 29976145 PMCID: PMC6034344 DOI: 10.1186/s12859-018-2262-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Accepted: 06/25/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Current high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel. Different samples are labeled by attaching a short sample specific nucleotide sequence, barcode, to each DNA molecule prior pooling them into a mix containing a number of libraries to be sequenced simultaneously. After sequencing, the samples are binned by identifying the barcode sequence within each sequence read. In order to tolerate sequencing errors, barcodes should be sufficiently apart from each other in sequence space. An additional constraint due to both nucleotide usage and basecalling accuracy is that the proportion of different nucleotides should be in balance in each barcode position. The number of samples to be mixed in each sequencing run may vary and this introduces a problem how to select the best subset of available barcodes at sequencing core facility for each sequencing run. There are plenty of tools available for de novo barcode design, but they are not suitable for subset selection. RESULTS We have developed a tool which can be used for three different tasks: 1) selecting an optimal barcode set from a larger set of candidates, 2) checking the compatibility of user-defined set of barcodes, e.g. whether two or more libraries with existing barcodes can be combined in a single sequencing pool, and 3) augmenting an existing set of barcodes. In our approach the selection process is formulated as a minimization problem. We define the cost function and a set of constraints and use integer programming to solve the resulting combinatorial problem. Based on the desired number of barcodes to be selected and the set of candidate sequences given by user, the necessary constraints are automatically generated and the optimal solution can be found. The method is implemented in C programming language and web interface is available at http://ekhidna2.biocenter.helsinki.fi/barcosel . CONCLUSIONS Increasing capacity of sequencing platforms raises the challenge of mixing barcodes. Our method allows the user to select a given number of barcodes among the larger existing barcode set so that both sequencing errors are tolerated and the nucleotide balance is optimized. The tool is easy to access via web browser.
Collapse
Affiliation(s)
- Panu Somervuo
- Mathematical Biology Group, Department of Biosciences, FIN-00014 University of Helsinki, P.O.Box 65 Finland
- Holm Group, Institute of Biotechnology, FIN-00014 University of Helsinki, P.O.Box 56 Finland
| | - Patrik Koskinen
- Holm Group, Institute of Biotechnology, FIN-00014 University of Helsinki, P.O.Box 56 Finland
| | - Peng Mei
- DNA Sequencing and Genomics Laboratory, Institute of Biotechnology, FIN-00014 University of Helsinki, P.O.Box 56 Finland
| | - Liisa Holm
- Holm Group, Institute of Biotechnology, FIN-00014 University of Helsinki, P.O.Box 56 Finland
| | - Petri Auvinen
- DNA Sequencing and Genomics Laboratory, Institute of Biotechnology, FIN-00014 University of Helsinki, P.O.Box 56 Finland
| | - Lars Paulin
- DNA Sequencing and Genomics Laboratory, Institute of Biotechnology, FIN-00014 University of Helsinki, P.O.Box 56 Finland
| |
Collapse
|
49
|
Abstract
Modern high-throughput biological assays study pooled populations of individual members by labeling each member with a unique DNA sequence called a “barcode.” DNA barcodes are frequently corrupted by DNA synthesis and sequencing errors, leading to significant data loss and incorrect data interpretation. Here, we describe an error correction strategy to improve the efficiency and statistical power of DNA barcodes. Our strategy accurately handles insertions and deletions (indels) in DNA barcodes, the most common type of error encountered during DNA synthesis and sequencing, resulting in order-of-magnitude increases in accuracy, efficiency, and signal-to-noise ratio. The accompanying software package makes deployment of these barcodes straightforward for the broader experimental scientist community. Many large-scale, high-throughput experiments use DNA barcodes, short DNA sequences prepended to DNA libraries, for identification of individuals in pooled biomolecule populations. However, DNA synthesis and sequencing errors confound the correct interpretation of observed barcodes and can lead to significant data loss or spurious results. Widely used error-correcting codes borrowed from computer science (e.g., Hamming, Levenshtein codes) do not properly account for insertions and deletions (indels) in DNA barcodes, even though deletions are the most common type of synthesis error. Here, we present and experimentally validate filled/truncated right end edit (FREE) barcodes, which correct substitution, insertion, and deletion errors, even when these errors alter the barcode length. FREE barcodes are designed with experimental considerations in mind, including balanced guanine-cytosine (GC) content, minimal homopolymer runs, and reduced internal hairpin propensity. We generate and include lists of barcodes with different lengths and error correction levels that may be useful in diverse high-throughput applications, including >106 single-error–correcting 16-mers that strike a balance between decoding accuracy, barcode length, and library size. Moreover, concatenating two or more FREE codes into a single barcode increases the available barcode space combinatorially, generating lists with >1015 error-correcting barcodes. The included software for creating barcode libraries and decoding sequenced barcodes is efficient and designed to be user-friendly for the general biology community.
Collapse
|
50
|
Wang B, Zheng X, Zhou S, Zhou C, Wei X, Zhang Q, Wei Z. Constructing DNA Barcode Sets Based on Particle Swarm Optimization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:999-1002. [PMID: 28287980 DOI: 10.1109/tcbb.2017.2679004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Following the completion of the human genome project, a large amount of high-throughput bio-data was generated. To analyze these data, massively parallel sequencing, namely next-generation sequencing, was rapidly developed. DNA barcodes are used to identify the ownership between sequences and samples when they are attached at the beginning or end of sequencing reads. Constructing DNA barcode sets provides the candidate DNA barcodes for this application. To increase the accuracy of DNA barcode sets, a particle swarm optimization (PSO) algorithm has been modified and used to construct the DNA barcode sets in this paper. Compared with the extant results, some lower bounds of DNA barcode sets are improved. The results show that the proposed algorithm is effective in constructing DNA barcode sets.
Collapse
|