1
|
Massively parallel screen uncovers many rare 3' UTR variants regulating mRNA abundance of cancer driver genes. Nat Commun 2024; 15:3335. [PMID: 38637555 PMCID: PMC11026479 DOI: 10.1038/s41467-024-46795-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 03/06/2024] [Indexed: 04/20/2024] Open
Abstract
Understanding the function of rare non-coding variants represents a significant challenge. Using MapUTR, a screening method, we studied the function of rare 3' UTR variants affecting mRNA abundance post-transcriptionally. Among 17,301 rare gnomAD variants, an average of 24.5% were functional, with 70% in cancer-related genes, many in critical cancer pathways. This observation motivated an interrogation of 11,929 somatic mutations, uncovering 3928 (33%) functional mutations in 155 cancer driver genes. Functional MapUTR variants were enriched in microRNA- or protein-binding sites and may underlie outlier gene expression in tumors. Further, we introduce untranslated tumor mutational burden (uTMB), a metric reflecting the amount of somatic functional MapUTR variants of a tumor and show its potential in predicting patient survival. Through prime editing, we characterized three variants in cancer-relevant genes (MFN2, FOSL2, and IRAK1), demonstrating their cancer-driving potential. Our study elucidates the function of tens of thousands of non-coding variants, nominates non-coding cancer driver mutations, and demonstrates their potential contributions to cancer.
Collapse
|
2
|
Systematic investigation of allelic regulatory activity of schizophrenia-associated common variants. CELL GENOMICS 2023; 3:100404. [PMID: 37868037 PMCID: PMC10589626 DOI: 10.1016/j.xgen.2023.100404] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 02/23/2023] [Accepted: 08/21/2023] [Indexed: 10/24/2023]
Abstract
Genome-wide association studies (GWASs) have successfully identified 145 genomic regions that contribute to schizophrenia risk, but linkage disequilibrium makes it challenging to discern causal variants. We performed a massively parallel reporter assay (MPRA) on 5,173 fine-mapped schizophrenia GWAS variants in primary human neural progenitors and identified 439 variants with allelic regulatory effects (MPRA-positive variants). Transcription factor binding had modest predictive power, while fine-map posterior probability, enhancer overlap, and evolutionary conservation failed to predict MPRA-positive variants. Furthermore, 64% of MPRA-positive variants did not exhibit expressive quantitative trait loci signature, suggesting that MPRA could identify yet unexplored variants with regulatory potentials. To predict the combinatorial effect of MPRA-positive variants on gene regulation, we propose an accessibility-by-contact model that combines MPRA-measured allelic activity with neuronal chromatin architecture.
Collapse
|
3
|
A multiplexed bacterial two-hybrid for rapid characterization of protein-protein interactions and iterative protein design. Nat Commun 2023; 14:4636. [PMID: 37532706 PMCID: PMC10397247 DOI: 10.1038/s41467-023-38697-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 05/11/2023] [Indexed: 08/04/2023] Open
Abstract
Protein-protein interactions (PPIs) are crucial for biological functions and have applications ranging from drug design to synthetic cell circuits. Coiled-coils have been used as a model to study the sequence determinants of specificity. However, building well-behaved sets of orthogonal pairs of coiled-coils remains challenging due to inaccurate predictions of orthogonality and difficulties in testing at scale. To address this, we develop the next-generation bacterial two-hybrid (NGB2H) method, which allows for the rapid exploration of interactions of programmed protein libraries in a quantitative and scalable way using next-generation sequencing readout. We design, build, and test large sets of orthogonal synthetic coiled-coils, assayed over 8,000 PPIs, and used the dataset to train a more accurate coiled-coil scoring algorithm (iCipa). After characterizing nearly 18,000 new PPIs, we identify to the best of our knowledge the largest set of orthogonal coiled-coils to date, with fifteen on-target interactions. Our approach provides a powerful tool for the design of orthogonal PPIs.
Collapse
|
4
|
Discovery and Validation of Context-Dependent Synthetic Mammalian Promoters. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.11.539703. [PMID: 37214829 PMCID: PMC10197685 DOI: 10.1101/2023.05.11.539703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Cellular transcription enables cells to adapt to various stimuli and maintain homeostasis. Transcription factors bind to transcription response elements (TREs) in gene promoters, initiating transcription. Synthetic promoters, derived from natural TREs, can be engineered to control exogenous gene expression using endogenous transcription machinery. This technology has found extensive use in biological research for applications including reporter gene assays, biomarker development, and programming synthetic circuits in living cells. However, a reliable and precise method for selecting minimally-sized synthetic promoters with desired background, amplitude, and stimulation response profiles has been elusive. In this study, we introduce a massively parallel reporter assay library containing 6184 synthetic promoters, each less than 250 bp in length. This comprehensive library allows for rapid identification of promoters with optimal transcriptional output parameters across multiple cell lines and stimuli. We showcase this library's utility to identify promoters activated in unique cell types, and in response to metabolites, mitogens, cellular toxins, and agonism of both aminergic and non-aminergic GPCRs. We further show these promoters can be used in luciferase reporter assays, eliciting 50-100 fold dynamic ranges in response to stimuli. Our platform is effective, easily implemented, and provides a solution for selecting short-length promoters with precise performance for a multitude of applications.
Collapse
|
5
|
Abstract
Predicting the function of noncoding variation is a major challenge in modern genetics. In this study, we used massively parallel reporter assays to screen 5706 variants identified from genome-wide association studies for both Alzheimer's disease (AD) and progressive supranuclear palsy (PSP), identifying 320 functional regulatory variants (frVars) across 27 loci, including the complex 17q21.31 region. We identified and validated multiple risk loci using CRISPR interference or excision, including complement 4 (C4A) and APOC1 in AD and PLEKHM1 and KANSL1 in PSP. Functional variants disrupt transcription factor binding sites converging on enhancers with cell type-specific activity in PSP and AD, implicating a neuronal SP1-driven regulatory network in PSP pathogenesis. These analyses suggest that noncoding genetic risk is driven by common genetic variants through their aggregate activity on specific transcriptional programs.
Collapse
|
6
|
SwabExpress: An end-to-end protocol for extraction-free covid-19 testing. Clin Chem 2021; 68:143-152. [PMID: 34286830 DOI: 10.1093/clinchem/hvab132] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 06/28/2021] [Indexed: 11/13/2022]
Abstract
BACKGROUND The urgent need for massively scaled clinical testing for SARS-CoV-2, along with global shortages of critical reagents and supplies, has necessitated development of streamlined laboratory testing protocols. Conventional nucleic acid testing for SARS-CoV-2 involves collection of a clinical specimen with a nasopharyngeal swab in transport medium, nucleic acid extraction, and quantitative reverse transcription PCR (RT-qPCR) (1). As testing has scaled across the world, the global supply chain has buckled, rendering testing reagents and materials scarce (2). To address shortages, we developed SwabExpress, an end-to-end protocol developed to employ mass produced anterior nares swabs and bypass the requirement for transport media and nucleic acid extraction. METHODS We evaluated anterior nares swabs, transported dry and eluted in low-TE buffer as a direct-to-RT-qPCR alternative to extraction-dependent viral transport media. We validated our protocol of using heat treatment for viral inactivation and added a proteinase K digestion step to reduce amplification interference. We tested this protocol across archived and prospectively collected swab specimens to fine-tune test performance. RESULTS After optimization, SwabExpress has a low limit of detection at 2-4 molecules/uL, 100% sensitivity, and 99.4% specificity when compared side-by-side with a traditional RT-qPCR protocol employing extraction. On real-world specimens, SwabExpress outperforms an automated extraction system while simultaneously reducing cost and hands-on time. CONCLUSION SwabExpress is a simplified workflow that facilitates scaled testing for COVID-19 without sacrificing test performance. It may serve as a template for the simplification of PCR-based clinical laboratory tests, particularly in times of critical shortages during pandemics.
Collapse
|
7
|
Massively scaled-up testing for SARS-CoV-2 RNA via next-generation sequencing of pooled and barcoded nasal and saliva samples. Nat Biomed Eng 2021; 5:657-665. [PMID: 34211145 PMCID: PMC10810734 DOI: 10.1038/s41551-021-00754-5] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 05/20/2021] [Indexed: 02/02/2023]
Abstract
Frequent and widespread testing of members of the population who are asymptomatic for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is essential for the mitigation of the transmission of the virus. Despite the recent increases in testing capacity, tests based on quantitative polymerase chain reaction (qPCR) assays cannot be easily deployed at the scale required for population-wide screening. Here, we show that next-generation sequencing of pooled samples tagged with sample-specific molecular barcodes enables the testing of thousands of nasal or saliva samples for SARS-CoV-2 RNA in a single run without the need for RNA extraction. The assay, which we named SwabSeq, incorporates a synthetic RNA standard that facilitates end-point quantification and the calling of true negatives, and that reduces the requirements for automation, purification and sample-to-sample normalization. We used SwabSeq to perform 80,000 tests, with an analytical sensitivity and specificity comparable to or better than traditional qPCR tests, in less than two months with turnaround times of less than 24 h. SwabSeq could be rapidly adapted for the detection of other pathogens.
Collapse
|
8
|
SwabExpress: An end-to-end protocol for extraction-free COVID-19 testing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2020.04.22.056283. [PMID: 32511368 PMCID: PMC7263496 DOI: 10.1101/2020.04.22.056283] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
BACKGROUND The urgent need for massively scaled clinical testing for SARS-CoV-2, along with global shortages of critical reagents and supplies, has necessitated development of streamlined laboratory testing protocols. Conventional nucleic acid testing for SARS-CoV-2 involves collection of a clinical specimen with a nasopharyngeal swab in transport medium, nucleic acid extraction, and quantitative reverse transcription PCR (RT-qPCR) (1). As testing has scaled across the world, the global supply chain has buckled, rendering testing reagents and materials scarce (2). To address shortages, we developed SwabExpress, an end-to-end protocol developed to employ mass produced anterior nares swabs and bypass the requirement for transport media and nucleic acid extraction. METHODS We evaluated anterior nares swabs, transported dry and eluted in low-TE buffer as a direct-to-RT-qPCR alternative to extraction-dependent viral transport media. We validated our protocol of using heat treatment for viral activation and added a proteinase K digestion step to reduce amplification interference. We tested this protocol across archived and prospectively collected swab specimens to fine-tune test performance. RESULTS After optimization, SwabExpress has a low limit of detection at 2-4 molecules/uL, 100% sensitivity, and 99.4% specificity when compared side-by-side with a traditional RT-qPCR protocol employing extraction. On real-world specimens, SwabExpress outperforms an automated extraction system while simultaneously reducing cost and hands-on time. CONCLUSION SwabExpress is a simplified workflow that facilitates scaled testing for COVID-19 without sacrificing test performance. It may serve as a template for the simplification of PCR-based clinical laboratory tests, particularly in times of critical shortages during pandemics.
Collapse
|
9
|
Swab-Seq: A high-throughput platform for massively scaled up SARS-CoV-2 testing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021. [PMID: 32909008 PMCID: PMC7480060 DOI: 10.1101/2020.08.04.20167874] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The rapid spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is due to the high rates of transmission by individuals who are asymptomatic at the time of transmission1,2. Frequent, widespread testing of the asymptomatic population for SARS-CoV-2 is essential to suppress viral transmission. Despite increases in testing capacity, multiple challenges remain in deploying traditional reverse transcription and quantitative PCR (RT-qPCR) tests at the scale required for population screening of asymptomatic individuals. We have developed SwabSeq, a high-throughput testing platform for SARS-CoV-2 that uses next-generation sequencing as a readout. SwabSeq employs sample-specific molecular barcodes to enable thousands of samples to be combined and simultaneously analyzed for the presence or absence of SARS-CoV-2 in a single run. Importantly, SwabSeq incorporates an in vitro RNA standard that mimics the viral amplicon, but can be distinguished by sequencing. This standard allows for end-point rather than quantitative PCR, improves quantitation, reduces requirements for automation and sample-to-sample normalization, enables purification-free detection, and gives better ability to call true negatives. After setting up SwabSeq in a high-complexity CLIA laboratory, we performed more than 80,000 tests for COVID-19 in less than two months, confirming in a real world setting that SwabSeq inexpensively delivers highly sensitive and specific results at scale, with a turn-around of less than 24 hours. Our clinical laboratory uses SwabSeq to test both nasal and saliva samples without RNA extraction, while maintaining analytical sensitivity comparable to or better than traditional RT-qPCR tests. Moving forward, SwabSeq can rapidly scale up testing to mitigate devastating spread of novel pathogens.
Collapse
|
10
|
Multiplexed characterization of rationally designed promoter architectures deconstructs combinatorial logic for IPTG-inducible systems. Nat Commun 2021; 12:325. [PMID: 33436562 PMCID: PMC7804116 DOI: 10.1038/s41467-020-20094-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 11/04/2020] [Indexed: 12/21/2022] Open
Abstract
A crucial step towards engineering biological systems is the ability to precisely tune the genetic response to environmental stimuli. In the case of Escherichia coli inducible promoters, our incomplete understanding of the relationship between sequence composition and gene expression hinders our ability to predictably control transcriptional responses. Here, we profile the expression dynamics of 8269 rationally designed, IPTG-inducible promoters that collectively explore the individual and combinatorial effects of RNA polymerase and LacI repressor binding site strengths. We then fit a statistical mechanics model to measured expression that accurately models gene expression and reveals properties of theoretically optimal inducible promoters. Furthermore, we characterize three alternative promoter architectures and show that repositioning binding sites within promoters influences the types of combinatorial effects observed between promoter elements. In total, this approach enables us to deconstruct relationships between inducible promoter elements and discover practical insights for engineering inducible promoters with desirable characteristics.
Collapse
|
11
|
Systematic identification of cis-regulatory variants that cause gene expression differences in a yeast cross. eLife 2020; 9:e62669. [PMID: 33179598 PMCID: PMC7685706 DOI: 10.7554/elife.62669] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 11/11/2020] [Indexed: 02/06/2023] Open
Abstract
Sequence variation in regulatory DNA alters gene expression and shapes genetically complex traits. However, the identification of individual, causal regulatory variants is challenging. Here, we used a massively parallel reporter assay to measure the cis-regulatory consequences of 5832 natural DNA variants in the promoters of 2503 genes in the yeast Saccharomyces cerevisiae. We identified 451 causal variants, which underlie genetic loci known to affect gene expression. Several promoters harbored multiple causal variants. In five promoters, pairs of variants showed non-additive, epistatic interactions. Causal variants were enriched at conserved nucleotides, tended to have low derived allele frequency, and were depleted from promoters of essential genes, which is consistent with the action of negative selection. Causal variants were also enriched for alterations in transcription factor binding sites. Models integrating these features provided modest, but statistically significant, ability to predict causal variants. This work revealed a complex molecular basis for cis-acting regulatory variation.
Collapse
|
12
|
DropSynth 2.0: high-fidelity multiplexed gene synthesis in emulsions. Nucleic Acids Res 2020; 48:e95. [PMID: 32692349 PMCID: PMC7498354 DOI: 10.1093/nar/gkaa600] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 06/13/2020] [Accepted: 07/11/2020] [Indexed: 01/12/2023] Open
Abstract
Multiplexed assays allow functional testing of large synthetic libraries of genetic elements, but are limited by the designability, length, fidelity and scale of the input DNA. Here, we improve DropSynth, a low-cost, multiplexed method that builds gene libraries by compartmentalizing and assembling microarray-derived oligonucleotides in vortexed emulsions. By optimizing enzyme choice, adding enzymatic error correction and increasing scale, we show that DropSynth can build thousands of gene-length fragments at >20% fidelity.
Collapse
|
13
|
Structural and functional characterization of G protein-coupled receptors with deep mutational scanning. eLife 2020; 9:54895. [PMID: 33084570 PMCID: PMC7707821 DOI: 10.7554/elife.54895] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2020] [Accepted: 10/16/2020] [Indexed: 01/14/2023] Open
Abstract
The >800 human G protein–coupled receptors (GPCRs) are responsible for transducing diverse chemical stimuli to alter cell state- and are the largest class of drug targets. Their myriad structural conformations and various modes of signaling make it challenging to understand their structure and function. Here, we developed a platform to characterize large libraries of GPCR variants in human cell lines with a barcoded transcriptional reporter of G protein signal transduction. We tested 7800 of 7828 possible single amino acid substitutions to the beta-2 adrenergic receptor (β2AR) at four concentrations of the agonist isoproterenol. We identified residues specifically important for β2AR signaling, mutations in the human population that are potentially loss of function, and residues that modulate basal activity. Using unsupervised learning, we identify residues critical for signaling, including all major structural motifs and molecular interfaces. We also find a previously uncharacterized structural latch spanning the first two extracellular loops that is highly conserved across Class A GPCRs and is conformationally rigid in both the inactive and active states of the receptor. More broadly, by linking deep mutational scanning with engineered transcriptional reporters, we establish a generalizable method for exploring pharmacogenomics, structure and function across broad classes of drug receptors.
Collapse
|
14
|
Dissection of c-AMP Response Element Architecture by Using Genomic and Episomal Massively Parallel Reporter Assays. Cell Syst 2020; 11:75-85.e7. [PMID: 32603702 DOI: 10.1016/j.cels.2020.05.011] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 02/16/2020] [Accepted: 05/26/2020] [Indexed: 11/15/2022]
Abstract
In eukaryotes, transcription factors (TFs) orchestrate gene expression by binding to TF-binding sites (TFBSs) and localizing transcriptional co-regulators and RNA polymerase II to cis-regulatory elements. However, we lack a basic understanding of the relationship between TFBS composition and their quantitative transcriptional responses. Here, we measured expression driven by 17,406 synthetic cis-regulatory elements with varied compositions of a model TFBS, the c-AMP response element (CRE) by using massively parallel reporter assays (MPRAs). We find CRE number, affinity, and promoter proximity largely determines expression. In addition, we observe expression modulation based on the spacing between CREs and CRE distance to the promoter, where expression follows a helical periodicity. Finally, we compare library expression between an episomal MPRA and a genomically integrated MPRA, where a single cis-regulatory element is assayed per cell at a defined locus. These assays largely recapitulate each other, although weaker, non-canonical CREs exhibit greater activity in a genomic context.
Collapse
|
15
|
A Scalable, Multiplexed Assay for Decoding GPCR-Ligand Interactions with RNA Sequencing. Cell Syst 2019; 8:254-260.e6. [PMID: 30904378 PMCID: PMC6907015 DOI: 10.1016/j.cels.2019.02.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Revised: 01/16/2019] [Accepted: 02/26/2019] [Indexed: 12/13/2022]
Abstract
G protein-coupled receptors (GPCRs) are central to how mammalian cells sense and respond to chemicals. Mammalian olfactory receptors (ORs), the largest family of GPCRs, mediate the sense of smell through activation by small molecules, though for most bonafide ligands, they have not been identified. Here, we introduce a platform to screen large chemical panels against multiplexed GPCR libraries using next-generation sequencing of barcoded genetic reporters in stably engineered human cell lines. We mapped 39 mammalian ORs against 181 odorants and identified 79 interactions that have not been reported to our knowledge, including ligands for 15 previously orphaned receptors. This multiplexed receptor assay allows the cost-effective mapping of large chemical libraries to receptor repertoires at scale.
Collapse
|
16
|
A Multiplexed Assay for Exon Recognition Reveals that an Unappreciated Fraction of Rare Genetic Variants Cause Large-Effect Splicing Disruptions. Mol Cell 2019; 73:183-194.e8. [PMID: 30503770 PMCID: PMC6599603 DOI: 10.1016/j.molcel.2018.10.037] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Revised: 07/19/2018] [Accepted: 10/23/2018] [Indexed: 11/23/2022]
Abstract
Mutations that lead to splicing defects can have severe consequences on gene function and cause disease. Here, we explore how human genetic variation affects exon recognition by developing a multiplexed functional assay of splicing using Sort-seq (MFASS). We assayed 27,733 variants in the Exome Aggregation Consortium (ExAC) within or adjacent to 2,198 human exons in the MFASS minigene reporter and found that 3.8% (1,050) of variants, most of which are extremely rare, led to large-effect splice-disrupting variants (SDVs). Importantly, we find that 83% of SDVs are located outside of canonical splice sites, are distributed evenly across distinct exonic and intronic regions, and are difficult to predict a priori. Our results indicate extant, rare genetic variants can have large functional effects on splicing at appreciable rates, even outside the context of disease, and MFASS enables their empirical assessment at scale.
Collapse
|
17
|
Systematic Dissection of Sequence Elements Controlling σ70 Promoters Using a Genomically Encoded Multiplexed Reporter Assay in Escherichia coli. Biochemistry 2018; 58:1539-1551. [PMID: 29388765 DOI: 10.1021/acs.biochem.7b01069] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Promoters are the key drivers of gene expression and are largely responsible for the regulation of cellular responses to time and environment. In Escherichia coli, decades of studies have revealed most, if not all, of the sequence elements necessary to encode promoter function. Despite our knowledge of these motifs, it is still not possible to predict the strength and regulation of a promoter from primary sequence alone. Here we develop a novel multiplexed assay to study promoter function in E. coli by building a site-specific genomic recombination-mediated cassette exchange system that allows for the facile construction and testing of large libraries of genetic designs integrated into precise genomic locations. We build and test a library of 10898 σ70 promoter variants consisting of all combinations of a set of eight -35 elements, eight -10 elements, three UP elements, eight spacers, and eight backgrounds. We find that the -35 and -10 sequence elements can explain approximately 74% of the variance in promoter strength within our data set using a simple log-linear statistical model. Simple neural network models explain >95% of the variance in our data set by capturing nonlinear interactions with the spacer, background, and UP elements.
Collapse
|
18
|
Abstract
Understanding the functional effects of DNA sequence variants is of critical importance for studies of basic biology, evolution, and medical genetics; however, measuring these effects in a high-throughput manner is a major challenge. One promising avenue is precise editing with the CRISPR-Cas9 system, which allows for generation of DNA double-strand breaks (DSBs) at genomic sites matching the targeting sequence of a guide RNA (gRNA). Recent studies have used CRISPR libraries to generate many frameshift mutations genome wide through faulty repair of CRISPR-directed breaks by nonhomologous end joining (NHEJ) 1 . Here, we developed a CRISPR-library-based approach for highly efficient and precise genome-wide variant engineering. We used our method to examine the functional consequences of premature-termination codons (PTCs) at different locations within all annotated essential genes in yeast. We found that most PTCs were highly deleterious unless they occurred close to the 3' end of the gene and did not affect an annotated protein domain. Unexpectedly, we discovered that some putatively essential genes are dispensable, whereas others have large dispensable regions. This approach can be used to profile the effects of large classes of variants in a high-throughput manner.
Collapse
|
19
|
Multiplexed gene synthesis in emulsions for exploring protein functional landscapes. Science 2018; 359:343-347. [PMID: 29301959 PMCID: PMC6261299 DOI: 10.1126/science.aao5167] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 12/18/2017] [Indexed: 12/14/2022]
Abstract
Improving our ability to construct and functionally characterize DNA sequences would broadly accelerate progress in biology. Here, we introduce DropSynth, a scalable, low-cost method to build thousands of defined gene-length constructs in a pooled (multiplexed) manner. DropSynth uses a library of barcoded beads that pull down the oligonucleotides necessary for a gene's assembly, which are then processed and assembled in water-in-oil emulsions. We used DropSynth to successfully build more than 7000 synthetic genes that encode phylogenetically diverse homologs of two essential genes in Escherichia coli We tested the ability of phosphopantetheine adenylyltransferase homologs to complement a knockout E. coli strain in multiplex, revealing core functional motifs and reasons underlying homolog incompatibility. DropSynth coupled with multiplexed functional assays allows us to rationally explore sequence-function relationships at an unprecedented scale.
Collapse
|
20
|
A systematic comparison of error correction enzymes by next-generation sequencing. Nucleic Acids Res 2017; 45:9206-9217. [PMID: 28911123 PMCID: PMC5587813 DOI: 10.1093/nar/gkx691] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Revised: 07/14/2017] [Accepted: 07/31/2017] [Indexed: 11/13/2022] Open
Abstract
Gene synthesis, the process of assembling gene-length fragments from shorter groups of oligonucleotides (oligos), is becoming an increasingly important tool in molecular and synthetic biology. The length, quality and cost of gene synthesis are limited by errors produced during oligo synthesis and subsequent assembly. Enzymatic error correction methods are cost-effective means to ameliorate errors in gene synthesis. Previous analyses of these methods relied on cloning and Sanger sequencing to evaluate their efficiencies, limiting quantitative assessment. Here, we develop a method to quantify errors in synthetic DNA by next-generation sequencing. We analyzed errors in model gene assemblies and systematically compared six different error correction enzymes across 11 conditions. We find that ErrASE and T7 Endonuclease I are the most effective at decreasing average error rates (up to 5.8-fold relative to the input), whereas MutS is the best for increasing the number of perfect assemblies (up to 25.2-fold). We are able to quantify differential specificities such as ErrASE preferentially corrects C/G transversions whereas T7 Endonuclease I preferentially corrects A/T transversions. More generally, this experimental and computational pipeline is a fast, scalable and extensible way to analyze errors in gene assemblies, to profile error correction methods, and to benchmark DNA synthesis methods.
Collapse
|
21
|
|
22
|
Abstract
Multiplex Automated Genome Engineering (MAGE) allows simultaneous mutagenesis of multiple target sites in bacterial genomes using short oligonucleotides. However, large-scale mutagenesis requires hundreds to thousands of unique oligos, which are costly to synthesize and impossible to scale-up by traditional phosphoramidite column-based approaches. Here, we describe a novel method to amplify oligos from microarray chips for direct use in MAGE to perturb thousands of genomic sites simultaneously. We demonstrated the feasibility of large-scale mutagenesis by inserting T7 promoters upstream of 2585 operons in E. coli using this method, which we call Microarray-Oligonucleotide (MO)-MAGE. The resulting mutant library was characterized by high-throughput sequencing to show that all attempted insertions were estimated to have occurred at an average frequency of 0.02% per locus with 0.4 average insertions per cell. MO-MAGE enables cost-effective large-scale targeted genome engineering that should be useful for a variety of applications in synthetic biology and metabolic engineering.
Collapse
|
23
|
Large-scale de novo DNA synthesis: technologies and applications. Nat Methods 2014; 11:499-507. [PMID: 24781323 PMCID: PMC7098426 DOI: 10.1038/nmeth.2918] [Citation(s) in RCA: 462] [Impact Index Per Article: 46.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 03/10/2014] [Indexed: 12/23/2022]
Abstract
For over 60 years, the synthetic production of new DNA sequences has helped researchers understand and engineer biology. Here we summarize methods and caveats for the de novo synthesis of DNA, with particular emphasis on recent technologies that allow for large-scale and low-cost production. In addition, we discuss emerging applications enabled by large-scale de novo DNA constructs, as well as the challenges and opportunities that lie ahead.
Collapse
|
24
|
|
25
|
Abstract
Most amino acids are encoded by multiple codons, and codon choice has strong effects on protein expression. Rare codons are enriched at the N terminus of genes in most organisms, although the causes and effects of this bias are unclear. Here, we measure expression from >14,000 synthetic reporters in Escherichia coli and show that using N-terminal rare codons instead of common ones increases expression by ~14-fold (median 4-fold). We quantify how individual N-terminal codons affect expression and show that these effects shape the sequence of natural genes. Finally, we demonstrate that reduced RNA structure and not codon rarity itself is responsible for expression increases. Our observations resolve controversies over the roles of N-terminal codon bias and suggest a straightforward method for optimizing heterologous gene expression in bacteria.
Collapse
|
26
|
CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering. Nat Biotechnol 2013; 31:833-8. [PMID: 23907171 PMCID: PMC3818127 DOI: 10.1038/nbt.2675] [Citation(s) in RCA: 1304] [Impact Index Per Article: 118.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2013] [Accepted: 07/24/2013] [Indexed: 02/07/2023]
Abstract
Prokaryotic type II CRISPR-Cas systems can be adapted to enable targeted genome modifications across a range of eukaryotes. Here we engineer this system to enable RNA-guided genome regulation in human cells by tethering transcriptional activation domains either directly to a nuclease-null Cas9 protein or to an aptamer-modified single guide RNA (sgRNA). Using this functionality we developed a transcriptional activation-based assay to determine the landscape of off-target binding of sgRNA:Cas9 complexes and compared it with the off-target activity of transcription activator-like (TALs) effectors. Our results reveal that specificity profiles are sgRNA dependent, and that sgRNA:Cas9 complexes and 18-mer TAL effectors can potentially tolerate 1-3 and 1-2 target mismatches, respectively. By engineering a requirement for cooperativity through offset nicking for genome editing or through multiple synergistic sgRNAs for robust transcriptional activation, we suggest methods to mitigate off-target phenomena. Our results expand the versatility of the sgRNA:Cas9 tool and highlight the critical need to engineer improved specificity.
Collapse
|
27
|
Abstract
Digital information is accumulating at an astounding rate, straining our ability to store and archive it. DNA is among the most dense and stable information media known. The development of new technologies in both DNA synthesis and sequencing make DNA an increasingly feasible digital storage medium. We developed a strategy to encode arbitrary digital information in DNA, wrote a 5.27-megabit book using DNA microchips, and read the book by using next-generation DNA sequencing.
Collapse
|
28
|
|
29
|
Abstract
De novo synthesis of long double-stranded DNA constructs has a myriad of applications in biology and biological engineering. However, its widespread adoption has been hindered by high costs. Cost can be significantly reduced by using oligonucleotides synthesized on high-density DNA chips. However, most methods for using off-chip DNA for gene synthesis have failed to scale due to the high error rates, low yields, and high chemical complexity of the chip-synthesized oligonucleotides. We have recently demonstrated that some commercial DNA chip manufacturers have improved error rates, and that the issues of chemical complexity and low yields can be solved by using barcoded primers to accurately and efficiently amplify subpools of oligonucleotides. This article includes protocols for computationally designing the DNA chip, amplifying the oligonucleotide subpools, and assembling 500-800 basepair (bp) constructs.
Collapse
|
30
|
TABASCO: A single molecule, base-pair resolved gene expression simulator. BMC Bioinformatics 2007; 8:480. [PMID: 18093293 PMCID: PMC2242808 DOI: 10.1186/1471-2105-8-480] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2007] [Accepted: 12/19/2007] [Indexed: 11/16/2022] Open
Abstract
Background Experimental studies of gene expression have identified some of the individual molecular components and elementary reactions that comprise and control cellular behavior. Given our current understanding of gene expression, and the goals of biotechnology research, both scientists and engineers would benefit from detailed simulators that can explicitly compute genome-wide expression levels as a function of individual molecular events, including the activities and interactions of molecules on DNA at single base pair resolution. However, for practical reasons including computational tractability, available simulators have not been able to represent genome-scale models of gene expression at this level of detail. Results Here we develop a simulator, TABASCO , which enables the precise representation of individual molecules and events in gene expression for genome-scale systems. We use a single molecule computational engine to track individual molecules interacting with and along nucleic acid polymers at single base resolution. Tabasco uses logical rules to automatically update and delimit the set of species and reactions that comprise a system during simulation, thereby avoiding the need for a priori specification of all possible combinations of molecules and reaction events. We confirm that single molecule, base-pair resolved simulation using TABASCO (Tabasco) can accurately compute gene expression dynamics and, moving beyond previous simulators, provide for the direct representation of intermolecular events such as polymerase collisions and promoter occlusion. We demonstrate the computational capacity of Tabasco by simulating the entirety of gene expression during bacteriophage T7 infection; for reference, the 39,937 base pair T7 genome encodes 56 genes that are transcribed by two types of RNA polymerases active across 22 promoters. Conclusion Tabasco enables genome-scale simulation of transcription and translation at individual molecule and single base-pair resolution. By directly representing the position and activity of individual molecules on DNA, Tabasco can directly test the effects of detailed molecular processes on system-wide gene expression. Tabasco would also be useful for studying the complex regulatory mechanisms controlling eukaryotic gene expression. The computational engine underlying Tabasco could also be adapted to represent other types of processive systems in which individual reaction events are organized across a single spatial dimension (e.g., polysaccharide synthesis).
Collapse
|
31
|
Abstract
Natural biological systems are selected by evolution to continue to exist and evolve. Evolution likely gives rise to complicated systems that are difficult to understand and manipulate. Here, we redesign the genome of a natural biological system, bacteriophage T7, in order to specify an engineered surrogate that, if viable, would be easier to study and extend. Our initial design goals were to physically separate and enable unique manipulation of primary genetic elements. Implicit in our design are the hypotheses that overlapping genetic elements are, in aggregate, nonessential for T7 viability and that our models for the functions encoded by elements are sufficient. To test our initial design, we replaced the left 11 515 base pairs (bp) of the 39 937 bp wild-type genome with 12 179 bp of engineered DNA. The resulting chimeric genome encodes a viable bacteriophage that appears to maintain key features of the original while being simpler to model and easier to manipulate. The viability of our initial design suggests that the genomes encoding natural biological systems can be systematically redesigned and built anew in service of scientific understanding or human intention.
Collapse
MESH Headings
- Algorithms
- Bacteriophage T7/genetics
- Bacteriophage T7/growth & development
- Bacteriophage T7/physiology
- Base Pairing
- DNA, Recombinant/chemical synthesis
- DNA, Recombinant/genetics
- DNA, Viral/genetics
- Escherichia coli/virology
- Genes, Overlapping
- Genes, Viral
- Genetic Engineering
- Genome, Viral
- Models, Biological
- Models, Genetic
- Molecular Sequence Data
- Organisms, Genetically Modified/genetics
- Organisms, Genetically Modified/growth & development
- Organisms, Genetically Modified/physiology
- Sequence Deletion
- Systems Biology/methods
- Transfection
- Viral Proteins/genetics
- Viral Proteins/physiology
- Virus Replication
Collapse
|
32
|
Solid and papillary epithelial neoplasm of the pancreas in an 11-year-old girl: case report and literature review. Pediatr Hematol Oncol 2003; 20:357-60. [PMID: 12775532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
Abstract
An 11-year-old girl presented with episodic abdominal pain of 2 years' duration. CT scan of the abdomen showed a mass in the tail of the pancreas. A distal pancreatectomy was done and the tumor was excised. Macroscopic and immunohistochemical studies were compatible with a solid and papillary epithelial neoplasm. This is a rare neoplasm with a decidedly female predominance. It has a very low malignant potential with a good prognosis. Surgical removal of the tumor is usually curative.
Collapse
|
33
|
Abstract
The authors report on 2 children with pernicious anemia, sisters, who presented with hypermelanosis as one of the clinical manifestations. The hypermelanosis disappeared with adequate treatment of vitamin B12 deficiency. Vitamin B12 deficiency should be considered in the differential diagnosis of a child presenting with hyperpigmentation and macrocytic red cell indices.
Collapse
|
34
|
Fever, hemorrhagic bullae and gastritis in a 20-month-old male. ANNALS OF ALLERGY 1989; 63:179-83. [PMID: 2774301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|