1
|
Gunamalai L, Singh P, Berg B, Shi L, Sanchez E, Smith A, Breton G, Bedford MT, Balciunas D, Kapoor A. Functional characterization of QT interval associated SCN5A enhancer variants identify combined additive effects. HGG ADVANCES 2024; 6:100358. [PMID: 39354714 DOI: 10.1016/j.xhgg.2024.100358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 09/26/2024] [Accepted: 09/26/2024] [Indexed: 10/03/2024] Open
Abstract
Several empirical and theoretical studies suggest the presence of multiple enhancers per gene that collectively regulate gene expression, and that common sequence variation impacting on the activities of these enhancers is a major source of inter-individual gene expression variability. However, for the vast majority of genes, enhancers and the underlying regulatory variation remains unknown. Even for the genes with well-characterized enhancers, the nature of the combined effects from multiple enhancers and their variants, when known, on gene expression regulation remains unexplored. Here, we have evaluated the combined effects from five SCN5A enhancers and their regulatory variants that are known to collectively correlate with SCN5A cardiac expression and underlie QT interval association in the general population. Using small deletions centered at the regulatory variants in episomal reporter assays in a mouse cardiomyocyte cell line, we demonstrate that the variants and their flanking sequences play critical role in individual enhancer activities, likely being a transcription factor (TF) binding site. By oligonucleotide-based pulldown assays on predicted TFs, we identify the TFs likely driving allele-specific enhancer activities. Using all 32 possible allelic synthetic constructs in reporter assays, representing the five bi-allelic enhancers, we demonstrate combined additive effects on overall enhancer activities. Using transient enhancer assays in zebrafish embryos we demonstrate that four elements act as enhancers in vivo. Together, these studies uncover the TFs driving the enhancer activities of QT interval associated SCN5A regulatory variants, reveal the additive effects from allelic combinations of these regulatory variants, and prove their potential to act as enhancers in vivo.
Collapse
Affiliation(s)
- Lavanya Gunamalai
- Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Parul Singh
- Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Brian Berg
- Department of Biology, College of Science and Technology, Temple University, Philadelphia, PA 19122, USA
| | - Leilei Shi
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Ernesto Sanchez
- Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Alexa Smith
- Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Ghislain Breton
- Department of Integrative Biology and Pharmacology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Mark T Bedford
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Darius Balciunas
- Department of Biology, College of Science and Technology, Temple University, Philadelphia, PA 19122, USA
| | - Ashish Kapoor
- Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
| |
Collapse
|
2
|
Gervais NC, Shapiro RS. Discovering the hidden function in fungal genomes. Nat Commun 2024; 15:8219. [PMID: 39300175 DOI: 10.1038/s41467-024-52568-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 09/11/2024] [Indexed: 09/22/2024] Open
Abstract
New molecular technologies have helped unveil previously unexplored facets of the genome beyond the canonical proteome, including microproteins and short ORFs, products of alternative splicing, regulatory non-coding RNAs, as well as transposable elements, cis-regulatory DNA, and other highly repetitive regions of DNA. In this Review, we highlight what is known about this 'hidden genome' within the fungal kingdom. Using well-established model systems as a contextual framework, we describe key elements of this hidden genome in diverse fungal species, and explore how these factors perform critical functions in regulating fungal metabolism, stress tolerance, and pathogenesis. Finally, we discuss new technologies that may be adapted to further characterize the hidden genome in fungi.
Collapse
Affiliation(s)
- Nicholas C Gervais
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON, Canada
| | - Rebecca S Shapiro
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON, Canada.
| |
Collapse
|
3
|
Renganaath K, Albert FW. Trans-eQTL hotspots shape complex traits by modulating cellular states. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.14.567054. [PMID: 38014174 PMCID: PMC10680915 DOI: 10.1101/2023.11.14.567054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Regulatory genetic variation shapes gene expression, providing an important mechanism connecting DNA variation and complex traits. The causal relationships between gene expression and complex traits remain poorly understood. Here, we integrated transcriptomes and 46 genetically complex growth traits in a large cross between two strains of the yeast Saccharomyces cerevisiae. We discovered thousands of genetic correlations between gene expression and growth, suggesting potential functional connections. Local regulatory variation was a minor source of these genetic correlations. Instead, genetic correlations tended to arise from multiple independent trans-acting regulatory loci. Trans-acting hotspots that affect the expression of numerous genes accounted for particularly large fractions of genetic growth variation and of genetic correlations between gene expression and growth. Genes with genetic correlations were enriched for similar biological processes across traits, but with heterogeneous direction of effect. Our results reveal how trans-acting regulatory hotspots shape complex traits by altering cellular states.
Collapse
Affiliation(s)
- Kaushik Renganaath
- Department of Genetics, Cell Biology, & Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Frank W Albert
- Department of Genetics, Cell Biology, & Development, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
4
|
Hassan MM, Tenazas F, Williams A, Chiu JW, Robin C, Russell DA, Golz JF. Minimizing IP issues associated with gene constructs encoding the Bt toxin - a case study. BMC Biotechnol 2024; 24:37. [PMID: 38825715 PMCID: PMC11145813 DOI: 10.1186/s12896-024-00864-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Accepted: 05/27/2024] [Indexed: 06/04/2024] Open
Abstract
BACKGROUND As part of a publicly funded initiative to develop genetically engineered Brassicas (cabbage, cauliflower, and canola) expressing Bacillus thuringiensis Crystal (Cry)-encoded insecticidal (Bt) toxin for Indian and Australian farmers, we designed several constructs that drive high-level expression of modified Cry1B and Cry1C genes (referred to as Cry1BM and Cry1CM; with M indicating modified). The two main motivations for modifying the DNA sequences of these genes were to minimise any licensing cost associated with the commercial cultivation of transgenic crop plants expressing CryM genes, and to remove or alter sequences that might adversely affect their activity in plants. RESULTS To assess the insecticidal efficacy of the Cry1BM/Cry1CM genes, constructs were introduced into the model Brassica Arabidopsis thaliana in which Cry1BM/Cry1CM expression was directed from either single (S4/S7) or double (S4S4/S7S7) subterranean clover stunt virus (SCSV) promoters. The resulting transgenic plants displayed a high-level of Cry1BM/Cry1CM expression. Protein accumulation for Cry1CM ranged from 5.18 to 176.88 µg Cry1CM/g dry weight of leaves. Contrary to previous work on stunt promoters, we found no correlation between the use of either single or double stunt promoters and the expression levels of Cry1BM/Cry1CM genes, with a similar range of Cry1CM transcript abundance and protein content observed from both constructs. First instar Diamondback moth (Plutella xylostella) larvae fed on transgenic Arabidopsis leaves expressing the Cry1BM/Cry1CM genes showed 100% mortality, with a mean leaf damage score on a scale of zero to five of 0.125 for transgenic leaves and 4.2 for wild-type leaves. CONCLUSIONS Our work indicates that the modified Cry1 genes are suitable for the development of insect resistant GM crops. Except for the PAT gene in the USA, our assessment of the intellectual property landscape of components presents within the constructs described here suggest that they can be used without the need for further licensing. This has the capacity to significantly reduce the cost of developing and using these Cry1M genes in GM crop plants in the future.
Collapse
Affiliation(s)
- Md Mahmudul Hassan
- School of Biosciences, University of Melbourne, Parkville, VIC, 3010, Australia
- Department of Genetics and Plant Breeding, Patuakhali Science and Technology University, Dumki, Patuakhali, 8602, Bangladesh
| | - Francis Tenazas
- School of Biosciences, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Adam Williams
- School of Biosciences, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Jing-Wen Chiu
- School of Agriculture, Food and Ecosystem Sciences, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Charles Robin
- School of Biosciences, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Derek A Russell
- Melbourne Veterinary School, University of Melbourne, Parkville, VIC, 3010, Australia
| | - John F Golz
- School of Biosciences, University of Melbourne, Parkville, VIC, 3010, Australia.
| |
Collapse
|
5
|
Zeng B, Bendl J, Deng C, Lee D, Misir R, Reach SM, Kleopoulos SP, Auluck P, Marenco S, Lewis DA, Haroutunian V, Ahituv N, Fullard JF, Hoffman GE, Roussos P. Genetic regulation of cell type-specific chromatin accessibility shapes brain disease etiology. Science 2024; 384:eadh4265. [PMID: 38781378 DOI: 10.1126/science.adh4265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 12/20/2023] [Indexed: 05/25/2024]
Abstract
Nucleotide variants in cell type-specific gene regulatory elements in the human brain are risk factors for human disease. We measured chromatin accessibility in 1932 aliquots of sorted neurons and non-neurons from 616 human postmortem brains and identified 34,539 open chromatin regions with chromatin accessibility quantitative trait loci (caQTLs). Only 10.4% of caQTLs are shared between neurons and non-neurons, which supports cell type-specific genetic regulation of the brain regulome. Incorporating allele-specific chromatin accessibility improves statistical fine-mapping and refines molecular mechanisms that underlie disease risk. Using massively parallel reporter assays in induced excitatory neurons, we screened 19,893 brain QTLs and identified the functional impact of 476 regulatory variants. Combined, this comprehensive resource captures variation in the human brain regulome and provides insights into disease etiology.
Collapse
Affiliation(s)
- Biao Zeng
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jaroslav Bendl
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Chengyu Deng
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Donghoon Lee
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Ruth Misir
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sarah M Reach
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Steven P Kleopoulos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Pavan Auluck
- Human Brain Collection Core, National Institute of Mental Health-Intramural Research Program, Bethesda, MD 20892, USA
| | - Stefano Marenco
- Human Brain Collection Core, National Institute of Mental Health-Intramural Research Program, Bethesda, MD 20892, USA
| | - David A Lewis
- Translational Neuroscience Program, Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Vahram Haroutunian
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Mental Illness Research, Education and Clinical Centers, James J. Peters VA Medical Center, Bronx, NY 10468, USA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94158, USA
| | - John F Fullard
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Gabriel E Hoffman
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Mental Illness Research, Education and Clinical Centers, James J. Peters VA Medical Center, Bronx, NY 10468, USA
| |
Collapse
|
6
|
Siraj L, Castro RI, Dewey H, Kales S, Nguyen TTL, Kanai M, Berenzy D, Mouri K, Wang QS, McCaw ZR, Gosai SJ, Aguet F, Cui R, Vockley CM, Lareau CA, Okada Y, Gusev A, Jones TR, Lander ES, Sabeti PC, Finucane HK, Reilly SK, Ulirsch JC, Tewhey R. Functional dissection of complex and molecular trait variants at single nucleotide resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.05.592437. [PMID: 38766054 PMCID: PMC11100724 DOI: 10.1101/2024.05.05.592437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Identifying the causal variants and mechanisms that drive complex traits and diseases remains a core problem in human genetics. The majority of these variants have individually weak effects and lie in non-coding gene-regulatory elements where we lack a complete understanding of how single nucleotide alterations modulate transcriptional processes to affect human phenotypes. To address this, we measured the activity of 221,412 trait-associated variants that had been statistically fine-mapped using a Massively Parallel Reporter Assay (MPRA) in 5 diverse cell-types. We show that MPRA is able to discriminate between likely causal variants and controls, identifying 12,025 regulatory variants with high precision. Although the effects of these variants largely agree with orthogonal measures of function, only 69% can plausibly be explained by the disruption of a known transcription factor (TF) binding motif. We dissect the mechanisms of 136 variants using saturation mutagenesis and assign impacted TFs for 91% of variants without a clear canonical mechanism. Finally, we provide evidence that epistasis is prevalent for variants in close proximity and identify multiple functional variants on the same haplotype at a small, but important, subset of trait-associated loci. Overall, our study provides a systematic functional characterization of likely causal common variants underlying complex and molecular human traits, enabling new insights into the regulatory grammar underlying disease risk.
Collapse
Affiliation(s)
- Layla Siraj
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Program in Biophysics, Harvard Graduate School of Arts and Sciences, Boston, MA, USA
- Harvard-Massachusetts Institute of Technology MD/PhD Program, Harvard Medical School, Boston, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | | | | | | | | | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA, USA
| | | | | | - Qingbo S. Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan
| | | | - Sager J. Gosai
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - François Aguet
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Ran Cui
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Caleb A. Lareau
- Program in Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| | - Thouis R. Jones
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Eric S. Lander
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Pardis C. Sabeti
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Hilary K. Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
| | - Steven K. Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Wu Tsai Institute, Yale University, New Haven, CT, USA
| | - Jacob C. Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
- Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA, USA
| |
Collapse
|
7
|
Park S, Kim M, Lee JW. Optimizing Nucleic Acid Delivery Systems through Barcode Technology. ACS Synth Biol 2024; 13:1006-1018. [PMID: 38526308 DOI: 10.1021/acssynbio.3c00602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Conventional biological experiments often focus on in vitro assays because of the inherent limitations when handling multiple variables in vivo, including labor-intensive and time-consuming procedures. Often only a subset of samples demonstrating significant efficacy in the in vitro assays can be evaluated in vivo. Nonetheless, because of the low correlation between the in vitro and in vivo tests, evaluation of the variables under examination in vivo and not solely in vitro is critical. An emerging approach to achieve high-throughput in vivo tests involves using a barcode system consisting of various nucleotide combinations. Unique barcodes for each variant enable the simultaneous testing of multiple entities, eliminating the need for separate individual tests. Subsequently, to identify crucial parameters, samples were collected and analyzed using barcode sequencing. This review explores the development of barcode design and its applications, including the evaluation of nucleic acid delivery systems and the optimization of gene expression in vivo.
Collapse
Affiliation(s)
- Soan Park
- Department of Chemical Engineering, Pohang University of Science and Technology, 77 CheongamRo, Gyeongbuk, 37673 NamGu, Pohang, Republic of Korea
| | - Mibang Kim
- Department of Chemical Engineering, Pohang University of Science and Technology, 77 CheongamRo, Gyeongbuk, 37673 NamGu, Pohang, Republic of Korea
| | - Jeong Wook Lee
- Department of Chemical Engineering, Pohang University of Science and Technology, 77 CheongamRo, Gyeongbuk, 37673 NamGu, Pohang, Republic of Korea
- School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology, 77 CheongamRo, Gyeongbuk, 37673 NamGu, Pohang, Republic of Korea
| |
Collapse
|
8
|
Gunamalai L, Singh P, Berg B, Shi L, Sanchez E, Smith A, Breton G, Bedford MT, Balciunas D, Kapoor A. Functional characterization of QT interval associated SCN5A enhancer variants identify combined additive effects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.11.584440. [PMID: 38559211 PMCID: PMC10979898 DOI: 10.1101/2024.03.11.584440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Several empirical and theoretical studies suggest presence of multiple enhancers per gene that collectively regulate gene expression, and that common sequence variation impacting on the activities of these enhancers is a major source of inter-individual variability in gene expression. However, for vast majority of genes, enhancers and the underlying regulatory variation remains unknown. Even for the genes with well-characterized enhancers, the nature of the combined effects from multiple enhancers and their variants, when known, on gene expression regulation remains unexplored. Here, we have evaluated the combined effects from five SCN5A enhancers and their regulatory variants that are known to collectively correlate with SCN5A cardiac expression and underlie QT interval association in the general population. Using small deletions centered at the regulatory variants in episomal reporter assays in a mouse cardiomyocyte cell line we demonstrate that the variants and their flanking sequences play critical role in individual enhancer activities, likely being a transcription factor (TF) binding site. By performing oligonucleotide-based pulldown assays on predicted TFs we identify the TFs likely driving allele-specific enhancer activities. Using all 32 possible allelic synthetic constructs in reporter assays, representing the five biallelic enhancers in tandem in their genomic order, we demonstrate combined additive effects on overall enhancer activities. Using transient enhancer assays in developing zebrafish embryos we demonstrate the four out the five enhancer elements act as enhancers in vivo . Together, these studies extend the previous findings to uncover the TFs driving the enhancer activities of QT interval associated SCN5A regulatory variants, reveal the additive effects from allelic combinations of these regulatory variants, and prove their potential to act as enhancers in vivo .
Collapse
|
9
|
Jiao Y, Nigam D, Barry K, Daum C, Yoshinaga Y, Lipzen A, Khan A, Parasa SP, Wei S, Lu Z, Tello-Ruiz MK, Dhiman P, Burow G, Hayes C, Chen J, Brandizzi F, Mortimer J, Ware D, Xin Z. A large sequenced mutant library - valuable reverse genetic resource that covers 98% of sorghum genes. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 117:1543-1557. [PMID: 38100514 DOI: 10.1111/tpj.16582] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 09/08/2023] [Accepted: 11/29/2023] [Indexed: 12/17/2023]
Abstract
Mutant populations are crucial for functional genomics and discovering novel traits for crop breeding. Sorghum, a drought and heat-tolerant C4 species, requires a vast, large-scale, annotated, and sequenced mutant resource to enhance crop improvement through functional genomics research. Here, we report a sorghum large-scale sequenced mutant population with 9.5 million ethyl methane sulfonate (EMS)-induced mutations that covered 98% of sorghum's annotated genes using inbred line BTx623. Remarkably, a total of 610 320 mutations within the promoter and enhancer regions of 18 000 and 11 790 genes, respectively, can be leveraged for novel research of cis-regulatory elements. A comparison of the distribution of mutations in the large-scale mutant library and sorghum association panel (SAP) provides insights into the influence of selection. EMS-induced mutations appeared to be random across different regions of the genome without significant enrichment in different sections of a gene, including the 5' UTR, gene body, and 3'-UTR. In contrast, there were low variation density in the coding and UTR regions in the SAP. Based on the Ka /Ks value, the mutant library (~1) experienced little selection, unlike the SAP (0.40), which has been strongly selected through breeding. All mutation data are publicly searchable through SorbMutDB (https://www.depts.ttu.edu/igcast/sorbmutdb.php) and SorghumBase (https://sorghumbase.org/). This current large-scale sequence-indexed sorghum mutant population is a crucial resource that enriched the sorghum gene pool with novel diversity and a highly valuable tool for the Poaceae family, that will advance plant biology research and crop breeding.
Collapse
Affiliation(s)
- Yinping Jiao
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance, Texas Tech University, Lubbock, Texas, 79409, USA
| | - Deepti Nigam
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance, Texas Tech University, Lubbock, Texas, 79409, USA
| | - Kerrie Barry
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
| | - Chris Daum
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
| | - Yuko Yoshinaga
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
| | - Anna Lipzen
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
| | - Adil Khan
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance, Texas Tech University, Lubbock, Texas, 79409, USA
| | - Sai-Praneeth Parasa
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance, Texas Tech University, Lubbock, Texas, 79409, USA
| | - Sharon Wei
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA
| | - Zhenyuan Lu
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA
| | | | - Pallavi Dhiman
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance, Texas Tech University, Lubbock, Texas, 79409, USA
| | - Gloria Burow
- Plant Stress and Germplasm Development Unit, Crop Systems Research Laboratory, USDA-ARS, 3810, 4th Street, Lubbock, Texas, 79424, USA
| | - Chad Hayes
- Plant Stress and Germplasm Development Unit, Crop Systems Research Laboratory, USDA-ARS, 3810, 4th Street, Lubbock, Texas, 79424, USA
| | - Junping Chen
- Plant Stress and Germplasm Development Unit, Crop Systems Research Laboratory, USDA-ARS, 3810, 4th Street, Lubbock, Texas, 79424, USA
| | - Federica Brandizzi
- MSU-DOE Plant Research Lab, Michigan State University, East Lansing, Michigan, USA
- Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, Michigan, USA
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, USA
| | - Jenny Mortimer
- Joint BioEnergy Institute, Emeryville, California, 94608, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, California, 94720, USA
- School of Agriculture, Food and Wine, Waite Research Institute, Waite Research Precinct, University of Adelaide, Glen Osmond, South Australia, 5064, Australia
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA
- USDA-ARS NAA Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, New York, 14853, USA
| | - Zhanguo Xin
- Plant Stress and Germplasm Development Unit, Crop Systems Research Laboratory, USDA-ARS, 3810, 4th Street, Lubbock, Texas, 79424, USA
| |
Collapse
|
10
|
Kleinschmidt H, Xu C, Bai L. Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation. Chromosoma 2023; 132:167-189. [PMID: 37184694 PMCID: PMC10542970 DOI: 10.1007/s00412-023-00796-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/16/2023]
Abstract
Despite the recent explosion in genome-wide studies in chromatin and gene regulation, we are still far from extracting a set of genetic rules that can predict the function of the regulatory genome. One major reason for this deficiency is that gene regulation is a multi-layered process that involves an enormous variable space, which cannot be fully explored using native genomes. This problem can be partially solved by introducing synthetic DNA libraries into cells, a method that can test the regulatory roles of thousands to millions of sequences with limited variables. Here, we review recent applications of this method to study transcription factor (TF) binding, nucleosome positioning, and transcriptional activity. We discuss the design principles, experimental procedures, and major findings from these studies and compare the pros and cons of different approaches.
Collapse
Affiliation(s)
- Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Physics, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
11
|
Collins MA, Avery R, Albert FW. Substrate-specific effects of natural genetic variation on proteasome activity. PLoS Genet 2023; 19:e1010734. [PMID: 37126494 PMCID: PMC10174532 DOI: 10.1371/journal.pgen.1010734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 05/11/2023] [Accepted: 04/04/2023] [Indexed: 05/02/2023] Open
Abstract
Protein degradation is an essential biological process that regulates protein abundance and removes misfolded and damaged proteins from cells. In eukaryotes, most protein degradation occurs through the stepwise actions of two functionally distinct entities, the ubiquitin system and the proteasome. Ubiquitin system enzymes attach ubiquitin to cellular proteins, targeting them for degradation. The proteasome then selectively binds and degrades ubiquitinated substrate proteins. Genetic variation in ubiquitin system genes creates heritable differences in the degradation of their substrates. However, the challenges of measuring the degradative activity of the proteasome independently of the ubiquitin system in large samples have limited our understanding of genetic influences on the proteasome. Here, using the yeast Saccharomyces cerevisiae, we built and characterized reporters that provide high-throughput, ubiquitin system-independent measurements of proteasome activity. Using single-cell measurements of proteasome activity from millions of genetically diverse yeast cells, we mapped 15 loci across the genome that influence proteasomal protein degradation. Twelve of these 15 loci exerted specific effects on the degradation of two distinct proteasome substrates, revealing a high degree of substrate-specificity in the genetics of proteasome activity. Using CRISPR-Cas9-based allelic engineering, we resolved a locus to a causal variant in the promoter of RPT6, a gene that encodes a subunit of the proteasome's 19S regulatory particle. The variant increases RPT6 expression, which we show results in increased proteasome activity. Our results reveal the complex genetic architecture of proteasome activity and suggest that genetic influences on the proteasome may be an important source of variation in the many cellular and organismal traits shaped by protein degradation.
Collapse
Affiliation(s)
- Mahlon A. Collins
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Randi Avery
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Frank W. Albert
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
12
|
Mechanisms of regulatory evolution in yeast. Curr Opin Genet Dev 2022; 77:101998. [PMID: 36220001 PMCID: PMC10117219 DOI: 10.1016/j.gde.2022.101998] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 08/30/2022] [Accepted: 09/01/2022] [Indexed: 02/06/2023]
Abstract
Studies of regulatory variation in yeast - at the level of new mutations, polymorphisms within a species, and divergence between species - have provided great insight into the molecular and evolutionary processes responsible for the evolution of gene expression in eukaryotes. The increasing ease with which yeast genomes can be manipulated and expression quantified in a high-throughput manner has recently accelerated mechanistic studies of cis- and trans-regulatory variation at multiple evolutionary timescales. These studies have, for example, identified differences in the properties of cis- and trans-acting mutations that affect their evolutionary fate, experimentally characterized the molecular mechanisms through which cis- and trans-regulatory variants act, and illustrated how regulatory networks can diverge between species with or without changes in gene expression.
Collapse
|
13
|
High-throughput approaches to functional characterization of genetic variation in yeast. Curr Opin Genet Dev 2022; 76:101979. [PMID: 36075138 DOI: 10.1016/j.gde.2022.101979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 07/29/2022] [Accepted: 08/02/2022] [Indexed: 11/20/2022]
Abstract
Expansion of sequencing efforts to include thousands of genomes is providing a fundamental resource for determining the genetic diversity that exists in a population. Now, high-throughput approaches are necessary to begin to understand the role these genotypic changes play in affecting phenotypic variation. Saccharomyces cerevisiae maintains its position as an excellent model system to determine the function of unknown variants with its exceptional genetic diversity, phenotypic diversity, and reliable genetic manipulation tools. Here, we review strategies and techniques developed in yeast that scale classic approaches of assessing variant function. These approaches improve our ability to better map quantitative trait loci at a higher resolution, even for rare variants, and are already providing greater insight into the role that different types of mutations play in phenotypic variation and evolution not just in yeast but across taxa.
Collapse
|
14
|
Vaknin I, Amit R. Molecular and experimental tools to design synthetic enhancers. Curr Opin Biotechnol 2022; 76:102728. [PMID: 35525178 DOI: 10.1016/j.copbio.2022.102728] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Revised: 03/16/2022] [Accepted: 04/03/2022] [Indexed: 11/03/2022]
Abstract
Understanding the grammar of enhancers and how they regulate gene expression is key for both basic research and for the pharma and biotech industries. The design and characterization of synthetic enhancers can expand the known regulatory space. This is achieved by the utilization of DNA Oligo Libraries (OLs), which facilitates screening of as many as millions of synthetic enhancer variants simultaneously. This review includes the latest commercial DNA OL synthesis technology and its capabilities, and a general 'know-how' guide for the design, construction, and analysis of OL-based synthetic enhancer characterization experiments. Specifically, we focus on synthetic-enhancer-based massively parallel reporter assay, Sort-seq methodologies (e.g. flow cytometry, deep sequencing), and a brief description of machine learning-based attempts for OL-analysis and follow-up validation experiments.
Collapse
Affiliation(s)
- Inbal Vaknin
- Department of Biotechnology and Food Engineering, Technion - Israel Institute of Technology, Haifa 3200000, Israel
| | - Roee Amit
- Department of Biotechnology and Food Engineering, Technion - Israel Institute of Technology, Haifa 3200000, Israel; The Russell Berrie Nanotechnology Institute, Technion - Israel Institute of Technology, Haifa 3200000, Israel.
| |
Collapse
|
15
|
Abell NS, DeGorter MK, Gloudemans MJ, Greenwald E, Smith KS, He Z, Montgomery SB. Multiple causal variants underlie genetic associations in humans. Science 2022; 375:1247-1254. [PMID: 35298243 PMCID: PMC9725108 DOI: 10.1126/science.abj5117] [Citation(s) in RCA: 74] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Associations between genetic variation and traits are often in noncoding regions with strong linkage disequilibrium (LD), where a single causal variant is assumed to underlie the association. We applied a massively parallel reporter assay (MPRA) to functionally evaluate genetic variants in high, local LD for independent cis-expression quantitative trait loci (eQTL). We found that 17.7% of eQTLs exhibit more than one major allelic effect in tight LD. The detected regulatory variants were highly and specifically enriched for activating chromatin structures and allelic transcription factor binding. Integration of MPRA profiles with eQTL/complex trait colocalizations across 114 human traits and diseases identified causal variant sets demonstrating how genetic association signals can manifest through multiple, tightly linked causal variants.
Collapse
Affiliation(s)
- Nathan S. Abell
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Marianne K. DeGorter
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| | | | - Emily Greenwald
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Kevin S. Smith
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Stephen B. Montgomery
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
16
|
The evolution, evolvability and engineering of gene regulatory DNA. Nature 2022; 603:455-463. [PMID: 35264797 DOI: 10.1038/s41586-022-04506-6] [Citation(s) in RCA: 92] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 02/02/2022] [Indexed: 11/08/2022]
Abstract
Mutations in non-coding regulatory DNA sequences can alter gene expression, organismal phenotype and fitness1-3. Constructing complete fitness landscapes, in which DNA sequences are mapped to fitness, is a long-standing goal in biology, but has remained elusive because it is challenging to generalize reliably to vast sequence spaces4-6. Here we build sequence-to-expression models that capture fitness landscapes and use them to decipher principles of regulatory evolution. Using millions of randomly sampled promoter DNA sequences and their measured expression levels in the yeast Saccharomyces cerevisiae, we learn deep neural network models that generalize with excellent prediction performance, and enable sequence design for expression engineering. Using our models, we study expression divergence under genetic drift and strong-selection weak-mutation regimes to find that regulatory evolution is rapid and subject to diminishing returns epistasis; that conflicting expression objectives in different environments constrain expression adaptation; and that stabilizing selection on gene expression leads to the moderation of regulatory complexity. We present an approach for using such models to detect signatures of selection on expression from natural variation in regulatory sequences and use it to discover an instance of convergent regulatory evolution. We assess mutational robustness, finding that regulatory mutation effect sizes follow a power law, characterize regulatory evolvability, visualize promoter fitness landscapes, discover evolvability archetypes and illustrate the mutational robustness of natural regulatory sequence populations. Our work provides a general framework for designing regulatory sequences and addressing fundamental questions in regulatory evolution.
Collapse
|
17
|
Shih CH, Fay J. Cis-regulatory variants affect gene expression dynamics in yeast. eLife 2021; 10:e68469. [PMID: 34369376 PMCID: PMC8367379 DOI: 10.7554/elife.68469] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 08/06/2021] [Indexed: 12/14/2022] Open
Abstract
Evolution of cis-regulatory sequences depends on how they affect gene expression and motivates both the identification and prediction of cis-regulatory variants responsible for expression differences within and between species. While much progress has been made in relating cis-regulatory variants to expression levels, the timing of gene activation and repression may also be important to the evolution of cis-regulatory sequences. We investigated allele-specific expression (ASE) dynamics within and between Saccharomyces species during the diauxic shift and found appreciable cis-acting variation in gene expression dynamics. Within-species ASE is associated with intergenic variants, and ASE dynamics are more strongly associated with insertions and deletions than ASE levels. To refine these associations, we used a high-throughput reporter assay to test promoter regions and individual variants. Within the subset of regions that recapitulated endogenous expression, we identified and characterized cis-regulatory variants that affect expression dynamics. Between species, chimeric promoter regions generate novel patterns and indicate constraints on the evolution of gene expression dynamics. We conclude that changes in cis-regulatory sequences can tune gene expression dynamics and that the interplay between expression dynamics and other aspects of expression is relevant to the evolution of cis-regulatory sequences.
Collapse
Affiliation(s)
- Ching-Hua Shih
- Department of Biology, University of RochesterRochesterUnited States
| | - Justin Fay
- Department of Biology, University of RochesterRochesterUnited States
| |
Collapse
|
18
|
Letiagina AE, Omelina ES, Ivankin AV, Pindyurin AV. MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. Front Genet 2021; 12:618189. [PMID: 34046055 PMCID: PMC8148044 DOI: 10.3389/fgene.2021.618189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/25/2021] [Indexed: 11/13/2022] Open
Abstract
Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC-ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC-ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional "mapping" samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.
Collapse
Affiliation(s)
- Anna E Letiagina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Faculty of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Evgeniya S Omelina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anton V Ivankin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Alexey V Pindyurin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|