1
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 PMCID: PMC11535156 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
2
|
Sasse A, Chikina M, Mostafavi S. Quick and effective approximation of in silico saturation mutagenesis experiments with first-order taylor expansion. iScience 2024; 27:110807. [PMID: 39286491 PMCID: PMC11404212 DOI: 10.1016/j.isci.2024.110807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/08/2024] [Accepted: 08/20/2024] [Indexed: 09/19/2024] Open
Abstract
To understand the decision process of genomic sequence-to-function models, explainable AI algorithms determine the importance of each nucleotide in a given input sequence to the model's predictions and enable discovery of cis-regulatory motifs for gene regulation. The most commonly applied method is in silico saturation mutagenesis (ISM) because its per-nucleotide importance scores can be intuitively understood as the computational counterpart to in vivo saturation mutagenesis experiments. While ISM is highly interpretable, it is computationally challenging to perform for many sequences, and becomes prohibitive as the length of the input sequences and size of the model grows. Here, we use the first-order Taylor approximation to approximate ISM values from the model's gradient, which reduces its computation cost to a single forward pass for an input sequence. We show that the Taylor ISM (TISM) approximation is robust across different model ablations, random initializations, training parameters, and dataset sizes.
Collapse
Affiliation(s)
- Alexander Sasse
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Maria Chikina
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 16354, USA
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
- Canadian Institute for Advanced Research, Toronto, ON MG5 1ZB, Canada
| |
Collapse
|
3
|
Fan S, Ma L, Song C, Han X, Zhong B, Lin Y. Promoter DNA methylation and transcription factor condensation are linked to transcriptional memory in mammalian cells. Cell Syst 2024; 15:808-823.e6. [PMID: 39243757 DOI: 10.1016/j.cels.2024.08.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 06/08/2024] [Accepted: 08/15/2024] [Indexed: 09/09/2024]
Abstract
The regulation of genes can be mathematically described by input-output functions that are typically assumed to be time invariant. This fundamental assumption underpins the design of synthetic gene circuits and the quantitative understanding of natural gene regulatory networks. Here, we found that this assumption is challenged in mammalian cells. We observed that a synthetic reporter gene can exhibit unexpected transcriptional memory, leading to a shift in the dose-response curve upon a second induction. Mechanistically, we investigated the cis-dependency of transcriptional memory, revealing the necessity of promoter DNA methylation in establishing memory. Furthermore, we showed that the synthetic transcription factor's effective DNA binding affinity underlies trans-dependency, which is associated with its capacity to undergo biomolecular condensation. These principles enabled modulating memory by perturbing either cis- or trans-regulation of genes. Together, our findings suggest the potential pervasiveness of transcriptional memory and implicate the need to model mammalian gene regulation with time-varying input-output functions. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Shenqi Fan
- Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China
| | - Liang Ma
- Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China.
| | - Chengzhi Song
- Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China
| | - Xu Han
- Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China
| | - Bijunyao Zhong
- Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China; School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Yihan Lin
- Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China; Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu 610213, Sichuan, China.
| |
Collapse
|
4
|
He AY, Danko CG. Dissection of core promoter syntax through single nucleotide resolution modeling of transcription initiation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.13.583868. [PMID: 38559255 PMCID: PMC10979970 DOI: 10.1101/2024.03.13.583868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
How the DNA sequence of cis-regulatory elements encode transcription initiation patterns remains poorly understood. Here we introduce CLIPNET, a deep learning model trained on population-scale PRO-cap data that predicts the position and quantity of transcription initiation with single nucleotide resolution from DNA sequence more accurately than existing approaches. Interpretation of CLIPNET revealed a complex regulatory syntax consisting of DNA-protein interactions in five major positions between -200 and +50 bp relative to the transcription start site, as well as more subtle positional preferences among transcriptional activators. Transcriptional activator and core promoter motifs work non-additively to encode distinct aspects of initiation, with the former driving initiation quantity and the latter initiation position. We identified core promoter motifs that explain initiation patterns in the majority of promoters and enhancers, including DPR motifs and AT-rich TBP binding sequences in TATA-less promoters. Our results provide insights into the sequence architecture governing transcription initiation.
Collapse
Affiliation(s)
- Adam Y. He
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University
- Graduate Field of Computational Biology, Cornell University
| | - Charles G. Danko
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University
| |
Collapse
|
5
|
Kim J, Muller RY, Bondra ER, Ingolia NT. CRISPRi with barcoded expression reporters dissects regulatory networks in human cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.06.611573. [PMID: 39282439 PMCID: PMC11398470 DOI: 10.1101/2024.09.06.611573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/20/2024]
Abstract
Genome-wide CRISPR screens have emerged as powerful tools for uncovering the genetic underpinnings of diverse biological processes. Incisive screens often depend on directly measuring molecular phenotypes, such as regulated gene expression changes, provoked by CRISPR-mediated genetic perturbations. Here, we provide quantitative measurements of transcriptional responses in human cells across genome-scale perturbation libraries by coupling CRISPR interference (CRISPRi) with barcoded expression reporter sequencing (CiBER-seq). To enable CiBER-seq in mammalian cells, we optimize the integration of highly complex, barcoded sgRNA libraries into a defined genomic context. CiBER-seq profiling of a nuclear factor kappa B (NF-κB) reporter delineates the canonical signaling cascade linking the transmembrane TNF-alpha receptor to inflammatory gene activation and highlights cell-type-specific factors in this response. Importantly, CiBER-seq relies solely on bulk RNA sequencing to capture the regulatory circuit driving this rapid transcriptional response. Our work demonstrates the accuracy of CiBER-seq and its potential for dissecting genetic networks in mammalian cells with superior time resolution.
Collapse
Affiliation(s)
- Jinyoung Kim
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Ryan Y. Muller
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Eliana R. Bondra
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Nicholas T. Ingolia
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
6
|
Engreitz JM, Lawson HA, Singh H, Starita LM, Hon GC, Carter H, Sahni N, Reddy TE, Lin X, Li Y, Munshi NV, Chahrour MH, Boyle AP, Hitz BC, Mortazavi A, Craven M, Mohlke KL, Pinello L, Wang T, Kundaje A, Yue F, Cody S, Farrell NP, Love MI, Muffley LA, Pazin MJ, Reese F, Van Buren E, Dey KK, Kircher M, Ma J, Radivojac P, Balliu B, Williams BA, Huangfu D, Park CY, Quertermous T, Das J, Calderwood MA, Fowler DM, Vidal M, Ferreira L, Mooney SD, Pejaver V, Zhao J, Gazal S, Koch E, Reilly SK, Sunyaev S, Carpenter AE, Buenrostro JD, Leslie CS, Savage RE, Giric S, Luo C, Plath K, Barrera A, Schubach M, Gschwind AR, Moore JE, Ahituv N, Yi SS, Hallgrimsdottir I, Gaulton KJ, Sakaue S, Booeshaghi S, Mattei E, Nair S, Pachter L, Wang AT, Shendure J, Agarwal V, Blair A, Chalkiadakis T, Chardon FM, Dash PM, Deng C, Hamazaki N, Keukeleire P, Kubo C, Lalanne JB, Maass T, Martin B, McDiarmid TA, Nobuhara M, Page NF, Regalado S, Sims J, Ushiki A, Best SM, Boyle G, Camp N, Casadei S, Da EY, Dawood M, Dawson SC, Fayer S, Hamm A, James RG, Jarvik GP, McEwen AE, Moore N, Pendyala S, Popp NA, Post M, Rubin AF, Smith NT, Stone J, Tejura M, Wang ZR, Wheelock MK, Woo I, Zapp BD, Amgalan D, Aradhana A, Arana SM, Bassik MC, Bauman JR, Bhattacharya A, Cai XS, Chen Z, Conley S, Deshpande S, Doughty BR, Du PP, Galante JA, Gifford C, Greenleaf WJ, Guo K, Gupta R, Isobe S, Jagoda E, Jain N, Jones H, Kang HY, Kim SH, Kim Y, Klemm S, Kundu R, Kundu S, Lago-Docampo M, Lee-Yow YC, Levin-Konigsberg R, Li DY, Lindenhofer D, Ma XR, Marinov GK, Martyn GE, McCreery CV, Metzl-Raz E, Monteiro JP, Montgomery MT, Mualim KS, Munger C, Munson G, Nguyen TC, Nguyen T, Palmisano BT, Pampari A, Rabinovitch M, Ramste M, Ray J, Roy KR, Rubio OM, Schaepe JM, Schnitzler G, Schreiber J, Sharma D, Sheth MU, Shi H, Singh V, Sinha R, Steinmetz LM, Tan J, Tan A, Tycko J, Valbuena RC, Amiri VVP, van Kooten MJFM, Vaughan-Jackson A, Venida A, Weldy CS, Worssam MD, Xia F, Yao D, Zeng T, Zhao Q, Zhou R, Chen ZS, Cimini BA, Coppin G, Coté AG, Haghighi M, Hao T, Hill DE, Lacoste J, Laval F, Reno C, Roth FP, Singh S, Spirohn-Fitzgerald K, Taipale M, Teelucksingh T, Tixhon M, Yadav A, Yang Z, Kraus WL, Armendariz DA, Dederich AE, Gogate A, El Hayek L, Goetsch SC, Kaur K, Kim HB, McCoy MK, Nzima MZ, Pinzón-Arteaga CA, Posner BA, Schmitz DA, Sivakumar S, Sundarrajan A, Wang L, Wang Y, Wu J, Xu L, Xu J, Yu L, Zhang Y, Zhao H, Zhou Q, Won H, Bell JL, Broadaway KA, Degner KN, Etheridge AS, Koller BH, Mah W, Mu W, Ritola KD, Rosen JD, Schoenrock SA, Sharp RA, Bauer D, Lettre G, Sherwood R, Becerra B, Blaine LJ, Che E, Francoeur MJ, Gibbs EN, Kim N, King EM, Kleinstiver BP, Lecluze E, Li Z, Patel ZM, Phan QV, Ryu J, Starr ML, Wu T, Gersbach CA, Crawford GE, Allen AS, Majoros WH, Iglesias N, Rai R, Venukuttan R, Li B, Anglen T, Bounds LR, Hamilton MC, Liu S, McCutcheon SR, McRoberts Amador CD, Reisman SJ, ter Weele MA, Bodle JC, Streff HL, Siklenka K, Strouse K, Bernstein BE, Babu J, Corona GB, Dong K, Duarte FM, Durand NC, Epstein CB, Fan K, Gaskell E, Hall AW, Ham AM, Knudson MK, Shoresh N, Wekhande S, White CM, Xi W, Satpathy AT, Corces MR, Chang SH, Chin IM, Gardner JM, Gardell ZA, Gutierrez JC, Johnson AW, Kampman L, Kasowski M, Lareau CA, Liu V, Ludwig LS, McGinnis CS, Menon S, Qualls A, Sandor K, Turner AW, Ye CJ, Yin Y, Zhang W, Wold BJ, Carilli M, Cheong D, Filibam G, Green K, Kawauchi S, Kim C, Liang H, Loving R, Luebbert L, MacGregor G, Merchan AG, Rebboah E, Rezaie N, Sakr J, Sullivan DK, Swarna N, Trout D, Upchurch S, Weber R, Castro CP, Chou E, Feng F, Guerra A, Huang Y, Jiang L, Liu J, Mills RE, Qian W, Qin T, Sartor MA, Sherpa RN, Wang J, Wang Y, Welch JD, Zhang Z, Zhao N, Mukherjee S, Page CD, Clarke S, Doty RW, Duan Y, Gordan R, Ko KY, Li S, Li B, Thomson A, Raychaudhuri S, Price A, Ali TA, Dey KK, Durvasula A, Kellis M, Iakoucheva LM, Kakati T, Chen Y, Benazouz M, Jain S, Zeiberg D, De Paolis Kaluza MC, Velyunskiy M, Gasch A, Huang K, Jin Y, Lu Q, Miao J, Ohtake M, Scopel E, Steiner RD, Sverchkov Y, Weng Z, Garber M, Fu Y, Haas N, Li X, Phalke N, Shan SC, Shedd N, Yu T, Zhang Y, Zhou H, Battle A, Jerby L, Kotler E, Kundu S, Marderstein AR, Montgomery SB, Nigam A, Padhi EM, Patel A, Pritchard J, Raine I, Ramalingam V, Rodrigues KB, Schreiber JM, Singhal A, Sinha R, Wang AT, Abundis M, Bisht D, Chakraborty T, Fan J, Hall DR, Rarani ZH, Jain AK, Kaundal B, Keshari S, McGrail D, Pease NA, Yi VF, Wu H, Kannan S, Song H, Cai J, Gao Z, Kurzion R, Leu JI, Li F, Liang D, Ming GL, Musunuru K, Qiu Q, Shi J, Su Y, Tishkoff S, Xie N, Yang Q, Yang W, Zhang H, Zhang Z, Beer MA, Hadjantonakis AK, Adeniyi S, Cho H, Cutler R, Glenn RA, Godovich D, Hu N, Jovanic S, Luo R, Oh JW, Razavi-Mohseni M, Shigaki D, Sidoli S, Vierbuchen T, Wang X, Williams B, Yan J, Yang D, Yang Y, Sander M, Gaulton KJ, Ren B, Bartosik W, Indralingam HS, Klie A, Mummey H, Okino ML, Wang G, Zemke NR, Zhang K, Zhu H, Zaitlen N, Ernst J, Langerman J, Li T, Sun Y, Rudensky AY, Periyakoil PK, Gao VR, Smith MH, Thomas NM, Donlin LT, Lakhanpal A, Southard KM, Ardy RC, Cherry JM, Gerstein MB, Andreeva K, Assis PR, Borsari B, Douglass E, Dong S, Gabdank I, Graham K, Jolanki O, Jou J, Kagda MS, Lee JW, Li M, Lin K, Miyasato SR, Rozowsky J, Small C, Spragins E, Tanaka FY, Whaling IM, Youngworth IA, Sloan CA, Belter E, Chen X, Chisholm RL, Dickson P, Fan C, Fulton L, Li D, Lindsay T, Luan Y, Luo Y, Lyu H, Ma X, Macias-Velasco J, Miga KH, Quaid K, Stitziel N, Stranger BE, Tomlinson C, Wang J, Zhang W, Zhang B, Zhao G, Zhuo X, Brennand K, Ciccia A, Hayward SB, Huang JW, Leuzzi G, Taglialatela A, Thakar T, Vaitsiankova A, Dey KK, Ali TA, Kim A, Grimes HL, Salomonis N, Gupta R, Fang S, Lee-Kim V, Heinig M, Losert C, Jones TR, Donnard E, Murphy M, Roberts E, Song S, Mostafavi S, Sasse A, Spiro A, Pennacchio LA, Kato M, Kosicki M, Mannion B, Slaven N, Visel A, Pollard KS, Drusinsky S, Whalen S, Ray J, Harten IA, Ho CH, Sanjana NE, Caragine C, Morris JA, Seruggia D, Kutschat AP, Wittibschlager S, Xu H, Fu R, He W, Zhang L, Osorio D, Bly Z, Calluori S, Gilchrist DA, Hutter CM, Morris SA, Samer EK. Deciphering the impact of genomic variation on function. Nature 2024; 633:47-57. [PMID: 39232149 DOI: 10.1038/s41586-024-07510-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 05/02/2024] [Indexed: 09/06/2024]
Abstract
Our genomes influence nearly every aspect of human biology-from molecular and cellular functions to phenotypes in health and disease. Studying the differences in DNA sequence between individuals (genomic variation) could reveal previously unknown mechanisms of human biology, uncover the basis of genetic predispositions to diseases, and guide the development of new diagnostic tools and therapeutic agents. Yet, understanding how genomic variation alters genome function to influence phenotype has proved challenging. To unlock these insights, we need a systematic and comprehensive catalogue of genome function and the molecular and cellular effects of genomic variants. Towards this goal, the Impact of Genomic Variation on Function (IGVF) Consortium will combine approaches in single-cell mapping, genomic perturbations and predictive modelling to investigate the relationships among genomic variation, genome function and phenotypes. IGVF will create maps across hundreds of cell types and states describing how coding variants alter protein activity, how noncoding variants change the regulation of gene expression, and how such effects connect through gene-regulatory and protein-interaction networks. These experimental data, computational predictions and accompanying standards and pipelines will be integrated into an open resource that will catalyse community efforts to explore how our genomes influence biology and disease across populations.
Collapse
|
7
|
Petersen RM, Vockley CM, Lea AJ. Uncovering methylation-dependent genetic effects on regulatory element function in diverse genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.23.609412. [PMID: 39229133 PMCID: PMC11370585 DOI: 10.1101/2024.08.23.609412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
A major goal in evolutionary biology and biomedicine is to understand the complex interactions between genetic variants, the epigenome, and gene expression. However, the causal relationships between these factors remain poorly understood. mSTARR-seq, a methylation-sensitive massively parallel reporter assay, is capable of identifying methylation-dependent regulatory activity at many thousands of genomic regions simultaneously, and allows for the testing of causal relationships between DNA methylation and gene expression on a region-by-region basis. Here, we developed a multiplexed mSTARR-seq protocol to assay naturally occurring human genetic variation from 25 individuals sampled from 10 localities in Europe and Africa. We identified 6,957 regulatory elements in either the unmethylated or methylated state, and this set was enriched for enhancer and promoter annotations, as expected. The expression of 58% of these regulatory elements was modulated by methylation, which was generally associated with decreased RNA expression. Within our set of regulatory elements, we used allele-specific expression analyses to identify 8,020 sites with genetic effects on gene regulation; further, we found that 42.3% of these genetic effects varied between methylated and unmethylated states. Sites exhibiting methylation-dependent genetic effects were enriched for GWAS and EWAS annotations, implicating them in human disease. Compared to datasets that assay DNA from a single European individual, our multiplexed assay uncovers dramatically more genetic effects and methylation-dependent genetic effects, highlighting the importance of including diverse individuals in assays which aim to understand gene regulatory processes.
Collapse
|
8
|
Chen W, Choi J, Li X, Nathans JF, Martin B, Yang W, Hamazaki N, Qiu C, Lalanne JB, Regalado S, Kim H, Agarwal V, Nichols E, Leith A, Lee C, Shendure J. Symbolic recording of signalling and cis-regulatory element activity to DNA. Nature 2024; 632:1073-1081. [PMID: 39020177 PMCID: PMC11357993 DOI: 10.1038/s41586-024-07706-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Accepted: 06/12/2024] [Indexed: 07/19/2024]
Abstract
Measurements of gene expression or signal transduction activity are conventionally performed using methods that require either the destruction or live imaging of a biological sample within the timeframe of interest. Here we demonstrate an alternative paradigm in which such biological activities are stably recorded to the genome. Enhancer-driven genomic recording of transcriptional activity in multiplex (ENGRAM) is based on the signal-dependent production of prime editing guide RNAs that mediate the insertion of signal-specific barcodes (symbols) into a genomically encoded recording unit. We show how this strategy can be used for multiplex recording of the cell-type-specific activities of dozens to hundreds of cis-regulatory elements with high fidelity, sensitivity and reproducibility. Leveraging signal transduction pathway-responsive cis-regulatory elements, we also demonstrate time- and concentration-dependent genomic recording of WNT, NF-κB and Tet-On activities. By coupling ENGRAM to sequential genome editing via DNA Typewriter1, we stably record information about the temporal dynamics of two orthogonal signalling pathways to genomic DNA. Finally we apply ENGRAM to integratively record the transient activity of nearly 100 transcription factor consensus motifs across daily windows spanning the differentiation of mouse embryonic stem cells into gastruloids, an in vitro model of early mammalian development. Although these are proof-of-concept experiments and much work remains to fully realize the possibilities, the symbolic recording of biological signals or states within cells, to the genome and over time, has broad potential to complement contemporary paradigms for how we make measurements in biological systems.
Collapse
Affiliation(s)
- Wei Chen
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, USA.
| | - Junhong Choi
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
- Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Xiaoyi Li
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Jenny F Nathans
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA, USA
| | - Beth Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Wei Yang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Nobuhiko Hamazaki
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
- Department of Obstetrics & Gynecology, University of Washington, Seattle, WA, USA
- Institute for Stem Cell & Regenerative Medicine, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
| | - Chengxiang Qiu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | | | - Samuel Regalado
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Haedong Kim
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Vikram Agarwal
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Eva Nichols
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Anh Leith
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
- Seattle Hub for Synthetic Biology, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
| |
Collapse
|
9
|
Trégouët DA, Morange PE. Next-generation sequencing strategies in venous thromboembolism: in whom and for what purpose? J Thromb Haemost 2024; 22:1826-1834. [PMID: 38641321 DOI: 10.1016/j.jtha.2024.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/04/2024] [Accepted: 04/05/2024] [Indexed: 04/21/2024]
Abstract
This invited review follows the oral presentation "To Sequence or Not to Sequence, That Is Not the Question; But 'When, Who, Which and What For?' Is" given during the State of the Art session "Translational Genomics in Thrombosis: From OMICs to Clinics" of the International Society on Thrombosis and Haemostasis 2023 Congress. Emphasizing the power of next-generation sequencing technologies and the diverse strategies associated with DNA variant analysis, this review highlights the unresolved questions and challenges in their implementation both for the clinical diagnosis of venous thromboembolism and in translational research.
Collapse
Affiliation(s)
- David-Alexandre Trégouët
- University of Bordeaux, Institut National de la Santé et de la Recherche Médicale, Bordeaux Population Health Research Center, Unité Mixte de Recherche 1219, Bordeaux, France.
| | - Pierre-Emmanuel Morange
- Cardiovascular and Nutrition Research Center (Centre de Recherche en CardioVasculaire et Nutrition), Institut National de la Santé et de la Recherche Médicale, Institut National de Recherche pour l'agriculture, l' Alimentation et l'Environnement, Aix-Marseille University, Marseille, France
| |
Collapse
|
10
|
Posfai A, Zhou J, McCandlish DM, Kinney JB. Gauge fixing for sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.12.593772. [PMID: 38798671 PMCID: PMC11118547 DOI: 10.1101/2024.05.12.593772] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
- Department of Biology, University of Florida, Gainesville, FL, 32611
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Justin B. Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
11
|
Mishra A, Jajodia A, Weston E, Jayavelu ND, Garcia M, Hossack D, Hawkins RD. Identification of functional enhancer variants associated with type I diabetes in CD4+ T cells. Front Immunol 2024; 15:1387253. [PMID: 38947339 PMCID: PMC11211866 DOI: 10.3389/fimmu.2024.1387253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Accepted: 04/09/2024] [Indexed: 07/02/2024] Open
Abstract
Type I diabetes is an autoimmune disease mediated by T-cell destruction of β cells in pancreatic islets. Currently, there is no known cure, and treatment consists of daily insulin injections. Genome-wide association studies and twin studies have indicated a strong genetic heritability for type I diabetes and implicated several genes. As most strongly associated variants are noncoding, there is still a lack of identification of functional and, therefore, likely causal variants. Given that many of these genetic variants reside in enhancer elements, we have tested 121 CD4+ T-cell enhancer variants associated with T1D. We found four to be functional through massively parallel reporter assays. Three of the enhancer variants weaken activity, while the fourth strengthens activity. We link these to their cognate genes using 3D genome architecture or eQTL data and validate them using CRISPR editing. Validated target genes include CLEC16A and SOCS1. While these genes have been previously implicated in type 1 diabetes and other autoimmune diseases, we show that enhancers controlling their expression harbor functional variants. These variants, therefore, may act as causal type 1 diabetic variants.
Collapse
Affiliation(s)
- Arpit Mishra
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States
| | - Ajay Jajodia
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States
| | - Eryn Weston
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States
| | - Naresh Doni Jayavelu
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States
| | - Mariana Garcia
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States
| | - Daniel Hossack
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States
| | - R. David Hawkins
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States
- Institute for Stem Cell and Regenerative Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Benaroya Research Institute at Virginia Mason, Seattle, WA, United States
| |
Collapse
|
12
|
Lalanne JB, Regalado SG, Domcke S, Calderon D, Martin BK, Li X, Li T, Suiter CC, Lee C, Trapnell C, Shendure J. Multiplex profiling of developmental cis-regulatory elements with quantitative single-cell expression reporters. Nat Methods 2024; 21:983-993. [PMID: 38724692 PMCID: PMC11166576 DOI: 10.1038/s41592-024-02260-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 03/22/2024] [Indexed: 06/13/2024]
Abstract
The inability to scalably and precisely measure the activity of developmental cis-regulatory elements (CREs) in multicellular systems is a bottleneck in genomics. Here we develop a dual RNA cassette that decouples the detection and quantification tasks inherent to multiplex single-cell reporter assays. The resulting measurement of reporter expression is accurate over multiple orders of magnitude, with a precision approaching the limit set by Poisson counting noise. Together with RNA barcode stabilization via circularization, these scalable single-cell quantitative expression reporters provide high-contrast readouts, analogous to classic in situ assays but entirely from sequencing. Screening >200 regions of accessible chromatin in a multicellular in vitro model of early mammalian development, we identify 13 (8 previously uncharacterized) autonomous and cell-type-specific developmental CREs. We further demonstrate that chimeric CRE pairs generate cognate two-cell-type activity profiles and assess gain- and loss-of-function multicellular expression phenotypes from CRE variants with perturbed transcription factor binding sites. Single-cell quantitative expression reporters can be applied in developmental and multicellular systems to quantitatively characterize native, perturbed and synthetic CREs at scale, with high sensitivity and at single-cell resolution.
Collapse
Affiliation(s)
| | - Samuel G Regalado
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Silvia Domcke
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Diego Calderon
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Xiaoyi Li
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Tony Li
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Chase C Suiter
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
| |
Collapse
|
13
|
Quantitative profiling of regulatory DNA activity at single-cell resolution. Nat Methods 2024; 21:936-937. [PMID: 38724694 DOI: 10.1038/s41592-024-02261-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2024]
|
14
|
Oriol F, Alberto M, Joachim AP, Patrick G, M BP, Ruben MF, Jaume B, Altair CH, Ferran P, Oriol G, Narcis FF, Baldo O. Structure-based learning to predict and model protein-DNA interactions and transcription-factor co-operativity in cis-regulatory elements. NAR Genom Bioinform 2024; 6:lqae068. [PMID: 38867914 PMCID: PMC11167492 DOI: 10.1093/nargab/lqae068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 04/18/2024] [Accepted: 05/23/2024] [Indexed: 06/14/2024] Open
Abstract
Transcription factor (TF) binding is a key component of genomic regulation. There are numerous high-throughput experimental methods to characterize TF-DNA binding specificities. Their application, however, is both laborious and expensive, which makes profiling all TFs challenging. For instance, the binding preferences of ∼25% human TFs remain unknown; they neither have been determined experimentally nor inferred computationally. We introduce a structure-based learning approach to predict the binding preferences of TFs and the automated modelling of TF regulatory complexes. We show the advantage of using our approach over the classical nearest-neighbor prediction in the limits of remote homology. Starting from a TF sequence or structure, we predict binding preferences in the form of motifs that are then used to scan a DNA sequence for occurrences. The best matches are either profiled with a binding score or collected for their subsequent modeling into a higher-order regulatory complex with DNA. Co-operativity is modelled by: (i) the co-localization of TFs and (ii) the structural modeling of protein-protein interactions between TFs and with co-factors. We have applied our approach to automatically model the interferon-β enhanceosome and the pioneering complexes of OCT4, SOX2 (or SOX11) and KLF4 with a nucleosome, which are compared with the experimentally known structures.
Collapse
Affiliation(s)
- Fornes Oriol
- Centre for Molecular Medicine and Therapeutics. BC Children's Hospital Research Institute. Department of Medical Genetics. University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Meseguer Alberto
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | | | - Gohl Patrick
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Bota Patricia M
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Molina-Fernández Ruben
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Bonet Jaume
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
- Laboratory of Protein Design & Immunoengineering. School of Engineering. Ecole Polytechnique Federale de Lausanne. Lausanne 1015, Vaud, Switzerland
| | - Chinchilla-Hernandez Altair
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Pegenaute Ferran
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Gallego Oriol
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Fernandez-Fuentes Narcis
- Institute of Biological, Environmental and Rural Science. Aberystwyth University, SY23 3DA Aberystwyth, UK
| | - Oliva Baldo
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| |
Collapse
|
15
|
Cochran K, Yin M, Mantripragada A, Schreiber J, Marinov GK, Kundaje A. Dissecting the cis-regulatory syntax of transcription initiation with deep learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.28.596138. [PMID: 38853896 PMCID: PMC11160661 DOI: 10.1101/2024.05.28.596138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Despite extensive characterization of mammalian Pol II transcription, the DNA sequence determinants of transcription initiation at a third of human promoters and most enhancers remain poorly understood. Hence, we trained and interpreted a neural network called ProCapNet that accurately models base-resolution initiation profiles from PRO-cap experiments using local DNA sequence. ProCapNet learns sequence motifs with distinct effects on initiation rates and TSS positioning and uncovers context-specific cryptic initiator elements intertwined within other TF motifs. ProCapNet annotates predictive motifs in nearly all actively transcribed regulatory elements across multiple cell-lines, revealing a shared cis-regulatory logic across promoters and enhancers mediated by a highly epistatic sequence syntax of cooperative and competitive motif interactions. ProCapNet models of RAMPAGE profiles measuring steady-state RNA abundance at TSSs distill initiation signals on par with models trained directly on PRO-cap profiles. ProCapNet learns a largely cell-type-agnostic cis-regulatory code of initiation complementing sequence drivers of cell-type-specific chromatin state critical for accurate prediction of cell-type-specific transcription initiation.
Collapse
Affiliation(s)
- Kelly Cochran
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | | | - Jacob Schreiber
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| |
Collapse
|
16
|
Paterson AH, Queitsch C. Genome organization and botanical diversity. THE PLANT CELL 2024; 36:1186-1204. [PMID: 38382084 PMCID: PMC11062460 DOI: 10.1093/plcell/koae045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 02/07/2024] [Accepted: 02/07/2024] [Indexed: 02/23/2024]
Abstract
The rich diversity of angiosperms, both the planet's dominant flora and the cornerstone of agriculture, is integrally intertwined with a distinctive evolutionary history. Here, we explore the interplay between angiosperm genome organization and botanical diversity, empowered by genomic approaches ranging from genetic linkage mapping to analysis of gene regulation. Commonality in the genetic hardware of plants has enabled robust comparative genomics that has provided a broad picture of angiosperm evolution and implicated both general processes and specific elements in contributing to botanical diversity. We argue that the hardware of plant genomes-both in content and in dynamics-has been shaped by selection for rather substantial differences in gene regulation between plants and animals such as maize and human, organisms of comparable genome size and gene number. Their distinctive genome content and dynamics may reflect in part the indeterminate development of plants that puts strikingly different demands on gene regulation than in animals. Repeated polyploidization of plant genomes and multiplication of individual genes together with extensive rearrangement and differential retention provide rich raw material for selection of morphological and/or physiological variations conferring fitness in specific niches, whether natural or artificial. These findings exemplify the burgeoning information available to employ in increasing knowledge of plant biology and in modifying selected plants to better meet human needs.
Collapse
Affiliation(s)
- Andrew H Paterson
- Plant Genome Mapping Laboratory, University of Georgia, Athens, GA, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| |
Collapse
|
17
|
Park S, Kim M, Lee JW. Optimizing Nucleic Acid Delivery Systems through Barcode Technology. ACS Synth Biol 2024; 13:1006-1018. [PMID: 38526308 DOI: 10.1021/acssynbio.3c00602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Conventional biological experiments often focus on in vitro assays because of the inherent limitations when handling multiple variables in vivo, including labor-intensive and time-consuming procedures. Often only a subset of samples demonstrating significant efficacy in the in vitro assays can be evaluated in vivo. Nonetheless, because of the low correlation between the in vitro and in vivo tests, evaluation of the variables under examination in vivo and not solely in vitro is critical. An emerging approach to achieve high-throughput in vivo tests involves using a barcode system consisting of various nucleotide combinations. Unique barcodes for each variant enable the simultaneous testing of multiple entities, eliminating the need for separate individual tests. Subsequently, to identify crucial parameters, samples were collected and analyzed using barcode sequencing. This review explores the development of barcode design and its applications, including the evaluation of nucleic acid delivery systems and the optimization of gene expression in vivo.
Collapse
Affiliation(s)
- Soan Park
- Department of Chemical Engineering, Pohang University of Science and Technology, 77 CheongamRo, Gyeongbuk, 37673 NamGu, Pohang, Republic of Korea
| | - Mibang Kim
- Department of Chemical Engineering, Pohang University of Science and Technology, 77 CheongamRo, Gyeongbuk, 37673 NamGu, Pohang, Republic of Korea
| | - Jeong Wook Lee
- Department of Chemical Engineering, Pohang University of Science and Technology, 77 CheongamRo, Gyeongbuk, 37673 NamGu, Pohang, Republic of Korea
- School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology, 77 CheongamRo, Gyeongbuk, 37673 NamGu, Pohang, Republic of Korea
| |
Collapse
|
18
|
Wessels HH, Stirn A, Méndez-Mancilla A, Kim EJ, Hart SK, Knowles DA, Sanjana NE. Prediction of on-target and off-target activity of CRISPR-Cas13d guide RNAs using deep learning. Nat Biotechnol 2024; 42:628-637. [PMID: 37400521 DOI: 10.1038/s41587-023-01830-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 05/16/2023] [Indexed: 07/05/2023]
Abstract
Transcriptome engineering applications in living cells with RNA-targeting CRISPR effectors depend on accurate prediction of on-target activity and off-target avoidance. Here we design and test ~200,000 RfxCas13d guide RNAs targeting essential genes in human cells with systematically designed mismatches and insertions and deletions (indels). We find that mismatches and indels have a position- and context-dependent impact on Cas13d activity, and mismatches that result in G-U wobble pairings are better tolerated than other single-base mismatches. Using this large-scale dataset, we train a convolutional neural network that we term targeted inhibition of gene expression via gRNA design (TIGER) to predict efficacy from guide sequence and context. TIGER outperforms the existing models at predicting on-target and off-target activity on our dataset and published datasets. We show that TIGER scoring combined with specific mismatches yields the first general framework to modulate transcript expression, enabling the use of RNA-targeting CRISPRs to precisely control gene dosage.
Collapse
Affiliation(s)
- Hans-Hermann Wessels
- New York Genome Center, New York City, NY, USA
- Department of Biology, New York University, New York City, NY, USA
| | - Andrew Stirn
- New York Genome Center, New York City, NY, USA
- Department of Computer Science, Columbia University, New York City, NY, USA
| | - Alejandro Méndez-Mancilla
- New York Genome Center, New York City, NY, USA
- Department of Biology, New York University, New York City, NY, USA
| | - Eric J Kim
- Department of Computer Science, Columbia University, New York City, NY, USA
| | - Sydney K Hart
- New York Genome Center, New York City, NY, USA
- Department of Biology, New York University, New York City, NY, USA
| | - David A Knowles
- New York Genome Center, New York City, NY, USA.
- Department of Computer Science, Columbia University, New York City, NY, USA.
- Data Science Institute, Columbia University, New York City, NY, USA.
- Department of Systems Biology, Columbia University, New York City, NY, USA.
| | - Neville E Sanjana
- New York Genome Center, New York City, NY, USA.
- Department of Biology, New York University, New York City, NY, USA.
| |
Collapse
|
19
|
Liu J, Ashuach T, Inoue F, Ahituv N, Yosef N, Kreimer A. Optimizing sequence design strategies for perturbation MPRAs: a computational evaluation framework. Nucleic Acids Res 2024; 52:1613-1627. [PMID: 38296821 PMCID: PMC10939410 DOI: 10.1093/nar/gkae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 12/26/2023] [Accepted: 01/12/2024] [Indexed: 02/02/2024] Open
Abstract
The advent of perturbation-based massively parallel reporter assays (MPRAs) technique has facilitated the delineation of the roles of non-coding regulatory elements in orchestrating gene expression. However, computational efforts remain scant to evaluate and establish guidelines for sequence design strategies for perturbation MPRAs. In this study, we propose a framework for evaluating and comparing various perturbation strategies for MPRA experiments. Within this framework, we benchmark three different perturbation approaches from the perspectives of alteration in motif-based profiles, consistency of MPRA outputs, and robustness of models that predict the activities of putative regulatory motifs. While our analyses show very similar results across multiple benchmarking metrics, the predictive modeling for the approach involving random nucleotide shuffling shows significant robustness compared with the other two approaches. Thus, we recommend designing sequences by randomly shuffling the nucleotides of the perturbed site in perturbation-MPRA, followed by a coherence check to prevent the introduction of other variations of the target motifs. In summary, our evaluation framework and the benchmarking findings create a resource of computational pipelines and highlight the potential of perturbation-MPRA in predicting non-coding regulatory activities.
Collapse
Affiliation(s)
- Jiayi Liu
- Graduate Program in Cell & Developmental Biology, Rutgers, The State University of New Jersey, 604 Allison Rd, Piscataway, NJ 08854, USA
- Department of Biochemistry and Molecular Biology, Rutgers, The State University of New Jersey, 604 Allison Road, Piscataway, NJ 08854, USA
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, 679 Hoes Lane West, Piscataway, Piscataway, NJ 08854, USA
| | - Tal Ashuach
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, 387 Soda Hall, Berkeley, CA 94720, USA
| | - Fumitaka Inoue
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Faculty of Medicine Building B, Yoshidatachibanacho, Sakyo Ward, Kyoto 606-8303, Japan
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, 1700 4th Street, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California, 513 Parnassus Ave, San Francisco, CA 94143, USA
| | - Nir Yosef
- Department of Systems Immunology, Weizmann Institute of Science, 234 Herzl Street, Rehovot 7610001, Israel
- Chan-Zuckerberg Biohub, 499 Illinois St, San Francisco, CA 94158, USA
- Department of Systems Immunology, Ragon Institute of MGH, MIT, and Harvard Institute of Science, 400 Technology Square, Cambridge, MA 02139, USA
| | - Anat Kreimer
- Department of Biochemistry and Molecular Biology, Rutgers, The State University of New Jersey, 604 Allison Road, Piscataway, NJ 08854, USA
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, 679 Hoes Lane West, Piscataway, Piscataway, NJ 08854, USA
| |
Collapse
|
20
|
Sun J, Noss S, Banerjee D, Das M, Girirajan S. Strategies for dissecting the complexity of neurodevelopmental disorders. Trends Genet 2024; 40:187-202. [PMID: 37949722 PMCID: PMC10872993 DOI: 10.1016/j.tig.2023.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 09/20/2023] [Accepted: 10/16/2023] [Indexed: 11/12/2023]
Abstract
Neurodevelopmental disorders (NDDs) are associated with a wide range of clinical features, affecting multiple pathways involved in brain development and function. Recent advances in high-throughput sequencing have unveiled numerous genetic variants associated with NDDs, which further contribute to disease complexity and make it challenging to infer disease causation and underlying mechanisms. Herein, we review current strategies for dissecting the complexity of NDDs using model organisms, induced pluripotent stem cells, single-cell sequencing technologies, and massively parallel reporter assays. We further highlight single-cell CRISPR-based screening techniques that allow genomic investigation of cellular transcriptomes with high efficiency, accuracy, and throughput. Overall, we provide an integrated review of experimental approaches that can be applicable for investigating a broad range of complex disorders.
Collapse
Affiliation(s)
- Jiawan Sun
- Molecular, Cellular, and Integrative Biosciences Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA
| | - Serena Noss
- Molecular, Cellular, and Integrative Biosciences Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA
| | - Deepro Banerjee
- Bioinformatics and Genomics Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA
| | - Maitreya Das
- Molecular, Cellular, and Integrative Biosciences Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA
| | - Santhosh Girirajan
- Molecular, Cellular, and Integrative Biosciences Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA; Bioinformatics and Genomics Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA; Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA; Department of Anthropology, Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
21
|
Kang CK, Kim AR. Deep molecular learning of transcriptional control of a synthetic CRE enhancer and its variants. iScience 2024; 27:108747. [PMID: 38222110 PMCID: PMC10784702 DOI: 10.1016/j.isci.2023.108747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 08/29/2023] [Accepted: 12/12/2023] [Indexed: 01/16/2024] Open
Abstract
Massively parallel reporter assay measures transcriptional activities of various cis-regulatory modules (CRMs) in a single experiment. We developed a thermodynamic computational model framework that calculates quantitative levels of gene expression directly from regulatory DNA sequences. Using the framework, we investigated the molecular mechanisms of cis-regulatory mutations of a synthetic enhancer that cause abnormal gene expression. We found that, in a human cell line, competitive binding between family transcription factors (TFs) with slightly different binding preferences significantly increases the accuracy of recapitulating the transcriptional effects of thousands of single- or multi-mutations. We also discovered that even if various harmful mutations occurred in an activator binding site, CRM could stably maintain or even increase gene expression through a certain form of competitive binding between family TFs. These findings enhance understanding the effect of SNPs and indels on CRMs and would help building robust custom-designed CRMs for biologics production and gene therapy.
Collapse
Affiliation(s)
- Chan-Koo Kang
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| | - Ah-Ram Kim
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- School of Applied Artificial Intelligence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| |
Collapse
|
22
|
Fowler DM, Rehm HL. Will variants of uncertain significance still exist in 2030? Am J Hum Genet 2024; 111:5-10. [PMID: 38086381 PMCID: PMC10806733 DOI: 10.1016/j.ajhg.2023.11.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 11/12/2023] [Accepted: 11/13/2023] [Indexed: 12/28/2023] Open
Abstract
In 2020, the National Human Genome Research Institute (NHGRI) made ten "bold predictions," including that "the clinical relevance of all encountered genomic variants will be readily predictable, rendering the diagnostic designation 'variant of uncertain significance (VUS)' obsolete." We discuss the prospects for this prediction, arguing that many, if not most, VUS in coding regions will be resolved by 2030. We outline a confluence of recent changes making this possible, especially advances in the standards for variant classification that better leverage diverse types of evidence, improvements in computational variant effect predictor performance, scalable multiplexed assays of variant effect capable of saturating the genome, and data-sharing efforts that will maximize the information gained from each new individual sequenced and variant interpreted. We suggest that clinicians and researchers can realize a future where VUSs have largely been eliminated, in line with the NHGRI's bold prediction. The length of time taken to reach this future, and thus whether we are able to achieve the goal of largely eliminating VUSs by 2030, is largely a consequence of the choices made now and in the next few years. We believe that investing in eliminating VUSs is worthwhile, since their predominance remains one of the biggest challenges to precision genomic medicine.
Collapse
Affiliation(s)
- Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA; Department of Bioengineering, University of Washington, Seattle, WA, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
| | - Heidi L Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
23
|
de Boer CG, Taipale J. Hold out the genome: a roadmap to solving the cis-regulatory code. Nature 2024; 625:41-50. [PMID: 38093018 DOI: 10.1038/s41586-023-06661-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/20/2023] [Indexed: 01/05/2024]
Abstract
Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The 'cis-regulatory code' - how cells interpret DNA sequences to determine when, where and how much genes should be expressed - has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code. However, the cis-regulatory code will probably never be solved if models are trained only on genomic sequences; regions of homology can easily lead to overestimation of predictive performance, and our genome is too short and has insufficient sequence diversity to learn all relevant parameters. Fortunately, randomly synthesized DNA sequences enable testing a far larger sequence space than exists in our genomes, and designed DNA sequences enable targeted queries to maximally improve the models. As the same biochemical principles are used to interpret DNA regardless of its source, models trained on these synthetic data can predict genomic activity, often better than genome-trained models. Here we provide an outlook on the field, and propose a roadmap towards solving the cis-regulatory code by a combination of machine learning and massively parallel assays using synthetic DNA.
Collapse
Affiliation(s)
- Carl G de Boer
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Jussi Taipale
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.
- Department of Biochemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
24
|
Nemoto T, Ocari T, Planul A, Tekinsoy M, Zin EA, Dalkara D, Ferrari U. ACIDES: on-line monitoring of forward genetic screens for protein engineering. Nat Commun 2023; 14:8504. [PMID: 38148337 PMCID: PMC10751290 DOI: 10.1038/s41467-023-43967-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 11/24/2023] [Indexed: 12/28/2023] Open
Abstract
Forward genetic screens of mutated variants are a versatile strategy for protein engineering and investigation, which has been successfully applied to various studies like directed evolution (DE) and deep mutational scanning (DMS). While next-generation sequencing can track millions of variants during the screening rounds, the vast and noisy nature of the sequencing data impedes the estimation of the performance of individual variants. Here, we propose ACIDES that combines statistical inference and in-silico simulations to improve performance estimation in the library selection process by attributing accurate statistical scores to individual variants. We tested ACIDES first on a random-peptide-insertion experiment and then on multiple public datasets from DE and DMS studies. ACIDES allows experimentalists to reliably estimate variant performance on the fly and can aid protein engineering and research pipelines in a range of applications, including gene therapy.
Collapse
Affiliation(s)
- Takahiro Nemoto
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
- Graduate School of Informatics, Kyoto University, Yoshida Hon-machi, Sakyo-ku, Kyoto, 606-8501, Japan.
- Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Osaka University, Suita, Osaka, 565-0871, Japan.
| | - Tommaso Ocari
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Arthur Planul
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Muge Tekinsoy
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Emilia A Zin
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Deniz Dalkara
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
| | - Ulisse Ferrari
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
| |
Collapse
|
25
|
Zhao J, Baltoumas FA, Konnaris MA, Mouratidis I, Liu Z, Sims J, Agarwal V, Pavlopoulos GA, Georgakopoulos--Soares I, Ahituv N. MPRAbase: A Massively Parallel Reporter Assay Database. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.19.567742. [PMID: 38045264 PMCID: PMC10690217 DOI: 10.1101/2023.11.19.567742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Massively parallel reporter assays (MPRAs) represent a set of high-throughput technologies that measure the functional effects of thousands of sequences/variants on gene regulatory activity. There are several different variations of MPRA technology and they are used for numerous applications, including regulatory element discovery, variant effect measurement, saturation mutagenesis, synthetic regulatory element generation or characterization of evolutionary gene regulatory differences. Despite their many designs and uses, there is no comprehensive database that incorporates the results of these experiments. To address this, we developed MPRAbase, a manually curated database that currently harbors 129 experiments, encompassing 17,718,677 elements tested across 35 cell types and 4 organisms. The MPRAbase web interface (http://www.mprabase.com) serves as a centralized user-friendly repository to download existing MPRA data for independent analysis and is designed with the ability to allow researchers to share their published data for rapid dissemination to the community.
Collapse
Affiliation(s)
- Jingjing Zhao
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Statistics, Penn State University, State College, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Statistics, Penn State University, State College, PA, USA
| | - Zhe Liu
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Jasmine Sims
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
| | - Vikram Agarwal
- mRNA Center of Excellence, Sanofi Pasteur Inc., Waltham, MA, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - Ilias Georgakopoulos--Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Statistics, Penn State University, State College, PA, USA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
26
|
Badia-I-Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, Saez-Rodriguez J. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023; 24:739-754. [PMID: 37365273 DOI: 10.1038/s41576-023-00618-5] [Citation(s) in RCA: 74] [Impact Index Per Article: 74.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2023] [Indexed: 06/28/2023]
Abstract
The interplay between chromatin, transcription factors and genes generates complex regulatory circuits that can be represented as gene regulatory networks (GRNs). The study of GRNs is useful to understand how cellular identity is established, maintained and disrupted in disease. GRNs can be inferred from experimental data - historically, bulk omics data - and/or from the literature. The advent of single-cell multi-omics technologies has led to the development of novel computational methods that leverage genomic, transcriptomic and chromatin accessibility information to infer GRNs at an unprecedented resolution. Here, we review the key principles of inferring GRNs that encompass transcription factor-gene interactions from transcriptomics and chromatin accessibility data. We focus on the comparison and classification of methods that use single-cell multimodal data. We highlight challenges in GRN inference, in particular with respect to benchmarking, and potential further developments using additional data modalities.
Collapse
Affiliation(s)
- Pau Badia-I-Mompel
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Lorna Wessels
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Department of Vascular Biology and Tumor Angiogenesis, European Center for Angioscience, Medical Faculty, MannHeim Heidelberg University, Mannheim, Germany
| | - Sophia Müller-Dott
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Rémi Trimbour
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, Paris, France
| | - Ricardo O Ramirez Flores
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| |
Collapse
|
27
|
Busia A, Listgarten J. MBE: model-based enrichment estimation and prediction for differential sequencing data. Genome Biol 2023; 24:218. [PMID: 37784130 PMCID: PMC10544408 DOI: 10.1186/s13059-023-03058-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 09/14/2023] [Indexed: 10/04/2023] Open
Abstract
Characterizing differences in sequences between two conditions, such as with and without drug exposure, using high-throughput sequencing data is a prevalent problem involving quantifying changes in sequence abundances, and predicting such differences for unobserved sequences. A key shortcoming of current approaches is their extremely limited ability to share information across related but non-identical reads. Consequently, they cannot use sequencing data effectively, nor be directly applied in many settings of interest. We introduce model-based enrichment (MBE) to overcome this shortcoming. We evaluate MBE using both simulated and real data. Overall, MBE improves accuracy compared to current differential analysis methods.
Collapse
Affiliation(s)
- Akosua Busia
- Department of Electrical Engineering & Computer Science, University of California, Berkeley, Berkeley, 94720, CA, USA.
| | - Jennifer Listgarten
- Department of Electrical Engineering & Computer Science, University of California, Berkeley, Berkeley, 94720, CA, USA.
| |
Collapse
|
28
|
Guzman C, Duttke S, Zhu Y, De Arruda Saldanha C, Downes N, Benner C, Heinz S. Combining TSS-MPRA and sensitive TSS profile dissimilarity scoring to study the sequence determinants of transcription initiation. Nucleic Acids Res 2023; 51:e80. [PMID: 37403796 PMCID: PMC10450201 DOI: 10.1093/nar/gkad562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/13/2023] [Accepted: 06/20/2023] [Indexed: 07/06/2023] Open
Abstract
Cis-regulatory elements (CREs) can be classified by the shapes of their transcription start site (TSS) profiles, which are indicative of distinct regulatory mechanisms. Massively parallel reporter assays (MPRAs) are increasingly being used to study CRE regulatory mechanisms, yet the degree to which MPRAs replicate individual endogenous TSS profiles has not been determined. Here, we present a new low-input MPRA protocol (TSS-MPRA) that enables measuring TSS profiles of episomal reporters as well as after lentiviral reporter chromatinization. To sensitively compare MPRA and endogenous TSS profiles, we developed a novel dissimilarity scoring algorithm (WIP score) that outperforms the frequently used earth mover's distance on experimental data. Using TSS-MPRA and WIP scoring on 500 unique reporter inserts, we found that short (153 bp) MPRA promoter inserts replicate the endogenous TSS patterns of ∼60% of promoters. Lentiviral reporter chromatinization did not improve fidelity of TSS-MPRA initiation patterns, and increasing insert size frequently led to activation of extraneous TSS in the MPRA that are not active in vivo. We discuss the implications of our findings, which highlight important caveats when using MPRAs to study transcription mechanisms. Finally, we illustrate how TSS-MPRA and WIP scoring can provide novel insights into the impact of transcription factor motif mutations and genetic variants on TSS patterns and transcription levels.
Collapse
Affiliation(s)
- Carlos Guzman
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
- Department of Bioengineering, Graduate Program in Bioinformatics & Systems Biology, U.C. San Diego, La Jolla, CA 92093, USA
| | - Sascha Duttke
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Yixin Zhu
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Camila De Arruda Saldanha
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Nicholas L Downes
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Christopher Benner
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Sven Heinz
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| |
Collapse
|
29
|
Gaulton KJ, Preissl S, Ren B. Interpreting non-coding disease-associated human variants using single-cell epigenomics. Nat Rev Genet 2023; 24:516-534. [PMID: 37161089 PMCID: PMC10629587 DOI: 10.1038/s41576-023-00598-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/27/2023] [Indexed: 05/11/2023]
Abstract
Genome-wide association studies (GWAS) have linked hundreds of thousands of sequence variants in the human genome to common traits and diseases. However, translating this knowledge into a mechanistic understanding of disease-relevant biology remains challenging, largely because such variants are predominantly in non-protein-coding sequences that still lack functional annotation at cell-type resolution. Recent advances in single-cell epigenomics assays have enabled the generation of cell type-, subtype- and state-resolved maps of the epigenome in heterogeneous human tissues. These maps have facilitated cell type-specific annotation of candidate cis-regulatory elements and their gene targets in the human genome, enhancing our ability to interpret the genetic basis of common traits and diseases.
Collapse
Affiliation(s)
- Kyle J Gaulton
- Department of Paediatrics, Paediatric Diabetes Research Center, University of California San Diego School of Medicine, La Jolla, CA, USA.
| | - Sebastian Preissl
- Center for Epigenomics, University of California San Diego School of Medicine, La Jolla, CA, USA.
- Institute of Experimental and Clinical Pharmacology and Toxicology, Faculty of Medicine, University of Freiburg, Freiburg, Germany.
| | - Bing Ren
- Center for Epigenomics, University of California San Diego School of Medicine, La Jolla, CA, USA.
- Department of Cellular and Molecular Medicine, University of California San Diego School of Medicine, La Jolla, CA, USA.
- Ludwig Institute for Cancer Research, La Jolla, CA, USA.
| |
Collapse
|
30
|
Ruperao P, Rangan P, Shah T, Thakur V, Kalia S, Mayes S, Rathore A. The Progression in Developing Genomic Resources for Crop Improvement. Life (Basel) 2023; 13:1668. [PMID: 37629524 PMCID: PMC10455509 DOI: 10.3390/life13081668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 07/25/2023] [Indexed: 08/27/2023] Open
Abstract
Sequencing technologies have rapidly evolved over the past two decades, and new technologies are being continually developed and commercialized. The emerging sequencing technologies target generating more data with fewer inputs and at lower costs. This has also translated to an increase in the number and type of corresponding applications in genomics besides enhanced computational capacities (both hardware and software). Alongside the evolving DNA sequencing landscape, bioinformatics research teams have also evolved to accommodate the increasingly demanding techniques used to combine and interpret data, leading to many researchers moving from the lab to the computer. The rich history of DNA sequencing has paved the way for new insights and the development of new analysis methods. Understanding and learning from past technologies can help with the progress of future applications. This review focuses on the evolution of sequencing technologies, their significant enabling role in generating plant genome assemblies and downstream applications, and the parallel development of bioinformatics tools and skills, filling the gap in data analysis techniques.
Collapse
Affiliation(s)
- Pradeep Ruperao
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Parimalan Rangan
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India;
| | - Trushar Shah
- International Institute of Tropical Agriculture (IITA), Nairobi 30709-00100, Kenya;
| | - Vivek Thakur
- Department of Systems & Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad 500046, India;
| | - Sanjay Kalia
- Department of Biotechnology, Ministry of Science and Technology, Government of India, New Delhi 110003, India;
| | - Sean Mayes
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Abhishek Rathore
- Excellence in Breeding, International Maize and Wheat Improvement Center (CIMMYT), Hyderabad 502324, India
| |
Collapse
|
31
|
Armendariz DA, Sundarrajan A, Hon GC. Breaking enhancers to gain insights into developmental defects. eLife 2023; 12:e88187. [PMID: 37497775 PMCID: PMC10374278 DOI: 10.7554/elife.88187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 07/19/2023] [Indexed: 07/28/2023] Open
Abstract
Despite ground-breaking genetic studies that have identified thousands of risk variants for developmental diseases, how these variants lead to molecular and cellular phenotypes remains a gap in knowledge. Many of these variants are non-coding and occur at enhancers, which orchestrate key regulatory programs during development. The prevailing paradigm is that non-coding variants alter the activity of enhancers, impacting gene expression programs, and ultimately contributing to disease risk. A key obstacle to progress is the systematic functional characterization of non-coding variants at scale, especially since enhancer activity is highly specific to cell type and developmental stage. Here, we review the foundational studies of enhancers in developmental disease and current genomic approaches to functionally characterize developmental enhancers and their variants at scale. In the coming decade, we anticipate systematic enhancer perturbation studies to link non-coding variants to molecular mechanisms, changes in cell state, and disease phenotypes.
Collapse
Affiliation(s)
- Daniel A Armendariz
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
| | - Anjana Sundarrajan
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
| | - Gary C Hon
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
- Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, United States
- Lyda Hill Department of Bioinformatics, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, United States
| |
Collapse
|
32
|
The Impact of Genomic Variation on Function (IGVF) Consortium. ARXIV 2023:arXiv:2307.13708v1. [PMID: 37547663 PMCID: PMC10402186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Our genomes influence nearly every aspect of human biology from molecular and cellular functions to phenotypes in health and disease. Human genetics studies have now associated hundreds of thousands of differences in our DNA sequence ("genomic variation") with disease risk and other phenotypes, many of which could reveal novel mechanisms of human biology and uncover the basis of genetic predispositions to diseases, thereby guiding the development of new diagnostics and therapeutics. Yet, understanding how genomic variation alters genome function to influence phenotype has proven challenging. To unlock these insights, we need a systematic and comprehensive catalog of genome function and the molecular and cellular effects of genomic variants. Toward this goal, the Impact of Genomic Variation on Function (IGVF) Consortium will combine approaches in single-cell mapping, genomic perturbations, and predictive modeling to investigate the relationships among genomic variation, genome function, and phenotypes. Through systematic comparisons and benchmarking of experimental and computational methods, we aim to create maps across hundreds of cell types and states describing how coding variants alter protein activity, how noncoding variants change the regulation of gene expression, and how both coding and noncoding variants may connect through gene regulatory and protein interaction networks. These experimental data, computational predictions, and accompanying standards and pipelines will be integrated into an open resource that will catalyze community efforts to explore genome function and the impact of genetic variation on human biology and disease across populations.
Collapse
|
33
|
Milito A, Aschern M, McQuillan JL, Yang JS. Challenges and advances towards the rational design of microalgal synthetic promoters in Chlamydomonas reinhardtii. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:3833-3850. [PMID: 37025006 DOI: 10.1093/jxb/erad100] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 03/24/2023] [Indexed: 06/19/2023]
Abstract
Microalgae hold enormous potential to provide a safe and sustainable source of high-value compounds, acting as carbon-fixing biofactories that could help to mitigate rapidly progressing climate change. Bioengineering microalgal strains will be key to optimizing and modifying their metabolic outputs, and to render them competitive with established industrial biotechnology hosts, such as bacteria or yeast. To achieve this, precise and tuneable control over transgene expression will be essential, which would require the development and rational design of synthetic promoters as a key strategy. Among green microalgae, Chlamydomonas reinhardtii represents the reference species for bioengineering and synthetic biology; however, the repertoire of functional synthetic promoters for this species, and for microalgae generally, is limited in comparison to other commercial chassis, emphasizing the need to expand the current microalgal gene expression toolbox. Here, we discuss state-of-the-art promoter analyses, and highlight areas of research required to advance synthetic promoter development in C. reinhardtii. In particular, we exemplify high-throughput studies performed in other model systems that could be applicable to microalgae, and propose novel approaches to interrogating algal promoters. We lastly outline the major limitations hindering microalgal promoter development, while providing novel suggestions and perspectives for how to overcome them.
Collapse
Affiliation(s)
- Alfonsina Milito
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, Barcelona, Spain
| | - Moritz Aschern
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, Barcelona, Spain
| | - Josie L McQuillan
- Department of Chemical and Biological Engineering, University of Sheffield, Mappin Street, Sheffield, S1 3JD, UK
| | - Jae-Seong Yang
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, Barcelona, Spain
| |
Collapse
|
34
|
Fowler DM, Adams DJ, Gloyn AL, Hahn WC, Marks DS, Muffley LA, Neal JT, Roth FP, Rubin AF, Starita LM, Hurles ME. An Atlas of Variant Effects to understand the genome at nucleotide resolution. Genome Biol 2023; 24:147. [PMID: 37394429 PMCID: PMC10316620 DOI: 10.1186/s13059-023-02986-x] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 06/13/2023] [Indexed: 07/04/2023] Open
Abstract
Sequencing has revealed hundreds of millions of human genetic variants, and continued efforts will only add to this variant avalanche. Insufficient information exists to interpret the effects of most variants, limiting opportunities for precision medicine and comprehension of genome function. A solution lies in experimental assessment of the functional effect of variants, which can reveal their biological and clinical impact. However, variant effect assays have generally been undertaken reactively for individual variants only after and, in most cases long after, their first observation. Now, multiplexed assays of variant effect can characterise massive numbers of variants simultaneously, yielding variant effect maps that reveal the function of every possible single nucleotide change in a gene or regulatory element. Generating maps for every protein encoding gene and regulatory element in the human genome would create an 'Atlas' of variant effect maps and transform our understanding of genetics and usher in a new era of nucleotide-resolution functional knowledge of the genome. An Atlas would reveal the fundamental biology of the human genome, inform human evolution, empower the development and use of therapeutics and maximize the utility of genomics for diagnosing and treating disease. The Atlas of Variant Effects Alliance is an international collaborative group comprising hundreds of researchers, technologists and clinicians dedicated to realising an Atlas of Variant Effects to help deliver on the promise of genomics.
Collapse
Affiliation(s)
- Douglas M. Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA USA
- Department of Bioengineering, University of Washington, Seattle, WA USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA USA
| | | | - Anna L. Gloyn
- Department of Pediatrics & Department of Genetics, Division of Endocrinology, Stanford School of Medicine, Stanford University, Stanford, CA USA
| | - William C. Hahn
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Debora S. Marks
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Department of Systems Biology, Harvard Medical School, Cambridge, USA
| | - Lara A. Muffley
- Department of Genome Sciences, University of Washington, Seattle, WA USA
| | - James T. Neal
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease at Broad Institute, Cambridge, MA USA
| | - Frederick P. Roth
- Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON Canada
| | - Alan F. Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC Australia
| | - Lea M. Starita
- Department of Genome Sciences, University of Washington, Seattle, WA USA
- Department of Bioengineering, University of Washington, Seattle, WA USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA USA
| | | |
Collapse
|
35
|
Guga S, Wang Y, Graham DC, Vyse TJ. A review of genetic risk in systemic lupus erythematosus. Expert Rev Clin Immunol 2023; 19:1247-1258. [PMID: 37496418 DOI: 10.1080/1744666x.2023.2240959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 05/10/2023] [Indexed: 07/28/2023]
Abstract
INTRODUCTION Systemic Lupus Erythematosus (SLE) is a complex multisystem autoimmune disease with a wide range of signs and symptoms in affected individuals. The utilization of genome-wide association study (GWAS) technology has led to an explosion in the number of genetic risk factors mapped for autoimmune diseases, including SLE. AREAS COVERED In this review, we summarize the more recent genetic risk loci mapped in SLE, which bring the total number of loci mapped to approximately 200. We review prioritization analyses of the associated variants and experimental validation of the putative causal variants. This includes the implementation of new bioinformatic techniques to align genomic and functional data and the use of transcriptomics with single-cell RNA-sequencing, CRISPR genome editing, and Massive Parallel Reporter Assays to analyze non-coding regulatory genetics. EXPERT OPINION Despite progress in identifying more genetic risk loci and variant-gene pairs for SLE, understanding its pathogenesis and applying findings clinically remains challenging. The polygenic risk score (PRS) has been used as an application of SLE genetics, but with limited performance in non-EUR populations. In the next few years, advancements in proteomics, post-translational modification estimation, and whole-genome sequencing will enhance disease understanding.
Collapse
Affiliation(s)
- Suri Guga
- Department of Medical & Molecular Genetics, King's College London, London, UK
| | - Yuxuan Wang
- Department of Medical & Molecular Genetics, King's College London, London, UK
| | | | - Timothy J Vyse
- Department of Medical & Molecular Genetics, King's College London, London, UK
| |
Collapse
|
36
|
Wu F. Updated analysis to reject the laboratory-engineering hypothesis of SARS-CoV-2. ENVIRONMENTAL RESEARCH 2023; 224:115481. [PMID: 36804316 PMCID: PMC9937728 DOI: 10.1016/j.envres.2023.115481] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/08/2023] [Accepted: 02/09/2023] [Indexed: 06/18/2023]
Abstract
A clear understanding of the origin of SARS-CoV-2 is important for future pandemic preparedness. Here, I provided an updated analysis of the type IIS endonuclease maps in genomes of alphacoronavirus, betacoronavirus, and SARS-CoV-2. Scenarios to engineer SARS-CoV-2 in the laboratory and the associated workload was also discussed. The analysis clearly shows that the endonuclease fingerprint does not indicate a synthetic origin of SARS-CoV-2 and engineering a SARS-CoV-2 virus in the laboratory is extremely challenging both scientifically and financially. On the contrary, current scientific evidence does support the animal origin of SARS-CoV-2.
Collapse
Affiliation(s)
- Fuqing Wu
- Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Texas, USA; Texas Epidemic Public Health Institute, TX, USA.
| |
Collapse
|
37
|
Ren N, Dai S, Ma S, Yang F. Strategies for activity analysis of single nucleotide polymorphisms associated with human diseases. Clin Genet 2023; 103:392-400. [PMID: 36527336 DOI: 10.1111/cge.14282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/10/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022]
Abstract
Genome-wide association studies (GWAS) have identified a large number of single nucleotide polymorphism (SNP) sites associated with human diseases. In the annotation of human diseases, especially cancers, SNPs, as an important component of genetic factors, have gained increasing attention. Given that most of the SNPs are located in non-coding regions, the functional verification of these SNPs is a great challenge. The key to functional annotation for risk SNPs is to screen SNPs with regulatory activity from thousands of disease associated-SNPs. In this review, we systematically recapitulate the characteristics and functional roles of SNP sites, discuss three parallel reporter screening strategies in detail based on barcode tag classification, and recommend the common in silico strategies to help supplement the annotation of SNP sites with epigenetic activity analysis, prediction of target genes and trans-acting factors. We hope that this review will contribute to this exuberant research field by providing robust activity analysis strategies that can facilitate the translation of GWAS results into personalized diagnosis and prevention measures for human diseases.
Collapse
Affiliation(s)
- Naixia Ren
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shangkun Dai
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shumin Ma
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
| | - Fengtang Yang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| |
Collapse
|
38
|
Zheng Y, VanDusen NJ. Massively Parallel Reporter Assays for High-Throughput In Vivo Analysis of Cis-Regulatory Elements. J Cardiovasc Dev Dis 2023; 10:jcdd10040144. [PMID: 37103023 PMCID: PMC10146671 DOI: 10.3390/jcdd10040144] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/24/2023] [Accepted: 03/27/2023] [Indexed: 03/31/2023] Open
Abstract
The rapid improvement of descriptive genomic technologies has fueled a dramatic increase in hypothesized connections between cardiovascular gene expression and phenotypes. However, in vivo testing of these hypotheses has predominantly been relegated to slow, expensive, and linear generation of genetically modified mice. In the study of genomic cis-regulatory elements, generation of mice featuring transgenic reporters or cis-regulatory element knockout remains the standard approach. While the data obtained is of high quality, the approach is insufficient to keep pace with candidate identification and therefore results in biases introduced during the selection of candidates for validation. However, recent advances across a range of disciplines are converging to enable functional genomic assays that can be conducted in a high-throughput manner. Here, we review one such method, massively parallel reporter assays (MPRAs), in which the activities of thousands of candidate genomic regulatory elements are simultaneously assessed via the next-generation sequencing of a barcoded reporter transcript. We discuss best practices for MPRA design and use, with a focus on practical considerations, and review how this emerging technology has been successfully deployed in vivo. Finally, we discuss how MPRAs are likely to evolve and be used in future cardiovascular research.
Collapse
|
39
|
Using Attribution Sequence Alignment to Interpret Deep Learning Models for miRNA Binding Site Prediction. BIOLOGY 2023; 12:biology12030369. [PMID: 36979061 PMCID: PMC10045089 DOI: 10.3390/biology12030369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 02/21/2023] [Accepted: 02/24/2023] [Indexed: 03/03/2023]
Abstract
MicroRNAs (miRNAs) are small non-coding RNAs that play a central role in the post-transcriptional regulation of biological processes. miRNAs regulate transcripts through direct binding involving the Argonaute protein family. The exact rules of binding are not known, and several in silico miRNA target prediction methods have been developed to date. Deep learning has recently revolutionized miRNA target prediction. However, the higher predictive power comes with a decreased ability to interpret increasingly complex models. Here, we present a novel interpretation technique, called attribution sequence alignment, for miRNA target site prediction models that can interpret such deep learning models on a two-dimensional representation of miRNA and putative target sequence. Our method produces a human readable visual representation of miRNA:target interactions and can be used as a proxy for the further interpretation of biological concepts learned by the neural network. We demonstrate applications of this method in the clustering of experimental data into binding classes, as well as using the method to narrow down predicted miRNA binding sites on long transcript sequences. Importantly, the presented method works with any neural network model trained on a two-dimensional representation of interactions and can be easily extended to further domains such as protein–protein interactions.
Collapse
|
40
|
Gallego Romero I, Lea AJ. Leveraging massively parallel reporter assays for evolutionary questions. Genome Biol 2023; 24:26. [PMID: 36788564 PMCID: PMC9926830 DOI: 10.1186/s13059-023-02856-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 01/17/2023] [Indexed: 02/16/2023] Open
Abstract
A long-standing goal of evolutionary biology is to decode how gene regulation contributes to organismal diversity. Doing so is challenging because it is hard to predict function from non-coding sequence and to perform molecular research with non-model taxa. Massively parallel reporter assays (MPRAs) enable the testing of thousands to millions of sequences for regulatory activity simultaneously. Here, we discuss the execution, advantages, and limitations of MPRAs, with a focus on evolutionary questions. We propose solutions for extending MPRAs to rare taxa and those with limited genomic resources, and we underscore MPRA's broad potential for driving genome-scale, functional studies across organisms.
Collapse
Affiliation(s)
- Irene Gallego Romero
- Melbourne Integrative Genomics, University of Melbourne, Royal Parade, Parkville, Victoria, 3010, Australia. .,School of BioSciences, The University of Melbourne, Royal Parade, Parkville, 3010, Australia. .,The Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, 30 Royal Parade, Parkville, Victoria, 3010, Australia. .,Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Estonia.
| | - Amanda J. Lea
- grid.152326.10000 0001 2264 7217Department of Biological Sciences, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37240 USA ,Child and Brain Development Program, Canadian Institute for Advanced Study, Toronto, Canada
| |
Collapse
|
41
|
Аpplication of massive parallel reporter analysis in biotechnology and medicine. КЛИНИЧЕСКАЯ ПРАКТИКА 2023. [DOI: 10.17816/clinpract115063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The development and functioning of an organism relies on tissue-specific gene programs. Genome regulatory elements play a key role in the regulation of such programs, and disruptions in their function can lead to the development of various pathologies, including cancers, malformations and autoimmune diseases. The emergence of high-throughput genomic studies has led to massively parallel reporter analysis (MPRA) methods, which allow the functional verification and identification of regulatory elements on a genome-wide scale. Initially MPRA was used as a tool to investigate fundamental aspects of epigenetics, but the approach also has great potential for clinical and practical biotechnology. Currently, MPRA is used for validation of clinically significant mutations, identification of tissue-specific regulatory elements, search for the most promising loci for transgene integration, and is an indispensable tool for creating highly efficient expression systems, the range of application of which extends from approaches for protein development and design of next-generation therapeutic antibody superproducers to gene therapy. In this review, the main principles and areas of practical application of high-throughput reporter assays will be discussed.
Collapse
|
42
|
Morova T, Ding Y, Huang CCF, Sar F, Schwarz T, Giambartolomei C, Baca S, Grishin D, Hach F, Gusev A, Freedman M, Pasaniuc B, Lack N. Optimized high-throughput screening of non-coding variants identified from genome-wide association studies. Nucleic Acids Res 2022; 51:e18. [PMID: 36546757 PMCID: PMC9943666 DOI: 10.1093/nar/gkac1198] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 11/19/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open
Abstract
The vast majority of disease-associated single nucleotide polymorphisms (SNP) identified from genome-wide association studies (GWAS) are localized in non-coding regions. A significant fraction of these variants impact transcription factors binding to enhancer elements and alter gene expression. To functionally interrogate the activity of such variants we developed snpSTARRseq, a high-throughput experimental method that can interrogate the functional impact of hundreds to thousands of non-coding variants on enhancer activity. snpSTARRseq dramatically improves signal-to-noise by utilizing a novel sequencing and bioinformatic approach that increases both insert size and the number of variants tested per loci. Using this strategy, we interrogated known prostate cancer (PCa) risk-associated loci and demonstrated that 35% of them harbor SNPs that significantly altered enhancer activity. Combining these results with chromosomal looping data we could identify interacting genes and provide a mechanism of action for 20 PCa GWAS risk regions. When benchmarked to orthogonal methods, snpSTARRseq showed a strong correlation with in vivo experimental allelic-imbalance studies whereas there was no correlation with predictive in silico approaches. Overall, snpSTARRseq provides an integrated experimental and computational framework to functionally test non-coding genetic variants.
Collapse
Affiliation(s)
- Tunc Morova
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | | - Funda Sar
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Tommer Schwarz
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Claudia Giambartolomei
- Central RNA Lab, Istituto Italiano di Tecnologia, Genova 16163, Italy,Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Sylvan C Baca
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Dennis Grishin
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Faraz Hach
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada,Department of Urologic Science, University of British Columbia, Vancouver, BC V5Z 1M9, Canada
| | - Alexander Gusev
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA,Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Matthew L Freedman
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA,The Center for Cancer Genome Discovery, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nathan A Lack
- To whom correspondence should be addressed. Tel: +1 604 875 4411;
| |
Collapse
|
43
|
Dace P, Findlay GM. Reducing uncertainty in genetic testing with Saturation Genome Editing. MED GENET-BERLIN 2022; 34:297-304. [PMID: 38836089 PMCID: PMC11006300 DOI: 10.1515/medgen-2022-2159] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
Accurate interpretation of human genetic data is critical for optimizing outcomes in the era of genomic medicine. Powerful methods for testing genetic variants for functional effects are allowing researchers to characterize thousands of variants across disease genes. Here, we review experimental tools enabling highly scalable assays of variants, focusing specifically on Saturation Genome Editing (SGE). We discuss examples of how this technique is being implemented for variant testing at scale and describe how SGE data for BRCA1 have been clinically validated and used to aid variant interpretation. The initial success at predicting variant pathogenicity with SGE has spurred efforts to expand this and related techniques to many more genes.
Collapse
Affiliation(s)
- Phoebe Dace
- The Genome Function Laboratory, The Francis Crick Institute, 1 Midland Rd, London, United Kingdom
| | - Gregory M Findlay
- The Genome Function Laboratory, The Francis Crick Institute, 1 Midland Rd, London, United Kingdom
| |
Collapse
|
44
|
Tabet D, Parikh V, Mali P, Roth FP, Claussnitzer M. Scalable Functional Assays for the Interpretation of Human Genetic Variation. Annu Rev Genet 2022; 56:441-465. [PMID: 36055970 DOI: 10.1146/annurev-genet-072920-032107] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Scalable sequence-function studies have enabled the systematic analysis and cataloging of hundreds of thousands of coding and noncoding genetic variants in the human genome. This has improved clinical variant interpretation and provided insights into the molecular, biophysical, and cellular effects of genetic variants at an astonishing scale and resolution across the spectrum of allele frequencies. In this review, we explore current applications and prospects for the field and outline the principles underlying scalable functional assay design, with a focus on the study of single-nucleotide coding and noncoding variants.
Collapse
Affiliation(s)
- Daniel Tabet
- Donnelly Centre, Department of Molecular Genetics, and Department of Computer Science, University of Toronto, Toronto, Ontario, Canada;
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Victoria Parikh
- Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California, USA
| | - Prashant Mali
- Department of Bioengineering, University of California, San Diego, California, USA
| | - Frederick P Roth
- Donnelly Centre, Department of Molecular Genetics, and Department of Computer Science, University of Toronto, Toronto, Ontario, Canada;
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Melina Claussnitzer
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Center for Genomic Medicine and Endocrine Division, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Harvard University, Boston, Massachusetts, USA;
| |
Collapse
|
45
|
van den Elzen AMG, Watson MJ, Thoreen CC. mRNA 5' terminal sequences drive 200-fold differences in expression through effects on synthesis, translation and decay. PLoS Genet 2022; 18:e1010532. [PMID: 36441824 PMCID: PMC9731452 DOI: 10.1371/journal.pgen.1010532] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 12/08/2022] [Accepted: 11/15/2022] [Indexed: 11/30/2022] Open
Abstract
mRNA regulatory sequences control gene expression at multiple levels including translation initiation and mRNA decay. The 5' terminal sequences of mRNAs have unique regulatory potential because of their proximity to key post-transcriptional regulators. Here we have systematically probed the function of 5' terminal sequences in gene expression in human cells. Using a library of reporter mRNAs initiating with all possible 7-mer sequences at their 5' ends, we find an unexpected impact on transcription that underlies 200-fold differences in mRNA expression. Library sequences that promote high levels of transcription mirrored those found in native mRNAs and define two basic classes with similarities to classic Initiator (Inr) and TCT core promoter motifs. By comparing transcription, translation and decay rates, we identify sequences that are optimized for both efficient transcription and growth-regulated translation and stability, including variants of terminal oligopyrimidine (TOP) motifs. We further show that 5' sequences of endogenous mRNAs are enriched for multi-functional TCT/TOP hybrid sequences. Together, our results reveal how 5' sequences define two general classes of mRNAs with distinct growth-responsive profiles of expression across synthesis, translation and decay.
Collapse
Affiliation(s)
- Antonia M. G. van den Elzen
- Department of Cellular and Molecular Physiology, Yale School of Medicine, New Haven, Connecticut, United States of America
| | - Maegan J. Watson
- Department of Cellular and Molecular Physiology, Yale School of Medicine, New Haven, Connecticut, United States of America
| | - Carson C. Thoreen
- Department of Cellular and Molecular Physiology, Yale School of Medicine, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
46
|
Walton RT, Singh A, Blainey PC. Pooled genetic screens with image-based profiling. Mol Syst Biol 2022; 18:e10768. [PMID: 36366905 PMCID: PMC9650298 DOI: 10.15252/msb.202110768] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 09/12/2022] [Accepted: 09/16/2022] [Indexed: 11/13/2022] Open
Abstract
Spatial structure in biology, spanning molecular, organellular, cellular, tissue, and organismal scales, is encoded through a combination of genetic and epigenetic factors in individual cells. Microscopy remains the most direct approach to exploring the intricate spatial complexity defining biological systems and the structured dynamic responses of these systems to perturbations. Genetic screens with deep single-cell profiling via image features or gene expression programs have the capacity to show how biological systems work in detail by cataloging many cellular phenotypes with one experimental assay. Microscopy-based cellular profiling provides information complementary to next-generation sequencing (NGS) profiling and has only recently become compatible with large-scale genetic screens. Optical screening now offers the scale needed for systematic characterization and is poised for further scale-up. We discuss how these methodologies, together with emerging technologies for genetic perturbation and microscopy-based multiplexed molecular phenotyping, are powering new approaches to reveal genotype-phenotype relationships.
Collapse
Affiliation(s)
- Russell T Walton
- Broad Institute of MIT and HarvardCambridgeMAUSA
- Department of Biological EngineeringMITCambridgeMAUSA
| | - Avtar Singh
- Broad Institute of MIT and HarvardCambridgeMAUSA
- Present address:
Department of Cellular and Tissue GenomicsGenentechSouth San FranciscoCAUSA
| | - Paul C Blainey
- Broad Institute of MIT and HarvardCambridgeMAUSA
- Department of Biological EngineeringMITCambridgeMAUSA
- Koch Institute for Integrative Cancer ResearchMITCambridgeMAUSA
| |
Collapse
|
47
|
Pourseif MM, Masoudi-Sobhanzadeh Y, Azari E, Parvizpour S, Barar J, Ansari R, Omidi Y. Self-amplifying mRNA vaccines: Mode of action, design, development and optimization. Drug Discov Today 2022; 27:103341. [PMID: 35988718 DOI: 10.1016/j.drudis.2022.103341] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Revised: 07/14/2022] [Accepted: 08/15/2022] [Indexed: 11/25/2022]
Abstract
The mRNA-based vaccines are quality-by-design (QbD) immunotherapies that provide safe, tunable, scalable, streamlined and potent treatment possibilities against different types of diseases. The self-amplifying mRNA (saRNA) vaccines, as a highly advantageous class of mRNA vaccines, are inspired by the intracellular self-multiplication nature of some positive-sense RNA viruses. Such vaccine platforms provide a relatively increased expression level of vaccine antigen(s) together with self-adjuvanticity properties. Lined with the QbD saRNA vaccines, essential optimizations improve the stability, safety, and immunogenicity of the vaccine constructs. Here, we elaborate on the concepts and mode-of-action of mRNA and saRNA vaccines, articulate the potential limitations or technical bottlenecks, and explain possible solutions or optimization methods in the process of their design and development.
Collapse
Affiliation(s)
- Mohammad M Pourseif
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran; Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Yosef Masoudi-Sobhanzadeh
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran; Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Erfan Azari
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran; Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran; Student Research Committee, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Sepideh Parvizpour
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran; Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Jaleh Barar
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran; Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Rais Ansari
- Department of Pharmaceutical Sciences, College of Pharmacy, Nova Southeastern University, Fort Lauderdale, Florida, USA
| | - Yadollah Omidi
- Department of Pharmaceutical Sciences, College of Pharmacy, Nova Southeastern University, Fort Lauderdale, Florida, USA.
| |
Collapse
|
48
|
Qin W, Li L, Yang F, Wang S, Yang GY. High-throughput iSpinach fluorescent aptamer-based real-time monitoring of in vitro transcription. BIORESOUR BIOPROCESS 2022; 9:112. [PMID: 38647769 PMCID: PMC10991154 DOI: 10.1186/s40643-022-00598-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 09/30/2022] [Indexed: 11/10/2022] Open
Abstract
In vitro transcription (IVT) is an essential technique for RNA synthesis. Methods for the accurate and rapid screening of IVT conditions will facilitate RNA polymerase engineering, promoter optimization, and screening for new transcription inhibitor drugs. However, traditional polyacrylamide gel electrophoresis (PAGE) and high-performance liquid chromatography methods are labor intensive, time consuming and not compatible with real-time analysis. Here, we developed an inexpensive, high-throughput, and real-time detection method for the monitoring of in vitro RNA synthesis called iSpinach aptamer-based monitoring of Transcription Activity in Real-time (STAR). STAR has a detection speed at least 100 times faster than conventional PAGE method and provides comparable results in the analysis of in vitro RNA synthesis reactions. It also can be used as an easy and quantitative method to detect the catalytic activity of T7 RNA polymerase. To further demonstrate the utility of STAR, it was applied to optimize the initially transcribed region of the green fluorescent protein gene and the 3T4T variants demonstrated significantly enhanced transcription output, with at least 1.7-fold and 2.8-fold greater output than the wild-type DNA template and common transcription template, respectively. STAR may provide a valuable tool for many biotechnical applications related to the transcription process, which may pave the way for the development of better RNA-related enzymes and new drugs.
Collapse
Affiliation(s)
- Weitong Qin
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Liang Li
- Hzymes Biotechnology Co. Ltd, Hubei, 430010, China
| | - Fan Yang
- Hzymes Biotechnology Co. Ltd, Hubei, 430010, China
| | - Siyuan Wang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Guang-Yu Yang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
49
|
Cooper YA, Guo Q, Geschwind DH. Multiplexed functional genomic assays to decipher the noncoding genome. Hum Mol Genet 2022; 31:R84-R96. [PMID: 36057282 PMCID: PMC9585676 DOI: 10.1093/hmg/ddac194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 11/14/2022] Open
Abstract
Linkage disequilibrium and the incomplete regulatory annotation of the noncoding genome complicates the identification of functional noncoding genetic variants and their causal association with disease. Current computational methods for variant prioritization have limited predictive value, necessitating the application of highly parallelized experimental assays to efficiently identify functional noncoding variation. Here, we summarize two distinct approaches, massively parallel reporter assays and CRISPR-based pooled screens and describe their flexible implementation to characterize human noncoding genetic variation at unprecedented scale. Each approach provides unique advantages and limitations, highlighting the importance of multimodal methodological integration. These multiplexed assays of variant effects are undoubtedly poised to play a key role in the experimental characterization of noncoding genetic risk, informing our understanding of the underlying mechanisms of disease-associated loci and the development of more robust predictive classification algorithms.
Collapse
Affiliation(s)
- Yonatan A Cooper
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Medical Scientist Training Program, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Qiuyu Guo
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Program in Neurogenetics, Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
- Center for Autism Research and Treatment, Semel Institute, University of California Los Angeles, Los Angeles, CA, USA
- Institute of Precision Health, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
50
|
Du AY, Zhuo X, Sundaram V, Jensen NO, Chaudhari HG, Saccone NL, Cohen BA, Wang T. Functional characterization of enhancer activity during a long terminal repeat's evolution. Genome Res 2022; 32:1840-1851. [PMID: 36192170 PMCID: PMC9712623 DOI: 10.1101/gr.276863.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 08/23/2022] [Indexed: 11/24/2022]
Abstract
Many transposable elements (TEs) contain transcription factor binding sites and are implicated as potential regulatory elements. However, TEs are rarely functionally tested for regulatory activity, which in turn limits our understanding of how TE regulatory activity has evolved. We systematically tested the human LTR18A subfamily for regulatory activity using massively parallel reporter assay (MPRA) and found AP-1- and CEBP-related binding motifs as drivers of enhancer activity. Functional analysis of evolutionarily reconstructed ancestral sequences revealed that LTR18A elements have generally lost regulatory activity over time through sequence changes, with the largest effects occurring owing to mutations in the AP-1 and CEBP motifs. We observed that the two motifs are conserved at higher rates than expected based on neutral evolution. Finally, we identified LTR18A elements as potential enhancers in the human genome, primarily in epithelial cells. Together, our results provide a model for the origin, evolution, and co-option of TE-derived regulatory elements.
Collapse
Affiliation(s)
- Alan Y Du
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Xiaoyu Zhuo
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Vasavi Sundaram
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Nicholas O Jensen
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Hemangi G Chaudhari
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Nancy L Saccone
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Barak A Cohen
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| |
Collapse
|