Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Orenstein Y, Shamir R. Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers. Bioinformatics 2013;29:i71-9. [PMID: 23813011 PMCID: PMC3694677 DOI: 10.1093/bioinformatics/btt230] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

For:	Orenstein Y, Shamir R. Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers. Bioinformatics 2013;29:i71-9. [PMID: 23813011 PMCID: PMC3694677 DOI: 10.1093/bioinformatics/btt230] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Number

Cited by Other Article(s)

Gunter HM, Youlten SE, Reis ALM, McCubbin T, Madala BS, Wong T, Stevanovski I, Cipponi A, Deveson IW, Santini NS, Kummerfeld S, Croucher PI, Marcellin E, Mercer TR. A universal molecular control for DNA, mRNA and protein expression. Nat Commun 2024;15:2480. [PMID: 38509097 PMCID: PMC10954659 DOI: 10.1038/s41467-024-46456-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 02/28/2024] [Indexed: 03/22/2024] Open

Affiliation(s)

Helen M Gunter Australian Institute of Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Queensland, Australia BASE mRNA Facility, The University of Queensland, Brisbane, Queensland, Australia ARC Centre of Excellence in Synthetic Biology, The University of Queensland, Brisbane, Queensland, Australia
Scott E Youlten Department of Genetics, Yale University School of Medicine, New Haven, CT, 06510, USA Garvan Institute of Medical Research, Sydney, New South Wales, Australia St Vincent's Clinical School, University of New South Wales, Sydney, New South Wales, Australia
Andre L M Reis Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Sydney, New South Wales, Australia School of Electrical and Information Engineering, University of Sydney, Sydney, New South Wales, Australia
Tim McCubbin Australian Institute of Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Queensland, Australia ARC Centre of Excellence in Synthetic Biology, The University of Queensland, Brisbane, Queensland, Australia
Bindu Swapna Madala Garvan Institute of Medical Research, Sydney, New South Wales, Australia Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Sydney, New South Wales, Australia
Ted Wong Garvan Institute of Medical Research, Sydney, New South Wales, Australia
Igor Stevanovski Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Sydney, New South Wales, Australia
Arcadi Cipponi Garvan Institute of Medical Research, Sydney, New South Wales, Australia St Vincent's Clinical School, University of New South Wales, Sydney, New South Wales, Australia
Ira W Deveson Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Sydney, New South Wales, Australia School of Electrical and Information Engineering, University of Sydney, Sydney, New South Wales, Australia
Nadia S Santini Centro Nacional de Investigación Disciplinaria en Conservación y Mejoramiento de Ecosistemas Forestales, INIFAP, Ciudad de México, 04010, Mexico
Sarah Kummerfeld Garvan Institute of Medical Research, Sydney, New South Wales, Australia St Vincent's Clinical School, University of New South Wales, Sydney, New South Wales, Australia
Peter I Croucher Garvan Institute of Medical Research, Sydney, New South Wales, Australia St Vincent's Clinical School, University of New South Wales, Sydney, New South Wales, Australia
Esteban Marcellin Australian Institute of Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Queensland, Australia ARC Centre of Excellence in Synthetic Biology, The University of Queensland, Brisbane, Queensland, Australia
Tim R Mercer Australian Institute of Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Queensland, Australia. BASE mRNA Facility, The University of Queensland, Brisbane, Queensland, Australia. ARC Centre of Excellence in Synthetic Biology, The University of Queensland, Brisbane, Queensland, Australia. Garvan Institute of Medical Research, Sydney, New South Wales, Australia.

Collapse

Alexandari AM, Horton CA, Shrikumar A, Shah N, Li E, Weilert M, Pufall MA, Zeitlinger J, Fordyce PM, Kundaje A. De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.11.540401. [PMID: 37214836 PMCID: PMC10197627 DOI: 10.1101/2023.05.11.540401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]

Abstract

Transcription factors (TF) are proteins that bind DNA in a sequence-specific manner to regulate gene transcription. Despite their unique intrinsic sequence preferences, in vivo genomic occupancy profiles of TFs differ across cellular contexts. Hence, deciphering the sequence determinants of TF binding, both intrinsic and context-specific, is essential to understand gene regulation and the impact of regulatory, non-coding genetic variation. Biophysical models trained on in vitro TF binding assays can estimate intrinsic affinity landscapes and predict occupancy based on TF concentration and affinity. However, these models cannot adequately explain context-specific, in vivo binding profiles. Conversely, deep learning models, trained on in vivo TF binding assays, effectively predict and explain genomic occupancy profiles as a function of complex regulatory sequence syntax, albeit without a clear biophysical interpretation. To reconcile these complementary models of in vitro and in vivo TF binding, we developed Affinity Distillation (AD), a method that extracts thermodynamic affinities de-novo from deep learning models of TF chromatin immunoprecipitation (ChIP) experiments by marginalizing away the influence of genomic sequence context. Applied to neural networks modeling diverse classes of yeast and mammalian TFs, AD predicts energetic impacts of sequence variation within and surrounding motifs on TF binding as measured by diverse in vitro assays with superior dynamic range and accuracy compared to motif-based methods. Furthermore, AD can accurately discern affinities of TF paralogs. Our results highlight thermodynamic affinity as a key determinant of in vivo binding, suggest that deep learning models of in vivo binding implicitly learn high-resolution affinity landscapes, and show that these affinities can be successfully distilled using AD. This new biophysical interpretation of deep learning models enables high-throughput in silico experiments to explore the influence of sequence context and variation on both intrinsic affinity and in vivo occupancy.

Collapse

Orenstein Y. Reverse de Bruijn: Utilizing Reverse Peptide Synthesis to Cover All Amino Acid k-mers. J Comput Biol 2020;27:376-385. [PMID: 31995404 DOI: 10.1089/cmb.2019.0448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Dans PD, Balaceanu A, Pasi M, Patelli AS, Petkevičiūtė D, Walther J, Hospital A, Bayarri G, Lavery R, Maddocks JH, Orozco M. The static and dynamic structural heterogeneities of B-DNA: extending Calladine-Dickerson rules. Nucleic Acids Res 2019;47:11090-11102. [PMID: 31624840 PMCID: PMC6868377 DOI: 10.1093/nar/gkz905] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Revised: 09/25/2019] [Accepted: 10/06/2019] [Indexed: 12/12/2022] Open

Affiliation(s)

Pablo D Dans Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology. Baldiri Reixac 10–12, 08028 Barcelona, Spain Department of Biological Sciences, University of the Republic (UdelaR), CENUR Gral. Rivera 1350, 50000 Salto, Uruguay
Alexandra Balaceanu Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology. Baldiri Reixac 10–12, 08028 Barcelona, Spain
Marco Pasi LBPA, École normale supérieure Paris-Saclay, 61 Av. du Pdt Wilson, Cachan 94235, France Bases Moléculaires et Structurales des Systèmes Infectieux, Univ. Lyon I/CNRS UMR 5086, IBCP, 7 Passage du Vercors, Lyon 69367, France
Alessandro S Patelli Institute of Mathematics, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland
Daiva Petkevičiūtė Institute of Mathematics, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland Faculty of Mathematics and Natural Sciences, Kaunas University of Technology, Studentų g. 50, 51368 Kaunas, Lithuania
Jürgen Walther Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology. Baldiri Reixac 10–12, 08028 Barcelona, Spain
Adam Hospital Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology. Baldiri Reixac 10–12, 08028 Barcelona, Spain
Genís Bayarri Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology. Baldiri Reixac 10–12, 08028 Barcelona, Spain
Richard Lavery Bases Moléculaires et Structurales des Systèmes Infectieux, Univ. Lyon I/CNRS UMR 5086, IBCP, 7 Passage du Vercors, Lyon 69367, France
John H Maddocks Institute of Mathematics, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland
Modesto Orozco Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology. Baldiri Reixac 10–12, 08028 Barcelona, Spain Department of Biochemistry and Molecular Biology. University of Barcelona, 08028 Barcelona, Spain

Collapse

Orenstein Y, Puccinelli R, Kim R, Fordyce P, Berger B. Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping. Cell Syst 2019;5:230-236.e5. [PMID: 28957657 DOI: 10.1016/j.cels.2017.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2017] [Revised: 04/14/2017] [Accepted: 07/27/2017] [Indexed: 11/27/2022]

Li W, Thanos D, Provata A. Quantifying local randomness in human DNA and RNA sequences using Erdös motifs. J Theor Biol 2018;461:41-50. [PMID: 30336158 DOI: 10.1016/j.jtbi.2018.09.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 08/14/2018] [Accepted: 09/25/2018] [Indexed: 10/28/2022]

Orenstein Y, Yu YW, Berger B. Joker de Bruijn: Covering k-Mers Using Joker Characters. J Comput Biol 2018;25:1171-1178. [PMID: 30117747 PMCID: PMC6247992 DOI: 10.1089/cmb.2018.0032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

Sequence libraries that cover all k-mers enable universal and unbiased measurements of nucleotide and peptide binding. The shortest sequence to cover all k-mers is a de Bruijn sequence of length \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\vert \Sigma { \vert ^k} + k - 1$$ \end{document}. Researchers would like to increase k to measure interactions at greater detail, but face a challenging problem: the number of k-mers grows exponentially in k, while the space on the experimental device is limited. In this study, we introduce a novel advance to shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet. Theoretically, the use of joker characters can reduce the library size tremendously, but it should be limited as the introduced degeneracy lowers the statistical robustness of measurements. In this work, we consider the problem of generating a minimum-length sequence that covers a given set of k-mers using joker characters. The number and positions of the joker characters are provided as input. We first prove that the problem is NP-hard. We then present the first solution to the problem, which is based on two algorithmic innovations: (1) a greedy heuristic and (2) an integer linear programming (ILP) formulation. We first run the heuristic to find a good feasible solution, and then run an ILP solver to improve it. We ran our algorithm on DNA and amino acid alphabets to cover all k-mers for different values of k and k-mer multiplicity. Results demonstrate that it produces sequences that are very close to the theoretical lower bound.

Collapse

Orenstein Y, Berger B. Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers. J Comput Biol 2015;23:67-79. [PMID: 26713687 PMCID: PMC4752187 DOI: 10.1089/cmb.2015.0179] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open

The complex task of choosing a de novo assembly: Lessons from fungal genomes. Comput Biol Chem 2014;53 Pt A:97-107. [DOI: 10.1016/j.compbiolchem.2014.08.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 12/21/2022]