1
|
Alkhamis O, Xiao Y. Systematic Study of in Vitro Selection Stringency Reveals How To Enrich High-Affinity Aptamers. J Am Chem Soc 2023; 145:194-206. [PMID: 36574475 DOI: 10.1021/jacs.2c09522] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Aptamers are oligonucleotide receptors with great potential for sensing and therapeutic applications. They are isolated from random libraries through an in vitro method termed systematic evolution of ligands by exponential enrichment (SELEX). Although SELEX-based methods have been widely employed over several decades, many aspects of the experimental process remain poorly understood in terms of how to adjust the selection conditions to obtain aptamers with the desired set of binding characteristics. As a result, SELEX is often performed with arbitrary parameters that tend to produce aptamers with insufficient affinity and/or specificity. Having a better understanding of these basic principles could increase the likelihood of obtaining high-quality aptamers. Here, we have systematically investigated how altering the selection stringency in terms of target concentration─which is essentially the root source of selection pressure for aptamer isolation─affects the outcome of SELEX. By performing four separate trials of SELEX for the same small-molecule target, we experimentally prove that the use of excessively high target concentrations promotes enrichment of low-affinity binders while also suppressing the enrichment of high-affinity aptamers. These findings should be broadly applicable across SELEX methods, given that they share the same core operating principle, and will be crucial for guiding selections to obtain high-quality aptamers in the future.
Collapse
Affiliation(s)
- Obtin Alkhamis
- Department of Chemistry, North Carolina State University, 2620 Yarbrough Dr., Raleigh, North Carolina27695, United States
| | - Yi Xiao
- Department of Chemistry, North Carolina State University, 2620 Yarbrough Dr., Raleigh, North Carolina27695, United States
| |
Collapse
|
2
|
Spirov AV, Myasnikova EM. Heuristic algorithms in evolutionary computation and modular organization of biological macromolecules: Applications to in vitro evolution. PLoS One 2022; 17:e0260497. [PMID: 35085255 PMCID: PMC8794168 DOI: 10.1371/journal.pone.0260497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 11/10/2021] [Indexed: 11/19/2022] Open
Abstract
Evolutionary computing (EC) is an area of computer sciences and applied mathematics covering heuristic optimization algorithms inspired by evolution in Nature. EC extensively study all the variety of methods which were originally based on the principles of selectionism. As a result, many new algorithms and approaches, significantly more efficient than classical selectionist schemes, were found. This is especially true for some families of special problems. There are strong arguments to believe that EC approaches are quite suitable for modeling and numerical analysis of those methods of synthetic biology and biotechnology that are known as in vitro evolution. Therefore, it is natural to expect that the new algorithms and approaches developed in EC can be effectively applied in experiments on the directed evolution of biological macromolecules. According to the John Holland's Schema theorem, the effective evolutionary search in genetic algorithms (GA) is provided by identifying short schemata of high fitness which in the further search recombine into the larger building blocks (BBs) with higher and higher fitness. The multimodularity of functional biological macromolecules and the preservation of already found modules in the evolutionary search have a clear analogy with the BBs in EC. It seems reasonable to try to transfer and introduce the methods of EC, preserving BBs and essentially accelerating the search, into experiments on in vitro evolution. We extend the key instrument of the Holland's theory, the Royal Roads fitness function, to problems of the in vitro evolution (Biological Royal Staircase, BioRS, functions). The specific version of BioRS developed in this publication arises from the realities of experimental evolutionary search for (DNA-) RNA-devices (aptazymes). Our numerical tests showed that for problems with the BioRS functions, simple heuristic algorithms, which turned out to be very effective for preserving BBs in GA, can be very effective in in vitro evolution approaches. We are convinced that such algorithms can be implemented in modern methods of in vitro evolution to achieve significant savings in time and resources and a significant increase in the efficiency of evolutionary search.
Collapse
Affiliation(s)
- Alexander V. Spirov
- I. M. Sechenov Institute of Evolutionary Physiology and Biochemistry Russian Academy of Sciences, St. Petersburg, Russia
- The Institute of Scientific Information for Social Sciences RAS, Moscow, Russia
| | | |
Collapse
|
3
|
Modeling SELEX for regulatory regions using Royal Road and Royal Staircase fitness functions. Biosystems 2020; 200:104312. [PMID: 33278501 DOI: 10.1016/j.biosystems.2020.104312] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 11/11/2020] [Accepted: 11/23/2020] [Indexed: 01/24/2023]
Abstract
The field of evolutionary algorithms (EAs) emerged in the area of computer science due to transfer of ideas from biology and developed independently for several decades, enriched with techniques from probability theory, complexity theory and optimization methods. In this paper, we consider some recent results form the EAs theory transferred back into biology. The well-known biotechnological procedure SELEX (Systematic Evolution of Ligands by EXponential enrichment) is viewed as an experimental implementation of an evolutionary algorithm. Theoretical bounds on EAs runtime are applied to model SELEX search for a regulatory region consisting of promoter and enhancer sequences. A comparison of theoretical bounds to the results of computational simulation indicates some cases where the theoretical bounds give favorable prediction, while simulation requires prohibitive computational resource.
Collapse
|
4
|
Djordjevic M, Rodic A, Graovac S. From biophysics to 'omics and systems biology. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2019; 48:413-424. [PMID: 30972433 DOI: 10.1007/s00249-019-01366-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 02/12/2019] [Accepted: 04/03/2019] [Indexed: 01/03/2023]
Abstract
Recent decades brought a revolution to biology, driven mainly by exponentially increasing amounts of data coming from "'omics" sciences. To handle these data, bioinformatics often has to combine biologically heterogeneous signals, for which methods from statistics and engineering (e.g. machine learning) are often used. While such an approach is sometimes necessary, it effectively treats the underlying biological processes as a black box. Similarly, systems biology deals with inherently complex systems, characterized by a large number of degrees of freedom, and interactions that are highly non-linear. To deal with this complexity, the underlying physical interactions are often (over)simplified, such as in Boolean modelling of network dynamics. In this review, we argue for the utility of applying a biophysical approach in bioinformatics and systems biology, including discussion of two examples from our research which address sequence analysis and understanding intracellular gene expression dynamics.
Collapse
Affiliation(s)
- Marko Djordjevic
- Faculty of Biology, Institute of Physiology and Biochemistry, University of Belgrade, Belgrade, Serbia.
| | - Andjela Rodic
- Faculty of Biology, Institute of Physiology and Biochemistry, University of Belgrade, Belgrade, Serbia.,Interdisciplinary PhD Program in Biophysics, University of Belgrade, Belgrade, Serbia
| | - Stefan Graovac
- Faculty of Biology, Institute of Physiology and Biochemistry, University of Belgrade, Belgrade, Serbia.,Interdisciplinary PhD Program in Biophysics, University of Belgrade, Belgrade, Serbia
| |
Collapse
|
5
|
Abstract
Transcription factors (TFs) control gene expression by binding to genomic DNA in a sequence-specific manner. Mutations in TF binding sites are increasingly found to be associated with human disease, yet we currently lack robust methods to predict these sites. Here, we developed a versatile maximum likelihood framework named No Read Left Behind (NRLB) that infers a biophysical model of protein-DNA recognition across the full affinity range from a library of in vitro selected DNA binding sites. NRLB predicts human Max homodimer binding in near-perfect agreement with existing low-throughput measurements. It can capture the specificity of the p53 tetramer and distinguish multiple binding modes within a single sample. Additionally, we confirm that newly identified low-affinity enhancer binding sites are functional in vivo, and that their contribution to gene expression matches their predicted affinity. Our results establish a powerful paradigm for identifying protein binding sites and interpreting gene regulatory sequences in eukaryotic genomes.
Collapse
|
6
|
Zhang L, Martini GD, Rube HT, Kribelbauer JF, Rastogi C, FitzPatrick VD, Houtman JC, Bussemaker HJ, Pufall MA. SelexGLM differentiates androgen and glucocorticoid receptor DNA-binding preference over an extended binding site. Genome Res 2017; 28:111-121. [PMID: 29196557 PMCID: PMC5749176 DOI: 10.1101/gr.222844.117] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 11/22/2017] [Indexed: 11/28/2022]
Abstract
The DNA-binding interfaces of the androgen (AR) and glucocorticoid (GR) receptors are virtually identical, yet these transcription factors share only about a third of their genomic binding sites and regulate similarly distinct sets of target genes. To address this paradox, we determined the intrinsic specificities of the AR and GR DNA-binding domains using a refined version of SELEX-seq. We developed an algorithm, SelexGLM, that quantifies binding specificity over a large (31-bp) binding site by iteratively fitting a feature-based generalized linear model to SELEX probe counts. This analysis revealed that the DNA-binding preferences of AR and GR homodimers differ significantly, both within and outside the 15-bp core binding site. The relative preference between the two factors can be tuned over a wide range by changing the DNA sequence, with AR more sensitive to sequence changes than GR. The specificity of AR extends to the regions flanking the core 15-bp site, where isothermal calorimetry measurements reveal that affinity is augmented by enthalpy-driven readout of poly(A) sequences associated with narrowed minor groove width. We conclude that the increased specificity of AR is correlated with more enthalpy-driven binding than GR. The binding models help explain differences in AR and GR genomic binding and provide a biophysical rationale for how promiscuous binding by GR allows functional substitution for AR in some castration-resistant prostate cancers.
Collapse
Affiliation(s)
- Liyang Zhang
- Department of Biochemistry, Carver College of Medicine, University of Iowa, Iowa City, Iowa 52242, USA
| | - Gabriella D Martini
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.,Department of Systems Biology, Columbia University Medical Center, New York, New York 10032, USA
| | - H Tomas Rube
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.,Department of Systems Biology, Columbia University Medical Center, New York, New York 10032, USA
| | - Judith F Kribelbauer
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.,Department of Systems Biology, Columbia University Medical Center, New York, New York 10032, USA
| | - Chaitanya Rastogi
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.,Department of Systems Biology, Columbia University Medical Center, New York, New York 10032, USA
| | - Vincent D FitzPatrick
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.,Department of Systems Biology, Columbia University Medical Center, New York, New York 10032, USA
| | - Jon C Houtman
- Department of Immunology, Carver College of Medicine, University of Iowa, Iowa City, Iowa 52242, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.,Department of Systems Biology, Columbia University Medical Center, New York, New York 10032, USA
| | - Miles A Pufall
- Department of Biochemistry, Carver College of Medicine, University of Iowa, Iowa City, Iowa 52242, USA
| |
Collapse
|
7
|
Djordjevic M, Djordjevic M, Zdobnov E. Scoring Targets of Transcription in Bacteria Rather than Focusing on Individual Binding Sites. Front Microbiol 2017; 8:2314. [PMID: 29213263 PMCID: PMC5702782 DOI: 10.3389/fmicb.2017.02314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Accepted: 11/09/2017] [Indexed: 11/13/2022] Open
Abstract
Reliable identification of targets of bacterial regulators is necessary to understand bacterial gene expression regulation. These targets are commonly predicted by searching for high-scoring binding sites in the upstream genomic regions, which typically leads to a large number of false positives. In contrast to the common approach, here we propose a novel concept, where overrepresentation of the scoring distribution that corresponds to the entire searched region is assessed, as opposed to predicting individual binding sites. We explore two implementations of this concept, based on Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) tests, which both provide straightforward P-value estimates for predicted targets. This approach is implemented for pleiotropic bacterial regulators, including σ70 (bacterial housekeeping σ factor) target predictions, which is a classical bioinformatics problem characterized by low specificity. We show that KS based approach is both faster and more accurate, departing from the current paradigm of AD being slower, but more accurate. Moreover, KS approach leads to a significant increase in the search accuracy compared to the standard approach, while at the same time straightforwardly assigning well established P-values to each potential target. Consequently, the new KS based method proposed here, which assigns P-values to fixed length upstream regions, provides a fast and accurate approach for predicting bacterial transcription targets.
Collapse
Affiliation(s)
- Marko Djordjevic
- Institute of Physiology and Biochemistry, Faculty of Biology, University of Belgrade, Belgrade, Serbia
| | | | - Evgeny Zdobnov
- Swiss Institute of Bioinformatics and Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland
| |
Collapse
|
8
|
Support vector machine classification of streptavidin-binding aptamers. PLoS One 2014; 9:e99964. [PMID: 24927174 PMCID: PMC4057401 DOI: 10.1371/journal.pone.0099964] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Accepted: 05/21/2014] [Indexed: 11/21/2022] Open
Abstract
Background Synthesizing and characterizing aptamers with high affinity and specificity have been extensively carried out for analytical and biomedical applications. Few publications can be found that describe structure–activity relationships (SARs) of candidate aptamer sequences. Methodology This paper reports pattern recognition with support vector machine (SVM) classification techniques for the identification of streptavidin-binding aptamers as “low” or “high” affinity aptamers. The SVM parameters C and γ were optimized using genetic algorithms. Four descriptors, the topological descriptor PW4 (path/walk 4 - Randic shape index), the connectivity index X3A (average connectivity index chi-3), the topological charge index JGI2 (mean topological charge index of order 2), and the free energy E of the secondary structure, were used to describe the structures of candidate aptamer sequences from SELEX selection (Schütze et al. (2011) PLoS ONE (12):e29604). Conclusions The predicted fractions of winning streptavidin-binding aptamers for ten rounds of SELEX conform to the aptamer evolutionary principles of SELEX-based screening. The feasibility of applying pattern recognition based on SVM and genetic algorithms for streptavidin-binding aptamers has been demonstrated.
Collapse
|
9
|
Romero-López C, Díaz-González R, Berzal-Herranz A. RNA Selection and EvolutionIn Vitro:Powerful Techniques for the Analysis and Identification of new Molecular Tools. BIOTECHNOL BIOTEC EQ 2014. [DOI: 10.1080/13102818.2007.10817461] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
|
10
|
Jing M, Bowser MT. Tracking the emergence of high affinity aptamers for rhVEGF165 during capillary electrophoresis-systematic evolution of ligands by exponential enrichment using high throughput sequencing. Anal Chem 2013; 85:10761-70. [PMID: 24125636 PMCID: PMC3892959 DOI: 10.1021/ac401875h] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Capillary electrophoresis-systematic evolution of ligands by exponential enrichment (CE-SELEX) is a powerful technique for isolating aptamers for various targets, from large proteins to small peptides with molecular weights of several kilodaltons. One of the unique characteristics of CE-SELEX is the relatively high heterogeneity of the ssDNA pools that remains even after multiple rounds of selection. Enriched sequences or highly abundant oligonucleotide motifs are rarely reported in CE-SELEX studies. In this work, we employed 454 pyrosequencing to profile the evolution of an oligonucleotide pool through multiple rounds of CE-SELEX selection against the target recombinant human vascular endothelial growth factor 165 (rhVEGF165). High throughput sequencing allowed up to 3 × 10(4) sequences to be obtained from each selected pool and compared to the unselected library. Remarkably, the highest abundance contiguous sequence (contig) was only present in 0.8% of sequences even after four rounds of selection. Closer analyses of the most abundant contigs, the top 1000 oligonucleotide fragments, and even the eight original FASTA files showed no evidence of prevailing motifs in the selected pools. The sequencing results also provided insight into why many CE-SELEX selections obtain pools with reduced affinities after many rounds of selection (typically >4). Preferential amplification of a particular short polymerase chain reaction (PCR) product allowed this nonbinding sequence to overtake the pool in later rounds of selection suggesting that further refinement of primer design or amplification optimization is necessary. High affinity aptamers with 10(-8) M dissociation constants for rhVEGF165 were identified. The affinities of the higher abundance contigs were compared with aptamers randomly chosen from the final selection pool using affinity capillary electrophoresis (ACE) and fluorescence polarization (FP). No statistical difference in affinity between the higher abundance contigs and the randomly chosen aptamers was observed, supporting the premise that CE-SELEX selects a uniquely heterogeneous pool of high affinity aptamers.
Collapse
Affiliation(s)
- Meng Jing
- Department of Chemistry, University of Minnesota, 207 Pleasant Street SE, Minneapolis, Minnesota, 55455, United States
| | - Michael T. Bowser
- Department of Chemistry, University of Minnesota, 207 Pleasant Street SE, Minneapolis, Minnesota, 55455, United States
| |
Collapse
|
11
|
Djordjevic M. Efficient transcription initiation in bacteria: an interplay of protein-DNA interaction parameters. Integr Biol (Camb) 2013; 5:796-806. [PMID: 23511241 DOI: 10.1039/c3ib20221f] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
As the first, and usually rate-limiting, step of transcription initiation, bacterial RNA polymerase (RNAP) binds to double stranded DNA (dsDNA) and subsequently opens the two strands of DNA (the open complex formation). The rate determining step in the open complex formation is opening of a short (6 bp) DNA called the -10 region, which interacts with RNAP in both dsDNA and single stranded (ssDNA) forms. Accordingly, formation of the open complex depends on (physically independent) domains of RNAP that interact with ssDNA and dsDNA, as well as on parameters of DNA melting and sequences of -10 regions. We here aim to understand how these different interactions are mutually related to ensure efficient open complex formation. To achieve this, we use a recently developed biophysical model of transcription initiation, which allows the calculation of the kinetic parameters of transcription initiation on the scale of whole genome. We consequently investigate kinetic properties of sequences derived from all E. coli intergenic regions, and from more than 300 experimentally confirmed E. coli σ(70) promoters. We find that interaction specificities of σ(70) DNA binding domains reduce the number of sequences where RNAP binds strongly, but forms the open complex too slowly to achieve functional transcription (so-called poised promoters). However, we find that, despite this reduction, there is still a significant number of such poised promoters in the intergenic regions, which may provide a major source of false positives in genome-wide searches of transcription start sites. Furthermore, we surprisingly find that sequences of -10 regions of the functional promoters increase the extent of RNAP poising, which we interpret in terms of an extension of a recently proposed model of promoter recognition ('mix-and-match model') to kinetic parameters. Overall, our results allow better understanding of the design of σ(70) DNA binding domains and promoter sequences, and place a fundamental limit on accuracy of methods for promoter detection that are based on strong RNAP binding (e.g. ChIP-chip).
Collapse
Affiliation(s)
- Marko Djordjevic
- Institute of Physiology and Biochemistry, Faculty of Biology, University of Belgrade, Studentski trg 16, 11000 Belgrade, Serbia.
| |
Collapse
|
12
|
Ozer A, White BS, Lis JT, Shalloway D. Density-dependent cooperative non-specific binding in solid-phase SELEX affinity selection. Nucleic Acids Res 2013; 41:7167-75. [PMID: 23737446 PMCID: PMC3737557 DOI: 10.1093/nar/gkt477] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
The non-specific binding of undesired ligands to a target is the primary factor limiting the enrichment of tight-binding ligands in affinity selection. Solution-phase non-specific affinity is determined by the free-energy of ligand binding to a single target. However, the solid-phase affinity might be higher if a ligand bound concurrently to multiple adjacent immobilized targets in a cooperative manner. Cooperativity could emerge in this case as a simple consequence of the relationship between the free energy of binding, localization entropy and the spatial distribution of the immobilized targets. We tested this hypothesis using a SELEX experimental design and found that non-specific RNA aptamer ligands can concurrently bind up to four bead-immobilized peptide targets, and that this can increase their effective binding affinity by two orders-of-magnitude. Binding curves were quantitatively explained by a new statistical mechanical model of density-dependent cooperative binding, which relates cooperative binding to both the target concentration and the target surface density on the immobilizing substrate. Target immobilization plays a key role in SELEX and other ligand enrichment methods, particularly in new multiplexed microfluidic purification devices, and these results have strong implications for optimizing their performance.
Collapse
Affiliation(s)
- Abdullah Ozer
- Department of Molecular Biology and Genetics, Cornell University, Biotechnology Building, Ithaca, NY 14853,USA
| | | | | | | |
Collapse
|
13
|
Atherton J, Boley N, Brown B, Ogawa N, Davidson SM, Eisen MB, Biggin MD, Bickel P. A model for sequential evolution of ligands by exponential enrichment (SELEX) data. Ann Appl Stat 2012. [DOI: 10.1214/12-aoas537] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
14
|
Influence of target concentration and background binding on in vitro selection of affinity reagents. PLoS One 2012; 7:e43940. [PMID: 22952815 PMCID: PMC3429449 DOI: 10.1371/journal.pone.0043940] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2012] [Accepted: 07/27/2012] [Indexed: 11/28/2022] Open
Abstract
Nucleic acid-based aptamers possess many useful features that make them a promising alternative to antibodies and other affinity reagents, including well-established chemical synthesis, reversible folding, thermal stability and low cost. However, the selection process typically used to generate aptamers (SELEX) often requires significant resources and can fail to yield aptamers with sufficient affinity and specificity. A number of seminal theoretical models and numerical simulations have been reported in the literature offering insights into experimental factors that govern the effectiveness of the selection process. Though useful, these previous models have not considered the full spectrum of experimental factors or the potential impact of tuning these parameters at each round over the course of a multi-round selection process. We have developed an improved mathematical model to address this important question, and report that both target concentration and the degree of non-specific background binding are critical determinants of SELEX efficiency. Although smaller target concentrations should theoretically offer superior selection outcome, we show that the level of background binding dramatically affect the target concentration that will yield maximum enrichment at each round of selection. Thus, our model enables experimentalists to determine appropriate target concentrations as a means for protocol optimization. Finally, we perform a comparative analysis of two different selection methods over multiple rounds of selection, and show that methods with inherently lower background binding offer dramatic advantages in selection efficiency.
Collapse
|
15
|
A novel approach for transcription factor analysis using SELEX with high-throughput sequencing (TFAST). PLoS One 2012; 7:e42761. [PMID: 22956994 PMCID: PMC3430675 DOI: 10.1371/journal.pone.0042761] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2012] [Accepted: 07/12/2012] [Indexed: 01/13/2023] Open
Abstract
Background In previous work, we designed a modified aptamer-free SELEX-seq protocol (afSELEX-seq) for the discovery of transcription factor binding sites. Here, we present original software, TFAST, designed to analyze afSELEX-seq data, validated against our previously generated afSELEX-seq dataset and a model dataset. TFAST is designed with a simple graphical interface (Java) so that it can be installed and executed without extensive expertise in bioinformatics. TFAST completes analysis within minutes on most personal computers. Methodology Once afSELEX-seq data are aligned to a target genome, TFAST identifies peaks and, uniquely, compares peak characteristics between cycles. TFAST generates a hierarchical report of graded peaks, their associated genomic sequences, binding site length predictions, and dummy sequences. Principal Findings Including additional cycles of afSELEX-seq improved TFAST's ability to selectively identify peaks, leading to 7,274, 4,255, and 2,628 peaks identified in two-, three-, and four-cycle afSELEX-seq. Inter-round analysis by TFAST identified 457 peaks as the strongest candidates for true binding sites. Separating peaks by TFAST into classes of worst, second-best and best candidate peaks revealed a trend of increasing significance (e-values 4.5×1012, 2.9×10−46, and 1.2×10−73) and informational content (11.0, 11.9, and 12.5 bits over 15 bp) of discovered motifs within each respective class. TFAST also predicted a binding site length (28 bp) consistent with non-computational experimentally derived results for the transcription factor PapX (22 to 29 bp). Conclusions/Significance TFAST offers a novel and intuitive approach for determining DNA binding sites of proteins subjected to afSELEX-seq. Here, we demonstrate that TFAST, using afSELEX-seq data, rapidly and accurately predicted sequence length and motif for a putative transcription factor's binding site.
Collapse
|
16
|
Abstract
Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an experimental procedure that allows extraction, from an initially random pool of oligonucleotides, of the oligomers with a high binding affinity for a given molecular target. The highest affinity binding sequences isolated through SELEX can have numerous research, diagnostic, and therapeutic applications. Recently, important new modifications of the SELEX protocol have been proposed. In particular, a suitably modified SELEX experiment, together with an appropriate computational procedure, allows inference of protein-DNA interaction parameters with up to now unprecedented accuracy. Such inference is possible even when there is no a priori information on transcription factor binding specificity, which allows accurate predictions of binding sites for any transcription factor of interest. In this chapter we discuss how to accurately determine protein-DNA interaction parameters from SELEX experiments. The chapter addresses experimental and computational procedure needed to generate and analyze appropriate data.
Collapse
|
17
|
Ladunga I. An overview of the computational analyses and discovery of transcription factor binding sites. Methods Mol Biol 2010; 674:1-22. [PMID: 20827582 DOI: 10.1007/978-1-60761-854-6_1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Here we provide a pragmatic, high-level overview of the computational approaches and tools for the discovery of transcription factor binding sites. Unraveling transcription regulatory networks and their malfunctions such as cancer became feasible due to recent stellar progress in experimental techniques and computational analyses. While predictions of isolated sites still pose notorious challenges, cis-regulatory modules (clusters) of binding sites can now be identified with high accuracy. Further support comes from conserved DNA segments, co-regulation, transposable elements, nucleosomes, and three-dimensional chromosomal structures. We introduce computational tools for the analysis and interpretation of chromatin immunoprecipitation, next-generation sequencing, SELEX, and protein-binding microarray results. Because immunoprecipitation produces overly large DNA segments and well over half of the sequencing reads from constitute background noise, methods are presented for background correction, sequence read mapping, peak calling, false discovery rate estimation, and co-localization analyses. To discover short binding site motifs from extensive immunoprecipitation segments, we recommend algorithms and software based on expectation maximization and Gibbs sampling. Data integration using several databases further improves performance. Binding sites can be visualized in genomic and chromatin context using genome browsers. Binding site information, integrated with co-expression in large compendia of gene expression experiments, allows us to reveal complex transcriptional regulatory networks.
Collapse
Affiliation(s)
- Istvan Ladunga
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE, USA.
| |
Collapse
|
18
|
Zhao Y, Granas D, Stormo GD. Inferring binding energies from selected binding sites. PLoS Comput Biol 2009; 5:e1000590. [PMID: 19997485 PMCID: PMC2777355 DOI: 10.1371/journal.pcbi.1000590] [Citation(s) in RCA: 159] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Accepted: 11/02/2009] [Indexed: 11/18/2022] Open
Abstract
We employ a biophysical model that accounts for the non-linear relationship between binding energy and the statistics of selected binding sites. The model includes the chemical potential of the transcription factor, non-specific binding affinity of the protein for DNA, as well as sequence-specific parameters that may include non-independent contributions of bases to the interaction. We obtain maximum likelihood estimates for all of the parameters and compare the results to standard probabilistic methods of parameter estimation. On simulated data, where the true energy model is known and samples are generated with a variety of parameter values, we show that our method returns much more accurate estimates of the true parameters and much better predictions of the selected binding site distributions. We also introduce a new high-throughput SELEX (HT-SELEX) procedure to determine the binding specificity of a transcription factor in which the initial randomized library and the selected sites are sequenced with next generation methods that return hundreds of thousands of sites. We show that after a single round of selection our method can estimate binding parameters that give very good fits to the selected site distributions, much better than standard motif identification algorithms. The DNA binding sites of transcription factors that control gene expression are often predicted based on a collection of known or selected binding sites. The most commonly used methods for inferring the binding site pattern, or sequence motif, assume that the sites are selected in proportion to their affinity for the transcription factor, ignoring the effect of the transcription factor concentration. We have developed a new maximum likelihood approach, in a program called BEEML, that directly takes into account the transcription factor concentration as well as non-specific contributions to the binding affinity, and we show in simulation studies that it gives a much more accurate model of the transcription factor binding sites than previous methods. We also develop a new method for extracting binding sites for a transcription factor from a random pool of DNA sequences, called high-throughput SELEX (HT-SELEX), and we show that after a single round of selection BEEML can obtain an accurate model of the transcription factor binding sites.
Collapse
Affiliation(s)
- Yue Zhao
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - David Granas
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Gary D. Stormo
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
19
|
Better estimation of protein-DNA interaction parameters improve prediction of functional sites. BMC Biotechnol 2008; 8:94. [PMID: 19105805 PMCID: PMC2654563 DOI: 10.1186/1472-6750-8-94] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2008] [Accepted: 12/23/2008] [Indexed: 11/23/2022] Open
Abstract
Background Characterizing transcription factor binding motifs is a common bioinformatics task. For transcription factors with variable binding sites, we need to get many suboptimal binding sites in our training dataset to get accurate estimates of free energy penalties for deviating from the consensus DNA sequence. One procedure to do that involves a modified SELEX (Systematic Evolution of Ligands by Exponential Enrichment) method designed to produce many such sequences. Results We analyzed low stringency SELEX data for E. coli Catabolic Activator Protein (CAP), and we show here that appropriate quantitative analysis improves our ability to predict in vitro affinity. To obtain large number of sequences required for this analysis we used a SELEX SAGE protocol developed by Roulet et al. The sequences obtained from here were subjected to bioinformatic analysis. The resulting bioinformatic model characterizes the sequence specificity of the protein more accurately than those sequence specificities predicted from previous analysis just by using a few known binding sites available in the literature. The consequences of this increase in accuracy for prediction of in vivo binding sites (and especially functional ones) in the E. coli genome are also discussed. We measured the dissociation constants of several putative CAP binding sites by EMSA (Electrophoretic Mobility Shift Assay) and compared the affinities to the bioinformatics scores provided by methods like the weight matrix method and QPMEME (Quadratic Programming Method of Energy Matrix Estimation) trained on known binding sites as well as on the new sites from SELEX SAGE data. We also checked predicted genome sites for conservation in the related species S. typhimurium. We found that bioinformatics scores based on SELEX SAGE data does better in terms of prediction of physical binding energies as well as in detecting functional sites. Conclusion We think that training binding site detection algorithms on datasets from binding assays lead to better prediction. The improvements in accuracy came from the unbiased nature of the SELEX dataset rather than from the number of sites available. We believe that with progress in short-read sequencing technology, one could use SELEX methods to characterize binding affinities of many low specificity transcription factors.
Collapse
|
20
|
Abstract
Over the last two decades, a large amount of data on initiation of transcription by bacterial RNA polymerase (RNAP) has been obtained. However, a question of how the open complex is formed still remains open, and several qualitative hypotheses for opening of DNA by RNAP have been proposed. To provide a theoretical framework needed to analyze the assembled experimental data, we here develop the first quantitative model of the open complex formation by bacterial RNAP. We first show that a simple hypothesis (which might follow from recent bioinformatic and experimental results), by which promoter DNA is melted in one step through thermal fluctuations, is inconsistent with experimental data. We next consider a more complex two-step view of the open complex formation. According to this hypothesis, the transcription bubble is formed in the -10 region, and consequently extends to the transcription start site. We derive how the open complex formation rate depends on DNA duplex melting energy and on interaction energies of RNAP with promoter DNA in the closed and open complex. This relationship provides an explicit connection between transcription initiation rate and physical properties of the promoter sequence and promoter-RNAP interactions. We compare our model with both biochemical measurements and genomics data and report a very good agreement with the experiments, with no free parameters used in model testing. This agreement therefore strongly supports both the quantitative model that we propose and the qualitative hypothesis on which the model is based. From a practical point, our results allow efficient estimation of promoter kinetic parameters, as well as engineering of promoter sequences with the desired kinetic properties.
Collapse
|
21
|
Abstract
DNA-protein interactions are fundamental to many biological processes, including the regulation of gene expression. Determining the binding affinities of transcription factors (TFs) to different DNA sequences allows the quantitative modeling of transcriptional regulatory networks and has been a significant technical challenge in molecular biology for many years. A recent paper by Maerkl and Quake1 demonstrated the use of microfluidic technology for the analysis of DNA-protein interactions. An array of short DNA sequences was spotted onto a glass slide, which was then covered with a microfluidic device allowing each spot to be within a chamber into which the flow of materials was controlled by valves. By trapping the DNA-protein complexes on the surface and measuring their concentrations microscopically, they could determine the binding affinity to a large number of DNA sequences that were varied systematically. They studied four TFs from the basic helix-loop-helix family of proteins, all of which bind to E-box sites with the consensus CAnnTG (where "n" can be any base), and showed that variations in affinity for different sites allows each TF to regulate different genes.
Collapse
Affiliation(s)
- Gary D Stormo
- Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA.
| | | |
Collapse
|
22
|
Driller K, Pagenstecher A, Uhl M, Omran H, Berlis A, Gründer A, Sippel AE. Nuclear factor I X deficiency causes brain malformation and severe skeletal defects. Mol Cell Biol 2007; 27:3855-3867. [PMID: 17353270 PMCID: PMC1899988 DOI: 10.1128/mcb.02293-06] [Citation(s) in RCA: 110] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The transcription factor family of nuclear factor I (NFI) proteins is encoded by four closely related genes: Nfia, Nfib, Nfic, and Nfix. A potential role for NFI proteins in regulating developmental processes has been implicated by their specific expression pattern during embryonic development and by analysis of NFI-deficient mice. It was shown that loss of NFIA results in hydrocephalus and agenesis of the corpus callosum and that NFIB deficiency leads to neurological defects and to severe lung hypoplasia, whereas Nfic knockout mice exhibit specific tooth defects. Here we report the knockout analysis of the fourth and last member of this gene family, Nfix. Loss of NFIX is postnatally lethal and leads to hydrocephalus and to a partial agenesis of the corpus callosum. Furthermore, NFIX-deficient mice develop a deformation of the spine, which is due to a delay in ossification of vertebral bodies and a progressive degeneration of intervertebral disks. Impaired endochondral ossification and decreased mineralization were also observed in femoral sections of Nfix-/- mice. Consistent with the defects in bone ossification we could show that the expression level of tetranectin, a plasminogen-binding protein involved in mineralization, is specifically downregulated in bones of NFIX-deficient mice.
Collapse
Affiliation(s)
- Katrin Driller
- Institut für Biologie III, Fakultät für Biologie, Albert-Ludwigs Universität Freiburg, Schänzlestrasse 1, D-79104 Freiburg, Germany
| | | | | | | | | | | | | |
Collapse
|
23
|
Djordjevic M. SELEX experiments: new prospects, applications and data analysis in inferring regulatory pathways. ACTA ACUST UNITED AC 2007; 24:179-89. [PMID: 17428731 DOI: 10.1016/j.bioeng.2007.03.001] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2006] [Revised: 03/02/2007] [Accepted: 03/06/2007] [Indexed: 10/23/2022]
Abstract
Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an experimental procedure that allows extraction, from an initially random pool of oligonucleotides, of the oligomers with a desired binding affinity for a given molecular target. The procedure can be used to infer the strongest binders for a given DNA or RNA binding protein, and the highest affinity binding sequences isolated through SELEX can have numerous research, diagnostic and therapeutic applications. Recently, important new modifications of the SELEX protocol have been proposed. In particular, a modification of the standard SELEX procedure allows generating a dataset from which protein-DNA interaction parameters can be determined with unprecedented accuracy. Another variant of SELEX allows investigating interactions of a protein with nucleic-acid fragments derived from the entire genome of an organism. We review here different SELEX-based methods, with particular emphasis on the experimental design and on the applications aimed at inferring protein-DNA interactions. In addition to the experimental issues, we also review relevant methods of data analysis, as well as theoretical modeling of SELEX.
Collapse
Affiliation(s)
- Marko Djordjevic
- Mathematical Biosciences Institute, The Ohio State University, Columbus, OH 43210, USA.
| |
Collapse
|
24
|
Levine HA, Nilsen-Hamilton M. A mathematical analysis of SELEX. Comput Biol Chem 2007; 31:11-35. [PMID: 17218151 PMCID: PMC2374838 DOI: 10.1016/j.compbiolchem.2006.10.002] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2006] [Accepted: 10/20/2006] [Indexed: 11/17/2022]
Abstract
Systematic evolution of ligands by exponential enrichment (SELEX) is a procedure by which a mixture of nucleic acids that vary in sequence can be separated into pure components with the goal of isolating those with specific biochemical activities. The basic idea is to combine the mixture with a specific target molecule and then separate the target-NA complex from the resulting reaction. The target-NA complex is then separated by mechanical means (for example by filtration), the NA is then eluted from the complex, amplified by polymerase chain reaction (PCR) and the process repeated. After several rounds, one should be left with a pool of [NA] that consists mostly of the species in the original pool that best binds to the target. In Irvine et al. [Irvine, D., Tuerk, C., Gold, L., 1991. SELEXION, systematic evolution of nucleic acids by exponential enrichment with integrated optimization by non-linear analysis. J. Mol. Biol. 222, 739-761] a mathematical analysis of this process was given. In this paper we revisit Irvine et al. [Ibid]. By rewriting the equations for the SELEX process, we considerably reduce the labor of computing the round to round distribution of nucleic acid fractions. We also establish necessary and sufficient conditions for the SELEX process to converge to a pool consisting solely of the best binding nucleic acid to a fixed target in a manner that maximizes the percentage of bound target. The assumption is that there is a single nucleic acid binding site on the target that permits occupation by not more than one nucleic acid. We analyze the case for which there is no background loss (no support losses and no free [NA] left on the support). We then examine the case in which such there are such losses. The significance of the analysis is that it suggests an experimental approach for the SELEX process as defined in Irvine et al. [Ibid] to converge to a pool consisting of a single best binding nucleic acid without recourse to any a priori information about the nature of the binding constants or the distribution of the individual nucleic acid fragments.
Collapse
Affiliation(s)
| | - Marit Nilsen-Hamilton
- Department of Biochemistry, Biophysics and Molecular Biology, , Iowa State University, Ames, Iowa, 50011, United States of America
| |
Collapse
|