1
|
From UK-2A to florylpicoxamid: Active learning to identify a mimic of a macrocyclic natural product. J Comput Aided Mol Des 2024; 38:19. [PMID: 38630341 DOI: 10.1007/s10822-024-00555-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 02/26/2024] [Indexed: 04/19/2024]
Abstract
Scaffold replacement as part of an optimization process that requires maintenance of potency, desirable biodistribution, metabolic stability, and considerations of synthesis at very large scale is a complex challenge. Here, we consider a set of over 1000 time-stamped compounds, beginning with a macrocyclic natural-product lead and ending with a broad-spectrum crop anti-fungal. We demonstrate the application of the QuanSA 3D-QSAR method employing an active learning procedure that combines two types of molecular selection. The first identifies compounds predicted to be most active of those most likely to be well-covered by the model. The second identifies compounds predicted to be most informative based on exhibiting low predicted activity but showing high 3D similarity to a highly active nearest-neighbor training molecule. Beginning with just 100 compounds, using a deterministic and automatic procedure, five rounds of 20-compound selection and model refinement identifies the binding metabolic form of florylpicoxamid. We show how iterative refinement broadens the domain of applicability of the successive models while also enhancing predictive accuracy. We also demonstrate how a simple method requiring very sparse data can be used to generate relevant ideas for synthetic candidates.
Collapse
|
2
|
Correction: Complex peptide macrocycle optimization: combining NMR restraints with conformational analysis to guide structure-based and ligand-based design. J Comput Aided Mol Des 2024; 38:12. [PMID: 38472529 DOI: 10.1007/s10822-024-00556-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2024]
|
3
|
Complex peptide macrocycle optimization: combining NMR restraints with conformational analysis to guide structure-based and ligand-based design. J Comput Aided Mol Des 2023; 37:519-535. [PMID: 37535171 PMCID: PMC10505130 DOI: 10.1007/s10822-023-00524-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 08/04/2023]
Abstract
Systematic optimization of large macrocyclic peptide ligands is a serious challenge. Here, we describe an approach for lead-optimization using the PD-1/PD-L1 system as a retrospective example of moving from initial lead compound to clinical candidate. We show how conformational restraints can be derived by exploiting NMR data to identify low-energy solution ensembles of a lead compound. Such restraints can be used to focus conformational search for analogs in order to accurately predict bound ligand poses through molecular docking and thereby estimate ligand strain and protein-ligand intermolecular binding energy. We also describe an analogous ligand-based approach that employs molecular similarity optimization to predict bound poses. Both approaches are shown to be effective for prioritizing lead-compound analogs. Surprisingly, relatively small ligand modifications, which may have minimal effects on predicted bound pose or intermolecular interactions, often lead to large changes in estimated strain that have dominating effects on overall binding energy estimates. Effective macrocyclic conformational search is crucial, whether in the context of NMR-based restraints, X-ray ligand refinement, partial torsional restraint for docking/ligand-similarity calculations or agnostic search for nominal global minima. Lead optimization for peptidic macrocycles can be made more productive using a multi-disciplinary approach that combines biophysical data with practical and efficient computational methods.
Collapse
|
4
|
Unmasking the True Identity of Rapamycin's Minor Conformer. JOURNAL OF NATURAL PRODUCTS 2023. [PMID: 37432113 PMCID: PMC10391620 DOI: 10.1021/acs.jnatprod.3c00421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 07/12/2023]
Abstract
Rapamycin, a well-known macrocyclic natural product with myriad biological activities, has been the subject of intense study since its first isolation and characterization over five decades ago. Rapamycin has been found to adopt a single conformation in the solid state (both when protein bound and uncomplexed) and exists as a mixture of two conformations in solution. Early work established that the major conformer in solution is the trans amide isomer but left the minor conformer mostly uncharacterized. Since that time, it has been widely accepted that the minor conformer of rapamycin is the cis amide, based solely on analogy to FK-506, another potent immunosuppressive compound with some shared key structural elements. To address this long-standing and unresolved question, the solution structure of the minor conformer of rapamycin was investigated using a combination of NMR techniques and computational methods and determined to be a trans amide species with rotation about the ester linkage.
Collapse
|
5
|
A Distributional Model of Bound Ligand Conformational Strain: From Small Molecules up to Large Peptidic Macrocycles. J Med Chem 2023; 66:1955-1971. [PMID: 36701387 PMCID: PMC9923749 DOI: 10.1021/acs.jmedchem.2c01744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The internal conformational strain incurred by ligands upon binding a target site has a critical impact on binding affinity, and expectations about the magnitude of ligand strain guide conformational search protocols. Estimates for bound ligand strain begin with modeled ligand atomic coordinates from X-ray co-crystal structures. By deriving low-energy conformational ensembles to fit X-ray diffraction data, calculated strain energies are substantially reduced compared with prior approaches. We show that the distribution of expected global strain energy values is dependent on molecular size in a superlinear manner. The distribution of strain energy follows a rectified normal distribution whose mean and variance are related to conformational complexity. The modeled strain distribution closely matches calculated strain values from experimental data comprising over 3000 protein-ligand complexes. The distributional model has direct implications for conformational search protocols as well as for directions in molecular design.
Collapse
|
6
|
Solution cis-Proline Conformation of IPCs Inhibitor Aureobasidin A Elucidated via NMR-Based Conformational Analysis. JOURNAL OF NATURAL PRODUCTS 2022; 85:1449-1458. [PMID: 35622967 DOI: 10.1021/acs.jnatprod.1c01071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Aureobasidin A (abA) is a natural depsipeptide that inhibits inositol phosphorylceramide (IPC) synthases with significant broad-spectrum antifungal activity. abA is known to have two distinct conformations in solution corresponding to trans- and cis-proline (Pro) amide bond rotamers. While the trans-Pro conformation has been studied extensively, cis-Pro conformers have remained elusive. Conformational properties of cyclic peptides are known to strongly affect both potency and cell permeability, making a comprehensive characterization of abA conformation highly desirable. Here, we report a high-resolution 3D structure of the cis-Pro conformer of aureobasidin A elucidated for the first time using a recently developed NMR-driven computational approach. This approach utilizes ForceGen's advanced conformational sampling of cyclic peptides augmented by sparse distance and torsion angle constraints derived from NMR data. The obtained 3D conformational structure of cis-Pro abA has been validated using anisotropic residual dipolar coupling measurements. Support for the biological relevance of both the cis-Pro and trans-Pro abA configurations was obtained through molecular similarity experiments, which showed a significant 3D similarity between NMR-restrained abA conformational ensembles and another IPC synthase inhibitor, pleofungin A. Such ligand-based comparisons can further our understanding of the important steric and electrostatic characteristics of abA and can be utilized in the design of future therapeutics.
Collapse
|
7
|
Synergy and Complementarity between Focused Machine Learning and Physics-Based Simulation in Affinity Prediction. J Chem Inf Model 2021; 61:5948-5966. [PMID: 34890185 DOI: 10.1021/acs.jcim.1c01382] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We present results on the extent to which physics-based simulation (exemplified by FEP+) and focused machine learning (exemplified by QuanSA) are complementary for ligand affinity prediction. For both methods, predictions of activity for LFA-1 inhibitors from a medicinal chemistry lead optimization project were accurate within the applicable domain of each approach. A hybrid model that combined predictions by both approaches by simple averaging performed better than either method, with respect to both ranking and absolute pKi values. Two publicly available FEP+ benchmarks, covering 16 diverse biological targets, were used to test the generality of the synergy. By identifying training data specifically focused on relevant ligands, accurate QuanSA models were derived using ligand activity data known at the time of the original series publications. Results across the 16 benchmark targets demonstrated significant improvements both for ranking and for absolute pKi values using hybrid predictions that combined the FEP+ and QuanSA predicted affinity values. The results argue for a combined approach for affinity prediction that makes use of physics-driven methods as well as those driven by machine learning, each applied carefully on appropriate compounds, with hybrid prediction strategies being employed where possible.
Collapse
|
8
|
Conformational Strain of Macrocyclic Peptides in Ligand-Receptor Complexes Based on Advanced Refinement of Bound-State Conformers. J Med Chem 2021; 64:3282-3298. [PMID: 33724820 DOI: 10.1021/acs.jmedchem.0c02159] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Macrocyclic peptides are an important modality in drug discovery, but molecular design is limited due to the complexity of their conformational landscape. To better understand conformational propensities, global strain energies were estimated for 156 protein-macrocyclic peptide cocrystal structures. Unexpectedly large strain energies were observed when the bound-state conformations were modeled with positional restraints. Instead, low-energy conformer ensembles were generated using xGen that fit experimental X-ray electron density maps and gave reasonable strain energy estimates. The ensembles featured significant conformational adjustments while still fitting the electron density as well or better than the original coordinates. Strain estimates suggest the interaction energy in protein-ligand complexes can offset a greater amount of strain for macrocyclic peptides than for small molecules and non-peptidic macrocycles. Across all molecular classes, the approximate upper bound on global strain energies had the same relationship with molecular size, and bound-state ensembles from xGen yielded favorable binding energy estimates.
Collapse
|
9
|
Geometrically Diverse Lariat Peptide Scaffolds Reveal an Untapped Chemical Space of High Membrane Permeability. J Am Chem Soc 2021; 143:705-714. [PMID: 33381960 PMCID: PMC8514148 DOI: 10.1021/jacs.0c06115] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Constrained, membrane-permeable peptides offer the possibility of engaging challenging intracellular targets. Structure-permeability relationships have been extensively studied in cyclic peptides whose backbones are cyclized from head to tail, like the membrane permeable and orally bioavailable natural product cyclosporine A. In contrast, the physicochemical properties of lariat peptides, which are cyclized from one of the termini onto a side chain, have received little attention. Many lariat peptide natural products exhibit interesting biological activities, and some, such as griselimycin and didemnin B, are membrane permeable and have intracellular targets. To investigate the structure-permeability relationships in the chemical space exemplified by these natural products, we generated a library of scaffolds using stable isotopes to encode stereochemistry and determined the passive membrane permeability of over 1000 novel lariat peptide scaffolds with molecular weights around 1000. Many lariats were surprisingly permeable, comparable to many known orally bioavailable drugs. Passive permeability was strongly dependent on N-methylation, stereochemistry, and ring topology. A variety of structure-permeability trends were observed including a relationship between alternating stereochemistry and high permeability, as well as a set of highly permeable consensus sequences. For the first time, robust structure-permeability relationships are established in synthetic lariat peptides exceeding 1000 compounds.
Collapse
|
10
|
XGen: Real-Space Fitting of Complex Ligand Conformational Ensembles to X-ray Electron Density Maps. J Med Chem 2020; 63:10509-10528. [DOI: 10.1021/acs.jmedchem.0c01373] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
11
|
Structure- and Ligand-Based Virtual Screening on DUD-E +: Performance Dependence on Approximations to the Binding Pocket. J Chem Inf Model 2020; 60:4296-4310. [PMID: 32271577 DOI: 10.1021/acs.jcim.0c00115] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Using the DUD-E+ benchmark, we explore the impact of using a single protein pocket or ligand for virtual screening compared with using ensembles of alternative pockets, ligands, and sets thereof. For both structure-based and ligand-based approaches, the precise characterization of the binding site in question had a significant impact on screening performance. Using the single original DUD-E protein, Surflex-Dock yielded mean ROC area of 0.81 ± 0.11. Using the cognate ligand instead, with the eSim method for screening, yielded 0.77 ± 0.14. Moving to ensembles of five protein pocket variants increased docking performance to 0.84 ± 0.09. Results for the analogous ligand-based approach (using the five crystallographically aligned cognate ligands) was 0.83 ± 0.11. Using the same ligands, but making use of an automatically generated mutual alignment, yielded mean AUC nearly as good as from single-structure docking: 0.80 ± 0.12. Detailed results and statistical analyses show that structure- and ligand-based methods are complementary and can be fruitfully combined to enhance screening efficiency. A hybrid approach combining ensemble docking with eSim-based screening produced the best and most consistent performance (mean ROC area of 0.89 ± 0.08 and 1% early enrichment of 46-fold). Based on results from both the docking and ligand-similarity approaches, it is clearly unwise to make use of a single arbitrarily chosen protein structure for docking or single ligand query for similarity-based screening.
Collapse
|
12
|
Electrostatic-field and surface-shape similarity for virtual screening and pose prediction. J Comput Aided Mol Des 2019; 33:865-886. [PMID: 31650386 PMCID: PMC6856045 DOI: 10.1007/s10822-019-00236-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 10/11/2019] [Indexed: 02/04/2023]
Abstract
We introduce a new method for rapid computation of 3D molecular similarity that combines electrostatic field comparison with comparison of molecular surface-shape and directional hydrogen-bonding preferences (called “eSim”). Rather than employing heuristic “colors” or user-defined molecular feature types to represent conformation-dependent molecular electrostatics, eSim calculates the similarity of the electrostatic fields of two molecules (in addition to shape and hydrogen-bonding). We present detailed virtual screening performance data on the standard 102 target DUD-E set. In its moderately fast screening mode, eSim running on a single computing core is capable of processing over 60 molecules per second. In this mode, eSim performed significantly better than all alternate methods for which full DUD-E data were available (mean ROC area of 0.74, p \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$< 10^{-9}$$\end{document}<10-9, by paired t-test, compared with the best performing alternate method). In addition, for 92 targets of the DUD-E set where multiple ligand-bound crystal structures were available, screening performance was assessed using alternate ligands or sets thereof (in their bound poses) as similarity targets. Using the joint alignment of five ligands for each protein target, mean ROC area exceeded 0.82 for the 92 targets. Design-focused application of ligand similarity methods depends on accurate predictions of geometric molecular relationships. We comprehensively assessed pose prediction accuracy by curating nearly 400,000 bound ligand pose pairs across the DUD-E targets. Overall, beginning from agnostic initial poses, we observed an 80% success rate for RMSD \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\le 2.0$$\end{document}≤2.0 Å among the top 20 predicted eSim poses. These examples were split roughly 50/50 into cases with high direct atomic overlap (where a shared scaffold exists between a pair) and low direct atomic overlap (where where a ligand pair has dissimilar scaffolds but largely occupies the same space). Within the high direct atomic overlap subset, the pose prediction success rate was 93%. For the more challenging subset (where dissimilar scaffolds are to be aligned), the success rate was 70%. The eSim approach enables both large-scale screening and rational design of ligands and is rooted in physically meaningful, non-heuristic, molecular comparisons.
Collapse
|
13
|
Quantitative surface field analysis: learning causal models to predict ligand binding affinity and pose. J Comput Aided Mol Des 2018; 32:731-757. [PMID: 29934750 PMCID: PMC6096883 DOI: 10.1007/s10822-018-0126-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Accepted: 06/14/2018] [Indexed: 12/27/2022]
Abstract
We introduce the QuanSA method for inducing physically meaningful field-based models of ligand binding pockets based on structure-activity data alone. The method is closely related to the QMOD approach, substituting a learned scoring field for a pocket constructed of molecular fragments. The problem of mutual ligand alignment is addressed in a general way, and optimal model parameters and ligand poses are identified through multiple-instance machine learning. We provide algorithmic details along with performance results on sixteen structure-activity data sets covering many pharmaceutically relevant targets. In particular, we show how models initially induced from small data sets can extrapolatively identify potent new ligands with novel underlying scaffolds with very high specificity. Further, we show that combining predictions from QuanSA models with those from physics-based simulation approaches is synergistic. QuanSA predictions yield binding affinities, explicit estimates of ligand strain, associated ligand pose families, and estimates of structural novelty and confidence. The method is applicable for fine-grained lead optimization as well as potent new lead identification.
Collapse
|
14
|
Magellan: A Web Based System for the Integrated Analysis of Heterogeneous Biological Data and Annotations; Application to DNA Copy Number and Expression Data in Ovarian Cancer. Cancer Inform 2017. [DOI: 10.1177/117693510600200019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Recent advances in high throughput biological methods allow researchers to generate enormous amounts of data from a single experiment. In order to extract meaningful conclusions from this tidal wave of data, it will be necessary to develop analytical methods of sufficient power and utility. It is particularly important that biologists themselves be able to perform many of these analyses, such that their background knowledge of the experimental system under study can be used to interpret results and direct further inquiries. We have developed a web-based system, Magellan, which allows the upload, storage, and analysis of multivariate data and textual or numerical annotations. Data and annotations are treated as abstract entities, to maximize the different types of information the system can store and analyze. Annotations can be used in analyses/visualizations, as a means of subsetting data to reduce dimensionality, or as a means of projecting variables from one data type or data set to another. Analytical methods are deployed within Magellan such that new functionalities can be added in a straightforward fashion. Using Magellan, we performed an integrated analysis of genome-wide comparative genomic hybridization (CGH), mRNA expression, and clinical data from ovarian tumors. Analyses included the use of permutation-based methods to identify genes whose mRNA expression levels correlated with patient survival, a nearest neighbor classifier to predict patient survival from CGH data, and curated annotations such as genomic position and derived annotations such as statistical computations to explore the quantitative relationship between CGH and mRNA expression data.
Collapse
|
15
|
ForceGen 3D structure and conformer generation: from small lead-like molecules to macrocyclic drugs. J Comput Aided Mol Des 2017; 31:419-439. [PMID: 28289981 PMCID: PMC5429375 DOI: 10.1007/s10822-017-0015-8] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2017] [Accepted: 02/27/2017] [Indexed: 12/22/2022]
Abstract
We introduce the ForceGen method for 3D structure generation and conformer elaboration of drug-like small molecules. ForceGen is novel, avoiding use of distance geometry, molecular templates, or simulation-oriented stochastic sampling. The method is primarily driven by the molecular force field, implemented using an extension of MMFF94s and a partial charge estimator based on electronegativity-equalization. The force field is coupled to algorithms for direct sampling of realistic physical movements made by small molecules. Results are presented on a standard benchmark from the Cambridge Crystallographic Database of 480 drug-like small molecules, including full structure generation from SMILES strings. Reproduction of protein-bound crystallographic ligand poses is demonstrated on four carefully curated data sets: the ConfGen Set (667 ligands), the PINC cross-docking benchmark (1062 ligands), a large set of macrocyclic ligands (182 total with typical ring sizes of 12-23 atoms), and a commonly used benchmark for evaluating macrocycle conformer generation (30 ligands total). Results compare favorably to alternative methods, and performance on macrocyclic compounds approaches that observed on non-macrocycles while yielding a roughly 100-fold speed improvement over alternative MD-based methods with comparable performance.
Collapse
|
16
|
Extrapolative prediction using physically-based QSAR. J Comput Aided Mol Des 2016; 30:127-52. [PMID: 26860112 PMCID: PMC4796382 DOI: 10.1007/s10822-016-9896-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2015] [Accepted: 01/21/2016] [Indexed: 11/25/2022]
Abstract
Surflex-QMOD integrates chemical structure and activity data to produce physically-realistic models for binding affinity prediction
. Here, we apply QMOD to a 3D-QSAR benchmark dataset and show broad applicability to a diverse set of targets. Testing new ligands within the QMOD model employs automated flexible molecular alignment, with the model itself defining the optimal pose for each ligand. QMOD performance was compared to that of four approaches that depended on manual alignments (CoMFA, two variations of CoMSIA, and CMF). QMOD showed comparable performance to the other methods on a challenging, but structurally limited, test set. The QMOD models were also applied to test a large and structurally diverse dataset of ligands from ChEMBL, nearly all of which were synthesized years after those used for model construction. Extrapolation across diverse chemical structures was possible because the method addresses the ligand pose problem and provides structural and geometric means to quantitatively identify ligands within a model’s applicability domain. Predictions for such ligands for the four tested targets were highly statistically significant based on rank correlation. Those molecules predicted to be highly active (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\hbox {pK}_i \ge 7.5$$\end{document}pKi≥7.5) had a mean experimental \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\hbox {pK}_i$$\end{document}pKi of 7.5, with potent and structurally novel ligands being identified by QMOD for each target.
Collapse
|
17
|
|
18
|
Prediction of off-target drug effects through data fusion. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2014; 19:160-171. [PMID: 24297543 PMCID: PMC3897331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
We present a probabilistic data fusion framework that combines multiple computational approaches for drawing relationships between drugs and targets. The approach has special relevance to identifying surprising unintended biological targets of drugs. Comparisons between molecules are made based on 2D topological structural considerations, based on 3D surface characteristics, and based on English descriptions of clinical effects. Similarity computations within each modality were transformed into probability scores. Given a new molecule along with a set of molecules sharing some biological effect, a single score based on comparison to the known set is produced, reflecting either 2D similarity, 3D similarity, clinical effects similarity or their combination. The methods were validated within acurated structural pharmacology database (SPDB) and further tested by blind application to data derived from the ChEMBL database. For prediction of off-target effects, 3D-similarity performed best as a single modality, but combining all methods produced performance gains. Striking examples of structurally surprising off-target predictions are presented.
Collapse
|
19
|
Protein function annotation by local binding site surface similarity. Proteins 2013; 82:679-94. [PMID: 24166661 DOI: 10.1002/prot.24450] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Revised: 10/02/2013] [Accepted: 10/10/2013] [Indexed: 11/06/2022]
Abstract
Hundreds of protein crystal structures exist for proteins whose function cannot be confidently determined from sequence similarity. Surflex-PSIM, a previously reported surface-based protein similarity algorithm, provides an alternative method for hypothesizing function for such proteins. The method now supports fully automatic binding site detection and is fast enough to screen comprehensive databases of protein binding sites. The binding site detection methodology was validated on apo/holo cognate protein pairs, correctly identifying 91% of ligand binding sites in holo structures and 88% in apo structures where corresponding sites existed. For correctly detected apo binding sites, the cognate holo site was the most similar binding site 87% of the time. PSIM was used to screen a set of proteins that had poorly characterized functions at the time of crystallization, but were later biochemically annotated. Using a fully automated protocol, this set of 8 proteins was screened against ∼60,000 ligand binding sites from the PDB. PSIM correctly identified functional matches that predated query protein biochemical annotation for five out of the eight query proteins. A panel of 12 currently unannotated proteins was also screened, resulting in a large number of statistically significant binding site matches, some of which suggest likely functions for the poorly characterized proteins.
Collapse
|
20
|
A structure-guided approach for protein pocket modeling and affinity prediction. J Comput Aided Mol Des 2013; 27:917-34. [PMID: 24214361 PMCID: PMC3851759 DOI: 10.1007/s10822-013-9688-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Accepted: 10/25/2013] [Indexed: 11/25/2022]
Abstract
Binding affinity prediction is frequently addressed using computational models constructed solely with molecular structure and activity data. We present a hybrid structure-guided strategy that combines molecular similarity, docking, and multiple-instance learning such that information from protein structures can be used to inform models of structure-activity relationships. The Surflex-QMOD approach has been shown to produce accurate predictions of binding affinity by constructing an interpretable physical model of a binding site with no experimental binding site structural information. We introduce a method to integrate protein structure information into the model induction process in order to construct more robust physical models. The structure-guided models accurately predict binding affinities over a broad range of compounds while producing more accurate representations of the protein pockets and ligand binding modes. Structure-guidance for the QMOD method yielded significant performance improvements, both for affinity and pose prediction, especially in cases where predictions were made on ligands very different from those used for model induction.
Collapse
|
21
|
Iterative refinement of a binding pocket model: active computational steering of lead optimization. J Med Chem 2012; 55:8926-42. [PMID: 23046104 PMCID: PMC3640415 DOI: 10.1021/jm301210j] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2012] [Indexed: 11/28/2022]
Abstract
Computational approaches for binding affinity prediction are most frequently demonstrated through cross-validation within a series of molecules or through performance shown on a blinded test set. Here, we show how such a system performs in an iterative, temporal lead optimization exercise. A series of gyrase inhibitors with known synthetic order formed the set of molecules that could be selected for "synthesis." Beginning with a small number of molecules, based only on structures and activities, a model was constructed. Compound selection was done computationally, each time making five selections based on confident predictions of high activity and five selections based on a quantitative measure of three-dimensional structural novelty. Compound selection was followed by model refinement using the new data. Iterative computational candidate selection produced rapid improvements in selected compound activity, and incorporation of explicitly novel compounds uncovered much more diverse active inhibitors than strategies lacking active novelty selection.
Collapse
|
22
|
Surflex-Dock: Docking benchmarks and real-world application. J Comput Aided Mol Des 2012; 26:687-99. [PMID: 22569590 DOI: 10.1007/s10822-011-9533-y] [Citation(s) in RCA: 184] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2011] [Accepted: 12/12/2011] [Indexed: 12/01/2022]
Abstract
Benchmarks for molecular docking have historically focused on re-docking the cognate ligand of a well-determined protein-ligand complex to measure geometric pose prediction accuracy, and measurement of virtual screening performance has been focused on increasingly large and diverse sets of target protein structures, cognate ligands, and various types of decoy sets. Here, pose prediction is reported on the Astex Diverse set of 85 protein ligand complexes, and virtual screening performance is reported on the DUD set of 40 protein targets. In both cases, prepared structures of targets and ligands were provided by symposium organizers. The re-prepared data sets yielded results not significantly different than previous reports of Surflex-Dock on the two benchmarks. Minor changes to protein coordinates resulting from complex pre-optimization had large effects on observed performance, highlighting the limitations of cognate ligand re-docking for pose prediction assessment. Docking protocols developed for cross-docking, which address protein flexibility and produce discrete families of predicted poses, produced substantially better performance for pose prediction. Performance on virtual screening performance was shown to benefit by employing and combining multiple screening methods: docking, 2D molecular similarity, and 3D molecular similarity. In addition, use of multiple protein conformations significantly improved screening enrichment.
Collapse
|
23
|
Abstract
Drug structures may be quantitatively compared based on 2D topological structural considerations and based on 3D characteristics directly related to binding. A framework for combining multiple similarity computations is presented along with its systematic application to 358 drugs with overlapping pharmacology. Given a new molecule along with a set of molecules sharing some biological effect, a single score based on comparison to the known set is produced, reflecting either 2D similarity, 3D similarity, or their combination. For prediction of primary targets, the benefit of 3D over 2D was relatively small, but for prediction of off-targets, the added benefit was large. In addition to assessing prediction, the relationship between chemical similarity and pharmacological novelty was studied. Drug pairs that shared high 3D similarity but low 2D similarity (i.e., a novel scaffold) were shown to be much more likely to exhibit pharmacologically relevant differences in terms of specific protein target modulation.
Collapse
|
24
|
Surface-based protein binding pocket similarity. Proteins 2011; 79:2746-63. [PMID: 21769944 DOI: 10.1002/prot.23103] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2011] [Revised: 05/06/2011] [Accepted: 05/25/2011] [Indexed: 11/08/2022]
Abstract
Protein similarity comparisons may be made on a local or global basis and may consider sequence information or differing levels of structural information. We present a local three-dimensional method that compares protein binding site surfaces in full atomic detail. The approach is based on the morphological similarity method which has been widely applied for global comparison of small molecules. We apply the method to all-by-all comparisons two sets of human protein kinases, a very diverse set of ATP-bound proteins from multiple species, and three heterogeneous benchmark protein binding site data sets. Cases of disagreement between sequence-based similarity and binding site similarity yield informative examples. Where sequence similarity is very low, high pocket similarity can reliably identify important binding motifs. Where sequence similarity is very high, significant differences in pocket similarity are related to ligand binding specificity and similarity. Local protein binding pocket similarity provides qualitatively complementary information to other approaches, and it can yield quantitative information in support of functional annotation.
Collapse
|
25
|
QMOD: physically meaningful QSAR. J Comput Aided Mol Des 2010; 24:865-78. [PMID: 20721601 DOI: 10.1007/s10822-010-9379-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2010] [Accepted: 08/03/2010] [Indexed: 10/19/2022]
Abstract
Computational methods for predicting ligand affinity where no protein structure is known generally take the form of regression analysis based on molecular features that have only a tangential relationship to a protein/ligand binding event. Such methods have utility in retrospective rationalization of activity patterns of substituents on a common scaffold, but are limited when either multiple scaffolds are present or when ligand alignment varies significantly based on structural changes. In addition, such methods generally assume independence and additivity of effect from scaffold substituents. Collectively, these non-physical modeling assumptions sharply limit the utility of widely used QSAR approaches for prospective prediction of ligand activity. The recently introduced Surflex-QMOD approach, by virtue of constructing physical models of binding sites, comes closer to a modeling approach that is congruent with protein ligand binding events. A set of congeneric CDK2 inhibitors showed that induced binding pockets can be quite congruent with the enzyme's active site but that model predictivity within a chemical series does not necessarily depend on congruence. Muscarinic antagonists were used to show that the QMOD approach is capable of making accurate predictions in cases where highly non-additive structure activity effects exist. The QMOD method offers a means to go beyond non-causative correlations in QSAR analysis.
Collapse
|
26
|
Abstract
The eight contributions here provide ample evidence that shape as a volume or as a surface is a vibrant and useful concept when applied to drug discovery. It provides a reliable scaffold for "decoration" with chemical intuition (or bias) for virtual screening and lead optimization but also has its unadorned uses, as in library design, ligand fitting, pose prediction, or active site description. Computing power has facilitated this evolution by allowing shape to be handled precisely without the need to reduce down to point descriptors or approximate metrics, and the diversity of resultant applications argues for this being an important step forward. Certainly, it is encouraging that as computation has enabled our intuition, molecular shape has consistently surprised us in its usefulness and adaptability. The first Aurelius question, "What is the essence of a thing?", seems well answered, however, the third, "What do molecules do?", only partly so. Are the topics covered here exhaustive, or is there more to come? To date, there has been little published on the use of the volumetric definition of shape described here as a QSAR variable, for instance, in the prediction or classification of activity, although other shape definitions have been successful applied, for instance, as embodied in the Compass program described above in "Shape from Surfaces". Crystal packing is a phenomenon much desired to be understood. Although powerful models have been applied to the problem, to what degree is this dominated purely by the shape of a molecule? The shape comparison described here is typically of a global nature, and yet some importance must surely be placed on partial shape matching, just as the substructure matching of chemical graphs has proved useful. The approach of using surfaces, as described here, offers some flavor of this, as does the use of metrics that penalize volume mismatch less than the Tanimoto, e.g., Tversky measures. As yet, there is little to go on as to how useful a paradigm this will be because there is less software and fewer concrete results.Finally, the distance between molecular shapes, or between any shapes defined as volumes or surfaces, is a metric property in the mathematical sense of the word. As yet, there has been little, if any, application of this observation. We cannot know what new application to the design and discovery of pharmaceuticals may yet arise from the simple concept of molecular shape, but it is fair to say that the progress so far is impressive.
Collapse
|
27
|
Abstract
Computational methods for predicting ligand affinity where no protein structure is known generally take the form of regression analysis based on molecular features that have only a tangential relationship to a protein/ligand binding event. Such methods have limited utility when structural variation moves beyond congeneric series. We present a novel approach based on the multiple-instance learning method of Compass, where a physical model of a binding site is induced from ligands and their corresponding activity data. The model consists of molecular fragments that can account for multiple positions of literal protein residues. We demonstrate the method on 5HT1a ligands by training on a series with limited scaffold variation and testing on numerous ligands with variant scaffolds. Predictive error was between 0.5 and 1.0 log units (0.7-1.4 kcal/mol), with statistically significant rank correlations. Accurate activity predictions of novel ligands were demonstrated using a validation approach where a small number of ligands of limited structural variation known at a fixed time point were used to make predictions on a blind test set of widely varying molecules, some discovered at a much later time point.
Collapse
|
28
|
Effects of protein conformation in docking: improved pose prediction through protein pocket adaptation. J Comput Aided Mol Des 2009; 23:355-74. [PMID: 19340588 DOI: 10.1007/s10822-009-9266-3] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2009] [Accepted: 03/14/2009] [Indexed: 11/30/2022]
Abstract
Computational methods for docking ligands have been shown to be remarkably dependent on precise protein conformation, where acceptable results in pose prediction have been generally possible only in the artificial case of re-docking a ligand into a protein binding site whose conformation was determined in the presence of the same ligand (the "cognate" docking problem). In such cases, on well curated protein/ligand complexes, accurate dockings can be returned as top-scoring over 75% of the time using tools such as Surflex-Dock. A critical application of docking in modeling for lead optimization requires accurate pose prediction for novel ligands, ranging from simple synthetic analogs to very different molecular scaffolds. Typical results for widely used programs in the "cross-docking case" (making use of a single fixed protein conformation) have rates closer to 20% success. By making use of protein conformations from multiple complexes, Surflex-Dock yields an average success rate of 61% across eight pharmaceutically relevant targets. Following docking, protein pocket adaptation and rescoring identifies single pose families that are correct an average of 67% of the time. Consideration of the best of two pose families (from alternate scoring regimes) yields a 75% mean success rate.
Collapse
|
29
|
Abstract
We describe a method for modeling chemical mutagenicity in terms of simple rules based on molecular features. A classification model was built using a rule-based ensemble method called RuleFit, developed by Friedman and Popescu. We show how performance compares favorably against literature methods. Performance was measured through the use of cross-validation and testing on external test sets. All data sets used are publicly available. The method automatically generated transparent rules in terms of molecular structure that agree well with known toxicology. While we have focused on chemical mutagenicity in demonstrating this method, we anticipate that it may be more generally useful in modeling other molecular properties such as other types of chemical toxicity.
Collapse
|
30
|
Abstract
The field of computational chemistry, particularly as applied to drug design, has become increasingly important in terms of the practical application of predictive modeling to pharmaceutical research and development. Tools for exploiting protein structures or sets of ligands known to bind particular targets can be used for binding-mode prediction, virtual screening, and prediction of activity. A serious weakness within the field is a lack of standards with respect to quantitative evaluation of methods, data set preparation, and data set sharing. Our goal should be to report new methods or comparative evaluations of methods in a manner that supports decision making for practical applications. Here we propose a modest beginning, with recommendations for requirements on statistical reporting, requirements for data sharing, and best practices for benchmark preparation and usage.
Collapse
|
31
|
Abstract
Empirical scoring functions used in protein-ligand docking calculations are typically trained on a dataset of complexes with known affinities with the aim of generalizing across different docking applications. We report a novel method of scoring-function optimization that supports the use of additional information to constrain scoring function parameters, which can be used to focus a scoring function's training towards a particular application, such as screening enrichment. The approach combines multiple instance learning, positive data in the form of ligands of protein binding sites of known and unknown affinity and binding geometry, and negative (decoy) data of ligands thought not to bind particular protein binding sites or known not to bind in particular geometries. Performance of the method for the Surflex-Dock scoring function is shown in cross-validation studies and in eight blind test cases. Tuned functions optimized with a sufficient amount of data exhibited either improved or undiminished screening performance relative to the original function across all eight complexes. Analysis of the changes to the scoring function suggest that modifications can be learned that are related to protein-specific features such as active-site mobility.
Collapse
|
32
|
Effects of inductive bias on computational evaluations of ligand-based modeling and on drug discovery. J Comput Aided Mol Des 2007; 22:147-59. [PMID: 18074107 DOI: 10.1007/s10822-007-9150-y] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2007] [Accepted: 11/12/2007] [Indexed: 11/28/2022]
Abstract
Inductive bias is the set of assumptions that a person or procedure makes in making a prediction based on data. Different methods for ligand-based predictive modeling have different inductive biases, with a particularly sharp contrast between 2D and 3D similarity methods. A unique aspect of ligand design is that the data that exist to test methodology have been largely man-made, and that this process of design involves prediction. By analyzing the molecular similarities of known drugs, we show that the inductive bias of the historic drug discovery process has a very strong 2D bias. In studying the performance of ligand-based modeling methods, it is critical to account for this issue in dataset preparation, use of computational controls, and in the interpretation of results. We propose specific strategies to explicitly address the problems posed by inductive bias considerations.
Collapse
|
33
|
Surflex-Dock 2.1: robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search. J Comput Aided Mol Des 2007; 21:281-306. [PMID: 17387436 DOI: 10.1007/s10822-007-9114-2] [Citation(s) in RCA: 456] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2007] [Accepted: 02/21/2007] [Indexed: 10/23/2022]
Abstract
The Surflex flexible molecular docking method has been generalized and extended in two primary areas related to the search component of docking. First, incorporation of a small-molecule force-field extends the search into Cartesian coordinates constrained by internal ligand energetics. Whereas previous versions searched only the alignment and acyclic torsional space of the ligand, the new approach supports dynamic ring flexibility and all-atom optimization of docked ligand poses. Second, knowledge of well established molecular interactions between ligand fragments and a target protein can be directly exploited to guide the search process. This offers advantages in some cases over the search strategy where ligand alignment is guided solely by a "protomol" (a pre-computed molecular representation of an idealized ligand). Results are presented on both docking accuracy and screening utility using multiple publicly available benchmark data sets that place Surflex's performance in the context of other molecular docking methods. In terms of docking accuracy, Surflex-Dock 2.1 performs as well as the best available methods. In the area of screening utility, Surflex's performance is extremely robust, and it is clearly superior to other methods within the set of cases for which comparative data are available, with roughly double the screening enrichment performance.
Collapse
|
34
|
Abstract
Virtual screening by molecular docking has become established as a method for drug lead discovery and optimization. All docking algorithms make use of a scoring function in combination with a method of search. Two theoretical aspects of scoring function performance dominate operational performance. The first is the degree to which a scoring function has a global extremum within the ligand pose landscape at the proper location. The second is the degree to which the magnitude of the function at the extremum is accurate. Presuming adequate search strategies, a scoring function's location performance will dominate behavior with respect to docking accuracy: the degree to which a predicted pose of a ligand matches experimental observation. A scoring function's magnitude performance will dominate behavior with respect to screening utility: enrichment of true ligands over non-ligands. Magnitude estimation also controls pure scoring accuracy: the degree to which bona fide ligands of a particular protein may be correctly ranked. Approaches to the development of scoring functions have varied widely, with a number of functions yielding similarly high levels of performance relating to the location issue. However, even among functions performing equally well on location, widely varying performance is observed on the question of magnitude. In many cases, performance is good enough to yield high enrichments of true ligands versus non-ligands in screening across a wide variety of protein types. Generally, performance is not good enough to correctly rank among true ligands. Strategies for improvement are discussed.
Collapse
|
35
|
Abstract
Systematic annotation of the primary targets of roughly 1000 known therapeutics reveals that over 700 of these modulate approximately 85 biological targets. We report the results of three analyses. In the first analysis, drug/drug similarities and target/target similarities were computed on the basis of three-dimensional ligand structures. Drug pairs sharing a target had significantly higher similarity than drug pairs sharing no target. Also, target pairs with no overlap in annotated drug specificity shared lower similarity than target pairs with increasing overlap. Two-way agglomerative clusterings of drugs and targets were consistent with known pharmacology and suggestive that side effects and drug-drug interactions might be revealed by modeling many targets. In the second analysis, we constructed and tested ligand-based models of 22 diverse targets in virtual screens using a background of screening molecules. Greater than 100-fold enrichment of cognate versus random molecules was observed in 20/22 cases. In the third analysis, selectivity of the models was tested using a background of drug molecules, with selectivity of greater than 80-fold observed in 17/22 cases. Predicted activities derived from crossing drugs against modeled targets identified a number of known side effects, drug specificities, and drug-drug interactions that have a rational basis in molecular structure.
Collapse
|
36
|
Breast tumor copy number aberration phenotypes and genomic instability. BMC Cancer 2006; 6:96. [PMID: 16620391 PMCID: PMC1459181 DOI: 10.1186/1471-2407-6-96] [Citation(s) in RCA: 230] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2006] [Accepted: 04/18/2006] [Indexed: 01/24/2023] Open
Abstract
Background Genomic DNA copy number aberrations are frequent in solid tumors, although the underlying causes of chromosomal instability in tumors remain obscure. Genes likely to have genomic instability phenotypes when mutated (e.g. those involved in mitosis, replication, repair, and telomeres) are rarely mutated in chromosomally unstable sporadic tumors, even though such mutations are associated with some heritable cancer prone syndromes. Methods We applied array comparative genomic hybridization (CGH) to the analysis of breast tumors. The variation in the levels of genomic instability amongst tumors prompted us to investigate whether alterations in processes/genes involved in maintenance and/or manipulation of the genome were associated with particular types of genomic instability. Results We discriminated three breast tumor subtypes based on genomic DNA copy number alterations. The subtypes varied with respect to level of genomic instability. We find that shorter telomeres and altered telomere related gene expression are associated with amplification, implicating telomere attrition as a promoter of this type of aberration in breast cancer. On the other hand, the numbers of chromosomal alterations, particularly low level changes, are associated with altered expression of genes in other functional classes (mitosis, cell cycle, DNA replication and repair). Further, although loss of function instability phenotypes have been demonstrated for many of the genes in model systems, we observed enhanced expression of most genes in tumors, indicating that over expression, rather than deficiency underlies instability. Conclusion Many of the genes associated with higher frequency of copy number aberrations are direct targets of E2F, supporting the hypothesis that deregulation of the Rb pathway is a major contributor to chromosomal instability in breast tumors. These observations are consistent with failure to find mutations in sporadic tumors in genes that have roles in maintenance or manipulation of the genome.
Collapse
|
37
|
Abstract
MOTIVATION We present a novel algorithm, MaMF, for identifying transcription factor (TF) binding site motifs. The method is deterministic and depends on an indexing technique to optimize the search process. On common yeast datasets, MaMF performs competitively with other methods. We also present results on a challenging group of eight sets of human genes known to be responsive to a diverse group of TFs. In every case, MaMF finds the annotated motif among the top scoring putative motifs. We compared MaMF against other motif finders on a larger human group of 21 gene sets and found that MaMF performs better than other algorithms. We analyzed the remaining high scoring motifs and show that many correspond to other TFs that are known to co-occur with the annotated TF motifs. The significant and frequent presence of co-occurring transcription factor binding sites explains in part the difficulty of human motif finding. MaMF is a very fast algorithm, suitable for application to large numbers of interesting gene sets.
Collapse
|
38
|
Abstract
MOTIVATION We present a system, QPACA (Quantitative Pathway Analysis in Cancer) for analysis of biological data in the context of pathways. QPACA supports data visualization and both fine- and coarse-grained specifications, but, more importantly, addresses the problems of pathway recognition and pathway augmentation. RESULTS Given a set of genes hypothesized to be part of a pathway or a coordinated process, QPACA is able to reliably distinguish true pathways from non-pathways using microarray expression data. Relying on the observation that only some of the experiments within a dataset are relevant to a specific biochemical pathway, QPACA automates selection of this subset using an optimization procedure. We present data on all human and yeast pathways found in the KEGG pathway database. In 117 out of 191 cases (61%), QPACA was able to correctly identify these positive cases as bona fide pathways with p-values measured using rigorous permutation analysis. Success in recognizing pathways was dependent on pathway size, with the largest quartile of pathways yielding 83% success. In cross-validation tests of pathway membership prediction, QPACA was able to yield enrichments for predicted pathway genes over random genes at rates of 2-fold or better the majority of the time, with rates of 10-fold or better 10-20% of the time. AVAILABILITY The software is available for academic research use free of charge by email request. SUPPLEMENTARY INFORMATION Data used in the paper may be downloaded from http://www.jainlab.org/downloads.html
Collapse
|
39
|
Abstract
Surflex-Dock employs an empirically derived scoring function to rank putative protein-ligand interactions by flexible docking of small molecules to proteins of known structure. The scoring function employed by Surflex was developed purely on the basis of positive data, comprising noncovalent protein-ligand complexes with known binding affinities. Consequently, scoring function terms for improper interactions received little weight in parameter estimation, and an ad hoc scheme for avoiding protein-ligand interpenetration was adopted. We present a generalized method for incorporating synthetically generated negative training data, which allows for rigorous estimation of all scoring function parameters. Geometric docking accuracy remained excellent under the new parametrization. In addition, a test of screening utility covering a diverse set of 29 proteins and corresponding ligand sets showed improved performance. Maximal enrichment of true ligands over nonligands exceeded 20-fold in over 80% of cases, with enrichment of greater than 100-fold in over 50% of cases.
Collapse
|
40
|
Expression of the tumor suppressor gene ARHI in epithelial ovarian cancer is associated with increased expression of p21WAF1/CIP1 and prolonged progression-free survival. Clin Cancer Res 2005; 10:6559-66. [PMID: 15475444 DOI: 10.1158/1078-0432.ccr-04-0698] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
PURPOSE ARHI, an imprinted putative tumor suppressor gene, is expressed in normal ovarian epithelial cells, but its expression is down-regulated or lost in most ovarian cancer cell lines. Reexpression of ARHI in cancer cells induces p21(WAF1/CIP1), down-regulates cyclin D1 promoter activity and inhibits growth in cell culture and in heterografts. To determine the relevance of these observations to clinical cancer, we have now measured ARHI expression in normal, benign and malignant ovarian tissues using immunohistochemistry and in situ hybridization. EXPERIMENTAL DESIGN Paraffin embedded tissues from 7 normal ovaries, 22 cystadenomas and 42 borderline lesions were analyzed using standard immunoperoxidase and in situ hybridization techniques to assess ARHI expression. In addition, immunohistochemistry against ARHI was performed on a tissue microarray containing 441 consecutive cases of ovarian carcinoma. RESULTS Strong ARHI expression was found in normal ovarian surface epithelial cells, cysts and follicles using immunohistochemistry and in situ hybridization. Reduced ARHI expression was observed in tumors of low malignant potential as well as in invasive cancers. ARHI expression was down-regulated in 63% of invasive ovarian cancer specimens and could not be detected in 47%. When immunohistochemistry and in situ hybridization were compared, ARHI protein expression could be down-regulated in the presence of ARHI mRNA. ARHI expression was correlated with expression of p21(WAF1/CIP1) (P = 0.0074) but not with cyclin D1 and associated with prolonged disease free survival (P = 0.001). On multivariate analysis, ARHI expression, grade and stage were independent prognostic factors. ARHI expression did not correlate with overall survival. CONCLUSIONS Persistence of ARHI expression in epithelial ovarian cancers correlated with prolonged disease free survival and expression of the cyclin dependent kinase inhibitor p21(WAF1/CIP1).
Collapse
|
41
|
Fractional Genomic Alteration Detected by Array-Based Comparative Genomic Hybridization Independently Predicts Survival after Hepatic Resection for Metastatic Colorectal Cancer. Clin Cancer Res 2005; 11:1791-7. [PMID: 15756001 DOI: 10.1158/1078-0432.ccr-04-1418] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
PURPOSE Although liver resection is the primary curative therapy for patients with colorectal hepatic metastases, most patients have a recurrence. Identification of molecular markers that predict patients at highest risk for recurrence may help to target further therapy. EXPERIMENTAL DESIGN Array-based comparative genomic hybridization was used to investigate the association of DNA copy number alterations with outcome in patients with colorectal liver metastasis resected with curative intent. DNA from 50 liver metastases was labeled and hybridized onto an array consisting of 2,463 bacterial artificial chromosome clones covering the entire genome. The total fraction of genome altered (FGA) in the metastases and the patient's clinical risk score (CRS) were calculated to identify independent prognostic factors for survival. RESULTS An average of 30 +/- 14% of the genome was altered in the liver metastases (14% gained and 16% lost). As expected, a lower CRS was an independent predictor of overall survival (P = 0.03). In addition, a high FGA also was an independent predictor of survival (P = 0.01). The median survival time in patients with a low CRS (score 0-2) and a high (> or =20%) FGA was 38 months compared with 18 months in patients with a low CRS and a low FGA. Supervised analyses, using Prediction Analysis of Microarrays and Significance Analysis of Microarrays, identified a set of clones, predominantly located on chromosomes 7 and 20, which best predicted survival. CONCLUSIONS Both FGA and CRS are independent predictors of survival in patients with resected hepatic colorectal cancer metastases. The greater the FGA, the more likely the patient is to survive.
Collapse
|
42
|
Mapping segmental and sequence variations among laboratory mice using BAC array CGH. Genome Res 2005; 15:302-11. [PMID: 15687294 PMCID: PMC546532 DOI: 10.1101/gr.2902505] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2004] [Accepted: 11/15/2004] [Indexed: 01/14/2023]
Abstract
We used arrays of 2069 BACs (1303 nonredundant autosomal clones) to map sequence variation among Mus spretus (SPRET/Ei and SPRET/Glasgow) and Mus musculus (C3H/HeJ, BALB/cJ, 129/J, DBA/2J, NIH, FVB/N, and C57BL/6) strains. We identified 80 clones representing 74 autosomal loci of copy number variation (|log(2)ratio| >/= 0.4). These variant loci distinguish laboratory strains. By FISH mapping, we determined that 63 BACs mapped to a single site on C57BL/6J chromosomes, while 17 clones mapped to multiple chromosomes (n = 16) or multiple sites on one chromosome (n = 1). We also show that small ratio changes (Delta log(2)ratio approximately 0.1) distinguish homozygous and heterozygous regions of the genome in interspecific backcross mice, providing an efficient method for genotyping progeny of backcrosses.
Collapse
|
43
|
Virtual screening in lead discovery and optimization. CURRENT OPINION IN DRUG DISCOVERY & DEVELOPMENT 2004; 7:396-403. [PMID: 15338948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Abstract
Virtual screening by molecular docking, using a protein with an experimentally determined structure as a target, has become an established method for lead discovery and for enhancing efficiency in lead optimization. Generalizations of the quantitative structure-activity relationship concept have led to approaches for virtual screening in the absence of a protein target structure, instead relying upon ligand-based models as surrogates of protein active sites. Recently reported methods for ligand-based virtual screening can achieve similar enrichment rates to those obtained using molecular docking. This review will discuss recent advances in both domains of virtual screening, including theoretical and practical advances and the implications for their application.
Collapse
|
44
|
Whole genome scanning identifies genotypes associated with recurrence and metastasis in prostate tumors. Hum Mol Genet 2004; 13:1303-13. [PMID: 15138198 DOI: 10.1093/hmg/ddh155] [Citation(s) in RCA: 146] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Prostate cancer is the most commonly diagnosed non-cutaneous neoplasm among American males and is the second leading cause of cancer-related death. Prostate specific antigen screening has resulted in earlier disease detection, yet approximately 30% of men will die of metastatic disease. Slow disease progression, an aging population and associated morbidity and mortality underscore the need for improved disease classification and therapies. To address these issues, we analyzed a cohort of patients using array comparative genomic hybridization (aCGH). The cohort comprises 64 patients, half of whom recurred postoperatively. Analysis of the aCGH profiles revealed numerous recurrent genomic copy number aberrations. Specific loss at 8p23.2 was associated with advanced stage disease, and gain at 11q13.1 was found to be predictive of postoperative recurrence independent of stage and grade. Moreover, comparison with an independent set of metastases revealed approximately 40 candidate markers associated with metastatic potential. Copy number aberrations at these loci may define metastatic genotypes.
Collapse
|
45
|
Abstract
The majority of drug targets for small molecule therapeutics are proteins whose three-dimensional structure is not known to sufficient resolution to permit structure-based design. All three-dimensional QSAR approaches have a requirement for some hypothesis of ligand conformation and alignment, and predictions of molecular activity critically depend on this ligand-based binding site hypothesis. The molecular similarity function used in the Surflex docking system, coupled with quantitative pressure to minimize overall molecular volume, forms an effective objective function for generating hypotheses of bioactive conformations of sets of small molecules binding to their cognate proteins. Results are presented, assessing utility of the method for ligands of the serotonin, histamine, muscarinic, and GABA(A) receptors. The Surflex similarity module (Surflex-Sim) was able, in each case, to distinguish true ligands from random compounds using models constructed from just two or three known ligands. True positive rates of 60% were achieved with false positive rates of 0-3%; the theoretical enrichment rates were over 150-fold compared with random screening. The methods are practically applicable for rational design of ligands and for high-throughput virtual screening and offer competitive performance to many structure-based docking algorithms.
Collapse
|
46
|
High-resolution analysis of DNA copy number alterations in colorectal cancer by array-based comparative genomic hybridization. Carcinogenesis 2004; 25:1345-57. [PMID: 15001537 DOI: 10.1093/carcin/bgh134] [Citation(s) in RCA: 159] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Array-based comparative genomic hybridization (CGH) allows for the simultaneous examination of thousands of genomic loci at 1-2 Mb resolution. Copy number alterations detected by array-based CGH can aid in the identification and localization of cancer causing genes. Here we report the results of array-based CGH in a set of 125 primary colorectal tumors hybridized onto an array consisting of 2463 bacterial artificial chromosome clones. On average, 17.3% of the entire genome was altered in our samples (8.5 +/- 6.7% gained and 8.8 +/- 7.3% lost). Losses involving 8p, 17p, 18p or 18q occurred in 37, 46, 49 and 60% of cases, respectively. Gains involving 8q or 20q were observed 42 and 65% of the time, respectively. A transition from loss to gain occurred on chromosome 8 between 41 and 48 Mb, with 25% of cases demonstrating a gain of 8p11 (45-53 Mb). Chromosome 8 also contained four distinct loci demonstrating high-level amplifications, centering at 44.9, 60, 92.7 and 144.7 Mb. On 20q multiple high-level amplifications were observed, centering at 32.3, 37.8, 45.4, 54.7, 59.4 and 65 Mb. Few differences in DNA copy number alterations were associated with tumor stage, location, age and sex of the patient. Microsatellite stable and unstable (MSI-H) tumors differed significantly with respect to the frequency of alterations (20 versus 5%, respectively, P < 0.01). Interestingly, MSI-H tumors were also observed to have DNA copy number alterations, most commonly involving 8q. This high-resolution analysis of DNA copy number alterations in colorectal cancer by array-based CGH allowed for the identification of many small, previously uncharacterized, genomic regions, such as on chromosomes 8 and 20. Array-based CGH was also able to identify DNA copy number changes in MSI-H tumors.
Collapse
|
47
|
Abstract
The RAS/mitogen-activated protein kinase pathway sends external growth-promoting signals to the nucleus. BRAF, a critical serine/threonine kinase in this pathway, is frequently activated by somatic mutation in melanoma. Using a cohort of 115 patients with primary invasive melanomas, we show that BRAF mutations are statistically significantly more common in melanomas occurring on skin subject to intermittent sun exposure than elsewhere (23 of 43 patients; P<.001, two-sided Fisher's exact test). By contrast, BRAF mutations in melanomas on chronically sun-damaged skin (1 of 12 patients) and melanomas on skin relatively or completely unexposed to sun, such as palms, soles, subungual sites (6 of 39 patients), and mucosal membranes (2 of 21 patients) are rare. We found no association of mutation status with clinical outcome or with the presence of an associated melanocytic nevus. The mutated BRAF allele was frequently found at an elevated copy number, implicating BRAF as one of the factors driving selection for the frequent copy number increases of chromosome 7q in melanoma. In summary, the uneven distribution of BRAF mutations strongly suggests distinct genetic pathways leading to melanoma. The high mutation frequency in melanomas arising on intermittently sun-exposed skin suggests a complex causative role of such exposure that mandates further evaluation.
Collapse
|
48
|
Abstract
A sequence similarity metric operating on 10 kb upstream regions of gene pairs quantitatively predicts a portion of co-variation of expression of gene pairs in large-scale gene expression studies in human tumors and tumor-derived cell lines. The signal on which the metric depends most strongly originates in the compositional structure of repetitive genomic sequences (particularly Alu elements) present in these upstream regions. This effect is completely separable from effects of isochore composition on gene expression. The results implicate repetitive elements with some functional role in transcriptional regulation of the specific genes in whose promoter regions they reside and lend credence to suggestions that the general phenomenon of repetitive element insertions may be a fundamental evolutionary mechanism for modulating gene transcription.
Collapse
|
49
|
Abstract
Tumors with defects in mismatch repair (MMR) show fewer chromosomal changes by cytogenetic analyses than most solid tumors, suggesting that a greater proportion of the alterations required for malignancy occur in genes with nucleotide sequences susceptible to errors normally corrected by MMR. Here, we used genome-wide microarray comparative genomic hybridization to carry out a higher resolution evaluation of the effect of MMR competence on genomic alterations occurring in 20 cell lines and to determine if characteristic aberrations arise in MMR-proficient and -deficient HCT116 cells undergoing selection for methotrexate resistance. We observed different spectra of aberrations in MMR-proficient compared to -deficient cell lines, as well as among cell lines with different types of MMR-deficiency. We also observed different genetic routes to drug resistance. Resistant MMR-deficient cells most frequently displayed no copy number alterations (16/29 cell pools), whereas all MMR-proficient cells had unique abnormalities involving chromosome 5, including amplicons centered on the target gene, DHFR and/or a neighboring novel locus (7/13 pools). These observations support the concept that tumor genomes are shaped by selection for alterations that promote survival and growth advantage, as well as by the particular dysfunctions in genes responsible for maintenance of genetic integrity.
Collapse
|
50
|
Genome-wide-array-based comparative genomic hybridization reveals genetic homogeneity and frequent copy number increases encompassing CCNE1 in fallopian tube carcinoma. Oncogene 2003; 22:4281-6. [PMID: 12833150 DOI: 10.1038/sj.onc.1206621] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Fallopian tube carcinoma (FTC) is a rare, poorly studied and aggressive cancer, associated with poor survival. Since tumorigenesis is related to the acquisition of genetic changes, we used genome-wide array comparative genomic hybridization to analyse copy number aberrations occurring in FTC in order to obtain a better understanding of FTC carcinogenesis and to identify prognostic events and targets for therapy. We used arrays of 2464 genomic clones, providing approximately 1.4 Mb resolution across the genome to map genomic DNA copy number aberrations quantitatively from 14 FTC onto the human genome sequence. All tumors showed a high frequency of copy number aberrations with recurrent gains on 3q, 6p, 7q, 8q, 12p, 17q, 19 and 20q, and losses involving chromosomes 4, 5q, 8p, 16q, 17p, 18q and X. Recurrent regions of amplification included 1p34, 8p11-q11, 8q24, 12p, 17p13, 17q12-q21, 19p13, 19q12-q13 and 19q13. Candidate, known oncogenes mapping to these amplicons included CMYC (8q24), CCNE1 (19q12-q21) and AKT2 (19q13), whereas PIK3CA and KRAS, previously suggested to be candidate driver genes for amplification, mapped outside copy number maxima on 3q and 12p, respectively. The FTC were remarkably homogeneous, with some recurrent aberrations occurring in more than 70% of samples, which suggests a stereotyped pattern of tumor evolution.
Collapse
|