1
|
Zhou H, Skolnick J. Utility of the Morgan Fingerprint in Structure-Based Virtual Ligand Screening. J Phys Chem B 2024; 128:5363-5370. [PMID: 38783525 PMCID: PMC11163432 DOI: 10.1021/acs.jpcb.4c01875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 05/10/2024] [Accepted: 05/14/2024] [Indexed: 05/25/2024]
Abstract
In modern drug discovery, virtual ligand screening (VLS) is frequently applied to identify possible hits before experimental testing and refinement due to its cost-effective nature for large compound libraries. For decades, efforts have been devoted to developing VLS methods with high accuracy. These include the state-of-the-art FINDSITE suite of approaches FINDSITEcomb2.0, FRAGSITE, and FRAGSITE2 and the meta version FRAGSITEcomb that were developed in our lab. These methods combine ligand homology modeling (LHM), traditional ligand similarity methods, and more recently machine learning approaches to rank ligands and have proven to be superior to most recent deep learning and large language model-based approaches. Here, we describe further improvements to our previous best methods by combining the Morgan fingerprint (MF) with the originally used PubChem fingerprint and FP2 fingerprint. We then benchmarked FINDSITEcomb2.0M, FRAGSITEM, FRAGSITE2M, and the composite meta-approach FRAGSITEcombM. On the 102 target DUD-E set, the 1% enrichment factor (EF1%) and area under the precision-recall curve (AUPR) of FRAGSITEcomb increased from 42.0/0.59 to 47.6/0.72. This 0.72 AUPR is significantly better than that of the state-of-the-art deep learning-based method DenseFS's AUPR of 0.443. An independent test on the 81 targets DEKOIS2.0 set shows that EF1%/AUPR increases from 18.3/0.520 to 23.1/0.683. An ablation investigation shows that the MF contributes to most of the improvement of all four approaches. Thus, the MF is a useful addition to structure-based VLS.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems
Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Jeffrey Skolnick
- Center for the Study of Systems
Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
2
|
von Beck T, Mena Hernandez L, Zhou H, Floyd K, Suthar MS, Skolnick J, Jacob J. Atovaquone and Pibrentasvir Inhibit the SARS-CoV-2 Endoribonuclease and Restrict Infection In Vitro but Not In Vivo. Viruses 2023; 15:1841. [PMID: 37766247 PMCID: PMC10534768 DOI: 10.3390/v15091841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/23/2023] [Accepted: 08/24/2023] [Indexed: 09/29/2023] Open
Abstract
The emergence of SARS-CoV-1 in 2003 followed by MERS-CoV and now SARS-CoV-2 has proven the latent threat these viruses pose to humanity. While the SARS-CoV-2 pandemic has shifted to a stage of endemicity, the threat of new coronaviruses emerging from animal reservoirs remains. To address this issue, the global community must develop small molecule drugs targeting highly conserved structures in the coronavirus proteome. Here, we characterized existing drugs for their ability to inhibit the endoribonuclease activity of the SARS-CoV-2 non-structural protein 15 (nsp15) via in silico, in vitro, and in vivo techniques. We have identified nsp15 inhibition by the drugs pibrentasvir and atovaquone which effectively inhibit SARS-CoV-2 and HCoV-OC43 at low micromolar concentrations in cell cultures. Furthermore, atovaquone, but not pibrentasvir, is observed to modulate HCoV-OC43 dsRNA and infection in a manner consistent with nsp15 inhibition. Although neither pibrentasvir nor atovaquone translate to clinical efficacy in a murine prophylaxis model of SARS-CoV-2 infection, atovaquone may serve as a basis for the design of future nsp15 inhibitors.
Collapse
Affiliation(s)
- Troy von Beck
- Emory Vaccine Center, Emory National Primate Research Center, Emory University, 954 Gatewood Road, Atlanta, GA 30329, USA; (T.v.B.); (L.M.H.); (K.F.); (M.S.S.)
| | - Luis Mena Hernandez
- Emory Vaccine Center, Emory National Primate Research Center, Emory University, 954 Gatewood Road, Atlanta, GA 30329, USA; (T.v.B.); (L.M.H.); (K.F.); (M.S.S.)
| | - Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, Atlanta, GA 30332, USA; (H.Z.); (J.S.)
| | - Katharine Floyd
- Emory Vaccine Center, Emory National Primate Research Center, Emory University, 954 Gatewood Road, Atlanta, GA 30329, USA; (T.v.B.); (L.M.H.); (K.F.); (M.S.S.)
| | - Mehul S. Suthar
- Emory Vaccine Center, Emory National Primate Research Center, Emory University, 954 Gatewood Road, Atlanta, GA 30329, USA; (T.v.B.); (L.M.H.); (K.F.); (M.S.S.)
- Department of Pediatrics, Division of Infectious Diseases, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, Atlanta, GA 30332, USA; (H.Z.); (J.S.)
| | - Joshy Jacob
- Emory Vaccine Center, Emory National Primate Research Center, Emory University, 954 Gatewood Road, Atlanta, GA 30329, USA; (T.v.B.); (L.M.H.); (K.F.); (M.S.S.)
| |
Collapse
|
3
|
de Sá Queiroz JHF, dos Santos Barbosa M, Miranda LGO, de Oliveira NR, Dellagostin OA, Marchioro SB, Simionatto S. Tp0684, Tp0750, and Tp0792 Recombinant Proteins as Antigens for the Serodiagnosis of Syphilis. Indian J Microbiol 2022; 62:419-427. [PMID: 35974924 PMCID: PMC9375814 DOI: 10.1007/s12088-022-01017-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 03/20/2022] [Indexed: 11/28/2022] Open
Abstract
The incidence of syphilis has increased alarmingly over the years. Its diagnosis continues to be a challenge, leading to the search for new alternative and effective methods. The objective of this study was to select and evaluate three Treponema pallidum recombinant proteins for potential use in syphilis serodiagnosis. Bioinformatics analysis was performed with three T. pallidum antigens (Tp0684, Tp0750, and Tp0792) to assess their physical, antigenic, and structural characteristics. The antigens were chemically synthesized, recombinant plasmids were expressed in Escherichia coli BL21 Star™ (DE3), and the recombinant proteins were purified by nickel affinity chromatography. The antigenicity of the recombinant proteins was evaluated by western blotting and enzyme-linked immunosorbent assay (ELISA), using the sera from patients with primary and latent syphilis. In silico analysis indicated the antigenic potential once the exposed B cell epitopes were detected in the evaluated proteins. Sera from patients with primary and latent syphilis specifically recognized rTp0684, rTp0750, and rTp0792 recombinant antigens. Moreover, the rTp0684-ELISA receiver operating characteristic (ROC) analysis showed an area under the ROC curve of 0.99, indicating high diagnostic efficacy with 97.62% specificity and 95% sensitivity. In conclusion, rTp0684 showed better potential as an antigen for the development of syphilis serodiagnosis. Thus, bioinformatic analysis can be an important tool to guide the selection of antigens for serological diagnosis. Supplementary Information The online version contains supplementary material available at 10.1007/s12088-022-01017-w.
Collapse
Affiliation(s)
- Júlio Henrique Ferreira de Sá Queiroz
- Laboratório de Pesquisa em Ciências da Saúde, Universidade Federal da Grande Dourados - UFGD, Rodovia Dourados - Itahum, km 12, Cidade Universitária, Dourados, MS 79804970 Brazil
| | - Marcelo dos Santos Barbosa
- Laboratório de Pesquisa em Ciências da Saúde, Universidade Federal da Grande Dourados - UFGD, Rodovia Dourados - Itahum, km 12, Cidade Universitária, Dourados, MS 79804970 Brazil
| | - Lais Gonçalves Ortolani Miranda
- Laboratório de Pesquisa em Ciências da Saúde, Universidade Federal da Grande Dourados - UFGD, Rodovia Dourados - Itahum, km 12, Cidade Universitária, Dourados, MS 79804970 Brazil
| | | | | | - Silvana Beutinger Marchioro
- Laboratório de Pesquisa em Ciências da Saúde, Universidade Federal da Grande Dourados - UFGD, Rodovia Dourados - Itahum, km 12, Cidade Universitária, Dourados, MS 79804970 Brazil
- Laboratório de Imunologia e Biologia Molecular, Instituto de Ciências da Saúde, Universidade Federal da Bahia, Salvador, BA Brazil
| | - Simone Simionatto
- Laboratório de Pesquisa em Ciências da Saúde, Universidade Federal da Grande Dourados - UFGD, Rodovia Dourados - Itahum, km 12, Cidade Universitária, Dourados, MS 79804970 Brazil
| |
Collapse
|
4
|
Nie L, Quan L, Wu T, He R, Lyu Q. TransPPMP: predicting pathogenicity of frameshift and non-sense mutations by a Transformer based on protein features. Bioinformatics 2022; 38:2705-2711. [PMID: 35561183 DOI: 10.1093/bioinformatics/btac188] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Revised: 01/04/2022] [Accepted: 03/26/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein structure can be severely disrupted by frameshift and non-sense mutations at specific positions in the protein sequence. Frameshift and non-sense mutation cases can also be found in healthy individuals. A method to distinguish neutral and potentially disease-associated frameshift and non-sense mutations is of practical and fundamental importance. It would allow researchers to rapidly screen out the potentially pathogenic sites from a large number of mutated genes and then use these sites as drug targets to speed up diagnosis and improve access to treatment. The problem of how to distinguish between neutral and potentially disease-associated frameshift and non-sense mutations remains under-researched. RESULTS We built a Transformer-based neural network model to predict the pathogenicity of frameshift and non-sense mutations on protein features and named it TransPPMP. The feature matrix of contextual sequences computed by the ESM pre-training model, type of mutation residue and the auxiliary features, including structure and function information, are combined as input features, and the focal loss function is designed to solve the sample imbalance problem during the training. In 10-fold cross-validation and independent blind test set, TransPPMP showed good robust performance and absolute advantages in all evaluation metrics compared with four other advanced methods, namely, ENTPRISE-X, VEST-indel, DDIG-in and CADD. In addition, we demonstrate the usefulness of the multi-head attention mechanism in Transformer to predict the pathogenicity of mutations-not only can multiple self-attention heads learn local and global interactions but also functional sites with a large influence on the mutated residue can be captured by attention focus. These could offer useful clues to study the pathogenicity mechanism of human complex diseases for which traditional machine learning methods fall short. AVAILABILITY AND IMPLEMENTATION TransPPMP is available at https://github.com/lennylv/TransPPMP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Liangpeng Nie
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Lijun Quan
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
- Province Key Lab for Information Processing Technologies, Soochow University, Suzhou 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Tingfang Wu
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
- Province Key Lab for Information Processing Technologies, Soochow University, Suzhou 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Ruji He
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Qiang Lyu
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
- Province Key Lab for Information Processing Technologies, Soochow University, Suzhou 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| |
Collapse
|
5
|
Masibag AN, Bergin CJ, Haebe JR, Zouggar A, Shah MS, Sandouka T, Mendes da Silva A, Desrochers FM, Fournier-Morin A, Benoit YD. Pharmacological targeting of Sam68 functions in colorectal cancer stem cells. iScience 2021; 24:103442. [PMID: 34877499 PMCID: PMC8633986 DOI: 10.1016/j.isci.2021.103442] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 10/09/2021] [Accepted: 11/10/2021] [Indexed: 01/20/2023] Open
Abstract
Cancer stem cells (CSCs) are documented to play a key role in tumorigenesis and therapy resistance. Despite significant progress in clinical oncology, CSC reservoirs remain elusive and difficult to eliminate. Reverse-turn peptidomimetics were characterized as disruptors of CBP/beta-Catenin interactions and represent a promising avenue to curb hyperactive canonical Wnt/beta-Catenin signaling in CSCs. Recent studies suggested Sam68 as a critical mediator of reverse-turn peptidomimetics response in CSC populations. Using computational and biochemical approaches we confirmed Sam68 as a primary target of reverse-turn peptidomimetics. Furthermore, we executed an in silico drug discovery pipeline to identify yet uncharacterized reverse-turn peptidomimetic structures displaying superior anti-CSC activity in transformed pluripotent and colorectal cancer cell models. Thus, we identified YB-0158 as a reverse-turn peptidomimetic small molecule with enhanced translational potential, altering key hallmarks of human colorectal CSCs in patient-derived ex vivo organoids and in vivo serial tumor transplantation. Sam68 is a direct protein target of reverse-turn peptidomimetic small molecules YB-0158 is a peptidomimetic structure with high predicted affinity for Sam68 YB-0158 elicits a cancer-selective response impeding main cancer stem cell hallmarks YB-0158 blocks cancer stem cell activity in tumor organoids and in vivo systems
Collapse
Affiliation(s)
- Angelique N Masibag
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Christopher J Bergin
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Joshua R Haebe
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Aïcha Zouggar
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Muhammad S Shah
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Tamara Sandouka
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Amanda Mendes da Silva
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - François M Desrochers
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Aube Fournier-Morin
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Yannick D Benoit
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| |
Collapse
|
6
|
Skolnick J, Gao M, Zhou H, Singh S. AlphaFold 2: Why It Works and Its Implications for Understanding the Relationships of Protein Sequence, Structure, and Function. J Chem Inf Model 2021; 61:4827-4831. [PMID: 34586808 DOI: 10.1021/acs.jcim.1c01114] [Citation(s) in RCA: 89] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
AlphaFold 2 (AF2) was the star of CASP14, the last biannual structure prediction experiment. Using novel deep learning, AF2 predicted the structures of many difficult protein targets at or near experimental resolution. Here, we present our perspective of why AF2 works and show that it is a very sophisticated fold recognition algorithm that exploits the completeness of the library of single domain PDB structures. It has also learned local side chain packing rearrangements that enable it to refine proteins to high resolution. The benefits and limitations of its ability to predict the structures of many more proteins at or close to atomic detail are discussed.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Mu Gao
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Suresh Singh
- Twilight Design, 4 Adams Road, Kendall Park, New Jersey 08824, United States
| |
Collapse
|
7
|
Skolnick J, Gao M. The role of local versus nonlocal physicochemical restraints in determining protein native structure. Curr Opin Struct Biol 2020; 68:1-8. [PMID: 33129066 DOI: 10.1016/j.sbi.2020.10.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 10/03/2020] [Accepted: 10/05/2020] [Indexed: 12/15/2022]
Abstract
The tertiary structure of a native protein is dictated by the interplay of local secondary structure propensities, hydrogen bonding, and tertiary interactions. It is argued that the space of known protein topologies covers all single domain folds and results from the compactness of the native structure and excluded volume. Protein compactness combined with the chirality of the protein's side chains also yields native-like Ramachandran plots. It is the many-body, tertiary interactions among residues that collectively select for the global structure that a particular protein sequence adopts. This explains why the recent advances in deep-learning approaches that predict protein side-chain contacts, the distance matrix between residues, and sequence alignments are successful. They succeed because they implicitly learned the many-body interactions among protein residues.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, GA 30332, United States.
| | - Mu Gao
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, GA 30332, United States.
| |
Collapse
|
8
|
Calcott MJ, Owen JG, Ackerley DF. Efficient rational modification of non-ribosomal peptides by adenylation domain substitution. Nat Commun 2020; 11:4554. [PMID: 32917865 PMCID: PMC7486941 DOI: 10.1038/s41467-020-18365-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Accepted: 08/19/2020] [Indexed: 12/22/2022] Open
Abstract
Non-ribosomal peptide synthetase (NRPS) enzymes form modular assembly-lines, wherein each module governs the incorporation of a specific monomer into a short peptide product. Modules are comprised of one or more key domains, including adenylation (A) domains, which recognise and activate the monomer substrate; condensation (C) domains, which catalyse amide bond formation; and thiolation (T) domains, which shuttle reaction intermediates between catalytic domains. This arrangement offers prospects for rational peptide modification via substitution of substrate-specifying domains. For over 20 years, it has been considered that C domains play key roles in proof-reading the substrate; a presumption that has greatly complicated rational NRPS redesign. Here we present evidence from both directed and natural evolution studies that any substrate-specifying role for C domains is likely to be the exception rather than the rule, and that novel non-ribosomal peptides can be generated by substitution of A domains alone. We identify permissive A domain recombination boundaries and show that these allow us to efficiently generate modified pyoverdine peptides at high yields. We further demonstrate the transferability of our approach in the PheATE-ProCAT model system originally used to infer C domain substrate specificity, generating modified dipeptide products at yields that are inconsistent with the prevailing dogma.
Collapse
Affiliation(s)
- Mark J Calcott
- School of Biological Sciences, Victoria University of Wellington, Wellington, New Zealand
- Centre for Biodiscovery and Maurice Wilkins Centre for Molecular Biodiscovery, Victoria University of Wellington, Wellington, New Zealand
| | - Jeremy G Owen
- School of Biological Sciences, Victoria University of Wellington, Wellington, New Zealand
- Centre for Biodiscovery and Maurice Wilkins Centre for Molecular Biodiscovery, Victoria University of Wellington, Wellington, New Zealand
| | - David F Ackerley
- School of Biological Sciences, Victoria University of Wellington, Wellington, New Zealand.
- Centre for Biodiscovery and Maurice Wilkins Centre for Molecular Biodiscovery, Victoria University of Wellington, Wellington, New Zealand.
| |
Collapse
|
9
|
Zhou H, Cao H, Matyunina L, Shelby M, Cassels L, McDonald JF, Skolnick J. MEDICASCY: A Machine Learning Approach for Predicting Small-Molecule Drug Side Effects, Indications, Efficacy, and Modes of Action. Mol Pharm 2020; 17:1558-1574. [PMID: 32237745 PMCID: PMC7319183 DOI: 10.1021/acs.molpharmaceut.9b01248] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
To improve the drug discovery yield, a method which is implemented at the beginning of drug discovery that accurately predicts drug side effects, indications, efficacy, and mode of action based solely on the input of the drug's chemical structure is needed. In contrast, extant predictive methods do not comprehensively address these aspects of drug discovery and rely on features derived from extensive, often unavailable experimental information for novel molecules. To address these issues, we developed MEDICASCY, a multilabel-based boosted random forest machine learning method that only requires the small molecule's chemical structure for the drug side effect, indication, efficacy, and probable mode of action target predictions; however, it has comparable or even significantly better performance than existing approaches requiring far more information. In retrospective benchmarking on high confidence predictions, MEDICASCY shows about 78% precision and recall for predicting at least one severe side effect and 72% precision drug efficacy. Experimental validation of MEDICASCY's efficacy predictions on novel molecules shows close to 80% precision for the inhibition of growth in ovarian, breast, and prostate cancer cell lines. Thus, MEDICASCY should improve the success rate for new drug approval. A web service for academic users is available at http://pwp.gatech.edu/cssb/MEDICASCY.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, N.W., Atlanta, GA 30332
| | - Hongnan Cao
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, N.W., Atlanta, GA 30332
| | - Lilya Matyunina
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, 30332-0230, USA
| | - Madelyn Shelby
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, 30332-0230, USA
| | - Lauren Cassels
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, 30332-0230, USA
| | - John F. McDonald
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, 30332-0230, USA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, N.W., Atlanta, GA 30332
| |
Collapse
|
10
|
Karczyńska AS, Ziȩba K, Uciechowska U, Mozolewska MA, Krupa P, Lubecka EA, Lipska AG, Sikorska C, Samsonov SA, Sieradzan AK, Giełdoń A, Liwo A, Ślusarz R, Ślusarz M, Lee J, Joo K, Czaplewski C. Improved Consensus-Fragment Selection in Template-Assisted Prediction of Protein Structures with the UNRES Force Field in CASP13. J Chem Inf Model 2020; 60:1844-1864. [PMID: 31999919 PMCID: PMC7588044 DOI: 10.1021/acs.jcim.9b00864] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
The method for protein-structure
prediction, which combines the
physics-based coarse-grained UNRES force field with knowledge-based
modeling, has been developed further and tested in the 13th Community
Wide Experiment on the Critical Assessment of Techniques for Protein
Structure Prediction (CASP13). The method implements restraints from
the consensus fragments common to server models. In this work, the
server models to derive fragments have been chosen on the basis of
quality assessment; a fully automatic fragment-selection procedure
has been introduced, and Dynamic Fragment Assembly pseudopotentials
have been fully implemented. The Global Distance Test Score (GDT_TS),
averaged over our “Model 1” predictions, increased by
over 10 units with respect to CASP12 for the free-modeling category
to reach 40.82. Our “Model 1” predictions ranked 20
and 14 for all and free-modeling targets, respectively (upper 20.2%
and 14.3% of all models submitted to CASP13 in these categories, respectively),
compared to 27 (upper 21.1%) and 24 (upper 18.9%) in CASP12, respectively.
For oligomeric targets, the Interface Patch Similarity (IPS) and Interface
Contact Similarity (ICS) averaged over our best oligomer models increased
from 0.28 to 0.36 and from 12.4 to 17.8, respectively, from CASP12
to CASP13, and top-ranking models of 2 targets (H0968 and T0997o)
were obtained (none in CASP12). The improvement of our method in CASP13
over CASP12 was ascribed to the combined effect of the overall enhancement
of server-model quality, our success in selecting server models and
fragments to derive restraints, and improvements of the restraint
and potential-energy functions.
Collapse
Affiliation(s)
| | - Karolina Ziȩba
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Urszula Uciechowska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Magdalena A Mozolewska
- Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, Warsaw PL-02668, Poland
| | - Paweł Krupa
- Institute of Physics, Polish Academy of Sciences, Aleja Lotników 32/46, Warsaw PL-02668, Poland
| | - Emilia A Lubecka
- Institute of Informatics, Faculty of Mathematics, Physics, and Informatics, University of Gdańsk, Wita Stwosza 57, Gdańsk 80-308, Poland
| | - Agnieszka G Lipska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Celina Sikorska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Sergey A Samsonov
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Adam K Sieradzan
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland.,School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Artur Giełdoń
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland.,School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Rafał Ślusarz
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Magdalena Ślusarz
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Jooyoung Lee
- School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Cezary Czaplewski
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| |
Collapse
|
11
|
DESTINI: A deep-learning approach to contact-driven protein structure prediction. Sci Rep 2019; 9:3514. [PMID: 30837676 PMCID: PMC6401133 DOI: 10.1038/s41598-019-40314-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 02/12/2019] [Indexed: 11/09/2022] Open
Abstract
The amino acid sequence of a protein encodes the blueprint of its native structure. To predict the corresponding structural fold from the protein’s sequence is one of most challenging problems in computational biology. In this work, we introduce DESTINI (deep structural inference for proteins), a novel computational approach that combines a deep-learning algorithm for protein residue/residue contact prediction with template-based structural modelling. For the first time, the significantly improved predictive ability is demonstrated in the large-scale tertiary structure prediction of over 1,200 single-domain proteins. DESTINI successfully predicts the tertiary structure of four times the number of “hard” targets (those with poor quality templates) that were previously intractable, viz, a “glass-ceiling” for previous template-based approaches, and also improves model quality for “easy” targets (those with good quality templates). The significantly better performance by DESTINI is largely due to the incorporation of better contact prediction into template modelling. To understand why deep-learning accomplishes more accurate contact prediction, systematic clustering reveals that deep-learning predicts coherent, native-like contact patterns compared to co-evolutionary analysis. Taken together, this work presents a promising strategy towards solving the protein structure prediction problem.
Collapse
|
12
|
Zhou H, Cao H, Skolnick J. FINDSITE comb2.0: A New Approach for Virtual Ligand Screening of Proteins and Virtual Target Screening of Biomolecules. J Chem Inf Model 2018; 58:2343-2354. [PMID: 30278128 DOI: 10.1021/acs.jcim.8b00309] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Computational approaches for predicting protein-ligand interactions can facilitate drug lead discovery and drug target determination. We have previously developed a threading/structural-based approach, FINDSITEcomb, for the virtual ligand screening of proteins that has been extensively experimentally validated. Even when low resolution predicted protein structures are employed, FINDSITEcomb has the advantage of being faster and more accurate than traditional high-resolution structure-based docking methods. It also overcomes the limitations of traditional QSAR methods that require a known set of seed ligands that bind to the given protein target. Here, we further improve FINDSITEcomb by enhancing its template ligand selection from the PDB/DrugBank/ChEMBL libraries of known protein-ligand interactions by (1) parsing the template proteins and their corresponding binding ligands in the DrugBank and ChEMBL libraries into domains so that the ligands with falsely matched domains to the targets will not be selected as template ligands; (2) applying various thresholds to filter out falsely matched template structures in the structure comparison process and thus their corresponding ligands for template ligand selection. With a sequence identity cutoff of 30% of target to templates and modeled target structures, FINDSITEcomb2.0 is shown to significantly improve upon FINDSITEcomb on the DUD-E benchmark set by increasing the 1% enrichment factor from 16.7 to 22.1, with a p-value of 4.3 × 10-3 by the Student t-test. With an 80% sequence identity cutoff of target to templates for the DUD-E set and modeled target structures, FINDSITEcomb2.0, having a 1% ROC enrichment factor of 52.39, also outperforms state-of-the-art methods that employ machine learning such as a deep convolutional neural network, CNN, with an enrichment of 29.65. Thus, FINDSITEcomb2.0 represents a significant improvement in the state-of-the-art. The FINDSITEcomb2.0 web service is freely available for academic users at http://pwp.gatech.edu/cssb/FINDSITE-COMB-2 .
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences , Georgia Institute of Technology , 950 Atlantic Drive, NW , Atlanta , Georgia 30332-2000 , United States
| | - Hongnan Cao
- Center for the Study of Systems Biology, School of Biological Sciences , Georgia Institute of Technology , 950 Atlantic Drive, NW , Atlanta , Georgia 30332-2000 , United States
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences , Georgia Institute of Technology , 950 Atlantic Drive, NW , Atlanta , Georgia 30332-2000 , United States
| |
Collapse
|
13
|
Chen M, Lin X, Lu W, Schafer NP, Onuchic JN, Wolynes PG. Template-Guided Protein Structure Prediction and Refinement Using Optimized Folding Landscape Force Fields. J Chem Theory Comput 2018; 14:6102-6116. [PMID: 30240202 DOI: 10.1021/acs.jctc.8b00683] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
When good structural templates can be identified, template-based modeling is the most reliable way to predict the tertiary structure of proteins. In this study, we combine template-based modeling with a realistic coarse-grained force field, AWSEM, that has been optimized using the principles of energy landscape theory. The Associative memory, Water mediated, Structure and Energy Model (AWSEM) is a coarse-grained force field having both transferable tertiary interactions and knowledge-based local-in-sequence interaction terms. We incorporate template information into AWSEM by introducing soft collective biases to the template structures, resulting in a model that we call AWSEM-Template. Structure prediction tests on eight targets, four of which are in the low sequence identity "twilight zone" of homology modeling, show that AWSEM-Template can achieve high-resolution structure prediction. Our results also confirm that using a combination of AWSEM and a template-guided potential leads to more accurate prediction of protein structures than simply using a template-guided potential alone. Free energy profile analyses demonstrate that the soft collective biases to the template effectively increase funneling toward native-like structures while still allowing significant flexibility so as to allow for correction of discrepancies between the target structure and the template. A further stage of refinement using all-atom molecular dynamics augmented with soft collective biases to the structures predicted by AWSEM-Template leads to a further improvement of both backbone and side-chain accuracy by maintaining sufficient flexibility but at the same time discouraging unproductive unfolding events often seen in unrestrained all-atom refinement simulations. The all-atom refinement simulations also reduce patches of frustration of the initial predictions. Some of the backbones found among the structures produced during the initial coarse-grained prediction step already have CE-RMSD values of less than 3 Å with 90% or more of the residues aligned to the experimentally solved structure for all targets. All-atom structures generated during the following all-atom refinement simulations, which started from coarse-grained structures that were chosen without reference to any knowledge about the native structure, have CE-RMSD values of less than 2.5 Å with 90% or more of the residues aligned for 6 out of 8 targets. Clustering low energy structures generated during the initial coarse-grained annealing picks out reliably structures that are within 1 Å of the best sampled structures in 5 out of 8 cases. After the all-atom refinement, structures that are within 1 Å of the best sampled structures can be selected using a simple algorithm based on energetic features alone in 7 out of 8 cases.
Collapse
Affiliation(s)
- Mingchen Chen
- Center for Theoretical Biological Physics, Rice University , Houston , Texas 77030 , United States.,Department of Bioengineering , Rice University , Houston , Texas 77005 , United States
| | - Xingcheng Lin
- Center for Theoretical Biological Physics, Rice University , Houston , Texas 77030 , United States.,Department of Physics and Astronomy , Rice University , Houston , Texas 77005 , United States
| | - Wei Lu
- Center for Theoretical Biological Physics, Rice University , Houston , Texas 77030 , United States.,Department of Physics and Astronomy , Rice University , Houston , Texas 77005 , United States
| | - Nicholas P Schafer
- Center for Theoretical Biological Physics, Rice University , Houston , Texas 77030 , United States.,Department of Chemistry , Rice University , Houston , Texas 77005 , United States
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University , Houston , Texas 77030 , United States.,Department of Physics and Astronomy , Rice University , Houston , Texas 77005 , United States.,Department of Chemistry , Rice University , Houston , Texas 77005 , United States.,Department of Biosciences , Rice University , Houston , Texas 77005 , United States
| | - Peter G Wolynes
- Center for Theoretical Biological Physics, Rice University , Houston , Texas 77030 , United States.,Department of Chemistry , Rice University , Houston , Texas 77005 , United States.,Department of Biosciences , Rice University , Houston , Texas 77005 , United States
| |
Collapse
|
14
|
Zhou H, Gao M, Skolnick J. ENTPRISE-X: Predicting disease-associated frameshift and nonsense mutations. PLoS One 2018; 13:e0196849. [PMID: 29723276 PMCID: PMC5933770 DOI: 10.1371/journal.pone.0196849] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 04/20/2018] [Indexed: 01/11/2023] Open
Abstract
To exploit the plethora of information provided by Next Generation Sequencing, the identification of the genetic mutations responsible for disease in general or cancer in particular, among the thousands of neutral germline or somatic variations is a crucial task. Genome-wide association studies for the detection of disease-associated genes or cancer drivers can only identify common variations or driver genes in a cohort of patients. Thus, they cannot discover unique disease-associated mutations or cancer driver genes on a personal basis. Moreover, even when there are such common variations, their significance is unknown. Here, we extend the machine learning based approach ENTPRISE developed for predicting the disease association of missense mutations to frameshift and nonsense mutations. The new approach, ENTPRISE-X, is shown to outperform the state-of-the-art methods VEST-indel and DDIG-in for predicting the disease association of germline frameshift mutations in terms of balanced measure Matthew’s correlation coefficient, MCC, with a MCC of 0.586 for ENTPRISE-X, versus 0.412 by VEST-indel and 0.321 by DDIG-in, respectively. Large scale testing on the ExAC dataset shows ENTPRISE-X has a much lower fraction of 16% of variations classified as disease causing, as compared to VEST-indel’s 26% and DDIG-in’s 65% of predictions as being disease-associated. A web server for ENTPRISE-X is freely available for academic users at http://cssb2.biology.gatech.edu/entprise-x.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Mu Gao
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, United States of America
- * E-mail:
| |
Collapse
|
15
|
Gupta P, Dash PK. Molecular details of secretory phospholipase A 2 from flax (Linum usitatissimum L.) provide insight into its structure and function. Sci Rep 2017; 7:11080. [PMID: 28894144 DOI: 10.1038/s41598-017-109699] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 08/17/2017] [Indexed: 05/29/2023] Open
Abstract
Secretory phospholipase A2 (sPLA2) are low molecular weight proteins (12-18 kDa) involved in a suite of plant cellular processes imparting growth and development. With myriad roles in physiological and biochemical processes in plants, detailed analysis of sPLA2 in flax/linseed is meagre. The present work, first in flax, embodies cloning, expression, purification and molecular characterisation of two distinct sPLA2s (I and II) from flax. PLA2 activity of the cloned sPLA2s were biochemically assayed authenticating them as bona fide phospholipase A2. Physiochemical properties of both the sPLA2s revealed they are thermostable proteins requiring di-valent cations for optimum activity.While, structural analysis of both the proteins revealed deviations in the amino acid sequence at C- & N-terminal regions; hydropathic study revealed LusPLA2I as a hydrophobic protein and LusPLA2II as a hydrophilic protein. Structural analysis of flax sPLA2s revealed that secondary structure of both the proteins are dominated by α-helix followed by random coils. Modular superimposition of LusPLA2 isoforms with rice sPLA2 confirmed monomeric structural preservation among plant phospholipase A2 and provided insight into structure of folded flax sPLA2s.
Collapse
Affiliation(s)
- Payal Gupta
- ICAR-National Research Centre on Plant Biotechnology, Pusa Campus, New Delhi, 110012, India.
- Department of Biotechnology, Kurukshetra University, Thanesar, 136119, India.
| | - Prasanta K Dash
- ICAR-National Research Centre on Plant Biotechnology, Pusa Campus, New Delhi, 110012, India.
| |
Collapse
|
16
|
Gupta P, Dash PK. Molecular details of secretory phospholipase A 2 from flax (Linum usitatissimum L.) provide insight into its structure and function. Sci Rep 2017; 7:11080. [PMID: 28894144 PMCID: PMC5593939 DOI: 10.1038/s41598-017-10969-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 08/17/2017] [Indexed: 01/19/2023] Open
Abstract
Secretory phospholipase A2 (sPLA2) are low molecular weight proteins (12-18 kDa) involved in a suite of plant cellular processes imparting growth and development. With myriad roles in physiological and biochemical processes in plants, detailed analysis of sPLA2 in flax/linseed is meagre. The present work, first in flax, embodies cloning, expression, purification and molecular characterisation of two distinct sPLA2s (I and II) from flax. PLA2 activity of the cloned sPLA2s were biochemically assayed authenticating them as bona fide phospholipase A2. Physiochemical properties of both the sPLA2s revealed they are thermostable proteins requiring di-valent cations for optimum activity.While, structural analysis of both the proteins revealed deviations in the amino acid sequence at C- & N-terminal regions; hydropathic study revealed LusPLA2I as a hydrophobic protein and LusPLA2II as a hydrophilic protein. Structural analysis of flax sPLA2s revealed that secondary structure of both the proteins are dominated by α-helix followed by random coils. Modular superimposition of LusPLA2 isoforms with rice sPLA2 confirmed monomeric structural preservation among plant phospholipase A2 and provided insight into structure of folded flax sPLA2s.
Collapse
Affiliation(s)
- Payal Gupta
- ICAR-National Research Centre on Plant Biotechnology, Pusa Campus, New Delhi, 110012, India.
- Department of Biotechnology, Kurukshetra University, Thanesar, 136119, India.
| | - Prasanta K Dash
- ICAR-National Research Centre on Plant Biotechnology, Pusa Campus, New Delhi, 110012, India.
| |
Collapse
|
17
|
Ando M, Fiesel FC, Hudec R, Caulfield TR, Ogaki K, Górka-Skoczylas P, Koziorowski D, Friedman A, Chen L, Dawson VL, Dawson TM, Bu G, Ross OA, Wszolek ZK, Springer W. The PINK1 p.I368N mutation affects protein stability and ubiquitin kinase activity. Mol Neurodegener 2017; 12:32. [PMID: 28438176 PMCID: PMC5404317 DOI: 10.1186/s13024-017-0174-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2017] [Accepted: 04/14/2017] [Indexed: 01/24/2023] Open
Abstract
Background Mutations in PINK1 and PARKIN are the most common causes of recessive early-onset Parkinson’s disease (EOPD). Together, the mitochondrial ubiquitin (Ub) kinase PINK1 and the cytosolic E3 Ub ligase PARKIN direct a complex regulated, sequential mitochondrial quality control. Thereby, damaged mitochondria are identified and targeted to degradation in order to prevent their accumulation and eventually cell death. Homozygous or compound heterozygous loss of either gene function disrupts this protective pathway, though at different steps and by distinct mechanisms. While structure and function of PARKIN variants have been well studied, PINK1 mutations remain poorly characterized, in particular under endogenous conditions. A better understanding of the exact molecular pathogenic mechanisms underlying the pathogenicity is crucial for rational drug design in the future. Methods Here, we characterized the pathogenicity of the PINK1 p.I368N mutation on the clinical and genetic as well as on the structural and functional level in patients’ fibroblasts and in cell-based, biochemical assays. Results Under endogenous conditions, PINK1 p.I368N is expressed, imported, and N-terminally processed in healthy mitochondria similar to PINK1 wild type (WT). Upon mitochondrial damage, however, full-length PINK1 p.I368N is not sufficiently stabilized on the outer mitochondrial membrane (OMM) resulting in loss of mitochondrial quality control. We found that binding of PINK1 p.I368N to the co-chaperone complex HSP90/CDC37 is reduced and stress-induced interaction with TOM40 of the mitochondrial protein import machinery is abolished. Analysis of a structural PINK1 p.I368N model additionally suggested impairments of Ub kinase activity as the ATP-binding pocket was found deformed and the substrate Ub was slightly misaligned within the active site of the kinase. Functional assays confirmed the lack of Ub kinase activity. Conclusions Here we demonstrated that mutant PINK1 p.I368N can not be stabilized on the OMM upon mitochondrial stress and due to conformational changes in the active site does not exert kinase activity towards Ub. In patients’ fibroblasts, biochemical assays and by structural analyses, we unraveled two pathomechanisms that lead to loss of function upon mutation of p.I368N and highlight potential strategies for future drug development. Electronic supplementary material The online version of this article (doi:10.1186/s13024-017-0174-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maya Ando
- Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road, Jacksonville, FL, 32224, USA
| | - Fabienne C Fiesel
- Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road, Jacksonville, FL, 32224, USA.,Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL, 32224, USA
| | - Roman Hudec
- Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road, Jacksonville, FL, 32224, USA
| | - Thomas R Caulfield
- Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road, Jacksonville, FL, 32224, USA.,Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL, 32224, USA
| | - Kotaro Ogaki
- Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road, Jacksonville, FL, 32224, USA
| | - Paulina Górka-Skoczylas
- Department of Medical Genetics, Institute of Mother and Child, Warsaw, Poland.,Institute of Genetics and Biotechnology, Faculty of Biology, Warsaw University, Warsaw, Poland
| | - Dariusz Koziorowski
- Department of Neurology, Faculty of Health Science, Medical University of Warsaw, Warsaw, Poland
| | - Andrzej Friedman
- Department of Neurology, Faculty of Health Science, Medical University of Warsaw, Warsaw, Poland
| | - Li Chen
- Neuroregeneration and Stem Cell Programs, Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.,Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.,Adrienne Helis Malvin Medical Research Foundation, New Orleans, LA, 70130-2685, USA
| | - Valina L Dawson
- Neuroregeneration and Stem Cell Programs, Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.,Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.,Adrienne Helis Malvin Medical Research Foundation, New Orleans, LA, 70130-2685, USA.,Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.,Department of Physiology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Ted M Dawson
- Neuroregeneration and Stem Cell Programs, Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.,Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.,Adrienne Helis Malvin Medical Research Foundation, New Orleans, LA, 70130-2685, USA.,Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.,Department of Pharmacology and Molecular Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Guojun Bu
- Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road, Jacksonville, FL, 32224, USA.,Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL, 32224, USA
| | - Owen A Ross
- Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road, Jacksonville, FL, 32224, USA.,Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL, 32224, USA
| | | | - Wolfdieter Springer
- Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road, Jacksonville, FL, 32224, USA. .,Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL, 32224, USA.
| |
Collapse
|
18
|
Coluzza I. Computational protein design: a review. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2017; 29:143001. [PMID: 28140371 DOI: 10.1088/1361-648x/aa5c76] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Proteins are one of the most versatile modular assembling systems in nature. Experimentally, more than 110 000 protein structures have been identified and more are deposited every day in the Protein Data Bank. Such an enormous structural variety is to a first approximation controlled by the sequence of amino acids along the peptide chain of each protein. Understanding how the structural and functional properties of the target can be encoded in this sequence is the main objective of protein design. Unfortunately, rational protein design remains one of the major challenges across the disciplines of biology, physics and chemistry. The implications of solving this problem are enormous and branch into materials science, drug design, evolution and even cryptography. For instance, in the field of drug design an effective computational method to design protein-based ligands for biological targets such as viruses, bacteria or tumour cells, could give a significant boost to the development of new therapies with reduced side effects. In materials science, self-assembly is a highly desired property and soon artificial proteins could represent a new class of designable self-assembling materials. The scope of this review is to describe the state of the art in computational protein design methods and give the reader an outline of what developments could be expected in the near future.
Collapse
Affiliation(s)
- Ivan Coluzza
- Computational Physics, Faculty of Physics, University of Vienna, Vienna, Austria
| |
Collapse
|
19
|
Puschmann A, Fiesel FC, Caulfield TR, Hudec R, Ando M, Truban D, Hou X, Ogaki K, Heckman MG, James ED, Swanberg M, Jimenez-Ferrer I, Hansson O, Opala G, Siuda J, Boczarska-Jedynak M, Friedman A, Koziorowski D, Rudzińska-Bar M, Aasly JO, Lynch T, Mellick GD, Mohan M, Silburn PA, Sanotsky Y, Vilariño-Güell C, Farrer MJ, Chen L, Dawson VL, Dawson TM, Wszolek ZK, Ross OA, Springer W. Heterozygous PINK1 p.G411S increases risk of Parkinson's disease via a dominant-negative mechanism. Brain 2016; 140:98-117. [PMID: 27807026 PMCID: PMC5379862 DOI: 10.1093/brain/aww261] [Citation(s) in RCA: 103] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2016] [Revised: 08/31/2016] [Accepted: 09/02/2016] [Indexed: 01/31/2023] Open
Abstract
See Gandhi and Plun-Favreau (doi:10.1093/aww320) for a scientific commentary on this article. Heterozygous mutations in recessive Parkinson’s disease genes have been postulated to increase disease risk. Puschmann et al. report a genetic association between heterozygous PINK1 p.G411S and Parkinson’s disease. They provide structural and functional explanations for a partial dominant-negative effect of the mutant protein, which impairs wild-type PINK1 activity through hetero-dimerization. See Gandhi and Plun-Favreau (doi:10.1093/aww320) for a scientific commentary on this article. It has been postulated that heterozygous mutations in recessive Parkinson’s genes may increase the risk of developing the disease. In particular, the PTEN-induced putative kinase 1 (PINK1) p.G411S (c.1231G>A, rs45478900) mutation has been reported in families with dominant inheritance patterns of Parkinson’s disease, suggesting that it might confer a sizeable disease risk when present on only one allele. We examined families with PINK1 p.G411S and conducted a genetic association study with 2560 patients with Parkinson’s disease and 2145 control subjects. Heterozygous PINK1 p.G411S mutations markedly increased Parkinson’s disease risk (odds ratio = 2.92, P = 0.032); significance remained when supplementing with results from previous studies on 4437 additional subjects (odds ratio = 2.89, P = 0.027). We analysed primary human skin fibroblasts and induced neurons from heterozygous PINK1 p.G411S carriers compared to PINK1 p.Q456X heterozygotes and PINK1 wild-type controls under endogenous conditions. While cells from PINK1 p.Q456X heterozygotes showed reduced levels of PINK1 protein and decreased initial kinase activity upon mitochondrial damage, stress-response was largely unaffected over time, as expected for a recessive loss-of-function mutation. By contrast, PINK1 p.G411S heterozygotes showed no decrease of PINK1 protein levels but a sustained, significant reduction in kinase activity. Molecular modelling and dynamics simulations as well as multiple functional assays revealed that the p.G411S mutation interferes with ubiquitin phosphorylation by wild-type PINK1 in a heterodimeric complex. This impairs the protective functions of the PINK1/parkin-mediated mitochondrial quality control. Based on genetic and clinical evaluation as well as functional and structural characterization, we established p.G411S as a rare genetic risk factor with a relatively large effect size conferred by a partial dominant-negative function phenotype.
Collapse
Affiliation(s)
- Andreas Puschmann
- 1 Lund University, Department of Clinical Sciences Lund, Neurology, Sweden .,2 Department of Neurology, Skåne University Hospital, Sweden.,3 Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Fabienne C Fiesel
- 3 Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA
| | | | - Roman Hudec
- 3 Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Maya Ando
- 3 Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Dominika Truban
- 3 Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Xu Hou
- 3 Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Kotaro Ogaki
- 3 Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Michael G Heckman
- 4 Division of Biomedical Statistics and Informatics, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Elle D James
- 3 Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Maria Swanberg
- 5 Lund University, Department of Experimental Medical Science, Lund, Sweden
| | | | - Oskar Hansson
- 6 Clinical Memory Research Unit, Department of Clinical Sciences Malmö, Lund University, Sweden.,7 Memory Clinic, Skåne University Hospital, Malmö, Sweden
| | - Grzegorz Opala
- 8 Department of Neurology, School of Medicine in Katowice, Medical University of Silesia, Katowice, Poland
| | - Joanna Siuda
- 8 Department of Neurology, School of Medicine in Katowice, Medical University of Silesia, Katowice, Poland
| | | | | | | | | | - Jan O Aasly
- 10 Department of Neurology, St. Olav's Hospital, and Department of Neuroscience, Norwegian University of Science and Technology, Trondheim, Norway
| | - Timothy Lynch
- 11 Dublin Neurological Institute at the Mater Misericordiae University Hospital, Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| | - George D Mellick
- 12 Eskitis Institute for Drug Discovery, Griffith University, Nathan, Queensland, Australia
| | - Megha Mohan
- 12 Eskitis Institute for Drug Discovery, Griffith University, Nathan, Queensland, Australia
| | - Peter A Silburn
- 12 Eskitis Institute for Drug Discovery, Griffith University, Nathan, Queensland, Australia.,13 University of Queensland, Asia-Pacific Centre for Neuromodulation, Centre for Clinical Research, Brisbane, Queensland, Australia
| | | | - Carles Vilariño-Güell
- 3 Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA.,15 Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Matthew J Farrer
- 3 Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA.,15 Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Li Chen
- 16 Neuroregeneration and Stem Cell Programs, Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,17 Solomon H Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,18 Adrienne Helis Malvin Medical Research Foundation, New Orleans, LA 70130-2685, USA
| | - Valina L Dawson
- 16 Neuroregeneration and Stem Cell Programs, Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,17 Solomon H Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,18 Adrienne Helis Malvin Medical Research Foundation, New Orleans, LA 70130-2685, USA.,19 Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,20 Department of Physiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Ted M Dawson
- 16 Neuroregeneration and Stem Cell Programs, Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,17 Solomon H Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,18 Adrienne Helis Malvin Medical Research Foundation, New Orleans, LA 70130-2685, USA.,19 Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,21 Department of Pharmacology and Molecular Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | | | - Owen A Ross
- 3 Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA.,23 School of Medicine and Medical Science, University College Dublin, Dublin, Ireland.,24 Mayo Graduate School, Neurobiology of Disease, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Wolfdieter Springer
- 3 Department of Neuroscience, Mayo Clinic, Jacksonville, FL 32224, USA .,24 Mayo Graduate School, Neurobiology of Disease, Mayo Clinic, Jacksonville, FL 32224, USA
| |
Collapse
|
20
|
Skolnick J, Zhou H. Why Is There a Glass Ceiling for Threading Based Protein Structure Prediction Methods? J Phys Chem B 2016; 121:3546-3554. [PMID: 27748116 DOI: 10.1021/acs.jpcb.6b09517] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Despite their different implementations, comparison of the best threading approaches to the prediction of evolutionary distant protein structures reveals that they tend to succeed or fail on the same protein targets. This is true despite the fact that the structural template library has good templates for all cases. Thus, a key question is why are certain protein structures threadable while others are not. Comparison with threading results on a set of artificial sequences selected for stability further argues that the failure of threading is due to the nature of the protein structures themselves. Using a new contact map based alignment algorithm, we demonstrate that certain folds are highly degenerate in that they can have very similar coarse grained fractions of native contacts aligned and yet differ significantly from the native structure. For threadable proteins, this is not the case. Thus, contemporary threading approaches appear to have reached a plateau, and new approaches to structure prediction are required.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology , 950 Atlantic Drive Northwest, Atlanta, Georgia 30318, United States
| | - Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology , 950 Atlantic Drive Northwest, Atlanta, Georgia 30318, United States
| |
Collapse
|
21
|
Srinivasan B, Zhou H, Mitra S, Skolnick J. Novel small molecule binders of human N-glycanase 1, a key player in the endoplasmic reticulum associated degradation pathway. Bioorg Med Chem 2016; 24:4750-4758. [PMID: 27567076 DOI: 10.1016/j.bmc.2016.08.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Revised: 08/08/2016] [Accepted: 08/12/2016] [Indexed: 12/30/2022]
Abstract
Peptide:N-glycanase (NGLY1) is an enzyme responsible for cleaving oligosaccharide moieties from misfolded glycoproteins to enable their proper degradation. Deletion and truncation mutations in this gene are responsible for an inherited disorder of the endoplasmic reticulum-associated degradation pathway. However, the literature is unclear whether the disorder is a result of mutations leading to loss-of-function, loss of substrate specificity, loss of protein stability or a combination of these factors. In this communication, without burdening ourselves with the mechanistic underpinning of disease causation because of mutations on the NGLY1 protein, we demonstrate the successful application of virtual ligand screening (VLS) combined with experimental high-throughput validation to the discovery of novel small-molecules that show binding to the transglutaminase domain of NGLY1. Attempts at recombinant expression and purification of six different constructs led to successful expression of five, with three constructs purified to homogeneity. Most mutant variants failed to purify possibly because of misfolding and the resultant exposure of surface hydrophobicity that led to protein aggregation. For the purified constructs, our threading/structure-based VLS algorithm, FINDSITE(comb), was employed to predict ligands that may bind to the protein. Then, the predictions were assessed by high-throughput differential scanning fluorimetry. This led to the identification of nine different ligands that bind to the protein of interest and provide clues to the nature of pharmacophore that facilitates binding. This is the first study that has identified novel ligands that bind to the NGLY1 protein as a possible starting point in the discovery of ligands with potential therapeutic applications in the treatment of the disorder caused by NGLY1 mutants.
Collapse
Affiliation(s)
- Bharath Srinivasan
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 950, Atlantic Drive, Atlanta, GA 30332, United States.
| | - Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 950, Atlantic Drive, Atlanta, GA 30332, United States
| | - Sreyoshi Mitra
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 950, Atlantic Drive, Atlanta, GA 30332, United States
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 950, Atlantic Drive, Atlanta, GA 30332, United States.
| |
Collapse
|
22
|
Mih N, Brunk E, Bordbar A, Palsson BO. A Multi-scale Computational Platform to Mechanistically Assess the Effect of Genetic Variation on Drug Responses in Human Erythrocyte Metabolism. PLoS Comput Biol 2016; 12:e1005039. [PMID: 27467583 PMCID: PMC4965186 DOI: 10.1371/journal.pcbi.1005039] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 06/27/2016] [Indexed: 12/31/2022] Open
Abstract
Progress in systems medicine brings promise to addressing patient heterogeneity and individualized therapies. Recently, genome-scale models of metabolism have been shown to provide insight into the mechanistic link between drug therapies and systems-level off-target effects while being expanded to explicitly include the three-dimensional structure of proteins. The integration of these molecular-level details, such as the physical, structural, and dynamical properties of proteins, notably expands the computational description of biochemical network-level properties and the possibility of understanding and predicting whole cell phenotypes. In this study, we present a multi-scale modeling framework that describes biological processes which range in scale from atomistic details to an entire metabolic network. Using this approach, we can understand how genetic variation, which impacts the structure and reactivity of a protein, influences both native and drug-induced metabolic states. As a proof-of-concept, we study three enzymes (catechol-O-methyltransferase, glucose-6-phosphate dehydrogenase, and glyceraldehyde-3-phosphate dehydrogenase) and their respective genetic variants which have clinically relevant associations. Using all-atom molecular dynamic simulations enables the sampling of long timescale conformational dynamics of the proteins (and their mutant variants) in complex with their respective native metabolites or drug molecules. We find that changes in a protein's structure due to a mutation influences protein binding affinity to metabolites and/or drug molecules, and inflicts large-scale changes in metabolism.
Collapse
Affiliation(s)
- Nathan Mih
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, California, United States of America
| | - Elizabeth Brunk
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
- * E-mail: (EB); (BOP)
| | - Aarash Bordbar
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Bernhard O. Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
- Department of Pediatrics, University of California, San Diego, La Jolla, California, United States of America
- * E-mail: (EB); (BOP)
| |
Collapse
|
23
|
ENTPRISE: An Algorithm for Predicting Human Disease-Associated Amino Acid Substitutions from Sequence Entropy and Predicted Protein Structures. PLoS One 2016; 11:e0150965. [PMID: 26982818 PMCID: PMC4794227 DOI: 10.1371/journal.pone.0150965] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Accepted: 02/21/2016] [Indexed: 01/02/2023] Open
Abstract
The advance of next-generation sequencing technologies has made exome sequencing rapid and relatively inexpensive. A major application of exome sequencing is the identification of genetic variations likely to cause Mendelian diseases. This requires processing large amounts of sequence information and therefore computational approaches that can accurately and efficiently identify the subset of disease-associated variations are needed. The accuracy and high false positive rates of existing computational tools leave much room for improvement. Here, we develop a boosted tree regression machine-learning approach to predict human disease-associated amino acid variations by utilizing a comprehensive combination of protein sequence and structure features. On comparing our method, ENTPRISE, to the state-of-the-art methods SIFT, PolyPhen-2, MUTATIONASSESSOR, MUTATIONTASTER, FATHMM, ENTPRISE exhibits significant improvement. In particular, on a testing dataset consisting of only proteins with balanced disease-associated and neutral variations defined as having the ratio of neutral/disease-associated variations between 0.3 and 3, the Mathews Correlation Coefficient by ENTPRISE is 0.493 as compared to 0.432 by PPH2-HumVar, 0.406 by SIFT, 0.403 by MUTATIONASSESSOR, 0.402 by PPH2-HumDiv, 0.305 by MUTATIONTASTER, and 0.181 by FATHMM. ENTPRISE is then applied to nucleic acid binding proteins in the human proteome. Disease-associated predictions are shown to be highly correlated with the number of protein-protein interactions. Both these predictions and the ENTPRISE server are freely available for academic users as a web service at http://cssb.biology.gatech.edu/entprise/.
Collapse
|
24
|
Roy A, Srinivasan B, Skolnick J. PoLi: A Virtual Screening Pipeline Based on Template Pocket and Ligand Similarity. J Chem Inf Model 2015. [PMID: 26225536 DOI: 10.1021/acs.jcim.5b00232] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Often in pharmaceutical research the goal is to identify small molecules that can interact with and appropriately modify the biological behavior of a new protein target. Unfortunately, most proteins lack both known structures and small molecule binders, prerequisites of many virtual screening, VS, approaches. For such proteins, ligand homology modeling, LHM, that copies ligands from homologous and perhaps evolutionarily distant template proteins, has been shown to be a powerful VS approach to identify possible binding ligands. However, if we want to target a specific pocket for which there is no homologous holo template protein structure, then LHM will not work. To address this issue, in a new pocket-based approach, PoLi, we generalize LHM by exploiting the fact that the number of distinct small molecule ligand-binding pockets in proteins is small. PoLi identifies similar ligand-binding pockets in a holo template protein library, selectively copies relevant parts of template ligands, and uses them for VS. In practice, PoLi is a hybrid structure and ligand-based VS algorithm that integrates 2D fingerprint-based and 3D shape-based similarity metrics for improved virtual screening performance. On standard DUD and DUD-E benchmark databases, using modeled receptor structures, PoLi achieves an average enrichment factor of 13.4 and 9.6, respectively, in the top 1% of the screened library. In contrast, traditional docking-based VS using AutoDock Vina and homology-based VS using FINDSITE(filt) have an average enrichment of 1.6 (3.0) and 9.0 (7.9) on the DUD (DUD-E) sets, respectively. Experimental validation of PoLi predictions on dihydrofolate reductase, DHFR, using differential scanning fluorimetry, DSF, identifies multiple ligands with diverse molecular scaffolds, thus demonstrating the advantage of PoLi over current state-of-the-art VS methods.
Collapse
Affiliation(s)
- Ambrish Roy
- Center for the Study of Systems Biology, Georgia Institute of Technology , 250 14th Street NW, Atlanta, Georgia 30318, United States
| | - Bharath Srinivasan
- Center for the Study of Systems Biology, Georgia Institute of Technology , 250 14th Street NW, Atlanta, Georgia 30318, United States
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, Georgia Institute of Technology , 250 14th Street NW, Atlanta, Georgia 30318, United States
| |
Collapse
|
25
|
Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 2015; 5:11476. [PMID: 26098304 PMCID: PMC4476419 DOI: 10.1038/srep11476] [Citation(s) in RCA: 218] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 05/19/2015] [Indexed: 11/09/2022] Open
Abstract
Direct prediction of protein structure from sequence is a challenging problem. An effective approach is to break it up into independent sub-problems. These sub-problems such as prediction of protein secondary structure can then be solved independently. In a previous study, we found that an iterative use of predicted secondary structure and backbone torsion angles can further improve secondary structure and torsion angle prediction. In this study, we expand the iterative features to include solvent accessible surface area and backbone angles and dihedrals based on Cα atoms. By using a deep learning neural network in three iterations, we achieved 82% accuracy for secondary structure prediction, 0.76 for the correlation coefficient between predicted and actual solvent accessible surface area, 19° and 30° for mean absolute errors of backbone φ and ψ angles, respectively, and 8° and 32° for mean absolute errors of Cα-based θ and τ angles, respectively, for an independent test dataset of 1199 proteins. The accuracy of the method is slightly lower for 72 CASP 11 targets but much higher than those of model structures from current state-of-the-art techniques. This suggests the potentially beneficial use of these predicted properties for model assessment and ranking.
Collapse
|
26
|
Boles RG, Hornung HA, Moody AE, Ortiz TB, Wong SA, Eggington JM, Stanley CM, Gao M, Zhou H, McLaughlin S, Zare AS, Sheldon KM, Skolnick J, McKernan KJ. Hurt, tired and queasy: Specific variants in the ATPase domain of the TRAP1 mitochondrial chaperone are associated with common, chronic "functional" symptomatology including pain, fatigue and gastrointestinal dysmotility. Mitochondrion 2015; 23:64-70. [PMID: 26022780 DOI: 10.1016/j.mito.2015.05.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2015] [Revised: 05/15/2015] [Accepted: 05/21/2015] [Indexed: 10/23/2022]
Abstract
Functional disorders are common conditions with a substantial impact on a patients' wellbeing, and can be diagnostically elusive. There are bidirectional associations between functional disorders and mitochondrial dysfunction. In this study, provided clinical information and the exon sequence of the TRAP1 mitochondrial chaperone were retrospectively reviewed with a focus on the functional categories of chronic pain, fatigue and gastrointestinal dysmotility. Very-highly conserved TRAP1 variants were identified in 73 of 930 unrelated patients. Functional symptomatology is strongly associated with specific variants in the ATPase binding pocket. In particular, the combined presence of all three functional categories is strongly associated with p.Ile253Val (OR 7.5, P = 0.0001) and with two other interacting variants (OR 18, P = 0.0005). Considering a 1-2% combined variant prevalence and high odds ratios, these variants may be an important factor in the etiology of functional symptomatology.
Collapse
Affiliation(s)
- Richard G Boles
- Courtagen Life Sciences, 12 Gill St, Ste. 3700, Woburn, MA 01801, United States
| | - Holly A Hornung
- Courtagen Life Sciences, 12 Gill St, Ste. 3700, Woburn, MA 01801, United States
| | - Alastair E Moody
- Courtagen Life Sciences, 12 Gill St, Ste. 3700, Woburn, MA 01801, United States
| | - Thomas B Ortiz
- Courtagen Life Sciences, 12 Gill St, Ste. 3700, Woburn, MA 01801, United States
| | - Stacey A Wong
- Courtagen Life Sciences, 12 Gill St, Ste. 3700, Woburn, MA 01801, United States
| | - Julie M Eggington
- Courtagen Life Sciences, 12 Gill St, Ste. 3700, Woburn, MA 01801, United States
| | - Christine M Stanley
- Courtagen Life Sciences, 12 Gill St, Ste. 3700, Woburn, MA 01801, United States
| | - Mu Gao
- Center for the Study of Systems Biology, Georgia Institute of Technology, 250 14th St, Atlanta, GA 30318, United States
| | - Hongyi Zhou
- Center for the Study of Systems Biology, Georgia Institute of Technology, 250 14th St, Atlanta, GA 30318, United States
| | - Stephen McLaughlin
- Courtagen Life Sciences, 12 Gill St, Ste. 3700, Woburn, MA 01801, United States
| | - Amir S Zare
- Courtagen Life Sciences, 12 Gill St, Ste. 3700, Woburn, MA 01801, United States
| | - Katherine M Sheldon
- Courtagen Life Sciences, 12 Gill St, Ste. 3700, Woburn, MA 01801, United States
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, Georgia Institute of Technology, 250 14th St, Atlanta, GA 30318, United States
| | - Kevin J McKernan
- Courtagen Life Sciences, 12 Gill St, Ste. 3700, Woburn, MA 01801, United States
| |
Collapse
|
27
|
Caulfield TR, Fiesel FC, Moussaud-Lamodière EL, Dourado DFAR, Flores SC, Springer W. Phosphorylation by PINK1 releases the UBL domain and initializes the conformational opening of the E3 ubiquitin ligase Parkin. PLoS Comput Biol 2014; 10:e1003935. [PMID: 25375667 PMCID: PMC4222639 DOI: 10.1371/journal.pcbi.1003935] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 09/25/2014] [Indexed: 11/19/2022] Open
Abstract
Loss-of-function mutations in PINK1 or PARKIN are the most common causes of autosomal recessive Parkinson's disease. Both gene products, the Ser/Thr kinase PINK1 and the E3 Ubiquitin ligase Parkin, functionally cooperate in a mitochondrial quality control pathway. Upon stress, PINK1 activates Parkin and enables its translocation to and ubiquitination of damaged mitochondria to facilitate their clearance from the cell. Though PINK1-dependent phosphorylation of Ser65 is an important initial step, the molecular mechanisms underlying the activation of Parkin's enzymatic functions remain unclear. Using molecular modeling, we generated a complete structural model of human Parkin at all atom resolution. At steady state, the Ub ligase is maintained inactive in a closed, auto-inhibited conformation that results from intra-molecular interactions. Evidently, Parkin has to undergo major structural rearrangements in order to unleash its catalytic activity. As a spark, we have modeled PINK1-dependent Ser65 phosphorylation in silico and provide the first molecular dynamics simulation of Parkin conformations along a sequential unfolding pathway that could release its intertwined domains and enable its catalytic activity. We combined free (unbiased) molecular dynamics simulation, Monte Carlo algorithms, and minimal-biasing methods with cell-based high content imaging and biochemical assays. Phosphorylation of Ser65 results in widening of a newly defined cleft and dissociation of the regulatory N-terminal UBL domain. This motion propagates through further opening conformations that allow binding of an Ub-loaded E2 co-enzyme. Subsequent spatial reorientation of the catalytic centers of both enzymes might facilitate the transfer of the Ub moiety to charge Parkin. Our structure-function study provides the basis to elucidate regulatory mechanisms and activity of the neuroprotective Parkin. This may open up new avenues for the development of small molecule Parkin activators through targeted drug design. Parkinson's disease (PD) is a devastating neurological condition caused by the selective and progressive degeneration of dopaminergic neurons in the brain. Loss-of-function mutations in the PINK1 or PARKIN genes are the most common causes of recessively inherited PD. Together the encoded proteins coordinate a protective cellular quality control pathway that allows elimination of impaired mitochondria in order to prevent further cellular damage and ultimately death. Although it is known that the kinase PINK1 operates upstream and activates the E3 Ubiquitin ligase Parkin, the molecular mechanisms remain elusive. Here, we combined state-of-the art computational and functional biological methods to demonstrate that Parkin is sequentially activated through PINK1-dependent phosphorylation and subsequent structural rearrangement. The induced motions result in release of Parkin's closed, auto-inhibited conformation to liberate its enzymatic functions. We provide for the first time a complete protein structure of Parkin at an all atom resolution and a comprehensive molecular dynamics simulation of its activation and opening conformations. The generated models will allow uncovering the exact mechanisms of regulation and enzymatic activity of Parkin and potentially the development of novel therapeutics through a structure-function-based drug design.
Collapse
Affiliation(s)
- Thomas R. Caulfield
- Department of Neuroscience, Mayo Clinic Jacksonville, Florida, United States of America
- * E-mail: (TRC); (WS)
| | - Fabienne C. Fiesel
- Department of Neuroscience, Mayo Clinic Jacksonville, Florida, United States of America
| | | | - Daniel F. A. R. Dourado
- Department of Cell & Molecular Biology, Computational & Systems Biology, Uppsala University, Uppsala, Sweden
| | - Samuel C. Flores
- Department of Cell & Molecular Biology, Computational & Systems Biology, Uppsala University, Uppsala, Sweden
| | - Wolfdieter Springer
- Department of Neuroscience, Mayo Clinic Jacksonville, Florida, United States of America
- Mayo Graduate School, Neurobiology of Disease, Mayo Clinic, Jacksonville, Florida, United States of America
- * E-mail: (TRC); (WS)
| |
Collapse
|
28
|
Skolnick J, Gao M, Zhou H. On the role of physics and evolution in dictating protein structure and function. Isr J Chem 2014; 54:1176-1188. [PMID: 25484448 PMCID: PMC4255337 DOI: 10.1002/ijch.201400013] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
How many of the structural and functional properties of proteins are inherent? Computer simulations provide a powerful tool to address this question. A series of studies on QS, quasi-spherical, compact polypeptides which lack any secondary structure; ART, artificial, proteins comprised of compact homopolypeptides with protein-like secondary structure; and PDB, native, single domain proteins shows that essentially all native global folds, pockets and protein-protein interfaces are in the ART library. This suggests that many protein properties are inherent and that evolution is involved in fine-tuning. The completeness of the space of ligand binding pockets and protein-protein interfaces suggests that promiscuous interactions are intrinsic to proteins and that the capacity to perform the biochemistry of life at low level does not require evolution. If so, this has profound consequences for the origin of life.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA
| | - Mu Gao
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA
| | - Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA
| |
Collapse
|
29
|
Srinivasan B, Zhou H, Kubanek J, Skolnick J. Experimental validation of FINDSITE(comb) virtual ligand screening results for eight proteins yields novel nanomolar and micromolar binders. J Cheminform 2014; 6:16. [PMID: 24936211 PMCID: PMC4038399 DOI: 10.1186/1758-2946-6-16] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 04/15/2014] [Indexed: 01/09/2023] Open
Abstract
Background Identification of ligand-protein binding interactions is a critical step in drug discovery. Experimental screening of large chemical libraries, in spite of their specific role and importance in drug discovery, suffer from the disadvantages of being random, time-consuming and expensive. To accelerate the process, traditional structure- or ligand-based VLS approaches are combined with experimental high-throughput screening, HTS. Often a single protein or, at most, a protein family is considered. Large scale VLS benchmarking across diverse protein families is rarely done, and the reported success rate is very low. Here, we demonstrate the experimental HTS validation of a novel VLS approach, FINDSITEcomb, across a diverse set of medically-relevant proteins. Results For eight different proteins belonging to different fold-classes and from diverse organisms, the top 1% of FINDSITEcomb’s VLS predictions were tested, and depending on the protein target, 4%-47% of the predicted ligands were shown to bind with μM or better affinities. In total, 47 small molecule binders were identified. Low nanomolar (nM) binders for dihydrofolate reductase and protein tyrosine phosphatases (PTPs) and micromolar binders for the other proteins were identified. Six novel molecules had cytotoxic activity (<10 μg/ml) against the HCT-116 colon carcinoma cell line and one novel molecule had potent antibacterial activity. Conclusions We show that FINDSITEcomb is a promising new VLS approach that can assist drug discovery.
Collapse
Affiliation(s)
- Bharath Srinivasan
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250, 14th Street, N.W., Atlanta, GA 30318, USA
| | - Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250, 14th Street, N.W., Atlanta, GA 30318, USA
| | - Julia Kubanek
- School of Biology, Atlanta, GA 30332, USA ; School of Chemistry and Biochemistry, Aquatic Chemical Ecology Center, Institute of Bioengineering and Biosciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250, 14th Street, N.W., Atlanta, GA 30318, USA
| |
Collapse
|
30
|
Khoury GA, Thompson JP, Smadbeck J, Kieslich CA, Floudas CA. Forcefield_PTM: Ab Initio Charge and AMBER Forcefield Parameters for Frequently Occurring Post-Translational Modifications. J Chem Theory Comput 2013; 9:5653-5674. [PMID: 24489522 PMCID: PMC3904396 DOI: 10.1021/ct400556v] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
In this work, we introduce Forcefield_PTM, a set of AMBER forcefield parameters consistent with ff03 for 32 common post-translational modifications. Partial charges were calculated through ab initio calculations and a two-stage RESP-fitting procedure in an ether-like implicit solvent environment. The charges were found to be generally consistent with others previously reported for phosphorylated amino acids, and trimethyllysine, using different parameterization methods. Pairs of modified and their corresponding unmodified structures were curated from the PDB for both single and multiple modifications. Background structural similarity was assessed in the context of secondary and tertiary structures from the global dataset. Next, the charges derived for Forcefield_PTM were tested on a macroscopic scale using unrestrained all-atom Langevin molecular dynamics simulations in AMBER for 34 (17 pairs of modified/unmodified) systems in implicit solvent. Assessment was performed in the context of secondary structure preservation, stability in energies, and correlations between the modified and unmodified structure trajectories on the aggregate. As an illustration of their utility, the parameters were used to compare the structural stability of the phosphorylated and dephosphorylated forms of OdhI. Microscopic comparisons between quantum and AMBER single point energies along key χ torsions on several PTMs were performed and corrections to improve their agreement in terms of mean squared errors and squared correlation coefficients were parameterized. This forcefield for post-translational modifications in condensed-phase simulations can be applied to a number of biologically relevant and timely applications including protein structure prediction, protein and peptide design, docking, and to study the effect of PTMs on folding and dynamics. We make the derived parameters and an associated interactive webtool capable of performing post-translational modifications on proteins using Forcefield_PTM available at http://selene.princeton.edu/FFPTM.
Collapse
Affiliation(s)
- George A. Khoury
- Department of Chemical and Biological Engineering, Princeton, NJ, USA
| | - Jeff P. Thompson
- Department of Chemical and Biological Engineering, Princeton, NJ, USA
| | - James Smadbeck
- Department of Chemical and Biological Engineering, Princeton, NJ, USA
| | - Chris A. Kieslich
- Department of Chemical and Biological Engineering, Princeton, NJ, USA
| | | |
Collapse
|
31
|
Skorupka K, Han SK, Nam HJ, Kim S, Faham S. Protein design by fusion: implications for protein structure prediction and evolution. ACTA CRYSTALLOGRAPHICA SECTION D: BIOLOGICAL CRYSTALLOGRAPHY 2013; 69:2451-60. [PMID: 24311586 DOI: 10.1107/s0907444913022701] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2013] [Accepted: 08/12/2013] [Indexed: 01/21/2023]
Abstract
Domain fusion is a useful tool in protein design. Here, the structure of a fusion of the heterodimeric flagella-assembly proteins FliS and FliC is reported. Although the ability of the fusion protein to maintain the structure of the heterodimer may be apparent, threading-based structural predictions do not properly fuse the heterodimer. Additional examples of naturally occurring heterodimers that are homologous to full-length proteins were identified. These examples highlight that the designed protein was engineered by the same tools as used in the natural evolution of proteins and that heterodimeric structures contain a wealth of information, currently unused, that can improve structural predictions.
Collapse
Affiliation(s)
- Katarzyna Skorupka
- Department of Molecular Physiology and Biological Physics, University of Virginia School of Medicine, Charlottesville, VA 22093, USA
| | | | | | | | | |
Collapse
|
32
|
Khoury GA, Smadbeck J, Kieslich CA, Floudas CA. Protein folding and de novo protein design for biotechnological applications. Trends Biotechnol 2013; 32:99-109. [PMID: 24268901 DOI: 10.1016/j.tibtech.2013.10.008] [Citation(s) in RCA: 101] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Revised: 10/10/2013] [Accepted: 10/18/2013] [Indexed: 11/19/2022]
Abstract
In the postgenomic era, the medical/biological fields are advancing faster than ever. However, before the power of full-genome sequencing can be fully realized, the connection between amino acid sequence and protein structure, known as the protein folding problem, needs to be elucidated. The protein folding problem remains elusive, with significant difficulties still arising when modeling amino acid sequences lacking an identifiable template. Understanding protein folding will allow for unforeseen advances in protein design; often referred to as the inverse protein folding problem. Despite challenges in protein folding, de novo protein design has recently demonstrated significant success via computational techniques. We review advances and challenges in protein structure prediction and de novo protein design, and highlight their interplay in successful biotechnological applications.
Collapse
Affiliation(s)
- George A Khoury
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - James Smadbeck
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Chris A Kieslich
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Christodoulos A Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA.
| |
Collapse
|
33
|
Skolnick J, Zhou H, Gao M. Are predicted protein structures of any value for binding site prediction and virtual ligand screening? Curr Opin Struct Biol 2013; 23:191-7. [PMID: 23415854 DOI: 10.1016/j.sbi.2013.01.009] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Revised: 01/04/2013] [Accepted: 01/23/2013] [Indexed: 01/03/2023]
Abstract
The recently developed field of ligand homology modeling (LHM) that extends the ideas of protein homology modeling to the prediction of ligand binding sites and for use in virtual ligand screening has emerged as a powerful new approach. Unlike traditional docking methodologies, LHM can be applied to low-to-moderate resolution predicted as well as experimental structures with little if any diminution in performance; thereby enabling ≈ 75% of an average proteome to have potentially significant virtual screening predictions. In large scale benchmarking, LHM is able to predict off-target ligand binding. Thus, despite the widespread belief to the contrary, low-to-moderate resolution predicted structures have considerable utility for biochemical function prediction.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA.
| | | | | |
Collapse
|
34
|
Power TD, Ivanciuc O, Schein CH, Braun W. Assessment of 3D models for allergen research. Proteins 2013; 81:545-54. [PMID: 23239464 DOI: 10.1002/prot.24239] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2012] [Revised: 11/16/2012] [Accepted: 12/07/2012] [Indexed: 12/27/2022]
Abstract
Allergenic proteins must crosslink specific IgE molecules, bound to the surface of mast cells and basophils, to stimulate an immune response. A structural understanding of the allergen-IgE interface is needed to predict cross-reactivities between allergens and to design hypoallergenic proteins. However, there are less than 90 experimentally determined structures available for the approximately 1500 sequences of allergens and isoallergens cataloged in the Structural Database of Allergenic Proteins. To provide reliable structural data for the remaining proteins, we previously produced more than 500 3D models using an automated procedure, with strict controls on template choice and model quality evaluation. Here, we assessed how well the fold and residue surface exposure of 10 of these models correlated with recently published experimental 3D structures determined by X-ray crystallography or NMR. We also discuss the impact of intrinsically disordered regions on the structural comparison and epitope prediction. Overall, for seven allergens with sequence identities to the original templates higher than 27%, the backbone root-mean square deviations were less than 2 Å between the models and the subsequently determined experimental structures for the ordered regions. Further, the surface exposure of the known IgE epitopes on the models of three major allergens, from peanut (Ara h 1), latex (Hev b 2), and soy (Gly m 4), was very similar to the experimentally determined structures. For the three remaining allergens with lower sequence identities to the modeling templates, the 3D folds were correctly identified. However, the accuracy of those models is not sufficient for a reliable epitope mapping.
Collapse
Affiliation(s)
- Trevor D Power
- Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, Texas 77555-0857, USA
| | | | | | | |
Collapse
|
35
|
Zhou H, Skolnick J. FINDSITE(comb): a threading/structure-based, proteomic-scale virtual ligand screening approach. J Chem Inf Model 2012; 53:230-40. [PMID: 23240691 DOI: 10.1021/ci300510n] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Virtual ligand screening is an integral part of the modern drug discovery process. Traditional ligand-based, virtual screening approaches are fast but require a set of structurally diverse ligands known to bind to the target. Traditional structure-based approaches require high-resolution target protein structures and are computationally demanding. In contrast, the recently developed threading/structure-based FINDSITE-based approaches have the advantage that they are as fast as traditional ligand-based approaches and yet overcome the limitations of traditional ligand- or structure-based approaches. These new methods can use predicted low-resolution structures and infer the likelihood of a ligand binding to a target by utilizing ligand information excised from the target's remote or close homologous proteins and/or libraries of ligand binding databases. Here, we develop an improved version of FINDSITE, FINDSITE(filt), that filters out false positive ligands in threading identified templates by a better binding site detection procedure that includes information about the binding site amino acid similarity. We then combine FINDSITE(filt) with FINDSITE(X) that uses publicly available binding databases ChEMBL and DrugBank for virtual ligand screening. The combined approach, FINDSITE(comb), is compared to two traditional docking methods, AUTODOCK Vina and DOCK 6, on the DUD benchmark set. It is shown to be significantly better in terms of enrichment factor, dependence on target structure quality, and speed. FINDSITE(comb) is then tested for virtual ligand screening on a large set of 3576 generic targets from the DrugBank database as well as a set of 168 Human GPCRs. Excluding close homologues, FINDSITE(comb) gives an average enrichment factor of 52.1 for generic targets and 22.3 for GPCRs within the top 1% of the screened compound library. Around 65% of the targets have better than random enrichment factors. The performance is insensitive to target structure quality, as long as it has a TM-score ≥ 0.4 to native. Thus, FINDSITE(comb) makes the screening of millions of compounds across entire proteomes feasible. The FINDSITE(comb) web service is freely available for academic users at http://cssb.biology.gatech.edu/skolnick/webservice/FINDSITE-COMB/index.html.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street, N.W., Atlanta, Georgia 30318, USA
| | | |
Collapse
|
36
|
Zhou H, Skolnick J. FINDSITE(X): a structure-based, small molecule virtual screening approach with application to all identified human GPCRs. Mol Pharm 2012; 9:1775-84. [PMID: 22574683 DOI: 10.1021/mp3000716] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
We have developed FINDSITE(X), an extension of FINDSITE, a protein threading based algorithm for the inference of protein binding sites, biochemical function and virtual ligand screening, that removes the limitation that holo protein structures (those containing bound ligands) of a sufficiently large set of distant evolutionarily related proteins to the target be solved; rather, predicted protein structures and experimental ligand binding information are employed. To provide the predicted protein structures, a fast and accurate version of our recently developed TASSER(VMT), TASSER(VMT)-lite, for template-based protein structural modeling applicable up to 1000 residues is developed and tested, with comparable performance to the top CASP9 servers. Then, a hybrid approach that combines structure alignments with an evolutionary similarity score for identifying functional relationships between target and proteins with binding data has been developed. By way of illustration, FINDSITE(X) is applied to 998 identified human G-protein coupled receptors (GPCRs). First, TASSER(VMT)-lite provides updates of all human GPCR structures previously modeled in our lab. We then use these structures and the new function similarity detection algorithm to screen all human GPCRs against the ZINC8 nonredundant (TC < 0.7) ligand set combined with ligands from the GLIDA database (a total of 88,949 compounds). Testing (excluding GPCRs whose sequence identity > 30% to the target from the binding data library) on a 168 human GPCR set with known binding data, the average enrichment factor in the top 1% of the compound library (EF(0.01)) is 22.7, whereas EF(0.01) by FINDSITE is 7.1. For virtual screening when just the target and its native ligands are excluded, the average EF(0.01) reaches 41.4. We also analyze off-target interactions for the 168 protein test set. All predicted structures, virtual screening data and off-target interactions for the 998 human GPCRs are available at http://cssb.biology.gatech.edu/skolnick/webservice/gpcr/index.html .
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street, N.W., Atlanta, Georgia 30318, United States
| | | |
Collapse
|