1
|
Chen J, Hou J, Wong KC. Categorical Matrix Completion With Active Learning for High-Throughput Screening. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2261-2270. [PMID: 32203025 DOI: 10.1109/tcbb.2020.2982142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The recent advances in wet-lab automation enable high-throughput experiments to be conducted seamlessly. In particular, the exhaustive enumeration of all possible conditions is always involved in high-throughput screening. Nonetheless, such a screening strategy is hardly believed to be optimal and cost-effective. By incorporating artificial intelligence, we design an open-source model based on categorical matrix completion and active machine learning to guide high throughput screening experiments. Specifically, we narrow our scope to the high-throughput screening for chemical compound effects on diverse protein sub-cellular locations. In the proposed model, we believe that exploration is more important than the exploitation in the long-run of high-throughput screening experiment, Therefore, we design several innovations to circumvent the existing limitations. In particular, categorical matrix completion is designed to accurately impute the missing experiments while margin sampling is also implemented for uncertainty estimation. The model is systematically tested on both simulated and real data. The simulation results reflect that our model can be robust to diverse scenarios, while the real data results demonstrate the wet-lab applicability of our model for high-throughput screening experiments. Lastly, we attribute the model success to its exploration ability by revealing the related matrix ranks and distinct experiment coverage comparisons.
Collapse
|
2
|
Tan YS, Mhoumadi Y, Verma CS. Roles of computational modelling in understanding p53 structure, biology, and its therapeutic targeting. J Mol Cell Biol 2020; 11:306-316. [PMID: 30726928 PMCID: PMC6487789 DOI: 10.1093/jmcb/mjz009] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 12/14/2018] [Accepted: 01/31/2019] [Indexed: 12/21/2022] Open
Abstract
The transcription factor p53 plays pivotal roles in numerous biological processes, including the suppression of tumours. The rich availability of biophysical data aimed at understanding its structure–function relationships since the 1990s has enabled the application of a variety of computational modelling techniques towards the establishment of mechanistic models. Together they have provided deep insights into the structure, mechanics, energetics, and dynamics of p53. In parallel, the observation that mutations in p53 or changes in its associated pathways characterize several human cancers has resulted in a race to develop therapeutic modulators of p53, some of which have entered clinical trials. This review describes how computational modelling has played key roles in understanding structural-dynamic aspects of p53, formulating hypotheses about domains that are beyond current experimental investigations, and the development of therapeutic molecules that target the p53 pathway.
Collapse
Affiliation(s)
- Yaw Sing Tan
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore
| | - Yasmina Mhoumadi
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore.,School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore
| | - Chandra S Verma
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore.,School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore.,Department of Biological Sciences, National University of Singapore, 14 Science Drive 4, Singapore
| |
Collapse
|
3
|
|
4
|
Gárate-Escamilla AK, El Hassani AH, Andres E. Big data execution time based on Spark Machine Learning Libraries. PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON CLOUD AND BIG DATA COMPUTING 2019. [DOI: 10.1145/3358505.3358519] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
Affiliation(s)
| | | | - Emmanuel Andres
- Service de Médecine Interne, Diabète et Maladies métaboliques de la Clinique Médicale B, CHRU de Strasbourg, Strasbourg
| |
Collapse
|
5
|
Unsupervised dimensionality reduction versus supervised regularization for classification from sparse data. Data Min Knowl Discov 2019. [DOI: 10.1007/s10618-019-00616-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
6
|
Bandaragoda TR, Ting KM, Albrecht D, Liu FT, Zhu Y, Wells JR. Isolation-based anomaly detection using nearest-neighbor ensembles. Comput Intell 2018. [DOI: 10.1111/coin.12156] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Tharindu R. Bandaragoda
- Research Centre for Data Analytics and Cognition; La Trobe University; Melbourne VIC Australia
| | - Kai Ming Ting
- School of Engineering and Information Technology; Federation University; Ballarat VIC Australia
| | - David Albrecht
- Faculty of Information Technology; Monash University; Melbourne VIC Australia
| | - Fei Tony Liu
- Faculty of Information Technology; Monash University; Melbourne VIC Australia
| | - Ye Zhu
- School of Information Technology; Deakin University; Burwood VIC Australia
| | - Jonathan R. Wells
- School of Engineering and Information Technology; Federation University; Ballarat VIC Australia
| |
Collapse
|
7
|
Danziger SA, Miller LR, Singh K, Whitney GA, Peskind ER, Li G, Lipshutz RJ, Aitchison JD, Smith JJ. An indicator cell assay for blood-based diagnostics. PLoS One 2017; 12:e0178608. [PMID: 28594877 PMCID: PMC5464608 DOI: 10.1371/journal.pone.0178608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2016] [Accepted: 05/16/2017] [Indexed: 11/30/2022] Open
Abstract
We have established proof of principle for the Indicator Cell Assay Platform™ (iCAP™), a broadly applicable tool for blood-based diagnostics that uses specifically-selected, standardized cells as biosensors, relying on their innate ability to integrate and respond to diverse signals present in patients' blood. To develop an assay, indicator cells are exposed in vitro to serum from case or control subjects and their global differential response patterns are used to train reliable, disease classifiers based on a small number of features. In a feasibility study, the iCAP detected pre-symptomatic disease in a murine model of amyotrophic lateral sclerosis (ALS) with 94% accuracy (p-Value = 3.81E-6) and correctly identified samples from a murine Huntington's disease model as non-carriers of ALS. Beyond the mouse model, in a preliminary human disease study, the iCAP detected early stage Alzheimer's disease with 72% cross-validated accuracy (p-Value = 3.10E-3). For both assays, iCAP features were enriched for disease-related genes, supporting the assay's relevance for disease research.
Collapse
Affiliation(s)
- Samuel A. Danziger
- Institute for Systems Biology, Seattle, WA, United States of America
- Center for Infectious Disease Research (formerly Seattle Biomedical Research Institute), Seattle, WA, United States of America
| | - Leslie R. Miller
- Institute for Systems Biology, Seattle, WA, United States of America
| | - Karanbir Singh
- Institute for Systems Biology, Seattle, WA, United States of America
| | | | - Elaine R. Peskind
- Northwest Network (VISN-20) Mental Illness, Research, Education, and Clinical Center (MIRECC), VA Puget Sound, Seattle, WA, United States of America
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, United States of America
| | - Ge Li
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, United States of America
- Geriatric Research, Education, and Clinical Center, Veterans Affairs (VA) Puget Sound Health Care System (VA Puget Sound), Seattle, WA, United States of America
| | - Robert J. Lipshutz
- Institute for Systems Biology, Seattle, WA, United States of America
- PreCyte Inc., Seattle, WA, United States of America
| | - John D. Aitchison
- Institute for Systems Biology, Seattle, WA, United States of America
- Center for Infectious Disease Research (formerly Seattle Biomedical Research Institute), Seattle, WA, United States of America
| | - Jennifer J. Smith
- Institute for Systems Biology, Seattle, WA, United States of America
- Center for Infectious Disease Research (formerly Seattle Biomedical Research Institute), Seattle, WA, United States of America
- PreCyte Inc., Seattle, WA, United States of America
| |
Collapse
|
8
|
Small Random Forest Models for Effective Chemogenomic Active Learning. JOURNAL OF COMPUTER AIDED CHEMISTRY 2017. [DOI: 10.2751/jcac.18.124] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
9
|
Abstract
Deleterious or 'disease-associated' mutations are mutations that lead to disease with high phenotype penetrance: they are inherited in a simple Mendelian manner, or, in the case of cancer, accumulate in somatic cells leading directly to disease. However, in some cases, the amino acid that is substituted resulting in disease is the wild-type native residue in the functionally equivalent protein in another species. Such examples are known as 'compensated pathogenic deviations' (CPDs) because, somewhere in the second species, there must be compensatory mutations that allow the protein to function normally despite having a residue which would cause disease in the first species. Depending on the nature of the mutations, compensation can occur in the same protein, or in a different protein with which it interacts. In principle, compensation can be achieved by a single mutation (most probably structurally close to the CPD), or by the cumulative effect of several mutations. Although it is clear that these effects occur in proteins, compensatory mutations are also important in RNA potentially having an impact on disease. As a much simpler molecule, RNA provides an interesting model for understanding mechanisms of compensatory effects, both by looking at naturally occurring RNA molecules and as a means of computational simulation. This review surveys the rather limited literature that has explored these effects. Understanding the nature of CPDs is important in understanding traversal along fitness landscape valleys in evolution. It could also have applications in treating diseases that result from such mutations.
Collapse
|
10
|
Temerinac-Ott M, Naik AW, Murphy RF. Deciding when to stop: efficient experimentation to learn to predict drug-target interactions. BMC Bioinformatics 2015; 16:213. [PMID: 26153434 PMCID: PMC4495685 DOI: 10.1186/s12859-015-0650-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 06/26/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Active learning is a powerful tool for guiding an experimentation process. Instead of doing all possible experiments in a given domain, active learning can be used to pick the experiments that will add the most knowledge to the current model. Especially, for drug discovery and development, active learning has been shown to reduce the number of experiments needed to obtain high-confidence predictions. However, in practice, it is crucial to have a method to evaluate the quality of the current predictions and decide when to stop the experimentation process. Only by applying reliable stopping criteria to active learning can time and costs in the experimental process actually be saved. RESULTS We compute active learning traces on simulated drug-target matrices in order to determine a regression model for the accuracy of the active learner. By analyzing the performance of the regression model on simulated data, we design stopping criteria for previously unseen experimental matrices. We demonstrate on four previously characterized drug effect data sets that applying the stopping criteria can result in upto 40 % savings of the total experiments for highly accurate predictions. CONCLUSIONS We show that active learning accuracy can be predicted using simulated data and results in substantial savings in the number of experiments required to make accurate drug-target predictions.
Collapse
Affiliation(s)
- Maja Temerinac-Ott
- Freiburg Institute for Advanced Studies, University of Freiburg, Freiburg, Germany.
| | - Armaghan W Naik
- Computational Biology Department, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, 15213, PA, USA.
| | - Robert F Murphy
- Freiburg Institute for Advanced Studies, University of Freiburg, Freiburg, Germany.
- Computational Biology Department, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, 15213, PA, USA.
- Departments of Biological Sciences, Biomedical Engineering and Machine Learning, Carnegie Mellon University, 5000 Forbes Ave15213, Pittsburgh, PA, USA.
| |
Collapse
|
11
|
Wallentine BD, Wang Y, Tretyachenko-Ladokhina V, Tan M, Senear DF, Luecke H. Structures of oncogenic, suppressor and rescued p53 core-domain variants: mechanisms of mutant p53 rescue. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2013; 69:2146-56. [PMID: 24100332 PMCID: PMC3792646 DOI: 10.1107/s0907444913020830] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2013] [Accepted: 07/25/2013] [Indexed: 11/10/2022]
Abstract
To gain insights into the mechanisms by which certain second-site suppressor mutations rescue the function of a significant number of cancer mutations of the tumor suppressor protein p53, X-ray crystallographic structures of four p53 core-domain variants were determined. These include an oncogenic mutant, V157F, two single-site suppressor mutants, N235K and N239Y, and the rescued cancer mutant V157F/N235K/N239Y. The V157F mutation substitutes a smaller hydrophobic valine with a larger hydrophobic phenylalanine within strand S4 of the hydrophobic core. The structure of this cancer mutant shows no gross structural changes in the overall fold of the p53 core domain, only minor rearrangements of side chains within the hydrophobic core of the protein. Based on biochemical analysis, these small local perturbations induce instability in the protein, increasing the free energy by 3.6 kcal mol(-1) (15.1 kJ mol(-1)). Further biochemical evidence shows that each suppressor mutation, N235K or N239Y, acts individually to restore thermodynamic stability to V157F and that both together are more effective than either alone. All rescued mutants were found to have wild-type DNA-binding activity when assessed at a permissive temperature, thus pointing to thermodynamic stability as the critical underlying variable. Interestingly, thermodynamic analysis shows that while N239Y demonstrates stabilization of the wild-type p53 core domain, N235K does not. These observations suggest distinct structural mechanisms of rescue. A new salt bridge between Lys235 and Glu198, found in both the N235K and rescued cancer mutant structures, suggests a rescue mechanism that relies on stabilizing the β-sandwich scaffold. On the other hand, the substitution N239Y creates an advantageous hydrophobic contact between the aromatic ring of this tyrosine and the adjacent Leu137. Surprisingly, the rescued cancer mutant shows much larger structural deviations than the cancer mutant alone when compared with wild-type p53. These suppressor mutations appear to rescue p53 function by creating novel intradomain interactions that stabilize the core domain, allowing compensation for the destabilizing V157F mutation.
Collapse
Affiliation(s)
- Brad D. Wallentine
- Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA 92697, USA
| | - Ying Wang
- Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA 92697, USA
| | | | - Martha Tan
- Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA 92697, USA
| | - Donald F. Senear
- Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA 92697, USA
| | - Hartmut Luecke
- Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA 92697, USA
- Department of Physiology and Biophysics, University of California, Irvine, Irvine, CA 92697, USA
- Department of Computer Science, University of California, Irvine, Irvine, CA 92697, USA
- Center for Biomembrane Systems, University of California, Irvine, Irvine, CA 92697, USA
- Unidad de Biofisica (CSIC, UPV/EHU) and Departamento de Bioquimica, Universidad del Pais Vasco, 48940 Leioa, Spain
| |
Collapse
|
12
|
Wassman CD, Baronio R, Demir Ö, Wallentine BD, Chen CK, Hall LV, Salehi F, Lin DW, Chung BP, Hatfield GW, Richard Chamberlin A, Luecke H, Lathrop RH, Kaiser P, Amaro RE. Computational identification of a transiently open L1/S3 pocket for reactivation of mutant p53. Nat Commun 2013; 4:1407. [PMID: 23360998 PMCID: PMC3562459 DOI: 10.1038/ncomms2361] [Citation(s) in RCA: 168] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Accepted: 12/06/2012] [Indexed: 12/22/2022] Open
Abstract
The tumour suppressor p53 is the most frequently mutated gene in human cancer. Reactivation of mutant p53 by small molecules is an exciting potential cancer therapy. Although several compounds restore wild-type function to mutant p53, their binding sites and mechanisms of action are elusive. Here computational methods identify a transiently open binding pocket between loop L1 and sheet S3 of the p53 core domain. Mutation of residue Cys124, located at the centre of the pocket, abolishes p53 reactivation of mutant R175H by PRIMA-1, a known reactivation compound. Ensemble-based virtual screening against this newly revealed pocket selects stictic acid as a potential p53 reactivation compound. In human osteosarcoma cells, stictic acid exhibits dose-dependent reactivation of p21 expression for mutant R175H more strongly than does PRIMA-1. These results indicate the L1/S3 pocket as a target for pharmaceutical reactivation of p53 mutants. About 40% of human cancers carry missense mutations in the tumour suppressor protein p53. Here the authors identify a transiently open pocket in the protein, and by targeting a small molecule to it, partially restore mutant p53 tumour suppressor activity.
Collapse
Affiliation(s)
- Christopher D Wassman
- Department of Computer Science, University of California, Irvine, Irvine, California 92697, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Geetha Ramani R, Jacob SG. Prediction of P53 mutants (multiple sites) transcriptional activity based on structural (2D&3D) properties. PLoS One 2013; 8:e55401. [PMID: 23468845 PMCID: PMC3572112 DOI: 10.1371/journal.pone.0055401] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2012] [Accepted: 12/21/2012] [Indexed: 01/05/2023] Open
Abstract
Prediction of secondary site mutations that reinstate mutated p53 to normalcy has been the focus of intense research in the recent past owing to the fact that p53 mutants have been implicated in more than half of all human cancers and restoration of p53 causes tumor regression. However laboratory investigations are more often laborious and resource intensive but computational techniques could well surmount these drawbacks. In view of this, we formulated a novel approach utilizing computational techniques to predict the transcriptional activity of multiple site (one-site to five-site) p53 mutants. The optimal MCC obtained by the proposed approach on prediction of one-site, two-site, three-site, four-site and five-site mutants were 0.775,0.341,0.784,0.916 and 0.655 respectively, the highest reported thus far in literature. We have also demonstrated that 2D and 3D features generate higher prediction accuracy of p53 activity and our findings revealed the optimal results for prediction of p53 status, reported till date. We believe detection of the secondary site mutations that suppress tumor growth may facilitate better understanding of the relationship between p53 structure and function and further knowledge on the molecular mechanisms and biological activity of p53, a targeted source for cancer therapy. We expect that our prediction methods and reported results may provide useful insights on p53 functional mechanisms and generate more avenues for utilizing computational techniques in biological data analysis.
Collapse
Affiliation(s)
- R. Geetha Ramani
- Department of Information Science and Technology, College of Engineering, Guindy, Anna University, Chennai, Tamilnadu, India
| | - Shomona Gracia Jacob
- Faculty of Information and Communication Engineering, Anna University, Chennai, Tamilnadu, India
| |
Collapse
|
14
|
Restoring coverage to the Bayesian false discovery rate control procedure. Knowl Inf Syst 2012. [DOI: 10.1007/s10115-012-0503-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
15
|
Varnek A, Baskin I. Machine learning methods for property prediction in chemoinformatics: Quo Vadis? J Chem Inf Model 2012; 52:1413-37. [PMID: 22582859 DOI: 10.1021/ci200409x] [Citation(s) in RCA: 148] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
This paper is focused on modern approaches to machine learning, most of which are as yet used infrequently or not at all in chemoinformatics. Machine learning methods are characterized in terms of the "modes of statistical inference" and "modeling levels" nomenclature and by considering different facets of the modeling with respect to input/ouput matching, data types, models duality, and models inference. Particular attention is paid to new approaches and concepts that may provide efficient solutions of common problems in chemoinformatics: improvement of predictive performance of structure-property (activity) models, generation of structures possessing desirable properties, model applicability domain, modeling of properties with functional endpoints (e.g., phase diagrams and dose-response curves), and accounting for multiple molecular species (e.g., conformers or tautomers).
Collapse
Affiliation(s)
- Alexandre Varnek
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4, rue B. Pascal, Strasbourg 67000, France.
| | | |
Collapse
|
16
|
Huang T, Niu S, Xu Z, Huang Y, Kong X, Cai YD, Chou KC. Predicting transcriptional activity of multiple site p53 mutants based on hybrid properties. PLoS One 2011; 6:e22940. [PMID: 21857971 PMCID: PMC3152557 DOI: 10.1371/journal.pone.0022940] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2011] [Accepted: 07/01/2011] [Indexed: 11/26/2022] Open
Abstract
As an important tumor suppressor protein, reactivate mutated p53 was found in many kinds of human cancers and that restoring active p53 would lead to tumor regression. In this work, we developed a new computational method to predict the transcriptional activity for one-, two-, three- and four-site p53 mutants, respectively. With the approach from the general form of pseudo amino acid composition, we used eight types of features to represent the mutation and then selected the optimal prediction features based on the maximum relevance, minimum redundancy, and incremental feature selection methods. The Mathew's correlation coefficients (MCC) obtained by using nearest neighbor algorithm and jackknife cross validation for one-, two-, three- and four-site p53 mutants were 0.678, 0.314, 0.705, and 0.907, respectively. It was revealed by the further optimal feature set analysis that the 2D (two-dimensional) structure features composed the largest part of the optimal feature set and maybe played the most important roles in all four types of p53 mutant active status prediction. It was also demonstrated by the optimal feature sets, especially those at the top level, that the 3D structure features, conservation, physicochemical and biochemical properties of amino acid near the mutation site, also played quite important roles for p53 mutant active status prediction. Our study has provided a new and promising approach for finding functionally important sites and the relevant features for in-depth study of p53 protein and its action mechanism.
Collapse
Affiliation(s)
- Tao Huang
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
- Shanghai Center for Bioinformation Technology, Shanghai, People's Republic of China
| | - Shen Niu
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
| | - Zhongping Xu
- Key Laboratory of Stem Cell Biology, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China
| | - Yun Huang
- Key Laboratory of Stem Cell Biology, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China
| | - Xiangyin Kong
- Key Laboratory of Stem Cell Biology, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China
- State Key Laboratory of Medical Genomics, Ruijin Hospital, Shanghai Jiaotong University, Shanghai, People's Republic of China
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, People's Republic of China
- Centre for Computational Systems Biology, Fudan University, Shanghai, People's Republic of China
- Gordon Life Science Institute, San Diego, California, United States of America
| | - Kuo-Chen Chou
- Gordon Life Science Institute, San Diego, California, United States of America
| |
Collapse
|
17
|
Affiliation(s)
- Robert F Murphy
- Lane Center for Computational Biology and the Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
| |
Collapse
|
18
|
Mirian MS, Ahmadabadi MN, Araabi BN, Siegwart RR. Learning active fusion of multiple experts' decisions: an attention-based approach. Neural Comput 2010; 23:558-91. [PMID: 21105824 DOI: 10.1162/neco_a_00079] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
In this letter, we propose a learning system, active decision fusion learning (ADFL), for active fusion of decisions. Each decision maker, referred to as a local decision maker, provides its suggestion in the form of a probability distribution over all possible decisions. The goal of the system is to learn the active sequential selection of the local decision makers in order to consult with and thus learn the final decision based on the consultations. These two learning tasks are formulated as learning a single sequential decision-making problem in the form of a Markov decision process (MDP), and a continuous reinforcement learning method is employed to solve it. The states of this MDP are decisions of the attended local decision makers, and the actions are either attending to a local decision maker or declaring final decisions. The learning system is punished for each consultation and wrong final decision and rewarded for correct final decisions. This results in minimizing the consultation and decision-making costs through learning a sequential consultation policy where the most informative local decision makers are consulted and the least informative, misleading, and redundant ones are left unattended. An important property of this policy is that it acts locally. This means that the system handles any nonuniformity in the local decision maker's expertise over the state space. This property has been exploited in the design of local experts. ADFL is tested on a set of classification tasks, where it outperforms two well-known classification methods, Adaboost and bagging, as well as three benchmark fusion algorithms: OWA, Borda count, and majority voting. In addition, the effect of local experts design strategy on the performance of ADFL is studied, and some guidelines for the design of local experts are provided. Moreover, evaluating ADFL in some special cases proves that it is able to derive the maximum benefit from the informative local decision makers and to minimize attending to redundant ones.
Collapse
Affiliation(s)
- Maryam S Mirian
- Control and Intelligent Processing Center of Excellence, Department of Electrical and Computer Eng., University of Tehran, Tehran, Iran.
| | | | | | | |
Collapse
|
19
|
Swamidass SJ, Bittker JA, Bodycombe NE, Ryder SP, Clemons PA. An economic framework to prioritize confirmatory tests after a high-throughput screen. ACTA ACUST UNITED AC 2010; 15:680-6. [PMID: 20547534 DOI: 10.1177/1087057110372803] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
How many hits from a high-throughput screen should be sent for confirmatory experiments? Analytical answers to this question are derived from statistics alone and aim to fix, for example, the false discovery rate at a predetermined tolerance. These methods, however, neglect local economic context and consequently lead to irrational experimental strategies. In contrast, the authors argue that this question is essentially economic, not statistical, and is amenable to an economic analysis that admits an optimal solution. This solution, in turn, suggests a novel tool for deciding the number of hits to confirm and the marginal cost of discovery, which meaningfully quantifies the local economic trade-off between true and false positives, yielding an economically optimal experimental strategy. Validated with retrospective simulations and prospective experiments, this strategy identified 157 additional actives that had been erroneously labeled inactive in at least one real-world screening experiment.
Collapse
Affiliation(s)
- S Joshua Swamidass
- Division of Laboratory and Genomic Medicine, Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA.
| | | | | | | | | |
Collapse
|
20
|
Predicting positive p53 cancer rescue regions using Most Informative Positive (MIP) active learning. PLoS Comput Biol 2008; 5:e1000498. [PMID: 19756158 PMCID: PMC2742196 DOI: 10.1371/journal.pcbi.1000498] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2009] [Accepted: 08/04/2009] [Indexed: 11/19/2022] Open
Abstract
Many protein engineering problems involve finding mutations that produce proteins
with a particular function. Computational active learning is an attractive
approach to discover desired biological activities. Traditional active learning
techniques have been optimized to iteratively improve classifier accuracy, not
to quickly discover biologically significant results. We report here a novel
active learning technique, Most Informative Positive (MIP), which is tailored to
biological problems because it seeks novel and informative positive results. MIP
active learning differs from traditional active learning methods in two ways:
(1) it preferentially seeks Positive (functionally active) examples; and (2) it
may be effectively extended to select gene regions suitable for high throughput
combinatorial mutagenesis. We applied MIP to discover mutations in the tumor
suppressor protein p53 that reactivate mutated p53 found in human cancers. This
is an important biomedical goal because p53 mutants have been
implicated in half of all human cancers, and restoring active p53 in tumors
leads to tumor regression. MIP found Positive (cancer rescue) p53 mutants
in silico using 33% fewer experiments than
traditional non-MIP active learning, with only a minor decrease in classifier
accuracy. Applying MIP to in vivo experimentation yielded
immediate Positive results. Ten different p53 mutations found in human cancers
were paired in silico with all possible single amino acid
rescue mutations, from which MIP was used to select a Positive Region predicted
to be enriched for p53 cancer rescue mutants. In vivo assays
showed that the predicted Positive Region: (1) had significantly more
(p<0.01) new strong cancer rescue mutants than control regions (Negative,
and non-MIP active learning); (2) had slightly more new strong cancer rescue
mutants than an Expert region selected for purely biological considerations; and
(3) rescued for the first time the previously unrescuable p53 cancer mutant
P152L. Engineering proteins to acquire or enhance a particular useful function is at the
core of many biomedical problems. This paper presents Most Informative Positive
(MIP) active learning, a novel integrated computational/biological approach
designed to help guide biological discovery of novel and informative positive
mutants. A classifier, together with modeled structure-based features, helps
guide biological experiments and so accelerates protein engineering studies. MIP
reduces the number of expensive biological experiments needed to achieve novel
and informative positive results. We used the MIP method to discover novel p53
cancer rescue mutants. p53 is a tumor suppressor protein, and destructive p53
mutations have been implicated in half of all human cancers. Second-site cancer
rescue mutations restore p53 activity and eventually may facilitate rational
design of better cancer drugs. This paper shows that, even in the first round of
in vivo experiments, MIP significantly increased the discovery rate of novel and
informative positive mutants.
Collapse
|