1
|
Vishwakarma S, Hernandez-Hernandez S, Ballester PJ. Graph neural networks are promising for phenotypic virtual screening on cancer cell lines. Biol Methods Protoc 2024; 9:bpae065. [PMID: 39502795 PMCID: PMC11537795 DOI: 10.1093/biomethods/bpae065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 08/20/2024] [Accepted: 09/02/2024] [Indexed: 11/08/2024] Open
Abstract
Artificial intelligence is increasingly driving early drug design, offering novel approaches to virtual screening. Phenotypic virtual screening (PVS) aims to predict how cancer cell lines respond to different compounds by focusing on observable characteristics rather than specific molecular targets. Some studies have suggested that deep learning may not be the best approach for PVS. However, these studies are limited by the small number of tested molecules as well as not employing suitable performance metrics and dissimilar-molecules splits better mimicking the challenging chemical diversity of real-world screening libraries. Here we prepared 60 datasets, each containing approximately 30 000-50 000 molecules tested for their growth inhibitory activities on one of the NCI-60 cancer cell lines. We conducted multiple performance evaluations of each of the five machine learning algorithms for PVS on these 60 problem instances. To provide even a more comprehensive evaluation, we used two model validation types: the random split and the dissimilar-molecules split. Overall, about 14 440 training runs aczross datasets were carried out per algorithm. The models were primarily evaluated using hit rate, a more suitable metric in VS contexts. The results show that all models are more challenged by test molecules that are substantially different from those in the training data. In both validation types, the D-MPNN algorithm, a graph-based deep neural network, was found to be the most suitable for building predictive models for this PVS problem.
Collapse
Affiliation(s)
- Sachin Vishwakarma
- Evotec SAS (France), Toulouse, France
- Centre de Recherche en Cancérologie de Marseille, Marseille 13009, France
| | | | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, United Kingdom
| |
Collapse
|
2
|
Zhao X, Singhal A, Park S, Kong J, Bachelder R, Ideker T. Cancer Mutations Converge on a Collection of Protein Assemblies to Predict Resistance to Replication Stress. Cancer Discov 2024; 14:508-523. [PMID: 38236062 PMCID: PMC10905674 DOI: 10.1158/2159-8290.cd-23-0641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/25/2023] [Accepted: 12/21/2023] [Indexed: 01/19/2024]
Abstract
Rapid proliferation is a hallmark of cancer associated with sensitivity to therapeutics that cause DNA replication stress (RS). Many tumors exhibit drug resistance, however, via molecular pathways that are incompletely understood. Here, we develop an ensemble of predictive models that elucidate how cancer mutations impact the response to common RS-inducing (RSi) agents. The models implement recent advances in deep learning to facilitate multidrug prediction and mechanistic interpretation. Initial studies in tumor cells identify 41 molecular assemblies that integrate alterations in hundreds of genes for accurate drug response prediction. These cover roles in transcription, repair, cell-cycle checkpoints, and growth signaling, of which 30 are shown by loss-of-function genetic screens to regulate drug sensitivity or replication restart. The model translates to cisplatin-treated cervical cancer patients, highlighting an RTK-JAK-STAT assembly governing resistance. This study defines a compendium of mechanisms by which mutations affect therapeutic responses, with implications for precision medicine. SIGNIFICANCE Zhao and colleagues use recent advances in machine learning to study the effects of tumor mutations on the response to common therapeutics that cause RS. The resulting predictive models integrate numerous genetic alterations distributed across a constellation of molecular assemblies, facilitating a quantitative and interpretable assessment of drug response. This article is featured in Selected Articles from This Issue, p. 384.
Collapse
Affiliation(s)
- Xiaoyu Zhao
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
| | - Akshat Singhal
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California
| | - Sungjoon Park
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
| | - JungHo Kong
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Moores Cancer Center, School of Medicine, University of California, San Diego, La Jolla, California
| | - Robin Bachelder
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
| | - Trey Ideker
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California
- Moores Cancer Center, School of Medicine, University of California, San Diego, La Jolla, California
- Department of Bioengineering, University of California, San Diego, La Jolla, California
| |
Collapse
|
3
|
Quinn TP, Hess JL, Marshe VS, Barnett MM, Hauschild AC, Maciukiewicz M, Elsheikh SSM, Men X, Schwarz E, Trakadis YJ, Breen MS, Barnett EJ, Zhang-James Y, Ahsen ME, Cao H, Chen J, Hou J, Salekin A, Lin PI, Nicodemus KK, Meyer-Lindenberg A, Bichindaritz I, Faraone SV, Cairns MJ, Pandey G, Müller DJ, Glatt SJ. A primer on the use of machine learning to distil knowledge from data in biological psychiatry. Mol Psychiatry 2024; 29:387-401. [PMID: 38177352 PMCID: PMC11228968 DOI: 10.1038/s41380-023-02334-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/21/2023] [Accepted: 11/17/2023] [Indexed: 01/06/2024]
Abstract
Applications of machine learning in the biomedical sciences are growing rapidly. This growth has been spurred by diverse cross-institutional and interdisciplinary collaborations, public availability of large datasets, an increase in the accessibility of analytic routines, and the availability of powerful computing resources. With this increased access and exposure to machine learning comes a responsibility for education and a deeper understanding of its bases and bounds, borne equally by data scientists seeking to ply their analytic wares in medical research and by biomedical scientists seeking to harness such methods to glean knowledge from data. This article provides an accessible and critical review of machine learning for a biomedically informed audience, as well as its applications in psychiatry. The review covers definitions and expositions of commonly used machine learning methods, and historical trends of their use in psychiatry. We also provide a set of standards, namely Guidelines for REporting Machine Learning Investigations in Neuropsychiatry (GREMLIN), for designing and reporting studies that use machine learning as a primary data-analysis approach. Lastly, we propose the establishment of the Machine Learning in Psychiatry (MLPsych) Consortium, enumerate its objectives, and identify areas of opportunity for future applications of machine learning in biological psychiatry. This review serves as a cautiously optimistic primer on machine learning for those on the precipice as they prepare to dive into the field, either as methodological practitioners or well-informed consumers.
Collapse
Affiliation(s)
- Thomas P Quinn
- Applied Artificial Intelligence Institute (A2I2), Burwood, VIC, 3125, Australia
| | - Jonathan L Hess
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Victoria S Marshe
- Institute of Medical Science, University of Toronto, Toronto, ON, M5S 1A1, Canada
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
| | - Michelle M Barnett
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, 2308, Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, Newcastle, NSW, 2308, Australia
| | - Anne-Christin Hauschild
- Department of Medical Informatics, Medical University Center Göttingen, Göttingen, Lower Saxony, 37075, Germany
| | - Malgorzata Maciukiewicz
- Hospital Zurich, University of Zurich, Zurich, 8091, Switzerland
- Department of Rheumatology and Immunology, University Hospital Bern, Bern, 3010, Switzerland
- Department for Biomedical Research (DBMR), University of Bern, Bern, 3010, Switzerland
| | - Samar S M Elsheikh
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
| | - Xiaoyu Men
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, M5S 1A1, Canada
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Yannis J Trakadis
- Department Human Genetics, McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
| | - Michael S Breen
- Psychiatry, Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Eric J Barnett
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Yanli Zhang-James
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Mehmet Eren Ahsen
- Department of Business Administration, Gies College of Business, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
- Department of Biomedical and Translational Sciences, Carle-Illinois School of Medicine, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
| | - Han Cao
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Junfang Chen
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Jiahui Hou
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Asif Salekin
- Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY, 13244, USA
| | - Ping-I Lin
- Discipline of Psychiatry and Mental Health, University of New South Wales, Sydney, NSW, 2052, Australia
- Mental Health Research Unit, South Western Sydney Local Health District, Liverpool, NSW, 2170, Australia
| | | | - Andreas Meyer-Lindenberg
- Clinical Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Isabelle Bichindaritz
- Biomedical and Health Informatics/Computer Science Department, State University of New York at Oswego, Oswego, NY, 13126, USA
- Intelligent Bio Systems Lab, State University of New York at Oswego, Oswego, NY, 13126, USA
| | - Stephen V Faraone
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Murray J Cairns
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, 2308, Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, Newcastle, NSW, 2308, Australia
| | - Gaurav Pandey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Daniel J Müller
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, M5S 1A1, Canada
- Department of Psychiatry, Psychosomatics and Psychotherapy, Center of Mental Health, University Hospital of Würzburg, Würzburg, 97080, Germany
| | - Stephen J Glatt
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
- Department of Public Health and Preventive Medicine, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
| |
Collapse
|
4
|
Sosnina EA, Sosnin S, Fedorov MV. Improvement of multi-task learning by data enrichment: application for drug discovery. J Comput Aided Mol Des 2023; 37:183-200. [PMID: 36943645 DOI: 10.1007/s10822-023-00500-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 02/21/2023] [Indexed: 03/23/2023]
Abstract
Multi-task learning in deep neural networks has become a topic of growing importance in many research fields, including drug discovery. However, applying multi-task learning poses new challenges in improving prediction performance. This study investigated the potential of training data enrichment to enhance multi-task model prediction quality in drug discovery. The study evaluated four scenarios with varying degrees of information capacity of the training data and applied two types of test data to evaluate prediction performance. We used three datasets: ViralChEMBL, which consisted of binary activities of compounds against viral species, was applied for the classification task; pQSAR(159) and pQSAR(4267), which consisted of bio-activities of compounds and assays from the research of the profile-QSAR method, were applied for regression tasks. We built multi-task models based on the feed-forward DNNs using the PyTorch framework. Our findings showed that training data enrichment could be an effective means of enhancing prediction performance in multi-task learning, but the degree of improvement depends on the quality of the training data. The more unique compounds and targets the training data included, the more new compound-target interactions are required for prediction improvement. Also, we found out that even using multi-task learning, one could not predict the interactions of compounds that are highly dissimilar from those used for model training. The study provides some recommendations for effectively employing multi-task learning in drug discovery to improve prediction accuracy and facilitate the discovery of novel drug candidates.
Collapse
Affiliation(s)
- Ekaterina A Sosnina
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow, Russia, 143026.
| | - Sergey Sosnin
- Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1190, Vienna, Austria
| | - Maxim V Fedorov
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow, Russia, 143026
- Sirius University of Science and Technology, Olympiisky Prospect 1, Sochi, Russia, 354340
| |
Collapse
|
5
|
Cheng X, Dai C, Wen Y, Wang X, Bo X, He S, Peng S. NeRD: a multichannel neural network to predict cellular response of drugs by integrating multidimensional data. BMC Med 2022; 20:368. [PMID: 36244991 PMCID: PMC9575288 DOI: 10.1186/s12916-022-02549-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 09/01/2022] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Considering the heterogeneity of tumors, it is a key issue in precision medicine to predict the drug response of each individual. The accumulation of various types of drug informatics and multi-omics data facilitates the development of efficient models for drug response prediction. However, the selection of high-quality data sources and the design of suitable methods remain a challenge. METHODS In this paper, we design NeRD, a multidimensional data integration model based on the PRISM drug response database, to predict the cellular response of drugs. Four feature extractors, including drug structure extractor (DSE), molecular fingerprint extractor (MFE), miRNA expression extractor (mEE), and copy number extractor (CNE), are designed for different types and dimensions of data. A fully connected network is used to fuse all features and make predictions. RESULTS Experimental results demonstrate the effective integration of the global and local structural features of drugs, as well as the features of cell lines from different omics data. For all metrics tested on the PRISM database, NeRD surpassed previous approaches. We also verified that NeRD has strong reliability in the prediction results of new samples. Moreover, unlike other algorithms, when the amount of training data was reduced, NeRD maintained stable performance. CONCLUSIONS NeRD's feature fusion provides a new idea for drug response prediction, which is of great significance for precise cancer treatment.
Collapse
Affiliation(s)
- Xiaoxiao Cheng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Chong Dai
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China.,Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China
| | - Yuqi Wen
- Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China
| | - Xiaoqi Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Song He
- Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China. .,The State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University, Changsha, China.
| |
Collapse
|
6
|
Cao H, Zhang Y, Baumbach J, Burton PR, Dwyer D, Koutsouleris N, Matschinske J, Marcon Y, Rajan S, Rieg T, Ryser-Welch P, Späth J, Herrmann C, Schwarz E. dsMTL - a computational framework for privacy-preserving, distributed multi-task machine learning. Bioinformatics 2022; 38:4919-4926. [PMID: 36073911 DOI: 10.1093/bioinformatics/btac616] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 09/06/2022] [Accepted: 09/07/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION In multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. RESULTS Here, we describe the development of "dsMTL", a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n < 500), real expression data given the actual network latency. AVAILABILITY dsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Han Cao
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Youcheng Zhang
- Health Data Science Unit, Medical Faculty Heidelberg & BioQuant, Heidelberg, 69120, Germany
| | - Jan Baumbach
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Computational Biomedicine Lab, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Paul R Burton
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Dominic Dwyer
- Department of Psychiatry and Psychotherapy, Section for Neurodiagnostic Applications, Ludwig-Maximilian University, Munich 80638, Germany
| | - Nikolaos Koutsouleris
- Department of Psychiatry and Psychotherapy, Section for Neurodiagnostic Applications, Ludwig-Maximilian University, Munich 80638, Germany
| | - Julian Matschinske
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | | | - Sivanesan Rajan
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Thilo Rieg
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Patricia Ryser-Welch
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Julian Späth
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | | | - Carl Herrmann
- Health Data Science Unit, Medical Faculty Heidelberg & BioQuant, Heidelberg, 69120, Germany
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| |
Collapse
|
7
|
Automatic identification of drug sensitivity of cancer cell with novel regression-based ensemble convolution neural network model. Soft comput 2022. [DOI: 10.1007/s00500-022-07098-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
8
|
Integration of Omics and Phenotypic Data for Precision Medicine. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2486:19-35. [PMID: 35437716 DOI: 10.1007/978-1-0716-2265-0_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Over the past two decades, biomedical research is moving toward a big-data-driven approach. The underlying causes of this transition include the ability to gather genetic or molecular profiles of humans faster, the increasing adoption of electronic health record (EHR) system, and the growing interest in linking omics and phenotypic data for analysis. The integration of individual's biology data (e.g., genomics, proteomics, metabolomics), and health-care data has created unprecedented opportunities for precision medicine, that is, a medical model that uses a patient's unique information, mainly genetic, to prevent, diagnose, or treat disease. This chapter reviewed the research opportunities and applications of integrating omics and phenotypic data for precision medicine, such as understanding the relationship between genotype and phenotype, disease subtyping, and diagnosis or prediction of adverse outcomes. We reviewed the recent advanced methods, particularly the machine learning and deep learning-based approaches used for harnessing and harmonizing the multiomics and phenotypic data to address these applications. We finally discussed the challenges and future directions.
Collapse
|
9
|
|
10
|
|
11
|
Firoozbakht F, Yousefi B, Schwikowski B. An overview of machine learning methods for monotherapy drug response prediction. Brief Bioinform 2022; 23:bbab408. [PMID: 34619752 PMCID: PMC8769705 DOI: 10.1093/bib/bbab408] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/25/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
For an increasing number of preclinical samples, both detailed molecular profiles and their responses to various drugs are becoming available. Efforts to understand, and predict, drug responses in a data-driven manner have led to a proliferation of machine learning (ML) methods, with the longer term ambition of predicting clinical drug responses. Here, we provide a uniquely wide and deep systematic review of the rapidly evolving literature on monotherapy drug response prediction, with a systematic characterization and classification that comprises more than 70 ML methods in 13 subclasses, their input and output data types, modes of evaluation, and code and software availability. ML experts are provided with a fundamental understanding of the biological problem, and how ML methods are configured for it. Biologists and biomedical researchers are introduced to the basic principles of applicable ML methods, and their application to the problem of drug response prediction. We also provide systematic overviews of commonly used data sources used for training and evaluation methods.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Behnam Yousefi
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
- Sorbonne Université, École Doctorale Complexite du Vivant, Paris, France
| | - Benno Schwikowski
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
12
|
Unsupervised Representation Learning for Proteochemometric Modeling. Int J Mol Sci 2021; 22:ijms222312882. [PMID: 34884688 PMCID: PMC8657702 DOI: 10.3390/ijms222312882] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/25/2021] [Accepted: 11/26/2021] [Indexed: 11/18/2022] Open
Abstract
In silico protein–ligand binding prediction is an ongoing area of research in computational chemistry and machine learning based drug discovery, as an accurate predictive model could greatly reduce the time and resources necessary for the detection and prioritization of possible drug candidates. Proteochemometric modeling (PCM) attempts to create an accurate model of the protein–ligand interaction space by combining explicit protein and ligand descriptors. This requires the creation of information-rich, uniform and computer interpretable representations of proteins and ligands. Previous studies in PCM modeling rely on pre-defined, handcrafted feature extraction methods, and many methods use protein descriptors that require alignment or are otherwise specific to a particular group of related proteins. However, recent advances in representation learning have shown that unsupervised machine learning can be used to generate embeddings that outperform complex, human-engineered representations. Several different embedding methods for proteins and molecules have been developed based on various language-modeling methods. Here, we demonstrate the utility of these unsupervised representations and compare three protein embeddings and two compound embeddings in a fair manner. We evaluate performance on various splits of a benchmark dataset, as well as on an internal dataset of protein–ligand binding activities and find that unsupervised-learned representations significantly outperform handcrafted representations.
Collapse
|
13
|
Chen Y, Zhang L. How much can deep learning improve prediction of the responses to drugs in cancer cell lines? Brief Bioinform 2021; 23:6370847. [PMID: 34529029 DOI: 10.1093/bib/bbab378] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 08/21/2021] [Accepted: 08/24/2021] [Indexed: 12/24/2022] Open
Abstract
The drug response prediction problem arises from personalized medicine and drug discovery. Deep neural networks have been applied to the multi-omics data being available for over 1000 cancer cell lines and tissues for better drug response prediction. We summarize and examine state-of-the-art deep learning methods that have been published recently. Although significant progresses have been made in deep learning approach in drug response prediction, deep learning methods show their weakness for predicting the response of a drug that does not appear in the training dataset. In particular, all the five evaluated deep learning methods performed worst than the similarity-regularized matrix factorization (SRMF) method in our drug blind test. We outline the challenges in applying deep learning approach to drug response prediction and suggest unique opportunities for deep learning integrated with established bioinformatics analyses to overcome some of these challenges.
Collapse
Affiliation(s)
- Yurui Chen
- Department of Mathematics and Computational Biology Programme, National University of Singapore, 119076, Singapore
| | - Louxin Zhang
- Department of Mathematics and Computational Biology Programme, National University of Singapore, 119076, Singapore
| |
Collapse
|
14
|
Miranda SP, Baião FA, Fleck JL, Piccolo SR. Predicting drug sensitivity of cancer cells based on DNA methylation levels. PLoS One 2021; 16:e0238757. [PMID: 34506489 PMCID: PMC8432830 DOI: 10.1371/journal.pone.0238757] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 06/28/2021] [Indexed: 01/22/2023] Open
Abstract
Cancer cell lines, which are cell cultures derived from tumor samples, represent one of the least expensive and most studied preclinical models for drug development. Accurately predicting drug responses for a given cell line based on molecular features may help to optimize drug-development pipelines and explain mechanisms behind treatment responses. In this study, we focus on DNA methylation profiles as one type of molecular feature that is known to drive tumorigenesis and modulate treatment responses. Using genome-wide, DNA methylation profiles from 987 cell lines in the Genomics of Drug Sensitivity in Cancer database, we used machine-learning algorithms to evaluate the potential to predict cytotoxic responses for eight anti-cancer drugs. We compared the performance of five classification algorithms and four regression algorithms representing diverse methodologies, including tree-, probability-, kernel-, ensemble-, and distance-based approaches. We artificially subsampled the data to varying degrees, aiming to understand whether training based on relatively extreme outcomes would yield improved performance. When using classification or regression algorithms to predict discrete or continuous responses, respectively, we consistently observed excellent predictive performance when the training and test sets consisted of cell-line data. Classification algorithms performed best when we trained the models using cell lines with relatively extreme drug-response values, attaining area-under-the-receiver-operating-characteristic-curve values as high as 0.97. The regression algorithms performed best when we trained the models using the full range of drug-response values, although this depended on the performance metrics we used. Finally, we used patient data from The Cancer Genome Atlas to evaluate the feasibility of classifying clinical responses for human tumors based on models derived from cell lines. Generally, the algorithms were unable to identify patterns that predicted patient responses reliably; however, predictions by the Random Forests algorithm were significantly correlated with Temozolomide responses for low-grade gliomas.
Collapse
Affiliation(s)
- Sofia P. Miranda
- Department of Industrial Engineering, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Fernanda A. Baião
- Department of Industrial Engineering, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Julia L. Fleck
- Mines Saint-Etienne, Univ Clermont Auvergne, CNRS, UMR 6158 LIMOS, Centre CIS, Saint-Etienne, France
| | - Stephen R. Piccolo
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| |
Collapse
|
15
|
Guinney J. Preview of "Interpretable systems biomarkers predict response to immune-checkpoint inhibitors". PATTERNS 2021; 2:100313. [PMID: 34430931 PMCID: PMC8369244 DOI: 10.1016/j.patter.2021.100313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Lapuente-Santana et al. (2021) developed Estimate Systems Immune Response (EaSIeR), a method for assessing the immune response to cancer using systems biology traits.
Collapse
|
16
|
Wei WQ, Zhao J, Roden DM, Peterson JF. Machine Learning Challenges in Pharmacogenomic Research. Clin Pharmacol Ther 2021; 110:552-554. [PMID: 34217153 DOI: 10.1002/cpt.2329] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 05/25/2021] [Indexed: 12/23/2022]
Affiliation(s)
- Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Juan Zhao
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Dan M Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Division of Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Oates Institute for Experimental Therapeutics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Josh F Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
17
|
Zhou K, Arslanturk S, Craig DB, Heath E, Draghici S. Discovery of primary prostate cancer biomarkers using cross cancer learning. Sci Rep 2021; 11:10433. [PMID: 34001952 PMCID: PMC8128891 DOI: 10.1038/s41598-021-89789-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 04/30/2021] [Indexed: 02/03/2023] Open
Abstract
Prostate cancer (PCa), the second leading cause of cancer death in American men, is a relatively slow-growing malignancy with multiple early treatment options. Yet, a significant number of low-risk PCa patients are over-diagnosed and over-treated with significant and long-term quality of life effects. Further, there is ever increasing evidence of metastasis and higher mortality when hormone-sensitive or castration-resistant PCa tumors are treated indistinctively. Hence, the critical need is to discover clinically-relevant and actionable PCa biomarkers by better understanding the biology of PCa. In this paper, we have discovered novel biomarkers of PCa tumors through cross-cancer learning by leveraging the pathological and molecular similarities in the DNA repair pathways of ovarian, prostate, and breast cancer tumors. Cross-cancer disease learning enriches the study population and identifies genetic/phenotypic commonalities that are important across diseases with pathological and molecular similarities. Our results show that ADIRF, SLC2A5, C3orf86, HSPA1B are among the most significant PCa biomarkers, while MTRNR2L1, EEPD1, TEPP and VN1R2 are jointly important biomarkers across prostate, breast and ovarian cancers. Our validation results have further shown that the discovered biomarkers can predict the disease state better than any randomly selected subset of differentially expressed prostate cancer genes.
Collapse
Affiliation(s)
- Kaiyue Zhou
- Department of Computer Science, Wayne State University, Detroit, 48201, USA
| | - Suzan Arslanturk
- Department of Computer Science, Wayne State University, Detroit, 48201, USA.
| | - Douglas B Craig
- Department of Oncology, Wayne State University, Detroit, 48201, USA
- Bioinformatics and Biostatistics Core, Barbara Ann Karmanos Cancer Institute, Detroit, 48201, USA
| | - Elisabeth Heath
- Department of Oncology, Wayne State University, Detroit, 48201, USA
- Molecular Therapeutics Program, Barbara Ann Karmanos Cancer Institute, Detroit, 48201, USA
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, 48201, USA
| |
Collapse
|
18
|
Asada K, Kaneko S, Takasawa K, Machino H, Takahashi S, Shinkai N, Shimoyama R, Komatsu M, Hamamoto R. Integrated Analysis of Whole Genome and Epigenome Data Using Machine Learning Technology: Toward the Establishment of Precision Oncology. Front Oncol 2021; 11:666937. [PMID: 34055633 PMCID: PMC8149908 DOI: 10.3389/fonc.2021.666937] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 04/26/2021] [Indexed: 12/17/2022] Open
Abstract
With the completion of the International Human Genome Project, we have entered what is known as the post-genome era, and efforts to apply genomic information to medicine have become more active. In particular, with the announcement of the Precision Medicine Initiative by U.S. President Barack Obama in his State of the Union address at the beginning of 2015, "precision medicine," which aims to divide patients and potential patients into subgroups with respect to disease susceptibility, has become the focus of worldwide attention. The field of oncology is also actively adopting the precision oncology approach, which is based on molecular profiling, such as genomic information, to select the appropriate treatment. However, the current precision oncology is dominated by a method called targeted-gene panel (TGP), which uses next-generation sequencing (NGS) to analyze a limited number of specific cancer-related genes and suggest optimal treatments, but this method causes the problem that the number of patients who benefit from it is limited. In order to steadily develop precision oncology, it is necessary to integrate and analyze more detailed omics data, such as whole genome data and epigenome data. On the other hand, with the advancement of analysis technologies such as NGS, the amount of data obtained by omics analysis has become enormous, and artificial intelligence (AI) technologies, mainly machine learning (ML) technologies, are being actively used to make more efficient and accurate predictions. In this review, we will focus on whole genome sequencing (WGS) analysis and epigenome analysis, introduce the latest results of omics analysis using ML technologies for the development of precision oncology, and discuss the future prospects.
Collapse
Affiliation(s)
- Ken Asada
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Syuzo Kaneko
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ken Takasawa
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Hidenori Machino
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Satoshi Takahashi
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Norio Shinkai
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
- Department of NCC Cancer Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Ryo Shimoyama
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Masaaki Komatsu
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ryuji Hamamoto
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
- Department of NCC Cancer Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| |
Collapse
|
19
|
Dogu E, Albayrak YE, Tuncay E. Length of hospital stay prediction with an integrated approach of statistical-based fuzzy cognitive maps and artificial neural networks. Med Biol Eng Comput 2021; 59:483-496. [PMID: 33544271 DOI: 10.1007/s11517-021-02327-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 01/24/2021] [Indexed: 10/22/2022]
Abstract
Chronic obstructive pulmonary disease (COPD) is a global burden, which is estimated to be the third leading cause of death worldwide by 2030. The economic burden of COPD grows continuously because it is not a curable disease. These conditions make COPD an important research field of artificial intelligence (AI) techniques in medicine. In this study, an integrated approach of the statistical-based fuzzy cognitive maps (SBFCM) and artificial neural networks (ANN) is proposed for predicting length of hospital stay of patients with COPD, who admitted to the hospital with an acute exacerbation. The SBFCM method is developed to determine the input variables of the ANN model. The SBFCM conducts statistical analysis to prepare preliminary information for the experts and then collects expert opinions accordingly, to define a conceptual map of the system. The integration of SBFCM and ANN methods provides both statistical data and expert opinion in the prediction model. In the numerical application, the proposed approach outperformed the conventional approach and other machine learning algorithms with 79.95% accuracy, revealing the power of expert opinion involvement in medical decisions. A medical decision support framework is constructed for better prediction of length of hospital stay and more effective hospital management.
Collapse
Affiliation(s)
- Elif Dogu
- Industrial Engineering Dept., Galatasaray University, Ciragan Cad. No.: 36, Ortakoy, 34349, Istanbul, Turkey.
| | - Y Esra Albayrak
- Industrial Engineering Dept., Galatasaray University, Ciragan Cad. No.: 36, Ortakoy, 34349, Istanbul, Turkey
| | - Esin Tuncay
- Yedikule Chest Diseases & Thoracic Surgery Training & Research Hospital, Belgrad Kapi Yolu Cad. No.: 1 34020 Zeytinburnu, Istanbul, Turkey
| |
Collapse
|
20
|
Jain S, Siramshetty VB, Alves VM, Muratov EN, Kleinstreuer N, Tropsha A, Nicklaus MC, Simeonov A, Zakharov AV. Large-Scale Modeling of Multispecies Acute Toxicity End Points Using Consensus of Multitask Deep Learning Methods. J Chem Inf Model 2021; 61:653-663. [PMID: 33533614 DOI: 10.1021/acs.jcim.0c01164] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Computational methods to predict molecular properties regarding safety and toxicology represent alternative approaches to expedite drug development, screen environmental chemicals, and thus significantly reduce associated time and costs. There is a strong need and interest in the development of computational methods that yield reliable predictions of toxicity, and many approaches, including the recently introduced deep neural networks, have been leveraged towards this goal. Herein, we report on the collection, curation, and integration of data from the public data sets that were the source of the ChemIDplus database for systemic acute toxicity. These efforts generated the largest publicly available such data set comprising > 80,000 compounds measured against a total of 59 acute systemic toxicity end points. This data was used for developing multiple single- and multitask models utilizing random forest, deep neural networks, convolutional, and graph convolutional neural network approaches. For the first time, we also reported the consensus models based on different multitask approaches. To the best of our knowledge, prediction models for 36 of the 59 end points have never been published before. Furthermore, our results demonstrated a significantly better performance of the consensus model obtained from three multitask learning approaches that particularly predicted the 29 smaller tasks (less than 300 compounds) better than other models developed in the study. The curated data set and the developed models have been made publicly available at https://github.com/ncats/ld50-multitask, https://predictor.ncats.io/, and https://cactus.nci.nih.gov/download/acute-toxicity-db (data set only) to support regulatory and research applications.
Collapse
Affiliation(s)
- Sankalp Jain
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Vishal B Siramshetty
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Vinicius M Alves
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Nicole Kleinstreuer
- Division of Intramural Research, Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Durham, North Carolina 27709, United States.,National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Durham, North Carolina 27709, United States
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Marc C Nicklaus
- Computer-Aided Drug Design (CADD) Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, NCI-Frederick, 376 Boyles Street, Frederick, Maryland 21702, United States
| | - Anton Simeonov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| |
Collapse
|
21
|
Xin S, Fang W, Li J, Li D, Wang C, Huang Q, Huang M, Zhuang W, Wang X, Chen L. Impact of STAT1 polymorphisms on crizotinib-induced hepatotoxicity in ALK-positive non-small cell lung cancer patients. J Cancer Res Clin Oncol 2021; 147:725-737. [PMID: 33387041 DOI: 10.1007/s00432-020-03476-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 11/18/2020] [Indexed: 01/01/2023]
Abstract
PURPOSE Crizotinib is the first-line small molecule tyrosine kinase inhibitor for ALK-positive non-small cell lung cancer. In this study, a retrospective pharmacogenomics investigation was conducted to explore the relationship between genes related to RTK downstream signaling pathways and crizotinib-induced hepatic toxicity in ALK-positive NSCLC patients. METHODS The variable importance analysis of random forest algorithm was applied to identify the significant features which contribute to the crizotinib sensitivity in Cancer Cell Line Encyclopedia (CCLE) database. The KEGG and reactome pathway enrichment analysis were conducted with EnrichR. The differential expression genes were identified with R package DESeq2 in CCLE liver derived cell lines between crizotinib sensitive and resistant groups. From 2012 to 2015, 42 NSCLC patients were enrolled in this study. 90 polymorphisms were genotyped using the Sequenom Massarray system. Sequencing of HGFR (c-Met) genes was carried out on the Ion Torrent Proton. RESULTS In total, 66.7% NSCLC patients suffered from crizotinib-induced liver toxicity and 11.9% progressed to severe hepatic toxicity. The features with the top importance from classification and regression random forest model were enriched in RTK downstream signaling pathways (JAK/STAT, RAS/RAF/MAPK, PI3K/AKT pathways) and immune system-related pathways. Collagen family genes (COL1A1, COL1A2, COL6A1, COL5A1) and other extracellular matrix protein (TNC, TAGLN, TENM2, EDIL3, VCAN, CNN1, SH3BP4, TAGLN), which were closely related to MAPK-ERK signaling pathways, were significantly enriched in crizotinib resistant cell lines. In multiple logistic regression, STAT1 rs10208033 (T > C) was significantly associated with crizotinib-induced liver toxicity (OR = 6.733, 95% CI 1.406-32.24, P = 0.017). Compared with non-CC, OR is 5.5 (95% CI 1.219-24.81, P = 0.027) for STAT1 rs10208033 CC genotype to develop crizotinib-induced liver toxicity. Further cell viability test in human fetal hepatocyte line, L-02, reveals that the STAT1 inhibitor might protect hepatocyte cells from the toxicity caused by crizotinib. CONCLUSION Polymorphism of rs10208033 is a potential biomarker for predicting crizotinib-induced hepatotoxicity. These results suggest that STAT1 plays an important role in crizotinib-induced hepatotoxicity. Further studies are needed to confirm our finding and understand the underlying mechanisms.
Collapse
Affiliation(s)
- Shuang Xin
- Institute of Clinical Pharmacology, School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510060, People's Republic of China.,Department of Medical Oncology, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, People's Republic of China
| | - Wenfeng Fang
- Department of Medical Oncology, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, People's Republic of China
| | - Jianwen Li
- CapitalBio Genomics Co., Ltd., Dongguan, 523808, China
| | - Delan Li
- Chemotherapy Department 2, Zhongshan City People's Hospital, Zhongshan, 528403, People's Republic of China
| | - Changzheng Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Quanfei Huang
- CapitalBio Genomics Co., Ltd., Dongguan, 523808, China
| | - Min Huang
- Institute of Clinical Pharmacology, School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510060, People's Republic of China
| | - Wei Zhuang
- Institute of Clinical Pharmacology, School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510060, People's Republic of China
| | - Xueding Wang
- Institute of Clinical Pharmacology, School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510060, People's Republic of China.
| | - Likun Chen
- Department of Medical Oncology, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, People's Republic of China.
| |
Collapse
|
22
|
Kusch N, Schuppert A. Two-step multi-omics modelling of drug sensitivity in cancer cell lines to identify driving mechanisms. PLoS One 2020; 15:e0238961. [PMID: 33226984 PMCID: PMC7682852 DOI: 10.1371/journal.pone.0238961] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 10/30/2020] [Indexed: 11/18/2022] Open
Abstract
Drug sensitivity prediction models for human cancer cell lines constitute important tools in identifying potential computational biomarkers for responsiveness in a pre-clinical setting. Integrating information derived from a range of heterogeneous data is crucial, but remains non-trivial, as differences in data structures may hinder fitting algorithms from assigning adequate weights to complementary information that is contained in distinct omics data. In order to counteract this effect that tends to lead to just one data type dominating supposedly multi-omics models, we developed a novel tool that enables users to train single-omics models separately in a first step and to integrate them into a multi-omics model in a second step. Extensive ablation studies are performed in order to facilitate an in-depth evaluation of the respective contributions of singular data types and of combinations thereof, effectively identifying redundancies and interdependencies between them. Moreover, the integration of the single-omics models is realized by a range of distinct classification algorithms, thus allowing for a performance comparison. Sets of molecular events and tissue types found to be related to significant shifts in drug sensitivity are returned to facilitate a comprehensive and straightforward analysis of potential computational biomarkers for drug responsiveness. Our two-step approach yields sets of actual multi-omics pan-cancer classification models that are highly predictive for a majority of drugs in the GDSC data base. In the context of targeted drugs with particular modes of action, its predictive performances compare favourably to those of classification models that incorporate multi-omics data in a simple one-step approach. Additionally, case studies demonstrate that it succeeds both in correctly identifying known key biomarkers for sensitivity towards specific drug compounds as well as in providing sets of potential candidates for additional computational biomarkers.
Collapse
Affiliation(s)
- Nina Kusch
- Joint Research Center for Computational Biomedicine, RWTH Aachen University, Aachen, Germany
- Aachen Institute for Advanced Study in Computational Engineering Science (AICES), RWTH Aachen University, Aachen, Germany
- Uniklinik Aachen, Aachen, Germany
- * E-mail:
| | - Andreas Schuppert
- Joint Research Center for Computational Biomedicine, RWTH Aachen University, Aachen, Germany
- Aachen Institute for Advanced Study in Computational Engineering Science (AICES), RWTH Aachen University, Aachen, Germany
- Uniklinik Aachen, Aachen, Germany
| |
Collapse
|
23
|
Sharma A, Rani R. Ensembled machine learning framework for drug sensitivity prediction. IET Syst Biol 2020; 14:39-46. [PMID: 31931480 DOI: 10.1049/iet-syb.2018.5094] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Drug sensitivity prediction is one of the critical tasks involved in drug designing and discovery. Recently several online databases and consortiums have contributed to providing open access to pharmacogenomic data. These databases have helped in developing computational approaches for drug sensitivity prediction. Cancer is a complex disease involving the heterogeneous behaviour of same tumour-type patients towards the same kind of drug therapy. Several methods have been proposed in the literature to predict drug sensitivity. However, these methods are not efficient enough to predict drug sensitivity. The present study has proposed an ensemble learning framework for drug-response prediction using a modified rotation forest. The proposed framework is further compared with three state-of-the-art algorithms and two baseline methods using Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) drug screens. The authors have also predicted missing drug response values in the data set using the proposed approach. The proposed approach outperforms other counterparts even though gene mutation data is not incorporated while designing the approach. An average mean square error of 3.14 and 0.404 is achieved using GDSC and CCLE drug screens, respectively. The obtained results show that the proposed framework has considerable potential to improve anti-cancer drug response prediction.
Collapse
|
24
|
Abstract
Machine learning has been heavily researched and widely used in many disciplines. However, achieving high accuracy requires a large amount of data that is sometimes difficult, expensive, or impractical to obtain. Integrating human knowledge into machine learning can significantly reduce data requirement, increase reliability and robustness of machine learning, and build explainable machine learning systems. This allows leveraging the vast amount of human knowledge and capability of machine learning to achieve functions and performance not available before and will facilitate the interaction between human beings and machine learning systems, making machine learning decisions understandable to humans. This paper gives an overview of the knowledge and its representations that can be integrated into machine learning and the methodology. We cover the fundamentals, current status, and recent progress of the methods, with a focus on popular and new topics. The perspectives on future directions are also discussed.
Collapse
Affiliation(s)
- Changyu Deng
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xunbi Ji
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109, USA
| | - Colton Rainey
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jianyu Zhang
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109, USA
| | - Wei Lu
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Materials Science & Engineering, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
25
|
Cabrera-Andrade A, López-Cortés A, Munteanu CR, Pazos A, Pérez-Castillo Y, Tejera E, Arrasate S, González-Díaz H. Perturbation-Theory Machine Learning (PTML) Multilabel Model of the ChEMBL Dataset of Preclinical Assays for Antisarcoma Compounds. ACS OMEGA 2020; 5:27211-27220. [PMID: 33134682 PMCID: PMC7594149 DOI: 10.1021/acsomega.0c03356] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 10/06/2020] [Indexed: 06/11/2023]
Abstract
Sarcomas are a group of malignant neoplasms of connective tissue with a different etiology than carcinomas. The efforts to discover new drugs with antisarcoma activity have generated large datasets of multiple preclinical assays with different experimental conditions. For instance, the ChEMBL database contains outcomes of 37,919 different antisarcoma assays with 34,955 different chemical compounds. Furthermore, the experimental conditions reported in this dataset include 157 types of biological activity parameters, 36 drug targets, 43 cell lines, and 17 assay organisms. Considering this information, we propose combining perturbation theory (PT) principles with machine learning (ML) to develop a PTML model to predict antisarcoma compounds. PTML models use one function of reference that measures the probability of a drug being active under certain conditions (protein, cell line, organism, etc.). In this paper, we used a linear discriminant analysis and neural network to train and compare PT and non-PT models. All the explored models have an accuracy of 89.19-95.25% for training and 89.22-95.46% in validation sets. PTML-based strategies have similar accuracy but generate simplest models. Therefore, they may become a versatile tool for predicting antisarcoma compounds.
Collapse
Affiliation(s)
- Alejandro Cabrera-Andrade
- Grupo
de Bio-Quimioinformática, Universidad
de Las Américas, de los Granados Avenue, Quito 170125, Ecuador
- Carrera
de Enfermería, Facultad de Ciencias de la Salud, Universidad de Las Américas, de los Granados Avenue, Quito 170125, Ecuador
- RNASA-IMEDIR,
Computer Sciences Faculty, University of
A Coruña, A Coruña 15071, Spain
| | - Andrés López-Cortés
- RNASA-IMEDIR,
Computer Sciences Faculty, University of
A Coruña, A Coruña 15071, Spain
- Centro
de Investigación Genética y Genómica, Facultad
de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Mariscal Sucre Avenue, Quito 170129, Ecuador
| | - Cristian R. Munteanu
- RNASA-IMEDIR,
Computer Sciences Faculty, University of
A Coruña, A Coruña 15071, Spain
- Biomedical
Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), A Coruña 15006, Spain
- Centro de
Investigación en Tecnologías de la Información
y las Comunicaciones (CITIC), Campus de
Elviña s/n, A Coruña 15071, Spain
| | - Alejandro Pazos
- RNASA-IMEDIR,
Computer Sciences Faculty, University of
A Coruña, A Coruña 15071, Spain
- Biomedical
Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), A Coruña 15006, Spain
| | - Yunierkis Pérez-Castillo
- Grupo
de Bio-Quimioinformática, Universidad
de Las Américas, de los Granados Avenue, Quito 170125, Ecuador
- Escuela
de Ciencias Físicas y Matemáticas, Universidad de Las Américas, de los Granados Avenue, Quito 170125, Ecuador
| | - Eduardo Tejera
- Grupo
de Bio-Quimioinformática, Universidad
de Las Américas, de los Granados Avenue, Quito 170125, Ecuador
- Facultad
de Ingeniería y Ciencias Aplicadas, Universidad de Las Américas, de los Granados Avenue, Quito 170125, Ecuador
| | - Sonia Arrasate
- Department
of Organic Chemistry II and Basque Center for Biophysics, University of Basque Country UPV/EHU, Leioa 48940, Biscay, Spain
| | - Humbert González-Díaz
- Department
of Organic Chemistry II and Basque Center for Biophysics, University of Basque Country UPV/EHU, Leioa 48940, Biscay, Spain
- Ikerbasque,
Basque Foundation for Science, Bilbao 48011, Biscay, Spain
| |
Collapse
|
26
|
Koras K, Juraeva D, Kreis J, Mazur J, Staub E, Szczurek E. Feature selection strategies for drug sensitivity prediction. Sci Rep 2020; 10:9377. [PMID: 32523056 PMCID: PMC7287073 DOI: 10.1038/s41598-020-65927-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 05/06/2020] [Indexed: 12/16/2022] Open
Abstract
Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. Critically, the sensitivity of cancer cells to treatment depends on an unknown subset of a large number of biological features. Here, we compare standard, data-driven feature selection approaches to feature selection driven by prior knowledge of drug targets, target pathways, and gene expression signatures. We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, evaluating 2484 unique models. For 23 drugs, better predictive performance is achieved when the features are selected according to prior knowledge of drug targets and pathways. The best correlation of observed and predicted response using the test set is achieved for Linifanib (r = 0.75). Extending the drug-dependent features with gene expression signatures yields the most predictive models for 60 drugs, with the best performing example of Dabrafenib. For many compounds, even a very small subset of drug-related features is highly predictive of drug sensitivity. Small feature sets selected using prior knowledge are more predictive for drugs targeting specific genes and pathways, while models with wider feature sets perform better for drugs affecting general cellular mechanisms. Appropriate feature selection strategies facilitate the development of interpretable models that are indicative for therapy design.
Collapse
Affiliation(s)
- Krzysztof Koras
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Dilafruz Juraeva
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Julian Kreis
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Johanna Mazur
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Eike Staub
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Ewa Szczurek
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland.
| |
Collapse
|
27
|
Cao H, Zhou J, Schwarz E. RMTL: an R library for multi-task learning. Bioinformatics 2020; 35:1797-1798. [PMID: 30256897 DOI: 10.1093/bioinformatics/bty831] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 08/20/2018] [Accepted: 09/25/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Multi-task learning (MTL) is a machine learning technique for simultaneous learning of multiple related classification or regression tasks. Despite its increasing popularity, MTL algorithms are currently not available in the widely used software environment R, creating a bottleneck for their application in biomedical research. RESULTS We developed an efficient, easy-to-use R library for MTL (www.r-project.org) comprising 10 algorithms applicable for regression, classification, joint predictor selection, task clustering, low-rank learning and incorporation of biological networks. We demonstrate the utility of the algorithms using simulated data. AVAILABILITY AND IMPLEMENTATION The RMTL package is an open source R package and is freely available at https://github.com/transbioZI/RMTL. RMTL will also be available on cran.r-project.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Han Cao
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Jiayu Zhou
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| |
Collapse
|
28
|
Wang H, Xi J, Wang M, Li A. Dual-Layer Strengthened Collaborative Topic Regression Modeling for Predicting Drug Sensitivity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:587-598. [PMID: 30106738 DOI: 10.1109/tcbb.2018.2864739] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
An effective way to facilitate the development of modern oncology precision medicine is the systematical analysis of the known drug sensitivities that have emerged in recent years. Meanwhile, the screening of drug response in cancer cell lines provides an estimable genomic and pharmacological data towards high accuracy prediction. Existing works primarily utilize genomic or functional genomic features to classify or regress the drug response. Here in this work, by the migration and extension of the conventional merchandise recommendation methods, we introduce an innovation model on accurate drug sensitivity prediction by using dual-layer strengthened collaborative topic regression (DS-CTR), which incorporates not only the graphic model to jointly learn drugs and cell lines feature from pharmacogenomics data but also drug and cell line similarity network model to strengthen the correlation of the prediction results. Using Genomics of Drug Sensitivity in Cancer project (GDSC) as benchmark datasets, the 5-fold cross-validation experiment demonstrates that DS-CTR model significantly improves drug response prediction performance compared with four categories of state-of-the-art algorithms as for both Receiver Operator Curve (ROC) and the Area Under Receiver Operator Curve (AUC). By uncovering the unknown cell-drug associations with advanced literature evidences, our novel model DS-CTR is validated and supported. The model also provides the possibility to make the discovery of new anti-cancer therapeutics in the preclinical trials cheaper and faster.
Collapse
|
29
|
Hamamoto R, Komatsu M, Takasawa K, Asada K, Kaneko S. Epigenetics Analysis and Integrated Analysis of Multiomics Data, Including Epigenetic Data, Using Artificial Intelligence in the Era of Precision Medicine. Biomolecules 2019; 10:biom10010062. [PMID: 31905969 PMCID: PMC7023005 DOI: 10.3390/biom10010062] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 12/20/2019] [Accepted: 12/27/2019] [Indexed: 12/14/2022] Open
Abstract
To clarify the mechanisms of diseases, such as cancer, studies analyzing genetic mutations have been actively conducted for a long time, and a large number of achievements have already been reported. Indeed, genomic medicine is considered the core discipline of precision medicine, and currently, the clinical application of cutting-edge genomic medicine aimed at improving the prevention, diagnosis and treatment of a wide range of diseases is promoted. However, although the Human Genome Project was completed in 2003 and large-scale genetic analyses have since been accomplished worldwide with the development of next-generation sequencing (NGS), explaining the mechanism of disease onset only using genetic variation has been recognized as difficult. Meanwhile, the importance of epigenetics, which describes inheritance by mechanisms other than the genomic DNA sequence, has recently attracted attention, and, in particular, many studies have reported the involvement of epigenetic deregulation in human cancer. So far, given that genetic and epigenetic studies tend to be accomplished independently, physiological relationships between genetics and epigenetics in diseases remain almost unknown. Since this situation may be a disadvantage to developing precision medicine, the integrated understanding of genetic variation and epigenetic deregulation appears to be now critical. Importantly, the current progress of artificial intelligence (AI) technologies, such as machine learning and deep learning, is remarkable and enables multimodal analyses of big omics data. In this regard, it is important to develop a platform that can conduct multimodal analysis of medical big data using AI as this may accelerate the realization of precision medicine. In this review, we discuss the importance of genome-wide epigenetic and multiomics analyses using AI in the era of precision medicine.
Collapse
Affiliation(s)
- Ryuji Hamamoto
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
- Correspondence: ; Tel.: +81-3-3547-5271
| | - Masaaki Komatsu
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Ken Takasawa
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Ken Asada
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Syuzo Kaneko
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
| |
Collapse
|
30
|
Scala G, Federico A, Fortino V, Greco D, Majello B. Knowledge Generation with Rule Induction in Cancer Omics. Int J Mol Sci 2019; 21:E18. [PMID: 31861438 PMCID: PMC6981587 DOI: 10.3390/ijms21010018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 11/26/2019] [Accepted: 12/13/2019] [Indexed: 12/21/2022] Open
Abstract
The explosion of omics data availability in cancer research has boosted the knowledge of the molecular basis of cancer, although the strategies for its definitive resolution are still not well established. The complexity of cancer biology, given by the high heterogeneity of cancer cells, leads to the development of pharmacoresistance for many patients, hampering the efficacy of therapeutic approaches. Machine learning techniques have been implemented to extract knowledge from cancer omics data in order to address fundamental issues in cancer research, as well as the classification of clinically relevant sub-groups of patients and for the identification of biomarkers for disease risk and prognosis. Rule induction algorithms are a group of pattern discovery approaches that represents discovered relationships in the form of human readable associative rules. The application of such techniques to the modern plethora of collected cancer omics data can effectively boost our understanding of cancer-related mechanisms. In fact, the capability of these methods to extract a huge amount of human readable knowledge will eventually help to uncover unknown relationships between molecular attributes and the malignant phenotype. In this review, we describe applications and strategies for the usage of rule induction approaches in cancer omics data analysis. In particular, we explore the canonical applications and the future challenges and opportunities posed by multi-omics integration problems.
Collapse
Affiliation(s)
- Giovanni Scala
- Department of Biology, University of Naples Federico II, 80126 Naples, Italy;
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, 33014 Tampere, Finland; (A.F.); (D.G.)
| | - Vittorio Fortino
- Institute of Biomedicine, University of Eastern Finland, 70210 Kuopio, Finland;
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, 33014 Tampere, Finland; (A.F.); (D.G.)
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland
| | - Barbara Majello
- Department of Biology, University of Naples Federico II, 80126 Naples, Italy;
| |
Collapse
|
31
|
Brand L, Yang X, Liu K, Elbeleidy S, Wang H, Zhang H, Nie F. Learning Robust Multilabel Sample Specific Distances for Identifying HIV-1 Drug Resistance. J Comput Biol 2019; 27:655-672. [PMID: 31725323 DOI: 10.1089/cmb.2019.0329] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
AIDS is a syndrome caused by the HIV. During the progression of AIDS, a patient's immune system is weakened, which increases the patient's susceptibility to infections and diseases. Although antiretroviral drugs can effectively suppress HIV, the virus mutates very quickly and can become resistant to treatment. In addition, the virus can also become resistant to other treatments not currently being used through mutations, which is known in the clinical research community as cross-resistance. Since a single HIV strain can be resistant to multiple drugs, this problem is naturally represented as a multilabel classification problem. Given this multilabel relationship, traditional single-label classification methods often fail to effectively identify the drug resistances that may develop after a particular virus mutation. In this work, we propose a novel multilabel Robust Sample Specific Distance (RSSD) method to identify multiclass HIV drug resistance. Our method is novel in that it can illustrate the relative strength of the drug resistance of a reverse transcriptase (RT) sequence against a given drug nucleoside analog and learn the distance metrics for all the drug resistances. To learn the proposed RSSDs, we formulate a learning objective that maximizes the ratio of the summations of a number of ℓ1-norm distances, which is difficult to solve in general. To solve this optimization problem, we derive an efficient, nongreedy iterative algorithm with rigorously proved convergence. Our new method has been verified on a public HIV type 1 drug resistance data set with over 600 RT sequences and five nucleoside analogs. We compared our method against several state-of-the-art multilabel classification methods, and the experimental results have demonstrated the effectiveness of our proposed method.
Collapse
Affiliation(s)
- Lodewijk Brand
- Department of Computer Science, Colorado School of Mines, Golden, Colorado
| | - Xue Yang
- Department of Computer Science, Colorado School of Mines, Golden, Colorado
| | - Kai Liu
- Department of Computer Science, Colorado School of Mines, Golden, Colorado
| | - Saad Elbeleidy
- Department of Computer Science, Colorado School of Mines, Golden, Colorado
| | - Hua Wang
- Department of Computer Science, Colorado School of Mines, Golden, Colorado
| | - Hao Zhang
- Department of Computer Science, Colorado School of Mines, Golden, Colorado
| | - Feiping Nie
- School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi'an, P.R. China
| |
Collapse
|
32
|
Parca L, Pepe G, Pietrosanto M, Galvan G, Galli L, Palmeri A, Sciandrone M, Ferrè F, Ausiello G, Helmer-Citterich M. Modeling cancer drug response through drug-specific informative genes. Sci Rep 2019; 9:15222. [PMID: 31645597 PMCID: PMC6811538 DOI: 10.1038/s41598-019-50720-0] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 09/06/2019] [Indexed: 12/18/2022] Open
Abstract
Recent advances in pharmacogenomics have generated a wealth of data of different types whose analysis have helped in the identification of signatures of different cellular sensitivity/resistance responses to hundreds of chemical compounds. Among the different data types, gene expression has proven to be the more successful for the inference of drug response in cancer cell lines. Although effective, the whole transcriptome can introduce noise in the predictive models, since specific mechanisms are required for different drugs and these realistically involve only part of the proteins encoded in the genome. We analyzed the pharmacogenomics data of 961 cell lines tested with 265 anti-cancer drugs and developed different machine learning approaches for dissecting the genome systematically and predict drug responses using both drug-unspecific and drug-specific genes. These methodologies reach better response predictions for the vast majority of the screened drugs using tens to few hundreds genes specific to each drug instead of the whole genome, thus allowing a better understanding and interpretation of drug-specific response mechanisms which are not necessarily restricted to the drug known targets.
Collapse
Affiliation(s)
- Luca Parca
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Gerardo Pepe
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Marco Pietrosanto
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Giulio Galvan
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Leonardo Galli
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Antonio Palmeri
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
- Celgene Institute for Translational Research Europe, Sevilla, Spain
| | - Marco Sciandrone
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Fabrizio Ferrè
- Department of Pharmacy and Biotechnology, University of Bologna Alma Mater, Bologna, Italy
| | - Gabriele Ausiello
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | | |
Collapse
|
33
|
Manica M, Oskooei A, Born J, Subramanian V, Sáez-Rodríguez J, Rodríguez Martínez M. Toward Explainable Anticancer Compound Sensitivity Prediction via Multimodal Attention-Based Convolutional Encoders. Mol Pharm 2019; 16:4797-4806. [DOI: 10.1021/acs.molpharmaceut.9b00520] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
| | | | - Jannis Born
- IBM Research, 8803 Zürich, Switzerland
- ETH Zürich, 8092 Zürich, Switzerland
- University of Zürich, 8006 Zürich, Switzerland
| | | | | | | |
Collapse
|
34
|
Ševa J, Wiegandt DL, Götze J, Lamping M, Rieke D, Schäfer R, Jähnichen P, Kittner M, Pallarz S, Starlinger J, Keilholz U, Leser U. VIST - a Variant-Information Search Tool for precision oncology. BMC Bioinformatics 2019; 20:429. [PMID: 31419935 PMCID: PMC6697931 DOI: 10.1186/s12859-019-2958-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 06/18/2019] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Diagnosis and treatment decisions in cancer increasingly depend on a detailed analysis of the mutational status of a patient's genome. This analysis relies on previously published information regarding the association of variations to disease progression and possible interventions. Clinicians to a large degree use biomedical search engines to obtain such information; however, the vast majority of scientific publications focus on basic science and have no direct clinical impact. We develop the Variant-Information Search Tool (VIST), a search engine designed for the targeted search of clinically relevant publications given an oncological mutation profile. RESULTS VIST indexes all PubMed abstracts and content from ClinicalTrials.gov. It applies advanced text mining to identify mentions of genes, variants and drugs and uses machine learning based scoring to judge the clinical relevance of indexed abstracts. Its functionality is available through a fast and intuitive web interface. We perform several evaluations, showing that VIST's ranking is superior to that of PubMed or a pure vector space model with regard to the clinical relevance of a document's content. CONCLUSION Different user groups search repositories of scientific publications with different intentions. This diversity is not adequately reflected in the standard search engines, often leading to poor performance in specialized settings. We develop a search engine for the specific case of finding documents that are clinically relevant in the course of cancer treatment. We believe that the architecture of our engine, heavily relying on machine learning algorithms, can also act as a blueprint for search engines in other, equally specific domains. VIST is freely available at https://vist.informatik.hu-berlin.de/.
Collapse
Affiliation(s)
- Jurica Ševa
- Knowledge Management in Bioinformatics, Department of Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Berlin, 12489, Germany
| | - David Luis Wiegandt
- Knowledge Management in Bioinformatics, Department of Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Berlin, 12489, Germany
| | - Julian Götze
- University Hospital Tübingen, Hoppe-Seyler-Straße 3, Tübingen, 72076, Germany
| | - Mario Lamping
- Charité Comprehensive Cancer Center, Charitéplatz 1, Berlin, 10117, Germany
| | - Damian Rieke
- Charité Comprehensive Cancer Center, Charitéplatz 1, Berlin, 10117, Germany
- Department of Hematology and Medical Oncology, Campus Benjamin Franklin, Charité Unviersitätsmedizin Berlin, Hindenburgdamm 30, Berlin, 12203, Germany
- Berlin Institute of Health, Kapelle-Ufer 2, Berlin, 10117, Germany
| | - Reinhold Schäfer
- Charité Comprehensive Cancer Center, Charitéplatz 1, Berlin, 10117, Germany
- German Cancer Consortium (DKTK), DKFZ Heidelberg, Im Neuenheimer Feld 280, Heidelberg, 69120, Germany
| | - Patrick Jähnichen
- Knowledge Management in Bioinformatics, Department of Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Berlin, 12489, Germany
| | - Madeleine Kittner
- Knowledge Management in Bioinformatics, Department of Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Berlin, 12489, Germany
| | - Steffen Pallarz
- Knowledge Management in Bioinformatics, Department of Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Berlin, 12489, Germany
| | - Johannes Starlinger
- Knowledge Management in Bioinformatics, Department of Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Berlin, 12489, Germany
| | - Ulrich Keilholz
- Charité Comprehensive Cancer Center, Charitéplatz 1, Berlin, 10117, Germany
| | - Ulf Leser
- Knowledge Management in Bioinformatics, Department of Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Berlin, 12489, Germany.
| |
Collapse
|
35
|
Sharifi-Noghabi H, Zolotareva O, Collins CC, Ester M. MOLI: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics 2019; 35:i501-i509. [PMID: 31510700 PMCID: PMC6612815 DOI: 10.1093/bioinformatics/btz318] [Citation(s) in RCA: 164] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Historically, gene expression has been shown to be the most informative data for drug response prediction. Recent evidence suggests that integrating additional omics can improve the prediction accuracy which raises the question of how to integrate the additional omics. Regardless of the integration strategy, clinical utility and translatability are crucial. Thus, we reasoned a multi-omics approach combined with clinical datasets would improve drug response prediction and clinical relevance. RESULTS We propose MOLI, a multi-omics late integration method based on deep neural networks. MOLI takes somatic mutation, copy number aberration and gene expression data as input, and integrates them for drug response prediction. MOLI uses type-specific encoding sub-networks to learn features for each omics type, concatenates them into one representation and optimizes this representation via a combined cost function consisting of a triplet loss and a binary cross-entropy loss. The former makes the representations of responder samples more similar to each other and different from the non-responders, and the latter makes this representation predictive of the response values. We validate MOLI on in vitro and in vivo datasets for five chemotherapy agents and two targeted therapeutics. Compared to state-of-the-art single-omics and early integration multi-omics methods, MOLI achieves higher prediction accuracy in external validations. Moreover, a significant improvement in MOLI's performance is observed for targeted drugs when training on a pan-drug input, i.e. using all the drugs with the same target compared to training only on drug-specific inputs. MOLI's high predictive power suggests it may have utility in precision oncology. AVAILABILITY AND IMPLEMENTATION https://github.com/hosseinshn/MOLI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hossein Sharifi-Noghabi
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
- Vancouver Prostate Centre, Vancouver, BC, Canada
| | - Olga Zolotareva
- International Research Training Group Computational Methods for the Analysis of the Diversity and Dynamics of Genomes and Genome Informatics, Faculty of Technology and Center for Biotechnology, Bielefeld University, Germany
| | - Colin C Collins
- Vancouver Prostate Centre, Vancouver, BC, Canada
- Department of Urologic Sciences, University of British Columbia, Vancouver, BC, Canada
| | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
- Vancouver Prostate Centre, Vancouver, BC, Canada
| |
Collapse
|
36
|
Guan NN, Zhao Y, Wang CC, Li JQ, Chen X, Piao X. Anticancer Drug Response Prediction in Cell Lines Using Weighted Graph Regularized Matrix Factorization. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 17:164-174. [PMID: 31265947 PMCID: PMC6610642 DOI: 10.1016/j.omtn.2019.05.017] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 05/17/2019] [Accepted: 05/20/2019] [Indexed: 12/14/2022]
Abstract
Precision medicine has become a novel and rising concept, which depends much on the identification of individual genomic signatures for different patients. The cancer cell lines could reflect the “omic” diversity of primary tumors, based on which many works have been carried out to study the cancer biology and drug discovery both in experimental and computational aspects. In this work, we presented a novel method to utilize weighted graph regularized matrix factorization (WGRMF) for inferring anticancer drug response in cell lines. We constructed a p-nearest neighbor graph to sparsify drug similarity matrix and cell line similarity matrix, respectively. Using the sparsified matrices in the graph regularization terms, we performed matrix factorization to generate the latent matrices for drug and cell line. The graph regularization terms including neighbor information could help to exclude the noisy ingredient and improve the prediction accuracy. The 10-fold cross-validation was implemented, and the Pearson correlation coefficient (PCC), root-mean-square error (RMSE), PCCsr, and RMSEsr averaged over all drugs were calculated to evaluate the performance of WGRMF. The results on the Genomics of Drug Sensitivity in Cancer (GDSC) dataset are 0.64 ± 0.16, 1.37 ± 0.35, 0.73 ± 0.14, and 1.71 ± 0.44 for PCC, RMSE, PCCsr, and RMSEsr in turn. And for the Cancer Cell Line Encyclopedia (CCLE) dataset, WGRMF got results of 0.72 ± 0.09, 0.56 ± 0.19, 0.79 ± 0.07, and 0.69 ± 0.19, respectively. The results showed the superiority of WGRMF compared with previous methods. Besides, based on the prediction results using the GDSC dataset, three types of case studies were carried out. The results from both cross-validation and case studies have shown the effectiveness of WGRMF on the prediction of drug response in cell lines.
Collapse
Affiliation(s)
- Na-Na Guan
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Chun-Chun Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China.
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China.
| | - Xue Piao
- School of Medical Informatics, Xuzhou Medical University, Xuzhou 221004, China.
| |
Collapse
|
37
|
Yang JH, Wright SN, Hamblin M, McCloskey D, Alcantar MA, Schrübbers L, Lopatkin AJ, Satish S, Nili A, Palsson BO, Walker GC, Collins JJ. A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action. Cell 2019; 177:1649-1661.e9. [PMID: 31080069 PMCID: PMC6545570 DOI: 10.1016/j.cell.2019.04.016] [Citation(s) in RCA: 188] [Impact Index Per Article: 37.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 03/19/2019] [Accepted: 04/08/2019] [Indexed: 12/13/2022]
Abstract
Current machine learning techniques enable robust association of biological signals with measured phenotypes, but these approaches are incapable of identifying causal relationships. Here, we develop an integrated "white-box" biochemical screening, network modeling, and machine learning approach for revealing causal mechanisms and apply this approach to understanding antibiotic efficacy. We counter-screen diverse metabolites against bactericidal antibiotics in Escherichia coli and simulate their corresponding metabolic states using a genome-scale metabolic network model. Regression of the measured screening data on model simulations reveals that purine biosynthesis participates in antibiotic lethality, which we validate experimentally. We show that antibiotic-induced adenine limitation increases ATP demand, which elevates central carbon metabolism activity and oxygen consumption, enhancing the killing effects of antibiotics. This work demonstrates how prospective network modeling can couple with machine learning to identify complex causal mechanisms underlying drug efficacy.
Collapse
Affiliation(s)
- Jason H Yang
- Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Sarah N Wright
- Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Meagan Hamblin
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Douglas McCloskey
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Miguel A Alcantar
- Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Lars Schrübbers
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Allison J Lopatkin
- Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Sangeeta Satish
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Amir Nili
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Bernhard O Palsson
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Lyngby, Denmark; Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Graham C Walker
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - James J Collins
- Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA.
| |
Collapse
|
38
|
Ngo TM, Teo YY. Genomic prediction of tuberculosis drug-resistance: benchmarking existing databases and prediction algorithms. BMC Bioinformatics 2019; 20:68. [PMID: 30736750 PMCID: PMC6368788 DOI: 10.1186/s12859-019-2658-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2018] [Accepted: 01/28/2019] [Indexed: 12/28/2022] Open
Abstract
Background It is possible to predict whether a tuberculosis (TB) patient will fail to respond to specific antibiotics by sequencing the genome of the infecting Mycobacterium tuberculosis (Mtb) and observing whether the pathogen carries specific mutations at drug-resistance sites. This advancement has led to the collation of TB databases such as PATRIC and ReSeqTB that possess both whole genome sequences and drug resistance phenotypes of infecting Mtb isolates. Bioinformatics tools have also been developed to predict drug resistance from whole genome sequencing (WGS) data. Here, we evaluate the performance of four popular tools (TBProfiler, MyKrobe, KvarQ, PhyResSE) with 6746 isolates compiled from publicly available databases, and subsequently identify highly probable phenotyping errors in the databases by genetically predicting the drug phenotypes using all four software. Results Our results show that these bioinformatics tools generally perform well in predicting the resistance status for two key first-line agents (isoniazid, rifampicin), but the accuracy is lower for second-line injectables and fluoroquinolones. The error rates in the databases are also non-trivial, reaching as high as 31.1% for prothionamide, and that phenotypes from ReSeqTB are more susceptible to errors. Conclusions The good performance of the automated software for drug resistance prediction from TB WGS data shown in this study further substantiates the usefulness and promise of utilising genetic data to accurately profile TB drug resistance, thereby reducing misdiagnoses arising from error-prone culture-based drug susceptibility testing. Electronic supplementary material The online version of this article (10.1186/s12859-019-2658-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tra-My Ngo
- NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore, 119077, Singapore
| | - Yik-Ying Teo
- NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore, 119077, Singapore. .,Saw Swee Hock School of Public Health, National University of Singapore, 12 Science Drive 2, Singapore, 117549, Singapore. .,Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore. .,Life Sciences Institute, National University of Singapore, Singapore, 117456, Singapore. .,Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672, Singapore.
| |
Collapse
|
39
|
Singh H, Rana PS, Singh U. Prediction of drug synergy score using ensemble based differential evolution. IET Syst Biol 2019; 13:24-29. [PMID: 30774113 PMCID: PMC8687263 DOI: 10.1049/iet-syb.2018.5023] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 07/23/2018] [Accepted: 09/05/2018] [Indexed: 12/23/2022] Open
Abstract
Prediction of drug synergy score is an ill-posed problem. It plays an efficient role in the medical field for inhibiting specific cancer agents. An efficient regression-based machine learning technique has an ability to minimise the drug synergy prediction errors. Therefore, in this study, an efficient machine learning technique for drug synergy prediction technique is designed by using ensemble based differential evolution (DE) for optimising the support vector machine (SVM). Because the tuning of the attributes of SVM kernel regulates the prediction precision. The ensemble based DE employs two trial vector generation techniques and two control attributes settings. The initial generation technique has the best solution and the other is without the best solution. The proposed and existing competitive machine learning techniques are applied to drug synergy data. The extensive analysis demonstrates that the proposed technique outperforms others in terms of accuracy, root mean square error and coefficient of correlation.
Collapse
Affiliation(s)
- Harpreet Singh
- Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab, 147004, India.
| | - Prashant Singh Rana
- Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab, 147004, India
| | - Urvinder Singh
- Electronics & Communication Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab, 147004, India
| |
Collapse
|
40
|
Srinivas R, Klimovich PV, Larson EC. Implicit-descriptor ligand-based virtual screening by means of collaborative filtering. J Cheminform 2018; 10:56. [PMID: 30467684 PMCID: PMC6755561 DOI: 10.1186/s13321-018-0310-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Accepted: 11/13/2018] [Indexed: 12/20/2022] Open
Abstract
Current ligand-based machine learning methods in virtual screening rely heavily on molecular fingerprinting for preprocessing, i.e., explicit description of ligands’ structural and physicochemical properties in a vectorized form. Of particular importance to current methods are the extent to which molecular fingerprints describe a particular ligand and what metric sufficiently captures similarity among ligands. In this work, we propose and evaluate methods that do not require explicit feature vectorization through fingerprinting, but, instead, provide implicit descriptors based only on other known assays. Our methods are based upon well known collaborative filtering algorithms used in recommendation systems. Our implicit descriptor method does not require any fingerprint similarity search, which makes the method free of the bias arising from the empirical nature of the fingerprint models. We show that implicit methods significantly outperform traditional machine learning methods, and the main strengths of implicit methods are their resilience to target-ligand sparsity and high potential for spotting promiscuous ligands.
Collapse
Affiliation(s)
- Raghuram Srinivas
- Department of Computer Science and Engineering, Bobby B. Lyle School of Engineering, Southern Methodist University, 3145 Dyer Street, Dallas, TX, 75205, USA. .,DataScience@SMU, Dallas, 75205, TX, USA.
| | - Pavel V Klimovich
- Department of Computer Science and Engineering, Bobby B. Lyle School of Engineering, Southern Methodist University, 3145 Dyer Street, Dallas, TX, 75205, USA.,The Dedman College Interdisciplinary Institute, 3225 Daniel Avenue, Dallas, TX, 75205, USA
| | - Eric C Larson
- Department of Computer Science and Engineering, Bobby B. Lyle School of Engineering, Southern Methodist University, 3145 Dyer Street, Dallas, TX, 75205, USA
| |
Collapse
|
41
|
Cao H, Meyer-Lindenberg A, Schwarz E. Comparative Evaluation of Machine Learning Strategies for Analyzing Big Data in Psychiatry. Int J Mol Sci 2018; 19:E3387. [PMID: 30380679 PMCID: PMC6274760 DOI: 10.3390/ijms19113387] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Revised: 10/22/2018] [Accepted: 10/25/2018] [Indexed: 12/24/2022] Open
Abstract
The requirement of innovative big data analytics has become a critical success factor for research in biological psychiatry. Integrative analyses across distributed data resources are considered essential for untangling the biological complexity of mental illnesses. However, little is known about algorithm properties for such integrative machine learning. Here, we performed a comparative analysis of eight machine learning algorithms for identification of reproducible biological fingerprints across data sources, using five transcriptome-wide expression datasets of schizophrenia patients and controls as a use case. We found that multi-task learning (MTL) with network structure (MTL_NET) showed superior accuracy compared to other MTL formulations as well as single task learning, and tied performance with support vector machines (SVM). Compared to SVM, MTL_NET showed significant benefits regarding the variability of accuracy estimates, as well as its robustness to cross-dataset and sampling variability. These results support the utility of this algorithm as a flexible tool for integrative machine learning in psychiatry.
Collapse
Affiliation(s)
- Han Cao
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159 Mannheim, Germany.
| | - Andreas Meyer-Lindenberg
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159 Mannheim, Germany.
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159 Mannheim, Germany.
| |
Collapse
|
42
|
Zhang L, Chen X, Guan NN, Liu H, Li JQ. A Hybrid Interpolation Weighted Collaborative Filtering Method for Anti-cancer Drug Response Prediction. Front Pharmacol 2018; 9:1017. [PMID: 30258362 PMCID: PMC6143790 DOI: 10.3389/fphar.2018.01017] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 08/22/2018] [Indexed: 12/16/2022] Open
Abstract
Individualized therapies ask for the most effective regimen for each patient, while the patients' response may differ from each other. However, it is impossible to clinically evaluate each patient's response due to the large population. Human cell lines have harbored most of the same genetic changes found in patients' tumors, thus are widely used to help understand initial responses of drugs. Based on the more credible assumption that similar cell lines and similar drugs exhibit similar responses, we formulated drug response prediction as a recommender system problem, and then adopted a hybrid interpolation weighted collaborative filtering (HIWCF) method to predict anti-cancer drug responses of cell lines by incorporating cell line similarity and drug similarity shown from gene expression profiles, drug chemical structure as well as drug response similarity. Specifically, we estimated the baseline based on the available responses and shrunk the similarity score for each cell line pair as well as each drug pair. The similarity scores were then shrunk and weighted by the correlation coefficients drawn from the know response between each pair. Before used to find the K most similar neighbors for further prediction, they went through the case amplification strategy to emphasize high similarity and neglect low similarity. In the last step for prediction, cell line-oriented and drug-oriented collaborative filtering models were carried out, and the average of predicted values from both models was used as the final predicted sensitivity. Through 10-fold cross validation, this approach was shown to reach accurate and reproducible outcome for those missing drug sensitivities. We also found that the drug response similarity between cell lines or drugs may play important role in the prediction. Finally, we discussed the biological outcomes based on the newly predicted response values in GDSC dataset.
Collapse
Affiliation(s)
- Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Na-Na Guan
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| |
Collapse
|
43
|
Morin O, Vallières M, Jochems A, Woodruff HC, Valdes G, Braunstein SE, Wildberger JE, Villanueva-Meyer JE, Kearney V, Yom SS, Solberg TD, Lambin P. A Deep Look Into the Future of Quantitative Imaging in Oncology: A Statement of Working Principles and Proposal for Change. Int J Radiat Oncol Biol Phys 2018; 102:1074-1082. [PMID: 30170101 DOI: 10.1016/j.ijrobp.2018.08.032] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 08/21/2018] [Accepted: 08/21/2018] [Indexed: 12/13/2022]
Abstract
The adoption of enterprise digital imaging, along with the development of quantitative imaging methods and the re-emergence of statistical learning, has opened the opportunity for more personalized cancer treatments through transformative data science research. In the last 5 years, accumulating evidence has indicated that noninvasive advanced imaging analytics (i.e., radiomics) can reveal key components of tumor phenotype for multiple lesions at multiple time points over the course of treatment. Many groups using homegrown software have extracted engineered and deep quantitative features on 3-dimensional medical images for better spatial and longitudinal understanding of tumor biology and for the prediction of diverse outcomes. These developments could augment patient stratification and prognostication, buttressing emerging targeted therapeutic approaches. Unfortunately, the rapid growth in popularity of this immature scientific discipline has resulted in many early publications that miss key information or use underpowered patient data sets, without production of generalizable results. Quantitative imaging research is complex, and key principles should be followed to realize its full potential. The fields of quantitative imaging and radiomics in particular require a renewed focus on optimal study design and reporting practices, standardization, interpretability, data sharing, and clinical trials. Standardization of image acquisition, feature calculation, and statistical analysis (i.e., machine learning) are required for the field to move forward. A new data-sharing paradigm enacted among open and diverse participants (medical institutions, vendors and associations) should be embraced for faster development and comprehensive clinical validation of imaging biomarkers. In this review and critique of the field, we propose working principles and fundamental changes to the current scientific approach, with the goal of high-impact research and development of actionable prediction models that will yield more meaningful applications of precision cancer medicine.
Collapse
Affiliation(s)
- Olivier Morin
- Department of Radiation Oncology, University of California San Francisco, San Francisco, California.
| | | | - Arthur Jochems
- The D-Lab, Grow Research Institute for Oncology, Maastricht University, Maastricht, The Netherlands
| | - Henry C Woodruff
- The D-Lab, Grow Research Institute for Oncology, Maastricht University, Maastricht, The Netherlands
| | - Gilmer Valdes
- Department of Radiation Oncology, University of California San Francisco, San Francisco, California
| | - Steve E Braunstein
- Department of Radiation Oncology, University of California San Francisco, San Francisco, California
| | - Joachim E Wildberger
- Department of Radiology and Nuclear Medicine, Maastricht University Medical Center, Maastricht, The Netherlands
| | | | - Vasant Kearney
- Department of Radiation Oncology, University of California San Francisco, San Francisco, California
| | - Sue S Yom
- Department of Radiation Oncology, University of California San Francisco, San Francisco, California
| | - Timothy D Solberg
- Department of Radiation Oncology, University of California San Francisco, San Francisco, California
| | - Philippe Lambin
- The D-Lab, Grow Research Institute for Oncology, Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
44
|
Kalamara A, Tobalina L, Saez-Rodriguez J. How to find the right drug for each patient? Advances and challenges in pharmacogenomics. CURRENT OPINION IN SYSTEMS BIOLOGY 2018; 10:53-62. [PMID: 31763498 PMCID: PMC6855262 DOI: 10.1016/j.coisb.2018.07.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Cancer is a highly heterogeneous disease with complex underlying biology. For these reasons, effective cancer treatment is still a challenge. Nowadays, it is clear that a cancer therapy that fits all the cases cannot be found, and as a result the design of therapies tailored to the patient's molecular characteristics is needed. Pharmacogenomics aims to study the relationship between an individual's genotype and drug response. Scientists use different biological models, ranging from cell lines to mouse models, as proxies for patients for preclinical and translational studies. The rapid development of "-omics" technologies is increasing the amount of features that can be measured in these models, expanding the possibilities of finding predictive biomarkers of drug response. Finding these relationships requires diverse computational approaches ranging from machine learning to dynamic modeling. Despite major advances, we are still far from being able to precisely predict drug efficacy in cancer models, let alone directly on patients. We believe that the new experimental techniques and computational approaches covered in this review will bring us closer to this goal.
Collapse
Affiliation(s)
- Angeliki Kalamara
- RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen, Germany
| | - Luis Tobalina
- RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen, Germany
| | - Julio Saez-Rodriguez
- RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK
- Heidelberg University, Faculty of Medicine, Institute of Computational Biomedicine, Heidelberg, Germany
| |
Collapse
|
45
|
Tan M, Özgül OF, Bardak B, Ekşioğlu I, Sabuncuoğlu S. Drug response prediction by ensemble learning and drug-induced gene expression signatures. Genomics 2018; 111:1078-1088. [PMID: 31533900 DOI: 10.1016/j.ygeno.2018.07.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Revised: 06/12/2018] [Accepted: 07/03/2018] [Indexed: 12/14/2022]
Abstract
Chemotherapeutic response of cancer cells to a given compound is one of the most fundamental information one requires to design anti-cancer drugs. Recently, considerable amount of drug-induced gene expression data has become publicly available, in addition to cytotoxicity databases. These large sets of data provided an opportunity to apply machine learning methods to predict drug activity. However, due to the complexity of cancer drug mechanisms, none of the existing methods is perfect. In this paper, we propose a novel ensemble learning method to predict drug response. In addition, we attempt to use the drug screen data together with two novel signatures produced from the drug-induced gene expression profiles of cancer cell lines. Finally, we evaluate predictions by in vitro experiments in addition to the tests on data sets. The predictions of the methods, the signatures and the software are available from http://mtan.etu.edu.tr/drug-response-prediction/.
Collapse
Affiliation(s)
- Mehmet Tan
- Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey.
| | - Ozan Fırat Özgül
- Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey
| | - Batuhan Bardak
- Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey
| | - Işıksu Ekşioğlu
- Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey
| | - Suna Sabuncuoğlu
- Department of Toxicology, Faculty of Pharmacy, Hacettepe University, Ankara, Turkey
| |
Collapse
|
46
|
Sundin I, Peltola T, Micallef L, Afrabandpey H, Soare M, Mamun Majumder M, Daee P, He C, Serim B, Havulinna A, Heckman C, Jacucci G, Marttinen P, Kaski S. Improving genomics-based predictions for precision medicine through active elicitation of expert knowledge. Bioinformatics 2018; 34:i395-i403. [PMID: 29949984 PMCID: PMC6022689 DOI: 10.1093/bioinformatics/bty257] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Motivation Precision medicine requires the ability to predict the efficacies of different treatments for a given individual using high-dimensional genomic measurements. However, identifying predictive features remains a challenge when the sample size is small. Incorporating expert knowledge offers a promising approach to improve predictions, but collecting such knowledge is laborious if the number of candidate features is very large. Results We introduce a probabilistic framework to incorporate expert feedback about the impact of genomic measurements on the outcome of interest and present a novel approach to collect the feedback efficiently, based on Bayesian experimental design. The new approach outperformed other recent alternatives in two medical applications: prediction of metabolic traits and prediction of sensitivity of cancer cells to different drugs, both using genomic features as predictors. Furthermore, the intelligent approach to collect feedback reduced the workload of the expert to approximately 11%, compared to a baseline approach. Availability and implementation Source code implementing the introduced computational methods is freely available at https://github.com/AaltoPML/knowledge-elicitation-for-precision-medicine. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Iiris Sundin
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Tomi Peltola
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Luana Micallef
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Homayun Afrabandpey
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Marta Soare
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Muntasir Mamun Majumder
- Institute for Molecular Medicine Finland FIMM, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
| | - Pedram Daee
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Chen He
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Baris Serim
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Aki Havulinna
- Institute for Molecular Medicine Finland FIMM, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland.,National Institute for Health and Welfare THL, Helsinki, Finland
| | - Caroline Heckman
- Institute for Molecular Medicine Finland FIMM, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
| | - Giulio Jacucci
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Pekka Marttinen
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Samuel Kaski
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| |
Collapse
|
47
|
Yang M, Simm J, Lam CC, Zakeri P, van Westen GJP, Moreau Y, Saez-Rodriguez J. Linking drug target and pathway activation for effective therapy using multi-task learning. Sci Rep 2018; 8:8322. [PMID: 29844324 PMCID: PMC5974390 DOI: 10.1038/s41598-018-25947-y] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 05/02/2018] [Indexed: 01/27/2023] Open
Abstract
Despite the abundance of large-scale molecular and drug-response data, the insights gained about the mechanisms underlying treatment efficacy in cancer has been in general limited. Machine learning algorithms applied to those datasets most often are used to provide predictions without interpretation, or reveal single drug-gene association and fail to derive robust insights. We propose to use Macau, a bayesian multitask multi-relational algorithm to generalize from individual drugs and genes and explore the interactions between the drug targets and signaling pathways' activation. A typical insight would be: "Activation of pathway Y will confer sensitivity to any drug targeting protein X". We applied our methodology to the Genomics of Drug Sensitivity in Cancer (GDSC) screening, using gene expression of 990 cancer cell lines, activity scores of 11 signaling pathways derived from the tool PROGENy as cell line input and 228 nominal targets for 265 drugs as drug input. These interactions can guide a tissue-specific combination treatment strategy, for example suggesting to modulate a certain pathway to maximize the drug response for a given tissue. We confirmed in literature drug combination strategies derived from our result for brain, skin and stomach tissues. Such an analysis of interactions across tissues might help target discovery, drug repurposing and patient stratification strategies.
Collapse
Affiliation(s)
- Mi Yang
- RWTH Aachen University, Faculty of Medicine, Joint Research Center for Computational Biomedicine, Aachen, Germany
| | - Jaak Simm
- ESAT-STADIUS, KU Leuven B-3001, Heverlee, Belgium
| | - Chi Chung Lam
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333CC, Leiden, The Netherlands
| | - Pooya Zakeri
- ESAT-STADIUS, KU Leuven B-3001, Heverlee, Belgium
| | - Gerard J P van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333CC, Leiden, The Netherlands
| | - Yves Moreau
- ESAT-STADIUS, KU Leuven B-3001, Heverlee, Belgium
| | - Julio Saez-Rodriguez
- RWTH Aachen University, Faculty of Medicine, Joint Research Center for Computational Biomedicine, Aachen, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK.
| |
Collapse
|
48
|
Mikulskis P, Hook A, Dundas AA, Irvine D, Sanni O, Anderson D, Langer R, Alexander MR, Williams P, Winkler DA. Prediction of Broad-Spectrum Pathogen Attachment to Coating Materials for Biomedical Devices. ACS APPLIED MATERIALS & INTERFACES 2018; 10:139-149. [PMID: 29191009 PMCID: PMC7613461 DOI: 10.1021/acsami.7b14197] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Bacterial infections in healthcare settings are a frequent accompaniment to both routine procedures such as catheterization and surgical site interventions. Their impact is becoming even more marked as the numbers of medical devices that are used to manage chronic health conditions and improve quality of life increases. The resistance of pathogens to multiple antibiotics is also increasing, adding an additional layer of complexity to the problems of employing safe and effective medical procedures. One approach to reducing the rate of infections associated with implanted and indwelling medical devices is the use of polymers that resist the formation of bacterial biofilms. To significantly accelerate the discovery of such materials, we show how state of the art machine learning methods can generate quantitative predictions for the attachment of multiple pathogens to a large library of polymers in a single model for the first time. Such models facilitate design of polymers with very low pathogen attachment across different bacterial species that will be candidate materials for implantable or indwelling medical devices such as urinary catheters, cochlear implants, and pacemakers.
Collapse
Affiliation(s)
- Paulius Mikulskis
- School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | - Andrew Hook
- School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | - Adam A. Dundas
- School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, United Kingdom
- Faculty of Engineering, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | - Derek Irvine
- Faculty of Engineering, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | - Olutoba Sanni
- School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | - Daniel Anderson
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, Massachusetts 02139-4307, United States
| | - Robert Langer
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, Massachusetts 02139-4307, United States
| | - Morgan R. Alexander
- School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, United Kingdom
- Corresponding Authors; ;
| | - Paul Williams
- Centre for Biomolecular Sciences, School of Life Sciences, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | - David A. Winkler
- School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, United Kingdom
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Kingsbury Drive, Melbourne, Victoria 3086, Australia
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria 3052, Australia
- School of Chemical and Physical Sciences, Flinders University, Bedford Park, South Australia 5046, Australia
- Corresponding Authors; ;
| |
Collapse
|
49
|
Lee SI, Celik S, Logsdon BA, Lundberg SM, Martins TJ, Oehler VG, Estey EH, Miller CP, Chien S, Dai J, Saxena A, Blau CA, Becker PS. A machine learning approach to integrate big data for precision medicine in acute myeloid leukemia. Nat Commun 2018; 9:42. [PMID: 29298978 PMCID: PMC5752671 DOI: 10.1038/s41467-017-02465-5] [Citation(s) in RCA: 125] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 11/30/2017] [Indexed: 02/06/2023] Open
Abstract
Cancers that appear pathologically similar often respond differently to the same drug regimens. Methods to better match patients to drugs are in high demand. We demonstrate a promising approach to identify robust molecular markers for targeted treatment of acute myeloid leukemia (AML) by introducing: data from 30 AML patients including genome-wide gene expression profiles and in vitro sensitivity to 160 chemotherapy drugs, a computational method to identify reliable gene expression markers for drug sensitivity by incorporating multi-omic prior information relevant to each gene’s potential to drive cancer. We show that our method outperforms several state-of-the-art approaches in identifying molecular markers replicated in validation data and predicting drug sensitivity accurately. Finally, we identify SMARCA4 as a marker and driver of sensitivity to topoisomerase II inhibitors, mitoxantrone, and etoposide, in AML by showing that cell lines transduced to have high SMARCA4 expression reveal dramatically increased sensitivity to these agents. Identification of markers of drug response is essential for precision therapy. Here the authors introduce an algorithm that uses prior information about each gene’s importance in AML to identify the most predictive gene-drug associations from transcriptome and drug response data from 30 AML samples.
Collapse
Affiliation(s)
- Su-In Lee
- Paul G. Allen School of Computer Science and Engineering, University of Washington, 185 E Stevens Way NE, Seattle, WA, 98195, USA. .,Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA, 98195, USA. .,Center for Cancer Innovation, University of Washington, 850 Republican Street, Seattle, WA, 98109, USA.
| | - Safiye Celik
- Paul G. Allen School of Computer Science and Engineering, University of Washington, 185 E Stevens Way NE, Seattle, WA, 98195, USA
| | | | - Scott M Lundberg
- Paul G. Allen School of Computer Science and Engineering, University of Washington, 185 E Stevens Way NE, Seattle, WA, 98195, USA
| | - Timothy J Martins
- Quellos High Throughput Screening Core, University of Washington, 850 Republican Street, Seattle, WA, 98109, USA
| | - Vivian G Oehler
- Clinical Research Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA, 98109, USA.,Division of Hematology, Department of Medicine and Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA, 98109, USA
| | - Elihu H Estey
- Clinical Research Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA, 98109, USA.,Division of Hematology, Department of Medicine and Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA, 98109, USA
| | - Chris P Miller
- Division of Hematology, Department of Medicine and Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA, 98109, USA
| | - Sylvia Chien
- Division of Hematology, Department of Medicine and Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA, 98109, USA
| | - Jin Dai
- Division of Hematology, Department of Medicine and Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA, 98109, USA
| | - Akanksha Saxena
- Division of Hematology, Department of Medicine and Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA, 98109, USA
| | - C Anthony Blau
- Center for Cancer Innovation, University of Washington, 850 Republican Street, Seattle, WA, 98109, USA.,Division of Hematology, Department of Medicine and Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA, 98109, USA
| | - Pamela S Becker
- Center for Cancer Innovation, University of Washington, 850 Republican Street, Seattle, WA, 98109, USA.,Clinical Research Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA, 98109, USA.,Division of Hematology, Department of Medicine and Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA, 98109, USA
| |
Collapse
|
50
|
Ruffalo M, Stojanov P, Pillutla VK, Varma R, Bar-Joseph Z. Reconstructing cancer drug response networks using multitask learning. BMC SYSTEMS BIOLOGY 2017; 11:96. [PMID: 29017547 PMCID: PMC5635550 DOI: 10.1186/s12918-017-0471-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Accepted: 10/02/2017] [Indexed: 01/03/2023]
Abstract
BACKGROUND Translating in vitro results to clinical tests is a major challenge in systems biology. Here we present a new Multi-Task learning framework which integrates thousands of cell line expression experiments to reconstruct drug specific response networks in cancer. RESULTS The reconstructed networks correctly identify several shared key proteins and pathways while simultaneously highlighting many cell type specific proteins. We used top proteins from each drug network to predict survival for patients prescribed the drug. CONCLUSIONS Predictions based on proteins from the in-vitro derived networks significantly outperformed predictions based on known cancer genes indicating that Multi-Task learning can indeed identify accurate drug response networks.
Collapse
Affiliation(s)
- Matthew Ruffalo
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Petar Stojanov
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Venkata Krishna Pillutla
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Rohan Varma
- Electrical and Computer Engineering, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ziv Bar-Joseph
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. .,Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|