1
|
Wenteler A, Cabrera CP, Wei W, Neduva V, Barnes MR. AI approaches for the discovery and validation of drug targets. CAMBRIDGE PRISMS. PRECISION MEDICINE 2024; 2:e7. [PMID: 39258224 PMCID: PMC11383977 DOI: 10.1017/pcm.2024.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 05/04/2024] [Accepted: 05/08/2024] [Indexed: 09/12/2024]
Abstract
Artificial intelligence (AI) holds immense promise for accelerating and improving all aspects of drug discovery, not least target discovery and validation. By integrating a diverse range of biological data modalities, AI enables the accurate prediction of drug target properties, ultimately illuminating biological mechanisms of disease and guiding drug discovery strategies. Despite the indisputable potential of AI in drug target discovery, there are many challenges and obstacles yet to be overcome, including dealing with data biases, model interpretability and generalisability, and the validation of predicted drug targets, to name a few. By exploring recent advancements in AI, this review showcases current applications of AI for drug target discovery and offers perspectives on the future of AI for the discovery and validation of drug targets, paving the way for the generation of novel and safer pharmaceuticals.
Collapse
Affiliation(s)
- Aaron Wenteler
- Digital Environment Research Institute, Queen Mary University of London, London, United Kingdom
- Centre for Translational Bioinformatics, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- MSD Discovery Centre, London, United Kingdom
| | - Claudia P Cabrera
- Digital Environment Research Institute, Queen Mary University of London, London, United Kingdom
- Centre for Translational Bioinformatics, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- NIHR Barts Cardiovascular Biomedical Research Centre, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
| | - Wei Wei
- MSD Discovery Centre, London, United Kingdom
| | | | - Michael R Barnes
- Digital Environment Research Institute, Queen Mary University of London, London, United Kingdom
- Centre for Translational Bioinformatics, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- NIHR Barts Cardiovascular Biomedical Research Centre, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
- The Alan Turing Institute, London, United Kingdom
| |
Collapse
|
2
|
Kim Y, Han Y, Hopper C, Lee J, Joo JI, Gong JR, Lee CK, Jang SH, Kang J, Kim T, Cho KH. A gray box framework that optimizes a white box logical model using a black box optimizer for simulating cellular responses to perturbations. CELL REPORTS METHODS 2024; 4:100773. [PMID: 38744288 PMCID: PMC11133856 DOI: 10.1016/j.crmeth.2024.100773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 03/19/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024]
Abstract
Predicting cellular responses to perturbations requires interpretable insights into molecular regulatory dynamics to perform reliable cell fate control, despite the confounding non-linearity of the underlying interactions. There is a growing interest in developing machine learning-based perturbation response prediction models to handle the non-linearity of perturbation data, but their interpretation in terms of molecular regulatory dynamics remains a challenge. Alternatively, for meaningful biological interpretation, logical network models such as Boolean networks are widely used in systems biology to represent intracellular molecular regulation. However, determining the appropriate regulatory logic of large-scale networks remains an obstacle due to the high-dimensional and discontinuous search space. To tackle these challenges, we present a scalable derivative-free optimizer trained by meta-reinforcement learning for Boolean network models. The logical network model optimized by the trained optimizer successfully predicts anti-cancer drug responses of cancer cell lines, while simultaneously providing insight into their underlying molecular regulatory mechanisms.
Collapse
Affiliation(s)
- Yunseong Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Younghyun Han
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Corbin Hopper
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jonghoon Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jae Il Joo
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jeong-Ryeol Gong
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Chun-Kyung Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Seong-Hoon Jang
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Junsoo Kang
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Taeyoung Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Kwang-Hyun Cho
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea.
| |
Collapse
|
3
|
Abbasi M, Carvalho FG, Ribeiro B, Arrais JP. Predicting drug activity against cancer through genomic profiles and SMILES. Artif Intell Med 2024; 150:102820. [PMID: 38553160 DOI: 10.1016/j.artmed.2024.102820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/09/2024] [Accepted: 02/21/2024] [Indexed: 04/02/2024]
Abstract
Due to the constant increase in cancer rates, the disease has become a leading cause of death worldwide, enhancing the need for its detection and treatment. In the era of personalized medicine, the main goal is to incorporate individual variability in order to choose more precisely which therapy and prevention strategies suit each person. However, predicting the sensitivity of tumors to anticancer treatments remains a challenge. In this work, we propose two deep neural network models to predict the impact of anticancer drugs in tumors through the half-maximal inhibitory concentration (IC50). These models join biological and chemical data to apprehend relevant features of the genetic profile and the drug compounds, respectively. In order to predict the drug response in cancer cell lines, this study employed different DL methods, resorting to Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). In the first stage, two autoencoders were pre-trained with high-dimensional gene expression and mutation data of tumors. Afterward, this genetic background is transferred to the prediction models that return the IC50 value that portrays the potency of a substance in inhibiting a cancer cell line. When comparing RSEM Expected counts and TPM as methods for displaying gene expression data, RSEM has been shown to perform better in deep models and CNNs model can obtain better insight in these types of data. Moreover, the obtained results reflect the effectiveness of the extracted deep representations in the prediction of the IC50 value that portrays the potency of a substance in inhibiting a tumor, achieving a performance of a mean squared error of 1.06 and surpassing previous state-of-the-art models.
Collapse
Affiliation(s)
- Maryam Abbasi
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal; Polytechnic Institute of Coimbra, Applied Research Institute, Coimbra, Portugal; Research Centre for Natural Resources Environment and Society (CERNAS), Polytechnic Institute of Coimbra, Coimbra, Portugal.
| | - Filipa G Carvalho
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| | - Bernardete Ribeiro
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| | - Joel P Arrais
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
4
|
Vasanthakumari P, Zhu Y, Brettin T, Partin A, Shukla M, Xia F, Narykov O, Weil MR, Stevens RL. A Comprehensive Investigation of Active Learning Strategies for Conducting Anti-Cancer Drug Screening. Cancers (Basel) 2024; 16:530. [PMID: 38339281 PMCID: PMC10854925 DOI: 10.3390/cancers16030530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/12/2024] [Accepted: 01/22/2024] [Indexed: 02/12/2024] Open
Abstract
It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.
Collapse
Affiliation(s)
- Priyanka Vasanthakumari
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
| | - Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Maulik Shukla
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Fangfang Xia
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Michael Ryan Weil
- Cancer Research Technology Program, Cancer Data Science Initiatives, Frederick National Laboratory for Cancer Research, Frederick, MD 21701, USA;
| | - Rick L. Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
5
|
Brindha GR, Rishiikeshwer BS, Santhi B, Nakendraprasath K, Manikandan R, Gandomi AH. Precise prediction of multiple anticancer drug efficacy using multi target regression and support vector regression analysis. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 224:107027. [PMID: 35914385 DOI: 10.1016/j.cmpb.2022.107027] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/08/2022] [Accepted: 07/13/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVES The prediction of multiple drug efficacies using machine learning prediction techniques based on clinical and molecular attributes of tumors is a new approach in the field of precision medicine of oncology. The selection of suitable and effective therapeutic drugs among the potential drugs is performed computationally considering the tumor features. In this study, we developed and validated machine learning models to predict the efficacy of five anti-cancer drugs according to the clinical and molecular attributes of 30 oral squamous cell carcinoma (OSCC) cohorts. This sounds a bit odd - consider: Ranking of the drugs was achieved using their apoptotic priming. METHODS We developed multiple drug efficacy prediction models based on three types of tumor characteristics by applying machine learning methods, including multi-target regression (MTR) and support vector regression (SVR). The prediction accuracy of existing machine learning methods was enhanced by introducing novel pre-processing techniques to develop Enhanced MTR (E_MTR), Enhanced Log-based MTR (EL_MTR), Enhanced Multi-target SVR (EM_SVR), and Enhanced Log-based Multi-target SVR (ELM_SVR). As a unique capability, ELM_SVR and EL_MTR rank the drugs based on their predicted efficacy. All the drug efficacy prediction models were built using OSCC real samples and theoretical samples. The best model was selected was based on dataset size and evaluation metrics, such as error terms, residuals and parameter tuning, and cross-validated (CV) using 30 real samples and 340 theoretical samples. RESULTS When 30 real tumor samples were used for the train-test and CV methods, MTR models predicted the efficacy with less error than SVR models. Comparatively, using 340 theoretical samples for the train-test and CV methods, though MTR improved the performance, SVR predicted the efficacy with zero error. We found that, for small samples, the proposed MTR provided a 0.01 difference between actual apoptotic priming and predicted priming of five drugs. For large samples, the predicted values by the proposed SVR had a difference of 0.00001. The error terms (Actual vs. Predicted) also reveal that the enhanced log model is suitable when MTR is applied. Meanwhile, the enhanced model is suitable for SVR learning for multiple drug efficacy prediction. It was found that the predicted ranks of the drugs based on the multi-targeted efficacy prediction exactly match the actual rankings. CONCLUSION We developed efficient statistical and machine learning models using MTR and SVR analysis for anticancer drug efficacy, which will be useful in the field of precision medicine to choose the most suitable drugs in personalized manner. The performance results of the proposed enhanced ranking techniques are described as follows: i) EL_MTR is the best to predict multiple anticancer drug efficacies and improve the accuracy of ranking drugs, irrespective of sample size; and ii) ELM_SVR performs better than other MTR models with a large sample size and precise ranking process.
Collapse
Affiliation(s)
- G R Brindha
- SASTRA Deemed to be University, Thanjavur, Tamilnadu 613401, India
| | | | - B Santhi
- SASTRA Deemed to be University, Thanjavur, Tamilnadu 613401, India.
| | | | - R Manikandan
- SASTRA Deemed to be University, Thanjavur, Tamilnadu 613401, India
| | - Amir H Gandomi
- Data Science Institute, Faculty of Engineering and Information Systems, University of Technology Sydney, Ultimo, NSW 2007, Australia.
| |
Collapse
|
6
|
Ogunleye AZ, Piyawajanusorn C, Gonçalves A, Ghislat G, Ballester PJ. Interpretable Machine Learning Models to Predict the Resistance of Breast Cancer Patients to Doxorubicin from Their microRNA Profiles. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2022; 9:e2201501. [PMID: 35785523 PMCID: PMC9403644 DOI: 10.1002/advs.202201501] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 06/02/2022] [Indexed: 05/05/2023]
Abstract
Doxorubicin is a common treatment for breast cancer. However, not all patients respond to this drug, which sometimes causes life-threatening side effects. Accurately anticipating doxorubicin-resistant patients would therefore permit to spare them this risk while considering alternative treatments without delay. Stratifying patients based on molecular markers in their pretreatment tumors is a promising approach to advance toward this ambitious goal, but single-gene gene markers such as HER2 expression have not shown to be sufficiently predictive. The recent availability of matched doxorubicin-response and diverse molecular profiles across breast cancer patients permits now analysis at a much larger scale. 16 machine learning algorithms and 8 molecular profiles are systematically evaluated on the same cohort of patients. Only 2 of the 128 resulting models are substantially predictive, showing that they can be easily missed by a standard-scale analysis. The best model is classification and regression tree (CART) nonlinearly combining 4 selected miRNA isoforms to predict doxorubicin response (median Matthew correlation coefficient (MCC) and area under the curve (AUC) of 0.56 and 0.80, respectively). By contrast, HER2 expression is significantly less predictive (median MCC and AUC of 0.14 and 0.57, respectively). As the predictive accuracy of this CART model increases with larger training sets, its update with future data should result in even better accuracy.
Collapse
Affiliation(s)
- Adeolu Z. Ogunleye
- Cancer Research Center of Marseille (CRCM)INSERM U1068MarseilleF‐13009France
- Cancer Research Center of Marseille (CRCM)Institut Paoli‐CalmettesMarseilleF‐13009France
- Cancer Research Center of Marseille (CRCM)Aix‐Marseille UniversitéMarseilleF‐13284France
- Cancer Research Center of Marseille (CRCM)CNRS UMR7258MarseilleF‐13009France
| | - Chayanit Piyawajanusorn
- Cancer Research Center of Marseille (CRCM)INSERM U1068MarseilleF‐13009France
- Cancer Research Center of Marseille (CRCM)Institut Paoli‐CalmettesMarseilleF‐13009France
- Cancer Research Center of Marseille (CRCM)Aix‐Marseille UniversitéMarseilleF‐13284France
- Cancer Research Center of Marseille (CRCM)CNRS UMR7258MarseilleF‐13009France
| | - Anthony Gonçalves
- Cancer Research Center of Marseille (CRCM)INSERM U1068MarseilleF‐13009France
- Cancer Research Center of Marseille (CRCM)Institut Paoli‐CalmettesMarseilleF‐13009France
- Cancer Research Center of Marseille (CRCM)Aix‐Marseille UniversitéMarseilleF‐13284France
- Cancer Research Center of Marseille (CRCM)CNRS UMR7258MarseilleF‐13009France
| | - Ghita Ghislat
- Cancer Research Center of Marseille (CRCM)INSERM U1068MarseilleF‐13009France
- Cancer Research Center of Marseille (CRCM)Institut Paoli‐CalmettesMarseilleF‐13009France
- Cancer Research Center of Marseille (CRCM)Aix‐Marseille UniversitéMarseilleF‐13284France
- Cancer Research Center of Marseille (CRCM)CNRS UMR7258MarseilleF‐13009France
| | - Pedro J. Ballester
- Cancer Research Center of Marseille (CRCM)INSERM U1068MarseilleF‐13009France
- Cancer Research Center of Marseille (CRCM)Institut Paoli‐CalmettesMarseilleF‐13009France
- Cancer Research Center of Marseille (CRCM)Aix‐Marseille UniversitéMarseilleF‐13284France
- Cancer Research Center of Marseille (CRCM)CNRS UMR7258MarseilleF‐13009France
- Department of BioengineeringImperial College LondonLondonSW7 2AZUK
| |
Collapse
|
7
|
Su Y, Shi Y, Lee W, Cheng L, Guo H. TAHDNet: Time-Aware Hierarchical Dependency Network for Medication Recommendation. J Biomed Inform 2022; 129:104069. [PMID: 35390541 DOI: 10.1016/j.jbi.2022.104069] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 03/15/2022] [Accepted: 03/31/2022] [Indexed: 11/25/2022]
Abstract
Medication recommendation is a hot topic in the research of applying neural networks to the healthcare area. Although extensive progressions have been made, current researches still face the following challenges: i). Existing methods are poor at efficiently capturing and leveraging local and global dependency information from patient visit records. ii). Current time-aware models based on irregularly interval medical records tend to ignore periodic variability in patient conditions, which limits the representational learning capability of these models. Therefore, we propose a Dynamic Time-aware Hierarchical Dependency Network (TAHDNet) for the medication recommendation task to address these challenges. Firstly, we use a Transformer-based model to learn the global information of the whole patient record through a self-supervised pre-training process. Secondly, a 1D-CNN model is used to learn the local dependencies on visitation level. Thirdly, we propose a dynamic time-aware module with a fused temporal decay function to assign different weights among different time intervals dynamically through a key-value attention mechanism. Experimental results on real-world datasets demonstrate the effectiveness of the model proposed in this paper.
Collapse
Affiliation(s)
- Yaqi Su
- School of Software, Shandong University.
| | - Yuliang Shi
- School of Software, Shandong University; Dareway Software Co., Ltd.
| | - Wu Lee
- School of Software, Shandong University.
| | - Lin Cheng
- School of Software, Shandong University.
| | - Hongmei Guo
- Department of Periodontology, School and Hospital of Stomatology, Cheeloo College of Medicine, Shandong University; Shandong Key Laboratory of Oral Tissue Regeneration; Shandong Engineering Laboratory for Dental Materials and Oral Tissue Regeneration.
| |
Collapse
|
8
|
Wang Z, Wang Z, Huang Y, Lu L, Fu Y. A multi-view multi-omics model for cancer drug response prediction. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03294-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
9
|
Firoozbakht F, Yousefi B, Schwikowski B. An overview of machine learning methods for monotherapy drug response prediction. Brief Bioinform 2022; 23:bbab408. [PMID: 34619752 PMCID: PMC8769705 DOI: 10.1093/bib/bbab408] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/25/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
For an increasing number of preclinical samples, both detailed molecular profiles and their responses to various drugs are becoming available. Efforts to understand, and predict, drug responses in a data-driven manner have led to a proliferation of machine learning (ML) methods, with the longer term ambition of predicting clinical drug responses. Here, we provide a uniquely wide and deep systematic review of the rapidly evolving literature on monotherapy drug response prediction, with a systematic characterization and classification that comprises more than 70 ML methods in 13 subclasses, their input and output data types, modes of evaluation, and code and software availability. ML experts are provided with a fundamental understanding of the biological problem, and how ML methods are configured for it. Biologists and biomedical researchers are introduced to the basic principles of applicable ML methods, and their application to the problem of drug response prediction. We also provide systematic overviews of commonly used data sources used for training and evaluation methods.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Behnam Yousefi
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
- Sorbonne Université, École Doctorale Complexite du Vivant, Paris, France
| | - Benno Schwikowski
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
10
|
Nguyen LC, Naulaerts S, Bruna A, Ghislat G, Ballester PJ. Predicting Cancer Drug Response In Vivo by Learning an Optimal Feature Selection of Tumour Molecular Profiles. Biomedicines 2021; 9:biomedicines9101319. [PMID: 34680436 PMCID: PMC8533095 DOI: 10.3390/biomedicines9101319] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 09/22/2021] [Accepted: 09/23/2021] [Indexed: 12/17/2022] Open
Abstract
(1) Background: Inter-tumour heterogeneity is one of cancer’s most fundamental features. Patient stratification based on drug response prediction is hence needed for effective anti-cancer therapy. However, single-gene markers of response are rare and/or may fail to achieve a significant impact in the clinic. Machine Learning (ML) is emerging as a particularly promising complementary approach to precision oncology. (2) Methods: Here we leverage comprehensive Patient-Derived Xenograft (PDX) pharmacogenomic data sets with dimensionality-reducing ML algorithms with this purpose. (3) Results: Combining multiple gene alterations via ML leads to better discrimination between sensitive and resistant PDXs in 19 of the 26 analysed cases. Highly predictive ML models employing concise gene lists were found for three cases: paclitaxel (breast cancer), binimetinib (breast cancer) and cetuximab (colorectal cancer). Interestingly, each of these multi-gene ML models identifies some treatment-responsive PDXs not harbouring the best actionable mutation for that case. Thus, ML multi-gene predictors generally have much fewer false negatives than the corresponding single-gene marker. (4) Conclusions: As PDXs often recapitulate clinical outcomes, these results suggest that many more patients could benefit from precision oncology if ML algorithms were also applied to existing clinical pharmacogenomics data, especially those algorithms generating classifiers combining data-selected gene alterations.
Collapse
Affiliation(s)
- Linh C. Nguyen
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France;
- Institut Paoli-Calmettes, F-13009 Marseille, France
- Aix-Marseille Université UM105, F-13009 Marseille, France
- CNRS UMR7258, F-13009 Marseille, France
- Department of Life Sciences, University of Science and Technology of Hanoi, Vietnam Academy of Science and Technology, Hanoi 100803, Vietnam
| | - Stefan Naulaerts
- Ludwig Institute for Cancer Research, 1200 Brussels, Belgium;
- Duve Institute, UCLouvain, 1200 Brussels, Belgium
| | | | - Ghita Ghislat
- Centre d’Immunologie de Marseille-Luminy, INSERM U1104, CNRS UMR7280, F-13009 Marseille, France;
| | - Pedro J. Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France;
- Institut Paoli-Calmettes, F-13009 Marseille, France
- Aix-Marseille Université UM105, F-13009 Marseille, France
- CNRS UMR7258, F-13009 Marseille, France
- Correspondence: ; Tel.: + 33-(0)4-8697-7201
| |
Collapse
|
11
|
Chen Y, Zhang L. How much can deep learning improve prediction of the responses to drugs in cancer cell lines? Brief Bioinform 2021; 23:6370847. [PMID: 34529029 DOI: 10.1093/bib/bbab378] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 08/21/2021] [Accepted: 08/24/2021] [Indexed: 12/24/2022] Open
Abstract
The drug response prediction problem arises from personalized medicine and drug discovery. Deep neural networks have been applied to the multi-omics data being available for over 1000 cancer cell lines and tissues for better drug response prediction. We summarize and examine state-of-the-art deep learning methods that have been published recently. Although significant progresses have been made in deep learning approach in drug response prediction, deep learning methods show their weakness for predicting the response of a drug that does not appear in the training dataset. In particular, all the five evaluated deep learning methods performed worst than the similarity-regularized matrix factorization (SRMF) method in our drug blind test. We outline the challenges in applying deep learning approach to drug response prediction and suggest unique opportunities for deep learning integrated with established bioinformatics analyses to overcome some of these challenges.
Collapse
Affiliation(s)
- Yurui Chen
- Department of Mathematics and Computational Biology Programme, National University of Singapore, 119076, Singapore
| | - Louxin Zhang
- Department of Mathematics and Computational Biology Programme, National University of Singapore, 119076, Singapore
| |
Collapse
|
12
|
Miranda SP, Baião FA, Fleck JL, Piccolo SR. Predicting drug sensitivity of cancer cells based on DNA methylation levels. PLoS One 2021; 16:e0238757. [PMID: 34506489 PMCID: PMC8432830 DOI: 10.1371/journal.pone.0238757] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 06/28/2021] [Indexed: 01/22/2023] Open
Abstract
Cancer cell lines, which are cell cultures derived from tumor samples, represent one of the least expensive and most studied preclinical models for drug development. Accurately predicting drug responses for a given cell line based on molecular features may help to optimize drug-development pipelines and explain mechanisms behind treatment responses. In this study, we focus on DNA methylation profiles as one type of molecular feature that is known to drive tumorigenesis and modulate treatment responses. Using genome-wide, DNA methylation profiles from 987 cell lines in the Genomics of Drug Sensitivity in Cancer database, we used machine-learning algorithms to evaluate the potential to predict cytotoxic responses for eight anti-cancer drugs. We compared the performance of five classification algorithms and four regression algorithms representing diverse methodologies, including tree-, probability-, kernel-, ensemble-, and distance-based approaches. We artificially subsampled the data to varying degrees, aiming to understand whether training based on relatively extreme outcomes would yield improved performance. When using classification or regression algorithms to predict discrete or continuous responses, respectively, we consistently observed excellent predictive performance when the training and test sets consisted of cell-line data. Classification algorithms performed best when we trained the models using cell lines with relatively extreme drug-response values, attaining area-under-the-receiver-operating-characteristic-curve values as high as 0.97. The regression algorithms performed best when we trained the models using the full range of drug-response values, although this depended on the performance metrics we used. Finally, we used patient data from The Cancer Genome Atlas to evaluate the feasibility of classifying clinical responses for human tumors based on models derived from cell lines. Generally, the algorithms were unable to identify patterns that predicted patient responses reliably; however, predictions by the Random Forests algorithm were significantly correlated with Temozolomide responses for low-grade gliomas.
Collapse
Affiliation(s)
- Sofia P. Miranda
- Department of Industrial Engineering, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Fernanda A. Baião
- Department of Industrial Engineering, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Julia L. Fleck
- Mines Saint-Etienne, Univ Clermont Auvergne, CNRS, UMR 6158 LIMOS, Centre CIS, Saint-Etienne, France
| | - Stephen R. Piccolo
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| |
Collapse
|
13
|
Zuo Z, Wang P, Chen X, Tian L, Ge H, Qian D. SWnet: a deep learning model for drug response prediction from cancer genomic signatures and compound chemical structures. BMC Bioinformatics 2021; 22:434. [PMID: 34507532 PMCID: PMC8434731 DOI: 10.1186/s12859-021-04352-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 08/31/2021] [Indexed: 12/13/2022] Open
Abstract
Background One of the major challenges in precision medicine is accurate prediction of individual patient’s response to drugs. A great number of computational methods have been developed to predict compounds activity using genomic profiles or chemical structures, but more exploration is yet to be done to combine genetic mutation, gene expression, and cheminformatics in one machine learning model. Results We presented here a novel deep-learning model that integrates gene expression, genetic mutation, and chemical structure of compounds in a multi-task convolutional architecture. We applied our model to the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets. We selected relevant cancer-related genes based on oncology genetics database and L1000 landmark genes, and used their expression and mutations as genomic features in model training. We obtain the cheminformatics features for compounds from PubChem or ChEMBL. Our finding is that combining gene expression, genetic mutation, and cheminformatics features greatly enhances the predictive performance. Conclusion We implemented an extended Graph Neural Network for molecular graphs and Convolutional Neural Network for gene features. With the employment of multi-tasking and self-attention functions to monitor the similarity between compounds, our model outperforms recently published methods using the same training and testing datasets. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04352-9.
Collapse
Affiliation(s)
- Zhaorui Zuo
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China
| | - Penglei Wang
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China
| | - Xiaowei Chen
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China
| | - Li Tian
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China
| | - Hui Ge
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China.
| | - Dahong Qian
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China.
| |
Collapse
|
14
|
Rafique R, Islam SR, Kazi JU. Machine learning in the prediction of cancer therapy. Comput Struct Biotechnol J 2021; 19:4003-4017. [PMID: 34377366 PMCID: PMC8321893 DOI: 10.1016/j.csbj.2021.07.003] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 07/06/2021] [Accepted: 07/07/2021] [Indexed: 12/15/2022] Open
Abstract
Resistance to therapy remains a major cause of cancer treatment failures, resulting in many cancer-related deaths. Resistance can occur at any time during the treatment, even at the beginning. The current treatment plan is dependent mainly on cancer subtypes and the presence of genetic mutations. Evidently, the presence of a genetic mutation does not always predict the therapeutic response and can vary for different cancer subtypes. Therefore, there is an unmet need for predictive models to match a cancer patient with a specific drug or drug combination. Recent advancements in predictive models using artificial intelligence have shown great promise in preclinical settings. However, despite massive improvements in computational power, building clinically useable models remains challenging due to a lack of clinically meaningful pharmacogenomic data. In this review, we provide an overview of recent advancements in therapeutic response prediction using machine learning, which is the most widely used branch of artificial intelligence. We describe the basics of machine learning algorithms, illustrate their use, and highlight the current challenges in therapy response prediction for clinical practice.
Collapse
Affiliation(s)
| | - S.M. Riazul Islam
- Department of Computer Science and Engineering, Sejong University, Seoul, South Korea
| | - Julhash U. Kazi
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Corresponding author at: Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Medicon village Building 404:C3, Scheelevägen 8, 22363 Lund, Sweden.
| |
Collapse
|
15
|
Meybodi FY, Eslahchi C. Predicting Anti-Cancer Drug Response by Finding Optimal Subset of Drugs. Bioinformatics 2021; 37:4509-4516. [PMID: 34170297 DOI: 10.1093/bioinformatics/btab466] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 05/26/2021] [Accepted: 06/22/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION One of the most difficult challenges in precision medicine is determining the best treatment strategy for each patient based on personal information. Since drug response prediction in vitro is extremely expensive, time-consuming, and virtually impossible, and because there are so many cell lines and drug data, computational methods are needed. RESULTS MinDrug is a method for predicting anti-cancer drug response which try to identify the best subset of drugs that are the most similar to other drugs. MinDrug predicts the anti-cancer drug response on a new cell line using information from drugs in this subset and their connections to other drugs. MinDrug employs a heuristic star algorithm to identify an optimal subset of drugs and a regression technique known as Elastic-Net approaches to predict anti-cancer drug response in a new cell line. To test MinDrug, we use both statistical and biological methods to assess the selected drugs. MinDrug is also compared to four state-of-the-art approaches using various k-fold cross-validations on two large public datasets: GDSC and CCLE. MinDrug outperforms the other approaches in terms of precision, robustness, and speed. Furthermore, we compare the evaluation results of all the approaches with an external dataset with a statistical distribution that is not exactly the same as the training data. The results show that MinDrug continues to outperform the other approaches. AVAILABILITY MinDrug's source code can be found at https://github.com/yassaee/MinDrug. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fatemeh Yassaee Meybodi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran.,School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| |
Collapse
|
16
|
A Methodological Framework to Discover Pharmacogenomic Interactions Based on Random Forests. Genes (Basel) 2021; 12:genes12060933. [PMID: 34207374 PMCID: PMC8235396 DOI: 10.3390/genes12060933] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 06/15/2021] [Accepted: 06/16/2021] [Indexed: 01/01/2023] Open
Abstract
The identification of genomic alterations in tumor tissues, including somatic mutations, deletions, and gene amplifications, produces large amounts of data, which can be correlated with a diversity of therapeutic responses. We aimed to provide a methodological framework to discover pharmacogenomic interactions based on Random Forests. We matched two databases from the Cancer Cell Line Encyclopaedia (CCLE) project, and the Genomics of Drug Sensitivity in Cancer (GDSC) project. For a total of 648 shared cell lines, we considered 48,270 gene alterations from CCLE as input features and the area under the dose-response curve (AUC) for 265 drugs from GDSC as the outcomes. A three-step reduction to 501 alterations was performed, selecting known driver genes and excluding very frequent/infrequent alterations and redundant ones. For each model, we used the concordance correlation coefficient (CCC) for assessing the predictive performance, and permutation importance for assessing the contribution of each alteration. In a reasonable computational time (56 min), we identified 12 compounds whose response was at least fairly sensitive (CCC > 20) to the alteration profiles. Some diversities were found in the sets of influential alterations, providing clues to discover significant drug-gene interactions. The proposed methodological framework can be helpful for mining pharmacogenomic interactions.
Collapse
|
17
|
Nouri-Mahdavi K, Mohammadzadeh V, Rabiolo A, Edalati K, Caprioli J, Yousefi S. Prediction of Visual Field Progression from OCT Structural Measures in Moderate to Advanced Glaucoma. Am J Ophthalmol 2021; 226:172-181. [PMID: 33529590 DOI: 10.1016/j.ajo.2021.01.023] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Revised: 01/23/2021] [Accepted: 01/25/2021] [Indexed: 01/29/2023]
Abstract
PURPOSE To test the hypothesis that visual field (VF) progression can be predicted from baseline and longitudinal optical coherence tomography (OCT) structural measurements. DESIGN Prospective cohort study. METHODS A total of 104 eyes (104 patients) with ≥3 years of follow-up and ≥5 VF examinations were enrolled. We defined VF progression based on pointwise linear regression on 24-2 VF (≥3 locations with slope less than or equal to -1.0 dB/year and P < .01). We used elastic net logistic regression (ENR) and machine learning to predict VF progression with demographics, baseline circumpapillary retinal nerve fiber layer (RNFL), macular ganglion cell/inner plexiform layer (GCIPL) thickness, and RNFL and GCIPL change rates at central 24 superpixels and 3 eccentricities, 3.4°, 5.5°, and 6.8°, from fovea and hemimaculas. Areas-under-ROC curves (AUC) were used to compare models. RESULTS Average ± SD follow-up and VF examinations were 4.5 ± 0.9 years and 8.7 ± 1.6, respectively. VF progression was detected in 23 eyes (22%). ENR selected rates of change of superotemporal RNFL sector and GCIPL change rates in 5 central superpixels and at 3.4° and 5.6° eccentricities as the best predictor subset (AUC = 0.79 ± 0.12). Best machine learning predictors consisted of baseline superior hemimacular GCIPL thickness and GCIPL change rates at 3.4° eccentricity and 3 central superpixels (AUC = 0.81 ± 0.10). Models using GCIPL-only structural variables performed better than RNFL-only models. CONCLUSIONS VF progression can be predicted with clinically relevant accuracy from baseline and longitudinal structural data. Further refinement of proposed models would assist clinicians with timely prediction of functional glaucoma progression and clinical decision making.
Collapse
|
18
|
Serafim MSM, Dos Santos Júnior VS, Gertrudes JC, Maltarollo VG, Honorio KM. Machine learning techniques applied to the drug design and discovery of new antivirals: a brief look over the past decade. Expert Opin Drug Discov 2021; 16:961-975. [PMID: 33957833 DOI: 10.1080/17460441.2021.1918098] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Introduction: Drug design and discovery of new antivirals will always be extremely important in medicinal chemistry, taking into account known and new viral diseases that are yet to come. Although machine learning (ML) have shown to improve predictions on the biological potential of chemicals and accelerate the discovery of drugs over the past decade, new methods and their combinations have improved their performance and established promising perspectives regarding ML in the search for new antivirals.Areas covered: The authors consider some interesting areas that deal with different ML techniques applied to antivirals. Recent innovative studies on ML and antivirals were selected and analyzed in detail. Also, the authors provide a brief look at the past to the present to detect advances and bottlenecks in the area.Expert opinion: From classical ML techniques, it was possible to boost the searches for antivirals. However, from the emergence of new algorithms and the improvement in old approaches, promising results will be achieved every day, as we have observed in the case of SARS-CoV-2. Recent experience has shown that it is possible to use ML to discover new antiviral candidates from virtual screening and drug repurposing.
Collapse
Affiliation(s)
- Mateus Sá Magalhães Serafim
- Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | | | - Jadson Castro Gertrudes
- Departamento de Computação, Instituto de Ciências Exatas e Biológicas, Universidade Federal de Ouro Preto (UFOP), Ouro Preto, Brazil
| | - Vinícius Gonçalves Maltarollo
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Kathia Maria Honorio
- Escola de Artes, Ciências e Humanidades, Universidade de São Paulo (USP), São Paulo, Brazil.,Centro de Ciências Naturais e Humanas, Universidade Federal do ABC (UFABC), Santo André, Brazil
| |
Collapse
|
19
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
20
|
Auslander N, Gussow AB, Koonin EV. Incorporating Machine Learning into Established Bioinformatics Frameworks. Int J Mol Sci 2021; 22:2903. [PMID: 33809353 PMCID: PMC8000113 DOI: 10.3390/ijms22062903] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/08/2021] [Accepted: 03/10/2021] [Indexed: 12/23/2022] Open
Abstract
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.
Collapse
Affiliation(s)
| | | | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;
| |
Collapse
|
21
|
Jamin A, Abraham P, Humeau-Heurtier A. Machine learning for predictive data analytics in medicine: A review illustrated by cardiovascular and nuclear medicine examples. Clin Physiol Funct Imaging 2020; 41:113-127. [PMID: 33316137 DOI: 10.1111/cpf.12686] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 11/01/2020] [Accepted: 12/01/2020] [Indexed: 12/13/2022]
Abstract
The evidence-based medicine allows the physician to evaluate the risk-benefit ratio of a treatment through setting and data. Risk-based choices can be done by the doctor using different information. With the emergence of new technologies, a large amount of data is recorded offering interesting perspectives with machine learning for predictive data analytics. Machine learning is an ensemble of methods that process data to model a learning problem. Supervised machine learning algorithms consist in using annotated data to construct the model. This category allows to solve prediction data analytics problems. In this paper, we detail the use of supervised machine learning algorithms for predictive data analytics problems in medicine. In the medical field, data can be split into two categories: medical images and other data. For brevity, our review deals with any kind of medical data excluding images. In this article, we offer a discussion around four supervised machine learning approaches: information-based, similarity-based, probability-based and error-based approaches. Each method is illustrated with detailed cardiovascular and nuclear medicine examples. Our review shows that model ensemble (ME) and support vector machine (SVM) methods are the most popular. SVM, ME and artificial neural networks often lead to better results than those given by other algorithms. In the coming years, more studies, more data, more tools and more methods will, for sure, be proposed.
Collapse
Affiliation(s)
- Antoine Jamin
- COTTOS Médical, Avrillé, France.,LERIA-Laboratoire d'Etude et de Recherche en Informatique d'Angers, Univ. Angers, Angers, France.,LARIS-Laboratoire Angevin de Recherche en Ingénierie des Systèmes, Univ. Angers, Angers, France
| | - Pierre Abraham
- Sports Medicine Department, UMR Mitovasc CNRS 6015 INSERM 1228, Angers University Hospital, Angers, France
| | - Anne Humeau-Heurtier
- LARIS-Laboratoire Angevin de Recherche en Ingénierie des Systèmes, Univ. Angers, Angers, France
| |
Collapse
|
22
|
Exploring the Antitumor Mechanisms of Zingiberis Rhizoma Combined with Coptidis Rhizoma Using a Network Pharmacology Approach. BIOMED RESEARCH INTERNATIONAL 2020; 2020:8887982. [PMID: 33426081 PMCID: PMC7781700 DOI: 10.1155/2020/8887982] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 10/30/2020] [Accepted: 11/25/2020] [Indexed: 12/24/2022]
Abstract
Background Although the combination of Zingiberis rhizoma (ZR) and Coptidis rhizoma (CR) is a classic traditional Chinese medicine-based herbal pair used for its antitumor effect, the material basis and underlying mechanisms are unclear. Here, a network pharmacology approach was used to elucidate the antitumor mechanisms of ZR-CR. Materials and Methods To predict the targets of ZR-CR in treating tumors, we constructed protein–protein interactions and hub component-target networks and performed pathway and process enrichment and molecular docking analysis. We used a surface plasmon resonance (SPR) assay to validate the predicted component-target affinities. Hub gene expression and survival analysis in patients with tumors were used to predict the clinical significance. Results The active components of ZR-CR—shogaol, daucosterol, ginkgetin, berberine, quercetin, chlorogenic acid, and vanillic acid—exhibited antitumor activities via the MAPK, PI3K-AKT, TNF, FOXO, HIF-1, and VEGF signaling pathways. Molecular docking and SPR analyses suggested direct binding of berberine with AKT1 and TP53; quercetin with EGFR and VEGF165; and ginkgetin, isoginkgetin, and daucosterol with VEGF165 with weak affinities. Gene expression levels of the hub targets of ZR-CR were associated with overall survival and disease-free survival in patients with various tumor types. Conclusions The antitumor components of the ZR-CR herbal pair and the mechanisms underlying their antitumor effects were identified. These antitumor components deserve to be explored further in experimental and clinical studies.
Collapse
|
23
|
Elbadawi M, Gaisford S, Basit AW. Advanced machine-learning techniques in drug discovery. Drug Discov Today 2020; 26:769-777. [PMID: 33290820 DOI: 10.1016/j.drudis.2020.12.003] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 11/16/2020] [Accepted: 12/02/2020] [Indexed: 01/20/2023]
Abstract
The popularity of machine learning (ML) across drug discovery continues to grow, yielding impressive results. As their use increases, so do their limitations become apparent. Such limitations include their need for big data, sparsity in data, and their lack of interpretability. It has also become apparent that the techniques are not truly autonomous, requiring retraining even post deployment. In this review, we detail the use of advanced techniques to circumvent these challenges, with examples drawn from drug discovery and allied disciplines. In addition, we present emerging techniques and their potential role in drug discovery. The techniques presented herein are anticipated to expand the applicability of ML in drug discovery.
Collapse
Affiliation(s)
- Moe Elbadawi
- Department of Pharmaceutics, UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK
| | - Simon Gaisford
- Department of Pharmaceutics, UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK; FabRx Ltd, 3 Romney Road, Ashford, TN24 0RW, UK
| | - Abdul W Basit
- Department of Pharmaceutics, UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK; FabRx Ltd, 3 Romney Road, Ashford, TN24 0RW, UK.
| |
Collapse
|
24
|
Huang LC, Yeung W, Wang Y, Cheng H, Venkat A, Li S, Ma P, Rasheed K, Kannan N. Quantitative Structure-Mutation-Activity Relationship Tests (QSMART) model for protein kinase inhibitor response prediction. BMC Bioinformatics 2020; 21:520. [PMID: 33183223 PMCID: PMC7664030 DOI: 10.1186/s12859-020-03842-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Accepted: 10/27/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Protein kinases are a large family of druggable proteins that are genomically and proteomically altered in many human cancers. Kinase-targeted drugs are emerging as promising avenues for personalized medicine because of the differential response shown by altered kinases to drug treatment in patients and cell-based assays. However, an incomplete understanding of the relationships connecting genome, proteome and drug sensitivity profiles present a major bottleneck in targeting kinases for personalized medicine. RESULTS In this study, we propose a multi-component Quantitative Structure-Mutation-Activity Relationship Tests (QSMART) model and neural networks framework for providing explainable models of protein kinase inhibition and drug response ([Formula: see text]) profiles in cell lines. Using non-small cell lung cancer as a case study, we show that interaction terms that capture associations between drugs, pathways, and mutant kinases quantitatively contribute to the response of two EGFR inhibitors (afatinib and lapatinib). In particular, protein-protein interactions associated with the JNK apoptotic pathway, associations between lung development and axon extension, and interaction terms connecting drug substructures and the volume/charge of mutant residues at specific structural locations contribute significantly to the observed [Formula: see text] values in cell-based assays. CONCLUSIONS By integrating multi-omics data in the QSMART model, we not only predict drug responses in cancer cell lines with high accuracy but also identify features and explainable interaction terms contributing to the accuracy. Although we have tested our multi-component explainable framework on protein kinase inhibitors, it can be extended across the proteome to investigate the complex relationships connecting genotypes and drug sensitivity profiles.
Collapse
Affiliation(s)
- Liang-Chin Huang
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Wayland Yeung
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Ye Wang
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602 USA
| | - Huimin Cheng
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602 USA
| | - Aarya Venkat
- Department of Biochemistry and Molecular Biology, 120 Green St., Athens, GA 30602 USA
| | - Sheng Li
- Department of Computer Science, 415 Boyd Graduate Studies Research Center, Athens, GA 30602 USA
| | - Ping Ma
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602 USA
| | - Khaled Rasheed
- Department of Computer Science, 415 Boyd Graduate Studies Research Center, Athens, GA 30602 USA
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
- Department of Biochemistry and Molecular Biology, 120 Green St., Athens, GA 30602 USA
| |
Collapse
|
25
|
Kuenzi BM, Park J, Fong SH, Sanchez KS, Lee J, Kreisberg JF, Ma J, Ideker T. Predicting Drug Response and Synergy Using a Deep Learning Model of Human Cancer Cells. Cancer Cell 2020; 38:672-684.e6. [PMID: 33096023 PMCID: PMC7737474 DOI: 10.1016/j.ccell.2020.09.014] [Citation(s) in RCA: 181] [Impact Index Per Article: 45.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 08/07/2020] [Accepted: 09/22/2020] [Indexed: 12/16/2022]
Abstract
Most drugs entering clinical trials fail, often related to an incomplete understanding of the mechanisms governing drug response. Machine learning techniques hold immense promise for better drug response predictions, but most have not reached clinical practice due to their lack of interpretability and their focus on monotherapies. We address these challenges by developing DrugCell, an interpretable deep learning model of human cancer cells trained on the responses of 1,235 tumor cell lines to 684 drugs. Tumor genotypes induce states in cellular subsystems that are integrated with drug structure to predict response to therapy and, simultaneously, learn biological mechanisms underlying the drug response. DrugCell predictions are accurate in cell lines and also stratify clinical outcomes. Analysis of DrugCell mechanisms leads directly to the design of synergistic drug combinations, which we validate systematically by combinatorial CRISPR, drug-drug screening in vitro, and patient-derived xenografts. DrugCell provides a blueprint for constructing interpretable models for predictive medicine.
Collapse
Affiliation(s)
- Brent M Kuenzi
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jisoo Park
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Samson H Fong
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Kyle S Sanchez
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - John Lee
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jason F Kreisberg
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jianzhu Ma
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA; Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
26
|
Kibble M, Khan SA, Ammad-ud-din M, Bollepalli S, Palviainen T, Kaprio J, Pietiläinen KH, Ollikainen M. An integrative machine learning approach to discovering multi-level molecular mechanisms of obesity using data from monozygotic twin pairs. ROYAL SOCIETY OPEN SCIENCE 2020; 7:200872. [PMID: 33204460 PMCID: PMC7657920 DOI: 10.1098/rsos.200872] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 09/29/2020] [Indexed: 05/19/2023]
Abstract
We combined clinical, cytokine, genomic, methylation and dietary data from 43 young adult monozygotic twin pairs (aged 22-36 years, 53% female), where 25 of the twin pairs were substantially weight discordant (delta body mass index > 3 kg m-2). These measurements were originally taken as part of the TwinFat study, a substudy of The Finnish Twin Cohort study. These five large multivariate datasets (comprising 42, 71, 1587, 1605 and 63 variables, respectively) were jointly analysed using an integrative machine learning method called group factor analysis (GFA) to offer new hypotheses into the multi-molecular-level interactions associated with the development of obesity. New potential links between cytokines and weight gain are identified, as well as associations between dietary, inflammatory and epigenetic factors. This encouraging case study aims to enthuse the research community to boldly attempt new machine learning approaches which have the potential to yield novel and unintuitive hypotheses. The source code of the GFA method is publically available as the R package GFA.
Collapse
Affiliation(s)
- Milla Kibble
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- Author for correspondence: Milla Kibble e-mail:
| | - Suleiman A. Khan
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Muhammad Ammad-ud-din
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Sailalitha Bollepalli
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Teemu Palviainen
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Jaakko Kaprio
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Department of Public Health, University of Helsinki, Helsinki, Finland
| | - Kirsi H. Pietiläinen
- Obesity Research Unit, Helsinki University Central Hospital and University of Helsinki, Helsinki, Finland
| | - Miina Ollikainen
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| |
Collapse
|
27
|
Ahmadi Moughari F, Eslahchi C. ADRML: anticancer drug response prediction using manifold learning. Sci Rep 2020; 10:14245. [PMID: 32859983 PMCID: PMC7456328 DOI: 10.1038/s41598-020-71257-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 08/13/2020] [Indexed: 12/05/2022] Open
Abstract
One of the prominent challenges in precision medicine is to select the most appropriate treatment strategy for each patient based on the personalized information. The availability of massive data about drugs and cell lines facilitates the possibility of proposing efficient computational models for predicting anticancer drug response. In this study, we propose ADRML, a model for Anticancer Drug Response Prediction using Manifold Learning to systematically integrate the cell line information with the drug information to make accurate predictions about drug therapeutic. The proposed model maps the drug response matrix into the lower-rank spaces that lead to obtaining new perspectives about cell lines and drugs. The drug response for a new cell line-drug pair is computed using the low-rank features. The evaluation of ADRML performance on various types of cell lines and drug information, in addition to the comparisons with previously proposed methods, shows that ADRML provides accurate and robust predictions. Further investigations about the association between drug response and pathway activity scores reveal that the predicted drug responses can shed light on the underlying drug mechanism. Also, the case studies suggest that the predictions of ADRML about novel cell line-drug pairs are validated by reliable pieces of evidence from the literature. Consequently, the evaluations verify that ADRML can be used in accurately predicting and imputing the anticancer drug response.
Collapse
Affiliation(s)
- Fatemeh Ahmadi Moughari
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. .,School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| |
Collapse
|
28
|
Abstract
Machine learning is a set of techniques that promise to greatly enhance our data-processing capability. In the field of oncology, ML presents itself with a wealth of possible applications to the research and the clinical context, such as automated diagnosis and precise treatment modulation. In this paper, we will review the principal applications of ML techniques in oncology and explore in detail how they work. This will allow us to discuss the issues and challenges that ML faces in this field, and ultimately gain a greater understanding of ML techniques and how they can improve oncological research and practice.
Collapse
Affiliation(s)
- Cecilia Nardini
- European School of Molecular Medicine (SEMM), 20139 Milan, Italy
| |
Collapse
|
29
|
Adam G, Rampášek L, Safikhani Z, Smirnov P, Haibe-Kains B, Goldenberg A. Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precis Oncol 2020; 4:19. [PMID: 32566759 PMCID: PMC7296033 DOI: 10.1038/s41698-020-0122-1] [Citation(s) in RCA: 125] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 04/17/2020] [Indexed: 12/24/2022] Open
Abstract
Cancer is a leading cause of death worldwide. Identifying the best treatment using computational models to personalize drug response prediction holds great promise to improve patient's chances of successful recovery. Unfortunately, the computational task of predicting drug response is very challenging, partially due to the limitations of the available data and partially due to algorithmic shortcomings. The recent advances in deep learning may open a new chapter in the search for computational drug response prediction models and ultimately result in more accurate tools for therapy response. This review provides an overview of the computational challenges and advances in drug response prediction, and focuses on comparing the machine learning techniques to be of utmost practical use for clinicians and machine learning non-experts. The incorporation of new data modalities such as single-cell profiling, along with techniques that rapidly find effective drug combinations will likely be instrumental in improving cancer care.
Collapse
Affiliation(s)
- George Adam
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON Canada
- Department of Computer Science, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
| | - Ladislav Rampášek
- Department of Computer Science, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, ON Canada
| | - Zhaleh Safikhani
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON Canada
| | - Petr Smirnov
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Ontario Institute for Cancer Research, Toronto, ON Canada
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON Canada
- Department of Computer Science, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON Canada
- Ontario Institute for Cancer Research, Toronto, ON Canada
| | - Anna Goldenberg
- Department of Computer Science, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, ON Canada
| |
Collapse
|
30
|
Koras K, Juraeva D, Kreis J, Mazur J, Staub E, Szczurek E. Feature selection strategies for drug sensitivity prediction. Sci Rep 2020; 10:9377. [PMID: 32523056 PMCID: PMC7287073 DOI: 10.1038/s41598-020-65927-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 05/06/2020] [Indexed: 12/16/2022] Open
Abstract
Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. Critically, the sensitivity of cancer cells to treatment depends on an unknown subset of a large number of biological features. Here, we compare standard, data-driven feature selection approaches to feature selection driven by prior knowledge of drug targets, target pathways, and gene expression signatures. We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, evaluating 2484 unique models. For 23 drugs, better predictive performance is achieved when the features are selected according to prior knowledge of drug targets and pathways. The best correlation of observed and predicted response using the test set is achieved for Linifanib (r = 0.75). Extending the drug-dependent features with gene expression signatures yields the most predictive models for 60 drugs, with the best performing example of Dabrafenib. For many compounds, even a very small subset of drug-related features is highly predictive of drug sensitivity. Small feature sets selected using prior knowledge are more predictive for drugs targeting specific genes and pathways, while models with wider feature sets perform better for drugs affecting general cellular mechanisms. Appropriate feature selection strategies facilitate the development of interpretable models that are indicative for therapy design.
Collapse
Affiliation(s)
- Krzysztof Koras
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Dilafruz Juraeva
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Julian Kreis
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Johanna Mazur
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Eike Staub
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Ewa Szczurek
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland.
| |
Collapse
|
31
|
Wang H, Xi J, Wang M, Li A. Dual-Layer Strengthened Collaborative Topic Regression Modeling for Predicting Drug Sensitivity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:587-598. [PMID: 30106738 DOI: 10.1109/tcbb.2018.2864739] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
An effective way to facilitate the development of modern oncology precision medicine is the systematical analysis of the known drug sensitivities that have emerged in recent years. Meanwhile, the screening of drug response in cancer cell lines provides an estimable genomic and pharmacological data towards high accuracy prediction. Existing works primarily utilize genomic or functional genomic features to classify or regress the drug response. Here in this work, by the migration and extension of the conventional merchandise recommendation methods, we introduce an innovation model on accurate drug sensitivity prediction by using dual-layer strengthened collaborative topic regression (DS-CTR), which incorporates not only the graphic model to jointly learn drugs and cell lines feature from pharmacogenomics data but also drug and cell line similarity network model to strengthen the correlation of the prediction results. Using Genomics of Drug Sensitivity in Cancer project (GDSC) as benchmark datasets, the 5-fold cross-validation experiment demonstrates that DS-CTR model significantly improves drug response prediction performance compared with four categories of state-of-the-art algorithms as for both Receiver Operator Curve (ROC) and the Area Under Receiver Operator Curve (AUC). By uncovering the unknown cell-drug associations with advanced literature evidences, our novel model DS-CTR is validated and supported. The model also provides the possibility to make the discovery of new anti-cancer therapeutics in the preclinical trials cheaper and faster.
Collapse
|
32
|
Chen J, Zhang L. A survey and systematic assessment of computational methods for drug response prediction. Brief Bioinform 2020; 22:232-246. [PMID: 31927568 DOI: 10.1093/bib/bbz164] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Drug response prediction arises from both basic and clinical research of personalized therapy, as well as drug discovery for cancers. With gene expression profiles and other omics data being available for over 1000 cancer cell lines and tissues, different machine learning approaches have been applied to drug response prediction. These methods appear in a body of literature and have been evaluated on different datasets with only one or two accuracy metrics. We systematically assess 17 representative methods for drug response prediction, which have been developed in the past 5 years, on four large public datasets in nine metrics. This study provides insights and lessons for future research into drug response prediction.
Collapse
|
33
|
Güvenç Paltun B, Mamitsuka H, Kaski S. Improving drug response prediction by integrating multiple data sources: matrix factorization, kernel and network-based approaches. Brief Bioinform 2019; 22:346-359. [PMID: 31838491 PMCID: PMC7820853 DOI: 10.1093/bib/bbz153] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 11/01/2019] [Accepted: 11/04/2019] [Indexed: 12/17/2022] Open
Abstract
Predicting the response of cancer cell lines to specific drugs is one of the central problems in personalized medicine, where the cell lines show diverse characteristics. Researchers have developed a variety of computational methods to discover associations between drugs and cell lines, and improved drug sensitivity analyses by integrating heterogeneous biological data. However, choosing informative data sources and methods that can incorporate multiple sources efficiently is the challenging part of successful analysis in personalized medicine. The reason is that finding decisive factors of cancer and developing methods that can overcome the problems of integrating data, such as differences in data structures and data complexities, are difficult. In this review, we summarize recent advances in data integration-based machine learning for drug response prediction, by categorizing methods as matrix factorization-based, kernel-based and network-based methods. We also present a short description of relevant databases used as a benchmark in drug response prediction analyses, followed by providing a brief discussion of challenges faced in integrating and interpreting data from multiple sources. Finally, we address the advantages of combining multiple heterogeneous data sources on drug sensitivity analysis by showing an experimental comparison. Contact: betul.guvenc@aalto.fi
Collapse
Affiliation(s)
- Betül Güvenç Paltun
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Samuel Kaski
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| |
Collapse
|
34
|
Griffiths JI, Cohen AL, Jones V, Salgia R, Chang JT, Bild AH. Opportunities for improving cancer treatment using systems biology. ACTA ACUST UNITED AC 2019; 17:41-50. [PMID: 32518857 DOI: 10.1016/j.coisb.2019.10.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Current cancer therapies target a limited set of tumor features, rather than considering the tumor as a whole. Systems biology aims to reveal therapeutic targets associated with a variety of facets in an individual's tumor, such as genetic heterogeneity and its evolution, cancer cell-autonomous phenotypes, and microenvironmental signaling. These disparate characteristics can be reconciled using mathematical modeling that incorporates concepts from ecology and evolution. This provides an opportunity to predict tumor growth and response to therapy, to tailor patient-specific approaches in real time or even prospectively. Importantly, as data regarding patient tumors is often available from only limited time points during treatment, systems-based approaches can address this limitation by interpolating longitudinal events within a principled framework. This review outlines areas in medicine that could benefit from systems biology approaches to deconvolve the complexity of cancer.
Collapse
Affiliation(s)
- Jason I Griffiths
- Department of Mathematics, University of Utah, Salt Lake City, UT 84112, USA
| | - Adam L Cohen
- Huntsman Cancer Institute, Department of Internal Medicine, University of Utah, Salt Lake City, UT 84112, USA
| | - Veronica Jones
- Department of Surgery, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Ravi Salgia
- Department of Medical Oncology, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Jeffrey T Chang
- Department of Integrative Biology and Pharmacology, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Andrea H Bild
- Department of Medical Oncology, Division of Molecular Pharmacology, Beckman Research Institute of City of Hope, Duarte, CA 91010, USA
| |
Collapse
|
35
|
Koromina M, Pandi MT, Patrinos GP. Rethinking Drug Repositioning and Development with Artificial Intelligence, Machine Learning, and Omics. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2019; 23:539-548. [PMID: 31651216 DOI: 10.1089/omi.2019.0151] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Pharmaceutical industry and the art and science of drug development are sorely in need of novel transformative technologies in the current age of digital health and artificial intelligence (AI). Often described as game-changing technologies, AI and machine learning algorithms have slowly but surely begun to revolutionize pharmaceutical industry and drug development over the past 5 years. In this expert review, we describe the most frequently used machine learning algorithms in drug development pipelines and the -omics databases well poised to support machine learning and drug discovery. Subsequently, we analyze the emerging new computational approaches to drug discovery and the in silico pipelines for drug repositioning and the synergies among -omics system sciences, AI and machine learning. As with system sciences, AI and machine learning embody a system scale and Big Data driven vision for drug discovery and development. We conclude with a future outlook on the ways in which machine learning approaches can be implemented to buttress and expedite drug discovery and precision medicine. As AI and machine learning are rapidly entering pharmaceutical industry and the art and science of drug development, we need to critically examine the attendant prospects and challenges to benefit patients and public health.
Collapse
Affiliation(s)
- Maria Koromina
- Laboratory of Pharmacogenomics and Individualized Therapy, Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece
| | - Maria-Theodora Pandi
- Laboratory of Pharmacogenomics and Individualized Therapy, Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece
| | - George P Patrinos
- Laboratory of Pharmacogenomics and Individualized Therapy, Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece.,Department of Pathology, College of Medicine and Health Sciences, United Arab Emirates University, Al-Ain, Abu Dhabi.,Zayed Center of Health Sciences, United Arab Emirates University, Al-Ain, Abu Dhabi
| |
Collapse
|
36
|
Parca L, Pepe G, Pietrosanto M, Galvan G, Galli L, Palmeri A, Sciandrone M, Ferrè F, Ausiello G, Helmer-Citterich M. Modeling cancer drug response through drug-specific informative genes. Sci Rep 2019; 9:15222. [PMID: 31645597 PMCID: PMC6811538 DOI: 10.1038/s41598-019-50720-0] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 09/06/2019] [Indexed: 12/18/2022] Open
Abstract
Recent advances in pharmacogenomics have generated a wealth of data of different types whose analysis have helped in the identification of signatures of different cellular sensitivity/resistance responses to hundreds of chemical compounds. Among the different data types, gene expression has proven to be the more successful for the inference of drug response in cancer cell lines. Although effective, the whole transcriptome can introduce noise in the predictive models, since specific mechanisms are required for different drugs and these realistically involve only part of the proteins encoded in the genome. We analyzed the pharmacogenomics data of 961 cell lines tested with 265 anti-cancer drugs and developed different machine learning approaches for dissecting the genome systematically and predict drug responses using both drug-unspecific and drug-specific genes. These methodologies reach better response predictions for the vast majority of the screened drugs using tens to few hundreds genes specific to each drug instead of the whole genome, thus allowing a better understanding and interpretation of drug-specific response mechanisms which are not necessarily restricted to the drug known targets.
Collapse
Affiliation(s)
- Luca Parca
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Gerardo Pepe
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Marco Pietrosanto
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Giulio Galvan
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Leonardo Galli
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Antonio Palmeri
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
- Celgene Institute for Translational Research Europe, Sevilla, Spain
| | - Marco Sciandrone
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Fabrizio Ferrè
- Department of Pharmacy and Biotechnology, University of Bologna Alma Mater, Bologna, Italy
| | - Gabriele Ausiello
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | | |
Collapse
|
37
|
Robert BM, Brindha GR, Santhi B, Kanimozhi G, Prasad NR. Computational models for predicting anticancer drug efficacy: A multi linear regression analysis based on molecular, cellular and clinical data of oral squamous cell carcinoma cohort. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 178:105-112. [PMID: 31416538 DOI: 10.1016/j.cmpb.2019.06.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 04/15/2019] [Accepted: 06/11/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND AND OBJECTIVES The computational prediction of drug responses based on the analysis of multiple clinical features of the tumor will be a novel strategy for accomplishing the long-term goal of precision medicine in oncology. The cancer patients will be benefitted if we computationally account all the tumor characteristics (data) for the selection of most effective and precise therapeutic drug. In this study, we developed and validated few computational models to predict anticancer drug efficacy based on molecular, cellular and clinical features of 31 oral squamous cell carcinoma (OSCC) cohort using computational methods. METHODS We developed drug efficacy prediction models using multiple tumor features by employing the statistical methods like multi linear regression (MLR), modified MLR-weighted least square (MLR-WLS) and enhanced MLR-WLS. All the three developed drug efficacy prediction models were then validated using the data of actual OSCC samples (train-test ratio 31: 31) and actual Vs hypothetical samples (train-test ratio 31: 30). The selected best statistical model i.e. enhanced MLR-WLS has then been cross-validated (CV) using 341 theoretical tumor data. Finally, the performances of the models were assessed by the level of learning confidence, significance, accuracy and error terms. RESULTS The train-test process for the real tumor samples of MLR-WLS method revealed the drug efficacy prediction enhancement and we observed that there was very less priming difference between actual and predicted. Furthermore, we found there was a less difference between actual apoptotic priming and predicted apoptotic priming for the tumors 6, 8, 21 and 30 whereas, for the remaining tumors there were no differences between predicted and actual priming data. The error terms (Actual Vs Predicted) also revealed the reliability of enhanced MLR-WLS model for drug efficacy prediction. CONCLUSION We developed effective computational prediction models using MLR analysis for anticancer drug efficacy which will be useful in the field of precision medicine to choose the choice of drug in a personalized manner. We observed that the enhanced MLR-WLS model was the best fit to predict anticancer drug efficacy which may have translational applications.
Collapse
Affiliation(s)
- Beaulah Mary Robert
- Department of Biochemistry and Biotechnology, Annamalai University, Annamalainagar 608 002, Tamilnadu, India
| | - G R Brindha
- School of Computing, SASTRA Deemed to be University, Tirumalaisamudram, Thanjavur 613401, Tamilnadu, India.
| | - B Santhi
- School of Computing, SASTRA Deemed to be University, Tirumalaisamudram, Thanjavur 613401, Tamilnadu, India
| | - G Kanimozhi
- Department of Biochemistry, Dharmapuramn Gnanambigai Government Arts and Science College for Women, Mayiladuthurai, Tamilnadu, India
| | - Nagarajan Rajendra Prasad
- Department of Biochemistry and Biotechnology, Annamalai University, Annamalainagar 608 002, Tamilnadu, India.
| |
Collapse
|
38
|
Mannheimer JD, Duval DL, Prasad A, Gustafson DL. A systematic analysis of genomics-based modeling approaches for prediction of drug response to cytotoxic chemotherapies. BMC Med Genomics 2019; 12:87. [PMID: 31208429 PMCID: PMC6580596 DOI: 10.1186/s12920-019-0519-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Accepted: 04/29/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The availability and generation of large amounts of genomic data has led to the development of a new paradigm in cancer treatment emphasizing a precision approach at the molecular and genomic level. Statistical modeling techniques aimed at leveraging broad scale in vitro, in vivo, and clinical data for precision drug treatment has become an active area of research. As a rapidly developing discipline at the crossroads of medicine, computer science, and mathematics, techniques ranging from accepted to those on the cutting edge of artificial intelligence have been utilized. Given the diversity and complexity of these techniques a systematic understanding of fundamental modeling principles is essential to contextualize influential factors to better understand results and develop new approaches. METHODS Using data available from the Genomics of Drug Sensitivity in Cancer (GDSC) and the NCI60 we explore principle components regression, linear and non-linear support vector regression, and artificial neural networks in combination with different implementations of correlation based feature selection (CBF) on the prediction of drug response for several cytotoxic chemotherapeutic agents. RESULTS Our results indicate that the regression method and features used have marginal effects on Spearman correlation between the predicted and measured values as well as prediction error. Detailed analysis of these results reveal that the bulk relationship between tissue of origin and drug response is a major driving factor in model performance. CONCLUSION These results display one of the challenges in building predictive models for drug response in pan-cancer models. Mainly, that bulk genotypic traits where the signal to noise ratio is high is the dominant behavior captured in these models. This suggests that improved techniques of feature selection that can discriminate individual cell response from histotype response will yield more successful pan-cancer models.
Collapse
Affiliation(s)
- Joshua D. Mannheimer
- School of Biomedical Engineering, Colorado State University, Fort Collins, 80523 CO USA
- Flint Animal Cancer Center, Colorado State University, Fort Collins, 80523 CO USA
| | - Dawn L. Duval
- Flint Animal Cancer Center, Colorado State University, Fort Collins, 80523 CO USA
- Department of Clinical Sciences, Colorado State University, Fort Collins, 80523 CO USA
| | - Ashok Prasad
- School of Biomedical Engineering, Colorado State University, Fort Collins, 80523 CO USA
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, 80523 CO USA
| | - Daniel L. Gustafson
- School of Biomedical Engineering, Colorado State University, Fort Collins, 80523 CO USA
- Flint Animal Cancer Center, Colorado State University, Fort Collins, 80523 CO USA
- Department of Clinical Sciences, Colorado State University, Fort Collins, 80523 CO USA
- University of Colorado Cancer Center Developmental Therapeutics Program, University of Colorado, Aurora, 80045 CO USA
| |
Collapse
|
39
|
Knowles DA, Bouchard G, Plevritis S. Sparse discriminative latent characteristics for predicting cancer drug sensitivity from genomic features. PLoS Comput Biol 2019; 15:e1006743. [PMID: 31136571 PMCID: PMC6555538 DOI: 10.1371/journal.pcbi.1006743] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Revised: 06/07/2019] [Accepted: 12/21/2018] [Indexed: 01/28/2023] Open
Abstract
Drug screening studies typically involve assaying the sensitivity of a range of cancer cell lines across an array of anti-cancer therapeutics. Alongside these sensitivity measurements high dimensional molecular characterizations of the cell lines are typically available, including gene expression, copy number variation and genomic mutations. We propose a sparse multitask regression model which learns discriminative latent characteristics that predict drug sensitivity and are associated with specific molecular features. We use ideas from Bayesian nonparametrics to automatically infer the appropriate number of these latent characteristics. The resulting analysis couples high predictive performance with interpretability since each latent characteristic involves a typically small set of drugs, cell lines and genomic features. Our model uncovers a number of drug-gene sensitivity associations missed by single gene analyses. We functionally validate one such novel association: that increased expression of the cell-cycle regulator C/EBPδ decreases sensitivity to the histone deacetylase (HDAC) inhibitor panobinostat.
Collapse
Affiliation(s)
- David A. Knowles
- Department of Radiology, Stanford University School of Medicine, Stanford, California, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | - Gina Bouchard
- Department of Radiology, Stanford University School of Medicine, Stanford, California, USA
| | - Sylvia Plevritis
- Department of Radiology, Stanford University School of Medicine, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California, USA
| |
Collapse
|
40
|
Akay A, Hess H. Deep Learning: Current and Emerging Applications in Medicine and Technology. IEEE J Biomed Health Inform 2019; 23:906-920. [PMID: 30676989 DOI: 10.1109/jbhi.2019.2894713] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Machine learning is enabling researchers to analyze and understand increasingly complex physical and biological phenomena in traditional fields such as biology, medicine, and engineering and emerging fields like synthetic biology, automated chemical synthesis, and biomanufacturing. These fields require new paradigms toward understanding increasingly complex data and converting such data into medical products and services for patients. The move toward deep learning and complex modeling is an attempt to bridge the gap between acquiring massive quantities of complex data, and converting such data into practical insights. Here, we provide an overview of the field of machine learning, its current applications and needs in traditional and emerging fields, and discuss an illustrative attempt at using deep learning to understand swarm behavior of molecular shuttles.
Collapse
|
41
|
Yan K, Fang X, Xu Y, Liu B. Protein fold recognition based on multi-view modeling. Bioinformatics 2019; 35:2982-2990. [DOI: 10.1093/bioinformatics/btz040] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Revised: 12/29/2018] [Accepted: 01/16/2019] [Indexed: 12/22/2022] Open
Abstract
Abstract
Motivation
Protein fold recognition has attracted increasing attention because it is critical for studies of the 3D structures of proteins and drug design. Researchers have been extensively studying this important task, and several features with high discriminative power have been proposed. However, the development of methods that efficiently combine these features to improve the predictive performance remains a challenging problem.
Results
In this study, we proposed two algorithms: MV-fold and MT-fold. MV-fold is a new computational predictor based on the multi-view learning model for fold recognition. Different features of proteins were treated as different views of proteins, including the evolutionary information, secondary structure information and physicochemical properties. These different views constituted the latent space. The ε-dragging technique was employed to enlarge the margins between different protein folds, improving the predictive performance of MV-fold. Then, MV-fold was combined with two template-based methods: HHblits and HMMER. The ensemble method is called MT-fold incorporating the advantages of both discriminative methods and template-based methods. Experimental results on five widely used benchmark datasets (DD, RDD, EDD, TG and LE) showed that the proposed methods outperformed some state-of-the-art methods in this field, indicating that MV-fold and MT-fold are useful computational tools for protein fold recognition and protein homology detection and would be efficient tools for protein sequence analysis. Finally, we constructed an update and rigorous benchmark dataset based on SCOPe (version 2.07) to fairly evaluate the performance of the proposed method, and our method achieved stable performance on this new dataset. This new benchmark dataset will become a widely used benchmark dataset to fairly evaluate the performance of different methods for fold recognition.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ke Yan
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Xiaozhao Fang
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
| | - Yong Xu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
42
|
Zhang Y, Yang Y, Li T, Fujita H. A multitask multiview clustering algorithm in heterogeneous situations based on LLE and LE. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2018.10.001] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
43
|
Prediction Methods of Herbal Compounds in Chinese Medicinal Herbs. Molecules 2018; 23:molecules23092303. [PMID: 30201875 PMCID: PMC6225236 DOI: 10.3390/molecules23092303] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 09/06/2018] [Accepted: 09/07/2018] [Indexed: 12/12/2022] Open
Abstract
Chinese herbal medicine has recently gained worldwide attention. The curative mechanism of Chinese herbal medicine is compared with that of western medicine at the molecular level. The treatment mechanism of most Chinese herbal medicines is still not clear. How do we integrate Chinese herbal medicine compounds with modern medicine? Chinese herbal medicine drug-like prediction method is particularly important. A growing number of Chinese herbal source compounds are now widely used as drug-like compound candidates. An important way for pharmaceutical companies to develop drugs is to discover potentially active compounds from related herbs in Chinese herbs. The methods for predicting the drug-like properties of Chinese herbal compounds include the virtual screening method, pharmacophore model method and machine learning method. In this paper, we focus on the prediction methods for the medicinal properties of Chinese herbal medicines. We analyze the advantages and disadvantages of the above three methods, and then introduce the specific steps of the virtual screening method. Finally, we present the prospect of the joint application of various methods.
Collapse
|
44
|
Ali M, Aittokallio T. Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys Rev 2018; 11:31-39. [PMID: 30097794 PMCID: PMC6381361 DOI: 10.1007/s12551-018-0446-z] [Citation(s) in RCA: 105] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 07/22/2018] [Indexed: 02/07/2023] Open
Abstract
In-depth modeling of the complex interplay among multiple omics data measured from cancer cell lines or patient tumors is providing new opportunities toward identification of tailored therapies for individual cancer patients. Supervised machine learning algorithms are increasingly being applied to the omics profiles as they enable integrative analyses among the high-dimensional data sets, as well as personalized predictions of therapy responses using multi-omics panels of response-predictive biomarkers identified through feature selection and cross-validation. However, technical variability and frequent missingness in input "big data" require the application of dedicated data preprocessing pipelines that often lead to some loss of information and compressed view of the biological signal. We describe here the state-of-the-art machine learning methods for anti-cancer drug response modeling and prediction and give our perspective on further opportunities to make better use of high-dimensional multi-omics profiles along with knowledge about cancer pathways targeted by anti-cancer compounds when predicting their phenotypic responses.
Collapse
Affiliation(s)
- Mehreen Ali
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00290, Helsinki, Finland.,Helsinki Institute for Information Technology (HIIT), Aalto University, FI-02150, Espoo, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00290, Helsinki, Finland. .,Helsinki Institute for Information Technology (HIIT), Aalto University, FI-02150, Espoo, Finland. .,Department of Mathematics and Statistics, University of Turku, FI-20014, Turku, Finland.
| |
Collapse
|
45
|
Kalamara A, Tobalina L, Saez-Rodriguez J. How to find the right drug for each patient? Advances and challenges in pharmacogenomics. CURRENT OPINION IN SYSTEMS BIOLOGY 2018; 10:53-62. [PMID: 31763498 PMCID: PMC6855262 DOI: 10.1016/j.coisb.2018.07.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Cancer is a highly heterogeneous disease with complex underlying biology. For these reasons, effective cancer treatment is still a challenge. Nowadays, it is clear that a cancer therapy that fits all the cases cannot be found, and as a result the design of therapies tailored to the patient's molecular characteristics is needed. Pharmacogenomics aims to study the relationship between an individual's genotype and drug response. Scientists use different biological models, ranging from cell lines to mouse models, as proxies for patients for preclinical and translational studies. The rapid development of "-omics" technologies is increasing the amount of features that can be measured in these models, expanding the possibilities of finding predictive biomarkers of drug response. Finding these relationships requires diverse computational approaches ranging from machine learning to dynamic modeling. Despite major advances, we are still far from being able to precisely predict drug efficacy in cancer models, let alone directly on patients. We believe that the new experimental techniques and computational approaches covered in this review will bring us closer to this goal.
Collapse
Affiliation(s)
- Angeliki Kalamara
- RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen, Germany
| | - Luis Tobalina
- RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen, Germany
| | - Julio Saez-Rodriguez
- RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK
- Heidelberg University, Faculty of Medicine, Institute of Computational Biomedicine, Heidelberg, Germany
| |
Collapse
|
46
|
Camacho DM, Collins KM, Powers RK, Costello JC, Collins JJ. Next-Generation Machine Learning for Biological Networks. Cell 2018; 173:1581-1592. [PMID: 29887378 DOI: 10.1016/j.cell.2018.05.015] [Citation(s) in RCA: 469] [Impact Index Per Article: 78.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 03/10/2018] [Accepted: 05/07/2018] [Indexed: 02/07/2023]
Abstract
Machine learning, a collection of data-analytical techniques aimed at building predictive models from multi-dimensional datasets, is becoming integral to modern biological research. By enabling one to generate models that learn from large datasets and make predictions on likely outcomes, machine learning can be used to study complex cellular systems such as biological networks. Here, we provide a primer on machine learning for life scientists, including an introduction to deep learning. We discuss opportunities and challenges at the intersection of machine learning and network biology, which could impact disease biology, drug discovery, microbiome research, and synthetic biology.
Collapse
Affiliation(s)
- Diogo M Camacho
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Katherine M Collins
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Department of Brain & Cognitive Sciences and Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Rani K Powers
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - James C Costello
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.
| | - James J Collins
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Department of Biological Engineering and Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
47
|
Chen FS, Jiang HY, Jiang Z. Prediction of drug–pathway interaction pairs with a disease-combined LSA-PU-KNN method. MOLECULAR BIOSYSTEMS 2017; 13:2583-2591. [DOI: 10.1039/c7mb00441a] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
This paper proposes a prediction of potential associations between drugs and pathways based on a disease-related LSA-PU-KNN method.
Collapse
Affiliation(s)
- Fan-Shu Chen
- Shanghai Key Laboratory of Multidimensional Information Processing
- East China Normal University
- Shanghai 200262
- China
- Department of Computer Science and Technology
| | - Hui-Yan Jiang
- Shanghai Key Laboratory of Multidimensional Information Processing
- East China Normal University
- Shanghai 200262
- China
- Department of Computer Science and Technology
| | - Zhenran Jiang
- Shanghai Key Laboratory of Multidimensional Information Processing
- East China Normal University
- Shanghai 200262
- China
- Department of Computer Science and Technology
| |
Collapse
|