1
|
Matboli M, Abdelbaky I, Khaled A, Khaled R, Hamady S, Farid LM, Abouelkhair MB, El-Attar NE, Farag Fathallah M, Abd El Hamid MS, Elmakromy GM, Ali M. Machine learning based identification potential feature genes for prediction of drug efficacy in nonalcoholic steatohepatitis animal model. Lipids Health Dis 2024; 23:266. [PMID: 39182075 PMCID: PMC11344433 DOI: 10.1186/s12944-024-02231-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Accepted: 07/30/2024] [Indexed: 08/27/2024] Open
Abstract
BACKGROUND Nonalcoholic Steatohepatitis (NASH) results from complex liver conditions involving metabolic, inflammatory, and fibrogenic processes. Despite its burden, there has been a lack of any approved food-and-drug administration therapy up till now. PURPOSE Utilizing machine learning (ML) algorithms, the study aims to identify reliable potential genes to accurately predict the treatment response in the NASH animal model using biochemical and molecular markers retrieved using bioinformatics techniques. METHODS The NASH-induced rat models were administered various microbiome-targeted therapies and herbal drugs for 12 weeks, these drugs resulted in reducing hepatic lipid accumulation, liver inflammation, and histopathological changes. The ML model was trained and tested based on the Histopathological NASH score (HPS); while (0-4) HPS considered Improved NASH and (5-8) considered non-improved, confirmed through rats' liver histopathological examination, incorporates 34 features comprising 20 molecular markers (mRNAs-microRNAs-Long non-coding-RNAs) and 14 biochemical markers that are highly enriched in NASH pathogenesis. Six different ML models were used in the proposed model for the prediction of NASH improvement, with Gradient Boosting demonstrating the highest accuracy of 98% in predicting NASH drug response. FINDINGS Following a gradual reduction in features, the outcomes demonstrated superior performance when employing the Random Forest classifier, yielding an accuracy of 98.4%. The principal selected molecular features included YAP1, LATS1, NF2, SRD5A3-AS1, FOXA2, TEAD2, miR-650, MMP14, ITGB1, and miR-6881-5P, while the biochemical markers comprised triglycerides (TG), ALT, ALP, total bilirubin (T. Bilirubin), alpha-fetoprotein (AFP), and low-density lipoprotein cholesterol (LDL-C). CONCLUSION This study introduced an ML model incorporating 16 noninvasive features, including molecular and biochemical signatures, which achieved high performance and accuracy in detecting NASH improvement. This model could potentially be used as diagnostic tools and to identify target therapies.
Collapse
Affiliation(s)
- Marwa Matboli
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt.
| | - Ibrahim Abdelbaky
- Artificial Intelligence Department, Faculty of Computers and Artificial Intelligence, Benha University, Benha City, Egypt
| | - Abdelrahman Khaled
- Bioinformatics Group, Center of Informatics Sciences (CIS), School of Information Technology and Computer Sciences, Nile University, Giza, Egypt
| | - Radwa Khaled
- Biotechnology/Biomolecular Chemistry Department, Faculty of Science, Cairo University, Cairo, Egypt
- Basic Sciences Department, Modern University for Technology and Information, Cairo, Egypt
| | | | - Laila M Farid
- Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | | | - Noha E El-Attar
- Information System Department, Faculty of Computers and Artificial Intelligence, Benha University, Benha City, Egypt
- Faculty of Artificial Intelligence, Delta University for Science and Technology, Gamasa, 35712, Egypt
| | - Mohamed Farag Fathallah
- Medical Pathology Department, Faculty of Medicine, Cairo University, Cairo, Egypt
- Medical Physiology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Manal S Abd El Hamid
- Medical Physiology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Gena M Elmakromy
- Endocrinology & Diabetes Mellitus Unit, Department of Internal Medicine, Badr University in Cairo, Badr City, Egypt
| | - Marwa Ali
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| |
Collapse
|
2
|
Vasanthakumari P, Zhu Y, Brettin T, Partin A, Shukla M, Xia F, Narykov O, Weil MR, Stevens RL. A Comprehensive Investigation of Active Learning Strategies for Conducting Anti-Cancer Drug Screening. Cancers (Basel) 2024; 16:530. [PMID: 38339281 PMCID: PMC10854925 DOI: 10.3390/cancers16030530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/12/2024] [Accepted: 01/22/2024] [Indexed: 02/12/2024] Open
Abstract
It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.
Collapse
Affiliation(s)
- Priyanka Vasanthakumari
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
| | - Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Maulik Shukla
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Fangfang Xia
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Michael Ryan Weil
- Cancer Research Technology Program, Cancer Data Science Initiatives, Frederick National Laboratory for Cancer Research, Frederick, MD 21701, USA;
| | - Rick L. Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
3
|
Narykov O, Zhu Y, Brettin T, Evrard YA, Partin A, Shukla M, Xia F, Clyde A, Vasanthakumari P, Doroshow JH, Stevens RL. Integration of Computational Docking into Anti-Cancer Drug Response Prediction Models. Cancers (Basel) 2023; 16:50. [PMID: 38201477 PMCID: PMC10777918 DOI: 10.3390/cancers16010050] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 12/01/2023] [Accepted: 12/07/2023] [Indexed: 01/12/2024] Open
Abstract
Cancer is a heterogeneous disease in that tumors of the same histology type can respond differently to a treatment. Anti-cancer drug response prediction is of paramount importance for both drug development and patient treatment design. Although various computational methods and data have been used to develop drug response prediction models, it remains a challenging problem due to the complexities of cancer mechanisms and cancer-drug interactions. To better characterize the interaction between cancer and drugs, we investigate the feasibility of integrating computationally derived features of molecular mechanisms of action into prediction models. Specifically, we add docking scores of drug molecules and target proteins in combination with cancer gene expressions and molecular drug descriptors for building response models. The results demonstrate a marginal improvement in drug response prediction performance when adding docking scores as additional features, through tests on large drug screening data. We discuss the limitations of the current approach and provide the research community with a baseline dataset of the large-scale computational docking for anti-cancer drugs.
Collapse
Affiliation(s)
- Oleksandr Narykov
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Yitan Zhu
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Yvonne A. Evrard
- Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA;
| | - Alexander Partin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Maulik Shukla
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Fangfang Xia
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Austin Clyde
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| | - Priyanka Vasanthakumari
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - James H. Doroshow
- Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD 20892, USA;
| | - Rick L. Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
4
|
Guo J, Hu J, Zheng Y, Zhao S, Ma J. Artificial intelligence: opportunities and challenges in the clinical applications of triple-negative breast cancer. Br J Cancer 2023; 128:2141-2149. [PMID: 36871044 PMCID: PMC10241896 DOI: 10.1038/s41416-023-02215-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 02/08/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Triple-negative breast cancer (TNBC) accounts for 15-20% of all invasive breast cancer subtypes. Owing to its clinical characteristics, such as the lack of effective therapeutic targets, high invasiveness, and high recurrence rate, TNBC is difficult to treat and has a poor prognosis. Currently, with the accumulation of large amounts of medical data and the development of computing technology, artificial intelligence (AI), particularly machine learning, has been applied to various aspects of TNBC research, including early screening, diagnosis, identification of molecular subtypes, personalised treatment, and prediction of prognosis and treatment response. In this review, we discussed the general principles of artificial intelligence, summarised its main applications in the diagnosis and treatment of TNBC, and provided new ideas and theoretical basis for the clinical diagnosis and treatment of TNBC.
Collapse
Affiliation(s)
- Jiamin Guo
- Department of Medical Oncology, West China Hospital, Sichuan University, 610041, Chengdu, Sichuan Province, P. R. China
| | - Junjie Hu
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, 610065, Chengdu, Sichuan Province, P. R. China
| | - Yichen Zheng
- Department of Medical Oncology, West China Hospital, Sichuan University, 610041, Chengdu, Sichuan Province, P. R. China
| | - Shuang Zhao
- Department of Radiology, West China Hospital of Sichuan University, 610041, Chengdu, Sichuan Province, P. R. China.
| | - Ji Ma
- Department of Medical Oncology, West China Hospital, Sichuan University, 610041, Chengdu, Sichuan Province, P. R. China.
| |
Collapse
|
5
|
Weng S, Hu D, Chen J, Yang Y, Peng D. Prediction of Fatty Liver Disease in a Chinese Population Using Machine-Learning Algorithms. Diagnostics (Basel) 2023; 13:diagnostics13061168. [PMID: 36980476 PMCID: PMC10047083 DOI: 10.3390/diagnostics13061168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 03/13/2023] [Accepted: 03/16/2023] [Indexed: 03/30/2023] Open
Abstract
BACKGROUND Fatty liver disease (FLD) is an important risk factor for liver cancer and cardiovascular disease and can lead to significant social and economic burden. However, there is currently no nationwide epidemiological survey for FLD in China, making early FLD screening crucial for the Chinese population. Unfortunately, liver biopsy and abdominal ultrasound, the preferred methods for FLD diagnosis, are not practical for primary medical institutions. Therefore, the aim of this study was to develop machine learning (ML) models for screening individuals at high risk of FLD, and to provide a new perspective on early FLD diagnosis. METHODS This study included a total of 30,574 individuals between the ages of 18 and 70 who completed abdominal ultrasound and the related clinical examinations. Among them, 3474 individuals were diagnosed with FLD by abdominal ultrasound. We used 11 indicators to build eight classification models to predict FLD. The model prediction ability was evaluated by the area under the curve, sensitivity, specificity, positive predictive value, negative predictive value, and kappa value. Feature importance analysis was assessed by Shapley value or root mean square error loss after permutations. RESULTS Among the eight ML models, the prediction accuracy of the extreme gradient boosting (XGBoost) model was highest at 89.77%. By feature importance analysis, we found that the body mass index, triglyceride, and alanine aminotransferase play important roles in FLD prediction. CONCLUSION XGBoost improves the efficiency and cost of large-scale FLD screening.
Collapse
Affiliation(s)
- Shuwei Weng
- Department of Cardiovascular Medicine, The Second Xiangya Hospital, Central South University, Changsha 410011, China
- Research Institute of Blood Lipid and Atherosclerosis, Central South University, Changsha 410011, China
| | - Die Hu
- Department of Cardiovascular Medicine, The Second Xiangya Hospital, Central South University, Changsha 410011, China
- Research Institute of Blood Lipid and Atherosclerosis, Central South University, Changsha 410011, China
| | - Jin Chen
- Department of Cardiovascular Medicine, The Second Xiangya Hospital, Central South University, Changsha 410011, China
- Research Institute of Blood Lipid and Atherosclerosis, Central South University, Changsha 410011, China
| | - Yanyi Yang
- Health Management Center, The Second Xiangya Hospital, Central South University, Changsha 410011, China
| | - Daoquan Peng
- Department of Cardiovascular Medicine, The Second Xiangya Hospital, Central South University, Changsha 410011, China
- Research Institute of Blood Lipid and Atherosclerosis, Central South University, Changsha 410011, China
| |
Collapse
|
6
|
Singh DP, Kaushik B. A systematic literature review for the prediction of anticancer drug response using various machine-learning and deep-learning techniques. Chem Biol Drug Des 2023; 101:175-194. [PMID: 36303299 DOI: 10.1111/cbdd.14164] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/13/2022] [Accepted: 10/24/2022] [Indexed: 12/24/2022]
Abstract
Computational methods have gained prominence in healthcare research. The accessibility of healthcare data has greatly incited academicians and researchers to develop executions that help in prognosis of cancer drug response. Among various computational methods, machine-learning (ML) and deep-learning (DL) methods provide the most consistent and effectual approaches to handle the serious aftermaths of the deadly disease and drug administered to the patients. Hence, this systematic literature review has reviewed researches that have investigated drug discovery and prognosis of anticancer drug response using ML and DL algorithms. Fot this purpose, PRISMA guidelines have been followed to choose research papers from Google Scholar, PubMed, and Sciencedirect websites. A total count of 105 papers that align with the context of this review were chosen. Further, the review also presents accuracy of the existing ML and DL methods in the prediction of anticancer drug response. It has been found from the review that, amidst the availability of various studies, there are certain challenges associated with each method. Thus, future researchers can consider these limitations and challenges to develop a prominent anticancer drug response prediction method, and it would be greatly beneficial to the medical professionals in administering non-invasive treatment to the patients.
Collapse
Affiliation(s)
- Davinder Paul Singh
- School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India
| | - Baijnath Kaushik
- School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India
| |
Collapse
|
7
|
Borisov N, Buzdin A. Transcriptomic Harmonization as the Way for Suppressing Cross-Platform Bias and Batch Effect. Biomedicines 2022; 10:2318. [PMID: 36140419 PMCID: PMC9496268 DOI: 10.3390/biomedicines10092318] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 09/14/2022] [Accepted: 09/16/2022] [Indexed: 11/16/2022] Open
Abstract
(1) Background: Emergence of methods interrogating gene expression at high throughput gave birth to quantitative transcriptomics, but also posed a question of inter-comparison of expression profiles obtained using different equipment and protocols and/or in different series of experiments. Addressing this issue is challenging, because all of the above variables can dramatically influence gene expression signals and, therefore, cause a plethora of peculiar features in the transcriptomic profiles. Millions of transcriptomic profiles were obtained and deposited in public databases of which the usefulness is however strongly limited due to the inter-comparison issues; (2) Methods: Dozens of methods and software packages that can be generally classified as either flexible or predefined format harmonizers have been proposed, but none has become to the date the gold standard for unification of this type of Big Data; (3) Results: However, recent developments evidence that platform/protocol/batch bias can be efficiently reduced not only for the comparisons of limited transcriptomic datasets. Instead, instruments were proposed for transforming gene expression profiles into the universal, uniformly shaped format that can support multiple inter-comparisons for reasonable calculation costs. This forms a basement for universal indexing of all or most of all types of RNA sequencing and microarray hybridization profiles; (4) Conclusions: In this paper, we attempted to overview the landscape of modern approaches and methods in transcriptomic harmonization and focused on the practical aspects of their application.
Collapse
Affiliation(s)
- Nicolas Borisov
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, 119435 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
| | - Anton Buzdin
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, 119435 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, 117997 Moscow, Russia
- PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), 1200 Brussels, Belgium
| |
Collapse
|
8
|
Sinha K, Ghosh J, Sil PC. Machine Learning in Drug Metabolism Study. Curr Drug Metab 2022; 23:1012-1026. [PMID: 36578255 DOI: 10.2174/1389200224666221227094144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 10/27/2022] [Accepted: 11/01/2022] [Indexed: 12/30/2022]
Abstract
Metabolic reactions in the body transform the administered drug into metabolites. These metabolites exhibit diverse biological activities. Drug metabolism is the major underlying cause of drug overdose-related toxicity, adversative drug effects and the drug's reduced efficacy. Though metabolic reactions deactivate a drug, drug metabolites are often considered pivotal agents for off-target effects or toxicity. On the other side, in combination drug therapy, one drug may influence another drug's metabolism and clearance and is thus considered one of the primary causes of drug-drug interactions. Today with the advancement of machine learning, the metabolic fate of a drug candidate can be comprehensively studied throughout the drug development procedure. Naïve Bayes, Logistic Regression, k-Nearest Neighbours, Decision Trees, different Boosting and Ensemble methods, Support Vector Machines and Artificial Neural Network boosted Deep Learning are some machine learning algorithms which are being extensively used in such studies. Such tools are covering several attributes of drug metabolism, with an emphasis on the prediction of drug-drug interactions, drug-target-interactions, clinical drug responses, metabolite predictions, sites of metabolism, etc. These reports are crucial for evaluating metabolic stability and predicting prospective drug-drug interactions, and can help pharmaceutical companies accelerate the drug development process in a less resourcedemanding manner than what in vitro studies offer. It could also help medical practitioners to use combinatorial drug therapy in a more resourceful manner. Also, with the help of the enormous growth of deep learning, traditional fields of computational drug development like molecular interaction fields, molecular docking, quantitative structure-toactivity relationship (QSAR) studies and quantum mechanical simulations are producing results which were unimaginable couple of years back. This review provides a glimpse of a few contextually relevant machine learning algorithms and then focuses on their outcomes in different studies.
Collapse
Affiliation(s)
- Krishnendu Sinha
- Department of Zoology, Jhargram Raj College, Jhargram-721507, India
| | - Jyotirmoy Ghosh
- Department of Chemistry, Banwarilal Bhalotia College, Asansol-713303, India
| | - Parames Chandra Sil
- Department of Division of Molecular Medicine, Bose Institute, Kolkata-700054, India
| |
Collapse
|
9
|
Nwanosike EM, Conway BR, Merchant HA, Hasan SS. Potential applications and performance of machine learning techniques and algorithms in clinical practice: A systematic review. Int J Med Inform 2021; 159:104679. [PMID: 34990939 DOI: 10.1016/j.ijmedinf.2021.104679] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Revised: 12/08/2021] [Accepted: 12/27/2021] [Indexed: 12/11/2022]
Abstract
PURPOSE The advent of clinically adapted machine learning algorithms can solve numerous problems ranging from disease diagnosis and prognosis to therapy recommendations. This systematic review examines the performance of machine learning (ML) algorithms and evaluates the progress made to date towards their implementation in clinical practice. METHODS Systematic searching of databases (PubMed, MEDLINE, Scopus, Google Scholar, Cochrane Library and WHO Covid-19 database) to identify original articles published between January 2011 and October 2021. Studies reporting ML techniques in clinical practice involving humans and ML algorithms with a performance metric were considered. RESULTS Of 873 unique articles identified, 36 studies were eligible for inclusion. The XGBoost (extreme gradient boosting) algorithm showed the highest potential for clinical applications (n = 7 studies); this was followed jointly by random forest algorithm, logistic regression, and the support vector machine, respectively (n = 5 studies). Prediction of outcomes (n = 33), in particular Inflammatory diseases (n = 7) received the most attention followed by cancer and neuropsychiatric disorders (n = 5 for each) and Covid-19 (n = 4). Thirty-three out of the thirty-six included studies passed more than 50% of the selected quality assessment criteria in the TRIPOD checklist. In contrast, none of the studies could achieve an ideal overall bias rating of 'low' based on the PROBAST checklist. In contrast, only three studies showed evidence of the deployment of ML algorithm(s) in clinical practice. CONCLUSIONS ML is potentially a reliable tool for clinical decision support. Although advocated widely in clinical practice, work is still in progress to validate clinically adapted ML algorithms. Improving quality standards, transparency, and interpretability of ML models will further lower the barriers to acceptability.
Collapse
Affiliation(s)
- Ezekwesiri Michael Nwanosike
- Department of Pharmacy, School of Applied Sciences, University of Huddersfield, Queensgate Huddersfield HD1 3DH, West Yorkshire, United Kingdom
| | - Barbara R Conway
- Department of Pharmacy, School of Applied Sciences, University of Huddersfield, Queensgate Huddersfield HD1 3DH, West Yorkshire, United Kingdom
| | - Hamid A Merchant
- Department of Pharmacy, School of Applied Sciences, University of Huddersfield, Queensgate Huddersfield HD1 3DH, West Yorkshire, United Kingdom
| | - Syed Shahzad Hasan
- Department of Pharmacy, School of Applied Sciences, University of Huddersfield, Queensgate Huddersfield HD1 3DH, West Yorkshire, United Kingdom; School of Biomedical Sciences & Pharmacy, University of Newcastle, Callaghan, Australia.
| |
Collapse
|
10
|
An X, Chen X, Yi D, Li H, Guan Y. Representation of molecules for drug response prediction. Brief Bioinform 2021; 23:6375515. [PMID: 34571534 DOI: 10.1093/bib/bbab393] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 08/28/2021] [Accepted: 08/30/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid development of machine learning and deep learning algorithms in the recent decade has spurred an outburst of their applications in many research fields. In the chemistry domain, machine learning has been widely used to aid in drug screening, drug toxicity prediction, quantitative structure-activity relationship prediction, anti-cancer synergy score prediction, etc. This review is dedicated to the application of machine learning in drug response prediction. Specifically, we focus on molecular representations, which is a crucial element to the success of drug response prediction and other chemistry-related prediction tasks. We introduce three types of commonly used molecular representation methods, together with their implementation and application examples. This review will serve as a brief introduction of the broad field of molecular representations.
Collapse
Affiliation(s)
- Xin An
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Xi Chen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Daiyao Yi
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Hongyang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
11
|
Carracedo-Reboredo P, Liñares-Blanco J, Rodríguez-Fernández N, Cedrón F, Novoa FJ, Carballal A, Maojo V, Pazos A, Fernandez-Lozano C. A review on machine learning approaches and trends in drug discovery. Comput Struct Biotechnol J 2021; 19:4538-4558. [PMID: 34471498 PMCID: PMC8387781 DOI: 10.1016/j.csbj.2021.08.011] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 08/06/2021] [Accepted: 08/06/2021] [Indexed: 12/30/2022] Open
Abstract
Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years.
Collapse
Key Words
- ADMET, Absorption, distribution, metabolism, elimination and toxicity
- ADR, Adverse Drug Reaction
- AI, Artificial Intelligence
- ANN, Artificial Neural Networks
- APFP, Atom Pairs 2d FingerPrint
- AUC, Area under the Curve
- BBB, Blood–Brain barrier
- CDK, Chemical Development Kit
- CNN, Convolutional Neural Networks
- CNS, Central Nervous System
- CPI, Compound-protein interaction
- CV, Cross Validation
- Cheminformatics
- DL, Deep Learning
- DNA, Deoxyribonucleic acid
- Deep Learning
- Drug Discovery
- ECFP, Extended Connectivity Fingerprints
- FDA, Food and Drug Administration
- FNN, Fully Connected Neural Networks
- FP, Fringerprints
- FS, Feature Selection
- GCN, Graph Convolutional Networks
- GEO, Gene Expression Omnibus
- GNN, Graph Neural Networks
- GO, Gene Ontology
- KEGG, Kyoto Encyclopedia of Genes and Genomes
- MACCS, Molecular ACCess System
- MCC, Matthews correlation coefficient
- MD, Molecular Descriptors
- MKL, Multiple Kernel Learning
- ML, Machine Learning
- Machine Learning
- Molecular Descriptors
- NB, Naive Bayes
- OOB, Out of Bag
- PCA, Principal Component Analyisis
- QSAR
- QSAR, Quantitative structure–activity relationship
- RF, Random Forest
- RNA, Ribonucleic Acid
- SMILES, simplified molecular-input line-entry system
- SVM, Support Vector Machines
- TCGA, The Cancer Genome Atlas
- WHO, World Health Organization
- t-SNE, t-Distributed Stochastic Neighbor Embedding
Collapse
Affiliation(s)
- Paula Carracedo-Reboredo
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Jose Liñares-Blanco
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
| | - Nereida Rodríguez-Fernández
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Department of Computer Science and Information Technologies, Faculty of Communication Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Francisco Cedrón
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Francisco J. Novoa
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Adrian Carballal
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Department of Computer Science and Information Technologies, Faculty of Communication Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Victor Maojo
- Biomedical Informatics Group, Artificial Intelligence Department, Polytechnic University of Madrid, Calle de los Ciruelos, Boadilla del Monte, Madrid 28660, Spain
| | - Alejandro Pazos
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR), Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| | - Carlos Fernandez-Lozano
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR), Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| |
Collapse
|
12
|
Venezian Povoa L, Ribeiro CHC, da Silva IT. Machine learning predicts treatment sensitivity in multiple myeloma based on molecular and clinical information coupled with drug response. PLoS One 2021; 16:e0254596. [PMID: 34320000 PMCID: PMC8318243 DOI: 10.1371/journal.pone.0254596] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Accepted: 06/29/2021] [Indexed: 11/18/2022] Open
Abstract
Providing treatment sensitivity stratification at the time of cancer diagnosis allows better allocation of patients to alternative treatment options. Despite many clinical and biological risk markers having been associated with variable survival in cancer, assessing the interplay of these markers through Machine Learning (ML) algorithms still remains to be fully explored. Here, we present a Multi Learning Training approach (MuLT) combining supervised, unsupervised and self-supervised learning algorithms, to examine the predictive value of heterogeneous treatment outcomes for Multiple Myeloma (MM). We show that gene expression values improve the treatment sensitivity prediction and recapitulates genetic abnormalities detected by Fluorescence in situ hybridization (FISH) testing. MuLT performance was assessed by cross-validation experiments, in which it predicted treatment sensitivity with 68.70% of AUC. Finally, simulations showed numerical evidences that in average 17.07% of patients could get better response to a different treatment at the first line.
Collapse
Affiliation(s)
- Lucas Venezian Povoa
- Aeronautics Institute of Technology (ITA), Bioengineering Lab, São José dos Campos, Brazil
- Aeronautics Institute of Technology (ITA), Computer Science Division, São José dos Campos, Brazil
- AC Camargo Cancer Center (ACCCC), International Research and Educational Center, São Paulo, Brazil
- Federal Institute for Education, Science, and Technology of São Paulo (IFPS), Jacarei, Brazil
| | - Carlos Henrique Costa Ribeiro
- Aeronautics Institute of Technology (ITA), Bioengineering Lab, São José dos Campos, Brazil
- Aeronautics Institute of Technology (ITA), Computer Science Division, São José dos Campos, Brazil
| | - Israel Tojal da Silva
- AC Camargo Cancer Center (ACCCC), International Research and Educational Center, São Paulo, Brazil
- * E-mail:
| |
Collapse
|
13
|
Shanbhogue H M, Thirumaleshwar S, Kumar Tm P, Kumar S H. Artificial Intelligence in Pharmaceutical Field - A Critical Review. Curr Drug Deliv 2021; 18:1456-1466. [PMID: 34139981 DOI: 10.2174/1567201818666210617100613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 04/09/2021] [Accepted: 04/17/2021] [Indexed: 12/15/2022]
Abstract
Artificial intelligence is an emerging sector in almost all fields. It is not confined only to a particular category and can be used in various fields like research, technology, and health. AI mainly concentrates on how computers analyze data and mimic the human thought process. As drug development involves high R & D costs and uncertainty in time consumption, artificial intelligence can serve as one of the promising solutions to overcome all these demerits. Due to the availability of enormous data, there are chances of missing out on some crucial details. For solving these issues, algorithms like machine learning, deep learning, and other expert systems are being used. On successful implementation of AI in the pharmaceutical field, the delays in drug development, and failure at the clinical and marketing level can be reduced. This review comprises information regarding the development of AI, its subfields, its overall implementation, and its application in the pharmaceutical sector and provides insights on challenges and limitations concerning AI.
Collapse
Affiliation(s)
- Maithri Shanbhogue H
- Department of Pharmaceutics, Industrial Pharmacy Group, JSS College of Pharmacy, Mysuru JSS Academy of Higher Education and Research Sri Shivarathreeshwara Nagara, Mysuru - 570015, Karnataka, India
| | - Shailesh Thirumaleshwar
- Department of Pharmaceutics, Industrial Pharmacy Group, JSS College of Pharmacy, Mysuru JSS Academy of Higher Education and Research Sri Shivarathreeshwara Nagara, Mysuru - 570015, Karnataka, India
| | - Pramod Kumar Tm
- Department of Pharmaceutics, Industrial Pharmacy Group, JSS College of Pharmacy, Mysuru JSS Academy of Higher Education and Research Sri Shivarathreeshwara Nagara, Mysuru - 570015, Karnataka, India
| | - Hemanth Kumar S
- Department of Pharmaceutics, Industrial Pharmacy Group, JSS College of Pharmacy, Mysuru JSS Academy of Higher Education and Research Sri Shivarathreeshwara Nagara, Mysuru - 570015, Karnataka, India
| |
Collapse
|
14
|
Borisov N, Sergeeva A, Suntsova M, Raevskiy M, Gaifullin N, Mendeleeva L, Gudkov A, Nareiko M, Garazha A, Tkachev V, Li X, Sorokin M, Surin V, Buzdin A. Machine Learning Applicability for Classification of PAD/VCD Chemotherapy Response Using 53 Multiple Myeloma RNA Sequencing Profiles. Front Oncol 2021; 11:652063. [PMID: 33937058 PMCID: PMC8083158 DOI: 10.3389/fonc.2021.652063] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 03/19/2021] [Indexed: 12/17/2022] Open
Abstract
Multiple myeloma (MM) affects ~500,000 people and results in ~100,000 deaths annually, being currently considered treatable but incurable. There are several MM chemotherapy treatment regimens, among which eleven include bortezomib, a proteasome-targeted drug. MM patients respond differently to bortezomib, and new prognostic biomarkers are needed to personalize treatments. However, there is a shortage of clinically annotated MM molecular data that could be used to establish novel molecular diagnostics. We report new RNA sequencing profiles for 53 MM patients annotated with responses on two similar chemotherapy regimens: bortezomib, doxorubicin, dexamethasone (PAD), and bortezomib, cyclophosphamide, dexamethasone (VCD), or with responses to their combinations. Fourteen patients received both PAD and VCD; six received only PAD, and 33 received only VCD. We compared profiles for the good and poor responders and found five genes commonly regulated here and in the previous datasets for other bortezomib regimens (all upregulated in the good responders): FGFR3, MAF, IGHA2, IGHV1-69, and GRB14. Four of these genes are linked with known immunoglobulin locus rearrangements. We then used five machine learning (ML) methods to build a classifier distinguishing good and poor responders for two cohorts: PAD + VCD (53 patients), and separately VCD (47 patients). We showed that the application of FloWPS dynamic data trimming was beneficial for all ML methods tested in both cohorts, and also in the previous MM bortezomib datasets. However, the ML models build for the different datasets did not allow cross-transferring, which can be due to different treatment regimens, experimental profiling methods, and MM heterogeneity.
Collapse
Affiliation(s)
- Nicolas Borisov
- Moscow Institute of Physics and Technology, Laboratory for Translational Genomic Bioinformatics, Dolgoprudny, Russia
| | - Anna Sergeeva
- National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
| | - Maria Suntsova
- I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Group for Genomic Analysis of Cell Signaling Systems, Moscow, Russia
| | - Mikhail Raevskiy
- Moscow Institute of Physics and Technology, Laboratory for Translational Genomic Bioinformatics, Dolgoprudny, Russia
| | - Nurshat Gaifullin
- Department of Pathology, Faculty of Medicine, Lomonosov Moscow State University, Moscow, Russia
| | - Larisa Mendeleeva
- National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
| | - Alexander Gudkov
- I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia
| | - Maria Nareiko
- National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
| | - Andrew Garazha
- Omicsway Corp., Research Department, Walnut, CA, United States
- Oncobox Ltd., Research Department, Moscow, Russia
| | - Victor Tkachev
- Omicsway Corp., Research Department, Walnut, CA, United States
- Oncobox Ltd., Research Department, Moscow, Russia
| | - Xinmin Li
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Maxim Sorokin
- I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia
- Omicsway Corp., Research Department, Walnut, CA, United States
- Oncobox Ltd., Research Department, Moscow, Russia
| | - Vadim Surin
- National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
| | - Anton Buzdin
- I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Group for Genomic Analysis of Cell Signaling Systems, Moscow, Russia
- Omicsway Corp., Research Department, Walnut, CA, United States
| |
Collapse
|
15
|
Cancer gene expression profiles associated with clinical outcomes to chemotherapy treatments. BMC Med Genomics 2020; 13:111. [PMID: 32948183 PMCID: PMC7499993 DOI: 10.1186/s12920-020-00759-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 07/27/2020] [Indexed: 12/18/2022] Open
Abstract
Background Machine learning (ML) methods still have limited applicability in personalized oncology due to low numbers of available clinically annotated molecular profiles. This doesn’t allow sufficient training of ML classifiers that could be used for improving molecular diagnostics. Methods We reviewed published datasets of high throughput gene expression profiles corresponding to cancer patients with known responses on chemotherapy treatments. We browsed Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA) and Tumor Alterations Relevant for GEnomics-driven Therapy (TARGET) repositories. Results We identified data collections suitable to build ML models for predicting responses on certain chemotherapeutic schemes. We identified 26 datasets, ranging from 41 till 508 cases per dataset. All the datasets identified were checked for ML applicability and robustness with leave-one-out cross validation. Twenty-three datasets were found suitable for using ML that had balanced numbers of treatment responder and non-responder cases. Conclusions We collected a database of gene expression profiles associated with clinical responses on chemotherapy for 2786 individual cancer cases. Among them seven datasets included RNA sequencing data (for 645 cases) and the others – microarray expression profiles. The cases represented breast cancer, lung cancer, low-grade glioma, endothelial carcinoma, multiple myeloma, adult leukemia, pediatric leukemia and kidney tumors. Chemotherapeutics included taxanes, bortezomib, vincristine, trastuzumab, letrozole, tipifarnib, temozolomide, busulfan and cyclophosphamide.
Collapse
|
16
|
Tkachev V, Sorokin M, Borisov C, Garazha A, Buzdin A, Borisov N. Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology. Int J Mol Sci 2020; 21:ijms21030713. [PMID: 31979006 PMCID: PMC7037338 DOI: 10.3390/ijms21030713] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 01/16/2020] [Accepted: 01/17/2020] [Indexed: 12/21/2022] Open
Abstract
(1) Background: Machine learning (ML) methods are rarely used for an omics-based prescription of cancer drugs, due to shortage of case histories with clinical outcome supplemented by high-throughput molecular data. This causes overtraining and high vulnerability of most ML methods. Recently, we proposed a hybrid global-local approach to ML termed floating window projective separator (FloWPS) that avoids extrapolation in the feature space. Its core property is data trimming, i.e., sample-specific removal of irrelevant features. (2) Methods: Here, we applied FloWPS to seven popular ML methods, including linear SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naïve Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). (3) Results: We performed computational experiments for 21 high throughput gene expression datasets (41–235 samples per dataset) totally representing 1778 cancer patients with known responses on chemotherapy treatments. FloWPS essentially improved the classifier quality for all global ML methods (SVM, RF, BNB, ADA, MLP), where the area under the receiver-operator curve (ROC AUC) for the treatment response classifiers increased from 0.61–0.88 range to 0.70–0.94. We tested FloWPS-empowered methods for overtraining by interrogating the importance of different features for different ML methods in the same model datasets. (4) Conclusions: We showed that FloWPS increases the correlation of feature importance between the different ML methods, which indicates its robustness to overtraining. For all the datasets tested, the best performance of FloWPS data trimming was observed for the BNB method, which can be valuable for further building of ML classifiers in personalized oncology.
Collapse
Affiliation(s)
- Victor Tkachev
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
| | - Maxim Sorokin
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
- Institute for Personailzed Medicine, I.M. Sechenov First Moscow State Medical University, 119991 Moscow, Russia
| | - Constantin Borisov
- National Research University—Higher School of Economics, 101000 Moscow, Russia;
| | - Andrew Garazha
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
| | - Anton Buzdin
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
- Institute for Personailzed Medicine, I.M. Sechenov First Moscow State Medical University, 119991 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Moscow Oblast, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, 117997 Moscow, Russia
| | - Nicolas Borisov
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
- Institute for Personailzed Medicine, I.M. Sechenov First Moscow State Medical University, 119991 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Moscow Oblast, Russia
- Correspondence: ; Tel.: +7-903-218-7261
| |
Collapse
|
17
|
Turki T, Taguchi YH. Machine learning algorithms for predicting drugs–tissues relationships. EXPERT SYSTEMS WITH APPLICATIONS 2019. [DOI: 10.1016/j.eswa.2019.02.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|