1
|
Akyüz K, Cano Abadía M, Goisauf M, Mayrhofer MT. Unlocking the potential of big data and AI in medicine: insights from biobanking. Front Med (Lausanne) 2024; 11:1336588. [PMID: 38357641 PMCID: PMC10864616 DOI: 10.3389/fmed.2024.1336588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 01/19/2024] [Indexed: 02/16/2024] Open
Abstract
Big data and artificial intelligence are key elements in the medical field as they are expected to improve accuracy and efficiency in diagnosis and treatment, particularly in identifying biomedically relevant patterns, facilitating progress towards individually tailored preventative and therapeutic interventions. These applications belong to current research practice that is data-intensive. While the combination of imaging, pathological, genomic, and clinical data is needed to train algorithms to realize the full potential of these technologies, biobanks often serve as crucial infrastructures for data-sharing and data flows. In this paper, we argue that the 'data turn' in the life sciences has increasingly re-structured major infrastructures, which often were created for biological samples and associated data, as predominantly data infrastructures. These have evolved and diversified over time in terms of tackling relevant issues such as harmonization and standardization, but also consent practices and risk assessment. In line with the datafication, an increased use of AI-based technologies marks the current developments at the forefront of the big data research in life science and medicine that engender new issues and concerns along with opportunities. At a time when secure health data environments, such as European Health Data Space, are in the making, we argue that such meta-infrastructures can benefit both from the experience and evolution of biobanking, but also the current state of affairs in AI in medicine, regarding good governance, the social aspects and practices, as well as critical thinking about data practices, which can contribute to trustworthiness of such meta-infrastructures.
Collapse
Affiliation(s)
- Kaya Akyüz
- Department of ELSI Services and Research, BBMRI-ERIC, Graz, Austria
| | | | | | | |
Collapse
|
2
|
Salama V, Godinich B, Geng Y, Humbert-Vidan L, Maule L, Wahid KA, Naser MA, He R, Mohamed ASR, Fuller CD, Moreno AC. Artificial Intelligence and Machine Learning in Cancer Related Pain: A Systematic Review. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.06.23299610. [PMID: 38105979 PMCID: PMC10723503 DOI: 10.1101/2023.12.06.23299610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Background/objective Pain is a challenging multifaceted symptom reported by most cancer patients, resulting in a substantial burden on both patients and healthcare systems. This systematic review aims to explore applications of artificial intelligence/machine learning (AI/ML) in predicting pain-related outcomes and supporting decision-making processes in pain management in cancer. Methods A comprehensive search of Ovid MEDLINE, EMBASE and Web of Science databases was conducted using terms including "Cancer", "Pain", "Pain Management", "Analgesics", "Opioids", "Artificial Intelligence", "Machine Learning", "Deep Learning", and "Neural Networks" published up to September 7, 2023. The screening process was performed using the Covidence screening tool. Only original studies conducted in human cohorts were included. AI/ML models, their validation and performance and adherence to TRIPOD guidelines were summarized from the final included studies. Results This systematic review included 44 studies from 2006-2023. Most studies were prospective and uni-institutional. There was an increase in the trend of AI/ML studies in cancer pain in the last 4 years. Nineteen studies used AI/ML for classifying cancer patients' pain development after cancer therapy, with median AUC 0.80 (range 0.76-0.94). Eighteen studies focused on cancer pain research with median AUC 0.86 (range 0.50-0.99), and 7 focused on applying AI/ML for cancer pain management decisions with median AUC 0.71 (range 0.47-0.89). Multiple ML models were investigated with. median AUC across all models in all studies (0.77). Random forest models demonstrated the highest performance (median AUC 0.81), lasso models had the highest median sensitivity (1), while Support Vector Machine had the highest median specificity (0.74). Overall adherence of included studies to TRIPOD guidelines was 70.7%. Lack of external validation (14%) and clinical application (23%) of most included studies was detected. Reporting of model calibration was also missing in the majority of studies (5%). Conclusion Implementation of various novel AI/ML tools promises significant advances in the classification, risk stratification, and management decisions for cancer pain. These advanced tools will integrate big health-related data for personalized pain management in cancer patients. Further research focusing on model calibration and rigorous external clinical validation in real healthcare settings is imperative for ensuring its practical and reliable application in clinical practice.
Collapse
|
3
|
Lustberg M, Wu X, Fernández-Martínez JL, de Andrés-Galiana EJ, Philips S, Leibowitz J, Schneider B, Sonis S. Leveraging GWAS data derived from a large cooperative group trial to assess the risk of taxane-induced peripheral neuropathy (TIPN) in patients being treated for breast cancer: Part 2-functional implications of a SNP cluster associated with TIPN risk in patients being treated for breast cancer. Support Care Cancer 2023; 31:178. [PMID: 36809570 DOI: 10.1007/s00520-023-07617-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 01/28/2023] [Indexed: 02/23/2023]
Abstract
INTRODUCTION Using GWAS data derived from a large collaborative trial (ECOG-5103), we identified a cluster of 267 SNPs which predicted CIPN in treatment-naive patients as reported in Part 1 of this study. To assess the functional and pathological implications of this set, we identified collective gene signatures were and evaluated the informational value of those signatures in defining CIPN's pathogenesis. METHODS In Part 1, we analyzed GWAS data derived from ECOG-5103, first identifying those SNPs that were most strongly associated with CIPN using Fisher's ratio. After identifying those SNPs which differentiated CIPN-positive from CIPN-negative phenotypes, we ranked them in order of their discriminatory power to produce a cluster of SNPs which provided the highest predictive accuracy using leave-one-out cross validation (LOOCV). An uncertainty analysis was included. Using the best predictive SNP cluster, we performed gene attribution for each SNP using NCBI Phenotype Genotype Integrator and then assessed functionality by applying GeneAnalytics, Gene Set Enrichment Analysis, and PCViz. RESULTS Using aggregate data derived from the GWAS, we identified a 267 SNP cluster which was associated with a CIPN+ phenotype with an accuracy of 96.1%. We could attribute 173 genes to the 267 SNP cluster. Six long intergenic non-protein coding genes were excluded. Ultimately, the functional analysis was based on 138 genes. Of the 17 pathways identified by Gene Analytics (GA) software, the irinotecan pharmacokinetic pathway had the highest score. Highly matching gene ontology attributions included flavone metabolic process, flavonoid glucuronidation, xenobiotic glucuronidation, nervous system development, UDP glycosyltransferase activity, retinoic acid binding, protein kinase C binding, and glucoronosyl transferase activity. Gene Set Enrichment Analysis (GSEA) GO terms identified neuron-associated genes as most significant (p = 5.45e-10). Consistent with the GA's output, flavone, and flavonoid associated terms, glucuronidation were noted as were GO terms associated with neurogenesis. CONCLUSION The application of functional analyses to phenotype-associated SNP clusters provides an independent validation step in assessing the clinical meaningfulness of GWAS-derived data. Functional analyses following gene attribution of a CIPN-predictive SNP cluster identified pathways, gene ontology terms, and a network which were consistent with a neuropathic phenotype.
Collapse
Affiliation(s)
| | - Xuan Wu
- Harvard School of Dental Medicine, Boston, MA, USA
| | | | | | - Santosh Philips
- Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Jeffrey Leibowitz
- Primary Endpoint Solutions, Waltham, MA, USA.,Brigham and Women's Hospital, Boston, MA, USA
| | - Bryan Schneider
- Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Stephen Sonis
- Harvard School of Dental Medicine, Boston, MA, USA.,Primary Endpoint Solutions, Waltham, MA, USA.,Brigham and Women's Hospital, Boston, MA, USA.,Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
4
|
Identification of a SNP cluster associated with taxane-induced peripheral neuropathy risk in patients being treated for breast cancer using GWAS data derived from a large cooperative group trial. Support Care Cancer 2023; 31:139. [PMID: 36707490 DOI: 10.1007/s00520-023-07595-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 01/16/2023] [Indexed: 01/29/2023]
Abstract
BACKGROUND Chemotherapy-induced peripheral neuropathy (CIPN) is a common toxicity of taxanes for which there is no effective intervention. Genomic CIPN risk determination has yielded promising, but inconsistent results. The present study assessed the utility of a collective SNP cluster identified using novel analytics to describe taxane-associated CIPN risk. METHODS We analyzed GWAS data derived from ECOG-5103, first identifying SNPs that were most strongly associated with CIPN using Fisher's ratio (FR). We then ranked ordered those SNPs which discriminated CIPN-positive (CIPN +) from CIPN-negative phenotypes based on their discriminatory power and developed the cluster of SNPs which provided the highest predictive accuracy using leave-one-out cross-validation (LOOCV). RESULTS Using aggregated genotype data obtained from the previously reported ECOG-5103 clinical trial (in which two different arrays were used, HumanOmniExpress (727,227 SNPs) and HumanOmni1-Quad1 (1,131,857 SNPs)), we identified a 267 SNP cluster which was associated with a CIPN + phenotype with an accuracy of 96.1%. CONCLUSIONS A cluster of SNPs was identified which prospectively discriminated patients most likely to develop symptomatic CIPN following taxane exposure as part of a breast cancer chemotherapy regimen. Validation using an independent patient cohort should be performed.
Collapse
|
5
|
Battineni G, Hossain MA, Chintalapudi N, Amenta F. A Survey on the Role of Artificial Intelligence in Biobanking Studies: A Systematic Review. Diagnostics (Basel) 2022; 12:1179. [PMID: 35626333 PMCID: PMC9140088 DOI: 10.3390/diagnostics12051179] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 05/02/2022] [Accepted: 05/06/2022] [Indexed: 02/04/2023] Open
Abstract
Introduction: In biobanks, participants' biological samples are stored for future research. The application of artificial intelligence (AI) involves the analysis of data and the prediction of any pathological outcomes. In AI, models are used to diagnose diseases as well as classify and predict disease risks. Our research analyzed AI's role in the development of biobanks in the healthcare industry, systematically. Methods: The literature search was conducted using three digital reference databases, namely PubMed, CINAHL, and WoS. Guidelines for preferred reporting elements for systematic reviews and meta-analyses (PRISMA)-2020 in conducting the systematic review were followed. The search terms included "biobanks", "AI", "machine learning", and "deep learning", as well as combinations such as "biobanks with AI", "deep learning in the biobanking field", and "recent advances in biobanking". Only English-language papers were included in the study, and to assess the quality of selected works, the Newcastle-Ottawa scale (NOS) was used. The good quality range (NOS ≥ 7) is only considered for further review. Results: A literature analysis of the above entries resulted in 239 studies. Based on their relevance to the study's goal, research characteristics, and NOS criteria, we included 18 articles for reviewing. In the last decade, biobanks and artificial intelligence have had a relatively large impact on the medical system. Interestingly, UK biobanks account for the highest percentage of high-quality works, followed by Qatar, South Korea, Singapore, Japan, and Denmark. Conclusions: Translational bioinformatics probably represent a future leader in precision medicine. AI and machine learning applications to biobanking research may contribute to the development of biobanks for the utility of health services and citizens.
Collapse
Affiliation(s)
- Gopi Battineni
- Clinical Research Centre, School of Medicinal and Health Products Sciences, University of Camerino, 62032 Camerino, Italy; (M.A.H.); (N.C.); (F.A.)
| | | | | | | |
Collapse
|
6
|
Hassan M, Awan FM, Naz A, deAndrés-Galiana EJ, Alvarez O, Cernea A, Fernández-Brillet L, Fernández-Martínez JL, Kloczkowski A. Innovations in Genomics and Big Data Analytics for Personalized Medicine and Health Care: A Review. Int J Mol Sci 2022; 23:4645. [PMID: 35563034 PMCID: PMC9104788 DOI: 10.3390/ijms23094645] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 04/06/2022] [Accepted: 04/18/2022] [Indexed: 02/01/2023] Open
Abstract
Big data in health care is a fast-growing field and a new paradigm that is transforming case-based studies to large-scale, data-driven research. As big data is dependent on the advancement of new data standards, technology, and relevant research, the future development of big data applications holds foreseeable promise in the modern day health care revolution. Enormously large, rapidly growing collections of biomedical omics-data (genomics, proteomics, transcriptomics, metabolomics, glycomics, etc.) and clinical data create major challenges and opportunities for their analysis and interpretation and open new computational gateways to address these issues. The design of new robust algorithms that are most suitable to properly analyze this big data by taking into account individual variability in genes has enabled the creation of precision (personalized) medicine. We reviewed and highlighted the significance of big data analytics for personalized medicine and health care by focusing mostly on machine learning perspectives on personalized medicine, genomic data models with respect to personalized medicine, the application of data mining algorithms for personalized medicine as well as the challenges we are facing right now in big data analytics.
Collapse
Affiliation(s)
- Mubashir Hassan
- Institute of Molecular Biology and Biotechnology (IMBB), The University of Lahore (UOL), Lahore 54590, Pakistan;
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH 43205, USA
| | - Faryal Mehwish Awan
- Department of Medical Lab Technology, The University of Haripur, Haripur 22620, Pakistan;
| | - Anam Naz
- Institute of Molecular Biology and Biotechnology (IMBB), The University of Lahore (UOL), Lahore 54590, Pakistan;
| | - Enrique J. deAndrés-Galiana
- Group of Inverse Problems, Optimization and Machine Learning, University of Oviedo, 33003 Oviedo, Spain; (E.J.d.-G.); (J.L.F.-M.)
| | - Oscar Alvarez
- DeepBioInsights, 38311 La Florida, Spain; (O.A.); (A.C.); (L.F.-B.)
| | - Ana Cernea
- DeepBioInsights, 38311 La Florida, Spain; (O.A.); (A.C.); (L.F.-B.)
| | | | - Juan Luis Fernández-Martínez
- Group of Inverse Problems, Optimization and Machine Learning, University of Oviedo, 33003 Oviedo, Spain; (E.J.d.-G.); (J.L.F.-M.)
| | - Andrzej Kloczkowski
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH 43205, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43205, USA
| |
Collapse
|
7
|
Shehab M, Abualigah L, Shambour Q, Abu-Hashem MA, Shambour MKY, Alsalibi AI, Gandomi AH. Machine learning in medical applications: A review of state-of-the-art methods. Comput Biol Med 2022; 145:105458. [PMID: 35364311 DOI: 10.1016/j.compbiomed.2022.105458] [Citation(s) in RCA: 90] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 03/23/2022] [Accepted: 03/24/2022] [Indexed: 12/11/2022]
Abstract
Applications of machine learning (ML) methods have been used extensively to solve various complex challenges in recent years in various application areas, such as medical, financial, environmental, marketing, security, and industrial applications. ML methods are characterized by their ability to examine many data and discover exciting relationships, provide interpretation, and identify patterns. ML can help enhance the reliability, performance, predictability, and accuracy of diagnostic systems for many diseases. This survey provides a comprehensive review of the use of ML in the medical field highlighting standard technologies and how they affect medical diagnosis. Five major medical applications are deeply discussed, focusing on adapting the ML models to solve the problems in cancer, medical chemistry, brain, medical imaging, and wearable sensors. Finally, this survey provides valuable references and guidance for researchers, practitioners, and decision-makers framing future research and development directions.
Collapse
Affiliation(s)
- Mohammad Shehab
- Information Technology, The World Islamic Sciences and Education University. Amman, Jordan.
| | - Laith Abualigah
- Faculty of Computer Sciences and Informatics, Amman Arab University, Amman, Jordan; School of Computer Sciences, Universiti Sains Malaysia, Pulau, Pinang, 11800, Malaysia.
| | - Qusai Shambour
- Department of Software Engineering, Al-Ahliyya Amman University, Amman, Jordan.
| | - Muhannad A Abu-Hashem
- Department of Geomatics, Faculty of Architecture and Planning, King Abdulaziz University, Jeddah, Saudi Arabia.
| | | | | | - Amir H Gandomi
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, 2007, Australia.
| |
Collapse
|
8
|
Grigorian N, Baumrucker SJ. Aromatase inhibitor–associated musculoskeletal pain: An overview of pathophysiology and treatment modalities. SAGE Open Med 2022; 10:20503121221078722. [PMID: 35321462 PMCID: PMC8935546 DOI: 10.1177/20503121221078722] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 01/17/2022] [Indexed: 11/23/2022] Open
Abstract
Since their introduction into clinical use in the 1970s, aromatase inhibitors have been a cornerstone of therapy for estrogen-receptor positive breast cancer in postmenopausal women. Unfortunately, this therapy leads to estrogen depletion in the body, which can lead to unpleasant side effects such as menopausal symptoms like hot flashes, insomnia, slightly increased risk of ischemic heart disease, accelerated bone loss leading to higher osteoporosis risk, and most significantly, arthralgias. The joint pain induced by aromatase inhibitor therapy is frequently cited as the leading cause of premature discontinuation; approximately 50% of patients will report new onset or worsening joint pain 1 year after therapy initiation, approximately 30% of patients discontinue therapy after 1 year, and only 50%–68% of patients remain fully compliant with therapy after 3 years. This article will describe risk factors for aromatase inhibitor–associated musculoskeletal syndrome, including genetic predispositions correlated with an increased risk of this syndrome, explain the currently understood pathophysiology, and give an overview of effective treatment options in managing this syndrome.
Collapse
|
9
|
Takei M, Okada N, Nakamura S, Kagawa K, Fujii S, Miki H, Ishizawa K, Abe M, Sato Y. A genome-wide association study predicts the onset of dysgeusia due to anti-cancer drug treatment. Biol Pharm Bull 2021; 45:114-117. [PMID: 34657909 DOI: 10.1248/bpb.b21-00745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Dysgeusia is a major side effect of anti-cancer drug treatment. Since dysgeusia significantly lowers the patient's quality of life, predicting and avoiding its onset in advance is desirable. Accordingly, aims of the present study were to use a genome-wide association study (GWAS) to identify genes associated with the development of dysgeusia in patients taking anti-cancer drugs and to predict the development of dysgeusia using associated SNPs. GWAS was conducted on 76 patients admitted to the Department of Hematology, Tokushima University Hospital. Using Sanger sequencing for 23 separately collected validation samples, the top two single nucleotide polymorphisms (SNPs) associated with the development of dysgeusia were determined. GWAS identified rs73049478 and rs41396146 SNPs on the RARB gene associated with dysgeusia development due to the administration of anti-cancer drugs. Evaluation of the two SNPs using 23 validation samples indicated that the accuracy rate of rs73049478 was relatively high (87.0%). Thus, the findings of the present study suggest that the rs73049478 SNP of RARB can be used to predict the onset of dysgeusia caused by the administration of anti-cancer drugs.
Collapse
Affiliation(s)
- Minori Takei
- Department of Pharmaceutical Information Science, Institute of Biomedical Sciences, Tokushima University Graduate School
| | - Naoto Okada
- Department of Pharmacy, Tokushima University Hospital
| | - Shingen Nakamura
- Department of Community Medicine and Medical Science, Tokushima University Graduate School of Biomedical Sciences
| | - Kumiko Kagawa
- Department of Hematology, Endocrinology and Metabolism, Institute of Biomedical Sciences, Tokushima University Graduate School
| | - Shiro Fujii
- Department of Hematology, Endocrinology and Metabolism, Institute of Biomedical Sciences, Tokushima University Graduate School
| | - Hirokazu Miki
- Division of Transfusion Medicine and Cell Therapy, Tokushima University Hospital
| | - Keisuke Ishizawa
- Department of Pharmacy, Tokushima University Hospital.,Department of Clinical Pharmacology and Therapeutics, Institute of Biomedical Sciences, Tokushima University Graduate School
| | - Masahiro Abe
- Department of Hematology, Endocrinology and Metabolism, Institute of Biomedical Sciences, Tokushima University Graduate School
| | - Youichi Sato
- Department of Pharmaceutical Information Science, Institute of Biomedical Sciences, Tokushima University Graduate School
| |
Collapse
|
10
|
Prediction of mucositis risk secondary to cancer therapy: a systematic review of current evidence and call to action. Support Care Cancer 2020; 28:5059-5073. [PMID: 32592033 DOI: 10.1007/s00520-020-05579-7] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Accepted: 06/12/2020] [Indexed: 01/25/2023]
Abstract
PURPOSE Despite advances in personalizing the efficacy of cancer therapy, our ability to identify patients at risk of severe treatment side effects and provide individualized supportive care is limited. This is particularly the case for mucositis (oral and gastrointestinal), with no comprehensive risk evaluation strategies to identify high-risk patients. We, the Multinational Association for Supportive Care in Cancer/International Society for Oral Oncology (MASCC/ISOO) Mucositis Study Group, therefore aimed to systematically review current evidence on that factors that influence mucositis risk to provide a foundation upon which future risk prediction studies can be based. METHODS We identified 11,018 papers from PubMed and Web of Science, with 197 records extracted for full review and 113 meeting final eligibility criteria. Data were then synthesized into tables to highlight the level of evidence for each risk predictor. RESULTS The strongest level of evidence supported dosimetric parameters as key predictors of mucositis risk. Genetic variants in drug-metabolizing pathways, immune signaling, and cell injury/repair mechanisms were also identified to impact mucositis risk. Factors relating to the individual were variably linked to mucositis outcomes, although female sex and smoking status showed some association with mucositis risk. CONCLUSION Mucositis risk reflects the complex interplay between the host, tumor microenvironment, and treatment specifications, yet the large majority of studies rely on hypothesis-driven, single-candidate approaches. For significant advances in the provision of personalized supportive care, coordinated research efforts with robust multiplexed approaches are strongly advised.
Collapse
|
11
|
Machine learning-based lifetime breast cancer risk reclassification compared with the BOADICEA model: impact on screening recommendations. Br J Cancer 2020; 123:860-867. [PMID: 32565540 PMCID: PMC7463251 DOI: 10.1038/s41416-020-0937-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 05/13/2020] [Accepted: 05/29/2020] [Indexed: 12/17/2022] Open
Abstract
Background The clinical utility of machine-learning (ML) algorithms for breast cancer risk prediction and screening practices is unknown. We compared classification of lifetime breast cancer risk based on ML and the BOADICEA model. We explored the differences in risk classification and their clinical impact on screening practices. Methods We used three different ML algorithms and the BOADICEA model to estimate lifetime breast cancer risk in a sample of 112,587 individuals from 2481 families from the Oncogenetic Unit, Geneva University Hospitals. Performance of algorithms was evaluated using the area under the receiver operating characteristic (AU-ROC) curve. Risk reclassification was compared for 36,146 breast cancer-free women of ages 20–80. The impact on recommendations for mammography surveillance was based on the Swiss Surveillance Protocol. Results The predictive accuracy of ML-based algorithms (0.843 ≤ AU-ROC ≤ 0.889) was superior to BOADICEA (AU-ROC = 0.639) and reclassified 35.3% of women in different risk categories. The largest reclassification (20.8%) was observed in women characterised as ‘near population’ risk by BOADICEA. Reclassification had the largest impact on screening practices of women younger than 50. Conclusion ML-based reclassification of lifetime breast cancer risk occurred in approximately one in three women. Reclassification is important for younger women because it impacts clinical decision- making for the initiation of screening.
Collapse
|
12
|
Lu K, Yang K, Niyongabo E, Shu Z, Wang J, Chang K, Zou Q, Jiang J, Jia C, Liu B, Zhou X. Integrated network analysis of symptom clusters across disease conditions. J Biomed Inform 2020; 107:103482. [PMID: 32535270 DOI: 10.1016/j.jbi.2020.103482] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 05/18/2020] [Accepted: 06/08/2020] [Indexed: 10/24/2022]
Abstract
Identifying the symptom clusters (two or more related symptoms) with shared underlying molecular mechanisms has been a vital analysis task to promote the symptom science and precision health. Related studies have applied the clustering algorithms (e.g. k-means, latent class model) to detect the symptom clusters mostly from various kinds of clinical data. In addition, they focused on identifying the symptom clusters (SCs) for a specific disease, which also mainly concerned with the clinical regularities for symptom management. Here, we utilized a network-based clustering algorithm (i.e., BigCLAM) to obtain 208 typical SCs across disease conditions on a large-scale symptom network derived from integrated high-quality disease-symptom associations. Furthermore, we evaluated the underlying shared molecular mechanisms for SCs, i.e., shared genes, protein-protein interaction (PPI) and gene functional annotations using integrated networks and similarity measures. We found that the symptoms in the same SCs tend to share a higher degree of genes, PPIs and have higher functional homogeneities. In addition, we found that most SCs have related symptoms with shared underlying molecular mechanisms (e.g. enriched pathways) across different disease conditions. Our work demonstrated that the integrated network analysis method could be used for identifying robust SCs and investigate the molecular mechanisms of these SCs, which would be valuable for symptom science and precision health.
Collapse
Affiliation(s)
- Kezhi Lu
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Kuo Yang
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Edouard Niyongabo
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Zixin Shu
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Jingjing Wang
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Kai Chang
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Qunsheng Zou
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Jiyue Jiang
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Caiyan Jia
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Baoyan Liu
- Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China.
| | - Xuezhong Zhou
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China; Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China.
| |
Collapse
|
13
|
Cernea A, Fernández-Martínez JL, deAndrés-Galiana EJ, Fernández-Ovies FJ, Alvarez-Machancoses O, Fernández-Muñiz Z, Saligan LN, Sonis ST. Robust pathway sampling in phenotype prediction. Application to triple negative breast cancer. BMC Bioinformatics 2020; 21:89. [PMID: 32164540 PMCID: PMC7068866 DOI: 10.1186/s12859-020-3356-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Background Phenotype prediction problems are usually considered ill-posed, as the amount of samples is very limited with respect to the scrutinized genetic probes. This fact complicates the sampling of the defective genetic pathways due to the high number of possible discriminatory genetic networks involved. In this research, we outline three novel sampling algorithms utilized to identify, classify and characterize the defective pathways in phenotype prediction problems, such as the Fisher’s ratio sampler, the Holdout sampler and the Random sampler, and apply each one to the analysis of genetic pathways involved in tumor behavior and outcomes of triple negative breast cancers (TNBC). Altered biological pathways are identified using the most frequently sampled genes and are compared to those obtained via Bayesian Networks (BNs). Results Random, Fisher’s ratio and Holdout samplers were more accurate and robust than BNs, while providing comparable insights about disease genomics. Conclusions The three samplers tested are good alternatives to Bayesian Networks since they are less computationally demanding algorithms. Importantly, this analysis confirms the concept of “biological invariance” since the altered pathways should be independent of the sampling methodology and the classifier used for their inference. Nevertheless, still some modifications are needed in the Bayesian networks to be able to sample correctly the uncertainty space in phenotype prediction problems, since the probabilistic parameterization of the uncertainty space is not unique and the use of the optimum network might falsify the pathways analysis.
Collapse
Affiliation(s)
- Ana Cernea
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C/ Federico García-Lorca, 18, 33007, Oviedo, Spain
| | - Juan Luis Fernández-Martínez
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C/ Federico García-Lorca, 18, 33007, Oviedo, Spain.
| | - Enrique J deAndrés-Galiana
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C/ Federico García-Lorca, 18, 33007, Oviedo, Spain.,Department of Informatics and Computer Science, University of Oviedo, C/ Federico García-Lorca, 18, 33007, Oviedo, Spain
| | - Francisco Javier Fernández-Ovies
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C/ Federico García-Lorca, 18, 33007, Oviedo, Spain
| | - Oscar Alvarez-Machancoses
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C/ Federico García-Lorca, 18, 33007, Oviedo, Spain
| | - Zulima Fernández-Muñiz
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C/ Federico García-Lorca, 18, 33007, Oviedo, Spain
| | - Leorey N Saligan
- National Institutes of Health, National Institute of Nursing Research, Bethesda, MD, USA
| | - Stephen T Sonis
- Primary Endpoint Solutions, Watertown, MA, USA.,Brigham and Women's Hospital and the Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
14
|
Ming C, Viassolo V, Probst-Hensch N, Chappuis PO, Dinov ID, Katapodi MC. Machine learning techniques for personalized breast cancer risk prediction: comparison with the BCRAT and BOADICEA models. Breast Cancer Res 2019; 21:75. [PMID: 31221197 PMCID: PMC6585114 DOI: 10.1186/s13058-019-1158-4] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Accepted: 05/28/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Comprehensive breast cancer risk prediction models enable identifying and targeting women at high-risk, while reducing interventions in those at low-risk. Breast cancer risk prediction models used in clinical practice have low discriminatory accuracy (0.53-0.64). Machine learning (ML) offers an alternative approach to standard prediction modeling that may address current limitations and improve accuracy of those tools. The purpose of this study was to compare the discriminatory accuracy of ML-based estimates against a pair of established methods-the Breast Cancer Risk Assessment Tool (BCRAT) and Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA) models. METHODS We quantified and compared the performance of eight different ML methods to the performance of BCRAT and BOADICEA using eight simulated datasets and two retrospective samples: a random population-based sample of U.S. breast cancer patients and their cancer-free female relatives (N = 1143), and a clinical sample of Swiss breast cancer patients and cancer-free women seeking genetic evaluation and/or testing (N = 2481). RESULTS Predictive accuracy (AU-ROC curve) reached 88.28% using ML-Adaptive Boosting and 88.89% using ML-random forest versus 62.40% with BCRAT for the U.S. population-based sample. Predictive accuracy reached 90.17% using ML-adaptive boosting and 89.32% using ML-Markov chain Monte Carlo generalized linear mixed model versus 59.31% with BOADICEA for the Swiss clinic-based sample. CONCLUSIONS There was a striking improvement in the accuracy of classification of women with and without breast cancer achieved with ML algorithms compared to the state-of-the-art model-based approaches. High-accuracy prediction techniques are important in personalized medicine because they facilitate stratification of prevention strategies and individualized clinical management.
Collapse
Affiliation(s)
- Chang Ming
- Nursing Science, Faculty of Medicine, University of Basel, Bernoullistrasse 28, Room 118, 4056, Basel, Switzerland.
| | - Valeria Viassolo
- Oncogenetics and Cancer Prevention, Geneva University Hospitals, Geneva, Switzerland
| | - Nicole Probst-Hensch
- Swiss Tropical and Public Health Institute, University of Basel, Basel, Switzerland
| | - Pierre O Chappuis
- Oncogenetics and Cancer Prevention, Geneva University Hospitals, Geneva, Switzerland.,Genetic Medicine, Geneva University Hospitals, Geneva, Switzerland
| | - Ivo D Dinov
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.,Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI, USA.,Statistics Online Computational resource, University of Michigan, Ann Arbor, MI, USA.,University of Michigan School of Nursing, Ann Arbor, MI, USA
| | - Maria C Katapodi
- Nursing Science, Faculty of Medicine, University of Basel, Bernoullistrasse 28, Room 118, 4056, Basel, Switzerland.,University of Michigan School of Nursing, Ann Arbor, MI, USA
| |
Collapse
|
15
|
Álvarez-Machancoses Ó, Fernández-Martínez JL. Using artificial intelligence methods to speed up drug discovery. Expert Opin Drug Discov 2019; 14:769-777. [DOI: 10.1080/17460441.2019.1621284] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Óscar Álvarez-Machancoses
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, Oviedo, Spain
| | - Juan Luis Fernández-Martínez
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, Oviedo, Spain
| |
Collapse
|
16
|
Therapeutic options for aromatase inhibitor-associated arthralgia in breast cancer survivors: A systematic review of systematic reviews, evidence mapping, and network meta-analysis. Maturitas 2018; 118:29-37. [DOI: 10.1016/j.maturitas.2018.09.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Revised: 09/19/2018] [Accepted: 09/26/2018] [Indexed: 01/08/2023]
|
17
|
Sampling Defective Pathways in Phenotype Prediction Problems via the Fisher’s Ratio Sampler. BIOINFORMATICS AND BIOMEDICAL ENGINEERING 2018. [DOI: 10.1007/978-3-319-78759-6_2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
18
|
Reinbolt RE, Sonis S, Timmers CD, Fernández-Martínez JL, Cernea A, de Andrés-Galiana EJ, Hashemi S, Miller K, Pilarski R, Lustberg MB. Genomic risk prediction of aromatase inhibitor-related arthralgia in patients with breast cancer using a novel machine-learning algorithm. Cancer Med 2017; 7:240-253. [PMID: 29168353 PMCID: PMC5773952 DOI: 10.1002/cam4.1256] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Revised: 10/05/2017] [Accepted: 10/13/2017] [Indexed: 02/06/2023] Open
Abstract
Many breast cancer (BC) patients treated with aromatase inhibitors (AIs) develop aromatase inhibitor‐related arthralgia (AIA). Candidate gene studies to identify AIA risk are limited in scope. We evaluated the potential of a novel analytic algorithm (NAA) to predict AIA using germline single nucleotide polymorphisms (SNP) data obtained before treatment initiation. Systematic chart review of 700 AI‐treated patients with stage I‐III BC identified asymptomatic patients (n = 39) and those with clinically significant AIA resulting in AI termination or therapy switch (n = 123). Germline DNA was obtained and SNP genotyping performed using the Affymetrix UK BioBank Axiom Array to yield 695,277 SNPs. SNP clusters that most closely defined AIA risk were discovered using an NAA that sequentially combined statistical filtering and a machine‐learning algorithm. NCBI PhenGenI and Ensemble databases defined gene attribution of the most discriminating SNPs. Phenotype, pathway, and ontologic analyses assessed functional and mechanistic validity. Demographics were similar in cases and controls. A cluster of 70 SNPs, correlating to 57 genes, was identified. This SNP group predicted AIA occurrence with a maximum accuracy of 75.93%. Strong associations with arthralgia, breast cancer, and estrogen phenotypes were seen in 19/57 genes (33%) and were functionally consistent. Using a NAA, we identified a 70 SNP cluster that predicted AIA risk with fair accuracy. Phenotype, functional, and pathway analysis of attributed genes was consistent with clinical phenotypes. This study is the first to link a specific SNP/gene cluster to AIA risk independent of candidate gene bias.
Collapse
Affiliation(s)
- Raquel E Reinbolt
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Stephen Sonis
- Primary Endpoint Solutions, Watertown, Massachusetts.,Brigham and Women's Hospital and the Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Cynthia D Timmers
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | | | - Ana Cernea
- Primary Endpoint Solutions, Watertown, Massachusetts.,University of Oviedo, Oviedo, Spain
| | | | - Sepehr Hashemi
- Primary Endpoint Solutions, Watertown, Massachusetts.,Harvard School of Dental Medicine, Boston, Massachusetts
| | - Karin Miller
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Robert Pilarski
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Maryam B Lustberg
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| |
Collapse
|