1
|
Baminiwatte R, Torsu B, Scherbakov D, Mollalo A, Obeid JS, Alekseyenko AV, Lenert LA. Machine learning in healthcare citizen science: A scoping review. Int J Med Inform 2024; 195:105766. [PMID: 39740357 DOI: 10.1016/j.ijmedinf.2024.105766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 11/20/2024] [Accepted: 12/15/2024] [Indexed: 01/02/2025]
Abstract
OBJECTIVES This scoping review aims to clarify the definition and trajectory of citizen-led scientific research (so-called citizen science) within the healthcare domain, examine the degree of integration of machine learning (ML) and the participation levels of citizen scientists in health-related projects. MATERIALS AND METHODS In January and September 2024 we conducted a comprehensive search in PubMed, Scopus, Web of Science, and EBSCOhost platform for peer-reviewed publications that combine citizen science and machine learning (ML) in healthcare. Articles were excluded if citizens were merely passive data providers or if only professional scientists were involved. RESULTS Out of an initial 1,395 screened, 56 articles spanning from 2013 to 2024 met the inclusion criteria. The majority of research projects were conducted in the U.S. (n = 20, 35.7 %), followed by Germany (n = 6, 10.7 %), with Spain, Canada, and the UK each contributing three studies (5.4 %). Data collection was the primary form of citizen scientist involvement (n = 29, 51.8 %), which included capturing images, sharing data online, and mailing samples. Data annotation was the next most common activity (n = 15, 26.8 %), followed by participation in ML model challenges (n = 8, 14.3 %) and decision-making contributions (n = 3, 5.4 %). Mosquitoes (n = 10, 34.5 %) and air pollution samples (n = 7, 24.2 %) were the main data objects collected by citizens for ML analysis. Classification tasks were the most prevalent ML method (n = 30, 52.6 %), with Convolutional Neural Networks being the most frequently used algorithm (n = 13, 20 %). DISCUSSION AND CONCLUSIONS Citizen science in healthcare is currently an American and European construct with growing expansion in Asia. Citizens are contributing data, and labeling data for ML methods, but only infrequently analyzing or leading studies. Projects that use "crowd-sourced" data and "citizen science" should be differentiated depending on the degree of involvement of citizens.
Collapse
Affiliation(s)
- Ranga Baminiwatte
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA
| | - Blessing Torsu
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA
| | - Dmitry Scherbakov
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA
| | - Abolfazl Mollalo
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA
| | - Jihad S Obeid
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA
| | - Alexander V Alekseyenko
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA
| | - Leslie A Lenert
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA.
| |
Collapse
|
2
|
Tandon S, Sharma M, Kasar P, Kala A. A cloud-based precision oncology framework for whole genome sequence analysis. Comput Biol Chem 2024; 110:108062. [PMID: 38554501 DOI: 10.1016/j.compbiolchem.2024.108062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 03/05/2024] [Accepted: 03/25/2024] [Indexed: 04/01/2024]
Abstract
Cancer is one of the wide-ranging diseases which have a high mortality rate impacting globally. This scenario can be switched by early detection and correct precision treatment, a major concern for cancer patients. Clinicians can figure out the best-suited treatments for cancer patients by analyzing the patient's genome, which will treat the patient well and minimize the chances of side effects as well. Therefore, we have developed a fast, robust, and efficient solution as our precision oncology framework based on the whole genome sequencing of the individual's DNA. This platform can perform the entire genomic analysis, starting from the quality assessment of the input file to the variant annotation and functional prediction, followed by a certain level of interpretation. This analysis helps in the molecular profiling of the tumors for the identification of the targetable alterations. It takes in FASTQ or BAM file as an input and provides us with two output reports: a primary report, which consists of the patients' details, a summary of the analysis, and a secondary report, which is an elaborated report comprised of numerous results obtained from the analysis such as base changes, codon changes, amino acid changes, TMB analysis, MSI analysis, the variant frequency with its effects and impacts, affected biomarkers, etc. This framework can be effectively utilized for cancer treatment guidance, identification and validation of novel biomarkers, oncology research & development, genomic analysis, and gene manipulation.
Collapse
Affiliation(s)
- Saloni Tandon
- Celebal Technologies Private Limited, 7th Floor Corporate tower, JLN Marg, Near Jawahar Circle, Malviya Nagar, Jaipur, Rajasthan 302017, India.
| | - Medha Sharma
- Celebal Technologies Private Limited, 7th Floor Corporate tower, JLN Marg, Near Jawahar Circle, Malviya Nagar, Jaipur, Rajasthan 302017, India
| | - Pratik Kasar
- Celebal Technologies Private Limited, 7th Floor Corporate tower, JLN Marg, Near Jawahar Circle, Malviya Nagar, Jaipur, Rajasthan 302017, India
| | - Anirudh Kala
- Celebal Technologies Private Limited, 7th Floor Corporate tower, JLN Marg, Near Jawahar Circle, Malviya Nagar, Jaipur, Rajasthan 302017, India
| |
Collapse
|
3
|
Liu Y, Xie H, Zhao X, Tang J, Yu Z, Wu Z, Tian R, Chen Y, Chen M, Ntentakis DP, Du Y, Chen T, Hu Y, Zhang S, Lei B, Zhang G. Automated detection of nine infantile fundus diseases and conditions in retinal images using a deep learning system. EPMA J 2024; 15:39-51. [PMID: 38463622 PMCID: PMC10923762 DOI: 10.1007/s13167-024-00350-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 01/21/2024] [Indexed: 03/12/2024]
Abstract
Purpose We developed an Infant Retinal Intelligent Diagnosis System (IRIDS), an automated system to aid early diagnosis and monitoring of infantile fundus diseases and health conditions to satisfy urgent needs of ophthalmologists. Methods We developed IRIDS by combining convolutional neural networks and transformer structures, using a dataset of 7697 retinal images (1089 infants) from four hospitals. It identifies nine fundus diseases and conditions, namely, retinopathy of prematurity (ROP) (mild ROP, moderate ROP, and severe ROP), retinoblastoma (RB), retinitis pigmentosa (RP), Coats disease, coloboma of the choroid, congenital retinal fold (CRF), and normal. IRIDS also includes depth attention modules, ResNet-18 (Res-18), and Multi-Axis Vision Transformer (MaxViT). Performance was compared to that of ophthalmologists using 450 retinal images. The IRIDS employed a five-fold cross-validation approach to generate the classification results. Results Several baseline models achieved the following metrics: accuracy, precision, recall, F1-score (F1), kappa, and area under the receiver operating characteristic curve (AUC) with best values of 94.62% (95% CI, 94.34%-94.90%), 94.07% (95% CI, 93.32%-94.82%), 90.56% (95% CI, 88.64%-92.48%), 92.34% (95% CI, 91.87%-92.81%), 91.15% (95% CI, 90.37%-91.93%), and 99.08% (95% CI, 99.07%-99.09%), respectively. In comparison, IRIDS showed promising results compared to ophthalmologists, demonstrating an average accuracy, precision, recall, F1, kappa, and AUC of 96.45% (95% CI, 96.37%-96.53%), 95.86% (95% CI, 94.56%-97.16%), 94.37% (95% CI, 93.95%-94.79%), 95.03% (95% CI, 94.45%-95.61%), 94.43% (95% CI, 93.96%-94.90%), and 99.51% (95% CI, 99.51%-99.51%), respectively, in multi-label classification on the test dataset, utilizing the Res-18 and MaxViT models. These results suggest that, particularly in terms of AUC, IRIDS achieved performance that warrants further investigation for the detection of retinal abnormalities. Conclusions IRIDS identifies nine infantile fundus diseases and conditions accurately. It may aid non-ophthalmologist personnel in underserved areas in infantile fundus disease screening. Thus, preventing severe complications. The IRIDS serves as an example of artificial intelligence integration into ophthalmology to achieve better outcomes in predictive, preventive, and personalized medicine (PPPM / 3PM) in the treatment of infantile fundus diseases. Supplementary Information The online version contains supplementary material available at 10.1007/s13167-024-00350-y.
Collapse
Affiliation(s)
- Yaling Liu
- Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, 518040 China
| | - Hai Xie
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
| | - Xinyu Zhao
- Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, 518040 China
| | - Jiannan Tang
- Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, 518040 China
| | - Zhen Yu
- Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, 518040 China
| | - Zhenquan Wu
- Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, 518040 China
| | - Ruyin Tian
- Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, 518040 China
| | - Yi Chen
- Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, 518040 China
- Guizhou Medical University, Guiyang, Guizhou China
| | - Miaohong Chen
- Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, 518040 China
- Guizhou Medical University, Guiyang, Guizhou China
| | - Dimitrios P. Ntentakis
- Retina Service, Ines and Fred Yeatts Retina Research Laboratory, Angiogenesis Laboratory, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA USA
| | - Yueshanyi Du
- Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, 518040 China
| | - Tingyi Chen
- Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, 518040 China
- Guizhou Medical University, Guiyang, Guizhou China
| | - Yarou Hu
- Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, 518040 China
| | - Sifan Zhang
- Guizhou Medical University, Guiyang, Guizhou China
- Southern University of Science and Technology School of Medicine, Shenzhen, China
| | - Baiying Lei
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
| | - Guoming Zhang
- Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, 518040 China
- Guizhou Medical University, Guiyang, Guizhou China
| |
Collapse
|
4
|
Doan Thu TN, Nguyen QK, Taylor-Robinson AW. Healthcare in Vietnam: Harnessing Artificial Intelligence and Robotics to Improve Patient Care Outcomes. Cureus 2023; 15:e45006. [PMID: 37829937 PMCID: PMC10565519 DOI: 10.7759/cureus.45006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/11/2023] [Indexed: 10/14/2023] Open
Abstract
Healthcare in Vietnam is increasingly utilizing artificial intelligence (AI) and robotics to enhance patient care outcomes. The Vietnamese healthcare sector recognizes the potential of AI and is actively exploring its applications in research and clinical practice. AI technologies, such as text mining and machine learning, can be employed to analyze medical data and improve decision-making processes. Robotics, on the other hand, can support various healthcare tasks, including elderly care, rehabilitation, and surgical interventions. Robotic surgery, specifically, is an innovative form of minimally invasive surgery that aims to improve surgical outcomes and enhance the patient experience. The implementation of AI in emergency and trauma settings is still in its early stages, but there is a growing interest in and recognition of its potential benefits. However, there are challenges that need to be addressed, such as the need for appropriate research and training programs to support the adoption and integration of AI in healthcare. Despite these challenges, healthcare professionals in Vietnam are optimistic about the potential of AI to improve acute care surgery and are open to embracing new digital technologies. The use of AI and robotics in healthcare aligns with the broader goal of improving healthcare systems in low- and middle-income countries, including Vietnam, through technological advancements. Overall, AI can play an important role in assisting prognosis and predictive analysis by integrating vast amounts of data. Moreover, the integration of AI and robotics in healthcare in Vietnam has the potential to enhance patient care outcomes, improve decision-making processes, and support healthcare professionals in their practice.
Collapse
Affiliation(s)
- Tu N Doan Thu
- Business and Management, College of Business and Management, VinUniversity, Hanoi, VNM
| | - Quan K Nguyen
- Epidemiology and Public Health, College of Health Sciences, VinUniversity, Hanoi, VNM
| | - Andrew W Taylor-Robinson
- Epidemiology and Public Health, College of Health Sciences, VinUniversity, Hanoi, VNM
- Epidemiology and Public Health, Center for Global Health, University of Pennsylvania Perelman School of Medicine, Philadelphia, USA
- Epidemiology and Public Health, Smart Health Center, VinUniversity, Hanoi, VNM
| |
Collapse
|
5
|
Rafique R, Islam SR, Kazi JU. Machine learning in the prediction of cancer therapy. Comput Struct Biotechnol J 2021; 19:4003-4017. [PMID: 34377366 PMCID: PMC8321893 DOI: 10.1016/j.csbj.2021.07.003] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 07/06/2021] [Accepted: 07/07/2021] [Indexed: 12/15/2022] Open
Abstract
Resistance to therapy remains a major cause of cancer treatment failures, resulting in many cancer-related deaths. Resistance can occur at any time during the treatment, even at the beginning. The current treatment plan is dependent mainly on cancer subtypes and the presence of genetic mutations. Evidently, the presence of a genetic mutation does not always predict the therapeutic response and can vary for different cancer subtypes. Therefore, there is an unmet need for predictive models to match a cancer patient with a specific drug or drug combination. Recent advancements in predictive models using artificial intelligence have shown great promise in preclinical settings. However, despite massive improvements in computational power, building clinically useable models remains challenging due to a lack of clinically meaningful pharmacogenomic data. In this review, we provide an overview of recent advancements in therapeutic response prediction using machine learning, which is the most widely used branch of artificial intelligence. We describe the basics of machine learning algorithms, illustrate their use, and highlight the current challenges in therapy response prediction for clinical practice.
Collapse
Affiliation(s)
| | - S.M. Riazul Islam
- Department of Computer Science and Engineering, Sejong University, Seoul, South Korea
| | - Julhash U. Kazi
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Corresponding author at: Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Medicon village Building 404:C3, Scheelevägen 8, 22363 Lund, Sweden.
| |
Collapse
|
6
|
Marostica E, Barber R, Denize T, Kohane IS, Signoretti S, Golden JA, Yu KH. Development of a Histopathology Informatics Pipeline for Classification and Prediction of Clinical Outcomes in Subtypes of Renal Cell Carcinoma. Clin Cancer Res 2021; 27:2868-2878. [PMID: 33722896 DOI: 10.1158/1078-0432.ccr-20-4119] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 01/25/2021] [Accepted: 03/10/2021] [Indexed: 12/24/2022]
Abstract
PURPOSE Histopathology evaluation is the gold standard for diagnosing clear cell (ccRCC), papillary, and chromophobe renal cell carcinoma (RCC). However, interrater variability has been reported, and the whole-slide histopathology images likely contain underutilized biological signals predictive of genomic profiles. EXPERIMENTAL DESIGN To address this knowledge gap, we obtained whole-slide histopathology images and demographic, genomic, and clinical data from The Cancer Genome Atlas, the Clinical Proteomic Tumor Analysis Consortium, and Brigham and Women's Hospital (Boston, MA) to develop computational methods for integrating data analyses. Leveraging these large and diverse datasets, we developed fully automated convolutional neural networks to diagnose renal cancers and connect quantitative pathology patterns with patients' genomic profiles and prognoses. RESULTS Our deep convolutional neural networks successfully detected malignancy (AUC in the independent validation cohort: 0.964-0.985), diagnosed RCC histologic subtypes (independent validation AUCs of the best models: 0.953-0.993), and predicted stage I ccRCC patients' survival outcomes (log-rank test P = 0.02). Our machine learning approaches further identified histopathology image features indicative of copy-number alterations (AUC > 0.7 in multiple genes in patients with ccRCC) and tumor mutation burden. CONCLUSIONS Our results suggest that convolutional neural networks can extract histologic signals predictive of patients' diagnoses, prognoses, and genomic variations of clinical importance. Our approaches can systematically identify previously unknown relations among diverse data modalities.
Collapse
Affiliation(s)
- Eliana Marostica
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - Rebecca Barber
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts.,Department of Computer Science, Princeton University, Princeton, New Jersey
| | - Thomas Denize
- Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - Sabina Signoretti
- Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts
| | - Jeffrey A Golden
- Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts.,Cedars-Sinai Medical Center, Los Angeles, California
| | - Kun-Hsing Yu
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts. .,Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts
| |
Collapse
|
7
|
Yu KH, Hu V, Wang F, Matulonis UA, Mutter GL, Golden JA, Kohane IS. Deciphering serous ovarian carcinoma histopathology and platinum response by convolutional neural networks. BMC Med 2020; 18:236. [PMID: 32807164 PMCID: PMC7433108 DOI: 10.1186/s12916-020-01684-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 06/28/2020] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Ovarian cancer causes 151,900 deaths per year worldwide. Treatment and prognosis are primarily determined by the histopathologic interpretation in combination with molecular diagnosis. However, the relationship between histopathology patterns and molecular alterations is not fully understood, and it is difficult to predict patients' chemotherapy response using the known clinical and histological variables. METHODS We analyzed the whole-slide histopathology images, RNA-Seq, and proteomics data from 587 primary serous ovarian adenocarcinoma patients and developed a systematic algorithm to integrate histopathology and functional omics findings and to predict patients' response to platinum-based chemotherapy. RESULTS Our convolutional neural networks identified the cancerous regions with areas under the receiver operating characteristic curve (AUCs) > 0.95 and classified tumor grade with AUCs > 0.80. Functional omics analysis revealed that expression levels of proteins participated in innate immune responses and catabolic pathways are associated with tumor grade. Quantitative histopathology analysis successfully stratified patients with different response to platinum-based chemotherapy (P = 0.003). CONCLUSIONS These results indicated the potential clinical utility of quantitative histopathology evaluation in tumor cell detection and chemotherapy response prediction. The developed algorithm is easily extensible to other tumor types and treatment modalities.
Collapse
Affiliation(s)
- Kun-Hsing Yu
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. .,Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA.
| | - Vincent Hu
- Department of Bioengineering, University of California San Diego, San Diego, CA, USA
| | - Feiran Wang
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
| | - Ursula A Matulonis
- Division of Gynecologic Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - George L Mutter
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
| | - Jeffrey A Golden
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
8
|
Yu KH, Lee TLM, Yen MH, Kou SC, Rosen B, Chiang JH, Kohane IS. Reproducible Machine Learning Methods for Lung Cancer Detection Using Computed Tomography Images: Algorithm Development and Validation. J Med Internet Res 2020; 22:e16709. [PMID: 32755895 PMCID: PMC7439139 DOI: 10.2196/16709] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 05/25/2020] [Accepted: 06/11/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Chest computed tomography (CT) is crucial for the detection of lung cancer, and many automated CT evaluation methods have been proposed. Due to the divergent software dependencies of the reported approaches, the developed methods are rarely compared or reproduced. OBJECTIVE The goal of the research was to generate reproducible machine learning modules for lung cancer detection and compare the approaches and performances of the award-winning algorithms developed in the Kaggle Data Science Bowl. METHODS We obtained the source codes of all award-winning solutions of the Kaggle Data Science Bowl Challenge, where participants developed automated CT evaluation methods to detect lung cancer (training set n=1397, public test set n=198, final test set n=506). The performance of the algorithms was evaluated by the log-loss function, and the Spearman correlation coefficient of the performance in the public and final test sets was computed. RESULTS Most solutions implemented distinct image preprocessing, segmentation, and classification modules. Variants of U-Net, VGGNet, and residual net were commonly used in nodule segmentation, and transfer learning was used in most of the classification algorithms. Substantial performance variations in the public and final test sets were observed (Spearman correlation coefficient = .39 among the top 10 teams). To ensure the reproducibility of results, we generated a Docker container for each of the top solutions. CONCLUSIONS We compared the award-winning algorithms for lung cancer detection and generated reproducible Docker images for the top solutions. Although convolutional neural networks achieved decent accuracy, there is plenty of room for improvement regarding model generalizability.
Collapse
Affiliation(s)
- Kun-Hsing Yu
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.,Department of Statistics, Harvard University, Cambridge, MA, United States.,Department of Pathology, Brigham and Women's Hospital, Boston, MA, United States
| | | | - Ming-Hsuan Yen
- Graduate Program of Multimedia Systems and Intelligent Computing, National Cheng Kung University and Academia Sinica, Tainan, Taiwan.,Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
| | - S C Kou
- Department of Statistics, Harvard University, Cambridge, MA, United States
| | - Bruce Rosen
- Department of Radiology, Athinoula A Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Boston, MA, United States.,Division of Health Sciences and Technology, Harvard-Massachusetts Institute of Technology, Boston, MA, United States
| | - Jung-Hsien Chiang
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.,Division of Health Sciences and Technology, Harvard-Massachusetts Institute of Technology, Boston, MA, United States
| |
Collapse
|
9
|
Yu KH, Wang F, Berry GJ, Ré C, Altman RB, Snyder M, Kohane IS. Classifying non-small cell lung cancer types and transcriptomic subtypes using convolutional neural networks. J Am Med Inform Assoc 2020; 27:757-769. [PMID: 32364237 PMCID: PMC7309263 DOI: 10.1093/jamia/ocz230] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 11/22/2019] [Accepted: 03/05/2020] [Indexed: 12/26/2022] Open
Abstract
OBJECTIVE Non-small cell lung cancer is a leading cause of cancer death worldwide, and histopathological evaluation plays the primary role in its diagnosis. However, the morphological patterns associated with the molecular subtypes have not been systematically studied. To bridge this gap, we developed a quantitative histopathology analytic framework to identify the types and gene expression subtypes of non-small cell lung cancer objectively. MATERIALS AND METHODS We processed whole-slide histopathology images of lung adenocarcinoma (n = 427) and lung squamous cell carcinoma patients (n = 457) in the Cancer Genome Atlas. We built convolutional neural networks to classify histopathology images, evaluated their performance by the areas under the receiver-operating characteristic curves (AUCs), and validated the results in an independent cohort (n = 125). RESULTS To establish neural networks for quantitative image analyses, we first built convolutional neural network models to identify tumor regions from adjacent dense benign tissues (AUCs > 0.935) and recapitulated expert pathologists' diagnosis (AUCs > 0.877), with the results validated in an independent cohort (AUCs = 0.726-0.864). We further demonstrated that quantitative histopathology morphology features identified the major transcriptomic subtypes of both adenocarcinoma and squamous cell carcinoma (P < .01). DISCUSSION Our study is the first to classify the transcriptomic subtypes of non-small cell lung cancer using fully automated machine learning methods. Our approach does not rely on prior pathology knowledge and can discover novel clinically relevant histopathology patterns objectively. The developed procedure is generalizable to other tumor types or diseases.
Collapse
Affiliation(s)
- Kun-Hsing Yu
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Feiran Wang
- Department of Electrical Engineering, Stanford University, Stanford, California, USA
| | - Gerald J Berry
- Department of Pathology, Stanford University, Stanford, California, USA
| | - Christopher Ré
- Department of Computer Science, Stanford University, Stanford, California, USA
| | - Russ B Altman
- Biomedical Informatics Program, Stanford University, Stanford, California, USA
- Department of Bioengineering, Stanford University, Stanford, California, USA
- Department of Genetics, Stanford University, Stanford, California, USA
| | - Michael Snyder
- Department of Genetics, Stanford University, Stanford, California, USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
10
|
Zhang Z, Li H, Jiang S, Li R, Li W, Chen H, Bo X. A survey and evaluation of Web-based tools/databases for variant analysis of TCGA data. Brief Bioinform 2020; 20:1524-1541. [PMID: 29617727 PMCID: PMC6781580 DOI: 10.1093/bib/bby023] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 02/22/2018] [Indexed: 12/28/2022] Open
Abstract
The Cancer Genome Atlas (TCGA) is a publicly funded project that aims to catalog and discover major cancer-causing genomic alterations with the goal of creating a comprehensive ‘atlas’ of cancer genomic profiles. The availability of this genome-wide information provides an unprecedented opportunity to expand our knowledge of tumourigenesis. Computational analytics and mining are frequently used as effective tools for exploring this byzantine series of biological and biomedical data. However, some of the more advanced computational tools are often difficult to understand or use, thereby limiting their application by scientists who do not have a strong computational background. Hence, it is of great importance to build user-friendly interfaces that allow both computational scientists and life scientists without a computational background to gain greater biological and medical insights. To that end, this survey was designed to systematically present available Web-based tools and facilitate the use TCGA data for cancer research.
Collapse
Affiliation(s)
- Zhuo Zhang
- Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Hao Li
- Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Shuai Jiang
- Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Ruijiang Li
- Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Wanying Li
- Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Hebing Chen
- Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Beijing Institute of Radiation Medicine, Beijing 100850, China
| |
Collapse
|
11
|
Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng 2018; 2:719-731. [PMID: 31015651 DOI: 10.1038/s41551-018-0305-z] [Citation(s) in RCA: 975] [Impact Index Per Article: 139.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 09/05/2018] [Indexed: 02/07/2023]
Abstract
Artificial intelligence (AI) is gradually changing medical practice. With recent progress in digitized data acquisition, machine learning and computing infrastructure, AI applications are expanding into areas that were previously thought to be only the province of human experts. In this Review Article, we outline recent breakthroughs in AI technologies and their biomedical applications, identify the challenges for further progress in medical AI systems, and summarize the economic, legal and social implications of AI in healthcare.
Collapse
Affiliation(s)
- Kun-Hsing Yu
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Andrew L Beam
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. .,Boston Children's Hospital, Boston, MA, USA.
| |
Collapse
|
12
|
Yu KH, Lee TLM, Chen YJ, Ré C, Kou SC, Chiang JH, Snyder M, Kohane IS. A Cloud-Based Metabolite and Chemical Prioritization System for the Biology/Disease-Driven Human Proteome Project. J Proteome Res 2018; 17:4345-4357. [PMID: 30094994 DOI: 10.1021/acs.jproteome.8b00378] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Targeted metabolomics and biochemical studies complement the ongoing investigations led by the Human Proteome Organization (HUPO) Biology/Disease-Driven Human Proteome Project (B/D-HPP). However, it is challenging to identify and prioritize metabolite and chemical targets. Literature-mining-based approaches have been proposed for target proteomics studies, but text mining methods for metabolite and chemical prioritization are hindered by a large number of synonyms and nonstandardized names of each entity. In this study, we developed a cloud-based literature mining and summarization platform that maps metabolites and chemicals in the literature to unique identifiers and summarizes the copublication trends of metabolites/chemicals and B/D-HPP topics using Protein Universal Reference Publication-Originated Search Engine (PURPOSE) scores. We successfully prioritized metabolites and chemicals associated with the B/D-HPP targeted fields and validated the results by checking against expert-curated associations and enrichment analyses. Compared with existing algorithms, our system achieved better precision and recall in retrieving chemicals related to B/D-HPP focused areas. Our cloud-based platform enables queries on all biological terms in multiple species, which will contribute to B/D-HPP and targeted metabolomics/chemical studies.
Collapse
Affiliation(s)
- Kun-Hsing Yu
- Department of Biomedical Informatics , Harvard Medical School , Boston , Massachusetts 02115 , United States.,Department of Statistics , Harvard University , Cambridge , Massachusetts 02138 , United States
| | - Tsung-Lu Michael Lee
- Department of Information Engineering , Kun Shan University , Tainan City 710 , Taiwan
| | - Yu-Ju Chen
- Institute of Chemistry , Academia Sinica , Taipei City 115 , Taiwan
| | - Christopher Ré
- Department of Computer Science , Stanford University , Stanford , California 94305 , United States
| | - Samuel C Kou
- Department of Statistics , Harvard University , Cambridge , Massachusetts 02138 , United States
| | - Jung-Hsien Chiang
- Department of Computer Science and Information Engineering , National Cheng Kung University , Tainan City 701 , Taiwan
| | - Michael Snyder
- Department of Genetics, School of Medicine , Stanford University , Stanford , California 94305 , United States
| | - Isaac S Kohane
- Department of Biomedical Informatics , Harvard Medical School , Boston , Massachusetts 02115 , United States
| |
Collapse
|
13
|
Yu KH, Lee TLM, Wang CS, Chen YJ, Ré C, Kou SC, Chiang JH, Kohane IS, Snyder M. Systematic Protein Prioritization for Targeted Proteomics Studies through Literature Mining. J Proteome Res 2018; 17:1383-1396. [PMID: 29505266 DOI: 10.1021/acs.jproteome.7b00772] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
There are more than 3.7 million published articles on the biological functions or disease implications of proteins, constituting an important resource of proteomics knowledge. However, it is difficult to summarize the millions of proteomics findings in the literature manually and quantify their relevance to the biology and diseases of interest. We developed a fully automated bioinformatics framework to identify and prioritize proteins associated with any biological entity. We used the 22 targeted areas of the Biology/Disease-driven (B/D)-Human Proteome Project (HPP) as examples, prioritized the relevant proteins through their Protein Universal Reference Publication-Originated Search Engine (PURPOSE) scores, validated the relevance of the score by comparing the protein prioritization results with a curated database, computed the scores of proteins across the topics of B/D-HPP, and characterized the top proteins in the common model organisms. We further extended the bioinformatics workflow to identify the relevant proteins in all organ systems and human diseases and deployed a cloud-based tool to prioritize proteins related to any custom search terms in real time. Our tool can facilitate the prioritization of proteins for any organ system or disease of interest and can contribute to the development of targeted proteomic studies for precision medicine.
Collapse
Affiliation(s)
- Kun-Hsing Yu
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, United States
- Department of Statistics, Harvard University, Cambridge, Massachusetts 02138, United States
| | - Tsung-Lu Michael Lee
- Department of Information Engineering, Kun Shan University, Tainan City 710-03, Taiwan
| | - Chi-Shiang Wang
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan City 701-01, Taiwan
| | - Yu-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei 115-29, Taiwan
| | - Christopher Ré
- Department of Computer Science, Stanford University, Stanford, California 94305, United States
| | - Samuel C. Kou
- Department of Statistics, Harvard University, Cambridge, Massachusetts 02138, United States
| | - Jung-Hsien Chiang
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan City 701-01, Taiwan
| | - Isaac S. Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Michael Snyder
- Department of Genetics, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
14
|
Yu KH, Berry GJ, Rubin DL, Ré C, Altman RB, Snyder M. Association of Omics Features with Histopathology Patterns in Lung Adenocarcinoma. Cell Syst 2017; 5:620-627.e3. [PMID: 29153840 PMCID: PMC5746468 DOI: 10.1016/j.cels.2017.10.014] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Revised: 07/30/2017] [Accepted: 10/19/2017] [Indexed: 12/16/2022]
Abstract
Adenocarcinoma accounts for more than 40% of lung malignancy, and microscopic pathology evaluation is indispensable for its diagnosis. However, how histopathology findings relate to molecular abnormalities remains largely unknown. Here, we obtained H&E-stained whole-slide histopathology images, pathology reports, RNA sequencing, and proteomics data of 538 lung adenocarcinoma patients from The Cancer Genome Atlas and used these to identify molecular pathways associated with histopathology patterns. We report cell-cycle regulation and nucleotide binding pathways underpinning tumor cell dedifferentiation, and we predicted histology grade using transcriptomics and proteomics signatures (area under curve >0.80). We built an integrative histopathology-transcriptomics model to generate better prognostic predictions for stage I patients (p = 0.0182 ± 0.0021) compared with gene expression or histopathology studies alone, and the results were replicated in an independent cohort (p = 0.0220 ± 0.0070). These results motivate the integration of histopathology and omics data to investigate molecular mechanisms of pathology findings and enhance clinical prognostic prediction.
Collapse
Affiliation(s)
- Kun-Hsing Yu
- Biomedical Informatics Program, Stanford University, Stanford, CA 94305-5479, USA; Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Gerald J Berry
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
| | - Daniel L Rubin
- Biomedical Informatics Program, Stanford University, Stanford, CA 94305-5479, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA; Department of Radiology, Stanford University, Stanford, CA 94305-5105, USA; Department of Medicine (Biomedical Informatics Research), Stanford University, Stanford, CA 94305-5479, USA
| | - Christopher Ré
- Department of Computer Science, Stanford University, Stanford, CA 94305-9025, USA
| | - Russ B Altman
- Biomedical Informatics Program, Stanford University, Stanford, CA 94305-5479, USA; Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA; Department of Computer Science, Stanford University, Stanford, CA 94305-9025, USA; Department of Bioengineering, Stanford University, Stanford, CA 94305-4125, USA
| | - Michael Snyder
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA.
| |
Collapse
|