1
|
Kudo MS, Gomes de Souza VM, Estivallet CLN, de Amorim HA, Kim FJ, Leite KRM, Moraes MC. The value of artificial intelligence for detection and grading of prostate cancer in human prostatectomy specimens: a validation study. Patient Saf Surg 2022; 16:36. [PMID: 36424622 PMCID: PMC9686032 DOI: 10.1186/s13037-022-00345-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 10/23/2022] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND The Gleason grading system is an important clinical practice for diagnosing prostate cancer in pathology images. However, this analysis results in significant variability among pathologists, hence creating possible negative clinical impacts. Artificial intelligence methods can be an important support for the pathologist, improving Gleason grade classifications. Consequently, our purpose is to construct and evaluate the potential of a Convolutional Neural Network (CNN) to classify Gleason patterns. METHODS The methodology included 6982 image patches with cancer, extracted from radical prostatectomy specimens previously analyzed by an expert uropathologist. A CNN was constructed to accurately classify the corresponding Gleason. The evaluation was carried out by computing the corresponding 3 classes confusion matrix; thus, calculating the percentage of precision, sensitivity, and specificity, as well as the overall accuracy. Additionally, k-fold three-way cross-validation was performed to enhance evaluation, allowing better interpretation and avoiding possible bias. RESULTS The overall accuracy reached 98% for the training and validation stage, and 94% for the test phase. Considering the test samples, the true positive ratio between pathologist and computer method was 85%, 93%, and 96% for specific Gleason patterns. Finally, precision, sensitivity, and specificity reached values up to 97%. CONCLUSION The CNN model presented and evaluated has shown high accuracy for specifically pattern neighbors and critical Gleason patterns. The outcomes are in line and complement others in the literature. The promising results surpassed current inter-pathologist congruence in classical reports, evidencing the potential of this novel technology in daily clinical aspects.
Collapse
Affiliation(s)
- Maíra Suzuka Kudo
- grid.11899.380000 0004 1937 0722Laboratory of Image and Signal Processing of the Institute of Science and Technology of Federal University of São Paulo (Universidade Federal de São Paulo – UNIFESP), Rua Talim 330 Sala 108 - Jardim Aeroporto, São José Dos Campos, SP CEP 12231-280 Brazil
| | - Vinicius Meneguette Gomes de Souza
- grid.11899.380000 0004 1937 0722Laboratory of Medical Investigations Number 55 of the Sao Paulo University Medical School – FMUSP, Avenida Dr Arnaldo, 455, sala 2145 – Cerqueira Cesar, Sao Paulo, SP CEP 01246-903 Brazil
| | - Carmen Liane Neubarth Estivallet
- grid.11899.380000 0004 1937 0722Laboratory of Medical Investigations Number 55 of the Sao Paulo University Medical School – FMUSP, Avenida Dr Arnaldo, 455, sala 2145 – Cerqueira Cesar, Sao Paulo, SP CEP 01246-903 Brazil
| | - Henrique Alves de Amorim
- grid.11899.380000 0004 1937 0722Laboratory of Image and Signal Processing of the Institute of Science and Technology of Federal University of São Paulo (Universidade Federal de São Paulo – UNIFESP), Rua Talim 330 Sala 108 - Jardim Aeroporto, São José Dos Campos, SP CEP 12231-280 Brazil
| | - Fernando J. Kim
- grid.430503.10000 0001 0703 675XDenver Health Medical Center, University of Colorado Anschutz Medical Center, Aurora, CO USA
| | - Katia Ramos Moreira Leite
- grid.11899.380000 0004 1937 0722Laboratory of Medical Investigations Number 55 of the Sao Paulo University Medical School – FMUSP, Avenida Dr Arnaldo, 455, sala 2145 – Cerqueira Cesar, Sao Paulo, SP CEP 01246-903 Brazil
| | - Matheus Cardoso Moraes
- grid.11899.380000 0004 1937 0722Laboratory of Image and Signal Processing of the Institute of Science and Technology of Federal University of São Paulo (Universidade Federal de São Paulo – UNIFESP), Rua Talim 330 Sala 108 - Jardim Aeroporto, São José Dos Campos, SP CEP 12231-280 Brazil
| |
Collapse
|
2
|
Salem H, Soria D, Lund JN, Awwad A. A systematic review of the applications of Expert Systems (ES) and machine learning (ML) in clinical urology. BMC Med Inform Decis Mak 2021; 21:223. [PMID: 34294092 PMCID: PMC8299670 DOI: 10.1186/s12911-021-01585-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Accepted: 07/08/2021] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Testing a hypothesis for 'factors-outcome effect' is a common quest, but standard statistical regression analysis tools are rendered ineffective by data contaminated with too many noisy variables. Expert Systems (ES) can provide an alternative methodology in analysing data to identify variables with the highest correlation to the outcome. By applying their effective machine learning (ML) abilities, significant research time and costs can be saved. The study aims to systematically review the applications of ES in urological research and their methodological models for effective multi-variate analysis. Their domains, development and validity will be identified. METHODS The PRISMA methodology was applied to formulate an effective method for data gathering and analysis. This study search included seven most relevant information sources: WEB OF SCIENCE, EMBASE, BIOSIS CITATION INDEX, SCOPUS, PUBMED, Google Scholar and MEDLINE. Eligible articles were included if they applied one of the known ML models for a clear urological research question involving multivariate analysis. Only articles with pertinent research methods in ES models were included. The analysed data included the system model, applications, input/output variables, target user, validation, and outcomes. Both ML models and the variable analysis were comparatively reported for each system. RESULTS The search identified n = 1087 articles from all databases and n = 712 were eligible for examination against inclusion criteria. A total of 168 systems were finally included and systematically analysed demonstrating a recent increase in uptake of ES in academic urology in particular artificial neural networks with 31 systems. Most of the systems were applied in urological oncology (prostate cancer = 15, bladder cancer = 13) where diagnostic, prognostic and survival predictor markers were investigated. Due to the heterogeneity of models and their statistical tests, a meta-analysis was not feasible. CONCLUSION ES utility offers an effective ML potential and their applications in research have demonstrated a valid model for multi-variate analysis. The complexity of their development can challenge their uptake in urological clinics whilst the limitation of the statistical tools in this domain has created a gap for further research studies. Integration of computer scientists in academic units has promoted the use of ES in clinical urological research.
Collapse
Affiliation(s)
- Hesham Salem
- Urological Department, NIHR Nottingham Biomedical Research Centre, School of Medicine, University of Nottingham, Nottingham, NG72UH, UK
- University Hospitals of Derby and Burton NHS Foundation Trust, Royal Derby Hospital, University of Nottingham, Derby, DE22 3DT, UK
| | - Daniele Soria
- School of Computer Science and Engineering, University of Westminster, London, W1W 6UW, UK
| | - Jonathan N Lund
- University Hospitals of Derby and Burton NHS Foundation Trust, Royal Derby Hospital, University of Nottingham, Derby, DE22 3DT, UK
| | - Amir Awwad
- NIHR Nottingham Biomedical Research Centre, Sir Peter Mansfield Imaging Centre, School of Medicine, University of Nottingham, Nottingham, NG72UH, UK.
- Department of Medical Imaging, London Health Sciences Centre, University of Hospital, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada.
| |
Collapse
|
3
|
A Multi-Channel and Multi-Spatial Attention Convolutional Neural Network for Prostate Cancer ISUP Grading. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11104321] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Prostate cancer (PCa) is one of the most prevalent cancers worldwide. As the demand for prostate biopsies increases, a worldwide shortage and an uneven geographical distribution of proficient pathologists place a strain on the efficacy of pathological diagnosis. Deep learning (DL) is able to automatically extract features from whole-slide images of prostate biopsies annotated by skilled pathologists and to classify the severity of PCa. A whole-slide image of biopsies has many irrelevant features that weaken the performance of DL models. To enable DL models to focus more on cancerous tissues, we propose a Multi-Channel and Multi-Spatial (MCMS) Attention module that can be easily plugged into any backbone CNN to enhance feature extraction. Specifically, MCMS learns a channel attention vector to assign weights to channels in the feature map by pooling from multiple attention branches with different reduction ratios; similarly, it also learns a spatial attention matrix to focus on more relevant areas of the image, by pooling from multiple convolutional layers with different kernel sizes. The model is verified on the most extensive multi-center PCa dataset that consists of 11,000 H&E-stained histopathology whole-slide images. Experimental results demonstrate that an MCMS-assisted CNN can effectively boost prediction performance in accuracy (ACC) and quadratic weighted kappa (QWK), compared with prior studies. The proposed model and results can serve as a credible benchmark for future research in automated PCa grading.
Collapse
|
4
|
Momenzadeh N, Hafezalseheh H, Nayebpour M, Fathian M, Noorossana R. A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100763] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
|
5
|
Vittrant B, Leclercq M, Martin-Magniette ML, Collins C, Bergeron A, Fradet Y, Droit A. Identification of a Transcriptomic Prognostic Signature by Machine Learning Using a Combination of Small Cohorts of Prostate Cancer. Front Genet 2020; 11:550894. [PMID: 33324443 PMCID: PMC7723980 DOI: 10.3389/fgene.2020.550894] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Accepted: 10/29/2020] [Indexed: 01/31/2023] Open
Abstract
Determining which treatment to provide to men with prostate cancer (PCa) is a major challenge for clinicians. Currently, the clinical risk-stratification for PCa is based on clinico-pathological variables such as Gleason grade, stage and prostate specific antigen (PSA) levels. But transcriptomic data have the potential to enable the development of more precise approaches to predict evolution of the disease. However, high quality RNA sequencing (RNA-seq) datasets along with clinical data with long follow-up allowing discovery of biochemical recurrence (BCR) biomarkers are small and rare. In this study, we propose a machine learning approach that is robust to batch effect and enables the discovery of highly predictive signatures despite using small datasets. Gene expression data were extracted from three RNA-Seq datasets cumulating a total of 171 PCa patients. Data were re-analyzed using a unique pipeline to ensure uniformity. Using a machine learning approach, a total of 14 classifiers were tested with various parameters to identify the best model and gene signature to predict BCR. Using a random forest model, we have identified a signature composed of only three genes (JUN, HES4, PPDPF) predicting BCR with better accuracy [74.2%, balanced error rate (BER) = 27%] than the clinico-pathological variables (69.2%, BER = 32%) currently in use to predict PCa evolution. This score is in the range of the studies that predicted BCR in single-cohort with a higher number of patients. We showed that it is possible to merge and analyze different small and heterogeneous datasets altogether to obtain a better signature than if they were analyzed individually, thus reducing the need for very large cohorts. This study demonstrates the feasibility to regroup different small datasets in one larger to identify a predictive genomic signature that would benefit PCa patients.
Collapse
Affiliation(s)
- Benjamin Vittrant
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Médecine Moléculaire, Université Laval, QC, Canada
| | - Mickael Leclercq
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Médecine Moléculaire, Université Laval, QC, Canada
| | - Marie-Laure Martin-Magniette
- Universities of Paris Saclay, Paris, Evry, CNRS, INRAE, Institute of Plant Sciences Paris Saclay (IPS2), 91192, GIf sur Yvette, France.,UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, Paris, France
| | - Colin Collins
- Vancouver Prostate Cancer Centre, Vancouver, BC, Canada.,Department of Urologic Sciences, The University of British Columbia, Vancouver, BC, Canada
| | - Alain Bergeron
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Chirurgie, Oncology Axis, Université Laval, Québec, QC, Canada
| | - Yves Fradet
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Chirurgie, Oncology Axis, Université Laval, Québec, QC, Canada
| | - Arnaud Droit
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Médecine Moléculaire, Université Laval, QC, Canada
| |
Collapse
|
6
|
Eissa A, Elsherbiny A, Zoeir A, Sandri M, Pirola G, Puliatti S, Del Prete C, Sighinolfi MC, Micali S, Rocco B, Bianchi G. Reliability of the different versions of Partin tables in predicting extraprostatic extension of prostate cancer: a systematic review and meta-analysis. MINERVA UROL NEFROL 2019; 71:457-478. [DOI: 10.23736/s0393-2249.19.03427-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
7
|
Eminaga O, Al-Hamad O, Boegemann M, Breil B, Semjonow A. Combination possibility and deep learning model as clinical decision-aided approach for prostate cancer. Health Informatics J 2019; 26:945-962. [PMID: 31238766 DOI: 10.1177/1460458219855884] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
This study aims to introduce as proof of concept a combination model for classification of prostate cancer using deep learning approaches. We utilized patients with prostate cancer who underwent surgical treatment representing the various conditions of disease progression. All possible combinations of significant variables from logistic regression and correlation analyses were determined from study data sets. The combination possibility and deep learning model was developed to predict these combinations that represented clinically meaningful patient's subgroups. The observed relative frequencies of different tumor stages and Gleason score Gls changes from biopsy to prostatectomy were available for each group. Deep learning models and seven machine learning approaches were compared for the classification performance of Gleason score changes and pT2 stage. Deep models achieved the highest F1 scores by pT2 tumors (0.849) and Gls change (0.574). Combination possibility and deep learning model is a useful decision-aided tool for prostate cancer and to group patients with prostate cancer into clinically meaningful groups.
Collapse
Affiliation(s)
- Okyaz Eminaga
- Stanford University School of Medicine, USA; University Hospital of Cologne, Germany
| | | | | | | | | |
Collapse
|
8
|
Onisko A, Druzdzel MJ, Austin RM. Application of Bayesian network modeling to pathology informatics. Diagn Cytopathol 2018; 47:41-47. [PMID: 30451397 DOI: 10.1002/dc.23993] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/04/2018] [Accepted: 05/30/2018] [Indexed: 11/06/2022]
Abstract
BACKGROUND In the era of extensive data collection, there is a growing need for a large scale data analysis with tools that can handle many variables in one modeling framework. In this article, we present our recent applications of Bayesian network modeling to pathology informatics. METHODS Bayesian networks (BNs) are probabilistic graphical models that represent domain knowledge and allow investigators to process this knowledge following sound rules of probability theory. BNs can be built based on expert opinion as well as learned from accumulating data sets. BN modeling is now recognized as a suitable approach for knowledge representation and reasoning under uncertainty. Over the last two decades BN have been successfully applied to many studies on medical prognosis and diagnosis. RESULTS Based on data and expert knowledge, we have constructed several BN models to assess patient risk for subsequent specific histopathologic diagnoses and their related prognosis in gynecological cytopathology and breast pathology. These models include the Pittsburgh Cervical Cancer Screening Model assessing risk for histopathologic diagnoses of cervical precancer and cervical cancer, modeling of the significance of benign-appearing endometrial cells in Pap tests, diagnostic modeling to determine whether adenocarcinoma in tissue specimens is of endometrial or endocervical origin, and models to assess risk for recurrence of invasive breast carcinoma and ductal carcinoma in situ. CONCLUSIONS Bayesian network models can be used as powerful and flexible risk assessment tools on large clinical datasets and can quantitatively identify variables that are of greatest significance in predicting specific histopathologic diagnoses and their related prognosis. Resulting BN models are able to provide individualized quantitative risk assessments and prognostication for specific abnormal findings commonly reported in gynecological cytopathology and breast pathology.
Collapse
Affiliation(s)
- Agnieszka Onisko
- Magee-Womens Hospital, Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, 15213.,Faculty of Computer Science, Bialystok University of Technology, Wiejska 45A, Bialystok, 15-351, Poland
| | - Marek J Druzdzel
- Faculty of Computer Science, Bialystok University of Technology, Wiejska 45A, Bialystok, 15-351, Poland.,School of Computing and Information, University of Pittsburgh, 135 N Bellefield Ave, Pittsburgh, Pennsylvania, 15213
| | - R Marshall Austin
- Magee-Womens Hospital, Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, 15213
| |
Collapse
|
9
|
Langarizadeh M, Moghbeli F. Applying Naive Bayesian Networks to Disease Prediction: a Systematic Review. Acta Inform Med 2016; 24:364-369. [PMID: 28077895 PMCID: PMC5203736 DOI: 10.5455/aim.2016.24.364-369] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2016] [Accepted: 10/11/2016] [Indexed: 12/15/2022] Open
Abstract
INTRODUCTION Naive Bayesian networks (NBNs) are one of the most effective and simplest Bayesian networks for prediction. OBJECTIVE This paper aims to review published evidence about the application of NBNs in predicting disease and it tries to show NBNs as the fundamental algorithm for the best performance in comparison with other algorithms. METHODS PubMed was electronically checked for articles published between 2005 and 2015. For characterizing eligible articles, a comprehensive electronic searching method was conducted. Inclusion criteria were determined based on NBN and its effects on disease prediction. A total of 99 articles were found. After excluding the duplicates (n= 5), the titles and abstracts of 94 articles were skimmed according to the inclusion criteria. Finally, 38 articles remained. They were reviewed in full text and 15 articles were excluded. Eventually, 23 articles were selected which met our eligibility criteria and were included in this study. RESULT In this article, the use of NBN in predicting diseases was described. Finally, the results were reported in terms of Accuracy, Sensitivity, Specificity and Area under ROC curve (AUC). The last column in Table 2 shows the differences between NBNs and other algorithms. DISCUSSION This systematic review (23 studies, 53,725 patients) indicates that predicting diseases based on a NBN had the best performance in most diseases in comparison with the other algorithms. Finally in most cases NBN works better than other algorithms based on the reported accuracy. CONCLUSION The method, termed NBNs is proposed and can efficiently construct a prediction model for disease.
Collapse
Affiliation(s)
- Mostafa Langarizadeh
- Department of Health Information Management, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
| | - Fateme Moghbeli
- Department of Health Information Management, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
10
|
De Bari B, Vallati M, Gatta R, Lestrade L, Manfrida S, Carrie C, Valentini V. Development and validation of a machine learning-based predictive model to improve the prediction of inguinal status of anal cancer patients: A preliminary report. Oncotarget 2016; 8:108509-108521. [PMID: 29312547 PMCID: PMC5752460 DOI: 10.18632/oncotarget.10749] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2016] [Accepted: 07/07/2016] [Indexed: 12/23/2022] Open
Abstract
Introduction The role of prophylactic inguinal irradiation (PII) in the treatment of anal cancer patients is controversial. We developped an innovative algorithm based on the Machine Learning (ML) allowing the tailoring of the prescription of PII. Results Once verified on the independent testing set, J48 showed the better performances, with specificity, sensitivity, and accuracy rates in predicting relapsing patients of 86.4%, 50.0% and 83.1% respectively (vs 36.5%, 90.4% and 80.25%, respectively, for LR). Methods We classified 194 anal cancer patients with Logistic Regression (LR) and other 3 ML techniques based on decision trees (J48, Random Tree and Random Forest), using a large set of clinical and therapeutic variables. We tested obtained ML algorithms on an independent testing set of 65 anal cancer patients. TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) methodology was used for the development, the Quality Assurance and the description of the experimental procedures. Conclusion In an internationally approved quality assurance framework, ML seems promising in predicting the outcome of patients that would benefit or not of the PII. Once confirmed in larger and/or multi-centric databases, ML could support the physician in tailoring the treatment and in deciding if deliver or not the PII.
Collapse
Affiliation(s)
- Berardino De Bari
- Radiation Oncology Department, Centre Hospitalier Universitaire Vaudois-CHUV, Lausanne, Switzerland
| | - Mauro Vallati
- University of Huddersfield, School of Computing and Engineering, Huddersfield, UK
| | - Roberto Gatta
- Radiation Oncology Department, Catholic University of Sacred Heart, Rome, Italy
| | - Laëtitia Lestrade
- Service de Radiothérapie, Léon Bérard Cancer Center, Lyon, France.,Radiation Oncology Department, Hôpitaux universitaires de Genève-HUG, Geneva, Switzerland
| | - Stefania Manfrida
- Radiation Oncology Department, Catholic University of Sacred Heart, Rome, Italy
| | - Christian Carrie
- Service de Radiothérapie, Léon Bérard Cancer Center, Lyon, France
| | - Vincenzo Valentini
- Radiation Oncology Department, Catholic University of Sacred Heart, Rome, Italy
| |
Collapse
|
11
|
Jäderling F, Nyberg T, Blomqvist L, Bjartell A, Steineck G, Carlsson S. Accurate prediction tools in prostate cancer require consistent assessment of included variables. Scand J Urol 2016; 50:260-6. [DOI: 10.3109/21681805.2016.1145736] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Fredrik Jäderling
- Department of Diagnostic Radiology, Karolinska University Hospital, Solna, Sweden
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - Tommy Nyberg
- Department of Oncology and Pathology, Division of Clinical Cancer Epidemiology, Karolinska Institutet, Stockholm, Sweden
| | - Lennart Blomqvist
- Department of Diagnostic Radiology, Karolinska University Hospital, Solna, Sweden
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - Anders Bjartell
- Department of Urology, Skåne University Hospital, Malmö, Sweden
| | - Gunnar Steineck
- Department of Oncology and Pathology, Division of Clinical Cancer Epidemiology, Karolinska Institutet, Stockholm, Sweden
- Division of Clinical Cancer Epidemiology, Department of Oncology, Institute of Clinical Sciences, Sahlgrenska Academy at the University of Gothenburg, Gothenburg, Sweden
| | - Stefan Carlsson
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Urology, Karolinska University Hospital, Solna, Sweden
| |
Collapse
|
12
|
Mansiaux Y, Carrat F. Detection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with H1N1pdm influenza infections. BMC Med Res Methodol 2014; 14:99. [PMID: 25154404 PMCID: PMC4146451 DOI: 10.1186/1471-2288-14-99] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Accepted: 08/14/2014] [Indexed: 12/19/2022] Open
Abstract
Background Big data is steadily growing in epidemiology. We explored the performances of methods dedicated to big data analysis for detecting independent associations between exposures and a health outcome. Methods We searched for associations between 303 covariates and influenza infection in 498 subjects (14% infected) sampled from a dedicated cohort. Independent associations were detected using two data mining methods, the Random Forests (RF) and the Boosted Regression Trees (BRT); the conventional logistic regression framework (Univariate Followed by Multivariate Logistic Regression - UFMLR) and the Least Absolute Shrinkage and Selection Operator (LASSO) with penalty in multivariate logistic regression to achieve a sparse selection of covariates. We developed permutations tests to assess the statistical significance of associations. We simulated 500 similar sized datasets to estimate the True (TPR) and False (FPR) Positive Rates associated with these methods. Results Between 3 and 24 covariates (1%-8%) were identified as associated with influenza infection depending on the method. The pre-seasonal haemagglutination inhibition antibody titer was the unique covariate selected with all methods while 266 (87%) covariates were not selected by any method. At 5% nominal significance level, the TPR were 85% with RF, 80% with BRT, 26% to 49% with UFMLR, 71% to 78% with LASSO. Conversely, the FPR were 4% with RF and BRT, 9% to 2% with UFMLR, and 9% to 4% with LASSO. Conclusions Data mining methods and LASSO should be considered as valuable methods to detect independent associations in large epidemiologic datasets.
Collapse
Affiliation(s)
- Yohann Mansiaux
- INSERM, UMR_S 1136, Institut Pierre Louis d'Epidémiologie et de Santé Publique, F-75013 Paris, France.
| | | |
Collapse
|
13
|
Hooper CM, Tanz SK, Castleden IR, Vacher MA, Small ID, Millar AH. SUBAcon: a consensus algorithm for unifying the subcellular localization data of the Arabidopsis proteome. ACTA ACUST UNITED AC 2014; 30:3356-64. [PMID: 25150248 DOI: 10.1093/bioinformatics/btu550] [Citation(s) in RCA: 130] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
MOTIVATION Knowing the subcellular location of proteins is critical for understanding their function and developing accurate networks representing eukaryotic biological processes. Many computational tools have been developed to predict proteome-wide subcellular location, and abundant experimental data from green fluorescent protein (GFP) tagging or mass spectrometry (MS) are available in the model plant, Arabidopsis. None of these approaches is error-free, and thus, results are often contradictory. RESULTS To help unify these multiple data sources, we have developed the SUBcellular Arabidopsis consensus (SUBAcon) algorithm, a naive Bayes classifier that integrates 22 computational prediction algorithms, experimental GFP and MS localizations, protein-protein interaction and co-expression data to derive a consensus call and probability. SUBAcon classifies protein location in Arabidopsis more accurately than single predictors. AVAILABILITY SUBAcon is a useful tool for recovering proteome-wide subcellular locations of Arabidopsis proteins and is displayed in the SUBA3 database (http://suba.plantenergy.uwa.edu.au). The source code and input data is available through the SUBA3 server (http://suba.plantenergy.uwa.edu.au//SUBAcon.html) and the Arabidopsis SUbproteome REference (ASURE) training set can be accessed using the ASURE web portal (http://suba.plantenergy.uwa.edu.au/ASURE).
Collapse
Affiliation(s)
- Cornelia M Hooper
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Sandra K Tanz
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Ian R Castleden
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Michael A Vacher
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Ian D Small
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - A Harvey Millar
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| |
Collapse
|
14
|
Ospina JD, Zhu J, Chira C, Bossi A, Delobel JB, Beckendorf V, Dubray B, Lagrange JL, Correa JC, Simon A, Acosta O, de Crevoisier R. Random forests to predict rectal toxicity following prostate cancer radiation therapy. Int J Radiat Oncol Biol Phys 2014; 89:1024-1031. [PMID: 25035205 DOI: 10.1016/j.ijrobp.2014.04.027] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Revised: 04/14/2014] [Accepted: 04/15/2014] [Indexed: 10/25/2022]
Abstract
PURPOSE To propose a random forest normal tissue complication probability (RF-NTCP) model to predict late rectal toxicity following prostate cancer radiation therapy, and to compare its performance to that of classic NTCP models. METHODS AND MATERIALS Clinical data and dose-volume histograms (DVH) were collected from 261 patients who received 3-dimensional conformal radiation therapy for prostate cancer with at least 5 years of follow-up. The series was split 1000 times into training and validation cohorts. A RF was trained to predict the risk of 5-year overall rectal toxicity and bleeding. Parameters of the Lyman-Kutcher-Burman (LKB) model were identified and a logistic regression model was fit. The performance of all the models was assessed by computing the area under the receiving operating characteristic curve (AUC). RESULTS The 5-year grade ≥2 overall rectal toxicity and grade ≥1 and grade ≥2 rectal bleeding rates were 16%, 25%, and 10%, respectively. Predictive capabilities were obtained using the RF-NTCP model for all 3 toxicity endpoints, including both the training and validation cohorts. The age and use of anticoagulants were found to be predictors of rectal bleeding. The AUC for RF-NTCP ranged from 0.66 to 0.76, depending on the toxicity endpoint. The AUC values for the LKB-NTCP were statistically significantly inferior, ranging from 0.62 to 0.69. CONCLUSIONS The RF-NTCP model may be a useful new tool in predicting late rectal toxicity, including variables other than DVH, and thus appears as a strong competitor to classic NTCP models.
Collapse
Affiliation(s)
- Juan D Ospina
- LTSI, Université de Rennes 1, Rennes, France; INSERM, U1099, Rennes, France; Escuela de Estadística, Universidad Nacional de Colombia Sede Medellín, Medellín, Colombia
| | - Jian Zhu
- LTSI, Université de Rennes 1, Rennes, France; Laboratory of Image Science and Technology, Southeast University, Nanjing, PR China; Department of Radiation Physics, Shandong Cancer Hospital and Institute, Jinan, PR China; Centre de Recherche en Information Biomédical Sino-Français, Rennes, France
| | - Ciprian Chira
- Département de Radiothérapie, Centre Eugène Marquis, Rennes, France
| | - Alberto Bossi
- Département de Radiothérapie, Institut Gustave-Roussy, Villejuif, France
| | - Jean B Delobel
- Département de Radiothérapie, Centre Eugène Marquis, Rennes, France
| | | | - Bernard Dubray
- Département de Radiothérapie, CRLCC Henri Becquerel, Rouen, France
| | | | - Juan C Correa
- Escuela de Estadística, Universidad Nacional de Colombia Sede Medellín, Medellín, Colombia
| | - Antoine Simon
- LTSI, Université de Rennes 1, Rennes, France; INSERM, U1099, Rennes, France; Centre de Recherche en Information Biomédical Sino-Français, Rennes, France
| | - Oscar Acosta
- LTSI, Université de Rennes 1, Rennes, France; INSERM, U1099, Rennes, France
| | - Renaud de Crevoisier
- LTSI, Université de Rennes 1, Rennes, France; INSERM, U1099, Rennes, France; Département de Radiothérapie, Centre Eugène Marquis, Rennes, France; Centre de Recherche en Information Biomédical Sino-Français, Rennes, France.
| |
Collapse
|
15
|
Abstract
Understanding the impact of clinical findings in discriminating between possible causes of a patient's presentation is essential in clinical judgment. A balance beam is a natural physical analogue that can accurately represent the combination of several pieces of evidence with varying ability to discriminate between disease hypotheses. Calculation of Bayes' theorem using log(posterior odds) as a function of log(prior odds) and the logarithms of the evidence's likelihood ratios maps onto the physical forces affecting objects placed on a balance beam. We describe the rules governing the functioning of tokens representing clinical findings in the comparison of 2 competing diseases. The likelihood ratios corresponding to positive (LR+) or negative (LR-) observations for each symptom determine the lateral position at which the symptom's token is placed on the beam, using a weight if the finding is present and a helium balloon if it is absent. We discuss how a balance beam could represent concepts of dynamic specificity (due to changes in competitor diseases' probabilities) and dynamic sensitivity (due to class-conditional independence). Utility-based thresholds for acting on a diagnosis could be represented by moving the balance beam's fulcrum. It is suggested that a balance beam can be a useful aid for students learning clinical diagnosis, allowing them to build on existing intuitive understanding to develop an appreciation of how evidence combines to influence degree of belief. The balance beam could also facilitate exploration of the potential impact of available questions or investigations.
Collapse
Affiliation(s)
- Robert M Hamm
- University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA (RMH, WHB)
| | - William Howard Beasley
- University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA (RMH, WHB),Howard Live Oak, Inc., Norman, OK, USA (WHB)
| |
Collapse
|
16
|
Golino HF, Amaral LSDB, Duarte SFP, Gomes CMA, Soares TDJ, dos Reis LA, Santos J. Predicting increased blood pressure using machine learning. J Obes 2014; 2014:637635. [PMID: 24669313 PMCID: PMC3941962 DOI: 10.1155/2014/637635] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Revised: 10/12/2013] [Accepted: 11/16/2013] [Indexed: 01/21/2023] Open
Abstract
The present study investigates the prediction of increased blood pressure by body mass index (BMI), waist (WC) and hip circumference (HC), and waist hip ratio (WHR) using a machine learning technique named classification tree. Data were collected from 400 college students (56.3% women) from 16 to 63 years old. Fifteen trees were calculated in the training group for each sex, using different numbers and combinations of predictors. The result shows that for women BMI, WC, and WHR are the combination that produces the best prediction, since it has the lowest deviance (87.42), misclassification (.19), and the higher pseudo R (2) (.43). This model presented a sensitivity of 80.86% and specificity of 81.22% in the training set and, respectively, 45.65% and 65.15% in the test sample. For men BMI, WC, HC, and WHC showed the best prediction with the lowest deviance (57.25), misclassification (.16), and the higher pseudo R (2) (.46). This model had a sensitivity of 72% and specificity of 86.25% in the training set and, respectively, 58.38% and 69.70% in the test set. Finally, the result from the classification tree analysis was compared with traditional logistic regression, indicating that the former outperformed the latter in terms of predictive power.
Collapse
Affiliation(s)
- Hudson Fernandes Golino
- Laboratório de Investigação da Arquitetura Cognitiva, Universidade Federal de Minas Gerais, 30000-000 Belo Horizonte, Minas Gerais, MG, Brazil
- *Hudson Fernandes Golino:
| | | | - Stenio Fernando Pimentel Duarte
- Núcleo de Pós-Graduação, Pesquisa e Extenção, Faculdade Independente do Nordeste, São Luís Avenue, 1305, 45000-000 Candeias, Vitória da Conquista, BA, Brazil
| | - Cristiano Mauro Assis Gomes
- Laboratório de Investigação da Arquitetura Cognitiva, Universidade Federal de Minas Gerais, 30000-000 Belo Horizonte, Minas Gerais, MG, Brazil
| | - Telma de Jesus Soares
- Instituto Multidisciplinar de Saúde, Universidade Federal da Bahia, 40000-000 Bahia, BA, Brazil
| | - Luciana Araujo dos Reis
- Núcleo de Pós-Graduação, Pesquisa e Extenção, Faculdade Independente do Nordeste, São Luís Avenue, 1305, 45000-000 Candeias, Vitória da Conquista, BA, Brazil
| | - Joselito Santos
- Núcleo de Pós-Graduação, Pesquisa e Extenção, Faculdade Independente do Nordeste, São Luís Avenue, 1305, 45000-000 Candeias, Vitória da Conquista, BA, Brazil
| |
Collapse
|
17
|
Artificial neural networks and prostate cancer--tools for diagnosis and management. Nat Rev Urol 2013; 10:174-82. [PMID: 23399728 DOI: 10.1038/nrurol.2013.9] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Artificial neural networks (ANNs) are mathematical models that are based on biological neural networks and are composed of interconnected groups of artificial neurons. ANNs are used to map and predict outcomes in complex relationships between given 'inputs' and sought-after 'outputs' and can also be used find patterns in datasets. In medicine, ANN applications have been used in cancer diagnosis, staging and recurrence prediction since the mid-1990s, when an enormous effort was initiated, especially in prostate cancer detection. Modern ANNs can incorporate new biomarkers and imaging data to improve their predictive power and can offer a number of advantages as clinical decision making tools, such as easy handling of distribution-free input parameters. Most importantly, ANNs consider nonlinear relationships among input data that cannot always be recognized by conventional analyses. In the future, complex medical diagnostic and treatment decisions will be increasingly based on ANNs and other multivariate models.
Collapse
|