1
|
Laurie MA, Zhou SR, Islam MT, Shkolyar E, Xing L, Liao JC. Bladder Cancer and Artificial Intelligence: Emerging Applications. Urol Clin North Am 2024; 51:63-75. [PMID: 37945103 PMCID: PMC10697017 DOI: 10.1016/j.ucl.2023.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
Bladder cancer is a common and heterogeneous disease that poses a significant burden to the patient and health care system. Major unmet needs include effective early detection strategy, imprecision of risk stratification, and treatment-associated morbidities. The existing clinical paradigm is imprecise, which results in missed tumors, suboptimal therapy, and disease progression. Artificial intelligence holds immense potential to address many unmet needs in bladder cancer, including early detection, risk stratification, treatment planning, quality assessment, and outcome prediction. Despite recent advances, extensive work remains to affirm the efficacy of artificial intelligence as a decision-making tool for bladder cancer management.
Collapse
Affiliation(s)
- Mark A Laurie
- Department of Urology, Stanford University School of Medicine, 453 Quarry Road, Mail Code 5656, Palo Alto, CA 94304, USA; Department of Radiation Oncology, Stanford University School of Medicine, 875 Blake Wilbur Drive Room G204, Stanford, CA 94305-5847, USA; Veterans Affairs Palo Alto Health Care System, Palo Alto, CA 94304, USA; Institute for Computational and Mathematical Engineering, Stanford University School of Engineering, Stanford, CA 94305, USA
| | - Steve R Zhou
- Department of Urology, Stanford University School of Medicine, 453 Quarry Road, Mail Code 5656, Palo Alto, CA 94304, USA
| | - Md Tauhidul Islam
- Department of Radiation Oncology, Stanford University School of Medicine, 875 Blake Wilbur Drive Room G204, Stanford, CA 94305-5847, USA
| | - Eugene Shkolyar
- Department of Urology, Stanford University School of Medicine, 453 Quarry Road, Mail Code 5656, Palo Alto, CA 94304, USA; Veterans Affairs Palo Alto Health Care System, Palo Alto, CA 94304, USA
| | - Lei Xing
- Department of Radiation Oncology, Stanford University School of Medicine, 875 Blake Wilbur Drive Room G204, Stanford, CA 94305-5847, USA
| | - Joseph C Liao
- Department of Urology, Stanford University School of Medicine, 453 Quarry Road, Mail Code 5656, Palo Alto, CA 94304, USA; Veterans Affairs Palo Alto Health Care System, Palo Alto, CA 94304, USA.
| |
Collapse
|
2
|
Ameen YA, Badary DM, Abonnoor AEI, Hussain KF, Sewisy AA. Which data subset should be augmented for deep learning? a simulation study using urothelial cell carcinoma histopathology images. BMC Bioinformatics 2023; 24:75. [PMID: 36869300 PMCID: PMC9983182 DOI: 10.1186/s12859-023-05199-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 02/21/2023] [Indexed: 03/05/2023] Open
Abstract
BACKGROUND Applying deep learning to digital histopathology is hindered by the scarcity of manually annotated datasets. While data augmentation can ameliorate this obstacle, its methods are far from standardized. Our aim was to systematically explore the effects of skipping data augmentation; applying data augmentation to different subsets of the whole dataset (training set, validation set, test set, two of them, or all of them); and applying data augmentation at different time points (before, during, or after dividing the dataset into three subsets). Different combinations of the above possibilities resulted in 11 ways to apply augmentation. The literature contains no such comprehensive systematic comparison of these augmentation ways. RESULTS Non-overlapping photographs of all tissues on 90 hematoxylin-and-eosin-stained urinary bladder slides were obtained. Then, they were manually classified as either inflammation (5948 images), urothelial cell carcinoma (5811 images), or invalid (3132 images; excluded). If done, augmentation was eight-fold by flipping and rotation. Four convolutional neural networks (Inception-v3, ResNet-101, GoogLeNet, and SqueezeNet), pre-trained on the ImageNet dataset, were fine-tuned to binary classify images of our dataset. This task was the benchmark for our experiments. Model testing performance was evaluated using accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve. Model validation accuracy was also estimated. The best testing performance was achieved when augmentation was done to the remaining data after test-set separation, but before division into training and validation sets. This leaked information between the training and the validation sets, as evidenced by the optimistic validation accuracy. However, this leakage did not cause the validation set to malfunction. Augmentation before test-set separation led to optimistic results. Test-set augmentation yielded more accurate evaluation metrics with less uncertainty. Inception-v3 had the best overall testing performance. CONCLUSIONS In digital histopathology, augmentation should include both the test set (after its allocation), and the remaining combined training/validation set (before being split into separate training and validation sets). Future research should try to generalize our results.
Collapse
Affiliation(s)
- Yusra A Ameen
- Department of Computer Science, Faculty of Computers and Information, Assiut University, Asyut, Egypt.
| | - Dalia M Badary
- Department of Pathology, Faculty of Medicine, Assiut University, Asyut, Egypt
| | | | - Khaled F Hussain
- Department of Computer Science, Faculty of Computers and Information, Assiut University, Asyut, Egypt
| | - Adel A Sewisy
- Department of Computer Science, Faculty of Computers and Information, Assiut University, Asyut, Egypt
| |
Collapse
|
3
|
Doǧan V, Isık T, Kılıç V, Horzum N. A field-deployable water quality monitoring with machine learning-based smartphone colorimetry. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2022; 14:3458-3466. [PMID: 36000587 DOI: 10.1039/d2ay00785a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Water quality monitoring is an increasing global concern as the pollution of water sources causes adverse effects on economic growth and human health. Traditional approaches to the detection of pollutants are time-consuming and labor-intensive due to the requirement of sophisticated equipment or laboratory settings. Therefore, portable devices featuring rapid response and easy operation are indispensable in water quality monitoring. Herein, smartphone-based colorimetric pollutant quantification is demonstrated in a machine learning (ML) framework. As a proof of concept, the presence of seven ions in water was analyzed using colorimetric strips. The color variation on the strip indicators was captured under eight lighting conditions with five smartphones, providing robustness against the illumination variation and camera optics for ML classifiers. Color and texture features were extracted from the images to train the classifiers. Among the twenty-three classifiers, K-Nearest Neighbors exhibits the best classification performance, leading to the integration with our custom-designed Android application called Hydro Sens. The proposed approach was also tested with real samples taken from local water sources. The results prove that incorporating color strips with ML with a smartphone application can be used for water quality monitoring, which offers promising alternatives for sophisticated equipment that is especially applicable in resource-limited settings.
Collapse
Affiliation(s)
- Vakkas Doǧan
- Department of Electrical and Electronics Engineering, Izmir Katip Celebi University, 35620 Turkey.
| | - Tuǧba Isık
- Department of Mineral Analysis and Technologies, General Directorate of Mineral Research and Exploration (MTA), Ankara, Turkey
| | - Volkan Kılıç
- Department of Electrical and Electronics Engineering, Izmir Katip Celebi University, 35620 Turkey.
| | - Nesrin Horzum
- Department of Engineering Sciences, Izmir Katip Celebi University, 35620 Izmir, Turkey
| |
Collapse
|
4
|
Sonabend R, Bender A, Vollmer S. Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures. Bioinformatics 2022; 38:4178-4184. [PMID: 35818973 PMCID: PMC9438958 DOI: 10.1093/bioinformatics/btac451] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 06/17/2022] [Accepted: 07/11/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION In this article, we consider how to evaluate survival distribution predictions with measures of discrimination. This is non-trivial as discrimination measures are the most commonly used in survival analysis and yet there is no clear method to derive a risk prediction from a distribution prediction. We survey methods proposed in literature and software and consider their respective advantages and disadvantages. RESULTS Whilst distributions are frequently evaluated by discrimination measures, we find that the method for doing so is rarely described in the literature and often leads to unfair comparisons or 'C-hacking'. We demonstrate by example how simple it can be to manipulate results and use this to argue for better reporting guidelines and transparency in the literature. We recommend that machine learning survival analysis software implements clear transformations between distribution and risk predictions in order to allow more transparent and accessible model evaluation. AVAILABILITY AND IMPLEMENTATION The code used in the final experiment is available at https://github.com/RaphaelS1/distribution_discrimination.
Collapse
Affiliation(s)
| | - Andreas Bender
- Department of Statistics, LMU Munich, 80539 Bavaria, Germany
| | - Sebastian Vollmer
- Department of Computer Science, Technische Universität Kaiserslautern, 67663 Kaiserslautern, Germany,Data Science and its Application, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), 67663 Kaiserslautern, Germany,Mathematics Institute, University of Warwick, CV4 7AL Coventry, UK
| |
Collapse
|
5
|
Research on the Application of Artificial Neural Network-Based Virtual Image Technology in College Tennis Teaching. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4935121. [PMID: 35845874 PMCID: PMC9287108 DOI: 10.1155/2022/4935121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 05/29/2022] [Accepted: 06/08/2022] [Indexed: 11/30/2022]
Abstract
At the same time that my country has shifted from high-speed development to high-quality development, my country has also put forward new requirements for education development. Due to the limited study time during college, each student's study habits and learning process are also different, and the degree of connection between tennis lessons is high, so there will be polarization when learning tennis. With the development of science and technology, more and more technological innovations are integrated into the classroom, and traditional teaching methods can no longer keep up with the pace of the times. Tennis teaching is a subject of equal proportion between theory and practice. The traditional teaching method simplifies the theory, which makes students to have some bad phenomena when they practice. Aiming at this series of problems, this paper uses algorithms such as softmax function and threshold function to construct an application model of virtual image technology based on the artificial neural network in tennis teaching. The research results of the article show that: (1) the average accuracy rate of the method in this paper is 97.22%, and the highest accuracy rate is 99.17%. The average accuracy rate also tends to increase with the increase of sample size; the recall rate is the highest, and the highest recall rate is 99.36%. The average recall rate is 96.77%; the highest correct rate is close to 100% and is significantly higher than the other three methods; the average correct rate reaches 98.8%; the response time is the shortest; the average response time is 33 ms; and the response time increases with the increase of the sample size. (2) After using this model, tennis skills have been improved, with an average of 12 in situ flips, an average of 7 in situ rackets, an average of 5 in situ forehand draws, and an average of 3 in situ backhand draws. (3) The average forehand and backhand scores of the class after the experiment were 90 and 86; the average forehand and backhand stability were 8 and 7; and the average forehand and backhand accuracy were 31 and 29, respectively. The average depth of forehand and backhand is 36 and 32. (4) Most of the students are satisfied with this model, and they all choose to strongly agree and relatively agree, and the percentage of very agree that helps stimulate learning has reached 60.52%, and no students choose to disagree very much.
Collapse
|
6
|
Hou J, Fu S, Wang X, Liu J, Xu Z. A noninvasive artificial neural network model to predict IgA nephropathy risk in Chinese population. Sci Rep 2022; 12:8296. [PMID: 35585099 PMCID: PMC9117316 DOI: 10.1038/s41598-022-11964-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 04/25/2022] [Indexed: 11/23/2022] Open
Abstract
Renal biopsy is the gold standard for Immunoglobulin A nephropathy (IgAN) but poses several problems. Thus, we aimed to establish a noninvasive model for predicting the risk probability of IgAN by analyzing routine and serological parameters. A total of 519 biopsy-diagnosed IgAN and 211 non-IgAN patients were recruited retrospectively. Artificial neural networks and logistic modeling were used. The receiver operating characteristic (ROC) curve and performance characteristics were determined to compare the diagnostic value between the two models. The training and validation sets did not differ significantly in terms of any variables. There were 19 significantly different parameters between the IgAN and non-IgAN groups. After multivariable logistic regression analysis, age, serum albumin, serum IgA, serum immunoglobulin G, estimated glomerular filtration rate, serum IgA/C3 ratio, and hematuria were found to be independently associated with the presence of IgAN. A backpropagation network model based on the above parameters was constructed and applied to the validation cohorts, revealing a sensitivity of 82.68% and a specificity of 84.78%. The area under the ROC curve for this model was higher than that for logistic regression model (0.881 vs. 0.839). The artificial neural network model based on routine markers can be a valuable noninvasive tool for predicting IgAN in screening practice.
Collapse
Affiliation(s)
- Jie Hou
- Department of Nephrology, The First Hospital of Jilin University, Changchun, 130021, Jilin, China
| | - Shaojie Fu
- Department of Nephrology, The First Hospital of Jilin University, Changchun, 130021, Jilin, China
| | - Xueyao Wang
- Department of Nephrology, The First Hospital of Jilin University, Changchun, 130021, Jilin, China
| | - Juan Liu
- Department of Nephrology, The First Hospital of Jilin University, Changchun, 130021, Jilin, China
| | - Zhonggao Xu
- Department of Nephrology, The First Hospital of Jilin University, Changchun, 130021, Jilin, China.
| |
Collapse
|
7
|
Arnold MH. Teasing out Artificial Intelligence in Medicine: An Ethical Critique of Artificial Intelligence and Machine Learning in Medicine. JOURNAL OF BIOETHICAL INQUIRY 2021; 18:121-139. [PMID: 33415596 PMCID: PMC7790358 DOI: 10.1007/s11673-020-10080-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 12/23/2020] [Indexed: 05/05/2023]
Abstract
The rapid adoption and implementation of artificial intelligence in medicine creates an ontologically distinct situation from prior care models. There are both potential advantages and disadvantages with such technology in advancing the interests of patients, with resultant ontological and epistemic concerns for physicians and patients relating to the instatiation of AI as a dependent, semi- or fully-autonomous agent in the encounter. The concept of libertarian paternalism potentially exercised by AI (and those who control it) has created challenges to conventional assessments of patient and physician autonomy. The unclear legal relationship between AI and its users cannot be settled presently, an progress in AI and its implementation in patient care will necessitate an iterative discourse to preserve humanitarian concerns in future models of care. This paper proposes that physicians should neither uncritically accept nor unreasonably resist developments in AI but must actively engage and contribute to the discourse, since AI will affect their roles and the nature of their work. One's moral imaginative capacity must be engaged in the questions of beneficence, autonomy, and justice of AI and whether its integration in healthcare has the potential to augment or interfere with the ends of medical practice.
Collapse
Affiliation(s)
- Mark Henderson Arnold
- School of Rural Health (Dubbo/Orange), Sydney Medical School, Faculty of Medicine and Health, University of Sydney, Sydney, Australia.
- Sydney Health Ethics, School of Public Health, University of Sydney, Sydney, Australia.
| |
Collapse
|
8
|
Barbieri D, Chawla N, Zaccagni L, Grgurinović T, Šarac J, Čoklo M, Missoni S. Predicting Cardiovascular Risk in Athletes: Resampling Improves Classification Performance. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17217923. [PMID: 33126737 PMCID: PMC7662820 DOI: 10.3390/ijerph17217923] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 10/20/2020] [Accepted: 10/25/2020] [Indexed: 11/16/2022]
Abstract
Cardiovascular diseases are the main cause of death worldwide. The aim of the present study is to verify the performances of a data mining methodology in the evaluation of cardiovascular risk in athletes, and whether the results may be used to support clinical decision making. Anthropometric (height and weight), demographic (age and sex) and biomedical (blood pressure and pulse rate) data of 26,002 athletes were collected in 2012 during routine sport medical examinations, which included electrocardiography at rest. Subjects were involved in competitive sport practice, for which medical clearance was needed. Outcomes were negative for the largest majority, as expected in an active population. Resampling was applied to balance positive/negative class ratio. A decision tree and logistic regression were used to classify individuals as either at risk or not. The receiver operating characteristic curve was used to assess classification performances. Data mining and resampling improved cardiovascular risk assessment in terms of increased area under the curve. The proposed methodology can be effectively applied to biomedical data in order to optimize clinical decision making, and-at the same time-minimize the amount of unnecessary examinations.
Collapse
Affiliation(s)
- Davide Barbieri
- Department of Biomedical and Specialty Surgical Sciences, Faculty of Medicine, Pharmacy and Prevention, University of Ferrara, 44121 Ferrara, Italy;
| | - Nitesh Chawla
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN 46556, USA;
| | - Luciana Zaccagni
- Department of Biomedical and Specialty Surgical Sciences, Faculty of Medicine, Pharmacy and Prevention, University of Ferrara, 44121 Ferrara, Italy;
- Biomedical Sport Studies Center, University of Ferrara, 44123 Ferrara, Italy
- Correspondence:
| | - Tonći Grgurinović
- Polyclinic for Occupational Health and Sports of Zagreb Sports Association with Laboratory of Medical Biochemistry, 10000 Zagreb, Croatia;
| | - Jelena Šarac
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia; (J.Š.); (M.Č.)
| | - Miran Čoklo
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia; (J.Š.); (M.Č.)
| | - Saša Missoni
- Institute for Anthropological Research, 10000 Zagreb, Croatia;
- School of Medicine, Josip Juraj Strossmayer University of Osijek, 31000 Osijek, Croatia
| |
Collapse
|
9
|
Ng F, Jiang R, Chow JCL. Predicting radiation treatment planning evaluation parameter using artificial intelligence and machine learning. IOP SCINOTES 2020. [DOI: 10.1088/2633-1357/ab805d] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
|
10
|
Gansky SA, Shafik S. At the crossroads of oral health inequities and precision public health. J Public Health Dent 2019; 80 Suppl 1:S14-S22. [PMID: 31063590 DOI: 10.1111/jphd.12316] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2018] [Revised: 01/06/2019] [Accepted: 03/14/2019] [Indexed: 01/21/2023]
Abstract
OBJECTIVES This paper reviews the precision public health literature pertaining to oral health, identifies possible threats that could inadvertently increase health inequities, and proposes potential opportunities that precision public health could utilize to reduce oral health inequities. METHODS The health sciences literature was reviewed and supplemented with new data to identify important issues relating to precision medicine, precision oral health, precision public health, and health equity. RESULTS Examples from general health and oral health were provided to illustrate salient concepts. CONCLUSIONS Future precision public health should utilize multifactorial, multi-level conceptual frameworks and conceptual causal models with upstream social determinants and downstream health effects, as well as a proportionate universalism perspective; and proper analytic methods, including sufficient sample sizes, appropriate statistical competitors, health disparity indices, causal modeling, and internal and external validation.
Collapse
Affiliation(s)
- Stuart A Gansky
- Division of Oral Epidemiology and Dental Public Health, Center to Address Disparities in Children's Oral Health, University of California, San Francisco, CA, USA
| | - Sarah Shafik
- Division of Oral Epidemiology and Dental Public Health, Center to Address Disparities in Children's Oral Health, University of California, San Francisco, CA, USA
| |
Collapse
|
11
|
Handelman GS, Kok HK, Chandra RV, Razavi AH, Lee MJ, Asadi H. eDoctor: machine learning and the future of medicine. J Intern Med 2018; 284:603-619. [PMID: 30102808 DOI: 10.1111/joim.12822] [Citation(s) in RCA: 374] [Impact Index Per Article: 62.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Machine learning (ML) is a burgeoning field of medicine with huge resources being applied to fuse computer science and statistics to medical problems. Proponents of ML extol its ability to deal with large, complex and disparate data, often found within medicine and feel that ML is the future for biomedical research, personalized medicine, computer-aided diagnosis to significantly advance global health care. However, the concepts of ML are unfamiliar to many medical professionals and there is untapped potential in the use of ML as a research tool. In this article, we provide an overview of the theory behind ML, explore the common ML algorithms used in medicine including their pitfalls and discuss the potential future of ML in medicine.
Collapse
Affiliation(s)
| | - H K Kok
- Interventional Radiology Service, Northern Hospital Radiology, Epping, Vic, Australia
| | - R V Chandra
- Interventional Neuroradiology Service, Monash Imaging, Monash Health, Clayton, Vic, Australia.,Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, Vic, Australia
| | - A H Razavi
- School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, Canada.,BCE Corporate Security, Ottawa, ON, Canada
| | - M J Lee
- Department of Radiology, Beaumont Hospital and Royal College of Surgeons in Ireland, Dublin, Ireland
| | - H Asadi
- Interventional Neuroradiology Service, Monash Imaging, Monash Health, Clayton, Vic, Australia.,Department of Radiology, Interventional Neuroradiology Service, Austin Health, Heidelberg, Vic, Australia.,School of Medicine, Faculty of Health, Deakin University, Waurn Ponds, Vic, Australia
| |
Collapse
|
12
|
Editorial Comment. J Urol 2018; 200:1377. [PMID: 30243814 DOI: 10.1016/j.juro.2018.06.103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
13
|
A machine learning approach for prediction of pregnancy outcome following IVF treatment. Neural Comput Appl 2018. [DOI: 10.1007/s00521-018-3693-9] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
14
|
Bannister CA, Halcox JP, Currie CJ, Preece A, Spasić I. A genetic programming approach to development of clinical prediction models: A case study in symptomatic cardiovascular disease. PLoS One 2018; 13:e0202685. [PMID: 30180175 PMCID: PMC6122798 DOI: 10.1371/journal.pone.0202685] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Accepted: 08/06/2018] [Indexed: 12/22/2022] Open
Abstract
Background Genetic programming (GP) is an evolutionary computing methodology capable of identifying complex, non-linear patterns in large data sets. Despite the potential advantages of GP over more typical, frequentist statistical approach methods, its applications to survival analyses are rare, at best. The aim of this study was to determine the utility of GP for the automatic development of clinical prediction models. Methods We compared GP against the commonly used Cox regression technique in terms of the development and performance of a cardiovascular risk score using data from the SMART study, a prospective cohort study of patients with symptomatic cardiovascular disease. The composite endpoint was cardiovascular death, non-fatal stroke, and myocardial infarction. A total of 3,873 patients aged 19–82 years were enrolled in the study 1996–2006. The cohort was split 70:30 into derivation and validation sets. The derivation set was used for development of both GP and Cox regression models. These models were then used to predict the discrete hazards at t = 1, 3, and 5 years. The predictive ability of both models was evaluated in terms of their risk discrimination and calibration using the validation set. Results The discrimination of both models was comparable. At time points t = 1, 3, and 5 years the C-index was 0.59, 0.69, 0.64 and 0.66, 0.70, 0.70 for the GP and Cox regression models respectively. At the same time points, the calibration of both models, which was assessed using calibration plots and the generalization of the Hosmer-Lemeshow test statistic, was also comparable, but with the Cox model being better calibrated to the validation data. Conclusion Using empirical data, we demonstrated that a prediction model developed automatically by GP has predictive ability comparable to that of manually tuned Cox regression. The GP model was more complex, but it was developed in a fully automated way and comprised fewer covariates. Furthermore, it did not require the expertise normally needed for its derivation, thereby alleviating the knowledge elicitation bottleneck. Overall, GP demonstrated considerable potential as a method for the automated development of clinical prediction models for diagnostic and prognostic purposes.
Collapse
Affiliation(s)
- Christian A. Bannister
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
- Cochrane Institute of Primary Care & Public Health, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Julian P. Halcox
- Department of Cardiology, Medical School, Swansea University, Swansea, United Kingdom
| | - Craig J. Currie
- Cochrane Institute of Primary Care & Public Health, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Alun Preece
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Irena Spasić
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
- * E-mail:
| |
Collapse
|
15
|
Senthil Kumar A, Kumar A, Krishnan R, Chakravarthi B, Deekshatalu BL. Soft Computing in Remote Sensing Applications. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES INDIA SECTION A-PHYSICAL SCIENCES 2017. [DOI: 10.1007/s40010-017-0431-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
16
|
Rezaei-Hachesu P, Oliyaee A, Safaie N, Ferdousi R. Comparison of coronary artery disease guidelines with extracted knowledge from data mining. J Cardiovasc Thorac Res 2017; 9:95-101. [PMID: 28740629 PMCID: PMC5516058 DOI: 10.15171/jcvtr.2017.16] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Accepted: 03/19/2017] [Indexed: 11/09/2022] Open
Abstract
Introduction: Coronary artery disease (CAD) is one of the major causes of disability and death in the world. Accordingly utilizing from a national and update guideline in heart-related disease are essential. Finding interesting rules from CAD data and comparison with guidelines was the objectives of this study. Methods: In this study 1993 valid and completed records related to patients (from 2009 to 2014) who had suffered from CAD were recruited and analyzed. Total of 25 variable including a target variable (CAD) and 24 inputs or predictor variables were used for knowledge discovery. To perform comparison between extracted knowledge and well trusted guidelines, Canadian Cardiovascular Society (CCS) guideline and US National Institute of Health (NIH) guideline were selected. Results of valid datamining rules were compared with guidelines and then were ranked based on their importance. Results: The most significant factor influencing CAD was chest pain. Elderly males (age >54) have a high probability to be diagnosed with CAD. Diagnostic methods that are listed in guidelines were confirmed and ranked based on analyzing of local CAD patients data. Knowledge discovery revealed that blood test has more diagnostic value among other medical tests that were recommended in guidelines. Conclusion: Guidelines confirm the achieved results from data mining (DM) techniques and help to rank important risk factors based on national and local information. Evaluation of extracted rules determined new patterns for CAD patients.
Collapse
Affiliation(s)
- Peyman Rezaei-Hachesu
- Health Information Technology Department, School of Management and Medical Informatics, Road Traffic Injury Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Azadeh Oliyaee
- Industrial Engineering Faculty, Sharif University Technology, Tehran, Iran
| | - Naser Safaie
- Cardiovascular Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Ferdousi
- Health Information Technology Department, School of Management and Medical Informatics, Road Traffic Injury Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
17
|
Fei Y, Hu J, Li WQ, Wang W, Zong GQ. Artificial neural networks predict the incidence of portosplenomesenteric venous thrombosis in patients with acute pancreatitis. J Thromb Haemost 2017; 15:439-445. [PMID: 27960048 DOI: 10.1111/jth.13588] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Indexed: 12/18/2022]
Abstract
Essentials Predicting the occurrence of portosplenomesenteric vein thrombosis (PSMVT) is difficult. We studied 72 patients with acute pancreatitis. Artificial neural networks modeling was more accurate than logistic regression in predicting PSMVT. Additional predictive factors may be incorporated into artificial neural networks. SUMMARY Objective To construct and validate artificial neural networks (ANNs) for predicting the occurrence of portosplenomesenteric venous thrombosis (PSMVT) and compare the predictive ability of the ANNs with that of logistic regression. Methods The ANNs and logistic regression modeling were constructed using simple clinical and laboratory data of 72 acute pancreatitis (AP) patients. The ANNs and logistic modeling were first trained on 48 randomly chosen patients and validated on the remaining 24 patients. The accuracy and the performance characteristics were compared between these two approaches by SPSS17.0 software. Results The training set and validation set did not differ on any of the 11 variables. After training, the back propagation network training error converged to 1 × 10-20 , and it retained excellent pattern recognition ability. When the ANNs model was applied to the validation set, it revealed a sensitivity of 80%, specificity of 85.7%, a positive predictive value of 77.6% and negative predictive value of 90.7%. The accuracy was 83.3%. Differences could be found between ANNs modeling and logistic regression modeling in these parameters (10.0% [95% CI, -14.3 to 34.3%], 14.3% [95% CI, -8.6 to 37.2%], 15.7% [95% CI, -9.9 to 41.3%], 11.8% [95% CI, -8.2 to 31.8%], 22.6% [95% CI, -1.9 to 47.1%], respectively). When ANNs modeling was used to identify PSMVT, the area under receiver operating characteristic curve was 0.849 (95% CI, 0.807-0.901), which demonstrated better overall properties than logistic regression modeling (AUC = 0.716) (95% CI, 0.679-0.761). Conclusions ANNs modeling was a more accurate tool than logistic regression in predicting the occurrence of PSMVT following AP. More clinical factors or biomarkers may be incorporated into ANNs modeling to improve its predictive ability.
Collapse
Affiliation(s)
- Y Fei
- Surgical Intensive Care Unit (SICU), Department of General Surgery, Jinling Hospital, Medical School of Nanjing University, Nanjing, China
| | - J Hu
- School of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - W-Q Li
- Surgical Intensive Care Unit (SICU), Department of General Surgery, Jinling Hospital, Medical School of Nanjing University, Nanjing, China
| | - W Wang
- Department of General Surgery, Bayi Hospital affiliated Nanjing University of Chinese Medicine/the 81st Hospital of P.L.A., Nanjing, China
| | - G-Q Zong
- Department of General Surgery, Bayi Hospital affiliated Nanjing University of Chinese Medicine/the 81st Hospital of P.L.A., Nanjing, China
| |
Collapse
|
18
|
Nilsaz-Dezfouli H, Abu-Bakar MR, Arasan J, Adam MB, Pourhoseingholi MA. Improving Gastric Cancer Outcome Prediction Using Single Time-Point Artificial Neural Network Models. Cancer Inform 2017; 16:1176935116686062. [PMID: 28469384 PMCID: PMC5392036 DOI: 10.1177/1176935116686062] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 09/18/2016] [Indexed: 12/17/2022] Open
Abstract
In cancer studies, the prediction of cancer outcome based on a set of prognostic variables has been a long-standing topic of interest. Current statistical methods for survival analysis offer the possibility of modelling cancer survivability but require unrealistic assumptions about the survival time distribution or proportionality of hazard. Therefore, attention must be paid in developing nonlinear models with less restrictive assumptions. Artificial neural network (ANN) models are primarily useful in prediction when nonlinear approaches are required to sift through the plethora of available information. The applications of ANN models for prognostic and diagnostic classification in medicine have attracted a lot of interest. The applications of ANN models in modelling the survival of patients with gastric cancer have been discussed in some studies without completely considering the censored data. This study proposes an ANN model for predicting gastric cancer survivability, considering the censored data. Five separate single time-point ANN models were developed to predict the outcome of patients after 1, 2, 3, 4, and 5 years. The performance of ANN model in predicting the probabilities of death is consistently high for all time points according to the accuracy and the area under the receiver operating characteristic curve.
Collapse
Affiliation(s)
| | - Mohd Rizam Abu-Bakar
- Institute for Mathematical Research, Universiti Putra Malaysia, Serdang, Malaysia
| | - Jayanthi Arasan
- Institute for Mathematical Research, Universiti Putra Malaysia, Serdang, Malaysia
| | - Mohd Bakri Adam
- Institute for Mathematical Research, Universiti Putra Malaysia, Serdang, Malaysia
| | - Mohamad Amin Pourhoseingholi
- Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and liver Diseases, Shahid Beheshti University of Medical Sciences,Tehran, Iran
| |
Collapse
|
19
|
Abstract
Knowledge Discovery and Data Mining (KDD) have become popular buzzwords. But what exactly is data mining? What are its strengths and limitations? Classic regression, artificial neural network (ANN), and classification and regression tree (CART) models are common KDD tools. Some recent reports ( e.g., Kattan et al., 1998 ) show that ANN and CART models can perform better than classic regression models: CART models excel at covariate interactions, while ANN models excel at nonlinear covariates. Model prediction performance is examined with the use of validation procedures and evaluating concordance, sensitivity, specificity, and likelihood ratio. To aid interpretation, various plots of predicted probabilities are utilized, such as lift charts, receiver operating characteristic curves, and cumulative captured-response plots. A dental caries study is used as an illustrative example. This paper compares the performance of logistic regression with KDD methods of CART and ANN in analyzing data from the Rochester caries study. With careful analysis, such as validation with sufficient sample size and the use of proper competitors, problems of naïve KDD analyses ( Schwarzer et al., 2000 ) can be carefully avoided.
Collapse
Affiliation(s)
- S A Gansky
- Center for Health and Community, Department of Preventive and Restorative Dental Sciences, Division of Oral Epidemiology and Dental Public Health, University of California, San Francisco, CA 94143-1361, USA.
| |
Collapse
|
20
|
Haux R, Koch S, Lovell N, Marschollek M, Nakashima N, Wolf KH. Health-Enabling and Ambient Assistive Technologies: Past, Present, Future. Yearb Med Inform 2016; Suppl 1:S76-91. [PMID: 27362588 PMCID: PMC5171510 DOI: 10.15265/iys-2016-s008] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND During the last decades, health-enabling and ambient assistive technologies became of considerable relevance for new informatics-based forms of diagnosis, prevention, and therapy. OBJECTIVES To describe the state of the art of health-enabling and ambient assistive technologies in 1992 and today, and its evolution over the last 25 years as well as to project where the field is expected to be in the next 25 years. In the context of this review, we define health-enabling and ambient assistive technologies as ambiently used sensor-based information and communication technologies, aiming at contributing to a person's health and health care as well as to her or his quality of life. METHODS Systematic review of all original articles with research focus in all volumes of the IMIA Yearbook of Medical Informatics. Surveying authors independently on key projects and visions as well as on their lessons learned in the context of health-enabling and ambient assistive technologies and summarizing their answers. Surveying authors independently on their expectations for the future and summarizing their answers. RESULTS IMIA Yearbook papers containing statements on health-enabling and ambient assistive technologies appear first in 2002. These papers form a minor part of published research articles in medical informatics. However, during recent years the number of articles published has increased significantly. Key projects were identified. There was a clear progress on the use of technologies. However proof of diagnostic relevance and therapeutic efficacy remains still limited. Reforming health care processes and focussing more on patient needs are required. CONCLUSIONS Health-enabling and ambient assistive technologies remain an important field for future health care and for interdisciplinary research. More and more publications assume that a person's home and their interaction therein, are becoming important components in health care provision, assessment, and management.
Collapse
Affiliation(s)
- R. Haux
- Peter L. Reichertz Institute for Medical Informatics, University of Braunschweig - Institute of Technology and Hannover Medical School, Germany
| | - S. Koch
- Health Informatics Centre, LIME, Karolinska Institutet, Stockholm, Sweden
| | - N.H. Lovell
- Graduate School of Biomedical Engineering, UNSW, Sydney, Australia
| | - M. Marschollek
- Peter L. Reichertz Institute for Medical Informatics, University of Braunschweig - Institute of Technology and Hannover Medical School, Germany
| | - N. Nakashima
- Medical Information Center, Kyushu University Hospital, Fukuoka, Japan
| | - K.-H. Wolf
- Peter L. Reichertz Institute for Medical Informatics, University of Braunschweig - Institute of Technology and Hannover Medical School, Germany
| |
Collapse
|
21
|
Coates J, Souhami L, El Naqa I. Big Data Analytics for Prostate Radiotherapy. Front Oncol 2016; 6:149. [PMID: 27379211 PMCID: PMC4905980 DOI: 10.3389/fonc.2016.00149] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 05/31/2016] [Indexed: 12/14/2022] Open
Abstract
Radiation therapy is a first-line treatment option for localized prostate cancer and radiation-induced normal tissue damage are often the main limiting factor for modern radiotherapy regimens. Conversely, under-dosing of target volumes in an attempt to spare adjacent healthy tissues limits the likelihood of achieving local, long-term control. Thus, the ability to generate personalized data-driven risk profiles for radiotherapy outcomes would provide valuable prognostic information to help guide both clinicians and patients alike. Big data applied to radiation oncology promises to deliver better understanding of outcomes by harvesting and integrating heterogeneous data types, including patient-specific clinical parameters, treatment-related dose-volume metrics, and biological risk factors. When taken together, such variables make up the basis for a multi-dimensional space (the "RadoncSpace") in which the presented modeling techniques search in order to identify significant predictors. Herein, we review outcome modeling and big data-mining techniques for both tumor control and radiotherapy-induced normal tissue effects. We apply many of the presented modeling approaches onto a cohort of hypofractionated prostate cancer patients taking into account different data types and a large heterogeneous mix of physical and biological parameters. Cross-validation techniques are also reviewed for the refinement of the proposed framework architecture and checking individual model performance. We conclude by considering advanced modeling techniques that borrow concepts from big data analytics, such as machine learning and artificial intelligence, before discussing the potential future impact of systems radiobiology approaches.
Collapse
Affiliation(s)
- James Coates
- Department of Oncology, University of Oxford, Oxford, UK
| | - Luis Souhami
- Division of Radiation Oncology, McGill University Health Centre, Montreal, QC, Canada
| | - Issam El Naqa
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
22
|
Prediction of medial tibiofemoral compartment joint space loss progression using volumetric cartilage measurements: Data from the FNIH OA biomarkers consortium. Eur Radiol 2016; 27:464-473. [PMID: 27221563 DOI: 10.1007/s00330-016-4393-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Revised: 04/28/2016] [Accepted: 05/02/2016] [Indexed: 12/18/2022]
Abstract
OBJECTIVES Investigating the association between baseline cartilage volume measurements (and initial 24th month volume loss) with medial compartment Joint-Space-Loss (JSL) progression (>0.7 mm) during 24-48th months of study. METHODS Case and control cohorts (Biomarkers Consortium subset from the Osteoarthritis Initiative (OAI)) were defined as participants with (n=297) and without (n=303) medial JSL progression (during 24-48th months). Cartilage volume measurements (baseline and 24th month loss) were obtained at five knee plates (medial-tibial, lateral-tibial, medial-femoral, lateral-femoral and patellar), and standardized values were analysed. Multivariate logistic regression was used with adjustment for known confounders. Artificial-Neural-Network analysis was conducted by Multi-Layer-Perceptrons (MLPs) including baseline determinants, and baseline (1) and interval changes (2) in cartilage volumes. RESULTS Larger baseline lateral-femoral cartilage volume was predictive of medial JSL (OR: 1.29 (1.01-1.64)). Greater initial 24th month lateral-femoral cartilage volume-loss (OR: 0.48 (0.27-0.84)) had protective effect on medial JSL during 24-48th months of study. Baseline and interval changes in lateral-femoral cartilage volume, were the most important estimators for medial JSL progression (importance values: 0.191(0.177-0.204), 0.218(0.207-0.228)) in the ANN analyses. CONCLUSIONS Cartilage volumes (both at baseline and their change during the initial 24 months) in the lateral femoral plate were predictive of medial JSL progression. KEY POINTS • Baseline lateral femoral cartilage volume is directly associated with medial JSL progression. • 24-month lateral femoral cartilage loss is inversely associated with medial JSL progression. • Lateral femoral cartilage volume is most important in association with medial JSL progression.
Collapse
|
23
|
Sapra R, Mehrotra S, Nundy S. Artificial Neural Networks: Prediction of mortality/survival in gastroenterology. ACTA ACUST UNITED AC 2015. [DOI: 10.1016/j.cmrp.2015.05.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
24
|
Lang TA, Altman DG. Basic statistical reporting for articles published in biomedical journals: the "Statistical Analyses and Methods in the Published Literature" or the SAMPL Guidelines. Int J Nurs Stud 2015; 52:5-9. [PMID: 25441757 DOI: 10.1016/j.ijnurstu.2014.09.006] [Citation(s) in RCA: 259] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- Thomas A Lang
- Tom Lang Communications and Training International, United States
| | - Douglas G Altman
- Centre for Statistics in Medicine, Oxford University, United Kingdom
| |
Collapse
|
25
|
Racioppi M, Salmaso L, Brombin C, Arboretti R, D'Agostino D, Colombo R, Serretta V, Brausi M, Casetta G, Gontero P, Hurle R, Tenaglia R, Altieri V, Bartoletti R, Maffezzini M, Siracusano S, Morgia G, Bassi PF. The clinical use of statistical permutation test methodology: a tool for identifying predictive variables of outcome. Urol Int 2014; 94:262-9. [PMID: 25171377 DOI: 10.1159/000365292] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2014] [Accepted: 06/16/2014] [Indexed: 11/19/2022]
Abstract
OBJECTIVES To identify the predictive variables affecting the outcome after radical surgery for bladder cancer by a newer statistical methodology, i.e. nonparametric combination (NPC). METHODS A multicenter study enrolled 1,312 patients who had undergone radical cystectomy for bladder cancer in 11 Italian oncological centers from January 1982 to December 2002. A statistical analysis of their medical history and diagnostic, pathological and postoperative variables was performed using a NPC test. The patients were included in a comprehensive database with medical history and clinical and pathological data. Five-year survival was used as the dependent variable, and p values were corrected for multiplicity using a closed testing procedure. The newer nonparametric approach was used to evaluate the prognostic importance of the variables. All of the analyses were performed using routines developed in MATLAB© and the significance level was set at α = 0.05. RESULTS A significant prognostic predictive value (p < 0.01) for tumor clinical staging, hydronephrosis, tumor pathological staging, grading, presence of concomitant carcinoma in situ, regional lymph node involvement, corpora cavernosa invasion, microvascular invasion, lymphatic invasion and prostatic stroma involvement was found. CONCLUSIONS The NPC test could handle any type of variable (categorical and quantitative) and take into account the multivariate relation among variables. This newer methodology offers a significant contribution in biomedical studies with several endpoints and is recommended in presence of non-normal data and missing values, as well as solving high-dimensional data and problems relating to small sample sizes.
Collapse
Affiliation(s)
- M Racioppi
- Department of Urology, Catholic University of the Sacred Heart, Rome, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Prediction of survival with alternative modeling techniques using pseudo values. PLoS One 2014; 9:e100234. [PMID: 24950066 PMCID: PMC4065009 DOI: 10.1371/journal.pone.0100234] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2013] [Accepted: 05/24/2014] [Indexed: 11/19/2022] Open
Abstract
Background The use of alternative modeling techniques for predicting patient survival is complicated by the fact that some alternative techniques cannot readily deal with censoring, which is essential for analyzing survival data. In the current study, we aimed to demonstrate that pseudo values enable statistically appropriate analyses of survival outcomes when used in seven alternative modeling techniques. Methods In this case study, we analyzed survival of 1282 Dutch patients with newly diagnosed Head and Neck Squamous Cell Carcinoma (HNSCC) with conventional Kaplan-Meier and Cox regression analysis. We subsequently calculated pseudo values to reflect the individual survival patterns. We used these pseudo values to compare recursive partitioning (RPART), neural nets (NNET), logistic regression (LR) general linear models (GLM) and three variants of support vector machines (SVM) with respect to dichotomous 60-month survival, and continuous pseudo values at 60 months or estimated survival time. We used the area under the ROC curve (AUC) and the root of the mean squared error (RMSE) to compare the performance of these models using bootstrap validation. Results Of a total of 1282 patients, 986 patients died during a median follow-up of 66 months (60-month survival: 52% [95% CI: 50%−55%]). The LR model had the highest optimism corrected AUC (0.791) to predict 60-month survival, followed by the SVM model with a linear kernel (AUC 0.787). The GLM model had the smallest optimism corrected RMSE when continuous pseudo values were considered for 60-month survival or the estimated survival time followed by SVM models with a linear kernel. The estimated importance of predictors varied substantially by the specific aspect of survival studied and modeling technique used. Conclusions The use of pseudo values makes it readily possible to apply alternative modeling techniques to survival problems, to compare their performance and to search further for promising alternative modeling techniques to analyze survival time.
Collapse
|
27
|
Piovesan L, Molino G, Terenziani P. An ontological knowledge and multiple abstraction level decision support system in healthcare. ACTA ACUST UNITED AC 2014. [DOI: 10.1186/2193-8636-1-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
28
|
Clinical prognostic methods: Trends and developments. J Biomed Inform 2014; 48:1-4. [DOI: 10.1016/j.jbi.2014.02.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Accepted: 02/28/2014] [Indexed: 02/04/2023]
|
29
|
Wei X, Ai J, Deng Y, Guan X, Johnson DR, Ang CY, Zhang C, Perkins EJ. Identification of biomarkers that distinguish chemical contaminants based on gene expression profiles. BMC Genomics 2014; 15:248. [PMID: 24678894 PMCID: PMC4051169 DOI: 10.1186/1471-2164-15-248] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2013] [Accepted: 03/11/2014] [Indexed: 11/29/2022] Open
Abstract
Background High throughput transcriptomics profiles such as those generated using microarrays have been useful in identifying biomarkers for different classification and toxicity prediction purposes. Here, we investigated the use of microarrays to predict chemical toxicants and their possible mechanisms of action. Results In this study, in vitro cultures of primary rat hepatocytes were exposed to 105 chemicals and vehicle controls, representing 14 compound classes. We comprehensively compared various normalization of gene expression profiles, feature selection and classification algorithms for the classification of these 105 chemicals into14 compound classes. We found that normalization had little effect on the averaged classification accuracy. Two support vector machine (SVM) methods, LibSVM and sequential minimal optimization, had better classification performance than other methods. SVM recursive feature selection (SVM-RFE) had the highest overfitting rate when an independent dataset was used for a prediction. Therefore, we developed a new feature selection algorithm called gradient method that had a relatively high training classification as well as prediction accuracy with the lowest overfitting rate of the methods tested. Analysis of biomarkers that distinguished the 14 classes of compounds identified a group of genes principally involved in cell cycle function that were significantly downregulated by metal and inflammatory compounds, but were induced by anti-microbial, cancer related drugs, pesticides, and PXR mediators. Conclusions Our results indicate that using microarrays and a supervised machine learning approach to predict chemical toxicants, their potential toxicity and mechanisms of action is practical and efficient. Choosing the right feature and classification algorithms for this multiple category classification and prediction is critical.
Collapse
Affiliation(s)
| | | | - Youping Deng
- Department of Internal Medicine, Rush University Cancer Center, Rush University Medical Center, Kidston House, 630 S, Hermitage Ave, Room 408, Chicago, IL 60612, USA.
| | | | | | | | | | | |
Collapse
|
30
|
Chatzimichail E, Matthaios D, Bouros D, Karakitsos P, Romanidis K, Kakolyris S, Papashinopoulos G, Rigas A. γ -H2AX: A Novel Prognostic Marker in a Prognosis Prediction Model of Patients with Early Operable Non-Small Cell Lung Cancer. Int J Genomics 2014; 2014:160236. [PMID: 24527431 PMCID: PMC3910456 DOI: 10.1155/2014/160236] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2013] [Revised: 11/03/2013] [Accepted: 12/12/2013] [Indexed: 11/18/2022] Open
Abstract
Cancer is a leading cause of death worldwide and the prognostic evaluation of cancer patients is of great importance in medical care. The use of artificial neural networks in prediction problems is well established in human medical literature. The aim of the current study was to assess the prognostic value of a series of clinical and molecular variables with the addition of γ -H2AX-a new DNA damage response marker-for the prediction of prognosis in patients with early operable non-small cell lung cancer by comparing the γ -H2AX-based artificial network prediction model with the corresponding LR one. Two prognostic models of 96 patients with 27 input variables were constructed by using the parameter-increasing method in order to compare the predictive accuracy of neural network and logistic regression models. The quality of the models was evaluated by an independent validation data set of 11 patients. Neural networks outperformed logistic regression in predicting the patient's outcome according to the experimental results. To assess the importance of the two factors p53 and γ -H2AX, models without these two variables were also constructed. JR and accuracy of these models were lower than those of the models using all input variables, suggesting that these biological markers are very important for optimal performance of the models. This study indicates that neural networks may represent a potentially more useful decision support tool than conventional statistical methods for predicting the outcome of patients with non-small cell lung cancer and that some molecular markers, such as γ -H2AX, enhance their predictive ability.
Collapse
Affiliation(s)
- E. Chatzimichail
- Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece
| | - D. Matthaios
- Department of Oncology, Democritus University of Thrace, Alexandroupolis, Greece
| | - D. Bouros
- Department of Pneumonology, Democritus University of Thrace, Alexandroupolis, Greece
| | - P. Karakitsos
- Department of Cytopathology, University of Athens Medical School, “Attikon” University Hospital, Athens, Greece
| | - K. Romanidis
- 2nd Department of Surgery, Democritus University of Thrace, Alexandroupolis, Greece
| | - S. Kakolyris
- Department of Oncology, Democritus University of Thrace, Alexandroupolis, Greece
| | - G. Papashinopoulos
- Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece
| | - A. Rigas
- Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece
| |
Collapse
|
31
|
A practical guide to epidemiological practice and standards in the identification and validation of diagnostic markers using a bladder cancer example. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2014; 1844:145-55. [DOI: 10.1016/j.bbapap.2013.07.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2012] [Revised: 07/23/2013] [Accepted: 07/30/2013] [Indexed: 12/14/2022]
|
32
|
Sengupta D, Naik PK. SN algorithm: analysis of temporal clinical data for mining periodic patterns and impending augury. J Clin Bioinforma 2013; 3:24. [PMID: 24283349 PMCID: PMC4177143 DOI: 10.1186/2043-9113-3-24] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 11/25/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND EHR (Electronic Health Record) system has led to development of specialized form of clinical databases which enable storage of information in temporal prospective. It has been a big challenge for mining this form of clinical data considering varied temporal points. This study proposes a conjoined solution to analyze the clinical parameters akin to a disease. We have used "association rule mining algorithm" to discover association rules among clinical parameters that can be augmented with the disease. Furthermore, we have proposed a new algorithm, SN algorithm, to map clinical parameters along with a disease state at various temporal points. RESULT SN algorithm is based on Jacobian approach, which augurs the state of a disease 'Sn' at a given temporal point 'Tn' by mapping the derivatives with the temporal point 'T0', whose state of disease 'S0' is known. The predictive ability of the proposed algorithm is evaluated in a temporal clinical data set of brain tumor patients. We have obtained a very high prediction accuracy of ~97% for a brain tumor state 'Sn' for any temporal point 'Tn'. CONCLUSION The results indicate that the methodology followed may be of good value to the diagnostic procedure, especially for analyzing temporal form of clinical data.
Collapse
Affiliation(s)
- Dipankar Sengupta
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Waknaghat, Solan, H,P, India.
| | | |
Collapse
|
33
|
Application of machine learning algorithms for clinical predictive modeling: a data-mining approach in SCT. Bone Marrow Transplant 2013; 49:332-7. [DOI: 10.1038/bmt.2013.146] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2013] [Revised: 07/31/2013] [Accepted: 08/03/2013] [Indexed: 01/18/2023]
|
34
|
Tapak L, Mahjub H, Hamidi O, Poorolajal J. Real-data comparison of data mining methods in prediction of diabetes in iran. Healthc Inform Res 2013; 19:177-85. [PMID: 24175116 PMCID: PMC3810525 DOI: 10.4258/hir.2013.19.3.177] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Revised: 09/08/2013] [Accepted: 09/21/2013] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVES Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes. METHODS The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria. RESULTS Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350). CONCLUSIONS The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases.
Collapse
Affiliation(s)
- Lily Tapak
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| | | | | | | |
Collapse
|
35
|
Tseng WJ, Hung LW, Shieh JS, Abbod MF, Lin J. Hip fracture risk assessment: artificial neural network outperforms conditional logistic regression in an age- and sex-matched case control study. BMC Musculoskelet Disord 2013; 14:207. [PMID: 23855555 PMCID: PMC3723443 DOI: 10.1186/1471-2474-14-207] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/14/2012] [Accepted: 07/12/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Osteoporotic hip fractures with a significant morbidity and excess mortality among the elderly have imposed huge health and economic burdens on societies worldwide. In this age- and sex-matched case control study, we examined the risk factors of hip fractures and assessed the fracture risk by conditional logistic regression (CLR) and ensemble artificial neural network (ANN). The performances of these two classifiers were compared. METHODS The study population consisted of 217 pairs (149 women and 68 men) of fractures and controls with an age older than 60 years. All the participants were interviewed with the same standardized questionnaire including questions on 66 risk factors in 12 categories. Univariate CLR analysis was initially conducted to examine the unadjusted odds ratio of all potential risk factors. The significant risk factors were then tested by multivariate analyses. For fracture risk assessment, the participants were randomly divided into modeling and testing datasets for 10-fold cross validation analyses. The predicting models built by CLR and ANN in modeling datasets were applied to testing datasets for generalization study. The performances, including discrimination and calibration, were compared with non-parametric Wilcoxon tests. RESULTS In univariate CLR analyses, 16 variables achieved significant level, and six of them remained significant in multivariate analyses, including low T score, low BMI, low MMSE score, milk intake, walking difficulty, and significant fall at home. For discrimination, ANN outperformed CLR in both 16- and 6-variable analyses in modeling and testing datasets (p?<?0.005). For calibration, ANN outperformed CLR only in 16-variable analyses in modeling and testing datasets (p?=?0.013 and 0.047, respectively). CONCLUSIONS The risk factors of hip fracture are more personal than environmental. With adequate model construction, ANN may outperform CLR in both discrimination and calibration. ANN seems to have not been developed to its full potential and efforts should be made to improve its performance.
Collapse
Affiliation(s)
- Wo-Jan Tseng
- Department of Orthopaedic Surgery, National Taiwan University Hospital Hsin-Chu Branch, No.25, Ln. 442, Sec. 1, Jingguo Rd., East Dist., 300, Hsinchu, Taiwan
| | | | | | | | | |
Collapse
|
36
|
Ayer T, Chen Q, Burnside ES. Artificial neural networks in mammography interpretation and diagnostic decision making. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:832509. [PMID: 23781276 PMCID: PMC3677609 DOI: 10.1155/2013/832509] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Accepted: 04/22/2013] [Indexed: 11/27/2022]
Abstract
Screening mammography is the most effective means for early detection of breast cancer. Although general rules for discriminating malignant and benign lesions exist, radiologists are unable to perfectly detect and classify all lesions as malignant and benign, for many reasons which include, but are not limited to, overlap of features that distinguish malignancy, difficulty in estimating disease risk, and variability in recommended management. When predictive variables are numerous and interact, ad hoc decision making strategies based on experience and memory may lead to systematic errors and variability in practice. The integration of computer models to help radiologists increase the accuracy of mammography examinations in diagnostic decision making has gained increasing attention in the last two decades. In this study, we provide an overview of one of the most commonly used models, artificial neural networks (ANNs), in mammography interpretation and diagnostic decision making and discuss important features in mammography interpretation. We conclude by discussing several common limitations of existing research on ANN-based detection and diagnostic models and provide possible future research directions.
Collapse
Affiliation(s)
- Turgay Ayer
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, 765 Ferst Dr., Atlanta, GA 30332, USA.
| | | | | |
Collapse
|
37
|
Abstract
AbstractTesticular cancer is rare but is the most common cancer in males between 15 and 34 years of age. Two principal types of testicular cancer are distinguished: seminomas and non-seminomas. If detected early, the overall cure rate for testicular cancer exceeds 90%. In this study, artificial neural network (ANN) analysis as a prognostic tool was demonstrated regard to five year recurrence after the non-seminoma treatment. Data from 202 patients treated for non-seminoma were available for evaluation and comparison. A total of 32 variables were analysed using the ANN. The ANN approach, as an advanced multivariate data processing method, was demon-strated to provide objective prognostic data. Some of these prognostic factors are consistent or even imperceptible with previously evaluated by other statistical methods.
Collapse
|
38
|
Modeling Paradigms for Medical Diagnostic Decision Support: A Survey and Future Directions. J Med Syst 2011; 36:3029-49. [PMID: 21964969 DOI: 10.1007/s10916-011-9780-4] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Accepted: 09/12/2011] [Indexed: 10/17/2022]
|
39
|
Cammann H, Jung K, Meyer HA, Stephan C. Avoiding pitfalls in applying prediction models, as illustrated by the example of prostate cancer diagnosis. Clin Chem 2011; 57:1490-8. [PMID: 21920913 DOI: 10.1373/clinchem.2011.166959] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
BACKGROUND The use of different mathematical models to support medical decisions is accompanied by increasing uncertainties when they are applied in practice. Using prostate cancer (PCa) risk models as an example, we recommend requirements for model development and draw attention to possible pitfalls so as to avoid the uncritical use of these models. CONTENT We conducted MEDLINE searches for applications of multivariate models supporting the prediction of PCa risk. We critically reviewed the methodological aspects of model development and the biological and analytical variability of the parameters used for model development. In addition, we reviewed the role of prostate biopsy as the gold standard for confirming diagnoses. In addition, we analyzed different methods of model evaluation with respect to their application to different populations. When using models in clinical practice, one must validate the results with a population from the application field. Typical model characteristics (such as discrimination performance and calibration) and methods for assessing the risk of a decision should be used when evaluating a model's output. The choice of a model should be based on these results and on the practicality of its use. SUMMARY To avoid possible errors in applying prediction models (the risk of PCa, for example) requires examining the possible pitfalls of the underlying mathematical models in the context of the individual case. The main tools for this purpose are discrimination, calibration, and decision curve analysis.
Collapse
Affiliation(s)
- Henning Cammann
- Institute of Medical Informatics, Charite´ –Universita¨ tsmedizin Berlin, Germany
| | | | | | | |
Collapse
|
40
|
Tong DL, Schierz AC. Hybrid genetic algorithm-neural network: Feature extraction for unpreprocessed microarray data. Artif Intell Med 2011; 53:47-56. [DOI: 10.1016/j.artmed.2011.06.008] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2010] [Revised: 05/11/2011] [Accepted: 06/26/2011] [Indexed: 12/22/2022]
|
41
|
Maroco J, Silva D, Rodrigues A, Guerreiro M, Santana I, de Mendonça A. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC Res Notes 2011; 4:299. [PMID: 21849043 PMCID: PMC3180705 DOI: 10.1186/1756-0500-4-299] [Citation(s) in RCA: 166] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2011] [Accepted: 08/17/2011] [Indexed: 12/02/2022] Open
Abstract
Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.
Collapse
Affiliation(s)
- João Maroco
- Unidade de Investigação em Psicologia e Saúde & Departamento de Estatística, ISPA - Instituto Universitário, Rua Jardim do Tabaco 44, 1149-041 Lisboa, Portugal.
| | | | | | | | | | | |
Collapse
|
42
|
Artificial neural network analysis of circulating tumor cells in metastatic breast cancer patients. Breast Cancer Res Treat 2011; 129:451-8. [PMID: 21710134 DOI: 10.1007/s10549-011-1645-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Accepted: 06/15/2011] [Indexed: 01/17/2023]
Abstract
A cut-off of 5 circulating tumor cells (CTCs) per 7.5 ml of blood in metastatic breast cancer (MBC) patients is highly predictive of outcome. We analyzed the relationship between CTCs as a continuous variable and overall survival in immunohistochemically defined primary tumor molecular subtypes using an artificial neural network (ANN) prognostic tool to determine the shape of the relationship between risk of death and CTC count and to predict individual survival. We analyzed a training dataset of 311 of 517 (60%) consecutive MBC patients who had been treated at MD Anderson Cancer Center from September 2004 to 2009 and who had undergone pre-therapy CTC counts (CellSearch(®)). Age; estrogen, progesterone receptor, and HER2 status; visceral metastasis; metastatic disease sites; therapy type and line; and CTCs as a continuous value were evaluated using ANN. A model with parameter estimates obtained from the training data was tested in a validation set of the remaining 206 (40%) patients. The model estimates were accurate, with good discrimination and calibration. Risk of death, as estimated by ANN, linearly increased with increasing CTC count in all molecular tumor subtypes but was higher in ER+ and triple-negative MBC than in HER2+. The probabilities of survival for the four subtypes with 0 CTC were as follows: ER+/HER2- 0.947, ER+/HER2+ 0.959, ER-/HER2+ 0.902, and ER-/HER2- 0.875. For patients with 200 CTCs, they were ER+/HER2- 0.439, ER+/HER2+ 0.621, ER-/HER2+ 0.307, ER-/HER2- 0.130. In this large study, ANN revealed a linear increase of risk of death in MBC patients with increasing CTC counts in all tumor subtypes. CTCs' prognostic effect was less evident in HER2+ MBC patients treated with targeted therapy. This study may support the concept that the number of CTCs, along with the biologic characteristics, needs to be carefully taken into account in future analysis.
Collapse
|
43
|
Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF, Hua L. Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst 2011; 36:2431-48. [PMID: 21537851 DOI: 10.1007/s10916-011-9710-5] [Citation(s) in RCA: 168] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2011] [Accepted: 04/07/2011] [Indexed: 10/18/2022]
Abstract
As a new concept that emerged in the middle of 1990's, data mining can help researchers gain both novel and deep insights and can facilitate unprecedented understanding of large biomedical datasets. Data mining can uncover new biomedical and healthcare knowledge for clinical and administrative decision making as well as generate scientific hypotheses from large experimental data, clinical databases, and/or biomedical literature. This review first introduces data mining in general (e.g., the background, definition, and process of data mining), discusses the major differences between statistics and data mining and then speaks to the uniqueness of data mining in the biomedical and healthcare fields. A brief summarization of various data mining algorithms used for classification, clustering, and association as well as their respective advantages and drawbacks is also presented. Suggested guidelines on how to use data mining algorithms in each area of classification, clustering, and association are offered along with three examples of how data mining has been used in the healthcare industry. Given the successful application of data mining by health related organizations that has helped to predict health insurance fraud and under-diagnosed patients, and identify and classify at-risk people in terms of health with the goal of reducing healthcare cost, we introduce how data mining technologies (in each area of classification, clustering, and association) have been used for a multitude of purposes, including research in the biomedical and healthcare fields. A discussion of the technologies available to enable the prediction of healthcare costs (including length of hospital stay), disease diagnosis and prognosis, and the discovery of hidden biomedical and healthcare patterns from related databases is offered along with a discussion of the use of data mining to discover such relationships as those between health conditions and a disease, relationships among diseases, and relationships among drugs. The article concludes with a discussion of the problems that hamper the clinical use of data mining by health professionals.
Collapse
Affiliation(s)
- Illhoi Yoo
- Health Management and Informatics Department, University of Missouri School of Medicine, Columbia, MO 65212, USA.
| | | | | | | | | | | | | |
Collapse
|
44
|
Lancashire LJ, Roberts DL, Dive C, Renehan AG. The development of composite circulating biomarker models for use in anticancer drug clinical development. Int J Cancer 2011; 128:1843-51. [PMID: 20549702 DOI: 10.1002/ijc.25513] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The development of informative composite circulating biomarkers predicting cancer presence or therapy response is clinically attractive but optimal approaches to modeling are as yet unclear. This study investigated multidimensional relationships within an example panel of serum insulin-like growth factor (IGF) peptides using logistic regression (LR), fractional polynomial (FP), regression, artificial neural networks (ANNs) and support vector machines (SVMs) to derive predictive models for colorectal cancer (CRC). Two phase 2 biomarker validation analyses were performed: controls were ambulant adults (n = 722); cases were: (i) CRC patients (n = 100) and (ii) patients with acromegaly (n = 52), the latter as "positive" discriminators. Serum IGF-I, IGF-II, IGF binding protein (IGFBP)-2 and -3 were measured. Discriminatory characteristics were compared within and between models. For the LR, FP and ANN models, and to a lesser extent SVMs, the addition of covariates at several steps improved discrimination characteristics. The optimum biomarker combination discriminating CRC vs. controls was achieved using ANN models [sensitivity, 94%; specificity, 90%; accuracy, 0.975 (95% CIs: 0.948 1.000)]. ANN modeling significantly outperformed LR, FP and SVM in terms of discrimination (p < 0.0001) and calibration. The acromegaly analysis demonstrated expected high performance characteristics in the ANN model [accuracy, 0.993 (95% CIs: 0.977, 1.000)]. Curved decision surfaces generated from the ANNs revealed the potential clinical utility. This example demonstrated improved discriminatory characteristics within the composite biomarker ANN model and a final model that outperformed the three other models. This modeling approach forms the basis to evaluate composite biomarkers as pharmacological and predictive biomarkers in future clinical trials.
Collapse
Affiliation(s)
- Lee J Lancashire
- Clinical and Experimental Pharmacology Group, Paterson Institute for Cancer Research, Manchester, UK
| | | | | | | |
Collapse
|
45
|
Contribution of artificial intelligence to the knowledge of prognostic factors in Hodgkin's lymphoma. Eur J Cancer Prev 2011; 19:308-12. [PMID: 20473182 DOI: 10.1097/cej.0b013e32833ad353] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Hodgkin's lymphoma is one of the most curable malignancies and most patients achieve a lasting complete remission. In this study, artificial neural network (ANN) analysis was shown to provide significant factors with regard to 5-year recurrence after lymphoma treatment. Data from 114 patients treated for Hodgkin's disease were available for evaluation and comparison. A total of 31 variables were subjected to ANN analysis. The ANN approach as an advanced multivariate data processing method was shown to provide objective prognostic data. Some of these prognostic factors are consistent or even identical to the factors evaluated earlier by other statistical methods.
Collapse
|
46
|
Genetic Algorithm-Neural Network (GANN): a study of neural network activation functions and depth of genetic algorithm search applied to feature selection. INT J MACH LEARN CYB 2010. [DOI: 10.1007/s13042-010-0004-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
47
|
Hui EP, Leung LKS, Poon TCW, Mo F, Chan VTC, Ma ATW, Poon A, Hui EK, Mak SS, Lai M, Lei KIK, Ma BBY, Mok TSK, Yeo W, Zee BCY, Chan ATC. Prediction of outcome in cancer patients with febrile neutropenia: a prospective validation of the Multinational Association for Supportive Care in Cancer risk index in a Chinese population and comparison with the Talcott model and artificial neural network. Support Care Cancer 2010; 19:1625-35. [PMID: 20820815 DOI: 10.1007/s00520-010-0993-8] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2009] [Accepted: 08/23/2010] [Indexed: 12/28/2022]
Abstract
PURPOSE We aimed to validate the Multinational Association for Supportive Care in Cancer (MASCC) risk index, and compare it with the Talcott model and artificial neural network (ANN) in predicting the outcome of febrile neutropenia in a Chinese population. METHODS We prospectively enrolled adult cancer patients who developed febrile neutropenia after chemotherapy and risk classified them according to MASCC score and Talcott model. ANN models were constructed and temporally validated in prospectively collected cohorts. RESULTS From October 2005 to February 2008, 227 consecutive patients were enrolled. Serious medical complications occurred in 22% of patients and 4% died. The positive predictive value of low risk prediction was 86% (95% CI = 81-90%) for MASCC score ≥ 21, 84% (79-89%) for Talcott model, and 85% (78-93%) for the best ANN model. The sensitivity, specificity, negative predictive value, and misclassification rate were 81%, 60%, 52%, and 24%, respectively, for MASCC score ≥ 21; and 50%, 72%, 33%, and 44%, respectively, for Talcott model; and 84%, 60%, 58%, and 22%, respectively, for ANN model. The area under the receiver-operating characteristic curve was 0.808 (95% CI = 0.717-0.899) for MASCC, 0.573 (0.455-0.691) for Talcott, and 0.737 (0.633-0.841) for ANN model. In the low risk group identified by MASCC score ≥ 21 (70% of all patients), 12.5% developed complications and 1.9% died, compared with 43.3%, and 9.0%, respectively, in the high risk group (p < 0.0001). CONCLUSIONS The MASCC risk index is prospectively validated in a Chinese population. It demonstrates a better overall performance than the Talcott model and is equivalent to ANN model.
Collapse
Affiliation(s)
- Edwin Pun Hui
- Department of Clinical Oncology, Prince of Wales Hospital, The Chinese University of Hong Kong, Shatin, Hong Kong, SAR, China,
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Sedehi M, Mehrabi Y, Kazemnejad A, Joharimajd V, Hadaegh F. RETRACTED ARTICLE: Artificial neural network for prediction of mixed response variables: simulation and application. Neural Comput Appl 2010. [DOI: 10.1007/s00521-010-0436-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
49
|
Prediction of clinical conditions after coronary bypass surgery using dynamic data analysis. J Med Syst 2010; 34:229-39. [PMID: 20503607 DOI: 10.1007/s10916-008-9234-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
This work studies the impact of using dynamic information as features in a machine learning algorithm for the prediction task of classifying critically ill patients in two classes according to the time they need to reach a stable state after coronary bypass surgery: less or more than 9 h. On the basis of five physiological variables (heart rate, systolic arterial blood pressure, systolic pulmonary pressure, blood temperature and oxygen saturation), different dynamic features were extracted, namely the means and standard deviations at different moments in time, coefficients of multivariate autoregressive models and cepstral coefficients. These sets of features served subsequently as inputs for a Gaussian process and the prediction results were compared with the case where only admission data was used for the classification. The dynamic features, especially the cepstral coefficients (aROC: 0.749, Brier score: 0.206), resulted in higher performances when compared to static admission data (aROC: 0.547, Brier score: 0.247). The differences in performance are shown to be significant. In all cases, the Gaussian process classifier outperformed to logistic regression.
Collapse
|
50
|
Comparison between an artificial neural network and logistic regression in predicting acute graft-vs-host disease after unrelated donor hematopoietic stem cell transplantation in thalassemia patients. Exp Hematol 2010; 38:426-33. [DOI: 10.1016/j.exphem.2010.02.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2009] [Revised: 02/24/2010] [Accepted: 02/26/2010] [Indexed: 10/19/2022]
|