1
|
Someeh N, Mirfeizi M, Asghari-Jafarabadi M, Alinia S, Farzipoor F, Shamshirgaran SM. Predicting mortality in brain stroke patients using neural networks: outcomes analysis in a longitudinal study. Sci Rep 2023; 13:18530. [PMID: 37898678 PMCID: PMC10613278 DOI: 10.1038/s41598-023-45877-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 10/25/2023] [Indexed: 10/30/2023] Open
Abstract
In this study, Neural Networks (NN) modelling has emerged as a promising tool for predicting outcomes in patients with Brain Stroke (BS) by identifying key risk factors. In this longitudinal study, we enrolled 332 patients form Imam hospital in Ardabil, Iran, with mean age: 77.4 (SD 10.4) years, and 50.6% were male. Diagnosis of BS was confirmed using both computerized tomography scan and magnetic resonance imaging, and risk factor and outcome data were collected from the hospital's BS registry, and by telephone follow-up over a period of 10 years, respectively. Using a multilayer perceptron NN approach, we analysed the impact of various risk factors on time to mortality and mortality from BS. A total of 100 NN classification algorithm were trained utilizing STATISTICA 13 software, and the optimal model was selected for further analysis based on their diagnostic performance. We also calculated Kaplan-Meier survival probabilities and conducted Log-rank tests. The five selected NN models exhibited impressive accuracy ranges of 81-85%. However, the optimal model stood out for its superior diagnostic indices. Mortality rate in the training and the validation data set was 7.9 (95% CI 5.7-11.0) per 1000 and 8.2 (7.1-9.6) per 1000, respectively (P = 0.925). The optimal model highlighted significant risk factors for BS mortality, including smoking, lower education, advanced age, lack of physical activity, a history of diabetes, all carrying substantial importance weights. Our study provides compelling evidence that the NN approach is highly effective in predicting mortality in patients with BS based on key risk factors, and has the potential to significantly enhance the accuracy of prediction. Moreover, our findings could inform more effective prevention strategies for BS, ultimately leading to better patient outcomes.
Collapse
Affiliation(s)
- Nasrin Someeh
- Student Research Committee, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Mani Mirfeizi
- Werribie Mercy West Hospital, Werribee, VIC, 3030, Australia
| | - Mohammad Asghari-Jafarabadi
- Road Traffic Injury Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
- Cabrini Research, Cabrini Health, Malvern, VIC, 3144, Australia.
- School of Public Health and Preventative Medicine, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, VIC, 3004, Australia.
- Department of Psychiatry, School of Clinical Sciences, Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, VIC, 3168, Australia.
| | - Shayesteh Alinia
- Department of Biostatistics and Epidemiology, School of Medicine, Zanjan University of Medical Sciences, Zanjan, Iran.
| | - Farshid Farzipoor
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Seyed Morteza Shamshirgaran
- Department of Statistics and Epidemiology, Faculty of Health Sciences, Neyshabur University of Medical Sciences, Neyshabur, Iran
| |
Collapse
|
2
|
Feng X, Dong Z, Li Y, Cheng Q, Xin Y, Lu Q, Xin R. MSFC: a new feature construction method for accurate diagnosis of mass spectrometry data. Sci Rep 2023; 13:15694. [PMID: 37735183 PMCID: PMC10514077 DOI: 10.1038/s41598-023-42395-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 09/09/2023] [Indexed: 09/23/2023] Open
Abstract
Mass spectrometry technology can realize dynamic detection of many complex matrix samples in a simple, rapid, compassionate, precise, and high-throughput manner and has become an indispensable tool in accurate diagnosis. The mass spectrometry data analysis is mainly to analyze all metabolites in the organism quantitatively and to find the relative relationship between metabolites and physiological and pathological changes. A feature construction of mass spectrometry data (MSFS) method is proposed to construct the features of the original mass spectrometry data, so as to reduce the noise in the mass spectrometry data, reduce the redundancy of the original data and improve the information content of the data. Chi-square test is used to select the optimal non-redundant feature subset from high-dimensional features. And the optimal feature subset is visually analyzed and corresponds to the original mass spectrum interval. Training in 10 kinds of supervised learning models, and evaluating the classification effect of the models through various evaluation indexes. Taking two public mass spectrometry datasets as examples, the feasibility of the method proposed in this paper is verified. In the coronary heart disease dataset, during the identification process of mixed batch samples, the classification accuracy on the test set reached 1.000; During the recognition process, the classification accuracy on the test set advanced to 0.979. On the colorectal liver metastases data set, the classification accuracy on the test set reached 1.000. This paper attempts to use a new raw mass spectrometry data preprocessing method to realize the alignment operation of the raw mass spectrometry data, which significantly improves the classification accuracy and provides another new idea for mass spectrometry data analysis. Compared with MetaboAnalyst software and existing experimental results, the method proposed in this paper has obtained better classification results.
Collapse
Affiliation(s)
- Xin Feng
- School of Science, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China
- State Key Laboratory of Inorganic Synthesis and Preparative Chemistry, College of Chemistry, Jilin University, Changchun, 130012, People's Republic of China
| | - Zheyuan Dong
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China
| | - Yingrui Li
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China
| | - Qian Cheng
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China
| | - Yongxian Xin
- College of Business and Economics, Australian National University, Canberra, ACT, 2601, Australia
| | - Qiaolin Lu
- School of Artificial Intelligence, Jilin University, Changchun, 130012, People's Republic of China
| | - Ruihao Xin
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China.
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, People's Republic of China.
| |
Collapse
|
3
|
Ovarian cancer detection using optimized machine learning models with adaptive differential evolution. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
4
|
Point-of-care detection assay based on biomarker-imprinted polymer for different cancers: a state-of-the-art review. Polym Bull (Berl) 2022. [DOI: 10.1007/s00289-022-04085-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
5
|
Pirhadi S, Maghooli K, Moteghaed NY, Garshasbi M, Mousavirad SJ. Biomarker Discovery by Imperialist Competitive Algorithm in Mass Spectrometry Data for Ovarian Cancer Prediction. JOURNAL OF MEDICAL SIGNALS & SENSORS 2021; 11:108-119. [PMID: 34268099 PMCID: PMC8253319 DOI: 10.4103/jmss.jmss_20_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Revised: 05/14/2020] [Accepted: 07/04/2020] [Indexed: 11/20/2022]
Abstract
Background: Mass spectrometry is a method for identifying proteins and could be used for distinguishing between proteins in healthy and nonhealthy samples. This study was conducted using mass spectrometry data of ovarian cancer with high resolution. Usually, diagnostic and monitoring tests are done according to sensitivity and specificity rates; thus, the aim of this study is to compare mass spectrometry of healthy and cancerous samples in order to find a set of biomarkers or indicators with a reasonable sensitivity and specificity rates. Methods: Therefore, combination methods were used for choosing the optimum feature set as t-test, entropy, Bhattacharya, and an imperialist competitive algorithm with K-nearest neighbors classifier. The resulting feature from each method was feed to the C5 decision tree with 10-fold cross-validation to classify data. Results: The most important variables using this method were identified and a set of rules were extracted. Similar to most frequent features, repetitive patterns were not obtained; the generalized rule induction method was used to identify the repetitive patterns. Conclusion: Finally, the resulting features were introduced as biomarkers and compared with other studies. It was found that the resulting features were very similar to other studies. In the case of the classifier, higher sensitivity and specificity rates with a lower number of features were achieved when compared with other studies.
Collapse
Affiliation(s)
- Shiva Pirhadi
- Department of Biomedical Engineering, Tehran Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Keivan Maghooli
- Department of Biomedical Engineering, Tehran Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Niloofar Yousefi Moteghaed
- Department of Biomedical Engineering and Medical Physics, Faculty of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Masoud Garshasbi
- Department of Medical Genetics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | | |
Collapse
|
6
|
Someeh N, Asghari Jafarabadi M, Shamshirgaran SM, Farzipoor F. The outcome in patients with brain stroke: A deep learning neural network modeling. JOURNAL OF RESEARCH IN MEDICAL SCIENCES 2020; 25:78. [PMID: 33088315 PMCID: PMC7554543 DOI: 10.4103/jrms.jrms_268_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Revised: 04/11/2020] [Accepted: 04/25/2020] [Indexed: 11/19/2022]
Abstract
Background: The artificial intelligence field is obtaining ever-increasing interests for enhancing the accuracy of diagnosis and the quality of patient care. Deep learning neural network (DLNN) approach was considered in patients with brain stroke (BS) to predict and classify the outcome by the risk factors. Materials and Methods: A total of 332 patients with BS (mean age: 77.4 [standard deviation: 10.4] years, 50.6% – male) from Imam Khomeini Hospital, Ardabil, Iran, during 2008–2018 participated in this prospective study. Data were gathered from the available documents of the BS registry. Furthermore, the diagnosis of BS was considered based on computerized tomography scans and magnetic resonance imaging. The DLNN strategy was applied to predict the effects of the main risk factors on mortality. The quality of the model was measured by diagnostic indices. Results: The finding of this study for 81 selected models demonstrated that ranges of accuracy, sensitivity, and specificity are 90.5%–99.7%, 83.8%–100%, and 89.8%–99.5%, respectively. Based on the optimal model (tangent hyperbolic activation function with the minimum–maximum hidden units of 10–20, max epochs of 400, momentum of 0.5, and learning rate of 0.1), the most important predictors for BS mortality were time interval after 10 years (accuracy = 92.2%), age category (75.6%), the history of hyperlipoproteinemia (66.9%), and education level (66.9%). The other independent variables are at moderate importance (66.6%) which include sex, employment status, residential place, smoking habits, history of heart disease, cerebrovascular accident type, blood pressure, diabetes, oral contraceptive pill use, and physical activity. Conclusion: The best means for dropping the BS load is effective BS prevention. DLNN strategy showed a surprising presentation in the prediction of BS mortality based on the main risk factors with an excellent diagnostic accuracy. Moreover, the time interval after 10 years, age, the history of hyperlipoproteinemia, and education level are the most important predictors for BS.
Collapse
Affiliation(s)
- Nasrin Someeh
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Mohammad Asghari Jafarabadi
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Seyed Morteza Shamshirgaran
- Department of Statistics and Epidemiology, Faculty of Health Sciences, Neyshabur University of Medical Sciences, Neyshabur, Iran
| | - Farshid Farzipoor
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
7
|
Rezaianzadeh A, Dastoorpoor M, Sanaei M, Salehnasab C, Mohammadi MJ, Mousavizadeh A. Predictors of length of stay in the coronary care unit in patient with acute coronary syndrome based on data mining methods. CLINICAL EPIDEMIOLOGY AND GLOBAL HEALTH 2020. [DOI: 10.1016/j.cegh.2019.09.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
|
8
|
Talavera A, Luna A. Machine Learning: A Contribution to Operational Research. IEEE REVISTA IBEROAMERICANA DE TECNOLOGIAS DEL APRENDIZAJE 2020. [DOI: 10.1109/rita.2020.2987700] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
9
|
Predicting Corporate Financial Sustainability Using Novel Business Analytics. SUSTAINABILITY 2018. [DOI: 10.3390/su11010064] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Measuring and managing the financial sustainability of the borrowers is crucial to financial institutions for their risk management. As a result, building an effective corporate financial distress prediction model has been an important research topic for a long time. Recently, researchers are exerting themselves to improve the accuracy of financial distress prediction models by applying various business analytics approaches including statistical and artificial intelligence methods. Among them, support vector machines (SVMs) are becoming popular. SVMs require only small training samples and have little possibility of overfitting if model parameters are properly tuned. Nonetheless, SVMs generally show high prediction accuracy since it can deal with complex nonlinear patterns. Despite of these advantages, SVMs are often criticized because their architectural factors are determined by heuristics, such as the parameters of a kernel function and the subsets of appropriate features and instances. In this study, we propose globally optimized SVMs, denoted by GOSVM, a novel hybrid SVM model designed to optimize feature selection, instance selection, and kernel parameters altogether. This study introduces genetic algorithm (GA) in order to simultaneously optimize multiple heterogeneous design factors of SVMs. Our study applies the proposed model to the real-world case for predicting financial distress. Experiments show that the proposed model significantly improves the prediction accuracy of conventional SVMs.
Collapse
|
10
|
Shen R, Li Z, Zhang L, Hua Y, Mao M, Li Z, Cai Z, Qiu Y, Gryak J, Najarian K. Osteosarcoma Patients Classification Using Plain X-Rays and Metabolomic Data. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2018:690-693. [PMID: 30440490 DOI: 10.1109/embc.2018.8512338] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Osteosarcoma is the most common type of bone cancer. The primary means of osteosarcoma diagnosis is through evaluating plain x-rays. Using image analysis techniques, features that clinicians use to diagnose osteosarcoma can be quantified and studied using computer algorithms. In this paper, we classify benign tumor patients and osteosarcoma patients using both image features and metabolomic data. These two types of feature sets are processed with feature selection algorithms - recursive feature elimination and information gain. The selected features are then assessed by two classification models - random forest and support vector machine (SVM). The performances of the two models are evaluated and compared using receiver operating characteristic curves. The random forest classifier outperformed the SVM, with a sensitivity of .92 and a specificity of .78.
Collapse
|
11
|
Smith BR, Ashton KM, Brodbelt A, Dawson T, Jenkinson MD, Hunt NT, Palmer DS, Baker MJ. Combining random forest and 2D correlation analysis to identify serum spectral signatures for neuro-oncology. Analyst 2018; 141:3668-78. [PMID: 26818218 DOI: 10.1039/c5an02452h] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Fourier transform infrared (FTIR) spectroscopy has long been established as an analytical technique for the measurement of vibrational modes of molecular systems. More recently, FTIR has been used for the analysis of biofluids with the aim of becoming a tool to aid diagnosis. For the clinician, this represents a convenient, fast, non-subjective option for the study of biofluids and the diagnosis of disease states. The patient also benefits from this method, as the procedure for the collection of serum is much less invasive and stressful than traditional biopsy. This is especially true of patients in whom brain cancer is suspected. A brain biopsy is very unpleasant for the patient, potentially dangerous and can occasionally be inconclusive. We therefore present a method for the diagnosis of brain cancer from serum samples using FTIR and machine learning techniques. The scope of the study involved 433 patients from whom were collected 9 spectra each in the range 600-4000 cm(-1). To begin the development of the novel method, various pre-processing steps were investigated and ranked in terms of final accuracy of the diagnosis. Random forest machine learning was utilised as a classifier to separate patients into cancer or non-cancer categories based upon the intensities of wavenumbers present in their spectra. Generalised 2D correlational analysis was then employed to further augment the machine learning, and also to establish spectral features important for the distinction between cancer and non-cancer serum samples. Using these methods, sensitivities of up to 92.8% and specificities of up to 91.5% were possible. Furthermore, ratiometrics were also investigated in order to establish any correlations present in the dataset. We show a rapid, computationally light, accurate, statistically robust methodology for the identification of spectral features present in differing disease states. With current advances in IR technology, such as the development of rapid discrete frequency collection, this approach is of importance to enable future clinical translation and enables IR to achieve its potential.
Collapse
Affiliation(s)
- Benjamin R Smith
- WestCHEM, Department of Pure and Applied Chemistry, University of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow, Scotland G1 1XL, UK. and WestCHEM, Department of Pure and Applied Chemistry, University of Strathclyde, Technology and Innovation Centre, 99 George Street, Glasgow G1 1RD, UK.
| | - Katherine M Ashton
- Neuropathology, Lancashire Teaching Hospitals NHS Trust, Royal Preston Hospital, Sharoe Green Lane, Fulwood, Preston, PR2 9HT, UK
| | - Andrew Brodbelt
- Neurosurgery, The Walton Centre NHS Foundation Trust, Lower Lane, Fazakerley, Liverpool, L9 7LJ, UK
| | - Timothy Dawson
- Neuropathology, Lancashire Teaching Hospitals NHS Trust, Royal Preston Hospital, Sharoe Green Lane, Fulwood, Preston, PR2 9HT, UK
| | - Michael D Jenkinson
- Neurosurgery, The Walton Centre NHS Foundation Trust, Lower Lane, Fazakerley, Liverpool, L9 7LJ, UK
| | - Neil T Hunt
- SUPA, Department of Physics, University of Strathclyde, 107 Rottenrow East, Glasgow, G4 0NG, UK
| | - David S Palmer
- WestCHEM, Department of Pure and Applied Chemistry, University of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow, Scotland G1 1XL, UK.
| | - Matthew J Baker
- WestCHEM, Department of Pure and Applied Chemistry, University of Strathclyde, Technology and Innovation Centre, 99 George Street, Glasgow G1 1RD, UK.
| |
Collapse
|
12
|
Reza Soroushmehr SM, Najarian K. Classifying osteosarcoma patients using machine learning approaches. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2017:82-85. [PMID: 29059816 DOI: 10.1109/embc.2017.8036768] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Metabolomic data analysis presents a unique opportunity to advance our understanding of osteosarcoma, a common bone malignancy for which genomic and proteomic studies have enjoyed limited success. One of the major goals of metabolomic studies is to classify osteosarcoma in early stages, which is required for metastasectomy treatment. In this paper we subject our metabolomic data on osteosarcoma patients collected by the SJTU team to three classification methods: logistic regression, support vector machine (SVM) and random forest (RF). The performances are evaluated and compared using receiver operating characteristic curves. All three classifiers are successful in distinguishing between healthy control and tumor cases, with random forest outperforming the other two for cross-validation in training set (accuracy rate for logistic regression, support vector machine and random forest are 88%, 90% and 97% respectively). Random forest achieved overall accuracy rate of 95% with 0.99 AUC on testing set.
Collapse
|
13
|
Integrated Chemometrics and Statistics to Drive Successful Proteomics Biomarker Discovery. Proteomes 2018; 6:proteomes6020020. [PMID: 29701723 PMCID: PMC6027525 DOI: 10.3390/proteomes6020020] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 04/19/2018] [Accepted: 04/25/2018] [Indexed: 01/15/2023] Open
Abstract
Protein biomarkers are of great benefit for clinical research and applications, as they are powerful means for diagnosing, monitoring and treatment prediction of different diseases. Even though numerous biomarkers have been reported, the translation to clinical practice is still limited. This mainly due to: (i) incorrect biomarker selection, (ii) insufficient validation of potential biomarkers, and (iii) insufficient clinical use. In this review, we focus on the biomarker selection process and critically discuss the chemometrical and statistical decisions made in proteomics biomarker discovery to increase to selection of high value biomarkers. The characteristics of the data, the computational resources, the type of biomarker that is searched for and the validation strategy influence the decision making of the chemometrical and statistical methods and a decision made for one component directly influences the choice for another. Incorrect decisions could increase the false positive and negative rate of biomarkers which requires independent confirmation of outcome by other techniques and for comparison between different related studies. There are few guidelines for authors regarding data analysis documentation in peer reviewed journals, making it hard to reproduce successful data analysis strategies. Here we review multiple chemometrical and statistical methods for their value in proteomics-based biomarker discovery and propose to include key components in scientific documentation.
Collapse
|
14
|
Chang J, Paydarfar D. Evolution of extrema features reveals optimal stimuli for biological state transitions. Sci Rep 2018; 8:3403. [PMID: 29467377 PMCID: PMC5821862 DOI: 10.1038/s41598-018-21761-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 02/09/2018] [Indexed: 11/08/2022] Open
Abstract
The ability to define the unique features of an input stimulus needed to control switch-like behavior in biological systems is an important problem in computational biology and medicine. We show in this study how highly complex and intractable optimization problems can be simplified by restricting the search to the signal's extrema as key feature points, and evolving the extrema features towards optimal solutions that closely match solutions derived from gradient-based methods. Our results suggest a model-independent approach for solving a class of optimization problems related to controlling switch-like state transitions.
Collapse
Affiliation(s)
- Joshua Chang
- Department of Neurology, University of Massachusetts Medical School, Worcester, Massachusetts, 01604, USA.
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, Texas, 78701, USA.
| | - David Paydarfar
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, Texas, 78701, USA.
- The Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, Texas, 78701, USA.
| |
Collapse
|
15
|
Dossat N, Mangé A, Solassol J, Jacot W, Lhermitte L, Maudelonde T, Daurès JP, Molinari N. Comparison of Supervised Classification Methods for Protein Profiling in Cancer Diagnosis. Cancer Inform 2017. [DOI: 10.1177/117693510700300023] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
A key challenge in clinical proteomics of cancer is the identification of biomarkers that could allow detection, diagnosis and prognosis of the diseases. Recent advances in mass spectrometry and proteomic instrumentations offer unique chance to rapidly identify these markers. These advances pose considerable challenges, similar to those created by microarray-based investigation, for the discovery of pattern of markers from high-dimensional data, specific to each pathologic state (e.g. normal vs cancer). We propose a three-step strategy to select important markers from high-dimensional mass spectrometry data using surface enhanced laser desorption/ionization (SELDI) technology. The first two steps are the selection of the most discriminating biomarkers with a construction of different classifiers. Finally, we compare and validate their performance and robustness using different supervised classification methods such as Support Vector Machine, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Neural Networks, Classification Trees and Boosting Trees. We show that the proposed method is suitable for analysing high-throughput proteomics data and that the combination of logistic regression and Linear Discriminant Analysis outperform other methods tested.
Collapse
Affiliation(s)
- Nadège Dossat
- IURC, Department of Biostatistic, Epidemiology and Clinical Research, Montpellier, France
- University of Montpellier I, Montpellier, France
| | - Alain Mangé
- University of Montpellier I, Montpellier, France
- CHU Montpellier, Hôpital Arnaud de Villeneuve, Department of Cellular Biology, Montpellier, France
- INSERM, U540, Montpellier, France
| | - Jérôme Solassol
- University of Montpellier I, Montpellier, France
- CHU Montpellier, Hôpital Arnaud de Villeneuve, Department of Cellular Biology, Montpellier, France
- INSERM, U540, Montpellier, France
| | - William Jacot
- University of Montpellier I, Montpellier, France
- CHU Montpellier, Hôpital Arnaud de Villeneuve, Department of Thoracic Oncology, Montpellier, France
| | - Ludovic Lhermitte
- University of Montpellier I, Montpellier, France
- CHU Montpellier, Hôpital Arnaud de Villeneuve, Department of Cellular Biology, Montpellier, France
- INSERM, U540, Montpellier, France
| | - Thierry Maudelonde
- University of Montpellier I, Montpellier, France
- CHU Montpellier, Hôpital Arnaud de Villeneuve, Department of Cellular Biology, Montpellier, France
- INSERM, U540, Montpellier, France
| | - Jean-Pierre Daurès
- IURC, Department of Biostatistic, Epidemiology and Clinical Research, Montpellier, France
- University of Montpellier I, Montpellier, France
- Chu Nîmes, Hôspital Caremeau, Department of Medical Information, Nîmes, France
| | - Nicolas Molinari
- IURC, Department of Biostatistic, Epidemiology and Clinical Research, Montpellier, France
- University of Montpellier I, Montpellier, France
- Chu Nîmes, Hôspital Caremeau, Department of Medical Information, Nîmes, France
| |
Collapse
|
16
|
Serum lipid profile discriminates patients with early lung cancer from healthy controls. Lung Cancer 2017; 112:69-74. [DOI: 10.1016/j.lungcan.2017.07.036] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Revised: 07/11/2017] [Accepted: 07/31/2017] [Indexed: 01/09/2023]
|
17
|
Mass spectrometry as a tool for biomarkers searching in gynecological oncology. Biomed Pharmacother 2017; 92:836-842. [PMID: 28601044 DOI: 10.1016/j.biopha.2017.05.146] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2016] [Revised: 05/21/2017] [Accepted: 05/31/2017] [Indexed: 01/10/2023] Open
Abstract
Tumors of the female reproductive tract are an important target for the development of diagnostic, prognostic and therapeutic strategies. Recent research has turned to proteomics based on mass spectrometry techniques, to achieve more effective diagnostic results. Mass spectrometry (MS) enables identification and quantification of multiple molecules simultaneously in a single experiment according to mass to charge ratio (m/z). Several proteomic strategies may be applied to establish the function of a particular protein/peptide or to identify a novel disease and specific biomarkers related to it. Therefore, MS could facilitate treatment in patients with tumors by helping researchers discover new biomarkers and narrowly targeted drugs. This review presents a comprehensive discussion of mass spectrometry as a tool for biomarkers searching that may lead to the discovery of easily available diagnostic tests in gynecological oncology with emphasis on clinical proteomics over the past decade. The article provides an insight into different MS based proteomic approaches.
Collapse
|
18
|
Integration of data mining classification techniques and ensemble learning to identify risk factors and diagnose ovarian cancer recurrence. Artif Intell Med 2017; 78:47-54. [DOI: 10.1016/j.artmed.2017.06.003] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 05/30/2017] [Accepted: 06/04/2017] [Indexed: 11/22/2022]
|
19
|
Kong A, Azencott R. Binary Markov Random Fields and interpretable mass spectra discrimination. Stat Appl Genet Mol Biol 2017; 16:/j/sagmb.ahead-of-print/sagmb-2016-0019/sagmb-2016-0019.xml. [PMID: 28475101 DOI: 10.1515/sagmb-2016-0019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
For mass spectra acquired from cancer patients by MALDI or SELDI techniques, automated discrimination between cancer types or stages has often been implemented by machine learning algorithms. Nevertheless, these techniques typically lack interpretability in terms of biomarkers. In this paper, we propose a new mass spectra discrimination algorithm by parameterized Markov Random Fields to automatically generate interpretable classifiers with small groups of scored biomarkers. A dataset of 238 MALDI colorectal mass spectra and two datasets of 216 and 253 SELDI ovarian mass spectra respectively were used to test our approach. The results show that our approach reaches accuracies of 81% to 100% to discriminate between patients from different colorectal and ovarian cancer stages, and performs as well or better than previous studies on similar datasets. Moreover, our approach enables efficient planar-displays to visualize mass spectra discrimination and has good asymptotic performance for large datasets. Thus, our classifiers should facilitate the choice and planning of further experiments for biological interpretation of cancer discriminating signatures. In our experiments, the number of mass spectra for each colorectal cancer stage is roughly half of that for each ovarian cancer stage, so that we reach lower discrimination accuracy for colorectal cancer than for ovarian cancer.
Collapse
|
20
|
Khozeimeh F, Alizadehsani R, Roshanzamir M, Khosravi A, Layegh P, Nahavandi S. An expert system for selecting wart treatment method. Comput Biol Med 2017; 81:167-175. [DOI: 10.1016/j.compbiomed.2017.01.001] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Revised: 12/31/2016] [Accepted: 01/03/2017] [Indexed: 01/15/2023]
|
21
|
Gadducci A, Cosio S, Zanca G, Genazzani AR. Evolving Role of Serum Biomarkers in the Management of Ovarian Cancer. WOMENS HEALTH 2016; 2:141-58. [DOI: 10.2217/17455057.2.1.141] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
The availability of an ideal serum tumor marker would be of great clinical benefit for both the diagnosis and management of patients with epithelial ovarian cancer. Serum cancer antigen 125 assay significantly increases the diagnostic reliability of ultrasound in discriminating a malignant from a benign ovarian mass, especially in postmenopausal women, and it is the only well validated tumor marker for monitoring disease course. Several other tumor-associated antigens have been assessed, including glycoprotein antigens other than cancer antigen 125, soluble cytokeratin fragments, kallikreins, cytokines and cytokine receptors, vascular endothelial growth factor, D-dimer, and lisophosphatidic acid. This article assesses the potential diagnostic and prognostic role of these novel biomarkers, both alone and in combination with cancer antigen 125. The future for serum tumor marker research is represented by the emerging technology of proteomics, which may allow scientific advances comparable to those achieved with the introduction of monoclonal antibody technology.
Collapse
Affiliation(s)
- Angiolo Gadducci
- Department of Procreative Medicine, Division of Gynecology and Obstetrics, University of Pisa, Via Roma 56, Pisa, 56127, Italy, Tel.: +39 50 992 609; Fax: +39 50 553 410
| | - Stefania Cosio
- Department of Procreative Medicine, Division of Gynecology and Obstetrics, University of Pisa, Via Roma 56, Pisa, 56127, Italy, Tel.: +39 50 992 609; Fax: +39 50 553 410
| | - Giulia Zanca
- Department of Procreative Medicine, Division of Gynecology and Obstetrics, University of Pisa, Via Roma 56, Pisa, 56127, Italy, Tel.: +39 50 992 609; Fax: +39 50 553 410
| | - Andrea Riccardo Genazzani
- Department of Procreative Medicine, Division of Gynecology and Obstetrics, University of Pisa, Via Roma 56, Pisa, 56127, Italy, Tel.: +39 50 992 609; Fax: +39 50 553 410
| |
Collapse
|
22
|
Widlak P, Pietrowska M, Polanska J, Marczyk M, Ros-Mazurczyk M, Dziadziuszko R, Jassem J, Rzyman W. Serum mass profile signature as a biomarker of early lung cancer. Lung Cancer 2016; 99:46-52. [PMID: 27565913 DOI: 10.1016/j.lungcan.2016.06.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Revised: 05/12/2016] [Accepted: 06/11/2016] [Indexed: 01/10/2023]
Abstract
OBJECTIVES Circulating molecular biomarkers of lung cancer may allow the pre-selection of candidates for computed tomography screening or increase its efficacy. We aimed to identify features of serum mass profile distinguishing individuals with early lung cancer from healthy participants of the lung cancer screening program. METHODS Blood samples were collected during a low-dose computed tomography (LD-CT) screening program performed by one institution (Medical University of Gdansk, Poland). MALDI-ToF mass spectrometry was used to characterize the low-molecular-weight (1000-14,000Da) serum fraction. The analysis comprised 95 patients with early stage lung cancer (including 30 screen-detected cases) and a matched group of 285 healthy controls. The cases were split into two independent cohorts (discovery and validation), analyzed separately 6 months apart. RESULTS Several molecular components of serum (putatively components of endogenous peptidome) discriminating patients with early lung cancer from controls were identified in a discovery cohort. This allowed building an effective cancer classifier as a model tuned to maximize negative predictive value, with an area under the curve (AUC) of 0.88, a negative predictive value of 100%, and a positive predictive value of 48%. However, the classifier performed worse in a validation cohort including independent sample sets (AUC 0.73, NPV 88% and PPV 30%). CONCLUSIONS We developed a serum mass profile-based signature identifying patients with early lung cancer. Although this marker has insufficient value as a stand-alone preselecting tool for LD-CT screening, its potential clinical usefulness in evaluation of indeterminate pulmonary nodules deserves further investigation.
Collapse
Affiliation(s)
- Piotr Widlak
- Maria Skłodowska-Curie Memorial Cancer Center and Institute of Oncology, ul. Wybrzeże Armii Krajowej 15, 44-100 Gliwice, Poland.
| | - Monika Pietrowska
- Maria Skłodowska-Curie Memorial Cancer Center and Institute of Oncology, ul. Wybrzeże Armii Krajowej 15, 44-100 Gliwice, Poland.
| | - Joanna Polanska
- Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland.
| | - Michal Marczyk
- Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland.
| | - Malgorzata Ros-Mazurczyk
- Maria Skłodowska-Curie Memorial Cancer Center and Institute of Oncology, ul. Wybrzeże Armii Krajowej 15, 44-100 Gliwice, Poland.
| | | | - Jacek Jassem
- Medical University of Gdańsk, ul. Dębinki 7, 80-211 Gdańsk, Poland.
| | - Witold Rzyman
- Medical University of Gdańsk, ul. Dębinki 7, 80-211 Gdańsk, Poland.
| |
Collapse
|
23
|
Bashir S, Qamar U, Khan FH. A Multicriteria Weighted Vote-Based Classifier Ensemble for Heart Disease Prediction. Comput Intell 2015. [DOI: 10.1111/coin.12070] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Saba Bashir
- Computer Engineering Department, College of Electrical and Mechanical Engineering; National University of Sciences and Technology (NUST); Islamabad Pakistan
| | - Usman Qamar
- Computer Engineering Department, College of Electrical and Mechanical Engineering; National University of Sciences and Technology (NUST); Islamabad Pakistan
| | - Farhan Hassan Khan
- Computer Engineering Department, College of Electrical and Mechanical Engineering; National University of Sciences and Technology (NUST); Islamabad Pakistan
| |
Collapse
|
24
|
Influence of honeybee sting on peptidome profile in human serum. Toxins (Basel) 2015; 7:1808-20. [PMID: 26008235 PMCID: PMC4448175 DOI: 10.3390/toxins7051808] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 05/15/2015] [Indexed: 02/06/2023] Open
Abstract
The aim of this study was to explore the serum peptide profiles from honeybee stung and non-stung individuals. Two groups of serum samples obtained from 27 beekeepers were included in our study. The first group of samples was collected within 3 h after a bee sting (stung beekeepers), and the samples were collected from the same person a second time after at least six weeks after the last bee sting (non-stung beekeepers). Peptide profile spectra were determined using MALDI-TOF mass spectrometry combined with Omix, ZipTips and magnetic beads based on weak-cation exchange (MB-WCX) enrichment strategies in the mass range of 1–10 kDa. The samples were classified, and discriminative models were established by using the quick classifier, genetic algorithm and supervised neural network algorithms. All of the statistical algorithms used in this study allow distinguishing analyzed groups with high statistical significance, which confirms the influence of honeybee sting on the serum peptidome profile. The results of this study may broaden the understanding of the human organism’s response to honeybee venom. Due to the fact that our pilot study was carried out on relatively small datasets, it is necessary to conduct further proteomic research of the response to honeybee sting on a larger group of samples.
Collapse
|
25
|
Bashir S, Qamar U, Khan FH. BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting. AUSTRALASIAN PHYSICAL & ENGINEERING SCIENCES IN MEDICINE 2015; 38:305-23. [DOI: 10.1007/s13246-015-0337-6] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2014] [Accepted: 02/24/2015] [Indexed: 11/28/2022]
|
26
|
Mirkes E, Alexandrakis I, Slater K, Tuli R, Gorban A. Computational diagnosis and risk evaluation for canine lymphoma. Comput Biol Med 2014; 53:279-90. [DOI: 10.1016/j.compbiomed.2014.08.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Revised: 08/01/2014] [Accepted: 08/07/2014] [Indexed: 10/24/2022]
|
27
|
Yamada S, Kawaguchi A, Kawaguchi T, Fukushima N, Kuromatsu R, Sumie S, Takata A, Nakano M, Satani M, Tonan T, Fujimoto K, Shima H, Kakuma T, Torimura T, Charlton MR, Sata M. Serum albumin level is a notable profiling factor for non-B, non-C hepatitis virus-related hepatocellular carcinoma: A data-mining analysis. Hepatol Res 2014; 44:837-45. [PMID: 23819517 DOI: 10.1111/hepr.12192] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/24/2013] [Revised: 06/24/2013] [Accepted: 06/25/2013] [Indexed: 12/12/2022]
Abstract
AIM Various factors are underlying for the onset of non-B, non-C hepatitis virus-related hepatocellular carcinoma (NBNC-HCC). We aimed to investigate the independent risk factors and profiles associated with NBNC-HCC using a data-mining technique. METHODS We conducted a case-control study and enrolled 223 NBNC-HCC patients and 669 controls from a health checkup database (n = 176 886). Multivariate analysis, random forest analysis and a decision-tree algorithm were employed to examine the independent risk factors, factors distinguishing between the case and control groups, and to identify profiles for the incidence of NBNC-HCC, respectively. RESULTS In multivariate analysis, besides γ-glutamyltransferase (GGT) levels and the Brinkman index, albumin level was an independent negative risk factor for the incidence of NBNC-HCC (odds ratio = 0.67; 95% confidence interval = 0.60-0.70; P < 0.0001). In random forest analysis, serum albumin level was the highest-ranked variable for distinguishing between the case and control groups (98 variable importance). A decision-tree algorithm was created for albumin and GGT levels, the aspartate aminotransferase-to-platelet ratio index (APRI) and the Brinkman index. The serum albumin level was selected as the initial split variable, and 82.5% of the subjects with albumin levels of less than 4.01 g/dL were found to have NBNC-HCC. CONCLUSION Data-mining analysis revealed that serum albumin level is an independent risk factor and the most distinguishable factor associated with the incidence of NBNC-HCC. Furthermore, we created an NBNC-HCC profile consisting of albumin and GGT levels, the APRI and the Brinkman index. This profile could be used in the screening strategy for NBNC-HCC.
Collapse
Affiliation(s)
- Shingo Yamada
- Division of Gastroenterology, Department of Medicine, Kurume University School of Medicine, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Kong A, Gupta C, Ferrari M, Agostini M, Bedin C, Bouamrani A, Tasciotti E, Azencott R. Biomarker Signature Discovery from Mass Spectrometry Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:766-772. [PMID: 26356346 DOI: 10.1109/tcbb.2014.2318718] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Mass spectrometry based high throughput proteomics are used for protein analysis and clinical diagnosis. Many machine learning methods have been used to construct classifiers based on mass spectrometry data, for discrimination between cancer stages. However, the classifiers generated by machine learning such as SVM techniques typically lack biological interpretability. We present an innovative technique for automated discovery of signatures optimized to characterize various cancer stages. We validate our signature discovery algorithm on one new colorectal cancer MALDI-TOF data set, and two well-known ovarian cancer SELDI-TOF data sets. In all of these cases, our signature based classifiers performed either better or at least as well as four benchmark machine learning algorithms including SVM and KNN. Moreover, our optimized signatures automatically select smaller sets of key biomarkers than the black-boxes generated by machine learning, and are much easier to interpret.
Collapse
|
29
|
Li S, Kang L, Zhao XM. A survey on evolutionary algorithm based hybrid intelligence in bioinformatics. BIOMED RESEARCH INTERNATIONAL 2014; 2014:362738. [PMID: 24729969 PMCID: PMC3963368 DOI: 10.1155/2014/362738] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Revised: 01/29/2014] [Accepted: 01/29/2014] [Indexed: 11/18/2022]
Abstract
With the rapid advance in genomics, proteomics, metabolomics, and other types of omics technologies during the past decades, a tremendous amount of data related to molecular biology has been produced. It is becoming a big challenge for the bioinformatists to analyze and interpret these data with conventional intelligent techniques, for example, support vector machines. Recently, the hybrid intelligent methods, which integrate several standard intelligent approaches, are becoming more and more popular due to their robustness and efficiency. Specifically, the hybrid intelligent approaches based on evolutionary algorithms (EAs) are widely used in various fields due to the efficiency and robustness of EAs. In this review, we give an introduction about the applications of hybrid intelligent methods, in particular those based on evolutionary algorithm, in bioinformatics. In particular, we focus on their applications to three common problems that arise in bioinformatics, that is, feature selection, parameter estimation, and reconstruction of biological networks.
Collapse
Affiliation(s)
- Shan Li
- Department of Mathematics, Shanghai University, Shanghai 200444, China
| | - Liying Kang
- Department of Mathematics, Shanghai University, Shanghai 200444, China
| | - Xing-Ming Zhao
- Department of Computer Science, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| |
Collapse
|
30
|
Huy NT, Thao NTH, Ha TTN, Lan NTP, Nga PTT, Thuy TT, Tuan HM, Nga CTP, Tuong VV, Dat TV, Huong VTQ, Karbwang J, Hirayama K. Development of clinical decision rules to predict recurrent shock in dengue. CRITICAL CARE : THE OFFICIAL JOURNAL OF THE CRITICAL CARE FORUM 2013; 17:R280. [PMID: 24295509 PMCID: PMC4057383 DOI: 10.1186/cc13135] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Accepted: 11/01/2013] [Indexed: 11/10/2022]
Abstract
INTRODUCTION Mortality from dengue infection is mostly due to shock. Among dengue patients with shock, approximately 30% have recurrent shock that requires a treatment change. Here, we report development of a clinical rule for use during a patient's first shock episode to predict a recurrent shock episode. METHODS The study was conducted in Center for Preventive Medicine in Vinh Long province and the Children's Hospital No. 2 in Ho Chi Minh City, Vietnam. We included 444 dengue patients with shock, 126 of whom had recurrent shock (28%). Univariate and multivariate analyses and a preprocessing method were used to evaluate and select 14 clinical and laboratory signs recorded at shock onset. Five variables (admission day, purpura/ecchymosis, ascites/pleural effusion, blood platelet count and pulse pressure) were finally trained and validated by a 10-fold validation strategy with 10 times of repetition, using a logistic regression model. RESULTS The results showed that shorter admission day (fewer days prior to admission), purpura/ecchymosis, ascites/pleural effusion, low platelet count and narrow pulse pressure were independently associated with recurrent shock. Our logistic prediction model was capable of predicting recurrent shock when compared to the null method (P < 0.05) and was not outperformed by other prediction models. Our final scoring rule provided relatively good accuracy (AUC, 0.73; sensitivity and specificity, 68%). Score points derived from the logistic prediction model revealed identical accuracy with AUCs at 0.73. Using a cutoff value greater than -154.5, our simple scoring rule showed a sensitivity of 68.3% and a specificity of 68.2%. CONCLUSIONS Our simple clinical rule is not to replace clinical judgment, but to help clinicians predict recurrent shock during a patient's first dengue shock episode.
Collapse
|
31
|
Yang MH, Yang FY, Oyang YJ. Application of density estimation algorithms in analyzing co-morbidities of migraine. ACTA ACUST UNITED AC 2013; 2:95-107. [PMID: 24392299 PMCID: PMC3873085 DOI: 10.1007/s13721-013-0028-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2012] [Revised: 01/10/2013] [Accepted: 01/21/2013] [Indexed: 11/25/2022]
Abstract
In this study, we will propose a density estimation based data analysis procedure to investigate the co-morbid associations between migraine and the suspected diseases. The primary objective of this study has aimed to develop a novel analysis procedure that can discover insightful knowledge from large medical databases. The entire analysis procedure consists of two stages. During the first stage, a kernel density estimation algorithm named relaxed variable kernel density estimation (RVKDE) is invoked to identify the samples of interest. Then, in the second stage, a density estimation algorithm based on generalized Gaussian components and named G2DE is invoked to provide a summarized description of the distribution. The results obtained by applying the proposed two-staged procedure to analyze co-morbidities of migraine revealed that the proposed procedure could effectively identify a number of clusters of samples with distinctive characteristics. The results further revealed that the distinctive characteristics of the clusters extracted by the proposed procedure were in conformity with the observations reported in recently published articles. Accordingly, it is conceivable that the proposed analysis procedure can be exploited to provide valuable clues of pathogenesis and facilitate development of proper treatment strategies.
Collapse
Affiliation(s)
- Meng-Han Yang
- Department of Computer Science and Information Engineering, National Kaohsiung University of Applied Sciences, No. 415 Chien Kung Rd., Kaohsiung, 80778 Taiwan, ROC
| | - Fu-Yi Yang
- The Department of Neurology, Taipei Tzu Chi General Hospital, No. 289, Jianguo Rd., Xindian District, New Taipei, 23142 Taiwan, ROC
| | - Yen-Jen Oyang
- Department of Computer Science and Information Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei City, 10617 Taiwan, ROC
| |
Collapse
|
32
|
Boccardi C, Rocchiccioli S, Cecchettini A, Mercatanti A, Citti L. An automated plasma protein fractionation design: high-throughput perspectives for proteomic analysis. BMC Res Notes 2012; 5:612. [PMID: 23116412 PMCID: PMC3517536 DOI: 10.1186/1756-0500-5-612] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2012] [Accepted: 10/26/2012] [Indexed: 01/17/2023] Open
Abstract
Background Human plasma, representing the most complete record of the individual phenotype, is an appealing sample for proteomics analysis in clinical applications. Up to today, the major obstacle in a proteomics study of plasma is the large dynamic range of protein concentration and the efforts of many researchers focused on the resolution of this important drawback. Findings In this study, proteins from pooled plasma samples were fractionated according to their chemical characteristics on a home-designed SPE automated platform. The resulting fractions were digested and further resolved by reversed-phase liquid chromatography coupled with MALDI TOF/TOF mass spectrometry. A total of 712 proteins were successfully identified until a concentration level of ng/mL. Pearson correlation coefficient was used to test reproducibility. Conclusions Our multidimensional fractionation approach reduced the analysis time (2 days are enough to process 16 plasma samples filling a 96-well plate) over the conventional gel-electrophoresis or multi-LC column based methods. The robotic processing, avoiding contaminants or lack of sample handling skill, promises highly reproducible specimen analyses (more than 85% Pearson correlation). The automated platform here presented is flexible and easily modulated changing fractioning elements or detectors.
Collapse
Affiliation(s)
- Claudia Boccardi
- Institute of Clinical Physiology-CNR, Via Moruzzi 1, 56124 Pisa, Italy
| | | | | | | | | |
Collapse
|
33
|
Li L, Chen L, Goldgof D, George F, Chen Z, Rao A, Cragun J, Sutphen R, Lancaster J. Integration of clinical information and gene expression profiles for prediction of chemo-response for ovarian cancer. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2012; 2005:4818-21. [PMID: 17281320 DOI: 10.1109/iembs.2005.1615550] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Ovarian cancer is the fifth leading cause of cancer death among women in the United States and western Europe. Platinum drugs are the most active agents in epithelial ovarian cancer therapy. In order to improve the prediction of response to platinum-based chemotherapy for advanced-stage ovarian cancers, we describe an integrated model which combines clinical information tumor and treatment information, with gene expression profile. This integrated modeling framework is based on the support vector machine classifier that evaluates the contributions of both clinical and gene expression data. The results show that the integrated model combining clinical information and gene expression profiles improve the prediction accuracy compared to those made by using gene expression predictor alone.
Collapse
Affiliation(s)
- Lihua Li
- Department of Radiology. He is now with Department of Interdisciplinary Oncology, H. Lee Moffitt Cancer Center & Research Institute, University of South Florida. Tampa, FL 33612, USA. (Phone:
| | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Dowd WW. Challenges for Biological Interpretation of Environmental Proteomics Data in Non-model Organisms. Integr Comp Biol 2012; 52:705-20. [DOI: 10.1093/icb/ics093] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
|
35
|
Van Hulse J, Khoshgoftaar TM, Napolitano A, Wald R. Threshold-based feature selection techniques for high-dimensional bioinformatics data. ACTA ACUST UNITED AC 2012. [DOI: 10.1007/s13721-012-0006-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
36
|
Genetic Programming for Biomarker Detection in Mass Spectrometry Data. LECTURE NOTES IN COMPUTER SCIENCE 2012. [DOI: 10.1007/978-3-642-35101-3_23] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
37
|
Nahar J, Tickle KS, Shawkat Ali AB. Pattern Discovery from Biological Data. Mach Learn 2012. [DOI: 10.4018/978-1-60960-818-7.ch403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Extracting useful information from structured and unstructured biological data is crucial in the health industry. Some examples include medical practitioner’s need to identify breast cancer patient in the early stage, estimate survival time of a heart disease patient, or recognize uncommon disease characteristics which suddenly appear. Currently there is an explosion in biological data available in the data bases. But information extraction and true open access to data are require time to resolve issues such as ethical clearance. The emergence of novel IT technologies allows health practitioners to facilitate the comprehensive analyses of medical images, genomes, transcriptomes, and proteomes in health and disease. The information that is extracted from such technologies may soon exert a dramatic change in the pace of medical research and impact considerably on the care of patients. The current research will review the existing technologies being used in heart and cancer research. Finally this research will provide some possible solutions to overcome the limitations of existing technologies. In summary the primary objective of this research is to investigate how existing modern machine learning techniques (with their strength and limitations) are being used in the indent of heartbeat related disease and the early detection of cancer in patients. After an extensive literature review these are the objectives chosen: to develop a new approach to find the association between diseases such as high blood pressure, stroke and heartbeat, to propose an improved feature selection method to analyze huge images and microarray databases for machine learning algorithms in cancer research, to find an automatic distance function selection method for clustering tasks, to discover the most significant risk factors for specific cancers, and to determine the preventive factors for specific cancers that are aligned with the most significant risk factors. Therefore we propose a research plan to attain these objectives within this chapter. The possible solutions of the above objectives are: new heartbeat identification techniques show promising association with the heartbeat patterns and diseases, sensitivity based feature selection methods will be applied to early cancer patient classification, meta learning approaches will be adopted in clustering algorithms to select an automatic distance function, and Apriori algorithm will be applied to discover the significant risks and preventive factors for specific cancers. We expect this research will add significant contributions to the medical professional to enable more accurate diagnosis and better patient care. It will also contribute in other area such as biomedical modeling, medical image analysis and early diseases warning.
Collapse
|
38
|
Widłak P, Pietrowska M, Wojtkiewicz K, Rutkowski T, Wygoda A, Marczak L, Marczyk M, Polańska J, Walaszczyk A, Domińczyk I, Składowski K, Stobiecki M, Polański A. Radiation-related changes in serum proteome profiles detected by mass spectrometry in blood of patients treated with radiotherapy due to larynx cancer. JOURNAL OF RADIATION RESEARCH 2011; 52:575-581. [PMID: 21768750 DOI: 10.1269/jrr.11019] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The study aimed to detect features of human serum proteome that were associated with exposure to ionizing radiation. The analyzed group consisted of 46 patients treated with radical radiotherapy for larynx cancer; patients were irradiated with total doses in a range from 51 to 72 Gy. Three consecutive blood samples were collected from each patient: before the start, 2 weeks after the start, and 4-6 weeks after the end of radiotherapy. The low-molecular-weight fraction of the serum proteome (2,000-13,000 Da) was analyzed by the MALDI-ToF mass spectrometry. Proteome profiles of serum samples collected before the start of radiotherapy and during the early stage of the treatment were similar. In marked contrast, mass profiles of serum samples collected several weeks after the end of the treatment revealed clear changes. We found that 41 out of 312 registered peptide ions changed their abundance significantly when serum samples collected after the final irradiation were compared with samples collected at the two earlier time points. We also found that abundances of certain serum peptides were associated with total doses of radiation received by patients. The results of this pilot study indicate that features of serum proteome analyzed by mass spectrometry have potential applicability as a retrospective marker of exposure to ionizing radiation.
Collapse
Affiliation(s)
- Piotr Widłak
- Maria Skłodowska-Curie Memorial Cancer Center and Institute of Oncology, Gliwice, Poland.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Lee SM, Park JS, Norwitz ER, Kim SM, Kim BJ, Park CW, Jun JK, Syn HC. Characterization of discriminatory urinary proteomic biomarkers for severe preeclampsia using SELDI-TOF mass spectrometry. J Perinat Med 2011; 39:391-6. [PMID: 21557676 DOI: 10.1515/jpm.2011.028] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
OBJECTIVE To analyze the proteomic pattern in urine for distinguishing severe preeclampsia from mild preeclampsia and normotensive controls using surface-enhanced laser desorption ionization time-of-flight mass spectrometry (SELDI-TOF-MS). STUDY DESIGN Urine samples were collected from women with severe preeclampsia (n=11 [sPE]), mild preeclampsia (n=7 [mPE]), and normotensive controls (n=8) and analyzed by SELDI-TOF-MS to identify discriminatory protein peaks in the sPE cohort. A scoring system was constructed--designated as Preeclampsia Proteomic Score of Urine (PPSU)--to differentiate sPE from mPE and normotensive controls. RESULTS Four discriminatory protein peaks were identified (m/z ratio: 4155, 6044, 6663, and 7971), all of which were down-regulated in women with sPE. PPSU scores in women with sPE were significantly lower than that in both mPE and controls (sPE 0 [0-4] vs. mPE 3 [0-4] vs. controls 4 [2-4]; median [range]; P<0.05). PPSU<2 had a sensitivity of 90.9% and specificity of 93.3% in discriminating patients with sPE from mPE and controls. CONCLUSION Proteomic analysis of urine can accurately distinguish sPE from mPE and normotensive controls.
Collapse
Affiliation(s)
- Seung Mi Lee
- Department of Obstetrics and Gynecology, Seoul National University College of Medicine, Seoul, Korea
| | | | | | | | | | | | | | | |
Collapse
|
40
|
Prilutsky D, Rogachev B, Marks RS, Lobel L, Last M. Classification of infectious diseases based on chemiluminescent signatures of phagocytes in whole blood. Artif Intell Med 2011; 52:153-63. [DOI: 10.1016/j.artmed.2011.04.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2009] [Revised: 04/11/2011] [Accepted: 04/18/2011] [Indexed: 12/21/2022]
|
41
|
Pietrowska M, Polańska J, Walaszczyk A, Wygoda A, Rutkowski T, Składowski K, Marczak Ł, Stobiecki M, Marczyk M, Polański A, Widłak P. Association between plasma proteome profiles analysed by mass spectrometry, a lymphocyte-based DNA-break repair assay and radiotherapy-induced acute mucosal reaction in head and neck cancer patients. Int J Radiat Biol 2011; 87:711-9. [DOI: 10.3109/09553002.2011.556174] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
42
|
Data processing pipelines for comprehensive profiling of proteomics samples by label-free LC–MS for biomarker discovery. Talanta 2011; 83:1209-24. [DOI: 10.1016/j.talanta.2010.10.029] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2010] [Revised: 10/18/2010] [Accepted: 10/21/2010] [Indexed: 01/30/2023]
|
43
|
Wang H, Huang G. Application of support vector machine in cancer diagnosis. Med Oncol 2010; 28 Suppl 1:S613-8. [PMID: 20842538 DOI: 10.1007/s12032-010-9663-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2010] [Accepted: 08/17/2010] [Indexed: 12/18/2022]
Abstract
To investigate the clinical application of tumor marker detection combined with support vector machine (SVM) model in the diagnosis of cancer. Tumor marker detection results for colorectal cancer, gastric cancer and lung cancer were collected. With these tumor mark data sets, the SVM models for diagnosis with best kernel function were created, trained and validated by cross-validation. Grid search and cross-validation methods were used to optimize the parameters of SVM. Diagnostic classifiers such as combined diagnosis test, logistic regression and decision tree were validated. Sensitivity, specialty, Youden Index and accuracy were used to evaluate the classifiers. Leave-one-out was used as the algorithm test method. For colorectal cancer, the accuracy of 4 classifiers were 75.8, 76.6, 83.1, 96.0%, respectively; for gastric cancer, the accuracy of 4 classifiers were 45.7, 64.5, 63.7, 91.7%; for lung cancer, the results were 71.9, 68.6, 75.2, 97.5%. The accuracy of SVM classifier is especially high in 4 kinds of classifiers, which indicates the potential application of SVM diagnostic model with tumor marker in cancer detection.
Collapse
Affiliation(s)
- Hui Wang
- Department of Nuclear Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 1630 Dongfang Road, Pudong District, Shanghai 200127, China
| | | |
Collapse
|
44
|
Zhou M, Guan W, Walker LD, Mezencev R, Benigno BB, Gray A, Fernández FM, McDonald JF. Rapid Mass Spectrometric Metabolic Profiling of Blood Sera Detects Ovarian Cancer with High Accuracy. Cancer Epidemiol Biomarkers Prev 2010; 19:2262-71. [DOI: 10.1158/1055-9965.epi-10-0126] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
45
|
Bougioukos P, Glotsos D, Cavouras D, Daskalakis A, Kalatzis I, Kostopoulos S, Nikiforidis G, Bezerianos A. An intensity-region driven multi-classifier scheme for improving the classification accuracy of proteomic MS-spectra. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2010; 99:147-153. [PMID: 20004492 DOI: 10.1016/j.cmpb.2009.11.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2008] [Revised: 10/26/2009] [Accepted: 11/04/2009] [Indexed: 05/28/2023]
Abstract
In this study, a pattern recognition system is presented for improving the classification accuracy of MS-spectra by means of gathering information from different MS-spectra intensity regions using a majority vote ensemble combination. The method starts by automatically breaking down all MS-spectra into common intensity regions. Subsequently, the most informative features (m/z values), which might constitute potential significant biomarkers, are extracted from each common intensity region over all the MS-spectra and, finally, normal from ovarian cancer MS-spectra are discriminated using a multi-classifier scheme, with members the Support Vector Machine, the Probabilistic Neural Network and the k-Nearest Neighbour classifiers. Clinical material was obtained from the publicly available ovarian proteomic dataset (8-7-02). To ensure robust and reliable estimates, the proposed pattern recognition system was evaluated using an external cross-validation process. The average overall performance of the system in discriminating normal from cancer ovarian MS-spectra was 97.18% with 98.52% mean sensitivity and 94.84% mean specificity values.
Collapse
|
46
|
Pietrowska M, Polanska J, Marczak L, Behrendt K, Nowicka E, Stobiecki M, Polanski A, Tarnawski R, Widlak P. Mass spectrometry-based analysis of therapy-related changes in serum proteome patterns of patients with early-stage breast cancer. J Transl Med 2010; 8:66. [PMID: 20618994 PMCID: PMC2908576 DOI: 10.1186/1479-5876-8-66] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2010] [Accepted: 07/11/2010] [Indexed: 12/04/2022] Open
Abstract
Background The proteomics approach termed proteome pattern analysis has been shown previously to have potential in the detection and classification of breast cancer. Here we aimed to identify changes in serum proteome patterns related to therapy of breast cancer patients. Methods Blood samples were collected before the start of therapy, after the surgical resection of tumors and one year after the end of therapy in a group of 70 patients diagnosed at early stages of the disease. Patients were treated with surgery either independently (26) or in combination with neoadjuvant chemotherapy (5) or adjuvant radio/chemotherapy (39). The low-molecular-weight fraction of serum proteome was examined using MALDI-ToF mass spectrometry, and then changes in intensities of peptide ions registered in a mass range between 2,000 and 14,000 Da were identified and correlated with clinical data. Results We found that surgical resection of tumors did not have an immediate effect on the mass profiles of the serum proteome. On the other hand, significant long-term effects were observed in serum proteome patterns one year after the end of basic treatment (we found that about 20 peptides exhibited significant changes in their abundances). Moreover, the significant differences were found primarily in the subgroup of patients treated with adjuvant therapy, but not in the subgroup subjected only to surgery. This suggests that the observed changes reflect overall responses of the patients to the toxic effects of adjuvant radio/chemotherapy. In line with this hypothesis we detected two serum peptides (registered m/z values 2,184 and 5,403 Da) whose changes correlated significantly with the type of treatment employed (their abundances decreased after adjuvant therapy, but increased in patients treated only with surgery). On the other hand, no significant correlation was found between changes in the abundance of any spectral component or clinical features of patients, including staging and grading of tumors. Conclusions The study establishes a high potential of MALDI-ToF-based analyses for the detection of dynamic changes in the serum proteome related to therapy of breast cancer patients, which revealed the potential applicability of serum proteome patterns analyses in monitoring the toxicity of therapy.
Collapse
Affiliation(s)
- Monika Pietrowska
- Maria Skłodowska-Curie Memorial Cancer Center and Institute of Oncology, Gliwice, Poland
| | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Almeida JS, McKillen DJ, Chen YA, Gross PS, Chapman RW, Warr G. Design and calibration of microarrays as universal transcriptomic environmental biosensors. Comp Funct Genomics 2010; 6:132-7. [PMID: 18629225 PMCID: PMC2447521 DOI: 10.1002/cfg.466] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2005] [Accepted: 02/07/2005] [Indexed: 11/15/2022] Open
Affiliation(s)
- J S Almeida
- Department of Biostatistics Bioinformatics, and Epidemiology, Medical University of South Carolina, 135 Cannon Street, Charleston, SC 29425, USA.
| | | | | | | | | | | |
Collapse
|
48
|
Karpievitch YV, Hill EG, Leclerc AP, Dabney AR, Almeida JS. An introspective comparison of random forest-based classifiers for the analysis of cluster-correlated data by way of RF++. PLoS One 2009; 4:e7087. [PMID: 19763254 PMCID: PMC2739274 DOI: 10.1371/journal.pone.0007087] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2009] [Accepted: 08/13/2009] [Indexed: 11/19/2022] Open
Abstract
Many mass spectrometry-based studies, as well as other biological experiments produce cluster-correlated data. Failure to account for correlation among observations may result in a classification algorithm overfitting the training data and producing overoptimistic estimated error rates and may make subsequent classifications unreliable. Current common practice for dealing with replicated data is to average each subject replicate sample set, reducing the dataset size and incurring loss of information. In this manuscript we compare three approaches to dealing with cluster-correlated data: unmodified Breiman's Random Forest (URF), forest grown using subject-level averages (SLA), and RF++ with subject-level bootstrapping (SLB). RF++, a novel Random Forest-based algorithm implemented in C++, handles cluster-correlated data through a modification of the original resampling algorithm and accommodates subject-level classification. Subject-level bootstrapping is an alternative sampling method that obviates the need to average or otherwise reduce each set of replicates to a single independent sample. Our experiments show nearly identical median classification and variable selection accuracy for SLB forests and URF forests when applied to both simulated and real datasets. However, the run-time estimated error rate was severely underestimated for URF forests. Predictably, SLA forests were found to be more severely affected by the reduction in sample size which led to poorer classification and variable selection accuracy. Perhaps most importantly our results suggest that it is reasonable to utilize URF for the analysis of cluster-correlated data. Two caveats should be noted: first, correct classification error rates must be obtained using a separate test dataset, and second, an additional post-processing step is required to obtain subject-level classifications. RF++ is shown to be an effective alternative for classifying both clustered and non-clustered data. Source code and stand-alone compiled versions of command-line and easy-to-use graphical user interface (GUI) versions of RF++ for Windows and Linux as well as a user manual (Supplementary File S2) are available for download at: http://sourceforge.org/projects/rfpp/ under the GNU public license.
Collapse
|
49
|
Penno MAS, Ernst M, Hoffmann P. Optimal preparation methods for automated matrix-assisted laser desorption/ionization time-of-flight mass spectrometry profiling of low molecular weight proteins and peptides. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2009; 23:2656-2662. [PMID: 19630030 DOI: 10.1002/rcm.4167] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Mass spectrometry (MS) profiling of the proteome and peptidome for disease-associated patterns is a new concept in clinical diagnostics. The technique, however, is highly sensitive to external sources of variation leading to potentially unacceptable numbers of false positive and false negative results. Before MS profiling can be confidently implemented in a medical setting, standard experimental methods must be developed that minimize technical variance. Past studies of variance have focused largely on pre-analytical variation (i.e., sample collection, handling, etc.). Here, we examined how factors at the analytical stage including the matrix and solid-phase extraction influence MS profiling. Firstly, a standard peptide/protein sample was measured automatically by matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) MS across five consecutive days using two different preparation methods, dried droplet and sample/matrix, of four types of matrix: alpha-cyano-4-hydroxycinnamic acid (HCCA), sinapinic acid (SA), 2,5-dihydroxybenzoic acid (DHB) and 2,5-dihydroxyacetophenone (DHAP). The results indicated that the matrix preparation greatly influenced a number of key parameters of the spectra including repeatability (within-day variability), reproducibility (inter-day variability), resolution, signal strength, background intensity and detectability. Secondly, an investigation into the variance associated with C8 magnetic bead extraction of the standard sample prior to automated MS profiling demonstrated that the process did not adversely affect these same parameters. In fact, the spectra were generally more robust following extraction. Thirdly, the best performing matrix preparations were evaluated using C8 magnetic bead extracted human plasma. We conclude that the DHAP prepared according to the dried-droplet method is the most appropriate matrix to use when performing automated MS profiling.
Collapse
Affiliation(s)
- Megan A S Penno
- Adelaide Proteomics Centre, University of Adelaide, Adelaide, South Australia, Australia.
| | | | | |
Collapse
|
50
|
Guan W, Zhou M, Hampton CY, Benigno BB, Walker LD, Gray A, McDonald JF, Fernández FM. Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines. BMC Bioinformatics 2009; 10:259. [PMID: 19698113 PMCID: PMC2741455 DOI: 10.1186/1471-2105-10-259] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2009] [Accepted: 08/22/2009] [Indexed: 12/12/2022] Open
Abstract
Background The majority of ovarian cancer biomarker discovery efforts focus on the identification of proteins that can improve the predictive power of presently available diagnostic tests. We here show that metabolomics, the study of metabolic changes in biological systems, can also provide characteristic small molecule fingerprints related to this disease. Results In this work, new approaches to automatic classification of metabolomic data produced from sera of ovarian cancer patients and benign controls are investigated. The performance of support vector machines (SVM) for the classification of liquid chromatography/time-of-flight mass spectrometry (LC/TOF MS) metabolomic data focusing on recognizing combinations or "panels" of potential metabolic diagnostic biomarkers was evaluated. Utilizing LC/TOF MS, sera from 37 ovarian cancer patients and 35 benign controls were studied. Optimum panels of spectral features observed in positive or/and negative ion mode electrospray (ESI) MS with the ability to distinguish between control and ovarian cancer samples were selected using state-of-the-art feature selection methods such as recursive feature elimination and L1-norm SVM. Conclusion Three evaluation processes (leave-one-out-cross-validation, 12-fold-cross-validation, 52-20-split-validation) were used to examine the SVM models based on the selected panels in terms of their ability for differentiating control vs. disease serum samples. The statistical significance for these feature selection results were comprehensively investigated. Classification of the serum sample test set was over 90% accurate indicating promise that the above approach may lead to the development of an accurate and reliable metabolomic-based approach for detecting ovarian cancer.
Collapse
Affiliation(s)
- Wei Guan
- College of Computing, Georgia Institute of Technology, Atlanta GA 30332, USA.
| | | | | | | | | | | | | | | |
Collapse
|