1
|
Vizza P, Aracri F, Guzzi PH, Gaspari M, Veltri P, Tradigo G. Machine learning pipeline to analyze clinical and proteomics data: experiences on a prostate cancer case. BMC Med Inform Decis Mak 2024; 24:93. [PMID: 38584282 PMCID: PMC11000316 DOI: 10.1186/s12911-024-02491-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Accepted: 03/25/2024] [Indexed: 04/09/2024] Open
Abstract
Proteomic-based analysis is used to identify biomarkers in blood samples and tissues. Data produced by devices such as mass spectrometry requires platforms to identify and quantify proteins (or peptides). Clinical information can be related to mass spectrometry data to identify diseases at an early stage. Machine learning techniques can be used to support physicians and biologists in studying and classifying pathologies. We present the application of machine learning techniques to define a pipeline aimed at studying and classifying proteomics data enriched using clinical information. The pipeline allows users to relate established blood biomarkers with clinical parameters and proteomics data. The proposed pipeline entails three main phases: (i) feature selection, (ii) models training, and (iii) models ensembling. We report the experience of applying such a pipeline to prostate-related diseases. Models have been trained on several biological datasets. We report experimental results about two datasets that result from the integration of clinical and mass spectrometry-based data in the contexts of serum and urine analysis. The pipeline receives input data from blood analytes, tissue samples, proteomic analysis, and urine biomarkers. It then trains different models for feature selection, classification and voting. The presented pipeline has been applied on two datasets obtained in a 2 years research project which aimed to extract hidden information from mass spectrometry, serum, and urine samples from hundreds of patients. We report results on analyzing prostate datasets serum with 143 samples, including 79 PCa and 84 BPH patients, and an urine dataset with 121 samples, including 67 PCa and 54 BPH patients. As results pipeline allowed to identify interesting peptides in the two datasets, 6 for the first one and 2 for the second one. The best model for both serum (AUC=0.87, Accuracy=0.83, F1=0.81, Sensitivity=0.84, Specificity=0.81) and urine (AUC=0.88, Accuracy=0.83, F1=0.83, Sensitivity=0.85, Specificity=0.80) datasets showed good predictive performances. We made the pipeline code available on GitHub and we are confident that it will be successfully adopted in similar clinical setups.
Collapse
Affiliation(s)
- Patrizia Vizza
- Department of Surgical and Medical Sciences, Magna Græcia University, 88100, Catanzaro, Italy
| | - Federica Aracri
- Department of Surgical and Medical Sciences, Magna Græcia University, 88100, Catanzaro, Italy.
| | - Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, Magna Græcia University, 88100, Catanzaro, Italy
| | - Marco Gaspari
- Department of Experimental and Clinical Medicine, Magna Græcia University, 88100, Catanzaro, Italy
| | - Pierangelo Veltri
- Department of Computers, Modeling, Electronics and Systems Engineering, University of Calabria, 87036, Rende, Italy
| | - Giuseppe Tradigo
- Department of Theoretical and Applied Sciences, eCampus University, 22060, Novedrate, CO, Italy
| |
Collapse
|
2
|
Boginskaya I, Safiullin R, Tikhomirova V, Kryukova O, Nechaeva N, Bulaeva N, Golukhova E, Ryzhikov I, Kost O, Afanasev K, Kurochkin I. Human Angiotensin I-Converting Enzyme Produced by Different Cells: Classification of the SERS Spectra with Linear Discriminant Analysis. Biomedicines 2022; 10:biomedicines10061389. [PMID: 35740411 PMCID: PMC9219671 DOI: 10.3390/biomedicines10061389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/07/2022] [Accepted: 06/09/2022] [Indexed: 11/16/2022] Open
Abstract
Angiotensin I-converting enzyme (ACE) is a peptidase widely presented in human tissues and biological fluids. ACE is a glycoprotein containing 17 potential N-glycosylation sites which can be glycosylated in different ways due to post-translational modification of the protein in different cells. For the first time, surface-enhanced Raman scattering (SERS) spectra of human ACE from lungs, mainly produced by endothelial cells, ACE from heart, produced by endothelial heart cells and miofibroblasts, and ACE from seminal fluid, produced by epithelial cells, have been compared with full assignment. The ability to separate ACEs’ SERS spectra was demonstrated using the linear discriminant analysis (LDA) method with high accuracy. The intervals in the spectra with maximum contributions of the spectral features were determined and their contribution to the spectrum of each separate ACE was evaluated. Near 25 spectral features forming three intervals were enough for successful separation of the spectra of different ACEs. However, more spectral information could be obtained from analysis of 50 spectral features. Band assignment showed that several features did not correlate with band assignments to amino acids or peptides, which indicated the carbohydrate contribution to the final spectra. Analysis of SERS spectra could be beneficial for the detection of tissue-specific ACEs.
Collapse
Affiliation(s)
- Irina Boginskaya
- Institute for Theoretical and Applied Electromagnetics RAS, 125412 Moscow, Russia; (R.S.); (I.R.); (K.A.)
- Bakulev Scientific Center for Cardiovascular Surgery, Cardiology Department, 121552 Moscow, Russia; (N.B.); (E.G.)
- Correspondence:
| | - Robert Safiullin
- Institute for Theoretical and Applied Electromagnetics RAS, 125412 Moscow, Russia; (R.S.); (I.R.); (K.A.)
- Moscow Institute of Physics and Technology, 141700 Dolgoprudny, Russia
| | - Victoria Tikhomirova
- Faculty of Chemistry, M.V. Lomonosov Moscow State University, 119991 Moscow, Russia; (V.T.); (O.K.); (O.K.); (I.K.)
| | - Olga Kryukova
- Faculty of Chemistry, M.V. Lomonosov Moscow State University, 119991 Moscow, Russia; (V.T.); (O.K.); (O.K.); (I.K.)
| | - Natalia Nechaeva
- Emanuel Institute of Biochemical Physics RAS, 119334 Moscow, Russia;
| | - Naida Bulaeva
- Bakulev Scientific Center for Cardiovascular Surgery, Cardiology Department, 121552 Moscow, Russia; (N.B.); (E.G.)
| | - Elena Golukhova
- Bakulev Scientific Center for Cardiovascular Surgery, Cardiology Department, 121552 Moscow, Russia; (N.B.); (E.G.)
| | - Ilya Ryzhikov
- Institute for Theoretical and Applied Electromagnetics RAS, 125412 Moscow, Russia; (R.S.); (I.R.); (K.A.)
- FMN Laboratory, Bauman Moscow State Technical University, 105005 Moscow, Russia
| | - Olga Kost
- Faculty of Chemistry, M.V. Lomonosov Moscow State University, 119991 Moscow, Russia; (V.T.); (O.K.); (O.K.); (I.K.)
| | - Konstantin Afanasev
- Institute for Theoretical and Applied Electromagnetics RAS, 125412 Moscow, Russia; (R.S.); (I.R.); (K.A.)
| | - Ilya Kurochkin
- Faculty of Chemistry, M.V. Lomonosov Moscow State University, 119991 Moscow, Russia; (V.T.); (O.K.); (O.K.); (I.K.)
- Emanuel Institute of Biochemical Physics RAS, 119334 Moscow, Russia;
| |
Collapse
|
3
|
Pirhadi S, Maghooli K, Moteghaed NY, Garshasbi M, Mousavirad SJ. Biomarker Discovery by Imperialist Competitive Algorithm in Mass Spectrometry Data for Ovarian Cancer Prediction. JOURNAL OF MEDICAL SIGNALS & SENSORS 2021; 11:108-119. [PMID: 34268099 PMCID: PMC8253319 DOI: 10.4103/jmss.jmss_20_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Revised: 05/14/2020] [Accepted: 07/04/2020] [Indexed: 11/20/2022]
Abstract
Background: Mass spectrometry is a method for identifying proteins and could be used for distinguishing between proteins in healthy and nonhealthy samples. This study was conducted using mass spectrometry data of ovarian cancer with high resolution. Usually, diagnostic and monitoring tests are done according to sensitivity and specificity rates; thus, the aim of this study is to compare mass spectrometry of healthy and cancerous samples in order to find a set of biomarkers or indicators with a reasonable sensitivity and specificity rates. Methods: Therefore, combination methods were used for choosing the optimum feature set as t-test, entropy, Bhattacharya, and an imperialist competitive algorithm with K-nearest neighbors classifier. The resulting feature from each method was feed to the C5 decision tree with 10-fold cross-validation to classify data. Results: The most important variables using this method were identified and a set of rules were extracted. Similar to most frequent features, repetitive patterns were not obtained; the generalized rule induction method was used to identify the repetitive patterns. Conclusion: Finally, the resulting features were introduced as biomarkers and compared with other studies. It was found that the resulting features were very similar to other studies. In the case of the classifier, higher sensitivity and specificity rates with a lower number of features were achieved when compared with other studies.
Collapse
Affiliation(s)
- Shiva Pirhadi
- Department of Biomedical Engineering, Tehran Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Keivan Maghooli
- Department of Biomedical Engineering, Tehran Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Niloofar Yousefi Moteghaed
- Department of Biomedical Engineering and Medical Physics, Faculty of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Masoud Garshasbi
- Department of Medical Genetics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | | |
Collapse
|
4
|
Lu HC, Patterson NH, Judd AM, Reyzer ML, Sehn JK. Imaging Mass Spectrometry Is an Accurate Tool in Differentiating Clear Cell Renal Cell Carcinoma and Chromophobe Renal Cell Carcinoma: A Proof-of-concept Study. J Histochem Cytochem 2020; 68:403-411. [PMID: 32466698 DOI: 10.1369/0022155420931417] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Clear cell renal cell carcinoma (ccRCC) and chromophobe renal cell carcinoma (chRCC) are relatively common tumors that can have significant risk for mortality. Treatment and prognostication in renal cell carcinoma (RCC) are dependent upon correct histologic typing. ccRCC and chRCC are generally straightforward to diagnose based on histomorphology alone. However, high-grade ccRCC and chRCC can sometimes resemble each other morphologically, particularly in small biopsies. Multiple immunostains and/or colloidal iron stain are sometimes required to differentiate the two. Imaging mass spectrometry (IMS) allows simultaneous spatial mapping of thousands of biomarkers, using formalin-fixed paraffin-embedded tissue sections. In this study, we evaluate the ability of IMS to differentiate between World Health Organization/International Society for Urological Pathology grade 3 ccRCC and chRCC. IMS spectra from a training set of 14 ccRCC and 13 chRCC were evaluated via support vector machine algorithm with a linear kernel for machine learning, building a classification model. The classification model was applied to a separate validation set of 6 ccRCC and 6 chRCC, with 19 to 20, 150-μm diameter tumor foci in each case sampled by IMS. Most evaluated tumor foci were classified correctly as ccRCC versus chRCC (99% accuracy, kappa=0.98), demonstrating that IMS is an accurate tool in differentiating high-grade ccRCC and chRCC.
Collapse
Affiliation(s)
- Hsiang-Chih Lu
- Department of Pathology and Immunology, Washington University School of Medicine in St. Louis, St. Louis, MO
| | - Nathan Heath Patterson
- Mass Spectrometry Research Center, Department of Biochemistry, Vanderbilt University, Nashville, TN
| | - Audra M Judd
- Mass Spectrometry Research Center, Department of Biochemistry, Vanderbilt University, Nashville, TN
| | - Michelle L Reyzer
- Mass Spectrometry Research Center, Department of Biochemistry, Vanderbilt University, Nashville, TN
| | - Jennifer K Sehn
- Department of Pathology and Immunology, Washington University School of Medicine in St. Louis, St. Louis, MO
| |
Collapse
|
5
|
Tsypin M, Asmellash S, Meyer K, Touchet B, Roder H. Extending the information content of the MALDI analysis of biological fluids via multi-million shot analysis. PLoS One 2019; 14:e0226012. [PMID: 31815946 PMCID: PMC6901224 DOI: 10.1371/journal.pone.0226012] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Accepted: 11/18/2019] [Indexed: 12/31/2022] Open
Abstract
INTRODUCTION Reliable measurements of the protein content of biological fluids like serum or plasma can provide valuable input for the development of personalized medicine tests. Standard MALDI analysis typically only shows high abundance proteins, which limits its utility for test development. It also exhibits reproducibility issues with respect to quantitative measurements. In this paper we show how the sensitivity of MALDI profiling of intact proteins in unfractionated human serum can be substantially increased by exposing a sample to many more laser shots than are commonly used. Analytical reproducibility is also improved. METHODS To assess what is theoretically achievable we utilized spectra from the same samples obtained over many years and combined them to generate MALDI spectral averages of up to 100,000,000 shots for a single sample, and up to 8,000,000 shots for a set of 40 different serum samples. Spectral attributes, such as number of peaks and spectral noise of such averaged spectra were investigated together with analytical reproducibility as a function of the number of shots. We confirmed that results were similar on MALDI instruments from different manufacturers. RESULTS We observed an expected decrease of noise, roughly proportional to the square root of the number of shots, over the whole investigated range of the number of shots (5 orders of magnitude), resulting in an increase in the number of reliably detected peaks. The reproducibility of the amplitude of these peaks, measured by CV and concordance analysis also improves with very similar dependence on shot number, reaching median CVs below 2% for shot numbers > 4 million. Measures of analytical information content and association with biological processes increase with increasing number of shots. CONCLUSIONS We demonstrate that substantially increasing the number of laser shots in a MALDI-TOF analysis leads to more informative and reliable data on the protein content of unfractionated serum. This approach has already been used in the development of clinical tests in oncology.
Collapse
Affiliation(s)
- Maxim Tsypin
- Biodesix Inc., Boulder, Colorado, United States of America
| | | | - Krista Meyer
- Biodesix Inc., Boulder, Colorado, United States of America
| | | | - Heinrich Roder
- Biodesix Inc., Boulder, Colorado, United States of America
| |
Collapse
|