1
|
Serra A, Cattelani L, Fratello M, Fortino V, Kinaret PAS, Greco D. Supervised Methods for Biomarker Detection from Microarray Experiments. Methods Mol Biol 2022; 2401:101-120. [PMID: 34902125 DOI: 10.1007/978-1-0716-1839-4_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Biomarkers are valuable indicators of the state of a biological system. Microarray technology has been extensively used to identify biomarkers and build computational predictive models for disease prognosis, drug sensitivity and toxicity evaluations. Activation biomarkers can be used to understand the underlying signaling cascades, mechanisms of action and biological cross talk. Biomarker detection from microarray data requires several considerations both from the biological and computational points of view. In this chapter, we describe the main methodology used in biomarkers discovery and predictive modeling and we address some of the related challenges. Moreover, we discuss biomarker validation and give some insights into multiomics strategies for biomarker detection.
Collapse
Affiliation(s)
- Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), University of Tampere, Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), University of Tampere, Tampere, Finland
| | - Michele Fratello
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), University of Tampere, Tampere, Finland
| | - Vittorio Fortino
- Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
| | - Pia Anneli Sofia Kinaret
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), University of Tampere, Tampere, Finland
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
- BioMediTech Institute, Tampere University, Tampere, Finland.
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), University of Tampere, Tampere, Finland.
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
2
|
Tutino VM, Poppenberg KE, Li L, Shallwani H, Jiang K, Jarvis JN, Sun Y, Snyder KV, Levy EI, Siddiqui AH, Kolega J, Meng H. Biomarkers from circulating neutrophil transcriptomes have potential to detect unruptured intracranial aneurysms. J Transl Med 2018; 16:373. [PMID: 30593281 PMCID: PMC6310942 DOI: 10.1186/s12967-018-1749-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Accepted: 12/17/2018] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Intracranial aneurysms (IAs) are dangerous because of their potential to rupture and cause deadly subarachnoid hemorrhages. Previously, we found significant RNA expression differences in circulating neutrophils between patients with unruptured IAs and aneurysm-free controls. Searching for circulating biomarkers for unruptured IAs, we tested the feasibility of developing classification algorithms that use neutrophil RNA expression levels from blood samples to predict the presence of an IA. METHODS Neutrophil RNA extracted from blood samples from 40 patients (20 with angiography-confirmed unruptured IA, 20 angiography-confirmed IA-free controls) was subjected to next-generation RNA sequencing to obtain neutrophil transcriptomes. In a randomly-selected training cohort of 30 of the 40 samples (15 with IA, 15 controls), we performed differential expression analysis. Significantly differentially expressed transcripts (false discovery rate < 0.05, fold change ≥ 1.5) were used to construct prediction models for IA using four well-known supervised machine-learning approaches (diagonal linear discriminant analysis, cosine nearest neighbors, nearest shrunken centroids, and support vector machines). These models were tested in a testing cohort of the remaining 10 neutrophil samples from the 40 patients (5 with IA, 5 controls), and model performance was assessed by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (PCR) was used to corroborate expression differences of a subset of model transcripts in neutrophil samples from a new, separate validation cohort of 10 patients (5 with IA, 5 controls). RESULTS The training cohort yielded 26 highly significantly differentially expressed neutrophil transcripts. Models using these transcripts identified IA patients in the testing cohort with accuracy ranging from 0.60 to 0.90. The best performing model was the diagonal linear discriminant analysis classifier (area under the ROC curve = 0.80 and accuracy = 0.90). Six of seven differentially expressed genes we tested were confirmed by quantitative PCR using isolated neutrophils from the separate validation cohort. CONCLUSIONS Our findings demonstrate the potential of machine-learning methods to classify IA cases and create predictive models for unruptured IAs using circulating neutrophil transcriptome data. Future studies are needed to replicate these findings in larger cohorts.
Collapse
Affiliation(s)
- Vincent M. Tutino
- Canon Stroke and Vascular Research Center, University at Buffalo, Clinical and Translational Research Center, 875 Ellicott Street, Buffalo, NY 14214 USA
- Department of Biomedical Engineering, University at Buffalo, Buffalo, NY USA
| | - Kerry E. Poppenberg
- Canon Stroke and Vascular Research Center, University at Buffalo, Clinical and Translational Research Center, 875 Ellicott Street, Buffalo, NY 14214 USA
- Department of Biomedical Engineering, University at Buffalo, Buffalo, NY USA
| | - Lu Li
- Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY USA
| | - Hussain Shallwani
- Department of Neurosurgery, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY USA
| | - Kaiyu Jiang
- Genetics, Genomics, and Bioinformatics Program, University at Buffalo, Buffalo, NY USA
| | - James N. Jarvis
- Genetics, Genomics, and Bioinformatics Program, University at Buffalo, Buffalo, NY USA
- Department of Pediatrics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY USA
| | - Yijun Sun
- Genetics, Genomics, and Bioinformatics Program, University at Buffalo, Buffalo, NY USA
- Department of Microbiology and Immunology, University at Buffalo, Buffalo, NY USA
| | - Kenneth V. Snyder
- Canon Stroke and Vascular Research Center, University at Buffalo, Clinical and Translational Research Center, 875 Ellicott Street, Buffalo, NY 14214 USA
- Department of Neurosurgery, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY USA
- Department of Radiology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY USA
- Department of Neurology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY USA
| | - Elad I. Levy
- Canon Stroke and Vascular Research Center, University at Buffalo, Clinical and Translational Research Center, 875 Ellicott Street, Buffalo, NY 14214 USA
- Department of Neurosurgery, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY USA
- Department of Radiology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY USA
| | - Adnan H. Siddiqui
- Canon Stroke and Vascular Research Center, University at Buffalo, Clinical and Translational Research Center, 875 Ellicott Street, Buffalo, NY 14214 USA
- Department of Neurosurgery, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY USA
- Department of Radiology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY USA
| | - John Kolega
- Canon Stroke and Vascular Research Center, University at Buffalo, Clinical and Translational Research Center, 875 Ellicott Street, Buffalo, NY 14214 USA
- Department of Pathology and Anatomical Sciences, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY USA
| | - Hui Meng
- Canon Stroke and Vascular Research Center, University at Buffalo, Clinical and Translational Research Center, 875 Ellicott Street, Buffalo, NY 14214 USA
- Department of Biomedical Engineering, University at Buffalo, Buffalo, NY USA
- Department of Neurosurgery, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY USA
- Department of Mechanical & Aerospace Engineering, University at Buffalo, Buffalo, NY USA
| |
Collapse
|
3
|
RNA sequencing data from neutrophils of patients with cystic fibrosis reveals potential for developing biomarkers for pulmonary exacerbations. J Cyst Fibros 2018; 18:194-202. [PMID: 29941318 DOI: 10.1016/j.jcf.2018.05.014] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Revised: 05/01/2018] [Accepted: 05/22/2018] [Indexed: 01/16/2023]
Abstract
BACKGROUND There is no effective way to predict cystic fibrosis (CF) pulmonary exacerbations (CFPE) before they become symptomatic or to assess satisfactory treatment responses. METHODS RNA sequencing of peripheral blood neutrophils from CF patients before and after therapy for CFPE was used to create transcriptome profiles. Transcripts with an average transcripts per million (TPM) level > 1.0 and a false discovery rate (FDR) < 0.05 were used in a cosine K-nearest neighbor (KNN) model. Real time PCR was used to corroborate RNA sequencing expression differences in both neutrophils and whole blood samples from an independent cohort of CF patients. Furthermore, sandwich ELISA was conducted to assess plasma levels of MRP8/14 complexes in CF patients before and after therapy. RESULTS We found differential expression of 136 transcripts and 83 isoforms when we compared neutrophils from CF patients before and after therapy (>1.5 fold change, FDR-adjusted P < 0.05). The model was able to successfully separate CF flare samples from those taken from the same patients in convalescence with an accuracy of 0.75 in both the training and testing cohorts. Six differently expressed genes were confirmed by real time PCR using both isolated neutrophils and whole blood from an independent cohort of CF patients before and after therapy, even though levels of myeloid related protein MRP8/14 dimers in plasma of CF patients were essentially unchanged by therapy. CONCLUSIONS Our findings demonstrate the potential of machine learning approaches for classifying disease states and thus developing sensitive biomarkers that can be used to monitor pulmonary disease activity in CF.
Collapse
|
4
|
Benson VM, Campagne F. Language workbench user interfaces for data analysis. PeerJ 2015; 3:e800. [PMID: 25755929 PMCID: PMC4349052 DOI: 10.7717/peerj.800] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Accepted: 02/05/2015] [Indexed: 11/29/2022] Open
Abstract
Biological data analysis is frequently performed with command line software. While this practice provides considerable flexibility for computationally savy individuals, such as investigators trained in bioinformatics, this also creates a barrier to the widespread use of data analysis software by investigators trained as biologists and/or clinicians. Workflow systems such as Galaxy and Taverna have been developed to try and provide generic user interfaces that can wrap command line analysis software. These solutions are useful for problems that can be solved with workflows, and that do not require specialized user interfaces. However, some types of analyses can benefit from custom user interfaces. For instance, developing biomarker models from high-throughput data is a type of analysis that can be expressed more succinctly with specialized user interfaces. Here, we show how Language Workbench (LW) technology can be used to model the biomarker development and validation process. We developed a language that models the concepts of Dataset, Endpoint, Feature Selection Method and Classifier. These high-level language concepts map directly to abstractions that analysts who develop biomarker models are familiar with. We found that user interfaces developed in the Meta-Programming System (MPS) LW provide convenient means to configure a biomarker development project, to train models and view the validation statistics. We discuss several advantages of developing user interfaces for data analysis with a LW, including increased interface consistency, portability and extension by language composition. The language developed during this experiment is distributed as an MPS plugin (available at http://campagnelab.org/software/bdval-for-mps/).
Collapse
Affiliation(s)
- Victoria M. Benson
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, The Weill Cornell Medical College, New York, NY, United States of America
| | - Fabien Campagne
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, The Weill Cornell Medical College, New York, NY, United States of America
| |
Collapse
|
5
|
Andersen GB, Hager H, Hansen LL, Tost J. Improved reproducibility in genome-wide DNA methylation analysis for PAXgene-fixed samples compared with restored formalin-fixed and paraffin-embedded DNA. Anal Biochem 2015; 468:50-8. [DOI: 10.1016/j.ab.2014.09.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2014] [Revised: 08/28/2014] [Accepted: 09/09/2014] [Indexed: 01/24/2023]
|
6
|
Chen Z, Liu Z, Deng X, Warden C, Li W, Garcia-Aguilar J. Chromosomal copy number alterations are associated with persistent lymph node metastasis after chemoradiation in locally advanced rectal cancer. Dis Colon Rectum 2012; 55:677-85. [PMID: 22595848 PMCID: PMC3356567 DOI: 10.1097/dcr.0b013e31824f873f] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
BACKGROUND Lymph node metastasis is an important indicator of oncologic outcome for patients with rectal cancer. Identifying predictive biomarkers of lymph node metastasis could therefore be clinically useful. OBJECTIVE The aim of this study was to assess whether chromosomal copy number alterations can assist in predicting persistent lymph node metastasis in patients with locally advanced rectal cancer treated with preoperative chemoradiation therapy. DESIGN This is a nonrandomized, prospective phase II study. SETTING This study took place in a multi-institutional setting. PATIENTS Ninety-five patients with stage II (cT3-4, cN0) or stage III (any cT, cN1-2) rectal cancer were included. INTERVENTION Patients were treated with preoperative chemoradiation therapy followed by total mesorectal excision. Pretreatment biopsy tumor DNA and surgical margin control DNA were extracted and analyzed by oligonucleotide array-based comparative genomic hybridization. Chromosomal copy number alterations were correlated with persistent lymph node metastasis. Finally, a model for predicting persistent lymph node metastasis was built. MAIN OUTCOME MEASURES The primary outcomes assessed were whether chromosomal copy number alterations are associated with persistent lymph node metastasis in patients with rectal cancer and the accuracy of oligonucleotide array-based comparative genomic hybridization for predicting lymph node metastasis. RESULTS Twenty-five of 95 (26%) patients had lymph node metastasis after chemoradiation. Losses of 28 chromosomal regions, most notably in chromosome 4, were significantly associated with lymph node metastasis. Our predictive model contained 65 probes and predicted persistent lymph node metastasis with 68% sensitivity, 93% specificity, and positive and negative predictive values of 77% and 89%. The use of this model accurately predicted lymph node status (positive or negative) after chemoradiation therapy in 82 of 95 patients (86%). LIMITATIONS The patient cohort was not completely homogeneous, which may have influenced their clinical outcome. In addition, although we performed rigorous, statistically sound internal validation, external validation will be important to further corroborate our findings. CONCLUSIONS Copy number alterations can help identify patients with rectal cancer who are at risk of lymph node metastasis after chemoradiation.
Collapse
Affiliation(s)
- Zhenbin Chen
- Department of Surgery, City of Hope, Duarte, CA 91010, USA
| | - Zheng Liu
- Bioinformatics Core, Department of Molecular Medicine, City of Hope, Duarte, CA 91010, USA
| | - Xutao Deng
- Bioinformatics Core, Department of Molecular Medicine, City of Hope, Duarte, CA 91010, USA
| | - Charles Warden
- Bioinformatics Core, Department of Molecular Medicine, City of Hope, Duarte, CA 91010, USA
| | - Wenyan Li
- Department of Surgery, City of Hope, Duarte, CA 91010, USA
| | - Julio Garcia-Aguilar
- Department of Surgery, City of Hope, Duarte, CA 91010, USA,Corresponding Author: Julio Garcia-Aguilar, MD, PhD, Chair, Department of Surgery, City of Hope, 1500 E. Duarte Road, CA 91010. Tel: (626) 471-9309. Fax: (626) 301-8113.,
| |
Collapse
|
7
|
Chen Z, Liu Z, Li W, Qu K, Deng X, Varma MG, Fichera A, Pigazzi A, Garcia-Aguilar J. Chromosomal copy number alterations are associated with tumor response to chemoradiation in locally advanced rectal cancer. Genes Chromosomes Cancer 2011; 50:689-99. [PMID: 21584903 DOI: 10.1002/gcc.20891] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2011] [Accepted: 04/18/2011] [Indexed: 01/24/2023] Open
Abstract
Rectal cancer response to chemoradiation (CRT) varies from no response to a pathologic complete response (pCR). Identifying predictive biomarkers of response would therefore be useful. We assessed whether chromosomal copy number alterations (CNAs) can assist in predicting pCR. Pretreatment tumor biopsies and paired normal surgical tissues from the proximal resection margin were collected from 95 rectal cancer patients treated with preoperative CRT and total mesorectal excision in a prospective Phase II study. Tumor and control DNA were extracted, and oligonucleotide array-based comparative genomic hybridization (aCGH) was used to identify CNAs, which were correlated with pCR. Ingenuity pathway analysis (IPA) was then used to identify functionally relevant genes in aberrant regions. Finally, a predictive model for pCR was built using support vector machine (SVM), and leave-one-out cross validation assessed the accuracy of aCGH. Chromosomal regions most commonly affected by gains were 20q11.21-q13.33, 13q11.32-23, 7p22.3-p22.2, and 8q23.3-q24.3, and losses were present at 18q11.32-q23, 17p13.3-q11.1, 10q23.1, and 4q32.1-q32.3. The 25 (26%) patients who achieved a pCR had significantly fewer high copy gains overall than non-pCR patients (P = 0.01). Loss of chromosomal region 15q11.1-q26.3 was significantly associated with non-pCR (P < 0.00002; Q-bound < 0.0391), while loss of 12p13.31 was significantly associated with pCR (P < 0.0003; Q-bound < 0.097). IPA identified eight genes in the imbalanced chromosomal regions that associated with tumor response. SVM identified 58 probes that predict pCR with 76% sensitivity, 97% specificity, and positive and negative predictive values of 91% and 92%. Our data indicate that chromosomal CNAs can help identify rectal cancer patients more likely to develop a pCR to CRT.
Collapse
Affiliation(s)
- Zhenbin Chen
- Department of Surgery, City of Hope, 1500 E. Duarte Road, Duarte, CA 91010, USA
| | | | | | | | | | | | | | | | | |
Collapse
|