1
|
Baquer G, Sementé L, García-Altares M, Lee YJ, Chaurand P, Correig X, Ràfols P. rMSIcleanup: an open-source tool for matrix-related peak annotation in mass spectrometry imaging and its application to silver-assisted laser desorption/ionization. J Cheminform 2020; 12:45. [PMID: 33431000 PMCID: PMC7374922 DOI: 10.1186/s13321-020-00449-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 07/13/2020] [Indexed: 11/14/2022] Open
Abstract
Mass spectrometry imaging (MSI) has become a mature, widespread analytical technique to perform non-targeted spatial metabolomics. However, the compounds used to promote desorption and ionization of the analyte during acquisition cause spectral interferences in the low mass range that hinder downstream data processing in metabolomics applications. Thus, it is advisable to annotate and remove matrix-related peaks to reduce the number of redundant and non-biologically-relevant variables in the dataset. We have developed rMSIcleanup, an open-source R package to annotate and remove signals from the matrix, according to the matrix chemical composition and the spatial distribution of its ions. To validate the annotation method, rMSIcleanup was challenged with several images acquired using silver-assisted laser desorption ionization MSI (AgLDI MSI). The algorithm was able to correctly classify m/z signals related to silver clusters. Visual exploration of the data using Principal Component Analysis (PCA) demonstrated that annotation and removal of matrix-related signals improved spectral data post-processing. The results highlight the need for including matrix-related peak annotation tools such as rMSIcleanup in MSI workflows.![]()
Collapse
Affiliation(s)
- Gerard Baquer
- Department of Electronic Engineering, Rovira i Virgili University, Tarragona, Spain
| | - Lluc Sementé
- Department of Electronic Engineering, Rovira i Virgili University, Tarragona, Spain
| | - María García-Altares
- Department of Electronic Engineering, Rovira i Virgili University, Tarragona, Spain. .,Spanish Biomedical Research Centre in Diabetes and Associated Metabolic Disorders (CIBERDEM), 28029, Madrid, Spain.
| | - Young Jin Lee
- Department of Chemistry, Iowa State University, Ames, IA, 50011, USA
| | - Pierre Chaurand
- Department of Chemistry, Université de Montréal, Montreal, QC, H3C 3J7, Canada
| | - Xavier Correig
- Department of Electronic Engineering, Rovira i Virgili University, Tarragona, Spain.,Spanish Biomedical Research Centre in Diabetes and Associated Metabolic Disorders (CIBERDEM), 28029, Madrid, Spain.,Institut d'Investigació Sanitària Pere Virgili, Tarragona, Spain
| | - Pere Ràfols
- Department of Electronic Engineering, Rovira i Virgili University, Tarragona, Spain.,Spanish Biomedical Research Centre in Diabetes and Associated Metabolic Disorders (CIBERDEM), 28029, Madrid, Spain.,Institut d'Investigació Sanitària Pere Virgili, Tarragona, Spain
| |
Collapse
|
2
|
Gene selection and disease prediction from gene expression data using a two-stage hetero-associative memory. PROGRESS IN ARTIFICIAL INTELLIGENCE 2018. [DOI: 10.1007/s13748-018-0148-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
3
|
Yu Z, Wang Z, You J, Zhang J, Liu J, Wong HS, Han G. A New Kind of Nonparametric Test for Statistical Comparison of Multiple Classifiers Over Multiple Datasets. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:4418-4431. [PMID: 28113414 DOI: 10.1109/tcyb.2016.2611020] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Nonparametric statistical analysis, such as the Friedman test (FT), is gaining more and more attention due to its useful applications in a lot of experimental studies. However, traditional FT for the comparison of multiple learning algorithms on different datasets adopts the naive ranking approach. The ranking is based on the average accuracy values obtained by the set of learning algorithms on the datasets, which neither considers the differences of the results obtained by the learning algorithms on each dataset nor takes into account the performance of the learning algorithms in each run. In this paper, we will first propose three kinds of ranking approaches, which are the weighted ranking approach, the global ranking approach (GRA), and the weighted GRA. Then, a theoretical analysis is performed to explore the properties of the proposed ranking approaches. Next, a set of the modified FTs based on the proposed ranking approaches are designed for the comparison of the learning algorithms. Finally, the modified FTs are evaluated through six classifier ensemble approaches on 34 real-world datasets. The experiments show the effectiveness of the modified FTs.
Collapse
|
4
|
Multilayer descriptors for medical image classification. Comput Biol Med 2016; 72:239-47. [DOI: 10.1016/j.compbiomed.2015.11.010] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2014] [Revised: 11/18/2015] [Accepted: 11/19/2015] [Indexed: 11/23/2022]
|
5
|
Classifying component failures of a hybrid electric vehicle fleet based on load spectrum data. Neural Comput Appl 2015. [DOI: 10.1007/s00521-015-2065-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
6
|
Saito PTM, Nakamura RYM, Amorim WP, Papa JP, de Rezende PJ, Falcão AX. Choosing the Most Effective Pattern Classification Model under Learning-Time Constraint. PLoS One 2015; 10:e0129947. [PMID: 26114552 PMCID: PMC4483274 DOI: 10.1371/journal.pone.0129947] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 05/14/2015] [Indexed: 12/04/2022] Open
Abstract
Nowadays, large datasets are common and demand faster and more effective pattern analysis techniques. However, methodologies to compare classifiers usually do not take into account the learning-time constraints required by applications. This work presents a methodology to compare classifiers with respect to their ability to learn from classification errors on a large learning set, within a given time limit. Faster techniques may acquire more training samples, but only when they are more effective will they achieve higher performance on unseen testing sets. We demonstrate this result using several techniques, multiple datasets, and typical learning-time limits required by applications.
Collapse
Affiliation(s)
- Priscila T. M. Saito
- Department of Computing, Federal University of Technology—Paraná, Cornélio Procópio, Brazil
- * E-mail:
| | | | - Willian P. Amorim
- Institute of Computing, Federal University of Mato Grosso do Sul, Campo Grande, Brazil
| | - João P. Papa
- Department of Computing, São Paulo State University, Bauru, Brazil
| | | | | |
Collapse
|
7
|
Sarikaya A, Albers D, Mitchell J, Gleicher M. Visualizing Validation of Protein Surface Classifiers. COMPUTER GRAPHICS FORUM : JOURNAL OF THE EUROPEAN ASSOCIATION FOR COMPUTER GRAPHICS 2014; 33:171-180. [PMID: 25342867 PMCID: PMC4204728 DOI: 10.1111/cgf.12373] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Many bioinformatics applications construct classifiers that are validated in experiments that compare their results to known ground truth over a corpus. In this paper, we introduce an approach for exploring the results of such classifier validation experiments, focusing on classifiers for regions of molecular surfaces. We provide a tool that allows for examining classification performance patterns over a test corpus. The approach combines a summary view that provides information about an entire corpus of molecules with a detail view that visualizes classifier results directly on protein surfaces. Rather than displaying miniature 3D views of each molecule, the summary provides 2D glyphs of each protein surface arranged in a reorderable, small-multiples grid. Each summary is specifically designed to support visual aggregation to allow the viewer to both get a sense of aggregate properties as well as the details that form them. The detail view provides a 3D visualization of each protein surface coupled with interaction techniques designed to support key tasks, including spatial aggregation and automated camera touring. A prototype implementation of our approach is demonstrated on protein surface classifier experiments.
Collapse
Affiliation(s)
- A Sarikaya
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - D Albers
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - J Mitchell
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA ; Department of Mathematics, University of Wisconsin-Madison, Madison, WI, USA
| | - M Gleicher
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
8
|
Goh LK, Liem N, Vijayaraghavan A, Chen G, Lim PL, Tay KJ, Chang M, Low JSW, Joshi A, Huang HH, Kalaw E, Tan PH, Hsieh WS, Yong WP, Alumkal J, Sim HG. Diagnostic and prognostic utility of a DNA hypermethylated gene signature in prostate cancer. PLoS One 2014; 9:e91666. [PMID: 24626295 PMCID: PMC3953552 DOI: 10.1371/journal.pone.0091666] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2013] [Accepted: 02/13/2014] [Indexed: 12/31/2022] Open
Abstract
We aimed to identify a prostate cancer DNA hypermethylation microarray signature (denoted as PHYMA) that differentiates prostate cancer from benign prostate hyperplasia (BPH), high from low-grade and lethal from non-lethal cancers. This is a non-randomized retrospective study in 111 local Asian men (87 prostate cancers and 24 BPH) treated from 1995 to 2009 in our institution. Archival prostate epithelia were laser-capture microdissected and genomic DNA extracted and bisulfite-converted. Samples were profiled using Illumina GoldenGate Methylation microarray, with raw data processed by GenomeStudio. A classification model was generated using support vector machine, consisting of a 55-probe DNA methylation signature of 46 genes. The model was independently validated on an internal testing dataset which yielded cancer detection sensitivity and specificity of 95.3% and 100% respectively, with overall accuracy of 96.4%. Second validation on another independent western cohort yielded 89.8% sensitivity and 66.7% specificity, with overall accuracy of 88.7%. A PHYMA score was developed for each sample based on the state of methylation in the PHYMA signature. Increasing PHYMA score was significantly associated with higher Gleason score and Gleason primary grade. Men with higher PHYMA scores have poorer survival on univariate (p = 0.0038, HR = 3.89) and multivariate analyses when controlled for (i) clinical stage (p = 0.055, HR = 2.57), and (ii) clinical stage and Gleason score (p = 0.043, HR = 2.61). We further performed bisulfite genomic sequencing on 2 relatively unknown genes to demonstrate robustness of the assay results. PHYMA is thus a signature with high sensitivity and specificity for discriminating tumors from BPH, and has a potential role in early detection and in predicting survival.
Collapse
Affiliation(s)
- Liang Kee Goh
- Centre for Quantitative Medicine, Duke-National University of Singapore Graduate Medical School, Singapore, Singapore, Singapore
- Cancer & Stem Cell Biology, Duke-National University of Singapore Graduate Medical School, Singapore, Singapore, Singapore
- * E-mail: (LKG); (HGS)
| | - Natalia Liem
- Cancer Science Institute, National University of Singapore, Singapore, Singapore, Singapore
| | - Aadhitthya Vijayaraghavan
- Centre for Quantitative Medicine, Duke-National University of Singapore Graduate Medical School, Singapore, Singapore, Singapore
| | - Gengbo Chen
- Cancer & Stem Cell Biology, Duke-National University of Singapore Graduate Medical School, Singapore, Singapore, Singapore
| | - Pei Li Lim
- Cancer Science Institute, National University of Singapore, Singapore, Singapore, Singapore
| | - Kae-Jack Tay
- Department of Urology, Singapore General Hospital, Singapore, Singapore, Singapore
| | - Michelle Chang
- Department of Urology, Singapore General Hospital, Singapore, Singapore, Singapore
| | - John Soon Wah Low
- Cancer Science Institute, National University of Singapore, Singapore, Singapore, Singapore
| | - Adita Joshi
- Department of Urology, Singapore General Hospital, Singapore, Singapore, Singapore
| | - Hong Hong Huang
- Department of Urology, Singapore General Hospital, Singapore, Singapore, Singapore
| | - Emarene Kalaw
- Department of Pathology, Singapore General Hospital, Singapore, Singapore, Singapore
| | - Puay Hoon Tan
- Department of Pathology, Singapore General Hospital, Singapore, Singapore, Singapore
| | - Wen-Son Hsieh
- Cancer Science Institute, National University of Singapore, Singapore, Singapore, Singapore
| | - Wei Peng Yong
- Cancer Science Institute, National University of Singapore, Singapore, Singapore, Singapore
| | - Joshi Alumkal
- Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon, United States of America
| | - Hong Gee Sim
- Department of Urology, Singapore General Hospital, Singapore, Singapore, Singapore
- * E-mail: (LKG); (HGS)
| |
Collapse
|
9
|
Manning T, Sleator RD, Walsh P. Biologically inspired intelligent decision making: a commentary on the use of artificial neural networks in bioinformatics. Bioengineered 2013; 5:80-95. [PMID: 24335433 PMCID: PMC4049912 DOI: 10.4161/bioe.26997] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Artificial neural networks (ANNs) are a class of powerful machine learning models for classification and function approximation which have analogs in nature. An ANN learns to map stimuli to responses through repeated evaluation of exemplars of the mapping. This learning approach results in networks which are recognized for their noise tolerance and ability to generalize meaningful responses for novel stimuli. It is these properties of ANNs which make them appealing for applications to bioinformatics problems where interpretation of data may not always be obvious, and where the domain knowledge required for deductive techniques is incomplete or can cause a combinatorial explosion of rules. In this paper, we provide an introduction to artificial neural network theory and review some interesting recent applications to bioinformatics problems.
Collapse
Affiliation(s)
- Timmy Manning
- Department of Computer Science; Cork Institute of Technology; Cork, Ireland
| | - Roy D Sleator
- Department of Biological Sciences; Cork Institute of Technology; Cork, Ireland
| | - Paul Walsh
- NSilico Ltd; Rubicon Innovation Centre; Cork, Ireland
| |
Collapse
|