51
|
Integrated Chemometrics and Statistics to Drive Successful Proteomics Biomarker Discovery. Proteomes 2018; 6:proteomes6020020. [PMID: 29701723 PMCID: PMC6027525 DOI: 10.3390/proteomes6020020] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 04/19/2018] [Accepted: 04/25/2018] [Indexed: 01/15/2023] Open
Abstract
Protein biomarkers are of great benefit for clinical research and applications, as they are powerful means for diagnosing, monitoring and treatment prediction of different diseases. Even though numerous biomarkers have been reported, the translation to clinical practice is still limited. This mainly due to: (i) incorrect biomarker selection, (ii) insufficient validation of potential biomarkers, and (iii) insufficient clinical use. In this review, we focus on the biomarker selection process and critically discuss the chemometrical and statistical decisions made in proteomics biomarker discovery to increase to selection of high value biomarkers. The characteristics of the data, the computational resources, the type of biomarker that is searched for and the validation strategy influence the decision making of the chemometrical and statistical methods and a decision made for one component directly influences the choice for another. Incorrect decisions could increase the false positive and negative rate of biomarkers which requires independent confirmation of outcome by other techniques and for comparison between different related studies. There are few guidelines for authors regarding data analysis documentation in peer reviewed journals, making it hard to reproduce successful data analysis strategies. Here we review multiple chemometrical and statistical methods for their value in proteomics-based biomarker discovery and propose to include key components in scientific documentation.
Collapse
|
52
|
Rodríguez-Pérez R, Cortés R, Guamán A, Pardo A, Torralba Y, Gómez F, Roca J, Barberà JA, Cascante M, Marco S. Instrumental drift removal in GC-MS data for breath analysis: the short-term and long-term temporal validation of putative biomarkers for COPD. J Breath Res 2018; 12:036007. [PMID: 29292699 DOI: 10.1088/1752-7163/aaa492] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Breath analysis holds the promise of a non-invasive technique for the diagnosis of diverse respiratory conditions including chronic obstructive pulmonary disease (COPD) and lung cancer. Breath contains small metabolites that may be putative biomarkers of these conditions. However, the discovery of reliable biomarkers is a considerable challenge in the presence of both clinical and instrumental confounding factors. Among the latter, instrumental time drifts are highly relevant, as since question the short and long-term validity of predictive models. In this work we present a methodology to counter instrumental drifts using information from interleaved blanks for a case study of GC-MS data from breath samples. The proposed method includes feature filtering, and additive, multiplicative and multivariate drift corrections, the latter being based on component correction. Biomarker discovery was based on genetic algorithms in a filter configuration using Fisher's ratio computed in the partial least squares-discriminant analysis subspace as a figure of merit. Using our protocol, we have been able to find nine peaks that provide a statistically significant area under the ROC curve of 0.75 for COPD discrimination. The method developed has been successfully validated using blind samples in short-term temporal validation. However, the attempt to use this model for patient screening six months later was not successful. This negative result highlights the importance of increasing validation rigor when reporting biomarker discovery results.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Signal and Information Processing for Sensing Systems, Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | | | | | | | | | | | | | | | | | | |
Collapse
|
53
|
Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics. Molecules 2017; 23:molecules23010052. [PMID: 29278382 PMCID: PMC5943966 DOI: 10.3390/molecules23010052] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Revised: 12/15/2017] [Accepted: 12/16/2017] [Indexed: 11/29/2022] Open
Abstract
Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.
Collapse
|
54
|
Shahrjooihaghighi A, Frigui H, Zhang X, Wei X, Shi B, Trabelsi A. An Ensemble Feature Selection Method for Biomarker Discovery. PROCEEDINGS OF THE ... IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY. IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY 2017; 2017:416-421. [PMID: 30887013 PMCID: PMC6420823 DOI: 10.1109/isspit.2017.8388679] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]
Abstract
Feature selection in Liquid Chromatography-Mass Spectrometry (LC-MS)-based metabolomics data (biomarker discovery) have become an important topic for machine learning researchers. High dimensionality and small sample size of LC-MS data make feature selection a challenging task. The goal of biomarker discovery is to select the few most discriminative features among a large number of irreverent ones. To improve the reliability of the discovered biomarkers, we use an ensemble-based approach. Ensemble learning can improve the accuracy of feature selection by combining multiple algorithms that have complementary information. In this paper, we propose an ensemble approach to combine the results of filter-based feature selection methods. To evaluate the proposed approach, we compared it to two commonly used methods, t-test and PLS-DA, using a real data set.
Collapse
Affiliation(s)
- Aliasghar Shahrjooihaghighi
- Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY 40292, USA
| | - Hichem Frigui
- Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY 40292, USA
| | - Xiang Zhang
- Department of Chemistry, University of Louisville, Louisville, KY 40292, USA
| | - Xiaoli Wei
- Department of Chemistry, University of Louisville, Louisville, KY 40292, USA
| | - Biyun Shi
- Department of Chemistry, University of Louisville, Louisville, KY 40292, USA
| | - Ameni Trabelsi
- Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY 40292, USA
| |
Collapse
|
55
|
Goh WWB, Sng JCG, Yee JY, See YM, Lee TS, Wong L, Lee J. Can Peripheral Blood-Derived Gene Expressions Characterize Individuals at Ultra-high Risk for Psychosis? COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2017; 1:168-183. [PMID: 30090857 PMCID: PMC6067827 DOI: 10.1162/cpsy_a_00007] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 06/07/2017] [Indexed: 12/17/2022]
Abstract
The ultra-high risk (UHR) state was originally conceived to identify individuals at imminent risk of developing psychosis. Although recent studies have suggested that most individuals designated UHR do not, they constitute a distinctive group, exhibiting cognitive and functional impairments alongside multiple psychiatric morbidities. UHR characterization using molecular markers may improve understanding, provide novel insight into pathophysiology, and perhaps improve psychosis prediction reliability. Whole-blood gene expressions from 56 UHR subjects and 28 healthy controls are checked for existence of a consistent gene expression profile (signature) underlying UHR, across a variety of normalization and heterogeneity-removal techniques, including simple log-conversion, quantile normalization, gene fuzzy scoring (GFS), and surrogate variable analysis. During functional analysis, consistent and reproducible identification of important genes depends largely on how data are normalized. Normalization techniques that address sample heterogeneity are superior. The best performer, the unsupervised GFS, produced a strong and concise 12-gene signature, enriched for psychosis-associated genes. Importantly, when applied on random subsets of data, classifiers built with GFS are "meaningful" in the sense that the classifier models built using genes selected after other forms of normalization do not outperform random ones, but GFS-derived classifiers do. Data normalization can present highly disparate interpretations on biological data. Comparative analysis has shown that GFS is efficient at preserving signals while eliminating noise. Using this, we demonstrate confidently that the UHR designation is well correlated with a distinct blood-based gene signature.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Biological Sciences, Nanyang Technological University, Singapore
- Department of Computer Science, National University of Singapore, Singapore
| | - Judy Chia-Ghee Sng
- Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Jie Yin Yee
- Research Division, Institute of Mental Health, Singapore
| | - Yuen Mei See
- Research Division, Institute of Mental Health, Singapore
| | - Tih-Shih Lee
- Neuroscience and Behavioral Disorders Program, Duke–National University of Singapore Medical School, Singapore
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore
- Department of Pathology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Jimmy Lee
- Research Division, Institute of Mental Health, Singapore
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| |
Collapse
|
56
|
Fluorescence spectroscopy as tool for the geographical discrimination of coffees produced in different regions of Minas Gerais State in Brazil. Food Control 2017. [DOI: 10.1016/j.foodcont.2017.01.020] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
57
|
Gutenko I, Dmitriev K, Kaufman AE, Barish MA. AnaFe: Visual Analytics of Image-derived Temporal Features-Focusing on the Spleen. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:171-180. [PMID: 27514050 DOI: 10.1109/tvcg.2016.2598463] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We present a novel visualization framework, AnaFe, targeted at observing changes in the spleen over time through multiple image-derived features. Accurate monitoring of progressive changes is crucial for diseases that result in enlargement of the organ. Our system is comprised of multiple linked views combining visualization of temporal 3D organ data, related measurements, and features. Thus it enables the observation of progression and allows for simultaneous comparison within and between the subjects. AnaFe offers insights into the overall distribution of robustly extracted and reproducible quantitative imaging features and their changes within the population, and also enables detailed analysis of individual cases. It performs similarity comparison of temporal series of one subject to all other series in both sick and healthy groups. We demonstrate our system through two use case scenarios on a population of 189 spleen datasets from 68 subjects with various conditions observed over time.
Collapse
|
58
|
Bari MG, Salekin S, Zhang JM. A Robust and Efficient Feature Selection Algorithm for Microarray Data. Mol Inform 2016; 36. [PMID: 28000384 DOI: 10.1002/minf.201600099] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 11/21/2016] [Indexed: 12/20/2022]
Abstract
In the past decades, a few synergistic feature selection algorithms have been published, which includes Cooperative Index (CI) and K-Top Scoring Pair (k-TSP). These algorithms consider the synergistic behavior of features when they are included in a feature panel. Although promising results have been shown for these algorithms, there is lack of a comprehensive and fair comparison with other feature selection algorithms across a large number of microarray datasets in terms of classification accuracy and computational complexity. There is a need in evaluating their performance and reducing the complexity of such algorithms. We compared the performance of synergistic feature selection algorithms with 11 other commonly used algorithms based on 22 microarray gene expression binary class datasets. The evaluation confirms that synergistic algorithms such as CI and k-TSP will gradually increase the classification performance as more features are used in the classifiers. Also, in order to cut down computational cost, we proposed a new feature selection ranking score called Positive Synergy Index (PSI). Testing results show that features selected using PSI as well as synergistic feature selection algorithms provide better performance compared to with all other methods, while PSI has a computational complexity significantly lower than that of other synergistic algorithms.
Collapse
Affiliation(s)
- Mehrab Ghanat Bari
- Dept. of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, 55905
| | - Sirajul Salekin
- Dept. of Electrical and Computer Engineering, The University of Texas as San Antonio, San Antonio, TX, 78249
| | - Jianqiu Michelle Zhang
- Dept. of Electrical and Computer Engineering, The University of Texas as San Antonio, San Antonio, TX, 78249
| |
Collapse
|
59
|
Ma X, Zhu Y, Huang Y, Tegeler T, Gao SJ, Zhang J. Quantitative Proteomic Approach for MicroRNA Target Prediction Based on 18O/ 16O Labeling. Cancer Inform 2016; 14:163-173. [PMID: 27980386 PMCID: PMC5147440 DOI: 10.4137/cin.s30563] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Revised: 02/01/2016] [Accepted: 02/07/2016] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Among many large-scale proteomic quantification methods, 18O/16O labeling requires neither specific amino acid in peptides nor label incorporation through several cell cycles, as in metabolic labeling; it does not cause significant elution time shifts between heavy- and light-labeled peptides, and its dynamic range of quantification is larger than that of tandem mass spectrometry-based quantification methods. These properties offer 18O/16O labeling the maximum flexibility in application. However, 18O/16O labeling introduces large quantification variations due to varying labeling efficiency. There lacks a processing pipeline that warrants the reliable identification of differentially expressed proteins (DEPs). This motivates us to develop a quantitative proteomic approach based on 18O/16O labeling and apply it on Kaposi sarcoma-associated herpesvirus (KSHV) microRNA (miR) target prediction. KSHV is a human pathogenic γ-herpesvirus strongly associated with the development of B-cell proliferative disorders, including primary effusion lymphoma. Recent studies suggest that miRs have evolved a highly complex network of interactions with the cellular and viral transcriptomes, and relatively few KSHV miR targets have been characterized at the functional level. While the new miR target prediction method, photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation (PAR-CLIP), allows the identification of thousands of miR targets, the link between miRs and their targets still cannot be determined. We propose to apply the developed proteomic approach to establish such links. METHOD We integrate several 18O/16O data processing algorithms that we published recently and identify the messenger RNAs of downregulated proteins as potential targets in KSHV miR-transfected human embryonic kidney 293T cells. Various statistical tests are employed for picking DEPs, and we select the best test by examining the enrichment of PAR-CLIP-reported targets with seed match to the miRs of interest among top ranked DEPs returned by statistical tests. Subsequently, the list of DEPs picked by the selected statistical test is filtered with the criteria that they must have downregulated gene expressions, must have reported as targets by an miR target prediction algorithm SVMcrio, and must have reported as targets by PAR-CLIP. RESULT We test the developed approach in the problem of finding targets of KSHV miR-K1. The RNAs of three DEPs are identified as miR-K1 targets, among which RAB23 and HNRNPU are novel. Results from both Western blotting and Luciferase reporter assays confirm the novel targets. These results show that the developed quantitative approach based on 18O/16O labeling can be combined with genomic, PAR-CLIP, and target prediction algorithms for the confident identification of KSHV miR targets. The developed approach could also be applied in other applications.
Collapse
Affiliation(s)
- Xuepo Ma
- Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX, USA
| | - Ying Zhu
- Keck School of Medicine of USC, Los Angeles, CA, USA
| | - Yufei Huang
- Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX, USA
| | | | | | - Jianqiu Zhang
- Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX, USA
| |
Collapse
|
60
|
Kawahara R, Meirelles GV, Heberle H, Domingues RR, Granato DC, Yokoo S, Canevarolo RR, Winck FV, Ribeiro ACP, Brandão TB, Filgueiras PR, Cruz KSP, Barbuto JA, Poppi RJ, Minghim R, Telles GP, Fonseca FP, Fox JW, Santos-Silva AR, Coletta RD, Sherman NE, Paes Leme AF. Integrative analysis to select cancer candidate biomarkers to targeted validation. Oncotarget 2016; 6:43635-52. [PMID: 26540631 PMCID: PMC4791256 DOI: 10.18632/oncotarget.6018] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Accepted: 10/17/2015] [Indexed: 01/15/2023] Open
Abstract
Targeted proteomics has flourished as the method of choice for prospecting for and validating potential candidate biomarkers in many diseases. However, challenges still remain due to the lack of standardized routines that can prioritize a limited number of proteins to be further validated in human samples. To help researchers identify candidate biomarkers that best characterize their samples under study, a well-designed integrative analysis pipeline, comprising MS-based discovery, feature selection methods, clustering techniques, bioinformatic analyses and targeted approaches was performed using discovery-based proteomic data from the secretomes of three classes of human cell lines (carcinoma, melanoma and non-cancerous). Three feature selection algorithms, namely, Beta-binomial, Nearest Shrunken Centroids (NSC), and Support Vector Machine-Recursive Features Elimination (SVM-RFE), indicated a panel of 137 candidate biomarkers for carcinoma and 271 for melanoma, which were differentially abundant between the tumor classes. We further tested the strength of the pipeline in selecting candidate biomarkers by immunoblotting, human tissue microarrays, label-free targeted MS and functional experiments. In conclusion, the proposed integrative analysis was able to pre-qualify and prioritize candidate biomarkers from discovery-based proteomics to targeted MS.
Collapse
Affiliation(s)
- Rebeca Kawahara
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências, LNBio, CNPEM, Campinas, Brazil
| | - Gabriela V Meirelles
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências, LNBio, CNPEM, Campinas, Brazil
| | - Henry Heberle
- Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, USP, São Carlos, Brazil
| | - Romênia R Domingues
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências, LNBio, CNPEM, Campinas, Brazil
| | - Daniela C Granato
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências, LNBio, CNPEM, Campinas, Brazil
| | - Sami Yokoo
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências, LNBio, CNPEM, Campinas, Brazil
| | - Rafael R Canevarolo
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências, LNBio, CNPEM, Campinas, Brazil.,Centro Infantil Boldrini, Campinas, Brazil
| | - Flavia V Winck
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências, LNBio, CNPEM, Campinas, Brazil
| | - Ana Carolina P Ribeiro
- Instituto do Câncer do Estado de São Paulo, Octavio Frias de Oliveira, São Paulo, Brazil
| | - Thaís Bianca Brandão
- Instituto do Câncer do Estado de São Paulo, Octavio Frias de Oliveira, São Paulo, Brazil
| | - Paulo R Filgueiras
- Instituto de Química, Universidade Estadual de Campinas, UNICAMP, Piracicaba, Brazil
| | - Karen S P Cruz
- Instituto de Ciências Biomédicas, Departamento de Imunologia, Universidade de São Paulo, USP, São Paulo, Brazil
| | - José Alexandre Barbuto
- Instituto de Ciências Biomédicas, Departamento de Imunologia, Universidade de São Paulo, USP, São Paulo, Brazil
| | - Ronei J Poppi
- Instituto de Química, Universidade Estadual de Campinas, UNICAMP, Piracicaba, Brazil
| | - Rosane Minghim
- Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, USP, São Carlos, Brazil
| | - Guilherme P Telles
- Instituto de Computação, Universidade Estadual de Campinas, UNICAMP, Campinas, Brazil
| | - Felipe Paiva Fonseca
- Faculdade de Odontologia de Piracicaba, Universidade Estadual de Campinas, UNICAMP, Piracicaba, Brazil
| | - Jay W Fox
- W. M. Keck Biomedical Mass Spectrometry Lab, University of Virginia, Charlottesville, Virginia, USA
| | - Alan R Santos-Silva
- Faculdade de Odontologia de Piracicaba, Universidade Estadual de Campinas, UNICAMP, Piracicaba, Brazil
| | - Ricardo D Coletta
- Faculdade de Odontologia de Piracicaba, Universidade Estadual de Campinas, UNICAMP, Piracicaba, Brazil
| | - Nicholas E Sherman
- W. M. Keck Biomedical Mass Spectrometry Lab, University of Virginia, Charlottesville, Virginia, USA
| | - Adriana F Paes Leme
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências, LNBio, CNPEM, Campinas, Brazil
| |
Collapse
|
61
|
Kaddi CD, Wang MD. Models for predicting stage in head and neck squamous cell carcinoma using proteomic data. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2014:5216-9. [PMID: 25571169 DOI: 10.1109/embc.2014.6944801] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Head and neck squamous cell carcinoma (HNSCC) that is detected at an advanced stage is associated with much worse patient outcomes than if detected at early stages. This study uses reverse phase protein array (RPPA) data to build predictive models that discriminate between early and advanced stage HNSCC. Individual and ensemble binary classifiers, using filter-based and wrapper-based feature selection, are used to build several models which achieve moderate MCC and AUC values. This study identifies informative protein feature sets which may contribute to an increased understanding of the molecular basis of HNSCC.
Collapse
|
62
|
Abstract
Metabolomics-based strategies have become an integral part of modern clinical research, allowing for a better understanding of pathophysiological conditions and disease mechanisms, as well as providing innovative tools for more adequate diagnostic and prognosis approaches. Metabolomics is considered an essential tool in precision medicine, which aims for personalized prevention and tailor-made treatments. Nevertheless, multiple pitfalls may be encountered in clinical metabolomics during the entire workflow, hampering the quality of the data and, thus, the biological interpretation. This review describes the challenges underlying metabolomics-based experiments, discussing step by step the potential pitfalls of the analytical process, including study design, sample collection, storage, as well as preparation, chromatographic and electrophoretic separation, detection and data analysis. Moreover, it offers practical solutions and strategies to tackle these challenges, ensuring the generation of high-quality data.
Collapse
|
63
|
Rinaudo P, Boudah S, Junot C, Thévenot EA. biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data. Front Mol Biosci 2016; 3:26. [PMID: 27446929 PMCID: PMC4914951 DOI: 10.3389/fmolb.2016.00026] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Accepted: 06/03/2016] [Indexed: 01/02/2023] Open
Abstract
High-throughput technologies such as transcriptomics, proteomics, and metabolomics show great promise for the discovery of biomarkers for diagnosis and prognosis. Selection of the most promising candidates between the initial untargeted step and the subsequent validation phases is critical within the pipeline leading to clinical tests. Several statistical and data mining methods have been described for feature selection: in particular, wrapper approaches iteratively assess the performance of the classifier on distinct subsets of variables. Current wrappers, however, do not estimate the significance of the selected features. We therefore developed a new methodology to find the smallest feature subset which significantly contributes to the model performance, by using a combination of resampling, ranking of variable importance, significance assessment by permutation of the feature values in the test subsets, and half-interval search. We wrapped our biosigner algorithm around three reference binary classifiers (Partial Least Squares—Discriminant Analysis, Random Forest, and Support Vector Machines) which have been shown to achieve specific performances depending on the structure of the dataset. By using three real biological and clinical metabolomics and transcriptomics datasets (containing up to 7000 features), complementary signatures were obtained in a few minutes, generally providing higher prediction accuracies than the initial full model. Comparison with alternative feature selection approaches further indicated that our method provides signatures of restricted size and high stability. Finally, by using our methodology to seek metabolites discriminating type 1 from type 2 diabetic patients, several features were selected, including a fragment from the taurochenodeoxycholic bile acid. Our methodology, implemented in the biosigner R/Bioconductor package and Galaxy/Workflow4metabolomics module, should be of interest for both experimenters and statisticians to identify robust molecular signatures from large omics datasets in the process of developing new diagnostics.
Collapse
Affiliation(s)
- Philippe Rinaudo
- CEA, LIST, Laboratory for Data Analysis and Systems' Intelligence, MetaboHUB Gif-sur-Yvette, France
| | - Samia Boudah
- Laboratoire d'Etude du Métabolisme des Médicaments, DSV/iBiTec-S/SPI, MetaboHUB, CEA-Saclay Gif-sur-Yvette, France
| | - Christophe Junot
- Laboratoire d'Etude du Métabolisme des Médicaments, DSV/iBiTec-S/SPI, MetaboHUB, CEA-Saclay Gif-sur-Yvette, France
| | - Etienne A Thévenot
- CEA, LIST, Laboratory for Data Analysis and Systems' Intelligence, MetaboHUB Gif-sur-Yvette, France
| |
Collapse
|
64
|
Pavelek Z, Vyšata O, Tambor V, Pimková K, Vu DL, Kuča K, Šťourač P, Vališ M. Proteomic analysis of cerebrospinal fluid for relapsing-remitting multiple sclerosis and clinically isolated syndrome. Biomed Rep 2016; 5:35-40. [PMID: 27347402 PMCID: PMC4906564 DOI: 10.3892/br.2016.668] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Accepted: 04/25/2016] [Indexed: 01/21/2023] Open
Abstract
Early diagnosis and treatment of multiple sclerosis (MS) in the initial stages of the disease can significantly retard its progression. The aim of the present study was to identify changes in the cerebrospinal fluid proteome in patients with relapsing-remitting MS and clinically isolated MS syndrome who are at high risk of developing MS (case group) compared to healthy population (control) in order to identify potential new markers, which could ultimately aid in early diagnosis of MS. The protein concentrations of each of the 11 case and 15 control samples were determined using a bicinchoninic acid assay. Nanoscale liquid chromatography coupled with tandem mass spectrometry was used for protein identification. Proteomics data were processed using the Perseus software suite and R. The results were filtered using the Benjamini-Hochberg procedure for the false discovery rate (FDR) correction (FDR<0.05). The results showed that, 26 proteins were significantly dysregulated in case samples compared to the controls. Nine proteins were found to be significantly less abundant in case samples, while the abundance of 17 proteins was significantly increased in case samples compared to controls. Three of the proteins were previously linked to RR MS, including immunoglobulin (Ig) γ-1 chain C region, Ig heavy chain V–III region BRO and Ig κ chain C region. Three proteins that were uniquely expressed in patients with RR MS were identified and these proteins may serve as prognostic biomarkers for identifying patients with a high risk of developing RR MS.
Collapse
Affiliation(s)
- Zbyšek Pavelek
- Department of Neurology, Faculty of Medicine and University Hospital Hradec Králové, Charles University in Prague, CZ-500 05 Hradec Králové, Czech Republic
| | - Oldřich Vyšata
- Department of Neurology, Faculty of Medicine and University Hospital Hradec Králové, Charles University in Prague, CZ-500 05 Hradec Králové, Czech Republic
| | - Vojtěch Tambor
- Biomedical Research Center, University Hospital Hradec Králové, CZ-500 05 Hradec Králové, Czech Republic
| | - Kristýna Pimková
- Biomedical Research Center, University Hospital Hradec Králové, CZ-500 05 Hradec Králové, Czech Republic
| | - Dai Long Vu
- Biomedical Research Center, University Hospital Hradec Králové, CZ-500 05 Hradec Králové, Czech Republic
| | - Kamil Kuča
- Biomedical Research Center, University Hospital Hradec Králové, CZ-500 05 Hradec Králové, Czech Republic
| | - Pavel Šťourač
- Department of Neurology, Faculty of Medicine and University Hospital Brno, CZ-639 00 Brno, Czech Republic
| | - Martin Vališ
- Department of Neurology, Faculty of Medicine and University Hospital Hradec Králové, Charles University in Prague, CZ-500 05 Hradec Králové, Czech Republic
| |
Collapse
|
65
|
Abstract
Autoantibodies are a key component for the diagnosis, prognosis and monitoring of various diseases. In order to discover novel autoantibody targets, highly multiplexed assays based on antigen arrays hold a great potential and provide possibilities to analyze hundreds of body fluid samples for their reactivity pattern against thousands of antigens in parallel. Here, we provide an overview of the available technologies for producing antigen arrays, highlight some of the technical and methodological considerations and discuss their applications as discovery tools. Together with recent studies utilizing antigen arrays, we give an overview on how the different types of antigen arrays have and will continue to deliver novel insights into autoimmune diseases among several others.
Collapse
|
66
|
Li Y, Wang L, Ju L, Deng H, Zhang Z, Hou Z, Xie J, Wang Y, Zhang Y. A Systematic Strategy for Screening and Application of Specific Biomarkers in Hepatotoxicity Using Metabolomics Combined With ROC Curves and SVMs. Toxicol Sci 2016; 150:390-9. [PMID: 26781514 DOI: 10.1093/toxsci/kfw001] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Current studies that evaluate toxicity based on metabolomics have primarily focused on the screening of biomarkers while largely neglecting further verification and biomarker applications. For this reason, we used drug-induced hepatotoxicity as an example to establish a systematic strategy for screening specific biomarkers and applied these biomarkers to evaluate whether the drugs have potential hepatotoxicity toxicity. Carbon tetrachloride (5 ml/kg), acetaminophen (1500 mg/kg), and atorvastatin (5 mg/kg) are established as rat hepatotoxicity models. Fifteen common biomarkers were screened by multivariate statistical analysis and integration analysis-based metabolomics data. The receiver operating characteristic curve was used to evaluate the sensitivity and specificity of the biomarkers. We obtained 10 specific biomarker candidates with an area under the curve greater than 0.7. Then, a support vector machine model was established by extracting specific biomarker candidate data from the hepatotoxic drugs and nonhepatotoxic drugs; the accuracy of the model was 94.90% (92.86% sensitivity and 92.59% specificity) and the results demonstrated that those ten biomarkers are specific. 6 drugs were used to predict the hepatotoxicity by the support vector machines model; the prediction results were consistent with the biochemical and histopathological results, demonstrating that the model was reliable. Thus, this support vector machine model can be applied to discriminate the between the hepatic or nonhepatic toxicity of drugs. This approach not only presents a new strategy for screening-specific biomarkers with greater diagnostic significance but also provides a new evaluation pattern for hepatotoxicity, and it will be a highly useful tool in toxicity estimation and disease diagnoses.
Collapse
Affiliation(s)
- Yubo Li
- *Tianjin State Key Laboratory of Modern Chinese Medicine, School of Traditional Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China and
| | - Lei Wang
- *Tianjin State Key Laboratory of Modern Chinese Medicine, School of Traditional Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China and
| | - Liang Ju
- *Tianjin State Key Laboratory of Modern Chinese Medicine, School of Traditional Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China and
| | - Haoyue Deng
- *Tianjin State Key Laboratory of Modern Chinese Medicine, School of Traditional Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China and
| | - Zhenzhu Zhang
- *Tianjin State Key Laboratory of Modern Chinese Medicine, School of Traditional Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China and
| | - Zhiguo Hou
- *Tianjin State Key Laboratory of Modern Chinese Medicine, School of Traditional Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China and
| | - Jiabin Xie
- *Tianjin State Key Laboratory of Modern Chinese Medicine, School of Traditional Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China and
| | - Yuming Wang
- *Tianjin State Key Laboratory of Modern Chinese Medicine, School of Traditional Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China and
| | - Yanjun Zhang
- Tianjin State Key Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China
| |
Collapse
|
67
|
Ramus C, Hovasse A, Marcellin M, Hesse AM, Mouton-Barbosa E, Bouyssié D, Vaca S, Carapito C, Chaoui K, Bruley C, Garin J, Cianférani S, Ferro M, Van Dorssaeler A, Burlet-Schiltz O, Schaeffer C, Couté Y, Gonzalez de Peredo A. Benchmarking quantitative label-free LC–MS data processing workflows using a complex spiked proteomic standard dataset. J Proteomics 2016; 132:51-62. [DOI: 10.1016/j.jprot.2015.11.011] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Revised: 11/04/2015] [Accepted: 11/08/2015] [Indexed: 10/22/2022]
|
68
|
Mayne J, Ning Z, Zhang X, Starr AE, Chen R, Deeke S, Chiang CK, Xu B, Wen M, Cheng K, Seebun D, Star A, Moore JI, Figeys D. Bottom-Up Proteomics (2013-2015): Keeping up in the Era of Systems Biology. Anal Chem 2015; 88:95-121. [PMID: 26558748 DOI: 10.1021/acs.analchem.5b04230] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Janice Mayne
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Zhibin Ning
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Xu Zhang
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Amanda E Starr
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Rui Chen
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Shelley Deeke
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Cheng-Kang Chiang
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Bo Xu
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Ming Wen
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Kai Cheng
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Deeptee Seebun
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Alexandra Star
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Jasmine I Moore
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Daniel Figeys
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| |
Collapse
|
69
|
Mazzara S, Sinisi A, Cardaci A, Rossi RL, Muratori L, Abrignani S, Bombaci M. Two of Them Do It Better: Novel Serum Biomarkers Improve Autoimmune Hepatitis Diagnosis. PLoS One 2015; 10:e0137927. [PMID: 26375394 PMCID: PMC4573979 DOI: 10.1371/journal.pone.0137927] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 08/22/2015] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Autoimmune hepatitis (AIH) is a chronic liver disease of unknown aetiology and characterized by continuing hepatocellular inflammation and necrosis. Autoantibodies represent accessible markers to measure the adaptive immune responses in the clinical investigation. Protein microarrays have become an important tool to discriminate the disease state from control groups, even though there is no agreed-upon standard to analyze the results. RESULTS In the present study 15 sera of patients with AIH and 78 healthy donors (HD) have been tested against 1626 proteins by an in house-developed array. Using a Partial Least Squares Discriminant Analysis (PLS-DA) the resulting data interpretation led to the identification of both new and previously identified proteins. Two new proteins AHPA9419 and Chondroadherin precursor (UNQ9419 and CHAD, respectively), and previously identified candidates as well, have been confirmed in a validation phase by DELFIA assay using a new cohort of AIH patients. A receiver operating characteristic analysis was used for the evaluation of biomarker candidates. The sensitivity of each autoantigen in AIH ranged from 65 to 88%; moreover, when the combination of the two new autoantigens was analyzed, the sensitivity increased to 95%. CONCLUSIONS Our findings demonstrate that the detection of autoantibodies against the two autoantigens could improve the performance in discriminating AIH patients from control classes and in combination with previously identified autoantigens and they could be used in diagnostic/prognostic markers.
Collapse
Affiliation(s)
- Saveria Mazzara
- Istituto Nazionale Genetica Molecolare “Romeo ed Enrica Invernizzi”, Milan, Italy
| | - Antonia Sinisi
- Istituto Nazionale Genetica Molecolare “Romeo ed Enrica Invernizzi”, Milan, Italy
| | - Angela Cardaci
- Istituto Nazionale Genetica Molecolare “Romeo ed Enrica Invernizzi”, Milan, Italy
| | | | - Luigi Muratori
- Center for the Study and Treatment of Autoimmune Diseases of the Liver and Biliary System, Policlinico di Sant’Orsola, Department of Medical and Surgical Sciences (DIMEC), Alma Mater Studiorum, University of Bologna, Bologna, Italy
| | - Sergio Abrignani
- Istituto Nazionale Genetica Molecolare “Romeo ed Enrica Invernizzi”, Milan, Italy
- DISSCO, Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
| | - Mauro Bombaci
- Istituto Nazionale Genetica Molecolare “Romeo ed Enrica Invernizzi”, Milan, Italy
| |
Collapse
|
70
|
Hanhineva K, Brunius C, Andersson A, Marklund M, Juvonen R, Keski-Rahkonen P, Auriola S, Landberg R. Discovery of urinary biomarkers of whole grain rye intake in free-living subjects using nontargeted LC-MS metabolite profiling. Mol Nutr Food Res 2015; 59:2315-25. [DOI: 10.1002/mnfr.201500423] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Revised: 08/05/2015] [Accepted: 08/06/2015] [Indexed: 11/07/2022]
Affiliation(s)
- Kati Hanhineva
- Department of Clinical Nutrition; Institute of Public Health and Clinical Nutrition; University of Eastern Finland; Kuopio Finland
| | - Carl Brunius
- Department of Food Science; Uppsala BioCenter; Swedish University of Agricultural Sciences; Uppsala Sweden
| | - Agneta Andersson
- Department of Food; Nutrition and Dietetics; Uppsala University; Uppsala Sweden
| | - Matti Marklund
- Department of Public Health and Caring Sciences; Clinical Nutrition and Metabolism; Uppsala Univeristy; Uppsala Sweden
| | - Risto Juvonen
- School of Pharmacy; University of Eastern Finland; Kuopio Finland
| | | | - Seppo Auriola
- School of Pharmacy; University of Eastern Finland; Kuopio Finland
| | - Rikard Landberg
- Department of Food Science; Uppsala BioCenter; Swedish University of Agricultural Sciences; Uppsala Sweden
- Nutritional Epidemiology Unit; Institute of Environmental Medicine; Karolinska Insitutet; Stockholm Sweden
| |
Collapse
|
71
|
Pursiheimo A, Vehmas AP, Afzal S, Suomi T, Chand T, Strauss L, Poutanen M, Rokka A, Corthals GL, Elo LL. Optimization of Statistical Methods Impact on Quantitative Proteomics Data. J Proteome Res 2015; 14:4118-26. [DOI: 10.1021/acs.jproteome.5b00183] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Anna Pursiheimo
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
- Department
of Mathematics and Statistics, University of Turku, FI-20014 Turku, Finland
| | - Anni P. Vehmas
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| | - Saira Afzal
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| | - Tomi Suomi
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
- Department
of Information Technology, University of Turku, FI-20014 Turku, Finland
| | - Thaman Chand
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| | - Leena Strauss
- Department
of Physiology and Turku Center for Disease Modeling, Institute of
Biomedicine, University of Turku, Kiinamyllynkatu 10, FI-20520 Turku, Finland
| | - Matti Poutanen
- Department
of Physiology and Turku Center for Disease Modeling, Institute of
Biomedicine, University of Turku, Kiinamyllynkatu 10, FI-20520 Turku, Finland
| | - Anne Rokka
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| | - Garry L. Corthals
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
- Van’t
Hoff Institute for Molecular Sciences, University of Amsterdam, P.O. Box 94157, 1090 GD Amsterdam, The Netherlands
| | - Laura L. Elo
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
- Department
of Mathematics and Statistics, University of Turku, FI-20014 Turku, Finland
| |
Collapse
|
72
|
Jagga Z, Gupta D. Machine learning for biomarker identification in cancer research - developments toward its clinical application. Per Med 2015; 12:371-387. [PMID: 29771660 DOI: 10.2217/pme.15.5] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The patterns identified from the systematically collected molecular profiles of patient tumor samples, along with clinical metadata, can assist personalized treatments for effective management of cancer patients with similar molecular subtypes. There is an unmet need to develop computational algorithms for cancer diagnosis, prognosis and therapeutics that can identify complex patterns and help in classifications based on plethora of emerging cancer research outcomes in public domain. Machine learning, a branch of artificial intelligence, holds a great potential for pattern recognition in cryptic cancer datasets, as evident from recent literature survey. In this review, we focus on the current status of machine learning applications in cancer research, highlighting trends and analyzing major achievements, roadblocks and challenges toward its implementation in clinics.
Collapse
Affiliation(s)
- Zeenia Jagga
- Bioinformatics Laboratory, Structural & Computational Biology Group, International Centre for Genetic Engineering & Biotechnology (ICGEB), Aruna Asaf Ali Marg, New Delhi 110 067, India
| | - Dinesh Gupta
- Bioinformatics Laboratory, Structural & Computational Biology Group, International Centre for Genetic Engineering & Biotechnology (ICGEB), Aruna Asaf Ali Marg, New Delhi 110 067, India
| |
Collapse
|
73
|
Pagel O, Loroch S, Sickmann A, Zahedi RP. Current strategies and findings in clinically relevant post-translational modification-specific proteomics. Expert Rev Proteomics 2015; 12:235-53. [PMID: 25955281 PMCID: PMC4487610 DOI: 10.1586/14789450.2015.1042867] [Citation(s) in RCA: 122] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Mass spectrometry-based proteomics has considerably extended our knowledge about the occurrence and dynamics of protein post-translational modifications (PTMs). So far, quantitative proteomics has been mainly used to study PTM regulation in cell culture models, providing new insights into the role of aberrant PTM patterns in human disease. However, continuous technological and methodical developments have paved the way for an increasing number of PTM-specific proteomic studies using clinical samples, often limited in sample amount. Thus, quantitative proteomics holds a great potential to discover, validate and accurately quantify biomarkers in body fluids and primary tissues. A major effort will be to improve the complete integration of robust but sensitive proteomics technology to clinical environments. Here, we discuss PTMs that are relevant for clinical research, with a focus on phosphorylation, glycosylation and proteolytic cleavage; furthermore, we give an overview on the current developments and novel findings in mass spectrometry-based PTM research.
Collapse
Affiliation(s)
- Oliver Pagel
- Leibniz-Institut für Analytische Wissenschaften – ISAS – e.V., Otto-Hahn-Straße 6b, 44227 Dortmund, Germany
| | - Stefan Loroch
- Leibniz-Institut für Analytische Wissenschaften – ISAS – e.V., Otto-Hahn-Straße 6b, 44227 Dortmund, Germany
| | | | - René P Zahedi
- Leibniz-Institut für Analytische Wissenschaften – ISAS – e.V., Otto-Hahn-Straße 6b, 44227 Dortmund, Germany
| |
Collapse
|
74
|
Gromski PS, Muhamadali H, Ellis DI, Xu Y, Correa E, Turner ML, Goodacre R. A tutorial review: Metabolomics and partial least squares-discriminant analysis – a marriage of convenience or a shotgun wedding. Anal Chim Acta 2015; 879:10-23. [DOI: 10.1016/j.aca.2015.02.012] [Citation(s) in RCA: 509] [Impact Index Per Article: 56.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2014] [Revised: 02/03/2015] [Accepted: 02/06/2015] [Indexed: 01/14/2023]
|
75
|
Heberle H, Meirelles GV, da Silva FR, Telles GP, Minghim R. InteractiVenn: a web-based tool for the analysis of sets through Venn diagrams. BMC Bioinformatics 2015; 16:169. [PMID: 25994840 DOI: 10.1186/s12859-015-0611-613] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 05/06/2015] [Indexed: 05/27/2023] Open
Abstract
BACKGROUND Set comparisons permeate a large number of data analysis workflows, in particular workflows in biological sciences. Venn diagrams are frequently employed for such analysis but current tools are limited. RESULTS We have developed InteractiVenn, a more flexible tool for interacting with Venn diagrams including up to six sets. It offers a clean interface for Venn diagram construction and enables analysis of set unions while preserving the shape of the diagram. Set unions are useful to reveal differences and similarities among sets and may be guided in our tool by a tree or by a list of set unions. The tool also allows obtaining subsets' elements, saving and loading sets for further analyses, and exporting the diagram in vector and image formats. InteractiVenn has been used to analyze two biological datasets, but it may serve set analysis in a broad range of domains. CONCLUSIONS InteractiVenn allows set unions in Venn diagrams to be explored thoroughly, by consequence extending the ability to analyze combinations of sets with additional observations, yielded by novel interactions between joined sets. InteractiVenn is freely available online at: www.interactivenn.net .
Collapse
Affiliation(s)
- Henry Heberle
- Universidade de São Paulo, Instituto de Ciências Matemáticas e de Computação, Av. Trabalhador São-carlense, 400, São Carlos SP, Brazil.
| | | | - Felipe R da Silva
- Embrapa Informática Agropecuária, Av. André Tosello, 209, Campinas SP, Brazil.
| | - Guilherme P Telles
- Universidade Estadual de Campinas, Instituto de Computação, Av. Albert Einstein, 1251, Campinas SP, Brazil.
| | - Rosane Minghim
- Universidade de São Paulo, Instituto de Ciências Matemáticas e de Computação, Av. Trabalhador São-carlense, 400, São Carlos SP, Brazil.
| |
Collapse
|
76
|
Heberle H, Meirelles GV, da Silva FR, Telles GP, Minghim R. InteractiVenn: a web-based tool for the analysis of sets through Venn diagrams. BMC Bioinformatics 2015; 16:169. [PMID: 25994840 PMCID: PMC4455604 DOI: 10.1186/s12859-015-0611-3] [Citation(s) in RCA: 1339] [Impact Index Per Article: 148.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 05/06/2015] [Indexed: 01/12/2023] Open
Abstract
Background Set comparisons permeate a large number of data analysis workflows, in particular workflows in biological sciences. Venn diagrams are frequently employed for such analysis but current tools are limited. Results We have developed InteractiVenn, a more flexible tool for interacting with Venn diagrams including up to six sets. It offers a clean interface for Venn diagram construction and enables analysis of set unions while preserving the shape of the diagram. Set unions are useful to reveal differences and similarities among sets and may be guided in our tool by a tree or by a list of set unions. The tool also allows obtaining subsets’ elements, saving and loading sets for further analyses, and exporting the diagram in vector and image formats. InteractiVenn has been used to analyze two biological datasets, but it may serve set analysis in a broad range of domains. Conclusions InteractiVenn allows set unions in Venn diagrams to be explored thoroughly, by consequence extending the ability to analyze combinations of sets with additional observations, yielded by novel interactions between joined sets. InteractiVenn is freely available online at: www.interactivenn.net.
Collapse
Affiliation(s)
- Henry Heberle
- Universidade de São Paulo, Instituto de Ciências Matemáticas e de Computação, Av. Trabalhador São-carlense, 400, São Carlos SP, Brazil.
| | | | - Felipe R da Silva
- Embrapa Informática Agropecuária, Av. André Tosello, 209, Campinas SP, Brazil.
| | - Guilherme P Telles
- Universidade Estadual de Campinas, Instituto de Computação, Av. Albert Einstein, 1251, Campinas SP, Brazil.
| | - Rosane Minghim
- Universidade de São Paulo, Instituto de Ciências Matemáticas e de Computação, Av. Trabalhador São-carlense, 400, São Carlos SP, Brazil.
| |
Collapse
|
77
|
Westbrook JA, Noirel J, Brown JE, Wright PC, Evans CA. Quantitation with chemical tagging reagents in biomarker studies. Proteomics Clin Appl 2015; 9:295-300. [DOI: 10.1002/prca.201400120] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Revised: 11/08/2014] [Accepted: 12/05/2014] [Indexed: 01/06/2023]
Affiliation(s)
- Jules A. Westbrook
- Academic Unit of Clinical Oncology; The Medical School, University of Sheffield; Sheffield UK
| | - Josselin Noirel
- Chaire de Bioinformatique; EA4627; Conservatoire National des Arts et Métiers; Paris France
| | - Janet E. Brown
- Academic Unit of Clinical Oncology; The Medical School, University of Sheffield; Sheffield UK
| | - Phillip C. Wright
- Department of Chemical and Biological Engineering; ChELSI Institute; University of Sheffield; Sheffield UK
| | - Caroline A. Evans
- Department of Chemical and Biological Engineering; ChELSI Institute; University of Sheffield; Sheffield UK
| |
Collapse
|
78
|
Chawade A, Sandin M, Teleman J, Malmström J, Levander F. Data Processing Has Major Impact on the Outcome of Quantitative Label-Free LC-MS Analysis. J Proteome Res 2014; 14:676-87. [DOI: 10.1021/pr500665j] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Aakash Chawade
- Department
of Immunotechnology, Medicon Village, Lund University, Scheelevägen
2, S-223 63 Lund, Sweden
| | - Marianne Sandin
- Department
of Immunotechnology, Medicon Village, Lund University, Scheelevägen
2, S-223 63 Lund, Sweden
| | - Johan Teleman
- Department
of Immunotechnology, Medicon Village, Lund University, Scheelevägen
2, S-223 63 Lund, Sweden
- Department
of Clinical Sciences, Faculty of Medicine, Lund University, SE-221
84 Lund, Sweden
| | - Johan Malmström
- Department
of Clinical Sciences, Faculty of Medicine, Lund University, SE-221
84 Lund, Sweden
| | - Fredrik Levander
- Department
of Immunotechnology, Medicon Village, Lund University, Scheelevägen
2, S-223 63 Lund, Sweden
- Bioinformatics
Infrastructure for Life Sciences (BILS), Lund University, P.O. Box 117, 221 00 Lund, Sweden
| |
Collapse
|
79
|
Lin X, Gao J, Zhou L, Yin P, Xu G. A modified k-TSP algorithm and its application in LC-MS-based metabolomics study of hepatocellular carcinoma and chronic liver diseases. J Chromatogr B Analyt Technol Biomed Life Sci 2014; 966:100-8. [PMID: 24939728 DOI: 10.1016/j.jchromb.2014.05.044] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2013] [Revised: 05/19/2014] [Accepted: 05/21/2014] [Indexed: 01/10/2023]
Abstract
In systems biology, the ability to discern meaningful information that reflects the nature of related problems from large amounts of data has become a key issue. The classification method using top scoring pairs (TSP), which measures the features of a data set in pairs and selects the top ranked feature pairs to construct the classifier, has been a powerful tool in genomics data analysis because of its simplicity and interpretability. This study examined the relationship between two features, modified the ranking criteria of the k-TSP method to measure the discriminative ability of each feature pair more accurately, and correspondingly, provided an improved classification procedure. Tests on eight public data sets showed the validity of the modified method. This modified k-TSP method was applied to our serum metabolomics data derived from liquid chromatography-mass spectrometry analysis of hepatocellular carcinoma and chronic liver diseases. Based on the 27 selected feature pairs, HCC and chronic liver diseases were accurately distinguished using the principal component analysis, and certain profound metabolic disturbances related to liver disease development were revealed by the feature pairs.
Collapse
Affiliation(s)
- Xiaohui Lin
- School of Computer Science & Technology, Dalian University of Technology, 116024 Dalian, China
| | - Jiuchong Gao
- School of Computer Science & Technology, Dalian University of Technology, 116024 Dalian, China
| | - Lina Zhou
- Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
| | - Peiyuan Yin
- Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
| | - Guowang Xu
- Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China.
| |
Collapse
|