1
|
Zhu ZX, Genchev GZ, Wang YM, Ji W, Ren YY, Tian GL, Sriswasdi S, Lu H. Improving the second-tier classification of methylmalonic acidemia patients using a machine learning ensemble method. World J Pediatr 2024:10.1007/s12519-023-00788-6. [PMID: 38401044 DOI: 10.1007/s12519-023-00788-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 12/10/2023] [Indexed: 02/26/2024]
Abstract
INTRODUCTION Methylmalonic acidemia (MMA) is a disorder of autosomal recessive inheritance, with an estimated prevalence of 1:50,000. First-tier clinical diagnostic tests often return many false positives [five false positive (FP): one true positive (TP)]. In this work, our goal was to refine a classification model that can minimize the number of false positives, currently an unmet need in the upstream diagnostics of MMA. METHODS We developed machine learning multivariable screening models for MMA with utility as a secondary-tier tool for false positives reduction. We utilized mass spectrometry-based features consisting of 11 amino acids and 31 carnitines derived from dried blood samples of neonatal patients, followed by additional ratio feature construction. Feature selection strategies (selection by filter, recursive feature elimination, and learned vector quantization) were used to determine the input set for evaluating the performance of 14 classification models to identify a candidate model set for an ensemble model development. RESULTS Our work identified computational models that explore metabolic analytes to reduce the number of false positives without compromising sensitivity. The best results [area under the receiver operating characteristic curve (AUROC) of 97%, sensitivity of 92%, and specificity of 95%] were obtained utilizing an ensemble of the algorithms random forest, C5.0, sparse linear discriminant analysis, and autoencoder deep neural network stacked with the algorithm stochastic gradient boosting as the supervisor. The model achieved a good performance trade-off for a screening application with 6% false-positive rate (FPR) at 95% sensitivity, 35% FPR at 99% sensitivity, and 39% FPR at 100% sensitivity. CONCLUSIONS The classification results and approach of this research can be utilized by clinicians globally, to improve the overall discovery of MMA in pediatric patients. The improved method, when adjusted to 100% precision, can be used to further inform the diagnostic process journey of MMA and help reduce the burden for patients and their families.
Collapse
Affiliation(s)
- Zhi-Xing Zhu
- Shanghai Engineering Research Center for Big Data in Pediatric Precision Medicine, Center for Biomedical Informatics, Shanghai Children's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Georgi Z Genchev
- Center of Excellence in Computational Molecular Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Yan-Min Wang
- Newborn Screening Center, Shanghai Children's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Wei Ji
- Newborn Screening Center, Shanghai Children's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Yong-Yong Ren
- SJTU-Yale Joint Center for Biostatistics and Data Science, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Guo-Li Tian
- Newborn Screening Center, Shanghai Children's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China.
| | - Sira Sriswasdi
- Center of Excellence in Computational Molecular Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand.
- Center for Artificial Intelligence in Medicine, Research Affairs, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand.
| | - Hui Lu
- Shanghai Engineering Research Center for Big Data in Pediatric Precision Medicine, Center for Biomedical Informatics, Shanghai Children's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China.
- SJTU-Yale Joint Center for Biostatistics and Data Science, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China.
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
2
|
Zaunseder E, Mütze U, Garbade SF, Haupt S, Feyh P, Hoffmann GF, Heuveline V, Kölker S. Machine Learning Methods Improve Specificity in Newborn Screening for Isovaleric Aciduria. Metabolites 2023; 13:metabo13020304. [PMID: 36837923 PMCID: PMC9962193 DOI: 10.3390/metabo13020304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 02/10/2023] [Accepted: 02/14/2023] [Indexed: 02/22/2023] Open
Abstract
Isovaleric aciduria (IVA) is a rare disorder of leucine metabolism and part of newborn screening (NBS) programs worldwide. However, NBS for IVA is hampered by, first, the increased birth prevalence due to the identification of individuals with an attenuated disease variant (so-called "mild" IVA) and, second, an increasing number of false positive screening results due to the use of pivmecillinam contained in the medication. Recently, machine learning (ML) methods have been analyzed, analogous to new biomarkers or second-tier methods, in the context of NBS. In this study, we investigated the application of machine learning classification methods to improve IVA classification using an NBS data set containing 2,106,090 newborns screened in Heidelberg, Germany. Therefore, we propose to combine two methods, linear discriminant analysis, and ridge logistic regression as an additional step, a digital-tier, to traditional NBS. Our results show that this reduces the false positive rate by 69.9% from 103 to 31 while maintaining 100% sensitivity in cross-validation. The ML methods were able to classify mild and classic IVA from normal newborns solely based on the NBS data and revealed that besides isovalerylcarnitine (C5), the metabolite concentration of tryptophan (Trp) is important for improved classification. Overall, applying ML methods to improve the specificity of IVA could have a major impact on newborns, as it could reduce the newborns' and families' burden of false positives or over-treatment.
Collapse
Affiliation(s)
- Elaine Zaunseder
- Engineering Mathematics and Computing Lab (EMCL), Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, 69120 Heidelberg, Germany
- Data Mining and Uncertainty Quantification (DMQ), Heidelberg Institute for Theoretical Studies (HITS), 69118 Heidelberg, Germany
- Correspondence:
| | - Ulrike Mütze
- Division of Child Neurology and Metabolic Medicine, Center for Child and Adolescent Medicine, Heidelberg University Hospital, 69120 Heidelberg, Germany
| | - Sven F. Garbade
- Division of Child Neurology and Metabolic Medicine, Center for Child and Adolescent Medicine, Heidelberg University Hospital, 69120 Heidelberg, Germany
| | - Saskia Haupt
- Engineering Mathematics and Computing Lab (EMCL), Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, 69120 Heidelberg, Germany
- Data Mining and Uncertainty Quantification (DMQ), Heidelberg Institute for Theoretical Studies (HITS), 69118 Heidelberg, Germany
| | - Patrik Feyh
- Division of Child Neurology and Metabolic Medicine, Center for Child and Adolescent Medicine, Heidelberg University Hospital, 69120 Heidelberg, Germany
| | - Georg F. Hoffmann
- Division of Child Neurology and Metabolic Medicine, Center for Child and Adolescent Medicine, Heidelberg University Hospital, 69120 Heidelberg, Germany
| | - Vincent Heuveline
- Engineering Mathematics and Computing Lab (EMCL), Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, 69120 Heidelberg, Germany
- Data Mining and Uncertainty Quantification (DMQ), Heidelberg Institute for Theoretical Studies (HITS), 69118 Heidelberg, Germany
| | - Stefan Kölker
- Division of Child Neurology and Metabolic Medicine, Center for Child and Adolescent Medicine, Heidelberg University Hospital, 69120 Heidelberg, Germany
| |
Collapse
|
3
|
Zaunseder E, Haupt S, Mütze U, Garbade SF, Kölker S, Heuveline V. Opportunities and challenges in machine learning-based newborn screening-A systematic literature review. JIMD Rep 2022; 63:250-261. [PMID: 35433168 PMCID: PMC8995842 DOI: 10.1002/jmd2.12285] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 03/17/2022] [Indexed: 01/06/2023] Open
Abstract
The development and continuous optimization of newborn screening (NBS) programs remains an important and challenging task due to the low prevalence of screened diseases and high sensitivity requirements for screening methods. Recently, different machine learning (ML) methods have been applied to support NBS. However, most studies only focus on single diseases or specific ML techniques making it difficult to draw conclusions on which methods are best to implement. Therefore, we performed a systematic literature review of peer-reviewed publications on ML-based NBS methods. Overall, 125 related papers, published in the past two decades, were collected for the study, and 17 met the inclusion criteria. We analyzed the opportunities and challenges of ML methods for NBS including data preprocessing, classification models and pattern recognition methods based on their underlying approaches, data requirements, interpretability on a modular level, and performance. In general, ML methods have the potential to reduce the false positive rate and identify so far unknown metabolic patterns within NBS data. Our analysis revealed, that, among the presented, logistic regression analysis and support vector machines seem to be valuable candidates for NBS. However, due to the variety of diseases and methods, a general recommendation for a single method in NBS is not possible. Instead, these methods should be further investigated and compared to other approaches in comprehensive studies as they show promising results in NBS applications.
Collapse
Affiliation(s)
- Elaine Zaunseder
- Engineering Mathematics and Computing Lab (EMCL), Interdisciplinary Center for Scientific Computing (IWR)Heidelberg UniversityHeidelbergGermany
- Data Mining and Uncertainty Quantification (DMQ)Heidelberg Institute for Theoretical Studies (HITS)HeidelbergGermany
| | - Saskia Haupt
- Engineering Mathematics and Computing Lab (EMCL), Interdisciplinary Center for Scientific Computing (IWR)Heidelberg UniversityHeidelbergGermany
- Data Mining and Uncertainty Quantification (DMQ)Heidelberg Institute for Theoretical Studies (HITS)HeidelbergGermany
| | - Ulrike Mütze
- Division of Child Neurology and Metabolic Medicine, Center for Child and Adolescent MedicineHeidelberg University HospitalHeidelbergGermany
| | - Sven F. Garbade
- Division of Child Neurology and Metabolic Medicine, Center for Child and Adolescent MedicineHeidelberg University HospitalHeidelbergGermany
| | - Stefan Kölker
- Division of Child Neurology and Metabolic Medicine, Center for Child and Adolescent MedicineHeidelberg University HospitalHeidelbergGermany
| | - Vincent Heuveline
- Engineering Mathematics and Computing Lab (EMCL), Interdisciplinary Center for Scientific Computing (IWR)Heidelberg UniversityHeidelbergGermany
- Data Mining and Uncertainty Quantification (DMQ)Heidelberg Institute for Theoretical Studies (HITS)HeidelbergGermany
| |
Collapse
|
4
|
Messa GM, Napolitano F, Elsea SH, di Bernardo D, Gao X. A Siamese neural network model for the prioritization of metabolic disorders by integrating real and simulated data. Bioinformatics 2020; 36:i787-i794. [PMID: 33381827 DOI: 10.1093/bioinformatics/btaa841] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Untargeted metabolomic approaches hold a great promise as a diagnostic tool for inborn errors of metabolisms (IEMs) in the near future. However, the complexity of the involved data makes its application difficult and time consuming. Computational approaches, such as metabolic network simulations and machine learning, could significantly help to exploit metabolomic data to aid the diagnostic process. While the former suffers from limited predictive accuracy, the latter is normally able to generalize only to IEMs for which sufficient data are available. Here, we propose a hybrid approach that exploits the best of both worlds by building a mapping between simulated and real metabolic data through a novel method based on Siamese neural networks (SNN). RESULTS The proposed SNN model is able to perform disease prioritization for the metabolic profiles of IEM patients even for diseases that it was not trained to identify. To the best of our knowledge, this has not been attempted before. The developed model is able to significantly outperform a baseline model that relies on metabolic simulations only. The prioritization performances demonstrate the feasibility of the method, suggesting that the integration of metabolic models and data could significantly aid the IEM diagnosis process in the near future. AVAILABILITY AND IMPLEMENTATION Metabolic datasets used in this study are publicly available from the cited sources. The original data produced in this study, including the trained models and the simulated metabolic profiles, are also publicly available (Messa et al., 2020).
Collapse
Affiliation(s)
- Gian Marco Messa
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Francesco Napolitano
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Sarah H Elsea
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Diego di Bernardo
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli 80078, Italy.,Department of Chemical, Materials and Industrial Production Engineering, University of Naples Federico II, 80125 Naples, Italy
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
5
|
Abstract
Metabolomics is the quantitative analysis of a large number of low molecular weight metabolites that are intermediate or final products of all the metabolic pathways in a living organism. Any metabolic profiles detectable in a human biological fluid are caused by the interaction between gene expression and the environment. The metabolomics approach offers the possibility to identify variations in metabolite profile that can be used to discriminate disease. This is particularly important for neonatal and pediatric studies especially for severe ill patient diagnosis and early identification. This property is of a great clinical importance in view of the newer definitions of health and disease. This review emphasizes the workflow of a typical metabolomics study and summarizes the latest results obtained in neonatal studies with particular interest in prematurity, intrauterine growth retardation, inborn errors of metabolism, perinatal asphyxia, sepsis, necrotizing enterocolitis, kidney disease, bronchopulmonary dysplasia, and cardiac malformation and dysfunction.
Collapse
|
6
|
Najdekr L, Gardlo A, Mádrová L, Friedecký D, Janečková H, Correa ES, Goodacre R, Adam T. Oxidized phosphatidylcholines suggest oxidative stress in patients with medium-chain acyl-CoA dehydrogenase deficiency. Talanta 2015; 139:62-6. [PMID: 25882409 DOI: 10.1016/j.talanta.2015.02.041] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Revised: 02/16/2015] [Accepted: 02/23/2015] [Indexed: 11/30/2022]
Abstract
Inborn errors of metabolism encompass a large group of diseases caused by enzyme deficiencies and are therefore amenable to metabolomics investigations. Medium chain acyl-CoA dehydrogenase deficiency (MCADD) is a defect in β-oxidation of fatty acids, and is one of the most well understood disorders. We report here the use of liquid chromatography-mass spectrometry (LC-MS) based untargeted metabolomics and targeted flow injection analysis-tandem mass spectrometry (FIA-TMS) that lead to discovery of novel compounds of oxidative stress. Dry blood spots of controls (n=25) and patient samples (n=25) were extracted by methanol/water (1/1, v/v) and these supernatants were analyzed by LC-MS method with detection by an Orbitrap Elite MS. Data were processed by XCMS and CAMERA followed by dimension reduction methods. Patients were clearly distinguished from controls in PCA. S-plot derived from OPLS-DA indicated that medium-chain acylcarnitines (octanoyl, decenoyl and decanoyl carnitines) as well as three phosphatidylcholines (PC(16:0,9:0(COOH))), PC(18:0,5:0(COOH)) and PC(16:0,8:0(COOH)) were important metabolites for differentiation between patients and healthy controls. In order to biologically validate these discriminatory molecules as indicators for oxidative stress, a second cohort of individuals were analyzed, including MCADD (n=25) and control (n=250) samples. These were measured by a modified newborn screening method using FIA-TMS (API 4000) in MRM mode. Calculated p-values for PC(16:0,9:0(COOH)), PC(18:0,5:0(COOH)) and PC(16:0,8:0(COOH)) were 1.927×10(-14), 2.391×10(-15) and 3.354×10(-15) respectively. These elevated oxidized phospholipids indeed show an increased presence of oxidative stress in MCADD patients as one of the pathophysiological mechanisms of the disease.
Collapse
Affiliation(s)
- Lukáš Najdekr
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotínská 5, Olomouc 775 15, Czech Republic
| | - Alžběta Gardlo
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotínská 5, Olomouc 775 15, Czech Republic
| | - Lucie Mádrová
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotínská 5, Olomouc 775 15, Czech Republic
| | - David Friedecký
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotínská 5, Olomouc 775 15, Czech Republic; Department of Clinical Biochemistry, University Hospital in Olomouc, I.P. Pavlova 6, 775 20 Olomouc, Czech Republic
| | - Hana Janečková
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotínská 5, Olomouc 775 15, Czech Republic; Department of Clinical Biochemistry, University Hospital in Olomouc, I.P. Pavlova 6, 775 20 Olomouc, Czech Republic
| | - Elon S Correa
- School of Chemistry, Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester M1 7DN, UK
| | - Royston Goodacre
- School of Chemistry, Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester M1 7DN, UK
| | - Tomáš Adam
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotínská 5, Olomouc 775 15, Czech Republic; Department of Clinical Biochemistry, University Hospital in Olomouc, I.P. Pavlova 6, 775 20 Olomouc, Czech Republic.
| |
Collapse
|
7
|
Mussap M, Antonucci R, Noto A, Fanos V. The role of metabolomics in neonatal and pediatric laboratory medicine. Clin Chim Acta 2013; 426:127-38. [PMID: 24035970 DOI: 10.1016/j.cca.2013.08.020] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Revised: 08/26/2013] [Accepted: 08/26/2013] [Indexed: 12/22/2022]
Abstract
Metabolomics consists of the quantitative analysis of a large number of low molecular mass metabolites involving substrates or products in metabolic pathways existing in all living systems. The analysis of the metabolic profile detectable in a human biological fluid allows to instantly identify changes in the composition of endogenous and exogenous metabolites caused by the interaction between specific physiopathological states, gene expression, and environment. In pediatrics and neonatology, metabolomics offers new encouraging perspectives for the improvement of critically ill patient outcome, for the early recognition of metabolic profiles associated with the development of diseases in the adult life, and for delivery of individualized medicine. In this view, nutrimetabolomics, based on the recognition of specific cluster of metabolites associated with nutrition and pharmacometabolomics, based on the capacity to personalize drug therapy by analyzing metabolic modifications due to therapeutic treatment may open new frontiers in the prevention and in the treatment of pediatric and neonatal diseases. This review summarizes the most relevant results published in the literature on the application of metabolomics in pediatric and neonatal clinical settings. However, there is the urgent need to standardize physiological and preanalytical variables, analytical methods, data processing, and result presentation, before establishing the definitive clinical value of results.
Collapse
Affiliation(s)
- Michele Mussap
- Laboratory Medicine Service, IRCCS AOU San Martino-IST, University-Hospital, National Institute for Cancer Research, Genova, Italy
| | | | | | | |
Collapse
|
8
|
Liu KQ, Liu ZP, Hao JK, Chen L, Zhao XM. Identifying dysregulated pathways in cancers from pathway interaction networks. BMC Bioinformatics 2012; 13:126. [PMID: 22676414 PMCID: PMC3443452 DOI: 10.1186/1471-2105-13-126] [Citation(s) in RCA: 100] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2011] [Accepted: 05/21/2012] [Indexed: 12/04/2022] Open
Abstract
Background Cancers, a group of multifactorial complex diseases, are generally caused by mutation of multiple genes or dysregulation of pathways. Identifying biomarkers that can characterize cancers would help to understand and diagnose cancers. Traditional computational methods that detect genes differentially expressed between cancer and normal samples fail to work due to small sample size and independent assumption among genes. On the other hand, genes work in concert to perform their functions. Therefore, it is expected that dysregulated pathways will serve as better biomarkers compared with single genes. Results In this paper, we propose a novel approach to identify dysregulated pathways in cancer based on a pathway interaction network. Our contribution is three-fold. Firstly, we present a new method to construct pathway interaction network based on gene expression, protein-protein interactions and cellular pathways. Secondly, the identification of dysregulated pathways in cancer is treated as a feature selection problem, which is biologically reasonable and easy to interpret. Thirdly, the dysregulated pathways are identified as subnetworks from the pathway interaction networks, where the subnetworks characterize very well the functional dependency or crosstalk between pathways. The benchmarking results on several distinct cancer datasets demonstrate that our method can obtain more reliable and accurate results compared with existing state of the art methods. Further functional analysis and independent literature evidence also confirm that our identified potential pathogenic pathways are biologically reasonable, indicating the effectiveness of our method. Conclusions Dysregulated pathways can serve as better biomarkers compared with single genes. In this work, by utilizing pathway interaction networks and gene expression data, we propose a novel approach that effectively identifies dysregulated pathways, which can not only be used as biomarkers to diagnose cancers but also serve as potential drug targets in the future.
Collapse
Affiliation(s)
- Ke-Qin Liu
- Institute of Systems Biology, Shanghai University, Shanghai 200444, China
| | | | | | | | | |
Collapse
|
9
|
A filter-based feature selection approach for identifying potential biomarkers for lung cancer. J Clin Bioinforma 2011; 1:11. [PMID: 21884628 PMCID: PMC3164604 DOI: 10.1186/2043-9113-1-11] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Accepted: 03/21/2011] [Indexed: 11/10/2022] Open
Abstract
Background Lung cancer is the leading cause of death from cancer in the world and its treatment is dependant on the type and stage of cancer detected in the patient. Molecular biomarkers that can characterize the cancer phenotype are thus a key tool in planning a therapeutic response. A common protocol for identifying such biomarkers is to employ genomic microarray analysis to find genes that show differential expression according to disease state or type. Data-mining techniques such as feature selection are often used to isolate, from among a large manifold of genes with differential expression, those specific genes whose differential expression patterns are of optimal value in phenotypic differentiation. One such technique, Biomarker Identifier (BMI), has been developed to identify features with the ability to distinguish between two data groups of interest, which is thus highly applicable for such studies. Results Microarray data with validated genes was used to evaluate the utility of BMI in identifying markers for lung cancer. This data set contains a set of 129 gene expression profiles from large-airway epithelial cells (60 samples from smokers with lung cancer and 69 from smokers without lung cancer) and 7 genes from this data have been confirmed to be differentially expressed by quantitative PCR. Using this data set, BMI was compared with various well-known feature selection methods and was found to be more successful than other methods in finding useful genes to classify cancerous samples. Also it is evident that genes selected by BMI (given the same number of genes and classification algorithms) showed better discriminative power than those from the original study. After pathway analysis on the selected genes by BMI, we have been able to correlate the selected genes with well-known cancer-related pathways. Conclusions Our results show that BMI can be used to analyze microarray data and to find useful genes for classifying samples. Pathway analysis suggests that BMI is successful in identifying biomarker-quality cancer-related genes from the data.
Collapse
|
10
|
Millonig G, Praun S, Netzer M, Baumgartner C, Dornauer A, Mueller S, Villinger J, Vogel W. Non-invasive diagnosis of liver diseases by breath analysis using an optimized ion-molecule reaction-mass spectrometry approach: a pilot study. Biomarkers 2010; 15:297-306. [PMID: 20151876 DOI: 10.3109/13547501003624512] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Breath composition is altered in liver diseases. We tested if ion-molecule-reaction mass spectrometry (IMR-MS) combined with a new statistical modality improves the diagnostic accuracy of breath analysis in liver diseases. We analysed 114 molecules in the breath of 126 individuals (healthy controls, and patients with non-alcoholic and alcoholic fatty liver disease and liver cirrhosis) by IMR-MS. Characteristic exhalation patterns were identified for each group. Combining two to seven molecules in the new stacked feature ranking model reached a diagnostic accuracy (area under the curve) for individual liver diseases between 0.88 and 0.97. IMR-MS followed by sophisticated statistical analysis is a promising tool for liver diagnostics by breath analysis.
Collapse
Affiliation(s)
- Gunda Millonig
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Medical University of Innsbruck, Austria.
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Griffiths W, Koal T, Wang Y, Kohl M, Enot D, Deigner HP. Targeted Metabolomics for Biomarker Discovery. Angew Chem Int Ed Engl 2010; 49:5426-45. [DOI: 10.1002/anie.200905579] [Citation(s) in RCA: 259] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
12
|
Griffiths W, Koal T, Wang Y, Kohl M, Enot D, Deigner HP. “Targeted Metabolomics” in der Biomarkerforschung. Angew Chem Int Ed Engl 2010. [DOI: 10.1002/ange.200905579] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
13
|
Baumgartner C, Lewis GD, Netzer M, Pfeifer B, Gerszten RE. A new data mining approach for profiling and categorizing kinetic patterns of metabolic biomarkers after myocardial injury. ACTA ACUST UNITED AC 2010; 26:1745-51. [PMID: 20483816 DOI: 10.1093/bioinformatics/btq254] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The discovery of new and unexpected biomarkers in cardiovascular disease is a highly data-driven process that requires the complementary power of modern metabolite profiling technologies, bioinformatics and biostatistics. Clinical biomarkers of early myocardial injury are lacking. A prospective biomarker cohort study was carried out to identify, categorize and profile kinetic patterns of early metabolic biomarkers of planned myocardial infarction (PMI) and spontaneous (SMI) myocardial infarction. We applied a targeted mass spectrometry (MS)-based metabolite profiling platform to serial blood samples drawn from carefully phenotyped patients undergoing alcohol septal ablation for hypertrophic obstructive cardiomyopathy serving as a human model of PMI. Patients with SMI and patients undergoing catheterization without induction of myocardial infarction served as positive and negative controls to assess generalizability of markers identified in PMI. RESULTS To identify metabolites of high predictive value in tandem mass spectrometry data, we introduced a new feature selection method for the categorization of metabolic signatures into three classes of weak, moderate and strong predictors, which can be easily applied to both paired and unpaired samples. Our paradigm outperformed standard null-hypothesis significance testing and other popular methods for feature selection in terms of the area under the receiver operating curve and the product of sensitivity and specificity. Our results emphasize that this new method was able to identify, classify and validate alterations of levels in multiple metabolites participating in pathways associated with myocardial injury as early as 10 min after PMI. AVAILABILITY The algorithm as well as supplementary material is available for download at: www.umit.at/page.cfm?vpath=departments/technik/iebe/tools/bi
Collapse
Affiliation(s)
- Christian Baumgartner
- Research Group for Clinical Bioinformatics, Institute of Electrical, Electronic and Bioengineering, University for Health Sciences, Medical Informatics and Technology (UMIT), A-6060 Hall in Tirol, Austria.
| | | | | | | | | |
Collapse
|
14
|
Abstract
Exploiting the potential of omics for clinical diagnosis, prognosis, and therapeutic purposes has currently been receiving a lot of attention. In recent years, most of the effort has been put into demonstrating the possible clinical applications of the various omics fields. The cost-effectiveness analysis has been, so far, rather neglected. The cost of omics-derived applications is still very high, but future technological improvements are likely to overcome this problem. In this chapter, we will give a general background of the main omics fields and try to provide some examples of the most successful applications of omics that might be used in clinical diagnosis and in a therapeutic context.
Collapse
Affiliation(s)
- Ewa Gubb
- Bioinformatics, Parque Technológico de Bizkaia, Derio, Spain
| | | |
Collapse
|
15
|
Visvanathan M, Netzer M, Seger M, Adagarla BS, Baumgartner C, Sittampalam S, Lushington GH. Oncogenes and pathway identification using filter-based approaches between various carcinoma types in lung. INTERNATIONAL JOURNAL OF COMPUTATIONAL BIOLOGY AND DRUG DESIGN 2009; 2:236-51. [PMID: 20090162 PMCID: PMC2825752 DOI: 10.1504/ijcbdd.2009.030115] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Lung cancer accounts for the most cancer-related deaths. The identification of cancer-associated genes and the related pathways are essential to prevent many types of cancer. In this paper, a more systematic approach is considered. First, we did pathway analysis using Hyper Geometric Distribution (HGD) and significantly overrepresented sets of reactions were identified. Second, feature-selection-based Particle Swarm Optimisation (PSO), Information Gain (IG) and the Biomarker Identifier (BMI) for the identification of different types of lung cancer were used. We also evaluated PSO and developed a new method to determine the BMI thresholds to prioritize genes. We were able to identify sets of key genes that can be found in several pathways. Experimental results show that our method simplifies features effectively and obtains higher classification accuracy than the other methods from the literature.
Collapse
Affiliation(s)
- Mahesh Visvanathan
- Bioinformatics Core Facility, University of Kansas Lawrence, KS 66047, USA
| | - Michael Netzer
- Institute of Electrical, Electronic and Bioengineering, Department of Biomedical Sciences and Engineering, University for Health Sciences, Medical Informatics and Technology (UMIT), A-6060 Hall in Tirol, Austria
| | - Michael Seger
- Institute of Electrical, Electronic and Bioengineering, Department of Biomedical Sciences and Engineering, University for Health Sciences, Medical Informatics and Technology (UMIT), A-6060 Hall in Tirol, Austria
| | | | - Christian Baumgartner
- Institute of Electrical, Electronic and Bioengineering, Department of Biomedical Sciences and Engineering, University for Health Sciences, Medical Informatics and Technology (UMIT), A-6060 Hall in Tirol, Austria, Fax: +43 50 8548 673827
| | - Sitta Sittampalam
- Therapeutics Discovery and Development, University of Kansas, Lawrence, KS, USA
| | | |
Collapse
|
16
|
Netzer M, Millonig G, Osl M, Pfeifer B, Praun S, Villinger J, Vogel W, Baumgartner C. A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry. Bioinformatics 2009; 25:941-7. [PMID: 19223453 DOI: 10.1093/bioinformatics/btp093] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Alcoholic fatty liver disease (AFLD) and non-AFLD (NAFLD) can progress to severe liver diseases such as steatohepatitis, cirrhosis and cancer. Thus, the detection of early liver disease is essential; however, minimal invasive diagnostic methods in clinical hepatology still lack specificity. RESULTS Ion molecule reaction mass spectrometry (IMR-MS) was applied to a total of 126 human breath gas samples comprising 91 cases (AFLD, NAFLD and cirrhosis) and 35 healthy controls. A new feature selection modality termed Stacked Feature Ranking (SFR) was developed to identify potential liver disease marker candidates in breath gas samples, relying on the combination of different entropy- and correlation-based feature ranking methods including statistical hypothesis testing using a two-level architecture with a suggestion and a decision layer. We benchmarked SFR against four single feature selection methods, a wrapper and a recently described ensemble method, indicating a significantly higher discriminatory ability of up to 10-15% for the SFR selected gas compounds expressed by the area under the ROC curve (AUC) of 0.85-0.95. Using this approach, we were able to identify unexpected breath gas marker candidates in liver disease of high predictive value. A literature study further supports top-ranked markers to be associated with liver disease. We propose SFR as a powerful tool for biomarker search in breath gas and other biological samples using mass spectrometry. AVAILABILITY The algorithm SFR and IMR-MS datasets are available under http://biomed.umit.at/page.cfm?pageid=526.
Collapse
Affiliation(s)
- M Netzer
- Research Group for Clinical Bioinformatics, Institute of Biomedical Engineering, University for Health Sciences, Medical Informatics and Technology (UMIT), Innsbruck Medical University, Innsbruck, Austria.
| | | | | | | | | | | | | | | |
Collapse
|
17
|
Vangala S, Tonelli A. Biomarkers, metabonomics, and drug development: can inborn errors of metabolism help in understanding drug toxicity? AAPS JOURNAL 2007; 9:E284-97. [PMID: 17915830 DOI: 10.1208/aapsj0903031] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Application of "omics" technology during drug discovery and development is rapidly evolving. This review evaluates the current status and future role of "metabonomics" as a tool in the drug development process to reduce the safety-related attrition rates and bridge the gaps between preclinical and clinical, and clinical and market. Particularly, the review looks at the knowledge gap between the pharmaceutical industry and pediatric hospitals, where metabonomics has been successfully applied to screen and treat newborn babies with inborn errors of metabolism. An attempt has been made to relate the clinical pathology associated with inborn errors of metabolism with those of drug-induced pathology. It is proposed that extending the metabonomic biomarkers used in pediatric hospitals, as "advanced clinical chemistry" for preclinical and clinical drug development, is immediately warranted for better safety assessment of drug candidates. The latest advances in mass spectrometry and nuclear magnetic resonance (NMR) spectroscopy should help replace the traditional approaches of laboratory clinical chemistry and move the safety evaluation of drug candidates into the new millennium.
Collapse
Affiliation(s)
- Subrahmanyam Vangala
- Global Preclinical Development, Johnson & Johnson Pharmaceutical Research and Development, Raritan, NJ, USA.
| | | |
Collapse
|
18
|
Ho S, Lukacs Z, Hoffmann GF, Lindner M, Wetter T. Feature construction can improve diagnostic criteria for high-dimensional metabolic data in newborn screening for medium-chain acyl-CoA dehydrogenase deficiency. Clin Chem 2007; 53:1330-7. [PMID: 17513288 DOI: 10.1373/clinchem.2006.081802] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
BACKGROUND In newborn screening with tandem mass spectrometry, multiple intermediary metabolites are quantified in a single analytical run for the diagnosis of fatty-acid oxidation disorders, organic acidurias, and aminoacidurias. Published diagnostic criteria for these disorders normally incorporate a primary metabolic marker combined with secondary markers, often analyte ratios, for which the markers have been chosen to reflect metabolic pathway deviations. METHODS We applied a procedure to extract new markers and diagnostic criteria for newborn screening to the data of newborns with confirmed medium-chain acyl-CoA dehydrogenase deficiency (MCADD) and a control group from the newborn screening program, Heidelberg, Germany. We validated the results with external data of the screening center in Hamburg, Germany. We extracted new markers by performing a systematic search for analyte combinations (features) with high discriminatory performance for MCADD. To select feature thresholds, we applied automated procedures to separate controls and cases on the basis of the feature values. Finally, we built classifiers from these new markers to serve as diagnostic criteria in screening for MCADD. RESULTS On the basis of chi(2) scores, we identified approximately 800 of >628,000 new analyte combinations with superior discriminatory performance compared with the best published combinations. Classifiers built with the new features achieved diagnostic sensitivities and specificities approaching 100%. CONCLUSION Feature construction methods provide ways to disclose information hidden in the set of measured analytes. Other diagnostic tasks based on high-dimensional metabolic data might also profit from this approach.
Collapse
Affiliation(s)
- Sirikit Ho
- Division of Metabolic Diseases, Department of General Pediatrics, University Children's Hospital, Heidelberg, Germany.
| | | | | | | | | |
Collapse
|
19
|
Current literature in mass spectrometry. JOURNAL OF MASS SPECTROMETRY : JMS 2006; 41:1654-1665. [PMID: 17136768 DOI: 10.1002/jms.959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
|