1
|
Rodgers O, Mills C, Watson C, Waterfield T. Role of diagnostic tests for sepsis in children: a review. Arch Dis Child 2024; 109:786-793. [PMID: 38262696 DOI: 10.1136/archdischild-2023-325984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 01/10/2024] [Indexed: 01/25/2024]
Abstract
Paediatric sepsis has a significant global impact and highly heterogeneous clinical presentation. The clinical pathway encompasses recognition, escalation and de-escalation. In each aspect, diagnostics have a fundamental influence over outcomes in children. Biomarkers can aid in creating a larger low-risk group of children from those in the clinical grey area who would otherwise receive antibiotics 'just in case'. Current biomarkers include C reactive protein and procalcitonin, which are limited in their clinical use to guide appropriate and rapid treatment. Biomarker discovery has focused on single biomarkers, which, so far, have not outperformed current biomarkers, as they fail to recognise the complexity of sepsis. The identification of multiple host biomarkers that may form a panel in a clinical test has the potential to recognise the complexity of sepsis and provide improved diagnostic performance. In this review, we discuss novel biomarkers and novel ways of using existing biomarkers in the assessment and management of sepsis along with the significant challenges in biomarker discovery at present. Validation of biomarkers is made less meaningful due to methodological heterogeneity, including variations in sepsis diagnosis, biomarker cut-off values and patient populations. Therefore, the utilisation of platform studies is necessary to improve the efficiency of biomarkers in clinical practice.
Collapse
Affiliation(s)
- Oenone Rodgers
- Wellcome-Wolfson Institute for Experimental Medicine, Queen's University Belfast, Belfast, UK
| | - Clare Mills
- Wellcome-Wolfson Institute for Experimental Medicine, Queen's University Belfast, Belfast, UK
| | - Chris Watson
- Wellcome-Wolfson Institute for Experimental Medicine, Queen's University Belfast, Belfast, UK
| | - Thomas Waterfield
- Wellcome-Wolfson Institute for Experimental Medicine, Queen's University Belfast School of Medicine, Dentistry and Biomedical Sciences, Belfast, UK
| |
Collapse
|
2
|
Kneipp J, Seifert S, Gärber F. SERS microscopy as a tool for comprehensive biochemical characterization in complex samples. Chem Soc Rev 2024; 53:7641-7656. [PMID: 38934892 DOI: 10.1039/d4cs00460d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]
Abstract
Surface enhanced Raman scattering (SERS) spectra of biomaterials such as cells or tissues can be used to obtain biochemical information from nanoscopic volumes in these heterogeneous samples. This tutorial review discusses the factors that determine the outcome of a SERS experiment in complex bioorganic samples. They are related to the SERS process itself, the possibility to selectively probe certain regions or constituents of a sample, and the retrieval of the vibrational information in order to identify molecules and their interaction. After introducing basic aspects of SERS experiments in the context of biocompatible environments, spectroscopy in typical microscopic settings is exemplified, including the possibilities to combine SERS with other linear and non-linear microscopic tools, and to exploit approaches that improve lateral and temporal resolution. In particular the great variation of data in a SERS experiment calls for robust data analysis tools. Approaches will be introduced that have been originally developed in the field of bioinformatics for the application to omics data and that show specific potential in the analysis of SERS data. They include the use of simulated data and machine learning tools that can yield chemical information beyond achieving spectral classification.
Collapse
Affiliation(s)
- Janina Kneipp
- Department of Chemistry, Humboldt-Universität zu Berlin, Brook-Taylor-Str. 2, 12489 Berlin, Germany.
| | - Stephan Seifert
- Hamburg School of Food Science, Department of Chemistry, Universität Hamburg, Grindelallee 117, 20146 Hamburg, Germany
| | - Florian Gärber
- Hamburg School of Food Science, Department of Chemistry, Universität Hamburg, Grindelallee 117, 20146 Hamburg, Germany
| |
Collapse
|
3
|
Magowan D, Abdulshafea M, Thompson D, Rajamoorthy SI, Owen R, Harris D, Prosser S. Blood-based biomarkers and novel technologies for the diagnosis of colorectal cancer and adenomas: a narrative review. Biomark Med 2024; 18:493-506. [PMID: 38900496 DOI: 10.1080/17520363.2024.2345583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 03/12/2024] [Indexed: 06/21/2024] Open
Abstract
Aim: Blood-based biomarkers have shown promise for diagnosing colorectal cancer (CRC) and adenomas (CRA). This review summarizes recent studies in this area. Methods: A literature search was undertaken for 01/01/2017-01/03/2023. Criteria included CRC, CRA, liquid-biopsy, blood-based tests and diagnosis. Results: 12,378 studies were reduced to 178 for data extraction. Sixty focused on proteomics, 53 on RNA species, 30 on cfDNA methylation, seven on antigens and autoantibodies and 28 on novel techniques. 169 case control and nine cohort studies. Number of participants ranged 100-54,297, mean age 58.26. CRC sensitivity and specificity ranged 9.10-100% and 20.40-100%, respectively. CRA sensitivity and specificity ranged 8.00-95.70% and 4.00-97.00%, respectively. Conclusion: Sensitive and specific blood-based tests exist for CRC and CRA. However, studies demonstrate heterogenous techniques and reporting quality. Further work should concentrate on validation and meta-analyzes.
Collapse
Affiliation(s)
- Drew Magowan
- Swansea University, Singleton Park, SA2 8PP, Swansea, UK
- Swansea Bay University Health Board, Department of General Surgery, Morriston Hospital, SA6 6NL, Swansea, UK
| | - Mansour Abdulshafea
- Swansea Bay University Health Board, Department of General Surgery, Morriston Hospital, SA6 6NL, Swansea, UK
| | - Dominic Thompson
- Swansea Bay University Health Board, Department of General Surgery, Morriston Hospital, SA6 6NL, Swansea, UK
| | - Shri-Ishvarya Rajamoorthy
- Swansea Bay University Health Board, Department of General Surgery, Morriston Hospital, SA6 6NL, Swansea, UK
| | - Rhiannon Owen
- Swansea University, Singleton Park, SA2 8PP, Swansea, UK
| | - Dean Harris
- Swansea University, Singleton Park, SA2 8PP, Swansea, UK
- Swansea Bay University Health Board, Department of General Surgery, Morriston Hospital, SA6 6NL, Swansea, UK
| | - Susan Prosser
- Swansea Bay University Health Board, Department of General Surgery, Morriston Hospital, SA6 6NL, Swansea, UK
| |
Collapse
|
4
|
Álvarez-Machancoses Ó, Faraggi E, deAndrés-Galiana EJ, Fernández-Martínez JL, Kloczkowski A. Prediction of Deleterious Single Amino Acid Polymorphisms with a Consensus Holdout Sampler. Curr Genomics 2024; 25:171-184. [PMID: 39086995 PMCID: PMC11288160 DOI: 10.2174/0113892029236347240308054538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 08/03/2023] [Accepted: 09/22/2023] [Indexed: 08/02/2024] Open
Abstract
Background Single Amino Acid Polymorphisms (SAPs) or nonsynonymous Single Nucleotide Variants (nsSNVs) are the most common genetic variations. They result from missense mutations where a single base pair substitution changes the genetic code in such a way that the triplet of bases (codon) at a given position is coding a different amino acid. Since genetic mutations sometimes cause genetic diseases, it is important to comprehend and foresee which variations are harmful and which ones are neutral (not causing changes in the phenotype). This can be posed as a classification problem. Methods Computational methods using machine intelligence are gradually replacing repetitive and exceedingly overpriced mutagenic tests. By and large, uneven quality, deficiencies, and irregularities of nsSNVs datasets debase the convenience of artificial intelligence-based methods. Subsequently, strong and more exact approaches are needed to address these problems. In the present work paper, we show a consensus classifier built on the holdout sampler, which appears strong and precise and outflanks all other popular methods. Results We produced 100 holdouts to test the structures and diverse classification variables of diverse classifiers during the training phase. The finest performing holdouts were chosen to develop a consensus classifier and tested using a k-fold (1 ≤ k ≤5) cross-validation method. We also examined which protein properties have the biggest impact on the precise prediction of the effects of nsSNVs. Conclusion Our Consensus Holdout Sampler outflanks other popular algorithms, and gives excellent results, highly accurate with low standard deviation. The advantage of our method emerges from using a tree of holdouts, where diverse LM/AI-based programs are sampled in diverse ways.
Collapse
Affiliation(s)
- Óscar Álvarez-Machancoses
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007, Oviedo, Spain
| | - Eshel Faraggi
- School of Science, Indiana University-Purdue University Indianapolis, IN, USA
| | - Enrique J deAndrés-Galiana
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007, Oviedo, Spain
- Department of Computer Science, University of Oviedo, C. Federico García Lorca, 18, 33007, Oviedo, Spain
| | - Juan L Fernández-Martínez
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007, Oviedo, Spain
| | - Andrzej Kloczkowski
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
5
|
Mróz J, Pelc M, Mitusińska K, Chorostowska-Wynimko J, Jezela-Stanek A. Computational Tools to Assist in Analyzing Effects of the SERPINA1 Gene Variation on Alpha-1 Antitrypsin (AAT). Genes (Basel) 2024; 15:340. [PMID: 38540399 PMCID: PMC10970068 DOI: 10.3390/genes15030340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 02/28/2024] [Accepted: 03/04/2024] [Indexed: 06/14/2024] Open
Abstract
In the rapidly advancing field of bioinformatics, the development and application of computational tools to predict the effects of single nucleotide variants (SNVs) are shedding light on the molecular mechanisms underlying disorders. Also, they hold promise for guiding therapeutic interventions and personalized medicine strategies in the future. A comprehensive understanding of the impact of SNVs in the SERPINA1 gene on alpha-1 antitrypsin (AAT) protein structure and function requires integrating bioinformatic approaches. Here, we provide a guide for clinicians to navigate through the field of computational analyses which can be applied to describe a novel genetic variant. Predicting the clinical significance of SERPINA1 variation allows clinicians to tailor treatment options for individuals with alpha-1 antitrypsin deficiency (AATD) and related conditions, ultimately improving the patient's outcome and quality of life. This paper explores the various bioinformatic methodologies and cutting-edge approaches dedicated to the assessment of molecular variants of genes and their product proteins using SERPINA1 and AAT as an example.
Collapse
Affiliation(s)
- Jakub Mróz
- Tunneling Group, Biotechnology Center, Silesian University of Technology, Krzywoustego St. 8, 44-100 Gliwice, Poland;
| | - Magdalena Pelc
- Department of Genetics and Clinical Immunology, National Institute of Tuberculosis and Lung Diseases, 26 Plocka St., 01-138 Warsaw, Poland; (M.P.); (J.C.-W.)
| | - Karolina Mitusińska
- Tunneling Group, Biotechnology Center, Silesian University of Technology, Krzywoustego St. 8, 44-100 Gliwice, Poland;
| | - Joanna Chorostowska-Wynimko
- Department of Genetics and Clinical Immunology, National Institute of Tuberculosis and Lung Diseases, 26 Plocka St., 01-138 Warsaw, Poland; (M.P.); (J.C.-W.)
| | - Aleksandra Jezela-Stanek
- Department of Genetics and Clinical Immunology, National Institute of Tuberculosis and Lung Diseases, 26 Plocka St., 01-138 Warsaw, Poland; (M.P.); (J.C.-W.)
| |
Collapse
|
6
|
LeeVan E, Pinsky P. Predictive Performance of Cell-Free Nucleic Acid-Based Multi-Cancer Early Detection Tests: A Systematic Review. Clin Chem 2024; 70:90-101. [PMID: 37791504 DOI: 10.1093/clinchem/hvad134] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 07/24/2023] [Indexed: 10/05/2023]
Abstract
BACKGROUND Cancer-screening tests that can detect multiple cancer types, or multi-cancer early detection (MCED) tests, have emerged recently as a potential new tool in decreasing cancer morbidity and mortality. Most MCED assays are based on detecting cell-free tumor DNA (CF-DNA) in the blood. MCEDs offer the potential for screening for cancer organ sites with high mortality, both with and without recommended screening. However, their clinical utility has not been established. Before clinical utility can be established, the clinical validity of MCEDs, i.e., their ability to predict cancer status, must be demonstrated. In this study we performed a systematic review of the predictive ability for cancer of cell-free-nucleic acid-based MCED tests. CONTENT We searched PubMed for relevant publications from January 2017 to February 2023, using MeSH terms related to multi-cancer detection, circulating DNA, and related concepts. Of 1811 publications assessed, 61 were reviewed in depth and 20 are included in this review. For almost all studies, the cancer cases were assessed at time of diagnosis. Most studies reported specificity (generally 95% or higher) and overall sensitivity (73% median). The median number of cancer types assessed per assay was 5. Many studies also reported sensitivity by stage and/or cancer type. Sensitivity generally increased with stage. SUMMARY To date, relatively few published studies have assessed the clinical validity of MCED tests. Most used cancer cases assessed at diagnosis, with generally high specificity and variable sensitivity depending on cancer type and stage. The next steps should be testing in the intended-use population, i.e., asymptomatic persons.
Collapse
Affiliation(s)
- Elyse LeeVan
- Division of Cancer Prevention, National Cancer Institute, Bethesda, MD, United States
| | - Paul Pinsky
- Division of Cancer Prevention, National Cancer Institute, Bethesda, MD, United States
| |
Collapse
|
7
|
Mottin L, Goldman JP, Jäggli C, Achermann R, Gobeill J, Knafou J, Ehrsam J, Wicky A, Gérard CL, Schwenk T, Charrier M, Tsantoulis P, Lovis C, Leichtle A, Kiessling MK, Michielin O, Pradervand S, Foufi V, Ruch P. Multilingual RECIST classification of radiology reports using supervised learning. Front Digit Health 2023; 5:1195017. [PMID: 37388252 PMCID: PMC10303934 DOI: 10.3389/fdgth.2023.1195017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/05/2023] [Indexed: 07/01/2023] Open
Abstract
Objectives The objective of this study is the exploration of Artificial Intelligence and Natural Language Processing techniques to support the automatic assignment of the four Response Evaluation Criteria in Solid Tumors (RECIST) scales based on radiology reports. We also aim at evaluating how languages and institutional specificities of Swiss teaching hospitals are likely to affect the quality of the classification in French and German languages. Methods In our approach, 7 machine learning methods were evaluated to establish a strong baseline. Then, robust models were built, fine-tuned according to the language (French and German), and compared with the expert annotation. Results The best strategies yield average F1-scores of 90% and 86% respectively for the 2-classes (Progressive/Non-progressive) and the 4-classes (Progressive Disease, Stable Disease, Partial Response, Complete Response) RECIST classification tasks. Conclusions These results are competitive with the manual labeling as measured by Matthew's correlation coefficient and Cohen's Kappa (79% and 76%). On this basis, we confirm the capacity of specific models to generalize on new unseen data and we assess the impact of using Pre-trained Language Models (PLMs) on the accuracy of the classifiers.
Collapse
Affiliation(s)
- Luc Mottin
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Jean-Philippe Goldman
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Christoph Jäggli
- Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
| | - Rita Achermann
- Department of Radiology, Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Julien Gobeill
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Julien Knafou
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Julien Ehrsam
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Alexandre Wicky
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Camille L. Gérard
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Tanja Schwenk
- Department of Oncology, Kantonsspital Aarau, Aarau, Switzerland
| | - Mélinda Charrier
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Petros Tsantoulis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Christian Lovis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Alexander Leichtle
- Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
| | - Michael K. Kiessling
- Department of Medical Oncology and Hematology, University Hospital Zurich, Zurich, Switzerland
| | - Olivier Michielin
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Sylvain Pradervand
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Vasiliki Foufi
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Patrick Ruch
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| |
Collapse
|
8
|
Sandve GK, Greiff V. Access to ground truth at unconstrained size makes simulated data as indispensable as experimental data for bioinformatics methods development and benchmarking. Bioinformatics 2022; 38:4994-4996. [PMID: 36073940 PMCID: PMC9620827 DOI: 10.1093/bioinformatics/btac612] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 02/18/2022] [Accepted: 09/08/2022] [Indexed: 11/14/2022] Open
Affiliation(s)
- Geir Kjetil Sandve
- Department of Informatics, University of Oslo, 0316 Oslo, Norway
- Centre of Bioinformatics, University of Oslo, 0316 Oslo, Norway
- UiORealArt convergence environment, University of Oslo, 0316 Oslo, Norway
| | - Victor Greiff
- Department of Immunology, University of Oslo and Oslo University Hospital, 0316 Oslo, Norway
| |
Collapse
|
9
|
Wiegand M, Cowan SL, Waddington CS, Halsall DJ, Keevil VL, Tom BDM, Taylor V, Gkrania-Klotsas E, Preller J, Goudie RJB. Development and validation of a dynamic 48-hour in-hospital mortality risk stratification for COVID-19 in a UK teaching hospital: a retrospective cohort study. BMJ Open 2022; 12:e060026. [PMID: 36691139 PMCID: PMC9445230 DOI: 10.1136/bmjopen-2021-060026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 07/13/2022] [Indexed: 02/02/2023] Open
Abstract
OBJECTIVES To develop a disease stratification model for COVID-19 that updates according to changes in a patient's condition while in hospital to facilitate patient management and resource allocation. DESIGN In this retrospective cohort study, we adopted a landmarking approach to dynamic prediction of all-cause in-hospital mortality over the next 48 hours. We accounted for informative predictor missingness and selected predictors using penalised regression. SETTING All data used in this study were obtained from a single UK teaching hospital. PARTICIPANTS We developed the model using 473 consecutive patients with COVID-19 presenting to a UK hospital between 1 March 2020 and 12 September 2020; and temporally validated using data on 1119 patients presenting between 13 September 2020 and 17 March 2021. PRIMARY AND SECONDARY OUTCOME MEASURES The primary outcome is all-cause in-hospital mortality within 48 hours of the prediction time. We accounted for the competing risks of discharge from hospital alive and transfer to a tertiary intensive care unit for extracorporeal membrane oxygenation. RESULTS Our final model includes age, Clinical Frailty Scale score, heart rate, respiratory rate, oxygen saturation/fractional inspired oxygen ratio, white cell count, presence of acidosis (pH <7.35) and interleukin-6. Internal validation achieved an area under the receiver operating characteristic (AUROC) of 0.90 (95% CI 0.87 to 0.93) and temporal validation gave an AUROC of 0.86 (95% CI 0.83 to 0.88). CONCLUSIONS Our model incorporates both static risk factors (eg, age) and evolving clinical and laboratory data, to provide a dynamic risk prediction model that adapts to both sudden and gradual changes in an individual patient's clinical condition. On successful external validation, the model has the potential to be a powerful clinical risk assessment tool. TRIAL REGISTRATION The study is registered as 'researchregistry5464' on the Research Registry (www.researchregistry.com).
Collapse
Affiliation(s)
- Martin Wiegand
- Faculty of Infectious Diseases, London School of Hygiene & Tropical Medicine, London, UK
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Sarah L Cowan
- Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | | | - David J Halsall
- Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Victoria L Keevil
- Department of Medicine, University of Cambridge, Cambridge, UK
- Department of Medicine for the Elderly, Addenbrooke's Hospital, Cambridge, UK
| | - Brian D M Tom
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Vince Taylor
- Cancer Research UK, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | | | - Jacobus Preller
- Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | | |
Collapse
|
10
|
Meehan AJ, Lewis SJ, Fazel S, Fusar-Poli P, Steyerberg EW, Stahl D, Danese A. Clinical prediction models in psychiatry: a systematic review of two decades of progress and challenges. Mol Psychiatry 2022; 27:2700-2708. [PMID: 35365801 PMCID: PMC9156409 DOI: 10.1038/s41380-022-01528-4] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 03/03/2022] [Accepted: 03/14/2022] [Indexed: 12/13/2022]
Abstract
Recent years have seen the rapid proliferation of clinical prediction models aiming to support risk stratification and individualized care within psychiatry. Despite growing interest, attempts to synthesize current evidence in the nascent field of precision psychiatry have remained scarce. This systematic review therefore sought to summarize progress towards clinical implementation of prediction modeling for psychiatric outcomes. We searched MEDLINE, PubMed, Embase, and PsychINFO databases from inception to September 30, 2020, for English-language articles that developed and/or validated multivariable models to predict (at an individual level) onset, course, or treatment response for non-organic psychiatric disorders (PROSPERO: CRD42020216530). Individual prediction models were evaluated based on three key criteria: (i) mitigation of bias and overfitting; (ii) generalizability, and (iii) clinical utility. The Prediction model Risk Of Bias ASsessment Tool (PROBAST) was used to formally appraise each study's risk of bias. 228 studies detailing 308 prediction models were ultimately eligible for inclusion. 94.5% of developed prediction models were deemed to be at high risk of bias, largely due to inadequate or inappropriate analytic decisions. Insufficient internal validation efforts (within the development sample) were also observed, while only one-fifth of models underwent external validation in an independent sample. Finally, our search identified just one published model whose potential utility in clinical practice was formally assessed. Our findings illustrated significant growth in precision psychiatry with promising progress towards real-world application. Nevertheless, these efforts have been inhibited by a preponderance of bias and overfitting, while the generalizability and clinical utility of many published models has yet to be formally established. Through improved methodological rigor during initial development, robust evaluations of reproducibility via independent validation, and evidence-based implementation frameworks, future research has the potential to generate risk prediction tools capable of enhancing clinical decision-making in psychiatric care.
Collapse
Affiliation(s)
- Alan J Meehan
- Department of Psychology, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
- Yale Child Study Center, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Stephanie J Lewis
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Seena Fazel
- Department of Psychiatry, University of Oxford, Oxford, UK
| | - Paolo Fusar-Poli
- Early Psychosis: Interventions and Clinical-detection (EPIC) Lab, Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
- OASIS Service, South London and Maudsley NHS Foundation Trust, London, UK
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
- National Institute for Health Research, Maudsley Biomedical Research Centre, South London and Maudsley NHS Foundation Trust, London, UK
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, The Netherlands
- Department of Public Health, Erasmus MC, Rotterdam, The Netherlands
| | - Daniel Stahl
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Andrea Danese
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK.
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK.
- National and Specialist CAMHS Clinic for Trauma, Anxiety, and Depression, South London and Maudsley NHS Foundation Trust, London, UK.
| |
Collapse
|
11
|
Oh M, Zhang L. Generalizing predictions to unseen sequencing profiles via deep generative models. Sci Rep 2022; 12:7151. [PMID: 35504956 PMCID: PMC9065080 DOI: 10.1038/s41598-022-11363-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 04/22/2022] [Indexed: 11/26/2022] Open
Abstract
Predictive models trained on sequencing profiles often fail to achieve expected performance when externally validated on unseen profiles. While many factors such as batch effects, small data sets, and technical errors contribute to the gap between source and unseen data distributions, it is a challenging problem to generalize the predictive models across studies without any prior knowledge of the unseen data distribution. Here, this study proposes DeepBioGen, a sequencing profile augmentation procedure that characterizes visual patterns of sequencing profiles, generates realistic profiles based on a deep generative model capturing the patterns, and generalizes the subsequent classifiers. DeepBioGen outperforms other methods in terms of enhancing the generalizability of the prediction models on unseen data. The generalized classifiers surpass the state-of-the-art method, evaluated on RNA sequencing tumor expression profiles for anti-PD1 therapy response prediction and WGS human gut microbiome profiles for type 2 diabetes diagnosis.
Collapse
Affiliation(s)
- Min Oh
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|
12
|
Guan BZ, Parmigiani G, Braun D, Trippa L. PREDICTION OF HEREDITARY CANCERS USING NEURAL NETWORKS. Ann Appl Stat 2022; 16:495-520. [PMID: 37873507 PMCID: PMC10593124 DOI: 10.1214/21-aoas1510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Family history is a major risk factor for many types of cancer. Mendelian risk prediction models translate family histories into cancer risk predictions, based on knowledge of cancer susceptibility genes. These models are widely used in clinical practice to help identify high-risk individuals. Mendelian models leverage the entire family history, but they rely on many assumptions about cancer susceptibility genes that are either unrealistic or challenging to validate, due to low mutation prevalence. Training more flexible models, such as neural networks, on large databases of pedigrees can potentially lead to accuracy gains. In this paper we develop a framework to apply neural networks to family history data and investigate their ability to learn inherited susceptibility to cancer. While there is an extensive literature on neural networks and their state-of-the-art performance in many tasks, there is little work applying them to family history data. We propose adaptations of fully-connected neural networks and convolutional neural networks to pedigrees. In data simulated under Mendelian inheritance, we demonstrate that our proposed neural network models are able to achieve nearly optimal prediction performance. Moreover, when the observed family history includes misreported cancer diagnoses, neural networks are able to outperform the Mendelian BRCAPRO model embedding the correct inheritance laws. Using a large dataset of over 200,000 family histories, the Risk Service cohort, we train prediction models for future risk of breast cancer. We validate the models using data from the Cancer Genetics Network.
Collapse
Affiliation(s)
- By Zoe Guan
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center
| | | | - Danielle Braun
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| | - Lorenzo Trippa
- Department of Data Sciences, Dana-Farber Cancer Institute
| |
Collapse
|
13
|
Schwarzer A, Talbot SR, Selich A, Morgan M, Schott JW, Dittrich-Breiholz O, Bastone AL, Weigel B, Ha TC, Dziadek V, Gijsbers R, Thrasher AJ, Staal FJT, Gaspar HB, Modlich U, Schambach A, Rothe M. Predicting genotoxicity of viral vectors for stem cell gene therapy using gene expression-based machine learning. Mol Ther 2021; 29:3383-3397. [PMID: 34174440 PMCID: PMC8636173 DOI: 10.1016/j.ymthe.2021.06.017] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 05/12/2021] [Accepted: 06/07/2021] [Indexed: 10/21/2022] Open
Abstract
Hematopoietic stem cell gene therapy is emerging as a promising therapeutic strategy for many diseases of the blood and immune system. However, several individuals who underwent gene therapy in different trials developed hematological malignancies caused by insertional mutagenesis. Preclinical assessment of vector safety remains challenging because there are few reliable assays to screen for potential insertional mutagenesis effects in vitro. Here we demonstrate that genotoxic vectors induce a unique gene expression signature linked to stemness and oncogenesis in transduced murine hematopoietic stem and progenitor cells. Based on this finding, we developed the surrogate assay for genotoxicity assessment (SAGA). SAGA classifies integrating retroviral vectors using machine learning to detect this gene expression signature during the course of in vitro immortalization. On a set of benchmark vectors with known genotoxic potential, SAGA achieved an accuracy of 90.9%. SAGA is more robust and sensitive and faster than previous assays and reliably predicts a mutagenic risk for vectors that led to leukemic severe adverse events in clinical trials. Our work provides a fast and robust tool for preclinical risk assessment of gene therapy vectors, potentially paving the way for safer gene therapy trials.
Collapse
Affiliation(s)
- Adrian Schwarzer
- Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany; Department of Hematology, Hemostasis, Oncology and Stem Cell Transplantation, Hannover Medical School, Hannover, Germany
| | - Steven R Talbot
- Institute for Laboratory Animal Science, Hannover Medical School, Hannover, Germany
| | - Anton Selich
- Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany
| | - Michael Morgan
- Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany
| | - Juliane W Schott
- Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany
| | | | - Antonella L Bastone
- Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany
| | - Bettina Weigel
- Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany
| | - Teng Cheong Ha
- Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany
| | - Violetta Dziadek
- Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany
| | - Rik Gijsbers
- Molecular Virology and Gene Therapy, KU Leuven, Leuven, Belgium
| | - Adrian J Thrasher
- Molecular and Cellular Immunology Section, UCL Great Ormond Street Institute of Child Health, London, UK
| | - Frank J T Staal
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden 2333 ZA, the Netherlands
| | - Hubert B Gaspar
- Molecular and Cellular Immunology Section, UCL Great Ormond Street Institute of Child Health, London, UK
| | - Ute Modlich
- Research Group for Gene Modification in Stem Cells, Division of Veterinary Medicine, Paul Ehrlich Institute, Langen, Germany
| | - Axel Schambach
- Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany; Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Michael Rothe
- Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany.
| |
Collapse
|
14
|
van Beek PE, Andriessen P, Onland W, Schuit E. Prognostic Models Predicting Mortality in Preterm Infants: Systematic Review and Meta-analysis. Pediatrics 2021; 147:peds.2020-020461. [PMID: 33879518 DOI: 10.1542/peds.2020-020461] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/27/2021] [Indexed: 11/24/2022] Open
Abstract
CONTEXT Prediction models can be a valuable tool in performing risk assessment of mortality in preterm infants. OBJECTIVE Summarizing prognostic models for predicting mortality in very preterm infants and assessing their quality. DATA SOURCES Medline was searched for all articles (up to June 2020). STUDY SELECTION All developed or externally validated prognostic models for mortality prediction in liveborn infants born <32 weeks' gestation and/or <1500 g birth weight were included. DATA EXTRACTION Data were extracted by 2 independent authors. Risk of bias (ROB) and applicability assessment was performed by 2 independent authors using Prediction model Risk of Bias Assessment Tool. RESULTS One hundred forty-two models from 35 studies reporting on model development and 112 models from 33 studies reporting on external validation were included. ROB assessment revealed high ROB in the majority of the models, most often because of inadequate (reporting of) analysis. Internal and external validation was lacking in 41% and 96% of these models. Meta-analyses revealed an average C-statistic of 0.88 (95% confidence interval [CI]: 0.83-0.91) for the Clinical Risk Index for Babies score, 0.87 (95% CI: 0.81-0.92) for the Clinical Risk Index for Babies II score, and 0.86 (95% CI: 0.78-0.92) for the Score for Neonatal Acute Physiology Perinatal Extension II score. LIMITATIONS Occasionally, an external validation study was included, but not the development study, because studies developed in the presurfactant era or general NICU population were excluded. CONCLUSIONS Instead of developing additional mortality prediction models for preterm infants, the emphasis should be shifted toward external validation and consecutive adaption of the existing prediction models.
Collapse
Affiliation(s)
- Pauline E van Beek
- Department of Neonatology, Máxima Medical Centre, Veldhoven, Netherlands;
| | - Peter Andriessen
- Department of Neonatology, Máxima Medical Centre, Veldhoven, Netherlands.,Department of Applied Physics, School of Medical Physics and Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
| | - Wes Onland
- Department of Neonatology, Amsterdam University Medical Centers and University of Amsterdam, Amsterdam, Netherlands
| | - Ewoud Schuit
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht and Utrecht University, Utrecht, Netherlands; and.,Cochrane Netherlands, University Medical Center Utrecht and Utrecht University, Utrecht, Netherlands
| |
Collapse
|
15
|
Zhang Y, Bernau C, Parmigiani G, Waldron L. The impact of different sources of heterogeneity on loss of accuracy from genomic prediction models. Biostatistics 2020; 21:253-268. [PMID: 30202918 DOI: 10.1093/biostatistics/kxy044] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 07/22/2018] [Accepted: 08/04/2018] [Indexed: 11/13/2022] Open
Abstract
Cross-study validation (CSV) of prediction models is an alternative to traditional cross-validation (CV) in domains where multiple comparable datasets are available. Although many studies have noted potential sources of heterogeneity in genomic studies, to our knowledge none have systematically investigated their intertwined impacts on prediction accuracy across studies. We employ a hybrid parametric/non-parametric bootstrap method to realistically simulate publicly available compendia of microarray, RNA-seq, and whole metagenome shotgun microbiome studies of health outcomes. Three types of heterogeneity between studies are manipulated and studied: (i) imbalances in the prevalence of clinical and pathological covariates, (ii) differences in gene covariance that could be caused by batch, platform, or tumor purity effects, and (iii) differences in the "true" model that associates gene expression and clinical factors to outcome. We assess model accuracy, while altering these factors. Lower accuracy is seen in CSV than in CV. Surprisingly, heterogeneity in known clinical covariates and differences in gene covariance structure have very limited contributions in the loss of accuracy when validating in new studies. However, forcing identical generative models greatly reduces the within/across study difference. These results, observed consistently for multiple disease outcomes and omics platforms, suggest that the most easily identifiable sources of study heterogeneity are not necessarily the primary ones that undermine the ability to accurately replicate the accuracy of omics prediction models in new studies. Unidentified heterogeneity, such as could arise from unmeasured confounding, may be more important.
Collapse
Affiliation(s)
- Yuqing Zhang
- Graduate Program in Bioinformatics, Boston University, 24 Cummington Mall, Boston, MA, USA
| | - Christoph Bernau
- Department of Medical Informatics, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, Munich, Germany
| | - Giovanni Parmigiani
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 3 Blackfan Cir, Boston, MA, USA.,Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Ave, Boston, MA, USA
| | - Levi Waldron
- Graduate School of Public Health and Health Policy, Institute for Implementation Science in Population Health, City University of New York, 55 W 125th St, New York, NY, USA
| |
Collapse
|
16
|
Ubels J, Sonneveld P, van Vliet MH, de Ridder J. Gene Networks Constructed Through Simulated Treatment Learning can Predict Proteasome Inhibitor Benefit in Multiple Myeloma. Clin Cancer Res 2020; 26:5952-5961. [PMID: 32913136 DOI: 10.1158/1078-0432.ccr-20-0742] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 05/27/2020] [Accepted: 09/03/2020] [Indexed: 11/16/2022]
Abstract
PURPOSE Proteasome inhibitors are widely used in treating multiple myeloma, but can cause serious side effects and response varies among patients. It is, therefore, important to gain more insight into which patients will benefit from proteasome inhibitors. EXPERIMENTAL DESIGN We introduce simulated treatment learned signatures (STLsig), a machine learning method to identify predictive gene expression signatures. STLsig uses genetically similar patients who have received an alternative treatment to model which patients will benefit more from proteasome inhibitors than from an alternative treatment. STLsig constructs gene networks by linking genes that are synergistic in their ability to predict benefit. RESULTS In a dataset of 910 patients with multiple myeloma, STLsig identified two gene networks that together can predict benefit to the proteasome inhibitor, bortezomib. In class "benefit," we found an HR of 0.47 (P = 0.04) in favor of bortezomib, while in class "no benefit," the HR was 0.91 (P = 0.68). Importantly, we observed a similar performance (HR class benefit, 0.46; P = 0.04) in an independent patient cohort. Moreover, this signature also predicts benefit for the proteasome inhibitor, carfilzomib, indicating it is not specific to bortezomib. No equivalent signature can be found when the genes in the signature are excluded from the analysis, indicating that they are essential. Multiple genes in the signature are linked to working mechanisms of proteasome inhibitors or multiple myeloma disease progression. CONCLUSIONS STLsig can identify gene signatures that could aid in treatment decisions for patients with multiple myeloma and provide insight into the biological mechanism behind treatment benefit.
Collapse
Affiliation(s)
- Joske Ubels
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, the Netherlands.,Oncode Institute, Utrecht, the Netherlands.,Department of Hematology, Erasmus MC Cancer Institute, Rotterdam, the Netherlands.,SkylineDx, Rotterdam, the Netherlands
| | - Pieter Sonneveld
- Department of Hematology, Erasmus MC Cancer Institute, Rotterdam, the Netherlands
| | | | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, the Netherlands. .,Oncode Institute, Utrecht, the Netherlands
| |
Collapse
|
17
|
Herrmann M, Probst P, Hornung R, Jurinovic V, Boulesteix AL. Large-scale benchmark study of survival prediction methods using multi-omics data. Brief Bioinform 2020; 22:5895463. [PMID: 32823283 PMCID: PMC8138887 DOI: 10.1093/bib/bbaa167] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 06/25/2020] [Accepted: 07/03/2020] [Indexed: 12/18/2022] Open
Abstract
Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database 'The Cancer Genome Atlas' (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan-Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno's C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups-especially clinical variables-from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited. Contact:moritz.herrmann@stat.uni-muenchen.de, +49 89 2180 3198 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online. All analyses are reproducible using R code freely available on Github.
Collapse
Affiliation(s)
- Moritz Herrmann
- Department of Statistics, Ludwig Maximilian University, Munich, 80539, Germany
| | - Philipp Probst
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany
| | - Roman Hornung
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany
| | - Vindi Jurinovic
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany
| |
Collapse
|
18
|
Watson OP, Cortes-Ciriano I, Taylor AR, Watson JA. A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery. Bioinformatics 2020; 35:4656-4663. [PMID: 31070704 PMCID: PMC6853675 DOI: 10.1093/bioinformatics/btz293] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 03/22/2019] [Accepted: 04/17/2019] [Indexed: 02/07/2023] Open
Abstract
Motivation Artificial intelligence, trained via machine learning (e.g. neural nets, random forests) or computational statistical algorithms (e.g. support vector machines, ridge regression), holds much promise for the improvement of small-molecule drug discovery. However, small-molecule structure-activity data are high dimensional with low signal-to-noise ratios and proper validation of predictive methods is difficult. It is poorly understood which, if any, of the currently available machine learning algorithms will best predict new candidate drugs. Results The quantile-activity bootstrap is proposed as a new model validation framework using quantile splits on the activity distribution function to construct training and testing sets. In addition, we propose two novel rank-based loss functions which penalize only the out-of-sample predicted ranks of high-activity molecules. The combination of these methods was used to assess the performance of neural nets, random forests, support vector machines (regression) and ridge regression applied to 25 diverse high-quality structure-activity datasets publicly available on ChEMBL. Model validation based on random partitioning of available data favours models that overfit and ‘memorize’ the training set, namely random forests and deep neural nets. Partitioning based on quantiles of the activity distribution correctly penalizes extrapolation of models onto structurally different molecules outside of the training data. Simpler, traditional statistical methods such as ridge regression can outperform state-of-the-art machine learning methods in this setting. In addition, our new rank-based loss functions give considerably different results from mean squared error highlighting the necessity to define model optimality with respect to the decision task at hand. Availability and implementation All software and data are available as Jupyter notebooks found at https://github.com/owatson/QuantileBootstrap. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Isidro Cortes-Ciriano
- Goring on Thames, Evariste Technologies Ltd., RG8 9AL UK.,Department of Chemistry, Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Aimee R Taylor
- Department of Epidemiology, Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health, Boston, MA 02115 USA.,Infectious Disease Microbiome Program, Broad Institute, Cambridge, MA 02142 USA
| | - James A Watson
- Nuffield Department of Medicine, Centre for Tropical Medicine and Global Health, University of Oxford, Oxford OX3, 7LF UK.,Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| |
Collapse
|
19
|
Vollmer S, Mateen BA, Bohner G, Király FJ, Ghani R, Jonsson P, Cumbers S, Jonas A, McAllister KSL, Myles P, Granger D, Birse M, Branson R, Moons KGM, Collins GS, Ioannidis JPA, Holmes C, Hemingway H. Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. BMJ 2020; 368:l6927. [PMID: 32198138 DOI: 10.1136/bmj.l6927] [Citation(s) in RCA: 155] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Sebastian Vollmer
- Alan Turing Institute, Kings Cross, London, UK
- Departments of Mathematics and Statistics, University of Warwick, Coventry, UK
| | - Bilal A Mateen
- Alan Turing Institute, Kings Cross, London, UK
- Warwick Medical School, University of Warwick, Coventry, UK
- Kings College Hospital, Denmark Hill, London, UK
| | - Gergo Bohner
- Alan Turing Institute, Kings Cross, London, UK
- Departments of Mathematics and Statistics, University of Warwick, Coventry, UK
| | - Franz J Király
- Alan Turing Institute, Kings Cross, London, UK
- Department of Statistical Science, University College London, London, UK
| | | | - Pall Jonsson
- Science Policy and Research, National Institute for Health and Care Excellence, Manchester, UK
| | - Sarah Cumbers
- Health and Social Care Directorate, National Institute for Health and Care Excellence, London, UK
| | - Adrian Jonas
- Data and Analytics Group, National Institute for Health and Care Excellence, London, UK
| | | | - Puja Myles
- Clinical Practice Research Datalink, Medicines and Healthcare products Regulatory Agency, London, UK
| | - David Granger
- Medicines and Healthcare products Regulatory Agency, London, UK
| | - Mark Birse
- Medicines and Healthcare products Regulatory Agency, London, UK
| | - Richard Branson
- Medicines and Healthcare products Regulatory Agency, London, UK
| | - Karel G M Moons
- Julius Centre for Health Sciences and Primary Care, UMC Utrecht, Utrecht University, Utrecht, Netherlands
| | - Gary S Collins
- UK EQUATOR Centre, Centre for Statistics in Medicine, NDORMS, University of Oxford, Oxford, UK
| | - John P A Ioannidis
- Meta-Research Innovation Centre at Stanford, Stanford University, Stanford, CA, USA
| | - Chris Holmes
- Alan Turing Institute, Kings Cross, London, UK
- Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
| | - Harry Hemingway
- Health Data Research UK London, University College London, London, UK
- Institute of Health Informatics, University College London, London, UK
- National Institute for Health Research, University College London Hospitals Biomedical Research Centre, University College London, London, UK
| |
Collapse
|
20
|
Shi L, Westerhuis JA, Rosén J, Landberg R, Brunius C. Variable selection and validation in multivariate modelling. Bioinformatics 2019; 35:972-980. [PMID: 30165467 PMCID: PMC6419897 DOI: 10.1093/bioinformatics/bty710] [Citation(s) in RCA: 116] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Revised: 07/04/2018] [Accepted: 08/24/2018] [Indexed: 12/28/2022] Open
Abstract
MOTIVATION Validation of variable selection and predictive performance is crucial in construction of robust multivariate models that generalize well, minimize overfitting and facilitate interpretation of results. Inappropriate variable selection leads instead to selection bias, thereby increasing the risk of model overfitting and false positive discoveries. Although several algorithms exist to identify a minimal set of most informative variables (i.e. the minimal-optimal problem), few can select all variables related to the research question (i.e. the all-relevant problem). Robust algorithms combining identification of both minimal-optimal and all-relevant variables with proper cross-validation are urgently needed. RESULTS We developed the MUVR algorithm to improve predictive performance and minimize overfitting and false positives in multivariate analysis. In the MUVR algorithm, minimal variable selection is achieved by performing recursive variable elimination in a repeated double cross-validation (rdCV) procedure. The algorithm supports partial least squares and random forest modelling, and simultaneously identifies minimal-optimal and all-relevant variable sets for regression, classification and multilevel analyses. Using three authentic omics datasets, MUVR yielded parsimonious models with minimal overfitting and improved model performance compared with state-of-the-art rdCV. Moreover, MUVR showed advantages over other variable selection algorithms, i.e. Boruta and VSURF, including simultaneous variable selection and validation scheme and wider applicability. AVAILABILITY AND IMPLEMENTATION Algorithms, data, scripts and tutorial are open source and available as an R package ('MUVR') at https://gitlab.com/CarlBrunius/MUVR.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lin Shi
- Department of Molecular Sciences, Swedish University of Agricultural Sciences, Uppsala SE-750 07, Sweden
- Department of Biology and Biological Engineering, Food and Nutrition Science, Chalmers University of Technology, Gothenburg SE-412 96, Sweden
| | - Johan A Westerhuis
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam XH, The Netherlands
- Metabolomics Center, North-West University, X6001, Potchefstroom, South Africa
| | - Johan Rosén
- Swedish National Food Agency, Uppsala, Sweden
| | - Rikard Landberg
- Department of Molecular Sciences, Swedish University of Agricultural Sciences, Uppsala SE-750 07, Sweden
- Department of Biology and Biological Engineering, Food and Nutrition Science, Chalmers University of Technology, Gothenburg SE-412 96, Sweden
| | - Carl Brunius
- Department of Biology and Biological Engineering, Food and Nutrition Science, Chalmers University of Technology, Gothenburg SE-412 96, Sweden
| |
Collapse
|
21
|
Allahyar A, Ubels J, de Ridder J. A data-driven interactome of synergistic genes improves network-based cancer outcome prediction. PLoS Comput Biol 2019; 15:e1006657. [PMID: 30726216 PMCID: PMC6380593 DOI: 10.1371/journal.pcbi.1006657] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2018] [Revised: 02/19/2019] [Accepted: 11/20/2018] [Indexed: 12/13/2022] Open
Abstract
Robustly predicting outcome for cancer patients from gene expression is an important challenge on the road to better personalized treatment. Network-based outcome predictors (NOPs), which considers the cellular wiring diagram in the classification, hold much promise to improve performance, stability and interpretability of identified marker genes. Problematically, reports on the efficacy of NOPs are conflicting and for instance suggest that utilizing random networks performs on par to networks that describe biologically relevant interactions. In this paper we turn the prediction problem around: instead of using a given biological network in the NOP, we aim to identify the network of genes that truly improves outcome prediction. To this end, we propose SyNet, a gene network constructed ab initio from synergistic gene pairs derived from survival-labelled gene expression data. To obtain SyNet, we evaluate synergy for all 69 million pairwise combinations of genes resulting in a network that is specific to the dataset and phenotype under study and can be used to in a NOP model. We evaluated SyNet and 11 other networks on a compendium dataset of >4000 survival-labelled breast cancer samples. For this purpose, we used cross-study validation which more closely emulates real world application of these outcome predictors. We find that SyNet is the only network that truly improves performance, stability and interpretability in several existing NOPs. We show that SyNet overlaps significantly with existing gene networks, and can be confidently predicted (~85% AUC) from graph-topological descriptions of these networks, in particular the breast tissue-specific network. Due to its data-driven nature, SyNet is not biased to well-studied genes and thus facilitates post-hoc interpretation. We find that SyNet is highly enriched for known breast cancer genes and genes related to e.g. histological grade and tamoxifen resistance, suggestive of a role in determining breast cancer outcome.
Collapse
Affiliation(s)
- Amin Allahyar
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Joske Ubels
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Skyline DX, Rotterdam
- Department of Hematology, Erasmus MC Cancer Institute, Rotterdam
| | - Jeroen de Ridder
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
22
|
Moons KGM, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, Reitsma JB, Kleijnen J, Mallett S. PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration. Ann Intern Med 2019; 170:W1-W33. [PMID: 30596876 DOI: 10.7326/m18-1377] [Citation(s) in RCA: 682] [Impact Index Per Article: 136.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Prediction models in health care use predictors to estimate for an individual the probability that a condition or disease is already present (diagnostic model) or will occur in the future (prognostic model). Publications on prediction models have become more common in recent years, and competing prediction models frequently exist for the same outcome or target population. Health care providers, guideline developers, and policymakers are often unsure which model to use or recommend, and in which persons or settings. Hence, systematic reviews of these studies are increasingly demanded, required, and performed. A key part of a systematic review of prediction models is examination of risk of bias and applicability to the intended population and setting. To help reviewers with this process, the authors developed PROBAST (Prediction model Risk Of Bias ASsessment Tool) for studies developing, validating, or updating (for example, extending) prediction models, both diagnostic and prognostic. PROBAST was developed through a consensus process involving a group of experts in the field. It includes 20 signaling questions across 4 domains (participants, predictors, outcome, and analysis). This explanation and elaboration document describes the rationale for including each domain and signaling question and guides researchers, reviewers, readers, and guideline developers in how to use them to assess risk of bias and applicability concerns. All concepts are illustrated with published examples across different topics. The latest version of the PROBAST checklist, accompanying documents, and filled-in examples can be downloaded from www.probast.org.
Collapse
Affiliation(s)
- Karel G M Moons
- Julius Center for Health Sciences and Primary Care and Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands (K.G.M., J.B.R.)
| | - Robert F Wolff
- Kleijnen Systematic Reviews, York, United Kingdom (R.F.W., M.W.)
| | - Richard D Riley
- Centre for Prognosis Research, Research Institute for Primary Care and Health Sciences, Keele University, Keele, United Kingdom (R.D.R.)
| | - Penny F Whiting
- Bristol Medical School of the University of Bristol and National Institute for Health Research Collaboration for Leadership in Applied Health Research and Care West, University Hospitals Bristol National Health Service Foundation Trust, Bristol, United Kingdom (P.F.W.)
| | - Marie Westwood
- Kleijnen Systematic Reviews, York, United Kingdom (R.F.W., M.W.)
| | - Gary S Collins
- Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom (G.S.C.)
| | - Johannes B Reitsma
- Julius Center for Health Sciences and Primary Care and Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands (K.G.M., J.B.R.)
| | - Jos Kleijnen
- Kleijnen Systematic Reviews, York, United Kingdom, and School for Public Health and Primary Care, Maastricht University, Maastricht, the Netherlands (J.K.)
| | - Sue Mallett
- Institute of Applied Health Research, National Institute for Health Research Birmingham Biomedical Research Centre, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom (S.M.)
| |
Collapse
|
23
|
Cui L, Lu H, Lee YH. Challenges and emergent solutions for LC-MS/MS based untargeted metabolomics in diseases. MASS SPECTROMETRY REVIEWS 2018; 37:772-792. [PMID: 29486047 DOI: 10.1002/mas.21562] [Citation(s) in RCA: 197] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 02/02/2018] [Indexed: 05/03/2023]
Abstract
In the past decade, advances in liquid chromatography-mass spectrometry (LC-MS) have revolutionized untargeted metabolomics analyses. By mining metabolomes more deeply, researchers are now primed to uncover key metabolites and their associations with diseases. The employment of untargeted metabolomics has led to new biomarker discoveries and a better mechanistic understanding of diseases with applications in precision medicine. However, many major pertinent challenges remain. First, compound identification has been poor, and left an overwhelming number of unidentified peaks. Second, partial, incomplete metabolomes persist due to factors such as limitations in mass spectrometry data acquisition speeds, wide-range of metabolites concentrations, and cellular/tissue/temporal-specific expression changes that confound our understanding of metabolite perturbations. Third, to contextualize metabolites in pathways and biology is difficult because many metabolites partake in multiple pathways, have yet to be described species specificity, or possess unannotated or more-complex functions that are not easily characterized through metabolomics analyses. From a translational perspective, information related to novel metabolite biomarkers, metabolic pathways, and drug targets might be sparser than they should be. Thankfully, significant progress has been made and novel solutions are emerging, achieved through sustained academic and industrial community efforts in terms of hardware, computational, and experimental approaches. Given the rapidly growing utility of metabolomics, this review will offer new perspectives, increase awareness of the major challenges in LC-MS metabolomics that will significantly benefit the metabolomics community and also the broader the biomedical community metabolomics aspire to serve.
Collapse
Affiliation(s)
- Liang Cui
- Translational 'Omics and Biomarkers Group, KK Research Centre, KK Women's and Children's Hospital, Singapore, Singapore
- Infectious Diseases-Interdisciplinary Research Group, Singapore-MIT Alliance for Research and Technology, Singapore, Singapore
| | - Haitao Lu
- Shanghai Center for Systems Biomedicine, Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
| | - Yie Hou Lee
- Translational 'Omics and Biomarkers Group, KK Research Centre, KK Women's and Children's Hospital, Singapore, Singapore
- OBGYN-Academic Clinical Program, Duke-NUS Medical School, Singapore, Singapore
| |
Collapse
|
24
|
Abstract
The digital world is generating data at a staggering and still increasing rate. While these "big data" have unlocked novel opportunities to understand public health, they hold still greater potential for research and practice. This review explores several key issues that have arisen around big data. First, we propose a taxonomy of sources of big data to clarify terminology and identify threads common across some subtypes of big data. Next, we consider common public health research and practice uses for big data, including surveillance, hypothesis-generating research, and causal inference, while exploring the role that machine learning may play in each use. We then consider the ethical implications of the big data revolution with particular emphasis on maintaining appropriate care for privacy in a world in which technology is rapidly changing social norms regarding the need for (and even the meaning of) privacy. Finally, we make suggestions regarding structuring teams and training to succeed in working with big data in research and practice.
Collapse
Affiliation(s)
- Stephen J Mooney
- Harborview Injury Prevention and Research Center, University of Washington, Seattle, Washington 98122, USA;
| | - Vikas Pejaver
- Department of Biomedical Informatics and Medical Education and the eScience Institute, University of Washington, Seattle, Washington 98109, USA;
| |
Collapse
|
25
|
Biomarker Guidelines for High-Dimensional Genomic Studies in Transplantation: Adding Method to the Madness. Transplantation 2018; 101:457-463. [PMID: 28212255 DOI: 10.1097/tp.0000000000001622] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
26
|
Rahmatallah Y, Khaidakov M, Lai KK, Goyne HE, Lamps LW, Hagedorn CH, Glazko G. Platform-independent gene expression signature differentiates sessile serrated adenomas/polyps and hyperplastic polyps of the colon. BMC Med Genomics 2017; 10:81. [PMID: 29284484 PMCID: PMC5745747 DOI: 10.1186/s12920-017-0317-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Accepted: 12/14/2017] [Indexed: 12/18/2022] Open
Abstract
Background Sessile serrated adenomas/polyps are distinguished from hyperplastic colonic polyps subjectively by their endoscopic appearance and histological morphology. However, hyperplastic and sessile serrated polyps can have overlapping morphological features resulting in sessile serrated polyps diagnosed as hyperplastic. While sessile serrated polyps can progress into colon cancer, hyperplastic polyps have virtually no risk for colon cancer. Objective measures, differentiating these types of polyps would improve cancer prevention and treatment outcome. Methods RNA-seq training data set and Affimetrix, Illumina testing data sets were obtained from Gene Expression Omnibus (GEO). RNA-seq single-end reads were filtered with FastX toolkit. Read mapping to the human genome, gene abundance estimation, and differential expression analysis were performed with Tophat-Cufflinks pipeline. Background correction, normalization, and probe summarization steps for Affimetrix arrays were performed using the robust multi-array method (RMA). For Illumina arrays, log2-scale expression data was obtained from GEO. Pathway analysis was implemented using Bioconductor package GSAR. To build a platform-independent molecular classifier that accurately differentiates sessile serrated and hyperplastic polyps we developed a new feature selection step. We also developed a simple procedure to classify new samples as either sessile serrated or hyperplastic with a class probability assigned to the decision, estimated using Cantelli’s inequality. Results The classifier trained on RNA-seq data and tested on two independent microarray data sets resulted in zero and three errors. The classifier was further tested using quantitative real-time PCR expression levels of 45 blinded independent formalin-fixed paraffin-embedded specimens and was highly accurate. Pathway analyses have shown that sessile serrated polyps are distinguished from hyperplastic polyps and normal controls by: up-regulation of pathways implicated in proliferation, inflammation, cell-cell adhesion and down-regulation of serine threonine kinase signaling pathway; differential co-expression of pathways regulating cell division, protein trafficking and kinase activities. Conclusions Most of the differentially expressed pathways are known as hallmarks of cancer and likely to explain why sessile serrated polyps are more prone to neoplastic transformation than hyperplastic. The new molecular classifier includes 13 genes and may facilitate objective differentiation between two polyps. Electronic supplementary material The online version of this article (10.1186/s12920-017-0317-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yasir Rahmatallah
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Magomed Khaidakov
- The Central Arkansas Veterans Healthcare System, Little Rock, AR, 72205, USA.,Department of Medicine, Division of Gastroenterology and Hepatology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Keith K Lai
- Department of Anatomic Pathology, Cleveland Clinic, Cleveland, OH, 44195, USA
| | - Hannah E Goyne
- Department of Pathology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Laura W Lamps
- Department of Pathology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Curt H Hagedorn
- The Central Arkansas Veterans Healthcare System, Little Rock, AR, 72205, USA.,Department of Medicine, Division of Gastroenterology and Hepatology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Galina Glazko
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA.
| |
Collapse
|
27
|
Hornung R, Causeur D, Bernau C, Boulesteix AL. Improving cross-study prediction through addon batch effect adjustment or addon normalization. Bioinformatics 2017; 33:397-404. [PMID: 27797760 DOI: 10.1093/bioinformatics/btw650] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 10/11/2016] [Indexed: 12/22/2022] Open
Abstract
Motivation To date most medical tests derived by applying classification methods to high-dimensional molecular data are hardly used in clinical practice. This is partly because the prediction error resulting when applying them to external data is usually much higher than internal error as evaluated through within-study validation procedures. We suggest the use of addon normalization and addon batch effect removal techniques in this context to reduce systematic differences between external data and the original dataset with the aim to improve prediction performance. Results We evaluate the impact of addon normalization and seven batch effect removal methods on cross-study prediction performance for several common classifiers using a large collection of microarray gene expression datasets, showing that some of these techniques reduce prediction error. Availability and Implementation All investigated addon methods are implemented in our R package bapred. Contact hornung@ibe.med.uni-muenchen.de. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Roman Hornung
- Department of Medical Informatics, Biometry and Epidemiology, University of Munich, Munich, Germany
| | - David Causeur
- Applied Mathematics Department, Agrocampus Ouest, Rennes, France
| | | | - Anne-Laure Boulesteix
- Department of Medical Informatics, Biometry and Epidemiology, University of Munich, Munich, Germany
| |
Collapse
|
28
|
Ioannidis JPA, Bossuyt PMM. Waste, Leaks, and Failures in the Biomarker Pipeline. Clin Chem 2017; 63:963-972. [DOI: 10.1373/clinchem.2016.254649] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 11/30/2016] [Indexed: 01/05/2023]
Abstract
Abstract
BACKGROUND
The large, expanding literature on biomarkers is characterized by almost ubiquitous significant results, with claims about the potential importance, but few of these discovered biomarkers are used in routine clinical care.
CONTENT
The pipeline of biomarker development includes several specific stages: discovery, validation, clinical translation, evaluation, implementation (and, in the case of nonutility, deimplementation). Each of these stages can be plagued by problems that cause failures of the overall pipeline. Some problems are nonspecific challenges for all biomedical investigation, while others are specific to the peculiarities of biomarker research. Discovery suffers from poor methods and incomplete and selective reporting. External independent validation is limited. Selection for clinical translation is often shaped by nonrational choices. Evaluation is sparse and the clinical utility of many biomarkers remains unknown. The regulatory environment for biomarkers remains weak and guidelines can reach biased or divergent recommendations. Removing inefficient or even harmful biomarkers that have been entrenched in clinical care can meet with major resistance.
SUMMARY
The current biomarker pipeline is too prone to failures. Consideration of clinical needs should become a starting point for the development of biomarkers. Improvements can include the use of more stringent methodology, better reporting, larger collaborative studies, careful external independent validation, preregistration, rigorous systematic reviews and umbrella reviews, pivotal randomized trials, and implementation and deimplementation studies. Incentives should be aligned toward delivering useful biomarkers.
Collapse
Affiliation(s)
- John P A Ioannidis
- Departments of Medicine, Health Research and Policy, and Statistics, and the Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA
| | - Patrick M M Bossuyt
- Department of Clinical Epidemiology, Biostatistics & Bioinformatics, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands
| |
Collapse
|
29
|
Marcus MW, Field JK. Is Bootstrapping Sufficient for Validating a Risk Model for Selection of Participants for a Lung Cancer Screening Program? J Clin Oncol 2017; 35:818-819. [DOI: 10.1200/jco.2016.71.3214] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Affiliation(s)
- Michael W. Marcus
- Michael W. Marcus and John K. Field, The University of Liverpool, Liverpool, United Kingdom
| | - John K. Field
- Michael W. Marcus and John K. Field, The University of Liverpool, Liverpool, United Kingdom
| |
Collapse
|
30
|
Characteristics and Validation Techniques for PCA-Based Gene-Expression Signatures. Int J Genomics 2017; 2017:2354564. [PMID: 28265563 PMCID: PMC5317117 DOI: 10.1155/2017/2354564] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Revised: 12/15/2016] [Accepted: 01/04/2017] [Indexed: 11/30/2022] Open
Abstract
Background. Many gene-expression signatures exist for describing the biological state of profiled tumors. Principal Component Analysis (PCA) can be used to summarize a gene signature into a single score. Our hypothesis is that gene signatures can be validated when applied to new datasets, using inherent properties of PCA. Results. This validation is based on four key concepts. Coherence: elements of a gene signature should be correlated beyond chance. Uniqueness: the general direction of the data being examined can drive most of the observed signal. Robustness: if a gene signature is designed to measure a single biological effect, then this signal should be sufficiently strong and distinct compared to other signals within the signature. Transferability: the derived PCA gene signature score should describe the same biology in the target dataset as it does in the training dataset. Conclusions. The proposed validation procedure ensures that PCA-based gene signatures perform as expected when applied to datasets other than those that the signatures were trained upon. Complex signatures, describing multiple independent biological components, are also easily identified.
Collapse
|
31
|
Timmons JA. Molecular Diagnostics of Ageing and Tackling Age-related Disease. Trends Pharmacol Sci 2016; 38:67-80. [PMID: 27979318 DOI: 10.1016/j.tips.2016.11.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Revised: 11/08/2016] [Accepted: 11/08/2016] [Indexed: 10/25/2022]
Abstract
As average life expectancy increases there is a greater focus on health-span and, in particular, how to treat or prevent chronic age-associated diseases. Therapies which were able to control 'biological age' with the aim of postponing chronic and costly diseases of old age require an entirely new approach to drug development. Molecular technologies and machine-learning methods have already yielded diagnostics that help guide cancer treatment and cardiovascular procedures. Discovery of valid and clinically informative diagnostics of human biological age (combined with disease-specific biomarkers) has the potential to alter current drug-discovery strategies, aid clinical trial recruitment and maximize healthy ageing. I will review some basic principles that govern the development of 'ageing' diagnostics, how such assays could be used during the drug-discovery or development process. Important logistical and statistical considerations are illustrated by reviewing recent biomarker activity in the field of Alzheimer's disease, as dementia represents the most pressing of priorities for the pharmaceutical industry, as well as the chronic disease in humans most associated with age.
Collapse
Affiliation(s)
- James A Timmons
- Division of Genetics and Molecular Medicine, King's College London, London, England; XRGenomics Ltd, Scion House, Stirlingshire, Scotland.
| |
Collapse
|
32
|
Abram TJ, Floriano PN, Christodoulides N, James R, Kerr AR, Thornhill MH, Redding SW, Vigneswaran N, Speight PM, Vick J, Murdoch C, Freeman C, Hegarty AM, D'Apice K, Phelan JA, Corby PM, Khouly I, Bouquot J, Demian NM, Weinstock YE, Rowan S, Yeh CK, McGuff HS, Miller FR, Gaur S, Karthikeyan K, Taylor L, Le C, Nguyen M, Talavera H, Raja R, Wong J, McDevitt JT. 'Cytology-on-a-chip' based sensors for monitoring of potentially malignant oral lesions. Oral Oncol 2016; 60:103-11. [PMID: 27531880 DOI: 10.1016/j.oraloncology.2016.07.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 06/30/2016] [Accepted: 07/02/2016] [Indexed: 12/11/2022]
Abstract
UNLABELLED Despite significant advances in surgical procedures and treatment, long-term prognosis for patients with oral cancer remains poor, with survival rates among the lowest of major cancers. Better methods are desperately needed to identify potential malignancies early when treatments are more effective. OBJECTIVE To develop robust classification models from cytology-on-a-chip measurements that mirror diagnostic performance of gold standard approach involving tissue biopsy. MATERIALS AND METHODS Measurements were recorded from 714 prospectively recruited patients with suspicious lesions across 6 diagnostic categories (each confirmed by tissue biopsy -histopathology) using a powerful new 'cytology-on-a-chip' approach capable of executing high content analysis at a single cell level. Over 200 cellular features related to biomarker expression, nuclear parameters and cellular morphology were recorded per cell. By cataloging an average of 2000 cells per patient, these efforts resulted in nearly 13 million indexed objects. RESULTS Binary "low-risk"/"high-risk" models yielded AUC values of 0.88 and 0.84 for training and validation models, respectively, with an accompanying difference in sensitivity+specificity of 6.2%. In terms of accuracy, this model accurately predicted the correct diagnosis approximately 70% of the time, compared to the 69% initial agreement rate of the pool of expert pathologists. Key parameters identified in these models included cell circularity, Ki67 and EGFR expression, nuclear-cytoplasmic ratio, nuclear area, and cell area. CONCLUSIONS This chip-based approach yields objective data that can be leveraged for diagnosis and management of patients with PMOL as well as uncovering new molecular-level insights behind cytological differences across the OED spectrum.
Collapse
Affiliation(s)
- Timothy J Abram
- Rice University, Department of Bioengineering, Houston, TX, USA
| | | | | | | | - A Ross Kerr
- New York University College of Dentistry, Department of Oral and Maxillofacial Pathology, Radiology & Medicine, New York, NY, USA
| | - Martin H Thornhill
- Academic Unit of Oral & Maxillofacial Medicine & Surgery, University of Sheffield School of Clinical Dentistry, Sheffield, UK
| | - Spencer W Redding
- The University of Texas Health Science Center at San Antonio, Department of Comprehensive Dentistry and Cancer Therapy and Research Center, San Antonio, TX, USA
| | - Nadarajah Vigneswaran
- The University of Texas Health Science Center at Houston, Department of Diagnostic and Biomedical Sciences, Houston, TX, USA
| | - Paul M Speight
- Academic Unit of Oral & Maxillofacial Pathology, University of Sheffield School of Clinical Dentistry, Sheffield, UK
| | | | - Craig Murdoch
- Academic Unit of Oral & Maxillofacial Medicine & Surgery, University of Sheffield School of Clinical Dentistry, Sheffield, UK
| | - Christine Freeman
- Academic Unit of Oral & Maxillofacial Medicine & Surgery, University of Sheffield School of Clinical Dentistry, Sheffield, UK
| | - Anne M Hegarty
- Unit of Oral Medicine, Charles Clifford Dental Hospital, Sheffield Teaching Hospitals National Health Service Foundation Trust, Sheffield, UK
| | - Katy D'Apice
- Unit of Oral Medicine, Charles Clifford Dental Hospital, Sheffield Teaching Hospitals National Health Service Foundation Trust, Sheffield, UK
| | - Joan A Phelan
- New York University College of Dentistry, Department of Oral and Maxillofacial Pathology, Radiology & Medicine, New York, NY, USA
| | - Patricia M Corby
- New York University School of Medicine, Department of Population Health and Radiation Oncology, New York, NY, USA
| | - Ismael Khouly
- New York University College of Dentistry, Bluestone Center for Clinical Research, New York, NY, USA
| | - Jerry Bouquot
- The University of Texas Health Science Center at Houston, Department of Diagnostic and Biomedical Sciences, Houston, TX, USA
| | - Nagi M Demian
- The University of Texas Health Science Center at Houston, Department of Oral and Maxillofacial Surgery, Houston, TX, USA
| | - Y Etan Weinstock
- The University of Texas Health Science Center at Houston, Department of Otolaryngology-Head and Neck Surgery, Houston, TX, USA
| | - Stephanie Rowan
- The University of Texas Health Science Center at San Antonio, Department of Comprehensive Dentistry and Cancer Therapy and Research Center, San Antonio, TX, USA
| | - Chih-Ko Yeh
- The University of Texas Health Science Center at San Antonio, Department of Comprehensive Dentistry and Cancer Therapy and Research Center, San Antonio, TX, USA; South Texas Veterans Health Care System, Geriatric Research, Education, and Clinical Center, San Antonio, TX, USA
| | - H Stan McGuff
- The University of Texas Health Science Center at San Antonio, Department of Pathology, San Antonio, TX, USA
| | - Frank R Miller
- The University of Texas Health Science Center at San Antonio, Department of Otolaryngology-Head and Neck Surgery and Cancer Therapy and Research Center, San Antonio, TX, USA
| | - Surabhi Gaur
- Rice University, Department of Bioengineering, Houston, TX, USA
| | | | - Leander Taylor
- Rice University, Department of Bioengineering, Houston, TX, USA
| | - Cathy Le
- Rice University, Department of Bioengineering, Houston, TX, USA
| | - Michael Nguyen
- Rice University, Department of Bioengineering, Houston, TX, USA
| | | | - Rameez Raja
- Rice University, Department of Bioengineering, Houston, TX, USA
| | - Jorge Wong
- Rice University, Department of Bioengineering, Houston, TX, USA
| | - John T McDevitt
- Rice University, Department of Bioengineering, Houston, TX, USA; Rice University, Department of Chemistry, Houston, TX, USA; New York University, Department of Biomaterials, New York, NY, USA.
| |
Collapse
|
33
|
Wishart DS. Emerging applications of metabolomics in drug discovery and precision medicine. Nat Rev Drug Discov 2016; 15:473-84. [PMID: 26965202 DOI: 10.1038/nrd.2016.32] [Citation(s) in RCA: 879] [Impact Index Per Article: 109.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Metabolomics is an emerging 'omics' science involving the comprehensive characterization of metabolites and metabolism in biological systems. Recent advances in metabolomics technologies are leading to a growing number of mainstream biomedical applications. In particular, metabolomics is increasingly being used to diagnose disease, understand disease mechanisms, identify novel drug targets, customize drug treatments and monitor therapeutic outcomes. This Review discusses some of the latest technological advances in metabolomics, focusing on the application of metabolomics towards uncovering the underlying causes of complex diseases (such as atherosclerosis, cancer and diabetes), the growing role of metabolomics in drug discovery and its potential effect on precision medicine.
Collapse
Affiliation(s)
- David S Wishart
- Department of Biological Sciences, CW 405, Biological Sciences Building, University of Alberta, Edmonton, Alberta, Canada T6G 2E9.,Department of Computing Science, 2-21 Athabasca Hall University of Alberta, Edmonton, Alberta, Canada T6G 2E8.,National Institute of Nanotechnology, National Research Council, Edmonton, Alberta, Canada T6G 2M9
| |
Collapse
|
34
|
Chang Y, Glass K, Liu YY, Silverman EK, Crapo JD, Tal-Singer R, Bowler R, Dy J, Cho M, Castaldi P. COPD subtypes identified by network-based clustering of blood gene expression. Genomics 2016; 107:51-58. [PMID: 26773458 DOI: 10.1016/j.ygeno.2016.01.004] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Revised: 12/04/2015] [Accepted: 01/06/2016] [Indexed: 01/22/2023]
Abstract
One of the most common smoking-related diseases, chronic obstructive pulmonary disease (COPD), results from a dysregulated, multi-tissue inflammatory response to cigarette smoke. We hypothesized that systemic inflammatory signals in genome-wide blood gene expression can identify clinically important COPD-related disease subtypes, and we leveraged pre-existing gene interaction networks to guide unsupervised clustering of blood microarray expression data. Using network-informed non-negative matrix factorization, we analyzed genome-wide blood gene expression from 229 former smokers in the ECLIPSE Study, and we identified novel, clinically relevant molecular subtypes of COPD. These network-informed clusters were more stable and more strongly associated with measures of lung structure and function than clusters derived from a network-naïve approach, and they were associated with subtype-specific enrichment for inflammatory and protein catabolic pathways. These clusters were successfully reproduced in an independent sample of 135 smokers from the COPDGene Study.
Collapse
Affiliation(s)
- Yale Chang
- Department of Computer Science, Northeastern University, Boston, USA
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, USA
| | - Yang-Yu Liu
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, USA; Pulmonary and Critical Care Division, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - James D Crapo
- Department of Medicine, National Jewish Health, Denver, USA
| | | | - Russ Bowler
- Department of Medicine, National Jewish Health, Denver, USA
| | - Jennifer Dy
- Department of Computer Science, Northeastern University, Boston, USA
| | - Michael Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, USA; Pulmonary and Critical Care Division, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - Peter Castaldi
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, USA; Division of General Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA.
| |
Collapse
|
35
|
Arnold JM, Choi WT, Sreekumar A, Maletić-Savatić M. Analytical strategies for studying stem cell metabolism. ACTA ACUST UNITED AC 2015. [PMID: 26213533 DOI: 10.1007/s11515-015-1357-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Owing to their capacity for self-renewal and pluripotency, stem cells possess untold potential for revolutionizing the field of regenerative medicine through the development of novel therapeutic strategies for treating cancer, diabetes, cardiovascular and neurodegenerative diseases. Central to developing these strategies is improving our understanding of biological mechanisms responsible for governing stem cell fate and self-renewal. Increasing attention is being given to the significance of metabolism, through the production of energy and generation of small molecules, as a critical regulator of stem cell functioning. Rapid advances in the field of metabolomics now allow for in-depth profiling of stem cells both in vitro and in vivo, providing a systems perspective on key metabolic and molecular pathways which influence stem cell biology. Understanding the analytical platforms and techniques that are currently used to study stem cell metabolomics, as well as how new insights can be derived from this knowledge, will accelerate new research in the field and improve future efforts to expand our understanding of the interplay between metabolism and stem cell biology.
Collapse
Affiliation(s)
- James M Arnold
- Department of Molecular and Cell Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - William T Choi
- Program in Developmental Biology and Medical Scientist Training Program, Baylor College of Medicine; Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, TX 77030, USA
| | - Arun Sreekumar
- Department of Molecular and Cell Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Mirjana Maletić-Savatić
- Program in Developmental Biology and Medical Scientist Training Program, Baylor College of Medicine; Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, TX 77030, USA ; Departments of Pediatrics-Neurology and Neuroscience, and Program in Structural and Computational Biology and Molecular Biophysics Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
36
|
Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, Vickers AJ, Ransohoff DF, Collins GS. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015; 162:W1-73. [PMID: 25560730 DOI: 10.7326/m14-0698] [Citation(s) in RCA: 2928] [Impact Index Per Article: 325.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) Statement includes a 22-item checklist, which aims to improve the reporting of studies developing, validating, or updating a prediction model, whether for diagnostic or prognostic purposes. The TRIPOD Statement aims to improve the transparency of the reporting of a prediction model study regardless of the study methods used. This explanation and elaboration document describes the rationale; clarifies the meaning of each item; and discusses why transparent reporting is important, with a view to assessing risk of bias and clinical usefulness of the prediction model. Each checklist item of the TRIPOD Statement is explained in detail and accompanied by published examples of good reporting. The document also provides a valuable reference of issues to consider when designing, conducting, and analyzing prediction model studies. To aid the editorial process and help peer reviewers and, ultimately, readers and systematic reviewers of prediction model studies, it is recommended that authors include a completed checklist in their submission. The TRIPOD checklist can also be downloaded from www.tripod-statement.org.
Collapse
|
37
|
Collins GS. Statistical flaws in the development of a prediction model. Am J Obstet Gynecol 2015; 212:116. [PMID: 25218126 DOI: 10.1016/j.ajog.2014.09.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 09/04/2014] [Indexed: 10/24/2022]
|
38
|
Shou Y, Robinson DM, Amakye DD, Rose KL, Cho YJ, Ligon KL, Sharp T, Haider AS, Bandaru R, Ando Y, Geoerger B, Doz F, Ashley DM, Hargrave DR, Casanova M, Tawbi HA, Rodon J, Thomas AL, Mita AC, MacDonald TJ, Kieran MW. A five-gene hedgehog signature developed as a patient preselection tool for hedgehog inhibitor therapy in medulloblastoma. Clin Cancer Res 2014; 21:585-93. [PMID: 25473003 DOI: 10.1158/1078-0432.ccr-13-1711] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
PURPOSE Distinct molecular subgroups of medulloblastoma, including hedgehog (Hh) pathway-activated disease, have been reported. We identified and clinically validated a five-gene Hh signature assay that can be used to preselect patients with Hh pathway-activated medulloblastoma. EXPERIMENTAL DESIGN Gene characteristics of the Hh medulloblastoma subgroup were identified through published bioinformatic analyses. Thirty-two genes shown to be differentially expressed in fresh-frozen and formalin-fixed paraffin-embedded tumor samples and reproducibly analyzed by RT-PCR were measured in matched samples. These data formed the basis for building a multi-gene logistic regression model derived through elastic net methods from which the five-gene Hh signature emerged after multiple iterations. On the basis of signature gene expression levels, the model computed a propensity score to determine Hh activation using a threshold set a priori. The association between Hh activation status and tumor response to the Hh pathway inhibitor sonidegib (LDE225) was analyzed. RESULTS Five differentially expressed genes in medulloblastoma (GLI1, SPHK1, SHROOM2, PDLIM3, and OTX2) were found to associate with Hh pathway activation status. In an independent validation study, Hh activation status of 25 medulloblastoma samples showed 100% concordance between the five-gene signature and Affymetrix profiling. Further, in medulloblastoma samples from 50 patients treated with sonidegib, all 6 patients who responded were found to have Hh-activated tumors. Three patients with Hh-activated tumors had stable or progressive disease. No patients with Hh-nonactivated tumors responded. CONCLUSIONS This five-gene Hh signature can robustly identify Hh-activated medulloblastoma and may be used to preselect patients who might benefit from sonidegib treatment.
Collapse
Affiliation(s)
- Yaping Shou
- Novartis Institutes for BioMedical Research, Inc, Cambridge, Massachusetts
| | - Douglas M Robinson
- Novartis Institutes for BioMedical Research, Inc, Cambridge, Massachusetts
| | - Dereck D Amakye
- Novartis Pharmaceuticals Corporation, East Hanover, New Jersey
| | - Kristine L Rose
- Novartis Pharmaceuticals Corporation, East Hanover, New Jersey
| | - Yoon-Jae Cho
- Departments of Neurology and Neurosurgery, Stanford University School of Medicine, Stanford, California
| | - Keith L Ligon
- Pediatric Neuro-Oncology, Dana-Farber Cancer Institute and Boston Children's Hospital and Harvard Medical School, Boston, Massachusetts. Department of Pathology, Children's Hospital Boston, Brigham and Women's Hospital, and Harvard Medical School, Boston, Massachusetts. Department of Medical Oncology and Center for Molecular Oncologic Pathology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Thad Sharp
- Novartis Institutes for BioMedical Research, Inc, Cambridge, Massachusetts
| | - Asifa S Haider
- Novartis Pharmaceuticals Corporation, East Hanover, New Jersey
| | - Raj Bandaru
- Novartis Institutes for BioMedical Research, Inc, Cambridge, Massachusetts
| | | | - Birgit Geoerger
- Institut Gustave Roussy, University Paris-Sud, Villejuif, France
| | - François Doz
- Institut Curie and University Paris Descartes, Sorbonne Paris Cité, France
| | | | | | | | - Hussein A Tawbi
- University of Pittsburgh Cancer Institute and University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Jordi Rodon
- Vall d'Hebron Institut d'Oncologia, and Universitat Autonoma de Barcelona, Barcelona, Spain
| | | | - Alain C Mita
- Cancer Therapy and Research Center, University of Texas Health Science Center, San Antonio, Texas
| | - Tobey J MacDonald
- Children's Healthcare of Atlanta, Aflac Cancer and Blood Disorders Center, Emory University School of Medicine, Atlanta, Georgia
| | - Mark W Kieran
- Pediatric Neuro-Oncology, Dana-Farber Cancer Institute and Boston Children's Hospital and Harvard Medical School, Boston, Massachusetts.
| |
Collapse
|
39
|
De Bin R, Herold T, Boulesteix AL. Added predictive value of omics data: specific issues related to validation illustrated by two case studies. BMC Med Res Methodol 2014; 14:117. [PMID: 25352096 PMCID: PMC4271356 DOI: 10.1186/1471-2288-14-117] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2014] [Accepted: 09/18/2014] [Indexed: 01/06/2023] Open
Abstract
Background In the last years, the importance of independent validation of the prediction ability of a new gene signature has been largely recognized. Recently, with the development of gene signatures which integrate rather than replace the clinical predictors in the prediction rule, the focus has been moved to the validation of the added predictive value of a gene signature, i.e. to the verification that the inclusion of the new gene signature in a prediction model is able to improve its prediction ability. Methods The high-dimensional nature of the data from which a new signature is derived raises challenging issues and necessitates the modification of classical methods to adapt them to this framework. Here we show how to validate the added predictive value of a signature derived from high-dimensional data and critically discuss the impact of the choice of methods on the results. Results The analysis of the added predictive value of two gene signatures developed in two recent studies on the survival of leukemia patients allows us to illustrate and empirically compare different validation techniques in the high-dimensional framework. Conclusions The issues related to the high-dimensional nature of the omics predictors space affect the validation process. An analysis procedure based on repeated cross-validation is suggested.
Collapse
Affiliation(s)
- Riccardo De Bin
- Department of Medical Informatics, Biometry and Epidemiology, Ludwig-Maximilians-Universität, Marchioninistr, 15, 81377 München, Germany.
| | | | | |
Collapse
|
40
|
Bernau C, Riester M, Boulesteix AL, Parmigiani G, Huttenhower C, Waldron L, Trippa L. Cross-study validation for the assessment of prediction algorithms. ACTA ACUST UNITED AC 2014; 30:i105-12. [PMID: 24931973 PMCID: PMC4058929 DOI: 10.1093/bioinformatics/btu279] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Motivation: Numerous competing algorithms for prediction in high-dimensional settings have been developed in the statistical and machine-learning literature. Learning algorithms and the prediction models they generate are typically evaluated on the basis of cross-validation error estimates in a few exemplary datasets. However, in most applications, the ultimate goal of prediction modeling is to provide accurate predictions for independent samples obtained in different settings. Cross-validation within exemplary datasets may not adequately reflect performance in the broader application context. Methods: We develop and implement a systematic approach to ‘cross-study validation’, to replace or supplement conventional cross-validation when evaluating high-dimensional prediction models in independent datasets. We illustrate it via simulations and in a collection of eight estrogen-receptor positive breast cancer microarray gene-expression datasets, where the objective is predicting distant metastasis-free survival (DMFS). We computed the C-index for all pairwise combinations of training and validation datasets. We evaluate several alternatives for summarizing the pairwise validation statistics, and compare these to conventional cross-validation. Results: Our data-driven simulations and our application to survival prediction with eight breast cancer microarray datasets, suggest that standard cross-validation produces inflated discrimination accuracy for all algorithms considered, when compared to cross-study validation. Furthermore, the ranking of learning algorithms differs, suggesting that algorithms performing best in cross-validation may be suboptimal when evaluated through independent validation. Availability: The survHD: Survival in High Dimensions package (http://www.bitbucket.org/lwaldron/survhd) will be made available through Bioconductor. Contact:levi.waldron@hunter.cuny.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christoph Bernau
- Leibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USALeibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USA
| | - Markus Riester
- Leibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USALeibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USA
| | - Anne-Laure Boulesteix
- Leibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USA
| | - Giovanni Parmigiani
- Leibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USALeibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USA
| | - Curtis Huttenhower
- Leibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USA
| | - Levi Waldron
- Leibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USA
| | - Lorenzo Trippa
- Leibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USALeibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USA
| |
Collapse
|
41
|
Tzoulaki I, Ebbels TMD, Valdes A, Elliott P, Ioannidis JPA. Design and analysis of metabolomics studies in epidemiologic research: a primer on -omic technologies. Am J Epidemiol 2014; 180:129-39. [PMID: 24966222 DOI: 10.1093/aje/kwu143] [Citation(s) in RCA: 133] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Metabolomics is the field of "-omics" research concerned with the comprehensive characterization of the small low-molecular-weight metabolites in biological samples. In epidemiology, it represents an emerging technology and an unprecedented opportunity to measure environmental and other exposures with improved precision and far less measurement error than with standard epidemiologic methods. Advances in the application of metabolomics in large-scale epidemiologic research are now being realized through a combination of improved sample preparation and handling, automated laboratory and processing methods, and reduction in costs. The number of epidemiologic studies that use metabolic profiling is still limited, but it is fast gaining popularity in this area. In the present article, we present a roadmap for metabolomic analyses in epidemiologic studies and discuss the various challenges these data pose to large-scale studies. We discuss the steps of data preprocessing, univariate and multivariate data analysis, correction for multiplicity of comparisons with correlated data, and finally the steps of cross-validation and external validation. As data from metabolomic studies accumulate in epidemiology, there is a need for large-scale replication and synthesis of findings, increased availability of raw data, and a focus on good study design, all of which will highlight the potential clinical impact of metabolomics in this field.
Collapse
|
42
|
De Bin R, Sauerbrei W, Boulesteix AL. Investigating the prediction ability of survival models based on both clinical and omics data: two case studies. Stat Med 2014; 33:5310-29. [PMID: 25042390 DOI: 10.1002/sim.6246] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 04/22/2014] [Accepted: 05/31/2014] [Indexed: 12/25/2022]
Abstract
In biomedical literature, numerous prediction models for clinical outcomes have been developed based either on clinical data or, more recently, on high-throughput molecular data (omics data). Prediction models based on both types of data, however, are less common, although some recent studies suggest that a suitable combination of clinical and molecular information may lead to models with better predictive abilities. This is probably due to the fact that it is not straightforward to combine data with different characteristics and dimensions (poorly characterized high-dimensional omics data, well-investigated low-dimensional clinical data). In this paper, we analyze two publicly available datasets related to breast cancer and neuroblastoma, respectively, in order to show some possible ways to combine clinical and omics data into a prediction model of time-to-event outcome. Different strategies and statistical methods are exploited. The results are compared and discussed according to different criteria, including the discriminative ability of the models, computed on a validation dataset.
Collapse
Affiliation(s)
- Riccardo De Bin
- Department of Medical Informatics, Biometry and Epidemiology, Ludwig-Maximilians-Universität of Munich, Germany
| | | | | |
Collapse
|
43
|
Bendl J, Stourac J, Salanda O, Pavelka A, Wieben ED, Zendulka J, Brezovsky J, Damborsky J. PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations. PLoS Comput Biol 2014; 10:e1003440. [PMID: 24453961 PMCID: PMC3894168 DOI: 10.1371/journal.pcbi.1003440] [Citation(s) in RCA: 534] [Impact Index Per Article: 53.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 12/03/2013] [Indexed: 02/07/2023] Open
Abstract
Single nucleotide variants represent a prevalent form of genetic variation. Mutations in the coding regions are frequently associated with the development of various genetic diseases. Computational tools for the prediction of the effects of mutations on protein function are very important for analysis of single nucleotide variants and their prioritization for experimental characterization. Many computational tools are already widely employed for this purpose. Unfortunately, their comparison and further improvement is hindered by large overlaps between the training datasets and benchmark datasets, which lead to biased and overly optimistic reported performances. In this study, we have constructed three independent datasets by removing all duplicities, inconsistencies and mutations previously used in the training of evaluated tools. The benchmark dataset containing over 43,000 mutations was employed for the unbiased evaluation of eight established prediction tools: MAPP, nsSNPAnalyzer, PANTHER, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP. The six best performing tools were combined into a consensus classifier PredictSNP, resulting into significantly improved prediction performance, and at the same time returned results for all mutations, confirming that consensus prediction represents an accurate and robust alternative to the predictions delivered by individual tools. A user-friendly web interface enables easy access to all eight prediction tools, the consensus classifier PredictSNP and annotations from the Protein Mutant Database and the UniProt database. The web server and the datasets are freely available to the academic community at http://loschmidt.chemi.muni.cz/predictsnp.
Collapse
Affiliation(s)
- Jaroslav Bendl
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
- Center of Biomolecular and Cellular Engineering, International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jan Stourac
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic
- Center of Biomolecular and Cellular Engineering, International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Ondrej Salanda
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
| | - Antonin Pavelka
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Eric D. Wieben
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, New York, United States of America
| | - Jaroslav Zendulka
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
| | - Jan Brezovsky
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic
- * E-mail: (JB); (JD)
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic
- Center of Biomolecular and Cellular Engineering, International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
- * E-mail: (JB); (JD)
| |
Collapse
|
44
|
Ioannidis JPA, Greenland S, Hlatky MA, Khoury MJ, Macleod MR, Moher D, Schulz KF, Tibshirani R. Increasing value and reducing waste in research design, conduct, and analysis. Lancet 2014; 383:166-75. [PMID: 24411645 PMCID: PMC4697939 DOI: 10.1016/s0140-6736(13)62227-8] [Citation(s) in RCA: 957] [Impact Index Per Article: 95.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Correctable weaknesses in the design, conduct, and analysis of biomedical and public health research studies can produce misleading results and waste valuable resources. Small effects can be difficult to distinguish from bias introduced by study design and analyses. An absence of detailed written protocols and poor documentation of research is common. Information obtained might not be useful or important, and statistical precision or power is often too low or used in a misleading way. Insufficient consideration might be given to both previous and continuing studies. Arbitrary choice of analyses and an overemphasis on random extremes might affect the reported findings. Several problems relate to the research workforce, including failure to involve experienced statisticians and methodologists, failure to train clinical researchers and laboratory scientists in research methods and design, and the involvement of stakeholders with conflicts of interest. Inadequate emphasis is placed on recording of research decisions and on reproducibility of research. Finally, reward systems incentivise quantity more than quality, and novelty more than reliability. We propose potential solutions for these problems, including improvements in protocols and documentation, consideration of evidence from studies in progress, standardisation of research efforts, optimisation and training of an experienced and non-conflicted scientific workforce, and reconsideration of scientific reward systems.
Collapse
Affiliation(s)
- John P A Ioannidis
- Stanford Prevention Research Center, Department of Medicine, School of Medicine, Stanford University, Stanford, CA, USA; Division of Epidemiology, School of Medicine, Stanford University, Stanford, CA, USA; Department of Statistics, School of Humanities and Sciences, Stanford University, Stanford, CA, USA; Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA.
| | - Sander Greenland
- Department of Epidemiology and Department of Statistics, UCLA School of Public Health, Los Angeles, CA, USA
| | - Mark A Hlatky
- Division of Cardiovascular Medicine, Department of Medicine, School of Medicine, Stanford University, Stanford, CA, USA; Division of Health Services Research, Stanford University, Stanford, CA, USA
| | - Muin J Khoury
- Office of Public Health Genomics, Centers for Disease Control and Prevention, Atlanta, GA, USA; Epidemiology and Genomics Research Program, National Cancer Institute, Rockville, MD, USA
| | - Malcolm R Macleod
- Department of Clinical Neurosciences, University of Edinburgh School of Medicine, Edinburgh, UK
| | - David Moher
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, University of Ottawa, Ottawa, ON, Canada; Department of Epidemiology and Community Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Kenneth F Schulz
- FHI 360, Durham, NC, USA; Department of Obstetrics and Gynecology, University of North Carolina School of Medicine, Chapel Hill, NC, USA
| | - Robert Tibshirani
- Department of Health Research and Policy, Stanford University, Stanford, CA, USA; Department of Statistics, School of Humanities and Sciences, Stanford University, Stanford, CA, USA
| |
Collapse
|
45
|
Fridlyand J, Yeh RF, Mackey H, Bengtsson T, Delmar P, Spaniolo G, Lieberman G. An industry statistician's perspective on PHC drug development. Contemp Clin Trials 2013; 36:624-35. [PMID: 23648396 DOI: 10.1016/j.cct.2013.04.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2013] [Revised: 04/11/2013] [Accepted: 04/25/2013] [Indexed: 10/26/2022]
Abstract
In the past decade, the cost of drug development has increased significantly. The estimates vary widely but frequently quoted numbers are staggering-it takes 10-15 years and billions of dollars to bring a drug to patients. To a large extent this is due to many long, expensive and ultimately unsuccessful drug trials. While one approach to combat the low yield on investment could be to continue searching for new blockbusters, an alternative method would lead us to focus on testing new targeted treatments that have a strong underlying scientific rationale and are more likely to provide enhanced clinical benefit in population subsets defined by molecular diagnostics. Development of these new treatments, however, cannot follow the usual established path; new strategies and approaches are required for the co-development of novel therapeutics and the diagnostic. In this paper we will review, from the point of view of industry, the approaches to, and challenges of drug development strategies incorporating predictive biomarkers into clinical programs. We will outline the basic concepts behind co-development with predictive biomarkers and summarize the current regulatory paradigm. We will present guiding principles of personalized health care (PHC) development and review the statistical, strategic, regulatory and operational challenges that statisticians regularly encounter on development programs with a PHC component. Some practical recommendations for team statisticians involved in PHC drug development are included. The majority of the examples and recommendations are drawn from oncology but broader concepts apply across all therapeutic areas.
Collapse
|
46
|
Okser S, Pahikkala T, Aittokallio T. Genetic variants and their interactions in disease risk prediction - machine learning and network perspectives. BioData Min 2013; 6:5. [PMID: 23448398 PMCID: PMC3606427 DOI: 10.1186/1756-0381-6-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Accepted: 02/11/2013] [Indexed: 12/31/2022] Open
Abstract
A central challenge in systems biology and medical genetics is to understand how interactions among genetic loci contribute to complex phenotypic traits and human diseases. While most studies have so far relied on statistical modeling and association testing procedures, machine learning and predictive modeling approaches are increasingly being applied to mining genotype-phenotype relationships, also among those associations that do not necessarily meet statistical significance at the level of individual variants, yet still contributing to the combined predictive power at the level of variant panels. Network-based analysis of genetic variants and their interaction partners is another emerging trend by which to explore how sub-network level features contribute to complex disease processes and related phenotypes. In this review, we describe the basic concepts and algorithms behind machine learning-based genetic feature selection approaches, their potential benefits and limitations in genome-wide setting, and how physical or genetic interaction networks could be used as a priori information for providing improved predictive power and mechanistic insights into the disease networks. These developments are geared toward explaining a part of the missing heritability, and when combined with individual genomic profiling, such systems medicine approaches may also provide a principled means for tailoring personalized treatment strategies in the future.
Collapse
|
47
|
Verma M, Khoury MJ, Ioannidis JPA. Opportunities and challenges for selected emerging technologies in cancer epidemiology: mitochondrial, epigenomic, metabolomic, and telomerase profiling. Cancer Epidemiol Biomarkers Prev 2013; 22:189-200. [PMID: 23242141 PMCID: PMC3565041 DOI: 10.1158/1055-9965.epi-12-1263] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Remarkable progress has been made in the last decade in new methods for biologic measurements using sophisticated technologies that go beyond the established genome, proteome, and gene expression platforms. These methods and technologies create opportunities to enhance cancer epidemiologic studies. In this article, we describe several emerging technologies and evaluate their potential in epidemiologic studies. We review the background, assays, methods, and challenges and offer examples of the use of mitochondrial DNA and copy number assessments, epigenomic profiling (including methylation, histone modification, miRNAs, and chromatin condensation), metabolite profiling (metabolomics), and telomere measurements. We map the volume of literature referring to each one of these measurement tools and the extent to which efforts have been made at knowledge integration (e.g., systematic reviews and meta-analyses). We also clarify strengths and weaknesses of the existing platforms and the range of type of samples that can be tested with each of them. These measurement tools can be used in identifying at-risk populations and providing novel markers of survival and treatment response. Rigorous analytic and validation standards, transparent availability of massive data, and integration in large-scale evidence are essential in fulfilling the potential of these technologies.
Collapse
Affiliation(s)
- Mukesh Verma
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, NIH, Bethesda, MD 20892, USA.
| | | | | |
Collapse
|
48
|
Abstract
Deregulation of gene expression, a hallmark of cancer, is caused by both genetic and epigenetic mechanisms. The rapid accumulation of epigenome maps of various cancers suggests a new avenue of research, namely integrating epigenomic data with other types of omic data for cancer diagnosis, prognosis, and biomarker discovery. We introduce the MAPIT algorithm (Multi Analyte Pathway Inference Tool), to enable principled integration of epigenomic, transcriptomic, and protein interactome data. As a proof-of-principle, we apply MAPIT to glioblastoma multiforme (GBM), the most common and aggressive form of brain tumor. Few predictive markers were reported for the prognosis of GBM patients. By integrating mRNA transcriptome, promoter DNA methylome and protein-protein physical interactome, we find ten expression- and three methylation-based network markers, involving 118 genes. When tested on additional GBM patient samples, the prognostic accuracy of the multi-analyte network markers (73.5%) is 9.7% and 8.6% higher than previous prognostic signatures built on gene expression or DNA methylation alone. Our results highlight the critical role of two novel pathways in the prognosis of GBM patients, small GTPase-mediated protein trafficking and ubiquitination-dependent protein degradation. A better understanding of these two pathways could lead to personalized therapies for subgroups of GBM patients. Our study demonstrates that integrating epigenomic, transcriptomic, and interactomic data can improve the accuracy network-based prognosis markers and lead to novel mechanistic understanding of cancer.
Collapse
Affiliation(s)
- Jongkwang Kim
- Department of Internal Medicine, University of Iowa, Iowa City, Iowa, United States of America
| | - Long Gao
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa, United States of America
| | - Kai Tan
- Department of Internal Medicine, University of Iowa, Iowa City, Iowa, United States of America
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa, United States of America
- * E-mail:
| |
Collapse
|
49
|
Hwang S. Comparison and evaluation of pathway-level aggregation methods of gene expression data. BMC Genomics 2012; 13 Suppl 7:S26. [PMID: 23282027 PMCID: PMC3521227 DOI: 10.1186/1471-2164-13-s7-s26] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Background Microarray experiments produce expression measurements in genomic scale. A way to derive functional understanding of the data is to focus on functional sets of genes, such as pathways, instead of individual genes. While a common practice for the pathway-level analysis has been functional enrichment analysis such as over-representation analysis and gene set enrichment analysis, an alternative approach has also been explored. In this approach, gene expression data are first aggregated at pathway level to transform the original data into a compact representation in which each row corresponds to a pathway instead of a gene. Thereafter the pathway expression data can be used for differential expression and classification analyses in pathway space, leveraging existing algorithms usually applied to gene expression data. While several studies have proposed the pathway-level aggregation methods, it remains unclear how they compare with one another, since the evaluations were done to a limited extent. Thus this study presents a comprehensive evaluation of six most prominent aggregation methods. Results The compared methods include five existing methods--mean of all member genes (Mean all), mean of condition-responsive genes (Mean CORGs), analysis of sample set enrichment scores (ASSESS), principal component analysis (PCA), and partial least squares (PLS)--and a variant of an existing method (Mean top 50%, averaging top half of member genes). Comprehensive and stringent benchmarking was performed by collecting seven pairs of related but independent datasets encompassing various phenotypes. Aggregation was done in the space of KEGG pathways. Performance of the methods was assessed by classification accuracy validated both internally and externally, and by examining the correlative extent of pathway signatures between the dataset pairs. The assessment revealed that (i) the best accuracy and correlation were obtained from ASSESS and Mean top 50%, (ii) Mean all showed the lowest accuracy, and (iii) Mean CORGs and PLS gave rise to the largest extent of discordance in the pathway signature correlation. Conclusions The two best performing method (ASSESS and Mean top 50%) are suggested to be preferred. The benchmarking analysis also suggests that there is both room and necessity for developing a novel method for pathway-level aggregation.
Collapse
Affiliation(s)
- Seungwoo Hwang
- Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea.
| |
Collapse
|
50
|
Kern SE. Why your new cancer biomarker may never work: recurrent patterns and remarkable diversity in biomarker failures. Cancer Res 2012; 72:6097-101. [PMID: 23172309 DOI: 10.1158/0008-5472.can-12-3232] [Citation(s) in RCA: 159] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Less than 1% of published cancer biomarkers actually enter clinical practice. Although best practices for biomarker development are published, optimistic investigators may not appreciate the statistical near-certainty and diverse modes by which the other 99% (likely including your favorite new marker) do indeed fail. Here, patterns of failure were abstracted for classification from publications and an online database detailing marker failures. Failure patterns formed a hierarchical logical structure, or outline, of an emerging, deeply complex, and arguably fascinating science of biomarker failure. A new cancer biomarker under development is likely to have already encountered one or more of the following fatal features encountered by prior markers: lack of clinical significance, hidden structure in the source data, a technically inadequate assay, inappropriate statistical methods, unmanageable domination of the data by normal variation, implausibility, deficiencies in the studied population or in the investigator system, and its disproof or abandonment for cause by others. A greater recognition of the science of biomarker failure and its near-complete ubiquity is constructive and celebrates a seemingly perpetual richness of biologic, technical, and philosophical complexity, the full appreciation of which could improve the management of scarce research resources.
Collapse
Affiliation(s)
- Scott E Kern
- The Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins University, Department of Oncology, 1650 Orleans Avenue, Baltimore, MD 21287, USA.
| |
Collapse
|