Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Castaldi PJ, Dahabreh IJ, Ioannidis JPA. An empirical assessment of validation practices for molecular classifiers. Brief Bioinform 2011;12:189-202. [PMID: 21300697 PMCID: PMC3088312 DOI: 10.1093/bib/bbq073] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Revised: 11/02/2010] [Indexed: 12/12/2022] Open

For:	Castaldi PJ, Dahabreh IJ, Ioannidis JPA. An empirical assessment of validation practices for molecular classifiers. Brief Bioinform 2011;12:189-202. [PMID: 21300697 PMCID: PMC3088312 DOI: 10.1093/bib/bbq073] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Revised: 11/02/2010] [Indexed: 12/12/2022] Open

Number

Cited by Other Article(s)

Rodgers O, Mills C, Watson C, Waterfield T. Role of diagnostic tests for sepsis in children: a review. Arch Dis Child 2024;109:786-793. [PMID: 38262696 DOI: 10.1136/archdischild-2023-325984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 01/10/2024] [Indexed: 01/25/2024]

Kneipp J, Seifert S, Gärber F. SERS microscopy as a tool for comprehensive biochemical characterization in complex samples. Chem Soc Rev 2024;53:7641-7656. [PMID: 38934892 DOI: 10.1039/d4cs00460d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]

Magowan D, Abdulshafea M, Thompson D, Rajamoorthy SI, Owen R, Harris D, Prosser S. Blood-based biomarkers and novel technologies for the diagnosis of colorectal cancer and adenomas: a narrative review. Biomark Med 2024;18:493-506. [PMID: 38900496 DOI: 10.1080/17520363.2024.2345583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 03/12/2024] [Indexed: 06/21/2024] Open

Álvarez-Machancoses Ó, Faraggi E, deAndrés-Galiana EJ, Fernández-Martínez JL, Kloczkowski A. Prediction of Deleterious Single Amino Acid Polymorphisms with a Consensus Holdout Sampler. Curr Genomics 2024;25:171-184. [PMID: 39086995 PMCID: PMC11288160 DOI: 10.2174/0113892029236347240308054538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 08/03/2023] [Accepted: 09/22/2023] [Indexed: 08/02/2024] Open

Mróz J, Pelc M, Mitusińska K, Chorostowska-Wynimko J, Jezela-Stanek A. Computational Tools to Assist in Analyzing Effects of the SERPINA1 Gene Variation on Alpha-1 Antitrypsin (AAT). Genes (Basel) 2024;15:340. [PMID: 38540399 PMCID: PMC10970068 DOI: 10.3390/genes15030340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 02/28/2024] [Accepted: 03/04/2024] [Indexed: 06/14/2024] Open

LeeVan E, Pinsky P. Predictive Performance of Cell-Free Nucleic Acid-Based Multi-Cancer Early Detection Tests: A Systematic Review. Clin Chem 2024;70:90-101. [PMID: 37791504 DOI: 10.1093/clinchem/hvad134] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 07/24/2023] [Indexed: 10/05/2023]

Mottin L, Goldman JP, Jäggli C, Achermann R, Gobeill J, Knafou J, Ehrsam J, Wicky A, Gérard CL, Schwenk T, Charrier M, Tsantoulis P, Lovis C, Leichtle A, Kiessling MK, Michielin O, Pradervand S, Foufi V, Ruch P. Multilingual RECIST classification of radiology reports using supervised learning. Front Digit Health 2023;5:1195017. [PMID: 37388252 PMCID: PMC10303934 DOI: 10.3389/fdgth.2023.1195017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/05/2023] [Indexed: 07/01/2023] Open

Affiliation(s)

Luc Mottin HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
Jean-Philippe Goldman Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
Christoph Jäggli Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
Rita Achermann Department of Radiology, Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
Julien Gobeill HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
Julien Knafou HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
Julien Ehrsam Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
Alexandre Wicky Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
Camille L. Gérard Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
Tanja Schwenk Department of Oncology, Kantonsspital Aarau, Aarau, Switzerland
Mélinda Charrier Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
Petros Tsantoulis Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
Christian Lovis Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
Alexander Leichtle Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
Michael K. Kiessling Department of Medical Oncology and Hematology, University Hospital Zurich, Zurich, Switzerland
Olivier Michielin Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
Sylvain Pradervand Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
Vasiliki Foufi Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
Patrick Ruch HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland

Collapse

Sandve GK, Greiff V. Access to ground truth at unconstrained size makes simulated data as indispensable as experimental data for bioinformatics methods development and benchmarking. Bioinformatics 2022;38:4994-4996. [PMID: 36073940 PMCID: PMC9620827 DOI: 10.1093/bioinformatics/btac612] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 02/18/2022] [Accepted: 09/08/2022] [Indexed: 11/14/2022] Open

Wiegand M, Cowan SL, Waddington CS, Halsall DJ, Keevil VL, Tom BDM, Taylor V, Gkrania-Klotsas E, Preller J, Goudie RJB. Development and validation of a dynamic 48-hour in-hospital mortality risk stratification for COVID-19 in a UK teaching hospital: a retrospective cohort study. BMJ Open 2022;12:e060026. [PMID: 36691139 PMCID: PMC9445230 DOI: 10.1136/bmjopen-2021-060026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 07/13/2022] [Indexed: 02/02/2023] Open

Abstract

OBJECTIVES

To develop a disease stratification model for COVID-19 that updates according to changes in a patient's condition while in hospital to facilitate patient management and resource allocation.

DESIGN

In this retrospective cohort study, we adopted a landmarking approach to dynamic prediction of all-cause in-hospital mortality over the next 48 hours. We accounted for informative predictor missingness and selected predictors using penalised regression.

SETTING

All data used in this study were obtained from a single UK teaching hospital.

PARTICIPANTS

We developed the model using 473 consecutive patients with COVID-19 presenting to a UK hospital between 1 March 2020 and 12 September 2020; and temporally validated using data on 1119 patients presenting between 13 September 2020 and 17 March 2021.

PRIMARY AND SECONDARY OUTCOME MEASURES

The primary outcome is all-cause in-hospital mortality within 48 hours of the prediction time. We accounted for the competing risks of discharge from hospital alive and transfer to a tertiary intensive care unit for extracorporeal membrane oxygenation.

RESULTS

Our final model includes age, Clinical Frailty Scale score, heart rate, respiratory rate, oxygen saturation/fractional inspired oxygen ratio, white cell count, presence of acidosis (pH <7.35) and interleukin-6. Internal validation achieved an area under the receiver operating characteristic (AUROC) of 0.90 (95% CI 0.87 to 0.93) and temporal validation gave an AUROC of 0.86 (95% CI 0.83 to 0.88).

CONCLUSIONS

Our model incorporates both static risk factors (eg, age) and evolving clinical and laboratory data, to provide a dynamic risk prediction model that adapts to both sudden and gradual changes in an individual patient's clinical condition. On successful external validation, the model has the potential to be a powerful clinical risk assessment tool.

TRIAL REGISTRATION

The study is registered as 'researchregistry5464' on the Research Registry (www.researchregistry.com).

Collapse

Meehan AJ, Lewis SJ, Fazel S, Fusar-Poli P, Steyerberg EW, Stahl D, Danese A. Clinical prediction models in psychiatry: a systematic review of two decades of progress and challenges. Mol Psychiatry 2022;27:2700-2708. [PMID: 35365801 PMCID: PMC9156409 DOI: 10.1038/s41380-022-01528-4] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 03/03/2022] [Accepted: 03/14/2022] [Indexed: 12/13/2022]

Abstract

Recent years have seen the rapid proliferation of clinical prediction models aiming to support risk stratification and individualized care within psychiatry. Despite growing interest, attempts to synthesize current evidence in the nascent field of precision psychiatry have remained scarce. This systematic review therefore sought to summarize progress towards clinical implementation of prediction modeling for psychiatric outcomes. We searched MEDLINE, PubMed, Embase, and PsychINFO databases from inception to September 30, 2020, for English-language articles that developed and/or validated multivariable models to predict (at an individual level) onset, course, or treatment response for non-organic psychiatric disorders (PROSPERO: CRD42020216530). Individual prediction models were evaluated based on three key criteria: (i) mitigation of bias and overfitting; (ii) generalizability, and (iii) clinical utility. The Prediction model Risk Of Bias ASsessment Tool (PROBAST) was used to formally appraise each study's risk of bias. 228 studies detailing 308 prediction models were ultimately eligible for inclusion. 94.5% of developed prediction models were deemed to be at high risk of bias, largely due to inadequate or inappropriate analytic decisions. Insufficient internal validation efforts (within the development sample) were also observed, while only one-fifth of models underwent external validation in an independent sample. Finally, our search identified just one published model whose potential utility in clinical practice was formally assessed. Our findings illustrated significant growth in precision psychiatry with promising progress towards real-world application. Nevertheless, these efforts have been inhibited by a preponderance of bias and overfitting, while the generalizability and clinical utility of many published models has yet to be formally established. Through improved methodological rigor during initial development, robust evaluations of reproducibility via independent validation, and evidence-based implementation frameworks, future research has the potential to generate risk prediction tools capable of enhancing clinical decision-making in psychiatric care.

Collapse

Oh M, Zhang L. Generalizing predictions to unseen sequencing profiles via deep generative models. Sci Rep 2022;12:7151. [PMID: 35504956 PMCID: PMC9065080 DOI: 10.1038/s41598-022-11363-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 04/22/2022] [Indexed: 11/26/2022] Open

Guan BZ, Parmigiani G, Braun D, Trippa L. PREDICTION OF HEREDITARY CANCERS USING NEURAL NETWORKS. Ann Appl Stat 2022;16:495-520. [PMID: 37873507 PMCID: PMC10593124 DOI: 10.1214/21-aoas1510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]

Schwarzer A, Talbot SR, Selich A, Morgan M, Schott JW, Dittrich-Breiholz O, Bastone AL, Weigel B, Ha TC, Dziadek V, Gijsbers R, Thrasher AJ, Staal FJT, Gaspar HB, Modlich U, Schambach A, Rothe M. Predicting genotoxicity of viral vectors for stem cell gene therapy using gene expression-based machine learning. Mol Ther 2021;29:3383-3397. [PMID: 34174440 PMCID: PMC8636173 DOI: 10.1016/j.ymthe.2021.06.017] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 05/12/2021] [Accepted: 06/07/2021] [Indexed: 10/21/2022] Open

Affiliation(s)

Adrian Schwarzer Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany; Department of Hematology, Hemostasis, Oncology and Stem Cell Transplantation, Hannover Medical School, Hannover, Germany
Steven R Talbot Institute for Laboratory Animal Science, Hannover Medical School, Hannover, Germany
Anton Selich Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany
Michael Morgan Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany
Juliane W Schott Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany
Oliver Dittrich-Breiholz Research Core Unit Genomics, Hannover Medical School, Hannover, Germany
Antonella L Bastone Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany
Bettina Weigel Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany
Teng Cheong Ha Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany
Violetta Dziadek Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany
Rik Gijsbers Molecular Virology and Gene Therapy, KU Leuven, Leuven, Belgium
Adrian J Thrasher Molecular and Cellular Immunology Section, UCL Great Ormond Street Institute of Child Health, London, UK
Frank J T Staal Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden 2333 ZA, the Netherlands
Hubert B Gaspar Molecular and Cellular Immunology Section, UCL Great Ormond Street Institute of Child Health, London, UK
Ute Modlich Research Group for Gene Modification in Stem Cells, Division of Veterinary Medicine, Paul Ehrlich Institute, Langen, Germany
Axel Schambach Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany; Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
Michael Rothe Institute of Experimental Hematology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625 Hannover, Germany.

Collapse

van Beek PE, Andriessen P, Onland W, Schuit E. Prognostic Models Predicting Mortality in Preterm Infants: Systematic Review and Meta-analysis. Pediatrics 2021;147:peds.2020-020461. [PMID: 33879518 DOI: 10.1542/peds.2020-020461] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/27/2021] [Indexed: 11/24/2022] Open

Zhang Y, Bernau C, Parmigiani G, Waldron L. The impact of different sources of heterogeneity on loss of accuracy from genomic prediction models. Biostatistics 2020;21:253-268. [PMID: 30202918 DOI: 10.1093/biostatistics/kxy044] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 07/22/2018] [Accepted: 08/04/2018] [Indexed: 11/13/2022] Open

Ubels J, Sonneveld P, van Vliet MH, de Ridder J. Gene Networks Constructed Through Simulated Treatment Learning can Predict Proteasome Inhibitor Benefit in Multiple Myeloma. Clin Cancer Res 2020;26:5952-5961. [PMID: 32913136 DOI: 10.1158/1078-0432.ccr-20-0742] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 05/27/2020] [Accepted: 09/03/2020] [Indexed: 11/16/2022]

Herrmann M, Probst P, Hornung R, Jurinovic V, Boulesteix AL. Large-scale benchmark study of survival prediction methods using multi-omics data. Brief Bioinform 2020;22:5895463. [PMID: 32823283 PMCID: PMC8138887 DOI: 10.1093/bib/bbaa167] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 06/25/2020] [Accepted: 07/03/2020] [Indexed: 12/18/2022] Open

Abstract

Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database 'The Cancer Genome Atlas' (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan-Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno's C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups-especially clinical variables-from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited. Contact:moritz.herrmann@stat.uni-muenchen.de, +49 89 2180 3198 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online. All analyses are reproducible using R code freely available on Github.

Collapse

Watson OP, Cortes-Ciriano I, Taylor AR, Watson JA. A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery. Bioinformatics 2020;35:4656-4663. [PMID: 31070704 PMCID: PMC6853675 DOI: 10.1093/bioinformatics/btz293] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 03/22/2019] [Accepted: 04/17/2019] [Indexed: 02/07/2023] Open

Abstract

Motivation

Artificial intelligence, trained via machine learning (e.g. neural nets, random forests) or computational statistical algorithms (e.g. support vector machines, ridge regression), holds much promise for the improvement of small-molecule drug discovery. However, small-molecule structure-activity data are high dimensional with low signal-to-noise ratios and proper validation of predictive methods is difficult. It is poorly understood which, if any, of the currently available machine learning algorithms will best predict new candidate drugs.

Results

The quantile-activity bootstrap is proposed as a new model validation framework using quantile splits on the activity distribution function to construct training and testing sets. In addition, we propose two novel rank-based loss functions which penalize only the out-of-sample predicted ranks of high-activity molecules. The combination of these methods was used to assess the performance of neural nets, random forests, support vector machines (regression) and ridge regression applied to 25 diverse high-quality structure-activity datasets publicly available on ChEMBL. Model validation based on random partitioning of available data favours models that overfit and ‘memorize’ the training set, namely random forests and deep neural nets. Partitioning based on quantiles of the activity distribution correctly penalizes extrapolation of models onto structurally different molecules outside of the training data. Simpler, traditional statistical methods such as ridge regression can outperform state-of-the-art machine learning methods in this setting. In addition, our new rank-based loss functions give considerably different results from mean squared error highlighting the necessity to define model optimality with respect to the decision task at hand.

Availability and implementation

All software and data are available as Jupyter notebooks found at https://github.com/owatson/QuantileBootstrap.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Vollmer S, Mateen BA, Bohner G, Király FJ, Ghani R, Jonsson P, Cumbers S, Jonas A, McAllister KSL, Myles P, Granger D, Birse M, Branson R, Moons KGM, Collins GS, Ioannidis JPA, Holmes C, Hemingway H. Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. BMJ 2020;368:l6927. [PMID: 32198138 DOI: 10.1136/bmj.l6927] [Citation(s) in RCA: 155] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Affiliation(s)

Sebastian Vollmer Alan Turing Institute, Kings Cross, London, UK Departments of Mathematics and Statistics, University of Warwick, Coventry, UK
Bilal A Mateen Alan Turing Institute, Kings Cross, London, UK Warwick Medical School, University of Warwick, Coventry, UK Kings College Hospital, Denmark Hill, London, UK
Gergo Bohner Alan Turing Institute, Kings Cross, London, UK Departments of Mathematics and Statistics, University of Warwick, Coventry, UK
Franz J Király Alan Turing Institute, Kings Cross, London, UK Department of Statistical Science, University College London, London, UK
Rayid Ghani University of Chicago, Chicago, IL, USA
Pall Jonsson Science Policy and Research, National Institute for Health and Care Excellence, Manchester, UK
Sarah Cumbers Health and Social Care Directorate, National Institute for Health and Care Excellence, London, UK
Adrian Jonas Data and Analytics Group, National Institute for Health and Care Excellence, London, UK
Katherine S L McAllister Data and Analytics Group, National Institute for Health and Care Excellence, London, UK
Puja Myles Clinical Practice Research Datalink, Medicines and Healthcare products Regulatory Agency, London, UK
David Granger Medicines and Healthcare products Regulatory Agency, London, UK
Mark Birse Medicines and Healthcare products Regulatory Agency, London, UK
Richard Branson Medicines and Healthcare products Regulatory Agency, London, UK
Karel G M Moons Julius Centre for Health Sciences and Primary Care, UMC Utrecht, Utrecht University, Utrecht, Netherlands
Gary S Collins UK EQUATOR Centre, Centre for Statistics in Medicine, NDORMS, University of Oxford, Oxford, UK
John P A Ioannidis Meta-Research Innovation Centre at Stanford, Stanford University, Stanford, CA, USA
Chris Holmes Alan Turing Institute, Kings Cross, London, UK Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
Harry Hemingway Health Data Research UK London, University College London, London, UK Institute of Health Informatics, University College London, London, UK National Institute for Health Research, University College London Hospitals Biomedical Research Centre, University College London, London, UK

Collapse

Shi L, Westerhuis JA, Rosén J, Landberg R, Brunius C. Variable selection and validation in multivariate modelling. Bioinformatics 2019;35:972-980. [PMID: 30165467 PMCID: PMC6419897 DOI: 10.1093/bioinformatics/bty710] [Citation(s) in RCA: 116] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Revised: 07/04/2018] [Accepted: 08/24/2018] [Indexed: 12/28/2022] Open

Abstract

MOTIVATION

Validation of variable selection and predictive performance is crucial in construction of robust multivariate models that generalize well, minimize overfitting and facilitate interpretation of results. Inappropriate variable selection leads instead to selection bias, thereby increasing the risk of model overfitting and false positive discoveries. Although several algorithms exist to identify a minimal set of most informative variables (i.e. the minimal-optimal problem), few can select all variables related to the research question (i.e. the all-relevant problem). Robust algorithms combining identification of both minimal-optimal and all-relevant variables with proper cross-validation are urgently needed.

RESULTS

We developed the MUVR algorithm to improve predictive performance and minimize overfitting and false positives in multivariate analysis. In the MUVR algorithm, minimal variable selection is achieved by performing recursive variable elimination in a repeated double cross-validation (rdCV) procedure. The algorithm supports partial least squares and random forest modelling, and simultaneously identifies minimal-optimal and all-relevant variable sets for regression, classification and multilevel analyses. Using three authentic omics datasets, MUVR yielded parsimonious models with minimal overfitting and improved model performance compared with state-of-the-art rdCV. Moreover, MUVR showed advantages over other variable selection algorithms, i.e. Boruta and VSURF, including simultaneous variable selection and validation scheme and wider applicability.

AVAILABILITY AND IMPLEMENTATION

Algorithms, data, scripts and tutorial are open source and available as an R package ('MUVR') at https://gitlab.com/CarlBrunius/MUVR.git.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Allahyar A, Ubels J, de Ridder J. A data-driven interactome of synergistic genes improves network-based cancer outcome prediction. PLoS Comput Biol 2019;15:e1006657. [PMID: 30726216 PMCID: PMC6380593 DOI: 10.1371/journal.pcbi.1006657] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2018] [Revised: 02/19/2019] [Accepted: 11/20/2018] [Indexed: 12/13/2022] Open

Abstract

Robustly predicting outcome for cancer patients from gene expression is an important challenge on the road to better personalized treatment. Network-based outcome predictors (NOPs), which considers the cellular wiring diagram in the classification, hold much promise to improve performance, stability and interpretability of identified marker genes. Problematically, reports on the efficacy of NOPs are conflicting and for instance suggest that utilizing random networks performs on par to networks that describe biologically relevant interactions. In this paper we turn the prediction problem around: instead of using a given biological network in the NOP, we aim to identify the network of genes that truly improves outcome prediction. To this end, we propose SyNet, a gene network constructed ab initio from synergistic gene pairs derived from survival-labelled gene expression data. To obtain SyNet, we evaluate synergy for all 69 million pairwise combinations of genes resulting in a network that is specific to the dataset and phenotype under study and can be used to in a NOP model. We evaluated SyNet and 11 other networks on a compendium dataset of >4000 survival-labelled breast cancer samples. For this purpose, we used cross-study validation which more closely emulates real world application of these outcome predictors. We find that SyNet is the only network that truly improves performance, stability and interpretability in several existing NOPs. We show that SyNet overlaps significantly with existing gene networks, and can be confidently predicted (~85% AUC) from graph-topological descriptions of these networks, in particular the breast tissue-specific network. Due to its data-driven nature, SyNet is not biased to well-studied genes and thus facilitates post-hoc interpretation. We find that SyNet is highly enriched for known breast cancer genes and genes related to e.g. histological grade and tamoxifen resistance, suggestive of a role in determining breast cancer outcome.

Collapse

Moons KGM, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, Reitsma JB, Kleijnen J, Mallett S. PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration. Ann Intern Med 2019;170:W1-W33. [PMID: 30596876 DOI: 10.7326/m18-1377] [Citation(s) in RCA: 682] [Impact Index Per Article: 136.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Cui L, Lu H, Lee YH. Challenges and emergent solutions for LC-MS/MS based untargeted metabolomics in diseases. MASS SPECTROMETRY REVIEWS 2018;37:772-792. [PMID: 29486047 DOI: 10.1002/mas.21562] [Citation(s) in RCA: 197] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 02/02/2018] [Indexed: 05/03/2023]

Abstract

In the past decade, advances in liquid chromatography-mass spectrometry (LC-MS) have revolutionized untargeted metabolomics analyses. By mining metabolomes more deeply, researchers are now primed to uncover key metabolites and their associations with diseases. The employment of untargeted metabolomics has led to new biomarker discoveries and a better mechanistic understanding of diseases with applications in precision medicine. However, many major pertinent challenges remain. First, compound identification has been poor, and left an overwhelming number of unidentified peaks. Second, partial, incomplete metabolomes persist due to factors such as limitations in mass spectrometry data acquisition speeds, wide-range of metabolites concentrations, and cellular/tissue/temporal-specific expression changes that confound our understanding of metabolite perturbations. Third, to contextualize metabolites in pathways and biology is difficult because many metabolites partake in multiple pathways, have yet to be described species specificity, or possess unannotated or more-complex functions that are not easily characterized through metabolomics analyses. From a translational perspective, information related to novel metabolite biomarkers, metabolic pathways, and drug targets might be sparser than they should be. Thankfully, significant progress has been made and novel solutions are emerging, achieved through sustained academic and industrial community efforts in terms of hardware, computational, and experimental approaches. Given the rapidly growing utility of metabolomics, this review will offer new perspectives, increase awareness of the major challenges in LC-MS metabolomics that will significantly benefit the metabolomics community and also the broader the biomedical community metabolomics aspire to serve.

Collapse

Mooney SJ, Pejaver V. Big Data in Public Health: Terminology, Machine Learning, and Privacy. Annu Rev Public Health 2018;39:95-112. [PMID: 29261408 PMCID: PMC6394411 DOI: 10.1146/annurev-publhealth-040617-014208] [Citation(s) in RCA: 150] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Biomarker Guidelines for High-Dimensional Genomic Studies in Transplantation: Adding Method to the Madness. Transplantation 2018;101:457-463. [PMID: 28212255 DOI: 10.1097/tp.0000000000001622] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Rahmatallah Y, Khaidakov M, Lai KK, Goyne HE, Lamps LW, Hagedorn CH, Glazko G. Platform-independent gene expression signature differentiates sessile serrated adenomas/polyps and hyperplastic polyps of the colon. BMC Med Genomics 2017;10:81. [PMID: 29284484 PMCID: PMC5745747 DOI: 10.1186/s12920-017-0317-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Accepted: 12/14/2017] [Indexed: 12/18/2022] Open

Abstract

Background

Sessile serrated adenomas/polyps are distinguished from hyperplastic colonic polyps subjectively by their endoscopic appearance and histological morphology. However, hyperplastic and sessile serrated polyps can have overlapping morphological features resulting in sessile serrated polyps diagnosed as hyperplastic. While sessile serrated polyps can progress into colon cancer, hyperplastic polyps have virtually no risk for colon cancer. Objective measures, differentiating these types of polyps would improve cancer prevention and treatment outcome.

Methods

RNA-seq training data set and Affimetrix, Illumina testing data sets were obtained from Gene Expression Omnibus (GEO). RNA-seq single-end reads were filtered with FastX toolkit. Read mapping to the human genome, gene abundance estimation, and differential expression analysis were performed with Tophat-Cufflinks pipeline. Background correction, normalization, and probe summarization steps for Affimetrix arrays were performed using the robust multi-array method (RMA). For Illumina arrays, log₂-scale expression data was obtained from GEO. Pathway analysis was implemented using Bioconductor package GSAR. To build a platform-independent molecular classifier that accurately differentiates sessile serrated and hyperplastic polyps we developed a new feature selection step. We also developed a simple procedure to classify new samples as either sessile serrated or hyperplastic with a class probability assigned to the decision, estimated using Cantelli’s inequality.

Results

The classifier trained on RNA-seq data and tested on two independent microarray data sets resulted in zero and three errors. The classifier was further tested using quantitative real-time PCR expression levels of 45 blinded independent formalin-fixed paraffin-embedded specimens and was highly accurate. Pathway analyses have shown that sessile serrated polyps are distinguished from hyperplastic polyps and normal controls by: up-regulation of pathways implicated in proliferation, inflammation, cell-cell adhesion and down-regulation of serine threonine kinase signaling pathway; differential co-expression of pathways regulating cell division, protein trafficking and kinase activities.

Conclusions

Most of the differentially expressed pathways are known as hallmarks of cancer and likely to explain why sessile serrated polyps are more prone to neoplastic transformation than hyperplastic. The new molecular classifier includes 13 genes and may facilitate objective differentiation between two polyps.

Electronic supplementary material

The online version of this article (10.1186/s12920-017-0317-7) contains supplementary material, which is available to authorized users.

Collapse

Hornung R, Causeur D, Bernau C, Boulesteix AL. Improving cross-study prediction through addon batch effect adjustment or addon normalization. Bioinformatics 2017;33:397-404. [PMID: 27797760 DOI: 10.1093/bioinformatics/btw650] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 10/11/2016] [Indexed: 12/22/2022] Open

Ioannidis JPA, Bossuyt PMM. Waste, Leaks, and Failures in the Biomarker Pipeline. Clin Chem 2017;63:963-972. [DOI: 10.1373/clinchem.2016.254649] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 11/30/2016] [Indexed: 01/05/2023]

Abstract Abstract BACKGROUND The large, expanding literature on biomarkers is characterized by almost ubiquitous significant results, with claims about the potential importance, but few of these discovered biomarkers are used in routine clinical care. CONTENT The pipeline of biomarker development includes several specific stages: discovery, validation, clinical translation, evaluation, implementation (and, in the case of nonutility, deimplementation). Each of these stages can be plagued by problems that cause failures of the overall pipeline. Some problems are nonspecific challenges for all biomedical investigation, while others are specific to the peculiarities of biomarker research. Discovery suffers from poor methods and incomplete and selective reporting. External independent validation is limited. Selection for clinical translation is often shaped by nonrational choices. Evaluation is sparse and the clinical utility of many biomarkers remains unknown. The regulatory environment for biomarkers remains weak and guidelines can reach biased or divergent recommendations. Removing inefficient or even harmful biomarkers that have been entrenched in clinical care can meet with major resistance. SUMMARY The current biomarker pipeline is too prone to failures. Consideration of clinical needs should become a starting point for the development of biomarkers. Improvements can include the use of more stringent methodology, better reporting, larger collaborative studies, careful external independent validation, preregistration, rigorous systematic reviews and umbrella reviews, pivotal randomized trials, and implementation and deimplementation studies. Incentives should be aligned toward delivering useful biomarkers. Collapse

Marcus MW, Field JK. Is Bootstrapping Sufficient for Validating a Risk Model for Selection of Participants for a Lung Cancer Screening Program? J Clin Oncol 2017;35:818-819. [DOI: 10.1200/jco.2016.71.3214] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Characteristics and Validation Techniques for PCA-Based Gene-Expression Signatures. Int J Genomics 2017;2017:2354564. [PMID: 28265563 PMCID: PMC5317117 DOI: 10.1155/2017/2354564] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Revised: 12/15/2016] [Accepted: 01/04/2017] [Indexed: 11/30/2022] Open

Timmons JA. Molecular Diagnostics of Ageing and Tackling Age-related Disease. Trends Pharmacol Sci 2016;38:67-80. [PMID: 27979318 DOI: 10.1016/j.tips.2016.11.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Revised: 11/08/2016] [Accepted: 11/08/2016] [Indexed: 10/25/2022]

Abram TJ, Floriano PN, Christodoulides N, James R, Kerr AR, Thornhill MH, Redding SW, Vigneswaran N, Speight PM, Vick J, Murdoch C, Freeman C, Hegarty AM, D'Apice K, Phelan JA, Corby PM, Khouly I, Bouquot J, Demian NM, Weinstock YE, Rowan S, Yeh CK, McGuff HS, Miller FR, Gaur S, Karthikeyan K, Taylor L, Le C, Nguyen M, Talavera H, Raja R, Wong J, McDevitt JT. 'Cytology-on-a-chip' based sensors for monitoring of potentially malignant oral lesions. Oral Oncol 2016;60:103-11. [PMID: 27531880 DOI: 10.1016/j.oraloncology.2016.07.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 06/30/2016] [Accepted: 07/02/2016] [Indexed: 12/11/2022]

Affiliation(s)

Timothy J Abram Rice University, Department of Bioengineering, Houston, TX, USA
Pierre N Floriano NeoTherma Oncology, Houston, TX, USA
Nicolaos Christodoulides Rice University, Department of Bioengineering, Houston, TX, USA
Robert James Rho Inc., Chapel Hill, NC, USA
A Ross Kerr New York University College of Dentistry, Department of Oral and Maxillofacial Pathology, Radiology & Medicine, New York, NY, USA
Martin H Thornhill Academic Unit of Oral & Maxillofacial Medicine & Surgery, University of Sheffield School of Clinical Dentistry, Sheffield, UK
Spencer W Redding The University of Texas Health Science Center at San Antonio, Department of Comprehensive Dentistry and Cancer Therapy and Research Center, San Antonio, TX, USA
Nadarajah Vigneswaran The University of Texas Health Science Center at Houston, Department of Diagnostic and Biomedical Sciences, Houston, TX, USA
Paul M Speight Academic Unit of Oral & Maxillofacial Pathology, University of Sheffield School of Clinical Dentistry, Sheffield, UK
Julie Vick Rho Inc., Chapel Hill, NC, USA
Craig Murdoch Academic Unit of Oral & Maxillofacial Medicine & Surgery, University of Sheffield School of Clinical Dentistry, Sheffield, UK
Christine Freeman Academic Unit of Oral & Maxillofacial Medicine & Surgery, University of Sheffield School of Clinical Dentistry, Sheffield, UK
Anne M Hegarty Unit of Oral Medicine, Charles Clifford Dental Hospital, Sheffield Teaching Hospitals National Health Service Foundation Trust, Sheffield, UK
Katy D'Apice Unit of Oral Medicine, Charles Clifford Dental Hospital, Sheffield Teaching Hospitals National Health Service Foundation Trust, Sheffield, UK
Joan A Phelan New York University College of Dentistry, Department of Oral and Maxillofacial Pathology, Radiology & Medicine, New York, NY, USA
Patricia M Corby New York University School of Medicine, Department of Population Health and Radiation Oncology, New York, NY, USA
Ismael Khouly New York University College of Dentistry, Bluestone Center for Clinical Research, New York, NY, USA
Jerry Bouquot The University of Texas Health Science Center at Houston, Department of Diagnostic and Biomedical Sciences, Houston, TX, USA
Nagi M Demian The University of Texas Health Science Center at Houston, Department of Oral and Maxillofacial Surgery, Houston, TX, USA
Y Etan Weinstock The University of Texas Health Science Center at Houston, Department of Otolaryngology-Head and Neck Surgery, Houston, TX, USA
Stephanie Rowan The University of Texas Health Science Center at San Antonio, Department of Comprehensive Dentistry and Cancer Therapy and Research Center, San Antonio, TX, USA
Chih-Ko Yeh The University of Texas Health Science Center at San Antonio, Department of Comprehensive Dentistry and Cancer Therapy and Research Center, San Antonio, TX, USA; South Texas Veterans Health Care System, Geriatric Research, Education, and Clinical Center, San Antonio, TX, USA
H Stan McGuff The University of Texas Health Science Center at San Antonio, Department of Pathology, San Antonio, TX, USA
Frank R Miller The University of Texas Health Science Center at San Antonio, Department of Otolaryngology-Head and Neck Surgery and Cancer Therapy and Research Center, San Antonio, TX, USA
Surabhi Gaur Rice University, Department of Bioengineering, Houston, TX, USA
Kailash Karthikeyan Rice University, Department of Bioengineering, Houston, TX, USA
Leander Taylor Rice University, Department of Bioengineering, Houston, TX, USA
Cathy Le Rice University, Department of Bioengineering, Houston, TX, USA
Michael Nguyen Rice University, Department of Bioengineering, Houston, TX, USA
Humberto Talavera Rice University, Department of Bioengineering, Houston, TX, USA
Rameez Raja Rice University, Department of Bioengineering, Houston, TX, USA
Jorge Wong Rice University, Department of Bioengineering, Houston, TX, USA
John T McDevitt Rice University, Department of Bioengineering, Houston, TX, USA; Rice University, Department of Chemistry, Houston, TX, USA; New York University, Department of Biomaterials, New York, NY, USA.

Collapse

Wishart DS. Emerging applications of metabolomics in drug discovery and precision medicine. Nat Rev Drug Discov 2016;15:473-84. [PMID: 26965202 DOI: 10.1038/nrd.2016.32] [Citation(s) in RCA: 879] [Impact Index Per Article: 109.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Chang Y, Glass K, Liu YY, Silverman EK, Crapo JD, Tal-Singer R, Bowler R, Dy J, Cho M, Castaldi P. COPD subtypes identified by network-based clustering of blood gene expression. Genomics 2016;107:51-58. [PMID: 26773458 DOI: 10.1016/j.ygeno.2016.01.004] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Revised: 12/04/2015] [Accepted: 01/06/2016] [Indexed: 01/22/2023]

Arnold JM, Choi WT, Sreekumar A, Maletić-Savatić M. Analytical strategies for studying stem cell metabolism. ACTA ACUST UNITED AC 2015. [PMID: 26213533 DOI: 10.1007/s11515-015-1357-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, Vickers AJ, Ransohoff DF, Collins GS. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015;162:W1-73. [PMID: 25560730 DOI: 10.7326/m14-0698] [Citation(s) in RCA: 2928] [Impact Index Per Article: 325.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Collins GS. Statistical flaws in the development of a prediction model. Am J Obstet Gynecol 2015;212:116. [PMID: 25218126 DOI: 10.1016/j.ajog.2014.09.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 09/04/2014] [Indexed: 10/24/2022]

Shou Y, Robinson DM, Amakye DD, Rose KL, Cho YJ, Ligon KL, Sharp T, Haider AS, Bandaru R, Ando Y, Geoerger B, Doz F, Ashley DM, Hargrave DR, Casanova M, Tawbi HA, Rodon J, Thomas AL, Mita AC, MacDonald TJ, Kieran MW. A five-gene hedgehog signature developed as a patient preselection tool for hedgehog inhibitor therapy in medulloblastoma. Clin Cancer Res 2014;21:585-93. [PMID: 25473003 DOI: 10.1158/1078-0432.ccr-13-1711] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract

PURPOSE

Distinct molecular subgroups of medulloblastoma, including hedgehog (Hh) pathway-activated disease, have been reported. We identified and clinically validated a five-gene Hh signature assay that can be used to preselect patients with Hh pathway-activated medulloblastoma.

EXPERIMENTAL DESIGN

Gene characteristics of the Hh medulloblastoma subgroup were identified through published bioinformatic analyses. Thirty-two genes shown to be differentially expressed in fresh-frozen and formalin-fixed paraffin-embedded tumor samples and reproducibly analyzed by RT-PCR were measured in matched samples. These data formed the basis for building a multi-gene logistic regression model derived through elastic net methods from which the five-gene Hh signature emerged after multiple iterations. On the basis of signature gene expression levels, the model computed a propensity score to determine Hh activation using a threshold set a priori. The association between Hh activation status and tumor response to the Hh pathway inhibitor sonidegib (LDE225) was analyzed.

RESULTS

Five differentially expressed genes in medulloblastoma (GLI1, SPHK1, SHROOM2, PDLIM3, and OTX2) were found to associate with Hh pathway activation status. In an independent validation study, Hh activation status of 25 medulloblastoma samples showed 100% concordance between the five-gene signature and Affymetrix profiling. Further, in medulloblastoma samples from 50 patients treated with sonidegib, all 6 patients who responded were found to have Hh-activated tumors. Three patients with Hh-activated tumors had stable or progressive disease. No patients with Hh-nonactivated tumors responded.

CONCLUSIONS

This five-gene Hh signature can robustly identify Hh-activated medulloblastoma and may be used to preselect patients who might benefit from sonidegib treatment.

Collapse

Affiliation(s)

Yaping Shou Novartis Institutes for BioMedical Research, Inc, Cambridge, Massachusetts
Douglas M Robinson Novartis Institutes for BioMedical Research, Inc, Cambridge, Massachusetts
Dereck D Amakye Novartis Pharmaceuticals Corporation, East Hanover, New Jersey
Kristine L Rose Novartis Pharmaceuticals Corporation, East Hanover, New Jersey
Yoon-Jae Cho Departments of Neurology and Neurosurgery, Stanford University School of Medicine, Stanford, California
Keith L Ligon Pediatric Neuro-Oncology, Dana-Farber Cancer Institute and Boston Children's Hospital and Harvard Medical School, Boston, Massachusetts. Department of Pathology, Children's Hospital Boston, Brigham and Women's Hospital, and Harvard Medical School, Boston, Massachusetts. Department of Medical Oncology and Center for Molecular Oncologic Pathology, Dana-Farber Cancer Institute, Boston, Massachusetts
Thad Sharp Novartis Institutes for BioMedical Research, Inc, Cambridge, Massachusetts
Asifa S Haider Novartis Pharmaceuticals Corporation, East Hanover, New Jersey
Raj Bandaru Novartis Institutes for BioMedical Research, Inc, Cambridge, Massachusetts
Yuichi Ando Nagoya University Hospital, Nagoya, Japan
Birgit Geoerger Institut Gustave Roussy, University Paris-Sud, Villejuif, France
François Doz Institut Curie and University Paris Descartes, Sorbonne Paris Cité, France
David M Ashley Deakin University/Barwon Health, Melbourne, Australia
Darren R Hargrave Great Ormond Street Hospital for Children, London
Michela Casanova Fondazione IRCCS Istituto Nazionale dei Tumori, Milano, Italy
Hussein A Tawbi University of Pittsburgh Cancer Institute and University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
Jordi Rodon Vall d'Hebron Institut d'Oncologia, and Universitat Autonoma de Barcelona, Barcelona, Spain
Anne L Thomas University of Leicester, Leicester, United Kingdom
Alain C Mita Cancer Therapy and Research Center, University of Texas Health Science Center, San Antonio, Texas
Tobey J MacDonald Children's Healthcare of Atlanta, Aflac Cancer and Blood Disorders Center, Emory University School of Medicine, Atlanta, Georgia
Mark W Kieran Pediatric Neuro-Oncology, Dana-Farber Cancer Institute and Boston Children's Hospital and Harvard Medical School, Boston, Massachusetts.

Collapse

De Bin R, Herold T, Boulesteix AL. Added predictive value of omics data: specific issues related to validation illustrated by two case studies. BMC Med Res Methodol 2014;14:117. [PMID: 25352096 PMCID: PMC4271356 DOI: 10.1186/1471-2288-14-117] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2014] [Accepted: 09/18/2014] [Indexed: 01/06/2023] Open

Bernau C, Riester M, Boulesteix AL, Parmigiani G, Huttenhower C, Waldron L, Trippa L. Cross-study validation for the assessment of prediction algorithms. ACTA ACUST UNITED AC 2014;30:i105-12. [PMID: 24931973 PMCID: PMC4058929 DOI: 10.1093/bioinformatics/btu279] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Abstract

Motivation: Numerous competing algorithms for prediction in high-dimensional settings have been developed in the statistical and machine-learning literature. Learning algorithms and the prediction models they generate are typically evaluated on the basis of cross-validation error estimates in a few exemplary datasets. However, in most applications, the ultimate goal of prediction modeling is to provide accurate predictions for independent samples obtained in different settings. Cross-validation within exemplary datasets may not adequately reflect performance in the broader application context.

Methods: We develop and implement a systematic approach to ‘cross-study validation’, to replace or supplement conventional cross-validation when evaluating high-dimensional prediction models in independent datasets. We illustrate it via simulations and in a collection of eight estrogen-receptor positive breast cancer microarray gene-expression datasets, where the objective is predicting distant metastasis-free survival (DMFS). We computed the C-index for all pairwise combinations of training and validation datasets. We evaluate several alternatives for summarizing the pairwise validation statistics, and compare these to conventional cross-validation.

Results: Our data-driven simulations and our application to survival prediction with eight breast cancer microarray datasets, suggest that standard cross-validation produces inflated discrimination accuracy for all algorithms considered, when compared to cross-study validation. Furthermore, the ranking of learning algorithms differs, suggesting that algorithms performing best in cross-validation may be suboptimal when evaluated through independent validation.

Availability: The survHD: Survival in High Dimensions package (http://www.bitbucket.org/lwaldron/survhd) will be made available through Bioconductor.

Contact:levi.waldron@hunter.cuny.edu

Supplementary information:Supplementary data are available at Bioinformatics online.

Collapse

Affiliation(s)

Christoph Bernau Leibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USALeibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USA
Markus Riester Leibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USALeibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USA
Anne-Laure Boulesteix Leibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USA
Giovanni Parmigiani Leibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USALeibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USA
Curtis Huttenhower Leibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USA
Levi Waldron Leibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USA
Lorenzo Trippa Leibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USALeibniz Supercomputing Center, Garching, Department for Medical Informatics, Biometry and Epidemiology, Munich, Germany, Cambridge, MA, Dana-Farber Cancer Institute, Boston, Harvard School of Public Health, Boston, USA and City University of New York School of Public Health, Hunter College, New York, USA

Collapse

Tzoulaki I, Ebbels TMD, Valdes A, Elliott P, Ioannidis JPA. Design and analysis of metabolomics studies in epidemiologic research: a primer on -omic technologies. Am J Epidemiol 2014;180:129-39. [PMID: 24966222 DOI: 10.1093/aje/kwu143] [Citation(s) in RCA: 133] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open

De Bin R, Sauerbrei W, Boulesteix AL. Investigating the prediction ability of survival models based on both clinical and omics data: two case studies. Stat Med 2014;33:5310-29. [PMID: 25042390 DOI: 10.1002/sim.6246] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 04/22/2014] [Accepted: 05/31/2014] [Indexed: 12/25/2022]

Bendl J, Stourac J, Salanda O, Pavelka A, Wieben ED, Zendulka J, Brezovsky J, Damborsky J. PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations. PLoS Comput Biol 2014;10:e1003440. [PMID: 24453961 PMCID: PMC3894168 DOI: 10.1371/journal.pcbi.1003440] [Citation(s) in RCA: 534] [Impact Index Per Article: 53.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 12/03/2013] [Indexed: 02/07/2023] Open

Affiliation(s)

Jaroslav Bendl Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic Center of Biomolecular and Cellular Engineering, International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
Jan Stourac Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic Center of Biomolecular and Cellular Engineering, International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
Ondrej Salanda Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
Antonin Pavelka Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic
Eric D. Wieben Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, New York, United States of America
Jaroslav Zendulka Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
Jan Brezovsky Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic * E-mail: (JB); (JD)
Jiri Damborsky Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment, Faculty of Science, Masaryk University, Brno, Czech Republic Center of Biomolecular and Cellular Engineering, International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic * E-mail: (JB); (JD)

Collapse

Ioannidis JPA, Greenland S, Hlatky MA, Khoury MJ, Macleod MR, Moher D, Schulz KF, Tibshirani R. Increasing value and reducing waste in research design, conduct, and analysis. Lancet 2014;383:166-75. [PMID: 24411645 PMCID: PMC4697939 DOI: 10.1016/s0140-6736(13)62227-8] [Citation(s) in RCA: 957] [Impact Index Per Article: 95.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Fridlyand J, Yeh RF, Mackey H, Bengtsson T, Delmar P, Spaniolo G, Lieberman G. An industry statistician's perspective on PHC drug development. Contemp Clin Trials 2013;36:624-35. [PMID: 23648396 DOI: 10.1016/j.cct.2013.04.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2013] [Revised: 04/11/2013] [Accepted: 04/25/2013] [Indexed: 10/26/2022]

Okser S, Pahikkala T, Aittokallio T. Genetic variants and their interactions in disease risk prediction - machine learning and network perspectives. BioData Min 2013;6:5. [PMID: 23448398 PMCID: PMC3606427 DOI: 10.1186/1756-0381-6-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Accepted: 02/11/2013] [Indexed: 12/31/2022] Open

Verma M, Khoury MJ, Ioannidis JPA. Opportunities and challenges for selected emerging technologies in cancer epidemiology: mitochondrial, epigenomic, metabolomic, and telomerase profiling. Cancer Epidemiol Biomarkers Prev 2013;22:189-200. [PMID: 23242141 PMCID: PMC3565041 DOI: 10.1158/1055-9965.epi-12-1263] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Kim J, Gao L, Tan K. Multi-analyte network markers for tumor prognosis. PLoS One 2012;7:e52973. [PMID: 23300836 PMCID: PMC3530467 DOI: 10.1371/journal.pone.0052973] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2012] [Accepted: 11/26/2012] [Indexed: 01/07/2023] Open

Hwang S. Comparison and evaluation of pathway-level aggregation methods of gene expression data. BMC Genomics 2012;13 Suppl 7:S26. [PMID: 23282027 PMCID: PMC3521227 DOI: 10.1186/1471-2164-13-s7-s26] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Abstract

Background

Microarray experiments produce expression measurements in genomic scale. A way to derive functional understanding of the data is to focus on functional sets of genes, such as pathways, instead of individual genes. While a common practice for the pathway-level analysis has been functional enrichment analysis such as over-representation analysis and gene set enrichment analysis, an alternative approach has also been explored. In this approach, gene expression data are first aggregated at pathway level to transform the original data into a compact representation in which each row corresponds to a pathway instead of a gene. Thereafter the pathway expression data can be used for differential expression and classification analyses in pathway space, leveraging existing algorithms usually applied to gene expression data. While several studies have proposed the pathway-level aggregation methods, it remains unclear how they compare with one another, since the evaluations were done to a limited extent. Thus this study presents a comprehensive evaluation of six most prominent aggregation methods.

Results

The compared methods include five existing methods--mean of all member genes (Mean all), mean of condition-responsive genes (Mean CORGs), analysis of sample set enrichment scores (ASSESS), principal component analysis (PCA), and partial least squares (PLS)--and a variant of an existing method (Mean top 50%, averaging top half of member genes). Comprehensive and stringent benchmarking was performed by collecting seven pairs of related but independent datasets encompassing various phenotypes. Aggregation was done in the space of KEGG pathways. Performance of the methods was assessed by classification accuracy validated both internally and externally, and by examining the correlative extent of pathway signatures between the dataset pairs. The assessment revealed that (i) the best accuracy and correlation were obtained from ASSESS and Mean top 50%, (ii) Mean all showed the lowest accuracy, and (iii) Mean CORGs and PLS gave rise to the largest extent of discordance in the pathway signature correlation.

Conclusions

The two best performing method (ASSESS and Mean top 50%) are suggested to be preferred. The benchmarking analysis also suggests that there is both room and necessity for developing a novel method for pathway-level aggregation.

Collapse

Kern SE. Why your new cancer biomarker may never work: recurrent patterns and remarkable diversity in biomarker failures. Cancer Res 2012;72:6097-101. [PMID: 23172309 DOI: 10.1158/0008-5472.can-12-3232] [Citation(s) in RCA: 159] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]