1
|
Ma S, Yu J, Qin X, Liu J. Current status and challenges in establishing reference intervals based on real-world data. Crit Rev Clin Lab Sci 2023; 60:427-441. [PMID: 37038925 DOI: 10.1080/10408363.2023.2195496] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 01/29/2023] [Accepted: 03/22/2023] [Indexed: 04/12/2023]
Abstract
Reference intervals (RIs) are the cornerstone for evaluation of test results in clinical practice and are invaluable in judging patient health and making clinical decisions. Establishing RIs based on clinical laboratory data is a branch of real-world data mining research. Compared to the traditional direct method, this indirect approach is highly practical, widely applicable, and low-cost. Improving the accuracy of RIs requires not only the collection of sufficient data and the use of correct statistical methods, but also proper stratification of heterogeneous subpopulations. This includes the establishment of age-specific RIs and taking into account other characteristics of reference individuals. Although there are many studies on establishing RIs by indirect methods, it is still very difficult for laboratories to select appropriate statistical methods due to the lack of formal guidelines. This review describes the application of real-world data and an approach for establishing indirect reference intervals (iRIs). We summarize the processes for establishing iRIs using real-world data and analyze the principle and applicable scope of the indirect method model in detail. Moreover, we compare different methods for constructing growth curves to establish age-specific RIs, in hopes of providing laboratories with a reference for establishing specific iRIs and giving new insight into clinical laboratory RI research. (201 words).
Collapse
Affiliation(s)
- Sijia Ma
- Department of Laboratory Medicine, Shengjing Hospital of China Medical University, Liaoning Clinical Research Center for Laboratory Medicine, Shenyang, P.R. China
| | - Juntong Yu
- Department of Laboratory Medicine, Shengjing Hospital of China Medical University, Liaoning Clinical Research Center for Laboratory Medicine, Shenyang, P.R. China
| | - Xiaosong Qin
- Department of Laboratory Medicine, Shengjing Hospital of China Medical University, Liaoning Clinical Research Center for Laboratory Medicine, Shenyang, P.R. China
| | - Jianhua Liu
- Department of Laboratory Medicine, Shengjing Hospital of China Medical University, Liaoning Clinical Research Center for Laboratory Medicine, Shenyang, P.R. China
| |
Collapse
|
2
|
Ammer T, Schützenmeister A, Prokosch HU, Rauh M, Rank CM, Zierk J. A pipeline for the fully automated estimation of continuous reference intervals using real-world data. Sci Rep 2023; 13:13440. [PMID: 37596314 PMCID: PMC10439150 DOI: 10.1038/s41598-023-40561-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 08/12/2023] [Indexed: 08/20/2023] Open
Abstract
Reference intervals are essential for interpreting laboratory test results. Continuous reference intervals precisely capture physiological age-specific dynamics that occur throughout life, and thus have the potential to improve clinical decision-making. However, established approaches for estimating continuous reference intervals require samples from healthy individuals, and are therefore substantially restricted. Indirect methods operating on routine measurements enable the estimation of one-dimensional reference intervals, however, no automated approach exists that integrates the dependency on a continuous covariate like age. We propose an integrated pipeline for the fully automated estimation of continuous reference intervals expressed as a generalized additive model for location, scale and shape based on discrete model estimates using an indirect method (refineR). The results are free of subjective user-input, enable conversion of test results into z-scores and can be integrated into laboratory information systems. Comparison of our results to established and validated reference intervals from the CALIPER and PEDREF studies and manufacturers' package inserts shows good agreement of reference limits, indicating that the proposed pipeline generates high-quality results. In conclusion, the developed pipeline enables the generation of high-precision percentile charts and continuous reference intervals. It represents the first parameter-less and fully automated solution for the indirect estimation of continuous reference intervals.
Collapse
Affiliation(s)
- Tatjana Ammer
- Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Roche Diagnostics GmbH, Penzberg, Germany
| | | | - Hans-Ulrich Prokosch
- Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Manfred Rauh
- Department of Pediatrics and Adolescent Medicine, Universitätsklinikum Erlangen, Loschgestr. 15, 91054, Erlangen, Germany
| | | | - Jakob Zierk
- Department of Pediatrics and Adolescent Medicine, Universitätsklinikum Erlangen, Loschgestr. 15, 91054, Erlangen, Germany.
- Center of Medical Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany.
| |
Collapse
|
3
|
Speller J, Staerk C, Mayr A. Robust statistical boosting with quantile-based adaptive loss functions. Int J Biostat 2022:ijb-2021-0127. [PMID: 35950232 DOI: 10.1515/ijb-2021-0127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 06/20/2022] [Indexed: 11/15/2022]
Abstract
We combine robust loss functions with statistical boosting algorithms in an adaptive way to perform variable selection and predictive modelling for potentially high-dimensional biomedical data. To achieve robustness against outliers in the outcome variable (vertical outliers), we consider different composite robust loss functions together with base-learners for linear regression. For composite loss functions, such as the Huber loss and the Bisquare loss, a threshold parameter has to be specified that controls the robustness. In the context of boosting algorithms, we propose an approach that adapts the threshold parameter of composite robust losses in each iteration to the current sizes of residuals, based on a fixed quantile level. We compared the performance of our approach to classical M-regression, boosting with standard loss functions or the lasso regarding prediction accuracy and variable selection in different simulated settings: the adaptive Huber and Bisquare losses led to a better performance when the outcome contained outliers or was affected by specific types of corruption. For non-corrupted data, our approach yielded a similar performance to boosting with the efficient L 2 loss or the lasso. Also in the analysis of skewed KRT19 protein expression data based on gene expression measurements from human cancer cell lines (NCI-60 cell line panel), boosting with the new adaptive loss functions performed favourably compared to standard loss functions or competing robust approaches regarding prediction accuracy and resulted in very sparse models.
Collapse
Affiliation(s)
- Jan Speller
- Medical Faculty, Institute of Medical Biometrics, Informatics and Epidemiology (IMBIE), University of Bonn, Bonn, Germany
| | - Christian Staerk
- Medical Faculty, Institute of Medical Biometrics, Informatics and Epidemiology (IMBIE), University of Bonn, Bonn, Germany
| | - Andreas Mayr
- Medical Faculty, Institute of Medical Biometrics, Informatics and Epidemiology (IMBIE), University of Bonn, Bonn, Germany
| |
Collapse
|
4
|
Hepp T, Zierk J, Rauh M, Metzler M, Seitz S. Mixture density networks for the indirect estimation of reference intervals. BMC Bioinformatics 2022; 23:307. [PMID: 35906555 PMCID: PMC9336034 DOI: 10.1186/s12859-022-04846-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 07/15/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Reference intervals represent the expected range of physiological test results in a healthy population and are essential to support medical decision making. Particularly in the context of pediatric reference intervals, where recruitment regulations make prospective studies challenging to conduct, indirect estimation strategies are becoming increasingly important. Established indirect methods enable robust identification of the distribution of "healthy" samples from laboratory databases, which include unlabeled pathologic cases, but are currently severely limited when adjusting for essential patient characteristics such as age. Here, we propose the use of mixture density networks (MDN) to overcome this problem and model all parameters of the mixture distribution in a single step. RESULTS Estimated reference intervals from varying settings with simulated data demonstrate the ability to accurately estimate latent distributions from unlabeled data using different implementations of MDNs. Comparing the performance with alternative estimation approaches further highlights the importance of modeling the mixture component weights as a function of the input in order to avoid biased estimates for all other parameters and the resulting reference intervals. We also provide a strategy to generate partially customized starting weights to improve proper identification of the latent components. Finally, the application on real-world hemoglobin samples provides results in line with current gold standard approaches, but also suggests further investigations with respect to adequate regularization strategies in order to prevent overfitting the data. CONCLUSIONS Mixture density networks provide a promising approach capable of extracting the distribution of healthy samples from unlabeled laboratory databases while simultaneously and explicitly estimating all parameters and component weights as non-linear functions of the covariate(s), thereby allowing the estimation of age-dependent reference intervals in a single step. Further studies on model regularization and asymmetric component distributions are warranted to consolidate our findings and expand the scope of applications.
Collapse
Affiliation(s)
- Tobias Hepp
- Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstraße 6, 91054, Erlangen, Germany. .,Chair of Spatial Data Science and Statistical Learning, Georg-August-Universität Göttingen, Platz der Göttinger Sieben 3, 37073, Göttingen, Germany.
| | - Jakob Zierk
- Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Loschgestraße 15, 91054, Erlangen, Germany
| | - Manfred Rauh
- Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Loschgestraße 15, 91054, Erlangen, Germany
| | - Markus Metzler
- Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Loschgestraße 15, 91054, Erlangen, Germany
| | - Sarem Seitz
- Department of Information Systems and Applied Computer Science, Otto-Friedrich-Universität Bamberg, Kapuzinerstraße 16, 96047, Bamberg, Germany
| |
Collapse
|
5
|
Abstract
Abstract
Laboratory tests are essential to assess the health status and to guide patient care in individuals of all ages. The interpretation of quantitative test results requires availability of appropriate reference intervals, and reference intervals in children have to account for the extensive physiological dynamics with age in many biomarkers. Creation of reference intervals using conventional approaches requires the sampling of healthy individuals, which is opposed by ethical and practical considerations in children, due to the need for a large number of blood samples from healthy children of all ages, including neonates and young infants. This limits the availability and quality of pediatric reference intervals, and ultimately negatively impacts pediatric clinical decision-making. Data mining approaches use laboratory test results and clinical information from hospital information systems to create reference intervals. The extensive number of available test results from laboratory information systems and advanced statistical methods enable the creation of pediatric reference intervals with an unprecedented age-related accuracy for children of all ages. Ongoing developments regarding the availability and standardization of electronic medical records and of indirect statistical methods will further improve the benefit of data mining for pediatric reference intervals.
Collapse
Affiliation(s)
- Jakob Zierk
- Department of Pediatrics and Adolescent Medicine , University Hospital Erlangen , Erlangen , Germany
| | - Markus Metzler
- Department of Pediatrics and Adolescent Medicine , University Hospital Erlangen , Erlangen , Germany
| | - Manfred Rauh
- Department of Pediatrics and Adolescent Medicine , University Hospital Erlangen , Erlangen , Germany
| |
Collapse
|
6
|
Abstract
Abstract
The indirect approach to defining reference intervals operates ‘a posteriori’, on stored laboratory data. It relies on being able to separate healthy and diseased populations using one or both of clinical techniques or statistical techniques. These techniques are also fundamental in a priori, direct reference interval approaches. The clinical techniques rely on using clinical data that is stored either in the electronic health record or within the laboratory database, to exclude patients with possible disease. It depends on the investigators understanding of the data and the pathological impacts on tests. The statistical technique relies on identifying a dominant, apparently healthy, typically Gaussian distribution, which is unaffected by the overlapping populations with higher (or lower) results. It depends on having large databases to give confidence in the extrapolation of the narrow portion of overall distribution representing unaffected individuals. The statistical issues involved can be complex, and can result in unintended bias, particularly when the impacts of disease and the physiological variations in the data are under appreciated.
Collapse
Affiliation(s)
- Kenneth A. Sikaris
- Department of Biochemistry , Melbourne Pathology , Collingwood , VIC , Australia
| |
Collapse
|
7
|
Zierk J, Baum H, Bertram A, Boeker M, Buchwald A, Cario H, Christoph J, Frühwald MC, Groß HJ, Groening A, Gscheidmeier T, Hoff T, Hoffmann R, Klauke R, Krebs A, Lichtinghagen R, Mühlenbrock-Lenter S, Neumann M, Nöllke P, Niemeyer CM, Ruf HG, Steigerwald U, Streichert T, Torge A, Yoshimi-Nöllke A, Prokosch HU, Metzler M, Rauh M. High-resolution pediatric reference intervals for 15 biochemical analytes described using fractional polynomials. Clin Chem Lab Med 2021; 59:1267-1278. [PMID: 33565284 DOI: 10.1515/cclm-2020-1371] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 01/28/2021] [Indexed: 01/04/2023]
Abstract
OBJECTIVES Assessment of children's laboratory test results requires consideration of the extensive changes that occur during physiological development and result in pronounced sex- and age-specific dynamics in many biochemical analytes. Pediatric reference intervals have to account for these dynamics, but ethical and practical challenges limit the availability of appropriate pediatric reference intervals that cover children from birth to adulthood. We have therefore initiated the multi-center data-driven PEDREF project (Next-Generation Pediatric Reference Intervals) to create pediatric reference intervals using data from laboratory information systems. METHODS We analyzed laboratory test results from 638,683 patients (217,883-982,548 samples per analyte, a median of 603,745 test results per analyte, and 10,298,067 test results in total) performed during patient care in 13 German centers. Test results from children with repeat measurements were discarded, and we estimated the distribution of physiological test results using a validated statistical approach (kosmic). RESULTS We report continuous pediatric reference intervals and percentile charts for alanine transaminase, aspartate transaminase, lactate dehydrogenase, alkaline phosphatase, γ-glutamyl-transferase, total protein, albumin, creatinine, urea, sodium, potassium, calcium, chloride, anorganic phosphate, and magnesium. Reference intervals are provided as tables and fractional polynomial functions (i.e., mathematical equations) that can be integrated into laboratory information systems. Additionally, Z-scores and percentiles enable the normalization of test results by age and sex to facilitate their interpretation across age groups. CONCLUSIONS The provided reference intervals and percentile charts enable precise assessment of laboratory test results in children from birth to adulthood. Our findings highlight the pronounced dynamics in many biochemical analytes in neonates, which require particular consideration in reference intervals to support clinical decision making most effectively.
Collapse
Affiliation(s)
- Jakob Zierk
- Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Erlangen, Germany.,Center of Medical Information and Communication Technology, University Hospital Erlangen, Erlangen, Germany
| | - Hannsjörg Baum
- Institute for Laboratory Medicine, Regionale Kliniken Holding RKH GmbH, Ludwigsburg, Germany
| | | | - Martin Boeker
- Institute of Medical Biometry and Statistics, Medical Data Science, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Armin Buchwald
- Institute for Clinical Chemistry and Laboratory Medicine, University of Freiburg, Freiburg, Germany
| | - Holger Cario
- Department of Pediatrics and Adolescent Medicine, University Medical Centre, Ulm, Germany
| | | | - Michael C Frühwald
- Paediatric and Adolescent Medicine, Medical Faculty and University Hospital Augsburg, Augsburg, Germany
| | - Hans-Jürgen Groß
- Core Facility of Clinical Chemistry, University Medical Centre Ulm, Ulm, Germany
| | | | - Thomas Gscheidmeier
- Core Facility of Clinical Chemistry, University Medical Centre Ulm, Ulm, Germany
| | - Torsten Hoff
- Central Laboratory, Gesundheit Nord - Bremen Hospital Group, Bremen, Germany
| | - Reinhard Hoffmann
- Institute for Laboratory Medicine and Microbiology, Medical Faculty and University Hospital Augsburg, Augsburg, Germany
| | - Rainer Klauke
- Institute of Clinical Chemistry, MHH, Hannover, Germany
| | | | | | | | - Michael Neumann
- Division of Laboratory Medicine, University Hospital of Würzburg, Würzburg, Germany
| | - Peter Nöllke
- Department of Pediatrics and Adolescent Medicine, Division of Pediatric Hematology and Oncology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Charlotte M Niemeyer
- Department of Pediatrics and Adolescent Medicine, Division of Pediatric Hematology and Oncology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Hans-Georg Ruf
- Institute for Laboratory Medicine and Microbiology, Medical Faculty and University Hospital Augsburg, Augsburg, Germany
| | - Udo Steigerwald
- Division of Laboratory Medicine, University Hospital of Würzburg, Würzburg, Germany
| | - Thomas Streichert
- Department of Clinical Chemistry, University Hospital of Cologne, Cologne, Germany
| | - Antje Torge
- Institute of Clinical Chemistry, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - Ayami Yoshimi-Nöllke
- Department of Pediatrics and Adolescent Medicine, Division of Pediatric Hematology and Oncology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Hans-Ulrich Prokosch
- Chair of Medical Informatics, Friedrich-Alexander-University Erlangen-Nuremberg, Erlangen, Germany
| | - Markus Metzler
- Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Erlangen, Germany
| | - Manfred Rauh
- Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Erlangen, Germany
| |
Collapse
|
8
|
Hepp T, Zierk J, Rauh M, Metzler M, Mayr A. Latent class distributional regression for the estimation of non-linear reference limits from contaminated data sources. BMC Bioinformatics 2020; 21:524. [PMID: 33187469 PMCID: PMC7666475 DOI: 10.1186/s12859-020-03853-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 10/30/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Medical decision making based on quantitative test results depends on reliable reference intervals, which represent the range of physiological test results in a healthy population. Current methods for the estimation of reference limits focus either on modelling the age-dependent dynamics of different analytes directly in a prospective setting or the extraction of independent distributions from contaminated data sources, e.g. data with latent heterogeneity due to unlabeled pathologic cases. In this article, we propose a new method to estimate indirect reference limits with non-linear dependencies on covariates from contaminated datasets by combining the framework of mixture models and distributional regression. RESULTS Simulation results based on mixtures of Gaussian and gamma distributions suggest accurate approximation of the true quantiles that improves with increasing sample size and decreasing overlap between the mixture components. Due to the high flexibility of the framework, initialization of the algorithm requires careful considerations regarding appropriate starting weights. Estimated quantiles from the extracted distribution of healthy hemoglobin concentration in boys and girls provide clinically useful pediatric reference limits similar to solutions obtained using different approaches which require more samples and are computationally more expensive. CONCLUSIONS Latent class distributional regression models represent the first method to estimate indirect non-linear reference limits from a single model fit, but the general scope of applications can be extended to other scenarios with latent heterogeneity.
Collapse
Affiliation(s)
- Tobias Hepp
- Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstraße 6, 91054, Erlangen, Germany.
| | - Jakob Zierk
- Kinder- und Jugendklinik, Universitätsklinikum Erlangen, Loschgestraße 15, 91054, Erlangen, Germany
| | - Manfred Rauh
- Kinder- und Jugendklinik, Universitätsklinikum Erlangen, Loschgestraße 15, 91054, Erlangen, Germany
| | - Markus Metzler
- Kinder- und Jugendklinik, Universitätsklinikum Erlangen, Loschgestraße 15, 91054, Erlangen, Germany
| | - Andreas Mayr
- Institut für Medizinische Biometrie, Informatik und Epidemiologie, Universitätsklinikum Bonn, Venusberg-Campus 1, 53127, Bonn, Germany
| |
Collapse
|