1
|
Zhang Y, Jiang X, Mentzer AJ, McVean G, Lunter G. Topic modeling identifies novel genetic loci associated with multimorbidities in UK Biobank. CELL GENOMICS 2023; 3:100371. [PMID: 37601973 PMCID: PMC10435382 DOI: 10.1016/j.xgen.2023.100371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 05/04/2023] [Accepted: 07/07/2023] [Indexed: 08/22/2023]
Abstract
Many diseases show patterns of co-occurrence, possibly driven by systemic dysregulation of underlying processes affecting multiple traits. We have developed a method (treeLFA) for identifying such multimorbidities from routine health-care data, which combines topic modeling with an informative prior derived from medical ontology. We apply treeLFA to UK Biobank data and identify a variety of topics representing multimorbidity clusters, including a healthy topic. We find that loci identified using topic weights as traits in a genome-wide association study (GWAS) analysis, which we validated with a range of approaches, only partially overlap with loci from GWASs on constituent single diseases. We also show that treeLFA improves upon existing methods like latent Dirichlet allocation in various ways. Overall, our findings indicate that topic models can characterize multimorbidity patterns and that genetic analysis of these patterns can provide insight into the etiology of complex traits that cannot be determined from the analysis of constituent traits alone.
Collapse
Affiliation(s)
- Yidong Zhang
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Chinese Academy of Medical Sciences Oxford Institute, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
- Department of Radiation Oncology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100006, China
| | - Xilin Jiang
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge CB2 0SR, UK
- Heart and Lung Research Institute, University of Cambridge, Cambridge CB2 0BB, UK
| | - Alexander J. Mentzer
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Gerton Lunter
- MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford OX3 9DS, UK
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen 9700 RB, the Netherlands
| |
Collapse
|
2
|
Kaplan AD, Greene JD, Liu VX, Ray P. Unsupervised probabilistic models for sequential Electronic Health Records. J Biomed Inform 2022; 134:104163. [PMID: 36038064 PMCID: PMC10588733 DOI: 10.1016/j.jbi.2022.104163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 06/23/2022] [Accepted: 08/11/2022] [Indexed: 11/18/2022]
Abstract
We develop an unsupervised probabilistic model for heterogeneous Electronic Health Record (EHR) data. Utilizing a mixture model formulation, our approach directly models sequences of arbitrary length, such as medications and laboratory results. This allows for subgrouping and incorporation of the dynamics underlying heterogeneous data types. The model consists of a layered set of latent variables that encode underlying structure in the data. These variables represent subject subgroups at the top layer, and unobserved states for sequences in the second layer. We train this model on episodic data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The resulting properties of the trained model generate novel insight from these complex and multifaceted data. In addition, we show how the model can be used to analyze sequences that contribute to assessment of mortality likelihood.
Collapse
Affiliation(s)
- Alan D Kaplan
- Computational Engineering Division, Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA 94550, United States of America.
| | - John D Greene
- Kaiser Permanente Division of Research, 2000 Broadway, Oakland, CA 94612, United States of America
| | - Vincent X Liu
- Kaiser Permanente Division of Research, 2000 Broadway, Oakland, CA 94612, United States of America
| | - Priyadip Ray
- Computational Engineering Division, Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA 94550, United States of America
| |
Collapse
|
3
|
Manktelow M, Iftikhar A, Bucholc M, McCann M, O'Kane M. Clinical and operational insights from data-driven care pathway mapping: a systematic review. BMC Med Inform Decis Mak 2022; 22:43. [PMID: 35177058 PMCID: PMC8851723 DOI: 10.1186/s12911-022-01756-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 01/11/2022] [Indexed: 01/23/2023] Open
Abstract
Background Accumulated electronic data from a wide variety of clinical settings has been processed using a range of informatics methods to determine the sequence of care activities experienced by patients. The “as is” or “de facto” care pathways derived can be analysed together with other data to yield clinical and operational information. It seems likely that the needs of both health systems and patients will lead to increasing application of such analyses. A comprehensive review of the literature is presented, with a focus on the study context, types of analysis undertaken, and the utility of the information gained. Methods A systematic review was conducted of literature abstracting sequential patient care activities (“de facto” care pathways) from care records. Broad coverage was achieved by initial screening of a Scopus search term, followed by screening of citations (forward snowball) and references (backwards snowball). Previous reviews of related topics were also considered. Studies were initially classified according to the perspective captured in the derived pathways. Concept matrices were then derived, classifying studies according to additional data used and subsequent analysis undertaken, with regard for the clinical domain examined and the knowledge gleaned. Results 254 publications were identified. The majority (n = 217) of these studies derived care pathways from data of an administrative/clinical type. 80% (n = 173) applied further analytical techniques, while 60% (n = 131) combined care pathways with enhancing data to gain insight into care processes. Discussion Classification of the objectives, analyses and complementary data used in data-driven care pathway mapping illustrates areas of greater and lesser focus in the literature. The increasing tendency for these methods to find practical application in service redesign is explored across the variety of contexts and research questions identified. A limitation of our approach is that the topic is broad, limiting discussion of methodological issues. Conclusion This review indicates that methods utilising data-driven determination of de facto patient care pathways can provide empirical information relevant to healthcare planning, management, and practice. It is clear that despite the number of publications found the topic reviewed is still in its infancy. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-022-01756-2.
Collapse
Affiliation(s)
- Matthew Manktelow
- Centre for Personalised Medicine, Clinical Decision Making and Patient Safety, Ulster University, C-TRIC, Altnagelvin Hospital Site, Derry-Londonderry, Northern Ireland.
| | - Aleeha Iftikhar
- Centre for Personalised Medicine, Clinical Decision Making and Patient Safety, Ulster University, C-TRIC, Altnagelvin Hospital Site, Derry-Londonderry, Northern Ireland
| | - Magda Bucholc
- School of Computing, Engineering and Intelligent Systems, Ulster University, Magee, Derry-Londonderry, Northern Ireland
| | - Michael McCann
- Department of Computing, Letterkenny Institute of Technology, Co. Donegal, Ireland
| | - Maurice O'Kane
- Clinical Chemistry Laboratory, Altnagelvin Hospital, Western Health and Social Care Trust, Derry-Londonderry, Northern Ireland
| |
Collapse
|
4
|
Yu K, Yang Z, Wu C, Huang Y, Xie X. In-hospital resource utilization prediction from electronic medical records with deep learning. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107052] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
5
|
Healthcare predictive analytics for disease progression: a longitudinal data fusion approach. J Intell Inf Syst 2020. [DOI: 10.1007/s10844-020-00606-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
6
|
Wehrhahn C, Leonard S, Rodriguez A, Xifara T. A Bayesian approach to disease clustering using restricted Chinese restaurant processes. Electron J Stat 2020. [DOI: 10.1214/20-ejs1696] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
7
|
|
8
|
Shen Y, Li Y, Zheng HT, Tang B, Yang M. Enhancing ontology-driven diagnostic reasoning with a symptom-dependency-aware Naïve Bayes classifier. BMC Bioinformatics 2019; 20:330. [PMID: 31196129 PMCID: PMC6567606 DOI: 10.1186/s12859-019-2924-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 05/31/2019] [Indexed: 11/10/2022] Open
Abstract
Background Ontology has attracted substantial attention from both academia and industry. Handling uncertainty reasoning is important in researching ontology. For example, when a patient is suffering from cirrhosis, the appearance of abdominal vein varices is four times more likely than the presence of bitter taste. Such medical knowledge is crucial for decision-making in various medical applications but is missing from existing medical ontologies. In this paper, we aim to discover medical knowledge probabilities from electronic medical record (EMR) texts to enrich ontologies. First, we build an ontology by identifying meaningful entity mentions from EMRs. Then, we propose a symptom-dependency-aware naïve Bayes classifier (SDNB) that is based on the assumption that there is a level of dependency among symptoms. To ensure the accuracy of the diagnostic classification, we incorporate the probability of a disease into the ontology via innovative approaches. Results We conduct a series of experiments to evaluate whether the proposed method can discover meaningful and accurate probabilities for medical knowledge. Based on over 30,000 deidentified medical records, we explore 336 abdominal diseases and 81 related symptoms. Among these 336 gastrointestinal diseases, the probabilities of 31 diseases are obtained via our method. These 31 probabilities of diseases and 189 conditional probabilities between diseases and the symptoms are added into the generated ontology. Conclusion In this paper, we propose a medical knowledge probability discovery method that is based on the analysis and extraction of EMR text data for enriching a medical ontology with probability information. The experimental results demonstrate that the proposed method can effectively identify accurate medical knowledge probability information from EMR data. In addition, the proposed method can efficiently and accurately calculate the probability of a patient suffering from a specified disease, thereby demonstrating the advantage of combining an ontology and a symptom-dependency-aware naïve Bayes classifier.
Collapse
Affiliation(s)
- Ying Shen
- School of Electronics and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, 518055, People's Republic of China
| | | | - Hai-Tao Zheng
- School of Information Science and Technology, Graduate School at Shenzhen, Tsinghua University, Shenzhen, 518055, People's Republic of China
| | - Buzhou Tang
- Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, People's Republic of China
| | - Min Yang
- SIAT, Chinese Academy of Sciences, Shenzhen, 518055, People's Republic of China.
| |
Collapse
|
9
|
Cui S, Wang D, Wang Y, Yu PW, Jin Y. An improved support vector machine-based diabetic readmission prediction. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 166:123-135. [PMID: 30415712 DOI: 10.1016/j.cmpb.2018.10.012] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 10/07/2018] [Accepted: 10/12/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND AND OBJECTIVE In healthcare systems, the cost of unplanned readmission accounts for a large proportion of total hospital payment. Hospital-specific readmission rate becomes a critical issue around the world. Quantification and early identification of unplanned readmission risks will improve the quality of care during hospitalization and reduce the occurrence of readmission. In clinical practice, medical workers generally use LACE score method to evaluate patient readmission risks, but this method usually performs poorly. With this in mind, this study presents a novel method combining support vector machine and genetic algorithm to build the risk prediction model, which simultaneously involves feature selection and the processing of imbalanced data. This model aims to provide decision support for clinicians during the discharge management of patients with diabetes. METHOD The experiments were conducted from a set of 8756 medical records with 50 different features about diabetic readmission. After preprocessing the data, an effective SMOTE-based method was proposed to solve the imbalance data problem. Further, in order to improve prediction performance, a hybrid feature selection mechanism was devised to select the important features. Subsequently, an improved support vector machine-based (SVM-based) method was developed and the genetic algorithm was used to tune the sensitive parameter of the algorithm. Finally, the five-fold cross-validation method was applied to compare the performance of proposed method with other methods (LACE score, logistic regression, naïve bayes, decision tree and feed forward neural networks). RESULTS Experimental results indicate that the proposed SVM-based method achieves an accuracy of 81.02%, a sensitivity of 82.89%, a specificity of 79.23%, and outperforms other popular algorithms in identifying diabetic patients who may be readmitted. CONCLUSIONS Our research can improve the performance of clinic decision support systems for diabetic readmission, by which the readmission possibility as well as the waste of medical resources can be reduced.
Collapse
Affiliation(s)
- Shaoze Cui
- School of Management Science and Engineering, Dalian University of Technology, Dalian 116023, PR China
| | - Dujuan Wang
- Business School of Sichuan University, Chengdu 610064, China.
| | - Yanzhang Wang
- School of Management Science and Engineering, Dalian University of Technology, Dalian 116023, PR China
| | - Pay-Wen Yu
- Department of Physical Education, Fu Jen Catholic University, New Taipei City 24205, Taiwan
| | - Yaochu Jin
- School of Management Science and Engineering, Dalian University of Technology, Dalian 116023, PR China; Department of Computer Science, University of Surrey, Guildford, Surrey GU2 7XH, United Kingdom
| |
Collapse
|
10
|
|