1
|
Schwartz CI, Farag A, Lopez KD, Moorhead S, Monsen KA. Using Omaha System data to explore relationships between client outcomes, phenotypes, and targeted home intervention approaches: an exemplar examining practice effectiveness for older women with circulation problems. J Am Med Inform Assoc 2023; 30:1773-1783. [PMID: 37335871 PMCID: PMC10586038 DOI: 10.1093/jamia/ocad106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 05/05/2023] [Accepted: 06/07/2023] [Indexed: 06/21/2023] Open
Abstract
BACKGROUND Improved health among older women remains elusive and may be linked to limited knowledge of and interventions targeted to population subgroups. Use of structured community nurse home visit data exploring relationships between client outcomes, phenotypes, and targeted intervention approaches may reveal new understandings of practice effectiveness. MATERIALS AND METHODS Omaha System data of 2363 women 65 years and older with circulation problems receiving at least 2 community nurse home visits were accessed. Previously identified phenotypes (Poor circulation; Irregular heart rate; and Limited symptoms), 7 intervention approaches (High-Surveillance; High-Teaching/Guidance/Counseling; Balanced-All; Balanced-Surveillance-Teaching/Guidance/Counseling; Low-Teaching/Guidance/Counseling-Balanced Other; Low-Surveillance-Mostly-Teaching/Guidance/Couseling-TreatmentProcedure-CaseManagement; and Mostly-TreatementProcedure+CaseManagement), and client knowledge, behavior, and status outcomes were used. Client-linked intervention approach counts, proportional use per phenotypes, and associations with client outcome scores were descriptively analyzed. Associations between intervention approach proportional use by phenotype and outcome scores were analyzed using parallel coordinate graph methodology for intervention approach effectiveness. RESULTS Percent use of intervention approach differed significantly by phenotype. The 2 most widely employed intervention approaches were characterized by either a high use of surveillance interventions or a balanced use of all intervention categories (surveillance, teaching/guidance/counseling, treatment-procedure, case-management). Mean outcome discharge and change scores significantly differed by intervention approach. Proportionally deployed intervention approach patterns by phenotype were associated with outcome small effects improvement. DISCUSSIONS AND CONCLUSIONS The Omaha System taxonomy supported the management and exploration of large multidimensional community nursing data of older women with circulation problems. This study offers a new way to examine intervention effectiveness using phenotype- and targeted intervention approach-informed structured data.
Collapse
Affiliation(s)
| | | | | | | | - Karen A Monsen
- School of Nursing, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
2
|
Alghamdi SM, Hoehndorf R. Improving the classification of cardinality phenotypes using collections. J Biomed Semantics 2023; 14:9. [PMID: 37550716 PMCID: PMC10405428 DOI: 10.1186/s13326-023-00290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 07/07/2023] [Indexed: 08/09/2023] Open
Abstract
MOTIVATION Phenotypes are observable characteristics of an organism and they can be highly variable. Information about phenotypes is collected in a clinical context to characterize disease, and is also collected in model organisms and stored in model organism databases where they are used to understand gene functions. Phenotype data is also used in computational data analysis and machine learning methods to provide novel insights into disease mechanisms and support personalized diagnosis of disease. For mammalian organisms and in a clinical context, ontologies such as the Human Phenotype Ontology and the Mammalian Phenotype Ontology are widely used to formally and precisely describe phenotypes. We specifically analyze axioms pertaining to phenotypes of collections of entities within a body, and we find that some of the axioms in phenotype ontologies lead to inferences that may not accurately reflect the underlying biological phenomena. RESULTS We reformulate the phenotypes of collections of entities using an ontological theory of collections. By reformulating phenotypes of collections in phenotypes ontologies, we avoid potentially incorrect inferences pertaining to the cardinality of these collections. We apply our method to two phenotype ontologies and show that the reformulation not only removes some problematic inferences but also quantitatively improves biological data analysis.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955, Thuwal, Saudi Arabia.
- King Abdul-Aziz University, Faculty of Computing and Information Technology, 25732, Rabigh, Saudi Arabia.
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955, Thuwal, Saudi Arabia.
| |
Collapse
|
3
|
Yamga E, Mullie L, Durand M, Cadrin-Chenevert A, Tang A, Montagnon E, Chartrand-Lefebvre C, Chassé M. Interpretable clinical phenotypes among patients hospitalized with COVID-19 using cluster analysis. Front Digit Health 2023; 5:1142822. [PMID: 37114183 PMCID: PMC10128042 DOI: 10.3389/fdgth.2023.1142822] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 03/13/2023] [Indexed: 04/29/2023] Open
Abstract
Background Multiple clinical phenotypes have been proposed for coronavirus disease (COVID-19), but few have used multimodal data. Using clinical and imaging data, we aimed to identify distinct clinical phenotypes in patients admitted with COVID-19 and to assess their clinical outcomes. Our secondary objective was to demonstrate the clinical applicability of this method by developing an interpretable model for phenotype assignment. Methods We analyzed data from 547 patients hospitalized with COVID-19 at a Canadian academic hospital. We processed the data by applying a factor analysis of mixed data (FAMD) and compared four clustering algorithms: k-means, partitioning around medoids (PAM), and divisive and agglomerative hierarchical clustering. We used imaging data and 34 clinical variables collected within the first 24 h of admission to train our algorithm. We conducted a survival analysis to compare the clinical outcomes across phenotypes. With the data split into training and validation sets (75/25 ratio), we developed a decision-tree-based model to facilitate the interpretation and assignment of the observed phenotypes. Results Agglomerative hierarchical clustering was the most robust algorithm. We identified three clinical phenotypes: 79 patients (14%) in Cluster 1, 275 patients (50%) in Cluster 2, and 203 (37%) in Cluster 3. Cluster 2 and Cluster 3 were both characterized by a low-risk respiratory and inflammatory profile but differed in terms of demographics. Compared with Cluster 3, Cluster 2 comprised older patients with more comorbidities. Cluster 1 represented the group with the most severe clinical presentation, as inferred by the highest rate of hypoxemia and the highest radiological burden. Intensive care unit (ICU) admission and mechanical ventilation risks were the highest in Cluster 1. Using only two to four decision rules, the classification and regression tree (CART) phenotype assignment model achieved an AUC of 84% (81.5-86.5%, 95 CI) on the validation set. Conclusions We conducted a multidimensional phenotypic analysis of adult inpatients with COVID-19 and identified three distinct phenotypes associated with different clinical outcomes. We also demonstrated the clinical usability of this approach, as phenotypes can be accurately assigned using a simple decision tree. Further research is still needed to properly incorporate these phenotypes in the management of patients with COVID-19.
Collapse
Affiliation(s)
- Eric Yamga
- Department of Medicine, Centre Hospitalier de l’Université de Montréal (CHUM), Montréal, QC, Canada
| | - Louis Mullie
- Department of Medicine, Centre Hospitalier de l’Université de Montréal (CHUM), Montréal, QC, Canada
| | - Madeleine Durand
- Department of Medicine, Centre Hospitalier de l’Université de Montréal (CHUM), Montréal, QC, Canada
- Centre de Recherche du Centre Hospitalier de l'Université de Montréal (CRCHUM), Montréal, QC, Canada
| | | | - An Tang
- Centre de Recherche du Centre Hospitalier de l'Université de Montréal (CRCHUM), Montréal, QC, Canada
- Department of Radiology and Nuclear Medicine, Centre Hospitalier de l’Université de Montréal (CHUM), Montréal, QC, Canada
| | - Emmanuel Montagnon
- Centre de Recherche du Centre Hospitalier de l'Université de Montréal (CRCHUM), Montréal, QC, Canada
| | - Carl Chartrand-Lefebvre
- Centre de Recherche du Centre Hospitalier de l'Université de Montréal (CRCHUM), Montréal, QC, Canada
- Department of Radiology and Nuclear Medicine, Centre Hospitalier de l’Université de Montréal (CHUM), Montréal, QC, Canada
| | - Michaël Chassé
- Department of Medicine, Centre Hospitalier de l’Université de Montréal (CHUM), Montréal, QC, Canada
- Centre de Recherche du Centre Hospitalier de l'Université de Montréal (CRCHUM), Montréal, QC, Canada
| |
Collapse
|
4
|
Characterizing fatigue phenotypes with other symptoms and clinically relevant outcomes among people with multiple sclerosis. Qual Life Res 2023; 32:151-160. [PMID: 35982203 DOI: 10.1007/s11136-022-03204-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/14/2022] [Indexed: 01/12/2023]
Abstract
PURPOSE Fatigue is a common symptom of multiple sclerosis (MS) and can adversely affect all aspect of quality of life. The etiology of fatigue remains unclear, and its treatments are suboptimal. Characterizing the phenotypes of fatigued persons with MS may help advance research on fatigue's etiology and identify ways to personalize fatigue interventions to improve quality of life. The purpose of this study was to identify fatigue phenotypes; examine phenotype stability overtime; and characterize phenotypes by health and function, social and environmental determinants, psychosocial factors, and engagement in healthy behaviors. METHODS We conducted a longitudinal study over a 3-month period with 289 fatigued participants with MS. To identify fatigue phenotypes and determine transition probabilities, we used latent profile and transition analyses with valid self-report measures of mental and physical fatigue severity, the mental and physical impact of fatigue, depression, anxiety, and sleep quality. We used ANOVAs and effect sizes to characterize differences among phenotypes. RESULTS The best fitting model included six subgroups of participants: Mild Phenotype, Mild-to-Moderate Phenotype, Moderate-to-Severe Phenotype, Severe Phenotype, Fatigue-dominant Phenotype, and Mental Health-dominant Phenotype. The transition analysis indicated that phenotypic membership was highly stable. Variables with a large eta squared effect size included environmental barriers, self-efficacy, and fatigue catastrophizing. CONCLUSION These results indicate that the magnitude of fatigue experienced may be more important to consider than the type of fatigue when characterizing fatigue phenotypes. Future research should explore whether tailoring interventions to environmental barriers, self-efficacy, and fatigue catastrophizing reduce the likelihood of transitioning to a more severe phenotype.
Collapse
|
5
|
Yang L, Zhang Y, Song Y, Zhang H, Yang R. Canonical transformation for multivariate mixed model association analyses. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:2147-2155. [PMID: 35536304 DOI: 10.1007/s00122-022-04103-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 04/11/2022] [Indexed: 06/14/2023]
Abstract
In extension of Single-RunKing to analyze multiple correlated traits, mvRunKing not only enlarged number of the analyzed phenotypes with canonical transformation, but also improved statistical power to detect pleiotropic QTNs through joint association analysis. Based on genomic variance-covariance matrices, we simplified multivariate mixed model association analysis to multiple univariate ones by using canonical transformation, and then individually implemented univariate association tests in the Single-RunKing. which enlarged number of the analyzed phenotypes. With canonical transformation back to the original scale, the association results would be biologically interpretable. Especially, we rapidly estimated genomic variance-covariance matrices with multivariate GEMMA and optimized separately the polygenic variances (or heritabilities) for only the markers that had large effects or higher significance levels in univariate mixed models, greatly improving computing efficiency for multiple univariate association tests. Beyond one test at once, joint association analysis for quantitative trait nucleotide (QTN) candidates can significantly increase statistical powers to detect QTNs. A user-friendly mvRunKing software was developed to efficiently implement multivariate mixed model association analyses.
Collapse
Affiliation(s)
- Li'ang Yang
- College of Life Science and College of Animal Scientific and Technology, Northeast Agricultural University, Harbin, 150030, China
| | - Ying Zhang
- College of Animal Science and Veterinary Medicine, Heilongjiang Bayi Agricultural University, Daqing, 163319, China
| | - Yuxin Song
- Research Centre for Aquatic Biotechnology, Chinese Academy of Fishery Sciences, Beijing, 100141, China
| | - Hengyu Zhang
- Department of Information and Computing Science, Heilongjiang Bayi Agricultural University, Daqing, 163319, China
| | - Runqing Yang
- Research Centre for Aquatic Biotechnology, Chinese Academy of Fishery Sciences, Beijing, 100141, China.
| |
Collapse
|
6
|
Genetically modified mice for research on human diseases: A triumph for Biotechnology or a work in progress? THE EUROBIOTECH JOURNAL 2022. [DOI: 10.2478/ebtj-2022-0008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/06/2022] Open
Abstract
Abstract
Genetically modified mice are engineered as models for human diseases. These mouse models include inbred strains, mutants, gene knockouts, gene knockins, and ‘humanized’ mice. Each mouse model is engineered to mimic a specific disease based on a theory of the genetic basis of that disease. For example, to test the amyloid theory of Alzheimer’s disease, mice with amyloid precursor protein genes are engineered, and to test the tau theory, mice with tau genes are engineered. This paper discusses the importance of mouse models in basic research, drug discovery, and translational research, and examines the question of how to define the “best” mouse model of a disease. The critiques of animal models and the caveats in translating the results from animal models to the treatment of human disease are discussed. Since many diseases are heritable, multigenic, age-related and experience-dependent, resulting from multiple gene-gene and gene-environment interactions, it will be essential to develop mouse models that reflect these genetic, epigenetic and environmental factors from a developmental perspective. Such models would provide further insight into disease emergence, progression and the ability to model two-hit and multi-hit theories of disease. The summary examines the biotechnology for creating genetically modified mice which reflect these factors and how they might be used to discover new treatments for complex human diseases such as cancers, neurodevelopmental and neurodegenerative diseases.
Collapse
|
7
|
Hamad MA, Krauel K, Schanze N, Gauchel N, Stachon P, Nuehrenberg T, Zurek M, Duerschmied D. Platelet Subtypes in Inflammatory Settings. Front Cardiovasc Med 2022; 9:823549. [PMID: 35463762 PMCID: PMC9021412 DOI: 10.3389/fcvm.2022.823549] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Accepted: 03/09/2022] [Indexed: 12/24/2022] Open
Abstract
In addition to their essential role in hemostasis and thrombosis, platelets also modulate inflammatory reactions and immune responses. This is achieved by specialized surface receptors as well as secretory products including inflammatory mediators and cytokines. Platelets can support and facilitate the recruitment of leukocytes into inflamed tissue. The various properties of platelet function make it less surprising that circulating platelets are different within one individual. Platelets have different physical properties leading to distinct subtypes of platelets based either on their function (procoagulant, aggregatory, secretory) or their age (reticulated/immature, non-reticulated/mature). To understand the significance of platelet phenotypic variation, qualitatively distinguishable platelet phenotypes should be studied in a variety of physiological and pathological circumstances. The advancement in proteomics instrumentation and tools (such as mass spectrometry-driven approaches) improved the ability to perform studies beyond that of foundational work. Despite the wealth of knowledge around molecular processes in platelets, knowledge gaps in understanding platelet phenotypes in health and disease exist. In this review, we report an overview of the role of platelet subpopulations in inflammation and a selection of tools for investigating the role of platelet subpopulations in inflammation.
Collapse
Affiliation(s)
- Muataz Ali Hamad
- Department of Cardiology and Angiology I, Heart Center, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany
- Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, Freiburg im Breisgau, Germany
- Faculty of Biology, University of Freiburg, Freiburg im Breisgau, Germany
| | - Krystin Krauel
- Department of Cardiology and Angiology I, Heart Center, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany
- Department of Cardiology, Angiology, Haemostaseology, and Medical Intensive Care, University Medical Centre Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Nancy Schanze
- Department of Cardiology and Angiology I, Heart Center, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany
- Department of Cardiology, Angiology, Haemostaseology, and Medical Intensive Care, University Medical Centre Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Nadine Gauchel
- Department of Cardiology and Angiology I, Heart Center, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany
| | - Peter Stachon
- Department of Cardiology and Angiology I, Heart Center, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany
| | - Thomas Nuehrenberg
- Department of Cardiology and Angiology II, Heart Center, Faculty of Medicine, University of Freiburg, Bad Krozingen, Germany
| | - Mark Zurek
- Department of Cardiology and Angiology II, Heart Center, Faculty of Medicine, University of Freiburg, Bad Krozingen, Germany
| | - Daniel Duerschmied
- Department of Cardiology and Angiology I, Heart Center, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany
- Department of Cardiology, Angiology, Haemostaseology, and Medical Intensive Care, University Medical Centre Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- European Center for AngioScience (ECAS) and German Center for Cardiovascular Research (DZHK) Partner Site Heidelberg/Mannheim, Mannheim, Germany
| |
Collapse
|
8
|
Zaghary WA, Elansary MM, Shouman DN, Abdelrahim AA, Abu-Zied KM, Sakr TM. Can nanotechnology overcome challenges facing stem cell therapy? A review. J Drug Deliv Sci Technol 2021. [DOI: 10.1016/j.jddst.2021.102883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
9
|
Chen L, Li Z, Zeng T, Zhang YH, Li H, Huang T, Cai YD. Predicting gene phenotype by multi-label multi-class model based on essential functional features. Mol Genet Genomics 2021; 296:905-918. [PMID: 33914130 DOI: 10.1007/s00438-021-01789-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/13/2021] [Indexed: 12/19/2022]
Abstract
Phenotype is one of the most significant concepts in genetics, which is used to describe all the characteristics of a research object that can be observed. Considering that phenotype reflects the integrated features of genotype and environment factors, it is hard to define phenotype characteristics, even difficult to predict unknown phenotypes. Restricted by current biological techniques, it is still quite expensive and time-consuming to obtain sufficient structural information of large-scale phenotype-associated genes/proteins. Various bioinformatics methods have been presented to solve such problem, and researchers have confirmed the efficacy and prediction accuracy of functional network-based prediction. But general functional descriptions have highly complicated inner structures for phenotype prediction. To further address this issue and improve the efficacy of phenotype prediction on more than ten kinds of phenotypes, we first extract functional enrichment features from GO and KEGG, and then use node2vec to learn functional embedding features of genes from a gene-gene network. All these features are analyzed by some feature selection methods (Boruta, minimum redundancy maximum relevance) to generate a feature list. Such list is fed into the incremental feature selection, incorporating some multi-label classifiers built by RAkEL and some classic base classifiers, to build an optimum multi-label multi-class classification model for phenotype prediction. According to recent researches, our method has indeed identified many literature-supported genes/proteins and their associated phenotypes, and even some candidate genes with re-assigned new phenotypes, which provide a new computational tool for the accurate and effective phenotypic prediction.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China.,College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China
| | - Zhandong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, 130052, People's Republic of China
| | - Tao Zeng
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Hao Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, 130052, People's Republic of China
| | - Tao Huang
- Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China.
| |
Collapse
|
10
|
Aslan JE. Platelet Proteomes, Pathways, and Phenotypes as Informants of Vascular Wellness and Disease. Arterioscler Thromb Vasc Biol 2021; 41:999-1011. [PMID: 33441027 PMCID: PMC7980774 DOI: 10.1161/atvbaha.120.314647] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Platelets rapidly undergo responsive transitions in form and function to repair vascular endothelium and mediate hemostasis. In contrast, heterogeneous platelet subpopulations with a range of primed or refractory phenotypes gradually arise in chronic inflammatory and other conditions in a manner that may indicate or support disease. Qualitatively distinguishable platelet phenotypes are increasingly associated with a variety of physiological and pathological circumstances; however, the origins and significance of platelet phenotypic variation remain unclear and conceptually vague. As changes in platelet function in disease exhibit many similarities to platelets following the activation of platelet agonist receptors, the intracellular responses of platelets common to hemostasis and inflammation may provide insights to the molecular basis of platelet phenotype. Here, we review concepts around how protein-level relations-from platelet receptors through intracellular signaling events-may help to define platelet phenotypes in inflammation, immune responses, aging, and other conditions. We further discuss how representing systems-wide platelet proteomics data profiles as circuit-like networks of causally related intracellular events, or, pathway maps, may inform molecular definitions of platelet phenotype. In addition to offering insights into platelets as druggable targets, maps of causally arranged intracellular relations underlying platelet function can also advance precision and interceptive medicine efforts by leveraging platelets as accessible, dynamic, endogenous, circulating biomarkers of vascular wellness and disease. Graphic Abstract: A graphic abstract is available for this article.
Collapse
Affiliation(s)
- Joseph E. Aslan
- Knight Cardiovascular Institute, School of Medicine, Oregon Health & Science University, Portland, Oregon, USA
- Department of Chemical Physiology and Biochemistry and School of Medicine, Oregon Health & Science University, Portland, Oregon, USA
- Department of Biomedical Engineering, School of Medicine, Oregon Health & Science University, Portland, Oregon, USA
| |
Collapse
|
11
|
Phenotypes for Resistant Bacteria Infections Using an Efficient Subgroup Discovery Algorithm. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-77211-6_27] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
12
|
Phenotyping Women Based on Dietary Macronutrients, Physical Activity, and Body Weight Using Machine Learning Tools. Nutrients 2019; 11:nu11071681. [PMID: 31336626 PMCID: PMC6682952 DOI: 10.3390/nu11071681] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 06/11/2019] [Accepted: 07/02/2019] [Indexed: 12/14/2022] Open
Abstract
Nutritional phenotyping can help achieve personalized nutrition, and machine learning tools may offer novel means to achieve phenotyping. The primary aim of this study was to use energy balance components, namely input (dietary energy intake and macronutrient composition) and output (physical activity) to predict energy stores (body weight) as a way to evaluate their ability to identify potential phenotypes based on these parameters. From the Women’s Health Initiative Observational Study (WHI OS), carbohydrates, proteins, fats, fibers, sugars, and physical activity variables, namely energy expended from mild, moderate, and vigorous intensity activity, were used to predict current body weight (both as body weight in kilograms and as a body mass index (BMI) category). Several machine learning tools were used for this prediction. Finally, cluster analysis was used to identify putative phenotypes. For the numerical predictions, the support vector machine (SVM), neural network, and k-nearest neighbor (kNN) algorithms performed modestly, with mean approximate errors (MAEs) of 6.70 kg, 6.98 kg, and 6.90 kg, respectively. For categorical prediction, SVM performed the best (54.5% accuracy), followed closely by the bagged tree ensemble and kNN algorithms. K-means cluster analysis improved prediction using numerical data, identified 10 clusters suggestive of phenotypes, with a minimum MAE of ~1.1 kg. A classifier was used to phenotype subjects into the identified clusters, with MAEs <5 kg for 15% of the test set (n = ~2000). This study highlights the challenges, limitations, and successes in using machine learning tools on self-reported data to identify determinants of energy balance.
Collapse
|
13
|
Harrington KM, Quaden R, Stein MB, Honerlaw JP, Cissell S, Pietrzak RH, Zhao H, Radhakrishnan K, Aslan M, Gaziano JM, Concato J, Gagnon DR, Gelernter J, Cho K. Validation of an Electronic Medical Record-Based Algorithm for Identifying Posttraumatic Stress Disorder in U.S. Veterans. J Trauma Stress 2019; 32:226-237. [PMID: 31009556 PMCID: PMC6699164 DOI: 10.1002/jts.22399] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 11/21/2018] [Accepted: 11/27/2018] [Indexed: 12/28/2022]
Abstract
We developed an algorithm for identifying U.S. veterans with a history of posttraumatic stress disorder (PTSD), using the Department of Veterans Affairs (VA) electronic medical record (EMR) system. This work was motivated by the need to create a valid EMR-based phenotype to identify thousands of cases and controls for a genome-wide association study of PTSD in veterans. We used manual chart review (n = 500) as the gold standard. For both the algorithm and chart review, three classifications were possible: likely PTSD, possible PTSD, and likely not PTSD. We used Lasso regression with cross-validation to select statistically significant predictors of PTSD from the EMR and then generate a predicted probability score of being a PTSD case for every participant in the study population (range: 0-1.00). Comparing the performance of our probabilistic approach (Lasso algorithm) to a rule-based approach (International Classification of Diseases [ICD] algorithm), the Lasso algorithm showed modestly higher overall percent agreement with chart review than the ICD algorithm (80% vs. 75%), higher sensitivity (0.95 vs. 0.84), and higher accuracy (AUC = 0.95 vs. 0.90). We applied a 0.7 probability cut-point to the Lasso results to determine final PTSD case-control status for the VA population. The final algorithm had a 0.99 sensitivity, 0.99 specificity, 0.95 positive predictive value, and 1.00 negative predictive value for PTSD classification (grouping possible PTSD and likely not PTSD) as determined by chart review. This algorithm may be useful for other research and quality improvement endeavors within the VA.
Collapse
Affiliation(s)
- Kelly M. Harrington
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
- Department of Psychiatry, Boston University School of Medicine, Boston, Massachusetts, USA
| | - Rachel Quaden
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
| | - Murray B. Stein
- Psychiatry Service, VA San Diego Healthcare System, San Diego, California, USA
- Departments of Psychiatry and Family Medicine & Public Health, University of California San Diego, La Jolla, California, USA
| | - Jacqueline P. Honerlaw
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
| | - Shadha Cissell
- Psychiatry Service, VA San Diego Healthcare System, San Diego, California, USA
| | - Robert H. Pietrzak
- Psychiatry Service, VA Connecticut Healthcare System, West Haven, Connecticut, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut, USA
| | - Hongyu Zhao
- VA Clinical Epidemiology Research Center (CERC), VA Connecticut Healthcare System, West Haven, Connecticut, USA
- Department of Biostatistics, Yale University School of Public Health, New Haven, Connecticut, USA
| | - Krishnan Radhakrishnan
- VA Clinical Epidemiology Research Center (CERC), VA Connecticut Healthcare System, West Haven, Connecticut, USA
- Department of Internal Medicine, University of Kentucky College of Medicine, Lexington, Kentucky, USA
| | - Mihaela Aslan
- VA Clinical Epidemiology Research Center (CERC), VA Connecticut Healthcare System, West Haven, Connecticut, USA
- Department of Medicine, Yale University School of Medicine, New Haven, Connecticut, USA
| | - John Michael Gaziano
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - John Concato
- VA Clinical Epidemiology Research Center (CERC), VA Connecticut Healthcare System, West Haven, Connecticut, USA
- Department of Medicine, Yale University School of Medicine, New Haven, Connecticut, USA
| | - David R. Gagnon
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Joel Gelernter
- Psychiatry Service, VA Connecticut Healthcare System, West Haven, Connecticut, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut, USA
- Departments of Genetics and Neuroscience, Yale University School of Medicine, New Haven, Connecticut, USA
| | - Kelly Cho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
14
|
Martinez-Carrasco AL, Juarez JM, Campos M, Morales A, Palacios F, Lopez-Rodriguez L. Interpretable Patient Subgrouping Using Trace-Based Clustering. Artif Intell Med 2019. [DOI: 10.1007/978-3-030-21642-9_33] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
15
|
Sylvestre E, Bouzillé G, Chazard E, His-Mahier C, Riou C, Cuggia M. Combining information from a clinical data warehouse and a pharmaceutical database to generate a framework to detect comorbidities in electronic health records. BMC Med Inform Decis Mak 2018; 18:9. [PMID: 29368609 PMCID: PMC5784648 DOI: 10.1186/s12911-018-0586-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 01/05/2018] [Indexed: 11/25/2022] Open
Abstract
Background Medical coding is used for a variety of activities, from observational studies to hospital billing. However, comorbidities tend to be under-reported by medical coders. The aim of this study was to develop an algorithm to detect comorbidities in electronic health records (EHR) by using a clinical data warehouse (CDW) and a knowledge database. Methods We enriched the Theriaque pharmaceutical database with the French national Comorbidities List to identify drugs associated with at least one major comorbid condition and diagnoses associated with a drug indication. Then, we compared the drug indications in the Theriaque database with the ICD-10 billing codes in EHR to detect potentially missing comorbidities based on drug prescriptions. Finally, we improved comorbidity detection by matching drug prescriptions and laboratory test results. We tested the obtained algorithm by using two retrospective datasets extracted from the Rennes University Hospital (RUH) CDW. The first dataset included all adult patients hospitalized in the ear, nose, throat (ENT) surgical ward between October and December 2014 (ENT dataset). The second included all adult patients hospitalized at RUH between January and February 2015 (general dataset). We reviewed medical records to find written evidence of the suggested comorbidities in current or past stays. Results Among the 22,132 Common Units of Dispensation (CUD) codes present in the Theriaque database, 19,970 drugs (90.2%) were associated with one or several ICD-10 diagnoses, based on their indication, and 11,162 (50.4%) with at least one of the 4878 comorbidities from the comorbidity list. Among the 122 patients of the ENT dataset, 75.4% had at least one drug prescription without corresponding ICD-10 code. The comorbidity diagnoses suggested by the algorithm were confirmed in 44.6% of the cases. Among the 4312 patients of the general dataset, 68.4% had at least one drug prescription without corresponding ICD-10 code. The comorbidity diagnoses suggested by the algorithm were confirmed in 20.3% of reviewed cases. Conclusions This simple algorithm based on combining accessible and immediately reusable data from knowledge databases, drug prescriptions and laboratory test results can detect comorbidities. Electronic supplementary material The online version of this article (10.1186/s12911-018-0586-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Emmanuelle Sylvestre
- INSERM, U1099, F-35000, Rennes, France. .,Université de Rennes 1, LTSI, F-35000, Rennes, France. .,CHU Rennes, CIC Inserm 1414, F-35000, Rennes, France. .,CHU Rennes, Centre de Données Cliniques, F-35000, Rennes, France.
| | - Guillaume Bouzillé
- INSERM, U1099, F-35000, Rennes, France.,Université de Rennes 1, LTSI, F-35000, Rennes, France.,CHU Rennes, CIC Inserm 1414, F-35000, Rennes, France.,CHU Rennes, Centre de Données Cliniques, F-35000, Rennes, France
| | - Emmanuel Chazard
- Département de Santé Publique, Université de Lille EA 2694, CHU Lille, F-59000, Lille, France
| | - Cécil His-Mahier
- INSERM, U1099, F-35000, Rennes, France.,Université de Rennes 1, LTSI, F-35000, Rennes, France.,CHU Rennes, CIC Inserm 1414, F-35000, Rennes, France.,CHU Rennes, Centre de Données Cliniques, F-35000, Rennes, France
| | - Christine Riou
- INSERM, U1099, F-35000, Rennes, France.,Université de Rennes 1, LTSI, F-35000, Rennes, France.,CHU Rennes, CIC Inserm 1414, F-35000, Rennes, France.,CHU Rennes, Centre de Données Cliniques, F-35000, Rennes, France
| | - Marc Cuggia
- INSERM, U1099, F-35000, Rennes, France.,Université de Rennes 1, LTSI, F-35000, Rennes, France.,CHU Rennes, CIC Inserm 1414, F-35000, Rennes, France.,CHU Rennes, Centre de Données Cliniques, F-35000, Rennes, France
| |
Collapse
|
16
|
He KY, Ge D, He MM. Big Data Analytics for Genomic Medicine. Int J Mol Sci 2017; 18:ijms18020412. [PMID: 28212287 PMCID: PMC5343946 DOI: 10.3390/ijms18020412] [Citation(s) in RCA: 104] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Revised: 02/08/2017] [Accepted: 02/09/2017] [Indexed: 12/25/2022] Open
Abstract
Genomic medicine attempts to build individualized strategies for diagnostic or therapeutic decision-making by utilizing patients’ genomic information. Big Data analytics uncovers hidden patterns, unknown correlations, and other insights through examining large-scale various data sets. While integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a Big Data infrastructure exhibit challenges, they also provide a feasible opportunity to develop an efficient and effective approach to identify clinically actionable genetic variants for individualized diagnosis and therapy. In this paper, we review the challenges of manipulating large-scale next-generation sequencing (NGS) data and diverse clinical data derived from the EHRs for genomic medicine. We introduce possible solutions for different challenges in manipulating, managing, and analyzing genomic and clinical data to implement genomic medicine. Additionally, we also present a practical Big Data toolset for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs.
Collapse
Affiliation(s)
- Karen Y He
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, USA.
| | | | - Max M He
- BioSciKin Co., Ltd., Nanjing 210042, China.
- Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, WI 53706, USA.
| |
Collapse
|
17
|
Is Chronic Obstructive Pulmonary Disease Caused by Wood Smoke a Different Phenotype or a Different Entity? ACTA ACUST UNITED AC 2016. [DOI: 10.1016/j.arbr.2016.06.026] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
18
|
Torres-Duque CA, García-Rodriguez MC, González-García M. Is Chronic Obstructive Pulmonary Disease Caused by Wood Smoke a Different Phenotype or a Different Entity? Arch Bronconeumol 2016; 52:425-31. [PMID: 27207325 DOI: 10.1016/j.arbres.2016.04.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Revised: 04/03/2016] [Accepted: 04/04/2016] [Indexed: 01/29/2023]
Abstract
Around 40% of the world's population continue using solid fuel, including wood, for cooking or heating their homes. Chronic exposure to wood smoke is a risk factor for developing chronic obstructive pulmonary disease (COPD). In some regions of the world, this can be a more important cause of COPD than exposure to tobacco smoke from cigarettes. Significant differences between COPD associated with wood smoke (W-COPD) and that caused by smoking (S-COPD) have led some authors to suggest that W-COPD should be considered a new COPD phenotype. We present a review of the differences between W-COPD and S-COPD. On the premise that wood smoke and tobacco smoke are not the same and the physiopathological mechanisms they induce may differ, we have analyzed whether W-COPD can be considered as another COPD phenotype or a distinct nosological entity.
Collapse
|
19
|
Lobo DSS, Aleksandrova L, Knight J, Casey DM, el-Guebaly N, Nobrega JN, Kennedy JL. Addiction-related genes in gambling disorders: new insights from parallel human and pre-clinical models. Mol Psychiatry 2015; 20:1002-10. [PMID: 25266122 DOI: 10.1038/mp.2014.113] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Revised: 07/30/2014] [Accepted: 08/04/2014] [Indexed: 11/09/2022]
Abstract
Neurobiological research supports the characterization of disordered gambling (DG) as a behavioral addiction. Recently, an animal model of gambling behavior was developed (rat gambling task, rGT), expanding the available tools to investigate DG neurobiology. We investigated whether rGT performance and associated risk gene expression in the rat's brain could provide cross-translational understanding of the neuromolecular mechanisms of addiction in DG. We genotyped tagSNPs (single-nucleotide polymorphisms) in 38 addiction-related genes in 400 DG and 345 non-DG subjects. Genes with P<0.1 in the human association analyses were selected to be investigated in the animal arm to determine whether their mRNA expression in rats was associated with the rat's performance on the rGT. In humans, DG was significantly associated with tagSNPs in DRD3 (rs167771) and CAMK2D (rs3815072). Our results suggest that age and gender might moderate the association between CAMK2D and DG. Moderation effects could not be investigated due to sample power. In the animal arm, only the association between rGT performance and Drd3 expression remained significant after Bonferroni correction for 59 brain regions. As male rats were used, gender effects could not be investigated. Our results corroborate previous findings reporting the involvement of DRD3 receptor in addictions. To our knowledge, the use of human genetics, pre-clinical models and gene expression as a cross-translation paradigm has not previously been attempted in the field of addictions. The cross-validation of human findings in animal models is crucial for improving the translation of basic research into clinical treatments, which could accelerate neurobiological and pharmacological investigations in addictions.
Collapse
Affiliation(s)
- D S S Lobo
- 1] Department of Psychiatry, University of Toronto, Centre for Addiction and Mental Health, Toronto, ON, Canada [2] Centre for Addiction and Mental Health, Toronto, ON, Canada
| | - L Aleksandrova
- 1] Centre for Addiction and Mental Health, Toronto, ON, Canada [2] Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, Canada
| | - J Knight
- 1] Department of Psychiatry, University of Toronto, Centre for Addiction and Mental Health, Toronto, ON, Canada [2] Centre for Addiction and Mental Health, Toronto, ON, Canada
| | - D M Casey
- Mental Health Commission of Canada, Calgary, AB, Canada
| | - N el-Guebaly
- Division of Addiction, Department of Psychiatry, University of Calgary, Calgary, AB, Canada
| | - J N Nobrega
- 1] Centre for Addiction and Mental Health, Toronto, ON, Canada [2] Departments of Pharmacology and Toxicology, Psychiatry, and Psychology, University of Toronto, Toronto, ON, Canada
| | - J L Kennedy
- 1] Department of Psychiatry, University of Toronto, Centre for Addiction and Mental Health, Toronto, ON, Canada [2] Centre for Addiction and Mental Health, Toronto, ON, Canada
| |
Collapse
|
20
|
A genome-wide association study identifies variants in KCNIP4 associated with ACE inhibitor-induced cough. THE PHARMACOGENOMICS JOURNAL 2015; 16:231-7. [PMID: 26169577 PMCID: PMC4713364 DOI: 10.1038/tpj.2015.51] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2014] [Revised: 04/13/2015] [Accepted: 06/03/2015] [Indexed: 12/30/2022]
Abstract
The most common side effect of angiotensin-converting enzyme inhibitor (ACEi) drugs is cough. We conducted a genome-wide association study (GWAS) of ACEi-induced cough among 7080 subjects of diverse ancestries in the Electronic Medical Records and Genomics (eMERGE) network. Cases were subjects diagnosed with ACEi-induced cough. Controls were subjects with at least 6 months of ACEi use and no cough. A GWAS (1595 cases and 5485 controls) identified associations on chromosome 4 in an intron of KCNIP4. The strongest association was at rs145489027 (minor allele frequency=0.33, odds ratio (OR)=1.3 (95% confidence interval (CI): 1.2–1.4), P=1.0 × 10−8). Replication for six single-nucleotide polymorphisms (SNPs) in KCNIP4 was tested in a second eMERGE population (n=926) and in the Genetics of Diabetes Audit and Research in Tayside, Scotland (GoDARTS) cohort (n=4309). Replication was observed at rs7675300 (OR=1.32 (1.01–1.70), P=0.04) in eMERGE and at rs16870989 and rs1495509 (OR=1.15 (1.01–1.30), P=0.03 for both) in GoDARTS. The combined association at rs1495509 was significant (OR=1.23 (1.15–1.32), P=1.9 × 10−9). These results indicate that SNPs in KCNIP4 may modulate ACEi-induced cough risk.
Collapse
|
21
|
Germline and somatic genetic predictors of pathological response in neoadjuvant settings of rectal and esophageal cancers: systematic review and meta-analysis. THE PHARMACOGENOMICS JOURNAL 2015; 16:249-65. [PMID: 26122021 DOI: 10.1038/tpj.2015.46] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Revised: 05/10/2015] [Accepted: 05/21/2015] [Indexed: 12/21/2022]
Abstract
Oncologists have pointed out an urgent need for biomarkers that can be useful for clinical application to predict the susceptibility of patients to preoperative therapy. This review collects, evaluates and combines data on the influence of reported somatic and germline genetic variations on histological tumor regression in neoadjuvant settings of rectal and esophageal cancers. Five hundred and twenty-seven articles were identified, 204 retrieved and 61 studies included. Among 24 and 14 genetic markers reported for rectal and esophageal cancers, respectively, significant associations in meta-analyses were demonstrated for the following markers. In rectal cancer, major response was more frequent in carriers of the TYMS genotype 2 R/2 R-2 R/3 R (rs34743033), MTHFR genotype 677C/C (rs1801133), wild-type TP53 and KRAS genes. In esophageal cancer, successful therapy appeared to correlate with wild-type TP53. These results may be useful for future research directions to translate reported data into practical clinical use.
Collapse
|
22
|
Chen Y, Zhuo X, Lin Y, Huang W, Xiao J, Zeng J, Jiang L, Chen C, Lin H, Dettke M. Association of ABO blood group with P-selectin levels in Chinese Han healthy volunteers. Transfusion 2015; 55:2759-65. [PMID: 26095340 DOI: 10.1111/trf.13212] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Revised: 05/04/2015] [Accepted: 05/25/2015] [Indexed: 12/26/2022]
Abstract
BACKGROUND Recent genome-wide association studies in Caucasians suggested that an association exists between the ABO gene locus and soluble levels of P-selectin (sP-selectin). However, it is unclear if the relationship corresponds to the phenotypic expression of ABO groups or is present in different ethnic groups. The aim of this study was to verify this observation at both genotypic and phenotypic levels in a healthy Chinese population. STUDY DESIGN AND METHODS The ABO blood groups were determined by both phenotypes and genotypes in 440 healthy Chinese Han volunteers, while P-selectin levels were evaluated for sP-selectin and total platelet P-selectin (pP-selectin). RESULTS ABO phenotyping and quantitative analysis of individual sP-selectin plasma levels were combined to demonstrate that individuals phenotypically expressing the A antigen have approximately 20% lower sP-selectin plasma levels than those carrying the B or O phenotype (p < 0.0001), but that no difference exists between A and AB and between B and O phenotypes. Genotyping data revealed that the presence of the A gene could be attributed to the observed difference in phenotype comparison, with no difference between A/A, A/B, and A/O genotypes. There were also no associations between ABO blood groups, either phenotypes or genotypes, and pP-selectin levels. CONCLUSION This study demonstrated an association between sP-selectin levels and ABO groups in a Chinese Han population, implicating its generalizability to other ethnic groups. This finding will improve the understanding of the mechanism of ABO blood group-associated diseases.
Collapse
Affiliation(s)
- Ying Chen
- Fujian Provincial Cancer Hospital, the Affiliated Hospital of Fujian Medical University.,Fujian Provincial Blood Center, Fuzhou, China
| | - Xiaofu Zhuo
- Fujian Provincial Blood Center, Fuzhou, China
| | | | | | - Jingrong Xiao
- Fujian Provincial Cancer Hospital, the Affiliated Hospital of Fujian Medical University
| | - Jia Zeng
- Fujian Provincial Blood Center, Fuzhou, China
| | - Li Jiang
- Fujian Provincial Blood Center, Fuzhou, China
| | - Cen Chen
- Fujian Provincial Blood Center, Fuzhou, China
| | - Haijuan Lin
- Fujian Provincial Blood Center, Fuzhou, China
| | - Markus Dettke
- Department of Blood Group Serology and Transfusion Medicine, Medical University of Vienna, Vienna, Austria
| |
Collapse
|
23
|
Peissig PL, Santos Costa V, Caldwell MD, Rottscheit C, Berg RL, Mendonca EA, Page D. Relational machine learning for electronic health record-driven phenotyping. J Biomed Inform 2014; 52:260-70. [PMID: 25048351 PMCID: PMC4261015 DOI: 10.1016/j.jbi.2014.07.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Revised: 05/21/2014] [Accepted: 07/08/2014] [Indexed: 01/19/2023]
Abstract
OBJECTIVE Electronic health records (EHR) offer medical and pharmacogenomics research unprecedented opportunities to identify and classify patients at risk. EHRs are collections of highly inter-dependent records that include biological, anatomical, physiological, and behavioral observations. They comprise a patient's clinical phenome, where each patient has thousands of date-stamped records distributed across many relational tables. Development of EHR computer-based phenotyping algorithms require time and medical insight from clinical experts, who most often can only review a small patient subset representative of the total EHR records, to identify phenotype features. In this research we evaluate whether relational machine learning (ML) using inductive logic programming (ILP) can contribute to addressing these issues as a viable approach for EHR-based phenotyping. METHODS Two relational learning ILP approaches and three well-known WEKA (Waikato Environment for Knowledge Analysis) implementations of non-relational approaches (PART, J48, and JRIP) were used to develop models for nine phenotypes. International Classification of Diseases, Ninth Revision (ICD-9) coded EHR data were used to select training cohorts for the development of each phenotypic model. Accuracy, precision, recall, F-Measure, and Area Under the Receiver Operating Characteristic (AUROC) curve statistics were measured for each phenotypic model based on independent manually verified test cohorts. A two-sided binomial distribution test (sign test) compared the five ML approaches across phenotypes for statistical significance. RESULTS We developed an approach to automatically label training examples using ICD-9 diagnosis codes for the ML approaches being evaluated. Nine phenotypic models for each ML approach were evaluated, resulting in better overall model performance in AUROC using ILP when compared to PART (p=0.039), J48 (p=0.003) and JRIP (p=0.003). DISCUSSION ILP has the potential to improve phenotyping by independently delivering clinically expert interpretable rules for phenotype definitions, or intuitive phenotypes to assist experts. CONCLUSION Relational learning using ILP offers a viable approach to EHR-driven phenotyping.
Collapse
Affiliation(s)
- Peggy L Peissig
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA.
| | - Vitor Santos Costa
- DCC-FCUP and CRACS INESC-TEC, Department de Ciência de Computadores, Universidade do Porto, Portugal
| | | | - Carla Rottscheit
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | - Richard L Berg
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | - Eneida A Mendonca
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, USA; Department of Pediatrics, University of Wisconsin-Madison, USA
| | - David Page
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, USA; Department of Computer Sciences, University of Wisconsin-Madison, USA
| |
Collapse
|
24
|
Wells QS, Farber-Eger E, Crawford DC. Extraction of echocardiographic data from the electronic medical record is a rapid and efficient method for study of cardiac structure and function. J Clin Bioinforma 2014; 4:12. [PMID: 25276338 PMCID: PMC4177384 DOI: 10.1186/2043-9113-4-12] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Accepted: 09/11/2014] [Indexed: 11/28/2022] Open
Abstract
Background Measures of cardiac structure and function are important human phenotypes that are associated with a range of clinical outcomes. Studying these traits in large populations can be time consuming and costly. Utilizing data from large electronic medical records (EMRs) is one possible solution to this problem. We describe the extraction and filtering of quantitative transthoracic echocardiographic data from the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study, a large, racially diverse, EMR-based cohort (n = 15,863). Results There were 6,076 echocardiography reports for 2,834 unique adult subjects. Missing data were uncommon with over 90% of data points present. Data irregularities are primarily related to inconsistent use of measurement units and transcriptional errors. The reported filtering method requires manual review of very few data points (<1%), and filtered echocardiographic parameters are similar to published data from epidemiologic populations of similar ethnicity. Moreover, the cohort is comparable in size, and in some cases larger than community-based cohorts of similar race/ethnicity. Conclusions These results demonstrate that echocardiographic data can be efficiently extracted from EMRs, and suggest that EMR-based cohorts have the potential to make major contributions toward the study of epidemiologic and genotype-phenotype associations for cardiac structure and function in diverse populations.
Collapse
Affiliation(s)
- Quinn S Wells
- Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA ; Department of Pharmacology, Vanderbilt University, Nashville, TN 37232, USA ; Vanderbilt University Medical Center, 2525 West End Avenue, Suite 300, Nashville TN 37203, USA
| | - Eric Farber-Eger
- Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37232, USA
| | - Dana C Crawford
- Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37232, USA ; Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37232, USA
| |
Collapse
|
25
|
Liao J, Li X, Wong TY, Wang JJ, Khor CC, Tai ES, Aung T, Teo YY, Cheng CY. Impact of measurement error on testing genetic association with quantitative traits. PLoS One 2014; 9:e87044. [PMID: 24475218 PMCID: PMC3901720 DOI: 10.1371/journal.pone.0087044] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 12/17/2013] [Indexed: 12/23/2022] Open
Abstract
Measurement error of a phenotypic trait reduces the power to detect genetic associations. We examined the impact of sample size, allele frequency and effect size in presence of measurement error for quantitative traits. The statistical power to detect genetic association with phenotype mean and variability was investigated analytically. The non-centrality parameter for a non-central F distribution was derived and verified using computer simulations. We obtained equivalent formulas for the cost of phenotype measurement error. Effects of differences in measurements were examined in a genome-wide association study (GWAS) of two grading scales for cataract and a replication study of genetic variants influencing blood pressure. The mean absolute difference between the analytic power and simulation power for comparison of phenotypic means and variances was less than 0.005, and the absolute difference did not exceed 0.02. To maintain the same power, a one standard deviation (SD) in measurement error of a standard normal distributed trait required a one-fold increase in sample size for comparison of means, and a three-fold increase in sample size for comparison of variances. GWAS results revealed almost no overlap in the significant SNPs (p<10−5) for the two cataract grading scales while replication results in genetic variants of blood pressure displayed no significant differences between averaged blood pressure measurements and single blood pressure measurements. We have developed a framework for researchers to quantify power in the presence of measurement error, which will be applicable to studies of phenotypes in which the measurement is highly variable.
Collapse
Affiliation(s)
- Jiemin Liao
- Department of Ophthalmology, National University of Singapore and National University Health System, Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
| | - Xiang Li
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
| | - Tien-Yin Wong
- Department of Ophthalmology, National University of Singapore and National University Health System, Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
- Saw Swee Hock School of Public Health, National University Health System, National University of Singapore, Singapore, Singapore
| | - Jie Jin Wang
- Centre for Vision Research, University of Sydney, Sydney, Australia
| | - Chiea Chuen Khor
- Department of Ophthalmology, National University of Singapore and National University Health System, Singapore, Singapore
- Division of Human Genetics, Genome Institute of Singapore, Singapore, Singapore
| | - E. Shyong Tai
- Saw Swee Hock School of Public Health, National University Health System, National University of Singapore, Singapore, Singapore
- Department of Medicine, National University of Singapore and National University Health System, Singapore, Singapore
- Duke-NUS Graduate Medical School, Singapore, Singapore
| | - Tin Aung
- Department of Ophthalmology, National University of Singapore and National University Health System, Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
| | - Yik-Ying Teo
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
- Saw Swee Hock School of Public Health, National University Health System, National University of Singapore, Singapore, Singapore
| | - Ching-Yu Cheng
- Department of Ophthalmology, National University of Singapore and National University Health System, Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
- Saw Swee Hock School of Public Health, National University Health System, National University of Singapore, Singapore, Singapore
- Duke-NUS Graduate Medical School, Singapore, Singapore
- * E-mail:
| |
Collapse
|
26
|
Rosenman M, He J, Martin J, Nutakki K, Eckert G, Lane K, Gradus-Pizlo I, Hui SL. Database queries for hospitalizations for acute congestive heart failure: flexible methods and validation based on set theory. J Am Med Inform Assoc 2013; 21:345-52. [PMID: 24113802 DOI: 10.1136/amiajnl-2013-001942] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND AND OBJECTIVE Electronic health records databases are increasingly used for identifying cohort populations, covariates, or outcomes, but discerning such clinical 'phenotypes' accurately is an ongoing challenge. We developed a flexible method using overlapping (Venn diagram) queries. Here we describe this approach to find patients hospitalized with acute congestive heart failure (CHF), a sampling strategy for one-by-one 'gold standard' chart review, and calculation of positive predictive value (PPV) and sensitivities, with SEs, across different definitions. MATERIALS AND METHODS We used retrospective queries of hospitalizations (2002-2011) in the Indiana Network for Patient Care with any CHF ICD-9 diagnoses, a primary diagnosis, an echocardiogram performed, a B-natriuretic peptide (BNP) drawn, or BNP >500 pg/mL. We used a hybrid between proportional sampling by Venn zone and over-sampling non-overlapping zones. The acute CHF (presence/absence) outcome was based on expert chart review using a priori criteria. RESULTS Among 79,091 hospitalizations, we reviewed 908. A query for any ICD-9 code for CHF had PPV 42.8% (SE 1.5%) for acute CHF and sensitivity 94.3% (1.3%). Primary diagnosis of 428 and BNP >500 pg/mL had PPV 90.4% (SE 2.4%) and sensitivity 28.8% (1.1%). PPV was <10% when there was no echocardiogram, no BNP, and no primary diagnosis. 'False positive' hospitalizations were for other heart disease, lung disease, or other reasons. CONCLUSIONS This novel method successfully allowed flexible application and validation of queries for patients hospitalized with acute CHF.
Collapse
Affiliation(s)
- Marc Rosenman
- Children's Health Services Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | | | | | | | | | | | | | | |
Collapse
|
27
|
Newton KM, Peissig PL, Kho AN, Bielinski SJ, Berg RL, Choudhary V, Basford M, Chute CG, Kullo IJ, Li R, Pacheco JA, Rasmussen LV, Spangler L, Denny JC. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J Am Med Inform Assoc 2013; 20:e147-54. [PMID: 23531748 PMCID: PMC3715338 DOI: 10.1136/amiajnl-2012-000896] [Citation(s) in RCA: 278] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2012] [Accepted: 03/05/2013] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats. OBJECTIVE To present lessons learned about validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies. MATERIALS AND METHODS The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University. RESULTS By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results. CONCLUSIONS Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.
Collapse
|
28
|
Using multiple measures for quantitative trait association analyses: application to estimated glomerular filtration rate. J Hum Genet 2013; 58:461-6. [PMID: 23535967 PMCID: PMC3711970 DOI: 10.1038/jhg.2013.23] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Revised: 02/27/2013] [Accepted: 03/03/2013] [Indexed: 01/07/2023]
Abstract
Studies of multiple measures of a quantitative trait can have greater precision and thus statistical power compared with single-measure studies, but this has rarely been studied in the relation to quantitative trait measurement error models in genetic association studies. Using estimated glomerular filtration rate (eGFR), a quantitative measure of kidney function, as an example we constructed measurement error models of a quantitative trait with systematic and random error components. We then examined the effects on precision of the parameter estimate between genetic loci and eGFR, resulting from varying the correlation and contribution of the error components. We also compared the empirical results from three genome-wide association studies (GWAS) of kidney function in 9049 European Americans: a single measure model, a three-measure model of the same biomarker of kidney function and a six-measure model of different biomarkers of kidney function. Simulations showed that given the same amount of overall errors, inclusion of measures with less correlated systematic errors led to greater gain in precision. The empirical GWAS results confirmed that both the three- and six-measure models detected more eGFR-associated genomic loci with stronger statistical association than the single-measure model despite some heterogeneity among the measures. Multiple measures of a quantitative trait can increase the statistical power of a study without additional participant recruitment. However, careful attention must be paid to the correlation of systematic errors and inconsistent associations when different biomarkers or methods are used to measure the quantitative trait.
Collapse
|
29
|
Correction of phenotype misclassification based on high-discrimination genetic predictive risk models. Epidemiology 2013; 23:902-9. [PMID: 23023008 DOI: 10.1097/ede.0b013e31826c3129] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Misclassification of phenotype status can seriously affect accuracy in association studies, including studies of genetic risk factors. A common problem is the classification of participants as nondiseased because of insufficient diagnostic workup or because participants have not been followed up long enough to develop disease. Some validated predictive models may have high discrimination in predicting disease. We suggest that information from such models can be used to predict the risk that a nondiseased participant will eventually develop disease and to recode the status of participants predicted to be at highest risk. We evaluate conditions under which recoding results in a maximal net improvement in the accuracy of phenotype classification. Net improvement is expected only when the positive likelihood ratio of the predictive model is larger than the inverse of the odds of disease among apparently nondiseased controls. We conducted simulations to probe the impact of reclassification on the power to detect new risk factors under several scenarios of classification accuracy of the previously developed models. We also apply this framework to a validated model of progression to advanced age-related macular degeneration that uses genetic and nongenetic variables (area under the curve = 0.915). In the training cohort (n = 2,937) and a separate validation cohort (n = 1,227), 195-272 and 78-91 nonprogressor participants, respectively, were reclassified as progressors. Correction of phenotype misclassification based on highly informative predictive models may be helpful in identifying additional genetic and other risk factors, when there are validated risk factors that provide strong discriminating ability.
Collapse
|
30
|
Peissig PL, Rasmussen LV, Berg RL, Linneman JG, McCarty CA, Waudby C, Chen L, Denny JC, Wilke RA, Pathak J, Carrell D, Kho AN, Starren JB. Importance of multi-modal approaches to effectively identify cataract cases from electronic health records. J Am Med Inform Assoc 2012; 19:225-34. [PMID: 22319176 DOI: 10.1136/amiajnl-2011-000456] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. MATERIALS AND METHODS We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. RESULTS An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. DISCUSSION A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. CONCLUSION We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries.
Collapse
Affiliation(s)
- Peggy L Peissig
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, Wisconsin 54449, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Garlet GP, Trombone APF, Menezes R, Letra A, Repeke CE, Vieira AE, Martins W, Neves LTD, Campanelli AP, Santos CFD, Vieira AR. The use of chronic gingivitis as reference status increases the power and odds of periodontitis genetic studies: a proposal based in the exposure concept and clearer resistance and susceptibility phenotypes definition. J Clin Periodontol 2012; 39:323-32. [PMID: 22324464 DOI: 10.1111/j.1600-051x.2012.01859.x] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/15/2012] [Indexed: 11/29/2022]
Abstract
AIM Current literature on chronic periodontitis genetics encompasses numerous single nucleotide polymorphisms-focused case-control studies with inconsistent and controversial results, which typically disregards the exposure concept embraced by case-control definition. Herein, we propose a case-control design reappraisal by clear phenotype selection, where chronic gingivitis represents a genetically resistant phenotype/genotype opposing the susceptible cohort. MATERIAL AND METHODS The hypothesis was tested in healthy, chronic periodontitis and gingivitis groups through Real-time PCR-based allelic discrimination of classic variants IL1B-3954, IL6-174, TNFA-308, IL10-592 and TLR4-299. RESULTS Observed allele/genotype frequencies characterize the healthy group with an intermediate genetic profile between periodontitis and gingivitis cohorts. When comparing genotype/allele frequencies in periodontitis versus healthy and periodontitis versus gingivitis scenarios, the number of positive associations (2-4) and the degree of association (p and odds ratio values) were significantly increased by the new approach proposed (periodontitis versus gingivitis), suggesting the association of IL1B-3954, TNFA-308, IL10-592 and TLR4-299 with periodontitis risk. Power study was also significantly improved by the new study design proposed when compared to the traditional approach. CONCLUSIONS The data presented herein support the use of new case-control study design based on the case-control definition and clear resistance/susceptibility phenotypes selection, which can significantly impact the study power and odds of identification of genetic factors involved in PD.
Collapse
Affiliation(s)
- Gustavo Pompermaier Garlet
- Department of Biological Sciences, School of Dentistry of Bauru, São Paulo University (FOB/USP), Bauru, SP, Brazil
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Shriner D. Moving toward System Genetics through Multiple Trait Analysis in Genome-Wide Association Studies. Front Genet 2012; 3:1. [PMID: 22303408 PMCID: PMC3266611 DOI: 10.3389/fgene.2012.00001] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2011] [Accepted: 01/01/2012] [Indexed: 02/05/2023] Open
Abstract
Association studies are a staple of genotype–phenotype mapping studies, whether they are based on single markers, haplotypes, candidate genes, genome-wide genotypes, or whole genome sequences. Although genetic epidemiological studies typically contain data collected on multiple traits which themselves are often correlated, most analyses have been performed on single traits. Here, I review several methods that have been developed to perform multiple trait analysis. These methods range from traditional multivariate models for systems of equations to recently developed graphical approaches based on network theory. The application of network theory to genetics is termed systems genetics and has the potential to address long-standing questions in genetics about complex processes such as coordinate regulation, homeostasis, and pleiotropy.
Collapse
Affiliation(s)
- Daniel Shriner
- Center for Research on Genomics and Global Health, National Human Genome Research Institute Bethesda, MD, USA
| |
Collapse
|
33
|
Evangelou E, Fellay J, Colombo S, Martinez-Picado J, Obel N, Goldstein DB, Telenti A, Ioannidis JPA. Impact of phenotype definition on genome-wide association signals: empirical evaluation in human immunodeficiency virus type 1 infection. Am J Epidemiol 2011; 173:1336-42. [PMID: 21490045 DOI: 10.1093/aje/kwr024] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Discussion on improving the power of genome-wide association studies to identify candidate variants and genes is generally centered on issues of maximizing sample size; less attention is given to the role of phenotype definition and ascertainment. The authors used genome-wide data from patients infected with human immunodeficiency virus type 1 (HIV-1) to assess whether differences in type of population (622 seroconverters vs. 636 seroprevalent subjects) or the number of measurements available for defining the phenotype resulted in differences in the effect sizes of associations between single nucleotide polymorphisms and the phenotype, HIV-1 viral load at set point. The effect estimate for the top 100 single nucleotide polymorphisms was 0.092 (95% confidence interval: 0.074, 0.110) log(10) viral load (log(10) copies of HIV-1 per mL of blood) greater in seroconverters than in seroprevalent subjects. The difference was even larger when the authors focused on chromosome 6 variants (0.153 log(10) viral load) or on variants that achieved genome-wide significance (0.232 log(10) viral load). The estimates of the genetic effects tended to be slightly larger when more viral load measurements were available, particularly among seroconverters and for variants that achieved genome-wide significance. Differences in phenotype definition and ascertainment may affect the estimated magnitude of genetic effects and should be considered in optimizing power for discovering new associations.
Collapse
Affiliation(s)
- Evangelos Evangelou
- Institute of Microbiology, University Hospital Center, University of Lausanne, Lausanne, Switzerland
| | | | | | | | | | | | | | | |
Collapse
|
34
|
Winkler AM, Kochunov P, Blangero J, Almasy L, Zilles K, Fox PT, Duggirala R, Glahn DC. Cortical thickness or grey matter volume? The importance of selecting the phenotype for imaging genetics studies. Neuroimage 2009; 53:1135-46. [PMID: 20006715 DOI: 10.1016/j.neuroimage.2009.12.028] [Citation(s) in RCA: 874] [Impact Index Per Article: 58.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2009] [Revised: 12/02/2009] [Accepted: 12/04/2009] [Indexed: 01/10/2023] Open
Abstract
Choosing the appropriate neuroimaging phenotype is critical to successfully identify genes that influence brain structure or function. While neuroimaging methods provide numerous potential phenotypes, their role for imaging genetics studies is unclear. Here we examine the relationship between brain volume, grey matter volume, cortical thickness and surface area, from a genetic standpoint. Four hundred and eighty-six individuals from randomly ascertained extended pedigrees with high-quality T1-weighted neuroanatomic MRI images participated in the study. Surface-based and voxel-based representations of brain structure were derived, using automated methods, and these measurements were analysed using a variance-components method to identify the heritability of these traits and their genetic correlations. All neuroanatomic traits were significantly influenced by genetic factors. Cortical thickness and surface area measurements were found to be genetically and phenotypically independent. While both thickness and area influenced volume measurements of cortical grey matter, volume was more closely related to surface area than cortical thickness. This trend was observed for both the volume-based and surface-based techniques. The results suggest that surface area and cortical thickness measurements should be considered separately and preferred over gray matter volumes for imaging genetic studies.
Collapse
Affiliation(s)
- Anderson M Winkler
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA.
| | | | | | | | | | | | | | | |
Collapse
|
35
|
Murphy S, Churchill S, Bry L, Chueh H, Weiss S, Lazarus R, Zeng Q, Dubey A, Gainer V, Mendis M, Glaser J, Kohane I. Instrumenting the health care enterprise for discovery research in the genomic era. Genome Res 2009; 19:1675-81. [PMID: 19602638 DOI: 10.1101/gr.094615.109] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Tens of thousands of subjects may be required to obtain reliable evidence relating disease characteristics to the weak effects typically reported from common genetic variants. The costs of assembling, phenotyping, and studying these large populations are substantial, recently estimated at three billion dollars for 500,000 individuals. They are also decade-long efforts. We hypothesized that automation and analytic tools can repurpose the informational byproducts of routine clinical care, bringing sample acquisition and phenotyping to the same high-throughput pace and commodity price-point as is currently true of genome-wide genotyping. Described here is a demonstration of the capability to acquire samples and data from densely phenotyped and genotyped individuals in the tens of thousands for common diseases (e.g., in a 1-yr period: N = 15,798 for rheumatoid arthritis; N = 42,238 for asthma; N = 34,535 for major depressive disorder) in one academic health center at an order of magnitude lower cost. Even for rare diseases caused by rare, highly penetrant mutations such as Huntington disease (N = 102) and autism (N = 756), these capabilities are also of interest.
Collapse
Affiliation(s)
- Shawn Murphy
- Informatics, Partners Healthcare Systems, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Abstract
Studies using genome-wide platforms have yielded an unprecedented number of promising signals of association between genomic variants and human traits. This Review addresses the steps required to validate, augment and refine such signals to identify underlying causal variants for well-defined phenotypes. These steps include: large-scale exact replication across both similar and diverse populations; fine mapping and resequencing; determination of the most informative markers and multiple independent informative loci; incorporation of functional information; and improved phenotype mapping of the implicated genetic effects. Even in cases for which replication proves that an effect exists, confident localization of the causal variant often remains elusive.
Collapse
|
37
|
Bracken MB. Why are so many epidemiology associations inflated or wrong? Does poorly conducted animal research suggest implausible hypotheses? Ann Epidemiol 2009; 19:220-4. [PMID: 19217006 DOI: 10.1016/j.annepidem.2008.11.006] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2008] [Accepted: 11/29/2008] [Indexed: 01/31/2023]
Abstract
There is growing concern among epidemiologists that most discovered associations are either inflated or false. The reasons for this concern have focused on methodological issues in the conduct and publication of epidemiologic research. This commentary suggests that another reason for discrepant findings may be that animal research is producing implausible hypotheses. Many animal studies are methodologically weak, and the animal literature is not systematically reviewed and synthesized. Moreover, most bodies of animal literature may be so heterogeneous that they can be used selectively to support the plausibility of almost any epidemiology study result. Epidemiologists themselves also do not consistently conduct systematic reviews of bodies of biological evidence which might point to sources of bias in an evidence base. Animal research will likely continue to provide the biological basis for epidemiological investigation, but substantial improvement is needed in how it is conducted and synthesized to improve the predictability of animal studies for the human condition.
Collapse
Affiliation(s)
- Michael B Bracken
- School of Public Health and Medicine, Yale University, New Haven, CT, USA
| |
Collapse
|
38
|
Wineinger NE, Kennedy RE, Erickson SW, Wojczynski MK, Bruder CE, Tiwari HK. Statistical issues in the analysis of DNA Copy Number Variations. INTERNATIONAL JOURNAL OF COMPUTATIONAL BIOLOGY AND DRUG DESIGN 2008; 1:368-95. [PMID: 19774103 PMCID: PMC2747762 DOI: 10.1504/ijcbdd.2008.022208] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Approaches to assess copy number variation have advanced rapidly and are being incorporated into genetic studies. While the technology exists for CNV genotyping, a further understanding and discussion of how to use the CNV data for association analyses is warranted. We present the options available for processing and analysing CNV data. We break these steps down into choice of genotyping platform, normalisation of the array data, calling algorithm, and statistical analysis.
Collapse
Affiliation(s)
- Nathan E. Wineinger
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA, Fax: 205-975-2540, E-mail:
| | - Richard E. Kennedy
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA, Fax: 205-975-2540, E-mail:
| | - Stephen W. Erickson
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA, Fax: 205-975-2540, E-mail:
| | - Mary K. Wojczynski
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA, Fax: 205-975-2540, E-mail:
| | - Carl E. Bruder
- Viral Biochemistry, Division of Drug Discovery, Southern Research Institute, Birmingham, Alabama 35205, USA, Fax: (205) 581-2097, E-mail:
| | - Hemant K. Tiwari
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA, Fax: 205-975-2541, E-mail:
| |
Collapse
|