1
|
McCoy JCS, Spicer JI, Ibbini Z, Tills O. Phenomics as an approach to Comparative Developmental Physiology. Front Physiol 2023; 14:1229500. [PMID: 37645563 PMCID: PMC10461620 DOI: 10.3389/fphys.2023.1229500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 07/24/2023] [Indexed: 08/31/2023] Open
Abstract
The dynamic nature of developing organisms and how they function presents both opportunity and challenge to researchers, with significant advances in understanding possible by adopting innovative approaches to their empirical study. The information content of the phenotype during organismal development is arguably greater than at any other life stage, incorporating change at a broad range of temporal, spatial and functional scales and is of broad relevance to a plethora of research questions. Yet, effectively measuring organismal development, and the ontogeny of physiological regulations and functions, and their responses to the environment, remains a significant challenge. "Phenomics", a global approach to the acquisition of phenotypic data at the scale of the whole organism, is uniquely suited as an approach. In this perspective, we explore the synergies between phenomics and Comparative Developmental Physiology (CDP), a discipline of increasing relevance to understanding sensitivity to drivers of global change. We then identify how organismal development itself provides an excellent model for pushing the boundaries of phenomics, given its inherent complexity, comparably smaller size, relative to adult stages, and the applicability of embryonic development to a broad suite of research questions using a diversity of species. Collection, analysis and interpretation of whole organismal phenotypic data are the largest obstacle to capitalising on phenomics for advancing our understanding of biological systems. We suggest that phenomics within the context of developing organismal form and function could provide an effective scaffold for addressing grand challenges in CDP and phenomics.
Collapse
Affiliation(s)
| | | | | | - Oliver Tills
- School of Biological and Marine Sciences, University of Plymouth, Plymouth, United Kingdom
| |
Collapse
|
2
|
Abstract
Developing personalized diagnostic strategies and targeted treatments requires a deep understanding of disease biology and the ability to dissect the relationship between molecular and genetic factors and their phenotypic consequences. However, such knowledge is fragmented across publications, non-standardized repositories, and evolving ontologies describing various scales of biological organization between genotypes and clinical phenotypes. Here, we present PrimeKG, a multimodal knowledge graph for precision medicine analyses. PrimeKG integrates 20 high-quality resources to describe 17,080 diseases with 4,050,249 relationships representing ten major biological scales, including disease-associated protein perturbations, biological processes and pathways, anatomical and phenotypic scales, and the entire range of approved drugs with their therapeutic action, considerably expanding previous efforts in disease-rooted knowledge graphs. PrimeKG contains an abundance of 'indications', 'contradictions', and 'off-label use' drug-disease edges that lack in other knowledge graphs and can support AI analyses of how drugs affect disease-associated networks. We supplement PrimeKG's graph structure with language descriptions of clinical guidelines to enable multimodal analyses and provide instructions for continual updates of PrimeKG as new data become available.
Collapse
|
3
|
He Y, Cooney CR, Maddock S, Thomas GH. Using pose estimation to identify regions and points on natural history specimens. PLoS Comput Biol 2023; 19:e1010933. [PMID: 36812227 PMCID: PMC9987800 DOI: 10.1371/journal.pcbi.1010933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 03/06/2023] [Accepted: 02/07/2023] [Indexed: 02/24/2023] Open
Abstract
A key challenge in mobilising growing numbers of digitised biological specimens for scientific research is finding high-throughput methods to extract phenotypic measurements on these datasets. In this paper, we test a pose estimation approach based on Deep Learning capable of accurately placing point labels to identify key locations on specimen images. We then apply the approach to two distinct challenges that each requires identification of key features in a 2D image: (i) identifying body region-specific plumage colouration on avian specimens and (ii) measuring morphometric shape variation in Littorina snail shells. For the avian dataset, 95% of images are correctly labelled and colour measurements derived from these predicted points are highly correlated with human-based measurements. For the Littorina dataset, more than 95% of landmarks were accurately placed relative to expert-labelled landmarks and predicted landmarks reliably captured shape variation between two distinct shell ecotypes ('crab' vs 'wave'). Overall, our study shows that pose estimation based on Deep Learning can generate high-quality and high-throughput point-based measurements for digitised image-based biodiversity datasets and could mark a step change in the mobilisation of such data. We also provide general guidelines for using pose estimation methods on large-scale biological datasets.
Collapse
Affiliation(s)
- Yichen He
- Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield; Alfred Denny Building, University of Sheffield, Sheffield, United Kingdom
- * E-mail:
| | - Christopher R. Cooney
- Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield; Alfred Denny Building, University of Sheffield, Sheffield, United Kingdom
| | - Steve Maddock
- Department of Computer Science, University of Sheffield; Regent Court, University of Sheffield, Sheffield, United Kingdom
| | - Gavin H. Thomas
- Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield; Alfred Denny Building, University of Sheffield, Sheffield, United Kingdom
- Bird Group, Department of Life Sciences, The Natural History Museum at Tring; Tring, United Kingdom
| |
Collapse
|
4
|
Koning E, Vorstman J, McIntyre RS, Brietzke E. Characterizing eating behavioral phenotypes in mood disorders: a narrative review. Psychol Med 2022; 52:2885-2898. [PMID: 36004528 PMCID: PMC9693712 DOI: 10.1017/s0033291722002446] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 07/06/2022] [Accepted: 07/12/2022] [Indexed: 01/05/2023]
Abstract
Mood disorders, including depressive and bipolar disorders, represent a multidimensional and prevalent group of psychiatric illnesses characterized by disturbances in emotion, cognition and metabolism. Maladaptive eating behaviors in mood disorders are diverse and warrant characterization in order to increase the precision of diagnostic criteria, identify subtypes and improve treatment strategies. The current narrative review synthesizes evidence for Eating Behavioral Phenotypes (EBP) in mood disorders as well as advancements in pathophysiological conceptual frameworks relevant to each phenotype. Phenotypes include maladaptive eating behaviors related to appetite, emotion, reward, impulsivity, diet style and circadian rhythm disruption. Potential treatment strategies for each phenotype are also discussed, including psychotherapeutic, pharmacological and nutritional interventions. Maladaptive eating behaviors related to mood disorders are relevant from both clinical and research perspectives, yet have been somewhat overlooked thus far. A better understanding of this aspect of mood disorders holds promise to improve clinical care in this patient group and contribute to the subtyping of these currently subjectively diagnosed and treated disorders.
Collapse
Affiliation(s)
- Elena Koning
- Centre for Neuroscience Studies (CNS), Queen's University, Kingston, ON, Canada
| | - Jacob Vorstman
- Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children, Toronto, ON, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, Canada
| | - Roger S. McIntyre
- Mood Disorders Psychopharmacology Unit (MDPU), Toronto Western Hospital, University Health Network, Toronto, ON, Canada
| | - Elisa Brietzke
- Centre for Neuroscience Studies (CNS), Queen's University, Kingston, ON, Canada
- Department of Psychiatry, Queen's University School of Medicine, Kingston, ON, Canada
| |
Collapse
|
5
|
Zebrafish Larvae Phenotype Classification from Bright-field Microscopic Images Using a Two-Tier Deep-Learning Pipeline. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10041247] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Classification of different zebrafish larvae phenotypes is useful for studying the environmental influence on embryo development. However, the scarcity of well-annotated training images and fuzzy inter-phenotype differences hamper the application of machine-learning methods in phenotype classification. This study develops a deep-learning approach to address these challenging problems. A convolutional network model with compressed separable convolution kernels is adopted to address the overfitting issue caused by insufficient training data. A two-tier classification pipeline is designed to improve the classification accuracy based on fuzzy phenotype features. Our method achieved an averaged accuracy of 91% for all the phenotypes and maximum accuracy of 100% for some phenotypes (e.g., dead and chorion). We also compared our method with the state-of-the-art methods based on the same dataset. Our method obtained dramatic accuracy improvement up to 22% against the existing method. This study offers an effective deep-learning solution for classifying difficult zebrafish larvae phenotypes based on very limited training data.
Collapse
|
6
|
Bennett TD, Callahan TJ, Feinstein JA, Ghosh D, Lakhani SA, Spaeder MC, Szefler SJ, Kahn MG. Data Science for Child Health. J Pediatr 2019; 208:12-22. [PMID: 30686480 PMCID: PMC6486872 DOI: 10.1016/j.jpeds.2018.12.041] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 12/11/2018] [Accepted: 12/18/2018] [Indexed: 12/12/2022]
Affiliation(s)
- Tellen D Bennett
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO; CU Data Science to Patient Value (D2V), University of Colorado School of Medicine, Aurora, CO; Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO; Adult and Child Consortium for Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine and Children's Hospital Colorado, Aurora, CO; Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, CO.
| | - Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, CO
| | - James A Feinstein
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO; Adult and Child Consortium for Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine and Children's Hospital Colorado, Aurora, CO
| | - Debashis Ghosh
- CU Data Science to Patient Value (D2V), University of Colorado School of Medicine, Aurora, CO; Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO; Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, CO
| | - Saquib A Lakhani
- Pediatric Genomics Discovery Program, Department of Pediatrics, Yale University School of Medicine, New Haven, CT
| | - Michael C Spaeder
- Pediatric Critical Care, University of Virginia School of Medicine, Charlottesville, VA
| | - Stanley J Szefler
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO; Adult and Child Consortium for Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine and Children's Hospital Colorado, Aurora, CO
| | - Michael G Kahn
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO; Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, CO
| |
Collapse
|
7
|
Farmer JR, Ong MS, Barmettler S, Yonker LM, Fuleihan R, Sullivan KE, Cunningham-Rundles C, Walter JE. Common Variable Immunodeficiency Non-Infectious Disease Endotypes Redefined Using Unbiased Network Clustering in Large Electronic Datasets. Front Immunol 2018; 8:1740. [PMID: 29375540 PMCID: PMC5767273 DOI: 10.3389/fimmu.2017.01740] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Accepted: 11/23/2017] [Indexed: 02/02/2023] Open
Abstract
Common variable immunodeficiency (CVID) is increasingly recognized for its association with autoimmune and inflammatory complications. Despite recent advances in immunophenotypic and genetic discovery, clinical care of CVID remains limited by our inability to accurately model risk for non-infectious disease development. Herein, we demonstrate the utility of unbiased network clustering as a novel method to analyze inter-relationships between non-infectious disease outcomes in CVID using databases at the United States Immunodeficiency Network (USIDNET), the centralized immunodeficiency registry of the United States, and Partners, a tertiary care network in Boston, MA, USA, with a shared electronic medical record amenable to natural language processing. Immunophenotypes were comparable in terms of native antibody deficiencies, low titer response to pneumococcus, and B cell maturation arrest. However, recorded non-infectious disease outcomes were more substantial in the Partners cohort across the spectrum of lymphoproliferation, cytopenias, autoimmunity, atopy, and malignancy. Using unbiased network clustering to analyze 34 non-infectious disease outcomes in the Partners cohort, we further identified unique patterns of lymphoproliferative (two clusters), autoimmune (two clusters), and atopic (one cluster) disease that were defined as CVID non-infectious endotypes according to discrete and non-overlapping immunophenotypes. Markers were both previously described {high serum IgE in the atopic cluster [odds ratio (OR) 6.5] and low class-switched memory B cells in the total lymphoproliferative cluster (OR 9.2)} and novel [low serum C3 in the total lymphoproliferative cluster (OR 5.1)]. Mortality risk in the Partners cohort was significantly associated with individual non-infectious disease outcomes as well as lymphoproliferative cluster 2, specifically (OR 5.9). In contrast, unbiased network clustering failed to associate known comorbidities in the adult USIDNET cohort. Together, these data suggest that unbiased network clustering can be used in CVID to redefine non-infectious disease inter-relationships; however, applicability may be limited to datasets well annotated through mechanisms such as natural language processing. The lymphoproliferative, autoimmune, and atopic Partners CVID endotypes herein described can be used moving forward to streamline genetic and biomarker discovery and to facilitate early screening and intervention in CVID patients at highest risk for autoimmune and inflammatory progression.
Collapse
Affiliation(s)
| | - Mei-Sing Ong
- Department of Population Medicine, Harvard Pilgrim Health Care Institute, Harvard Medical School, Boston, MA, United States
| | | | - Lael M Yonker
- Massachusetts General Hospital, Boston, MA, United States
| | - Ramsay Fuleihan
- Ann and Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, United States
| | | | | | | | - Jolan E Walter
- Massachusetts General Hospital, Boston, MA, United States.,University of South Florida, St. Petersburg, FL, United States.,Johns Hopkins All Children's Hospital, St. Petersburg, FL, United States
| |
Collapse
|
8
|
Glueck M, Gvozdik A, Chevalier F, Khan A, Brudno M, Wigdor D. PhenoStacks: Cross-Sectional Cohort Phenotype Comparison Visualizations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:191-200. [PMID: 27514055 DOI: 10.1109/tvcg.2016.2598469] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Cross-sectional phenotype studies are used by genetics researchers to better understand how phenotypes vary across patients with genetic diseases, both within and between cohorts. Analyses within cohorts identify patterns between phenotypes and patients (e.g., co-occurrence) and isolate special cases (e.g., potential outliers). Comparing the variation of phenotypes between two cohorts can help distinguish how different factors affect disease manifestation (e.g., causal genes, age of onset, etc.). PhenoStacks is a novel visual analytics tool that supports the exploration of phenotype variation within and between cross-sectional patient cohorts. By leveraging the semantic hierarchy of the Human Phenotype Ontology, phenotypes are presented in context, can be grouped and clustered, and are summarized via overviews to support the exploration of phenotype distributions. The design of PhenoStacks was motivated by formative interviews with genetics researchers: we distil high-level tasks, present an algorithm for simplifying ontology topologies for visualization, and report the results of a deployment evaluation with four expert genetics researchers. The results suggest that PhenoStacks can help identify phenotype patterns, investigate data quality issues, and inform data collection design.
Collapse
|
9
|
Cheng KC, Katz SR, Lin AY, Xin X, Ding Y. Whole-Organism Cellular Pathology: A Systems Approach to Phenomics. ADVANCES IN GENETICS 2016; 95:89-115. [PMID: 27503355 DOI: 10.1016/bs.adgen.2016.05.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Phenotype is defined as the state of an organism resulting from interactions between genes, environment, disease, molecular mechanisms, and chance. The purpose of the emerging field of phenomics is to systematically determine and measure phenotypes across biology for the sake of understanding. Phenotypes can affect more than one cell type and life stage, so ideal phenotyping would include the state of every cell type within the context of both tissue architecture and the whole organism at each life stage. In medicine, high-resolution anatomic assessment of phenotype is obtained from histology. Histology's interpretative power, codified by Virchow as cellular pathology, is derived from its ability to discern diagnostic and characteristic cellular changes in diseased tissues. Cellular pathology is observed in every major human disease and relies on the ability of histology to detect cellular change in any cell type due to unbiased pan-cellular staining, even in optically opaque tissues. Our laboratory has shown that histology is far more sensitive than stereomicroscopy for detecting phenotypes in zebrafish mutants. Those studies have also shown that more complete sampling, greater consistency in sample orientation, and the inclusion of phenotypes extending over longer length scales would provide greater coverage of common phenotypes. We are developing technical approaches to achieve an ideal detection of cellular pathology using an improved form of X-ray microtomography that retains the strengths and addresses the weaknesses of histology as a screening tool. We are using zebrafish as a vertebrate model based on the overlaps between zebrafish and mammalian tissue architecture, and a body size small enough to allow whole-organism, volumetric imaging at cellular resolution. Automation of whole-organism phenotyping would greatly increase the value of phenomics. Potential societal benefits would include reduction in the cost of drug development, a reduction in the incidence of unexpected severe drug and environmental toxicity, and more rapid elucidation of the contributions of genes and the environment to phenotypes, including the validation of candidate disease alleles identified in population and personal genetics.
Collapse
Affiliation(s)
- K C Cheng
- The Pennsylvania State University College of Medicine, Hershey, PA, United States
| | - S R Katz
- The Pennsylvania State University College of Medicine, Hershey, PA, United States
| | - A Y Lin
- The Pennsylvania State University College of Medicine, Hershey, PA, United States
| | - X Xin
- The Pennsylvania State University College of Medicine, Hershey, PA, United States
| | - Y Ding
- The Pennsylvania State University College of Medicine, Hershey, PA, United States
| |
Collapse
|
10
|
Glueck M, Hamilton P, Chevalier F, Breslav S, Khan A, Wigdor D, Brudno M. PhenoBlocks: Phenotype Comparison Visualizations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:101-110. [PMID: 26529691 DOI: 10.1109/tvcg.2015.2467733] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The differential diagnosis of hereditary disorders is a challenging task for clinicians due to the heterogeneity of phenotypes that can be observed in patients. Existing clinical tools are often text-based and do not emphasize consistency, completeness, or granularity of phenotype reporting. This can impede clinical diagnosis and limit their utility to genetics researchers. Herein, we present PhenoBlocks, a novel visual analytics tool that supports the comparison of phenotypes between patients, or between a patient and the hallmark features of a disorder. An informal evaluation of PhenoBlocks with expert clinicians suggested that the visualization effectively guides the process of differential diagnosis and could reinforce the importance of complete, granular phenotypic reporting.
Collapse
|
11
|
Abstract
This paper provides an overview of recent developments in big data in the context of biomedical and health informatics. It outlines the key characteristics of big data and how medical and health informatics, translational bioinformatics, sensor informatics, and imaging informatics will benefit from an integrated approach of piecing together different aspects of personalized information from a diverse range of data sources, both structured and unstructured, covering genomics, proteomics, metabolomics, as well as imaging, clinical diagnosis, and long-term continuous physiological sensing of an individual. It is expected that recent advances in big data will expand our knowledge for testing new hypotheses about disease management from diagnosis to prevention to personalized treatment. The rise of big data, however, also raises challenges in terms of privacy, security, data ownership, data stewardship, and governance. This paper discusses some of the existing activities and future opportunities related to big data for health, outlining some of the key underlying issues that need to be tackled.
Collapse
|
12
|
Dammann O, Gray P, Gressens P, Wolkenhauer O, Leviton A. Systems Epidemiology: What's in a Name? Online J Public Health Inform 2014; 6:e198. [PMID: 25598870 PMCID: PMC4292535 DOI: 10.5210/ojphi.v6i3.5571] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Systems biology is an interdisciplinary effort to integrate molecular, cellular, tissue, organ, and organism levels of function into computational models that facilitate the identification of general principles. Systems medicine adds a disease focus. Systems epidemiology adds yet another level consisting of antecedents that might contribute to the disease process in populations. In etiologic and prevention research, systems-type thinking about multiple levels of causation will allow epidemiologists to identify contributors to disease at multiple levels as well as their interactions. In public health, systems epidemiology will contribute to the improvement of syndromic surveillance methods. We encourage the creation of computational simulation models that integrate information about disease etiology, pathogenetic data, and the expertise of investigators from different disciplines.
Collapse
Affiliation(s)
- O. Dammann
- Dept of Public Health and Community Medicine, Tufts
University School of Medicine, Boston, MA
- Perinatal Epidemiology Unit, Dept. of Gynecology and
Obstetrics, Hannover Medical School, Hannover, Germany
| | - P. Gray
- Dept of Public Health and Community Medicine, Tufts
University School of Medicine, Boston, MA
| | - P. Gressens
- Inserm, U676, Paris, France
- Department of Perinatal Imaging and Health,
Department of Division of Imaging Sciences and Biomedical Engineering,
King’s College London, King’s Health Partners, St. Thomas’
Hospital, London, United Kingdom
| | - O. Wolkenhauer
- Department of Systems Biology and Bioinformatics,
University of Rostock, Rostock, Germany
- Stellenbosch Institute for Advanced Study (STIAS),
Stellenbosch, South Africa
| | - A. Leviton
- Neuroepidemiology Unit, Children’s Hospital,
Boston, MA
| |
Collapse
|
13
|
Pivovarov R, Albers DJ, Sepulveda JL, Elhadad N. Identifying and mitigating biases in EHR laboratory tests. J Biomed Inform 2014; 51:24-34. [PMID: 24727481 DOI: 10.1016/j.jbi.2014.03.016] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Revised: 03/27/2014] [Accepted: 03/30/2014] [Indexed: 02/08/2023]
Abstract
Electronic health record (EHR) data show promise for deriving new ways of modeling human disease states. Although EHR researchers often use numerical values of laboratory tests as features in disease models, a great deal of information is contained in the context within which a laboratory test is taken. For example, the same numerical value of a creatinine test has different interpretation for a chronic kidney disease patient and a patient with acute kidney injury. We study whether EHR research studies are subject to biased results and interpretations if laboratory measurements taken in different contexts are not explicitly separated. We show that the context of a laboratory test measurement can often be captured by the way the test is measured through time. We perform three tasks to study the properties of these temporal measurement patterns. In the first task, we confirm that laboratory test measurement patterns provide additional information to the stand-alone numerical value. The second task identifies three measurement pattern motifs across a set of 70 laboratory tests performed for over 14,000 patients. Of these, one motif exhibits properties that can lead to biased research results. In the third task, we demonstrate the potential for biased results on a specific example. We conduct an association study of lipase test values to acute pancreatitis. We observe a diluted signal when using only a lipase value threshold, whereas the full association is recovered when properly accounting for lipase measurements in different contexts (leveraging the lipase measurement patterns to separate the contexts). Aggregating EHR data without separating distinct laboratory test measurement patterns can intermix patients with different diseases, leading to the confounding of signals in large-scale EHR analyses. This paper presents a methodology for leveraging measurement frequency to identify and reduce laboratory test biases.
Collapse
Affiliation(s)
- Rimma Pivovarov
- Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA.
| | - David J Albers
- Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA.
| | - Jorge L Sepulveda
- Department of Pathology and Cell Biology, Columbia University, 630 W. 168th Street, New York, NY, USA.
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA.
| |
Collapse
|
14
|
Smith HC, Niewohner DJ, Dewey GD, Longo AM, Guy TL, Higgins BR, Daehling SB, Genrich SC, Wentworth CD, Durham Brooks TL. Using flatbed scanners to collect high-resolution time-lapsed images of the arabidopsis root gravitropic response. J Vis Exp 2014:e50878. [PMID: 24513680 PMCID: PMC4091038 DOI: 10.3791/50878] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Research efforts in biology increasingly require use of methodologies that enable high-volume collection of high-resolution data. A challenge laboratories can face is the development and attainment of these methods. Observation of phenotypes in a process of interest is a typical objective of research labs studying gene function and this is often achieved through image capture. A particular process that is amenable to observation using imaging approaches is the corrective growth of a seedling root that has been displaced from alignment with the gravity vector. Imaging platforms used to measure the root gravitropic response can be expensive, relatively low in throughput, and/or labor intensive. These issues have been addressed by developing a high-throughput image capture method using inexpensive, yet high-resolution, flatbed scanners. Using this method, images can be captured every few minutes at 4,800 dpi. The current setup enables collection of 216 individual responses per day. The image data collected is of ample quality for image analysis applications.
Collapse
|
15
|
High-throughput phenotyping of plant populations using a personal digital assistant. Methods Mol Biol 2012; 918:97-116. [PMID: 22893288 DOI: 10.1007/978-1-61779-995-2_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/27/2023]
Abstract
During many biological experiments voluminous data is acquired, which can be best collected with -portable data acquisition devices and later analyzed with a personal computer (PC). Public domain software catering to data acquisition and analysis is currently limited. The necessity of phenotyping large plant populations led to the development of the application "PHENOME" to manage the data. PHENOME allows acquisition of phenotypic data using a personal digital assistant (PDA) with a built-in barcode scanner. The acquired data can be exported to a customized database on a PC for further analysis and cataloging. PHENOME can be used for a variety of applications, for example high-throughput phenotyping of a mutagenized or mapping population, or phenotyping of several individuals in one or more ecological niches.
Collapse
|
16
|
van Triest HJW, Chen D, Ji X, Qi S, Li-Ling J. PhenOMIM: an OMIM-based secondary database purported for phenotypic comparison. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2012; 2011:3589-92. [PMID: 22255115 DOI: 10.1109/iembs.2011.6090600] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Phenotypic comparison may provide crucial information for obtaining insights into molecular interactions underlying various diseases. However, few attempts have been made to systematically analyze the phenotypes of hereditary disorders, mainly owing to the poor quality of text descriptions and lack of a unified system of descriptors. Here we present a secondary database, PHENOMIM, for translating the phenotypic data obtained from the Online Mendelian Inheritance in Man (OMIM) database into a structured form. Moreover, a web interface has also been developed for visualizing the data and related information from the OMIM and PhenOMIM databases. The data is freely available online for reviewing and commenting purposes and can be found at http://faculty.neu.edu.cn/bmie/han/PhenOMIM/.
Collapse
Affiliation(s)
- Han J W van Triest
- Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang 110003, China.
| | | | | | | | | |
Collapse
|
17
|
Pérez-Pérez JM, Rubio-Díaz S, Dhondt S, Hernández-Romero D, Sánchez-Soriano J, Beemster GTS, Ponce MR, Micol JL. Whole organ, venation and epidermal cell morphological variations are correlated in the leaves of Arabidopsis mutants. PLANT, CELL & ENVIRONMENT 2011; 34:2200-11. [PMID: 21883289 DOI: 10.1111/j.1365-3040.2011.02415.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Despite the large number of genes known to affect leaf shape or size, we still have a relatively poor understanding of how leaf morphology is established. For example, little is known about how cell division and cell expansion are controlled and coordinated within a growing leaf to eventually develop into a laminar organ of a definite size. To obtain a global perspective of the cellular basis of variations in leaf morphology at the organ, tissue and cell levels, we studied a collection of 111 non-allelic mutants with abnormally shaped and/or sized leaves, which broadly represent the mutational variations in Arabidopsis thaliana leaf morphology not associated with lethality. We used image-processing techniques on these mutants to quantify morphological parameters running the gamut from the palisade mesophyll and epidermal cells to the venation, whole leaf and rosette levels. We found positive correlations between epidermal cell size and leaf area, which is consistent with long-standing Avery's hypothesis that the epidermis drives leaf growth. In addition, venation parameters were positively correlated with leaf area, suggesting that leaf growth and vein patterning share some genetic controls. Positional cloning of the genes affected by the studied mutations will eventually establish functional links between genotypes, molecular functions, cellular parameters and leaf phenotypes.
Collapse
Affiliation(s)
- José Manuel Pérez-Pérez
- Instituto de Bioingeniería, Centro de Investigación Operativa, Universidad Miguel Hernández, Campus de Elche, 03202 Elche, Alicante, Spain
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Gupta M, Cheung CL, Hsu YH, Demissie S, Cupples LA, Kiel DP, Karasik D. Identification of homogeneous genetic architecture of multiple genetically correlated traits by block clustering of genome-wide associations. J Bone Miner Res 2011; 26:1261-71. [PMID: 21611967 PMCID: PMC3312758 DOI: 10.1002/jbmr.333] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Genome-wide association studies (GWAS) using high-density genotyping platforms offer an unbiased strategy to identify new candidate genes for osteoporosis. It is imperative to be able to clearly distinguish signal from noise by focusing on the best phenotype in a genetic study. We performed GWAS of multiple phenotypes associated with fractures [bone mineral density (BMD), bone quantitative ultrasound (QUS), bone geometry, and muscle mass] with approximately 433,000 single-nucleotide polymorphisms (SNPs) and created a database of resulting associations. We performed analysis of GWAS data from 23 phenotypes by a novel modification of a block clustering algorithm followed by gene-set enrichment analysis. A data matrix of standardized regression coefficients was partitioned along both axes--SNPs and phenotypes. Each partition represents a distinct cluster of SNPs that have similar effects over a particular set of phenotypes. Application of this method to our data shows several SNP-phenotype connections. We found a strong cluster of association coefficients of high magnitude for 10 traits (BMD at several skeletal sites, ultrasound measures, cross-sectional bone area, and section modulus of femoral neck and shaft). These clustered traits were highly genetically correlated. Gene-set enrichment analyses indicated the augmentation of genes that cluster with the 10 osteoporosis-related traits in pathways such as aldosterone signaling in epithelial cells, role of osteoblasts, osteoclasts, and chondrocytes in rheumatoid arthritis, and Parkinson signaling. In addition to several known candidate genes, we also identified PRKCH and SCNN1B as potential candidate genes for multiple bone traits. In conclusion, our mining of GWAS results revealed the similarity of association results between bone strength phenotypes that may be attributed to pleiotropic effects of genes. This knowledge may prove helpful in identifying novel genes and pathways that underlie several correlated phenotypes, as well as in deciphering genetic and phenotypic modularity underlying osteoporosis risk.
Collapse
Affiliation(s)
- Mayetri Gupta
- Department of Biostatistics, Boston University, Boston, MA, USA
| | | | | | | | | | | | | |
Collapse
|
19
|
Patel N, Lanktree MB, Hegele RA. Genetic risk factors for stroke in the genome-wide association era. EXPERT OPINION ON MEDICAL DIAGNOSTICS 2011; 5:75-84. [PMID: 23484478 DOI: 10.1517/17530059.2011.540567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
IMPORTANCE OF THE FIELD Recent genome-wide association studies (GWASs) have renewed interest in genetic determinants of a wide range of complex traits and disorders, including stroke. AREAS COVERED IN THIS REVIEW This paper reviews the current knowledge of genes that contribute to rare monogenic forms of stroke as well as more common 'garden variety' forms, focusing on the results of GWASs. Potential clinical pharmacogenetic and diagnostic applications of this information are considered. Publications from 1990 to September 2010 were identified through a Medline search using terms 'human stroke' and 'genetics', 'monogenic', 'familial', 'mutation', 'genome-wide association study', 'polymorphism', or 'genotype'. WHAT THE READER WILL GAIN The review synthesizes and collates the current understanding of genes that are involved across a range of stroke subphenotypes. TAKE HOME MESSAGE The complexity of stroke will make translation of genetic findings into new diagnostic or therapeutic tools relatively more challenging than for some other conditions and tempers the authors' enthusiasm for the eventual clinical utility of this information.
Collapse
Affiliation(s)
- Neeraj Patel
- University of Western Ontario, Robarts Research Institute, Schulich School of Medicine and Dentistry, Blackburn Cardiovascular Genetics Laboratory, London, Ontario, Canada N6A 5K8 +519 931 5271 ; +519 931 5218 ;
| | | | | |
Collapse
|
20
|
Canada BA, Thomas GK, Cheng KC, Wang JZ. SHIRAZ: an automated histology image annotation system for zebrafish phenomics. MULTIMEDIA TOOLS AND APPLICATIONS 2011; 51:401-440. [PMID: 21461317 PMCID: PMC3066164 DOI: 10.1007/s11042-010-0638-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Histological characterization is used in clinical and research contexts as a highly sensitive method for detecting the morphological features of disease and abnormal gene function. Histology has recently been accepted as a phenotyping method for the forthcoming Zebrafish Phenome Project, a large-scale community effort to characterize the morphological, physiological, and behavioral phenotypes resulting from the mutations in all known genes in the zebrafish genome. In support of this project, we present a novel content-based image retrieval system for the automated annotation of images containing histological abnormalities in the developing eye of the larval zebrafish.
Collapse
Affiliation(s)
- Brian A. Canada
- Department of Science and Mathematics, University of South Carolina, Beaufort, SC USA
| | | | | | - James Z. Wang
- College of Information Sciences & Technology, The Pennsylvania State University, University Park, PA USA
| |
Collapse
|
21
|
Wei WQ, Tao C, Jiang G, Chute CG. A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2010; 2010:857-861. [PMID: 21347100 PMCID: PMC3041302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
UNLABELLED Current research on high throughput identification of patients with a specific phenotype is in its infancy. There is an urgent need to develop a general automatic approach for patient identification. OBJECTIVE We took advantage of Mayo Clinic electronic clinical notes and proposed a novel method of combining NLP, machine learning, and ontology for automatic patient identification. We also investigated the benefits of involving existing SNOMED semantic knowledge in a patient identification task. METHODS the SVM algorithm was applied on SNOMED concept units extracted from T2DM case/control clinical notes. Precision, recall, and F-score were calculated to evaluate the performance. RESULTS This approach achieved an F-score of above 0.950 for both groups when using all identified concept units as features. Concept units from semantic type-Disease or Syndrome contain the most important information for patient identification. Our results also implied that the coarse level concepts contain enough information to classify T2DM cases/controls.
Collapse
Affiliation(s)
- Wei-Qi Wei
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN
| | | | | | | |
Collapse
|
22
|
Karasik D, Hsu YH, Zhou Y, Cupples LA, Kiel DP, Demissie S. Genome-wide pleiotropy of osteoporosis-related phenotypes: the Framingham Study. J Bone Miner Res 2010; 25:1555-63. [PMID: 20200953 PMCID: PMC3153998 DOI: 10.1002/jbmr.38] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Genome-wide association studies offer an unbiased approach to identify new candidate genes for osteoporosis. We examined the Affymetrix 500K + 50K SNP GeneChip marker sets for associations with multiple osteoporosis-related traits at various skeletal sites, including bone mineral density (BMD, hip and spine), heel ultrasound, and hip geometric indices in the Framingham Osteoporosis Study. We evaluated 433,510 single-nucleotide polymorphisms (SNPs) in 2073 women (mean age 65 years), members of two-generational families. Variance components analysis was performed to estimate phenotypic, genetic, and environmental correlations (rho(P), rho(G), and rho(E)) among bone traits. Linear mixed-effects models were used to test associations between SNPs and multivariable-adjusted trait values. We evaluated the proportion of SNPs associated with pairs of the traits at a nominal significance threshold alpha = 0.01. We found substantial correlation between the proportion of associated SNPs and the rho(P) and rho(G) (r = 0.91 and 0.84, respectively) but much lower with rho(E) (r = 0.38). Thus, for example, hip and spine BMD had 6.8% associated SNPs in common, corresponding to rho(P) = 0.55 and rho(G) = 0.66 between them. Fewer SNPs were associated with both BMD and any of the hip geometric traits (eg, femoral neck and shaft width, section moduli, neck shaft angle, and neck length); rho(G) between BMD and geometric traits ranged from -0.24 to +0.40. In conclusion, we examined relationships between osteoporosis-related traits based on genome-wide associations. Most of the similarity between the quantitative bone phenotypes may be attributed to pleiotropic effects of genes. This knowledge may prove helpful in defining the best phenotypes to be used in genetic studies of osteoporosis.
Collapse
Affiliation(s)
- David Karasik
- Hebrew SeniorLife Institute for Aging Research and Harvard Medical School, Boston, MA 02131, USA.
| | | | | | | | | | | |
Collapse
|
23
|
Lanktree MB, Dichgans M, Hegele RA. Advances in genomic analysis of stroke: what have we learned and where are we headed? Stroke 2010; 41:825-32. [PMID: 20167918 DOI: 10.1161/strokeaha.109.570523] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
As a result of technological advances, the genomic analysis of stroke has shifted from candidate gene association studies to genome-wide association studies (GWAS). Agnostic GWAS evaluate up to 90% of common genetic variation in a single experiment, creating an improved framework for identifying novel genetic leads for biochemical and cellular mechanisms underlying stroke. Given the ubiquity of the GWAS approach, it has become essential for stroke researchers and clinicians to be able to interpret GWAS results. Thus, we review the basic elements of design, methods, presentation, and interpretation of GWAS in the context of stroke research. In 8 recent stroke GWAS reports, no single locus has been identified in 2 GWAS at a genome-wide level of significance. Additionally, no significant association signal between stroke and a locus with previous evidence from candidate gene studies of stroke has been identified yet. Some caveats of the approach and future directions for stroke genomics are discussed, including the use of intermediate phenotypes, Mendelian randomization, phenomics, and deep resequencing. Intelligent, appropriately powered, multidisciplinary studies incorporating knowledge from clinical medicine, epidemiology, genetics, and molecular biology will be required to fully characterize the genomic contributors to stroke.
Collapse
Affiliation(s)
- Matthew B Lanktree
- Robarts Research Institute and Schulich School of Medicine and Dentistry, University of Western Ontario, London, Ontario, Canada
| | | | | |
Collapse
|
24
|
Sintchenko V. Informatics for Infectious Disease Research and Control. INFECTIOUS DISEASE INFORMATICS 2010. [PMCID: PMC7120928 DOI: 10.1007/978-1-4419-1327-2_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The goal of infectious disease informatics is to optimize the clinical and public health management of infectious diseases through improvements in the development and use of antimicrobials, the design of more effective vaccines, the identification of biomarkers for life-threatening infections, a better understanding of host-pathogen interactions, and biosurveillance and clinical decision support. Infectious disease informatics can lead to more targeted and effective approaches for the prevention, diagnosis and treatment of infections through a comprehensive review of the genetic repertoire and metabolic profiles of a pathogen. The developments in informatics have been critical in boosting the translational science and in supporting both reductionist and integrative research paradigms.
Collapse
|
25
|
Vankadavath RN, Hussain AJ, Bodanapu R, Kharshiing E, Basha PO, Gupta S, Sreelakshmi Y, Sharma R. Computer aided data acquisition tool for high-throughput phenotyping of plant populations. PLANT METHODS 2009; 5:18. [PMID: 20003250 PMCID: PMC2796657 DOI: 10.1186/1746-4811-5-18] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Accepted: 12/10/2009] [Indexed: 05/28/2023]
Abstract
BACKGROUND The data generated during a course of a biological experiment/study can be sometimes be massive and its management becomes quite critical for the success of the investigation undertaken. The accumulation and analysis of such large datasets often becomes tedious for biologists and lab technicians. Most of the current phenotype data acquisition management systems do not cater to the specialized needs of large-scale data analysis. The successful application of genomic tools/strategies to introduce desired traits in plants requires extensive and precise phenotyping of plant populations or gene bank material, thus necessitating an efficient data acquisition system. RESULTS Here we describe newly developed software "PHENOME" for high-throughput phenotyping, which allows researchers to accumulate, categorize, and manage large volume of phenotypic data. In this study, a large number of individual tomato plants were phenotyped with the "PHENOME" application using a Personal Digital Assistant (PDA) with built-in barcode scanner in concert with customized database specific for handling large populations. CONCLUSION The phenotyping of large population of plants both in the laboratory and in the field is very efficiently managed using PDA. The data is transferred to a specialized database(s) where it can be further analyzed and catalogued. The "PHENOME" aids collection and analysis of data obtained in large-scale mutagenesis, assessing quantitative trait loci (QTLs), raising mapping population, sampling of several individuals in one or more ecological niches etc.
Collapse
Affiliation(s)
| | - Appibhai Jakir Hussain
- School of Life Sciences, University of Hyderabad, Hyderabad 500 046, India
- JK AgriGenetics, Begumpet, Hyderabad 500016, India
| | - Reddaiah Bodanapu
- School of Life Sciences, University of Hyderabad, Hyderabad 500 046, India
| | - Eros Kharshiing
- School of Life Sciences, University of Hyderabad, Hyderabad 500 046, India
- Department of Botany, St Edmund's College, Meghalaya 793003, India
| | - Pinjari Osman Basha
- School of Life Sciences, University of Hyderabad, Hyderabad 500 046, India
- Department of Genetics and Genomics, Yogi Vemana University, Kadapa 516003, India
| | - Soni Gupta
- School of Life Sciences, University of Hyderabad, Hyderabad 500 046, India
| | | | - Rameshwar Sharma
- School of Life Sciences, University of Hyderabad, Hyderabad 500 046, India
| |
Collapse
|
26
|
Lee E, Jung H, Radivojac P, Kim JW, Lee D. Analysis of AML genes in dysregulated molecular networks. BMC Bioinformatics 2009; 10 Suppl 9:S2. [PMID: 19761572 PMCID: PMC2745689 DOI: 10.1186/1471-2105-10-s9-s2] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identifying disease causing genes and understanding their molecular mechanisms are essential to developing effective therapeutics. Thus, several computational methods have been proposed to prioritize candidate disease genes by integrating different data types, including sequence information, biomedical literature, and pathway information. Recently, molecular interaction networks have been incorporated to predict disease genes, but most of those methods do not utilize invaluable disease-specific information available in mRNA expression profiles of patient samples. RESULTS Through the integration of protein-protein interaction networks and gene expression profiles of acute myeloid leukemia (AML) patients, we identified subnetworks of interacting proteins dysregulated in AML and characterized known mutation genes causally implicated to AML embedded in the subnetworks. The analysis shows that the set of extracted subnetworks is a reservoir rich in AML genes reflecting key leukemogenic processes such as myeloid differentiation. CONCLUSION We showed that the integrative approach both utilizing gene expression profiles and molecular networks could identify AML causing genes most of which were not detectable with gene expression analysis alone due to the minor changes in mRNA level.
Collapse
Affiliation(s)
- Eunjung Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea.
| | | | | | | | | |
Collapse
|
27
|
Mechanism-anchored profiling derived from epigenetic networks predicts outcome in acute lymphoblastic leukemia. BMC Bioinformatics 2009; 10 Suppl 9:S6. [PMID: 19761576 PMCID: PMC2745693 DOI: 10.1186/1471-2105-10-s9-s6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background Current outcome predictors based on "molecular profiling" rely on gene lists selected without consideration for their molecular mechanisms. This study was designed to demonstrate that we could learn about genes related to a specific mechanism and further use this knowledge to predict outcome in patients – a paradigm shift towards accurate "mechanism-anchored profiling". We propose a novel algorithm, PGnet, which predicts a tripartite mechanism-anchored network associated to epigenetic regulation consisting of phenotypes, genes and mechanisms. Genes termed as GEMs in this network meet all of the following criteria: (i) they are co-expressed with genes known to be involved in the biological mechanism of interest, (ii) they are also differentially expressed between distinct phenotypes relevant to the study, and (iii) as a biomodule, genes correlate with both the mechanism and the phenotype. Results This proof-of-concept study, which focuses on epigenetic mechanisms, was conducted in a well-studied set of 132 acute lymphoblastic leukemia (ALL) microarrays annotated with nine distinct phenotypes and three measures of response to therapy. We used established parametric and non parametric statistics to derive the PGnet tripartite network that consisted of 10 phenotypes and 33 significant clusters of GEMs comprising 535 distinct genes. The significance of PGnet was estimated from empirical p-values, and a robust subnetwork derived from ALL outcome data was produced by repeated random sampling. The evaluation of derived robust network to predict outcome (relapse of ALL) was significant (p = 3%), using one hundred three-fold cross-validations and the shrunken centroids classifier. Conclusion To our knowledge, this is the first method predicting co-expression networks of genes associated with epigenetic mechanisms and to demonstrate its inherent capability to predict therapeutic outcome. This PGnet approach can be applied to any regulatory mechanisms including transcriptional or microRNA regulation in order to derive predictive molecular profiles that are mechanistically anchored. The implementation of PGnet in R is freely available at .
Collapse
|
28
|
Automated multidimensional phenotypic profiling using large public microarray repositories. Proc Natl Acad Sci U S A 2009; 106:12323-8. [PMID: 19590007 DOI: 10.1073/pnas.0900883106] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Phenotypes are complex, and difficult to quantify in a high-throughput fashion. The lack of comprehensive phenotype data can prevent or distort genotype-phenotype mapping. Here, we describe "PhenoProfiler," a computational method that enables in silico phenotype profiling. Drawing on the principle that similar gene expression patterns are likely to be associated with similar phenotype patterns, PhenoProfiler supplements the missing quantitative phenotype information for a given microarray dataset based on other well-characterized microarray datasets. We applied our method to 587 human microarray datasets covering >14,000 samples, and confirmed that the predicted phenotype profiles are highly consistent with true phenotype descriptions. PhenoProfiler offers several unique capabilities: (i) automated, multidimensional phenotype profiling, facilitating the analysis and treatment design of complex diseases; (ii) the extrapolation of phenotype profiles beyond provided classes; and (iii) the detection of confounding phenotype factors that could otherwise bias biological inferences. Finally, because no direct comparisons are made between gene expression values from different datasets, the method can use the entire body of cross-platform microarray data. This work has produced a compendium of phenotype profiles for the National Center for Biotechnology Information GEO datasets, which can facilitate an unbiased understanding of the transcriptome-phenome mapping. The continued accumulation of microarray data will further increase the power of PhenoProfiler, by increasing the variety and the quality of phenotypes to be profiled.
Collapse
|
29
|
Lee E, Jung H, Radivojac P, Kim JW, Lee D. Analysis of AML Genes in Dysregulated Molecular Networks. SUMMIT ON TRANSLATIONAL BIOINFORMATICS 2009; 2009:1-18. [PMID: 21347161 PMCID: PMC3041561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
BACKGROUND Identifying disease causing genes and understanding their molecular mechanisms are essential to developing effective therapeutics. Thus, several computational methods have been proposed to prioritize candidate disease genes by integrating different data types, including sequence information, biomedical literature, and pathway information. Recently, molecular interaction networks have been incorporated to predict disease genes, but most of those methods do not utilize invaluable disease-specific information available in mRNA expression profiles of patient samples. RESULTS Through the integration of protein-protein interaction networks and gene expression profiles of acute myeloid leukemia (AML) patients, we identified subnetworks of interacting proteins dysregulated in AML and characterized known mutation genes causally implicated to AML embedded in the subnetworks. The analysis shows that the set of extracted subnetworks is a reservoir rich in AML genes reflecting key leukemogenic processes such as myeloid differentiation, CONCLUSION We showed that the integrative approach both utilizing gene expression profiles and molecular networks could identify AML causing genes most of which were not detectable with gene expression analysis alone due to their minor changes in mRNA.
Collapse
Affiliation(s)
- Eunjung Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea, Biomedical Research Center, KAIST, Daejeon 305-701, South Korea
| | - Hyunchul Jung
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea
| | - Predrag Radivojac
- School of Informatics, Indiana University, Bloomington, IN 47408, USA
| | - Jong-Won Kim
- Department of Laboratory Medicine and Genetics, Samsung Medical Center, Sungkyunkwan University, School of Medicine, Seoul 135-710, South Korea
| | - Doheon Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea,Corresponding author
| |
Collapse
|
30
|
Plant Phenotyping with Low Cost Digital Cameras and Image Analytics. INFORMATION TECHNOLOGIES IN ENVIRONMENTAL ENGINEERING 2009. [DOI: 10.1007/978-3-540-88351-7_18] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
31
|
Robinson PN, Köhler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet 2008; 83:610-5. [PMID: 18950739 DOI: 10.1016/j.ajhg.2008.09.017] [Citation(s) in RCA: 620] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2008] [Revised: 09/24/2008] [Accepted: 09/30/2008] [Indexed: 10/21/2022] Open
Abstract
There are many thousands of hereditary diseases in humans, each of which has a specific combination of phenotypic features, but computational analysis of phenotypic data has been hampered by lack of adequate computational data structures. Therefore, we have developed a Human Phenotype Ontology (HPO) with over 8000 terms representing individual phenotypic anomalies and have annotated all clinical entries in Online Mendelian Inheritance in Man with the terms of the HPO. We show that the HPO is able to capture phenotypic similarities between diseases in a useful and highly significant fashion.
Collapse
|
32
|
Kokel D, Peterson RT. Chemobehavioural phenomics and behaviour-based psychiatric drug discovery in the zebrafish. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2008; 7:483-90. [PMID: 18784194 DOI: 10.1093/bfgp/eln040] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Despite their ubiquity and impact, psychiatric illnesses and other disorders of the central nervous system remain among the most poorly treated diseases. Most psychiatric medicines were discovered due to serendipitous observations of behavioural phenotypes in humans, rodents and other mammals. Extensive behaviour-based chemical screens would likely identify novel psychiatric drugs. However, large-scale chemical screens in mammals are inefficient and impractical. In contrast, zebrafish are very well suited for high-throughput behaviour-based drug discovery. Furthermore, the vast amounts of data generated from large-scale behavioural screens in zebrafish will facilitate a systems-level analysis of how chemicals affect behaviour. Unlike serendipitous discoveries in mammals, a comprehensive and integrative analysis of zebrafish chemobehavioural phenomics may identify functional relationships that would be missed by more reductionist approaches. Thus, behaviour-based chemical screens in the zebrafish may improve our understanding of neurobiology and accelerate the pace of psychiatric drug discovery.
Collapse
Affiliation(s)
- David Kokel
- Cardiovascular Research Center and Division of Cardiology, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, 149 13th Street, Charlestown, MA 02129, USA.
| | | |
Collapse
|
33
|
Abstract
PURPOSE OF REVIEW Large-scale genomic studies establish genotype-phenotype associations, but they use phenotypes that represent current views of disease. There is an opportunity to enhance our understanding of genotype-phenotype associations by extending phenotypes into much greater detail ('deep phenotyping'). RECENT FINDINGS We should engage in deep phenotyping for the following reasons. First, the current emphasis on clinical outcomes, although necessary for the advancement of clinical medicine, is not sufficient. Second, analytical and biological variance embedded in traditional phenotypes dilutes statistical power and strength of association. Finally, even relatively precise phenotypes may vary in terms of underlying pathophysiology across an individual's life history. Deep phenotyping focuses on the biological relevance of pathways and metabolic flux, increasing the 'granularity' of phenotypes. SUMMARY Focus on medical phenotypes is critical, but long-term interests require additional studies that illuminate underlying biology. Deep phenotyping is less likely to yield dramatic changes in current medical practice but it offers an opportunity to gain scientific insight in an incremental manner and to make progress in redefining clinical outcomes with greater precision. It is expensive, and debate is needed to determine when and how it should be applied.
Collapse
Affiliation(s)
- Russell P Tracy
- Pathology and Biochemistry, University of Vermont College of Medicine, Burlington, Vermont 05446, USA.
| |
Collapse
|
34
|
Abstract
The human genome project has stimulated development of impressive repositories of biological knowledge at the genomic level and new knowledge bases are rapidly being developed in a 'bottom-up' fashion. In contrast, higher-level phenomics knowledge bases are underdeveloped, particularly with respect to the complex neuropsychiatric syndrome, symptom, cognitive, and neural systems phenotypes widely acknowledged as critical to advance molecular psychiatry research. This gap limits informatics strategies that could improve both the mining and representation of relevant knowledge, and help prioritize phenotypes for new research. Most existing structured knowledge bases also engage a limited set of contributors, and thus fail to leverage recent developments in social collaborative knowledge-building. We developed a collaborative annotation database to enable representation and sharing of empirical information about phenotypes important to neuropsychiatric research (www.Phenowiki.org). As a proof of concept, we focused on findings relevant to 'cognitive control', a neurocognitive construct considered important to multiple neuropsychiatric syndromes. Currently this knowledge base tabulates empirical findings about heritabilities and measurement properties of specific cognitive task and rating scale indicators (n=449 observations). It is hoped that this new open resource can serve as a starting point that enables broadly collaborative knowledge-building, and help investigators select and prioritize endophenotypes for translational research.
Collapse
|
35
|
Abstract
Biomedical data useful for data mining are often distributed across multiple databases. These databases may be aggregated using several techniques to create single data sets that may be mined using standard approaches; however, separate databases may, in their design or data representation, capture information that is analytically useful and that is lost on integration. Recent techniques for mining multiple databases simultaneously but separately may preserve and leverage the unique perspectives within each database. This article presents an example, "dual mining," in which concurrent analysis of a target database with a related knowledge base can improve the identification of association patterns in the target most likely to be of interest for further analysis.
Collapse
Affiliation(s)
- Mir S Siadaty
- Division of Clinical Informatics, Department of Public Health Sciences, University of Virginia, Suite 3181 West Complex, 1335 Hospital Drive Charlottesville, VA 22908, USA.
| | | |
Collapse
|
36
|
Gundlapalli AV, South BR, Phansalkar S, Kinney AY, Shen S, Delisle S, Perl T, Samore MH. Application of Natural Language Processing to VA Electronic Health Records to Identify Phenotypic Characteristics for Clinical and Research Purposes. SUMMIT ON TRANSLATIONAL BIOINFORMATICS 2008; 2008:36-40. [PMID: 21347124 PMCID: PMC3041527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Informatics tools to extract and analyze clinical information on patients have lagged behind data-mining developments in bioinformatics. While the analyses of an individual's partial or complete genotype is nearly a reality, the phenotypic characteristics that accompany the genotype are not well known and largely inaccessible in free-text patient health records. As the adoption of electronic medical records increases, there exists an urgent need to extract pertinent phenotypic information and make that available to clinicians and researchers. This usually requires the data to be in a structured format that is both searchable and amenable to computation. Using inflammatory bowel disease as an example, this study demonstrates the utility of a natural language processing system (MedLEE) in mining clinical notes in the paperless VA Health Care System. This adaptation of MedLEE is useful for identifying patients with specific clinical conditions, those at risk for or those with symptoms suggestive of those conditions.
Collapse
Affiliation(s)
- Adi V. Gundlapalli
- Departments of Internal Medicine and,Biomedical Informatics, University of Utah School of Medicine
| | - Brett R. South
- Departments of Internal Medicine and,Salt Lake VA Health Care System, Salt Lake City, Utah
| | | | - Anita Y. Kinney
- Departments of Internal Medicine and,Salt Lake VA Health Care System, Salt Lake City, Utah
| | - Shuying Shen
- Departments of Internal Medicine and,Salt Lake VA Health Care System, Salt Lake City, Utah
| | - Sylvain Delisle
- Maryland VA Health Care System and University of Maryland School of Medicine
| | - Trish Perl
- Johns Hopkins Medical Institutions and University, Baltimore, Maryland
| | - Matthew H. Samore
- Departments of Internal Medicine and,Biomedical Informatics, University of Utah School of Medicine,Salt Lake VA Health Care System, Salt Lake City, Utah
| |
Collapse
|
37
|
Van Vooren S, Coessens B, De Moor B, Moreau Y, Vermeesch JR. Array comparative genomic hybridization and computational genome annotation in constitutional cytogenetics: suggesting candidate genes for novel submicroscopic chromosomal imbalance syndromes. Genet Med 2007; 9:642-9. [PMID: 17873653 DOI: 10.1097/gim.0b013e318145b27b] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Genome-wide array comparative genomic hybridization screening is uncovering pathogenic submicroscopic chromosomal imbalances in patients with developmental disorders. In those patients, imbalances appear now to be scattered across the whole genome, and most patients carry different chromosomal anomalies. Screening patients with developmental disorders can be considered a forward functional genome screen. The imbalances pinpoint the location of genes that are involved in human development. Because most imbalances encompass regions harboring multiple genes, the challenge is to (1) identify those genes responsible for the specific phenotype and (2) disentangle the role of the different genes located in an imbalanced region. In this review, we discuss novel tools and relevant databases that have recently been developed to aid this gene discovery process. Identification of the functional relevance of genes will not only deepen our understanding of human development but will, in addition, aid in the data interpretation and improve genetic counseling.
Collapse
Affiliation(s)
- Steven Van Vooren
- Department of Electrotechnical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium.
| | | | | | | | | |
Collapse
|