1
|
Tewari S, Toledo Margalef P, Kareem A, Abdul-Hussein A, White M, Wazana A, Davidge ST, Delrieux C, Connor KL. Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence. J Pers Med 2021; 11:jpm11111064. [PMID: 34834416 PMCID: PMC8621659 DOI: 10.3390/jpm11111064] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 10/01/2021] [Accepted: 10/18/2021] [Indexed: 01/03/2023] Open
Abstract
The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health.
Collapse
Affiliation(s)
- Shrankhala Tewari
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
| | - Pablo Toledo Margalef
- CONICET, National Science and Technology Council of Argentina, Buenos Aires C1425FQD, Argentina; (P.T.M.); (C.D.)
| | - Ayesha Kareem
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
| | - Ayah Abdul-Hussein
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
| | - Marina White
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
| | - Ashley Wazana
- Department of Psychiatry, McGill University, Montreal, QC H3A 0G4, Canada;
| | - Sandra T. Davidge
- Women and Children’s Health Research Institute, University of Alberta, Edmonton, AB T6G 1C9, Canada;
| | - Claudio Delrieux
- CONICET, National Science and Technology Council of Argentina, Buenos Aires C1425FQD, Argentina; (P.T.M.); (C.D.)
- DIEC—Electric and Computer Engineering Department, Universidad Nacional del Sur, Bahía Blanca B8000, Argentina
| | - Kristin L. Connor
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
- Correspondence:
| |
Collapse
|
2
|
Strain J, Spaans F, Serhan M, Davidge ST, Connor KL. Programming of weight and obesity across the lifecourse by the maternal metabolic exposome: A systematic review. Mol Aspects Med 2021; 87:100986. [PMID: 34167845 DOI: 10.1016/j.mam.2021.100986] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 05/14/2021] [Accepted: 06/07/2021] [Indexed: 12/11/2022]
Abstract
Exposome research aims to comprehensively understand the multiple environmental exposures that influence human health. To date, much of exposome science has focused on environmental chemical exposures and does not take a lifecourse approach. The rising prevalence of obesity, and the limited success in its prevention points to the need for a better understanding of the diverse exposures that associate with, or protect against, this condition, and the mechanisms driving its pathogenesis. The objectives of this review were to 1. evaluate the evidence on the maternal metabolic exposome in the programming of offspring growth/obesity and 2. identify and discuss the mechanisms underlying the programming of obesity. A systematic review was conducted following PRISMA guidelines to capture articles that investigated early life metabolic exposures and offspring weight and/or obesity outcomes. Scientific databases were searched using pre-determined indexed search terms, and risk of bias assessments were conducted to determine study quality. A final total of 76 articles were obtained and extracted data from human and animal studies were visualised using GOfER diagrams. Multiple early life exposures, including maternal obesity, diabetes and adverse nutrition, increase the risk of high weight at birth and postnatally, and excess adipose accumulation in human and animal offspring. The main mechanisms through which the metabolic exposome programmes offspring growth and obesity risk include epigenetic modifications, altered placental function, altered composition of the gut microbiome and breast milk, and metabolic inflammation, with downstream effects on development of the central appetite system, adipose tissues and liver. Understanding early life risks and protectors, and the mechanisms through which the exposome modifies health trajectories, is critical for developing and applying early interventions to prevent offspring obesity later in life.
Collapse
Affiliation(s)
- Jamie Strain
- Department of Health Sciences, Carleton University, Ottawa, ON, Canada
| | - Floor Spaans
- Department of Obstetrics and Gynaecology, University of Alberta, Edmonton, AB, Canada; Women and Children's Health Research Institute, University of Alberta, Edmonton, AB, Canada
| | - Mohamed Serhan
- Department of Health Sciences, Carleton University, Ottawa, ON, Canada
| | - Sandra T Davidge
- Department of Obstetrics and Gynaecology, University of Alberta, Edmonton, AB, Canada; Department of Physiology, University of Alberta, Edmonton, AB, Canada; Women and Children's Health Research Institute, University of Alberta, Edmonton, AB, Canada
| | - Kristin L Connor
- Department of Health Sciences, Carleton University, Ottawa, ON, Canada.
| |
Collapse
|
3
|
Kashyap A, Burris H, Callison-Burch C, Boland MR. The CLASSE GATOR (CLinical Acronym SenSE disambiGuATOR): A Method for predicting acronym sense from neonatal clinical notes. Int J Med Inform 2020; 137:104101. [PMID: 32088556 DOI: 10.1016/j.ijmedinf.2020.104101] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 02/12/2020] [Accepted: 02/13/2020] [Indexed: 11/18/2022]
Abstract
OBJECTIVE To develop an algorithm for identifying acronym 'sense' from clinical notes without requiring a clinically annotated training set. MATERIALS AND METHODS Our algorithm is called CLASSE GATOR: Clinical Acronym SenSE disambiGuATOR. CLASSE GATOR extracts acronyms and definitions from PubMed Central (PMC). A logistic regression model is trained using words associated with specific acronym-definition pairs from PMC. CLASSE GATOR uses this library of acronym-definitions and their corresponding word feature vectors to predict the acronym 'sense' from Beth Israel Deaconess (MIMIC-III) neonatal notes. RESULTS We identified 1,257 acronyms and 8,287 definitions including a random definition from 31,764 PMC articles on prenatal exposures and 2,227,674 PMC open access articles. The average number of senses (definitions) per acronym was 6.6 (min = 2, max = 50). The average internal 5-fold cross validation was 87.9 % (on PMC). We found 727 unique acronyms (57.29 %) from PMC were present in 105,044 neonatal notes (MIMIC-III). We evaluated the performance of acronym prediction using 245 manually annotated clinical notes with 9 distinct acronyms. CLASSE GATOR achieved an overall accuracy of 63.04 % and outperformed random for 8/9 acronyms (88.89 %) when applied to clinical notes. We also compared our algorithm with UMN's acronym set, and found that CLASSE GATOR outperformed random for 63.46 % of 52 acronyms when using logistic regression, 75.00 % when using Bert and 76.92 % when using BioBert as the prediction algorithm within CLASSE GATOR. CONCLUSIONS CLASSE GATOR is the first automated acronym sense disambiguation method for clinical notes. Importantly, CLASSE GATOR does not require an expensive manually annotated acronym-definition corpus for training.
Collapse
Affiliation(s)
- Aditya Kashyap
- Department of Computer Science, University of Pennsylvania, United States
| | - Heather Burris
- Department of Pediatrics, Division of Neonatology, Children's Hospital of Philadelphia, United States; Perelman School of Medicine, University of Pennsylvania, United States
| | | | - Mary Regina Boland
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, United States; Institute for Biomedical Informatics, University of Pennsylvania, United States; Center for Excellence in Environmental Toxicology, University of Pennsylvania, United States; Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, United States.
| |
Collapse
|
4
|
Boland MR, Casal ML, Kraus MS, Gelzer AR. Applied Veterinary Informatics: Development of a Semantic and Domain-Specific Method to Construct a Canine Data Repository. Sci Rep 2019; 9:18641. [PMID: 31819105 PMCID: PMC6901510 DOI: 10.1038/s41598-019-55035-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 11/21/2019] [Indexed: 11/08/2022] Open
Abstract
Animals are used to study the pathogenesis of various human diseases, but typically as animal models with induced disease. However, companion animals develop disease spontaneously in a way that mirrors disease development in humans. The purpose of this study is to develop a semantic and domain-specific method to enable construction of a data repository from a veterinary hospital that would be useful for future studies. We developed a two-phase method that combines semantic and domain-specific approaches to construct a canine data repository of clinical data collected during routine care at the Matthew J Ryan Veterinary Hospital of the University of Pennsylvania (PennVet). Our framework consists of two phases: (1) a semantic data-cleaning phase and (2) a domain-specific data-cleaning phase. We validated our data repository using a gold standard of known breed predispositions for certain diseases (i.e., mitral valve disease, atrial fibrillation and osteosarcoma). Our two-phase method allowed us to maximize data retention (99.8% of data retained), while ensuring the quality of our result. Our final population contained 84,405 dogs treated between 2000 and 2017 from 194 distinct dog breeds. We observed the expected breed associations with mitral valve disease, atrial fibrillation, and osteosarcoma (P < 0.05) after adjusting for multiple comparisons. Precision ranged from 60.0 to 83.3 for the three diseases (avg. 74.2) and recall ranged from 31.6 to 83.3 (avg. 53.3). Our study describes a two-phase method to construct a clinical data repository using canine data obtained during routine clinical care at a veterinary hospital.
Collapse
Affiliation(s)
- Mary Regina Boland
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
- Center for Excellence in Environmental Toxicology, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.
| | - Margret L Casal
- Department of Clinical Studies and Advanced Medicine, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Marc S Kraus
- Department of Clinical Studies and Advanced Medicine, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Anna R Gelzer
- Department of Clinical Studies and Advanced Medicine, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|