1
|
Malec SA, Taneja SB, Albert SM, Elizabeth Shaaban C, Karim HT, Levine AS, Munro P, Callahan TJ, Boyce RD. Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: A use case studying depression as a risk factor for Alzheimer's disease. J Biomed Inform 2023; 142:104368. [PMID: 37086959 PMCID: PMC10355339 DOI: 10.1016/j.jbi.2023.104368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 03/03/2023] [Accepted: 04/17/2023] [Indexed: 04/24/2023]
Abstract
BACKGROUND Causal feature selection is essential for estimating effects from observational data. Identifying confounders is a crucial step in this process. Traditionally, researchers employ content-matter expertise and literature review to identify confounders. Uncontrolled confounding from unidentified confounders threatens validity, conditioning on intermediate variables (mediators) weakens estimates, and conditioning on common effects (colliders) induces bias. Additionally, without special treatment, erroneous conditioning on variables combining roles introduces bias. However, the vast literature is growing exponentially, making it infeasible to assimilate this knowledge. To address these challenges, we introduce a novel knowledge graph (KG) application enabling causal feature selection by combining computable literature-derived knowledge with biomedical ontologies. We present a use case of our approach specifying a causal model for estimating the total causal effect of depression on the risk of developing Alzheimer's disease (AD) from observational data. METHODS We extracted computable knowledge from a literature corpus using three machine reading systems and inferred missing knowledge using logical closure operations. Using a KG framework, we mapped the output to target terminologies and combined it with ontology-grounded resources. We translated epidemiological definitions of confounder, collider, and mediator into queries for searching the KG and summarized the roles played by the identified variables. We compared the results with output from a complementary method and published observational studies and examined a selection of confounding and combined role variables in-depth. RESULTS Our search identified 128 confounders, including 58 phenotypes, 47 drugs, 35 genes, 23 collider, and 16 mediator phenotypes. However, only 31 of the 58 confounder phenotypes were found to behave exclusively as confounders, while the remaining 27 phenotypes played other roles. Obstructive sleep apnea emerged as a potential novel confounder for depression and AD. Anemia exemplified a variable playing combined roles. CONCLUSION Our findings suggest combining machine reading and KG could augment human expertise for causal feature selection. However, the complexity of causal feature selection for depression with AD highlights the need for standardized field-specific databases of causal variables. Further work is needed to optimize KG search and transform the output for human consumption.
Collapse
Affiliation(s)
- Scott A Malec
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Steven M Albert
- Department of Behavioral and Community Health Sciences, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - C Elizabeth Shaaban
- Department of Epidemiology, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Helmet T Karim
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA; Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA
| | - Arthur S Levine
- Department of Neurobiology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA; The Brain Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Paul Munro
- School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, USA
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
2
|
Zheng J, Zhang Y, Rasheed H, Walker V, Sugawara Y, Li J, Leng Y, Elsworth B, Wootton RE, Fang S, Yang Q, Burgess S, Haycock PC, Borges MC, Cho Y, Carnegie R, Howell A, Robinson J, Thomas LF, Brumpton BM, Hveem K, Hallan S, Franceschini N, Morris AP, Köttgen A, Pattaro C, Wuttke M, Yamamoto M, Kashihara N, Akiyama M, Kanai M, Matsuda K, Kamatani Y, Okada Y, Walters R, Millwood IY, Chen Z, Davey Smith G, Barbour S, Yu C, Åsvold BO, Zhang H, Gaunt TR. Trans-ethnic Mendelian-randomization study reveals causal relationships between cardiometabolic factors and chronic kidney disease. Int J Epidemiol 2022; 50:1995-2010. [PMID: 34999880 PMCID: PMC8743120 DOI: 10.1093/ije/dyab203] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Accepted: 09/01/2021] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND This study was to systematically test whether previously reported risk factors for chronic kidney disease (CKD) are causally related to CKD in European and East Asian ancestries using Mendelian randomization. METHODS A total of 45 risk factors with genetic data in European ancestry and 17 risk factors in East Asian participants were identified as exposures from PubMed. We defined the CKD by clinical diagnosis or by estimated glomerular filtration rate of <60 ml/min/1.73 m2. Ultimately, 51 672 CKD cases and 958 102 controls of European ancestry from CKDGen, UK Biobank and HUNT, and 13 093 CKD cases and 238 118 controls of East Asian ancestry from Biobank Japan, China Kadoorie Biobank and Japan-Kidney-Biobank/ToMMo were included. RESULTS Eight risk factors showed reliable evidence of causal effects on CKD in Europeans, including genetically predicted body mass index (BMI), hypertension, systolic blood pressure, high-density lipoprotein cholesterol, apolipoprotein A-I, lipoprotein(a), type 2 diabetes (T2D) and nephrolithiasis. In East Asians, BMI, T2D and nephrolithiasis showed evidence of causality on CKD. In two independent replication analyses, we observed that increased hypertension risk showed reliable evidence of a causal effect on increasing CKD risk in Europeans but in contrast showed a null effect in East Asians. Although liability to T2D showed consistent effects on CKD, the effects of glycaemic phenotypes on CKD were weak. Non-linear Mendelian randomization indicated a threshold relationship between genetically predicted BMI and CKD, with increased risk at BMI of >25 kg/m2. CONCLUSIONS Eight cardiometabolic risk factors showed causal effects on CKD in Europeans and three of them showed causality in East Asians, providing insights into the design of future interventions to reduce the burden of CKD.
Collapse
Affiliation(s)
- Jie Zheng
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
| | - Yuemiao Zhang
- Renal Division, Peking University First Hospital, Peking University Institute of Nephrology, Key Laboratory of Renal Disease, Ministry of Health of China, Key Laboratory of Chronic Kidney Disease Prevention and Treatment (Peking University), Ministry of Education, Beijing, P. R. China
| | - Humaira Rasheed
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
| | - Venexia Walker
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Yuka Sugawara
- Division of Nephrology and Endocrinology, The University of Tokyo Hospital, Tokyo, Japan
| | - Jiachen Li
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, P. R. China
| | - Yue Leng
- Department of Psychiatry, University of California, San Francisco, CA, USA
| | - Benjamin Elsworth
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
| | - Robyn E Wootton
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
| | - Si Fang
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
| | - Qian Yang
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
| | - Stephen Burgess
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, UK
- Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Philip C Haycock
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
| | - Maria Carolina Borges
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
| | - Yoonsu Cho
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
| | - Rebecca Carnegie
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
| | - Amy Howell
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
| | - Jamie Robinson
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
| | - Laurent F Thomas
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Clinical and Molecular Medicine, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
| | - Ben Michael Brumpton
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Thoracic Medicine, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway
| | - Kristian Hveem
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
| | - Stein Hallan
- Department of Clinical and Molecular Medicine, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Nephrology, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Andrew P Morris
- Division of Musculoskeletal and Dermatological Sciences, University of Manchester, Manchester, UK
| | - Anna Köttgen
- Institute of Genetic Epidemiology, Department of Biometry, Epidemiology and Medical Bioinformatics, Faculty of Medicine and Medical Center–University of Freiburg, Freiburg, Germany
| | - Cristian Pattaro
- Eurac Research, Institute for Biomedicine (affiliated with the University of Lübeck), Bolzano, Italy
| | - Matthias Wuttke
- Institute of Genetic Epidemiology, Department of Biometry, Epidemiology and Medical Bioinformatics, Faculty of Medicine and Medical Center–University of Freiburg, Freiburg, Germany
| | - Masayuki Yamamoto
- Tohoku Medical Megabank Organization and Tohoku University Graduate School of Medicine, Tohoku University, Sendai, Miyagi, Japan
| | - Naoki Kashihara
- Department of Nephrology and Hypertension, Kawasaki Medical School, Kurashiki, Okayama, Japan
| | - Masato Akiyama
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Department of Ophthalmology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | - Masahiro Kanai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Koichi Matsuda
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, the University of Tokyo, Tokyo, Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, the University of Tokyo, Tokyo, Japan
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
- Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Japan
| | - Robin Walters
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Iona Y Millwood
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Zhengming Chen
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - George Davey Smith
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
- NIHR Biomedical Research Centre at the University Hospitals Bristol NHS Foundation Trust and the University of Bristol, UK
| | - Sean Barbour
- Division of Nephrology, University of British Columbia, Vancouver, British Columbia, Canada
- British Columbia Provincial Renal Agency, Vancouver, British Columbia, Canada
| | - Canqing Yu
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, P. R. China
| | - Bjørn Olav Åsvold
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Endocrinology, Clinic of Medicine, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway
| | - Hong Zhang
- Renal Division, Peking University First Hospital, Peking University Institute of Nephrology, Key Laboratory of Renal Disease, Ministry of Health of China, Key Laboratory of Chronic Kidney Disease Prevention and Treatment (Peking University), Ministry of Education, Beijing, P. R. China
| | - Tom R Gaunt
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
- NIHR Biomedical Research Centre at the University Hospitals Bristol NHS Foundation Trust and the University of Bristol, UK
| |
Collapse
|
3
|
Sprooten J, Vankerckhoven A, Vanmeerbeek I, Borras DM, Berckmans Y, Wouters R, Laureano RS, Baert T, Boon L, Landolfo C, Testa AC, Fischerova D, Van Holsbeke C, Bourne T, Chiappa V, Froyman W, Schols D, Agostinis P, Timmerman D, Tejpar S, Vergote I, Coosemans A, Garg AD. Peripherally-driven myeloid NFkB and IFN/ISG responses predict malignancy risk, survival, and immunotherapy regime in ovarian cancer. J Immunother Cancer 2021; 9:jitc-2021-003609. [PMID: 34795003 PMCID: PMC8603275 DOI: 10.1136/jitc-2021-003609] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/17/2021] [Indexed: 12/21/2022] Open
Abstract
Background Tumors can influence peripheral immune macroenvironment, thereby creating opportunities for non-invasive serum/plasma immunobiomarkers for immunostratification and immunotherapy designing. However, current approaches for immunobiomarkers’ detection are largely quantitative, which is unreliable for assessing functional peripheral immunodynamics of patients with cancer. Hence, we aimed to design a functional biomarker modality for capturing peripheral immune signaling in patients with cancer for reliable immunostratification. Methods We used a data-driven in silico framework, integrating existing tumor/blood bulk-RNAseq or single-cell (sc)RNAseq datasets of patients with cancer, to inform the design of an innovative serum-screening modality, that is, serum-functional immunodynamic status (sFIS) assay. Next, we pursued proof-of-concept analyses via multiparametric serum profiling of patients with ovarian cancer (OV) with sFIS assay combined with Luminex (cytokines/soluble immune checkpoints), CA125-antigen detection, and whole-blood immune cell counts. Here, sFIS assay’s ability to determine survival benefit or malignancy risk was validated in a discovery (n=32) and/or validation (n=699) patient cohorts. Lastly, we used an orthotopic murine metastatic OV model, with anti-OV therapy selection via in silico drug–target screening and murine serum screening via sFIS assay, to assess suitable in vivo immunotherapy options. Results In silico data-driven framework predicted that peripheral immunodynamics of patients with cancer might be best captured via analyzing myeloid nuclear factor kappa-light-chain enhancer of activated B cells (NFκB) signaling and interferon-stimulated genes' (ISG) responses. This helped in conceptualization of an ‘in sitro’ (in vitro+in situ) sFIS assay, where human myeloid cells were exposed to patients’ serum in vitro, to assess serum-induced (si)-NFκB or interferon (IFN)/ISG responses (as active signaling reporter activity) within them, thereby ‘mimicking’ patients’ in situ immunodynamic status. Multiparametric serum profiling of patients with OV established that sFIS assay can: decode peripheral immunology (by indicating higher enrichment of si-NFκB over si-IFN/ISG responses), estimate survival trends (si-NFκB or si-IFN/ISG responses associating with negative or positive prognosis, respectively), and coestimate malignancy risk (relative to benign/borderline ovarian lesions). Biologically, we documented dominance of pro-tumorigenic, myeloid si-NFκB responseHIGHsi-IFN/ISG responseLOW inflammation in periphery of patients with OV. Finally, in an orthotopic murine metastatic OV model, sFIS assay predicted the higher capacity of chemo-immunotherapy (paclitaxel–carboplatin plus anti-TNF antibody combination) in achieving a pro-immunogenic peripheral milieu (si-IFN/ISG responseHIGHsi-NFκB responseLOW), which aligned with high antitumor efficacy. Conclusions We established sFIS assay as a novel biomarker resource for serum screening in patients with OV to evaluate peripheral immunodynamics, patient survival trends and malignancy risk, and to design preclinical chemo-immunotherapy strategies.
Collapse
Affiliation(s)
- Jenny Sprooten
- Laboratory of Cell Stress & Immunity, Department of Cellular & Molecular Medicine, KU Leuven, Leuven, Belgium
| | - Ann Vankerckhoven
- Department of Oncology, Leuven Cancer Institute, Laboratory of Tumor Immunology and Immunotherapy, KU Leuven, Leuven, Belgium
| | - Isaure Vanmeerbeek
- Laboratory of Cell Stress & Immunity, Department of Cellular & Molecular Medicine, KU Leuven, Leuven, Belgium
| | - Daniel M Borras
- Laboratory of Cell Stress & Immunity, Department of Cellular & Molecular Medicine, KU Leuven, Leuven, Belgium
| | - Yani Berckmans
- Department of Oncology, Leuven Cancer Institute, Laboratory of Tumor Immunology and Immunotherapy, KU Leuven, Leuven, Belgium
| | - Roxanne Wouters
- Department of Oncology, Leuven Cancer Institute, Laboratory of Tumor Immunology and Immunotherapy, KU Leuven, Leuven, Belgium
| | - Raquel S Laureano
- Laboratory of Cell Stress & Immunity, Department of Cellular & Molecular Medicine, KU Leuven, Leuven, Belgium
| | - Thais Baert
- Department of Oncology, Leuven Cancer Institute, Laboratory of Tumor Immunology and Immunotherapy, KU Leuven, Leuven, Belgium.,Department of Oncology, Leuven Cancer Institute, Laboratory of Gynaecologic Oncology, KU Leuven, Leuven, Belgium
| | | | - Chiara Landolfo
- Department of Oncology, Leuven Cancer Institute, Laboratory of Tumor Immunology and Immunotherapy, KU Leuven, Leuven, Belgium.,Department of Development and Regeneration, KU Leuven, Leuven, Belgium.,Queen Charlotte's and Chelsea Hospital, Imperial College, London, UK.,Dipartimento Scienze della Salute della Donna e del Bambino, Fondazione Policlinico Universitario A. Gemelli, Istituto di Ricovero e Cura a Carattere Scientifico, Rome, Italy
| | - Antonia Carla Testa
- Dipartimento Scienze della Salute della Donna e del Bambino, Fondazione Policlinico Universitario A. Gemelli, Istituto di Ricovero e Cura a Carattere Scientifico, Rome, Italy.,Dipartimento Scienze della Vita e Sanità pubblica, Università Cattolica del Sacro Cuore, Rome, Italy
| | | | | | - Tom Bourne
- Queen Charlotte's and Chelsea Hospital, Imperial College, London, UK
| | | | - Wouter Froyman
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium.,Department of Gynaecology and Obstetrics, UZ Leuven, Leuven, Belgium
| | - Dominique Schols
- Department of Microbiology, Immunology and Transplantation, Laboratory of Virology and Chemotherapy, Rega Institute, KU Leuven, Leuven, Belgium
| | - Patrizia Agostinis
- Department of Cellular and Molecular Medicine, Cell Death Research and Therapy Laboratory, KU Leuven, Belgium.,VIB Center for Cancer Biology, VIB, Leuven, Belgium
| | - Dirk Timmerman
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium.,Department of Gynaecology and Obstetrics, UZ Leuven, Leuven, Belgium
| | - Sabine Tejpar
- Laboratory for Molecular Digestive Oncology, Department of Oncology, KU Leuven, Leuven, Belgium
| | - Ignace Vergote
- Department of Oncology, Leuven Cancer Institute, Laboratory of Tumor Immunology and Immunotherapy, KU Leuven, Leuven, Belgium.,Department of Oncology, Leuven Cancer Institute, Laboratory of Gynaecologic Oncology, KU Leuven, Leuven, Belgium.,Department of Gynaecology and Obstetrics, UZ Leuven, Leuven, Belgium
| | - An Coosemans
- Department of Oncology, Leuven Cancer Institute, Laboratory of Tumor Immunology and Immunotherapy, KU Leuven, Leuven, Belgium
| | - Abhishek D Garg
- Laboratory of Cell Stress & Immunity, Department of Cellular & Molecular Medicine, KU Leuven, Leuven, Belgium
| |
Collapse
|
4
|
Liu Y, Elsworth B, Erola P, Haberland V, Hemani G, Lyon M, Zheng J, Lloyd O, Vabistsevits M, Gaunt TR. EpiGraphDB: a database and data mining platform for health data science. Bioinformatics 2021; 37:1304-1311. [PMID: 33165574 PMCID: PMC8189674 DOI: 10.1093/bioinformatics/btaa961] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 10/03/2020] [Accepted: 11/04/2020] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION The wealth of data resources on human phenotypes, risk factors, molecular traits and therapeutic interventions presents new opportunities for population health sciences. These opportunities are paralleled by a growing need for data integration, curation and mining to increase research efficiency, reduce mis-inference and ensure reproducible research. RESULTS We developed EpiGraphDB (https://epigraphdb.org/), a graph database containing an array of different biomedical and epidemiological relationships and an analytical platform to support their use in human population health data science. In addition, we present three case studies that illustrate the value of this platform. The first uses EpiGraphDB to evaluate potential pleiotropic relationships, addressing mis-inference in systematic causal analysis. In the second case study, we illustrate how protein-protein interaction data offer opportunities to identify new drug targets. The final case study integrates causal inference using Mendelian randomization with relationships mined from the biomedical literature to 'triangulate' evidence from different sources. AVAILABILITY AND IMPLEMENTATION The EpiGraphDB platform is openly available at https://epigraphdb.org. Code for replicating case study results is available at https://github.com/MRCIEU/epigraphdb as Jupyter notebooks using the API, and https://mrcieu.github.io/epigraphdb-r using the R package. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yi Liu
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK
| | - Benjamin Elsworth
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK
| | - Pau Erola
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK
| | - Valeriia Haberland
- Cancer Genetics, Norwich Medical School, University of East Anglia, Norwich, UK
| | - Gibran Hemani
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK
| | - Matt Lyon
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK
- NIHR Bristol Biomedical Research Centre, University of Bristol, Bristol, UK
| | - Jie Zheng
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK
| | - Oliver Lloyd
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK
| | - Marina Vabistsevits
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK
| | - Tom R Gaunt
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK
- NIHR Bristol Biomedical Research Centre, University of Bristol, Bristol, UK
| |
Collapse
|
5
|
Richardson TG, Zheng J, Gaunt TR. Computational Tools for Causal Inference in Genetics. Cold Spring Harb Perspect Med 2021; 11:a039248. [PMID: 33288654 PMCID: PMC8168525 DOI: 10.1101/cshperspect.a039248] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The advent of large-scale, phenotypically rich, and readily accessible data provides an unprecedented opportunity for epidemiologists, statistical geneticists, bioinformaticians, and also behavioral and social scientists to investigate the causes and consequences of disease. Computational tools and resources are an integral component of such endeavors, which will become increasingly important as these data continue to grow exponentially. In this review, we have provided an overview of computational software and databases that have been developed to assist with analyses in causal inference. This includes online tools that can be used to help generate hypotheses, publicly accessible resources that store summary-level information for millions of genetic markers, and computational approaches that can be used to leverage this wealth of data to study causal relationships.
Collapse
Affiliation(s)
- Tom G Richardson
- MRC Integrative Epidemiology Unit (IEU), Population Health Sciences, Bristol Medical School, University of Bristol, Bristol BS8 2BN, United Kingdom
| | - Jie Zheng
- MRC Integrative Epidemiology Unit (IEU), Population Health Sciences, Bristol Medical School, University of Bristol, Bristol BS8 2BN, United Kingdom
| | - Tom R Gaunt
- MRC Integrative Epidemiology Unit (IEU), Population Health Sciences, Bristol Medical School, University of Bristol, Bristol BS8 2BN, United Kingdom
| |
Collapse
|
6
|
Elsworth B, Gaunt TR. MELODI Presto: a fast and agile tool to explore semantic triples derived from biomedical literature. Bioinformatics 2021; 37:583-585. [PMID: 32810207 PMCID: PMC8088324 DOI: 10.1093/bioinformatics/btaa726] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 07/09/2020] [Accepted: 08/11/2020] [Indexed: 12/05/2022] Open
Abstract
SUMMARY The field of literature-based discovery is growing in step with the volume of literature being produced. From modern natural language processing algorithms to high quality entity tagging, the methods and their impact are developing rapidly. One annotation object that arises from these approaches, the subject-predicate-object triple, is proving to be very useful in representing knowledge. We have implemented efficient search methods and an application programming interface, to create fast and convenient functions to utilize triples extracted from the biomedical literature by SemMedDB. By refining these data, we have identified a set of triples that focus on the mechanistic aspects of the literature, and provide simple methods to explore both enriched triples from single queries, and overlapping triples across two query lists. AVAILABILITY AND IMPLEMENTATION https://melodi-presto.mrcieu.ac.uk/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Benjamin Elsworth
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol BS8 2BN, UK
| | - Tom R Gaunt
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol BS8 2BN, UK
| |
Collapse
|
7
|
Mei S, Huang X, Xie C, Mora A. GREG-studying transcriptional regulation using integrative graph databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5735477. [PMID: 32055858 PMCID: PMC7018612 DOI: 10.1093/database/baz162] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 12/08/2019] [Accepted: 12/31/2019] [Indexed: 01/08/2023]
Abstract
A gene regulatory process is the result of the concerted action of transcription factors, co-factors, regulatory non-coding RNAs (ncRNAs) and chromatin interactions. Therefore, the combination of protein-DNA, protein-protein, ncRNA-DNA, ncRNA-protein and DNA-DNA data in a single graph database offers new possibilities regarding generation of biological hypotheses. GREG (The Gene Regulation Graph Database) is an integrative database and web resource that allows the user to visualize and explore the network of all above-mentioned interactions for a query transcription factor, long non-coding RNA, genomic range or DNA annotation, as well as extracting node and interaction information, identifying connected nodes and performing advanced graphical queries directly on the regulatory network, in a simple and efficient way. In this article, we introduce GREG together with some application examples (including exploratory research of Nanog's regulatory landscape and the etiology of chronic obstructive pulmonary disease), which we use as a demonstration of the advantages of using graph databases in biomedical research. Database URL: https://mora-lab.github.io/projects/greg.html, www.moralab.science/GREG/.
Collapse
Affiliation(s)
- Songqing Mei
- School of Basic Medical Sciences, Guangzhou Medical University, Panyu Campus of Guangzhou Medical University, Xinzao, 511436 Guangzhou, P.R. China
| | - Xiaowei Huang
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health (Chinese Academy of Sciences), Panyu Campus of Guangzhou Medical University, Xinzao, 511436 Guangzhou, P.R. China
| | - Chengshu Xie
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health (Chinese Academy of Sciences), Panyu Campus of Guangzhou Medical University, Xinzao, 511436 Guangzhou, P.R. China
| | - Antonio Mora
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health (Chinese Academy of Sciences), Panyu Campus of Guangzhou Medical University, Xinzao, 511436 Guangzhou, P.R. China
| |
Collapse
|
8
|
Caufield JH, Ping P. New advances in extracting and learning from protein-protein interactions within unstructured biomedical text data. Emerg Top Life Sci 2019; 3:357-369. [PMID: 33523203 DOI: 10.1042/etls20190003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 07/11/2019] [Accepted: 07/16/2019] [Indexed: 12/14/2022]
Abstract
Protein-protein interactions, or PPIs, constitute a basic unit of our understanding of protein function. Though substantial effort has been made to organize PPI knowledge into structured databases, maintenance of these resources requires careful manual curation. Even then, many PPIs remain uncurated within unstructured text data. Extracting PPIs from experimental research supports assembly of PPI networks and highlights relationships crucial to elucidating protein functions. Isolating specific protein-protein relationships from numerous documents is technically demanding by both manual and automated means. Recent advances in the design of these methods have leveraged emerging computational developments and have demonstrated impressive results on test datasets. In this review, we discuss recent developments in PPI extraction from unstructured biomedical text. We explore the historical context of these developments, recent strategies for integrating and comparing PPI data, and their application to advancing the understanding of protein function. Finally, we describe the challenges facing the application of PPI mining to the text concerning protein families, using the multifunctional 14-3-3 protein family as an example.
Collapse
Affiliation(s)
- J Harry Caufield
- The NIH BD2K Center of Excellence in Biomedical Computing, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Department of Physiology, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
| | - Peipei Ping
- The NIH BD2K Center of Excellence in Biomedical Computing, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Department of Physiology, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Department of Medicine/Cardiology, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Department of Bioinformatics, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Scalable Analytics Institute (ScAi), University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
| |
Collapse
|
9
|
Robles LA, Dawe K, Martin RM, Higgins JPT, Lewis SJ. Does testosterone mediate the relationship between vitamin D and prostate cancer? A systematic review and meta-analysis protocol. Syst Rev 2019; 8:52. [PMID: 30755270 PMCID: PMC6371501 DOI: 10.1186/s13643-018-0908-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 12/06/2018] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Evidence from studies on prostate cancer progression have identified vitamin D to be a potentially important nutrient. However, the World Cancer Research Fund and American Institute for Cancer Research have reported the quality of this evidence to be limited and warrant further investigation. We plan to use the recently developed WCRF International/University of Bristol mechanistic systematic review framework to determine whether the observed association between vitamin D and prostate cancer exists through a plausible biological pathway. METHODS This protocol sets out how we will perform a systematic review of the literature in human and animal studies. We will search the electronic databases MEDLINE, EMBASE, PubMed, and BIOSIS Citation Index without restrictions on year of publication or language. We will extract data from observational and experimental studies examining two inter-linked pathways in the relationship between vitamin D and prostate cancer progression: (1) vitamin D and testosterone, and (2) testosterone and prostate cancer progression. We focus on testosterone as its actions form a potentially novel intermediate mechanism that was identified via our online literature mining tools. The outcomes of interest include incidence or prevalence of prostate cancer, measures of prostate cancer progression (including biochemical recurrence, local, or distal metastases), and prostate cancer-specific mortality. We will assess study quality and the level of certainty of the evidence. We will analyse data where possible, using meta-analysis with forest plots or albatross plots; otherwise, a narrative synthesis will be performed. DISCUSSION To our knowledge, this will be the first systematic synthesis of the evidence underpinning the vitamin D-testosterone-prostate cancer mechanistic pathway. The results of the review may inform future research, intervention trials, and public health messages.
Collapse
Affiliation(s)
- Luke A. Robles
- Bristol Medicine School, Population Health Sciences, University of Bristol, Bristol, England
- MRC Integrative Epidemiology Unit (IEU), University of Bristol, Office OG25, Oakfield House, Oakfield Grove, Bristol, BS8 2BN England
| | - Karen Dawe
- Bristol Medicine School, Population Health Sciences, University of Bristol, Bristol, England
- MRC Integrative Epidemiology Unit (IEU), University of Bristol, Office OG25, Oakfield House, Oakfield Grove, Bristol, BS8 2BN England
| | - Richard M. Martin
- Bristol Medicine School, Population Health Sciences, University of Bristol, Bristol, England
- MRC Integrative Epidemiology Unit (IEU), University of Bristol, Office OG25, Oakfield House, Oakfield Grove, Bristol, BS8 2BN England
- National Institute for Health Research (NIHR) Bristol Biomedical Research Centre, University of Bristol, Bristol, England
| | - Julian P. T. Higgins
- Bristol Medicine School, Population Health Sciences, University of Bristol, Bristol, England
- MRC Integrative Epidemiology Unit (IEU), University of Bristol, Office OG25, Oakfield House, Oakfield Grove, Bristol, BS8 2BN England
| | - Sarah J. Lewis
- Bristol Medicine School, Population Health Sciences, University of Bristol, Bristol, England
- MRC Integrative Epidemiology Unit (IEU), University of Bristol, Office OG25, Oakfield House, Oakfield Grove, Bristol, BS8 2BN England
| |
Collapse
|