1
|
Konopka T, Vestito L, Smedley D. Dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function. BIOINFORMATICS ADVANCES 2021; 1:vbab026. [PMID: 34870209 PMCID: PMC8633315 DOI: 10.1093/bioadv/vbab026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 09/09/2021] [Accepted: 10/07/2021] [Indexed: 01/27/2023]
Abstract
Animal models have long been used to study gene function and the impact of genetic mutations on phenotype. Through the research efforts of thousands of research groups, systematic curation of published literature and high-throughput phenotyping screens, the collective body of knowledge for the mouse now covers the majority of protein-coding genes. We here collected data for over 53 000 mouse models with mutations in over 15 000 genomic markers and characterized by more than 254 000 annotations using more than 9000 distinct ontology terms. We investigated dimensional reduction and embedding techniques as means to facilitate access to this diverse and high-dimensional information. Our analyses provide the first visual maps of the landscape of mouse phenotypic diversity. We also summarize some of the difficulties in producing and interpreting embeddings of sparse phenotypic data. In particular, we show that data preprocessing, filtering and encoding have as much impact on the final embeddings as the process of dimensional reduction. Nonetheless, techniques developed in the context of dimensional reduction create opportunities for explorative analysis of this large pool of public data, including for searching for mouse models suited to study human diseases. AVAILABILITY AND IMPLEMENTATION Source code for analysis scripts is available on GitHub at https://github.com/tkonopka/mouse-embeddings. The data underlying this article are available in Zenodo at https://doi.org/10.5281/zenodo.4916171. CONTACT t.konopka@qmul.ac.uk. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Tomasz Konopka
- William Harvey Research Institute, Queen Mary University of London, EC1M 6BQ London, UK,To whom correspondence should be addressed.
| | - Letizia Vestito
- William Harvey Research Institute, Queen Mary University of London, EC1M 6BQ London, UK,Ear Institute, University College London, WC1X 8EE London, UK,Great Ormond Street Institute of Child Health, University College London, WC1N 1EH London, UK
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, EC1M 6BQ London, UK
| |
Collapse
|
2
|
Powell DR, Revelli JP, Doree DD, DaCosta CM, Desai U, Shadoan MK, Rodriguez L, Mullens M, Yang QM, Ding ZM, Kirkpatrick LL, Vogel P, Zambrowicz B, Sands AT, Platt KA, Hansen GM, Brommage R. High-Throughput Screening of Mouse Gene Knockouts Identifies Established and Novel High Body Fat Phenotypes. Diabetes Metab Syndr Obes 2021; 14:3753-3785. [PMID: 34483672 PMCID: PMC8409770 DOI: 10.2147/dmso.s322083] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 08/04/2021] [Indexed: 01/05/2023] Open
Abstract
PURPOSE Obesity is a major public health problem. Understanding which genes contribute to obesity may better predict individual risk and allow development of new therapies. Because obesity of a mouse gene knockout (KO) line predicts an association of the orthologous human gene with obesity, we reviewed data from the Lexicon Genome5000TM high throughput phenotypic screen (HTS) of mouse gene KOs to identify KO lines with high body fat. MATERIALS AND METHODS KO lines were generated using homologous recombination or gene trapping technologies. HTS body composition analyses were performed on adult wild-type and homozygous KO littermate mice from 3758 druggable mouse genes having a human ortholog. Body composition was measured by either DXA or QMR on chow-fed cohorts from all 3758 KO lines and was measured by QMR on independent high fat diet-fed cohorts from 2488 of these KO lines. Where possible, comparisons were made to HTS data from the International Mouse Phenotyping Consortium (IMPC). RESULTS Body fat data are presented for 75 KO lines. Of 46 KO lines where independent external published and/or IMPC KO lines are reported as obese, 43 had increased body fat. For the remaining 29 novel high body fat KO lines, Ksr2 and G2e3 are supported by data from additional independent KO cohorts, 6 (Asnsd1, Srpk2, Dpp8, Cxxc4, Tenm3 and Kiss1) are supported by data from additional internal cohorts, and the remaining 21 including Tle4, Ak5, Ntm, Tusc3, Ankk1, Mfap3l, Prok2 and Prokr2 were studied with HTS cohorts only. CONCLUSION These data support the finding of high body fat in 43 independent external published and/or IMPC KO lines. A novel obese phenotype was identified in 29 additional KO lines, with 27 still lacking the external confirmation now provided for Ksr2 and G2e3 KO mice. Undoubtedly, many mammalian obesity genes remain to be identified and characterized.
Collapse
Affiliation(s)
- David R Powell
- Department of Pharmaceutical Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, TX, USA
| | - Jean-Pierre Revelli
- Department of Pharmaceutical Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, TX, USA
| | - Deon D Doree
- Department of Pharmaceutical Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, TX, USA
| | - Christopher M DaCosta
- Department of Pharmaceutical Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, TX, USA
| | - Urvi Desai
- Department of Pharmaceutical Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, TX, USA
| | - Melanie K Shadoan
- Department of Pharmaceutical Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, TX, USA
| | - Lawrence Rodriguez
- Department of Information Technology, Lexicon Pharmaceuticals, Inc, The Woodlands, Tx, USA
| | - Michael Mullens
- Department of Information Technology, Lexicon Pharmaceuticals, Inc, The Woodlands, Tx, USA
| | - Qi M Yang
- Department of Pharmaceutical Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, TX, USA
| | - Zhi-Ming Ding
- Department of Pharmaceutical Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, TX, USA
| | - Laura L Kirkpatrick
- Department of Molecular Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, Tx, USA
| | - Peter Vogel
- Department of Pharmaceutical Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, TX, USA
| | - Brian Zambrowicz
- Department of Pharmaceutical Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, TX, USA
- Department of Information Technology, Lexicon Pharmaceuticals, Inc, The Woodlands, Tx, USA
- Department of Molecular Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, Tx, USA
| | - Arthur T Sands
- Department of Pharmaceutical Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, TX, USA
- Department of Information Technology, Lexicon Pharmaceuticals, Inc, The Woodlands, Tx, USA
- Department of Molecular Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, Tx, USA
| | - Kenneth A Platt
- Department of Molecular Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, Tx, USA
| | - Gwenn M Hansen
- Department of Molecular Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, Tx, USA
| | - Robert Brommage
- Department of Pharmaceutical Biology, Lexicon Pharmaceuticals, Inc, The Woodlands, TX, USA
| |
Collapse
|
3
|
Konopka T, Ng S, Smedley D. Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base. PLoS Comput Biol 2021; 17:e1009283. [PMID: 34379637 PMCID: PMC8382188 DOI: 10.1371/journal.pcbi.1009283] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 08/23/2021] [Accepted: 07/16/2021] [Indexed: 11/20/2022] Open
Abstract
Integrating reference datasets (e.g. from high-throughput experiments) with unstructured and manually-assembled information (e.g. notes or comments from individual researchers) has the potential to tailor bioinformatic analyses to specific needs and to lead to new insights. However, developing bespoke analysis pipelines from scratch is time-consuming, and general tools for exploring such heterogeneous data are not available. We argue that by treating all data as text, a knowledge-base can accommodate a range of bioinformatic data types and applications. We show that a database coupled to nearest-neighbor algorithms can address common tasks such as gene-set analysis as well as specific tasks such as ontology translation. We further show that a mathematical transformation motivated by diffusion can be effective for exploration across heterogeneous datasets. Diffusion enables the knowledge-base to begin with a sparse query, impute more features, and find matches that would otherwise remain hidden. This can be used, for example, to map multi-modal queries consisting of gene symbols and phenotypes to descriptions of diseases. Diffusion also enables user-driven learning: when the knowledge-base cannot provide satisfactory search results in the first instance, users can improve the results in real-time by adding domain-specific knowledge. User-driven learning has implications for data management, integration, and curation.
Collapse
Affiliation(s)
- Tomasz Konopka
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - Sandra Ng
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| |
Collapse
|
4
|
Exploring the Etiological Links behind Neurodegenerative Diseases: Inflammatory Cytokines and Bioactive Kynurenines. Int J Mol Sci 2020; 21:ijms21072431. [PMID: 32244523 PMCID: PMC7177899 DOI: 10.3390/ijms21072431] [Citation(s) in RCA: 129] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 03/26/2020] [Accepted: 03/30/2020] [Indexed: 02/07/2023] Open
Abstract
Alzheimer's disease (AD) and Parkinson's disease (PD) are the most common neurodegenerative diseases (NDs), presenting a broad range of symptoms from motor dysfunctions to psychobehavioral manifestations. A common clinical course is the proteinopathy-induced neural dysfunction leading to anatomically corresponding neuropathies. However, current diagnostic criteria based on pathology and symptomatology are of little value for the sake of disease prevention and drug development. Overviewing the pathomechanism of NDs, this review incorporates systematic reviews on inflammatory cytokines and tryptophan metabolites kynurenines (KYNs) of human samples, to present an inferential method to explore potential links behind NDs. The results revealed increases of pro-inflammatory cytokines and neurotoxic KYNs in NDs, increases of anti-inflammatory cytokines in AD, PD, Huntington's disease (HD), Creutzfeldt-Jakob disease, and human immunodeficiency virus (HIV)-associated neurocognitive disorders, and decreases of neuromodulatory KYNs in AD, PD, and HD. The results reinforced a strong link between inflammation and neurotoxic KYNs, confirmed activation of adaptive immune response, and suggested a possible role in the decrease of neuromodulatory KYNs, all of which may contribute to the development of chronic low grade inflammation. Commonalities of multifactorial NDs were discussed to present a current limit of diagnostic criteria, a need for preclinical biomarkers, and an approach to search the initiation factors of NDs.
Collapse
|