1
|
Nayar G, Altman RB. Heterogeneous network approaches to protein pathway prediction. Comput Struct Biotechnol J 2024; 23:2727-2739. [PMID: 39035835 PMCID: PMC11260399 DOI: 10.1016/j.csbj.2024.06.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 06/17/2024] [Accepted: 06/18/2024] [Indexed: 07/23/2024] Open
Abstract
Understanding protein-protein interactions (PPIs) and the pathways they comprise is essential for comprehending cellular functions and their links to specific phenotypes. Despite the prevalence of molecular data generated by high-throughput sequencing technologies, a significant gap remains in translating this data into functional information regarding the series of interactions that underlie phenotypic differences. In this review, we present an in-depth analysis of heterogeneous network methodologies for modeling protein pathways, highlighting the critical role of integrating multifaceted biological data. It outlines the process of constructing these networks, from data representation to machine learning-driven predictions and evaluations. The work underscores the potential of heterogeneous networks in capturing the complexity of proteomic interactions, thereby offering enhanced accuracy in pathway prediction. This approach not only deepens our understanding of cellular processes but also opens up new possibilities in disease treatment and drug discovery by leveraging the predictive power of comprehensive proteomic data analysis.
Collapse
Affiliation(s)
- Gowri Nayar
- Department of Biomedical Data Science, Stanford University, United States
| | - Russ B. Altman
- Department of Biomedical Data Science, Stanford University, United States
- Department of Genetics, Stanford University, United States
- Department of Medicine, Stanford University, United States
- Department of Bioengineering, Stanford University, United States
| |
Collapse
|
2
|
Correa Marrero M, Jänes J, Baptista D, Beltrao P. Integrating Large-Scale Protein Structure Prediction into Human Genetics Research. Annu Rev Genomics Hum Genet 2024; 25:123-140. [PMID: 38621234 DOI: 10.1146/annurev-genom-120622-020615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
The last five years have seen impressive progress in deep learning models applied to protein research. Most notably, sequence-based structure predictions have seen transformative gains in the form of AlphaFold2 and related approaches. Millions of missense protein variants in the human population lack annotations, and these computational methods are a valuable means to prioritize variants for further analysis. Here, we review the recent progress in deep learning models applied to the prediction of protein structure and protein variants, with particular emphasis on their implications for human genetics and health. Improved prediction of protein structures facilitates annotations of the impact of variants on protein stability, protein-protein interaction interfaces, and small-molecule binding pockets. Moreover, it contributes to the study of host-pathogen interactions and the characterization of protein function. As genome sequencing in large cohorts becomes increasingly prevalent, we believe that better integration of state-of-the-art protein informatics technologies into human genetics research is of paramount importance.
Collapse
Affiliation(s)
- Miguel Correa Marrero
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | - Jürgen Jänes
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | | | - Pedro Beltrao
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| |
Collapse
|
3
|
Cesnik A, Schaffer LV, Gaur I, Jain M, Ideker T, Lundberg E. Mapping the Multiscale Proteomic Organization of Cellular and Disease Phenotypes. Annu Rev Biomed Data Sci 2024; 7:369-389. [PMID: 38748859 PMCID: PMC11343683 DOI: 10.1146/annurev-biodatasci-102423-113534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
While the primary sequences of human proteins have been cataloged for over a decade, determining how these are organized into a dynamic collection of multiprotein assemblies, with structures and functions spanning biological scales, is an ongoing venture. Systematic and data-driven analyses of these higher-order structures are emerging, facilitating the discovery and understanding of cellular phenotypes. At present, knowledge of protein localization and function has been primarily derived from manual annotation and curation in resources such as the Gene Ontology, which are biased toward richly annotated genes in the literature. Here, we envision a future powered by data-driven mapping of protein assemblies. These maps can capture and decode cellular functions through the integration of protein expression, localization, and interaction data across length scales and timescales. In this review, we focus on progress toward constructing integrated cell maps that accelerate the life sciences and translational research.
Collapse
Affiliation(s)
- Anthony Cesnik
- Department of Bioengineering, Stanford University, Stanford, California, USA;
| | - Leah V Schaffer
- Department of Medicine, University of California San Diego, La Jolla, California, USA;
| | - Ishan Gaur
- Department of Bioengineering, Stanford University, Stanford, California, USA;
| | - Mayank Jain
- Department of Medicine, University of California San Diego, La Jolla, California, USA;
| | - Trey Ideker
- Departments of Computer Science and Engineering and Bioengineering, University of California San Diego, La Jolla, California, USA
- Department of Medicine, University of California San Diego, La Jolla, California, USA;
| | - Emma Lundberg
- Chan Zuckerberg Biohub, San Francisco, California, USA
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm, Sweden
- Department of Pathology, Stanford University, Palo Alto, California, USA
- Department of Bioengineering, Stanford University, Stanford, California, USA;
| |
Collapse
|
4
|
Sinha S, McLaren E, Mullick M, Singh S, Boland BS, Ghosh P. FORWARD: A Learning Framework for Logical Network Perturbations to Prioritize Targets for Drug Development. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.16.602603. [PMID: 39071297 PMCID: PMC11275938 DOI: 10.1101/2024.07.16.602603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Despite advances in artificial intelligence (AI), target-based drug development remains a costly, complex and imprecise process. We introduce F.O.R.W.A.R.D [ Framework for Outcome-based Research and Drug Development ], a network-based target prioritization approach and test its utility in the challenging therapeutic area of Inflammatory Bowel Diseases (IBD), which is a chronic condition of multifactorial origin. F.O.R.W.A.R.D leverages real-world outcomes, using a machine-learning classifier trained on transcriptomic data from seven prospective randomized clinical trials involving four drugs. It establishes a molecular signature of remission as the therapeutic goal and computes, by integrating principles of network connectivity, the likelihood that a drug's action on its target(s) will induce the remission-associated genes. Benchmarking F.O.R.W.A.R.D against 210 completed clinical trials on 52 targets showed a perfect predictive accuracy of 100%. The success of F.O.R.W.A.R.D was achieved despite differences in targets, mechanisms, and trial designs. F.O.R.W.A.R.D-driven in-silico phase '0' trials revealed its potential to inform trial design, justify re-trialing failed drugs, and guide early terminations. With its extendable applications to other therapeutic areas and its iterative refinement with emerging trials, F.O.R.W.A.R.D holds the promise to transform drug discovery by generating foresight from hindsight and impacting research and development as well as human-in-the-loop clinical decision-making.
Collapse
|
5
|
Arowolo O, Suvorov A. Underexplored Molecular Mechanisms of Toxicity. J Xenobiot 2024; 14:939-949. [PMID: 39051348 PMCID: PMC11270369 DOI: 10.3390/jox14030052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 07/01/2024] [Accepted: 07/15/2024] [Indexed: 07/27/2024] Open
Abstract
Social biases may concentrate the attention of researchers on a small number of well-known molecules/mechanisms leaving others underexplored. In accordance with this view, central to mechanistic toxicology is a narrow range of molecular pathways that are assumed to be involved in a significant part of the responses to toxicity. It is unclear, however, if there are other molecular mechanisms which play an important role in toxicity events but are overlooked by toxicology. To identify overlooked genes sensitive to chemical exposures, we used publicly available databases. First, we used data on the published chemical-gene interactions for 17,338 genes to estimate their sensitivity to chemical exposures. Next, we extracted data on publication numbers per gene for 19,243 human genes from the Find My Understudied Genes database. Thresholds were applied to both datasets using our algorithm to identify chemically sensitive and chemically insensitive genes and well-studied and underexplored genes. A total of 1110 underexplored genes highly sensitive to chemical exposures were used in GSEA and Shiny GO analyses to identify enriched biological categories. The metabolism of fatty acids, amino acids, and glucose were identified as underexplored molecular mechanisms sensitive to chemical exposures. These findings suggest that future effort is needed to uncover the role of xenobiotics in the current epidemics of metabolic diseases.
Collapse
Affiliation(s)
| | - Alexander Suvorov
- Department of Environmental Health Sciences, School of Public Health and Health Sciences, University of Massachusetts, 686 North Pleasant Street, Amherst, MA 01003, USA;
| |
Collapse
|
6
|
Ooi E, Xiang R, Chamberlain AJ, Goddard ME. Archetypal clustering reveals physiological mechanisms linking milk yield and fertility in dairy cattle. J Dairy Sci 2024; 107:4726-4742. [PMID: 38369117 DOI: 10.3168/jds.2023-23699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 01/11/2024] [Indexed: 02/20/2024]
Abstract
Fertility in dairy cattle has declined as an unintended consequence of single-trait selection for high milk yield. The unfavorable genetic correlation between milk yield and fertility is now well documented; however, the underlying physiological mechanisms are still uncertain. To understand the relationship between these traits, we developed a method that clusters variants with similar patterns of effects and, after the integration of gene expression data, identifies the genes through which they are likely to act. Biological processes that are enriched in the genes of each cluster were then identified. We identified several clusters with unique patterns of effects. One of the clusters included variants associated with increased milk yield and decreased fertility, where the "archetypal" variant (i.e., the one with the largest effect) was associated with the GC gene, whereas others were associated with TRIM32, LRRK2, and U6-associated snRNA. These genes have been linked to transcription and alternative splicing, suggesting that these processes are likely contributors to the unfavorable relationship between the 2 traits. Another cluster, with archetypal variant near DGAT1 and including variants associated with CDH2, BTRC, SFRP2, ZFHX3, and SLITRK5, appeared to affect milk yield but have little effect on fertility. These genes have been linked to insulin, adipose tissue, and energy metabolism. A third cluster with archetypal variant near ZNF613 and including variants associated with ROBO1, EFNA5, PALLD, GPC6, and PTPRT were associated with fertility but not milk yield. These genes have been linked to GnRH neuronal migration, embryonic development, or ovarian function. The use of archetypal clustering to group variants with similar patterns of effects may assist in identifying the biological processes underlying correlated traits. The method is hypothesis generating and requires experimental confirmation. However, we have uncovered several novel mechanisms potentially affecting milk production and fertility such as GnRH neuronal migration. We anticipate our method to be a starting point for experimental research into novel pathways, which have been previously unexplored within the context of dairy production.
Collapse
Affiliation(s)
- E Ooi
- Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, Victoria 3010, Australia; Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria 3083, Australia.
| | - R Xiang
- Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, Victoria 3010, Australia; Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria 3083, Australia
| | - A J Chamberlain
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria 3083, Australia; School of Applied Systems Biology, La Trobe University, Bundoora, Victoria 3083, Australia
| | - M E Goddard
- Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, Victoria 3010, Australia; Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria 3083, Australia
| |
Collapse
|
7
|
Oba GM, Nakato R. Clover: An unbiased method for prioritizing differentially expressed genes using a data-driven approach. Genes Cells 2024; 29:456-470. [PMID: 38602264 PMCID: PMC11163938 DOI: 10.1111/gtc.13119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/12/2024]
Abstract
Identifying key genes from a list of differentially expressed genes (DEGs) is a critical step in transcriptome analysis. However, current methods, including Gene Ontology analysis and manual annotation, essentially rely on existing knowledge, which is highly biased depending on the extent of the literature. As a result, understudied genes, some of which may be associated with important molecular mechanisms, are often ignored or remain obscure. To address this problem, we propose Clover, a data-driven scoring method to specifically highlight understudied genes. Clover aims to prioritize genes associated with important molecular mechanisms by integrating three metrics: the likelihood of appearing in the DEG list, tissue specificity, and number of publications. We applied Clover to Alzheimer's disease data and confirmed that it successfully detected known associated genes. Moreover, Clover effectively prioritized understudied but potentially druggable genes. Overall, our method offers a novel approach to gene characterization and has the potential to expand our understanding of gene functions. Clover is an open-source software written in Python3 and available on GitHub at https://github.com/G708/Clover.
Collapse
Affiliation(s)
- Gina Miku Oba
- Laboratory of Computational Genomics, Institute for Quantitative BiosciencesUniversity of TokyoTokyoJapan
- Department of Computational Biology and Medical Science, Graduate School of Frontier ScienceUniversity of TokyoTokyoJapan
| | - Ryuichiro Nakato
- Laboratory of Computational Genomics, Institute for Quantitative BiosciencesUniversity of TokyoTokyoJapan
- Department of Computational Biology and Medical Science, Graduate School of Frontier ScienceUniversity of TokyoTokyoJapan
| |
Collapse
|
8
|
Richardson R, Tejedor Navarro H, Amaral LAN, Stoeger T. Meta-Research: Understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. eLife 2024; 12:RP93429. [PMID: 38546716 PMCID: PMC10977968 DOI: 10.7554/elife.93429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/01/2024] Open
Abstract
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although several hypotheses for bias in the selection of genes as research targets have been proposed, their explanatory powers have not yet been compared. Our analysis suggests that understudied genes are systematically abandoned in favor of better-studied genes between the completion of -omics experiments and the reporting of results. Understudied genes remain abandoned by studies that cite these -omics experiments. Conversely, we find that publications on understudied genes may even accrue a greater number of citations. Among 45 biological and experimental factors previously proposed to affect which genes are being studied, we find that 33 are significantly associated with the choice of hit genes presented in titles and abstracts of -omics studies. To promote the investigation of understudied genes, we condense our insights into a tool, find my understudied genes (FMUG), that allows scientists to engage with potential bias during the selection of hits. We demonstrate the utility of FMUG through the identification of genes that remain understudied in vertebrate aging. FMUG is developed in Flutter and is available for download at fmug.amaral.northwestern.edu as a MacOS/Windows app.
Collapse
Affiliation(s)
- Reese Richardson
- Interdisciplinary Biological Sciences, Northwestern UniversityEvanstonUnited States
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
| | - Heliodoro Tejedor Navarro
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- Northwestern Institute on Complex Systems, Northwestern UniversityEvanstonUnited States
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- Northwestern Institute on Complex Systems, Northwestern UniversityEvanstonUnited States
- Department of Molecular Biosciences, Northwestern UniversityEvanstonUnited States
- Department of Physics and Astronomy, Northwestern UniversityEvanstonUnited States
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- The Potocsnak Longevity Institute, Northwestern UniversityChicagoUnited States
- Simpson Querrey Lung Institute for Translational Science, Northwestern UniversityChicagoUnited States
| |
Collapse
|
9
|
Richardson RAK, Tejedor Navarro H, Amaral LAN, Stoeger T. Meta-Research: understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.28.530483. [PMID: 36909550 PMCID: PMC10002660 DOI: 10.1101/2023.02.28.530483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although several hypotheses for bias in the selection of genes as research targets have been proposed, their explanatory powers have not yet been compared. Our analysis suggests that understudied genes are systematically abandoned in favor of better-studied genes between the completion of -omics experiments and the reporting of results. Understudied genes remain abandoned by studies that cite these -omics experiments. Conversely, we find that publications on understudied genes may even accrue a greater number of citations. Among 45 biological and experimental factors previously proposed to affect which genes are being studied, we find that 33 are significantly associated with the choice of hit genes presented in titles and abstracts of - omics studies. To promote the investigation of understudied genes we condense our insights into a tool, find my understudied genes (FMUG), that allows scientists to engage with potential bias during the selection of hits. We demonstrate the utility of FMUG through the identification of genes that remain understudied in vertebrate aging. FMUG is developed in Flutter and is available for download at fmug.amaral.northwestern.edu as a MacOS/Windows app.
Collapse
Affiliation(s)
- Reese AK Richardson
- Interdisciplinary Biological Sciences, Northwestern University
- Department of Chemical and Biological Engineering, Northwestern University
| | - Heliodoro Tejedor Navarro
- Department of Chemical and Biological Engineering, Northwestern University
- Northwestern Institute on Complex Systems, Northwestern University
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern University
- Northwestern Institute on Complex Systems, Northwestern University
- Department of Physics and Astronomy, Northwestern University
- Department of Molecular Biosciences, Northwestern University
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern University
- The Potocsnak Longevity Institute, Northwestern University
- Simpson Querrey Lung Institute for Translational Science, Northwestern University
| |
Collapse
|
10
|
el Bouhaddani S, Höllerhage M, Uh HW, Moebius C, Bickle M, Höglinger G, Houwing-Duistermaat J. Statistical integration of multi-omics and drug screening data from cell lines. PLoS Comput Biol 2024; 20:e1011809. [PMID: 38295113 PMCID: PMC10878536 DOI: 10.1371/journal.pcbi.1011809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 02/20/2024] [Accepted: 01/08/2024] [Indexed: 02/02/2024] Open
Abstract
Data integration methods are used to obtain a unified summary of multiple datasets. For multi-modal data, we propose a computational workflow to jointly analyze datasets from cell lines. The workflow comprises a novel probabilistic data integration method, named POPLS-DA, for multi-omics data. The workflow is motivated by a study on synucleinopathies where transcriptomics, proteomics, and drug screening data are measured in affected LUHMES cell lines and controls. The aim is to highlight potentially druggable pathways and genes involved in synucleinopathies. First, POPLS-DA is used to prioritize genes and proteins that best distinguish cases and controls. For these genes, an integrated interaction network is constructed where the drug screen data is incorporated to highlight druggable genes and pathways in the network. Finally, functional enrichment analyses are performed to identify clusters of synaptic and lysosome-related genes and proteins targeted by the protective drugs. POPLS-DA is compared to other single- and multi-omics approaches. We found that HSPA5, a member of the heat shock protein 70 family, was one of the most targeted genes by the validated drugs, in particular by AT1-blockers. HSPA5 and AT1-blockers have been previously linked to α-synuclein pathology and Parkinson's disease, showing the relevance of our findings. Our computational workflow identified new directions for therapeutic targets for synucleinopathies. POPLS-DA provided a larger interpretable gene set than other single- and multi-omic approaches. An implementation based on R and markdown is freely available online.
Collapse
Affiliation(s)
| | | | - Hae-Won Uh
- Dept. Data science & Biostatistics, UMC Utrecht, Utrecht, Netherlands
| | - Claudia Moebius
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Marc Bickle
- Roche Institute for Translational Bioengineering, Basel, Switzerland
| | - Günter Höglinger
- Department of Neurology, Hannover Medical School, Hannover, Germany
- Department of Neurology, Ludwig-Maximilians-Universität, Munich, Germany
- German Center for Neurodegenerative Diseases, Munich, Germany
- Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Jeanine Houwing-Duistermaat
- Dept. Data science & Biostatistics, UMC Utrecht, Utrecht, Netherlands
- Dept. of Mathematics, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
11
|
Ng JWY, Felix JF, Olson DM. A novel approach to risk exposure and epigenetics-the use of multidimensional context to gain insights into the early origins of cardiometabolic and neurocognitive health. BMC Med 2023; 21:466. [PMID: 38012757 PMCID: PMC10683259 DOI: 10.1186/s12916-023-03168-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Each mother-child dyad represents a unique combination of genetic and environmental factors. This constellation of variables impacts the expression of countless genes. Numerous studies have uncovered changes in DNA methylation (DNAm), a form of epigenetic regulation, in offspring related to maternal risk factors. How these changes work together to link maternal-child risks to childhood cardiometabolic and neurocognitive traits remains unknown. This question is a key research priority as such traits predispose to future non-communicable diseases (NCDs). We propose viewing risk and the genome through a multidimensional lens to identify common DNAm patterns shared among diverse risk profiles. METHODS We identified multifactorial Maternal Risk Profiles (MRPs) generated from population-based data (n = 15,454, Avon Longitudinal Study of Parents and Children (ALSPAC)). Using cord blood HumanMethylation450 BeadChip data, we identified genome-wide patterns of DNAm that co-vary with these MRPs. We tested the prospective relation of these DNAm patterns (n = 914) to future outcomes using decision tree analysis. We then tested the reproducibility of these patterns in (1) DNAm data at age 7 and 17 years within the same cohort (n = 973 and 974, respectively) and (2) cord DNAm in an independent cohort, the Generation R Study (n = 686). RESULTS We identified twenty MRP-related DNAm patterns at birth in ALSPAC. Four were prospectively related to cardiometabolic and/or neurocognitive childhood outcomes. These patterns were replicated in DNAm data from blood collected at later ages. Three of these patterns were externally validated in cord DNAm data in Generation R. Compared to previous literature, DNAm patterns exhibited novel spatial distribution across the genome that intersects with chromatin functional and tissue-specific signatures. CONCLUSIONS To our knowledge, we are the first to leverage multifactorial population-wide data to detect patterns of variability in DNAm. This context-based approach decreases biases stemming from overreliance on specific samples or variables. We discovered molecular patterns demonstrating prospective and replicable relations to complex traits. Moreover, results suggest that patterns harbour a genome-wide organisation specific to chromatin regulation and target tissues. These preliminary findings warrant further investigation to better reflect the reality of human context in molecular studies of NCDs.
Collapse
Affiliation(s)
- Jane W Y Ng
- Department of Pediatrics, Cummings School of Medicine, University of Calgary, 28 Oki Drive NW, Calgary, AB, T3B 6A8, Canada
| | - Janine F Felix
- The Generation F Study Group, Erasmus MC University Medical Center Rotterdam, Postbus, 2040, 3000 CA, Rotterdam, The Netherlands
- Department of Pediatrics, Erasmus MC University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - David M Olson
- Departments of Obstetrics and Gynecology, Physiology, and Pediatrics, Faculty of Medicine and Dentistry, University of Alberta, 220 HMRC, Edmonton, AB, T6G2S2, Canada.
| |
Collapse
|
12
|
Gaiteri C, Connell DR, Sultan FA, Iatrou A, Ng B, Szymanski BK, Zhang A, Tasaki S. Robust, scalable, and informative clustering for diverse biological networks. Genome Biol 2023; 24:228. [PMID: 37828545 PMCID: PMC10571258 DOI: 10.1186/s13059-023-03062-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 09/19/2023] [Indexed: 10/14/2023] Open
Abstract
Clustering molecular data into informative groups is a primary step in extracting robust conclusions from big data. However, due to foundational issues in how they are defined and detected, such clusters are not always reliable, leading to unstable conclusions. We compare popular clustering algorithms across thousands of synthetic and real biological datasets, including a new consensus clustering algorithm-SpeakEasy2: Champagne. These tests identify trends in performance, show no single method is universally optimal, and allow us to examine factors behind variation in performance. Multiple metrics indicate SpeakEasy2 generally provides robust, scalable, and informative clusters for a range of applications.
Collapse
Affiliation(s)
- Chris Gaiteri
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA.
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.
- Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA.
| | - David R Connell
- Rush University Graduate College, Rush University Medical Center, Chicago, IL, USA
| | - Faraz A Sultan
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Artemis Iatrou
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
- Department of Psychiatry, McLean Hospital, Harvard Medical School, Harvard University, Belmont, MA, USA
| | - Bernard Ng
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Boleslaw K Szymanski
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA
- Network Science and Technology Center, Rensselaer Polytechnic Institute, Troy, NY, USA
- Academy of Social Sciences, Łódź, Poland
| | - Ada Zhang
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Shinya Tasaki
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
- Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA
| |
Collapse
|
13
|
Rodríguez-López M, Bordin N, Lees J, Scholes H, Hassan S, Saintain Q, Kamrad S, Orengo C, Bähler J. Broad functional profiling of fission yeast proteins using phenomics and machine learning. eLife 2023; 12:RP88229. [PMID: 37787768 PMCID: PMC10547477 DOI: 10.7554/elife.88229] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023] Open
Abstract
Many proteins remain poorly characterized even in well-studied organisms, presenting a bottleneck for research. We applied phenomics and machine-learning approaches with Schizosaccharomyces pombe for broad cues on protein functions. We assayed colony-growth phenotypes to measure the fitness of deletion mutants for 3509 non-essential genes in 131 conditions with different nutrients, drugs, and stresses. These analyses exposed phenotypes for 3492 mutants, including 124 mutants of 'priority unstudied' proteins conserved in humans, providing varied functional clues. For example, over 900 proteins were newly implicated in the resistance to oxidative stress. Phenotype-correlation networks suggested roles for poorly characterized proteins through 'guilt by association' with known proteins. For complementary functional insights, we predicted Gene Ontology (GO) terms using machine learning methods exploiting protein-network and protein-homology data (NET-FF). We obtained 56,594 high-scoring GO predictions, of which 22,060 also featured high information content. Our phenotype-correlation data and NET-FF predictions showed a strong concordance with existing PomBase GO annotations and protein networks, with integrated analyses revealing 1675 novel GO predictions for 783 genes, including 47 predictions for 23 priority unstudied proteins. Experimental validation identified new proteins involved in cellular aging, showing that these predictions and phenomics data provide a rich resource to uncover new protein functions.
Collapse
Affiliation(s)
- María Rodríguez-López
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Nicola Bordin
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jon Lees
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
- University of BristolBristolUnited Kingdom
| | - Harry Scholes
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Shaimaa Hassan
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
- Helwan University, Faculty of PharmacyCairoEgypt
| | - Quentin Saintain
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Stephan Kamrad
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Christine Orengo
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jürg Bähler
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| |
Collapse
|
14
|
Martínez-Enguita D, Dwivedi SK, Jörnsten R, Gustafsson M. NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures. Brief Bioinform 2023; 24:bbad293. [PMID: 37587790 PMCID: PMC10516364 DOI: 10.1093/bib/bbad293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/25/2023] [Accepted: 07/29/2023] [Indexed: 08/18/2023] Open
Abstract
Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature discovery workflow for DNA methylation analysis utilizing network-coherent autoencoders (NCAEs) with biologically relevant latent embeddings. First, we explored the architecture space of autoencoders trained on a large-scale pan-tissue compendium (n = 75 272) of human epigenome-wide association studies. We observed the emergence of co-localized patterns in the deep autoencoder latent space representations that corresponded to biological network modules. We determined the NCAE configuration with the strongest co-localization and centrality signals in the human protein interactome. Leveraging the NCAE embeddings, we then trained interpretable deep neural networks for risk factor (aging, smoking) and disease (systemic lupus erythematosus) prediction and classification tasks. Remarkably, our NCAE embedding-based models outperformed existing predictors, revealing novel DNA methylation signatures enriched in gene sets and pathways associated with the studied condition in each case. Our data-driven biomarker discovery workflow provides a generally applicable pipeline to capture relevant risk factor and disease information. By surpassing the limitations of knowledge-driven methods, our approach enhances the understanding of complex epigenetic processes, facilitating the development of more effective diagnostic and therapeutic strategies.
Collapse
Affiliation(s)
- David Martínez-Enguita
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| | - Sanjiv K Dwivedi
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| | - Rebecka Jörnsten
- Department of Mathematical Sciences, Chalmers University of Technology, Sweden
| | - Mika Gustafsson
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| |
Collapse
|
15
|
Waury K, de Wit R, Verberk IMW, Teunissen CE, Abeln S. Deciphering Protein Secretion from the Brain to Cerebrospinal Fluid for Biomarker Discovery. J Proteome Res 2023; 22:3068-3080. [PMID: 37606934 PMCID: PMC10476268 DOI: 10.1021/acs.jproteome.3c00366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Indexed: 08/23/2023]
Abstract
Cerebrospinal fluid (CSF) is an essential matrix for the discovery of neurological disease biomarkers. However, the high dynamic range of protein concentrations in CSF hinders the detection of the least abundant protein biomarkers by untargeted mass spectrometry. It is thus beneficial to gain a deeper understanding of the secretion processes within the brain. Here, we aim to explore if and how the secretion of brain proteins to the CSF can be predicted. By combining a curated CSF proteome and the brain elevated proteome of the Human Protein Atlas, brain proteins were classified as CSF or non-CSF secreted. A machine learning model was trained on a range of sequence-based features to differentiate between CSF and non-CSF groups and effectively predict the brain origin of proteins. The classification model achieves an area under the curve of 0.89 if using high confidence CSF proteins. The most important prediction features include the subcellular localization, signal peptides, and transmembrane regions. The classifier generalized well to the larger brain detected proteome and is able to correctly predict novel CSF proteins identified by affinity proteomics. In addition to elucidating the underlying mechanisms of protein secretion, the trained classification model can support biomarker candidate selection.
Collapse
Affiliation(s)
- Katharina Waury
- Department
of Computer Science, Vrije Universiteit
Amsterdam, 1081 HV Amsterdam, The Netherlands
| | - Renske de Wit
- Department
of Computer Science, Vrije Universiteit
Amsterdam, 1081 HV Amsterdam, The Netherlands
| | - Inge M. W. Verberk
- Neurochemistry
Laboratory, Department of Clinical Chemistry, Amsterdam Neuroscience, VU University Medical Center, Amsterdam UMC, 1081 HV Amsterdam, The Netherlands
| | - Charlotte E. Teunissen
- Neurochemistry
Laboratory, Department of Clinical Chemistry, Amsterdam Neuroscience, VU University Medical Center, Amsterdam UMC, 1081 HV Amsterdam, The Netherlands
| | - Sanne Abeln
- Department
of Computer Science, Vrije Universiteit
Amsterdam, 1081 HV Amsterdam, The Netherlands
| |
Collapse
|
16
|
Potter A, Hangas A, Goffart S, Huynen MA, Cabrera-Orefice A, Spelbrink JN. Uncharacterized protein C17orf80 - a novel interactor of human mitochondrial nucleoids. J Cell Sci 2023; 136:jcs260822. [PMID: 37401363 PMCID: PMC10445727 DOI: 10.1242/jcs.260822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 06/26/2023] [Indexed: 07/05/2023] Open
Abstract
Molecular functions of many human proteins remain unstudied, despite the demonstrated association with diseases or pivotal molecular structures, such as mitochondrial DNA (mtDNA). This small genome is crucial for the proper functioning of mitochondria, the energy-converting organelles. In mammals, mtDNA is arranged into macromolecular complexes called nucleoids that serve as functional stations for its maintenance and expression. Here, we aimed to explore an uncharacterized protein C17orf80, which was previously detected close to the nucleoid components by proximity labelling mass spectrometry. To investigate the subcellular localization and function of C17orf80, we took advantage of immunofluorescence microscopy, interaction proteomics and several biochemical assays. We demonstrate that C17orf80 is a mitochondrial membrane-associated protein that interacts with nucleoids even when mtDNA replication is inhibited. In addition, we show that C17orf80 is not essential for mtDNA maintenance and mitochondrial gene expression in cultured human cells. These results provide a basis for uncovering the molecular function of C17orf80 and the nature of its association with nucleoids, possibly leading to new insights about mtDNA and its expression.
Collapse
Affiliation(s)
- Alisa Potter
- Department of Pediatrics, Amalia Children's Hospital, Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
- Radboud Center for Mitochondrial Medicine (RCMM), Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
| | - Anu Hangas
- Department of Environmental and Biological Sciences, University of Eastern Finland, Joensuu, 80101, Finland
| | - Steffi Goffart
- Department of Environmental and Biological Sciences, University of Eastern Finland, Joensuu, 80101, Finland
| | - Martijn A. Huynen
- Department of Medical BioSciences, Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
| | - Alfredo Cabrera-Orefice
- Radboud Center for Mitochondrial Medicine (RCMM), Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
- Department of Medical BioSciences, Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
| | - Johannes N. Spelbrink
- Department of Pediatrics, Amalia Children's Hospital, Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
- Radboud Center for Mitochondrial Medicine (RCMM), Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
| |
Collapse
|
17
|
Hongo JA, de Castro GM, Albuquerque Menezes AP, Rios Picorelli AC, Martins da Silva TT, Imada EL, Marchionni L, Del-Bem LE, Vieira Chaves A, Almeida GMDF, Campelo F, Lobo FP. CALANGO: A phylogeny-aware comparative genomics tool for discovering quantitative genotype-phenotype associations across species. PATTERNS (NEW YORK, N.Y.) 2023; 4:100728. [PMID: 37409050 PMCID: PMC10318336 DOI: 10.1016/j.patter.2023.100728] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 12/08/2022] [Accepted: 03/15/2023] [Indexed: 07/07/2023]
Abstract
Living species vary significantly in phenotype and genomic content. Sophisticated statistical methods linking genes with phenotypes within a species have led to breakthroughs in complex genetic diseases and genetic breeding. Despite the abundance of genomic and phenotypic data available for thousands of species, finding genotype-phenotype associations across species is challenging due to the non-independence of species data resulting from common ancestry. To address this, we present CALANGO (comparative analysis with annotation-based genomic components), a phylogeny-aware comparative genomics tool to find homologous regions and biological roles associated with quantitative phenotypes across species. In two case studies, CALANGO identified both known and previously unidentified genotype-phenotype associations. The first study revealed unknown aspects of the ecological interaction between Escherichia coli, its integrated bacteriophages, and the pathogenicity phenotype. The second identified an association between maximum height in angiosperms and the expansion of a reproductive mechanism that prevents inbreeding and increases genetic diversity, with implications for conservation biology and agriculture.
Collapse
Affiliation(s)
- Jorge Augusto Hongo
- Instituto de Computação, Universidade Estadual de Campinas, Campinas, Sao Paulo 13083-872, Brazil
| | - Giovanni Marques de Castro
- Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
| | - Alison Pelri Albuquerque Menezes
- Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
| | - Agnello César Rios Picorelli
- Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
| | - Thieres Tayroni Martins da Silva
- Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
| | - Eddie Luidy Imada
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Luiz-Eduardo Del-Bem
- Department of Botany, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
| | - Anderson Vieira Chaves
- Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
| | - Gabriel Magno de Freitas Almeida
- Faculty of Biosciences, Fisheries and Economics, Norwegian College of Fishery Science, UiT The Arctic University of Norway, 9019 Tromsø, Norway
| | - Felipe Campelo
- Department of Computer Science, College of Engineering and Physical Sciences, Aston University, Birmingham B4 7ET, UK
| | - Francisco Pereira Lobo
- Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
| |
Collapse
|
18
|
Hellinger R, Sigurdsson A, Wu W, Romanova EV, Li L, Sweedler JV, Süssmuth RD, Gruber CW. Peptidomics. NATURE REVIEWS. METHODS PRIMERS 2023; 3:25. [PMID: 37250919 PMCID: PMC7614574 DOI: 10.1038/s43586-023-00205-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/09/2023] [Indexed: 05/31/2023]
Abstract
Peptides are biopolymers, typically consisting of 2-50 amino acids. They are biologically produced by the cellular ribosomal machinery or by non-ribosomal enzymes and, sometimes, other dedicated ligases. Peptides are arranged as linear chains or cycles, and include post-translational modifications, unusual amino acids and stabilizing motifs. Their structure and molecular size render them a unique chemical space, between small molecules and larger proteins. Peptides have important physiological functions as intrinsic signalling molecules, such as neuropeptides and peptide hormones, for cellular or interspecies communication, as toxins to catch prey or as defence molecules to fend off enemies and microorganisms. Clinically, they are gaining popularity as biomarkers or innovative therapeutics; to date there are more than 60 peptide drugs approved and more than 150 in clinical development. The emerging field of peptidomics comprises the comprehensive qualitative and quantitative analysis of the suite of peptides in a biological sample (endogenously produced, or exogenously administered as drugs). Peptidomics employs techniques of genomics, modern proteomics, state-of-the-art analytical chemistry and innovative computational biology, with a specialized set of tools. The complex biological matrices and often low abundance of analytes typically examined in peptidomics experiments require optimized sample preparation and isolation, including in silico analysis. This Primer covers the combination of techniques and workflows needed for peptide discovery and characterization and provides an overview of various biological and clinical applications of peptidomics.
Collapse
Affiliation(s)
- Roland Hellinger
- Center for Physiology and Pharmacology, Medical University of Vienna, Vienna, Austria
| | - Arnar Sigurdsson
- Institut für Chemie, Technische Universität Berlin, Berlin, Germany
| | - Wenxin Wu
- School of Pharmacy and Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Elena V Romanova
- Department of Chemistry, University of Illinois, Urbana, IL, USA
| | - Lingjun Li
- School of Pharmacy and Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Christian W Gruber
- Center for Physiology and Pharmacology, Medical University of Vienna, Vienna, Austria
| |
Collapse
|
19
|
Sadegh S, Skelton J, Anastasi E, Maier A, Adamowicz K, Möller A, Kriege NM, Kronberg J, Haller T, Kacprowski T, Wipat A, Baumbach J, Blumenthal DB. Lacking mechanistic disease definitions and corresponding association data hamper progress in network medicine and beyond. Nat Commun 2023; 14:1662. [PMID: 36966134 PMCID: PMC10039912 DOI: 10.1038/s41467-023-37349-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 03/13/2023] [Indexed: 03/27/2023] Open
Abstract
A long-term objective of network medicine is to replace our current, mainly phenotype-based disease definitions by subtypes of health conditions corresponding to distinct pathomechanisms. For this, molecular and health data are modeled as networks and are mined for pathomechanisms. However, many such studies rely on large-scale disease association data where diseases are annotated using the very phenotype-based disease definitions the network medicine field aims to overcome. This raises the question to which extent the biases mechanistically inadequate disease annotations introduce in disease association data distort the results of studies which use such data for pathomechanism mining. We address this question using global- and local-scale analyses of networks constructed from disease association data of various types. Our results indicate that large-scale disease association data should be used with care for pathomechanism mining and that analyses of such data should be accompanied by close-up analyses of molecular data for well-characterized patient cohorts.
Collapse
Affiliation(s)
- Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - James Skelton
- School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | - Elisa Anastasi
- School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | - Andreas Maier
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Klaudia Adamowicz
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Anna Möller
- Biomedical Network Science Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Nils M Kriege
- Faculty of Computer Science, University of Vienna, Vienna, Austria
- Research Network Data Science, University of Vienna, Vienna, Austria
| | - Jaanika Kronberg
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Toomas Haller
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Anil Wipat
- School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Computational Biomedicine Lab, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - David B Blumenthal
- Biomedical Network Science Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
| |
Collapse
|
20
|
Holm L, Laiho A, Törönen P, Salgado M. DALI shines a light on remote homologs: One hundred discoveries. Protein Sci 2023; 32:e4519. [PMID: 36419248 PMCID: PMC9793968 DOI: 10.1002/pro.4519] [Citation(s) in RCA: 154] [Impact Index Per Article: 154.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 11/15/2022] [Accepted: 11/20/2022] [Indexed: 11/25/2022]
Abstract
Structural comparison reveals remote homology that often fails to be detected by sequence comparison. The DALI web server (http://ekhidna2.biocenter.helsinki.fi/dali) is a platform for structural analysis that provides database searches and interactive visualization, including structural alignments annotated with secondary structure, protein families and sequence logos, and 3D structure superimposition supported by color-coded sequence and structure conservation. Here, we are using DALI to mine the AlphaFold Database version 1, which increased the structural coverage of protein families by 20%. We found 100 remote homologous relationships hitherto unreported in the current reference database for protein domains, Pfam 35.0. In particular, we linked 35 domains of unknown function (DUFs) to the previously characterized families, generating a functional hypothesis that can be explored downstream in structural biology studies. Other findings include gene fusions, tandem duplications, and adjustments to domain boundaries. The evidence for homology can be browsed interactively through live examples on DALI's website.
Collapse
Affiliation(s)
- Liisa Holm
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| | - Aleksi Laiho
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| | - Petri Törönen
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| | - Marco Salgado
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| |
Collapse
|
21
|
Toner TM, Pancholi R, Miller P, Forster T, Coleman HG, Overton IM. Strategies and techniques for quality control and semantic enrichment with multimodal data: a case study in colorectal cancer with eHDPrep. Gigascience 2022; 12:giad030. [PMID: 37171130 PMCID: PMC10176503 DOI: 10.1093/gigascience/giad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 02/19/2023] [Accepted: 04/19/2023] [Indexed: 05/13/2023] Open
Abstract
BACKGROUND Integration of data from multiple domains can greatly enhance the quality and applicability of knowledge generated in analysis workflows. However, working with health data is challenging, requiring careful preparation in order to support meaningful interpretation and robust results. Ontologies encapsulate relationships between variables that can enrich the semantic content of health datasets to enhance interpretability and inform downstream analyses. FINDINGS We developed an R package for electronic health data preparation, "eHDPrep," demonstrated upon a multimodal colorectal cancer dataset (661 patients, 155 variables; Colo-661); a further demonstrator is taken from The Cancer Genome Atlas (459 patients, 94 variables; TCGA-COAD). eHDPrep offers user-friendly methods for quality control, including internal consistency checking and redundancy removal with information-theoretic variable merging. Semantic enrichment functionality is provided, enabling generation of new informative "meta-variables" according to ontological common ancestry between variables, demonstrated with SNOMED CT and the Gene Ontology in the current study. eHDPrep also facilitates numerical encoding, variable extraction from free text, completeness analysis, and user review of modifications to the dataset. CONCLUSIONS eHDPrep provides effective tools to assess and enhance data quality, laying the foundation for robust performance and interpretability in downstream analyses. Application to multimodal colorectal cancer datasets resulted in improved data quality, structuring, and robust encoding, as well as enhanced semantic information. We make eHDPrep available as an R package from CRAN (https://cran.r-project.org/package = eHDPrep) and GitHub (https://github.com/overton-group/eHDPrep).
Collapse
Affiliation(s)
- Tom M Toner
- Patrick G. Johnston Centre for Cancer Research, Queen’s University Belfast, Belfast BT9 7AE, UK
- Health Data Research Wales and Northern Ireland, Queen’s University Belfast, Belfast BT9 7AE, UK
| | - Rashi Pancholi
- Patrick G. Johnston Centre for Cancer Research, Queen’s University Belfast, Belfast BT9 7AE, UK
- Health Data Research Wales and Northern Ireland, Queen’s University Belfast, Belfast BT9 7AE, UK
| | - Paul Miller
- Health Data Research Wales and Northern Ireland, Queen’s University Belfast, Belfast BT9 7AE, UK
- The Centre for Secure Information Technologies, Queen’s University Belfast, Belfast BT3 9DT, UK
| | | | - Helen G Coleman
- Patrick G. Johnston Centre for Cancer Research, Queen’s University Belfast, Belfast BT9 7AE, UK
- Centre for Public Health, Queen’s University Belfast, Belfast BT12 6BA, UK
| | - Ian M Overton
- Patrick G. Johnston Centre for Cancer Research, Queen’s University Belfast, Belfast BT9 7AE, UK
- Health Data Research Wales and Northern Ireland, Queen’s University Belfast, Belfast BT9 7AE, UK
| |
Collapse
|
22
|
Yu JSL, Heineike BM, Hartl J, Aulakh SK, Correia-Melo C, Lehmann A, Lemke O, Agostini F, Lee CT, Demichev V, Messner CB, Mülleder M, Ralser M. Inorganic sulfur fixation via a new homocysteine synthase allows yeast cells to cooperatively compensate for methionine auxotrophy. PLoS Biol 2022; 20:e3001912. [PMID: 36455053 PMCID: PMC9757880 DOI: 10.1371/journal.pbio.3001912] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 12/16/2022] [Accepted: 11/14/2022] [Indexed: 12/03/2022] Open
Abstract
The assimilation, incorporation, and metabolism of sulfur is a fundamental process across all domains of life, yet how cells deal with varying sulfur availability is not well understood. We studied an unresolved conundrum of sulfur fixation in yeast, in which organosulfur auxotrophy caused by deletion of the homocysteine synthase Met17p is overcome when cells are inoculated at high cell density. In combining the use of self-establishing metabolically cooperating (SeMeCo) communities with proteomic, genetic, and biochemical approaches, we discovered an uncharacterized gene product YLL058Wp, herein named Hydrogen Sulfide Utilizing-1 (HSU1). Hsu1p acts as a homocysteine synthase and allows the cells to substitute for Met17p by reassimilating hydrosulfide ions leaked from met17Δ cells into O-acetyl-homoserine and forming homocysteine. Our results show that cells can cooperate to achieve sulfur fixation, indicating that the collective properties of microbial communities facilitate their basic metabolic capacity to overcome sulfur limitation.
Collapse
Affiliation(s)
- Jason S. L. Yu
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Benjamin M. Heineike
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Johannes Hartl
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Simran K. Aulakh
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Clara Correia-Melo
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Andrea Lehmann
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Oliver Lemke
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Federica Agostini
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Cory T. Lee
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Vadim Demichev
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Christoph B. Messner
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Michael Mülleder
- Core Facility—High Throughput Mass Spectrometry, Charité Universitätsmedizin, Berlin, Germany
| | - Markus Ralser
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| |
Collapse
|
23
|
Murray GC, Bais P, Hatton CL, Tadenev ALD, Hoffmann BR, Stodola TJ, Morelli KH, Pratt SL, Schroeder D, Doty R, Fiehn O, John SWM, Bult CJ, Cox GA, Burgess RW. Mouse models of NADK2 deficiency analyzed for metabolic and gene expression changes to elucidate pathophysiology. Hum Mol Genet 2022; 31:4055-4074. [PMID: 35796562 PMCID: PMC9703942 DOI: 10.1093/hmg/ddac151] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 06/17/2022] [Accepted: 06/30/2022] [Indexed: 11/13/2022] Open
Abstract
NADK2 encodes the mitochondrial form of nicotinamide adenine dinucleotide (NAD) kinase, which phosphorylates NAD. Rare recessive mutations in human NADK2 are associated with a syndromic neurological mitochondrial disease that includes metabolic changes, such as hyperlysinemia and 2,4 dienoyl CoA reductase (DECR) deficiency. However, the full pathophysiology resulting from NADK2 deficiency is not known. Here, we describe two chemically induced mouse mutations in Nadk2-S326L and S330P-which cause severe neuromuscular disease and shorten lifespan. The S330P allele was characterized in detail and shown to have marked denervation of neuromuscular junctions by 5 weeks of age and muscle atrophy by 11 weeks of age. Cerebellar Purkinje cells also showed progressive degeneration in this model. Transcriptome profiling on brain and muscle was performed at early and late disease stages. In addition, metabolomic profiling was performed on the brain, muscle, liver and spinal cord at the same ages and on plasma at 5 weeks. Combined transcriptomic and metabolomic analyses identified hyperlysinemia, DECR deficiency and generalized metabolic dysfunction in Nadk2 mutant mice, indicating relevance to the human disease. We compared findings from the Nadk model to equivalent RNA sequencing and metabolomic datasets from a mouse model of infantile neuroaxonal dystrophy, caused by recessive mutations in Pla2g6. This enabled us to identify disrupted biological processes that are common between these mouse models of neurological disease, as well as those processes that are gene-specific. These findings improve our understanding of the pathophysiology of neuromuscular diseases and describe mouse models that will be useful for future preclinical studies.
Collapse
Affiliation(s)
- G C Murray
- The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA
- The Graduate School of Biomedical Science and Engineering, University of Maine, Orono, ME 04469, USA
| | - P Bais
- The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA
| | - C L Hatton
- The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA
| | - A L D Tadenev
- The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA
| | - B R Hoffmann
- The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA
| | - T J Stodola
- The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA
| | - K H Morelli
- The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA
- The Graduate School of Biomedical Science and Engineering, University of Maine, Orono, ME 04469, USA
| | - S L Pratt
- The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA
- Neuroscience Program, Graduate School of Biomedical Sciences, Tufts University, Boston, MA 02111, USA
| | - D Schroeder
- The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA
| | - R Doty
- The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA
| | - O Fiehn
- West Coast Metabolomics Center, University of California Davis, 451 Health Science Dr., Davis, CA 95618, USA
| | - S W M John
- The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA
- Department of Ophthalmology, Columbia University Irving Medical Center, New York, NY 10032, USA
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10032, USA
| | - C J Bult
- The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA
- The Graduate School of Biomedical Science and Engineering, University of Maine, Orono, ME 04469, USA
| | - G A Cox
- The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA
- The Graduate School of Biomedical Science and Engineering, University of Maine, Orono, ME 04469, USA
- Neuroscience Program, Graduate School of Biomedical Sciences, Tufts University, Boston, MA 02111, USA
| | - R W Burgess
- The Jackson Laboratory, 600 Main St., Bar Harbor, ME 04609, USA
- The Graduate School of Biomedical Science and Engineering, University of Maine, Orono, ME 04469, USA
- Neuroscience Program, Graduate School of Biomedical Sciences, Tufts University, Boston, MA 02111, USA
| |
Collapse
|
24
|
Byrne JA, Park Y, Richardson RAK, Pathmendra P, Sun M, Stoeger T. Protection of the human gene research literature from contract cheating organizations known as research paper mills. Nucleic Acids Res 2022; 50:12058-12070. [PMID: 36477580 PMCID: PMC9757046 DOI: 10.1093/nar/gkac1139] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 11/08/2022] [Accepted: 11/14/2022] [Indexed: 12/12/2022] Open
Abstract
Human gene research generates new biology insights with translational potential, yet few studies have considered the health of the human gene literature. The accessibility of human genes for targeted research, combined with unreasonable publication pressures and recent developments in scholarly publishing, may have created a market for low-quality or fraudulent human gene research articles, including articles produced by contract cheating organizations known as paper mills. This review summarises the evidence that paper mills contribute to the human gene research literature at scale and outlines why targeted gene research may be particularly vulnerable to systematic research fraud. To raise awareness of targeted gene research from paper mills, we highlight features of problematic manuscripts and publications that can be detected by gene researchers and/or journal staff. As improved awareness and detection could drive the further evolution of paper mill-supported publications, we also propose changes to academic publishing to more effectively deter and correct problematic publications at scale. In summary, the threat of paper mill-supported gene research highlights the need for all researchers to approach the literature with a more critical mindset, and demand publications that are underpinned by plausible research justifications, rigorous experiments and fully transparent reporting.
Collapse
Affiliation(s)
- Jennifer A Byrne
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, NSW, Australia
- NSW Health Statewide Biobank, NSW Health Pathology, Camperdown, NSW, Australia
| | - Yasunori Park
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, NSW, Australia
| | - Reese A K Richardson
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, USA
| | - Pranujan Pathmendra
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, NSW, Australia
| | - Mengyi Sun
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, USA
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, USA
- Successful Clinical Response in Pneumonia Therapy (SCRIPT) Systems Biology Center, Northwestern University, Evanston, USA
- Center for Genetic Medicine, Northwestern University School of Medicine, Chicago, USA
| |
Collapse
|
25
|
Hagihara H, Shoji H, Kuroiwa M, Graef IA, Crabtree GR, Nishi A, Miyakawa T. Forebrain-specific conditional calcineurin deficiency induces dentate gyrus immaturity and hyper-dopaminergic signaling in mice. Mol Brain 2022; 15:94. [PMID: 36414974 PMCID: PMC9682671 DOI: 10.1186/s13041-022-00981-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 11/12/2022] [Indexed: 11/24/2022] Open
Abstract
Calcineurin (Cn), a phosphatase important for synaptic plasticity and neuronal development, has been implicated in the etiology and pathophysiology of neuropsychiatric disorders, including schizophrenia, intellectual disability, autism spectrum disorders, epilepsy, and Alzheimer's disease. Forebrain-specific conditional Cn knockout mice have been known to exhibit multiple behavioral phenotypes related to these disorders. In this study, we investigated whether Cn mutant mice show pseudo-immaturity of the dentate gyrus (iDG) in the hippocampus, which we have proposed as an endophenotype shared by these disorders. Expression of calbindin and GluA1, typical markers for mature DG granule cells (GCs), was decreased and that of doublecortin, calretinin, phospho-CREB, and dopamine D1 receptor (Drd1), markers for immature GC, was increased in Cn mutants. Phosphorylation of cAMP-dependent protein kinase (PKA) substrates (GluA1, ERK2, DARPP-32, PDE4) was increased and showed higher sensitivity to SKF81297, a Drd1-like agonist, in Cn mutants than in controls. While cAMP/PKA signaling is increased in the iDG of Cn mutants, chronic treatment with rolipram, a selective PDE4 inhibitor that increases intracellular cAMP, ameliorated the iDG phenotype significantly and nesting behavior deficits with nominal significance. Chronic rolipram administration also decreased the phosphorylation of CREB, but not the other four PKA substrates examined, in Cn mutants. These results suggest that Cn deficiency induces pseudo-immaturity of GCs and that cAMP signaling increases to compensate for this maturation abnormality. This study further supports the idea that iDG is an endophenotype shared by certain neuropsychiatric disorders.
Collapse
Affiliation(s)
- Hideo Hagihara
- Division of Systems Medical Science, Center for Medical Science, Fujita Health University, Toyoake, Aichi 470-1192 Japan
| | - Hirotaka Shoji
- Division of Systems Medical Science, Center for Medical Science, Fujita Health University, Toyoake, Aichi 470-1192 Japan
| | - Mahomi Kuroiwa
- Department of Pharmacology, Kurume University School of Medicine, Kurume, Fukuoka 830-0011 Japan
| | - Isabella A. Graef
- Department of Pathology, Stanford University of Medicine, Stanford, CA 94305 USA
| | - Gerald R. Crabtree
- Department of Pathology, Stanford University of Medicine, Stanford, CA 94305 USA
| | - Akinori Nishi
- Department of Pharmacology, Kurume University School of Medicine, Kurume, Fukuoka 830-0011 Japan
| | - Tsuyoshi Miyakawa
- Division of Systems Medical Science, Center for Medical Science, Fujita Health University, Toyoake, Aichi 470-1192 Japan
| |
Collapse
|
26
|
Gable AL, Szklarczyk D, Lyon D, Matias Rodrigues JF, von Mering C. Systematic assessment of pathway databases, based on a diverse collection of user-submitted experiments. Brief Bioinform 2022; 23:bbac355. [PMID: 36088548 PMCID: PMC9487593 DOI: 10.1093/bib/bbac355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 07/13/2022] [Accepted: 07/30/2022] [Indexed: 11/14/2022] Open
Abstract
A knowledge-based grouping of genes into pathways or functional units is essential for describing and understanding cellular complexity. However, it is not always clear a priori how and at what level of specificity functionally interconnected genes should be partitioned into pathways, for a given application. Here, we assess and compare nine existing and two conceptually novel functional classification systems, with respect to their discovery power and generality in gene set enrichment testing. We base our assessment on a collection of nearly 2000 functional genomics datasets provided by users of the STRING database. With these real-life and diverse queries, we assess which systems typically provide the most specific and complete enrichment results. We find many structural and performance differences between classification systems. Overall, the well-established, hierarchically organized pathway annotation systems yield the best enrichment performance, despite covering substantial parts of the human genome in general terms only. On the other hand, the more recent unsupervised annotation systems perform strongest in understudied areas and organisms, and in detecting more specific pathways, albeit with less informative labels.
Collapse
Affiliation(s)
- Annika L Gable
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | - Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - David Lyon
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | | | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
27
|
WhichTF is functionally important in your open chromatin data? PLoS Comput Biol 2022; 18:e1010378. [PMID: 36040971 PMCID: PMC9426921 DOI: 10.1371/journal.pcbi.1010378] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 07/11/2022] [Indexed: 11/19/2022] Open
Abstract
We present WhichTF, a computational method to identify functionally important transcription factors (TFs) from chromatin accessibility measurements. To rank TFs, WhichTF applies an ontology-guided functional approach to compute novel enrichment by integrating accessibility measurements, high-confidence pre-computed conservation-aware TF binding sites, and putative gene-regulatory models. Comparison with prior sheer abundance-based methods reveals the unique ability of WhichTF to identify context-specific TFs with functional relevance, including NF-κB family members in lymphocytes and GATA factors in cardiac cells. To distinguish the transcriptional regulatory landscape in closely related samples, we apply differential analysis and demonstrate its utility in lymphocyte, mesoderm developmental, and disease cells. We find suggestive, under-characterized TFs, such as RUNX3 in mesoderm development and GLI1 in systemic lupus erythematosus. We also find TFs known for stress response, suggesting routine experimental caveats that warrant careful consideration. WhichTF yields biological insight into known and novel molecular mechanisms of TF-mediated transcriptional regulation in diverse contexts, including human and mouse cell types, cell fate trajectories, and disease-associated cells. Transcription factors (TFs), a class of DNA binding proteins, regulate tissue- and cell-type-specific expression of genes. Identifying the critical TFs in a given cellular context leads to investigating molecular regulatory mechanisms in development, differentiation, and disease. Because there are more than 1,500 human TFs, experimental measurements of genome-wide occupancy across all TFs have been challenging. While computational approaches play pivotal roles, most existing methods rely on statistical enrichment, focusing either on sequence motif similarity recognized by TFs or the similarity of the genomic region of interest with the previously characterized TF occupancy profile. Here we propose WhichTF as an alternative, incorporating curated biomedical knowledge from ontology and integrating it with the high-confidence prediction of conserved TF binding sites in user-provided genomic regions of interest. We develop a new WhichTF score to rank TFs and demonstrate its applicability across human and mouse cell types, cellular differentiation trajectories, and disease-associated cells.
Collapse
|
28
|
Proteome-Wide Differential Effects of Peritoneal Dialysis Fluid Properties in an In Vitro Human Endothelial Cell Model. Int J Mol Sci 2022; 23:ijms23148010. [PMID: 35887356 PMCID: PMC9317527 DOI: 10.3390/ijms23148010] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 07/13/2022] [Accepted: 07/15/2022] [Indexed: 01/27/2023] Open
Abstract
To replace kidney function, peritoneal dialysis (PD) utilizes hyperosmotic PD fluids with specific physico-chemical properties. Their composition induces progressive damage of the peritoneum, leading to vasculopathies, decline of membrane function, and PD technique failure. Clinically used PD fluids differ in their composition but still remain bioincompatible. We mapped the molecular pathomechanisms in human endothelial cells induced by the different characteristics of widely used PD fluids by proteomics. Of 7894 identified proteins, 3871 were regulated at least by 1 and 49 by all tested PD fluids. The latter subset was enriched for cell junction-associated proteins. The different PD fluids individually perturbed proteins commonly related to cell stress, survival, and immune function pathways. Modeling two major bioincompatibility factors of PD fluids, acidosis, and glucose degradation products (GDPs) revealed distinct effects on endothelial cell function and regulation of cellular stress responses. Proteins and pathways most strongly affected were members of the oxidative stress response. Addition of the antioxidant and cytoprotective additive, alanyl-glutamine (AlaGln), to PD fluids led to upregulation of thioredoxin reductase-1, an antioxidant protein, potentially explaining the cytoprotective effect of AlaGln. In conclusion, we mapped out the molecular response of endothelial cells to PD fluids, and provided new evidence for their specific pathomechanisms, crucial for improvement of PD therapies.
Collapse
|
29
|
Garrido‐Rodriguez M, Zirngibl K, Ivanova O, Lobentanzer S, Saez‐Rodriguez J. Integrating knowledge and omics to decipher mechanisms via large-scale models of signaling networks. Mol Syst Biol 2022; 18:e11036. [PMID: 35880747 PMCID: PMC9316933 DOI: 10.15252/msb.202211036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 05/12/2022] [Accepted: 05/31/2022] [Indexed: 11/10/2022] Open
Abstract
Signal transduction governs cellular behavior, and its dysregulation often leads to human disease. To understand this process, we can use network models based on prior knowledge, where nodes represent biomolecules, usually proteins, and edges indicate interactions between them. Several computational methods combine untargeted omics data with prior knowledge to estimate the state of signaling networks in specific biological scenarios. Here, we review, compare, and classify recent network approaches according to their characteristics in terms of input omics data, prior knowledge and underlying methodologies. We highlight existing challenges in the field, such as the general lack of ground truth and the limitations of prior knowledge. We also point out new omics developments that may have a profound impact, such as single-cell proteomics or large-scale profiling of protein conformational changes. We provide both an introduction for interested users seeking strategies to study cell signaling on a large scale and an update for seasoned modelers.
Collapse
Affiliation(s)
- Martin Garrido‐Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University HospitalInstitute for Computational Biomedicine, BioquantHeidelbergGermany
| | - Katharina Zirngibl
- Heidelberg University, Faculty of Medicine, and Heidelberg University HospitalInstitute for Computational Biomedicine, BioquantHeidelbergGermany
| | - Olga Ivanova
- Heidelberg University, Faculty of Medicine, and Heidelberg University HospitalInstitute for Computational Biomedicine, BioquantHeidelbergGermany
| | - Sebastian Lobentanzer
- Heidelberg University, Faculty of Medicine, and Heidelberg University HospitalInstitute for Computational Biomedicine, BioquantHeidelbergGermany
| | - Julio Saez‐Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University HospitalInstitute for Computational Biomedicine, BioquantHeidelbergGermany
| |
Collapse
|
30
|
|
31
|
Willsey HR, Willsey AJ, Wang B, State MW. Genomics, convergent neuroscience and progress in understanding autism spectrum disorder. Nat Rev Neurosci 2022; 23:323-341. [PMID: 35440779 PMCID: PMC10693992 DOI: 10.1038/s41583-022-00576-7] [Citation(s) in RCA: 76] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/18/2022] [Indexed: 12/31/2022]
Abstract
More than a hundred genes have been identified that, when disrupted, impart large risk for autism spectrum disorder (ASD). Current knowledge about the encoded proteins - although incomplete - points to a very wide range of developmentally dynamic and diverse biological processes. Moreover, the core symptoms of ASD involve distinctly human characteristics, presenting challenges to interpreting evolutionarily distant model systems. Indeed, despite a decade of striking progress in gene discovery, an actionable understanding of pathobiology remains elusive. Increasingly, convergent neuroscience approaches have been recognized as an important complement to traditional uses of genetics to illuminate the biology of human disorders. These methods seek to identify intersection among molecular-level, cellular-level and circuit-level functions across multiple risk genes and have highlighted developing excitatory neurons in the human mid-gestational prefrontal cortex as an important pathobiological nexus in ASD. In addition, neurogenesis, chromatin modification and synaptic function have emerged as key potential mediators of genetic vulnerability. The continued expansion of foundational 'omics' data sets, the application of higher-throughput model systems and incorporating developmental trajectories and sex differences into future analyses will refine and extend these results. Ultimately, a systems-level understanding of ASD genetic risk holds promise for clarifying pathobiology and advancing therapeutics.
Collapse
Affiliation(s)
- Helen Rankin Willsey
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - A Jeremy Willsey
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA, USA.
| | - Belinda Wang
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
- Langley Porter Psychiatric Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Matthew W State
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA, USA.
- Langley Porter Psychiatric Institute, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
32
|
Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber J. An open invitation to the Understudied Proteins Initiative. Nat Biotechnol 2022; 40:815-817. [PMID: 35534555 DOI: 10.1038/s41587-022-01316-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Georg Kustatscher
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, UK.
| | | | - Anne-Claude Gingras
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Sinai Health System, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Tiannan Guo
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, China
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Kathryn S Lilley
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Emma Lundberg
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH-Royal Institute of Technology, Stockholm, Sweden.,Department of Bioengineering, Stanford University, Stanford, CA, USA.,Department of Pathology, Stanford University, Stanford, CA, USA.,Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Edward M Marcotte
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, TX, USA
| | - Markus Ralser
- Department of Biochemistry, Charité University Medicine, Berlin, Germany.,The Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK
| | - Juri Rappsilber
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, UK. .,Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany. .,Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
33
|
Basili D, Reynolds J, Houghton J, Malcomber S, Chambers B, Liddell M, Muller I, White A, Shah I, Everett LJ, Middleton A, Bender A. Latent Variables Capture Pathway-Level Points of Departure in High-Throughput Toxicogenomic Data. Chem Res Toxicol 2022; 35:670-683. [PMID: 35333521 PMCID: PMC9019810 DOI: 10.1021/acs.chemrestox.1c00444] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Indexed: 11/28/2022]
Abstract
Estimation of points of departure (PoDs) from high-throughput transcriptomic data (HTTr) represents a key step in the development of next-generation risk assessment (NGRA). Current approaches mainly rely on single key gene targets, which are constrained by the information currently available in the knowledge base and make interpretation challenging as scientists need to interpret PoDs for thousands of genes or hundreds of pathways. In this work, we aimed to address these issues by developing a computational workflow to investigate the pathway concentration-response relationships in a way that is not fully constrained by known biology and also facilitates interpretation. We employed the Pathway-Level Information ExtractoR (PLIER) to identify latent variables (LVs) describing biological activity and then investigated in vitro LVs' concentration-response relationships using the ToxCast pipeline. We applied this methodology to a published transcriptomic concentration-response data set for 44 chemicals in MCF-7 cells and showed that our workflow can capture known biological activity and discriminate between estrogenic and antiestrogenic compounds as well as activity not aligning with the existing knowledge base, which may be relevant in a risk assessment scenario. Moreover, we were able to identify the known estrogen activity in compounds that are not well-established ER agonists/antagonists supporting the use of the workflow in read-across. Next, we transferred its application to chemical compounds tested in HepG2, HepaRG, and MCF-7 cells and showed that PoD estimates are in strong agreement with those estimated using a recently developed Bayesian approach (cor = 0.89) and in weak agreement with those estimated using a well-established approach such as BMDExpress2 (cor = 0.57). These results demonstrate the effectiveness of using PLIER in a concentration-response scenario to investigate pathway activity in a way that is not fully constrained by the knowledge base and to ease the biological interpretation and support the development of an NGRA framework with the ability to improve current risk assessment strategies for chemicals using new approach methodologies.
Collapse
Affiliation(s)
- Danilo Basili
- Department
of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K.
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Joe Reynolds
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Jade Houghton
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Sophie Malcomber
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Bryant Chambers
- Center
for Computational Toxicology and Exposure, Office of Research and
Development, U.S. Environmental Protection
Agency, Research Triangle Park, North Carolina 27711, United States
| | - Mark Liddell
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Iris Muller
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Andrew White
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Imran Shah
- Center
for Computational Toxicology and Exposure, Office of Research and
Development, U.S. Environmental Protection
Agency, Research Triangle Park, North Carolina 27711, United States
| | - Logan J. Everett
- Center
for Computational Toxicology and Exposure, Office of Research and
Development, U.S. Environmental Protection
Agency, Research Triangle Park, North Carolina 27711, United States
| | - Alistair Middleton
- Unilever,
Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K.
| | - Andreas Bender
- Department
of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K.
| |
Collapse
|
34
|
Makarious MB, Leonard HL, Vitale D, Iwaki H, Sargent L, Dadu A, Violich I, Hutchins E, Saffo D, Bandres-Ciga S, Kim JJ, Song Y, Maleknia M, Bookman M, Nojopranoto W, Campbell RH, Hashemi SH, Botia JA, Carter JF, Craig DW, Van Keuren-Jensen K, Morris HR, Hardy JA, Blauwendraat C, Singleton AB, Faghri F, Nalls MA. Multi-modality machine learning predicting Parkinson's disease. NPJ Parkinsons Dis 2022; 8:35. [PMID: 35365675 PMCID: PMC8975993 DOI: 10.1038/s41531-022-00288-w] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 02/01/2022] [Indexed: 02/06/2023] Open
Abstract
Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multimodal data is key moving forward. We build upon previous work to deliver multimodal predictions of Parkinson's disease (PD) risk and systematically develop a model using GenoML, an automated ML package, to make improved multi-omic predictions of PD, validated in an external cohort. We investigated top features, constructed hypothesis-free disease-relevant networks, and investigated drug-gene interactions. We performed automated ML on multimodal data from the Parkinson's progression marker initiative (PPMI). After selecting the best performing algorithm, all PPMI data was used to tune the selected model. The model was validated in the Parkinson's Disease Biomarker Program (PDBP) dataset. Our initial model showed an area under the curve (AUC) of 89.72% for the diagnosis of PD. The tuned model was then tested for validation on external data (PDBP, AUC 85.03%). Optimizing thresholds for classification increased the diagnosis prediction accuracy and other metrics. Finally, networks were built to identify gene communities specific to PD. Combining data modalities outperforms the single biomarker paradigm. UPSIT and PRS contributed most to the predictive power of the model, but the accuracy of these are supplemented by many smaller effect transcripts and risk SNPs. Our model is best suited to identifying large groups of individuals to monitor within a health registry or biobank to prioritize for further testing. This approach allows complex predictive models to be reproducible and accessible to the community, with the package, code, and results publicly available.
Collapse
Affiliation(s)
- Mary B Makarious
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
- UCL Movement Disorders Centre, University College London, London, UK
| | - Hampton L Leonard
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA
- Data Tecnica International LLC, Glen Echo, MD, USA
- German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
| | - Dan Vitale
- Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA
- Data Tecnica International LLC, Glen Echo, MD, USA
| | - Hirotaka Iwaki
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA
- Data Tecnica International LLC, Glen Echo, MD, USA
| | - Lana Sargent
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA
- School of Nursing, Virginia Commonwealth University, Richmond, VA, USA
- Geriatric Pharmacotherapy Program, School of Pharmacy, Virginia Commonwealth University, Richmond, VA, USA
| | - Anant Dadu
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Ivo Violich
- Institute of Translational Genomics, University of Southern California, Los Angeles, CA, USA
| | - Elizabeth Hutchins
- Neurogenomics Division, Translational Genomics Research Institute (TGen), Phoenix, AZ, USA
| | - David Saffo
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Sara Bandres-Ciga
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Jonggeol Jeff Kim
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
- Preventive Neurology Unit, Wolfson Institute of Preventive Medicine, Queen Mary University of London, London, UK
| | - Yeajin Song
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
- Data Tecnica International LLC, Glen Echo, MD, USA
| | | | - Matt Bookman
- Verily Life Sciences, South San Francisco, CA, USA
| | | | - Roy H Campbell
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Sayed Hadi Hashemi
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Juan A Botia
- Department of Molecular Neuroscience, UCL Queen Square Institute of Neurology, London, UK
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | | | - David W Craig
- Institute of Translational Genomics, University of Southern California, Los Angeles, CA, USA
| | | | - Huw R Morris
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
- UCL Movement Disorders Centre, University College London, London, UK
| | - John A Hardy
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
- UCL Movement Disorders Centre, University College London, London, UK
- UK Dementia Research Institute and Department of Neurodegenerative Disease and Reta Lila Weston Institute, London, UK
- Institute for Advanced Study, The Hong Kong University of Science and Technology, Hong Kong, Hong Kong SAR, China
| | - Cornelis Blauwendraat
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Andrew B Singleton
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA
| | - Faraz Faghri
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
- Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA.
- Data Tecnica International LLC, Glen Echo, MD, USA.
| | - Mike A Nalls
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
- Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA.
- Data Tecnica International LLC, Glen Echo, MD, USA.
| |
Collapse
|
35
|
Abstract
It is often claimed that only experiments can support strong causal inferences and therefore they should be privileged in the behavioral sciences. We disagree. Overvaluing experiments results in their overuse both by researchers and decision makers and in an underappreciation of their shortcomings. Neglect of other methods often follows. Experiments can suggest whether X causes Y in a specific experimental setting; however, they often fail to elucidate either the mechanisms responsible for an effect or the strength of an effect in everyday natural settings. In this article, we consider two overarching issues. First, experiments have important limitations. We highlight problems with external, construct, statistical-conclusion, and internal validity; replicability; and conceptual issues associated with simple X causes Y thinking. Second, quasi-experimental and nonexperimental methods are absolutely essential. As well as themselves estimating causal effects, these other methods can provide information and understanding that goes beyond that provided by experiments. A research program progresses best when experiments are not treated as privileged but instead are combined with these other methods.
Collapse
Affiliation(s)
- Ed Diener
- Department of Psychology, University of Utah.,Department of Psychology, University of Virginia.,Gallup, Washington, D.C
| | - Robert Northcott
- Department of Philosophy, Birkbeck College, University of London
| | | | | |
Collapse
|
36
|
A framework to score the effects of structural variants in health and disease. Genome Res 2022; 32:766-777. [PMID: 35197310 PMCID: PMC8997355 DOI: 10.1101/gr.275995.121] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 02/22/2022] [Indexed: 11/25/2022]
Abstract
While technological advances improved the identification of structural variants (SVs) in the human genome, their interpretation remains challenging. Several methods utilize individual mechanistic principles like the deletion of coding sequence or 3D genome architecture disruptions. However, a comprehensive tool using the broad spectrum of available annotations is missing. Here, we describe CADD-SV, a method to retrieve and integrate a wide set of annotations to predict the effects of SVs. Previously, supervised learning approaches were limited due to a small number and biased set of annotated pathogenic or benign SVs. We overcome this problem by using a surrogate training-objective, the Combined Annotation Dependent Depletion (CADD) of functional variants. We use human and chimpanzee derived SVs as proxy-neutral and contrast them with matched simulated variants as proxy-deleterious, an approach that has proven powerful for short sequence variants. Our tool computes summary statistics over diverse variant annotations and uses random forest models to prioritize deleterious structural variants. The resulting CADD-SV scores correlate with known pathogenic and rare population variants. We further show that we can prioritize somatic cancer variants as well as noncoding variants known to affect gene expression. We provide a website and offline-scoring tool for easy application of CADD-SV.
Collapse
|
37
|
Trapotsi MA, Hosseini-Gerami L, Bender A. Computational analyses of mechanism of action (MoA): data, methods and integration. RSC Chem Biol 2022; 3:170-200. [PMID: 35360890 PMCID: PMC8827085 DOI: 10.1039/d1cb00069a] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 12/09/2021] [Indexed: 12/15/2022] Open
Abstract
The elucidation of a compound's Mechanism of Action (MoA) is a challenging task in the drug discovery process, but it is important in order to rationalise phenotypic findings and to anticipate potential side-effects. Bioinformatic approaches, advances in machine learning techniques and the increasing deposition of high-throughput data in public databases have significantly contributed to recent advances in the field, but it is not straightforward to decide which data and methods are most suitable to use in a given case. In this review, we focus on these methods and data and their applications in generating MoA hypotheses for subsequent experimental validation. We discuss compound-specific data such as -omics, cell morphology and bioactivity data, as well as commonly used supplementary prior knowledge such as network and pathway data, and provide information on databases where this data can be accessed. In terms of methodologies, we discuss both well-established methods (connectivity mapping, pathway enrichment) as well as more developing methods (neural networks and multi-omics integration). Finally, we review case studies where the MoA of a compound was successfully suggested from computational analysis by incorporating multiple data modalities and/or methodologies. Our aim for this review is to provide researchers with insights into the benefits and drawbacks of both the data and methods in terms of level of understanding, biases and interpretation - and to highlight future avenues of investigation which we foresee will improve the field of MoA elucidation, including greater public access to -omics data and methodologies which are capable of data integration.
Collapse
Affiliation(s)
- Maria-Anna Trapotsi
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Layla Hosseini-Gerami
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Andreas Bender
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| |
Collapse
|
38
|
Carr AV, Frey BL, Scalf M, Cesnik AJ, Rolfs Z, Pike KA, Yang B, Keller MP, Jarrard DF, Shortreed MR, Smith LM. MetaNetwork Enhances Biological Insights from Quantitative Proteomics Differences by Combining Clustering and Enrichment Analyses. J Proteome Res 2022; 21:410-419. [PMID: 35073098 PMCID: PMC9150505 DOI: 10.1021/acs.jproteome.1c00756] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Interpreting proteomics data remains challenging due to the large number of proteins that are quantified by modern mass spectrometry methods. Weighted gene correlation network analysis (WGCNA) can identify groups of biologically related proteins using only protein intensity values by constructing protein correlation networks. However, WGCNA is not widespread in proteomic analyses due to challenges in implementing workflows. To facilitate the adoption of WGCNA by the proteomics field, we created MetaNetwork, an open-source, R-based application to perform sophisticated WGCNA workflows with no coding skill requirements for the end user. We demonstrate MetaNetwork's utility by employing it to identify groups of proteins associated with prostate cancer from a proteomic analysis of tumor and adjacent normal tissue samples. We found a decrease in cytoskeleton-related protein expression, a known hallmark of prostate tumors. We further identified changes in module eigenproteins indicative of dysregulation in protein translation and trafficking pathways. These results demonstrate the value of using MetaNetwork to improve the biological interpretation of quantitative proteomics experiments with 15 or more samples.
Collapse
Affiliation(s)
- Austin V Carr
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Brian L Frey
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Mark Scalf
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Anthony J Cesnik
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Zach Rolfs
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Kyndal A Pike
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Bing Yang
- Department of Urology, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Mark P. Keller
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - David F Jarrard
- Department of Urology, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States,Corresponding Author: Telephone: 608-263-2594
| |
Collapse
|
39
|
Rodriguez-Esteban R. The speed of information propagation in the scientific network distorts biomedical research. PeerJ 2022; 10:e12764. [PMID: 35070506 PMCID: PMC8759377 DOI: 10.7717/peerj.12764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 12/17/2021] [Indexed: 01/07/2023] Open
Abstract
Delays in the propagation of scientific discoveries across scientific communities have been an oft-maligned feature of scientific research for introducing a bias towards knowledge that is produced within a scientist's closest community. The vastness of the scientific literature has been commonly blamed for this phenomenon, despite recent improvements in information retrieval and text mining. Its actual negative impact on scientific progress, however, has never been quantified. This analysis attempts to do so by exploring its effects on biomedical discovery, particularly in the discovery of relations between diseases, genes and chemical compounds. Results indicate that the probability that two scientific facts will enable the discovery of a new fact depends on how far apart these two facts were originally within the scientific landscape. In particular, the probability decreases exponentially with the citation distance. Thus, the direction of scientific progress is distorted based on the location in which each scientific fact is published, representing a path-dependent bias in which originally closely-located discoveries drive the sequence of future discoveries. To counter this bias, scientists should open the scope of their scientific work with modern information retrieval and extraction approaches.
Collapse
Affiliation(s)
- Raul Rodriguez-Esteban
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland
| |
Collapse
|
40
|
Lindlöf A. The Vulnerability of the Developing Brain: Analysis of Highly Expressed Genes in Infant C57BL/6 Mouse Hippocampus in Relation to Phenotypic Annotation Derived From Mutational Studies. Bioinform Biol Insights 2022; 16:11779322211062722. [PMID: 35023907 PMCID: PMC8743926 DOI: 10.1177/11779322211062722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 11/03/2021] [Indexed: 12/06/2022] Open
Abstract
The hippocampus has been shown to have a major role in learning and memory, but also to participate in the regulation of emotions. However, its specific role(s) in memory is still unclear. Hippocampal damage or dysfunction mainly results in memory issues, especially in the declarative memory but, in animal studies, has also shown to lead to hyperactivity and difficulty in inhibiting responses previously taught. The brain structure is affected in neuropathological disorders, such as Alzheimer's, epilepsy, and schizophrenia, and also by depression and stress. The hippocampus structure is far from mature at birth and undergoes substantial development throughout infant and juvenile life. The aim of this study was to survey genes highly expressed throughout the postnatal period in mouse hippocampus and which have also been linked to an abnormal phenotype through mutational studies to achieve a greater understanding about hippocampal functions during postnatal development. Publicly available gene expression data from C57BL/6 mouse hippocampus was analyzed; from a total of 5 time points (at postnatal day 1, 10, 15, 21, and 30), 547 genes highly expressed in all of these time points were selected for analysis. Highly expressed genes are considered to be of potential biological importance and appear to be multifunctional, and hence any dysfunction in such a gene will most likely have a large impact on the development of abilities during the postnatal and juvenile period. Phenotypic annotation data downloaded from Mouse Genomic Informatics database were analyzed for these genes, and the results showed that many of them are important for proper embryo development and infant survival, proper growth, and increase in body size, as well as for voluntary movement functions, motor coordination, and balance. The results also indicated an association with seizures that have primarily been characterized by uncontrolled motor activity and the development of proper grooming abilities. The complete list of genes and their phenotypic annotation data have been compiled in a file for easy access.
Collapse
|
41
|
Stoeger T, Nunes Amaral LA. The characteristics of early-stage research into human genes are substantially different from subsequent research. PLoS Biol 2022; 20:e3001520. [PMID: 34990452 PMCID: PMC8769369 DOI: 10.1371/journal.pbio.3001520] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 01/19/2022] [Accepted: 12/21/2021] [Indexed: 11/19/2022] Open
Abstract
Throughout the last 2 decades, several scholars observed that present day research into human genes rarely turns toward genes that had not already been extensively investigated in the past. Guided by hypotheses derived from studies of science and innovation, we present here a literature-wide data-driven meta-analysis to identify the specific scientific and organizational contexts that coincided with early-stage research into human genes throughout the past half century. We demonstrate that early-stage research into human genes differs in team size, citation impact, funding mechanisms, and publication outlet, but that generalized insights derived from studies of science and innovation only partially apply to early-stage research into human genes. Further, we demonstrate that, presently, genome biology accounts for most of the initial early-stage research, while subsequent early-stage research can engage other life sciences fields. We therefore anticipate that the specificity of our findings will enable scientists and policymakers to better promote early-stage research into human genes and increase overall innovation within the life sciences.
Collapse
Affiliation(s)
- Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
- Northwestern Institute on Complex Systems (NICO), Northwestern University, Evanston, Illinois, United States of America
- Center for Genetic Medicine, Northwestern University, Chicago, Illinois, United States of America
| | - Luís A. Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
- Northwestern Institute on Complex Systems (NICO), Northwestern University, Evanston, Illinois, United States of America
- Department of Molecular Bioscience, Northwestern University, Evanston, Illinois, United States of America
- Department of Physics and Astronomy, Northwestern University, Evanston, Illinois, United States of America
- Department of Medicine, Northwestern University School of Medicine, Chicago, Illinois, United States of America
| |
Collapse
|
42
|
Pust MM, Tümmler B. Bacterial low-abundant taxa are key determinants of a healthy airway metagenome in the early years of human life. Comput Struct Biotechnol J 2021; 20:175-186. [PMID: 35024091 PMCID: PMC8713036 DOI: 10.1016/j.csbj.2021.12.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 12/06/2021] [Accepted: 12/06/2021] [Indexed: 11/17/2022] Open
Abstract
The default removal of low-abundance (rare) taxa from microbial community analyses may lead to an incomplete picture of the taxonomic and functional microbial potential within the human habitat. Publicly available shotgun metagenomics data of healthy children and children with cystic fibrosis (CF) were reanalysed to study the development of the rare species biosphere, which was here defined by either the 15th, 25th or 35th species abundance percentile. We found that healthy children contained an age-independent network of abundant (core) and rare species with both entities being essential in maintaining the network structure. The protein sequence usage for more than 100 bacterial metabolic pathways differed between the core and rare species biosphere. In CF children, the background structure was underdeveloped and random forest bootstrapping based on all constituents of the early airway metagenome and host-associated factors indicated that rare taxa were the most important variables in deciding whether a child was healthy or suffered from the life-limiting CF disease. Attempts failed to make the age-independent CF network as robust as the healthy structure when an increasing number of bacterial taxa from the healthy network was incorporated into the CF structure by computer-based model simulations. However, the transfer of a key combination of taxa from the healthy to the CF network structure with high species diversity and low species dominance, correlated with a more robust CF network and a topological approximation of CF and healthy graph structures. Rothia mucilaginosa, Streptococci and rare species were essential in improving the underdeveloped CF network.
Collapse
Affiliation(s)
- Marie-Madlen Pust
- Department of Paediatric Pneumology, Allergology, and Neonatology, Hannover Medical School (MHH), Germany
- Biomedical Research in Endstage and Obstructive Lung Disease Hannover (BREATH), German Center for Lung Research, Hannover Medical School, Germany
| | - Burkhard Tümmler
- Department of Paediatric Pneumology, Allergology, and Neonatology, Hannover Medical School (MHH), Germany
- Biomedical Research in Endstage and Obstructive Lung Disease Hannover (BREATH), German Center for Lung Research, Hannover Medical School, Germany
| |
Collapse
|
43
|
Chen J, Geard N, Zobel J, Verspoor K. Automatic consistency assurance for literature-based gene ontology annotation. BMC Bioinformatics 2021; 22:565. [PMID: 34823464 PMCID: PMC8620237 DOI: 10.1186/s12859-021-04479-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 11/15/2021] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Literature-based gene ontology (GO) annotation is a process where expert curators use uniform expressions to describe gene functions reported in research papers, creating computable representations of information about biological systems. Manual assurance of consistency between GO annotations and the associated evidence texts identified by expert curators is reliable but time-consuming, and is infeasible in the context of rapidly growing biological literature. A key challenge is maintaining consistency of existing GO annotations as new studies are published and the GO vocabulary is updated. RESULTS In this work, we introduce a formalisation of biological database annotation inconsistencies, identifying four distinct types of inconsistency. We propose a novel and efficient method using state-of-the-art text mining models to automatically distinguish between consistent GO annotation and the different types of inconsistent GO annotation. We evaluate this method using a synthetic dataset generated by directed manipulation of instances in an existing corpus, BC4GO. We provide detailed error analysis for demonstrating that the method achieves high precision on more confident predictions. CONCLUSIONS Two models built using our method for distinct annotation consistency identification tasks achieved high precision and were robust to updates in the GO vocabulary. Our approach demonstrates clear value for human-in-the-loop curation scenarios.
Collapse
Affiliation(s)
- Jiyu Chen
- School of Computing and Information Systems, University of Melbourne, Melbourne, 3010, Australia
| | - Nicholas Geard
- School of Computing and Information Systems, University of Melbourne, Melbourne, 3010, Australia
| | - Justin Zobel
- School of Computing and Information Systems, University of Melbourne, Melbourne, 3010, Australia
| | - Karin Verspoor
- School of Computing and Information Systems, University of Melbourne, Melbourne, 3010, Australia. .,School of Computing Technologies, RMIT University, Melbourne, VIC, 3000, Australia.
| |
Collapse
|
44
|
Labour classified by cervical dilatation & fetal membrane rupture demonstrates differential impact on RNA-seq data for human myometrium tissues. PLoS One 2021; 16:e0260119. [PMID: 34797869 PMCID: PMC8604334 DOI: 10.1371/journal.pone.0260119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 11/02/2021] [Indexed: 12/13/2022] Open
Abstract
High throughput sequencing has previously identified differentially expressed genes (DEGs) and enriched signalling networks in human myometrium for term (≥37 weeks) gestation labour, when defined as a singular state of activity at comparison to the non-labouring state. However, transcriptome changes that occur during transition from early to established labour (defined as ≤3 and >3 cm cervical dilatation, respectively) and potentially altered by fetal membrane rupture (ROM), when adapting from onset to completion of childbirth, remained to be defined. In the present study, we assessed whether differences for these two clinically observable factors of labour are associated with different myometrial transcriptome profiles. Analysis of our tissue (‘bulk’) RNA-seq data (NCBI Gene Expression Omnibus: GSE80172) with classification of labour into four groups, each compared to the same non-labour group, identified more DEGs for early than established labour; ROM was the strongest up-regulator of DEGs. We propose that lower DEGs frequency for early labour and/or ROM negative myometrium was attributed to bulk RNA-seq limitations associated with tissue heterogeneity, as well as the possibility that processes other than gene transcription are of more importance at labour onset. Integrative analysis with future data from additional samples, which have at least equivalent refined clinical classification for labour status, and alternative omics approaches will help to explain what truly contributes to transcriptomic changes that are critical for labour onset. Lastly, we identified five DEGs common to all labour groupings; two of which (AREG and PER3) were validated by qPCR and not differentially expressed in placenta and choriodecidua.
Collapse
|
45
|
Multi-omics mapping of human papillomavirus integration sites illuminates novel cervical cancer target genes. Br J Cancer 2021; 125:1408-1419. [PMID: 34526665 PMCID: PMC8575955 DOI: 10.1038/s41416-021-01545-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 08/04/2021] [Accepted: 08/26/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Integration of human papillomavirus (HPV) into the host genome is a dominant feature of invasive cervical cancer (ICC), yet the tumorigenicity of cis genomic changes at integration sites remains largely understudied. METHODS Combining multi-omics data from The Cancer Genome Atlas with patient-matched long-read sequencing of HPV integration sites, we developed a strategy for using HPV integration events to identify and prioritise novel candidate ICC target genes (integration-detected genes (IDGs)). Four IDGs were then chosen for in vitro functional studies employing small interfering RNA-mediated knockdown in cell migration, proliferation and colony formation assays. RESULTS PacBio data revealed 267 unique human-HPV breakpoints comprising 87 total integration events in eight tumours. Candidate IDGs were filtered based on the following criteria: (1) proximity to integration site, (2) clonal representation of integration event, (3) tumour-specific expression (Z-score) and (4) association with ICC survival. Four candidates prioritised based on their unknown function in ICC (BNC1, RSBN1, USP36 and TAOK3) exhibited oncogenic properties in cervical cancer cell lines. Further, annotation of integration events provided clues regarding potential mechanisms underlying altered IDG expression in both integrated and non-integrated ICC tumours. CONCLUSIONS HPV integration events can guide the identification of novel IDGs for further study in cervical carcinogenesis and as putative therapeutic targets.
Collapse
|
46
|
Stupp D, Sharon E, Bloch I, Zitnik M, Zuk O, Tabach Y. Co-evolution based machine-learning for predicting functional interactions between human genes. Nat Commun 2021; 12:6454. [PMID: 34753957 PMCID: PMC8578642 DOI: 10.1038/s41467-021-26792-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 10/09/2021] [Indexed: 12/20/2022] Open
Abstract
Over the next decade, more than a million eukaryotic species are expected to be fully sequenced. This has the potential to improve our understanding of genotype and phenotype crosstalk, gene function and interactions, and answer evolutionary questions. Here, we develop a machine-learning approach for utilizing phylogenetic profiles across 1154 eukaryotic species. This method integrates co-evolution across eukaryotic clades to predict functional interactions between human genes and the context for these interactions. We benchmark our approach showing a 14% performance increase (auROC) compared to previous methods. Using this approach, we predict functional annotations for less studied genes. We focus on DNA repair and verify that 9 of the top 50 predicted genes have been identified elsewhere, with others previously prioritized by high-throughput screens. Overall, our approach enables better annotation of function and functional interactions and facilitates the understanding of evolutionary processes underlying co-evolution. The manuscript is accompanied by a webserver available at: https://mlpp.cs.huji.ac.il. With the rise in number of eukaryotic species being fully sequenced, large scale phylogenetic profiling can give insights on gene function, Here, the authors describe a machine-learning approach that integrates co-evolution across eukaryotic clades to predict gene function and functional interactions among human genes.
Collapse
Affiliation(s)
- Doron Stupp
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel
| | - Elad Sharon
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard University, Boston, MA, 02115, USA
| | - Or Zuk
- Department of Statistics and Data Science, The Hebrew University of Jerusalem, Jerusalem, 9190501, Israel.
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel.
| |
Collapse
|
47
|
Dalvie S, Chatzinakos C, Al Zoubi O, Georgiadis F, Lancashire L, Daskalakis NP. From genetics to systems biology of stress-related mental disorders. Neurobiol Stress 2021; 15:100393. [PMID: 34584908 PMCID: PMC8456113 DOI: 10.1016/j.ynstr.2021.100393] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 07/22/2021] [Accepted: 09/08/2021] [Indexed: 01/20/2023] Open
Abstract
Many individuals will be exposed to some form of traumatic stress in their lifetime which, in turn, increases the likelihood of developing stress-related disorders such as post-traumatic stress disorder (PTSD), major depressive disorder (MDD) and anxiety disorders (ANX). The development of these disorders is also influenced by genetics and have heritability estimates ranging between ∼30 and 70%. In this review, we provide an overview of the findings of genome-wide association studies for PTSD, depression and ANX, and we observe a clear genetic overlap between these three diagnostic categories. We go on to highlight the results from transcriptomic and epigenomic studies, and, given the multifactorial nature of stress-related disorders, we provide an overview of the gene-environment studies that have been conducted to date. Finally, we discuss systems biology approaches that are now seeing wider utility in determining a more holistic view of these complex disorders.
Collapse
Affiliation(s)
- Shareefa Dalvie
- South African Medical Research Council (SAMRC), Unit on Risk & Resilience in Mental Disorders, Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, South Africa
- South African Medical Research Council (SAMRC), Unit on Child & Adolescent Health, Department of Paediatrics and Child Health, University of Cape Town, Cape Town, South Africa
| | - Chris Chatzinakos
- Department of Psychiatry, McLean Hospital, Harvard Medical School, Belmont, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, USA
| | - Obada Al Zoubi
- Department of Psychiatry, McLean Hospital, Harvard Medical School, Belmont, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, USA
| | - Foivos Georgiadis
- Department of Psychiatry, McLean Hospital, Harvard Medical School, Belmont, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, USA
| | | | - Lee Lancashire
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, USA
- Department of Data Science, Cohen Veterans Bioscience, New York, USA
| | - Nikolaos P. Daskalakis
- Department of Psychiatry, McLean Hospital, Harvard Medical School, Belmont, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, USA
| |
Collapse
|
48
|
Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, Feng T, Zhou L, Tang W, Zhan L, Fu X, Liu S, Bo X, Yu G. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (N Y) 2021; 2:100141. [PMID: 34557778 PMCID: PMC8454663 DOI: 10.1016/j.xinn.2021.100141] [Citation(s) in RCA: 2915] [Impact Index Per Article: 971.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 06/29/2021] [Indexed: 12/15/2022] Open
Abstract
Functional enrichment analysis is pivotal for interpreting high-throughput omics data in life science. It is crucial for this type of tool to use the latest annotation databases for as many organisms as possible. To meet these requirements, we present here an updated version of our popular Bioconductor package, clusterProfiler 4.0. This package has been enhanced considerably compared with its original version published 9 years ago. The new version provides a universal interface for functional enrichment analysis in thousands of organisms based on internally supported ontologies and pathways as well as annotation data provided by users or derived from online databases. It also extends the dplyr and ggplot2 packages to offer tidy interfaces for data operation and visualization. Other new features include gene set enrichment analysis and comparison of enrichment results from multiple gene lists. We anticipate that clusterProfiler 4.0 will be applied to a wide range of scenarios across diverse organisms. clusterProfiler supports exploring functional characteristics of both coding and non-coding genomics data for thousands of species with up-to-date gene annotation It provides a universal interface for gene functional annotation from a variety of sources and thus can be applied in diverse scenarios It provides a tidy interface to access, manipulate, and visualize enrichment results to help users achieve efficient data interpretation Datasets obtained from multiple treatments and time points can be analyzed and compared in a single run, easily revealing functional consensus and differences among distinct conditions
Collapse
Affiliation(s)
- Tianzhi Wu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Erqiang Hu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Shuangbin Xu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Meijun Chen
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Pingfan Guo
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Zehan Dai
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Tingze Feng
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Lang Zhou
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Wenli Tang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Li Zhan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Xiaocong Fu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Shanshan Liu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Guangchuang Yu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Guangdong Provincial Key Laboratory of Proteomics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
49
|
Albaradei S, Napolitano F, Thafar MA, Gojobori T, Essack M, Gao X. MetaCancer: A deep learning-based pan-cancer metastasis prediction model developed using multi-omics data. Comput Struct Biotechnol J 2021; 19:4404-4411. [PMID: 34429856 PMCID: PMC8368987 DOI: 10.1016/j.csbj.2021.08.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 07/19/2021] [Accepted: 08/06/2021] [Indexed: 02/09/2023] Open
Abstract
Predicting metastasis in the early stages means that clinicians have more time to adjust a treatment regimen to target the primary and metastasized cancer. In this regard, several computational approaches are being developed to identify metastasis early. However, most of the approaches focus on changes on one genomic level only, and they are not being developed from a pan-cancer perspective. Thus, we here present a deep learning (DL)-based model, MetaCancer, that differentiates pan-cancer metastasis status based on three heterogeneous data layers. In particular, we built the DL-based model using 400 patients' data that includes RNA sequencing (RNA-Seq), microRNA sequencing (microRNA-Seq), and DNA methylation data from The Cancer Genome Atlas (TCGA). We quantitatively assess the proposed convolutional variational autoencoder (CVAE) and alternative feature extraction methods. We further show that integrating mRNA, microRNA, and DNA methylation data as features improves our model's performance compared to when we used mRNA data only. In addition, we show that the mRNA-related features make a more significant contribution when attempting to distinguish the primary tumors from metastatic ones computationally. Lastly, we show that our DL model significantly outperformed a machine learning (ML) ensemble method based on various metrics.
Collapse
Affiliation(s)
- Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Francesco Napolitano
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Maha A. Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| |
Collapse
|
50
|
Watson J, Schwartz JM, Francavilla C. Using Multilayer Heterogeneous Networks to Infer Functions of Phosphorylated Sites. J Proteome Res 2021; 20:3532-3548. [PMID: 34164982 PMCID: PMC8256419 DOI: 10.1021/acs.jproteome.1c00150] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Indexed: 01/23/2023]
Abstract
Mass spectrometry-based quantitative phosphoproteomics has become an essential approach in the study of cellular processes such as signaling. Commonly used methods to analyze phosphoproteomics datasets depend on generic, gene-centric annotations such as Gene Ontology terms, which do not account for the function of a protein in a particular phosphorylation state. Analysis of phosphoproteomics data is hampered by a lack of phosphorylated site-specific annotations. We propose a method that combines shotgun phosphoproteomics data, protein-protein interactions, and functional annotations into a heterogeneous multilayer network. Phosphorylation sites are associated to potential functions using a random walk on the heterogeneous network (RWHN) algorithm. We validated our approach against a model of the MAPK/ERK pathway and functional annotations from PhosphoSitePlus and were able to associate differentially regulated sites on the same proteins to their previously described specific functions. We further tested the algorithm on three previously published datasets and were able to reproduce their experimentally validated conclusions and to associate phosphorylation sites with known functions based on their regulatory patterns. Our approach provides a refinement of commonly used analysis methods and accurately predicts context-specific functions for sites with similar phosphorylation profiles.
Collapse
Affiliation(s)
- Joanne Watson
- Division
of Evolution & Genomic Sciences, School of Biological Sciences,
Faculty of Biology, Medicine & Health, University of Manchester, Manchester M13 9PT, U.K.
- Division
of Molecular and Cellular Function, School of Biological Sciences,
Faculty of Biology, Medicine & Health, University of Manchester, Manchester M13 9PT, U.K.
| | - Jean-Marc Schwartz
- Division
of Evolution & Genomic Sciences, School of Biological Sciences,
Faculty of Biology, Medicine & Health, University of Manchester, Manchester M13 9PT, U.K.
| | - Chiara Francavilla
- Division
of Molecular and Cellular Function, School of Biological Sciences,
Faculty of Biology, Medicine & Health, University of Manchester, Manchester M13 9PT, U.K.
| |
Collapse
|