1
|
Seddon AR, Damiano OM, Hampton MB, Stevens AJ. Widespread genomic de novo DNA methylation occurs following CD8 + T cell activation and proliferation. Epigenetics 2024; 19:2367385. [PMID: 38899429 PMCID: PMC11195465 DOI: 10.1080/15592294.2024.2367385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 06/05/2024] [Indexed: 06/21/2024] Open
Abstract
This research investigates the intricate dynamics of DNA methylation in the hours following CD8+ T cell activation, during a critical yet understudied temporal window. DNA methylation is an epigenetic modification central to regulation of gene expression and directing immune responses. Our investigation spanned 96-h post-activation and unveils a nuanced tapestry of global and site-specific methylation changes. We identified 15,626 significant differentially methylated CpGs spread across the genome, with the most significant changes occurring within the genes ADAM10, ICA1, and LAPTM5. While many changes had modest effect sizes, approximately 120 CpGs exhibited a log2FC above 1.5, with cell activation and proliferation pathways the most affected. Relatively few of the differentially methylated CpGs occurred along adjacent gene regions. The exceptions were seven differentially methylated gene regions, with the Human T cell Receptor Alpha Joining Genes demonstrating consistent methylation change over a 3kb window. We also investigated whether an inflammatory environment could alter DNA methylation during activation, with proliferating cells exposed to the oxidant glycine chloramine. No substantial differential methylation was observed in this context. The temporal perspective of early activation adds depth to the evolving field of epigenetic immunology, offering insights with implications for therapeutic innovation and expanding our understanding of epigenetic modulation in immune function.
Collapse
Affiliation(s)
- Annika R. Seddon
- Department of Pathology and Biomedical Science, Mātai Hāora - Centre for Redox Biology and Medicine, University of Otago, Christchurch, New Zealand
| | - Olivia M. Damiano
- Department of Pathology and Molecular Medicine, Genetics and Epigenetics Research Group, University of Otago, Wellington, New Zealand
| | - Mark B. Hampton
- Department of Pathology and Biomedical Science, Mātai Hāora - Centre for Redox Biology and Medicine, University of Otago, Christchurch, New Zealand
| | - Aaron J. Stevens
- Department of Pathology and Molecular Medicine, Genetics and Epigenetics Research Group, University of Otago, Wellington, New Zealand
| |
Collapse
|
2
|
Yazdani K, Mousapour R, Hayes WB. New GO-based measures in multiple network alignment. Bioinformatics 2024; 40:btae476. [PMID: 39082966 PMCID: PMC11310457 DOI: 10.1093/bioinformatics/btae476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 06/11/2024] [Accepted: 07/30/2024] [Indexed: 08/10/2024] Open
Abstract
MOTIVATION Protein-protein interaction (PPI) networks provide valuable insights into the function of biological systems. Aligning multiple PPI networks may expose relationships beyond those observable by pairwise comparisons. However, assessing the biological quality of multiple network alignments is a challenging problem. RESULTS We propose two new measures to evaluate the quality of multiple network alignments using functional information from Gene Ontology (GO) terms. When aligning multiple real PPI networks across species, we observe that both measures are highly correlated with objective quality indicators, such as common orthologs. Additionally, our measures strongly correlate with an alignment's ability to predict novel GO annotations, which is a unique advantage over existing GO-based measures. AVAILABILITY AND IMPLEMENTATION The scripts and the links to the raw and alignment data can be accessed at https://github.com/kimiayazdani/GO_Measures.git.
Collapse
Affiliation(s)
- Kimia Yazdani
- Department of Computer Science, University of California, Irvine, CA 92697-3435, United States
| | - Reza Mousapour
- Department of Computer Engineering, Sharif University of Technology, Tehran 1458889694, Iran
| | - Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA 92697-3435, United States
| |
Collapse
|
3
|
Rich A, Acar O, Carvunis AR. Massively integrated coexpression analysis reveals transcriptional regulation, evolution and cellular implications of the yeast noncanonical translatome. Genome Biol 2024; 25:183. [PMID: 38978079 PMCID: PMC11232214 DOI: 10.1186/s13059-024-03287-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 05/20/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Recent studies uncovered pervasive transcription and translation of thousands of noncanonical open reading frames (nORFs) outside of annotated genes. The contribution of nORFs to cellular phenotypes is difficult to infer using conventional approaches because nORFs tend to be short, of recent de novo origins, and lowly expressed. Here we develop a dedicated coexpression analysis framework that accounts for low expression to investigate the transcriptional regulation, evolution, and potential cellular roles of nORFs in Saccharomyces cerevisiae. RESULTS Our results reveal that nORFs tend to be preferentially coexpressed with genes involved in cellular transport or homeostasis but rarely with genes involved in RNA processing. Mechanistically, we discover that young de novo nORFs located downstream of conserved genes tend to leverage their neighbors' promoters through transcription readthrough, resulting in high coexpression and high expression levels. Transcriptional piggybacking also influences the coexpression profiles of young de novo nORFs located upstream of genes, but to a lesser extent and without detectable impact on expression levels. Transcriptional piggybacking influences, but does not determine, the transcription profiles of de novo nORFs emerging nearby genes. About 40% of nORFs are not strongly coexpressed with any gene but are transcriptionally regulated nonetheless and tend to form entirely new transcription modules. We offer a web browser interface ( https://carvunislab.csb.pitt.edu/shiny/coexpression/ ) to efficiently query, visualize, and download our coexpression inferences. CONCLUSIONS Our results suggest that nORF transcription is highly regulated. Our coexpression dataset serves as an unprecedented resource for unraveling how nORFs integrate into cellular networks, contribute to cellular phenotypes, and evolve.
Collapse
Affiliation(s)
- April Rich
- Joint Carnegie Mellon University-University of Pittsburgh, University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA
| | - Omer Acar
- Joint Carnegie Mellon University-University of Pittsburgh, University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
4
|
Ludwig J, Mrázek J. OrthoRefine: automated enhancement of prior ortholog identification via synteny. BMC Bioinformatics 2024; 25:163. [PMID: 38664637 PMCID: PMC11044567 DOI: 10.1186/s12859-024-05786-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 04/15/2024] [Indexed: 04/29/2024] Open
Abstract
BACKGROUND Identifying orthologs continues to be an early and imperative step in genome analysis but remains a challenging problem. While synteny (conservation of gene order) has previously been used independently and in combination with other methods to identify orthologs, applying synteny in ortholog identification has yet to be automated in a user-friendly manner. This desire for automation and ease-of-use led us to develop OrthoRefine, a standalone program that uses synteny to refine ortholog identification. RESULTS We developed OrthoRefine to improve the detection of orthologous genes by implementing a look-around window approach to detect synteny. We tested OrthoRefine in tandem with OrthoFinder, one of the most used software for identification of orthologs in recent years. We evaluated improvements provided by OrthoRefine in several bacterial and a eukaryotic dataset. OrthoRefine efficiently eliminates paralogs from orthologous groups detected by OrthoFinder. Using synteny increased specificity and functional ortholog identification; additionally, analysis of BLAST e-value, phylogenetics, and operon occurrence further supported using synteny for ortholog identification. A comparison of several window sizes suggested that smaller window sizes (eight genes) were generally the most suitable for identifying orthologs via synteny. However, larger windows (30 genes) performed better in datasets containing less closely related genomes. A typical run of OrthoRefine with ~ 10 bacterial genomes can be completed in a few minutes on a regular desktop PC. CONCLUSION OrthoRefine is a simple-to-use, standalone tool that automates the application of synteny to improve ortholog detection. OrthoRefine is particularly efficient in eliminating paralogs from orthologous groups delineated by standard methods.
Collapse
Affiliation(s)
- J Ludwig
- Institute of Bioinformatics, The University of Georgia, Athens, GA, 30602, USA.
| | - J Mrázek
- Department of Microbiology and Institute of Bioinformatics, The University of Georgia, Athens, GA, 30602, USA
| |
Collapse
|
5
|
Helbing DL, Haas F, Cirri E, Rahnis N, Dau TTD, Kelmer Sacramento E, Oraha N, Böhm L, Lajqi T, Fehringer P, Morrison H, Bauer R. Impact of inflammatory preconditioning on murine microglial proteome response induced by focal ischemic brain injury. Front Immunol 2024; 15:1227355. [PMID: 38655254 PMCID: PMC11036884 DOI: 10.3389/fimmu.2024.1227355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 03/11/2024] [Indexed: 04/26/2024] Open
Abstract
Preconditioning with lipopolysaccharide (LPS) induces neuroprotection against subsequent cerebral ischemic injury, mainly involving innate immune pathways. Microglia are resident immune cells of the central nervous system (CNS) that respond early to danger signals through memory-like differential reprogramming. However, the cell-specific molecular mechanisms underlying preconditioning are not fully understood. To elucidate the distinct molecular mechanisms of preconditioning on microglia, we compared these cell-specific proteomic profiles in response to LPS preconditioning and without preconditioning and subsequent transient focal brain ischemia and reperfusion, - using an established mouse model of transient focal brain ischemia and reperfusion. A proteomic workflow, based on isolated microglia obtained from mouse brains by cell sorting and coupled to mass spectrometry for identification and quantification, was applied. Our data confirm that LPS preconditioning induces marked neuroprotection, as indicated by a significant reduction in brain infarct volume. The established brain cell separation method was suitable for obtaining an enriched microglial cell fraction for valid proteomic analysis. The results show a significant impact of LPS preconditioning on microglial proteome patterns by type I interferons, presumably driven by the interferon cluster regulator proteins signal transducer and activator of transcription1/2 (STAT1/2).
Collapse
Affiliation(s)
- Dario Lucas Helbing
- Institute of Molecular Cell Biology, Jena University Hospital, Friedrich Schiller University, Jena, Germany
- Leibniz Institute on Aging, Fritz Lipmann Institute, Jena, Germany
- Department of Psychiatry and Psychotherapy, Jena University Hospital, Friedrich Schiller University Jena, Jena, Germany
- Center for Intervention and Research on Adaptive and Maladaptive Brain Circuits Underlying Mental Health (C-I-R-C), Jena-Magdeburg-Halle, Jena, Germany
- German Center for Mental Health (DZPG), Site Halle-Jena-Magdeburg, Jena, Germany
| | - Fabienne Haas
- Institute of Molecular Cell Biology, Jena University Hospital, Friedrich Schiller University, Jena, Germany
| | - Emilio Cirri
- Leibniz Institute on Aging, Fritz Lipmann Institute, Jena, Germany
| | - Norman Rahnis
- Leibniz Institute on Aging, Fritz Lipmann Institute, Jena, Germany
| | | | | | - Nova Oraha
- Institute of Molecular Cell Biology, Jena University Hospital, Friedrich Schiller University, Jena, Germany
- Leibniz Institute on Aging, Fritz Lipmann Institute, Jena, Germany
| | - Leopold Böhm
- Institute of Molecular Cell Biology, Jena University Hospital, Friedrich Schiller University, Jena, Germany
- Leibniz Institute on Aging, Fritz Lipmann Institute, Jena, Germany
- Department of Microbiology and Hospital Hygiene, Bundeswehr Central Hospital Koblenz, Koblenz, Germany
| | - Trim Lajqi
- Department of Neonatology, Heidelberg University Children’s Hospital, Heidelberg, Germany
| | - Pascal Fehringer
- Institute of Molecular Cell Biology, Jena University Hospital, Friedrich Schiller University, Jena, Germany
| | - Helen Morrison
- Leibniz Institute on Aging, Fritz Lipmann Institute, Jena, Germany
- Faculty of Biological Sciences, Friedrich-Schiller University, Jena, Germany
| | - Reinhard Bauer
- Institute of Molecular Cell Biology, Jena University Hospital, Friedrich Schiller University, Jena, Germany
| |
Collapse
|
6
|
Hayes WB. Exact p-values for global network alignments via combinatorial analysis of shared GO terms : REFANGO: Rigorous Evaluation of Functional Alignments of Networks using Gene Ontology. J Math Biol 2024; 88:50. [PMID: 38551701 PMCID: PMC10980677 DOI: 10.1007/s00285-024-02058-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 01/21/2024] [Accepted: 02/05/2024] [Indexed: 04/01/2024]
Abstract
Network alignment aims to uncover topologically similar regions in the protein-protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions tend to perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no "gold standard" exists for evaluating how well either is able to uncover functionally similar regions. Here we propose a formal, mathematically and statistically rigorous method for evaluating the statistical significance of shared GO terms in a global, 1-to-1 alignment between two PPI networks. Given an alignment in which k aligned protein pairs share a particular GO term g, we use a combinatorial argument to precisely quantify the p-value of that alignment with respect to g compared to a random alignment. The p-value of the alignment with respect to all GO terms, including their inter-relationships, is approximated using the Empirical Brown's Method. We note that, just as with BLAST's p-values, this method is not designed to guide an alignment algorithm towards a solution; instead, just as with BLAST, an alignment is guided by a scoring matrix or function; the p-values herein are computed after the fact, providing independent feedback to the user on the biological quality of the alignment that was generated by optimizing the scoring function. Importantly, we demonstrate that among all GO-based measures of network alignments, ours is the only one that correlates with the precision of GO annotation predictions, paving the way for network alignment-based protein function prediction.
Collapse
Affiliation(s)
- Wayne B Hayes
- Department of Computer Science, UC Irvine, Irvine, USA.
| |
Collapse
|
7
|
Ayub U, Naveed H. GSLAlign: community detection and local PPI network alignment. J Biomol Struct Dyn 2024:1-9. [PMID: 38214492 DOI: 10.1080/07391102.2024.2301757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 12/29/2023] [Indexed: 01/13/2024]
Abstract
High throughput protein-protein interaction (PPI) profiling and computational techniques have resulted in generating a large amount of PPI network data. The study of PPI networks helps in understanding the biological processes of the proteins. The comparative study of the PPI networks helps in identifying the conserved interactions across the species. This article presents a novel local PPI network aligner 'GSLAlign' that consists of two stages. It first detects the communities from the PPI networks by applying the GraphSAGE algorithm using gene expression data. In the second stage, the detected communities are aligned using a community aligner that is based on protein sequence similarity. The community detection algorithm produces more separable and biologically accurate communities as compared to previous community detection algorithms. Moreover, the proposed community alignment algorithm achieves 3-8% better results in terms of semantic similarity as compared to previous local aligners. The average connectivity and coverage of the proposed algorithm are also better than the existing aligners.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Umair Ayub
- Department of Computer Science, Bahria University, Lahore, Pakistan
| | - Hammad Naveed
- National University of Computer and Emerging Sciences, Lahore, Pakistan and Computational Biology Research Lab, National University of Computer and Emerging Sciences, Lahore, Pakistan
| |
Collapse
|
8
|
Altenhoff AM, Warwick Vesztrocy A, Bernard C, Train CM, Nicheperovich A, Prieto Baños S, Julca I, Moi D, Nevers Y, Majidian S, Dessimoz C, Glover NM. OMA orthology in 2024: improved prokaryote coverage, ancestral and extant GO enrichment, a revamped synteny viewer and more in the OMA Ecosystem. Nucleic Acids Res 2024; 52:D513-D521. [PMID: 37962356 PMCID: PMC10767875 DOI: 10.1093/nar/gkad1020] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/17/2023] [Accepted: 10/23/2023] [Indexed: 11/15/2023] Open
Abstract
In this update paper, we present the latest developments in the OMA browser knowledgebase, which aims to provide high-quality orthology inferences and facilitate the study of gene families, genomes and their evolution. First, we discuss the addition of new species in the database, particularly an expanded representation of prokaryotic species. The OMA browser now offers Ancestral Genome pages and an Ancestral Gene Order viewer, allowing users to explore the evolutionary history and gene content of ancestral genomes. We also introduce a revamped Local Synteny Viewer to compare genomic neighborhoods across both extant and ancestral genomes. Hierarchical Orthologous Groups (HOGs) are now annotated with Gene Ontology annotations, and users can easily perform extant or ancestral GO enrichments. Finally, we recap new tools in the OMA Ecosystem, including OMAmer for proteome mapping, OMArk for proteome quality assessment, OMAMO for model organism selection and Read2Tree for phylogenetic species tree construction from reads. These new features provide exciting opportunities for orthology analysis and comparative genomics. OMA is accessible at https://omabrowser.org.
Collapse
Affiliation(s)
- Adrian M Altenhoff
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland
| | - Alex Warwick Vesztrocy
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Charles Bernard
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Clement-Marie Train
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Alina Nicheperovich
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Silvia Prieto Baños
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Irene Julca
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - David Moi
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Yannis Nevers
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Sina Majidian
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Natasha M Glover
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
9
|
Fernando PC, Mabee PM, Zeng E. Protein-protein interaction network module changes associated with the vertebrate fin-to-limb transition. Sci Rep 2023; 13:22594. [PMID: 38114646 PMCID: PMC10730527 DOI: 10.1038/s41598-023-50050-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 12/14/2023] [Indexed: 12/21/2023] Open
Abstract
Evolutionary phenotypic transitions, such as the fin-to-limb transition in vertebrates, result from modifications in related proteins and their interactions, often in response to changing environment. Identifying these alterations in protein networks is crucial for a more comprehensive understanding of these transitions. However, previous research has not attempted to compare protein-protein interaction (PPI) networks associated with evolutionary transitions, and most experimental studies concentrate on a limited set of proteins. Therefore, the goal of this work was to develop a network-based platform for investigating the fin-to-limb transition using PPI networks. Quality-enhanced protein networks, constructed by integrating PPI networks with anatomy ontology data, were leveraged to compare protein modules for paired fins (pectoral fin and pelvic fin) of fishes (zebrafish) to those of the paired limbs (forelimb and hindlimb) of mammals (mouse). This also included prediction of novel protein candidates and their validation by enrichment and homology analyses. Hub proteins such as shh and bmp4, which are crucial for module stability, were identified, and their changing roles throughout the transition were examined. Proteins with preserved roles during the fin-to-limb transition were more likely to be hub proteins. This study also addressed hypotheses regarding the role of non-preserved proteins associated with the transition.
Collapse
Affiliation(s)
- Pasan C Fernando
- Department of Plant Sciences, University of Colombo, Colombo, Sri Lanka.
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA
- National Ecological Observatory Network, Battelle, 1625 38th St. #100, Boulder, CO, 80301, USA
| | - Erliang Zeng
- Departments of Preventive & Community Dentistry, College of Dentistry, University of Iowa, Iowa City, IA, USA.
- Division of Biostatistics and Computational Biology, College of Dentistry, University of Iowa, Iowa City, IA, USA.
- Departments of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA, USA.
- Departments of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
10
|
Joo S, Dhaygude K, Westerberg S, Krebs R, Puhka M, Holmström E, Syrjälä S, Nykänen AI, Lemström K. Transcriptomic Landscape of Circulating Extracellular Vesicles in Heart Transplant Ischemia-Reperfusion. Genes (Basel) 2023; 14:2101. [PMID: 38003044 PMCID: PMC10671425 DOI: 10.3390/genes14112101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 11/14/2023] [Accepted: 11/16/2023] [Indexed: 11/26/2023] Open
Abstract
Ischemia-reperfusion injury (IRI) is an inevitable event during heart transplantation, which is known to exacerbate damage to the allograft. However, the precise mechanisms underlying IRI remain incompletely understood. Here, we profiled the whole transcriptome of plasma extracellular vesicles (EVs) by RNA sequencing from 41 heart transplant recipients immediately before and at 12 h after transplant reperfusion. We found that the expression of 1317 protein-coding genes in plasma EVs was changed at 12 h after reperfusion. Upregulated genes of plasma EVs were related to metabolism and immune activation, while downregulated genes were related to cell survival and extracellular matrix organization. In addition, we performed correlation analyses between EV transcriptome and intensity of graft IRI (i.e., cardiomyocyte injury), as well as EV transcriptome and primary graft dysfunction, as well as any biopsy-proven acute rejection after heart transplantation. We ultimately revealed that at 12 h after reperfusion, 4 plasma EV genes (ITPKA, DDIT4L, CD19, and CYP4A11) correlated with both cardiomyocyte injury and primary graft dysfunction, suggesting that EVs are sensitive indicators of reperfusion injury reflecting lipid metabolism-induced stress and imbalance in calcium homeostasis. In conclusion, we show that profiling plasma EV gene expression may enlighten the mechanisms of heart transplant IRI.
Collapse
Affiliation(s)
- SeoJeong Joo
- Translational Immunology Research Program, Transplantation Laboratory, University of Helsinki, 00014 Helsinki, Finland; (S.J.); (K.D.); (S.W.); (R.K.); (E.H.); (S.S.); (A.I.N.)
| | - Kishor Dhaygude
- Translational Immunology Research Program, Transplantation Laboratory, University of Helsinki, 00014 Helsinki, Finland; (S.J.); (K.D.); (S.W.); (R.K.); (E.H.); (S.S.); (A.I.N.)
| | - Sofie Westerberg
- Translational Immunology Research Program, Transplantation Laboratory, University of Helsinki, 00014 Helsinki, Finland; (S.J.); (K.D.); (S.W.); (R.K.); (E.H.); (S.S.); (A.I.N.)
| | - Rainer Krebs
- Translational Immunology Research Program, Transplantation Laboratory, University of Helsinki, 00014 Helsinki, Finland; (S.J.); (K.D.); (S.W.); (R.K.); (E.H.); (S.S.); (A.I.N.)
| | - Maija Puhka
- Institute for Molecular Medicine Finland FIMM, EV and HiPREP Core, University of Helsinki, 00014 Helsinki, Finland;
| | - Emil Holmström
- Translational Immunology Research Program, Transplantation Laboratory, University of Helsinki, 00014 Helsinki, Finland; (S.J.); (K.D.); (S.W.); (R.K.); (E.H.); (S.S.); (A.I.N.)
| | - Simo Syrjälä
- Translational Immunology Research Program, Transplantation Laboratory, University of Helsinki, 00014 Helsinki, Finland; (S.J.); (K.D.); (S.W.); (R.K.); (E.H.); (S.S.); (A.I.N.)
- Heart and Lung Center, Helsinki University Hospital, University of Helsinki, 00014 Helsinki, Finland
| | - Antti I. Nykänen
- Translational Immunology Research Program, Transplantation Laboratory, University of Helsinki, 00014 Helsinki, Finland; (S.J.); (K.D.); (S.W.); (R.K.); (E.H.); (S.S.); (A.I.N.)
- Heart and Lung Center, Helsinki University Hospital, University of Helsinki, 00014 Helsinki, Finland
| | - Karl Lemström
- Translational Immunology Research Program, Transplantation Laboratory, University of Helsinki, 00014 Helsinki, Finland; (S.J.); (K.D.); (S.W.); (R.K.); (E.H.); (S.S.); (A.I.N.)
- Heart and Lung Center, Helsinki University Hospital, University of Helsinki, 00014 Helsinki, Finland
| |
Collapse
|
11
|
Ibtehaz N, Kagaya Y, Kihara D. Domain-PFP allows protein function prediction using function-aware domain embedding representations. Commun Biol 2023; 6:1103. [PMID: 37907681 PMCID: PMC10618451 DOI: 10.1038/s42003-023-05476-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 10/17/2023] [Indexed: 11/02/2023] Open
Abstract
Domains are functional and structural units of proteins that govern various biological functions performed by the proteins. Therefore, the characterization of domains in a protein can serve as a proper functional representation of proteins. Here, we employ a self-supervised protocol to derive functionally consistent representations for domains by learning domain-Gene Ontology (GO) co-occurrences and associations. The domain embeddings we constructed turned out to be effective in performing actual function prediction tasks. Extensive evaluations showed that protein representations using the domain embeddings are superior to those of large-scale protein language models in GO prediction tasks. Moreover, the new function prediction method built on the domain embeddings, named Domain-PFP, substantially outperformed the state-of-the-art function predictors. Additionally, Domain-PFP demonstrated competitive performance in the CAFA3 evaluation, achieving overall the best performance among the top teams that participated in the assessment.
Collapse
Affiliation(s)
- Nabil Ibtehaz
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
12
|
Ibtehaz N, Kagaya Y, Kihara D. Domain-PFP: Protein Function Prediction Using Function-Aware Domain Embedding Representations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.23.554486. [PMID: 37662252 PMCID: PMC10473699 DOI: 10.1101/2023.08.23.554486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Domains are functional and structural units of proteins that govern various biological functions performed by the proteins. Therefore, the characterization of domains in a protein can serve as a proper functional representation of proteins. Here, we employ a self-supervised protocol to derive functionally consistent representations for domains by learning domain-Gene Ontology (GO) co-occurrences and associations. The domain embeddings we constructed turned out to be effective in performing actual function prediction tasks. Extensive evaluations showed that protein representations using the domain embeddings are superior to those of large-scale protein language models in GO prediction tasks. Moreover, the new function prediction method built on the domain embeddings, named Domain-PFP, significantly outperformed the state-of-the-art function predictors. Additionally, Domain-PFP demonstrated competitive performance in the CAFA3 evaluation, achieving overall the best performance among the top teams that participated in the assessment.
Collapse
Affiliation(s)
- Nabil Ibtehaz
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| |
Collapse
|
13
|
Jablonski KP, Beerenwinkel N. Coherent pathway enrichment estimation by modeling inter-pathway dependencies using regularized regression. Bioinformatics 2023; 39:btad522. [PMID: 37610338 PMCID: PMC10471899 DOI: 10.1093/bioinformatics/btad522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 07/04/2023] [Accepted: 08/22/2023] [Indexed: 08/24/2023] Open
Abstract
MOTIVATION Gene set enrichment methods are a common tool to improve the interpretability of gene lists as obtained, for example, from differential gene expression analyses. They are based on computing whether dysregulated genes are located in certain biological pathways more often than expected by chance. Gene set enrichment tools rely on pre-existing pathway databases such as KEGG, Reactome, or the Gene Ontology. These databases are increasing in size and in the number of redundancies between pathways, which complicates the statistical enrichment computation. RESULTS We address this problem and develop a novel gene set enrichment method, called pareg, which is based on a regularized generalized linear model and directly incorporates dependencies between gene sets related to certain biological functions, for example, due to shared genes, in the enrichment computation. We show that pareg is more robust to noise than competing methods. Additionally, we demonstrate the ability of our method to recover known pathways as well as to suggest novel treatment targets in an exploratory analysis using breast cancer samples from TCGA. AVAILABILITY AND IMPLEMENTATION pareg is freely available as an R package on Bioconductor (https://bioconductor.org/packages/release/bioc/html/pareg.html) as well as on https://github.com/cbg-ethz/pareg. The GitHub repository also contains the Snakemake workflows needed to reproduce all results presented here.
Collapse
Affiliation(s)
- Kim Philipp Jablonski
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4058, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4058, Switzerland
| |
Collapse
|
14
|
Grenko CM, Bonnycastle LL, Taylor HJ, Yan T, Swift AJ, Robertson CC, Narisu N, Erdos MR, Collins FS, Taylor DL. Single-cell transcriptomic profiling of human pancreatic islets reveals genes responsive to glucose exposure over 24 hours. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.06.543931. [PMID: 37333221 PMCID: PMC10274787 DOI: 10.1101/2023.06.06.543931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Disruption of pancreatic islet function and glucose homeostasis can lead to the development of sustained hyperglycemia, beta cell glucotoxicity, and ultimately type 2 diabetes (T2D). In this study, we sought to explore the effects of hyperglycemia on human pancreatic islet (HPI) gene expression by exposing HPIs from two donors to low (2.8mM) and high (15.0mM) glucose concentrations over 24 hours, assaying the transcriptome at seven time points using single-cell RNA sequencing (scRNA-seq). We modeled time as both a discrete and continuous variable to determine momentary and longitudinal changes in transcription associated with islet time in culture or glucose exposure. Across all cell types, we identified 1,528 genes associated with time, 1,185 genes associated with glucose exposure, and 845 genes associated with interaction effects between time and glucose. We clustered differentially expressed genes across cell types and found 347 modules of genes with similar expression patterns across time and glucose conditions, including two beta cell modules enriched in genes associated with T2D. Finally, by integrating genomic features from this study and genetic summary statistics for T2D and related traits, we nominate 363 candidate effector genes that may underlie genetic associations for T2D and related traits.
Collapse
Affiliation(s)
- Caleb M. Grenko
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Lori L. Bonnycastle
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Henry J. Taylor
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
| | - Tingfen Yan
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Amy J. Swift
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Catherine C. Robertson
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Narisu Narisu
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Michael R. Erdos
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Francis S. Collins
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - D. Leland Taylor
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
15
|
Liu X, Gao L, Peng Y, Fang Z, Wang J. PheSom: a term frequency-based method for measuring human phenotype similarity on the basis of MeSH vocabulary. Front Genet 2023; 14:1185790. [PMID: 37496714 PMCID: PMC10366691 DOI: 10.3389/fgene.2023.1185790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 06/21/2023] [Indexed: 07/28/2023] Open
Abstract
Background: Phenotype similarity calculation should be used to help improve drug repurposing. In this study, based on the MeSH terms describing the phenotypes deposited in OMIM, we proposed a method, namely, PheSom (Phenotype Similarity On MeSH), to measure the similarity between phenotypes. PheSom counted the number of overlapping MeSH terms between two phenotypes and then took the weight of every MeSH term within each phenotype into account according to the term frequency-inverse document frequency (FIDC). Phenotype-related genes were used for the evaluation of our method. Results: A 7,739 × 7,739 similarity score matrix was finally obtained and the number of phenotype pairs was dramatically decreased with the increase of similarity score. Besides, the overlapping rates of phenotype-related genes were remarkably increased with the increase of similarity score between phenotypes, which supports the reliability of our method. Conclusion: We anticipate our method can be applied to identifying novel therapeutic methods for complex diseases.
Collapse
Affiliation(s)
- Xinhua Liu
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou, Zhejiang, China
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| | - Ling Gao
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou, Zhejiang, China
| | - Yonglin Peng
- Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
| | - Zhonghai Fang
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| | - Ju Wang
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| |
Collapse
|
16
|
Lu MQ, He YQ, Wu Y, Zhou HX, Jian Y, Gao W, Bao L, Chen WM. Identification of aberrantly expressed lncRNAs and ceRNA networks in multiple myeloma: a combined high-throughput sequencing and microarray analysis. Front Oncol 2023; 13:1160342. [PMID: 37342185 PMCID: PMC10277558 DOI: 10.3389/fonc.2023.1160342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 05/18/2023] [Indexed: 06/22/2023] Open
Abstract
Background This study aimed to explore the potential effects of long non-coding RNAs (lncRNAs) in multiple myeloma (MM) patients using two detection methods: high-throughput sequencing and microarray. Methods In this study, lncRNAs were detected in 20 newly diagnosed MM patients, with 10 patients analyzed by whole transcriptome-specific RNA sequencing and 10 patients analyzed by microarray (Affymetrix Human Clariom D). The expression levels of lncRNAs, microRNAs, and messenger RNAs (mRNAs) were analyzed, and the differentially expressed lncRNAs identified by both methods were selected. The significant differentially expressed lncRNAs were further validated using PCR. Results This study established the aberrant expression of certain lncRNAs involved in the occurrence of MM, with AC007278.2 and FAM157C showing the most significant differences. The top 5 common pathways identified by the Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis were the chemokine signaling pathway, inflammatory mediator regulation, Th17 cell differentiation, apoptosis, and NF-kappa B signaling pathway. Furthermore, three microRNAs (miRNAs) (miR-4772-3p, miR-617, and miR-618) were found to constitute competing endogenous RNA (ceRNA) networks in both sequencing and microarray analyses. Conclusions By the combination analysis, our understanding of lncRNAs in MM will be increased significantly. More overlapping differentially expressed lncRNAs were found to predict therapeutic targets precisely.
Collapse
Affiliation(s)
- Min-Qiu Lu
- Department of Hematology, Beijing Jishuitan Hospital, Beijing, China
| | - Yu-Qin He
- Department of Emergency, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, China
| | - Yin Wu
- Department of Hematology, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, China
| | - Hui-Xing Zhou
- Department of Hematology, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, China
| | - Yuan Jian
- Department of Hematology, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, China
| | - Wen Gao
- Department of Hematology, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, China
| | - Li Bao
- Department of Hematology, Beijing Jishuitan Hospital, Beijing, China
| | - Wen-Ming Chen
- Department of Hematology, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
17
|
Dosch J, Bergmann H, Tran V, Ebersberger I. FAS: assessing the similarity between proteins using multi-layered feature architectures. Bioinformatics 2023; 39:btad226. [PMID: 37084276 PMCID: PMC10185405 DOI: 10.1093/bioinformatics/btad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 02/23/2023] [Accepted: 04/13/2023] [Indexed: 04/23/2023] Open
Abstract
MOTIVATION Protein sequence comparison is a fundamental element in the bioinformatics toolkit. When sequences are annotated with features such as functional domains, transmembrane domains, low complexity regions or secondary structure elements, the resulting feature architectures allow better informed comparisons. However, many existing schemes for scoring architecture similarities cannot cope with features arising from multiple annotation sources. Those that do fall short in the resolution of overlapping and redundant feature annotations. RESULTS Here, we introduce FAS, a scoring method that integrates features from multiple annotation sources in a directed acyclic architecture graph. Redundancies are resolved as part of the architecture comparison by finding the paths through the graphs that maximize the pair-wise architecture similarity. In a large-scale evaluation on more than 10 000 human-yeast ortholog pairs, architecture similarities assessed with FAS are consistently more plausible than those obtained using e-values to resolve overlaps or leaving overlaps unresolved. Three case studies demonstrate the utility of FAS on architecture comparison tasks: benchmarking of orthology assignment software, identification of functionally diverged orthologs, and diagnosing protein architecture changes stemming from faulty gene predictions. With the help of FAS, feature architecture comparisons can now be routinely integrated into these and many other applications. AVAILABILITY AND IMPLEMENTATION FAS is available as python package: https://pypi.org/project/greedyFAS/.
Collapse
Affiliation(s)
- Julian Dosch
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Holger Bergmann
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Vinh Tran
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIKF), Frankfurt, 60325, Germany
- LOEWE Centre for Translational Biodiversity Genomics (TBG), Frankfurt, 60325, Germany
| |
Collapse
|
18
|
Ternet C, Junk P, Sevrin T, Catozzi S, Wåhlén E, Heldin J, Oliviero G, Wynne K, Kiel C. Analysis of context-specific KRAS-effector (sub)complexes in Caco-2 cells. Life Sci Alliance 2023; 6:e202201670. [PMID: 36894174 PMCID: PMC9998658 DOI: 10.26508/lsa.202201670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 02/24/2023] [Accepted: 02/27/2023] [Indexed: 03/11/2023] Open
Abstract
Ras is a key switch controlling cell behavior. In the GTP-bound form, Ras interacts with numerous effectors in a mutually exclusive manner, where individual Ras-effectors are likely part of larger cellular (sub)complexes. The molecular details of these (sub)complexes and their alteration in specific contexts are not understood. Focusing on KRAS, we performed affinity purification (AP)-mass spectrometry (MS) experiments of exogenously expressed FLAG-KRAS WT and three oncogenic mutants ("genetic contexts") in the human Caco-2 cell line, each exposed to 11 different culture media ("culture contexts") that mimic conditions relevant in the colon and colorectal cancer. We identified four effectors present in complex with KRAS in all genetic and growth contexts ("context-general effectors"). Seven effectors are found in KRAS complexes in only some contexts ("context-specific effectors"). Analyzing all interactors in complex with KRAS per condition, we find that the culture contexts had a larger impact on interaction rewiring than genetic contexts. We investigated how changes in the interactome impact functional outcomes and created a Shiny app for interactive visualization. We validated some of the functional differences in metabolism and proliferation. Finally, we used networks to evaluate how KRAS-effectors are involved in the modulation of functions by random walk analyses of effector-mediated (sub)complexes. Altogether, our work shows the impact of environmental contexts on network rewiring, which provides insights into tissue-specific signaling mechanisms. This may also explain why KRAS oncogenic mutants may be causing cancer only in specific tissues despite KRAS being expressed in most cells and tissues.
Collapse
Affiliation(s)
- Camille Ternet
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin 4, Ireland
- UCD Charles Institute of Dermatology, School of Medicine, University College Dublin, Dublin 4, Ireland
| | - Philipp Junk
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin 4, Ireland
- UCD Charles Institute of Dermatology, School of Medicine, University College Dublin, Dublin 4, Ireland
| | - Thomas Sevrin
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin 4, Ireland
- UCD Charles Institute of Dermatology, School of Medicine, University College Dublin, Dublin 4, Ireland
| | - Simona Catozzi
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin 4, Ireland
- UCD Charles Institute of Dermatology, School of Medicine, University College Dublin, Dublin 4, Ireland
| | - Erik Wåhlén
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Johan Heldin
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Giorgio Oliviero
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin 4, Ireland
| | - Kieran Wynne
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin 4, Ireland
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Dublin 4, Ireland
| | - Christina Kiel
- Department of Molecular Medicine, University of Pavia, Pavia, Italy
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin 4, Ireland
- UCD Charles Institute of Dermatology, School of Medicine, University College Dublin, Dublin 4, Ireland
| |
Collapse
|
19
|
Daniali M, Galer PD, Lewis-Smith D, Parthasarathy S, Kim E, Salvucci DD, Miller JM, Haag S, Helbig I. Enriching representation learning using 53 million patient notes through human phenotype ontology embedding. Artif Intell Med 2023; 139:102523. [PMID: 37100502 PMCID: PMC10782859 DOI: 10.1016/j.artmed.2023.102523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 02/17/2023] [Accepted: 02/23/2023] [Indexed: 03/04/2023]
Abstract
The Human Phenotype Ontology (HPO) is a dictionary of >15,000 clinical phenotypic terms with defined semantic relationships, developed to standardize phenotypic analysis. Over the last decade, the HPO has been used to accelerate the implementation of precision medicine into clinical practice. In addition, recent research in representation learning, specifically in graph embedding, has led to notable progress in automated prediction via learned features. Here, we present a novel approach to phenotype representation by incorporating phenotypic frequencies based on 53 million full-text health care notes from >1.5 million individuals. We demonstrate the efficacy of our proposed phenotype embedding technique by comparing our work to existing phenotypic similarity-measuring methods. Using phenotype frequencies in our embedding technique, we are able to identify phenotypic similarities that surpass current computational models. Furthermore, our embedding technique exhibits a high degree of agreement with domain experts' judgment. By transforming complex and multidimensional phenotypes from the HPO format into vectors, our proposed method enables efficient representation of these phenotypes for downstream tasks that require deep phenotyping. This is demonstrated in a patient similarity analysis and can further be applied to disease trajectory and risk prediction.
Collapse
Affiliation(s)
- Maryam Daniali
- Department of Computer Science, Drexel University, Philadelphia, PA, USA; Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Peter D Galer
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA; The Epilepsy Neuro Genetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - David Lewis-Smith
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA; The Epilepsy Neuro Genetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Translational and Clinical Research Institute, Newcastle University, Newcastle-upon-Tyne, UK; Department of Clinical Neurosciences, Royal Victoria Infirmary, Newcastle-upon-Tyne, UK
| | - Shridhar Parthasarathy
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA; The Epilepsy Neuro Genetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Edward Kim
- Department of Computer Science, Drexel University, Philadelphia, PA, USA
| | - Dario D Salvucci
- Department of Computer Science, Drexel University, Philadelphia, PA, USA
| | - Jeffrey M Miller
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Scott Haag
- Department of Computer Science, Drexel University, Philadelphia, PA, USA; Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Ingo Helbig
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA; The Epilepsy Neuro Genetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA.
| |
Collapse
|
20
|
Ferdousi F, Sasaki K, Fukumitsu S, Kuwata H, Nakajima M, Isoda H. A Descriptive Whole-Genome Transcriptomics Study in a Stem Cell-Based Tool Predicts Multiple Tissue-Specific Beneficial Potential and Molecular Targets of Carnosic Acid. Int J Mol Sci 2023; 24:ijms24098077. [PMID: 37175790 PMCID: PMC10179098 DOI: 10.3390/ijms24098077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023] Open
Abstract
Carnosic acid (CA) is a phenolic diterpene widely distributed in herbal plants, rosemary and sage. Although its medicinal properties, such as antioxidant, antimicrobial, and neuroprotective effects, have been well-documented, its relevant biochemical processes and molecular targets have not been fully explored yet. In the present study, we conducted an untargeted whole-genome transcriptomics analysis to investigate CA-induced early biological and molecular events in human amniotic epithelial stem cells (hAESCs) with the aim of exploring its multiple tissue-specific functionalities and potential molecular targets. We found that seven days of CA treatment in hAESCs could induce mesoderm-lineage-specific differentiation. Tissue enrichment analysis revealed that CA significantly enriched lateral plate mesoderm-originated cardiovascular and adipose tissues. Further tissue-specific PPI analysis and kinase and transcription factor enrichment analyses identified potential upstream regulators and molecular targets of CA in a tissue-specific manner. Gene ontology enrichment analyses revealed the metabolic, antioxidant, and antifibrotic activities of CA. Altogether, our comprehensive whole-genome transcriptomics analyses offer a thorough understanding of the possible underlying molecular mechanism of CA.
Collapse
Affiliation(s)
- Farhana Ferdousi
- Alliance for Research on the Mediterranean and North Africa (ARENA), University of Tsukuba, Tsukuba 305-8572, Japan
| | - Kazunori Sasaki
- Alliance for Research on the Mediterranean and North Africa (ARENA), University of Tsukuba, Tsukuba 305-8572, Japan
- Open Innovation Laboratory for Food and Medicinal Resource Engineering (FoodMed-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba 305-0821, Japan
| | - Satoshi Fukumitsu
- NIPPN Corporation, Tokyo 243-0041, Japan
- Tsukuba Life Science Innovation Program (T-LSI), University of Tsukuba, Tsukuba 305-8577, Japan
| | | | - Mitsutoshi Nakajima
- Alliance for Research on the Mediterranean and North Africa (ARENA), University of Tsukuba, Tsukuba 305-8572, Japan
- Open Innovation Laboratory for Food and Medicinal Resource Engineering (FoodMed-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba 305-0821, Japan
- MED R&D Corporation, Tsukuba 305-8572, Japan
- Faculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba 305-8572, Japan
| | - Hiroko Isoda
- Alliance for Research on the Mediterranean and North Africa (ARENA), University of Tsukuba, Tsukuba 305-8572, Japan
- Open Innovation Laboratory for Food and Medicinal Resource Engineering (FoodMed-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba 305-0821, Japan
- Tsukuba Life Science Innovation Program (T-LSI), University of Tsukuba, Tsukuba 305-8577, Japan
- MED R&D Corporation, Tsukuba 305-8572, Japan
- Faculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba 305-8572, Japan
| |
Collapse
|
21
|
Kartheeswaran KP, Rayan AXA, Varrieth GT. Enhanced disease-disease association with information enriched disease representation. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:8892-8932. [PMID: 37161227 DOI: 10.3934/mbe.2023391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
OBJECTIVE Quantification of disease-disease association (DDA) enables the understanding of disease relationships for discovering disease progression and finding comorbidity. For effective DDA strength calculation, there is a need to address the main challenge of integration of various biomedical aspects of DDA is to obtain an information rich disease representation. MATERIALS AND METHODS An enhanced and integrated DDA framework is developed that integrates enriched literature-based with concept-based DDA representation. The literature component of the proposed framework uses PubMed abstracts and consists of improved neural network model that classifies DDAs for an enhanced literature-based DDA representation. Similarly, an ontology-based joint multi-source association embedding model is proposed in the ontology component using Disease Ontology (DO), UMLS, claims insurance, clinical notes etc. Results and Discussion: The obtained information rich disease representation is evaluated on different aspects of DDA datasets such as Gene, Variant, Gene Ontology (GO) and a human rated benchmark dataset. The DDA scores calculated using the proposed method achieved a high correlation mainly in gene-based dataset. The quantified scores also shown better correlation of 0.821, when evaluated on human rated 213 disease pairs. In addition, the generated disease representation is proved to have substantial effect on correlation of DDA scores for different categories of disease pairs. CONCLUSION The enhanced context and semantic DDA framework provides an enriched disease representation, resulting in high correlated results with different DDA datasets. We have also presented the biological interpretation of disease pairs. The developed framework can also be used for deriving the strength of other biomedical associations.
Collapse
|
22
|
Le Priol C, Azencott CA, Gidrol X. Detection of genes with differential expression dispersion unravels the role of autophagy in cancer progression. PLoS Comput Biol 2023; 19:e1010342. [PMID: 36893104 PMCID: PMC9997931 DOI: 10.1371/journal.pcbi.1010342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 02/09/2023] [Indexed: 03/10/2023] Open
Abstract
The majority of gene expression studies focus on the search for genes whose mean expression is different between two or more populations of samples in the so-called "differential expression analysis" approach. However, a difference in variance in gene expression may also be biologically and physiologically relevant. In the classical statistical model used to analyze RNA-sequencing (RNA-seq) data, the dispersion, which defines the variance, is only considered as a parameter to be estimated prior to identifying a difference in mean expression between conditions of interest. Here, we propose to evaluate four recently published methods, which detect differences in both the mean and dispersion in RNA-seq data. We thoroughly investigated the performance of these methods on simulated datasets and characterized parameter settings to reliably detect genes with a differential expression dispersion. We applied these methods to The Cancer Genome Atlas datasets. Interestingly, among the genes with an increased expression dispersion in tumors and without a change in mean expression, we identified some key cellular functions, most of which were related to catabolism and were overrepresented in most of the analyzed cancers. In particular, our results highlight autophagy, whose role in cancerogenesis is context-dependent, illustrating the potential of the differential dispersion approach to gain new insights into biological processes and to discover new biomarkers.
Collapse
Affiliation(s)
- Christophe Le Priol
- Univ. Grenoble Alpes, INSERM, CEA-IRIG, Biomics, Grenoble, France
- * E-mail: (CLP); (XG)
| | - Chloé-Agathe Azencott
- Center for Computational Biology, Mines ParisTech, PSL Research University, Paris, France
- Institut Curie, Paris, France
- INSERM U900, Paris, France
| | - Xavier Gidrol
- Univ. Grenoble Alpes, INSERM, CEA-IRIG, Biomics, Grenoble, France
- * E-mail: (CLP); (XG)
| |
Collapse
|
23
|
Khazaal A, Zandavi SM, Smolnikov A, Fatima S, Vafaee F. Pan-Cancer Analysis Reveals Functional Similarity of Three lncRNAs across Multiple Tumors. Int J Mol Sci 2023; 24:ijms24054796. [PMID: 36902227 PMCID: PMC10003012 DOI: 10.3390/ijms24054796] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 02/24/2023] [Accepted: 02/28/2023] [Indexed: 03/06/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) are emerging as key regulators in many biological processes. The dysregulation of lncRNA expression has been associated with many diseases, including cancer. Mounting evidence suggests lncRNAs to be involved in cancer initiation, progression, and metastasis. Thus, understanding the functional implications of lncRNAs in tumorigenesis can aid in developing novel biomarkers and therapeutic targets. Rich cancer datasets, documenting genomic and transcriptomic alterations together with advancement in bioinformatics tools, have presented an opportunity to perform pan-cancer analyses across different cancer types. This study is aimed at conducting a pan-cancer analysis of lncRNAs by performing differential expression and functional analyses between tumor and non-neoplastic adjacent samples across eight cancer types. Among dysregulated lncRNAs, seven were shared across all cancer types. We focused on three lncRNAs, found to be consistently dysregulated among tumors. It has been observed that these three lncRNAs of interest are interacting with a wide range of genes across different tissues, yet enriching substantially similar biological processes, found to be implicated in cancer progression and proliferation.
Collapse
Affiliation(s)
- Abir Khazaal
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW 2052, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW 2052, Australia
| | - Seid Miad Zandavi
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW 2052, Australia
- Harvard Medical School, Harvard University, Boston, MA 02115, USA
| | - Andrei Smolnikov
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW 2052, Australia
| | - Shadma Fatima
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW 2052, Australia
- Ingham Institute of Applied Medical Research, Sydney, NSW 2170, Australia
| | - Fatemeh Vafaee
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW 2052, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW 2052, Australia
- Correspondence:
| |
Collapse
|
24
|
Bandyopadhyay SS, Halder AK, Saha S, Chatterjee P, Nasipuri M, Basu S. Assessment of GO-Based Protein Interaction Affinities in the Large-Scale Human–Coronavirus Family Interactome. Vaccines (Basel) 2023; 11:vaccines11030549. [PMID: 36992133 DOI: 10.3390/vaccines11030549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 02/19/2023] [Accepted: 02/23/2023] [Indexed: 03/03/2023] Open
Abstract
SARS-CoV-2 is a novel coronavirus that replicates itself via interacting with the host proteins. As a result, identifying virus and host protein-protein interactions could help researchers better understand the virus disease transmission behavior and identify possible COVID-19 drugs. The International Committee on Virus Taxonomy has determined that nCoV is genetically 89% compared to the SARS-CoV epidemic in 2003. This paper focuses on assessing the host–pathogen protein interaction affinity of the coronavirus family, having 44 different variants. In light of these considerations, a GO-semantic scoring function is provided based on Gene Ontology (GO) graphs for determining the binding affinity of any two proteins at the organism level. Based on the availability of the GO annotation of the proteins, 11 viral variants, viz., SARS-CoV-2, SARS, MERS, Bat coronavirus HKU3, Bat coronavirus Rp3/2004, Bat coronavirus HKU5, Murine coronavirus, Bovine coronavirus, Rat coronavirus, Bat coronavirus HKU4, Bat coronavirus 133/2005, are considered from 44 viral variants. The fuzzy scoring function of the entire host–pathogen network has been processed with ~180 million potential interactions generated from 19,281 host proteins and around 242 viral proteins. ~4.5 million potential level one host–pathogen interactions are computed based on the estimated interaction affinity threshold. The resulting host–pathogen interactome is also validated with state-of-the-art experimental networks. The study has also been extended further toward the drug-repurposing study by analyzing the FDA-listed COVID drugs.
Collapse
Affiliation(s)
- Soumyendu Sekhar Bandyopadhyay
- Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
- Department of Computer Science and Engineering, School of Engineering and Technology, Adamas University, Kolkata 700126, India
| | - Anup Kumar Halder
- Faculty of Mathematics and Information Sciences, Warsaw University of Technology, 00-662 Warsaw, Poland
| | - Sovan Saha
- Department of Computer Science and Engineering (Artificial Intelligence and Machine Learning), Techno Main Salt Lake, Sector V, Kolkata 700091, India
| | - Piyali Chatterjee
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata 700152, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
| |
Collapse
|
25
|
Xue X, Zhang W, Fan A. Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins. PLoS One 2023; 18:e0284274. [PMID: 37083829 PMCID: PMC10121005 DOI: 10.1371/journal.pone.0284274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 03/28/2023] [Indexed: 04/22/2023] Open
Abstract
Identifying key proteins from protein-protein interaction (PPI) networks is one of the most fundamental and important tasks for computational biologists. However, the protein interactions obtained by high-throughput technology are characterized by a high false positive rate, which severely hinders the prediction accuracy of the current computational methods. In this paper, we propose a novel strategy to identify key proteins by constructing reliable PPI networks. Five Gene Ontology (GO)-based semantic similarity measurements (Jiang, Lin, Rel, Resnik, and Wang) are used to calculate the confidence scores for protein pairs under three annotation terms (Molecular function (MF), Biological process (BP), and Cellular component (CC)). The protein pairs with low similarity values are assumed to be low-confidence links, and the refined PPI networks are constructed by filtering the low-confidence links. Six topology-based centrality methods (the BC, DC, EC, NC, SC, and aveNC) are applied to test the performance of the measurements under the original network and refined network. We systematically compare the performance of the five semantic similarity metrics with the three GO annotation terms on four benchmark datasets, and the simulation results show that the performance of these centrality methods under refined PPI networks is relatively better than that under the original networks. Resnik with a BP annotation term performs best among all five metrics with the three annotation terms. These findings suggest the importance of semantic similarity metrics in measuring the reliability of the links between proteins and highlight the Resnik metric with the BP annotation term as a favourable choice.
Collapse
Affiliation(s)
- Xiaoli Xue
- School of Science, East China Jiaotong University, Nanchang, China
| | - Wei Zhang
- School of Science, East China Jiaotong University, Nanchang, China
| | - Anjing Fan
- School of Computer and Information Engineering, Anyang Normal University, Anyang, China
| |
Collapse
|
26
|
Moreyra NN, Almeida FC, Allan C, Frankel N, Matzkin LM, Hasson E. Phylogenomics provides insights into the evolution of cactophily and host plant shifts in Drosophila. Mol Phylogenet Evol 2023; 178:107653. [PMID: 36404461 DOI: 10.1016/j.ympev.2022.107653] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 09/30/2022] [Accepted: 10/25/2022] [Indexed: 11/06/2022]
Abstract
Cactophilic species of the Drosophila buzzatii cluster (repleta group) comprise an excellent model group to investigate genomic changes underlying adaptation to extreme climate conditions and host plants. In particular, these species form a tractable system to study the transition from chemically simpler breeding sites (like prickly pears of the genus Opuntia) to chemically more complex hosts (columnar cacti). Here, we report four highly contiguous genome assemblies of three species of the buzzatii cluster. Based on this genomic data and inferred phylogenetic relationships, we identified candidate taxonomically restricted genes (TRGs) likely involved in the evolution of cactophily and cactus host specialization. Functional enrichment analyses of TRGs within the buzzatii cluster identified genes involved in detoxification, water preservation, immune system response, anatomical structure development, and morphogenesis. In contrast, processes that regulate responses to stress, as well as the metabolism of nitrogen compounds, transport, and secretion were found in the set of species that are columnar cacti dwellers. These findings are in line with the hypothesis that those genomic changes brought about key mechanisms underlying the adaptation of the buzzatii cluster species to arid regions in South America.
Collapse
Affiliation(s)
- Nicolás Nahuel Moreyra
- Departamento de Ecología, Genética y Evolución (EGE), Facultad de Ciencias Exactas y Naturales (FCEyN), Universidad de Buenos Aires (UBA), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina; Instituto de Ecología, Genética y Evolución de Buenos Aires (IEGEBA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina.
| | - Francisca Cunha Almeida
- Departamento de Ecología, Genética y Evolución (EGE), Facultad de Ciencias Exactas y Naturales (FCEyN), Universidad de Buenos Aires (UBA), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina; Instituto de Ecología, Genética y Evolución de Buenos Aires (IEGEBA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina.
| | - Carson Allan
- Department of Entomology, University of Arizona, Tucson, AZ 85719, USA.
| | - Nicolás Frankel
- Departamento de Ecología, Genética y Evolución (EGE), Facultad de Ciencias Exactas y Naturales (FCEyN), Universidad de Buenos Aires (UBA), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina; Instituto de Ecología, Genética y Evolución de Buenos Aires (IEGEBA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina.
| | | | - Esteban Hasson
- Departamento de Ecología, Genética y Evolución (EGE), Facultad de Ciencias Exactas y Naturales (FCEyN), Universidad de Buenos Aires (UBA), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina; Instituto de Ecología, Genética y Evolución de Buenos Aires (IEGEBA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina.
| |
Collapse
|
27
|
Sayols S. rrvgo: a Bioconductor package for interpreting lists of Gene Ontology terms. MICROPUBLICATION BIOLOGY 2023; 2023:10.17912/micropub.biology.000811. [PMID: 37151216 PMCID: PMC10155054 DOI: 10.17912/micropub.biology.000811] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 04/13/2023] [Accepted: 04/17/2023] [Indexed: 05/09/2023]
Abstract
Gene Ontology (GO) annotation is often used to guide the biological interpretation of high-throughput omics experiments, e.g. by analysing lists of differentially regulated genes for enriched GO terms. Due to the hierarchical nature of GOs, the resulting lists of enriched terms are usually redundant and difficult to summarise and interpret. To facilitate the interpretation of large lists of GO terms, I developed rrvgo, a Bioconductor package that aims at simplifying the redundancy of GO lists by grouping similar terms based on their semantic similarity. rrvgo also provides different visualization options to guide the interpretation of the summarized GO terms. Considering that several software tools have been developed for this purpose, rrvgo is unique at combining powerful visualizations in a programmatic interface coupled with up-to-date GO gene annotation provided by the Bioconductor project.
Collapse
Affiliation(s)
- Sergi Sayols
- Bioinformatics Core Facility, Institute of Molecular Biology, Mainz, 55128, Germany
- Correspondence to: Sergi Sayols (
)
| |
Collapse
|
28
|
Ismail E, Gad W, Hashem M. HEC-ASD: a hybrid ensemble-based classification model for predicting autism spectrum disorder disease genes. BMC Bioinformatics 2022; 23:554. [PMID: 36544099 PMCID: PMC9768984 DOI: 10.1186/s12859-022-05099-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open
Abstract
PURPOSE Autism spectrum disorder (ASD) is the most prevalent disease today. The causes of its infection may be attributed to genetic causes by 80% and environmental causes by 20%. In spite of this, the majority of the current research is concerned with environmental causes, and the least proportion with the genetic causes of the disease. Autism is a complex disease, which makes it difficult to identify the genes that cause the disease. METHODS Hybrid ensemble-based classification (HEC-ASD) model for predicting ASD genes using gradient boosting machines is proposed. The proposed model utilizes gene ontology (GO) to construct a gene functional similarity matrix using hybrid gene similarity (HGS) method. HGS measures the semantic similarity between genes effectively. It combines the graph-based method, such as Wang method with the number of directed children's nodes of gene term from GO. Moreover, an ensemble gradient boosting classifier is adapted to enhance the prediction of genes forming a robust classification model. RESULTS The proposed model is evaluated using the Simons Foundation Autism Research Initiative (SFARI) gene database. The experimental results are promising as they improve the classification performance for predicting ASD genes. The results are compared with other approaches that used gene regulatory network (GRN), protein to protein interaction network (PPI), or GO. The HEC-ASD model reaches the highest prediction accuracy of 0.88% using ensemble learning classifiers. CONCLUSION The proposed model demonstrates that ensemble learning technique using gradient boosting is effective in predicting autism spectrum disorder genes. Moreover, the HEC-ASD model utilized GO rather than using PPI network and GRN.
Collapse
Affiliation(s)
- Eman Ismail
- grid.7269.a0000 0004 0621 1570Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| | - Walaa Gad
- grid.7269.a0000 0004 0621 1570Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| | - Mohamed Hashem
- grid.7269.a0000 0004 0621 1570Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| |
Collapse
|
29
|
Qu JH, Tarasov KV, Chakir K, Tarasova YS, Riordon DR, Lakatta EG. Proteomic Landscape and Deduced Functions of the Cardiac 14-3-3 Protein Interactome. Cells 2022; 11:cells11213496. [PMID: 36359893 PMCID: PMC9654263 DOI: 10.3390/cells11213496] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 10/17/2022] [Accepted: 10/24/2022] [Indexed: 11/09/2022] Open
Abstract
Rationale: The 14-3-3 protein family is known to interact with many proteins in non-cardiac cell types to regulate multiple signaling pathways, particularly those relating to energy and protein homeostasis; and the 14-3-3 network is a therapeutic target of critical metabolic and proteostatic signaling in cancer and neurological diseases. Although the heart is critically sensitive to nutrient and energy alterations, and multiple signaling pathways coordinate to maintain the cardiac cell homeostasis, neither the structure of cardiac 14-3-3 protein interactome, nor potential functional roles of 14-3-3 protein–protein interactions (PPIs) in heart has been explored. Objective: To establish the comprehensive landscape and characterize the functional role of cardiac 14-3-3 PPIs. Methods and Results: We evaluated both RNA expression and protein abundance of 14-3-3 isoforms in mouse heart, followed by co-immunoprecipitation of 14-3-3 proteins and mass spectrometry in left ventricle. We identified 52 proteins comprising the cardiac 14-3-3 interactome. Multiple bioinformatic analyses indicated that more than half of the proteins bound to 14-3-3 are related to mitochondria; and the deduced functions of the mitochondrial 14-3-3 network are to regulate cardiac ATP production via interactions with mitochondrial inner membrane proteins, especially those in mitochondrial complex I. Binding to ribosomal proteins, 14-3-3 proteins likely coordinate protein synthesis and protein quality control. Localizations of 14-3-3 proteins to mitochondria and ribosome were validated via immunofluorescence assays. The deduced function of cardiac 14-3-3 PPIs is to regulate cardiac metabolic homeostasis and proteostasis. Conclusions: Thus, the cardiac 14-3-3 interactome may be a potential therapeutic target in cardiovascular metabolic and proteostatic disease states, as it already is in cancer therapy.
Collapse
|
30
|
Kahilainen A, Oostra V, Somervuo P, Minard G, Saastamoinen M. Alternative developmental and transcriptomic responses to host plant water limitation in a butterfly metapopulation. Mol Ecol 2022; 31:5666-5683. [PMID: 34516691 DOI: 10.1111/mec.16178] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Revised: 08/06/2021] [Accepted: 09/02/2021] [Indexed: 01/13/2023]
Abstract
Predicting how climate change affects biotic interactions poses a challenge. Plant-insect herbivore interactions are particularly sensitive to climate change, as climate-induced changes in plant quality cascade into the performance of insect herbivores. Whereas the immediate survival of herbivore individuals depends on plastic responses to climate change-induced nutritional stress, long-term population persistence via evolutionary adaptation requires genetic variation for these responses. To assess the prospects for population persistence under climate change, it is therefore crucial to characterize response mechanisms to climate change-induced stressors, and quantify their variability in natural populations. Here, we test developmental and transcriptomic responses to water limitation-induced host plant quality change in a Glanville fritillary butterfly (Melitaea cinxia) metapopulation. We combine nuclear magnetic resonance spectroscopy on the plant metabolome, larval developmental assays and an RNA sequencing analysis of the larval transcriptome. We observed that responses to feeding on water-limited plants, in which amino acids and aromatic compounds are enriched, showed marked variation within the metapopulation, with individuals of some families performing better on control and others on water-limited plants. The transcriptomic responses were concordant with the developmental responses: families exhibiting opposite developmental responses also produced opposite transcriptomic responses (e.g. in growth-associated transcripts). The divergent responses in both larval development and transcriptome are associated with differences between families in amino acid catabolism and storage protein production. The results reveal intrapopulation variability in plasticity, suggesting that the Finnish M. cinxia metapopulation harbours potential for buffering against drought-induced changes in host plant quality.
Collapse
Affiliation(s)
- Aapo Kahilainen
- Organismal and Evolutionary Biology Research Programme, University of Helsinki, P.O. Box 65, Helsinki, FIN-00014, Finland
| | - Vicencio Oostra
- Organismal and Evolutionary Biology Research Programme, University of Helsinki, P.O. Box 65, Helsinki, FIN-00014, Finland.,Department of Evolution, Ecology and Behaviour, University of Liverpool, Crown Street, Liverpool, L69 7ZB, United Kingdom
| | - Panu Somervuo
- Organismal and Evolutionary Biology Research Programme, University of Helsinki, P.O. Box 65, Helsinki, FIN-00014, Finland
| | - Guillaume Minard
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS, INRAe, VetAgro Sup, UMR Ecologie Microbienne, Villeurbanne, France
| | - Marjo Saastamoinen
- Organismal and Evolutionary Biology Research Programme, University of Helsinki, P.O. Box 65, Helsinki, FIN-00014, Finland.,Helsinki Institute of Life Science, University of Helsinki, Finland
| |
Collapse
|
31
|
Zhao Y, Guo Q, Cao S, Tian Y, Han K, Sun Y, Li J, Yang Q, Ji Q, Sederoff R, Li Y. Genome-wide identification of the AlkB homologs gene family, PagALKBH9B and PagALKBH10B regulated salt stress response in Populus. FRONTIERS IN PLANT SCIENCE 2022; 13:994154. [PMID: 36204058 PMCID: PMC9530910 DOI: 10.3389/fpls.2022.994154] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 08/26/2022] [Indexed: 06/16/2023]
Abstract
The AlkB homologs (ALKBH) gene family regulates N6-methyladenosine (m6A) RNA methylation and is involved in plant growth and the abiotic stress response. Poplar is an important model plant for studying perennial woody plants. Poplars typically have a long juvenile period of 7-10 years, requiring long periods of time for studies of flowering or mature wood properties. Consequently, functional studies of the ALKBH genes in Populus species have been limited. Based on AtALKBHs sequence similarity with Arabidopsis thaliana, 23 PagALKBHs were identified in the genome of the poplar 84K hybrid genotype (P. alba × P. tremula var. glandulosa), and gene structures and conserved domains were confirmed between homologs. The PagALKBH proteins were classified into six groups based on conserved sequence compared with human, Arabidopsis, maize, rice, wheat, tomato, barley, and grape. All homologs of PagALKBHs were tissue-specific; most were highly expressed in leaves. ALKBH9B and ALKBH10B are m6A demethylases and overexpression of their homologs PagALKBH9B and PagALKBH10B reduced m6A RNA methylation in transgenic lines. The number of adventitious roots and the biomass accumulation of transgenic lines decreased compared with WT. Therefore, PagALKBH9B and PagALKBH10B mediate m6A RNA demethylation and play a regulatory role in poplar growth and development. Overexpression of PagALKBH9B and PagALKBH10B can reduce the accumulation of H2O2 and oxidative damage by increasing the activities of SOD, POD, and CAT, and enhancing protection for Chl a/b, thereby increasing the salt tolerance of transgenic lines. However, overexpression lines were more sensitive to drought stress due to reduced proline content. This research revealed comprehensive information about the PagALKBH gene family and their roles in growth and development and responsing to salt stress of poplar.
Collapse
Affiliation(s)
- Ye Zhao
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants of Ministry of Education, College of Biological Sciences and Technology, National Engineering Research Center of Tree Breeding and Ecological Restoration, Engineering Technology Research Center of Black Locust of National Forestry and Grassland Administration, Beijing Forestry University, Beijing, China
| | - Qi Guo
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants of Ministry of Education, College of Biological Sciences and Technology, National Engineering Research Center of Tree Breeding and Ecological Restoration, Engineering Technology Research Center of Black Locust of National Forestry and Grassland Administration, Beijing Forestry University, Beijing, China
| | - Sen Cao
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants of Ministry of Education, College of Biological Sciences and Technology, National Engineering Research Center of Tree Breeding and Ecological Restoration, Engineering Technology Research Center of Black Locust of National Forestry and Grassland Administration, Beijing Forestry University, Beijing, China
| | - Yanting Tian
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants of Ministry of Education, College of Biological Sciences and Technology, National Engineering Research Center of Tree Breeding and Ecological Restoration, Engineering Technology Research Center of Black Locust of National Forestry and Grassland Administration, Beijing Forestry University, Beijing, China
| | - Kunjin Han
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants of Ministry of Education, College of Biological Sciences and Technology, National Engineering Research Center of Tree Breeding and Ecological Restoration, Engineering Technology Research Center of Black Locust of National Forestry and Grassland Administration, Beijing Forestry University, Beijing, China
| | - Yuhan Sun
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants of Ministry of Education, College of Biological Sciences and Technology, National Engineering Research Center of Tree Breeding and Ecological Restoration, Engineering Technology Research Center of Black Locust of National Forestry and Grassland Administration, Beijing Forestry University, Beijing, China
| | - Juan Li
- Natural Resources and Planning Bureau of Yanshan County, Cangzhou, Hebei, China
| | - Qingshan Yang
- Shandong Academy of Forestry, Jinan, Shandong, China
| | - Qingju Ji
- Cangzhou Municipal Forestry Seeding and Cutting Management Center, Cangzhou, China
| | - Ronald Sederoff
- Forest Biotechnology Group, Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, United States
| | - Yun Li
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants of Ministry of Education, College of Biological Sciences and Technology, National Engineering Research Center of Tree Breeding and Ecological Restoration, Engineering Technology Research Center of Black Locust of National Forestry and Grassland Administration, Beijing Forestry University, Beijing, China
| |
Collapse
|
32
|
Ayub U, Naveed H. BioAlign: An Accurate Global PPI Network Alignment Algorithm. Evol Bioinform Online 2022; 18:11769343221110658. [PMID: 35898232 PMCID: PMC9309777 DOI: 10.1177/11769343221110658] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 06/02/2022] [Indexed: 11/15/2022] Open
Abstract
Motivation The advancement of high-throughput PPI profiling techniques results in generating a large amount of PPI data. The alignment of the PPI networks uncovers the relationship between the species that can help understand the biological systems. The comparative study reveals the conserved biological interactions of the proteins across the species. It can also help study the biological pathways and signal networks of the cells. Although several network alignment algorithms are developed to study and compare the PPI data, the development of the aligner that aligns the PPI networks with high biological similarity and coverage is still challenging. Results This paper presents a novel global network alignment algorithm, BioAlign, that incorporates a significant amount of biological information. Existing studies use global sequence and/or 3D-structure similarity to align the PPI networks. In contrast, BioAlign uses the local sequence similarity, predicted secondary structure motifs, and remote homology in addition to global sequence and 3D-structure similarity. The extra sources of biological information help BioAlign to align the proteins with high biological similarity. BioAlign produces significantly better results in terms of AFS and Coverage (6-32 and 7-34 with respect to MF and BP, respectively) than the existing algorithms. BioAlign aligns a much larger number of proteins that have high biological similarities as compared to the existing aligners. BioAlign helps in studying the functionally similar protein pairs across the species.
Collapse
Affiliation(s)
- Umair Ayub
- FAST School of Computing, National
University of Computer and Emerging Sciences, Lahore, Pakistan
- Computational Biology Research Lab,
Department of Computing, National University of Computer and Emerging Sciences,
Islamabad, Pakistan
| | - Hammad Naveed
- FAST School of Computing, National
University of Computer and Emerging Sciences, Lahore, Pakistan
- Computational Biology Research Lab,
Department of Computing, National University of Computer and Emerging Sciences,
Islamabad, Pakistan
- Hammad Naveed, Computational Biology
Research Lab, Department of Computing, National University of Computer and
Emerging Sciences, 852 Milaad Street, Block B, Faisal Town, Lahore, Pakistan.
| |
Collapse
|
33
|
James K, Alsobhe A, Cockell SJ, Wipat A, Pocock M. Integration of probabilistic functional networks without an external Gold Standard. BMC Bioinformatics 2022; 23:302. [PMID: 35879662 PMCID: PMC9316706 DOI: 10.1186/s12859-022-04834-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 07/11/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Probabilistic functional integrated networks (PFINs) are designed to aid our understanding of cellular biology and can be used to generate testable hypotheses about protein function. PFINs are generally created by scoring the quality of interaction datasets against a Gold Standard dataset, usually chosen from a separate high-quality data source, prior to their integration. Use of an external Gold Standard has several drawbacks, including data redundancy, data loss and the need for identifier mapping, which can complicate the network build and impact on PFIN performance. Additionally, there typically are no Gold Standard data for non-model organisms. RESULTS We describe the development of an integration technique, ssNet, that scores and integrates both high-throughput and low-throughout data from a single source database in a consistent manner without the need for an external Gold Standard dataset. Using data from Saccharomyces cerevisiae we show that ssNet is easier and faster, overcoming the challenges of data redundancy, Gold Standard bias and ID mapping. In addition ssNet results in less loss of data and produces a more complete network. CONCLUSIONS The ssNet method allows PFINs to be built successfully from a single database, while producing comparable network performance to networks scored using an external Gold Standard source and with reduced data loss.
Collapse
Affiliation(s)
- Katherine James
- Department of Applied Sciences, Northumbria University, Sandyford Rd, Newcastle upon Tyne, NE1 8ST, UK. .,Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK.
| | - Aoesha Alsobhe
- Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK.,Saudi Electronic University, Abi Bakr As Siddiq Branch Rd, Riyadh, 1332, Saudi Arabia
| | - Simon J Cockell
- School of Biomedical, Nutritional and Sports Science, Faculty of Medical Sciences, Newcastle University, Framlington Place, Newcastle upon Tyne, NE2 4HH, UK
| | - Anil Wipat
- Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK
| | - Matthew Pocock
- Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK
| |
Collapse
|
34
|
Wang S, Atkinson GRS, Hayes WB. SANA: cross-species prediction of Gene Ontology GO annotations via topological network alignment. NPJ Syst Biol Appl 2022; 8:25. [PMID: 35859153 PMCID: PMC9300714 DOI: 10.1038/s41540-022-00232-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 05/20/2022] [Indexed: 12/31/2022] Open
Abstract
Topological network alignment aims to align two networks node-wise in order to maximize the observed common connection (edge) topology between them. The topological alignment of two protein-protein interaction (PPI) networks should thus expose protein pairs with similar interaction partners allowing, for example, the prediction of common Gene Ontology (GO) terms. Unfortunately, no network alignment algorithm based on topology alone has been able to achieve this aim, though those that include sequence similarity have seen some success. We argue that this failure of topology alone is due to the sparsity and incompleteness of the PPI network data of almost all species, which provides the network topology with a small signal-to-noise ratio that is effectively swamped when sequence information is added to the mix. Here we show that the weak signal can be detected using multiple stochastic samples of "good" topological network alignments, which allows us to observe regions of the two networks that are robustly aligned across multiple samples. The resulting network alignment frequency (NAF) strongly correlates with GO-based Resnik semantic similarity and enables the first successful cross-species predictions of GO terms based on topology-only network alignments. Our best predictions have an AUPR of about 0.4, which is competitive with state-of-the-art algorithms, even when there is no observable sequence similarity and no known homology relationship. While our results provide only a "proof of concept" on existing network data, we hypothesize that predicting GO terms from topology-only network alignments will become increasingly practical as the volume and quality of PPI network data increase.
Collapse
Affiliation(s)
- Siyue Wang
- Department of Computer Science, University of California, Irvine, CA, 92697-3435, USA
| | - Giles R S Atkinson
- Department of Computer Science, University of California, Irvine, CA, 92697-3435, USA
| | - Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA, 92697-3435, USA.
| |
Collapse
|
35
|
Wang S, Chen X, Frederisy BJ, Mbakogu BA, Kanne AD, Khosravi P, Hayes WB. On the current failure-but bright future-of topology-driven biological network alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 131:1-44. [PMID: 35871888 DOI: 10.1016/bs.apcsb.2022.05.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Since the function of a protein is defined by its interaction partners, and since we expect similar interaction patterns across species, the alignment of protein-protein interaction (PPI) networks between species, based on network topology alone, should uncover functionally related proteins across species. Surprisingly, despite the publication of more than fifty algorithms aimed at performing PPI network alignment, few have demonstrated a statistically significant link between network topology and functional similarity, and none have demonstrated that orthologs can be recovered using network topology alone. We find that the major contributing factors to this surprising failure are: (i) edge densities in most currently available experimental PPI networks are demonstrably too low to expect topological network alignment to succeed; (ii) in the few cases where the edge densities are high enough, some measures of topological similarity easily uncover functionally similar proteins while others do not; and (iii) most network alignment algorithms to date perform poorly at optimizing even their own topological objective functions, hampering their ability to use topology effectively. We demonstrate that SANA-the Simulated Annealing Network Aligner-significantly outperforms existing aligners at optimizing their own objective functions, even achieving near-optimal solutions when the optimal solution is known. We offer the first demonstration of global network alignments based on topology alone that align functionally similar proteins with p-values in some cases below 10-300. We predict that topological network alignment has a bright future as edge densities increase toward the value where good alignments become possible. We demonstrate that when enough common topology is present at high enough edge densities-for example in the recent, partly synthetic networks of the Integrated Interaction Database-topological network alignment easily recovers most orthologs, paving the way toward high-throughput functional prediction based on topology-driven network alignment.
Collapse
Affiliation(s)
- Siyue Wang
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Xiaoyin Chen
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Brent J Frederisy
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Benedict A Mbakogu
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Amy D Kanne
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Pasha Khosravi
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA, United States.
| |
Collapse
|
36
|
Gu Z, Hübschmann D. Simplify enrichment: A bioconductor package for clustering and visualizing functional enrichment results. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00073-0. [PMID: 35680096 PMCID: PMC10373083 DOI: 10.1016/j.gpb.2022.04.008] [Citation(s) in RCA: 56] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 01/07/2022] [Accepted: 05/08/2022] [Indexed: 10/18/2022]
Abstract
Functional enrichment analysis or gene set enrichment analysis is a basic bioinformatics method that evaluates the biological importance of a list of genes of interest. However, it may produce a long list of significant terms with highly redundant information that is difficult to summarize. Current tools to simplify enrichment results by clustering them into groups either still produce redundancy between clusters or do not retain consistent term similarities within clusters. We propose a new method named binary cut for clustering similarity matrices of functional terms. Through comprehensive benchmarks on both simulated and real-world datasets, we demonstrated that binary cut could efficiently cluster functional terms into groups where terms showed consistent similarities within groups and were mutually exclusive between groups. We compared binary cut clustering on the similarity matrices obtained from different similarity measures and found that semantic similarity worked well with binary cut, while similarity matrices based on gene overlap showed less consistent patterns. We implemented the binary cut algorithm in the R package simplifyEnrichment, which additionally provides functionalities for visualizing, summarizing, and comparing the clustering. The simplifyEnrichment package and the documentation are available at https://bioconductor.org/packages/simplifyEnrichment/.
Collapse
Affiliation(s)
- Zuguang Gu
- Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT) Heidelberg, Heidelberg 69120, Germany.
| | - Daniel Hübschmann
- Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT) Heidelberg, Heidelberg 69120, Germany; Heidelberg Institute of Stem cell Technology and Experimental Medicine (HI-STEM), Heidelberg 69120, Germany; German Cancer Consortium (DKTK), Heidelberg 69120, Germany; Department of Pediatric Immunology, Hematology and Oncology, University Hospital Heidelberg, Heidelberg 69120, Germany.
| |
Collapse
|
37
|
Kagaya Y, Flannery ST, Jain A, Kihara D. ContactPFP: Protein Function Prediction Using Predicted Contact Information. FRONTIERS IN BIOINFORMATICS 2022; 2. [PMID: 35875419 PMCID: PMC9302406 DOI: 10.3389/fbinf.2022.896295] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Computational function prediction is one of the most important problems in bioinformatics as elucidating the function of genes is a central task in molecular biology and genomics. Most of the existing function prediction methods use protein sequences as the primary source of input information because the sequence is the most available information for query proteins. There are attempts to consider other attributes of query proteins. Among these attributes, the three-dimensional (3D) structure of proteins is known to be very useful in identifying the evolutionary relationship of proteins, from which functional similarity can be inferred. Here, we report a novel protein function prediction method, ContactPFP, which uses predicted residue-residue contact maps as input structural features of query proteins. Although 3D structure information is known to be useful, it has not been routinely used in function prediction because the 3D structure is not experimentally determined for many proteins. In ContactPFP, we overcome this limitation by using residue-residue contact prediction, which has become increasingly accurate due to rapid development in the protein structure prediction field. ContactPFP takes a query protein sequence as input and uses predicted residue-residue contact as a proxy for the 3D protein structure. To characterize how predicted contacts contribute to function prediction accuracy, we compared the performance of ContactPFP with several well-established sequence-based function prediction methods. The comparative study revealed the advantages and weaknesses of ContactPFP compared to contemporary sequence-based methods. There were many cases where it showed higher prediction accuracy. We examined factors that affected the accuracy of ContactPFP using several illustrative cases that highlight the strength of our method.
Collapse
Affiliation(s)
- Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| | - Sean T. Flannery
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Aashish Jain
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
- *Correspondence: Daisuke Kihara,
| |
Collapse
|
38
|
Cote-L’Heureux A, Maurer-Alcalá XX, Katz LA. Old genes in new places: A taxon-rich analysis of interdomain lateral gene transfer events. PLoS Genet 2022; 18:e1010239. [PMID: 35731825 PMCID: PMC9255765 DOI: 10.1371/journal.pgen.1010239] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 07/05/2022] [Accepted: 05/06/2022] [Indexed: 11/26/2022] Open
Abstract
Vertical inheritance is foundational to Darwinian evolution, but fails to explain major innovations such as the rapid spread of antibiotic resistance among bacteria and the origin of photosynthesis in eukaryotes. While lateral gene transfer (LGT) is recognized as an evolutionary force in prokaryotes, the role of LGT in eukaryotic evolution is less clear. With the exception of the transfer of genes from organelles to the nucleus, a process termed endosymbiotic gene transfer (EGT), the extent of interdomain transfer from prokaryotes to eukaryotes is highly debated. A common critique of studies of interdomain LGT is the reliance on the topology of single-gene trees that attempt to estimate more than one billion years of evolution. We take a more conservative approach by identifying cases in which a single clade of eukaryotes is found in an otherwise prokaryotic gene tree (i.e. exclusive presence). Starting with a taxon-rich dataset of over 13,600 gene families and passing data through several rounds of curation, we identify and categorize the function of 306 interdomain LGT events into diverse eukaryotes, including 189 putative EGTs, 52 LGTs into Opisthokonta (i.e. animals, fungi and their microbial relatives), and 42 LGTs nearly exclusive to anaerobic eukaryotes. To assess differential gene loss as an explanation for exclusive presence, we compare branch lengths within each LGT tree to a set of vertically-inherited genes subsampled to mimic gene loss (i.e. with the same taxonomic sampling) and consistently find shorter relative distance between eukaryotes and prokaryotes in LGT trees, a pattern inconsistent with gene loss. Our methods provide a framework for future studies of interdomain LGT and move the field closer to an understanding of how best to model the evolutionary history of eukaryotes.
Collapse
Affiliation(s)
- Auden Cote-L’Heureux
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, United States of America
| | | | - Laura A. Katz
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, United States of America
- Program in Organismic Biology and Evolution, University of Massachusetts Amherst, Amherst, Massachusetts, United States of America
| |
Collapse
|
39
|
Chen Z, Huang X, Fu R, Zhan A. Neighbours matter: Effects of genomic organization on gene expression plasticity in response to environmental stresses during biological invasions. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY. PART D, GENOMICS & PROTEOMICS 2022; 42:100992. [PMID: 35504120 DOI: 10.1016/j.cbd.2022.100992] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 04/07/2022] [Accepted: 04/21/2022] [Indexed: 06/14/2023]
Abstract
Gene expression regulation has been widely recognized as an important molecular mechanism underlying phenotypic plasticity in environmental adaptation. However, it remains largely unexplored on the effects of genomic organization on gene expression plasticity under environmental stresses during biological invasions. Here, we use an invasive model ascidian, Ciona robusta, to investigate how genomic organization affects gene expression in response to salinity stresses during range expansions. Our study showed that neighboring genes were co-expressed and approximately 30% of stress responsive genes were physically clustered on chromosomes. Such coordinated expression was substantially affected by the physical distance and orientation of genes. Interestingly, the overall expression correlation of neighboring genes was significantly decreased under high salinity stresses, illustrating that the co-expression regulation could be disrupted by salinity challenges. Furthermore, the clustering of genes was associated with their function constraints and expression patterns - operon genes enriched in gene expression machinery had the highest transcriptional activity and expression stability. Notably, our analyses showed that the tail-to-tail genes, mainly involved in biological functions related to phosphorylation, homeostatic process, and ion transport, exhibited higher intrinsic expression variability and greater response to salinity challenges. Altogether, the results obtained here provide new insights into the effects of gene organization on gene expression plasticity under environmental challenges, hence improving our knowledge on mechanisms of rapid environmental adaptation during biological invasions.
Collapse
Affiliation(s)
- Zaohuang Chen
- Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, 18 Shuangqing Road, Haidian District, Beijing 100085, China; University of Chinese Academy of Sciences, Chinese Academy of Sciences, 19A Yuquan Road, Shijingshan District, Beijing 100049, China
| | - Xuena Huang
- Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, 18 Shuangqing Road, Haidian District, Beijing 100085, China
| | - Ruiying Fu
- Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, 18 Shuangqing Road, Haidian District, Beijing 100085, China; University of Chinese Academy of Sciences, Chinese Academy of Sciences, 19A Yuquan Road, Shijingshan District, Beijing 100049, China
| | - Aibin Zhan
- Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, 18 Shuangqing Road, Haidian District, Beijing 100085, China; University of Chinese Academy of Sciences, Chinese Academy of Sciences, 19A Yuquan Road, Shijingshan District, Beijing 100049, China.
| |
Collapse
|
40
|
Yu WH, Hsu CL, Lin CC, Oyang YJ, Juan HF, Huang HC. Stratification of lncRNA modulation networks in breast cancer. BMC Med Genomics 2022; 14:300. [PMID: 35501896 PMCID: PMC9059351 DOI: 10.1186/s12920-022-01236-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 04/12/2022] [Indexed: 12/02/2022] Open
Abstract
Background Recently, non-coding RNAs are of growing interest, and more scientists attach importance to research on their functions. Long non-coding RNAs (lncRNAs) are defined as non-protein coding transcripts longer than 200 nucleotides. We already knew that lncRNAs are related to cancers and will be dysregulated in them. But most of their functions are still left to further study. A mechanism of RNA regulation, known as competing endogenous RNAs (ceRNAs), has been proposed to explain the complex relationships among mRNAs and lncRNAs by competing for binding with shared microRNAs (miRNAs). Methods We proposed an analysis framework to construct the association networks among lncRNA, mRNA, and miRNAs based on their expression patterns and decipher their network modules. Results We collected a large-scale gene expression dataset of 1,061 samples from breast invasive carcinoma (BRCA) patients, each consisted of the expression profiles of 4,359 lncRNAs, 16,517 mRNAs, and 534 miRNAs, and applied the proposed analysis approach to interrogate them. We have uncovered the underlying ceRNA modules and the key modulatory lncRNAs for different subtypes of breast cancer. Conclusions We proposed a modulatory analysis to infer the ceRNA effects among mRNAs and lncRNAs and performed functional analysis to reveal the plausible mechanisms of lncRNA modulation in the four breast cancer subtypes. Our results might provide new directions for breast cancer therapeutics and the proposed method could be readily applied to other diseases. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-022-01236-6.
Collapse
Affiliation(s)
- Wen-Hsuan Yu
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan.,Center for Computational and Systems Biology, National Taiwan University, Taipei, Taiwan.,Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, No. 155, Sec. 2, Linong Street, Taipei, 112, Taiwan
| | - Chia-Lang Hsu
- Center for Computational and Systems Biology, National Taiwan University, Taipei, Taiwan.,Department of Medical Research, National Taiwan University Hospital, Taipei, Taiwan
| | - Chen-Ching Lin
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, No. 155, Sec. 2, Linong Street, Taipei, 112, Taiwan
| | - Yen-Jen Oyang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan.,Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| | - Hsueh-Fen Juan
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan. .,Center for Computational and Systems Biology, National Taiwan University, Taipei, Taiwan. .,Department of Life Science, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 106, Taiwan.
| | - Hsuan-Cheng Huang
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, No. 155, Sec. 2, Linong Street, Taipei, 112, Taiwan.
| |
Collapse
|
41
|
Zhang J, Zhu M, Qian Y. protein2vec: Predicting Protein-Protein Interactions Based on LSTM. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1257-1266. [PMID: 32750870 DOI: 10.1109/tcbb.2020.3003941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The semantic similarity of gene ontology (GO) terms is widely used to predict protein-protein interactions (PPIs). The traditional semantic similarity measures are based mainly on manually crafted features, which may ignore some important hidden information of the gene ontology. Moreover, those methods usually obtain the similarity between proteins from similarity between GO terms by some simple statistical rules, such as MAX and BMA (best-match average), oversimplifying the possible complex relationship between the proteins and the GO terms annotated with them. To overcome the two deficiencies, we propose a new method named protein2vec, which characterizes a protein with a vector based on the GO terms annotated to it and combines the information of both the GO and known PPIs. We firstly try to apply the network embedding algorithm on the GO network to generate feature vectors for each GO term. Then, Long Short-Time Memory (LSTM) encodes the feature vectors of the GO terms annotated with a protein into another vector (called protein vector). Finally, two protein vectors are forwarded into a feedforward neural network to predict the interaction between the two corresponding proteins. The experimental results show that protein2vec outperforms almost all commonly used traditional semantic similarity methods.
Collapse
|
42
|
Pavani KC, Meese T, Pascottini OB, Guan X, Lin X, Peelman L, Hamacher J, Van Nieuwerburgh F, Deforce D, Boel A, Heindryckx B, Tilleman K, Van Soom A, Gadella BM, Hendrix A, Smits K. Hatching is modulated by microRNA-378a-3p derived from extracellular vesicles secreted by blastocysts. Proc Natl Acad Sci U S A 2022; 119:e2122708119. [PMID: 35298333 PMCID: PMC8944274 DOI: 10.1073/pnas.2122708119] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 02/04/2022] [Indexed: 12/17/2022] Open
Abstract
SignificanceHatching from the zona pellucida is a prerequisite for embryo implantation and is less likely to occur in vitro for reasons unknown. Extracellular vesicles (EVs) are secreted by the embryo into the culture medium. Yet the role that embryonic EVs and their cargo microRNAs (miRNAs) play in blastocyst hatching has not been elucidated, partially due to the difficulties of isolating them from low amounts of culture medium. Here, we optimized EV-miRNA isolation from medium conditioned by individually cultured bovine embryos and subsequently showed that miR-378a-3p, which was up-regulated in EVs secreted by blastocysts, plays a crucial role in promoting blastocyst hatching. This demonstrates the regulatory effect of miR-378-3p on hatching, which is an established embryo quality parameter linked with implantation.
Collapse
Affiliation(s)
- Krishna Chaitanya Pavani
- Department of Reproduction, Obstetrics and Herd Health, Faculty of Veterinary Medicine, University of Ghent, B-9820 Merelbeke, Belgium
- Department for Reproductive Medicine, Ghent University Hospital, 9000 Gent, Belgium
| | - Tim Meese
- Department of Pharmaceutics, Faculty of Pharmaceutical Sciences, Ghent University, B-9000 Ghent, Belgium
| | - Osvaldo Bogado Pascottini
- Department of Reproduction, Obstetrics and Herd Health, Faculty of Veterinary Medicine, University of Ghent, B-9820 Merelbeke, Belgium
- Department of Veterinary Sciences, Gamete Research Center, University of Antwerp, 2610 Antwerp, Belgium
| | - XueFeng Guan
- Department of Nutrition, Genetics and Ethology, Faculty of Veterinary Medicine, Ghent University, B-9000 Ghent, Belgium
| | - Xiaoyuan Lin
- Department of Nutrition, Genetics and Ethology, Faculty of Veterinary Medicine, Ghent University, B-9000 Ghent, Belgium
| | - Luc Peelman
- Department of Nutrition, Genetics and Ethology, Faculty of Veterinary Medicine, Ghent University, B-9000 Ghent, Belgium
| | - Joachim Hamacher
- Institute of Crop Science and Resource Conservation, Plant Pathology, Rheinische Friedrich-Wilhelms-University of Bonn, D-53115 Bonn, Germany
| | - Filip Van Nieuwerburgh
- Department of Pharmaceutics, Faculty of Pharmaceutical Sciences, Ghent University, B-9000 Ghent, Belgium
| | - Dieter Deforce
- Department of Pharmaceutics, Faculty of Pharmaceutical Sciences, Ghent University, B-9000 Ghent, Belgium
| | - Annekatrien Boel
- Ghent-Fertility and Stem Cell Team, Department for Reproductive Medicine, Ghent University Hospital, 9000 Ghent, Belgium
| | - Björn Heindryckx
- Ghent-Fertility and Stem Cell Team, Department for Reproductive Medicine, Ghent University Hospital, 9000 Ghent, Belgium
| | - Kelly Tilleman
- Department for Reproductive Medicine, Ghent University Hospital, 9000 Gent, Belgium
| | - Ann Van Soom
- Department of Reproduction, Obstetrics and Herd Health, Faculty of Veterinary Medicine, University of Ghent, B-9820 Merelbeke, Belgium
| | - Bart M. Gadella
- Department of Biomolecular Health Sciences, Faculty of Veterinary Medicine, Utrecht University, 3584 CM Utrecht, The Netherlands
| | - An Hendrix
- Laboratory of Experimental Cancer Research, Department of Human Structure and Repair, Ghent University, B-9000 Ghent, Belgium
- Cancer Research Institute Ghent, B-9000 Ghent, Belgium
| | - Katrien Smits
- Department of Reproduction, Obstetrics and Herd Health, Faculty of Veterinary Medicine, University of Ghent, B-9820 Merelbeke, Belgium
| |
Collapse
|
43
|
Mallick K, Mallik S, Bandyopadhyay S, Chakraborty S. A Novel Graph Topology-Based GO-Similarity Measure for Signature Detection From Multi-Omics Data and its Application to Other Problems. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:773-785. [PMID: 32866101 DOI: 10.1109/tcbb.2020.3020537] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Large scale multi-omics data analysis and signature prediction have been a topic of interest in the last two decades. While various traditional clustering/correlation-based methods have been proposed, but the overall prediction is not always satisfactory. To solve these challenges, in this article, we propose a new approach by leveraging the Gene Ontology (GO)similarity combined with multiomics data. In this article, a new GO similarity measure, ModSchlicker, is proposed and the effectiveness of the proposed measure along with other standardized measures are reviewed while using various graph topology-based Information Content (IC)values of GO-term. The proposed measure is deployed to PPI prediction. Furthermore, by involving GO similarity, we propose a new framework for stronger disease-based gene signature detection from the multi-omics data. For the first objective, we predict interaction from various benchmark PPI datasets of Yeast and Human species. For the latter, the gene expression and methylation profiles are used to identify Differentially Expressed and Methylated (DEM)genes. Thereafter, the GO similarity score along with a statistical method are used to determine the potential gene signature. Interestingly, the proposed method produces a better performance ( 0.9 avg. accuracy and 0.95 AUC)as compared to the other existing related methods during the classification of the participating features (genes)of the signature. Moreover, the proposed method is highly useful in other prediction/classification problems for any kind of large scale omics data.
Collapse
|
44
|
Kemper EK, Zhang Y, Dix MM, Cravatt BF. Global profiling of phosphorylation-dependent changes in cysteine reactivity. Nat Methods 2022; 19:341-352. [PMID: 35228727 PMCID: PMC8920781 DOI: 10.1038/s41592-022-01398-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 01/14/2022] [Indexed: 01/11/2023]
Abstract
Proteomics has revealed that the ~20,000 human genes engender a far greater number of proteins, or proteoforms, that are diversified in large part by post-translational modifications (PTMs). How such PTMs affect protein structure and function is an active area of research but remains technically challenging to assess on a proteome-wide scale. Here, we describe a chemical proteomic method to quantitatively relate serine/threonine phosphorylation to changes in the reactivity of cysteine residues, a parameter that can affect the potential for cysteines to be post-translationally modified or engaged by covalent drugs. Leveraging the extensive high-stoichiometry phosphorylation occurring in mitotic cells, we discover numerous cysteines that exhibit phosphorylation-dependent changes in reactivity on diverse proteins enriched in cell cycle regulatory pathways. The discovery of bidirectional changes in cysteine reactivity often occurring in proximity to serine/threonine phosphorylation events points to the broad impact of phosphorylation on the chemical reactivity of proteins and the future potential to create small-molecule probes that differentially target proteoforms with PTMs.
Collapse
Affiliation(s)
- Esther K Kemper
- The Department of Chemistry and The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA.
| | - Yuanjin Zhang
- The Department of Chemistry and The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Melissa M Dix
- The Department of Chemistry and The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Benjamin F Cravatt
- The Department of Chemistry and The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA.
| |
Collapse
|
45
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
46
|
Slater LT, Russell S, Makepeace S, Carberry A, Karwath A, Williams JA, Fanning H, Ball S, Hoehndorf R, Gkoutos GV. Evaluating semantic similarity methods for comparison of text-derived phenotype profiles. BMC Med Inform Decis Mak 2022; 22:33. [PMID: 35123470 PMCID: PMC8818208 DOI: 10.1186/s12911-022-01770-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 01/21/2022] [Indexed: 11/16/2022] Open
Abstract
Background Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance ‘patient-like me’ analyses, automated coding, differential diagnosis, and outcome prediction. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or better methods in the area. Methods We develop a platform for reproducible benchmarking and comparison of experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from all text narrative associated with admissions in the medical information mart for intensive care (MIMIC-III). Results 300 semantic similarity configurations were evaluated, as well as one embedding-based approach. On average, measures that did not make use of an external information content measure performed slightly better, however the best-performing configurations when measured by area under receiver operating characteristic curve and Top Ten Accuracy used term-specificity and annotation-frequency measures. Conclusion We identified and interpreted the performance of a large number of semantic similarity configurations for the task of classifying diagnosis from text-derived phenotype profiles in one setting. We also provided a basis for further research on other settings and related tasks in the area.
Collapse
|
47
|
Masoomi-Aladizgeh F, McKay MJ, Asar Y, Haynes PA, Atwell BJ. Patterns of gene expression in pollen of cotton (Gossypium hirsutum) indicate downregulation as a feature of thermotolerance. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 109:965-979. [PMID: 34837283 DOI: 10.1111/tpj.15608] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 11/22/2021] [Accepted: 11/23/2021] [Indexed: 06/13/2023]
Abstract
Reproductive performance in plants is impaired as maximum temperatures consistently approach 40°C. However, the timing of heatwaves critically affects their impact. We studied the molecular responses during pollen maturation in cotton to investigate the vulnerability to high temperature. Tetrads (TEs), uninucleate and binucleate microspores, and mature pollen were subjected to SWATH-MS and RNA-seq analyses after exposure to 38/28°C (day/night) for 5 days. The results indicated that molecular signatures were downregulated progressively in response to heat during pollen development. This was even more evident in leaves, where three-quarters of differentially changed proteins decreased in abundance during heat. Functional analysis showed that translation of genes increased in TEs after exposure to heat; however, the reverse pattern was observed in mature pollen and leaves. For example, proteins involved in transport were highly abundant in TEs whereas in later stages of pollen formation and leaves, heat suppressed synthesis of proteins involved in cell-to-cell communication. Moreover, a large number of heat shock proteins were identified in heat-affected TEs, but these proteins were less abundant in mature pollen and leaves. We speculate that the sensitivity of TE cells to heat is related to high rates of translation targeted to pathways that might not be essential for thermotolerance. Molecular signatures during stages of pollen development after heatwaves could provide markers for future genetic improvement.
Collapse
Affiliation(s)
| | - Matthew J McKay
- Australian Proteome Analysis Facility, Department of Molecular Sciences, Macquarie University, NSW, Australia
| | - Yasmin Asar
- School of Life and Environmental Sciences, University of Sydney, NSW, Australia
| | - Paul A Haynes
- Department of Molecular Sciences, Macquarie University, NSW, Australia
| | - Brian J Atwell
- Department of Biological Sciences, Macquarie University, NSW, Australia
| |
Collapse
|
48
|
Siddiqui G, Giannangelo C, De Paoli A, Schuh AK, Heimsch KC, Anderson D, Brown TG, MacRaild CA, Wu J, Wang X, Dong Y, Vennerstrom JL, Becker K, Creek DJ. Peroxide Antimalarial Drugs Target Redox Homeostasis in Plasmodium falciparum Infected Red Blood Cells. ACS Infect Dis 2022; 8:210-226. [PMID: 34985858 PMCID: PMC8762662 DOI: 10.1021/acsinfecdis.1c00550] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
![]()
Plasmodium
falciparum causes the
most lethal form of malaria. Peroxide antimalarials based on artemisinin
underpin the frontline treatments for malaria, but artemisinin resistance
is rapidly spreading. Synthetic peroxide antimalarials, known as ozonides,
are in clinical development and offer a potential alternative. Here,
we used chemoproteomics to investigate the protein alkylation targets
of artemisinin and ozonide probes, including an analogue of the ozonide
clinical candidate, artefenomel. We greatly expanded the list of proteins
alkylated by peroxide antimalarials and identified significant enrichment
of redox-related proteins for both artemisinins and ozonides. Disrupted
redox homeostasis was confirmed by dynamic live imaging of the glutathione
redox potential using a genetically encoded redox-sensitive fluorescence-based
biosensor. Targeted liquid chromatography-mass spectrometry (LC-MS)-based
thiol metabolomics also confirmed changes in cellular thiol levels.
This work shows that peroxide antimalarials disproportionately alkylate
proteins involved in redox homeostasis and that disrupted redox processes
are involved in the mechanism of action of these important antimalarials.
Collapse
Affiliation(s)
- Ghizal Siddiqui
- Drug Delivery, Disposition and Dynamics, Monash Institute of Pharmaceutical Sciences, Monash University, 381 Royal Parade, Parkville, VIC 3052, Australia
| | - Carlo Giannangelo
- Drug Delivery, Disposition and Dynamics, Monash Institute of Pharmaceutical Sciences, Monash University, 381 Royal Parade, Parkville, VIC 3052, Australia
| | - Amanda De Paoli
- Drug Delivery, Disposition and Dynamics, Monash Institute of Pharmaceutical Sciences, Monash University, 381 Royal Parade, Parkville, VIC 3052, Australia
| | - Anna Katharina Schuh
- Biochemistry and Molecular Biology, Interdisciplinary Research Center, Justus Liebig University Giessen, 35392 Giessen, Germany
| | - Kim C. Heimsch
- Biochemistry and Molecular Biology, Interdisciplinary Research Center, Justus Liebig University Giessen, 35392 Giessen, Germany
| | - Dovile Anderson
- Drug Delivery, Disposition and Dynamics, Monash Institute of Pharmaceutical Sciences, Monash University, 381 Royal Parade, Parkville, VIC 3052, Australia
| | - Timothy G. Brown
- Drug Discovery Biology, Monash Institute of Pharmaceutical Sciences, Monash University, 381 Royal Parade, Parkville, VIC 3052, Australia
| | - Christopher A. MacRaild
- Drug Delivery, Disposition and Dynamics, Monash Institute of Pharmaceutical Sciences, Monash University, 381 Royal Parade, Parkville, VIC 3052, Australia
| | - Jianbo Wu
- College of Pharmacy, University of Nebraska Medical Center, 986125 Nebraska Medical Center, Omaha, Nebraska 68198-6125, United States
| | - Xiaofang Wang
- College of Pharmacy, University of Nebraska Medical Center, 986125 Nebraska Medical Center, Omaha, Nebraska 68198-6125, United States
| | - Yuxiang Dong
- College of Pharmacy, University of Nebraska Medical Center, 986125 Nebraska Medical Center, Omaha, Nebraska 68198-6125, United States
| | - Jonathan L. Vennerstrom
- College of Pharmacy, University of Nebraska Medical Center, 986125 Nebraska Medical Center, Omaha, Nebraska 68198-6125, United States
| | - Katja Becker
- Biochemistry and Molecular Biology, Interdisciplinary Research Center, Justus Liebig University Giessen, 35392 Giessen, Germany
| | - Darren J. Creek
- Drug Delivery, Disposition and Dynamics, Monash Institute of Pharmaceutical Sciences, Monash University, 381 Royal Parade, Parkville, VIC 3052, Australia
| |
Collapse
|
49
|
Lastra-Díaz JJ, Lara-Clares A, Garcia-Serrano A. HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey. BMC Bioinformatics 2022; 23:23. [PMID: 34991460 PMCID: PMC8734250 DOI: 10.1186/s12859-021-04539-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 12/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. RESULTS To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra's algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure. CONCLUSIONS We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.
Collapse
Affiliation(s)
- Juan J. Lastra-Díaz
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| | - Alicia Lara-Clares
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| | - Ana Garcia-Serrano
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| |
Collapse
|
50
|
Acharya S, Cui L, Pan Y. A Refined 3-in-1 Fused Protein Similarity Measure: Application in Threshold-Free Hub Detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:192-206. [PMID: 32070994 DOI: 10.1109/tcbb.2020.2973563] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
An exhaustive literature survey shows that finding protein/gene similarity is an important step towards solving widespread bioinformatics problems, such as predicting protein-protein interactions, analyzing Protein-Protein Interaction Networks (PPINs), gene prioritization, and disease gene/protein detection. In this article, we have proposed an improved 3-in-1 fused protein similarity measure called FuSim-II. It is built upon combining the weighted average of biological knowledge extracted from three potential genomic/ proteomic resources such as Gene Ontology (GO), PPIN, and protein sequence. Furthermore, we have shown the application of the proposed measure in detecting potential hub-proteins from a given PPIN. Aiming that, we have proposed a multi-objective clustering-based protein hub detection framework with FuSim-II working as the underlying proximity measure. The PPINs of H. Sapiens and M. Musculus organisms are chosen for experimental purposes. Unlike most of the existing hub-detection methods, the proposed technique does not require to follow any protein degree cut-off or threshold to define hubs. A thorough assessment of efficiency between proposed and existing eight protein similarity measures along with eight single/multi-objective clustering methods has been carried out. Internal cluster validity indices like Silhouette and Davies Bouldin (DB) are deployed to accomplish analytical study. Also, a comparative performance analysis between proposed and five existing hub-proteins detection algorithms is conducted through the enrichment of essentiality study. The reported results show the improved performance of FuSim-II over existing protein similarity measures in terms of identifying functionally related proteins as well as relevant hub-proteins. Supplementary material is available at http://csse.szu.edu.cn/staff/cuilz/eng/index.html.
Collapse
|