1
|
Lawson S, Donovan D, Lefevre J. An application of node and edge nonlinear hypergraph centrality to a protein complex hypernetwork. PLoS One 2024; 19:e0311433. [PMID: 39361678 PMCID: PMC11449304 DOI: 10.1371/journal.pone.0311433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 09/12/2024] [Indexed: 10/05/2024] Open
Abstract
The use of graph centrality measures applied to biological networks, such as protein interaction networks, underpins much research into identifying key players within biological processes. This approach however is restricted to dyadic interactions and it is well-known that in many instances interactions are polyadic. In this study we illustrate the merit of using hypergraph centrality applied to a hypernetwork as an alternative. Specifically, we review and propose an extension to a recently introduced node and edge nonlinear hypergraph centrality model which provides mutually dependent node and edge centralities. A Saccharomyces Cerevisiae protein complex hypernetwork is used as an example application with nodes representing proteins and hyperedges representing protein complexes. The resulting rankings of the nodes and edges are considered to see if they provide insight into the essentiality of the proteins and complexes. We find that certain variations of the model predict essentiality more accurately and that the degree-based variation illustrates that the centrality-lethality rule extends to a hypergraph setting. In particular, through exploitation of the models flexibility, we identify small sets of proteins densely populated with essential proteins. One of the key advantages of applying this model to a protein complex hypernetwork is that it also provides a classification method for protein complexes, unlike previous approaches which are only concerned with classifying proteins.
Collapse
Affiliation(s)
- Sarah Lawson
- ARC Centre of Excellence, Plant Success in Nature and Agriculture, School of Mathematics and Physics, The University of Queensland, Brisbane, Queensland, Australia
| | - Diane Donovan
- ARC Centre of Excellence, Plant Success in Nature and Agriculture, School of Mathematics and Physics, The University of Queensland, Brisbane, Queensland, Australia
| | - James Lefevre
- ARC Centre of Excellence, Plant Success in Nature and Agriculture, School of Mathematics and Physics, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
2
|
Lu P, Tian J. ACDMBI: A deep learning model based on community division and multi-source biological information fusion predicts essential proteins. Comput Biol Chem 2024; 112:108115. [PMID: 38865861 DOI: 10.1016/j.compbiolchem.2024.108115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 05/15/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024]
Abstract
Accurately identifying essential proteins is vital for drug research and disease diagnosis. Traditional centrality methods and machine learning approaches often face challenges in accurately discerning essential proteins, primarily relying on information derived from protein-protein interaction (PPI) networks. Despite attempts by some researchers to integrate biological data and PPI networks for predicting essential proteins, designing effective integration methods remains a challenge. In response to these challenges, this paper presents the ACDMBI model, specifically designed to overcome the aforementioned issues. ACDMBI is comprised of two key modules: feature extraction and classification. In terms of capturing relevant information, we draw insights from three distinct data sources. Initially, structural features of proteins are extracted from the PPI network through community division. Subsequently, these features are further optimized using Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). Moving forward, protein features are extracted from gene expression data utilizing Bidirectional Long Short-Term Memory networks (BiLSTM) and a multi-head self-attention mechanism. Finally, protein features are derived by mapping subcellular localization data to a one-dimensional vector and processing it through fully connected layers. In the classification phase, we integrate features extracted from three different data sources, crafting a multi-layer deep neural network (DNN) for protein classification prediction. Experimental results on brewing yeast data showcase the ACDMBI model's superior performance, with AUC reaching 0.9533 and AUPR reaching 0.9153. Ablation experiments further reveal that the effective integration of features from diverse biological information significantly boosts the model's performance.
Collapse
Affiliation(s)
- Pengli Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China.
| | - Jialong Tian
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China.
| |
Collapse
|
3
|
Liu L, Liu Y, Min L, Zhou Z, He X, Xie Y, Cao W, Deng S, Lin X, He X, Chen X. Most Pleiotropic Effects of Gene Knockouts Are Evolutionarily Transient in Yeasts. Mol Biol Evol 2024; 41:msae189. [PMID: 39238468 DOI: 10.1093/molbev/msae189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 07/12/2024] [Accepted: 08/30/2024] [Indexed: 09/07/2024] Open
Abstract
Pleiotropy, the phenomenon in which a single gene influences multiple traits, is a fundamental concept in genetics. However, the evolutionary mechanisms underlying pleiotropy require further investigation. In this study, we conducted parallel gene knockouts targeting 100 transcription factors in 2 strains of Saccharomyces cerevisiae. We systematically examined and quantified the pleiotropic effects of these knockouts on gene expression levels for each transcription factor. Our results showed that the knockout of a single gene generally affected the expression levels of multiple genes in both strains, indicating various degrees of pleiotropic effects. Strikingly, the pleiotropic effects of the knockouts change rapidly between strains in different genetic backgrounds, and ∼85% of them were nonconserved. Further analysis revealed that the conserved effects tended to be functionally associated with the deleted transcription factors, while the nonconserved effects appeared to be more ad hoc responses. In addition, we measured 184 yeast cell morphological traits in these knockouts and found consistent patterns. In order to investigate the evolutionary processes underlying pleiotropy, we examined the pleiotropic effects of standing genetic variations in a population consisting of ∼1,000 hybrid progenies of the 2 strains. We observed that newly evolved expression quantitative trait loci impacted the expression of a greater number of genes than did old expression quantitative trait loci, suggesting that natural selection is gradually eliminating maladaptive or slightly deleterious pleiotropic responses. Overall, our results show that, although being prevalent for new mutations, the majority of pleiotropic effects observed are evolutionarily transient, which explains how evolution proceeds despite complicated pleiotropic effects.
Collapse
Affiliation(s)
- Li Liu
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, Innovation Center for Evolutionary Synthetic Biology, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Yao Liu
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Lulu Min
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Zhenzhen Zhou
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Xingxing He
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - YunHan Xie
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, Innovation Center for Evolutionary Synthetic Biology, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Evolutionary Ecology, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Waifang Cao
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Shuyun Deng
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Xiaoju Lin
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Xionglei He
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, Innovation Center for Evolutionary Synthetic Biology, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Xiaoshu Chen
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control, Ministry of Education, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
4
|
Patel LA, Cao Y, Mendenhall EM, Benner C, Goren A. The Wild West of spike-in normalization. Nat Biotechnol 2024; 42:1343-1349. [PMID: 39271835 DOI: 10.1038/s41587-024-02377-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2024]
Affiliation(s)
- Lauren A Patel
- Department of Bioengineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Medicine, Division of Endocrinology & Metabolism, University of California San Diego, La Jolla, CA, USA
- Department of Medicine, Division of Genomics & Precision Medicine, University of California San Diego, La Jolla, CA, USA
| | - Yuwei Cao
- Department of Medicine, Division of Genomics & Precision Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA
| | | | - Christopher Benner
- Department of Medicine, Division of Endocrinology & Metabolism, University of California San Diego, La Jolla, CA, USA.
| | - Alon Goren
- Department of Medicine, Division of Genomics & Precision Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
5
|
Zhao Y, Han Z, Zhu X, Chen B, Zhou L, Liu X, Liu H. Yeast Proteins: Proteomics, Extraction, Modification, Functional Characterization, and Structure: A Review. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:18774-18793. [PMID: 39146464 DOI: 10.1021/acs.jafc.4c04821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Proteins are essential for human tissues and organs, and they require adequate intake for normal physiological functions. With a growing global population, protein demand rises annually. Traditional animal and plant protein sources rely heavily on land and water, making it difficult to meet the increasing demand. The high protein content of yeast and the complete range of amino acids in yeast proteins make it a high-quality source of supplemental protein. Screening of high-protein yeast strains using proteomics is essential to increase the value of yeast protein resources and to promote the yeast protein industry. However, current yeast extraction methods are mainly alkaline solubilization and acid precipitation; therefore, it is necessary to develop more efficient and environmentally friendly techniques. In addition, the functional properties of yeast proteins limit their application in the food industry. To improve these properties, methods must be selected to modify the secondary and tertiary structures of yeast proteins. This paper explores how proteomic analysis can be used to identify nutrient-rich yeast strains, compares the process of preparing yeast proteins, and investigates how modification methods affect the function and structure of yeast proteins. It provides a theoretical basis for solving the problem of inadequate protein intake in China and explores future prospects.
Collapse
Affiliation(s)
- Yan Zhao
- School of Food and Health, Beijing Technology and Business University, Beijing 100080, China
| | - Zhaowei Han
- School of Food and Health, Beijing Technology and Business University, Beijing 100080, China
| | - Xuchun Zhu
- School of Food and Health, Beijing Technology and Business University, Beijing 100080, China
| | - Bingyu Chen
- Graduate School of Agriculture, Kyoto University, Kyoto606-8502, Japan
| | - Linyi Zhou
- School of Food and Health, Beijing Technology and Business University, Beijing 100080, China
| | - Xiaoyong Liu
- Henan Agricultural University, Zhengzhou, Henan 450002, China
| | - Hongzhi Liu
- School of Food and Health, Beijing Technology and Business University, Beijing 100080, China
- College of Food and Pharmaceutical Engineering, Guizhou Institute of Technology, Guiyang, Guizhou 550025, China
| |
Collapse
|
6
|
Mucelli X, Huang LS. Naming internal insertion alleles created using CRISPR in Saccharomyces cerevisiae. MICROPUBLICATION BIOLOGY 2024; 2024:10.17912/micropub.biology.001258. [PMID: 39185013 PMCID: PMC11342080 DOI: 10.17912/micropub.biology.001258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 07/23/2024] [Accepted: 08/06/2024] [Indexed: 08/27/2024]
Abstract
The budding yeast Saccharomyces cerevisiae is a powerful model organism, partly because of the ease of genome alterations due to the combination of a fast generation time and many molecular genetic tools. Recent advances in CRISPR-based systems allow for the easier creation of alleles with internally inserted sequences within the coding regions of genes, such as the internal insertion of sequences that code for epitopes or fluorescent proteins. Here we briefly summarize some exisiting nomenclature standards and suggest nomenclature guidelines for internal insertion alleles which are informative, consistent, and computable.
Collapse
|
7
|
Romero-Pérez PS, Moran HM, Horani A, Truong A, Manriquez-Sandoval E, Ramirez JF, Martinez A, Gollub E, Hunter K, Lotthammer JM, Emenecker RJ, Boothby TC, Holehouse AS, Fried SD, Sukenik S. Protein surface chemistry encodes an adaptive resistance to desiccation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.28.604841. [PMID: 39131385 PMCID: PMC11312438 DOI: 10.1101/2024.07.28.604841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
Cellular desiccation - the loss of nearly all water from the cell - is a recurring stress in an increasing number of ecosystems that can drive proteome-wide protein unfolding and aggregation. For cells to survive this stress, at least some of the proteome must disaggregate and resume function upon rehydration. The molecular determinants that underlie the ability of proteins to do this remain largely unknown. Here, we apply quantitative and structural proteomic mass spectrometry to desiccated and rehydrated yeast extracts to show that some proteins possess an innate capacity to survive extreme water loss. Structural analysis correlates the ability of proteins to resist desiccation with their surface chemistry. Remarkably, highly resistant proteins are responsible for the production of the cell's building blocks - amino acids, metabolites, and sugars. Conversely, those proteins that are most desiccation-sensitive are involved in ribosome biogenesis and other energy consuming processes. As a result, the rehydrated proteome is preferentially enriched with metabolite and small molecule producers and depleted of some of the cell's heaviest consumers. We propose this functional bias enables cells to kickstart their metabolism and promote cell survival following desiccation and rehydration.
Collapse
Affiliation(s)
| | - Haley M. Moran
- Department of Chemistry, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Azeem Horani
- Quantitative and Systems Biology Program, University of California Merced, Merced, CA 95343, USA
| | - Alexander Truong
- Dept of Chemistry and Biochemistry, University of California Merced, Merced, CA 95343, USA
| | - Edgar Manriquez-Sandoval
- T. C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - John F. Ramirez
- Department of Molecular Biology, University of Wyoming, Laramie, WY 82071, USA
| | - Alec Martinez
- Dept of Chemistry and Biochemistry, University of California Merced, Merced, CA 95343, USA
| | - Edith Gollub
- Dept of Chemistry and Biochemistry, University of California Merced, Merced, CA 95343, USA
| | - Kara Hunter
- Dept of Chemistry and Biochemistry, University of California Merced, Merced, CA 95343, USA
| | - Jeffrey M. Lotthammer
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Ryan J. Emenecker
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Thomas C. Boothby
- Department of Molecular Biology, University of Wyoming, Laramie, WY 82071, USA
| | - Alex S. Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Stephen D. Fried
- Department of Chemistry, Johns Hopkins University, Baltimore, Maryland 21218, USA
- T. C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Shahar Sukenik
- Dept of Chemistry and Biochemistry, University of California Merced, Merced, CA 95343, USA
- Quantitative and Systems Biology Program, University of California Merced, Merced, CA 95343, USA
| |
Collapse
|
8
|
Chen P, Zhang J. The loci of environmental adaptation in a model eukaryote. Nat Commun 2024; 15:5672. [PMID: 38971805 PMCID: PMC11227561 DOI: 10.1038/s41467-024-50002-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 06/25/2024] [Indexed: 07/08/2024] Open
Abstract
While the underlying genetic changes have been uncovered in some cases of adaptive evolution, the lack of a systematic study prevents a general understanding of the genomic basis of adaptation. For example, it is unclear whether protein-coding or noncoding mutations are more important to adaptive evolution and whether adaptations to different environments are brought by genetic changes distributed in diverse genes and biological processes or concentrated in a core set. We here perform laboratory evolution of 3360 Saccharomyces cerevisiae populations in 252 environments of varying levels of stress. We find the yeast adaptations to be primarily fueled by large-effect coding mutations overrepresented in a relatively small gene set, despite prevalent antagonistic pleiotropy across environments. Populations generally adapt faster in more stressful environments, partly because of greater benefits of the same mutations in more stressful environments. These and other findings from this model eukaryote help unravel the genomic principles of environmental adaptation.
Collapse
Affiliation(s)
- Piaopiao Chen
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, 48109, USA
- College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, 48109, USA.
| |
Collapse
|
9
|
Li Z, Wang S, Cui H, Liu X, Zhang Y. Spatiotemporal constrained RNA-protein heterogeneous network for protein complex identification. Brief Bioinform 2024; 25:bbae280. [PMID: 38856171 PMCID: PMC11163383 DOI: 10.1093/bib/bbae280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/05/2024] [Accepted: 05/24/2024] [Indexed: 06/11/2024] Open
Abstract
The identification of protein complexes from protein interaction networks is crucial in the understanding of protein function, cellular processes and disease mechanisms. Existing methods commonly rely on the assumption that protein interaction networks are highly reliable, yet in reality, there is considerable noise in the data. In addition, these methods fail to account for the regulatory roles of biomolecules during the formation of protein complexes, which is crucial for understanding the generation of protein interactions. To this end, we propose a SpatioTemporal constrained RNA-protein heterogeneous network for Protein Complex Identification (STRPCI). STRPCI first constructs a multiplex heterogeneous protein information network to capture deep semantic information by extracting spatiotemporal interaction patterns. Then, it utilizes a dual-view aggregator to aggregate heterogeneous neighbor information from different layers. Finally, through contrastive learning, STRPCI collaboratively optimizes the protein embedding representations under different spatiotemporal interaction patterns. Based on the protein embedding similarity, STRPCI reweights the protein interaction network and identifies protein complexes with core-attachment strategy. By considering the spatiotemporal constraints and biomolecular regulatory factors of protein interactions, STRPCI measures the tightness of interactions, thus mitigating the impact of noisy data on complex identification. Evaluation results on four real PPI networks demonstrate the effectiveness and strong biological significance of STRPCI. The source code implementation of STRPCI is available from https://github.com/LI-jasm/STRPCI.
Collapse
Affiliation(s)
- Zeqian Li
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Shilong Wang
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Hai Cui
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, CA 94305, USA
| | - Yijia Zhang
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| |
Collapse
|
10
|
Desroches Altamirano C, Kang MK, Jordan MA, Borianne T, Dilmen I, Gnädig M, von Appen A, Honigmann A, Franzmann TM, Alberti S. eIF4F is a thermo-sensing regulatory node in the translational heat shock response. Mol Cell 2024; 84:1727-1741.e12. [PMID: 38547866 DOI: 10.1016/j.molcel.2024.02.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 12/18/2023] [Accepted: 02/29/2024] [Indexed: 05/05/2024]
Abstract
Heat-shocked cells prioritize the translation of heat shock (HS) mRNAs, but the underlying mechanism is unclear. We report that HS in budding yeast induces the disassembly of the eIF4F complex, where eIF4G and eIF4E assemble into translationally arrested mRNA ribonucleoprotein particles (mRNPs) and HS granules (HSGs), whereas eIF4A promotes HS translation. Using in vitro reconstitution biochemistry, we show that a conformational rearrangement of the thermo-sensing eIF4A-binding domain of eIF4G dissociates eIF4A and promotes the assembly with mRNA into HS-mRNPs, which recruit additional translation factors, including Pab1p and eIF4E, to form multi-component condensates. Using extracts and cellular experiments, we demonstrate that HS-mRNPs and condensates repress the translation of associated mRNA and deplete translation factors that are required for housekeeping translation, whereas HS mRNAs can be efficiently translated by eIF4A. We conclude that the eIF4F complex is a thermo-sensing node that regulates translation during HS.
Collapse
Affiliation(s)
- Christine Desroches Altamirano
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Moo-Koo Kang
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Mareike A Jordan
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany
| | - Tom Borianne
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Irem Dilmen
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Maren Gnädig
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Alexander von Appen
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany
| | - Alf Honigmann
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Titus M Franzmann
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Simon Alberti
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany.
| |
Collapse
|
11
|
Saha S, Chatterjee P, Basu S, Nasipuri M. EPI-SF: essential protein identification in protein interaction networks using sequence features. PeerJ 2024; 12:e17010. [PMID: 38495766 PMCID: PMC10944162 DOI: 10.7717/peerj.17010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 02/05/2024] [Indexed: 03/19/2024] Open
Abstract
Proteins are considered indispensable for facilitating an organism's viability, reproductive capabilities, and other fundamental physiological functions. Conventional biological assays are characterized by prolonged duration, extensive labor requirements, and financial expenses in order to identify essential proteins. Therefore, it is widely accepted that employing computational methods is the most expeditious and effective approach to successfully discerning essential proteins. Despite being a popular choice in machine learning (ML) applications, the deep learning (DL) method is not suggested for this specific research work based on sequence features due to the restricted availability of high-quality training sets of positive and negative samples. However, some DL works on limited availability of data are also executed at recent times which will be our future scope of work. Conventional ML techniques are thus utilized in this work due to their superior performance compared to DL methodologies. In consideration of the aforementioned, a technique called EPI-SF is proposed here, which employs ML to identify essential proteins within the protein-protein interaction network (PPIN). The protein sequence is the primary determinant of protein structure and function. So, initially, relevant protein sequence features are extracted from the proteins within the PPIN. These features are subsequently utilized as input for various machine learning models, including XGB Boost Classifier, AdaBoost Classifier, logistic regression (LR), support vector classification (SVM), Decision Tree model (DT), Random Forest model (RF), and Naïve Bayes model (NB). The objective is to detect the essential proteins within the PPIN. The primary investigation conducted on yeast examined the performance of various ML models for yeast PPIN. Among these models, the RF model technique had the highest level of effectiveness, as indicated by its precision, recall, F1-score, and AUC values of 0.703, 0.720, 0.711, and 0.745, respectively. It is also found to be better in performance when compared to the other state-of-arts based on traditional centrality like betweenness centrality (BC), closeness centrality (CC), etc. and deep learning methods as well like DeepEP, as emphasized in the result section. As a result of its favorable performance, EPI-SF is later employed for the prediction of novel essential proteins inside the human PPIN. Due to the tendency of viruses to selectively target essential proteins involved in the transmission of diseases within human PPIN, investigations are conducted to assess the probable involvement of these proteins in COVID-19 and other related severe diseases.
Collapse
Affiliation(s)
- Sovan Saha
- Department of Computer Science & Engineering (Artificial Intelligence & Machine Learning), Techno Main Salt Lake, Kolkata, West Bengal, India
| | - Piyali Chatterjee
- Department of Computer Science & Engineering, Netaji Subhash Engineering College, Kolkata, West Bengal, India
| | - Subhadip Basu
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Mita Nasipuri
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, West Bengal, India
| |
Collapse
|
12
|
Garge RK, Geck RC, Armstrong JO, Dunn B, Boutz DR, Battenhouse A, Leutert M, Dang V, Jiang P, Kwiatkowski D, Peiser T, McElroy H, Marcotte EM, Dunham MJ. Systematic profiling of ale yeast protein dynamics across fermentation and repitching. G3 (BETHESDA, MD.) 2024; 14:jkad293. [PMID: 38135291 PMCID: PMC10917522 DOI: 10.1093/g3journal/jkad293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 11/28/2023] [Accepted: 12/07/2023] [Indexed: 12/24/2023]
Abstract
Studying the genetic and molecular characteristics of brewing yeast strains is crucial for understanding their domestication history and adaptations accumulated over time in fermentation environments, and for guiding optimizations to the brewing process itself. Saccharomyces cerevisiae (brewing yeast) is among the most profiled organisms on the planet, yet the temporal molecular changes that underlie industrial fermentation and beer brewing remain understudied. Here, we characterized the genomic makeup of a Saccharomyces cerevisiae ale yeast widely used in the production of Hefeweizen beers, and applied shotgun mass spectrometry to systematically measure the proteomic changes throughout 2 fermentation cycles which were separated by 14 rounds of serial repitching. The resulting brewing yeast proteomics resource includes 64,740 protein abundance measurements. We found that this strain possesses typical genetic characteristics of Saccharomyces cerevisiae ale strains and displayed progressive shifts in molecular processes during fermentation based on protein abundance changes. We observed protein abundance differences between early fermentation batches compared to those separated by 14 rounds of serial repitching. The observed abundance differences occurred mainly in proteins involved in the metabolism of ergosterol and isobutyraldehyde. Our systematic profiling serves as a starting point for deeper characterization of how the yeast proteome changes during commercial fermentations and additionally serves as a resource to guide fermentation protocols, strain handling, and engineering practices in commercial brewing and fermentation environments. Finally, we created a web interface (https://brewing-yeast-proteomics.ccbb.utexas.edu/) to serve as a valuable resource for yeast geneticists, brewers, and biochemists to provide insights into the global trends underlying commercial beer production.
Collapse
Affiliation(s)
- Riddhiman K Garge
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Renee C Geck
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Joseph O Armstrong
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Barbara Dunn
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Daniel R Boutz
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
- Antibody Discovery and Accelerated Protein Therapeutics, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute, Houston, TX 77030, USA
| | - Anna Battenhouse
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Mario Leutert
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Institute of Molecular Systems Biology, ETH Zürich, Zürich 8049, Switzerland
| | - Vy Dang
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Pengyao Jiang
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | | | | | | | - Edward M Marcotte
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Maitreya J Dunham
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
13
|
Lee B, Hokamp K, Alhussain MM, Bamagoos AA, Fleming AB. The influence of flocculation upon global gene transcription in a yeast CYC8 mutant. Microb Genom 2024; 10:001216. [PMID: 38529898 PMCID: PMC10995634 DOI: 10.1099/mgen.0.001216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 02/29/2024] [Indexed: 03/27/2024] Open
Abstract
The transcriptome from a Saccharomyces cerevisiae tup1 deletion mutant was one of the first comprehensive yeast transcriptomes published. Subsequent transcriptomes from tup1 and cyc8 mutants firmly established the Tup1-Cyc8 complex as predominantly acting as a repressor of gene transcription. However, transcriptomes from tup1/cyc8 gene deletion or conditional mutants would all have been influenced by the striking flocculation phenotypes that these mutants display. In this study, we have separated the impact of flocculation from the transcriptome in a cyc8 conditional mutant to reveal those genes (i) subject solely to Cyc8p-dependent regulation, (ii) regulated by flocculation only and (iii) regulated by Cyc8p and further influenced by flocculation. We reveal a more accurate list of Cyc8p-regulated genes that includes newly identified Cyc8p-regulated genes that were masked by the flocculation phenotype and excludes genes which were indirectly influenced by flocculation and not regulated by Cyc8p. Furthermore, we show evidence that flocculation exerts a complex and potentially dynamic influence upon global gene transcription. These data should be of interest to future studies into the mechanism of action of the Tup1-Cyc8 complex and to studies involved in understanding the development of flocculation and its impact upon cell function.
Collapse
Affiliation(s)
- Brenda Lee
- Department of Microbiology, School of Genetics and Microbiology, Moyne Institute of Preventive Medicine, Trinity College Dublin, Dublin, Ireland
| | - Karsten Hokamp
- Department of Genetics, School of Genetics and Microbiology, Smurfit Institute, Trinity College Dublin, Dublin, Ireland
| | - Mohamed M. Alhussain
- Department of Microbiology, School of Genetics and Microbiology, Moyne Institute of Preventive Medicine, Trinity College Dublin, Dublin, Ireland
| | - Atif A. Bamagoos
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Alastair B. Fleming
- Department of Microbiology, School of Genetics and Microbiology, Moyne Institute of Preventive Medicine, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
14
|
Sousa AD, Costa AL, Costa V, Pereira C. Prediction and biological analysis of yeast VDAC1 phosphorylation. Arch Biochem Biophys 2024; 753:109914. [PMID: 38290597 DOI: 10.1016/j.abb.2024.109914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/02/2024] [Accepted: 01/25/2024] [Indexed: 02/01/2024]
Abstract
The mitochondrial outer membrane protein porin 1 (Por1), the yeast orthologue of mammalian voltage-dependent anion channel (VDAC), is the major permeability pathway for the flux of metabolites and ions between cytosol and mitochondria. In yeast, several Por1 phosphorylation sites have been identified. Protein phosphorylation is a major modification regulating a variety of biological activities, but the potential biological roles of Por1 phosphorylation remains unaddressed. In this work, we analysed 10 experimentally observed phosphorylation sites in yeast Por1 using bioinformatics tools. Two of the residues, T100 and S133, predicted to reduce and increase pore permeability, respectively, were validated using biological assays. In accordance, Por1T100D reduced mitochondrial respiration, while Por1S133E phosphomimetic mutant increased it. Por1T100A expression also improved respiratory growth, while Por1S133A caused defects in all growth conditions tested, notably in fermenting media. In conclusion, we found phosphorylation has the potential to modulate Por1, causing a marked effect on mitochondrial function. It can also impact on cell morphology and growth both in respiratory and, unpredictably, also in fermenting conditions, expanding our knowledge on the role of Por1 in cell physiology.
Collapse
Affiliation(s)
- André D Sousa
- i3S - Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal; IBMC - Instituto de Biologia Celular e Molecular, Universidade do Porto, Portugal
| | - Ana Luisa Costa
- i3S - Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal; IBMC - Instituto de Biologia Celular e Molecular, Universidade do Porto, Portugal
| | - Vítor Costa
- i3S - Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal; IBMC - Instituto de Biologia Celular e Molecular, Universidade do Porto, Portugal; ICBAS - Instituto de Ciências Biomédicas Abel Salazar, Universidade do Porto, Portugal
| | - Clara Pereira
- i3S - Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal; IBMC - Instituto de Biologia Celular e Molecular, Universidade do Porto, Portugal.
| |
Collapse
|
15
|
Wang J, Li S, Sun Z, Lao Q, Shen B, Li K, Nie Y. Full-length radiograph based automatic musculoskeletal modeling using convolutional neural network. J Biomech 2024; 166:112046. [PMID: 38467079 DOI: 10.1016/j.jbiomech.2024.112046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 02/27/2024] [Accepted: 03/07/2024] [Indexed: 03/13/2024]
Abstract
Full-length radiographs contain information from which many anatomical parameters of the pelvis, femur, and tibia may be derived, but only a few anatomical parameters are used for musculoskeletal modeling. This study aimed to develop a fully automatic algorithm to extract anatomical parameters from full-length radiograph to generate a musculoskeletal model that is more accurate than linear scaled one. A U-Net convolutional neural network was trained to segment the pelvis, femur, and tibia from the full-length radiograph. Eight anatomic parameters (six for length and width, two for angles) were automatically extracted from the bone segmentation masks and used to generate the musculoskeletal model. Sørensen-Dice coefficient was used to quantify the consistency of automatic bone segmentation masks with manually segmented labels. Maximum distance error, root mean square (RMS) distance error and Jaccard index (JI) were used to evaluate the geometric accuracy of the automatically generated pelvis, femur and tibia models versus CT bone models. Mean Sørensen-Dice coefficients for the pelvis, femur and tibia 2D segmentation masks were 0.9898, 0.9822 and 0.9786, respectively. The algorithm-driven bone models were closer to the 3D CT bone models than the scaled generic models in geometry, with significantly lower maximum distance error (28.3 % average decrease from 24.35 mm) and RMS distance error (28.9 % average decrease from 9.55 mm) and higher JI (17.2 % average increase from 0.46) (P < 0.001). The algorithm-driven musculoskeletal modeling (107.15 ± 10.24 s) was faster than the manual process (870.07 ± 44.79 s) for the same full-length radiograph. This algorithm provides a fully automatic way to generate a musculoskeletal model from full-length radiograph that achieves an approximately 30 % reduction in distance errors, which could enable personalized musculoskeletal simulation based on full-length radiograph for large scale OA populations.
Collapse
Affiliation(s)
- Junqing Wang
- Department of Orthopedic Surgery and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China.
| | - Shiqi Li
- Department of Orthopedic Surgery and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China; West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China; College of Electrical Engineering, Sichuan University, Chengdu, Sichuan Province, China.
| | - Zitong Sun
- Sichuan University-Pittsburgh Institute (SCUPI), Sichuan University, Chengdu, Sichuan Province, China.
| | - Qicheng Lao
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications (BUPT), Beijing, China
| | - Bin Shen
- Department of Orthopedic Surgery and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China.
| | - Kang Li
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China.
| | - Yong Nie
- Department of Orthopedic Surgery and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China.
| |
Collapse
|
16
|
Li H, Xie J, Song J, Jin C, Xin H, Pan X, Ke J, Yuan Y, Shen H, Ning G. CRCS: An automatic image processing pipeline for hormone level analysis of Cushing's disease. Methods 2024; 222:28-40. [PMID: 38159688 DOI: 10.1016/j.ymeth.2023.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 12/01/2023] [Accepted: 12/25/2023] [Indexed: 01/03/2024] Open
Abstract
Due to the abnormal secretion of adreno-cortico-tropic-hormone (ACTH) by tumors, Cushing's disease leads to hypercortisonemia, a precursor to a series of metabolic disorders and serious complications. Cushing's disease has high recurrence rate, short recurrence time and undiscovered recurrence reason after surgical resection. Qualitative or quantitative automatic image analysis of histology images can potentially in providing insights into Cushing's disease, but still no software has been available to the best of our knowledge. In this study, we propose a quantitative image analysis-based pipeline CRCS, which aims to explore the relationship between the expression level of ACTH in normal cell tissues adjacent to tumor cells and the postoperative prognosis of patients. CRCS mainly consists of image-level clustering, cluster-level multi-modal image registration, patch-level image classification and pixel-level image segmentation on the whole slide imaging (WSI). On both image registration and classification tasks, our method CRCS achieves state-of-the-art performance compared to recently published methods on our collected benchmark dataset. In addition, CRCS achieves an accuracy of 0.83 for postoperative prognosis of 12 cases. CRCS demonstrates great potential for instrumenting automatic diagnosis and treatment for Cushing's disease.
Collapse
Affiliation(s)
- Haiyue Li
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Jing Xie
- Department of Pathology, Ruijin Hospital, Shanghai Jiao Tong University, School of Medicine, 197 Ruijin 2nd Road, Shanghai 200025, China
| | - Jialin Song
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiao Tong University, Xi'an 710049, China
| | - Cheng Jin
- Medical Robot Research Institute, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hongyi Xin
- University of Michigan - Shanghai Jiao Tong University Joint Institute Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Jing Ke
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ye Yuan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Hongbin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Guang Ning
- State Key Laboratory of Medical Genomes, National Clinical Research Center for Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Laboratory of Endocrinology and Metabolism, Institute of Health Sciences, Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS) & Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, China.
| |
Collapse
|
17
|
Gaikani HK, Stolar M, Kriti D, Nislow C, Giaever G. From beer to breadboards: yeast as a force for biological innovation. Genome Biol 2024; 25:10. [PMID: 38178179 PMCID: PMC10768129 DOI: 10.1186/s13059-023-03156-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 12/21/2023] [Indexed: 01/06/2024] Open
Abstract
The history of yeast Saccharomyces cerevisiae, aka brewer's or baker's yeast, is intertwined with our own. Initially domesticated 8,000 years ago to provide sustenance to our ancestors, for the past 150 years, yeast has served as a model research subject and a platform for technology. In this review, we highlight many ways in which yeast has served to catalyze the fields of functional genomics, genome editing, gene-environment interaction investigation, proteomics, and bioinformatics-emphasizing how yeast has served as a catalyst for innovation. Several possible futures for this model organism in synthetic biology, drug personalization, and multi-omics research are also presented.
Collapse
Affiliation(s)
- Hamid Kian Gaikani
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC, Canada
- Department of Chemistry, University of British Columbia, Vancouver, BC, Canada
| | - Monika Stolar
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC, Canada
| | - Divya Kriti
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC, Canada
| | - Corey Nislow
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC, Canada.
| | - Guri Giaever
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
18
|
Li G, Luo X, Hu Z, Wu J, Peng W, Liu J, Zhu X. Essential proteins discovery based on dominance relationship and neighborhood similarity centrality. Health Inf Sci Syst 2023; 11:55. [PMID: 37981988 PMCID: PMC10654316 DOI: 10.1007/s13755-023-00252-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 10/13/2023] [Indexed: 11/21/2023] Open
Abstract
Essential proteins play a vital role in development and reproduction of cells. The identification of essential proteins helps to understand the basic survival of cells. Due to time-consuming, costly and inefficient with biological experimental methods for discovering essential proteins, computational methods have gained increasing attention. In the initial stage, essential proteins are mainly identified by the centralities based on protein-protein interaction (PPI) networks, which limit their identification rate due to many false positives in PPI networks. In this study, a purified PPI network is firstly introduced to reduce the impact of false positives in the PPI network. Secondly, by analyzing the similarity relationship between a protein and its neighbors in the PPI network, a new centrality called neighborhood similarity centrality (NSC) is proposed. Thirdly, based on the subcellular localization and orthologous data, the protein subcellular localization score and ortholog score are calculated, respectively. Fourthly, by analyzing a large number of methods based on multi-feature fusion, it is found that there is a special relationship among features, which is called dominance relationship, then, a novel model based on dominance relationship is proposed. Finally, NSC, subcellular localization score, and ortholog score are fused by the dominance relationship model, and a new method called NSO is proposed. In order to verify the performance of NSO, the seven representative methods (ION, NCCO, E_POC, SON, JDC, PeC, WDC) are compared on yeast datasets. The experimental results show that the NSO method has higher identification rate than other methods.
Collapse
Affiliation(s)
- Gaoshi Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Xinlong Luo
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Zhipeng Hu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Jingli Wu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
| | - Jiafei Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Xiaoshu Zhu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
- School of Computer and Information Security & School of Software Engineering, Guilin University of Electronic Science and Technology, Guilin, China
| |
Collapse
|
19
|
Zhao H, Liu G, Cao X. A seed expansion-based method to identify essential proteins by integrating protein-protein interaction sub-networks and multiple biological characteristics. BMC Bioinformatics 2023; 24:452. [PMID: 38036960 PMCID: PMC10688502 DOI: 10.1186/s12859-023-05583-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 11/24/2023] [Indexed: 12/02/2023] Open
Abstract
BACKGROUND The identification of essential proteins is of great significance in biology and pathology. However, protein-protein interaction (PPI) data obtained through high-throughput technology include a high number of false positives. To overcome this limitation, numerous computational algorithms based on biological characteristics and topological features have been proposed to identify essential proteins. RESULTS In this paper, we propose a novel method named SESN for identifying essential proteins. It is a seed expansion method based on PPI sub-networks and multiple biological characteristics. Firstly, SESN utilizes gene expression data to construct PPI sub-networks. Secondly, seed expansion is performed simultaneously in each sub-network, and the expansion process is based on the topological features of predicted essential proteins. Thirdly, the error correction mechanism is based on multiple biological characteristics and the entire PPI network. Finally, SESN analyzes the impact of each biological characteristic, including protein complex, gene expression data, GO annotations, and subcellular localization, and adopts the biological data with the best experimental results. The output of SESN is a set of predicted essential proteins. CONCLUSIONS The analysis of each component of SESN indicates the effectiveness of all components. We conduct comparison experiments using three datasets from two species, and the experimental results demonstrate that SESN achieves superior performance compared to other methods.
Collapse
Affiliation(s)
- He Zhao
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University, Changchun, China.
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China.
| | - Xintian Cao
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
20
|
Sahoo TR, Patra S, Vipsita S. Decision tree classifier based on topological characteristics of subgraph for the mining of protein complexes from large scale PPI networks. Comput Biol Chem 2023; 106:107935. [PMID: 37536230 DOI: 10.1016/j.compbiolchem.2023.107935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 06/11/2023] [Accepted: 07/23/2023] [Indexed: 08/05/2023]
Abstract
The growing accessibility of large-scale protein interaction data demands extensive research to understand cell organization and its functioning at the network level. Bioinformatics and data mining researchers have extensively studied network clustering to examine the structural and operational features of protein protein interaction (PPI) networks. Clustering PPI networks has proven useful in numerous research over the past two decades for identifying functional modules, understanding the roles of previously unknown proteins, and other purposes. Protein complexes represent one of the essential cellular components for creating biological activities. Inferring protein complexes has been made more accessible by experimental approaches. We offer a novel method that integrates the classification model with local topological data, making it more reliable and efficient. This article describes a decision tree classifier based on topological characteristics of the subgraph for mining protein complexes. The proposed graph-based algorithm is an effective and efficient way to identify protein complexes from large-scale PPI networks. The performance of the proposed algorithm is observed in protein-protein interaction networks of yeast and human in the Database of Interacting Proteins (DIP) and the Biological General Repository for Interaction Datasets (BioGRID) using widely accepted benchmark protein complexes from the comprehensive resource of mammalian protein complexes (CORUM) and the comprehensive catalogue of yeast protein complexes (CYC2008). The outcomes demonstrate that our method can outperform the best-performing supervised, semi-supervised, and unsupervised approaches to detecting protein complexes.
Collapse
Affiliation(s)
- Tushar Ranjan Sahoo
- Bioinformatics Lab, Department of Computer Science, IIIT, Bhubaneswar, India.
| | - Sabyasachi Patra
- Bioinformatics Lab, Department of Computer Science, IIIT, Bhubaneswar, India.
| | - Swati Vipsita
- Bioinformatics Lab, Department of Computer Science, IIIT, Bhubaneswar, India.
| |
Collapse
|
21
|
Chen K, Zhang X, Zhou X, Mi B, Xiao Y, Zhou L, Wu Z, Wu L, Wang X. Privacy preserving federated learning for full heterogeneity. ISA TRANSACTIONS 2023; 141:73-83. [PMID: 37105888 DOI: 10.1016/j.isatra.2023.04.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 04/01/2023] [Accepted: 04/14/2023] [Indexed: 06/19/2023]
Abstract
Federated learning is a novel distribute machine learning paradigm to support cooperative model training among multiple participant clients, where each client keeps its private data locally to protect its data privacy. However, in practical application domains, Federated learning still meets several heterogeneous challenges such data heterogeneity, model heterogeneity, and computation heterogeneity, significantly decreasing its global model performance. To the best of our knowledge, existing solutions only focus on one or two challenges in their heterogeneous settings. In this paper, to address the above challenges simultaneously, we present a novel solution called Full Heterogeneous Federated Learning (FHFL). Firstly, we propose a synthetic data generation approach to mitigate the Non-IID data heterogeneity problem. Secondly, we use knowledge distillation to learn from heterogeneous models of participant clients for model aggregation in the central server. Finally, we produce an opportunistic computation schedule strategy to exploit the idle computation resources for fast-computing clients. Experiment results on different datasets show that our FHFL method can achieve an excellent model training performance. We believe it will serve as a pioneer work for distributed model training among heterogeneous clients in Federated learning.
Collapse
Affiliation(s)
- Kongyang Chen
- Institute of Artificial Intelligence and Blockchain, Guangzhou University, China; Pazhou Lab, Guangzhou, China; Jiangsu Key Laboratory of Media Design and Software Technology, Jiangnan University, Wuxi, China
| | - Xiaoxue Zhang
- School of Computer Science and Cyber Engineering, Guangzhou University, China
| | - Xiuhua Zhou
- School of Computer Science and Cyber Engineering, Guangzhou University, China
| | - Bing Mi
- School of Public Finance and Taxation, Guangdong University of Finance and Economics, China
| | - Yatie Xiao
- School of Computer Science and Cyber Engineering, Guangzhou University, China
| | - Lei Zhou
- Department of Otorhinolaryngology-Head and Neck Surgery, Zhongshan Hospital Affiliated to Fudan University, China
| | - Zhen Wu
- Third Affiliated Hospital, Sun Yat-sen University, China
| | - Lin Wu
- Third Affiliated Hospital, Sun Yat-sen University, China.
| | - Xiaoying Wang
- Third Affiliated Hospital, Sun Yat-sen University, China.
| |
Collapse
|
22
|
Garge RK, Geck RC, Armstrong JO, Dunn B, Boutz DR, Battenhouse A, Leutert M, Dang V, Jiang P, Kwiatkowski D, Peiser T, McElroy H, Marcotte EM, Dunham MJ. Systematic Profiling of Ale Yeast Protein Dynamics across Fermentation and Repitching. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.21.558736. [PMID: 37790497 PMCID: PMC10543003 DOI: 10.1101/2023.09.21.558736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Studying the genetic and molecular characteristics of brewing yeast strains is crucial for understanding their domestication history and adaptations accumulated over time in fermentation environments, and for guiding optimizations to the brewing process itself. Saccharomyces cerevisiae (brewing yeast) is amongst the most profiled organisms on the planet, yet the temporal molecular changes that underlie industrial fermentation and beer brewing remain understudied. Here, we characterized the genomic makeup of a Saccharomyces cerevisiae ale yeast widely used in the production of Hefeweizen beers, and applied shotgun mass spectrometry to systematically measure the proteomic changes throughout two fermentation cycles which were separated by 14 rounds of serial repitching. The resulting brewing yeast proteomics resource includes 64,740 protein abundance measurements. We found that this strain possesses typical genetic characteristics of Saccharomyces cerevisiae ale strains and displayed progressive shifts in molecular processes during fermentation based on protein abundance changes. We observed protein abundance differences between early fermentation batches compared to those separated by 14 rounds of serial repitching. The observed abundance differences occurred mainly in proteins involved in the metabolism of ergosterol and isobutyraldehyde. Our systematic profiling serves as a starting point for deeper characterization of how the yeast proteome changes during commercial fermentations and additionally serves as a resource to guide fermentation protocols, strain handling, and engineering practices in commercial brewing and fermentation environments. Finally, we created a web interface (https://brewing-yeast-proteomics.ccbb.utexas.edu/) to serve as a valuable resource for yeast geneticists, brewers, and biochemists to provide insights into the global trends underlying commercial beer production.
Collapse
Affiliation(s)
- Riddhiman K. Garge
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, USA
| | - Renee C. Geck
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Joseph O. Armstrong
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Barbara Dunn
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Daniel R. Boutz
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, USA
- Houston Methodist Research Institute, Houston, Texas, USA
| | - Anna Battenhouse
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, USA
| | - Mario Leutert
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| | - Vy Dang
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, USA
| | - Pengyao Jiang
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | | | | | | | - Edward M. Marcotte
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, USA
| | - Maitreya J. Dunham
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| |
Collapse
|
23
|
Han Y, Liu M, Wang Z. Key protein identification by integrating protein complex information and multi-biological features. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:18191-18206. [PMID: 38052554 DOI: 10.3934/mbe.2023808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Identifying key proteins based on protein-protein interaction networks has emerged as a prominent area of research in bioinformatics. However, current methods exhibit certain limitations, such as the omission of subcellular localization information and the disregard for the impact of topological structure noise on the reliability of key protein identification. Moreover, the influence of proteins outside a complex but interacting with proteins inside the complex on complex participation tends to be overlooked. Addressing these shortcomings, this paper presents a novel method for key protein identification that integrates protein complex information with multiple biological features. This approach offers a comprehensive evaluation of protein importance by considering subcellular localization centrality, topological centrality weighted by gene ontology (GO) similarity and complex participation centrality. Experimental results, including traditional statistical metrics, jackknife methodology metric and key protein overlap or difference, demonstrate that the proposed method not only achieves higher accuracy in identifying key proteins compared to nine classical methods but also exhibits robustness across diverse protein-protein interaction networks.
Collapse
Affiliation(s)
- Yongyin Han
- School of Computer Science and Technology, China University of Mining and Technology, China
- Xuzhou College of Industrial Technology, China
| | - Maolin Liu
- School of Computer Science and Technology, China University of Mining and Technology, China
| | - Zhixiao Wang
- School of Computer Science and Technology, China University of Mining and Technology, China
| |
Collapse
|
24
|
Chen H, Pelizzola M, Futschik A. Haplotype based testing for a better understanding of the selective architecture. BMC Bioinformatics 2023; 24:322. [PMID: 37633901 PMCID: PMC10463365 DOI: 10.1186/s12859-023-05437-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 08/03/2023] [Indexed: 08/28/2023] Open
Abstract
BACKGROUND The identification of genomic regions affected by selection is one of the most important goals in population genetics. If temporal data are available, allele frequency changes at SNP positions are often used for this purpose. Here we provide a new testing approach that uses haplotype frequencies instead of allele frequencies. RESULTS Using simulated data, we show that compared to SNP based test, our approach has higher power, especially when the number of candidate haplotypes is small or moderate. To improve power when the number of haplotypes is large, we investigate methods to combine them with a moderate number of haplotype subsets. Haplotype frequencies can often be recovered with less noise than SNP frequencies, especially under pool sequencing, giving our test an additional advantage. Furthermore, spurious outlier SNPs may lead to false positives, a problem usually not encountered when working with haplotypes. Post hoc tests for the number of selected haplotypes and for differences between their selection coefficients are also provided for a better understanding of the underlying selection dynamics. An application on a real data set further illustrates the performance benefits. CONCLUSIONS Due to less multiple testing correction and noise reduction, haplotype based testing is able to outperform SNP based tests in terms of power in most scenarios.
Collapse
Affiliation(s)
- Haoyu Chen
- University of Veterinary Medicine Vienna, Vienna, Austria
- Vienna Graduate School of Population Genetics, Vienna, Austria
| | | | | |
Collapse
|
25
|
Wacholder A, Parikh SB, Coelho NC, Acar O, Houghton C, Chou L, Carvunis AR. A vast evolutionarily transient translatome contributes to phenotype and fitness. Cell Syst 2023; 14:363-381.e8. [PMID: 37164009 PMCID: PMC10348077 DOI: 10.1016/j.cels.2023.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 01/30/2023] [Accepted: 04/06/2023] [Indexed: 05/12/2023]
Abstract
Translation is the process by which ribosomes synthesize proteins. Ribosome profiling recently revealed that many short sequences previously thought to be noncoding are pervasively translated. To identify protein-coding genes in this noncanonical translatome, we combine an integrative framework for extremely sensitive ribosome profiling analysis, iRibo, with high-powered selection inferences tailored for short sequences. We construct a reference translatome for Saccharomyces cerevisiae comprising 5,400 canonical and almost 19,000 noncanonical translated elements. Only 14 noncanonical elements were evolving under detectable purifying selection. A representative subset of translated elements lacking signatures of selection demonstrated involvement in processes including DNA repair, stress response, and post-transcriptional regulation. Our results suggest that most translated elements are not conserved protein-coding genes and contribute to genotype-phenotype relationships through fast-evolving molecular mechanisms.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Saurin Bipin Parikh
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Omer Acar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Lin Chou
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
26
|
Wong ED, Miyasato SR, Aleksander S, Karra K, Nash RS, Skrzypek MS, Weng S, Engel SR, Cherry JM. Saccharomyces genome database update: server architecture, pan-genome nomenclature, and external resources. Genetics 2023; 224:iyac191. [PMID: 36607068 PMCID: PMC10158836 DOI: 10.1093/genetics/iyac191] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 11/16/2022] [Accepted: 12/21/2022] [Indexed: 01/07/2023] Open
Abstract
As one of the first model organism knowledgebases, Saccharomyces Genome Database (SGD) has been supporting the scientific research community since 1993. As technologies and research evolve, so does SGD: from updates in software architecture, to curation of novel data types, to incorporation of data from, and collaboration with, other knowledgebases. We are continuing to make steps toward providing the community with an S. cerevisiae pan-genome. Here, we describe software upgrades, a new nomenclature system for genes not found in the reference strain, and additions to gene pages. With these improvements, we aim to remain a leading resource for students, researchers, and the broader scientific community.
Collapse
Affiliation(s)
- Edith D Wong
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Suzi Aleksander
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Kalpana Karra
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Robert S Nash
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Marek S Skrzypek
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Shuai Weng
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Stacia R Engel
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
27
|
Lê-Bury P, Druart K, Savin C, Lechat P, Mas Fiol G, Matondo M, Bécavin C, Dussurget O, Pizarro-Cerdá J. Yersiniomics, a Multi-Omics Interactive Database for Yersinia Species. Microbiol Spectr 2023; 11:e0382622. [PMID: 36847572 PMCID: PMC10100798 DOI: 10.1128/spectrum.03826-22] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 01/26/2023] [Indexed: 03/01/2023] Open
Abstract
The genus Yersinia includes a large variety of nonpathogenic and life-threatening pathogenic bacteria, which cause a broad spectrum of diseases in humans and animals, such as plague, enteritis, Far East scarlet-like fever (FESLF), and enteric redmouth disease. Like most clinically relevant microorganisms, Yersinia spp. are currently subjected to intense multi-omics investigations whose numbers have increased extensively in recent years, generating massive amounts of data useful for diagnostic and therapeutic developments. The lack of a simple and centralized way to exploit these data led us to design Yersiniomics, a web-based platform allowing straightforward analysis of Yersinia omics data. Yersiniomics contains a curated multi-omics database at its core, gathering 200 genomic, 317 transcriptomic, and 62 proteomic data sets for Yersinia species. It integrates genomic, transcriptomic, and proteomic browsers, a genome viewer, and a heatmap viewer to navigate within genomes and experimental conditions. For streamlined access to structural and functional properties, it directly links each gene to GenBank, the Kyoto Encyclopedia of Genes and Genomes (KEGG), UniProt, InterPro, IntAct, and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and each experiment to Gene Expression Omnibus (GEO), the European Nucleotide Archive (ENA), or the Proteomics Identifications Database (PRIDE). Yersiniomics provides a powerful tool for microbiologists to assist with investigations ranging from specific gene studies to systems biology studies. IMPORTANCE The expanding genus Yersinia is composed of multiple nonpathogenic species and a few pathogenic species, including the deadly etiologic agent of plague, Yersinia pestis. In 2 decades, the number of genomic, transcriptomic, and proteomic studies on Yersinia grew massively, delivering a wealth of data. We developed Yersiniomics, an interactive web-based platform, to centralize and analyze omics data sets on Yersinia species. The platform allows user-friendly navigation between genomic data, expression data, and experimental conditions. Yersiniomics will be a valuable tool to microbiologists.
Collapse
Affiliation(s)
- Pierre Lê-Bury
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
| | - Karen Druart
- Institut Pasteur, Université Paris Cité, CNRS USR2000, Mass Spectrometry for Biology Unit, Proteomic Platform, Paris, France
| | - Cyril Savin
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
- Institut Pasteur, Université Paris Cité, Yersinia National Reference Laboratory, WHO Collaborating Research & Reference Centre for Plague FRA-140, Paris, France
| | - Pierre Lechat
- Institut Pasteur, Université Paris Cité, ALPS, Bioinformatic Hub, Paris, France
| | - Guillem Mas Fiol
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
| | - Mariette Matondo
- Institut Pasteur, Université Paris Cité, CNRS USR2000, Mass Spectrometry for Biology Unit, Proteomic Platform, Paris, France
| | | | - Olivier Dussurget
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
| | - Javier Pizarro-Cerdá
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
- Institut Pasteur, Université Paris Cité, Yersinia National Reference Laboratory, WHO Collaborating Research & Reference Centre for Plague FRA-140, Paris, France
| |
Collapse
|
28
|
Chen H, Cai Y, Ji C, Selvaraj G, Wei D, Wu H. AdaPPI: identification of novel protein functional modules via adaptive graph convolution networks in a protein-protein interaction network. Brief Bioinform 2023; 24:6918779. [PMID: 36526282 DOI: 10.1093/bib/bbac523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 10/10/2022] [Accepted: 11/02/2022] [Indexed: 12/23/2022] Open
Abstract
Identifying unknown protein functional modules, such as protein complexes and biological pathways, from protein-protein interaction (PPI) networks, provides biologists with an opportunity to efficiently understand cellular function and organization. Finding complex nonlinear relationships in underlying functional modules may involve a long-chain of PPI and pose great challenges in a PPI network with an unevenly sparse and dense node distribution. To overcome these challenges, we propose AdaPPI, an adaptive convolution graph network in PPI networks to predict protein functional modules. We first suggest an attributed graph node presentation algorithm. It can effectively integrate protein gene ontology attributes and network topology, and adaptively aggregates low- or high-order graph structural information according to the node distribution by considering graph node smoothness. Based on the obtained node representations, core cliques and expansion algorithms are applied to find functional modules in PPI networks. Comprehensive performance evaluations and case studies indicate that the framework significantly outperforms state-of-the-art methods. We also presented potential functional modules based on their confidence.
Collapse
|
29
|
Chen S, Huang C, Wang L, Zhou S. A disease-related essential protein prediction model based on the transfer neural network. Front Genet 2023; 13:1087294. [PMID: 36685976 PMCID: PMC9845409 DOI: 10.3389/fgene.2022.1087294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 12/14/2022] [Indexed: 01/06/2023] Open
Abstract
Essential proteins play important roles in the development and survival of organisms whose mutations are proven to be the drivers of common internal diseases having higher prevalence rates. Due to high costs of traditional biological experiments, an improved Transfer Neural Network (TNN) was designed to extract raw features from multiple biological information of proteins first, and then, based on the newly-constructed Transfer Neural Network, a novel computational model called TNNM was designed to infer essential proteins in this paper. Different from traditional Markov chain, since Transfer Neural Network adopted the gradient descent algorithm to automatically obtain the transition probability matrix, the prediction accuracy of TNNM was greatly improved. Moreover, additional antecedent memory coefficient and bias term were introduced in Transfer Neural Network, which further enhanced both the robustness and the non-linear expression ability of TNNM as well. Finally, in order to evaluate the identification performance of TNNM, intensive experiments have been executed based on two well-known public databases separately, and experimental results show that TNNM can achieve better performance than representative state-of-the-art prediction models in terms of both predictive accuracies and decline rate of accuracies. Therefore, TNNM may play an important role in key protein prediction in the future.
Collapse
Affiliation(s)
- Sisi Chen
- The First Hospital of Hunan University of Chinese Medicine, Changsha, Hunan, China
| | - Chiguo Huang
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China,*Correspondence: Chiguo Huang, ; Lei Wang, ; Shunxian Zhou,
| | - Lei Wang
- The First Hospital of Hunan University of Chinese Medicine, Changsha, Hunan, China,Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China,*Correspondence: Chiguo Huang, ; Lei Wang, ; Shunxian Zhou,
| | - Shunxian Zhou
- The First Hospital of Hunan University of Chinese Medicine, Changsha, Hunan, China,Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China,College of Information Science and Engineering, Hunan Women’s University, Changsha, Hunan, China,*Correspondence: Chiguo Huang, ; Lei Wang, ; Shunxian Zhou,
| |
Collapse
|
30
|
Xue X, Zhang W, Fan A. Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins. PLoS One 2023; 18:e0284274. [PMID: 37083829 PMCID: PMC10121005 DOI: 10.1371/journal.pone.0284274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 03/28/2023] [Indexed: 04/22/2023] Open
Abstract
Identifying key proteins from protein-protein interaction (PPI) networks is one of the most fundamental and important tasks for computational biologists. However, the protein interactions obtained by high-throughput technology are characterized by a high false positive rate, which severely hinders the prediction accuracy of the current computational methods. In this paper, we propose a novel strategy to identify key proteins by constructing reliable PPI networks. Five Gene Ontology (GO)-based semantic similarity measurements (Jiang, Lin, Rel, Resnik, and Wang) are used to calculate the confidence scores for protein pairs under three annotation terms (Molecular function (MF), Biological process (BP), and Cellular component (CC)). The protein pairs with low similarity values are assumed to be low-confidence links, and the refined PPI networks are constructed by filtering the low-confidence links. Six topology-based centrality methods (the BC, DC, EC, NC, SC, and aveNC) are applied to test the performance of the measurements under the original network and refined network. We systematically compare the performance of the five semantic similarity metrics with the three GO annotation terms on four benchmark datasets, and the simulation results show that the performance of these centrality methods under refined PPI networks is relatively better than that under the original networks. Resnik with a BP annotation term performs best among all five metrics with the three annotation terms. These findings suggest the importance of semantic similarity metrics in measuring the reliability of the links between proteins and highlight the Resnik metric with the BP annotation term as a favourable choice.
Collapse
Affiliation(s)
- Xiaoli Xue
- School of Science, East China Jiaotong University, Nanchang, China
| | - Wei Zhang
- School of Science, East China Jiaotong University, Nanchang, China
| | - Anjing Fan
- School of Computer and Information Engineering, Anyang Normal University, Anyang, China
| |
Collapse
|
31
|
孙 玉, 刘 嘉, 孙 泽, 韩 建, 于 宁. [A generative adversarial network-based unsupervised domain adaptation method for magnetic resonance image segmentation]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2022; 39:1181-1188. [PMID: 36575088 PMCID: PMC9927195 DOI: 10.7507/1001-5515.202203009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 10/23/2022] [Indexed: 12/29/2022]
Abstract
Intelligent medical image segmentation methods have been rapidly developed and applied, while a significant challenge is domain shift. That is, the segmentation performance degrades due to distribution differences between the source domain and the target domain. This paper proposed an unsupervised end-to-end domain adaptation medical image segmentation method based on the generative adversarial network (GAN). A network training and adjustment model was designed, including segmentation and discriminant networks. In the segmentation network, the residual module was used as the basic module to increase feature reusability and reduce model optimization difficulty. Further, it learned cross-domain features at the image feature level with the help of the discriminant network and a combination of segmentation loss with adversarial loss. The discriminant network took the convolutional neural network and used the labels from the source domain, to distinguish whether the segmentation result of the generated network is from the source domain or the target domain. The whole training process was unsupervised. The proposed method was tested with experiments on a public dataset of knee magnetic resonance (MR) images and the clinical dataset from our cooperative hospital. With our method, the mean Dice similarity coefficient (DSC) of segmentation results increased by 2.52% and 6.10% to the classical feature level and image level domain adaptive method. The proposed method effectively improves the domain adaptive ability of the segmentation method, significantly improves the segmentation accuracy of the tibia and femur, and can better solve the domain transfer problem in MR image segmentation.
Collapse
Affiliation(s)
- 玉波 孙
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
- 南开大学 天津市智能机器人技术重点实验室(天津 300350)Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin 300350, P. R. China
| | - 嘉男 刘
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
- 南开大学 天津市智能机器人技术重点实验室(天津 300350)Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin 300350, P. R. China
| | - 泽文 孙
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
| | - 建达 韩
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
- 南开大学 天津市智能机器人技术重点实验室(天津 300350)Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin 300350, P. R. China
- 北京大学第三医院 运动医学研究所(北京 100083)Institute of Sports Medicine, Peking University Third Hospital, Beijing 100083, P. R. China
| | - 宁波 于
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
- 南开大学 天津市智能机器人技术重点实验室(天津 300350)Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin 300350, P. R. China
- 北京大学第三医院 运动医学研究所(北京 100083)Institute of Sports Medicine, Peking University Third Hospital, Beijing 100083, P. R. China
| |
Collapse
|
32
|
Wang L, Peng J, Kuang L, Tan Y, Chen Z. Identification of Essential Proteins Based on Local Random Walk and Adaptive Multi-View Multi-Label Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3507-3516. [PMID: 34788220 DOI: 10.1109/tcbb.2021.3128638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Accumulating evidences have indicated that essential proteins play vital roles in human physiological process. In recent years, although researches on prediction of essential proteins have been developing rapidly, there are as well various limitations such as unsatisfactory data suitability, low accuracy of predictive results and so on. In this manuscript, a novel method called RWAMVL was proposed to predict essential proteins based on the Random Walk and the Adaptive Multi-View multi-label Learning. In RWAMVL, considering that the inherent noise is ubiquitous in existing datasets of known protein-protein interactions (PPIs), a variety of different features including biological features of proteins and topological features of PPI networks were obtained by adopting adaptive multi-view multi-label learning first. And then, an improved random walk method was designed to detect essential proteins based on these different features. Finally, in order to verify the predictive performance of RWAMVL, intensive experiments were done to compare it with multiple state-of-the-art predictive methods under different expeditionary frameworks. And as a result, RWAMVL was proven that it can achieve better prediction accuracy than all those competitive methods, which demonstrated as well that RWAMVL may be a potential tool for prediction of key proteins in the future.
Collapse
|
33
|
DhDIT2 Encodes a Debaryomyces hansenii Cytochrome P450 Involved in Benzo(a)pyrene Degradation-A Proposal for Mycoremediation. J Fungi (Basel) 2022; 8:jof8111150. [PMID: 36354917 PMCID: PMC9698926 DOI: 10.3390/jof8111150] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 10/25/2022] [Accepted: 10/26/2022] [Indexed: 11/17/2022] Open
Abstract
Pollutants, such as polycyclic aromatic hydrocarbons (PAHs), e.g., benzo(a)pyrene (BaP), are common components of contaminating mixtures. Such compounds are ubiquitous, extremely toxic, and they pollute soils and aquatic niches. The need for new microorganism-based remediation strategies prompted researchers to identify the most suitable organisms to eliminate pollutants without interfering with the ecosystem. We analyzed the effect caused by BaP on the growth properties of Candida albicans, Debaryomyces hansenii, Rhodotorula mucilaginosa, and Saccharomyces cerevisiae. Their ability to metabolize BaP was also evaluated. The aim was to identify an optimal candidate to be used as the central component of a mycoremediation strategy. The results show that all four yeast species metabolized BaP by more than 70%, whereas their viability was not affected. The best results were observed for D. hansenii. When an incubation was performed in the presence of a cytochrome P450 (CYP) inhibitor, no BaP degradation was observed. Thus, the initial oxidation step is mediated by a CYP enzyme. Additionally, this study identified the D. hansenii DhDIT2 gene as essential to perform the initial degradation of BaP. Hence, we propose that D. hansenii and a S. cerevisiae expressing the DhDIT2 gene are suitable candidates to degrade BaP in contaminated environments.
Collapse
|
34
|
Sahoo TR, Vipsita S, Patra S. Complex Prediction in Large PPI Networks Using Expansion and Stripe of Core Cliques. Interdiscip Sci 2022:10.1007/s12539-022-00541-z. [PMID: 36306022 DOI: 10.1007/s12539-022-00541-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 10/06/2022] [Accepted: 10/07/2022] [Indexed: 06/16/2023]
Abstract
The widespread availability and importance of large-scale protein-protein interaction (PPI) data demand a flurry of research efforts to understand the organisation of a cell and its functionality by analysing these data at the network level. In the bioinformatics and data mining fields, network clustering acquired a lot of attraction to examine a PPI network's topological and functional aspects. The clustering of PPI networks has been proven to be an excellent method for discovering functional modules, disclosing functions of unknown proteins, and other tasks in numerous research over the last decade. This research proposes a unique graph mining approach to detect protein complexes using dense neighbourhoods (highly connected regions) in an interaction graph. Our technique first finds size-3 cliques associated with each edge (protein interaction), and then these core cliques are expanded to form high-density subgraphs. Loosely connected proteins are stripped out from these subgraphs to produce a potential protein complex. Finally, the redundancy is removed based on the Jaccard coefficient. Computational results are presented on the yeast and human protein interaction dataset to highlight our proposed technique's efficiency. Predicted protein complexes of the proposed approach have a significantly higher score of similarity to those used as gold standards in the CYC-2008 and CORUM benchmark databases than other existing approaches.
Collapse
Affiliation(s)
| | - Swati Vipsita
- CSE, IIIT Bhubaneswar, Gothapatna, Bhubaneswar, Odisha, 751003, India
| | - Sabyasachi Patra
- CSE, IIIT Bhubaneswar, Gothapatna, Bhubaneswar, Odisha, 751003, India
| |
Collapse
|
35
|
Jagtap S, Pirayre A, Bidard F, Duval L, Malliaros FD. BRANEnet: embedding multilayer networks for omics data integration. BMC Bioinformatics 2022; 23:429. [PMID: 36245002 PMCID: PMC9575224 DOI: 10.1186/s12859-022-04955-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Accepted: 08/24/2022] [Indexed: 11/10/2022] Open
Abstract
Background Gene expression is regulated at different molecular levels, including chromatin accessibility, transcription, RNA maturation, and transport. These regulatory mechanisms have strong connections with cellular metabolism. In order to study the cellular system and its functioning, omics data at each molecular level can be generated and efficiently integrated. Here, we propose BRANEnet, a novel multi-omics integration framework for multilayer heterogeneous networks. BRANEnet is an expressive, scalable, and versatile method to learn node embeddings, leveraging random walk information within a matrix factorization framework. Our goal is to efficiently integrate multi-omics data to study different regulatory aspects of multilayered processes that occur in organisms. We evaluate our framework using multi-omics data of Saccharomyces cerevisiae, a well-studied yeast model organism. Results We test BRANEnet on transcriptomics (RNA-seq) and targeted metabolomics (NMR) data for wild-type yeast strain during a heat-shock time course of 0, 20, and 120 min. Our framework learns features for differentially expressed bio-molecules showing heat stress response. We demonstrate the applicability of the learned features for targeted omics inference tasks: transcription factor (TF)-target prediction, integrated omics network (ION) inference, and module identification. The performance of BRANEnet is compared to existing network integration methods. Our model outperforms baseline methods by achieving high prediction scores for a variety of downstream tasks. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04955-w.
Collapse
Affiliation(s)
- Surabhi Jagtap
- Université Paris-Saclay, CentraleSupélec, Inria, 3 Rue Joliot Curie, 91190, Gif-Sur-Yvette, France.,IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| | - Aurélie Pirayre
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| | - Frédérique Bidard
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| | - Laurent Duval
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| | - Fragkiskos D Malliaros
- Université Paris-Saclay, CentraleSupélec, Inria, 3 Rue Joliot Curie, 91190, Gif-Sur-Yvette, France.
| |
Collapse
|
36
|
Li Q, Chen X, Lin L, Zhang L, Wang L, Bao J, Zhang D. Transcriptomic Dynamics of Active and Inactive States of Rho GTPase MoRho3 in Magnaporthe oryzae. J Fungi (Basel) 2022; 8:jof8101060. [PMID: 36294629 PMCID: PMC9605073 DOI: 10.3390/jof8101060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 10/04/2022] [Accepted: 10/05/2022] [Indexed: 11/17/2022] Open
Abstract
The small Rho GTPase acts as a molecular switch in eukaryotic signal transduction, which plays a critical role in polar cell growth and vesicle trafficking. Previous studies demonstrated that constitutively active (CA) mutant strains, of MoRho3-CA were defective in appressorium formation. While dominant-negative (DN) mutant strains MoRho3-DN shows defects in polar growth. However, the molecular dynamics of MoRho3-mediated regulatory networks in the pathogenesis of Magnaporthe oryzae still needs to be uncovered. Here, we perform comparative transcriptomic profiling of MoRho3-CA and MoRho3-DN mutant strains using a high-throughput RNA sequencing approach. We find that genetic manipulation of MoRho3 significantly disrupts the expression of 28 homologs of Saccharomyces cerevisiae Rho3-interacting proteins, including EXO70, BNI1, and BNI2 in the MoRho3 CA, DN mutant strains. Functional enrichment analyses of up-regulated DEGs reveal a significant enrichment of genes associated with ribosome biogenesis in the MoRho3-CA mutant strain. Down-regulated DEGs in the MoRho3-CA mutant strains shows significant enrichment in starch/sucrose metabolism and the ABC transporter pathway. Moreover, analyses of down-regulated DEGs in the in MoRho3-DN reveals an over-representation of genes enriched in metabolic pathways. In addition, we observe a significant suppression in the expression levels of secreted proteins suppressed in both MoRho3-CA and DN mutant strains. Together, our results uncover expression dynamics mediated by two states of the small GTPase MoRho3, demonstrating its crucial roles in regulating the expression of ribosome biogenesis and secreted proteins.
Collapse
Affiliation(s)
- Qian Li
- Meishan Vocational Technical College, Ministerial and Provincial Joint Innovation Centre for Safety Production of Cross-Strait Crops, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Correspondence: (Q.L.); (D.Z.)
| | - Xi Chen
- State Key Laboratory for Ecological Pest Control of Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Lianyu Lin
- State Key Laboratory for Ecological Pest Control of Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Lianhu Zhang
- State Key Laboratory for Ecological Pest Control of Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- College of Agronomy, Jiangxi Agricultural University, Nanchang 330045, China
| | - Li Wang
- Meishan Vocational Technical College, Ministerial and Provincial Joint Innovation Centre for Safety Production of Cross-Strait Crops, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Jiandong Bao
- State Key Laboratory for Ecological Pest Control of Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Dongmei Zhang
- State Key Laboratory for Ecological Pest Control of Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Correspondence: (Q.L.); (D.Z.)
| |
Collapse
|
37
|
Dynamical modeling for non-Gaussian data with high-dimensional sparse ordinary differential equations. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107483] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
38
|
Controlling gene expression with deep generative design of regulatory DNA. Nat Commun 2022; 13:5099. [PMID: 36042233 PMCID: PMC9427793 DOI: 10.1038/s41467-022-32818-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 08/18/2022] [Indexed: 11/25/2022] Open
Abstract
Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Using mutagenesis typically requires screening sizable random DNA libraries, which limits the designs to span merely a short section of the promoter and restricts their control of gene expression. Here, we prototype a deep learning strategy based on generative adversarial networks (GAN) by learning directly from genomic and transcriptomic data. Our ExpressionGAN can traverse the entire regulatory sequence-expression landscape in a gene-specific manner, generating regulatory DNA with prespecified target mRNA levels spanning the whole gene regulatory structure including coding and adjacent non-coding regions. Despite high sequence divergence from natural DNA, in vivo measurements show that 57% of the highly-expressed synthetic sequences surpass the expression levels of highly-expressed natural controls. This demonstrates the applicability and relevance of deep generative design to expand our knowledge and control of gene expression regulation in any desired organism, condition or tissue. Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Here the authors present EspressionGAN, a generative adversarial network that uses genomic and transcriptomic data to generate regulatory sequences.
Collapse
|
39
|
Rule-Based Pruning and In Silico Identification of Essential Proteins in Yeast PPIN. Cells 2022; 11:cells11172648. [PMID: 36078056 PMCID: PMC9454873 DOI: 10.3390/cells11172648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/18/2022] [Accepted: 08/22/2022] [Indexed: 11/25/2022] Open
Abstract
Proteins are vital for the significant cellular activities of living organisms. However, not all of them are essential. Identifying essential proteins through different biological experiments is relatively more laborious and time-consuming than the computational approaches used in recent times. However, practical implementation of conventional scientific methods sometimes becomes challenging due to poor performance impact in specific scenarios. Thus, more developed and efficient computational prediction models are required for essential protein identification. An effective methodology is proposed in this research, capable of predicting essential proteins in a refined yeast protein–protein interaction network (PPIN). The rule-based refinement is done using protein complex and local interaction density information derived from the neighborhood properties of proteins in the network. Identification and pruning of non-essential proteins are equally crucial here. In the initial phase, careful assessment is performed by applying node and edge weights to identify and discard the non-essential proteins from the interaction network. Three cut-off levels are considered for each node and edge weight for pruning the non-essential proteins. Once the PPIN has been filtered out, the second phase starts with two centralities-based approaches: (1) local interaction density (LID) and (2) local interaction density with protein complex (LIDC), which are successively implemented to identify the essential proteins in the yeast PPIN. Our proposed methodology achieves better performance in comparison to the existing state-of-the-art techniques.
Collapse
|
40
|
Zhang N, Nanshan M, Cao J. A Joint estimation approach to sparse additive ordinary differential equations. STATISTICS AND COMPUTING 2022; 32:69. [PMID: 36033975 PMCID: PMC9395913 DOI: 10.1007/s11222-022-10117-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 06/13/2022] [Indexed: 06/15/2023]
Abstract
Ordinary differential equations (ODEs) are widely used to characterize the dynamics of complex systems in real applications. In this article, we propose a novel joint estimation approach for generalized sparse additive ODEs where observations are allowed to be non-Gaussian. The new method is unified with existing collocation methods by considering the likelihood, ODE fidelity and sparse regularization simultaneously. We design a block coordinate descent algorithm for optimizing the non-convex and non-differentiable objective function. The global convergence of the algorithm is established. The simulation study and two applications demonstrate the superior performance of the proposed method in estimation and improved performance of identifying the sparse structure.
Collapse
Affiliation(s)
- Nan Zhang
- School of Data Science, Fudan University, Shanghai, China
| | - Muye Nanshan
- School of Data Science, Fudan University, Shanghai, China
| | - Jiguo Cao
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
41
|
Hamed Mozaffari M, Tay LL. Overfitting One-Dimensional convolutional neural networks for Raman spectra identification. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 272:120961. [PMID: 35124481 DOI: 10.1016/j.saa.2022.120961] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 01/11/2022] [Accepted: 01/25/2022] [Indexed: 06/14/2023]
Abstract
Dedicated handheld spectrometers have been adopted by first responders and law enforcement agencies for in situ identification of unknown substances. Real-time spectral matching process is a pixel-by-pixel comparing of the unknown spectra with reference data. In fact, the success rate of this process using a miniaturized portable Raman spectrometer relies mainly on the variety of reference data carried on the memory. This is a hurdle in miniaturizing and affordability of the current handheld spectrometers due to limited memory and computational power. In this study, we aim to mitigate this issue by utilizing the power of one-dimensional Convolutional Neural Networks (1DCNN) trained on millions of Raman spectra augmented from standard available reference databases. Specifically, an intentionally overfitted 1DCNN model can be substituted with the reference database of handheld spectrometers to alleviate the memory size and increase the identification process speed and accuracy. Our experimental results revealed that 1DCNN could identify one pure unknown Raman instance from thousands of classes with a high accuracy.
Collapse
Affiliation(s)
- M Hamed Mozaffari
- Metrology Research Centre, National Research Council Canada, Ottawa, ON K1A0R6, Canada.
| | - Li-Lin Tay
- Metrology Research Centre, National Research Council Canada, Ottawa, ON K1A0R6, Canada
| |
Collapse
|
42
|
Wu C, Feng Z, Zheng J, Zhang H, Cao J, Yan H. Star topology convolution for graph representation learning. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00744-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AbstractWe present a novel graph convolutional method called star topology convolution (STC). This method makes graph convolution more similar to conventional convolutional neural networks (CNNs) in Euclidean feature spaces. STC learns subgraphs which have a star topology rather than learning a fixed graph like most spectral methods. Due to the properties of a star topology, STC is graph-scale free (without a fixed graph size constraint). It has fewer parameters in its convolutional filter and is inductive, so it is more flexible and can be applied to large and evolving graphs. The convolutional filter is learnable and localized, similar to CNNs in Euclidean feature spaces, and can share weights across graphs. To test the method, STC was compared with the state-of-the-art graph convolutional methods in a supervised learning setting on nine node properties prediction benchmark datasets: Cora, Citeseer, Pubmed, PPI, Arxiv, MAG, ACM, DBLP, and IMDB. The experimental results showed that STC achieved the state-of-the-art performance on all these datasets and maintained good robustness. In an essential protein identification task, STC outperformed the state-of-the-art essential protein identification methods. An application of using pretrained STC as the embedding for feature extraction of some downstream classification tasks was introduced. The experimental results showed that STC can share weights across different graphs and be used as the embedding to improve the performance of downstream tasks.
Collapse
|
43
|
Yang TH, Lin YC, Hsia M, Liao ZY. SSRTool: a web tool for evaluating RNA secondary structure predictions based on species-specific functional interpretability. Comput Struct Biotechnol J 2022; 20:2473-2483. [PMID: 35664227 PMCID: PMC9136272 DOI: 10.1016/j.csbj.2022.05.028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Revised: 05/13/2022] [Accepted: 05/13/2022] [Indexed: 01/02/2023] Open
Abstract
RNA secondary structures can carry out essential cellular functions alone or interact with one another to form the hierarchical tertiary structures. Experimental structure identification approa ches can show the in vitro structures of RNA molecules. However, they usually have limits in the resolution and are costly. In silico structure prediction tools are thus primarily relied on for pre-experiment analysis. Various structure prediction models have been developed over the decades. Since these tools are usually used before knowing the actual RNA structures, evaluating and ranking the pile of secondary structure predictions of a given sequence is essential in computational analysis. In this research, we implemented a web service called SSRTool (RNA Secondary Structure prediction Ranking Tool) to assist in the ranking and evaluation of the generated predicted structures of a given sequence. Based on the computed species-specific interpretability significance in four common RNA structure–function aspects, SSRTool provides three functions along with visualization interfaces: (1) Rank user-generated predictions. (2) Provide an automated streamline of structure prediction and ranking for a given sequence. (3) Infer the functional aspects of a given structure. We demonstrated the applicability of SSRTool via real case studies and reported the similar trends between computed species-specific rankings and the corresponding prediction F1 values. The SSRTool web service is available online at https://cobisHSS0.im.nuk.edu.tw/SSRTool/, http://cosbi3.ee.ncku.edu.tw/SSRTool/, or the redirecting site https://github.com/cobisLab/SSRTool/.
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
- Corresponding author.
| | - Yu-Cian Lin
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| | - Min Hsia
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| | - Zhan-Yi Liao
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| |
Collapse
|
44
|
Zhang J, Zhu M, Qian Y. protein2vec: Predicting Protein-Protein Interactions Based on LSTM. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1257-1266. [PMID: 32750870 DOI: 10.1109/tcbb.2020.3003941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The semantic similarity of gene ontology (GO) terms is widely used to predict protein-protein interactions (PPIs). The traditional semantic similarity measures are based mainly on manually crafted features, which may ignore some important hidden information of the gene ontology. Moreover, those methods usually obtain the similarity between proteins from similarity between GO terms by some simple statistical rules, such as MAX and BMA (best-match average), oversimplifying the possible complex relationship between the proteins and the GO terms annotated with them. To overcome the two deficiencies, we propose a new method named protein2vec, which characterizes a protein with a vector based on the GO terms annotated to it and combines the information of both the GO and known PPIs. We firstly try to apply the network embedding algorithm on the GO network to generate feature vectors for each GO term. Then, Long Short-Time Memory (LSTM) encodes the feature vectors of the GO terms annotated with a protein into another vector (called protein vector). Finally, two protein vectors are forwarded into a feedforward neural network to predict the interaction between the two corresponding proteins. The experimental results show that protein2vec outperforms almost all commonly used traditional semantic similarity methods.
Collapse
|
45
|
Zhang Z, Luo Y, Jiang M, Wu D, Zhang W, Yan W, Zhao B. An efficient strategy for identifying essential proteins based on homology, subcellular location and protein-protein interaction information. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:6331-6343. [PMID: 35603404 DOI: 10.3934/mbe.2022296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
High throughput biological experiments are expensive and time consuming. For the past few years, many computational methods based on biological information have been proposed and widely used to understand the biological background. However, the processing of biological information data inevitably produces false positive and false negative data, such as the noise in the Protein-Protein Interaction (PPI) networks and the noise generated by the integration of a variety of biological information. How to solve these noise problems is the key role in essential protein predictions. An Identifying Essential Proteins model based on non-negative Matrix Symmetric tri-Factorization and multiple biological information (IEPMSF) is proposed in this paper, which utilizes only the PPI network proteins common neighbor characters to develop a weighted network, and uses the non-negative matrix symmetric tri-factorization method to find more potential interactions between proteins in the network so as to optimize the weighted network. Then, using the subcellular location and lineal homology information, the starting score of proteins is determined, and the random walk algorithm with restart mode is applied to the optimized network to mark and rank each protein. We tested the suggested forecasting model against current representative approaches using a public database. Experiment shows high efficiency of new method in essential proteins identification. The effectiveness of this method shows that it can dramatically solve the noise problems that existing in the multi-source biological information itself and cased by integrating them.
Collapse
Affiliation(s)
- Zhihong Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022, China
| | - Yingchun Luo
- Department of Ultrasound, Hunan Provincial Maternal and Child Health Care Hospital, Changsha, Hunan 410008, China
| | - Meiping Jiang
- Department of Ultrasound, Hunan Provincial Maternal and Child Health Care Hospital, Changsha, Hunan 410008, China
| | - Dongjie Wu
- Department of Banking and Finance, Monash University, Clayton, Victoria 3168, Australia
| | - Wang Zhang
- Department of Optoelectronic Engineering, Jinan University, Guangzhou, Guangdong 510632, China
| | - Wei Yan
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022, China
| | - Bihai Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022, China
| |
Collapse
|
46
|
Engel SR, Wong ED, Nash RS, Aleksander S, Alexander M, Douglass E, Karra K, Miyasato SR, Simison M, Skrzypek MS, Weng S, Cherry JM. New data and collaborations at the Saccharomyces Genome Database: updated reference genome, alleles, and the Alliance of Genome Resources. Genetics 2022; 220:iyab224. [PMID: 34897464 PMCID: PMC9209811 DOI: 10.1093/genetics/iyab224] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 11/11/2021] [Indexed: 02/03/2023] Open
Abstract
Saccharomyces cerevisiae is used to provide fundamental understanding of eukaryotic genetics, gene product function, and cellular biological processes. Saccharomyces Genome Database (SGD) has been supporting the yeast research community since 1993, serving as its de facto hub. Over the years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation, and developed various tools and methods for analysis and curation of a variety of emerging data types. More recently, SGD and six other model organism focused knowledgebases have come together to create the Alliance of Genome Resources to develop sustainable genome information resources that promote and support the use of various model organisms to understand the genetic and genomic bases of human biology and disease. Here we describe recent activities at SGD, including the latest reference genome annotation update, the development of a curation system for mutant alleles, and new pages addressing homology across model organisms as well as the use of yeast to study human disease.
Collapse
Affiliation(s)
- Stacia R Engel
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Edith D Wong
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Robert S Nash
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Suzi Aleksander
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Micheal Alexander
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Eric Douglass
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Kalpana Karra
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Matt Simison
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Marek S Skrzypek
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Shuai Weng
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| |
Collapse
|
47
|
Wang J, Zhang Q, Han J, Zhao Y, Zhao C, Yan B, Dai C, Wu L, Wen Y, Zhang Y, Leng D, Wang Z, Yang X, He S, Bo X. Computational methods, databases and tools for synthetic lethality prediction. Brief Bioinform 2022; 23:6555403. [PMID: 35352098 PMCID: PMC9116379 DOI: 10.1093/bib/bbac106] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/15/2022] [Accepted: 03/02/2022] [Indexed: 12/17/2022] Open
Abstract
Synthetic lethality (SL) occurs between two genes when the inactivation of either gene alone has no effect on cell survival but the inactivation of both genes results in cell death. SL-based therapy has become one of the most promising targeted cancer therapies in the last decade as PARP inhibitors achieve great success in the clinic. The key point to exploiting SL-based cancer therapy is the identification of robust SL pairs. Although many wet-lab-based methods have been developed to screen SL pairs, known SL pairs are less than 0.1% of all potential pairs due to large number of human gene combinations. Computational prediction methods complement wet-lab-based methods to effectively reduce the search space of SL pairs. In this paper, we review the recent applications of computational methods and commonly used databases for SL prediction. First, we introduce the concept of SL and its screening methods. Second, various SL-related data resources are summarized. Then, computational methods including statistical-based methods, network-based methods, classical machine learning methods and deep learning methods for SL prediction are summarized. In particular, we elaborate on the negative sampling methods applied in these models. Next, representative tools for SL prediction are introduced. Finally, the challenges and future work for SL prediction are discussed.
Collapse
Affiliation(s)
- Jing Wang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Qinglong Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Junshan Han
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yanpeng Zhao
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Caiyun Zhao
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Bowei Yan
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Chong Dai
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Lianlian Wu
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yixin Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Dongjin Leng
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Zhongming Wang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaoxi Yang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| |
Collapse
|
48
|
Chen P, Michel AH, Zhang J. Transposon insertional mutagenesis of diverse yeast strains suggests coordinated gene essentiality polymorphisms. Nat Commun 2022; 13:1490. [PMID: 35314699 PMCID: PMC8938418 DOI: 10.1038/s41467-022-29228-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 03/01/2022] [Indexed: 12/18/2022] Open
Abstract
Due to epistasis, the same mutation can have drastically different phenotypic consequences in different individuals. This phenomenon is pertinent to precision medicine as well as antimicrobial drug development, but its general characteristics are largely unknown. We approach this question by genome-wide assessment of gene essentiality polymorphism in 16 Saccharomyces cerevisiae strains using transposon insertional mutagenesis. Essentiality polymorphism is observed for 9.8% of genes, most of which have had repeated essentiality switches in evolution. Genes exhibiting essentiality polymorphism lean toward having intermediate numbers of genetic and protein interactions. Gene essentiality changes tend to occur concordantly among components of the same protein complex or metabolic pathway and among a group of over 100 mitochondrial proteins, revealing molecular machines or functional modules as units of gene essentiality variation. Most essential genes tolerate transposon insertions consistently among strains in one or more coding segments, delineating nonessential regions within essential genes.
Collapse
Affiliation(s)
- Piaopiao Chen
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Agnès H Michel
- Department of Biochemistry, University of Oxford, Oxford, OX1 3QU, UK
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
49
|
Lu H, Shang C, Zou S, Cheng L, Yang S, Wang L. A Novel Method for Predicting Essential Proteins by Integrating Multidimensional Biological Attribute Information and Topological Properties. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220304201507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Essential proteins are indispensable to the maintenance of life activities and play essential roles in the areas of synthetic biology. Identification of essential proteins by computational methods has become a hot topic in recent years because of its efficiency.
Objective:
Identification of essential proteins is of important significance and practical use in the areas of synthetic biology, drug targets, and human disease genes.
Method:
In this paper, a method called EOP(Edge clustering coefficient -Orthologous-Protein) is proposed to infer potential essential proteins by combining Multidimensional Biological Attribute Information of proteins with Topological Properties of the protein-protein interaction network.
Results:
The simulation results on the yeast protein interaction network show that the number of essential proteins identified by this method is more than the number identified by the other 12 methods(DC, IC, EC, SC, BC, CC, NC, LAC, PEC, CoEWC, POEM, DWE). Especially compared with DC(Degree Centrality), the SN(sensitivity) is 9% higher, when the candidate protein is 1%, the recognition rate is 34% higher, when the candidate protein is 5%, 10%, 15%, 20%, 25% the recognition rate is 36%, 22%, 15%, 11%, 8% higher respectively.
Conclusion:
Experimental results show that our method can achieve satisfactory prediction results, which may provide references for future research.
Collapse
Affiliation(s)
- Hanyu Lu
- College of Big Data and Information Engineering, Guizhou University, Guizhou, China
| | - Chen Shang
- College of Big Data and Information Engineering, Guizhou University, Guizhou, China
| | - Sai Zou
- College of Big Data and Information Engineering, Guizhou University, Guizhou, China
| | - Lihong Cheng
- College of Foreign Languages, Dalian Jiaotong University, China
| | - Shikong Yang
- College of Big Data and Information Engineering, Guizhou University, Guizhou, China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, China
| |
Collapse
|
50
|
Zhu X, Zhu Y, Tan Y, Chen Z, Wang L. An Iterative Method for Predicting Essential Proteins Based on Multifeature Fusion and Linear Neighborhood Similarity. Front Aging Neurosci 2022; 13:799500. [PMID: 35140599 PMCID: PMC8819145 DOI: 10.3389/fnagi.2021.799500] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 12/02/2021] [Indexed: 11/13/2022] Open
Abstract
Growing evidence have demonstrated that many biological processes are inseparable from the participation of key proteins. In this paper, a novel iterative method called linear neighborhood similarity-based protein multifeatures fusion (LNSPF) is proposed to identify potential key proteins based on multifeature fusion. In LNSPF, an original protein-protein interaction (PPI) network will be constructed first based on known protein-protein interaction data downloaded from benchmark databases, based on which, topological features will be further extracted. Next, gene expression data of proteins will be adopted to transfer the original PPI network to a weighted PPI network based on the linear neighborhood similarity. After that, subcellular localization and homologous information of proteins will be integrated to extract functional features for proteins, and based on both functional and topological features obtained above. And then, an iterative method will be designed and carried out to predict potential key proteins. At last, for evaluating the predictive performance of LNSPF, extensive experiments have been done, and compare results between LNPSF and 15 state-of-the-art competitive methods have demonstrated that LNSPF can achieve satisfactory recognition accuracy, which is markedly better than that achieved by each competing method.
Collapse
Affiliation(s)
- Xianyou Zhu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
| | - Yaocan Zhu
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Yihong Tan
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Zhiping Chen
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|