1
|
Fang T, Szklarczyk D, Hachilif R, von Mering C. Enhancing coevolutionary signals in protein-protein interaction prediction through clade-wise alignment integration. Sci Rep 2024; 14:6009. [PMID: 38472223 PMCID: PMC10933411 DOI: 10.1038/s41598-024-55655-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/26/2024] [Indexed: 03/14/2024] Open
Abstract
Protein-protein interactions (PPIs) play essential roles in most biological processes. The binding interfaces between interacting proteins impose evolutionary constraints that have successfully been employed to predict PPIs from multiple sequence alignments (MSAs). To construct MSAs, critical choices have to be made: how to ensure the reliable identification of orthologs, and how to optimally balance the need for large alignments versus sufficient alignment quality. Here, we propose a divide-and-conquer strategy for MSA generation: instead of building a single, large alignment for each protein, multiple distinct alignments are constructed under distinct clades in the tree of life. Coevolutionary signals are searched separately within these clades, and are only subsequently integrated using machine learning techniques. We find that this strategy markedly improves overall prediction performance, concomitant with better alignment quality. Using the popular DCA algorithm to systematically search pairs of such alignments, a genome-wide all-against-all interaction scan in a bacterial genome is demonstrated. Given the recent successes of AlphaFold in predicting direct PPIs at atomic detail, a discover-and-refine approach is proposed: our method could provide a fast and accurate strategy for pre-screening the entire genome, submitting to AlphaFold only promising interaction candidates-thus reducing false positives as well as computation time.
Collapse
Affiliation(s)
- Tao Fang
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Radja Hachilif
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| |
Collapse
|
2
|
Campos ACS, Araújo TM, Moraes L, Corrêa dos Santos RA, Goldman GH, Riano-Pachon DM, Oliveira JVDC, Squina FM, Castro IDM, Trópia MJM, da Cunha AC, Rosse IC, Brandão RL. Selected cachaça yeast strains share a genomic profile related to traits relevant to industrial fermentation processes. Appl Environ Microbiol 2024; 90:e0175923. [PMID: 38112453 PMCID: PMC10807443 DOI: 10.1128/aem.01759-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 11/01/2023] [Indexed: 12/21/2023] Open
Abstract
The isolation and selection of yeast strains to improve the quality of the cachaça-Brazilian Spirit-have been studied in our research group. Our strategy considers Saccharomyces cerevisiae as the predominant species involved in sugarcane juice fermentation and the presence of different stressors (osmolarity, temperature, ethanol content, and competition with other microorganisms). It also considers producing balanced concentrations of volatile compounds (higher alcohols and acetate and/or ethyl esters), flocculation capacity, and ethanol production. Since the genetic bases behind these traits of interest are not fully established, the whole genome sequencing of 11 different Saccharomyces cerevisiae strains isolated and selected from different places was analyzed to identify the presence of a specific genetic variation common to cachaça yeast strains. We have identified 20,128 single-nucleotide variants shared by all genomes. Of these shared variants, 37 were new variants (being six missenses), and 4,451 were identified as missenses. We performed a detailed functional annotation (using enrichment analysis, protein-protein interaction network analysis, and database and in-depth literature searches) of these new and missense variants. Many genes carrying these variations were involved in the phenotypes of flocculation, tolerance to fermentative stresses, and production of volatile compounds and ethanol. These results demonstrate the existence of a genetic profile shared by the 11 strains under study that could be associated with the applied selective strategy. Thus, this study points out genes and variants that may be used as molecular markers for selecting strains well suited to the fermentation process, including genetic improvement by genome editing, ultimately producing high-quality beverages and adding value.IMPORTANCEThis work demonstrates the existence of new genetic markers related to different phenotypes used to select yeast strains and mutations in genes directly involved in producing flavoring compounds and ethanol, and others related to flocculation and stress resistance.
Collapse
Affiliation(s)
- Anna Clara Silva Campos
- Laboratório de Biologia Celular e Molecular, Departamento de Farmácia, Escola de Farmácia, Ouro Preto, Brazil
| | - Thalita Macedo Araújo
- Laboratório de Biologia Celular e Molecular, Departamento de Farmácia, Escola de Farmácia, Ouro Preto, Brazil
- Área de Ciências Biológicas, Instituto Federal de Minas Gerais, Campus Ouro Preto, Ouro Preto, Minas Gerais, Brazil
| | - Lauro Moraes
- Laboratório Multiusuário de Bioinformática, Núcleo de Pesquisas em Ciências Biológicas, Universidade Federal de Ouro Preto, Ouro Preto, Brazil
| | - Renato Augusto Corrêa dos Santos
- Faculdade de Ciências Farmacêuticas de Ribeirão Preto (FCFRP), Universidade de São Paulo, Ribeirão Preto, São Paulo, Brazil
- Laboratório de Biologia Computacional, Evolutiva e de Sistemas, Centro de Energia Nuclear na Agricultura, Universidade de São Paulo, Piracicaba, São Paulo, Brazil
| | - Gustavo Henrique Goldman
- Faculdade de Ciências Farmacêuticas de Ribeirão Preto (FCFRP), Universidade de São Paulo, Ribeirão Preto, São Paulo, Brazil
| | - Diego Maurício Riano-Pachon
- Laboratório de Biologia Computacional, Evolutiva e de Sistemas, Centro de Energia Nuclear na Agricultura, Universidade de São Paulo, Piracicaba, São Paulo, Brazil
| | | | | | - Ieso de Miranda Castro
- Laboratório de Biologia Celular e Molecular, Departamento de Farmácia, Escola de Farmácia, Ouro Preto, Brazil
| | - Maria José Magalhães Trópia
- Laboratório de Biologia Celular e Molecular, Departamento de Farmácia, Escola de Farmácia, Ouro Preto, Brazil
| | - Aureliano Claret da Cunha
- Laboratório de Biologia Celular e Molecular, Departamento de Farmácia, Escola de Farmácia, Ouro Preto, Brazil
- Laboratório de Engenharia de Alimentos, Departamento de Alimentos, Escola de Nutrição, Salvador, Brazil
| | - Izinara C. Rosse
- Laboratório de Biologia Celular e Molecular, Departamento de Farmácia, Escola de Farmácia, Ouro Preto, Brazil
- Laboratório Multiusuário de Bioinformática, Núcleo de Pesquisas em Ciências Biológicas, Universidade Federal de Ouro Preto, Ouro Preto, Brazil
| | - Rogelio Lopes Brandão
- Laboratório de Biologia Celular e Molecular, Departamento de Farmácia, Escola de Farmácia, Ouro Preto, Brazil
| |
Collapse
|
3
|
Bi X, Liang W, Zhao Q, Wang J. SSLpheno: a self-supervised learning approach for gene-phenotype association prediction using protein-protein interactions and gene ontology data. Bioinformatics 2023; 39:btad662. [PMID: 37941450 PMCID: PMC10666204 DOI: 10.1093/bioinformatics/btad662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 10/17/2023] [Accepted: 11/03/2023] [Indexed: 11/10/2023] Open
Abstract
MOTIVATION Medical genomics faces significant challenges in interpreting disease phenotype and genetic heterogeneity. Despite the establishment of standardized disease phenotype databases, computational methods for predicting gene-phenotype associations still suffer from imbalanced category distribution and a lack of labeled data in small categories. RESULTS To address the problem of labeled-data scarcity, we propose a self-supervised learning strategy for gene-phenotype association prediction, called SSLpheno. Our approach utilizes an attributed network that integrates protein-protein interactions and gene ontology data. We apply a Laplacian-based filter to ensure feature smoothness and use self-supervised training to optimize node feature representation. Specifically, we calculate the cosine similarity of feature vectors and select positive and negative sample nodes for reconstruction training labels. We employ a deep neural network for multi-label classification of phenotypes in the downstream task. Our experimental results demonstrate that SSLpheno outperforms state-of-the-art methods, especially in categories with fewer annotations. Moreover, our case studies illustrate the potential of SSLpheno as an effective prescreening tool for gene-phenotype association identification. AVAILABILITY AND IMPLEMENTATION https://github.com/bixuehua/SSLpheno.
Collapse
Affiliation(s)
- Xuehua Bi
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi 830017, China
| | - Weiyang Liang
- College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
4
|
Cai P, Liu S, Zhang D, Xing H, Han M, Liu D, Gong L, Hu QN. SynBioTools: a one-stop facility for searching and selecting synthetic biology tools. BMC Bioinformatics 2023; 24:152. [PMID: 37069545 PMCID: PMC10111727 DOI: 10.1186/s12859-023-05281-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 04/11/2023] [Indexed: 04/19/2023] Open
Abstract
BACKGROUND The rapid development of synthetic biology relies heavily on the use of databases and computational tools, which are also developing rapidly. While many tool registries have been created to facilitate tool retrieval, sharing, and reuse, no relatively comprehensive tool registry or catalog addresses all aspects of synthetic biology. RESULTS We constructed SynBioTools, a comprehensive collection of synthetic biology databases, computational tools, and experimental methods, as a one-stop facility for searching and selecting synthetic biology tools. SynBioTools includes databases, computational tools, and methods extracted from reviews via SCIentific Table Extraction, a scientific table-extraction tool that we built. Approximately 57% of the resources that we located and included in SynBioTools are not mentioned in bio.tools, the dominant tool registry. To improve users' understanding of the tools and to enable them to make better choices, the tools are grouped into nine modules (each with subdivisions) based on their potential biosynthetic applications. Detailed comparisons of similar tools in every classification are included. The URLs, descriptions, source references, and the number of citations of the tools are also integrated into the system. CONCLUSIONS SynBioTools is freely available at https://synbiotools.lifesynther.com/ . It provides end-users and developers with a useful resource of categorized synthetic biology databases, tools, and methods to facilitate tool retrieval and selection.
Collapse
Affiliation(s)
- Pengli Cai
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Sheng Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Dachuan Zhang
- Ecological Systems Design, Institute of Environmental Engineering, ETH Zurich, 8093, Zurich, Switzerland
| | - Huadong Xing
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Mengying Han
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Dongliang Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Linlin Gong
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
5
|
Li Y, Ma B, Hua K, Gong H, He R, Luo R, Bi D, Zhou R, Langford PR, Jin H. PPNet: Identifying Functional Association Networks by Phylogenetic Profiling of Prokaryotic Genomes. Microbiol Spectr 2023; 11:e0387122. [PMID: 36602356 PMCID: PMC9927313 DOI: 10.1128/spectrum.03871-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 12/01/2022] [Indexed: 01/06/2023] Open
Abstract
Identification of microbial functional association networks allows interpretation of biological phenomena and a greater understanding of the molecular basis of pathogenicity and also underpins the formulation of control measures. Here, we describe PPNet, a tool that uses genome information and analysis of phylogenetic profiles with binary similarity and distance measures to derive large-scale bacterial gene association networks of a single species. As an exemplar, we have derived a functional association network in the pig pathogen Streptococcus suis using 81 binary similarity and dissimilarity measures which demonstrates excellent performance based on the area under the receiver operating characteristic (AUROC), the area under the precision-recall (AUPR), and a derived overall scoring method. Selected network associations were validated experimentally by using bacterial two-hybrid experiments. We conclude that PPNet, a publicly available (https://github.com/liyangjie/PPNet), can be used to construct microbial association networks from easily acquired genome-scale data. IMPORTANCE This study developed PPNet, the first tool that can be used to infer large-scale bacterial functional association networks of a single species. PPNet includes a method for assigning the uniqueness of a bacterial strain using the average nucleotide identity and the average nucleotide coverage. PPNet collected 81 binary similarity and distance measures for phylogenetic profiling and then evaluated and divided them into four groups. PPNet can effectively capture gene networks that are functionally related to phenotype from publicly prokaryotic genomes, as well as provide valuable results for downstream analysis and experiment testing.
Collapse
Affiliation(s)
- Yangjie Li
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Bin Ma
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Kexin Hua
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Huimin Gong
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Rongrong He
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Rui Luo
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Dingren Bi
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Rui Zhou
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Paul R. Langford
- Section of Paediatric Infectious Disease, Imperial College London, St Mary’s Campus, London, United Kingdom
| | - Hui Jin
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
6
|
Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, Gable AL, Fang T, Doncheva N, Pyysalo S, Bork P, Jensen L, von Mering C. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res 2023; 51:D638-D646. [PMID: 36370105 PMCID: PMC9825434 DOI: 10.1093/nar/gkac1000] [Citation(s) in RCA: 1530] [Impact Index Per Article: 1530.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/10/2022] [Accepted: 10/19/2022] [Indexed: 11/13/2022] Open
Abstract
Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein-protein interactions-both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes.
Collapse
Affiliation(s)
- Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Rebecca Kirsch
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Mikaela Koutrouli
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Farrokh Mehryary
- TurkuNLP lab, Department of Computing, University of Turku, 20014 Turku, Finland
| | - Radja Hachilif
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Annika L Gable
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tao Fang
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Nadezhda T Doncheva
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Sampo Pyysalo
- TurkuNLP lab, Department of Computing, University of Turku, 20014 Turku, Finland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
- Yonsei Frontier Lab (YFL), Yonsei University, Seoul 03722, South Korea
- Max Delbrück Centre for Molecular Medicine, 13125 Berlin, Germany
- Department of Bioinformatics, Biozentrum, University of Würzburg, 97074 Würzburg, Germany
| | - Lars J Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
7
|
Thakur P, Alaba MO, Rauniyar S, Singh RN, Saxena P, Bomgni A, Gnimpieba EZ, Lushbough C, Goh KM, Sani RK. Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow. Microorganisms 2023; 11:119. [PMID: 36677411 PMCID: PMC9867429 DOI: 10.3390/microorganisms11010119] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 12/12/2022] [Accepted: 12/14/2022] [Indexed: 01/05/2023] Open
Abstract
A significant amount of literature is available on biocorrosion, which makes manual extraction of crucial information such as genes and proteins a laborious task. Despite the fast growth of biology related corrosion studies, there is a limited number of gene collections relating to the corrosion process (biocorrosion). Text mining offers a potential solution by automatically extracting the essential information from unstructured text. We present a text mining workflow that extracts biocorrosion associated genes/proteins in sulfate-reducing bacteria (SRB) from literature databases (e.g., PubMed and PMC). This semi-automatic workflow is built with the Named Entity Recognition (NER) method and Convolutional Neural Network (CNN) model. With PubMed and PMCID as inputs, the workflow identified 227 genes belonging to several Desulfovibrio species. To validate their functions, Gene Ontology (GO) enrichment and biological network analysis was performed using UniprotKB and STRING-DB, respectively. The GO analysis showed that metal ion binding, sulfur binding, and electron transport were among the principal molecular functions. Furthermore, the biological network analysis generated three interlinked clusters containing genes involved in metal ion binding, cellular respiration, and electron transfer, which suggests the involvement of the extracted gene set in biocorrosion. Finally, the dataset was validated through manual curation, yielding a similar set of genes as our workflow; among these, hysB and hydA, and sat and dsrB were identified as the metal ion binding and sulfur metabolism genes, respectively. The identified genes were mapped with the pangenome of 63 SRB genomes that yielded the distribution of these genes across 63 SRB based on the amino acid sequence similarity and were further categorized as core and accessory gene families. SRB's role in biocorrosion involves the transfer of electrons from the metal surface via a hydrogen medium to the sulfate reduction pathway. Therefore, genes encoding hydrogenases and cytochromes might be participating in removing hydrogen from the metals through electron transfer. Moreover, the production of corrosive sulfide from the sulfur metabolism indirectly contributes to the localized pitting of the metals. After the corroboration of text mining results with SRB biocorrosion mechanisms, we suggest that the text mining framework could be utilized for genes/proteins extraction and significantly reduce the manual curation time.
Collapse
Affiliation(s)
- Payal Thakur
- Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA
- Data Driven Material Discovery Center for Bioengineering Innovation, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA
| | - Mathew O. Alaba
- Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD 57069, USA
| | - Shailabh Rauniyar
- Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA
- 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA
| | - Ram Nageena Singh
- Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA
- 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA
| | - Priya Saxena
- Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA
- Data Driven Material Discovery Center for Bioengineering Innovation, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA
| | - Alain Bomgni
- Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD 57069, USA
| | - Etienne Z. Gnimpieba
- Data Driven Material Discovery Center for Bioengineering Innovation, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA
- Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD 57069, USA
- 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA
| | - Carol Lushbough
- Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD 57069, USA
| | - Kian Mau Goh
- Faculty of Science, Universiti Teknologi Malaysia, Skudai 81310, Johor, Malaysia
| | - Rajesh Kumar Sani
- Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA
- Data Driven Material Discovery Center for Bioengineering Innovation, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA
- 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA
- BuG ReMeDEE Consortium, South Dakota School of Mines and Technology, Rapid City, SD 57701, USA
- Composite and Nanocomposite Advanced Manufacturing Centre—Biomaterials, Rapid City, SD 57701, USA
| |
Collapse
|
8
|
Santorelli L, Caterino M, Costanzo M. Dynamic Interactomics by Cross-Linking Mass Spectrometry: Mapping the Daily Cell Life in Postgenomic Era. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2022; 26:633-649. [PMID: 36445175 DOI: 10.1089/omi.2022.0137] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
The majority of processes that occur in daily cell life are modulated by hundreds to thousands of dynamic protein-protein interactions (PPI). The resulting protein complexes constitute a tangled network that, with its continuous remodeling, builds up highly organized functional units. Thus, defining the dynamic interactome of one or more proteins allows determining the full range of biological activities these proteins are capable of. This conceptual approach is poised to gain further traction and significance in the current postgenomic era wherein the treatment of severe diseases needs to be tackled at both genomic and PPI levels. This also holds true for COVID-19, a multisystemic disease affecting biological networks across the biological hierarchy from genome to proteome to metabolome. In this overarching context and the current historical moment of the COVID-19 pandemic where systems biology increasingly comes to the fore, cross-linking mass spectrometry (XL-MS) has become highly relevant, emerging as a powerful tool for PPI discovery and characterization. This expert review highlights the advanced XL-MS approaches that provide in vivo insights into the three-dimensional protein complexes, overcoming the static nature of common interactomics data and embracing the dynamics of the cell proteome landscape. Many XL-MS applications based on the use of diverse cross-linkers, MS detection methods, and predictive bioinformatic tools for single proteins or proteome-wide interactions were shown. We conclude with a future outlook on XL-MS applications in the field of structural proteomics and ways to sustain the remarkable flexibility of XL-MS for dynamic interactomics and structural studies in systems biology and planetary health.
Collapse
Affiliation(s)
- Lucia Santorelli
- Department of Oncology and Hematology-Oncology, University of Milano, Milan, Italy.,IFOM ETS, The AIRC Institute of Molecular Oncology, Milan, Italy
| | - Marianna Caterino
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Naples, Italy.,CEINGE-Biotecnologie Avanzate s.c.ar.l., Naples, Italy
| | - Michele Costanzo
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Naples, Italy.,CEINGE-Biotecnologie Avanzate s.c.ar.l., Naples, Italy
| |
Collapse
|
9
|
El Kadiri Y, Ratbi I, Sefiani A, Lyahyai J. Novel copy number variation of COLQ gene in a Moroccan patient with congenital myasthenic syndrome: a case report and review of the literature. BMC Neurol 2022; 22:292. [PMID: 35932018 PMCID: PMC9354381 DOI: 10.1186/s12883-022-02822-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 07/29/2022] [Indexed: 11/11/2022] Open
Abstract
Background Congenital myasthenic syndromes (CMSs) are rare genetic diseases due to abnormalities of the neuromuscular junction leading to permanent or transient muscle fatigability and weakness. To date, 32 genes were found to be involved in CMSs with autosomal dominant and/or recessive inheritance patterns. CMS with acetylcholinesterase deficiency, in particular, was determined to be due to biallelic mutations of COLQ gene with early-onset clinical signs. Here, we report clinical features and novel molecular findings of COLQ-related CMS in a Moroccan patient with a review of the literature for this rare form. Case presentation In this study, we report the case of a 28-month-old Moroccan female patient with hypotonia, associated to axial muscle weakness, global motor delay, bilateral ptosis, unilateral partial visual field deficiency with normal ocular motility, and fatigable muscle weakness. Clinical exome sequencing revealed a novel homozygous deletion of exon 13 in COLQ gene, NM_005677.4(COLQ):c.(814+1_815-1)_(954+1_955-1) del p.(Gly272Aspfs*11). This finding was subsequently confirmed by quantitative real-time PCR (qPCR) in the proband and her parents. In silico analysis of protein-protein interaction network by STRING tool revealed that 12 proteins are highly associated to COLQ with an elevated confidence score. Treatment with Salbutamol resulted in clear benefits and recovery. Conclusions This clinical observation illustrates the important place of next-generation sequencing in the precise molecular diagnosis of heterogeneous forms of CMS, the appropriate management and targeted treatment, and genetic counseling of families, with a better characterization of the mutational profile of this rare disease in the Moroccan population.
Collapse
Affiliation(s)
- Youssef El Kadiri
- Research Team in Genomics and Molecular Epidemiology of Genetic Diseases, Genomics Center of Human Pathologies, Faculty of Medicine and Pharmacy, Mohammed V University in Rabat, 10 100 Rabat, Morocco. .,Department of Medical Genetics, National Institute of Health, BP 769-Agdal, 10 090, Rabat, Morocco.
| | - Ilham Ratbi
- Research Team in Genomics and Molecular Epidemiology of Genetic Diseases, Genomics Center of Human Pathologies, Faculty of Medicine and Pharmacy, Mohammed V University in Rabat, 10 100 Rabat, Morocco
| | - Abdelaziz Sefiani
- Research Team in Genomics and Molecular Epidemiology of Genetic Diseases, Genomics Center of Human Pathologies, Faculty of Medicine and Pharmacy, Mohammed V University in Rabat, 10 100 Rabat, Morocco.,Department of Medical Genetics, National Institute of Health, BP 769-Agdal, 10 090, Rabat, Morocco
| | - Jaber Lyahyai
- Research Team in Genomics and Molecular Epidemiology of Genetic Diseases, Genomics Center of Human Pathologies, Faculty of Medicine and Pharmacy, Mohammed V University in Rabat, 10 100 Rabat, Morocco
| |
Collapse
|
10
|
Xiang J, Meng X, Zhao Y, Wu FX, Li M. HyMM: hybrid method for disease-gene prediction by integrating multiscale module structure. Brief Bioinform 2022; 23:6547263. [PMID: 35275996 DOI: 10.1093/bib/bbac072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/18/2022] [Accepted: 02/13/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction. RESULTS We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation. CONCLUSIONS The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.
Collapse
Affiliation(s)
- Ju Xiang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China; Department of Basic Medical Sciences & Academician Workstation, Changsha Medical University, Changsha, Hunan 410219, China
| | - Xiangmao Meng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
11
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
12
|
Huang XT, Jia S, Gao L, Wu J. Reconstruction of human protein-coding gene functional association network based on machine learning. Brief Bioinform 2022; 23:6502555. [PMID: 35021191 DOI: 10.1093/bib/bbab552] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 11/13/2021] [Accepted: 12/02/2021] [Indexed: 01/02/2023] Open
Abstract
Networks consisting of molecular interactions are intrinsically dynamical systems of an organism. These interactions curated in molecular interaction databases are still not complete and contain false positives introduced by high-throughput screening experiments. In this study, we propose a framework to integrate interactions of functional associated protein-coding genes from 31 data sources to reconstruct a network with high coverage and quality. For each interaction, 369 features were constructed including properties of both the interaction and the involved genes. The training and validation sets were built on the pathway interactions as positives and the potential negative instances resulting from our proposed semi-supervised strategy. Random forest classification method was then applied to train and predict multiple times to give a score for each interaction. After setting a threshold estimated by a Binomial distribution, a Human protein-coding Gene Functional Association Network (HuGFAN) was reconstructed with 20 383 genes and 1185 429 high confidence interactions. Then, HuGFAN was compared with other networks from data sources with respect to network properties, suggesting that HuGFAN is more function and pathway related. Finally, HuGFAN was applied to identify cancer driver through two famous network-based methods (DriverNet and HotNet2) to show its outstanding performance compared with other networks. HuGFAN and other supplementary files are freely available at https://github.com/xthuang226/HuGFAN.
Collapse
Affiliation(s)
- Xiao-Tai Huang
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China
| | - Songwei Jia
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China
| | - Jing Wu
- School of Mechanical Engineering, Dongguan University of Technology, Dongguan, 523808, Guangdong, China
| |
Collapse
|
13
|
Badkas A, De Landtsheer S, Sauter T. Construction and contextualization approaches for protein-protein interaction networks. Comput Struct Biotechnol J 2022; 20:3280-3290. [PMID: 35832626 PMCID: PMC9251778 DOI: 10.1016/j.csbj.2022.06.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/15/2022] [Accepted: 06/15/2022] [Indexed: 11/17/2022] Open
Abstract
Protein-protein interaction network (PPIN) analysis is a widely used method to study the contextual role of proteins of interest, to predict novel disease genes, disease or functional modules, and to identify novel drug targets. PPIN-based analysis uses both generic and context-specific networks. Multiple contextualization methodologies have been described, such as shortest-path algorithms, neighborhood-based methods, and diffusion/propagation algorithms. This review discusses these methods, provides intuitive representations of PPIN contextualization, and also examines how the quality of such context-specific networks could be improved by considering additional sources of evidence. As a heuristic, we observe that tasks such as identifying disease genes, drug targets, and protein complexes should consider local neighborhoods, while uncovering disease mechanisms and discovering disease-pathways would gain from diffusion-based construction.
Collapse
|
14
|
Morgan S, Malatras A, Duguez S, Duddy W. Optimized Molecular Interaction Networks for the Study of Skeletal Muscle. J Neuromuscul Dis 2021; 8:S223-S239. [PMID: 34308911 DOI: 10.3233/jnd-210680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND Molecular interaction networks (MINs) aim to capture the complex relationships between interacting molecules within a biological system. MINs can be constructed from existing knowledge of molecular functional associations, such as protein-protein binding interactions (PPI) or gene co-expression, and these different sources may be combined into a single MIN. A given MIN may be more or less optimal in its representation of the important functional relationships of molecules in a tissue. OBJECTIVE The aim of this study was to establish whether a combined MIN derived from different types of functional association could better capture muscle-relevant biology compared to its constituent single-source MINs. METHODS MINs were constructed from functional association databases for both protein-binding and gene co-expression. The networks were then compared based on the capture of muscle-relevant genes and gene ontology (GO) terms, tested in two different ways using established biological network clustering algorithms. The top performing MINs were combined to test whether an optimal MIN for skeletal muscle could be constructed. RESULTS The STRING PPI network was the best performing single-source MIN among those tested. Combining STRING with interactions from either the MyoMiner or CoXPRESSdb gene co-expression sources resulted in a combined network with improved performance relative to its constituent networks. CONCLUSION MINs constructed from multiple types of functional association can better represent the functional relationships of molecules in a given tissue. Such networks may be used to improve the analysis and interpretation of functional genomics data in the study of skeletal muscle and neuromuscular diseases. Networks and clusters described by this study, including the combinations of STRING with MyoMiner or with CoXPRESSdb, are available for download from https://www.sys-myo.com/myominer/download.php.
Collapse
Affiliation(s)
- Stephen Morgan
- Northern Ireland Centre for Stratified Medicine, Altnagelvin Hospital Campus, Ulster University, Londonderry, Northern Ireland, UK
| | - Apostolos Malatras
- Department of Biological Sciences, Molecular Medicine Research Center, University of Cyprus, University Avenue, Nicosia, Cyprus
| | - Stephanie Duguez
- Northern Ireland Centre for Stratified Medicine, Altnagelvin Hospital Campus, Ulster University, Londonderry, Northern Ireland, UK
| | - William Duddy
- Northern Ireland Centre for Stratified Medicine, Altnagelvin Hospital Campus, Ulster University, Londonderry, Northern Ireland, UK
| |
Collapse
|
15
|
Xiang J, Zhang J, Zheng R, Li X, Li M. NIDM: network impulsive dynamics on multiplex biological network for disease-gene prediction. Brief Bioinform 2021; 22:6236070. [PMID: 33866352 DOI: 10.1093/bib/bbab080] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/11/2021] [Accepted: 02/21/2021] [Indexed: 12/12/2022] Open
Abstract
The prediction of genes related to diseases is important to the study of the diseases due to high cost and time consumption of biological experiments. Network propagation is a popular strategy for disease-gene prediction. However, existing methods focus on the stable solution of dynamics while ignoring the useful information hidden in the dynamical process, and it is still a challenge to make use of multiple types of physical/functional relationships between proteins/genes to effectively predict disease-related genes. Therefore, we proposed a framework of network impulsive dynamics on multiplex biological network (NIDM) to predict disease-related genes, along with four variants of NIDM models and four kinds of impulsive dynamical signatures (IDSs). NIDM is to identify disease-related genes by mining the dynamical responses of nodes to impulsive signals being exerted at specific nodes. By a series of experimental evaluations in various types of biological networks, we confirmed the advantage of multiplex network and the important roles of functional associations in disease-gene prediction, demonstrated superior performance of NIDM compared with four types of network-based algorithms and then gave the effective recommendations of NIDM models and IDS signatures. To facilitate the prioritization and analysis of (candidate) genes associated to specific diseases, we developed a user-friendly web server, which provides three kinds of filtering patterns for genes, network visualization, enrichment analysis and a wealth of external links (http://bioinformatics.csu.edu.cn/DGP/NID.jsp). NIDM is a protocol for disease-gene prediction integrating different types of biological networks, which may become a very useful computational tool for the study of disease-related genes.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Jiashuai Zhang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, China
| | - Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
16
|
Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, Doncheva NT, Legeay M, Fang T, Bork P, Jensen LJ, von Mering C. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res 2021; 49:D605-D612. [PMID: 33237311 PMCID: PMC7779004 DOI: 10.1093/nar/gkaa1074] [Citation(s) in RCA: 3974] [Impact Index Per Article: 1324.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/20/2020] [Accepted: 11/23/2020] [Indexed: 12/19/2022] Open
Abstract
Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein–protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/.
Collapse
Affiliation(s)
- Damian Szklarczyk
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Annika L Gable
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Katerina C Nastou
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - David Lyon
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Rebecca Kirsch
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Sampo Pyysalo
- TurkuNLP Group, Department of Future Technologies, University of Turku, 20014 Turun Yliopisto, Finland
| | - Nadezhda T Doncheva
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Marc Legeay
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Tao Fang
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Molecular Medicine Partnership Unit, University of Heidelberg and European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Max Delbrück Centre for Molecular Medicine, 13125 Berlin, Germany.,Department of Bioinformatics, Biozentrum, University of Würzburg, 97074 Würzburg, Germany
| | - Lars J Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Christian von Mering
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| |
Collapse
|