1
|
Jacinto-Méndez D, Granados-Ramírez CG, Carbajal-Tinoco MD. KCD: A prediction web server of knowledge-based circular dichroism. Protein Sci 2024; 33:e4967. [PMID: 38532692 DOI: 10.1002/pro.4967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 03/03/2024] [Accepted: 03/04/2024] [Indexed: 03/28/2024]
Abstract
We present a web server that predicts the far-UV circular dichroism (CD) spectra of proteins by utilizing their three-dimensional (3D) structures from the Protein Data Bank (PDB). The main algorithm is based on the classical theory of optical activity together with a set of atomic complex polarizabilities, which are obtained from the analysis of a series of synchrotron radiation CD spectra and their related 3D structures from the PDB. The results of our knowledge-based CD method (KCD) are in good agreement with measured spectra that could include the effect of D-amino acids. Our method also delivers some of the most accurate predictions, in comparison with the calculated spectra from well-established models. Specifically, using a metric of closeness based on normalized absolute deviations between experimental and calculated spectra, the mean values for a series of 57 test proteins give the following figures for such models: 0.26 KCD, 0.27 PDBMD2CD, 0.30 SESCA, and 0.47 DichroCalc. From another point of view, it is worth mentioning the remarkable capabilities of the recent approaches based on artificial intelligence, which can precisely predict the native structure of proteins. The structure of proteins, however, is flexible and can be modified by a diversity of environmental factors such as interactions with other molecules, mechanical stresses, variations of temperature, pH, or ionic strength. Experimental CD spectra together with reliable predictions can be utilized to assess eventual secondary structural changes. A similar kind of evaluation can be done for the case of an incomplete protein structure that has been reconstructed by using different approaches. The KCD method can be freely accessed from: https://kcd.cinvestav.mx/.
Collapse
Affiliation(s)
- Damián Jacinto-Méndez
- Departamento de Física, Centro de Investigación y de Estudios Avanzados del IPN, Mexico City, Mexico
| | | | | |
Collapse
|
2
|
Benova L, Hudec L. Comprehensive Analysis and Evaluation of Anomalous User Activity in Web Server Logs. Sensors (Basel) 2024; 24:746. [PMID: 38339461 PMCID: PMC10856912 DOI: 10.3390/s24030746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 01/18/2024] [Accepted: 01/22/2024] [Indexed: 02/12/2024]
Abstract
In this study, we present a novel machine learning framework for web server anomaly detection that uniquely combines the Isolation Forest algorithm with expert evaluation, focusing on individual user activities within NGINX server logs. Our approach addresses the limitations of traditional methods by effectively isolating and analyzing subtle anomalies in vast datasets. Initially, the Isolation Forest algorithm was applied to extensive NGINX server logs, successfully identifying outlier user behaviors that conventional methods often overlook. We then employed DBSCAN for detailed clustering of these anomalies, categorizing them based on user request times and types. A key innovation of our methodology is the incorporation of post-clustering expert analysis. Cybersecurity professionals evaluated the identified clusters, adding a crucial layer of qualitative assessment. This enabled the accurate distinction between benign and potentially harmful activities, leading to targeted responses such as access restrictions or web server configuration adjustments. Our approach demonstrates a significant advancement in network security, offering a more refined understanding of user behavior. By integrating algorithmic precision with expert insights, we provide a comprehensive and nuanced strategy for enhancing cybersecurity measures. This study not only advances anomaly detection techniques but also emphasizes the critical need for a multifaceted approach in protecting web server infrastructures.
Collapse
Affiliation(s)
- Lenka Benova
- Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava, 842 16 Bratislava, Slovakia;
| | | |
Collapse
|
3
|
Krishnan SR, Roy A, Gromiha MM. Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning. Brief Bioinform 2024; 25:bbae002. [PMID: 38261341 PMCID: PMC10805179 DOI: 10.1093/bib/bbae002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 12/21/2023] [Accepted: 12/24/2023] [Indexed: 01/24/2024] Open
Abstract
Ribonucleic acids (RNAs) play important roles in cellular regulation. Consequently, dysregulation of both coding and non-coding RNAs has been implicated in several disease conditions in the human body. In this regard, a growing interest has been observed to probe into the potential of RNAs to act as drug targets in disease conditions. To accelerate this search for disease-associated novel RNA targets and their small molecular inhibitors, machine learning models for binding affinity prediction were developed specific to six RNA subtypes namely, aptamers, miRNAs, repeats, ribosomal RNAs, riboswitches and viral RNAs. We found that differences in RNA sequence composition, flexibility and polar nature of RNA-binding ligands are important for predicting the binding affinity. Our method showed an average Pearson correlation (r) of 0.83 and a mean absolute error of 0.66 upon evaluation using the jack-knife test, indicating their reliability despite the low amount of data available for several RNA subtypes. Further, the models were validated with external blind test datasets, which outperform other existing quantitative structure-activity relationship (QSAR) models. We have developed a web server to host the models, RNA-Small molecule binding Affinity Predictor, which is freely available at: https://web.iitm.ac.in/bioinfo2/RSAPred/.
Collapse
Affiliation(s)
- Sowmya R Krishnan
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
- TCS Research (Life Sciences division), Tata Consultancy Services, Hyderabad 500081, India
| | - Arijit Roy
- TCS Research (Life Sciences division), Tata Consultancy Services, Hyderabad 500081, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
- International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama 226-8501, Japan
- Department of Computer Science, National University of Singapore, Singapore 117543
| |
Collapse
|
4
|
Xing H, Cai P, Liu D, Han M, Liu J, Le Y, Zhang D, Hu QN. High-throughput prediction of enzyme promiscuity based on substrate-product pairs. Brief Bioinform 2024; 25:bbae089. [PMID: 38487850 PMCID: PMC10940840 DOI: 10.1093/bib/bbae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/20/2024] [Accepted: 02/03/2024] [Indexed: 03/18/2024] Open
Abstract
The screening of enzymes for catalyzing specific substrate-product pairs is often constrained in the realms of metabolic engineering and synthetic biology. Existing tools based on substrate and reaction similarity predominantly rely on prior knowledge, demonstrating limited extrapolative capabilities and an inability to incorporate custom candidate-enzyme libraries. Addressing these limitations, we have developed the Substrate-product Pair-based Enzyme Promiscuity Prediction (SPEPP) model. This innovative approach utilizes transfer learning and transformer architecture to predict enzyme promiscuity, thereby elucidating the intricate interplay between enzymes and substrate-product pairs. SPEPP exhibited robust predictive ability, eliminating the need for prior knowledge of reactions and allowing users to define their own candidate-enzyme libraries. It can be seamlessly integrated into various applications, including metabolic engineering, de novo pathway design, and hazardous material degradation. To better assist metabolic engineers in designing and refining biochemical pathways, particularly those without programming skills, we also designed EnzyPick, an easy-to-use web server for enzyme screening based on SPEPP. EnzyPick is accessible at http://www.biosynther.com/enzypick/.
Collapse
Affiliation(s)
- Huadong Xing
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Pengli Cai
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Dongliang Liu
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Mengying Han
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan 430072, China
| | - Yingying Le
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Dachuan Zhang
- Institute of Environmental Engineering, ETH Zurich, Laura-Hezner-Weg 7, 8093 Zurich, Switzerland
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
5
|
Zou J, Qin Z, Li R, Yan X, Huang H, Yang B, Zhou F, Zhang L. iProPhos: A Web-Based Interactive Platform for Integrated Proteome and Phosphoproteome Analysis. Mol Cell Proteomics 2024; 23:100693. [PMID: 38097182 PMCID: PMC10828474 DOI: 10.1016/j.mcpro.2023.100693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 11/06/2023] [Accepted: 12/11/2023] [Indexed: 01/29/2024] Open
Abstract
Large-scale omics studies have generated a wealth of mass spectrometry-based proteomics data, which provide additional insights into disease biology spanning genomic boundaries. However, there is a notable lack of web-based analysis and visualization tools that facilitate the reutilization of these data. Given this challenge, we present iProPhos, a user-friendly web server to deliver interactive and customizable functionalities. iProPhos incorporates a large number of samples, including 1444 tumor samples and 746 normal samples across 12 cancer types, sourced from the Clinical Proteomic Tumor Analysis Consortium. Additionally, users can also upload their own proteomics/phosphoproteomics data for analysis and visualization. In iProPhos, users can perform profiling plotting and differential expression, patient survival, clinical feature-related, and correlation analyses, including protein-protein, mRNA-protein, and kinase-substrate correlations. Furthermore, functional enrichment, protein-protein interaction network, and kinase-substrate enrichment analyses are accessible. iProPhos displays the analytical results in interactive figures and tables with various selectable parameters. It is freely accessible at http://longlab-zju.cn/iProPhos without login requirement. We present two case studies to demonstrate that iProPhos can identify potential drug targets and upstream kinases contributing to site-specific phosphorylation. Ultimately, iProPhos allows end-users to leverage the value of big data in cancer proteomics more effectively and accelerates the discovery of novel therapeutic targets.
Collapse
Affiliation(s)
- Jing Zou
- The Second Affiliated Hospital and Life Sciences Institute and School of Medicine, The MOE Key Laboratory of Biosystems Homeostasis and Protection and Zhejiang Provincial Key Laboratory for Cancer Molecular Cell Biology, Zhejiang University, Hangzhou, China
| | - Ziran Qin
- The Second Affiliated Hospital and Life Sciences Institute and School of Medicine, The MOE Key Laboratory of Biosystems Homeostasis and Protection and Zhejiang Provincial Key Laboratory for Cancer Molecular Cell Biology, Zhejiang University, Hangzhou, China
| | - Ran Li
- School of Medicine, Hangzhou City University, Hangzhou, China.
| | - Xiaohua Yan
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Nanchang University, Nanchang, China
| | - Huizhe Huang
- The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Bing Yang
- The Second Affiliated Hospital and Life Sciences Institute and School of Medicine, The MOE Key Laboratory of Biosystems Homeostasis and Protection and Zhejiang Provincial Key Laboratory for Cancer Molecular Cell Biology, Zhejiang University, Hangzhou, China; Department of Pharmaceutical Chemistry and the Cardiovascular Research Institute, University of California, San Francisco, California, USA
| | - Fangfang Zhou
- Institutes of Biology and Medical Sciences, Soochow University, Suzhou, China.
| | - Long Zhang
- The Second Affiliated Hospital and Life Sciences Institute and School of Medicine, The MOE Key Laboratory of Biosystems Homeostasis and Protection and Zhejiang Provincial Key Laboratory for Cancer Molecular Cell Biology, Zhejiang University, Hangzhou, China; Cancer Center, Zhejiang University, Hangzhou, China.
| |
Collapse
|
6
|
Nishimura Y, Yamada K, Okazaki Y, Ogata H. DiGAlign: Versatile and Interactive Visualization of Sequence Alignment for Comparative Genomics. Microbes Environ 2024; 39:ME23061. [PMID: 38508742 PMCID: PMC10982109 DOI: 10.1264/jsme2.me23061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 01/08/2024] [Indexed: 03/22/2024] Open
Abstract
With the explosion of available genomic information, comparative genomics has become a central approach to understanding microbial ecology and evolution. We developed DiGAlign (https://www.genome.jp/digalign/), a web server that provides versatile functionality for comparative genomics with an intuitive interface. It allows the user to perform the highly customizable visualization of a synteny map by simply uploading nucleotide sequences of interest, ranging from a specific region to the whole genome landscape of microorganisms and viruses. DiGAlign will serve a wide range of biological researchers, particularly experimental biologists, with multifaceted features that allow the rapid characterization of genomic sequences of interest and the generation of a publication-ready figure.
Collapse
Affiliation(s)
- Yosuke Nishimura
- Research Center for Bioscience and Nanoscience (CeBN), Research Institute for Marine Resources Utilization (MRU), Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka 237–0061, Japan
| | - Kohei Yamada
- Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, 611–0011, Japan
| | - Yusuke Okazaki
- Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, 611–0011, Japan
| | - Hiroyuki Ogata
- Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, 611–0011, Japan
| |
Collapse
|
7
|
Wang L, Zhou Y. MRM-BERT: a novel deep neural network predictor of multiple RNA modifications by fusing BERT representation and sequence features. RNA Biol 2024; 21:1-10. [PMID: 38357904 PMCID: PMC10877979 DOI: 10.1080/15476286.2024.2315384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/02/2024] [Indexed: 02/16/2024] Open
Abstract
RNA modifications play crucial roles in various biological processes and diseases. Accurate prediction of RNA modification sites is essential for understanding their functions. In this study, we propose a hybrid approach that fuses a pre-trained sequence representation with various sequence features to predict multiple types of RNA modifications in one combined prediction framework. We developed MRM-BERT, a deep learning method that combined the pre-trained DNABERT deep sequence representation module and the convolutional neural network (CNN) exploiting four traditional sequence feature encodings to improve the prediction performance. MRM-BERT was evaluated on multiple datasets of 12 commonly occurring RNA modifications, including m6A, m5C, m1A and so on. The results demonstrate that our hybrid model outperforms other models in terms of area under receiver operating characteristic curve (AUC) for all 12 types of RNA modifications. MRM-BERT is available as an online tool (http://117.122.208.21:8501) or source code (https://github.com/abhhba999/MRM-BERT), which allows users to predict RNA modification sites and visualize the results. Overall, our study provides an effective and efficient approach to predict multiple RNA modifications, contributing to the understanding of RNA biology and the development of therapeutic strategies.
Collapse
Affiliation(s)
- Linshu Wang
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Yuan Zhou
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing, China
- Department of Biomedical Informatics, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing, China
| |
Collapse
|
8
|
Gautam S, Thakur A, Rajput A, Kumar M. Anti-Dengue: A Machine Learning-Assisted Prediction of Small Molecule Antivirals against Dengue Virus and Implications in Drug Repurposing. Viruses 2023; 16:45. [PMID: 38257744 PMCID: PMC10818795 DOI: 10.3390/v16010045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/20/2023] [Accepted: 12/21/2023] [Indexed: 01/24/2024] Open
Abstract
Dengue outbreaks persist in global tropical regions, lacking approved antivirals, necessitating critical therapeutic development against the virus. In this context, we developed the "Anti-Dengue" algorithm that predicts dengue virus inhibitors using a quantitative structure-activity relationship (QSAR) and MLTs. Using the "DrugRepV" database, we extracted chemicals (small molecules) and repurposed drugs targeting the dengue virus with their corresponding IC50 values. Then, molecular descriptors and fingerprints were computed for these molecules using PaDEL software. Further, these molecules were split into training/testing and independent validation datasets. We developed regression-based predictive models employing 10-fold cross-validation using a variety of machine learning approaches, including SVM, ANN, kNN, and RF. The best predictive model yielded a PCC of 0.71 on the training/testing dataset and 0.81 on the independent validation dataset. The created model's reliability and robustness were assessed using William's plot, scatter plot, decoy set, and chemical clustering analyses. Predictive models were utilized to identify possible drug candidates that could be repurposed. We identified goserelin, gonadorelin, and nafarelin as potential repurposed drugs with high pIC50 values. "Anti-Dengue" may be beneficial in accelerating antiviral drug development against the dengue virus.
Collapse
Affiliation(s)
- Sakshi Gautam
- Virology Unit, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Sector 39A, Chandigarh 160036, India; (S.G.); (A.T.); (A.R.)
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Anamika Thakur
- Virology Unit, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Sector 39A, Chandigarh 160036, India; (S.G.); (A.T.); (A.R.)
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Akanksha Rajput
- Virology Unit, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Sector 39A, Chandigarh 160036, India; (S.G.); (A.T.); (A.R.)
| | - Manoj Kumar
- Virology Unit, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Sector 39A, Chandigarh 160036, India; (S.G.); (A.T.); (A.R.)
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| |
Collapse
|
9
|
Zhao Y, Yin J, Zhang L, Zhang Y, Chen X. Drug-drug interaction prediction: databases, web servers and computational models. Brief Bioinform 2023; 25:bbad445. [PMID: 38113076 PMCID: PMC10782925 DOI: 10.1093/bib/bbad445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 10/26/2023] [Accepted: 11/14/2023] [Indexed: 12/21/2023] Open
Abstract
In clinical treatment, two or more drugs (i.e. drug combination) are simultaneously or successively used for therapy with the purpose of primarily enhancing the therapeutic efficacy or reducing drug side effects. However, inappropriate drug combination may not only fail to improve efficacy, but even lead to adverse reactions. Therefore, according to the basic principle of improving the efficacy and/or reducing adverse reactions, we should study drug-drug interactions (DDIs) comprehensively and thoroughly so as to reasonably use drug combination. In this review, we first introduced the basic conception and classification of DDIs. Further, some important publicly available databases and web servers about experimentally verified or predicted DDIs were briefly described. As an effective auxiliary tool, computational models for predicting DDIs can not only save the cost of biological experiments, but also provide relevant guidance for combination therapy to some extent. Therefore, we summarized three types of prediction models (including traditional machine learning-based models, deep learning-based models and score function-based models) proposed during recent years and discussed the advantages as well as limitations of them. Besides, we pointed out the problems that need to be solved in the future research of DDIs prediction and provided corresponding suggestions.
Collapse
Affiliation(s)
- Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jun Yin
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Yong Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Xing Chen
- School of Science, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
10
|
Tian H, Xiao S, Jiang X, Tao P. PASSerRank: Prediction of allosteric sites with learning to rank. J Comput Chem 2023; 44:2223-2229. [PMID: 37561047 DOI: 10.1002/jcc.27193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 06/19/2023] [Accepted: 07/10/2023] [Indexed: 08/11/2023]
Abstract
Allostery plays a crucial role in regulating protein activity, making it a highly sought-after target in drug development. One of the major challenges in allosteric drug research is the identification of allosteric sites. In recent years, many computational models have been developed for accurate allosteric site prediction. Most of these models focus on designing a general rule that can be applied to pockets of proteins from various families. In this study, we present a new approach using the concept of Learning to Rank (LTR). The LTR model ranks pockets based on their relevance to allosteric sites, that is, how well a pocket meets the characteristics of known allosteric sites. After the training and validation on two datasets, the Allosteric Database (ASD) and CASBench, the LTR model was able to rank an allosteric pocket in the top three positions for 83.6% and 80.5% of test proteins, respectively. The model outperforms other common machine learning models with higher F1 scores (0.662 in ASD and 0.608 in CASBench) and Matthews correlation coefficients (0.645 in ASD and 0.589 in CASBench). The trained model is available on the PASSer platform (https://passer.smu.edu) to aid in drug discovery research.
Collapse
Affiliation(s)
- Hao Tian
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, USA
| | - Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, USA
| | - Xi Jiang
- Department of Statistics, Southern Methodist University, Dallas, Texas, USA
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, USA
| |
Collapse
|
11
|
Krause F, Voigt K, Di Ventura B, Öztürk MA. ReverseDock: a web server for blind docking of a single ligand to multiple protein targets using AutoDock Vina. Front Mol Biosci 2023; 10:1243970. [PMID: 37881441 PMCID: PMC10594994 DOI: 10.3389/fmolb.2023.1243970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 09/25/2023] [Indexed: 10/27/2023] Open
Abstract
Several platforms exist to perform molecular docking to computationally predict binders to a specific protein target from a library of ligands. The reverse, that is, docking a single ligand to various protein targets, can currently be done by very few web servers, which limits the search to a small set of pre-selected human proteins. However, the possibility to in silico predict which targets a compound identified in a high-throughput drug screen bind would help optimize and reduce the costs of the experimental workflow needed to reveal the molecular mechanism of action of a ligand. Here, we present ReverseDock, a blind docking web server based on AutoDock Vina specifically designed to allow users with no computational expertise to dock a ligand to 100 protein structures of their choice. ReverseDock increases the number and type of proteins a ligand can be docked to, making the task of in silico docking of a ligand to entire families of proteins straightforward. We envision ReverseDock will support researchers by providing the possibility to apply inverse docking computations using web browser. ReverseDock is available at: https://reversedock.biologie.uni-freiburg.de/.
Collapse
Affiliation(s)
- Fabian Krause
- Signaling Research Centres BIOSS and CIBSS, University of Freiburg, Freiburg, Germany
- Institute of Biology II, University of Freiburg, Freiburg, Germany
| | - Karsten Voigt
- Institute of Biology III, University of Freiburg, Freiburg, Germany
| | - Barbara Di Ventura
- Signaling Research Centres BIOSS and CIBSS, University of Freiburg, Freiburg, Germany
- Institute of Biology II, University of Freiburg, Freiburg, Germany
| | - Mehmet Ali Öztürk
- Signaling Research Centres BIOSS and CIBSS, University of Freiburg, Freiburg, Germany
- Institute of Biology II, University of Freiburg, Freiburg, Germany
| |
Collapse
|
12
|
Gao P, Zhao H, Luo Z, Lin Y, Feng W, Li Y, Kong F, Li X, Fang C, Wang X. SoyDNGP: a web-accessible deep learning framework for genomic prediction in soybean breeding. Brief Bioinform 2023; 24:bbad349. [PMID: 37824739 DOI: 10.1093/bib/bbad349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 09/13/2023] [Accepted: 09/14/2023] [Indexed: 10/14/2023] Open
Abstract
Soybean is a globally significant crop, playing a vital role in human nutrition and agriculture. Its complex genetic structure and wide trait variation, however, pose challenges for breeders and researchers aiming to optimize its yield and quality. Addressing this biological complexity requires innovative and accurate tools for trait prediction. In response to this challenge, we have developed SoyDNGP, a deep learning-based model that offers significant advancements in the field of soybean trait prediction. Compared to existing methods, such as DeepGS and DNNGP, SoyDNGP boasts a distinct advantage due to its minimal increase in parameter volume and superior predictive accuracy. Through rigorous performance comparison, including prediction accuracy and model complexity, SoyDNGP represents improved performance to its counterparts. Furthermore, it effectively predicted complex traits with remarkable precision, demonstrating robust performance across different sample sizes and trait complexities. We also tested the versatility of SoyDNGP across multiple crop species, including cotton, maize, rice and tomato. Our results showed its consistent and comparable performance, emphasizing SoyDNGP's potential as a versatile tool for genomic prediction across a broad range of crops. To enhance its accessibility to users without extensive programming experience, we designed a user-friendly web server, available at http://xtlab.hzau.edu.cn/SoyDNGP. The server provides two features: 'Trait Lookup', offering users the ability to access pre-existing trait predictions for over 500 soybean accessions, and 'Trait Prediction', allowing for the upload of VCF files for trait estimation. By providing a high-performing, accessible tool for trait prediction, SoyDNGP opens up new possibilities in the quest for optimized soybean breeding.
Collapse
Affiliation(s)
- Pengfei Gao
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| | - Haonan Zhao
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| | - Zheng Luo
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| | - Yifan Lin
- Hubei Hongshan Laboratory, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| | - Wanjie Feng
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| | - Yaling Li
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| | - Fanjiang Kong
- Guangzhou Key Laboratory of Crop Gene Editing, Guangdong Key Laboratory of Plant Adaptation and Molecular Design, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou 510006, China
| | - Xia Li
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| | - Chao Fang
- Guangzhou Key Laboratory of Crop Gene Editing, Guangdong Key Laboratory of Plant Adaptation and Molecular Design, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou 510006, China
| | - Xutong Wang
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
- Hubei Hongshan Laboratory, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| |
Collapse
|
13
|
Gautam A, Bhowmik D, Basu S, Zeng W, Lahiri A, Huson DH, Paul S. Microbiome Metabolome Integration Platform (MMIP): a web-based platform for microbiome and metabolome data integration and feature identification. Brief Bioinform 2023; 24:bbad325. [PMID: 37771003 DOI: 10.1093/bib/bbad325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/12/2023] [Indexed: 09/30/2023] Open
Abstract
A microbial community maintains its ecological dynamics via metabolite crosstalk. Hence, knowledge of the metabolome, alongside its populace, would help us understand the functionality of a community and also predict how it will change in atypical conditions. Methods that employ low-cost metagenomic sequencing data can predict the metabolic potential of a community, that is, its ability to produce or utilize specific metabolites. These, in turn, can potentially serve as markers of biochemical pathways that are associated with different communities. We developed MMIP (Microbiome Metabolome Integration Platform), a web-based analytical and predictive tool that can be used to compare the taxonomic content, diversity variation and the metabolic potential between two sets of microbial communities from targeted amplicon sequencing data. MMIP is capable of highlighting statistically significant taxonomic, enzymatic and metabolic attributes as well as learning-based features associated with one group in comparison with another. Furthermore, MMIP can predict linkages among species or groups of microbes in the community, specific enzyme profiles, compounds or metabolites associated with such a group of organisms. With MMIP, we aim to provide a user-friendly, online web server for performing key microbiome-associated analyses of targeted amplicon sequencing data, predicting metabolite signature, and using learning-based linkage analysis, without the need for initial metabolomic analysis, and thereby helping in hypothesis generation.
Collapse
Affiliation(s)
- Anupam Gautam
- Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
- International Max Planck Research School "From Molecules to Organisms", Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Cluster of Excellence: EXC 2124: Controlling Microbes to Fight Infection, Tübingen, Germany
| | - Debaleena Bhowmik
- Cell Biology and Physiology Division, CSIR-Indian Institute of Chemical Biology, Kolkata, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Sayantani Basu
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
| | - Wenhuan Zeng
- Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
- Cluster of Excellence: EXC 2064: Machine Learning: New Perspectives for Science, University of Tübingen, Tübingen, Germany
| | - Abhishake Lahiri
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- Infectious Diseases and Immunology Division, CSIR-Indian Institute of Chemical Biology, Kolkata, India
- Centre for Health Science and Technology, JIS Institute of Advanced Studies and Research Kolkata, JIS University, West Bengal, India
| | - Daniel H Huson
- Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
- International Max Planck Research School "From Molecules to Organisms", Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Cluster of Excellence: EXC 2124: Controlling Microbes to Fight Infection, Tübingen, Germany
| | - Sandip Paul
- Centre for Health Science and Technology, JIS Institute of Advanced Studies and Research Kolkata, JIS University, West Bengal, India
| |
Collapse
|
14
|
Guo Y, Zhou Q, Wei B, Wang MW, Zhao S. GPCRana: A web server for quantitative analysis of GPCR structures. Structure 2023; 31:1132-1142.e2. [PMID: 37392740 DOI: 10.1016/j.str.2023.06.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 05/21/2023] [Accepted: 06/06/2023] [Indexed: 07/03/2023]
Abstract
G protein-coupled receptors (GPCRs) attract tremendous attention from both industrial and academic researchers with currently over 900 released structures. Structural analysis is widely used to understand receptor functionality and pharmacology, but more user-friendly tools are needed. Residue-residue contact score (RRCS) is an atomic distance-based method that allows a quantitative description of GPCR structures. Here, we present GPCRana, a web server that provides a user-friendly interface to analyze GPCR structures. After uploading selected structures, GPCRana immediately generates a comprehensive report covering four aspects: (i) RRCS for all residue pairs incorporated with real-time 3D visualization; (ii) ligand-receptor interactions; (iii) activation pathway analysis; and (iv) RRCS_TMs that indicates the global movements of transmembrane helices. Moreover, conformational changes between two structures can be analyzed. Applying GPCRana on AlphaFold2-predicted models reveals differentiated inter-helical packing forms in a receptor-dependent manner. Our web server offers a fast and precise way to study GPCR structures and is freely available at http://gpcranalysis.com/#/.
Collapse
Affiliation(s)
- Yu Guo
- iHuman Institute, ShanghaiTech University, Shanghai 201210, China; School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China; University of Chinese Academy of Sciences, Beijing 100049, China; Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China
| | - Qingtong Zhou
- Department of Pharmacology, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China; Research Center for Deepsea Bioresources, Sanya, Hainan 572025, China.
| | - Bin Wei
- Research Center for Deepsea Bioresources, Sanya, Hainan 572025, China
| | - Ming-Wei Wang
- Department of Pharmacology, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China; Research Center for Deepsea Bioresources, Sanya, Hainan 572025, China; Department of Chemistry, School of Science, The University of Tokyo, Tokyo 113-0033, Japan.
| | - Suwen Zhao
- iHuman Institute, ShanghaiTech University, Shanghai 201210, China; School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China.
| |
Collapse
|
15
|
Wang Z, Ge P, Zhou XL, Shui KM, Geng H, Yang J, Chen JY, Wang J. nASAP: A Nascent RNA Profiling Data Analysis Platform. J Mol Biol 2023; 435:168142. [PMID: 37356907 DOI: 10.1016/j.jmb.2023.168142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 04/19/2023] [Accepted: 04/30/2023] [Indexed: 06/27/2023]
Abstract
Although nascent RNA profiling data are widely used in transcriptional regulation studies, the development and standardization of data processing pipeline lags far behind RNA-seq. We are filling this gap by establishing the nASAP web server (https://grobase.top/nasap/) to provide practical quality evaluation and comprehensive analysis of nascent RNA datasets. In nASAP, four customized analysis modules are provided, including i) quality assessment, which summarizes the sequencing statistics, mapping ratio, and evaluates RNA integrity and mRNA contamination; ii) quantification analysis for mRNAs, lncRNAs and eRNAs; iii) pausing analysis across the whole genome based on sequencing reads distribution; and iv) network analysis to better understand the gene regulatory mechanism by obtaining annotated enhancer-promoter interactomes. The nASAP is user-friendly and outperforms the existing pipeline for quality control of nascent RNA profiling data. We anticipate that nASAP, which eases both basic and advanced analysis of nascent RNA data, will be extremely useful in various fields.
Collapse
Affiliation(s)
- Zhi Wang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Peng Ge
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Xiao-Long Zhou
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Kun-Ming Shui
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China; Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing 210023, China
| | - Huichao Geng
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China; Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing 210023, China
| | - Jie Yang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China.
| | - Jia-Yu Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China; Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing 210023, China.
| | - Jin Wang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China; NJU Advanced Institute for Life Sciences (NAILS), Jiangsu Engineering Research Center for MicroRNA Biology and Biotechnology, Nanjing University, Nanjing 210023, China.
| |
Collapse
|
16
|
Andreani J, Jiménez-García B, Ohue M. Editorial: Web tools for modeling and analysis of biomolecular interactions Volume II. Front Mol Biosci 2023; 10:1190855. [PMID: 37363399 PMCID: PMC10289181 DOI: 10.3389/fmolb.2023.1190855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 05/30/2023] [Indexed: 06/28/2023] Open
Affiliation(s)
- Jessica Andreani
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), Gif-sur-Yvette, France
| | | | - Masahito Ohue
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Kanagawa, Japan
| |
Collapse
|
17
|
Li XW, Duan TT, Chu JY, Pan SY, Zeng Y, Hu FF. SCAD-Brain: a public database of single cell RNA-seq data in human and mouse brains with Alzheimer's disease. Front Aging Neurosci 2023; 15:1157792. [PMID: 37251804 PMCID: PMC10213211 DOI: 10.3389/fnagi.2023.1157792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 04/24/2023] [Indexed: 05/31/2023] Open
Affiliation(s)
- Xin-Wen Li
- Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, Hubei, China
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Ting-Ting Duan
- Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, Hubei, China
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Jin-Yu Chu
- Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, Hubei, China
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Shi-Yao Pan
- Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, Hubei, China
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Yan Zeng
- Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, Hubei, China
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Fei-Fei Hu
- Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, Hubei, China
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
18
|
Jung J, Popella L, Do PT, Pfau P, Vogel J, Barquist L. Design and off-target prediction for antisense oligomers targeting bacterial mRNAs with the MASON web server. RNA 2023; 29:570-583. [PMID: 36750372 PMCID: PMC10158992 DOI: 10.1261/rna.079263.122] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 01/10/2023] [Indexed: 05/06/2023]
Abstract
Antisense oligomers (ASOs), such as peptide nucleic acids (PNAs), designed to inhibit the translation of essential bacterial genes, have emerged as attractive sequence- and species-specific programmable RNA antibiotics. Yet, potential drawbacks include unwanted side effects caused by their binding to transcripts other than the intended target. To facilitate the design of PNAs with minimal off-target effects, we developed MASON (make antisense oligomers now), a web server for the design of PNAs that target bacterial mRNAs. MASON generates PNA sequences complementary to the translational start site of a bacterial gene of interest and reports critical sequence attributes and potential off-target sites. We based MASON's off-target predictions on experiments in which we treated Salmonella enterica serovar Typhimurium with a series of 10-mer PNAs derived from a PNA targeting the essential gene acpP but carrying two serial mismatches. Growth inhibition and RNA-sequencing (RNA-seq) data revealed that PNAs with terminal mismatches are still able to target acpP, suggesting wider off-target effects than anticipated. Comparison of these results to an RNA-seq data set from uropathogenic Escherichia coli (UPEC) treated with eleven different PNAs confirmed that our findings are not unique to Salmonella We believe that MASON's off-target assessment will improve the design of specific PNAs and other ASOs.
Collapse
Affiliation(s)
- Jakob Jung
- Institute for Molecular Infection Biology, University of Würzburg, 97080 Würzburg, Germany
| | - Linda Popella
- Institute for Molecular Infection Biology, University of Würzburg, 97080 Würzburg, Germany
| | - Phuong Thao Do
- Institute for Molecular Infection Biology, University of Würzburg, 97080 Würzburg, Germany
- Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
| | - Patrick Pfau
- Faculty of Medicine, University of Würzburg, 97080 Würzburg, Germany
| | - Jörg Vogel
- Institute for Molecular Infection Biology, University of Würzburg, 97080 Würzburg, Germany
- Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
| | - Lars Barquist
- Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
- Faculty of Medicine, University of Würzburg, 97080 Würzburg, Germany
| |
Collapse
|
19
|
Huang YQ, Sun P, Chen Y, Liu HX, Hao GF, Song BA. Bioinformatics toolbox for exploring target mutation-induced drug resistance. Brief Bioinform 2023; 24:7026012. [PMID: 36738254 DOI: 10.1093/bib/bbad033] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/25/2022] [Accepted: 01/14/2023] [Indexed: 02/05/2023] Open
Abstract
Drug resistance is increasingly among the main issues affecting human health and threatening agriculture and food security. In particular, developing approaches to overcome target mutation-induced drug resistance has long been an essential part of biological research. During the past decade, many bioinformatics tools have been developed to explore this type of drug resistance, and they have become popular for elucidating drug resistance mechanisms in a low cost, fast and effective way. However, these resources are scattered and underutilized, and their strengths and limitations have not been systematically analyzed and compared. Here, we systematically surveyed 59 freely available bioinformatics tools for exploring target mutation-induced drug resistance. We analyzed and summarized these resources based on their functionality, data volume, data source, operating principle, performance, etc. And we concisely discussed the strengths, limitations and application examples of these tools. Specifically, we tested some predictive tools and offered some thoughts from the clinician's perspective. Hopefully, this work will provide a useful toolbox for researchers working in the biomedical, pesticide, bioinformatics and pharmaceutical engineering fields, and a good platform for non-specialists to quickly understand drug resistance prediction.
Collapse
Affiliation(s)
- Yuan-Qin Huang
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Guiyang 550025, P. R. China
| | - Ping Sun
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Guiyang 550025, P. R. China
| | - Yi Chen
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Guiyang 550025, P. R. China
| | - Huan-Xiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Guiyang 550025, P. R. China
| | - Bao-An Song
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Guiyang 550025, P. R. China
| |
Collapse
|
20
|
Lai FL, Gao F. Auto-Kla: a novel web server to discriminate lysine lactylation sites using automated machine learning. Brief Bioinform 2023; 24:7068952. [PMID: 36869843 DOI: 10.1093/bib/bbad070] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/24/2023] [Accepted: 02/07/2023] [Indexed: 03/05/2023] Open
Abstract
Recently, lysine lactylation (Kla), a novel post-translational modification (PTM), which can be stimulated by lactate, has been found to regulate gene expression and life activities. Therefore, it is imperative to accurately identify Kla sites. Currently, mass spectrometry is the fundamental method for identifying PTM sites. However, it is expensive and time-consuming to achieve this through experiments alone. Herein, we proposed a novel computational model, Auto-Kla, to quickly and accurately predict Kla sites in gastric cancer cells based on automated machine learning (AutoML). With stable and reliable performance, our model outperforms the recently published model in the 10-fold cross-validation. To investigate the generalizability and transferability of our approach, we evaluated the performance of our models trained on two other widely studied types of PTM, including phosphorylation sites in host cells infected with SARS-CoV-2 and lysine crotonylation sites in HeLa cells. The results show that our models achieve comparable or better performance than current outstanding models. We believe that this method will become a useful analytical tool for PTM prediction and provide a reference for the future development of related models. The web server and source code are available at http://tubic.org/Kla and https://github.com/tubic/Auto-Kla, respectively.
Collapse
Affiliation(s)
- Fei-Liao Lai
- Department of Physics, School of Science, Tianjin University, Tianjin 300072, China
| | - Feng Gao
- Department of Physics, School of Science, Tianjin University, Tianjin 300072, China
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
- SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin 300072, China
| |
Collapse
|
21
|
Sents Z, Stoughton TE, Buecherl L, Thomas PJ, Fontanarrosa P, Myers CJ. SynBioSuite: A Tool for Improving the Workflow for Genetic Design and Modeling. ACS Synth Biol 2023; 12:892-897. [PMID: 36888740 DOI: 10.1021/acssynbio.2c00597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2023]
Abstract
Synthetic biology research has led to the development of many software tools for designing, constructing, editing, simulating, and sharing genetic parts and circuits. Among these tools are SBOLCanvas, iBioSim, and SynBioHub, which can be used in conjunction to create a genetic circuit design following the design-build-test-learn process. However, although automation works within these tools, most of these software tools are not integrated, and the process of transferring information between them is a very manual, error-prone process. To address this problem, this work automates some of these processes and presents SynBioSuite, a cloud-based tool that eliminates many of the drawbacks of the current approach by automating the setup and reception of results for simulating a designed genetic circuit via an application programming interface.
Collapse
Affiliation(s)
- Zachary Sents
- Department of Electrical, Computer, and Energy Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Thomas E Stoughton
- Department of Computer Science, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Lukas Buecherl
- Biomedical Engineering Program, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Payton J Thomas
- Department of Biomedical Engineering, University of Utah, Salt Lake City, Utah 84112, United States
| | - Pedro Fontanarrosa
- Department of Electrical, Computer, and Energy Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Chris J Myers
- Department of Electrical, Computer, and Energy Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| |
Collapse
|
22
|
Barman RK, Chakrabarti AK, Dutta S. Prediction of Phage Virion Proteins Using Machine Learning Methods. Molecules 2023; 28. [PMID: 36903484 DOI: 10.3390/molecules28052238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 01/27/2023] [Accepted: 02/20/2023] [Indexed: 03/04/2023] Open
Abstract
Antimicrobial resistance (AMR) is a major problem and an immediate alternative to antibiotics is the need of the hour. Research on the possible alternative products to tackle bacterial infections is ongoing worldwide. One of the most promising alternatives to antibiotics is the use of bacteriophages (phage) or phage-driven antibacterial drugs to cure bacterial infections caused by AMR bacteria. Phage-driven proteins, including holins, endolysins, and exopolysaccharides, have shown great potential in the development of antibacterial drugs. Likewise, phage virion proteins (PVPs) might also play an important role in the development of antibacterial drugs. Here, we have developed a machine learning-based prediction method to predict PVPs using phage protein sequences. We have employed well-known basic and ensemble machine learning methods with protein sequence composition features for the prediction of PVPs. We found that the gradient boosting classifier (GBC) method achieved the best accuracy of 80% on the training dataset and an accuracy of 83% on the independent dataset. The performance on the independent dataset is better than other existing methods. A user-friendly web server developed by us is freely available to all users for the prediction of PVPs from phage protein sequences. The web server might facilitate the large-scale prediction of PVPs and hypothesis-driven experimental study design.
Collapse
|
23
|
Marcet-Houben M, Collado-Cala I, Fuentes-Palacios D, Gómez AD, Molina M, Garisoain-Zafra A, Chorostecki U, Gabaldón T. EvolClustDB: Exploring Eukaryotic Gene Clusters with Evolutionarily Conserved Genomic Neighbourhoods. J Mol Biol 2023:168013. [PMID: 36806474 DOI: 10.1016/j.jmb.2023.168013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 01/24/2023] [Accepted: 02/11/2023] [Indexed: 02/17/2023]
Abstract
Conservation of gene neighbourhood over evolutionary distances is generally indicative of shared regulation or functional association among genes. This concept has been broadly exploited in prokaryotes but its use on eukaryotic genomes has been limited to specific functional classes, such as biosynthetic gene clusters. We here used an evolutionary-based gene cluster discovery algorithm (EvolClust) to pre-compute evolutionarily conserved gene neighbourhoods, which can be searched, browsed and downloaded in EvolClustDB. We inferred ∼35,000 cluster families in 882 different species in genome comparisons of five taxonomically broad clades: Fungi, Plants, Metazoans, Insects and Protists. EvolClustDB allows browsing through the cluster families, as well as searching by protein, species, identifier or sequence. Visualization allows inspecting gene order per species in a phylogenetic context, so that relevant evolutionary events such as gain, loss or transfer, can be inferred. EvolClustDB is freely available, without registration, at http://evolclustdb.org/.
Collapse
Affiliation(s)
- Marina Marcet-Houben
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain; Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3, 08034 Barcelona, Spain
| | - Ismael Collado-Cala
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain; Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3, 08034 Barcelona, Spain
| | - Diego Fuentes-Palacios
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain; Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3, 08034 Barcelona, Spain
| | - Alicia D Gómez
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain; Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3, 08034 Barcelona, Spain
| | - Manuel Molina
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain; Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3, 08034 Barcelona, Spain
| | - Andrés Garisoain-Zafra
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain; Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3, 08034 Barcelona, Spain
| | - Uciel Chorostecki
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain; Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3, 08034 Barcelona, Spain
| | - Toni Gabaldón
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain; Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3, 08034 Barcelona, Spain; Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain; Centro de Investigación Biomédica En Red de Enfermedades Infecciosas (CIBERINFEC), Barcelona, Spain.
| |
Collapse
|
24
|
Wang F, Li W, Li B, Xie L, Tong Y, Xu X. cRNAsp12 Web Server for the Prediction of Circular RNA Secondary Structures and Stabilities. Int J Mol Sci 2023; 24:ijms24043822. [PMID: 36835231 PMCID: PMC9959564 DOI: 10.3390/ijms24043822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 01/29/2023] [Accepted: 02/07/2023] [Indexed: 02/17/2023] Open
Abstract
Circular RNAs (circRNAs) are a novel class of non-coding RNA that, unlike linear RNAs, form a covalently closed loop without the 5' and 3' ends. Growing evidence shows that circular RNAs play important roles in life processes and have great potential implications in clinical and research fields. The accurate modeling of circRNAs structure and stability has far-reaching impact on our understanding of their functions and our ability to develop RNA-based therapeutics. The cRNAsp12 server offers a user-friendly web interface to predict circular RNA secondary structures and folding stabilities from the sequence. Through the helix-based landscape partitioning strategy, the server generates distinct ensembles of structures and predicts the minimal free energy structures for each ensemble with the recursive partition function calculation and backtracking algorithms. For structure predictions in the limited structural ensemble, the server also provides users with the option to set the structural constraints of forcing the base pairs and/or forcing the unpaired bases, such that only structures that meet the criteria are enumerated recursively.
Collapse
Affiliation(s)
- Fengfei Wang
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Wei Li
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Baiyi Li
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Yunguang Tong
- Department of Pharmacy, China Jiliang University, Hangzhou 310000, China
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
- Correspondence:
| |
Collapse
|
25
|
Liu CJ, Hu FF, Xie GY, Miao YR, Li XW, Zeng Y, Guo AY. GSCA: an integrated platform for gene set cancer analysis at genomic, pharmacogenomic and immunogenomic levels. Brief Bioinform 2023; 24:6957252. [PMID: 36549921 DOI: 10.1093/bib/bbac558] [Citation(s) in RCA: 66] [Impact Index Per Article: 66.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 11/16/2022] [Indexed: 12/24/2022] Open
Abstract
Cancer initiation and progression are likely caused by the dysregulation of biological pathways. Gene set analysis (GSA) could improve the signal-to-noise ratio and identify potential biological insights on the gene set level. However, platforms exploring cancer multi-omics data using GSA methods are lacking. In this study, we upgraded our GSCALite to GSCA (gene set cancer analysis, http://bioinfo.life.hust.edu.cn/GSCA) for cancer GSA at genomic, pharmacogenomic and immunogenomic levels. In this improved GSCA, we integrated expression, mutation, drug sensitivity and clinical data from four public data sources for 33 cancer types. We introduced useful features to GSCA, including associations between immune infiltration with gene expression and genomic variations, and associations between gene set expression/mutation and clinical outcomes. GSCA has four main functional modules for cancer GSA to explore, analyze and visualize expression, genomic variations, tumor immune infiltration, drug sensitivity and their associations with clinical outcomes. We used case studies of three gene sets: (i) seven cell cycle genes, (ii) tumor suppressor genes of PI3K pathway and (iii) oncogenes of PI3K pathway to prove the advantage of GSCA over single gene analysis. We found novel associations of gene set expression and mutation with clinical outcomes in different cancer types on gene set level, while on single gene analysis level, they are not significant associations. In conclusion, GSCA is a user-friendly web server and a useful resource for conducting hypothesis tests by using GSA methods at genomic, pharmacogenomic and immunogenomic levels.
Collapse
Affiliation(s)
- Chun-Jie Liu
- Wuhan University of Science and Technology and Huazhong University of Science and Technology
| | - Fei-Fei Hu
- Wuhan University of Science and Technology
| | - Gui-Yan Xie
- Huazhong University of Science and Technology
| | - Ya-Ru Miao
- Huazhong University of Science and Technology
| | - Xin-Wen Li
- Wuhan University of Science and Technology
| | - Yan Zeng
- Wuhan University of Science and Technology
| | - An-Yuan Guo
- Huazhong University of Science and Technology
| |
Collapse
|
26
|
Kumar N, Patiyal S, Choudhury S, Tomer R, Dhall A, Raghava GPS. DMPPred: a tool for identification of antigenic regions responsible for inducing type 1 diabetes mellitus. Brief Bioinform 2023; 24:6911429. [PMID: 36524996 DOI: 10.1093/bib/bbac525] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 10/27/2022] [Accepted: 11/04/2022] [Indexed: 12/23/2022] Open
Abstract
There are a number of antigens that induce autoimmune response against β-cells, leading to type 1 diabetes mellitus (T1DM). Recently, several antigen-specific immunotherapies have been developed to treat T1DM. Thus, identification of T1DM associated peptides with antigenic regions or epitopes is important for peptide based-therapeutics (e.g. immunotherapeutic). In this study, for the first time, an attempt has been made to develop a method for predicting, designing, and scanning of T1DM associated peptides with high precision. We analysed 815 T1DM associated peptides and observed that these peptides are not associated with a specific class of HLA alleles. Thus, HLA binder prediction methods are not suitable for predicting T1DM associated peptides. First, we developed a similarity/alignment based method using Basic Local Alignment Search Tool and achieved a high probability of correct hits with poor coverage. Second, we developed an alignment-free method using machine learning techniques and got a maximum AUROC of 0.89 using dipeptide composition. Finally, we developed a hybrid method that combines the strength of both alignment free and alignment-based methods and achieves maximum area under the receiver operating characteristic of 0.95 with Matthew's correlation coefficient of 0.81 on an independent dataset. We developed a web server 'DMPPred' and stand-alone server for predicting, designing and scanning T1DM associated peptides (https://webs.iiitd.edu.in/raghava/dmppred/).
Collapse
Affiliation(s)
- Nishant Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Shubham Choudhury
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Ritu Tomer
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| |
Collapse
|
27
|
Liu Y, Song F, Li Z, Chen L, Xu Y, Sun H, Chang Y. A comprehensive tool for tumor precision medicine with pharmaco-omics data analysis. Front Pharmacol 2023; 14:1085765. [PMID: 36713829 PMCID: PMC9878337 DOI: 10.3389/fphar.2023.1085765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 01/04/2023] [Indexed: 01/14/2023] Open
Abstract
Background: Cancer precision medicine is an effective strategy to fight cancers by bridging genomics and drug discovery to provide specific treatment for patients with different genetic characteristics. Although some public databases and modelling frameworks have been developed through studies on drug response, most of them only considered the ramifications of the drug on the cell line and the effects on the patient still require a huge amount of work to integrate data from various databases and calculations, especially concerning precision treatment. Furthermore, not only efficacy but also the adverse effects of drugs on patients should be taken into account during cancer treatment. However, the adverse effects as essential indicators of drug safety assessment are always neglected. Method: A holistic estimation explores various drugs' efficacy levels by calculating their potency both in reversing and enhancing cancer-associated gene expression change. And a method for bridging the gap between cell culture and living tissue estimates the effectiveness of a drug on individual patients through the mappings of various cell lines to each person according to their genetic mutation similarities. Result: We predicted the efficacy of FDA-recommended drugs, taking into account both efficacy and toxicity, and obtained consistent results. We also provided an intuitive and easy-to-use web server called DBPOM (http://www.dbpom.net/, a comprehensive database of pharmaco-omics for cancer precision medicine), which not only integrates the above methods but also provides calculation results on more than 10,000 small molecule compounds and drugs. As a one-stop web server, clinicians and drug researchers can also analyze the overall effect of a drug or a drug combination on cancer patients as well as the biological functions that they target. DBPOM is now public, free to use with no login requirement, and contains all the data and code. Conclusion: Both the positive and negative effects of drugs during precision treatment are essential for practical application of drugs. DBPOM based on the two effects will become a vital resource and analysis platform for drug development, drug mechanism studies and the discovery of new therapies.
Collapse
Affiliation(s)
- Yijun Liu
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Fuhu Song
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Zhi Li
- Medical Oncology Department, The First Affiliated Hospital of China Medical University, Shenyang, China
| | - Liang Chen
- Department of Computer Science, College of Engineering, Shantou University, Shantou, China,Key Laboratory of Intelligent Manufacturing Technology of Ministry of Education, Shantou University, Shantou, China
| | - Ying Xu
- Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, The University of Georgia, Athens, GA, United States
| | - Huiyan Sun
- School of Artificial Intelligence, Jilin University, Changchun, China,International Center of Future Science, Jilin University, Changchun, China,*Correspondence: Huiyan Sun, ; Yi Chang,
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, Changchun, China,International Center of Future Science, Jilin University, Changchun, China,*Correspondence: Huiyan Sun, ; Yi Chang,
| |
Collapse
|
28
|
Olenyi T, Marquet C, Heinzinger M, Kröger B, Nikolova T, Bernhofer M, Sändig P, Schütze K, Littmann M, Mirdita M, Steinegger M, Dallago C, Rost B. LambdaPP: Fast and accessible protein-specific phenotype predictions. Protein Sci 2023; 32:e4524. [PMID: 36454227 PMCID: PMC9793974 DOI: 10.1002/pro.4524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 11/09/2022] [Accepted: 11/21/2022] [Indexed: 12/04/2022]
Abstract
The availability of accurate and fast artificial intelligence (AI) solutions predicting aspects of proteins are revolutionizing experimental and computational molecular biology. The webserver LambdaPP aspires to supersede PredictProtein, the first internet server making AI protein predictions available in 1992. Given a protein sequence as input, LambdaPP provides easily accessible visualizations of protein 3D structure, along with predictions at the protein level (GeneOntology, subcellular location), and the residue level (binding to metal ions, small molecules, and nucleotides; conservation; intrinsic disorder; secondary structure; alpha-helical and beta-barrel transmembrane segments; signal-peptides; variant effect) in seconds. The structure prediction provided by LambdaPP-leveraging ColabFold and computed in minutes-is based on MMseqs2 multiple sequence alignments. All other feature prediction methods are based on the pLM ProtT5. Queried by a protein sequence, LambdaPP computes protein and residue predictions almost instantly for various phenotypes, including 3D structure and aspects of protein function. LambdaPP is freely available for everyone to use under embed.predictprotein.org, the interactive results for the case study can be found under https://embed.predictprotein.org/o/Q9NZC2. The frontend of LambdaPP can be found on GitHub (github.com/sacdallago/embed.predictprotein.org), and can be freely used and distributed under the academic free use license (AFL-2). For high-throughput applications, all methods can be executed locally via the bio-embeddings (bioembeddings.com) python package, or docker image at ghcr.io/bioembeddings/bio_embeddings, which also includes the backend of LambdaPP.
Collapse
Affiliation(s)
- Tobias Olenyi
- TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany,TUM Graduate SchoolCenter of Doctoral Studies in Informatics and its Applications (CeDoSIA)GarchingGermany
| | - Céline Marquet
- TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany,TUM Graduate SchoolCenter of Doctoral Studies in Informatics and its Applications (CeDoSIA)GarchingGermany
| | - Michael Heinzinger
- TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany,TUM Graduate SchoolCenter of Doctoral Studies in Informatics and its Applications (CeDoSIA)GarchingGermany
| | - Benjamin Kröger
- TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
| | - Tiha Nikolova
- TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
| | - Michael Bernhofer
- TUM Graduate SchoolCenter of Doctoral Studies in Informatics and its Applications (CeDoSIA)GarchingGermany
| | - Philip Sändig
- TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
| | - Konstantin Schütze
- TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
| | - Maria Littmann
- TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
| | - Milot Mirdita
- School of Biological SciencesSeoul National UniversitySeoulSouth Korea
| | - Martin Steinegger
- School of Biological SciencesSeoul National UniversitySeoulSouth Korea,Korea Artificial Intelligence InstituteSeoul National UniversitySeoulSouth Korea,Korea Institute of Molecular Biology and GeneticsSeoul National UniversitySeoulSouth Korea
| | - Christian Dallago
- TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany,VantAINew YorkUSA
| | - Burkhard Rost
- TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany,Institute for Advanced Study (TUM‐IAS)Lichtenbergstr. 2a, 85748 Garching/Munich, Germany & TUM School of Life Sciences Weihenstephan (WZW)FreisingGermany
| |
Collapse
|
29
|
Kulandaisamy A, Parvathy Dharshini SA, Gromiha MM. Alz-Disc: A Tool to Discriminate Disease-causing and Neutral Mutations in Alzheimer's Disease. Comb Chem High Throughput Screen 2023; 26:769-777. [PMID: 35619290 DOI: 10.2174/1386207325666220520102316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 03/17/2022] [Accepted: 04/07/2022] [Indexed: 11/22/2022]
Abstract
BACKGROUND Alzheimer's disease (AD) is the most common neurodegenerative disorder that affects the neuronal system and leads to memory loss. Many coding gene variants are associated with this disease and it is important to characterize their annotations. METHODS We collected the Alzheimer's disease-causing and neutral mutations from different databases. For each mutation, we computed the different features from protein sequence. Further, these features were used to build a Bayes network-based machine-learning algorithm to discriminate between the disease-causing and neutral mutations in AD. RESULTS We have constructed a comprehensive dataset of 314 Alzheimer's disease-causing and 370 neutral mutations and explored their characteristic features such as conservation scores, positionspecific scoring matrix (PSSM) profile, and the change in hydrophobicity, different amino acid residue substitution matrices and neighboring residue information for identifying the disease-causing mutations. Utilizing these features, we have developed a disease-specific tool named Alz-disc, for discriminating the disease-causing and neutral mutations using sequence information alone. The performance of the present method showed an accuracy of 89% for independent test set, which is 13% higher than available generic methods. This method is freely available as a web server at https://web.iitm.ac.in/bioinfo2/alzdisc/. CONCLUSIONS This study is useful to annotate the effect of new variants and develop mutation specific drug design strategies for Alzheimer's disease.
Collapse
Affiliation(s)
- A Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India
| | - S Akila Parvathy Dharshini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India
| |
Collapse
|
30
|
Li J, Li Z, Wang Y, Lin H, Wu B. TLSEA: a tool for lncRNA set enrichment analysis based on multi-source heterogeneous information fusion. Front Genet 2023; 14:1181391. [PMID: 37205123 PMCID: PMC10185877 DOI: 10.3389/fgene.2023.1181391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 04/11/2023] [Indexed: 05/21/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) play an important regulatory role in gene transcription and post-transcriptional modification, and lncRNA regulatory dysfunction leads to a variety of complex human diseases. Hence, it might be beneficial to detect the underlying biological pathways and functional categories of genes that encode lncRNA. This can be carried out by using gene set enrichment analysis, which is a pervasive bioinformatic technique that has been widely used. However, accurately performing gene set enrichment analysis of lncRNAs remains a challenge. Most conventional enrichment analysis methods have not exhaustively included the rich association information among genes, which usually affects the regulatory functions of genes. Here, we developed a novel tool for lncRNA set enrichment analysis (TLSEA) to improve the accuracy of the gene functional enrichment analysis, which extracted the low-dimensional vectors of lncRNAs in two functional annotation networks with the graph representation learning method. A novel lncRNA-lncRNA association network was constructed by merging lncRNA-related heterogeneous information obtained from multiple sources with the different lncRNA-related similarity networks. In addition, the random walk with restart method was adopted to effectively expand the lncRNAs submitted by users according to the lncRNA-lncRNA association network of TLSEA. In addition, a case study of breast cancer was performed, which demonstrated that TLSEA could detect breast cancer more accurately than conventional tools. The TLSEA can be accessed freely at http://www.lirmed.com:5003/tlsea.
Collapse
Affiliation(s)
- Jianwei Li
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, China
- *Correspondence: Jianwei Li,
| | - Zhiguang Li
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Yinfei Wang
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Hongxin Lin
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Baoqin Wu
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| |
Collapse
|
31
|
Elbourne LDH, Wilson-Mortier B, Ren Q, Hassan KA, Tetu SG, Paulsen IT. TransAAP: an automated annotation pipeline for membrane transporter prediction in bacterial genomes. Microb Genom 2023; 9:mgen000927. [PMID: 36748555 PMCID: PMC9973855 DOI: 10.1099/mgen.0.000927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Membrane transporters are a large group of proteins that span cell membranes and contribute to critical cell processes, including delivery of essential nutrients, ejection of waste products, and assisting the cell in sensing environmental conditions. Obtaining an accurate and specific annotation of the transporter proteins encoded by a micro-organism can provide details of its likely nutritional preferences and environmental niche(s), and identify novel transporters that could be utilized in small molecule production in industrial biotechnology. The Transporter Automated Annotation Pipeline (TransAAP) (http://www.membranetransport.org/transportDB2/TransAAP_login.html) is a fully automated web service for the prediction and annotation of membrane transport proteins in an organism from its genome sequence, by using comparisons with both curated databases such as the TCDB (Transporter Classification Database) and TDB, as well as selected Pfams and TIGRFAMs of transporter families and other methodologies. TransAAP was used to annotate transporter genes in the prokaryotic genomes in the National Center for Biotechnology Information (NCBI) RefSeq; these are presented in the transporter database TransportDB (http://www.membranetransport.org) website, which has a suite of data visualization and analysis tools. Creation and maintenance of a bioinformatic database specific for transporters in all genomic datasets is essential for microbiology research groups and the general research/biotechnology community to obtain a detailed picture of membrane transporter systems in various environments, as well as comprehensive information on specific membrane transport proteins.
Collapse
Affiliation(s)
- Liam D. H. Elbourne
- School of Natural Sciences, Macquarie University, Sydney, Australia
- Biomolecular Discovery Research Centre, Macquarie University, Sydney, Australia
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, Australia
- *Correspondence: Liam D. H. Elbourne,
| | | | - Qinghu Ren
- Memorial Sloan Kettering Cancer Center, New York, USA
| | - Karl A. Hassan
- School of Environmental and Life Sciences, Newcastle University, Newcastle, Australia
| | - Sasha G. Tetu
- School of Natural Sciences, Macquarie University, Sydney, Australia
- Biomolecular Discovery Research Centre, Macquarie University, Sydney, Australia
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, Australia
| | - Ian T. Paulsen
- School of Natural Sciences, Macquarie University, Sydney, Australia
- Biomolecular Discovery Research Centre, Macquarie University, Sydney, Australia
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, Australia
- *Correspondence: Ian T. Paulsen,
| |
Collapse
|
32
|
Deutsch N, Pajkos M, Erdős G, Dosztányi Z. DisCanVis: Visualizing integrated structural and functional annotations to better understand the effect of cancer mutations located within disordered proteins. Protein Sci 2023; 32:e4522. [PMID: 36452990 PMCID: PMC9793970 DOI: 10.1002/pro.4522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/15/2022] [Accepted: 11/16/2022] [Indexed: 12/03/2022]
Abstract
Intrinsically disordered proteins (IDPs) play important roles in a wide range of biological processes and have been associated with various diseases, including cancer. In the last few years, cancer genome projects have systematically collected genetic variations underlying multiple cancer types. In parallel, the number and different types of disordered proteins characterized by experimental methods have also significantly increased. Nevertheless, the role of IDPs in various types of cancer is still not well understood. In this work, we present DisCanVis, a novel visualization tool for cancer mutations with a special focus on IDPs. In order to aid the interpretation of observed mutations, genome level information is combined with information about the structural and functional properties of proteins. The web server enables users to inspect individual proteins, collect examples with existing annotations of protein disorder and associated function or to discover currently uncharacterized examples with likely disease relevance. Through a REST API interface and precompiled tables the analysis can be extended to a group of proteins.
Collapse
Affiliation(s)
- Norbert Deutsch
- Department of BiochemistryInstitute of Biology, ELTE Eötvös Loránd UniversityBudapestHungary
| | - Mátyás Pajkos
- Department of BiochemistryInstitute of Biology, ELTE Eötvös Loránd UniversityBudapestHungary
| | - Gábor Erdős
- Department of BiochemistryInstitute of Biology, ELTE Eötvös Loránd UniversityBudapestHungary
| | - Zsuzsanna Dosztányi
- Department of BiochemistryInstitute of Biology, ELTE Eötvös Loránd UniversityBudapestHungary
| |
Collapse
|
33
|
Zeng W, Gautam A, Huson DH. MuLan-Methyl-multiple transformer-based language models for accurate DNA methylation prediction. Gigascience 2022; 12:giad054. [PMID: 37489753 PMCID: PMC10367125 DOI: 10.1093/gigascience/giad054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 05/09/2023] [Accepted: 07/18/2023] [Indexed: 07/26/2023] Open
Abstract
Transformer-based language models are successfully used to address massive text-related tasks. DNA methylation is an important epigenetic mechanism, and its analysis provides valuable insights into gene regulation and biomarker identification. Several deep learning-based methods have been proposed to identify DNA methylation, and each seeks to strike a balance between computational effort and accuracy. Here, we introduce MuLan-Methyl, a deep learning framework for predicting DNA methylation sites, which is based on 5 popular transformer-based language models. The framework identifies methylation sites for 3 different types of DNA methylation: N6-adenine, N4-cytosine, and 5-hydroxymethylcytosine. Each of the employed language models is adapted to the task using the "pretrain and fine-tune" paradigm. Pretraining is performed on a custom corpus of DNA fragments and taxonomy lineages using self-supervised learning. Fine-tuning aims at predicting the DNA methylation status of each type. The 5 models are used to collectively predict the DNA methylation status. We report excellent performance of MuLan-Methyl on a benchmark dataset. Moreover, we argue that the model captures characteristic differences between different species that are relevant for methylation. This work demonstrates that language models can be successfully adapted to applications in biological sequence analysis and that joint utilization of different language models improves model performance. Mulan-Methyl is open source, and we provide a web server that implements the approach.
Collapse
Affiliation(s)
- Wenhuan Zeng
- Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany
| | - Anupam Gautam
- Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany
- International Max Planck Research School "From Molecules to Organisms", Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
- Cluster of Excellence: EXC 2124: Controlling Microbes to Fight Infection, University of Tübingen, 72076 Tübingen, Germany
| | - Daniel H Huson
- Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany
- International Max Planck Research School "From Molecules to Organisms", Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
- Cluster of Excellence: EXC 2124: Controlling Microbes to Fight Infection, University of Tübingen, 72076 Tübingen, Germany
| |
Collapse
|
34
|
Wang S, Tang H, Zhao Y, Zuo L. BayeStab: Predicting effects of mutations on protein stability with uncertainty quantification. Protein Sci 2022; 31:e4467. [PMID: 36217239 PMCID: PMC9601791 DOI: 10.1002/pro.4467] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/06/2022] [Accepted: 10/06/2022] [Indexed: 11/11/2022]
Abstract
Predicting protein thermostability change upon mutation is crucial for understanding diseases and designing therapeutics. However, accurately estimating Gibbs free energy change of the protein remained a challenge. Some methods struggle to generalize on examples with no homology and produce uncalibrated predictions. Here we leverage advances in graph neural networks for protein feature extraction to tackle this structure-property prediction task. Our method, BayeStab, is then tested on four test datasets, including S669, S611, S350, and Myoglobin, showing high generalization and symmetry performance. Meanwhile, we apply concrete dropout enabled Bayesian neural networks to infer plausible models and estimate uncertainty. By decomposing the uncertainty into parts induced by data noise and model, we demonstrate that the probabilistic method allows insights into the inherent noise of the training datasets, which is closely relevant to the upper bound of the task. Finally, the BayeStab web server is created and can be found at: http://www.bayestab.com. The code for this work is available at: https://github.com/HongzhouTang/BayeStab.
Collapse
Affiliation(s)
- Shuyu Wang
- Department of Control EngineeringNortheastern UniversityQinhuangdaoHebeiChina
| | - Hongzhou Tang
- Department of Control EngineeringNortheastern UniversityQinhuangdaoHebeiChina
| | - Yuliang Zhao
- Department of Control EngineeringNortheastern UniversityQinhuangdaoHebeiChina
| | - Lei Zuo
- Department of Naval Architecture and Marine EngineeringUniversity of MichiganAnn ArborMichiganUSA
| |
Collapse
|
35
|
Carbone J, Ghidini A, Romano A, Gentilucci L, Musiani F. PacDOCK: A Web Server for Positional Distance-Based and Interaction-Based Analysis of Docking Results. Molecules 2022; 27:6884. [PMID: 36296477 DOI: 10.3390/molecules27206884] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 10/06/2022] [Accepted: 10/12/2022] [Indexed: 11/05/2022]
Abstract
Molecular docking is a key method for structure-based drug design used to predict the conformations assumed by small drug-like ligands when bound to their target. However, the evaluation of molecular docking studies can be hampered by the lack of a free and easy to use platform for the complete analysis of results obtained by the principal docking programs. To this aim, we developed PacDOCK, a freely available and user-friendly web server that comprises a collection of tools for positional distance-based and interaction-based analysis of docking results, which can be provided in several file formats. PacDOCK allows a complete analysis of molecular docking results through root mean square deviation (RMSD) calculation, molecular visualization, and cluster analysis of docked poses. The RMSD calculation compares docked structures with a reference structure, also when atoms are randomly labelled, and their conformational and positional differences can be visualised. In addition, it is possible to visualise a ligand into the target binding pocket and investigate the key receptor–ligand interactions. Moreover, PacDOCK enables the clustering of docking results by identifying a restrained number of clusters from many docked poses. We believe that PacDOCK will contribute to facilitating the analysis of docking results to improve the efficiency of computer-aided drug design.
Collapse
|
36
|
Rodrigues CHM, Garg A, Keizer D, Pires DEV, Ascher DB. CSM-peptides: A computational approach to rapid identification of therapeutic peptides. Protein Sci 2022; 31:e4442. [PMID: 36173168 PMCID: PMC9518225 DOI: 10.1002/pro.4442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/25/2022]
Abstract
Peptides are attractive alternatives for the development of new therapeutic strategies due to their versatility and low complexity of synthesis. Increasing interest in these molecules has led to the creation of large collections of experimentally characterized therapeutic peptides, which greatly contributes to development of data‐driven computational approaches. Here we propose CSM‐peptides, a novel machine learning method for rapid identification of eight different types of therapeutic peptides: anti‐angiogenic, anti‐bacterial, anti‐cancer, anti‐inflammatory, anti‐viral, cell‐penetrating, quorum sensing, and surface binding. Our method has shown to outperform existing approaches, achieving an AUC of up to 0.92 on independent blind tests, and consistent performance on cross‐validation. We anticipate CSM‐peptides to be of great value in helping screening large libraries to identify novel peptides with therapeutic potential and have made it freely available as a user‐friendly web server and Application Programming Interface at https://biosig.lab.uq.edu.au/csm_peptides.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland, Australia
| | - Anjali Garg
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
| | - David Keizer
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland, Australia
| |
Collapse
|
37
|
Zhang ZY, Ning L, Ye X, Yang YH, Futamura Y, Sakurai T, Lin H. iLoc-miRNA: extracellular/intracellular miRNA prediction using deep BiLSTM with attention mechanism. Brief Bioinform 2022; 23:6693601. [PMID: 36070864 DOI: 10.1093/bib/bbac395] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 07/27/2022] [Accepted: 08/13/2022] [Indexed: 11/13/2022] Open
Abstract
The location of microRNAs (miRNAs) in cells determines their function in regulation activity. Studies have shown that miRNAs are stable in the extracellular environment that mediates cell-to-cell communication and are located in the intracellular region that responds to cellular stress and environmental stimuli. Though in situ detection techniques of miRNAs have made great contributions to the study of the localization and distribution of miRNAs, miRNA subcellular localization and their role are still in progress. Recently, some machine learning-based algorithms have been designed for miRNA subcellular location prediction, but their performance is still far from satisfactory. Here, we present a new data partitioning strategy that categorizes functionally similar locations for the precise and instructive prediction of miRNA subcellular location in Homo sapiens. To characterize the localization signals, we adopted one-hot encoding with post padding to represent the whole miRNA sequences, and proposed a deep bidirectional long short-term memory with the multi-head self-attention algorithm to model. The algorithm showed high selectivity in distinguishing extracellular miRNAs from intracellular miRNAs. Moreover, a series of motif analyses were performed to explore the mechanism of miRNA subcellular localization. To improve the convenience of the model, a user-friendly web server named iLoc-miRNA was established (http://iLoc-miRNA.lin-group.cn/).
Collapse
Affiliation(s)
- Zhao-Yue Zhang
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba 3058577, Japan
| | - Lin Ning
- School of Healthcare Technology, Chengdu Neusoft University, 611844, Chengdu, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Yu-He Yang
- Center for Information Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yasunori Futamura
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba 3058577, Japan.,Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Tetsuya Sakurai
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba 3058577, Japan.,Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Hao Lin
- Center for Information Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
38
|
Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Circ-LocNet: A Computational Framework for Circular RNA Sub-Cellular Localization Prediction. Int J Mol Sci 2022; 23:ijms23158221. [PMID: 35897818 PMCID: PMC9329987 DOI: 10.3390/ijms23158221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/15/2022] [Accepted: 07/20/2022] [Indexed: 02/04/2023] Open
Abstract
Circular ribonucleic acids (circRNAs) are novel non-coding RNAs that emanate from alternative splicing of precursor mRNA in reversed order across exons. Despite the abundant presence of circRNAs in human genes and their involvement in diverse physiological processes, the functionality of most circRNAs remains a mystery. Like other non-coding RNAs, sub-cellular localization knowledge of circRNAs has the aptitude to demystify the influence of circRNAs on protein synthesis, degradation, destination, their association with different diseases, and potential for drug development. To date, wet experimental approaches are being used to detect sub-cellular locations of circular RNAs. These approaches help to elucidate the role of circRNAs as protein scaffolds, RNA-binding protein (RBP) sponges, micro-RNA (miRNA) sponges, parental gene expression modifiers, alternative splicing regulators, and transcription regulators. To complement wet-lab experiments, considering the progress made by machine learning approaches for the determination of sub-cellular localization of other non-coding RNAs, the paper in hand develops a computational framework, Circ-LocNet, to precisely detect circRNA sub-cellular localization. Circ-LocNet performs comprehensive extrinsic evaluation of 7 residue frequency-based, residue order and frequency-based, and physio-chemical property-based sequence descriptors using the five most widely used machine learning classifiers. Further, it explores the performance impact of K-order sequence descriptor fusion where it ensembles similar as well dissimilar genres of statistical representation learning approaches to reap the combined benefits. Considering the diversity of statistical representation learning schemes, it assesses the performance of second-order, third-order, and going all the way up to seventh-order sequence descriptor fusion. A comprehensive empirical evaluation of Circ-LocNet over a newly developed benchmark dataset using different settings reveals that standalone residue frequency-based sequence descriptors and tree-based classifiers are more suitable to predict sub-cellular localization of circular RNAs. Further, K-order heterogeneous sequence descriptors fusion in combination with tree-based classifiers most accurately predict sub-cellular localization of circular RNAs. We anticipate this study will act as a rich baseline and push the development of robust computational methodologies for the accurate sub-cellular localization determination of novel circRNAs.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
- Correspondence:
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Imran Malik
- School of Computer Science & Electrical Engineering, National University of Sciences and Technology, Islamabad 44000, Pakistan;
| | - Andreas Dengel
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- DeepReader GmbH, Trippstadter Str. 122, 67663 Kaiserslautern, Germany
| |
Collapse
|
39
|
Duobiene S, Ratautas K, Trusovas R, Ragulis P, Šlekas G, Simniškis R, Račiukaitis G. Development of Wireless Sensor Network for Environment Monitoring and Its Implementation Using SSAIL Technology. Sensors (Basel) 2022; 22:s22145343. [PMID: 35891024 PMCID: PMC9321793 DOI: 10.3390/s22145343] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 07/08/2022] [Accepted: 07/15/2022] [Indexed: 05/27/2023]
Abstract
The Internet of Things (IoT) technology and its applications are turning real-world things into smart objects, integrating everything under a common infrastructure to manage performance through a software application and offering upgrades with integrated web servers in a timely manner. Quality of life, the green economy, and pollution management in society require comprehensive environmental monitoring systems with easy-to-use features and maintenance. This research suggests implementing a wireless sensor network with embedded sensor nodes manufactured using the Selective Surface Activation Induced by Laser technology. Such technology allows the integration of electrical circuits with free-form plastic sensor housing. In this work, a low-cost asynchronous web server for monitoring temperature and humidity sensors connected to the ESP32 Wi-Fi module has been developed. Data from sensor nodes across the facility are collected and displayed in real-time charts on a web server. Multiple web clients on the same network can access the sensor data. The energy to the sensor nodes could be powered by harvesting energy from surrounding sources of electromagnetic radiation. This automated and self-powered system monitors environmental and climatic factors, helps with timely action, and benefits sensor design by allowing antenna and rf-circuit formation on various plastics, even on the body of the device itself. It also provides greater flexibility in hardware modification and rapid large-scale deployment.
Collapse
Affiliation(s)
- Shathya Duobiene
- Department of Laser Technologies, FTMC—Center for Physical Sciences and Technology, Savanoriu Ave. 231, LT-02300 Vilnius, Lithuania; (K.R.); (R.T.); (G.R.)
| | - Karolis Ratautas
- Department of Laser Technologies, FTMC—Center for Physical Sciences and Technology, Savanoriu Ave. 231, LT-02300 Vilnius, Lithuania; (K.R.); (R.T.); (G.R.)
| | - Romualdas Trusovas
- Department of Laser Technologies, FTMC—Center for Physical Sciences and Technology, Savanoriu Ave. 231, LT-02300 Vilnius, Lithuania; (K.R.); (R.T.); (G.R.)
| | - Paulius Ragulis
- Department of Physical Technologies, FTMC—Center for Physical Sciences and Technology, Sauletekio Al. 3, LT-02300 Vilnius, Lithuania; (P.R.); (G.Š.); (R.S.)
| | - Gediminas Šlekas
- Department of Physical Technologies, FTMC—Center for Physical Sciences and Technology, Sauletekio Al. 3, LT-02300 Vilnius, Lithuania; (P.R.); (G.Š.); (R.S.)
| | - Rimantas Simniškis
- Department of Physical Technologies, FTMC—Center for Physical Sciences and Technology, Sauletekio Al. 3, LT-02300 Vilnius, Lithuania; (P.R.); (G.Š.); (R.S.)
| | - Gediminas Račiukaitis
- Department of Laser Technologies, FTMC—Center for Physical Sciences and Technology, Savanoriu Ave. 231, LT-02300 Vilnius, Lithuania; (K.R.); (R.T.); (G.R.)
| |
Collapse
|
40
|
Abstract
Cells express thousands of macromolecules, and their functioning relies on multiple networks of intermolecular interactions. These interactions can be experimentally determined at different spatial and temporal resolutions. But, physical interfaces are not often delineated directly, especially in high-throughput experiments. A large fraction of protein-protein interactions involves domain and so-called SLiMs (for Short Linear Motifs). Often, SLiMs lie in disordered regions or loops. Their small size, limited sequence conservation, and loosely folded nature prevent straightforward detection. SLiMAn (Short Linear Motif Analysis), a new web server, is provided to help thorough analysis of interactomics data. From a list of putative interactants (e.g., output from an interactomics study), SLiMs (from ELM) and SLiM-recognition domains (from Pfam) are extracted, and putative pairings are displayed. Predicted results can be filtered using motif E-values, IUPred2 scores, or BioGRID interaction matches. When structural templates are available, a given SLiM and its recognition domain can be modeled using SCWRL. We illustrate here the use of SLiMAn on distinct examples, including one real-case study. We oversee wide-range applications for SLiMAn in the context of the massive analysis of protein-protein interactions. This new web server is made freely available at https://sliman.cbs.cnrs.fr.
Collapse
Affiliation(s)
- Victor Reys
- Centre de Biologie Structurale, CNRS, INSERM, Univ. Montpellier, Montpellier 34090, France
| | - Gilles Labesse
- Centre de Biologie Structurale, CNRS, INSERM, Univ. Montpellier, Montpellier 34090, France
| |
Collapse
|
41
|
Bertelli C, Gray KL, Woods N, Lim AC, Tilley KE, Winsor GL, Hoad GR, Roudgar A, Spencer A, Peltier J, Warren D, Raphenya AR, McArthur AG, Brinkman FSL. Enabling genomic island prediction and comparison in multiple genomes to investigate bacterial evolution and outbreaks. Microb Genom 2022; 8. [PMID: 35584003 PMCID: PMC9465072 DOI: 10.1099/mgen.0.000818] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Outbreaks of virulent and/or drug-resistant bacteria have a significant impact on human health and major economic consequences. Genomic islands (GIs; defined as clusters of genes of probable horizontal origin) are of high interest because they disproportionately encode virulence factors, some antimicrobial-resistance (AMR) genes, and other adaptations of medical or environmental interest. While microbial genome sequencing has become rapid and inexpensive, current computational methods for GI analysis are not amenable for rapid, accurate, user-friendly and scalable comparative analysis of sets of related genomes. To help fill this gap, we have developed IslandCompare, an open-source computational pipeline for GI prediction and comparison across several to hundreds of bacterial genomes. A dynamic and interactive visualization strategy displays a bacterial core-genome phylogeny, with bacterial genomes linearly displayed at the phylogenetic tree leaves. Genomes are overlaid with GI predictions and AMR determinants from the Comprehensive Antibiotic Resistance Database (CARD), and regions of similarity between the genomes are also displayed. GI predictions are performed using Sigi-HMM and IslandPath-DIMOB, the two most precise GI prediction tools based on nucleotide composition biases, as well as a novel blast-based consistency step to improve cross-genome prediction consistency. GIs across genomes sharing sequence similarity are grouped into clusters, further aiding comparative analysis and visualization of acquisition and loss of mobile GIs in specific sub-clades. IslandCompare is an open-source software that is containerized for local use, plus available via a user-friendly, web-based interface to allow direct use by bioinformaticians, biologists and clinicians (at https://islandcompare.ca).
Collapse
Affiliation(s)
- Claire Bertelli
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada.,Institute of Microbiology, Lausanne University Hospital and University of Lausanne, 1011 Lausanne, Switzerland
| | - Kristen L Gray
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Nolan Woods
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Adrian C Lim
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Keith E Tilley
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Geoffrey L Winsor
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Gemma R Hoad
- Research Computing Group, Simon Fraser University, Burnaby, BC, Canada
| | - Ata Roudgar
- Research Computing Group, Simon Fraser University, Burnaby, BC, Canada
| | - Adam Spencer
- Research Computing Group, Simon Fraser University, Burnaby, BC, Canada
| | - James Peltier
- Research Computing Group, Simon Fraser University, Burnaby, BC, Canada
| | - Derek Warren
- Research Computing Group, Simon Fraser University, Burnaby, BC, Canada
| | - Amogelang R Raphenya
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, ON, Canada.,Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, Canada.,Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, Canada
| | - Andrew G McArthur
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, ON, Canada.,Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, Canada.,Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, Canada
| | - Fiona S L Brinkman
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
42
|
Dhall A, Patiyal S, Raghava GPS. HLAncPred: a method for predicting promiscuous non-classical HLA binding sites. Brief Bioinform 2022; 23:6587168. [PMID: 35580839 DOI: 10.1093/bib/bbac192] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 03/23/2022] [Accepted: 04/27/2022] [Indexed: 12/25/2022] Open
Abstract
Human leukocyte antigens (HLA) regulate various innate and adaptive immune responses and play a crucial immunomodulatory role. Recent studies revealed that non-classical HLA-(HLA-E & HLA-G) based immunotherapies have many advantages over traditional HLA-based immunotherapy, particularly against cancer and COVID-19 infection. In the last two decades, several methods have been developed to predict the binders of classical HLA alleles. In contrast, limited attempts have been made to develop methods for predicting non-classical HLA binding peptides, due to the scarcity of sufficient experimental data. Of note, in order to facilitate the scientific community, we have developed an artificial intelligence-based method for predicting binders of class-Ib HLA alleles. All the models were trained and tested on experimentally validated data obtained from the recent release of IEDB. The machine learning models achieved more than 0.98 AUC for HLA-G alleles on validation dataset. Similarly, our models achieved the highest AUC of 0.96 and 0.94 on the validation dataset for HLA-E*01:01 and HLA-E*01:03, respectively. We have summarized the models developed in the past for non-classical HLA and validated the performance with the models developed in this study. Moreover, to facilitate the community, we have utilized our tool for predicting the potential non-classical HLA binding peptides in the spike protein of different variants of virus causing COVID-19, including Omicron (B.1.1.529). One of the major challenges in the field of immunotherapy is to identify the promiscuous binders or antigenic regions that can bind to a large number of HLA alleles. To predict the promiscuous binders for the non-classical HLA alleles, we developed a web server HLAncPred (https://webs.iiitd.edu.in/raghava/hlancpred) and standalone package.
Collapse
Affiliation(s)
- Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| |
Collapse
|
43
|
Chen Y, He Z, Men Y, Dong G, Hu S, Ying X. MetaLogo: a heterogeneity-aware sequence logo generator and aligner. Brief Bioinform 2022; 23:6519790. [PMID: 35108357 PMCID: PMC8921662 DOI: 10.1093/bib/bbab591] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 12/16/2021] [Accepted: 12/22/2021] [Indexed: 11/13/2022] Open
Abstract
Sequence logos are used to visually display conservations and variations in short sequences. They can indicate the fixed patterns or conserved motifs in a batch of DNA or protein sequences. However, most of the popular sequence logo generators are based on the assumption that all the input sequences are from the same homologous group, which will lead to an overlook of the heterogeneity among the sequences during the sequence logo making process. Heterogeneous groups of sequences may represent clades of different evolutionary origins, or genes families with different functions. Therefore, it is essential to divide the sequences into different phylogenetic or functional groups to reveal their specific sequence motifs and conservation patterns. To solve these problems, we developed MetaLogo, which can automatically cluster the input sequences after multiple sequence alignment and phylogenetic tree construction, and then output sequence logos for multiple groups and aligned them in one figure. User-defined grouping is also supported by MetaLogo to allow users to investigate functional motifs in a more delicate and dynamic perspective. MetaLogo can highlight both the homologous and nonhomologous sites among sequences. MetaLogo can also be used to annotate the evolutionary positions and gene functions of unknown sequences, together with their local sequence characteristics. We provide users a public MetaLogo web server (http://metalogo.omicsnet.org), a standalone Python package (https://github.com/labomics/MetaLogo), and also a built-in web server available for local deployment. Using MetaLogo, users can draw informative, customized and publishable sequence logos without any programming experience to present and investigate new knowledge on specific sequence sets.
Collapse
Affiliation(s)
- Yaowen Chen
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Zhen He
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Yahui Men
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Guohua Dong
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Shuofeng Hu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Xiaomin Ying
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| |
Collapse
|
44
|
Dey S, Prilusky J, Levy ED. QSalignWeb: A Server to Predict and Analyze Protein Quaternary Structure. Front Mol Biosci 2022; 8:787510. [PMID: 35071324 PMCID: PMC8769216 DOI: 10.3389/fmolb.2021.787510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 12/02/2021] [Indexed: 11/16/2022] Open
Abstract
The identification of physiologically relevant quaternary structures (QSs) in crystal lattices is challenging. To predict the physiological relevance of a particular QS, QSalign searches for homologous structures in which subunits interact in the same geometry. This approach proved accurate but was limited to structures already present in the Protein Data Bank (PDB). Here, we introduce a webserver (www.QSalign.org) allowing users to submit homo-oligomeric structures of their choice to the QSalign pipeline. Given a user-uploaded structure, the sequence is extracted and used to search homologs based on sequence similarity and PFAM domain architecture. If structural conservation is detected between a homolog and the user-uploaded QS, physiological relevance is inferred. The web server also generates alternative QSs with PISA and processes them the same way as the query submitted to widen the predictions. The result page also shows representative QSs in the protein family of the query, which is informative if no QS conservation was detected or if the protein appears monomeric. These representative QSs can also serve as a starting point for homology modeling.
Collapse
Affiliation(s)
- Sucharita Dey
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Jaime Prilusky
- Department of Life Sciences and Core Facilities, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D. Levy
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
45
|
Lomize AL, Todd SC, Pogozheva ID. Spatial arrangement of proteins in planar and curved membranes by PPM 3.0. Protein Sci 2022; 31:209-220. [PMID: 34716622 PMCID: PMC8740824 DOI: 10.1002/pro.4219] [Citation(s) in RCA: 68] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 10/26/2021] [Accepted: 10/28/2021] [Indexed: 01/03/2023]
Abstract
Cellular protrusions, invaginations, and many intracellular organelles have strongly curved membrane regions. Transmembrane and peripheral membrane proteins that induce, sense, or stabilize such regions cannot be properly fitted into a single flat bilayer. To treat such proteins, we developed a new method and a web tool, PPM 3.0, for positioning proteins in curved or planar, single or multiple membranes. This method determines the energetically optimal spatial position, the hydrophobic thickness, and the radius of intrinsic curvature of a membrane-deforming protein structure by arranging it in a single or several sphere-shaped or planar membrane sections. In addition, it can define the lipid-embedded regions of a protein that simultaneously spans several membranes or determine the optimal position of a peptide in a spherical micelle. The PPM 3.0 web server operates with 17 types of biological membranes and 4 types of artificial bilayers. It is publicly available at https://opm.phar.umich.edu/ppm_server3. PPM 3.0 was applied to identify and characterize arrangements in membranes of 128 proteins with a significant intrinsic curvature, such as BAR domains, annexins, Piezo, and MscS mechanosensitive channels, cation-chloride cotransporters, as well as mitochondrial ATP synthases, calcium uniporters, and TOM complexes. These proteins form large complexes that are mainly localized in mitochondria, plasma membranes, and endosomes. Structures of bacterial drug efflux pumps, AcrAB-TolC, MexAB-OrpM, and MacAB-TolC, were positioned in both membranes of the bacterial cell envelop, while structures of multimeric gap-junction channels were arranged in two opposed cellular membranes.
Collapse
Affiliation(s)
- Andrei L. Lomize
- College of Pharmacy, Department of Medicinal ChemistryUniversity of MichiganAnn ArborMichiganUSA
| | - Spencer C. Todd
- Department of Electrical Engineering and Computer Science, College of EngineeringUniversity of MichiganAnn ArborMichiganUSA
| | - Irina D. Pogozheva
- College of Pharmacy, Department of Medicinal ChemistryUniversity of MichiganAnn ArborMichiganUSA
| |
Collapse
|
46
|
Törönen P, Holm L. PANNZER-A practical tool for protein function prediction. Protein Sci 2022; 31:118-128. [PMID: 34562305 PMCID: PMC8740830 DOI: 10.1002/pro.4193] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 09/22/2021] [Accepted: 09/22/2021] [Indexed: 01/03/2023]
Abstract
The facility of next-generation sequencing has led to an explosion of gene catalogs for novel genomes, transcriptomes and metagenomes, which are functionally uncharacterized. Computational inference has emerged as a necessary substitute for first-hand experimental evidence. PANNZER (Protein ANNotation with Z-scoRE) is a high-throughput functional annotation web server that stands out among similar publically accessible web servers in supporting submission of up to 100,000 protein sequences at once and providing both Gene Ontology (GO) annotations and free text description predictions. Here, we demonstrate the use of PANNZER and discuss future plans and challenges. We present two case studies to illustrate problems related to data quality and method evaluation. Some commonly used evaluation metrics and evaluation datasets promote methods that favor unspecific and broad functional classes over more informative and specific classes. We argue that this can bias the development of automated function prediction methods. The PANNZER web server and source code are available at http://ekhidna2.biocenter.helsinki.fi/sanspanz/.
Collapse
Affiliation(s)
- Petri Törönen
- Institute of Biotechnology, Helsinki Institute of Life Sciences, University of HelsinkiHelsinkiFinland
| | - Liisa Holm
- Institute of Biotechnology, Helsinki Institute of Life Sciences, University of HelsinkiHelsinkiFinland,Organismal and Evolutionary Biology Research Program, Faculty of BiosciencesUniversity of HelsinkiHelsinkiFinland
| |
Collapse
|
47
|
Fei Y, Feng J, Wang R, Zhang B, Zhang H, Huang J. PhasiRNAnalyzer: an integrated analyser for plant phased siRNAs. RNA Biol 2021; 18:1622-1629. [PMID: 33541212 PMCID: PMC8594884 DOI: 10.1080/15476286.2021.1879543] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 01/15/2021] [Accepted: 01/18/2021] [Indexed: 10/22/2022] Open
Abstract
Phased siRNAs (phasiRNAs) are a class of small interfering RNAs (siRNAs) which play essential roles in plant development and defence. However, only a few phasiRNAs have been extensively studied due to the difficulties in identifying and characterizing plant phasiRNAs by plant biologists. Herein, we describe a comprehensive and multi-functional web server termed PhasiRNAnalyzer, which is able to identify all crucial components in plant phasiRNA's regulatory pathway (phase-initiator→PHAS gene→phasiRNA cluster→target gene). Currently, PhasiRNAnalyzer exhibits the following advantages: I) It is the most comprehensive platform which hosts 170 plant species with 256 genome data, 438 cDNA data and 271 degradome data. II) It can identify all crucial components in phasiRNA's regulatory pathway, and verify the interactions between phasiRNAs and their target genes based on degradome data. III) It can perform differential expression analysis of phasiRNAs on each PHAS gene locus between different samples conveniently. IV) It provides the user-friendly interfaces and introduces several improvements, primarily by making more accurate and efficient analysis when dealing with deep sequencing data. In summary, PhasiRNAnalyzer is a comprehensive and systemic phasiRNA analysis server with high sensitivity and efficiency. It can be freely accessed at https://cbi.njau.edu.cn/PPSA/.
Collapse
Affiliation(s)
- Yuhan Fei
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, College of Agriculture, Nanjing Agricultural University, Nanjing, China
| | - Jiejie Feng
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, College of Agriculture, Nanjing Agricultural University, Nanjing, China
| | - Rui Wang
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, College of Agriculture, Nanjing Agricultural University, Nanjing, China
| | - Baoyi Zhang
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, College of Agriculture, Nanjing Agricultural University, Nanjing, China
| | - Hongsheng Zhang
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, College of Agriculture, Nanjing Agricultural University, Nanjing, China
| | - Ji Huang
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, College of Agriculture, Nanjing Agricultural University, Nanjing, China
| |
Collapse
|
48
|
Shi XX, Wang ZZ, Wang YL, Huang GY, Yang JF, Wang F, Hao GF, Yang GF. PTMdyna: exploring the influence of post-translation modifications on protein conformational dynamics. Brief Bioinform 2021; 23:6394992. [PMID: 34643234 DOI: 10.1093/bib/bbab424] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 09/02/2021] [Accepted: 09/14/2021] [Indexed: 11/14/2022] Open
Abstract
Protein post-translational modifications (PTM) play vital roles in cellular regulation, modulating functions by driving changes in protein structure and dynamics. Exploring comprehensively the influence of PTM on conformational dynamics can facilitate the understanding of the related biological function and molecular mechanism. Currently, a series of excellent computation tools have been designed to analyze the time-dependent structural properties of proteins. However, the protocol aimed to explore conformational dynamics of post-translational modified protein is still a blank. To fill this gap, we present PTMdyna to visually predict the conformational dynamics differences between unmodified and modified proteins, thus indicating the influence of specific PTM. PTMdyna exhibits an AUC of 0.884 tested on 220 protein-protein complex structures. The case of heterochromatin protein 1α complexed with lysine 9-methylated histone H3, which is critical for genomic stability and cell differentiation, was used to demonstrate its applicability. PTMdyna provides a reliable platform to predict the influence of PTM on protein dynamics, making it easier to interpret PTM functionality at the structure level. The web server is freely available at http://ccbportal.com/PTMdyna.
Collapse
Affiliation(s)
- Xing-Xing Shi
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, Hubei, P. R. China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, Hubei, P. R. China
| | - Zhi-Zheng Wang
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, Hubei, P. R. China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, Hubei, P. R. China
| | - Yu-Liang Wang
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, Hubei, P. R. China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, Hubei, P. R. China
| | - Guang-Yi Huang
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, Hubei, P. R. China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, Hubei, P. R. China
| | - Jing-Fang Yang
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, Hubei, P. R. China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, Hubei, P. R. China
| | - Fan Wang
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, Hubei, P. R. China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, Hubei, P. R. China
| | - Ge-Fei Hao
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, Hubei, P. R. China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, Hubei, P. R. China.,State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Research and Development Center for Fine Chemicals, Guizhou University, Guiyang, Guizhou, P. R. China
| | - Guang-Fu Yang
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, Hubei, P. R. China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, Hubei, P. R. China
| |
Collapse
|
49
|
Jia L, Yao W, Jiang Y, Li Y, Wang Z, Li H, Huang F, Li J, Chen T, Zhang H. Development of interactive biological web applications with R/Shiny. Brief Bioinform 2021; 23:6387320. [PMID: 34642739 DOI: 10.1093/bib/bbab415] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 09/09/2021] [Accepted: 09/12/2021] [Indexed: 12/13/2022] Open
Abstract
Development of interactive web applications to deposit, visualize and analyze biological datasets is a major subject of bioinformatics. R is a programming language for data science, which is also one of the most popular languages used in biological data analysis and bioinformatics. However, building interactive web applications was a great challenge for R users before the Shiny package was developed by the RStudio company in 2012. By compiling R code into HTML, CSS and JavaScript code, Shiny has made it incredibly easy to build web applications for the large R community in bioinformatics and for even non-programmers. Over 470 biological web applications have been developed with R/Shiny up to now. To further promote the utilization of R/Shiny, we reviewed the development of biological web applications with R/Shiny, including eminent biological web applications built with R/Shiny, basic steps to build an R/Shiny application, commonly used R packages to build the interface and server of R/Shiny applications, deployment of R/Shiny applications in the cloud and online resources for R/Shiny.
Collapse
Affiliation(s)
- Lihua Jia
- National Key Laboratory of Wheat and Maize Crop Science, College of Agronomy, Henan Agricultural University, Zhengzhou 450002, China.,College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Wen Yao
- College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Yingru Jiang
- College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Yang Li
- College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Zhizhan Wang
- College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Haoran Li
- College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Fangfang Huang
- College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Jiaming Li
- College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Tiantian Chen
- College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Huiyong Zhang
- College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| |
Collapse
|
50
|
Chen Z, Wang M, De Wilde RL, Feng R, Su M, Torres-de la Roche LA, Shi W. A Machine Learning Model to Predict the Triple Negative Breast Cancer Immune Subtype. Front Immunol 2021; 12:749459. [PMID: 34603338 PMCID: PMC8484710 DOI: 10.3389/fimmu.2021.749459] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 08/30/2021] [Indexed: 12/29/2022] Open
Abstract
Background Immune checkpoint blockade (ICB) has been approved for the treatment of triple-negative breast cancer (TNBC), since it significantly improved the progression-free survival (PFS). However, only about 10% of TNBC patients could achieve the complete response (CR) to ICB because of the low response rate and potential adverse reactions to ICB. Methods Open datasets from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) were downloaded to perform an unsupervised clustering analysis to identify the immune subtype according to the expression profiles. The prognosis, enriched pathways, and the ICB indicators were compared between immune subtypes. Afterward, samples from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset were used to validate the correlation of immune subtype with prognosis. Data from patients who received ICB were selected to validate the correlation of the immune subtype with ICB response. Machine learning models were used to build a visual web server to predict the immune subtype of TNBC patients requiring ICB. Results A total of eight open datasets including 931 TNBC samples were used for the unsupervised clustering. Two novel immune subtypes (referred to as S1 and S2) were identified among TNBC patients. Compared with S2, S1 was associated with higher immune scores, higher levels of immune cells, and a better prognosis for immunotherapy. In the validation dataset, subtype 1 samples had a better prognosis than sub type 2 samples, no matter in overall survival (OS) (p = 0.00036) or relapse-free survival (RFS) (p = 0.0022). Bioinformatics analysis identified 11 hub genes (LCK, IL2RG, CD3G, STAT1, CD247, IL2RB, CD3D, IRF1, OAS2, IRF4, and IFNG) related to the immune subtype. A robust machine learning model based on random forest algorithm was established by 11 hub genes, and it performed reasonably well with area Under the Curve of the receiver operating characteristic (AUC) values = 0.76. An open and free web server based on the random forest model, named as triple-negative breast cancer immune subtype (TNBCIS), was developed and is available from https://immunotypes.shinyapps.io/TNBCIS/. Conclusion TNBC open datasets allowed us to stratify samples into distinct immunotherapy response subgroups according to gene expression profiles. Based on two novel subtypes, candidates for ICB with a higher response rate and better prognosis could be selected by using the free visual online web server that we designed.
Collapse
Affiliation(s)
- Zihao Chen
- Department of Urology, University of Freiburg, Freiburg, Germany
| | - Maoli Wang
- Department of Breast Surgery, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, China
| | - Rudy Leon De Wilde
- University Hospital for Gynecology, Pius-Hospital, University Medicine Oldenburg, Oldenburg, Germany
| | - Ruifa Feng
- Breast Center of The Second Affiliated Hospital of Guilin Medical University, Guilin, China
| | - Mingqiang Su
- Department of Urology, Zigong Hospital, Affiliated to Southwest Medical University, Zigong, China
| | | | - Wenjie Shi
- University Hospital for Gynecology, Pius-Hospital, University Medicine Oldenburg, Oldenburg, Germany
| |
Collapse
|