1
|
Ruan X, Xia S, Li S, Su Z, Yang J. Hybrid framework for membrane protein type prediction based on the PSSM. Sci Rep 2024; 14:17156. [PMID: 39060345 PMCID: PMC11282086 DOI: 10.1038/s41598-024-68163-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 07/22/2024] [Indexed: 07/28/2024] Open
Abstract
Membrane proteins are considered the major source of drug targets and are indispensable for drug design and disease prevention. However, traditional biomechanical experiments are costly and time-consuming; thus, many computational methods for predicting membrane protein types are gaining popularity. The position-specific scoring matrix (PSSM) method is an excellent method for describing the evolutionary information of protein sequences. In this study, we propose an improved capsule neural network (ICNN) model based on a capsule neural network to acquire sufficient relevant information from the PSSM. Furthermore, accounting for the complementarity between traditional machine learning and deep learning, we propose a hybrid framework that combines both approaches to predict protein types. This framework trains 41 baseline models based on the PSSM. The optimal subset features, selected after traversal, are fused using a two-level decision-level feature fusion approach. Subsequently, comparisons are made using three combined strategies within an ensemble learning framework. The experimental results demonstrate that solely relying on PSSM input, the proposed method not only surpasses the optimal methods by 1.52 % , 2.26 % and 2.67 % on Dataset1, Dataset2, and Datasets3, respectively, but also exhibits superior generalizability. Furthermore, the code and dataset can be free download at https://github.com/ruanxiaoli/membrane-protein-types .
Collapse
Affiliation(s)
- Xiaoli Ruan
- State Key Laboratory of Public Big Data, Guizhou University, Guizhou, 550000, Guizhou, China.
| | - Sina Xia
- State Key Laboratory of Public Big Data, Guizhou University, Guizhou, 550000, Guizhou, China
| | - Shaobo Li
- State Key Laboratory of Public Big Data, Guizhou University, Guizhou, 550000, Guizhou, China
| | - Zhidong Su
- Department of Electrical and Computer Engineering, University of Oklahoma State, Stillwater, 74078, USA
| | - Jing Yang
- State Key Laboratory of Public Big Data, Guizhou University, Guizhou, 550000, Guizhou, China
| |
Collapse
|
2
|
Sun A, Li H, Dong G, Zhao Y, Zhang D. DBPboost:A method of classification of DNA-binding proteins based on improved differential evolution algorithm and feature extraction. Methods 2024; 223:56-64. [PMID: 38237792 DOI: 10.1016/j.ymeth.2024.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 12/29/2023] [Accepted: 01/13/2024] [Indexed: 02/01/2024] Open
Abstract
DNA-binding proteins are a class of proteins that can interact with DNA molecules through physical and chemical interactions. Their main functions include regulating gene expression, maintaining chromosome structure and stability, and more. DNA-binding proteins play a crucial role in cellular and molecular biology, as they are essential for maintaining normal cellular physiological functions and adapting to environmental changes. The prediction of DNA-binding proteins has been a hot topic in the field of bioinformatics. The key to accurately classifying DNA-binding proteins is to find suitable feature sources and explore the information they contain. Although there are already many models for predicting DNA-binding proteins, there is still room for improvement in mining feature source information and calculation methods. In this study, we created a model called DBPboost to better identify DNA-binding proteins. The innovation of this study lies in the use of eight feature extraction methods, the improvement of the feature selection step, which involves selecting some features first and then performing feature selection again after feature fusion, and the optimization of the differential evolution algorithm in feature fusion, which improves the performance of feature fusion. The experimental results show that the prediction accuracy of the model on the UniSwiss dataset is 89.32%, and the sensitivity is 89.01%, which is better than most existing models.
Collapse
Affiliation(s)
- Ailun Sun
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Hongfei Li
- College of Life Science, Northeast Forestry University, Harbin 150040, China
| | - Guanghui Dong
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Yuming Zhao
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Dandan Zhang
- Department of Obstetrics and Gynecology, the First Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, China.
| |
Collapse
|
3
|
Xue H, Sun Y, Chen J, Tian H, Liu Z, Shen M, Liu L. CAT-CBAM-Net: An Automatic Scoring Method for Sow Body Condition Based on CNN and Transformer. SENSORS (BASEL, SWITZERLAND) 2023; 23:7919. [PMID: 37765975 PMCID: PMC10535612 DOI: 10.3390/s23187919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 09/02/2023] [Accepted: 09/14/2023] [Indexed: 09/29/2023]
Abstract
Sow body condition scoring has been confirmed as a vital procedure in sow management. A timely and accurate assessment of the body condition of a sow is conducive to determining nutritional supply, and it takes on critical significance in enhancing sow reproductive performance. Manual sow body condition scoring methods have been extensively employed in large-scale sow farms, which are time-consuming and labor-intensive. To address the above-mentioned problem, a dual neural network-based automatic scoring method was developed in this study for sow body condition. The developed method aims to enhance the ability to capture local features and global information in sow images by combining CNN and transformer networks. Moreover, it introduces a CBAM module to help the network pay more attention to crucial feature channels while suppressing attention to irrelevant channels. To tackle the problem of imbalanced categories and mislabeling of body condition data, the original loss function was substituted with the optimized focal loss function. As indicated by the model test, the sow body condition classification achieved an average precision of 91.06%, the average recall rate was 91.58%, and the average F1 score reached 91.31%. The comprehensive comparative experimental results suggested that the proposed method yielded optimal performance on this dataset. The method developed in this study is capable of achieving automatic scoring of sow body condition, and it shows broad and promising applications.
Collapse
Affiliation(s)
- Hongxiang Xue
- College of Engineering, Nanjing Agricultural University, Nanjing 210031, China; (H.X.); (Y.S.); (J.C.); (Z.L.)
- Key Laboratory of Breeding Equipment, Ministry of Agriculture and Rural Affairs, Nanjing 210031, China; (H.T.); (M.S.)
| | - Yuwen Sun
- College of Engineering, Nanjing Agricultural University, Nanjing 210031, China; (H.X.); (Y.S.); (J.C.); (Z.L.)
- Key Laboratory of Breeding Equipment, Ministry of Agriculture and Rural Affairs, Nanjing 210031, China; (H.T.); (M.S.)
| | - Jinxin Chen
- College of Engineering, Nanjing Agricultural University, Nanjing 210031, China; (H.X.); (Y.S.); (J.C.); (Z.L.)
- Key Laboratory of Breeding Equipment, Ministry of Agriculture and Rural Affairs, Nanjing 210031, China; (H.T.); (M.S.)
| | - Haonan Tian
- Key Laboratory of Breeding Equipment, Ministry of Agriculture and Rural Affairs, Nanjing 210031, China; (H.T.); (M.S.)
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing 210031, China
| | - Zihao Liu
- College of Engineering, Nanjing Agricultural University, Nanjing 210031, China; (H.X.); (Y.S.); (J.C.); (Z.L.)
- Key Laboratory of Breeding Equipment, Ministry of Agriculture and Rural Affairs, Nanjing 210031, China; (H.T.); (M.S.)
| | - Mingxia Shen
- Key Laboratory of Breeding Equipment, Ministry of Agriculture and Rural Affairs, Nanjing 210031, China; (H.T.); (M.S.)
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing 210031, China
| | - Longshen Liu
- Key Laboratory of Breeding Equipment, Ministry of Agriculture and Rural Affairs, Nanjing 210031, China; (H.T.); (M.S.)
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing 210031, China
| |
Collapse
|
4
|
Artificial intelligence-based HDX (AI-HDX) prediction reveals fundamental characteristics to protein dynamics: Mechanisms on SARS-CoV-2 immune escape. iScience 2023; 26:106282. [PMID: 36910327 PMCID: PMC9968663 DOI: 10.1016/j.isci.2023.106282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 01/10/2023] [Accepted: 02/23/2023] [Indexed: 03/03/2023] Open
Abstract
Three-dimensional structure and dynamics are essential for protein function. Advancements in hydrogen-deuterium exchange (HDX) techniques enable probing protein dynamic information in physiologically relevant conditions. HDX-coupled mass spectrometry (HDX-MS) has been broadly applied in pharmaceutical industries. However, it is challenging to obtain dynamics information at the single amino acid resolution and time consuming to perform the experiments and process the data. Here, we demonstrate the first deep learning model, artificial intelligence-based HDX (AI-HDX), that predicts intrinsic protein dynamics based on the protein sequence. It uncovers the protein structural dynamics by combining deep learning, experimental HDX, sequence alignment, and protein structure prediction. AI-HDX can be broadly applied to drug discovery, protein engineering, and biomedical studies. As a demonstration, we elucidated receptor-binding domain structural dynamics as a potential mechanism of anti-severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) antibody efficacy and immune escape. AI-HDX fundamentally differs from the current AI tools for protein analysis and may transform protein design for various applications.
Collapse
|
5
|
TMEM244 Is a Long Non-Coding RNA Necessary for CTCL Cell Growth. Int J Mol Sci 2023; 24:ijms24043531. [PMID: 36834942 PMCID: PMC9963807 DOI: 10.3390/ijms24043531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 01/24/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023] Open
Abstract
Transmembrane protein 244 (TMEM244) was annotated to be a member of the TMEM family, which are is a component of cell membranes and is involved in many cellular processes. To date, the expression of the TMEM244 protein has not been experimentally confirmed, and its function has not been clarified. Recently, the expression of the TMEM244 gene was acknowledged to be a diagnostic marker for Sézary syndrome, a rare cutaneous T-cell lymphoma (CTCL). In this study, we aimed to determine the role of the TMEM244 gene in CTCL cells. Two CTCL cell lines were transfected with shRNAs targeting the TMEM244 transcript. The phenotypic effect of TMEM244 knockdown was validated using green fluorescent protein (GFP) growth competition assays and AnnexinV/7AAD staining. Western blot analysis was performed to identify the TMEM244 protein. Our results indicate that TMEM244 is not a protein-coding gene but a long non-coding RNA (lncRNA) that is necessary for the growth of CTCL cells.
Collapse
|
6
|
Sun J, Kulandaisamy A, Liu J, Hu K, Gromiha MM, Zhang Y. Machine learning in computational modelling of membrane protein sequences and structures: From methodologies to applications. Comput Struct Biotechnol J 2023; 21:1205-1226. [PMID: 36817959 PMCID: PMC9932300 DOI: 10.1016/j.csbj.2023.01.036] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 01/16/2023] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
Membrane proteins mediate a wide spectrum of biological processes, such as signal transduction and cell communication. Due to the arduous and costly nature inherent to the experimental process, membrane proteins have long been devoid of well-resolved atomic-level tertiary structures and, consequently, the understanding of their functional roles underlying a multitude of life activities has been hampered. Currently, computational tools dedicated to furthering the structure-function understanding are primarily focused on utilizing intelligent algorithms to address a variety of site-wise prediction problems (e.g., topology and interaction sites), but are scattered across different computing sources. Moreover, the recent advent of deep learning techniques has immensely expedited the development of computational tools for membrane protein-related prediction problems. Given the growing number of applications optimized particularly by manifold deep neural networks, we herein provide a review on the current status of computational strategies mainly in membrane protein type classification, topology identification, interaction site detection, and pathogenic effect prediction. Meanwhile, we provide an overview of how the entire prediction process proceeds, including database collection, data pre-processing, feature extraction, and method selection. This review is expected to be useful for developing more extendable computational tools specific to membrane proteins.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Headington, Oxford OX3 7LD, UK
| | - Arulsamy Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India
| | - Jacklyn Liu
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
| | - Kai Hu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China
| | - M. Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India,Corresponding authors.
| | - Yuan Zhang
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China,Corresponding authors.
| |
Collapse
|
7
|
Supermolecules as a quality markers of herbal medicinal products. Heliyon 2022; 8:e12497. [PMID: 36568034 PMCID: PMC9767884 DOI: 10.1016/j.heliyon.2022.e12497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 11/28/2022] [Accepted: 12/13/2022] [Indexed: 12/24/2022] Open
Abstract
Herbal medicines have greatly contributed to human health worldwide for thousands of years. In particular, traditional Chinese medicine plays an essential role in the prevention and treatment of COVID-19. With the exponentially increasing use and global attention to herbal medicinal products (HMPs), efficacy and safety have become major public concerns in many countries. In general, the quantification and qualification of quality markers (Q-markers) is the most common way to solve this issue. In the last few decades, small molecules, including flavonoids, terpenes, phenylpropanoids, alkaloids, phenols, and glycosides have been extensively investigated as Q-markers for HMP quality control. With the development of biotechnology in the last decade, scientists have begun to explore HMPs macromolecules, including polysaccharides and DNA, for their establishment as Q-markers. In recent years, supermolecules with stronger biological activities have been found in HMPs. In this review, we summarize and discuss the current Q-markers for HMP quality control; in particular, the possibility of using supermolecules as Q-markers based on structure and activity was discussed.
Collapse
|
8
|
Pandey M, Anoosha P, Yesudhas D, Gromiha MM. Identification of potential driver mutations in glioblastoma using machine learning. Brief Bioinform 2022; 23:6764546. [PMID: 36266243 DOI: 10.1093/bib/bbac451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 09/13/2022] [Accepted: 09/22/2022] [Indexed: 12/14/2022] Open
Abstract
Glioblastoma is a fast and aggressively growing tumor in the brain and spinal cord. Mutation of amino acid residues in targets proteins, which are involved in glioblastoma, alters the structure and function and may lead to disease. In this study, we collected a set of 9386 disease-causing (drivers) mutations based on the recurrence in patient samples and experimentally annotated as pathogenic and 8728 as neutral (passenger) mutations. We observed that Arg is highly preferred at the mutant sites of drivers, whereas Met and Ile showed preferences in passengers. Inspecting neighboring residues at the mutant sites revealed that the motifs YP, CP and GRH, are preferred in drivers, whereas SI, IQ and TVI are dominant in neutral. In addition, we have computed other sequence-based features such as conservation scores, Position Specific Scoring Matrices (PSSM) and physicochemical properties, and developed a machine learning-based method, GBMDriver (GlioBlastoma Multiforme Drivers), for distinguishing between driver and passenger mutations. Our method showed an accuracy and AUC of 73.59% and 0.82, respectively, on 10-fold cross-validation and 81.99% and 0.87 in a blind set of 1809 mutants. The tool is available at https://web.iitm.ac.in/bioinfo2/GBMDriver/index.html. We envisage that the present method is helpful to prioritize driver mutations in glioblastoma and assist in identifying therapeutic targets.
Collapse
Affiliation(s)
- Medha Pandey
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - P Anoosha
- Division of Medical Oncology, Department of Internal Medicine, Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio, USA
| | - Dhanusha Yesudhas
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| |
Collapse
|
9
|
Jasni N, Saidin S, Kin WW, Arifin N, Othman N. Entamoeba histolytica: Membrane and Non-Membrane Protein Structure, Function, Immune Response Interaction, and Vaccine Development. MEMBRANES 2022; 12:1079. [PMID: 36363634 PMCID: PMC9695907 DOI: 10.3390/membranes12111079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/26/2022] [Accepted: 10/26/2022] [Indexed: 06/16/2023]
Abstract
Entamoeba histolytica is a protozoan parasite that is the causative agent of amoebiasis. This parasite has caused widespread infection in India, Africa, Mexico, and Central and South America, and results in 100,000 deaths yearly. An immune response is a body's mechanism for eradicating and fighting against substances it sees as harmful or foreign. E. histolytica biological membranes are considered foreign and immunogenic to the human body, thereby initiating the body's immune responses. Understanding immune response and antigen interaction are essential for vaccine development. Thus, this review aims to identify and understand the protein structure, function, and interaction of the biological membrane with the immune response, which could contribute to vaccine development. Furthermore, the current trend of vaccine development studies to combat amoebiasis is also reviewed.
Collapse
Affiliation(s)
- Nurhana Jasni
- Institute for Research in Molecular Medicine (INFORMM), Universiti Sains Malaysia, Gelugor 11800, Malaysia
| | - Syazwan Saidin
- Department of Biology, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, Tanjung Malim 35900, Malaysia
| | - Wong Weng Kin
- School of Health Sciences, Universiti Sains Malaysia, Kubang Kerian 16150, Malaysia
| | - Norsyahida Arifin
- Institute for Research in Molecular Medicine (INFORMM), Universiti Sains Malaysia, Gelugor 11800, Malaysia
| | - Nurulhasanah Othman
- Institute for Research in Molecular Medicine (INFORMM), Universiti Sains Malaysia, Gelugor 11800, Malaysia
| |
Collapse
|
10
|
Pauwels J, Fijałkowska D, Eyckerman S, Gevaert K. Mass spectrometry and the cellular surfaceome. MASS SPECTROMETRY REVIEWS 2022; 41:804-841. [PMID: 33655572 DOI: 10.1002/mas.21690] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 02/05/2021] [Accepted: 02/09/2021] [Indexed: 06/12/2023]
Abstract
The collection of exposed plasma membrane proteins, collectively termed the surfaceome, is involved in multiple vital cellular processes, such as the communication of cells with their surroundings and the regulation of transport across the lipid bilayer. The surfaceome also plays key roles in the immune system by recognizing and presenting antigens, with its possible malfunctioning linked to disease. Surface proteins have long been explored as potential cell markers, disease biomarkers, and therapeutic drug targets. Despite its importance, a detailed study of the surfaceome continues to pose major challenges for mass spectrometry-driven proteomics due to the inherent biophysical characteristics of surface proteins. Their inefficient extraction from hydrophobic membranes to an aqueous medium and their lower abundance compared to intracellular proteins hamper the analysis of surface proteins, which are therefore usually underrepresented in proteomic datasets. To tackle such problems, several innovative analytical methodologies have been developed. This review aims at providing an extensive overview of the different methods for surfaceome analysis, with respective considerations for downstream mass spectrometry-based proteomics.
Collapse
Affiliation(s)
- Jarne Pauwels
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | | | - Sven Eyckerman
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Kris Gevaert
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| |
Collapse
|
11
|
Wandera KG, Alkhnbashi OS, Bassett HVI, Mitrofanov A, Hauns S, Migur A, Backofen R, Beisel CL. Anti-CRISPR prediction using deep learning reveals an inhibitor of Cas13b nucleases. Mol Cell 2022; 82:2714-2726.e4. [PMID: 35649413 DOI: 10.1016/j.molcel.2022.05.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 03/25/2022] [Accepted: 05/03/2022] [Indexed: 11/28/2022]
Abstract
As part of the ongoing bacterial-phage arms race, CRISPR-Cas systems in bacteria clear invading phages whereas anti-CRISPR proteins (Acrs) in phages inhibit CRISPR defenses. Known Acrs have proven extremely diverse, complicating their identification. Here, we report a deep learning algorithm for Acr identification that revealed an Acr against type VI-B CRISPR-Cas systems. The algorithm predicted numerous putative Acrs spanning almost all CRISPR-Cas types and subtypes, including over 7,000 putative type IV and VI Acrs not predicted by other algorithms. By performing a cell-free screen for Acr hits against type VI-B systems, we identified a potent inhibitor of Cas13b nucleases we named AcrVIB1. AcrVIB1 blocks Cas13b-mediated defense against a targeted plasmid and lytic phage, and its inhibitory function principally occurs upstream of ribonucleoprotein complex formation. Overall, our work helps expand the known Acr universe, aiding our understanding of the bacteria-phage arms race and the use of Acrs to control CRISPR technologies.
Collapse
Affiliation(s)
- Katharina G Wandera
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
| | - Omer S Alkhnbashi
- Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
| | - Harris V I Bassett
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
| | | | - Sven Hauns
- Universität Freiburg, 79098 Freiburg, Germany
| | - Anzhela Migur
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
| | - Rolf Backofen
- Universität Freiburg, 79098 Freiburg, Germany; Signalling Research Centres BIOSS and CIBSS, University of Freiburg, 79098 Freiburg, Germany.
| | - Chase L Beisel
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany; Medical Faculty, University of Würzburg, 97080 Würzburg, Germany.
| |
Collapse
|
12
|
Rivera AM, Wilburn DB, Swanson WJ. Domain Expansion and Functional Diversification in Vertebrate Reproductive Proteins. Mol Biol Evol 2022; 39:msac105. [PMID: 35587583 PMCID: PMC9154058 DOI: 10.1093/molbev/msac105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The rapid evolution of fertilization proteins has generated remarkable diversity in molecular structure and function. Glycoproteins of vertebrate egg coats contain multiple zona pellucida (ZP)-N domains (1-6 copies) that facilitate multiple reproductive functions, including species-specific sperm recognition. In this report, we integrate phylogenetics and machine learning to investigate how ZP-N domains diversify in structure and function. The most C-terminal ZP-N domain of each paralog is associated with another domain type (ZP-C), which together form a "ZP module." All modular ZP-N domains are phylogenetically distinct from nonmodular or free ZP-N domains. Machine learning-based classification identifies eight residues that form a stabilizing network in modular ZP-N domains that is absent in free domains. Positive selection is identified in some free ZP-N domains. Our findings support that strong purifying selection has conserved an essential structural core in modular ZP-N domains, with the relaxation of this structural constraint allowing free N-terminal domains to functionally diversify.
Collapse
Affiliation(s)
- Alberto M. Rivera
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Damien B. Wilburn
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Willie J. Swanson
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| |
Collapse
|
13
|
Xia W, Zheng L, Fang J, Li F, Zhou Y, Zeng Z, Zhang B, Li Z, Li H, Zhu F. PFmulDL: a novel strategy enabling multi-class and multi-label protein function annotation by integrating diverse deep learning methods. Comput Biol Med 2022; 145:105465. [PMID: 35366467 DOI: 10.1016/j.compbiomed.2022.105465] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 03/22/2022] [Accepted: 03/25/2022] [Indexed: 02/06/2023]
Abstract
Bioinformatic annotation of protein function is essential but extremely sophisticated, which asks for extensive efforts to develop effective prediction method. However, the existing methods tend to amplify the representativeness of the families with large number of proteins by misclassifying the proteins in the families with small number of proteins. That is to say, the ability of the existing methods to annotate proteins in the 'rare classes' remains limited. Herein, a new protein function annotation strategy, PFmulDL, integrating multiple deep learning methods, was thus constructed. First, the recurrent neural network was integrated, for the first time, with the convolutional neural network to facilitate the function annotation. Second, a transfer learning method was introduced to the model construction for further improving the prediction performances. Third, based on the latest data of Gene Ontology, the newly constructed model could annotate the largest number of protein families comparing with the existing methods. Finally, this newly constructed model was found capable of significantly elevating the prediction performance for the 'rare classes' without sacrificing that for the 'major classes'. All in all, due to the emerging requirements on improving the prediction performance for the proteins in 'rare classes', this new strategy would become an essential complement to the existing methods for protein function prediction. All the models and source codes are freely available and open to all users at: https://github.com/idrblab/PFmulDL.
Collapse
Affiliation(s)
- Weiqi Xia
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China; Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Jiebin Fang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Ying Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Zhenyu Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Honglin Li
- School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China; Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.
| |
Collapse
|
14
|
Nallasamy V, Seshiah M. Protein Structure Prediction Using Quantile Dragonfly and Structural Class-Based Deep Learning. INT J PATTERN RECOGN 2022. [DOI: 10.1142/s021800142250015x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Predicting three-dimensional structure of a protein in the field of computational molecular biology has received greater attention. Most of the recent research works aimed at exploring search space, however with the increasing nature and size of data, protein structure identification and prediction are still in the preliminary stage. This work is aimed at exploring search space to tackle protein structure prediction with minimum execution time and maximum accuracy by means of quantile regressive dragonfly and structural class homolog-based deep learning (QRD-SCHDL). The proposed QRD-SCHDL method consists of two distinct steps. They are protein structure identification and prediction. In the first step, protein structure identification is performed by means of QRD optimization model to identify protein structure with minimum error. Here the protein structure identification is first performed as the raw database contains sequence information and does not contain structural information. An optimization model is designed to obtain the structural information from the database. However, protein structure gives much more insight than its sequence. Therefore, to perform computational prediction of protein structure from its sequence, actual protein structure prediction is made. The second step involves the actual protein structure prediction via structural class and homolog-based deep learning. For each protein structure prediction, a scoring matrix is obtained by utilizing structural class maximum correlation coefficient. Finally, the proposed method is tested on a set of different unique numbers of protein data and compared to the state-of-the-art methods. The obtained results showed the potentiality of the proposed method in terms of metrics, error rate, protein structure prediction time, protein structure prediction accuracy, precision, specificity, recall, ROC, Kappa coefficient and [Formula: see text]-measure, respectively. It also shows that the proposed QRD-SCHDL method attains comparable results and outperformed in certain cases, thereby signifying the efficiency of the proposed work.
Collapse
Affiliation(s)
- Varanavasi Nallasamy
- Department of Computer Science, Periyar University, Salem-636011, Tamil Nadu, India
| | - Malarvizhi Seshiah
- Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram-637401, Namakkal, Tamil Nadu, India
| |
Collapse
|
15
|
Chatzigoulas A, Cournia Z. Predicting protein–membrane interfaces of peripheral membrane proteins using ensemble machine learning. Brief Bioinform 2022; 23:6527274. [PMID: 35152294 PMCID: PMC8921665 DOI: 10.1093/bib/bbab518] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 10/23/2021] [Accepted: 11/12/2021] [Indexed: 12/13/2022] Open
Abstract
Abstract
Abnormal protein–membrane attachment is involved in deregulated cellular pathways and in disease. Therefore, the possibility to modulate protein–membrane interactions represents a new promising therapeutic strategy for peripheral membrane proteins that have been considered so far undruggable. A major obstacle in this drug design strategy is that the membrane-binding domains of peripheral membrane proteins are usually unknown. The development of fast and efficient algorithms predicting the protein–membrane interface would shed light into the accessibility of membrane–protein interfaces by drug-like molecules. Herein, we describe an ensemble machine learning methodology and algorithm for predicting membrane-penetrating amino acids. We utilize available experimental data from the literature for training 21 machine learning classifiers and meta-classifiers. Evaluation of the best ensemble classifier model accuracy yields a macro-averaged F1 score = 0.92 and a Matthews correlation coefficient = 0.84 for predicting correctly membrane-penetrating amino acids on unknown proteins of a validation set. The python code for predicting protein–membrane interfaces of peripheral membrane proteins is available at https://github.com/zoecournia/DREAMM.
Collapse
Affiliation(s)
- Alexios Chatzigoulas
- Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephessiou, 11527 Athens, Greece
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 15784 Athens, Greece
| | - Zoe Cournia
- Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephessiou, 11527 Athens, Greece
| |
Collapse
|
16
|
Abstract
Membrane proteins (MPs) play essential roles in numerous cellular processes. Because around 70% of the currently marketed drugs target MPs, a detailed understanding of their structure, binding properties, and functional dynamics in a physiologically relevant environment is crucial for a more detailed understanding of this important protein class. We here summarize the benefits of using lipid nanodiscs for NMR structural investigations and provide a detailed overview of the currently used lipid nanodisc systems as well as their applications in solution-state NMR. Despite the increasing use of other structural methods for the structure determination of MPs in lipid nanodiscs, solution NMR turns out to be a versatile tool to probe a wide range of MP features, ranging from the structure determination of small to medium-sized MPs to probing ligand and partner protein binding as well as functionally relevant dynamical signatures in a lipid nanodisc setting. We will expand on these topics by discussing recent NMR studies with lipid nanodiscs and work out a key workflow for optimizing the nanodisc incorporation of an MP for subsequent NMR investigations. With this, we hope to provide a comprehensive background to enable an informed assessment of the applicability of lipid nanodiscs for NMR studies of a particular MP of interest.
Collapse
Affiliation(s)
- Umut Günsel
- Bavarian NMR Center (BNMRZ) at the Department of Chemistry, Technical University of Munich, Ernst-Otto-Fischer-Strasse 2, 85748 Garching, Germany
| | - Franz Hagn
- Bavarian NMR Center (BNMRZ) at the Department of Chemistry, Technical University of Munich, Ernst-Otto-Fischer-Strasse 2, 85748 Garching, Germany.,Institute of Structural Biology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| |
Collapse
|
17
|
iEnhancer-RD: Identification of enhancers and their strength using RKPK features and deep neural networks. Anal Biochem 2021; 630:114318. [PMID: 34364858 DOI: 10.1016/j.ab.2021.114318] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 07/02/2021] [Accepted: 07/27/2021] [Indexed: 11/20/2022]
Abstract
Enhancers are regulatory elements involved in gene expression.It is a part of DNA, which can enhance the transcription rate of gene. However, the identification of enhancer by biological experimental methods is time-consuming and expensive. Therefore, there is an urgent need for more efficient methods to identify them.In this study, we propose a new feature extraction method RKPK, which combines three feature methods and uses the recursive feature elimination algorithm for feature selection, and apply deep neural network as classifier to construct the iEnhancer-RD calculation method for enhancer identification. It is a two-layer classification architecture in which the first layer(layer I) identifies enhancers from a set of DNA sequences, and the second layer(layer II) divides the identified enhancers into two subgroups, namely strong and weak enhancers. Independent dataset test indicates that the proposed method is significantly better than most existing methods, and attains the accuracy of 78.8% and 70.5% in the two layers, respectively. Our iEnhancer-RD architecture is implemented in Python and is available at https://github.com/YangHuan639/iEnhancer-RD.
Collapse
|
18
|
Zhu W, Dong F, Hou B, Kenniard Takudzwa Gwatidzo W, Zhou L, Li G. Segmenting the Semi-Conductive Shielding Layer of Cable Slice Images Using the Convolutional Neural Network. Polymers (Basel) 2020; 12:E2085. [PMID: 32937761 PMCID: PMC7569897 DOI: 10.3390/polym12092085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/02/2020] [Accepted: 09/08/2020] [Indexed: 11/25/2022] Open
Abstract
Being an important part of aerial insulated cable, the semiconductive shielding layer is made of a typical polymer material and can improve the cable transmission effects; the structural parameters will affect the cable quality directly. Then, the image processing of the semiconductive layer plays an essential role in the structural parameter measurements. However, the semiconductive layer images are often disturbed by the cutting marks, which affect the measurements seriously. In this paper, a novel method based on the convolutional neural network is proposed for image segmentation. In our proposed strategy, a deep fully convolutional network with a skip connection algorithm is defined as the main framework. The inception structure and residual connection are employed to fuse features extracted from the receptive fields with different sizes. Finally, an improved weighted loss function and refined algorithm are utilized for pixel classification. Experimental results show that our proposed algorithm achieves better performance than the current algorithms.
Collapse
Affiliation(s)
| | | | - Beiping Hou
- School of Automation and Electrical Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China; (W.Z.); (F.D.); (W.K.T.G.); (L.Z.); (G.L.)
| | | | | | | |
Collapse
|