1
|
Leyland B, Novichkova E, Dolui AK, Jallet D, Daboussi F, Legeret B, Li Z, Li-Beisson Y, Boussiba S, Khozin-Goldberg I. Acyl-CoA binding protein is required for lipid droplet degradation in the diatom Phaeodactylum tricornutum. PLANT PHYSIOLOGY 2024; 194:958-981. [PMID: 37801606 DOI: 10.1093/plphys/kiad525] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 06/28/2023] [Accepted: 07/15/2023] [Indexed: 10/08/2023]
Abstract
Diatoms (Bacillariophyceae) accumulate neutral storage lipids in lipid droplets during stress conditions, which can be rapidly degraded and recycled when optimal conditions resume. Since nutrient and light availability fluctuate in marine environments, storage lipid turnover is essential for diatom dominance of marine ecosystems. Diatoms have garnered attention for their potential to provide a sustainable source of omega-3 fatty acids. Several independent proteomic studies of lipid droplets isolated from the model oleaginous pennate diatom Phaeodactylum tricornutum have identified a previously uncharacterized protein with an acyl-CoA binding (ACB) domain, Phatrdraft_48778, here referred to as Phaeodactylum tricornutum acyl-CoA binding protein (PtACBP). We report the phenotypic effects of CRISPR-Cas9 targeted genome editing of PtACBP. ptacbp mutants were defective in lipid droplet and triacylglycerol degradation, as well as lipid and eicosapentaenoic acid synthesis, during recovery from nitrogen starvation. Transcription of genes responsible for peroxisomal β-oxidation, triacylglycerol lipolysis, and eicosapentaenoic acid synthesis was inhibited. A lipid-binding assay using a synthetic ACB domain from PtACBP indicated preferential binding specificity toward certain polar lipids. PtACBP fused to eGFP displayed an endomembrane-like pattern, which surrounded the periphery of lipid droplets. PtACBP is likely responsible for intracellular acyl transport, affecting cell division, development, photosynthesis, and stress response. A deeper understanding of the molecular mechanisms governing storage lipid turnover will be crucial for developing diatoms and other microalgae as biotechnological cell factories.
Collapse
Affiliation(s)
- Ben Leyland
- The Microalgal Biotechnology Laboratory, The French Associates Institute for Agriculture and Biotechnology, Jacob Blaustein Institute for Desert Research, Ben-Gurion University of the Negev, Sede Boker Campus 84990, Israel
| | - Ekaterina Novichkova
- The Microalgal Biotechnology Laboratory, The French Associates Institute for Agriculture and Biotechnology, Jacob Blaustein Institute for Desert Research, Ben-Gurion University of the Negev, Sede Boker Campus 84990, Israel
| | - Achintya Kumar Dolui
- The Microalgal Biotechnology Laboratory, The French Associates Institute for Agriculture and Biotechnology, Jacob Blaustein Institute for Desert Research, Ben-Gurion University of the Negev, Sede Boker Campus 84990, Israel
| | - Denis Jallet
- Toulouse Biotechnology Institute Bio & Chemical Engineering, Institut National de la Recherche Agronomique, Institute National Des Sciences Appliquees, Le Centre national de la recherche scientifique, Toulouse 31077, France
| | - Fayza Daboussi
- Toulouse Biotechnology Institute Bio & Chemical Engineering, Institut National de la Recherche Agronomique, Institute National Des Sciences Appliquees, Le Centre national de la recherche scientifique, Toulouse 31077, France
| | - Bertrand Legeret
- Aix-Marseille University, CEA, CNRS, BIAM, Institut de Biosciences et Biotechnologies Aix-Marseille, CEA Cadarache, Saint Paul-Lez-Durance 13108, France
| | - Zhongze Li
- Aix-Marseille University, CEA, CNRS, BIAM, Institut de Biosciences et Biotechnologies Aix-Marseille, CEA Cadarache, Saint Paul-Lez-Durance 13108, France
| | - Yonghua Li-Beisson
- Aix-Marseille University, CEA, CNRS, BIAM, Institut de Biosciences et Biotechnologies Aix-Marseille, CEA Cadarache, Saint Paul-Lez-Durance 13108, France
| | - Sammy Boussiba
- The Microalgal Biotechnology Laboratory, The French Associates Institute for Agriculture and Biotechnology, Jacob Blaustein Institute for Desert Research, Ben-Gurion University of the Negev, Sede Boker Campus 84990, Israel
| | - Inna Khozin-Goldberg
- The Microalgal Biotechnology Laboratory, The French Associates Institute for Agriculture and Biotechnology, Jacob Blaustein Institute for Desert Research, Ben-Gurion University of the Negev, Sede Boker Campus 84990, Israel
| |
Collapse
|
2
|
Zhang L, Xiao K, Wang X, Kong L. A novel fusion technology utilizing complex network and sequence information for FAD-binding site identification. Anal Biochem 2024; 685:115401. [PMID: 37981176 DOI: 10.1016/j.ab.2023.115401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 11/08/2023] [Accepted: 11/14/2023] [Indexed: 11/21/2023]
Abstract
Flavin adenine dinucleotide (FAD) binding sites play an increasingly important role as useful targets for inhibiting bacterial infections. To reveal protein topological structural information as a reasonable complement for the identification FAD-binding sites, we designed a novel fusion technology according to sequence and complex network. The specially designed feature vectors were combined and fed into CatBoost for model construction. Moreover, due to the minority class (positive samples) is more significant for biological researches, a random under-sampling technique was applied to solve the imbalance. Compared with the previous methods, our methods achieved the best results for two independent test datasets. Especially, the MCC obtained by FADsite and FADsite_seq were 14.37 %-53.37 % and 21.81 %-60.81 % higher than the results of existing methods on Test6; and they showed improvements ranging from 6.03 % to 21.96 % and 19.77 %-35.70 % on Test4. Meanwhile, statistical tests show that our methods significantly differ from the state-of-the-art methods and the cross-entropy loss shows that our methods have high certainty. The excellent results demonstrated the effectiveness of using sequence and complex network information in identifying FAD-binding sites. It may be complementary to other biological studies. The data and resource codes are available at https://github.com/Kangxiaoneuq/FADsite.
Collapse
Affiliation(s)
- Lichao Zhang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China; Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, PR China
| | - Kang Xiao
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Xueting Wang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Liang Kong
- Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, PR China; School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao, PR China.
| |
Collapse
|
3
|
Hong X, Lv J, Li Z, Xiong Y, Zhang J, Chen HF. Sequence-based machine learning method for predicting the effects of phosphorylation on protein-protein interactions. Int J Biol Macromol 2023; 243:125233. [PMID: 37290543 DOI: 10.1016/j.ijbiomac.2023.125233] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 06/02/2023] [Accepted: 06/03/2023] [Indexed: 06/10/2023]
Abstract
Protein phosphorylation, catalyzed by kinases, is an important biochemical process, which plays an essential role in multiple cell signaling pathways. Meanwhile, protein-protein interactions (PPI) constitute the signaling pathways. Abnormal phosphorylation status on protein can regulate protein functions through PPI to evoke severe diseases, such as Cancer and Alzheimer's disease. Due to the limited experimental evidence and high costs to experimentally identify novel evidence of phosphorylation regulation on PPI, it is necessary to develop a high-accuracy and user-friendly artificial intelligence method to predict phosphorylation effect on PPI. Here, we proposed a novel sequence-based machine learning method named PhosPPI, which achieved better identification performance (Accuracy and AUC) than other competing predictive methods of Betts, HawkDock and FoldX. PhosPPI is now freely available in web server (https://phosppi.sjtu.edu.cn/). This tool can help the user to identify functional phosphorylation sites affecting PPI and explore phosphorylation-associated disease mechanism and drug development.
Collapse
Affiliation(s)
- Xiaokun Hong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jiyang Lv
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhengxin Li
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jian Zhang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao-Tong University School of Medicine (SJTU-SM), Shanghai 200025, China.
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China.
| |
Collapse
|
4
|
Peng Z, Li Z, Meng Q, Zhao B, Kurgan L. CLIP: accurate prediction of disordered linear interacting peptides from protein sequences using co-evolutionary information. Brief Bioinform 2023; 24:6858950. [PMID: 36458437 DOI: 10.1093/bib/bbac502] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 09/30/2022] [Accepted: 10/24/2022] [Indexed: 12/04/2022] Open
Abstract
One of key features of intrinsically disordered regions (IDRs) is facilitation of protein-protein and protein-nucleic acids interactions. These disordered binding regions include molecular recognition features (MoRFs), short linear motifs (SLiMs) and longer binding domains. Vast majority of current predictors of disordered binding regions target MoRFs, with a handful of methods that predict SLiMs and disordered protein-binding domains. A new and broader class of disordered binding regions, linear interacting peptides (LIPs), was introduced recently and applied in the MobiDB resource. LIPs are segments in protein sequences that undergo disorder-to-order transition upon binding to a protein or a nucleic acid, and they cover MoRFs, SLiMs and disordered protein-binding domains. Although current predictors of MoRFs and disordered protein-binding regions could be used to identify some LIPs, there are no dedicated sequence-based predictors of LIPs. To this end, we introduce CLIP, a new predictor of LIPs that utilizes robust logistic regression model to combine three complementary types of inputs: co-evolutionary information derived from multiple sequence alignments, physicochemical profiles and disorder predictions. Ablation analysis suggests that the co-evolutionary information is particularly useful for this prediction and that combining the three inputs provides substantial improvements when compared to using these inputs individually. Comparative empirical assessments using low-similarity test datasets reveal that CLIP secures area under receiver operating characteristic curve (AUC) of 0.8 and substantially improves over the results produced by the closest current tools that predict MoRFs and disordered protein-binding regions. The webserver of CLIP is freely available at http://biomine.cs.vcu.edu/servers/CLIP/ and the standalone code can be downloaded from http://yanglab.qd.sdu.edu.cn/download/CLIP/.
Collapse
Affiliation(s)
- Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China.,Frontier Science Center for Nonlinear Expectations, Ministry of Education, Qingdao, 266237, China
| | - Zixia Li
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Qiaozhen Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
5
|
Yuan T, Werman JM, Yin X, Yang M, Garcia-Diaz M, Sampson NS. Enzymatic β-Oxidation of the Cholesterol Side Chain in Mycobacterium tuberculosis Bifurcates Stereospecifically at Hydration of 3-Oxo-cholest-4,22-dien-24-oyl-CoA. ACS Infect Dis 2021; 7:1739-1751. [PMID: 33826843 PMCID: PMC8204306 DOI: 10.1021/acsinfecdis.1c00069] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
![]()
The unique ability
of Mycobacterium tuberculosis (Mtb) to utilize host
lipids such as cholesterol for survival, persistence,
and virulence has made the metabolic pathway of cholesterol an area
of great interest for therapeutics development. Herein, we identify
and characterize two genes from the Cho-region (genomic locus responsible
for cholesterol catabolism) of the Mtb genome, chsH3 (Rv3538) and chsB1 (Rv3502c). Their protein products
catalyze two sequential stereospecific hydration and dehydrogenation
steps in the β-oxidation of the cholesterol side chain. ChsH3
favors the 22S hydration of 3-oxo-cholest-4,22-dien-24-oyl-CoA
in contrast to the previously reported EchA19 (Rv3516), which catalyzes
formation of the (22R)-hydroxy-3-oxo-cholest-4-en-24-oyl-CoA
from the same enoyl-CoA substrate. ChsB1 is stereospecific and catalyzes
dehydrogenation of the ChsH3 product but not the EchA19 product. The
X-ray crystallographic structure of the ChsB1 apo-protein was determined
at a resolution of 2.03 Å, and the holo-enzyme with bound NAD+ cofactor was determined at a resolution of 2.21 Å. The
homodimeric structure is representative of a classical NAD+-utilizing short-chain type alcohol dehydrogenase/reductase, including
a Rossmann-fold motif, but exhibits a unique substrate binding site
architecture that is of greater length and width than its homologous
counterparts, likely to accommodate the bulky steroid substrate. Intriguingly,
Mtb utilizes hydratases from the MaoC-like family in sterol side-chain
catabolism in contrast to fatty acid β-oxidation in other species
that utilize the evolutionarily distinct crotonase family of hydratases.
Collapse
Affiliation(s)
- Tianao Yuan
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794-3400, United States
| | - Joshua M. Werman
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794-3400, United States
| | - Xingyu Yin
- Biochemistry and Structural Biology Graduate Program, Stony Brook University, Stony Brook, New York 11794-5215, United States
| | - Meng Yang
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794-3400, United States
| | - Miguel Garcia-Diaz
- Department of Pharmacological Sciences, Stony Brook University, Stony Brook, New York 11794-8651, United States
| | - Nicole S. Sampson
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794-3400, United States
| |
Collapse
|
6
|
Su H, Peng Z, Yang J. Recognition of small molecule-RNA binding sites using RNA sequence and structure. Bioinformatics 2021; 37:36-42. [PMID: 33416863 PMCID: PMC8034527 DOI: 10.1093/bioinformatics/btaa1092] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 12/12/2020] [Accepted: 12/23/2020] [Indexed: 11/22/2022] Open
Abstract
Motivation RNA molecules become attractive small molecule drug targets to treat disease in recent years. Computer-aided drug design can be facilitated by detecting the RNA sites that bind small molecules. However, very limited progress has been reported for the prediction of small molecule–RNA binding sites. Results We developed a novel method RNAsite to predict small molecule–RNA binding sites using sequence profile- and structure-based descriptors. RNAsite was shown to be competitive with the state-of-the-art methods on the experimental structures of two independent test sets. When predicted structure models were used, RNAsite outperforms other methods by a large margin. The possibility of improving RNAsite by geometry-based binding pocket detection was investigated. The influence of RNA structure’s flexibility and the conformational changes caused by ligand binding on RNAsite were also discussed. RNAsite is anticipated to be a useful tool for the design of RNA-targeting small molecule drugs. Availability and implementation http://yanglab.nankai.edu.cn/RNAsite. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hong Su
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| |
Collapse
|
7
|
Fan BL, Jiang Z, Sun J, Liu R. Systematic characterization and prediction of coenzyme A-associated proteins using sequence and network information. Brief Bioinform 2020; 22:6012866. [PMID: 33253385 DOI: 10.1093/bib/bbaa308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 09/08/2020] [Accepted: 10/12/2020] [Indexed: 01/11/2023] Open
Abstract
Coenzyme A-associated proteins (CAPs) are a category of functionally important proteins involved in multiple biological processes through interactions with coenzyme A (CoA). To date, unfortunately, the specific differences between CAPs and other proteins have yet to be systemically investigated. Moreover, there are no computational methods that can be used specifically to predict these proteins. Herein, we characterized CAPs from multifaceted viewpoints and revealed their specific preferences. Compared with other proteins, CAPs were more likely to possess binding regions for CoA and its derivatives, were evolutionarily highly conserved, exhibited ordered and hydrophobic structural conformations, and tended to be densely located in protein-protein interaction networks. Based on these biological insights, we built seven classifiers using predicted CoA-binding residue distributions, word embedding vectors, remote homolog numbers, evolutionary conservation, amino acid composition, predicted structural features and network properties. These classifiers could effectively identify CAPs in Homo sapiens, Mus musculus and Arabidopsis thaliana. The complementarity among the individual classifiers prompted us to build a two-layer stacking model named CAPE for improving prediction performance. We applied CAPE to identify some high-confidence candidates in the three species, which were tightly associated with the known functions of CAPs. Finally, we extended our algorithm to cross-species prediction, thereby developing a generic CAP prediction model. In summary, this work provides a comprehensive survey and an effective predictor for CAPs, which can help uncover the interplay between CoA and functionally relevant proteins.
Collapse
Affiliation(s)
- Bing-Liang Fan
- College of Informatics, Huazhong Agricultural University
| | - Zheng Jiang
- College of Informatics, Huazhong Agricultural University
| | - Jun Sun
- College of Informatics, Huazhong Agricultural University
| | - Rong Liu
- College of Informatics, Huazhong Agricultural University
| |
Collapse
|
8
|
Xia CQ, Pan X, Shen HB. Protein-ligand binding residue prediction enhancement through hybrid deep heterogeneous learning of sequence and structure data. Bioinformatics 2020; 36:3018-3027. [PMID: 32091580 DOI: 10.1093/bioinformatics/btaa110] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 01/19/2020] [Accepted: 02/18/2020] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION Knowledge of protein-ligand binding residues is important for understanding the functions of proteins and their interaction mechanisms. From experimentally solved protein structures, how to accurately identify its potential binding sites of a specific ligand on the protein is still a challenging problem. Compared with structure-alignment-based methods, machine learning algorithms provide an alternative flexible solution which is less dependent on annotated homogeneous protein structures. Several factors are important for an efficient protein-ligand prediction model, e.g. discriminative feature representation and effective learning architecture to deal with both the large-scale and severely imbalanced data. RESULTS In this study, we propose a novel deep-learning-based method called DELIA for protein-ligand binding residue prediction. In DELIA, a hybrid deep neural network is designed to integrate 1D sequence-based features with 2D structure-based amino acid distance matrices. To overcome the problem of severe data imbalance between the binding and nonbinding residues, strategies of oversampling in mini-batch, random undersampling and stacking ensemble are designed to enhance the model. Experimental results on five benchmark datasets demonstrate the effectiveness of proposed DELIA pipeline. AVAILABILITY AND IMPLEMENTATION The web server of DELIA is available at www.csbio.sjtu.edu.cn/bioinf/delia/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chun-Qiu Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| |
Collapse
|
9
|
Guo Z, Hou J, Cheng J. DNSS2: Improved ab initio protein secondary structure prediction using advanced deep learning architectures. Proteins 2020; 89:207-217. [PMID: 32893403 DOI: 10.1002/prot.26007] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 07/07/2020] [Accepted: 09/02/2020] [Indexed: 12/27/2022]
Abstract
Accurate prediction of protein secondary structure (alpha-helix, beta-strand and coil) is a crucial step for protein inter-residue contact prediction and ab initio tertiary structure prediction. In a previous study, we developed a deep belief network-based protein secondary structure method (DNSS1) and successfully advanced the prediction accuracy beyond 80%. In this work, we developed multiple advanced deep learning architectures (DNSS2) to further improve secondary structure prediction. The major improvements over the DNSS1 method include (a) designing and integrating six advanced one-dimensional deep convolutional/recurrent/residual/memory/fractal/inception networks to predict 3-state and 8-state secondary structure, and (b) using more sensitive profile features inferred from Hidden Markov model (HMM) and multiple sequence alignment (MSA). Most of the deep learning architectures are novel for protein secondary structure prediction. DNSS2 was systematically benchmarked on independent test data sets with eight state-of-art tools and consistently ranked as one of the best methods. Particularly, DNSS2 was tested on the protein targets of 2018 CASP13 experiment and achieved the Q3 score of 81.62%, SOV score of 72.19%, and Q8 score of 73.28%. DNSS2 is freely available at: https://github.com/multicom-toolbox/DNSS2.
Collapse
Affiliation(s)
- Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
10
|
Liu Y, Munteanu CR, Kong Z, Ran T, Sahagún-Ruiz A, He Z, Zhou C, Tan Z. Identification of coenzyme-binding proteins with machine learning algorithms. Comput Biol Chem 2019; 79:185-192. [PMID: 30851647 DOI: 10.1016/j.compbiolchem.2019.01.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2018] [Revised: 09/11/2018] [Accepted: 01/25/2019] [Indexed: 01/12/2023]
Abstract
The coenzyme-binding proteins play a vital role in the cellular metabolism processes, such as fatty acid biosynthesis, enzyme and gene regulation, lipid synthesis, particular vesicular traffic, and β-oxidation donation of acyl-CoA esters. Based on the theory of Star Graph Topological Indices (SGTIs) of protein primary sequences, we proposed a method to develop a first classification model for predicting protein with coenzyme-binding properties. To simulate the properties of coenzyme-binding proteins, we created a dataset containing 2897 proteins, among 456 proteins functioned as coenzyme-binding activity. The SGTIs of peptide sequence were calculated with Sequence to Star Network (S2SNet) application. We used the SGTIs as inputs to several classification techniques with a machine learning software - Weka. A Random Forest classifier based on 3 features of the embedded and non-embedded graphs was identified as the best predictive model for coenzyme-binding proteins. This model developed was with the true positive (TP) rate of 91.7%, false positive (FP) rate of 7.6%, and Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.971. The prediction of new coenzyme-binding activity proteins using this model could be useful for further drug development or enzyme metabolism researches.
Collapse
Affiliation(s)
- Yong Liu
- Key Laboratory for Agro-Ecological Processes in Subtropical Region, National Engineering Laboratory for Pollution Control and Waste Utilization in Livestock and Poultry Production, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, Institute of Subtropical Agriculture, The Chinese Academy of Sciences, Changsha, Hunan, 410125, PR China; Hunan Co-Innovation Center of Animal Production Safety, CICAPS, Changsha, Hunan, 410128, PR China
| | - Cristian R Munteanu
- RNASA-IMEDIR, Computer Science Faculty, University of A Coruna, A Coruña, Spain; Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), A Coruña, 15006, Spain
| | - Zhiwei Kong
- Key Laboratory for Agro-Ecological Processes in Subtropical Region, National Engineering Laboratory for Pollution Control and Waste Utilization in Livestock and Poultry Production, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, Institute of Subtropical Agriculture, The Chinese Academy of Sciences, Changsha, Hunan, 410125, PR China; University of the Chinese Academy of Sciences, Beijing, 100049, PR China
| | - Tao Ran
- Key Laboratory for Agro-Ecological Processes in Subtropical Region, National Engineering Laboratory for Pollution Control and Waste Utilization in Livestock and Poultry Production, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, Institute of Subtropical Agriculture, The Chinese Academy of Sciences, Changsha, Hunan, 410125, PR China; Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Lethbridge, Alberta, T1J 4B1, Canada
| | - Alfredo Sahagún-Ruiz
- Department of Microbiology and Immunology, Faculty of Veterinary Medicine and Animal Science, National Autonomous University of Mexico, Universidad 3000, Copilco Coyoacán, CP 04510, México D.F., Mexico
| | - Zhixiong He
- Key Laboratory for Agro-Ecological Processes in Subtropical Region, National Engineering Laboratory for Pollution Control and Waste Utilization in Livestock and Poultry Production, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, Institute of Subtropical Agriculture, The Chinese Academy of Sciences, Changsha, Hunan, 410125, PR China; Hunan Co-Innovation Center of Animal Production Safety, CICAPS, Changsha, Hunan, 410128, PR China.
| | - Chuanshe Zhou
- Key Laboratory for Agro-Ecological Processes in Subtropical Region, National Engineering Laboratory for Pollution Control and Waste Utilization in Livestock and Poultry Production, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, Institute of Subtropical Agriculture, The Chinese Academy of Sciences, Changsha, Hunan, 410125, PR China; Hunan Co-Innovation Center of Animal Production Safety, CICAPS, Changsha, Hunan, 410128, PR China
| | - Zhiliang Tan
- Key Laboratory for Agro-Ecological Processes in Subtropical Region, National Engineering Laboratory for Pollution Control and Waste Utilization in Livestock and Poultry Production, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, Institute of Subtropical Agriculture, The Chinese Academy of Sciences, Changsha, Hunan, 410125, PR China; Hunan Co-Innovation Center of Animal Production Safety, CICAPS, Changsha, Hunan, 410128, PR China
| |
Collapse
|
11
|
Su H, Liu M, Sun S, Peng Z, Yang J. Improving the prediction of protein–nucleic acids binding residues via multiple sequence profiles and the consensus of complementary methods. Bioinformatics 2018; 35:930-936. [DOI: 10.1093/bioinformatics/bty756] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Revised: 08/02/2018] [Accepted: 08/28/2018] [Indexed: 12/31/2022] Open
Affiliation(s)
- Hong Su
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Mengchen Liu
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Saisai Sun
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| |
Collapse
|
12
|
Liu Y, Wang X, Liu B. IDP⁻CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields. Int J Mol Sci 2018; 19:E2483. [PMID: 30135358 PMCID: PMC6164615 DOI: 10.3390/ijms19092483] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 08/14/2018] [Accepted: 08/18/2018] [Indexed: 12/16/2022] Open
Abstract
Accurate prediction of intrinsically disordered proteins/regions is one of the most important tasks in bioinformatics, and some computational predictors have been proposed to solve this problem. How to efficiently incorporate the sequence-order effect is critical for constructing an accurate predictor because disordered region distributions show global sequence patterns. In order to capture these sequence patterns, several sequence labelling models have been applied to this field, such as conditional random fields (CRFs). However, these methods suffer from certain disadvantages. In this study, we proposed a new computational predictor called IDP⁻CRF, which is trained on an updated benchmark dataset based on the MobiDB database and the DisProt database, and incorporates more comprehensive sequence-based features, including PSSMs (position-specific scoring matrices), kmer, predicted secondary structures, and relative solvent accessibilities. Experimental results on the benchmark dataset and two independent datasets show that IDP⁻CRF outperforms 25 existing state-of-the-art methods in this field, demonstrating that IDP⁻CRF is a very useful tool for identifying IDPs/IDRs (intrinsically disordered proteins/regions). We anticipate that IDP⁻CRF will facilitate the development of protein sequence analysis.
Collapse
Affiliation(s)
- Yumeng Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, Guangdong, China.
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, Guangdong, China.
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, Guangdong, China.
| |
Collapse
|
13
|
Zhao Z, Peng Z, Yang J. Improving Sequence-Based Prediction of Protein–Peptide Binding Residues by Introducing Intrinsic Disorder and a Consensus Method. J Chem Inf Model 2018; 58:1459-1468. [DOI: 10.1021/acs.jcim.8b00019] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Zijuan Zhao
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| |
Collapse
|