1
|
Kiseleva OI, Arzumanian VA, Kurbatov IY, Poverennaya EV. In silico and in cellulo approaches for functional annotation of human protein splice variants. BIOMEDITSINSKAIA KHIMIIA 2024; 70:315-328. [PMID: 39324196 DOI: 10.18097/pbmc20247005315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
The elegance of pre-mRNA splicing mechanisms continues to interest scientists even after over a half century, since the discovery of the fact that coding regions in genes are interrupted by non-coding sequences. The vast majority of human genes have several mRNA variants, coding structurally and functionally different protein isoforms in a tissue-specific manner and with a linkage to specific developmental stages of the organism. Alteration of splicing patterns shifts the balance of functionally distinct proteins in living systems, distorts normal molecular pathways, and may trigger the onset and progression of various pathologies. Over the past two decades, numerous studies have been conducted in various life sciences disciplines to deepen our understanding of splicing mechanisms and the extent of their impact on the functioning of living systems. This review aims to summarize experimental and computational approaches used to elucidate the functions of splice variants of a single gene based on our experience accumulated in the laboratory of interactomics of proteoforms at the Institute of Biomedical Chemistry (IBMC) and best global practices.
Collapse
Affiliation(s)
- O I Kiseleva
- Institute of Biomedical Chemistry, Moscow, Russia
| | | | | | | |
Collapse
|
2
|
Yuan Q, Tian C, Song Y, Ou P, Zhu M, Zhao H, Yang Y. GPSFun: geometry-aware protein sequence function predictions with language models. Nucleic Acids Res 2024; 52:W248-W255. [PMID: 38738636 PMCID: PMC11223820 DOI: 10.1093/nar/gkae381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 04/22/2024] [Accepted: 04/26/2024] [Indexed: 05/14/2024] Open
Abstract
Knowledge of protein function is essential for elucidating disease mechanisms and discovering new drug targets. However, there is a widening gap between the exponential growth of protein sequences and their limited function annotations. In our prior studies, we have developed a series of methods including GraphPPIS, GraphSite, LMetalSite and SPROF-GO for protein function annotations at residue or protein level. To further enhance their applicability and performance, we now present GPSFun, a versatile web server for Geometry-aware Protein Sequence Function annotations, which equips our previous tools with language models and geometric deep learning. Specifically, GPSFun employs large language models to efficiently predict 3D conformations of the input protein sequences and extract informative sequence embeddings. Subsequently, geometric graph neural networks are utilized to capture the sequence and structure patterns in the protein graphs, facilitating various downstream predictions including protein-ligand binding sites, gene ontologies, subcellular locations and protein solubility. Notably, GPSFun achieves superior performance to state-of-the-art methods across diverse tasks without requiring multiple sequence alignments or experimental protein structures. GPSFun is freely available to all users at https://bio-web1.nscc-gz.cn/app/GPSFun with user-friendly interfaces and rich visualizations.
Collapse
Affiliation(s)
- Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Chong Tian
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Yidong Song
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Peihua Ou
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Mingming Zhu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| |
Collapse
|
3
|
Bonello J, Orengo C. FunPredCATH: An ensemble method for predicting protein function using CATH. BIOCHIMICA ET BIOPHYSICA ACTA. PROTEINS AND PROTEOMICS 2024; 1872:140985. [PMID: 38122964 DOI: 10.1016/j.bbapap.2023.140985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 12/05/2023] [Accepted: 12/06/2023] [Indexed: 12/23/2023]
Abstract
MOTIVATION The growth of unannotated proteins in UniProt increases at a very high rate every year due to more efficient sequencing methods. However, the experimental annotation of proteins is a lengthy and expensive process. Using computational techniques to narrow the search can speed up the process by providing highly specific Gene Ontology (GO) terms. METHODOLOGY We propose an ensemble approach that combines three generic base predictors that predict Gene Ontology (BP, CC and MF) terms from sequences across different species. We train our models on UniProtGOA annotation data and use the CATH domain resources to identify the protein families. We then calculate a score based on the prevalence of individual GO terms in the functional families that is then used as an indicator of confidence when assigning the GO term to an uncharacterised protein. METHODS In the ensemble, we use a statistics-based method that scores the occurrence of GO terms in a CATH FunFam against a background set of proteins annotated by the same GO term. We also developed a set-based method that uses Set Intersection and Set Union to score the occurrence of GO terms within the same CATH FunFam. Finally, we also use FunFams-Plus, a predictor method developed by the Orengo Group at UCL to predict GO terms for uncharacterised proteins in the CAFA3 challenge. EVALUATION We evaluated the methods against the CAFA3 benchmark and DomFun. We used the Precision, Recall and Fmax metrics and the benchmark datasets that are used in CAFA3 to evaluate our models and compare them to the CAFA3 results. Our results show that FunPredCATH compares well with top CAFA methods in the different ontologies and benchmarks. CONTRIBUTIONS FunPredCATH compares well with other prediction methods on CAFA3, and the ensemble approach outperforms the base methods. We show that non-IEA models obtain higher Fmax scores than the IEA counterparts, while the models including IEA annotations have higher coverage at the expense of a lower Fmax score.
Collapse
Affiliation(s)
- Joseph Bonello
- Department of Structural and Molecular Biology, University College London, Gower Street, London WC1E 6BT, United Kingdom; Department of Computer Information Systems, University of Malta, Faculty of ICT, Msida, MSD 2080, Malta.
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, Gower Street, London WC1E 6BT, United Kingdom
| |
Collapse
|
4
|
Ibtehaz N, Kagaya Y, Kihara D. Domain-PFP allows protein function prediction using function-aware domain embedding representations. Commun Biol 2023; 6:1103. [PMID: 37907681 PMCID: PMC10618451 DOI: 10.1038/s42003-023-05476-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 10/17/2023] [Indexed: 11/02/2023] Open
Abstract
Domains are functional and structural units of proteins that govern various biological functions performed by the proteins. Therefore, the characterization of domains in a protein can serve as a proper functional representation of proteins. Here, we employ a self-supervised protocol to derive functionally consistent representations for domains by learning domain-Gene Ontology (GO) co-occurrences and associations. The domain embeddings we constructed turned out to be effective in performing actual function prediction tasks. Extensive evaluations showed that protein representations using the domain embeddings are superior to those of large-scale protein language models in GO prediction tasks. Moreover, the new function prediction method built on the domain embeddings, named Domain-PFP, substantially outperformed the state-of-the-art function predictors. Additionally, Domain-PFP demonstrated competitive performance in the CAFA3 evaluation, achieving overall the best performance among the top teams that participated in the assessment.
Collapse
Affiliation(s)
- Nabil Ibtehaz
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
5
|
Ibtehaz N, Kagaya Y, Kihara D. Domain-PFP: Protein Function Prediction Using Function-Aware Domain Embedding Representations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.23.554486. [PMID: 37662252 PMCID: PMC10473699 DOI: 10.1101/2023.08.23.554486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Domains are functional and structural units of proteins that govern various biological functions performed by the proteins. Therefore, the characterization of domains in a protein can serve as a proper functional representation of proteins. Here, we employ a self-supervised protocol to derive functionally consistent representations for domains by learning domain-Gene Ontology (GO) co-occurrences and associations. The domain embeddings we constructed turned out to be effective in performing actual function prediction tasks. Extensive evaluations showed that protein representations using the domain embeddings are superior to those of large-scale protein language models in GO prediction tasks. Moreover, the new function prediction method built on the domain embeddings, named Domain-PFP, significantly outperformed the state-of-the-art function predictors. Additionally, Domain-PFP demonstrated competitive performance in the CAFA3 evaluation, achieving overall the best performance among the top teams that participated in the assessment.
Collapse
Affiliation(s)
- Nabil Ibtehaz
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| |
Collapse
|
6
|
Yan TC, Yue ZX, Xu HQ, Liu YH, Hong YF, Chen GX, Tao L, Xie T. A systematic review of state-of-the-art strategies for machine learning-based protein function prediction. Comput Biol Med 2023; 154:106446. [PMID: 36680931 DOI: 10.1016/j.compbiomed.2022.106446] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 12/07/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022]
Abstract
New drug discovery is inseparable from the discovery of drug targets, and the vast majority of the known targets are proteins. At the same time, proteins are essential structural and functional elements of living cells necessary for the maintenance of all forms of life. Therefore, protein functions have become the focus of many pharmacological and biological studies. Traditional experimental techniques are no longer adequate for rapidly growing annotation of protein sequences, and approaches to protein function prediction using computational methods have emerged and flourished. A significant trend has been to use machine learning to achieve this goal. In this review, approaches to protein function prediction based on the sequence, structure, protein-protein interaction (PPI) networks, and fusion of multi-information sources are discussed. The current status of research on protein function prediction using machine learning is considered, and existing challenges and prominent breakthroughs are discussed to provide ideas and methods for future studies.
Collapse
Affiliation(s)
- Tian-Ci Yan
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Zi-Xuan Yue
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Hong-Quan Xu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yu-Hong Liu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yan-Feng Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Gong-Xing Chen
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| |
Collapse
|
7
|
MSALigMap-A Tool for Mapping Active-Site Amino Acids in PDB Structures onto Known and Novel Unannotated Homologous Sequences with Similar Function. LIFE (BASEL, SWITZERLAND) 2022; 12:life12122082. [PMID: 36556447 PMCID: PMC9784966 DOI: 10.3390/life12122082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 11/30/2022] [Accepted: 12/06/2022] [Indexed: 12/14/2022]
Abstract
MSALigMap (Multiple Sequence Alignment Ligand Mapping) is a tool for mapping active-site amino-acid residues that bind selected ligands on to target protein sequences of interest. Users can also provide novel sequences (unavailable in public databases) for analysis. MSALigMap is written in Python. There are several tools and servers available for comparing and mapping active-site amino-acid residues among protein structures. However, there has not previously been a tool for mapping ligand binding amino-acid residues onto protein sequences of interest. Using MSALigMap, users can compare multiple protein sequences, such as those from different organisms or clinical strains, with sequences of proteins with crystal structures in PDB that are bound with the ligand/drug and DNA of interest. This allows users to easily map the binding residues and to predict the consequences of different mutations observed in the binding site. The MSALigMap server can be accessed at https://albiorix.bioenv.gu.se/MSALigMap/HomePage.py.
Collapse
|
8
|
GUO Z, JIA Y, HUANG C, ZHOU Y, CHEN X, YIN R, GUO Y, WANG L, YUAN J, WANG J, YAN P, YIN R. Immunogenicity and protection against Glaesserella parasuis serotype 13 infection after vaccination with recombinant protein LolA in mice. J Vet Med Sci 2022; 84:1527-1535. [PMID: 36216558 PMCID: PMC9705812 DOI: 10.1292/jvms.22-0203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 09/19/2022] [Indexed: 01/03/2024] Open
Abstract
Glaesserella parasuis is a pathogen causing Glässer's disease characterized by fibrinous polyserositis, polyarthritis, and meningitis. Owing to the low cross-immunogenicity of different bacterial antigens in commercial vaccines, finding and identifying effective immunoprotective antigens will facilitate the development of novel subunit vaccines. In this study, LolA, identified by bioinformatics approaches, was cloned and successfully expressed as a recombinant protein in Escherichia coli, and its immunogenicity and protection were evaluated in a mouse model. The results showed that the recombinant protein LolA can stimulate mice to produce high levels of IgG antibodies and confer 50% protection against challenge with the highly virulent G. parasuis CY1201 strain (serotype 13). By testing the cytokine levels of interleukin 4 (IL-4), IL-10, and interferon-γ (IFN-γ), it was found that the recombinant protein LolA can induce both Th1 and Th2 immune responses in mice. These results suggest that the recombinant protein LolA has the potential to serve as an alternative antigen for a novel vaccine to prevent G. parasuis infection.
Collapse
Affiliation(s)
- Zhongbo GUO
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, College of Animal Science & Veterinary Medicine, Shenyang Agricultural University,
Shenyang, China
| | - Yongchao JIA
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, College of Animal Science & Veterinary Medicine, Shenyang Agricultural University,
Shenyang, China
| | - Chen HUANG
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, College of Animal Science & Veterinary Medicine, Shenyang Agricultural University,
Shenyang, China
| | - Yuanyuan ZHOU
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, College of Animal Science & Veterinary Medicine, Shenyang Agricultural University,
Shenyang, China
| | - Xin CHEN
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, College of Animal Science & Veterinary Medicine, Shenyang Agricultural University,
Shenyang, China
| | - Ronglan YIN
- Research Academy of Animal Husbandry and Veterinary Medicine Sciences of Jilin Province, Changchun, China
| | - Ying GUO
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, College of Animal Science & Veterinary Medicine, Shenyang Agricultural University,
Shenyang, China
| | - Linxi WANG
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, College of Animal Science & Veterinary Medicine, Shenyang Agricultural University,
Shenyang, China
| | - Jing YUAN
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, College of Animal Science & Veterinary Medicine, Shenyang Agricultural University,
Shenyang, China
| | - Jingyi WANG
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, College of Animal Science & Veterinary Medicine, Shenyang Agricultural University,
Shenyang, China
| | - Ping YAN
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, College of Animal Science & Veterinary Medicine, Shenyang Agricultural University,
Shenyang, China
| | - Ronghuan YIN
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, College of Animal Science & Veterinary Medicine, Shenyang Agricultural University,
Shenyang, China
| |
Collapse
|
9
|
Rahardja S, Wang M, Nguyen BP, Fränti P, Rahardja S. A lightweight classification of adaptor proteins using transformer networks. BMC Bioinformatics 2022; 23:461. [PMID: 36333658 PMCID: PMC9635127 DOI: 10.1186/s12859-022-05000-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Accepted: 09/13/2022] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND Adaptor proteins play a key role in intercellular signal transduction, and dysfunctional adaptor proteins result in diseases. Understanding its structure is the first step to tackling the associated conditions, spurring ongoing interest in research into adaptor proteins with bioinformatics and computational biology. Our study aims to introduce a small, new, and superior model for protein classification, pushing the boundaries with new machine learning algorithms. RESULTS We propose a novel transformer based model which includes convolutional block and fully connected layer. We input protein sequences from a database, extract PSSM features, then process it via our deep learning model. The proposed model is efficient and highly compact, achieving state-of-the-art performance in terms of area under the receiver operating characteristic curve, Matthew's Correlation Coefficient and Receiver Operating Characteristics curve. Despite merely 20 hidden nodes translating to approximately 1% of the complexity of previous best known methods, the proposed model is still superior in results and computational efficiency. CONCLUSIONS The proposed model is the first transformer model used for recognizing adaptor protein, and outperforms all existing methods, having PSSM profiles as inputs that comprises convolutional blocks, transformer and fully connected layers for the use of classifying adaptor proteins.
Collapse
Affiliation(s)
- Sylwan Rahardja
- grid.9668.10000 0001 0726 2490School of Computing, University of Eastern Finland, Joensuu, Finland
| | - Mou Wang
- grid.440588.50000 0001 0307 1240School of Marine Science and Technology, Northwestern Polytechnical University and Singapore Institute of Technology, 710072 Xi’an, China
| | - Binh P. Nguyen
- grid.267827.e0000 0001 2292 3111School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand
| | - Pasi Fränti
- grid.9668.10000 0001 0726 2490School of Computing, University of Eastern Finland, Joensuu, Finland
| | - Susanto Rahardja
- grid.440588.50000 0001 0307 1240School of Marine Science and Technology, Northwestern Polytechnical University and Singapore Institute of Technology, 710072 Xi’an, China ,grid.486188.b0000 0004 1790 4399Singapore Institute of Technology, Singapore, 138683 Singapore
| |
Collapse
|
10
|
Mansoor M, Nauman M, Ur Rehman H, Benso A. Gene Ontology GAN (GOGAN): a novel architecture for protein function prediction. Soft comput 2022. [DOI: 10.1007/s00500-021-06707-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
11
|
Torres M, Yang H, Romero AE, Paccanaro A. Protein function prediction for newly sequenced organisms. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00419-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
12
|
Cheng X, Tian B, Gao C, Gao W, Yan S, Yao H, Wang X, Jiang Y, Hu L, Pan X, Cao J, Lu J, Ma C, Chang C, Zhang H. Identification and expression analysis of candidate genes related to seed dormancy and germination in the wheat GATA family. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2021; 169:343-359. [PMID: 34837867 DOI: 10.1016/j.plaphy.2021.11.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 10/21/2021] [Accepted: 11/10/2021] [Indexed: 06/13/2023]
Abstract
GATA transcription factors have been reported to function in plant growth and development and during various biotic/abiotic stresses in Arabidopsis and rice. However, the functions of wheat GATAs, particularly in the regulation of seed dormancy and germination, remain unclear. Here, we identified 78 TaGATAs in wheat and divided them into five subfamilies. Sixty-four paralogous pairs and 52 orthologous pairs were obtained, and Ka/Ks ratios showed that the TaGATAs had undergone strong purifying election during the evolutionary process. Triplet analysis indicated that a high homologue retention rate could explain the large number of TaGATAs in wheat. Gene structure analysis revealed that most members of the same subfamily had similar structures, and subcellular localization prediction indicated that most TaGATAs were located in the nucleus. Gene ontology annotation results showed that most TaGATAs had molecular functions in DNA and zinc binding, and promoter analysis suggested that they may play important roles in growth, development, and biotic/abiotic stress response. We combined three microarray datasets with qRT-PCR expression data from wheat varieties of contrasting dormancy and pre-harvest sprouting resistance levels during imbibition in order to identify ten candidate genes (TaGATA17/-25/-34/-37/-40/-46/-48/-51/-72/-73) that may be involved in the regulation of seed dormancy and germination in wheat. These findings provide valuable information for further dissection of TaGATA functions in the regulation of seed dormancy and germination, thereby enabling the improvement of wheat pre-harvest sprouting resistance by gene pyramiding.
Collapse
Affiliation(s)
- Xinran Cheng
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China; National Key Laboratory for Crop Genetics and Germplasm Enhancement, Jiangsu Plant Gene Engineering Research Center, Nanjing Agricultural University, Nanjing, 210095, China
| | - Bingbing Tian
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China
| | - Chang Gao
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China
| | - Wei Gao
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China
| | - Shengnan Yan
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China
| | - Hui Yao
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China
| | - Xuyang Wang
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China
| | - Yating Jiang
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China
| | - Leixue Hu
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China
| | - Xu Pan
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China
| | - Jiajia Cao
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China
| | - Jie Lu
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China
| | - Chuanxi Ma
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China
| | - Cheng Chang
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China.
| | - Haiping Zhang
- College of Agronomy, Anhui Agricultural University, Key Laboratory of Wheat Biology and Genetic Improvement on Southern Yellow & Huai River Valley, Ministry of Agriculture and Rural Affairs, Hefei, 230036, Anhui, China.
| |
Collapse
|
13
|
Hu S, Wang Y, Xu Z, Zhou Y, Cao J, Zhang H, Zhou J. Identification of the Bcl-2 and Bax homologs from Rhipicephalus haemaphysaloides and their function in the degeneration of tick salivary glands. Parasit Vectors 2021; 14:386. [PMID: 34348769 PMCID: PMC8336254 DOI: 10.1186/s13071-021-04879-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 07/16/2021] [Indexed: 11/30/2022] Open
Abstract
Background The salivary glands of female ticks degenerate rapidly by apoptosis and autophagy after feeding. Bcl-2 family proteins play an important role in the apoptosis pathways, but the functions of these proteins in ticks are unclear. We studied Bcl-2 and Bax homologs from Rhipicephalus haemaphysaloides and determined their functions in the degeneration of the salivary glands. Methods Two molecules containing conserved BH (Bcl-2 family homology) domains were identified and named RhBcl-2 and RhBax. After protein purification and mouse immunization, specific polyclonal antibodies (PcAb) were created in response to the recombinant proteins. Reverse transcription quantitative PCR (RT-qPCR) and western blot were used to detect the presence of RhBcl-2 and RhBax in ticks. TUNEL assays were used to determine the level of apoptosis in the salivary glands of female ticks at different feeding times after gene silencing. Co-transfection and GST pull-down assays were used to identify interactions between RhBcl-2 and RhBax. Results The RT-qPCR assay revealed that RhBax gene transcription increased significantly during feeding at all tick developmental stages (engorged larvae, nymphs, and adult females). Transcriptional levels of RhBcl-2 and RhBax increased more significantly in the female salivary glands than in other tissues post engorgement. RhBcl-2 silencing significantly inhibited tick feeding. In contrast, RhBax interference had no effect on tick feeding. TUNEL staining showed that apoptosis levels were significantly reduced after interference with RhBcl-2 expression. Co-transfection and GST pull-down assays showed that RhBcl-2 and RhBax could interact but not combine in the absence of the BH3 domain. Conclusions This study identified the roles of RhBcl-2 and RhBax in tick salivary gland degeneration and finds that the BH3 domain is a key factor in their interactions. Graphical Abstract ![]()
Supplementary Information The online version contains supplementary material available at 10.1186/s13071-021-04879-z.
Collapse
Affiliation(s)
- Shanming Hu
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241, China
| | - Yanan Wang
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241, China
| | - Zhengmao Xu
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241, China
| | - Yongzhi Zhou
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241, China
| | - Jie Cao
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241, China
| | - Houshuang Zhang
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241, China
| | - Jinlin Zhou
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241, China.
| |
Collapse
|
14
|
Yousafi Q, Sarfaraz A, Saad Khan M, Saleem S, Shahzad U, Abbas Khan A, Sadiq M, Ditta Abid A, Sohail Shahzad M, ul Hassan N. In silico annotation of unreviewed acetylcholinesterase (AChE) in some lepidopteran insect pest species reveals the causes of insecticide resistance. Saudi J Biol Sci 2021; 28:2197-2209. [PMID: 33911936 PMCID: PMC8071828 DOI: 10.1016/j.sjbs.2021.01.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 12/11/2020] [Accepted: 01/06/2021] [Indexed: 02/07/2023] Open
Abstract
Lepidoptera is the second most diverse insect order outnumbered only by the Coeleptera. Acetylcholinesterase (AChE) is the major target site for insecticides. Extensive use of insecticides, to inhibit the function of this enzyme, have resulted in the development of insecticide resistance. Complete knowledge of the target proteins is very important to know the cause of resistance. Computational annotation of insect acetylcholinesterase can be helpful for the characterization of this important protein. Acetylcholinesterase of fourteen lepidopteran insect pest species was annotated by using different bioinformatics tools. AChE in all the species was hydrophilic and thermostable. All the species showed lower values for instability index except L. orbonalis, S. exigua and T. absoluta. Highest percentage of Arg, Asp, Asn, Gln and Cys were recorded in P. rapae. High percentage of Cys and Gln might be reason for insecticide resistance development in P. rapae. Phylogenetic analysis revealed the AChE in T. absoluta, L. orbonalis and S. exigua are closely related and emerged from same primary branch. Three functional motifs were predicted in eleven species while only two were found in L. orbonalis, S. exigua and T. absoluta. AChE in eleven species followed secretory pathway and have signal peptides. No signal peptides were predicted for S. exigua, L. orbonalis and T. absoluta and follow non secretory pathway. Arginine methylation and cysteine palmotylation was found in all species except S. exigua, L. orbonalis and T. absoluta. Glycosylphosphatidylinositol (GPI) anchor was predicted in only nine species.
Collapse
Affiliation(s)
- Qudsia Yousafi
- COMSATS University Islamabad, Sahiwal Campus, Sahiwal, Punjab, Pakistan
- Corresponding author.
| | - Ayesha Sarfaraz
- COMSATS University Islamabad, Sahiwal Campus, Sahiwal, Punjab, Pakistan
| | | | - Shahzad Saleem
- COMSATS University Islamabad, Sahiwal Campus, Sahiwal, Punjab, Pakistan
| | - Umbreen Shahzad
- College of Agriculture, Bahauddin Zakariya University, Bahadur Campus, Layyah, Pakistan
| | - Azhar Abbas Khan
- College of Agriculture, Bahauddin Zakariya University, Bahadur Campus, Layyah, Pakistan
| | - Mazhar Sadiq
- COMSATS University Islamabad, Sahiwal Campus, Sahiwal, Punjab, Pakistan
| | | | | | | |
Collapse
|
15
|
Vihinen M. Functional effects of protein variants. Biochimie 2020; 180:104-120. [PMID: 33164889 DOI: 10.1016/j.biochi.2020.10.009] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 10/15/2020] [Accepted: 10/19/2020] [Indexed: 12/11/2022]
Abstract
Genetic and other variations frequently affect protein functions. Scientific articles can contain confusing descriptions about which function or property is affected, and in many cases the statements are pure speculation without any experimental evidence. To clarify functional effects of protein variations of genetic or non-genetic origin, a systematic conceptualisation and framework are introduced. This framework describes protein functional effects on abundance, activity, specificity and affinity, along with countermeasures, which allow cells, tissues and organisms to tolerate, avoid, repair, attenuate or resist (TARAR) the effects. Effects on abundance discussed include gene dosage, restricted expression, mis-localisation and degradation. Enzymopathies, effects on kinetics, allostery and regulation of protein activity are subtopics for the effects of variants on activity. Variation outcomes on specificity and affinity comprise promiscuity, specificity, affinity and moonlighting. TARAR mechanisms redress variations with active and passive processes including chaperones, redundancy, robustness, canalisation and metabolic and signalling rewiring. A framework for pragmatic protein function analysis and presentation is introduced. All of the mechanisms and effects are described along with representative examples, most often in relation to diseases. In addition, protein function is discussed from evolutionary point of view. Application of the presented framework facilitates unambiguous, detailed and specific description of functional effects and their systematic study.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184, Lund, Sweden.
| |
Collapse
|
16
|
Meixner E, Goldmann U, Sedlyarov V, Scorzoni S, Rebsamen M, Girardi E, Superti‐Furga G. A substrate-based ontology for human solute carriers. Mol Syst Biol 2020; 16:e9652. [PMID: 32697042 PMCID: PMC7374931 DOI: 10.15252/msb.20209652] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 06/24/2020] [Accepted: 06/26/2020] [Indexed: 11/09/2022] Open
Abstract
Solute carriers (SLCs) are the largest family of transmembrane transporters in the human genome with more than 400 members. Despite the fact that SLCs mediate critical biological functions and several are important pharmacological targets, a large proportion of them is poorly characterized and present no assigned substrate. A major limitation to systems-level de-orphanization campaigns is the absence of a structured, language-controlled chemical annotation. Here we describe a thorough manual annotation of SLCs based on literature. The annotation of substrates, transport mechanism, coupled ions, and subcellular localization for 446 human SLCs confirmed that ~30% of these were still functionally orphan and lacked known substrates. Application of a substrate-based ontology to transcriptomic datasets identified SLC-specific responses to external perturbations, while a machine-learning approach based on the annotation allowed us to identify potential substrates for several orphan SLCs. The annotation is available at https://opendata.cemm.at/gsflab/slcontology/. Given the increasing availability of large biological datasets and the growing interest in transporters, we expect that the effort presented here will be critical to provide novel insights into the functions of SLCs.
Collapse
Affiliation(s)
- Eva Meixner
- CeMM Research Center for Molecular Medicine of the Austrian Academy of SciencesViennaAustria
| | - Ulrich Goldmann
- CeMM Research Center for Molecular Medicine of the Austrian Academy of SciencesViennaAustria
| | - Vitaly Sedlyarov
- CeMM Research Center for Molecular Medicine of the Austrian Academy of SciencesViennaAustria
| | - Stefania Scorzoni
- CeMM Research Center for Molecular Medicine of the Austrian Academy of SciencesViennaAustria
| | - Manuele Rebsamen
- CeMM Research Center for Molecular Medicine of the Austrian Academy of SciencesViennaAustria
| | - Enrico Girardi
- CeMM Research Center for Molecular Medicine of the Austrian Academy of SciencesViennaAustria
| | - Giulio Superti‐Furga
- CeMM Research Center for Molecular Medicine of the Austrian Academy of SciencesViennaAustria
- Center for Physiology and PharmacologyMedical University of ViennaViennaAustria
| |
Collapse
|
17
|
Wang Y, Hu S, Tuerdi M, Yu X, Zhang H, Zhou Y, Cao J, da Silva Vaz I, Zhou J. Initiator and executioner caspases in salivary gland apoptosis of Rhipicephalus haemaphysaloides. Parasit Vectors 2020; 13:288. [PMID: 32503655 PMCID: PMC7275347 DOI: 10.1186/s13071-020-04164-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 06/01/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Apoptosis is fundamental in maintaining cell balance in multicellular organisms, and caspases play a crucial role in apoptosis pathways. It is reported that apoptosis plays an important role in tick salivary gland degeneration. Several different caspases have been found in ticks, but the interactions between them are currently unknown. Here, we report three new caspases, isolated from the salivary glands of the tick Rhipicephalus haemaphysaloides. METHODS The full-length cDNA of the RhCaspases 7, 8 and 9 genes were obtained by transcriptome, and RhCaspases 7, 8 and 9 were expressed in E. coli; after protein purification and immunization in mice, specific polyclonal antibodies (PcAb) were created in response to the recombinant protein. Reverse-transcription quantitative PCR (RT-qPCR) and western blot were used to detect the existence of RhCaspases 7, 8 and 9 in ticks. TUNEL assays were used to determine the apoptosis level in salivary glands at different feeding times after gene silencing. The interaction between RhCaspases 7, 8 and 9 were identified by co-transfection assays. RESULTS The transcription of apoptosis-related genes in R. haemaphysaloides salivary glands increased significantly after tick engorgement. Three caspase-like molecules containing conserved caspase domains were identified and named RhCaspases 7, 8 and 9. RhCaspase8 and RhCaspase9 contain a long pro-domain at their N-terminals. An RT-qPCR assay demonstrated that the transcription of these three caspase genes increased significantly during the engorged periods of the tick developmental stages (engorged larval, nymph, and adult female ticks). Transcriptional levels of RhCaspases 7, 8 and 9 in salivary glands increased more significantly than other tissues post-engorgement. RhCaspase9-RNAi treatment significantly inhibited tick feeding. In contrast, knockdown of RhCaspase7 and RhCaspase8 had no influence on tick feeding. Compared to the control group, apoptosis levels were significantly reduced after interfering with RhCaspase 7, 8 and 9 expressions. Co-transfection assays showed RhCaspase7 was cleaved by RhCaspases 8 and 9, demonstrating that RhCaspases 8 and 9 are initiator caspases and RhCaspase7 is an executioner caspase. CONCLUSIONS To the best of our knowledge, this is the first study to identify initiator and executioner caspases in ticks, confirm the interaction among them, and associate caspase activation with tick salivary gland degeneration.
Collapse
Affiliation(s)
- Yanan Wang
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241 China
| | - Shanming Hu
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241 China
| | - Mayinuer Tuerdi
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241 China
| | - Xinmao Yu
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241 China
| | - Houshuang Zhang
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241 China
| | - Yongzhi Zhou
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241 China
| | - Jie Cao
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241 China
| | - Itabajara da Silva Vaz
- Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS Brazil
| | - Jinlin Zhou
- Key Laboratory of Animal Parasitology of Ministry of Agriculture, Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Shanghai, 200241 China
| |
Collapse
|
18
|
Common Functions of Disordered Proteins across Evolutionary Distant Organisms. Int J Mol Sci 2020; 21:ijms21062105. [PMID: 32204351 PMCID: PMC7139818 DOI: 10.3390/ijms21062105] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 03/16/2020] [Accepted: 03/17/2020] [Indexed: 12/14/2022] Open
Abstract
Intrinsically disordered proteins and regions typically lack a well-defined structure and thus fall outside the scope of the classic sequence–structure–function relationship. Hence, classic sequence- or structure-based bioinformatic approaches are often not well suited to identify homology or predict the function of unknown intrinsically disordered proteins. Here, we give selected examples of intrinsic disorder in plant proteins and present how protein function is shared, altered or distinct in evolutionary distant organisms. Furthermore, we explore how examining the specific role of disorder across different phyla can provide a better understanding of the common features that protein disorder contributes to the respective biological mechanism.
Collapse
|
19
|
Lu S, da Rocha LA, Torquato RJS, da Silva Vaz Junior I, Florin-Christensen M, Tanaka AS. A novel type 1 cystatin involved in the regulation of Rhipicephalus microplus midgut cysteine proteases. Ticks Tick Borne Dis 2020; 11:101374. [PMID: 32008997 DOI: 10.1016/j.ttbdis.2020.101374] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2019] [Revised: 01/03/2020] [Accepted: 01/10/2020] [Indexed: 01/20/2023]
Abstract
Rhipicephalus microplus is a cattle ectoparasite found in tropical and subtropical regions around the world with great impact on livestock production. R. microplus can also harbor pathogens, such as Babesia sp. and Anaplasma sp. which further compromise cattle production. Blood meal acquisition and digestion are key steps for tick development. In ticks, digestion takes place inside midgut cells and is mediated by aspartic and cysteine peptidases and, therefore, regulated by their inhibitors. Cystatins are a family of cysteine peptidases inhibitors found in several organisms and have been associated in ticks with blood acquisition, blood digestion, modulation of host immune response and tick immunity. In this work, we characterized a novel R. microplus type 1 cystatin, named Rmcystatin-1b. The inhibitor transcripts were found to be highly expressed in the midgut of partially and fully engorged females and they appear to be modulated at different days post-detachment. Purified recombinant Rmcystatin-1b displayed inhibitory activity towards typical cysteine peptidases with high affinity. Moreover, rRmcystatin-1b was able to inhibit native R. microplus cysteine peptidases and RNAi-mediated knockdown of the cystatin transcripts resulted in increased proteolytic activity. Moreover, rRmcystatin-1b was able to interfere with B. bovis growth in vitro. Taken together our data strongly suggest that Rmcystatin-1b is a regulator of blood digestion in R. microplus midgut.
Collapse
Affiliation(s)
- Stephen Lu
- Department of Biochemistry, Escola Paulista de Medicina, Universidade de Federal de São Paulo (UNIFESP), São Paulo, SP, Brazil
| | - Leticia A da Rocha
- Department of Biochemistry, Escola Paulista de Medicina, Universidade de Federal de São Paulo (UNIFESP), São Paulo, SP, Brazil
| | - Ricardo J S Torquato
- Department of Biochemistry, Escola Paulista de Medicina, Universidade de Federal de São Paulo (UNIFESP), São Paulo, SP, Brazil
| | - Itabajara da Silva Vaz Junior
- Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul (UFRGS), RS, Brazil; Faculdade de Veterinária, Universidade Federal do Rio Grande do Sul (UFRGS), RS, Brazil; Instituto Nacional de Ciência e Tecnologia em Entomologia Molecular (INCT-em), RJ, Brazil
| | - Monica Florin-Christensen
- Instituto de Patobiologia Veterinaria, Centro de Investigaciones en Ciencias Veterinarias y Agronómicas (CICVyA), INTA-Castelar, Los Reseros y Nicolas Repetto s/n, Hurlingham 1686, Argentina; National Council of Scientific and Technological Research (CONICET), Ciudad Autónoma de Buenos Aires C1033AAj, Argentina
| | - Aparecida S Tanaka
- Department of Biochemistry, Escola Paulista de Medicina, Universidade de Federal de São Paulo (UNIFESP), São Paulo, SP, Brazil; Instituto Nacional de Ciência e Tecnologia em Entomologia Molecular (INCT-em), RJ, Brazil.
| |
Collapse
|
20
|
Hong J, Luo Y, Zhang Y, Ying J, Xue W, Xie T, Tao L, Zhu F. Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning. Brief Bioinform 2019; 21:1437-1447. [PMID: 31504150 PMCID: PMC7412958 DOI: 10.1093/bib/bbz081] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 05/27/2019] [Accepted: 06/10/2019] [Indexed: 11/12/2022] Open
Abstract
Functional annotation of protein sequence with high accuracy has become one of the most important issues in modern biomedical studies, and computational approaches of significantly accelerated analysis process and enhanced accuracy are greatly desired. Although a variety of methods have been developed to elevate protein annotation accuracy, their ability in controlling false annotation rates remains either limited or not systematically evaluated. In this study, a protein encoding strategy, together with a deep learning algorithm, was proposed to control the false discovery rate in protein function annotation, and its performances were systematically compared with that of the traditional similarity-based and de novo approaches. Based on a comprehensive assessment from multiple perspectives, the proposed strategy and algorithm were found to perform better in both prediction stability and annotation accuracy compared with other de novo methods. Moreover, an in-depth assessment revealed that it possessed an improved capacity of controlling the false discovery rate compared with traditional methods. All in all, this study not only provided a comprehensive analysis on the performances of the newly proposed strategy but also provided a tool for the researcher in the fields of protein function annotation.
Collapse
Affiliation(s)
- Jiajun Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yang Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Junbiao Ying
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Feng Zhu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|
21
|
Wu J, Yin Q, Zhang C, Geng J, Wu H, Hu H, Ke X, Zhang Y. Function Prediction for G Protein-Coupled Receptors through Text Mining and Induction Matrix Completion. ACS OMEGA 2019; 4:3045-3054. [PMID: 31459527 PMCID: PMC6649004 DOI: 10.1021/acsomega.8b02454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Accepted: 01/11/2019] [Indexed: 06/10/2023]
Abstract
G protein-coupled receptors (GPCRs) constitute the key component of cellular signal transduction. Accurately annotating the biological functions of GPCR proteins is vital to the understanding of the physiological processes they involve in. With the rapid development of text mining technologies and the exponential growth of biomedical literature, it becomes urgent to explore biological functional information from various literature for systematically and reliably annotating these known GPCRs. We design a novel three-stage approach, TM-IMC, using text mining and inductive matrix completion, for automated prediction of the gene ontology (GO) terms of the GPCR proteins. Large-scale benchmark tests show that inductive matrix completion models contribute to GPCR-GO association prediction for both molecular function and biological process aspects. Moreover, our detailed data analysis shows that information extracted from GPCR-associated literature indeed contributes to the prediction of GPCR-GO associations. The study demonstrated a new avenue to enhance the accuracy of GPCR function annotation through the combination of text mining and induction matrix completion over baseline methods in critical assessment of protein function annotation algorithms and literature-based GO annotation methods. Source codes of TM-IMC and the involved datasets can be freely downloaded from https://zhanglab.ccmb.med.umich.edu/TM-IMC for academic purposes.
Collapse
Affiliation(s)
- Jiansheng Wu
- School
of Geographic and Biological Information and School of Telecommunication and
Information Engineering, Nanjing University
of Posts and Telecommunications, Nanjing 210023, China
| | - Qin Yin
- School
of Geographic and Biological Information and School of Telecommunication and
Information Engineering, Nanjing University
of Posts and Telecommunications, Nanjing 210023, China
| | - Chengxin Zhang
- Department of Computational Medicine
and Bioinformatics and Department of Biological
Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Jingjing Geng
- School
of Geographic and Biological Information and School of Telecommunication and
Information Engineering, Nanjing University
of Posts and Telecommunications, Nanjing 210023, China
| | - Hongjie Wu
- School
of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Haifeng Hu
- School
of Geographic and Biological Information and School of Telecommunication and
Information Engineering, Nanjing University
of Posts and Telecommunications, Nanjing 210023, China
| | - Xiaoyan Ke
- Child
Mental Health Research Center, Nanjing Brain Hospital, Nanjing Medical University, Nanjing 210029, China
| | - Yang Zhang
- Department of Computational Medicine
and Bioinformatics and Department of Biological
Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| |
Collapse
|