1
|
Jain A, Begum T, Ahmad S. Analysis and Prediction of Pathogen Nucleic Acid Specificity for Toll-like Receptors in Vertebrates. J Mol Biol 2023; 435:168208. [PMID: 37479078 DOI: 10.1016/j.jmb.2023.168208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/20/2023] [Accepted: 07/13/2023] [Indexed: 07/23/2023]
Abstract
Identification of key sequence, expression and function related features of nucleic acid-sensing host proteins is of fundamental importance to understand the dynamics of pathogen-specific host responses. To meet this objective, we considered toll-like receptors (TLRs), a representative class of membrane-bound sensor proteins, from 17 vertebrate species covering mammals, birds, reptiles, amphibians, and fishes in this comparative study. We identified the molecular signatures of host TLRs that are responsible for sensing pathogen nucleic acids or other pathogen-associated molecular patterns (PAMPs), and potentially play important roles in host defence mechanism. Interestingly, our findings reveal that such host-specific features are directly related to the strand (single or double) specificity of nucleic acid from pathogens. However, during host-pathogen interactions, such features were unable to explain the pathogenic PAMP (i.e., DNA, RNA or other) selectivity, suggesting a more complex mechanism. Using these features, we developed a number of machine learning models, of which Random Forest achieved a high performance (94.57% accuracy) to predict strand specificity of TLRs from protein-derived features. We applied the trained model to propose strand specificity of some previously uncharacterized distinct fish-specific novel TLRs (TLR18, TLR23, TLR24, TLR25, TLR27).
Collapse
Affiliation(s)
- Anuja Jain
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India. https://twitter.com/@Anuja334
| | - Tina Begum
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
| | - Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
| |
Collapse
|
2
|
Link clustering explains non-central and contextually essential genes in protein interaction networks. Sci Rep 2019; 9:11672. [PMID: 31406201 PMCID: PMC6690968 DOI: 10.1038/s41598-019-48273-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Accepted: 08/01/2019] [Indexed: 01/29/2023] Open
Abstract
Recent studies have shown that many essential genes (EGs) change their essentiality across various contexts. Finding contextual EGs in pathogenic conditions may facilitate the identification of therapeutic targets. We propose link clustering as an indicator of contextual EGs that are non-central in protein-protein interaction (PPI) networks. In various human and yeast PPI networks, we found that 29–47% of EGs were better characterized by link clustering than by centrality. Importantly, non-central EGs were prone to change their essentiality across different human cell lines and between species. Compared with central EGs and non-EGs, non-central EGs had intermediate levels of expression and evolutionary conservation. In addition, non-central EGs exhibited a significant impact on communities at lower hierarchical levels, suggesting that link clustering is associated with contextual essentiality, as it depicts locally important nodes in network structures.
Collapse
|
3
|
Begum T, Ghosh TC, Basak S. Systematic Analyses and Prediction of Human Drug Side Effect Associated Proteins from the Perspective of Protein Evolution. Genome Biol Evol 2017; 9:337-350. [PMID: 28391292 PMCID: PMC5499873 DOI: 10.1093/gbe/evw301] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/16/2017] [Indexed: 12/20/2022] Open
Abstract
Identification of various factors involved in adverse drug reactions in target proteins to develop therapeutic drugs with minimal/no side effect is very important. In this context, we have performed a comparative evolutionary rate analyses between the genes exhibiting drug side-effect(s) (SET) and genes showing no side effect (NSET) with an aim to increase the prediction accuracy of SET/NSET proteins using evolutionary rate determinants. We found that SET proteins are more conserved than the NSET proteins. The rates of evolution between SET and NSET protein primarily depend upon their noncomplex (protein complex association number = 0) forming nature, phylogenetic age, multifunctionality, membrane localization, and transmembrane helix content irrespective of their essentiality, total druggability (total number of drugs/target), m-RNA expression level, and tissue expression breadth. We also introduced two novel terms—killer druggability (number of drugs with killing side effect(s)/target), essential druggability (number of drugs targeting essential proteins/target) to explain the evolutionary rate variation between SET and NSET proteins. Interestingly, we noticed that SET proteins are younger than NSET proteins and multifunctional younger SET proteins are candidates of acquiring killing side effects. We provide evidence that higher killer druggability, multifunctionality, and transmembrane helices support the conservation of SET proteins over NSET proteins in spite of their recent origin. By employing all these entities, our Support Vector Machine model predicts human SET/NSET proteins to a high degree of accuracy (∼86%).
Collapse
Affiliation(s)
- Tina Begum
- Bioinformatics Centre, Tripura University, Suryamaninagar, Tripura, India
| | | | - Surajit Basak
- Bioinformatics Centre, Tripura University, Suryamaninagar, Tripura, India.,Department of Molecular Biology & Bioinformatics, Tripura University, Suryamaninagar, Tripura, India
| |
Collapse
|
4
|
Mallik S, Kundu S. Modular Organization of Residue-Level Contacts Shapes the Selection Pressure on Individual Amino Acid Sites of Ribosomal Proteins. Genome Biol Evol 2017; 9:916-931. [PMID: 28338825 PMCID: PMC5388290 DOI: 10.1093/gbe/evx036] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/21/2017] [Indexed: 12/26/2022] Open
Abstract
Understanding the molecular evolution of macromolecular complexes in the light of their structure, assembly, and stability is of central importance. Here, we address how the modular organization of native molecular contacts shapes the selection pressure on individual residue sites of ribosomal complexes. The bacterial ribosomal complex is represented as a residue contact network where nodes represent amino acid/nucleotide residues and edges represent their van der Waals interactions. We find statistically overrepresented native amino acid-nucleotide contacts (OaantC, one amino acid contacts one or multiple nucleotides, internucleotide contacts are disregarded). Contact number is defined as the number of nucleotides contacted. Involvement of individual amino acids in OaantCs with smaller contact numbers is more random, whereas only a few amino acids significantly contribute to OaantCs with higher contact numbers. An investigation of structure, stability, and assembly of bacterial ribosome depicts the involvement of these OaantCs in diverse biophysical interactions stabilizing the complex, including high-affinity protein-RNA contacts, interprotein cooperativity, intersubunit bridge, packing of multiple ribosomal RNA domains, etc. Amino acid-nucleotide constituents of OaantCs with higher contact numbers are generally associated with significantly slower substitution rates compared with that of OaantCs with smaller contact numbers. This evolutionary rate heterogeneity emerges from the strong purifying selection pressure that conserves the respective amino acid physicochemical properties relevant to the stabilizing interaction with OaantC nucleotides. An analysis of relative molecular orientations of OaantC residues and their interaction energetics provides the biophysical ground of purifying selection conserving OaantC amino acid physicochemical properties.
Collapse
Affiliation(s)
- Saurav Mallik
- Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, India
- Center of Excellence in Systems Biology and Biomedical Engineering (TEQIP Phase-II), University of Calcutta, Kolkata, India
| | - Sudip Kundu
- Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, India
- Center of Excellence in Systems Biology and Biomedical Engineering (TEQIP Phase-II), University of Calcutta, Kolkata, India
| |
Collapse
|
5
|
Banerjee S, Chakraborty S, De RK. Deciphering the cause of evolutionary variance within intrinsically disordered regions in human proteins. J Biomol Struct Dyn 2016; 35:233-249. [PMID: 26790343 DOI: 10.1080/07391102.2016.1143877] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Why the intrinsically disordered regions evolve within human proteome has became an interesting question for a decade. Till date, it remains an unsolved yet an intriguing issue to investigate why some of the disordered regions evolve rapidly while the rest are highly conserved across mammalian species. Identifying the key biological factors, responsible for the variation in the conservation rate of different disordered regions within the human proteome, may revisit the above issue. We emphasized that among the other biological features (multifunctionality, gene essentiality, protein connectivity, number of unique domains, gene expression level and expression breadth) considered in our study, the number of unique protein domains acts as a strong determinant that negatively influences the conservation of disordered regions. In this context, we justified that proteins having a fewer types of domains preferably need to conserve their disordered regions to enhance their structural flexibility which in turn will facilitate their molecular interactions. In contrast, the selection pressure acting on the stretches of disordered regions is not so strong in the case of multi-domains proteins. Therefore, we reasoned that the presence of conserved disordered stretches may compensate the functions of multiple domains within a single domain protein. Interestingly, we noticed that the influence of the unique domain number and expression level acts differently on the evolution of disordered regions from that of well-structured ones.
Collapse
Affiliation(s)
- Sanghita Banerjee
- a Machine Intelligence Unit , Indian Statistical Institute , 203 Barrackpore Trunk Road, Kolkata 700108 , India
| | | | - Rajat K De
- a Machine Intelligence Unit , Indian Statistical Institute , 203 Barrackpore Trunk Road, Kolkata 700108 , India
| |
Collapse
|
6
|
Acharya D, Ghosh TC. Global analysis of human duplicated genes reveals the relative importance of whole-genome duplicates originated in the early vertebrate evolution. BMC Genomics 2016; 17:71. [PMID: 26801093 PMCID: PMC4724117 DOI: 10.1186/s12864-016-2392-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 01/13/2016] [Indexed: 12/13/2022] Open
Abstract
Background Gene duplication is a genetic mutation that creates functionally redundant gene copies that are initially relieved from selective pressures and may adapt themselves to new functions with time. The levels of gene duplication may vary from small-scale duplication (SSD) to whole genome duplication (WGD). Studies with yeast revealed ample differences between these duplicates: Yeast WGD pairs were functionally more similar, less divergent in subcellular localization and contained a lesser proportion of essential genes. In this study, we explored the differences in evolutionary genomic properties of human SSD and WGD genes, with the identifiable human duplicates coming from the two rounds of whole genome duplication occurred early in vertebrate evolution. Results We observed that these two groups of duplicates were also dissimilar in terms of their evolutionary and genomic properties. But interestingly, this is not like the same observed in yeast. The human WGDs were found to be functionally less similar, diverge more in subcellular level and contain a higher proportion of essential genes than the SSDs, all of which are opposite from yeast. Additionally, we explored that human WGDs were more divergent in their gene expression profile, have higher multifunctionality and are more often associated with disease, and are evolutionarily more conserved than human SSDs. Conclusions Our study suggests that human WGD duplicates are more divergent and entails the adaptation of WGDs to novel and important functions that consequently lead to their evolutionary conservation in the course of evolution. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2392-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Debarun Acharya
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata, 700054, West Bengal, India
| | - Tapash C Ghosh
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata, 700054, West Bengal, India.
| |
Collapse
|
7
|
Chakraborty S, Panda A, Ghosh TC. Exploring the evolutionary rate differences between human disease and non-disease genes. Genomics 2015; 108:18-24. [PMID: 26562439 DOI: 10.1016/j.ygeno.2015.11.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Revised: 10/29/2015] [Accepted: 11/03/2015] [Indexed: 10/22/2022]
Abstract
Comparisons of evolutionary features between human disease and non-disease genes have a wide implication to understand the genetic basis of human disease genes. However, it has not yet been resolved whether disease genes evolve at slower or faster rate than the non-disease genes. To resolve this controversy, here we integrated human disease genes from several databases and compared their protein evolutionary rates with non-disease genes in both housekeeping and tissue-specific group. We noticed that in tissue specific group, disease genes evolve significantly at a slower rate than non-disease genes. However, we found no significant difference in evolutionary rates between disease and non-disease genes in housekeeping group. Tissue specific disease genes have a higher protein complex number, elevated gene expression level and are also associated with conserve biological processes. Finally, our regression analysis suggested that protein complex number followed by protein multifunctionality independently modulates the evolutionary rate of human disease genes.
Collapse
Affiliation(s)
- Sandip Chakraborty
- Bioinformatics Centre, Bose Institute, P-1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Arup Panda
- Bioinformatics Centre, Bose Institute, P-1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Tapash Chandra Ghosh
- Bioinformatics Centre, Bose Institute, P-1/12, C.I.T. Scheme VII M, Kolkata 700 054, India.
| |
Collapse
|
8
|
Mukherjee S, Panda A, Ghosh TC. Elucidating evolutionary features and functional implications of orphan genes in Leishmania major. INFECTION GENETICS AND EVOLUTION 2015; 32:330-7. [PMID: 25843649 DOI: 10.1016/j.meegid.2015.03.031] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2015] [Revised: 03/25/2015] [Accepted: 03/26/2015] [Indexed: 11/28/2022]
Abstract
Orphan genes are protein coding genes that lack recognizable homologs in other organisms. These genes were reported to comprise a considerable fraction of coding regions in all sequenced genomes and thought to be allied with organism's lineage-specific traits. However, their evolutionary persistence and functional significance still remain elusive. Due to lack of homologs with the host genome and for their probable lineage-specific functional roles, orphan gene product of pathogenic protozoan might be considered as the possible therapeutic targets. Leishmania major is an important parasitic protozoan of the genus Leishmania that is associated with the disease cutaneous leishmaniasis. Therefore, evolutionary and functional characterization of orphan genes in this organism may help in understanding the factors prevailing pathogen evolution and parasitic adaptation. In this study, we systematically identified orphan genes of L. major and employed several in silico analyses for understanding their evolutionary and functional attributes. To trace the signatures of molecular evolution, we compared their evolutionary rate with non-orphan genes. In agreement with prior observations, here we noticed that orphan genes evolve at a higher rate as compared to non-orphan genes. Lower sequence conservation of orphan genes was previously attributed solely due to their younger gene age. However, here we observed that together with gene age, a number of genomic (like expression level, GC content, variation in codon usage) and proteomic factors (like protein length, intrinsic disorder content, hydropathicity) could independently modulate their evolutionary rate. We considered the interplay of all these factors and analyzed their relative contribution on protein evolutionary rate by regression analysis. On the functional level, we observed that orphan genes are associated with regulatory, growth factor and transport related processes. Moreover, these genes were found to be enriched with various types of interaction and trafficking motifs, implying their possible involvement in host-parasite interactions. Thus, our comprehensive analysis of L. major orphan genes provided evidence for their extensive roles in host-pathogen interactions and virulence.
Collapse
Affiliation(s)
- Sumit Mukherjee
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, West Bengal, India; Department of Physical Sciences, Indian Institute of Science Education and Research-Kolkata, Mohanpur 741246, Nadia, West Bengal, India
| | - Arup Panda
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, West Bengal, India
| | - Tapash Chandra Ghosh
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, West Bengal, India.
| |
Collapse
|
9
|
Begum T, Ghosh TC. Elucidating the genotype-phenotype relationships and network perturbations of human shared and specific disease genes from an evolutionary perspective. Genome Biol Evol 2014; 6:2741-53. [PMID: 25287147 PMCID: PMC4224346 DOI: 10.1093/gbe/evu220] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To date, numerous studies have been attempted to determine the extent of variation in evolutionary rates between human disease and nondisease (ND) genes. In our present study, we have considered human autosomal monogenic (Mendelian) disease genes, which were classified into two groups according to the number of phenotypic defects, that is, specific disease (SPD) gene (one gene: one defect) and shared disease (SHD) gene (one gene: multiple defects). Here, we have compared the evolutionary rates of these two groups of genes, that is, SPD genes and SHD genes with respect to ND genes. We observed that the average evolutionary rates are slow in SHD group, intermediate in SPD group, and fast in ND group. Group-to-group evolutionary rate differences remain statistically significant regardless of their gene expression levels and number of defects. We demonstrated that disease genes are under strong selective constraint if they emerge through edgetic perturbation or drug-induced perturbation of the interactome network, show tissue-restricted expression, and are involved in transmembrane transport. Among all the factors, our regression analyses interestingly suggest the independent effects of 1) drug-induced perturbation and 2) the interaction term of expression breadth and transmembrane transport on protein evolutionary rates. We reasoned that the drug-induced network disruption is a combination of several edgetic perturbations and, thus, has more severe effect on gene phenotypes.
Collapse
Affiliation(s)
- Tina Begum
- Bioinformatics Centre, Bose Institute, Kolkata, West Bengal, India
| | | |
Collapse
|
10
|
Su L, Liu G, Wang H, Tian Y, Zhou Z, Han L, Yan L. GECluster: a novel protein complex prediction method. BIOTECHNOL BIOTEC EQ 2014; 28:753-761. [PMID: 26019559 PMCID: PMC4433864 DOI: 10.1080/13102818.2014.946700] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2013] [Accepted: 05/26/2014] [Indexed: 11/16/2022] Open
Abstract
Identification of protein complexes is of great importance in the understanding of cellular organization and functions. Traditional computational protein complex prediction methods mainly rely on the topology of protein–protein interaction (PPI) networks but seldom take biological information of proteins (such as Gene Ontology (GO)) into consideration. Meanwhile, the environment relevant analysis of protein complex evolution has been poorly studied, partly due to the lack of high-precision protein complex datasets. In this paper, a combined PPI network is introduced to predict protein complexes which integrate both GO and expression value of relevant protein-coding genes. A novel protein complex prediction method GECluster (Gene Expression Cluster) was proposed based on a seed node expansion strategy, in which a combined PPI network was utilized. GECluster was applied to a training combined PPI network and it predicted more credible complexes than peer methods. The results indicate that using a combined PPI network can efficiently improve protein complex prediction accuracy. In order to study protein complex evolution within cells due to changes in the living environment surrounding cells, GECluster was applied to seven combined PPI networks constructed using the data of a test set including yeast response to stress throughout a wine fermentation process. Our results showed that with the rise of alcohol concentration, protein complexes within yeast cells gradually evolve from one state to another. Besides this, the number of core and attachment proteins within a protein complex both changed significantly.
Collapse
Affiliation(s)
- Lingtao Su
- College of Computer Science and Technology, Jilin University , Changchun , P. R. China ; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University , Changchun , P. R. China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University , Changchun , P. R. China ; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University , Changchun , P. R. China
| | - Han Wang
- College of Computer Science and Information Technology, Northeast Normal University , Changchun , P. R. China
| | - Yuan Tian
- College of Computer Science and Technology, Jilin University , Changchun , P. R. China ; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University , Changchun , P. R. China
| | - Zhihui Zhou
- College of Computer Science and Technology, Jilin University , Changchun , P. R. China ; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University , Changchun , P. R. China
| | - Liang Han
- College of Computer Science and Technology, Jilin University , Changchun , P. R. China ; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University , Changchun , P. R. China
| | - Lun Yan
- College of Computer Science and Technology, Jilin University , Changchun , P. R. China ; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University , Changchun , P. R. China
| |
Collapse
|