1
|
Jain A, Begum T, Ahmad S. Analysis and Prediction of Pathogen Nucleic Acid Specificity for Toll-like Receptors in Vertebrates. J Mol Biol 2023; 435:168208. [PMID: 37479078 DOI: 10.1016/j.jmb.2023.168208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/20/2023] [Accepted: 07/13/2023] [Indexed: 07/23/2023]
Abstract
Identification of key sequence, expression and function related features of nucleic acid-sensing host proteins is of fundamental importance to understand the dynamics of pathogen-specific host responses. To meet this objective, we considered toll-like receptors (TLRs), a representative class of membrane-bound sensor proteins, from 17 vertebrate species covering mammals, birds, reptiles, amphibians, and fishes in this comparative study. We identified the molecular signatures of host TLRs that are responsible for sensing pathogen nucleic acids or other pathogen-associated molecular patterns (PAMPs), and potentially play important roles in host defence mechanism. Interestingly, our findings reveal that such host-specific features are directly related to the strand (single or double) specificity of nucleic acid from pathogens. However, during host-pathogen interactions, such features were unable to explain the pathogenic PAMP (i.e., DNA, RNA or other) selectivity, suggesting a more complex mechanism. Using these features, we developed a number of machine learning models, of which Random Forest achieved a high performance (94.57% accuracy) to predict strand specificity of TLRs from protein-derived features. We applied the trained model to propose strand specificity of some previously uncharacterized distinct fish-specific novel TLRs (TLR18, TLR23, TLR24, TLR25, TLR27).
Collapse
Affiliation(s)
- Anuja Jain
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India. https://twitter.com/@Anuja334
| | - Tina Begum
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
| | - Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
| |
Collapse
|
2
|
Exploring Potential Signals of Selection for Disordered Residues in Prokaryotic and Eukaryotic Proteins. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 18:549-564. [PMID: 33346088 PMCID: PMC8377245 DOI: 10.1016/j.gpb.2020.06.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Revised: 03/29/2020] [Accepted: 06/10/2020] [Indexed: 11/22/2022]
Abstract
Intrinsically disordered proteins (IDPs) are an important class of proteins in all domains of life for their functional importance. However, how nature has shaped the disorder potential of prokaryotic and eukaryotic proteins is still not clearly known. Randomly generated sequences are free of any selective constraints, thus these sequences are commonly used as null models. Considering different types of random protein models, here we seek to understand how the disorder potential of natural eukaryotic and prokaryotic proteins differs from random sequences. Comparing proteome-wide disorder content between real and random sequences of 12 model organisms, we noticed that eukaryotic proteins are enriched in disordered regions compared to random sequences, but in prokaryotes such regions are depleted. By analyzing the position-wise disorder profile, we show that there is a generally higher disorder near the N- and C-terminal regions of eukaryotic proteins as compared to the random models; however, either no or a weak such trend was found in prokaryotic proteins. Moreover, here we show that this preference is not caused by the amino acid or nucleotide composition at the respective sites. Instead, these regions were found to be endowed with a higher fraction of protein–protein binding sites, suggesting their functional importance. We discuss several possible explanations for this pattern, such as improving the efficiency of protein–protein interaction, ribosome movement during translation, and post-translational modification. However, further studies are needed to clearly understand the biophysical mechanisms causing the trend.
Collapse
|
3
|
Panda A, Acharya D, Chandra Ghosh T. Insights into human intrinsically disordered proteins from their gene expression profile. MOLECULAR BIOSYSTEMS 2018; 13:2521-2530. [PMID: 29051952 DOI: 10.1039/c7mb00311k] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Expression level provides important clues about gene function. Previously, various efforts have been undertaken to profile human genes according to their expression level. Intrinsically disordered proteins (IDPs) do not adopt any rigid conformation under physiological conditions, however, are considered as an important functional class in all domains of life. Based on a human tissue-averaged gene expression level, previous studies showed that IDPs are expressed at a lower level than ordered globular proteins. Here, we examined the gene expression pattern of human ordered and disordered proteins in 32 normal tissues. We noticed that in most of the tissues, ordered and disordered proteins are expressed at a similar level. Moreover, in a number of tissues IDPs were found to be expressed at a higher level than ordered proteins. Rigorous statistical analyses suggested that the lower tissue-averaged gene expression level of IDPs (reported earlier) may be the consequence of their biased gene expression in some specific tissues and higher protein length. When we considered the gene repertory of each tissue we noticed that a number of human tissues (brain, testes, etc.) selectively express a higher fraction of disordered proteins, which help them to maintain higher protein connectivity by forming disordered binding motifs and to sustain their functional specificities. Our results demonstrated that the disordered proteins are indispensable in these tissues for their functional advantages.
Collapse
Affiliation(s)
- Arup Panda
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, West Bengal, India.
| | | | | |
Collapse
|
4
|
Begum T, Ghosh TC, Basak S. Systematic Analyses and Prediction of Human Drug Side Effect Associated Proteins from the Perspective of Protein Evolution. Genome Biol Evol 2017; 9:337-350. [PMID: 28391292 PMCID: PMC5499873 DOI: 10.1093/gbe/evw301] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/16/2017] [Indexed: 12/20/2022] Open
Abstract
Identification of various factors involved in adverse drug reactions in target proteins to develop therapeutic drugs with minimal/no side effect is very important. In this context, we have performed a comparative evolutionary rate analyses between the genes exhibiting drug side-effect(s) (SET) and genes showing no side effect (NSET) with an aim to increase the prediction accuracy of SET/NSET proteins using evolutionary rate determinants. We found that SET proteins are more conserved than the NSET proteins. The rates of evolution between SET and NSET protein primarily depend upon their noncomplex (protein complex association number = 0) forming nature, phylogenetic age, multifunctionality, membrane localization, and transmembrane helix content irrespective of their essentiality, total druggability (total number of drugs/target), m-RNA expression level, and tissue expression breadth. We also introduced two novel terms—killer druggability (number of drugs with killing side effect(s)/target), essential druggability (number of drugs targeting essential proteins/target) to explain the evolutionary rate variation between SET and NSET proteins. Interestingly, we noticed that SET proteins are younger than NSET proteins and multifunctional younger SET proteins are candidates of acquiring killing side effects. We provide evidence that higher killer druggability, multifunctionality, and transmembrane helices support the conservation of SET proteins over NSET proteins in spite of their recent origin. By employing all these entities, our Support Vector Machine model predicts human SET/NSET proteins to a high degree of accuracy (∼86%).
Collapse
Affiliation(s)
- Tina Begum
- Bioinformatics Centre, Tripura University, Suryamaninagar, Tripura, India
| | | | - Surajit Basak
- Bioinformatics Centre, Tripura University, Suryamaninagar, Tripura, India.,Department of Molecular Biology & Bioinformatics, Tripura University, Suryamaninagar, Tripura, India
| |
Collapse
|
5
|
Chakraborty S, Panda A, Ghosh TC. Exploring the evolutionary rate differences between human disease and non-disease genes. Genomics 2015; 108:18-24. [PMID: 26562439 DOI: 10.1016/j.ygeno.2015.11.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Revised: 10/29/2015] [Accepted: 11/03/2015] [Indexed: 10/22/2022]
Abstract
Comparisons of evolutionary features between human disease and non-disease genes have a wide implication to understand the genetic basis of human disease genes. However, it has not yet been resolved whether disease genes evolve at slower or faster rate than the non-disease genes. To resolve this controversy, here we integrated human disease genes from several databases and compared their protein evolutionary rates with non-disease genes in both housekeeping and tissue-specific group. We noticed that in tissue specific group, disease genes evolve significantly at a slower rate than non-disease genes. However, we found no significant difference in evolutionary rates between disease and non-disease genes in housekeeping group. Tissue specific disease genes have a higher protein complex number, elevated gene expression level and are also associated with conserve biological processes. Finally, our regression analysis suggested that protein complex number followed by protein multifunctionality independently modulates the evolutionary rate of human disease genes.
Collapse
Affiliation(s)
- Sandip Chakraborty
- Bioinformatics Centre, Bose Institute, P-1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Arup Panda
- Bioinformatics Centre, Bose Institute, P-1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Tapash Chandra Ghosh
- Bioinformatics Centre, Bose Institute, P-1/12, C.I.T. Scheme VII M, Kolkata 700 054, India.
| |
Collapse
|
6
|
Hierarchical closeness efficiently predicts disease genes in a directed signaling network. Comput Biol Chem 2014; 53PB:191-197. [PMID: 25462327 DOI: 10.1016/j.compbiolchem.2014.08.023] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 08/13/2014] [Accepted: 08/25/2014] [Indexed: 11/21/2022]
Abstract
BACKGROUND Many structural centrality measures were proposed to predict putative disease genes on biological networks. Closeness is one of the best-known structural centrality measures, and its effectiveness for disease gene prediction on undirected biological networks has been frequently reported. However, it is not clear whether closeness is effective for disease gene prediction on directed biological networks such as signaling networks. RESULTS In this paper, we first show that closeness does not significantly outperform other well-known centrality measures such as Degree, Betweenness, and PageRank for disease gene prediction on a human signaling network. In addition, we observed that prediction accuracy by the closeness measure was worse than that by a reachability measure, but closeness could efficiently predict disease genes among a set of genes with the same reachability value. Based on this observation, we devised a novel structural measure, hierarchical closeness, by combining reachability and closeness such that all genes are first ranked by the degree of reachability and then the tied genes are further ranked by closeness. We discovered that hierarchical closeness outperforms other structural centrality measures in disease gene prediction. We also found that the set of highly ranked genes in terms of hierarchical closeness is clearly different from that of hub genes with high connectivity. More interestingly, these findings were consistently reproduced in a random Boolean network model. Finally, we found that genes with relatively high hierarchical closeness are significantly likely to encode proteins in the extracellular matrix and receptor proteins in a human signaling network, supporting the fact that half of all modern medicinal drugs target receptor-encoding genes. CONCLUSION Taken together, hierarchical closeness proposed in this study is a novel structural measure to efficiently predict putative disease genes in a directed signaling network.
Collapse
|
7
|
Begum T, Ghosh TC. Elucidating the genotype-phenotype relationships and network perturbations of human shared and specific disease genes from an evolutionary perspective. Genome Biol Evol 2014; 6:2741-53. [PMID: 25287147 PMCID: PMC4224346 DOI: 10.1093/gbe/evu220] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To date, numerous studies have been attempted to determine the extent of variation in evolutionary rates between human disease and nondisease (ND) genes. In our present study, we have considered human autosomal monogenic (Mendelian) disease genes, which were classified into two groups according to the number of phenotypic defects, that is, specific disease (SPD) gene (one gene: one defect) and shared disease (SHD) gene (one gene: multiple defects). Here, we have compared the evolutionary rates of these two groups of genes, that is, SPD genes and SHD genes with respect to ND genes. We observed that the average evolutionary rates are slow in SHD group, intermediate in SPD group, and fast in ND group. Group-to-group evolutionary rate differences remain statistically significant regardless of their gene expression levels and number of defects. We demonstrated that disease genes are under strong selective constraint if they emerge through edgetic perturbation or drug-induced perturbation of the interactome network, show tissue-restricted expression, and are involved in transmembrane transport. Among all the factors, our regression analyses interestingly suggest the independent effects of 1) drug-induced perturbation and 2) the interaction term of expression breadth and transmembrane transport on protein evolutionary rates. We reasoned that the drug-induced network disruption is a combination of several edgetic perturbations and, thus, has more severe effect on gene phenotypes.
Collapse
Affiliation(s)
- Tina Begum
- Bioinformatics Centre, Bose Institute, Kolkata, West Bengal, India
| | | |
Collapse
|
8
|
Panda A, Podder S, Chakraborty S, Ghosh TC. GC-made protein disorder sheds new light on vertebrate evolution. Genomics 2014; 104:530-7. [PMID: 25240915 DOI: 10.1016/j.ygeno.2014.09.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Revised: 08/05/2014] [Accepted: 09/10/2014] [Indexed: 10/24/2022]
Abstract
At the emergence of endothermic vertebrates, GC rich regions of the ectothermic ancestral genomes underwent a significant GC increase. Such an increase was previously postulated to increase thermodynamic and structural stability of proteins through selective increase of protein hydrophobicity. Here, we found that, increase in GC content promotes a higher content of disorder promoting amino acid in endothermic vertebrates proteins and that the increase in hydrophobicity is mainly due to a higher content of the small disorder promoting amino acid alanine. In endothermic vertebrates, prevalence of disordered residues was found to promote functional diversity of proteins encoded by GC rich genes. Higher fraction of disordered residues in this group of proteins was also found to minimize their aggregation tendency. Thus, we propose that the GC transition has favored disordered residues to promote functional diversity in GC rich genes, and to protect them against functional loss by protein misfolding.
Collapse
Affiliation(s)
- Arup Panda
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Soumita Podder
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Sandip Chakraborty
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Tapash Chandra Ghosh
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India.
| |
Collapse
|
9
|
Panda A, Ghosh TC. Prevalent structural disorder carries signature of prokaryotic adaptation to oxic atmosphere. Gene 2014; 548:134-41. [PMID: 24999584 DOI: 10.1016/j.gene.2014.07.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Revised: 06/27/2014] [Accepted: 07/03/2014] [Indexed: 12/12/2022]
Abstract
Microbes have adopted efficient mechanisms to contend with environmental changes. The emergence of oxygen was a major event that led to an abrupt change in Earth's atmosphere. To adjust with this shift in environmental condition ancient microbes must have undergone several modifications. Although some proteomic and genomic attributes were proposed to facilitate survival of microorganisms in the presence of oxygen, the process of adaptation still remains elusive. Recent studies have focused that intrinsically disordered proteins play crucial roles in adaptation to a wide range of ecological conditions. Therefore, it is likely that disordered proteins could also play indispensable roles in microbial adaptation to the aerobic environment. To test this hypothesis we measured the disorder content of 679 prokaryotes from four oxygen requirement groups. Our result revealed that aerobic proteomes are endowed with the highest protein disorder followed by facultative microbes. Minimal disorder was observed in anaerobic and microaerophilic microbes with no significant difference in their disorder content. Considering all the potential confounding factors that can modulate protein disorder, here we established that the high protein disorder in aerobic microbe is not a by-product of adaptation to any other selective pressure. On the functional level, we found that the high disorder in aerobic proteomes has been utilized for processes that are important for their aerobic lifestyle. Moreover, aerobic proteomes were found to be enriched with disordered binding sites and to contain transcription factors with high disorder propensity. Based on our results, here we proposed that the high protein disorder is an adaptive opportunity for aerobic microbes to fit with the genomic and functional complexities of the aerobic lifestyle.
Collapse
Affiliation(s)
- Arup Panda
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Tapash Chandra Ghosh
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India.
| |
Collapse
|