Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lee B, Lee D. Protein comparison at the domain architecture level. BMC Bioinformatics 2009;10 Suppl 15:S5. [PMID: 19958515 PMCID: PMC2788356 DOI: 10.1186/1471-2105-10-s15-s5] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

For:	Lee B, Lee D. Protein comparison at the domain architecture level. BMC Bioinformatics 2009;10 Suppl 15:S5. [PMID: 19958515 PMCID: PMC2788356 DOI: 10.1186/1471-2105-10-s15-s5] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Number

Cited by Other Article(s)

Dosch J, Bergmann H, Tran V, Ebersberger I. FAS: assessing the similarity between proteins using multi-layered feature architectures. Bioinformatics 2023;39:btad226. [PMID: 37084276 PMCID: PMC10185405 DOI: 10.1093/bioinformatics/btad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 02/23/2023] [Accepted: 04/13/2023] [Indexed: 04/23/2023] Open

Moussa S, Kilgour M, Jans C, Hernandez-Garcia A, Cuperlovic-Culf M, Bengio Y, Simine L. Diversifying Design of Nucleic Acid Aptamers Using Unsupervised Machine Learning. J Phys Chem B 2023;127:62-68. [PMID: 36574492 DOI: 10.1021/acs.jpcb.2c05660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Sepúlveda V, Maurelia F, González M, Aguayo J, Caprile T. SCO-spondin, a giant matricellular protein that regulates cerebrospinal fluid activity. Fluids Barriers CNS 2021;18:45. [PMID: 34600566 PMCID: PMC8487547 DOI: 10.1186/s12987-021-00277-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 09/11/2021] [Indexed: 12/28/2022] Open

Abstract

Cerebrospinal fluid is a clear fluid that occupies the ventricular and subarachnoid spaces within and around the brain and spinal cord. Cerebrospinal fluid is a dynamic signaling milieu that transports nutrients, waste materials and neuroactive substances that are crucial for the development, homeostasis and functionality of the central nervous system. The mechanisms that enable cerebrospinal fluid to simultaneously exert these homeostatic/dynamic functions are not fully understood. SCO-spondin is a large glycoprotein secreted since the early stages of development into the cerebrospinal fluid. Its domain architecture resembles a combination of a matricellular protein and the ligand-binding region of LDL receptor family. The matricellular proteins are a group of extracellular proteins with the capacity to interact with different molecules, such as growth factors, cytokines and cellular receptors; enabling the integration of information to modulate various physiological and pathological processes. In the same way, the LDL receptor family interacts with many ligands, including β-amyloid peptide and different growth factors. The domains similarity suggests that SCO-spondin is a matricellular protein enabled to bind, modulate, and transport different cerebrospinal fluid molecules. SCO-spondin can be found soluble or polymerized into a dynamic threadlike structure called the Reissner fiber, which extends from the diencephalon to the caudal tip of the spinal cord. Reissner fiber continuously moves caudally as new SCO-spondin molecules are added at the cephalic end and are disaggregated at the caudal end. This movement, like a conveyor belt, allows the transport of the bound molecules, thereby increasing their lifespan and action radius. The binding of SCO-spondin to some relevant molecules has already been reported; however, in this review we suggest more than 30 possible binding partners, including peptide β-amyloid and several growth factors. This new perspective characterizes SCO-spondin as a regulator of cerebrospinal fluid activity, explaining its high evolutionary conservation, its apparent multifunctionality, and the lethality or severe malformations, such as hydrocephalus and curved body axis, of knockout embryos. Understanding the regulation and identifying binding partners of SCO-spondin are crucial for better comprehension of cerebrospinal fluid physiology.

Collapse

Strepis N, Naranjo HD, Meier-Kolthoff J, Göker M, Shapiro N, Kyrpides N, Klenk HP, Schaap PJ, Stams AJM, Sousa DZ. Genome-guided analysis allows the identification of novel physiological traits in Trichococcus species. BMC Genomics 2020;21:24. [PMID: 31914924 PMCID: PMC6950789 DOI: 10.1186/s12864-019-6410-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 12/18/2019] [Indexed: 11/29/2022] Open

Abstract

BACKGROUND

The genus Trichococcus currently contains nine species: T. flocculiformis, T. pasteurii, T. palustris, T. collinsii, T. patagoniensis, T. ilyis, T. paludicola, T. alkaliphilus, and T. shcherbakoviae. In general, Trichococcus species can degrade a wide range of carbohydrates. However, only T. pasteurii and a non-characterized strain of Trichococcus, strain ES5, have the capacity of converting glycerol to mainly 1,3-propanediol. Comparative genomic analysis of Trichococcus species provides the opportunity to further explore the physiological potential and uncover novel properties of this genus.

RESULTS

In this study, a genotype-phenotype comparative analysis of Trichococcus strains was performed. The genome of Trichococcus strain ES5 was sequenced and included in the comparison with the other nine type strains. Genes encoding functions related to e.g. the utilization of different carbon sources (glycerol, arabinan and alginate), antibiotic resistance, tolerance to low temperature and osmoregulation could be identified in all the sequences analysed. T. pasteurii and Trichococcus strain ES5 contain a operon with genes encoding necessary enzymes for 1,3-PDO production from glycerol. All the analysed genomes comprise genes encoding for cold shock domains, but only five of the Trichococcus species can grow at 0 °C. Protein domains associated to osmoregulation mechanisms are encoded in the genomes of all Trichococcus species, except in T. palustris, which had a lower resistance to salinity than the other nine studied Trichococcus strains.

CONCLUSIONS

Genome analysis and comparison of ten Trichococcus strains allowed the identification of physiological traits related to substrate utilization and environmental stress resistance (e.g. to cold and salinity). Some substrates were used by single species, e.g. alginate by T. collinsii and arabinan by T. alkaliphilus. Strain ES5 may represent a subspecies of Trichococcus flocculiformis and contrary to the type strain (DSM 2094T), is able to grow on glycerol with the production of 1,3-propanediol.

Collapse

Hernandez-Guerrero R, Galán-Vásquez E, Pérez-Rueda E. The protein architecture in Bacteria and Archaea identifies a set of promiscuous and ancient domains. PLoS One 2019;14:e0226604. [PMID: 31856202 PMCID: PMC6922389 DOI: 10.1371/journal.pone.0226604] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Accepted: 11/29/2019] [Indexed: 11/19/2022] Open

Evolution of Protein Domain Architectures. Methods Mol Biol 2019;1910:469-504. [PMID: 31278674 DOI: 10.1007/978-1-4939-9074-0_15] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Perez-Rueda E, Hernandez-Guerrero R, Martinez-Nuñez MA, Armenta-Medina D, Sanchez I, Ibarra JA. Abundance, diversity and domain architecture variability in prokaryotic DNA-binding transcription factors. PLoS One 2018;13:e0195332. [PMID: 29614096 PMCID: PMC5882156 DOI: 10.1371/journal.pone.0195332] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 03/20/2018] [Indexed: 02/04/2023] Open

Abstract

Gene regulation at the transcriptional level is a central process in all organisms, and DNA-binding transcription factors, known as TFs, play a fundamental role. This class of proteins usually binds at specific DNA sequences, activating or repressing gene expression. In general, TFs are composed of two domains: the DNA-binding domain (DBD) and an extra domain, which in this work we have named “companion domain” (CD). This latter could be involved in one or more functions such as ligand binding, protein-protein interactions or even with enzymatic activity. In contrast to DBDs, which have been widely characterized both experimentally and bioinformatically, information on the abundance, distribution, variability and possible role of the CDs is scarce. Here, we investigated these issues associated with the domain architectures of TFs in prokaryotic genomes. To this end, 19 families of TFs in 761 non-redundant bacterial and archaeal genomes were evaluated. In this regard we found four main groups based on the abundance and distribution in the analyzed genomes: i) LysR and TetR/AcrR; ii) AraC/XylS, SinR, and others; iii) Lrp, Fis, ArsR, and others; and iv) a group that included only two families, ArgR and BirA. Based on a classification of the organisms according to the life-styles, a major abundance of regulatory families in free-living organisms, in contrast with pathogenic, extremophilic or intracellular organisms, was identified. Finally, the protein architecture diversity associated to the 19 families considering a weight score for domain promiscuity evidenced which regulatory families were characterized by either a large diversity of CDs, here named as “promiscuous” families given the elevated number of variable domains found in those TFs, or a low diversity of CDs. Altogether this information helped us to understand the diversity and distribution of the 19 Prokaryotes TF families. Moreover, initial steps were taken to comprehend the variability of the extra domain in those TFs, which eventually might assist in evolutionary and functional studies.

Collapse

Mata AR, Pacheco CM, Cruz Pérez JF, Sáenz MM, Baca BE. In silico comparative analysis of GGDEF and EAL domain signaling proteins from the Azospirillum genomes. BMC Microbiol 2018;18:20. [PMID: 29523074 PMCID: PMC5845226 DOI: 10.1186/s12866-018-1157-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Accepted: 02/09/2018] [Indexed: 01/13/2023] Open

Abstract

BACKGROUND

The cyclic-di-GMP (c-di-GMP) second messenger exemplifies a signaling system that regulates many bacterial behaviors of key importance; among them, c-di-GMP controls the transition between motile and sessile life-styles in bacteria. Cellular c-di-GMP levels in bacteria are regulated by the opposite enzymatic activities of diguanylate cyclases and phosphodiesterases, which are proteins that have GGDEF and EAL domains, respectively. Azospirillum is a genus of plant-growth-promoting bacteria, and members of this genus have beneficial effects in many agronomically and ecologically essential plants. These bacteria also inhabit aquatic ecosystems, and have been isolated from humus-reducing habitats. Bioinformatic and structural approaches were used to identify genes predicted to encode GG[D/E]EF, EAL and GG[D/E]EF-EAL domain proteins from nine genome sequences.

RESULTS

The analyzed sequences revealed that the genomes of A. humicireducens SgZ-5T, A. lipoferum 4B, Azospirillum sp. B510, A. thiophilum BV-ST, A. halopraeferens DSM3675, A. oryzae A2P, and A. brasilense Sp7, Sp245 and Az39 encode for 29 to 41 of these predicted proteins. Notably, only 15 proteins were conserved in all nine genomes: eight GGDEF, three EAL and four GGDEF-EAL hybrid domain proteins, all of which corresponded to core genes in the genomes. The predicted proteins exhibited variable lengths, architectures and sensor domains. In addition, the predicted cellular localizations showed that some of the proteins to contain transmembrane domains, suggesting that these proteins are anchored to the membrane. Therefore, as reported in other soil bacteria, the Azospirillum genomes encode a large number of proteins that are likely involved in c-di-GMP metabolism. In addition, the data obtained here strongly suggest host specificity and environment specific adaptation.

CONCLUSIONS

Bacteria of the Azospirillum genus cope with diverse environmental conditions to survive in soil and aquatic habitats and, in certain cases, to colonize and benefit their host plant. Gaining information on the structures of proteins involved in c-di-GMP metabolism in Azospirillum appears to be an important step in determining the c-di-GMP signaling pathways, involved in the transition of a motile cell towards a biofilm life-style, as an example of microbial genome plasticity under diverse in situ environments.

Collapse

Koehorst JJ, Saccenti E, Schaap PJ, Martins Dos Santos VAP, Suarez-Diez M. Protein domain architectures provide a fast, efficient and scalable alternative to sequence-based methods for comparative functional genomics. F1000Res 2016;5:1987. [PMID: 27703668 PMCID: PMC5031134 DOI: 10.12688/f1000research.9416.3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/26/2017] [Indexed: 11/20/2022] Open

Doğan T, MacDougall A, Saidi R, Poggioli D, Bateman A, O'Donovan C, Martin MJ. UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB. Bioinformatics 2016;32:2264-71. [PMID: 27153729 PMCID: PMC4965628 DOI: 10.1093/bioinformatics/btw114] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2015] [Revised: 01/22/2016] [Accepted: 02/25/2016] [Indexed: 11/17/2022] Open

Figueiredo HCP, Soares SC, Pereira FL, Dorella FA, Carvalho AF, Teixeira JP, Azevedo VAC, Leal CAG. Comparative genome analysis of Weissella ceti, an emerging pathogen of farm-raised rainbow trout. BMC Genomics 2015;16:1095. [PMID: 26694728 PMCID: PMC4687380 DOI: 10.1186/s12864-015-2324-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Accepted: 12/15/2015] [Indexed: 11/10/2022] Open

Abstract

Background

The genus Weissella belongs to the lactic acid bacteria and includes 18 currently identified species, predominantly isolated from fermented food but rarely from cases of bacteremia in animals. Recently, a new species, designated Weissella ceti, has been correlated with hemorrhagic illness in farm-raised rainbow trout in China, Brazil, and the USA, with high transmission and mortality rates during outbreaks. Although W. ceti is an important emerging veterinary pathogen, little is known about its genomic features or virulence mechanisms. To better understand these and to characterize the species, we have previously sequenced the genomes of W. ceti strains WS08, WS74, and WS105, isolated from different rainbow trout farms in Brazil and displaying different pulsed-field gel electrophoresis patterns. Here, we present a comparative analysis of the three previously sequenced genomes of W. ceti strains from Brazil along with W. ceti NC36 from the USA and those of other Weissella species.

Results

Phylogenomic and orthology-based analyses both showed a high-similarity in the genetic structure of these W. ceti strains. This structure is corroborated by the highly syntenic order of their genes and the neutral evolution inferred from Tajima’s D. A whole-genome multilocus sequence typing analysis distinguished strains WS08 and NC36 from strains WS74 and WS105. We predicted 10 putative genomic islands (GEI), among which PAIs 3a and 3b are phage sequences that occur only in WS105 and WS74, respectively, whereas PAI 1 is species specific.

Conclusions

We identified several genes putatively involved in the basic processes of bacterial physiology and pathogenesis, including survival in aquatic environment, adherence in the host, spread inside the host, resistance to immune-system-mediated stresses, and antibiotic resistance. These data provide new insights in the molecular epidemiology and host adaptation for this emerging pathogen in aquaculture.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-2324-4) contains supplementary material, which is available to authorized users.

Collapse

Assessing the Metabolic Diversity of Streptococcus from a Protein Domain Point of View. PLoS One 2015;10:e0137908. [PMID: 26366735 PMCID: PMC4569324 DOI: 10.1371/journal.pone.0137908] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 08/22/2015] [Indexed: 01/17/2023] Open

Analysis of the protein domain and domain architecture content in fungi and its application in the search of new antifungal targets. PLoS Comput Biol 2014;10:e1003733. [PMID: 25033262 PMCID: PMC4102429 DOI: 10.1371/journal.pcbi.1003733] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Accepted: 06/04/2014] [Indexed: 01/25/2023] Open

Abstract

Over the past several years fungal infections have shown an increasing incidence in the susceptible population, and caused high mortality rates. In parallel, multi-resistant fungi are emerging in human infections. Therefore, the identification of new potential antifungal targets is a priority. The first task of this study was to analyse the protein domain and domain architecture content of the 137 fungal proteomes (corresponding to 111 species) available in UniProtKB (UniProt KnowledgeBase) by January 2013. The resulting list of core and exclusive domain and domain architectures is provided in this paper. It delineates the different levels of fungal taxonomic classification: phylum, subphylum, order, genus and species. The analysis highlighted Aspergillus as the most diverse genus in terms of exclusive domain content. In addition, we also investigated which domains could be considered promiscuous in the different organisms. As an application of this analysis, we explored three different ways to detect potential targets for antifungal drugs. First, we compared the domain and domain architecture content of the human and fungal proteomes, and identified those domains and domain architectures only present in fungi. Secondly, we looked for information regarding fungal pathways in public repositories, where proteins containing promiscuous domains could be involved. Three pathways were identified as a result: lovastatin biosynthesis, xylan degradation and biosynthesis of siroheme. Finally, we classified a subset of the studied fungi in five groups depending on their occurrence in clinical samples. We then looked for exclusive domains in the groups that were more relevant clinically and determined which of them had the potential to bind small molecules. Overall, this study provides a comprehensive analysis of the available fungal proteomes and shows three approaches that can be used as a first step in the detection of new antifungal targets.

Some fungi have become pathogenic to plants and in a lesser extent to animals. Under certain conditions their presence in the human body can prove a threat for human health, especially for immunocompromised patients. Yet, some fungi can also infect healthy individuals. The low sensitivity of the antifungal drugs available together with the clinically observed resistance of some fungi raises the demand for new alternative treatments. Proteins are biological molecules which perform essential functions within the living organisms. Many of those functions are attributed to the varying folded structure of each protein. These configurations are composed of functional units -also called domains- each one independently responsible for a fraction of the overall biological function. Understanding how the different block combinations are distributed across members of the same or similar families of organisms is important. For instance, exclusive domain combinations can hold particular acquired functions. Blocks displaying a high mobility can play major roles for the organism's survival. The biological goal of this study was to analyse the functional implications of protein domains and domain combinations in the available fungal proteomes. This information can be used to highlight proteins and pathways that could be potentially used as drug targets.

Collapse

Terrapon N, Weiner J, Grath S, Moore AD, Bornberg-Bauer E. Rapid similarity search of proteins using alignments of domain arrangements. ACTA ACUST UNITED AC 2013;30:274-81. [PMID: 23828785 DOI: 10.1093/bioinformatics/btt379] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]

Syamaladevi DP, Joshi A, Sowdhamini R. An alignment-free domain architecture similarity search (ADASS) algorithm for inferring homology between multi-domain proteins. Bioinformation 2013;9:491-9. [PMID: 23861564 PMCID: PMC3705623 DOI: 10.6026/97320630009491] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2012] [Revised: 01/01/2013] [Accepted: 01/02/2013] [Indexed: 11/23/2022] Open

Wang JJY, Bensmail H, Gao X. Multiple graph regularized protein domain ranking. BMC Bioinformatics 2012;13:307. [PMID: 23157331 PMCID: PMC3583823 DOI: 10.1186/1471-2105-13-307] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Accepted: 10/29/2012] [Indexed: 11/10/2022] Open

Wang J, Gao X, Wang Q, Li Y. ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval. BMC Bioinformatics 2012;13 Suppl 7:S2. [PMID: 22594999 PMCID: PMC3348016 DOI: 10.1186/1471-2105-13-s7-s2] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Abstract

BACKGROUND

The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database.

RESULTS

In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure dij by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context N(i) and N(j) is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing dij by a factor learned from the context N(i) and N(j).Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update the Protein Hierarchial Context Coherently in an iterative algorithm--ProDis-ContSHC.We test the performance of ProDis-ContSHC on two benchmark sets, i.e., the ASTRAL 1.73 database and the FSSP/DALI database. Experimental results demonstrate that plugging our supervised contextual dissimilarity measures into the retrieval systems significantly outperforms the context-free dissimilarity/similarity measures and other unsupervised contextual dissimilarity measures that do not use the class label information.

CONCLUSIONS

Using the contextual proteins with their class labels in the database, we can improve the accuracy of the pairwise dissimilarity/similarity measures dramatically for the protein retrieval tasks. In this work, for the first time, we propose the idea of supervised contextual dissimilarity learning, resulting in the ProDis-ContSHC algorithm. Among different contextual dissimilarity learning approaches that can be used to compare a pair of proteins, ProDis-ContSHC provides the highest accuracy. Finally, ProDis-ContSHC compares favorably with other methods reported in the recent literature.

Collapse

Cohen-Gihon I, Fong JH, Sharan R, Nussinov R, Przytycka TM, Panchenko AR. Evolution of domain promiscuity in eukaryotic genomes--a perspective from the inferred ancestral domain architectures. MOLECULAR BIOSYSTEMS 2011;7:784-92. [PMID: 21127809 PMCID: PMC3321261 DOI: 10.1039/c0mb00182a] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

FACT: functional annotation transfer between proteins with similar feature architectures. BMC Bioinformatics 2010;11:417. [PMID: 20696036 PMCID: PMC2931517 DOI: 10.1186/1471-2105-11-417] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2010] [Accepted: 08/09/2010] [Indexed: 11/24/2022] Open

Abstract

Background

The increasing number of sequenced genomes provides the basis for exploring the genetic and functional diversity within the tree of life. Only a tiny fraction of the encoded proteins undergoes a thorough experimental characterization. For the remainder, bioinformatics annotation tools are the only means to infer their function. Exploiting significant sequence similarities to already characterized proteins, commonly taken as evidence for homology, is the prevalent method to deduce functional equivalence. Such methods fail when homologs are too diverged, or when they have assumed a different function. Finally, due to convergent evolution, functional equivalence is not necessarily linked to common ancestry. Therefore complementary approaches are required to identify functional equivalents.

Results

We present the Feature Architecture Comparison Tool http://www.cibiv.at/FACT to search for functionally equivalent proteins. FACT uses the similarity between feature architectures of two proteins, i.e., the arrangements of functional domains, secondary structure elements and compositional properties, as a proxy for their functional equivalence. A scoring function measures feature architecture similarities, which enables searching for functional equivalents in entire proteomes. Our evaluation of 9,570 EC classified enzymes revealed that FACT, using the full feature, set outperformed the existing architecture-based approaches by identifying significantly more functional equivalents as highest scoring proteins. We show that FACT can identify functional equivalents that share no significant sequence similarity. However, when the highest scoring protein of FACT is also the protein with the highest local sequence similarity, it is in 99% of the cases functionally equivalent to the query. We demonstrate the versatility of FACT by identifying a missing link in the yeast glutathione metabolism and also by searching for the human GolgA5 equivalent in Trypanosoma brucei.

Conclusions

FACT facilitates a quick and sensitive search for functionally equivalent proteins in entire proteomes. FACT is complementary to approaches using sequence similarity to identify proteins with the same function. Thus, FACT is particularly useful when functional equivalents need to be identified in evolutionarily distant species, or when functional equivalents are not homologous. The most reliable annotation transfers, however, are achieved when feature architecture similarity and sequence similarity are jointly taken into account.

Collapse

Ranganathan S. Towards a career in bioinformatics. BMC Bioinformatics 2009;10 Suppl 15:S1. [PMID: 19958508 PMCID: PMC2788349 DOI: 10.1186/1471-2105-10-s15-s1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open