Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Park J, Karplus K, Barrett C, Hughey R, Haussler D, Hubbard T, Chothia C. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol 1998;284:1201-10. [PMID: 9837738 DOI: 10.1006/jmbi.1998.2221] [Citation(s) in RCA: 340] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

For:	Park J, Karplus K, Barrett C, Hughey R, Haussler D, Hubbard T, Chothia C. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol 1998;284:1201-10. [PMID: 9837738 DOI: 10.1006/jmbi.1998.2221] [Citation(s) in RCA: 340] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Number

Cited by Other Article(s)

Malkhed V, Mustyala KK, Potlapally SR, Vuruputuri U. Identification of novel leads applyingin silicostudies for Mycobacterium multidrug resistant (MMR) protein. J Biomol Struct Dyn 2013;32:1889-906. [DOI: 10.1080/07391102.2013.842185] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Lapadula WJ, Sánchez Puerta MV, Juri Ayub M. Revising the taxonomic distribution, origin and evolution of ribosome inactivating protein genes. PLoS One 2013;8:e72825. [PMID: 24039805 PMCID: PMC3764214 DOI: 10.1371/journal.pone.0072825] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2013] [Accepted: 07/13/2013] [Indexed: 11/24/2022] Open

Belenki L, Sterzik V, Bohnert M. Similarity analysis of spectra obtained via reflectance spectrometry in legal medicine. JOURNAL OF LABORATORY AUTOMATION 2013;19:110-8. [PMID: 23897013 DOI: 10.1177/2211068213496089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Mishra S, Saxena A, Sangwan RS. Fundamentals of Homology Modeling Steps and Comparison among Important Bioinformatics Tools: An Overview. ACTA ACUST UNITED AC 2013. [DOI: 10.17311/sciintl.2013.237.252] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]

Gonzalez MW, Spouge JL. Domain analysis of symbionts and hosts (DASH) in a genome-wide survey of pathogenic human viruses. BMC Res Notes 2013;6:209. [PMID: 23706066 PMCID: PMC3672079 DOI: 10.1186/1756-0500-6-209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Accepted: 05/17/2013] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

In the coevolution of viruses and their hosts, viruses often capture host genes, gaining advantageous functions (e.g. immune system control). Identifying functional similarities shared by viruses and their hosts can help decipher mechanisms of pathogenesis and accelerate virus-targeted drug and vaccine development. Cellular homologs in viruses are usually documented using pairwise-sequence comparison methods. Yet, pairwise-sequence searches have limited sensitivity resulting in poor identification of divergent homologies.

RESULTS

Methods based on profiles from multiple sequences provide a more sensitive alternative to identify similarities in host-pathogen systems. The present work describes a profile-based bioinformatics pipeline that we call the Domain Analysis of Symbionts and Hosts (DASH). DASH provides a web platform for the functional analysis of viral and host genomes. This study uses Human Herpesvirus 8 (HHV-8) as a model to validate the methodology. Our results indicate that HHV-8 shares at least 29% of its genes with humans (fourteen immunomodulatory and ten metabolic genes). DASH also suggests functions for fifty-one additional HHV-8 structural and metabolic proteins. We also perform two other comparative genomics studies of human viruses: (1) a broad survey of eleven viruses of disparate sizes and transcription strategies; and (2) a closer examination of forty-one viruses of the order Mononegavirales. In the survey, DASH detects human homologs in 4/5 DNA viruses. None of the non-retro-transcribing RNA viruses in the survey showed evidence of homology to humans. The order Mononegavirales are also non-retro-transcribing RNA viruses, however, and DASH found homology in 39/41 of them. Mononegaviruses display larger fractions of human similarities (up to 75%) than any of the other RNA or DNA viruses (up to 55% and 29% respectively).

CONCLUSIONS

We conclude that gene sharing probably occurs between humans and both DNA and RNA viruses, in viral genomes of differing sizes, regardless of transcription strategies. Our method (DASH) simultaneously analyzes the genomes of two interacting species thereby mining functional information to identify shared as well as exclusive domains to each organism. Our results validate our approach, showing that DASH has potential as a pipeline for making therapeutic discoveries in other host-symbiont systems. DASH results are available at http://tinyurl.com/spouge-dash.

Collapse

Schuepbach T, Pagni M, Bridge A, Bougueleret L, Xenarios I, Cerutti L. pfsearchV3: a code acceleration and heuristic to search PROSITE profiles. Bioinformatics 2013;29:1215-7. [PMID: 23505298 PMCID: PMC3634184 DOI: 10.1093/bioinformatics/btt129] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open

Maulik U, Sarkar A. Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels. PLoS One 2013;8:e46468. [PMID: 23457439 PMCID: PMC3574063 DOI: 10.1371/journal.pone.0046468] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2011] [Accepted: 09/04/2012] [Indexed: 11/18/2022] Open

Udaka K, Mamitsuka H, Nakaseko Y, Abe N. Prediction of MHC class I binding peptides by a query learning algorithm based on hidden markov models. J Biol Phys 2013;28:183-94. [PMID: 23345768 DOI: 10.1023/a:1019931731519] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Wheeler TJ, Clements J, Eddy SR, Hubley R, Jones TA, Jurka J, Smit AFA, Finn RD. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res 2013;41:D70-82. [PMID: 23203985 PMCID: PMC3531169 DOI: 10.1093/nar/gks1265] [Citation(s) in RCA: 215] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 11/04/2012] [Accepted: 11/05/2012] [Indexed: 11/28/2022] Open

Joshi AG, Raghavender US, Sowdhamini R. Improved performance of sequence search approaches in remote homology detection. F1000Res 2013;2:93. [PMID: 25469226 PMCID: PMC4240247 DOI: 10.12688/f1000research.2-93.v2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/27/2014] [Indexed: 11/20/2022] Open

Vyas VK, Ukawala RD, Ghate M, Chintha C. Homology modeling a fast tool for drug discovery: current perspectives. Indian J Pharm Sci 2012. [PMID: 23204616 PMCID: PMC3507339 DOI: 10.4103/0250-474x.102537] [Citation(s) in RCA: 155] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Shih CH, Chang CM, Lin YS, Lo WC, Hwang JK. Evolutionary information hidden in a single protein structure. Proteins 2012;80:1647-57. [DOI: 10.1002/prot.24058] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2011] [Revised: 02/07/2012] [Accepted: 02/12/2012] [Indexed: 11/07/2022]

Hobiger K, Utesch T, Mroginski MA, Friedrich T. Coupling of Ci-VSP modules requires a combination of structure and electrostatics within the linker. Biophys J 2012;102:1313-22. [PMID: 22455914 DOI: 10.1016/j.bpj.2012.02.027] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2011] [Revised: 02/01/2012] [Accepted: 02/08/2012] [Indexed: 11/26/2022] Open

Gong YN, Chen GW, Shih SR. Characterization of subtypes of the influenza A hemagglutinin (HA) gene using profile hidden Markov models. JOURNAL OF MICROBIOLOGY, IMMUNOLOGY, AND INFECTION = WEI MIAN YU GAN RAN ZA ZHI 2011;45:404-10. [PMID: 22197681 DOI: 10.1016/j.jmii.2011.12.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Revised: 09/27/2011] [Accepted: 10/16/2011] [Indexed: 11/27/2022]

Hong Y, Kang J, Lee D, van Rossum DB. Adaptive GDDA-BLAST: fast and efficient algorithm for protein sequence embedding. PLoS One 2010;5:e13596. [PMID: 21042584 PMCID: PMC2962639 DOI: 10.1371/journal.pone.0013596] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2010] [Accepted: 09/28/2010] [Indexed: 11/28/2022] Open

Abstract

A major computational challenge in the genomic era is annotating structure/function to the vast quantities of sequence information that is now available. This problem is illustrated by the fact that most proteins lack comprehensive annotations, even when experimental evidence exists. We previously theorized that embedded-alignment profiles (simply "alignment profiles" hereafter) provide a quantitative method that is capable of relating the structural and functional properties of proteins, as well as their evolutionary relationships. A key feature of alignment profiles lies in the interoperability of data format (e.g., alignment information, physio-chemical information, genomic information, etc.). Indeed, we have demonstrated that the Position Specific Scoring Matrices (PSSMs) are an informative M-dimension that is scored by quantitatively measuring the embedded or unmodified sequence alignments. Moreover, the information obtained from these alignments is informative, and remains so even in the "twilight zone" of sequence similarity (<25% identity). Although our previous embedding strategy was powerful, it suffered from contaminating alignments (embedded AND unmodified) and high computational costs. Herein, we describe the logic and algorithmic process for a heuristic embedding strategy named "Adaptive GDDA-BLAST." Adaptive GDDA-BLAST is, on average, up to 19 times faster than, but has similar sensitivity to our previous method. Further, data are provided to demonstrate the benefits of embedded-alignment measurements in terms of detecting structural homology in highly divergent protein sequences and isolating secondary structural elements of transmembrane and ankyrin-repeat domains. Together, these advances allow further exploration of the embedded alignment data space within sufficiently large data sets to eventually induce relevant statistical inferences. We show that sequence embedding could serve as one of the vehicles for measurement of low-identity alignments and for incorporation thereof into high-performance PSSM-based alignment profiles.

Collapse

A novel secretory poly-cysteine and histidine-tailed metalloprotein (Ts-PCHTP) from Trichinella spiralis (Nematoda). PLoS One 2010;5:e13343. [PMID: 20967224 PMCID: PMC2954182 DOI: 10.1371/journal.pone.0013343] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2010] [Accepted: 09/16/2010] [Indexed: 11/19/2022] Open

Abstract

Background

Trichinella spiralis is an unusual parasitic intracellular nematode causing dedifferentiation of the host myofiber. Trichinella proteomic analyses have identified proteins that act at the interface between the parasite and the host and are probably important for the infection and pathogenesis. Many parasitic proteins, including a number of metalloproteins are unique for the nematodes and trichinellids and therefore present good targets for future therapeutic developments. Furthermore, detailed information on such proteins and their function in the nematode organism would provide better understanding of the parasite - host interactions.

Methodology/Principal Findings

In this study we report the identification, biochemical characterization and localization of a novel poly-cysteine and histidine-tailed metalloprotein (Ts-PCHTP). The native Ts-PCHTP was purified from T. spiralis muscle larvae that were isolated from infected rats as a model system. The sequence analysis showed no homology with other proteins. Two unique poly-cysteine domains were found in the amino acid sequence of Ts-PCHTP. This protein is also the first reported natural histidine tailed protein. It was suggested that Ts-PCHTP has metal binding properties. Total Reflection X-ray Fluorescence (TXRF) assay revealed that it binds significant concentrations of iron, nickel and zinc at protein:metal ratio of about 1∶2. Immunohistochemical analysis showed that the Ts-PCHTP is localized in the cuticle and in all tissues of the larvae, but that it is not excreted outside the parasite.

Conclusions/Significance

Our data suggest that Ts-PCHTP is the first described member of a novel nematode poly-cysteine protein family and its function could be metal storage and/or transport. Since this protein family is unique for parasites from Superfamily Trichinelloidea its potential applications in diagnostics and treatment could be exploited in future.

Collapse

Huynen MA, de Hollander M, Szklarczyk R. Mitochondrial proteome evolution and genetic disease. Biochim Biophys Acta Mol Basis Dis 2009;1792:1122-9. [DOI: 10.1016/j.bbadis.2009.03.005] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2008] [Revised: 03/04/2009] [Accepted: 03/20/2009] [Indexed: 11/16/2022]

Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire. Proc Natl Acad Sci U S A 2009;106:20216-21. [PMID: 19875695 DOI: 10.1073/pnas.0909775106] [Citation(s) in RCA: 351] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

Dlakić M. HHsvm: fast and accurate classification of profile-profile matches identified by HHsearch. ACTA ACUST UNITED AC 2009;25:3071-6. [PMID: 19773335 DOI: 10.1093/bioinformatics/btp555] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Tångrot JE, Kågström B, Sauer UH. Accurate domain identification with structure-anchored hidden Markov models, saHMMs. Proteins 2009;76:343-52. [PMID: 19173309 DOI: 10.1002/prot.22349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Abstract

The ever increasing speed of DNA sequencing widens the discrepancy between the number of known gene products, and the knowledge of their function and structure. Proper annotation of protein sequences is therefore crucial if the missing information is to be deduced from sequence-based similarity comparisons. These comparisons become exceedingly difficult as the pairwise identities drop to very low values. To improve the accuracy of domain identification, we exploit the fact that the three-dimensional structures of domains are much more conserved than their sequences. Based on structure-anchored multiple sequence alignments of low identity homologues we constructed 850 structure-anchored hidden Markov models (saHMMs), each representing one domain family. Since the saHMMs are highly family specific, they can be used to assign a domain to its correct family and clearly distinguish it from domains belonging to other families, even within the same superfamily. This task is not trivial and becomes particularly difficult if the unknown domain is distantly related to the rest of the domain sequences within the family. In a search with full length protein sequences, harbouring at least one domain as defined by the structural classification of proteins database (SCOP), version 1.71, versus the saHMM database based on SCOP version 1.69, we achieve an accuracy of 99.0%. All of the few hits outside the family fall within the correct superfamily. Compared to Pfam_ls HMMs, the saHMMs obtain about 11% higher coverage. A comparison with BLAST and PSI-BLAST demonstrates that the saHMMs have consistently fewer errors per query at a given coverage. Within our recommended E-value range, the same is true for a comparison with SUPERFAMILY. Furthermore, we are able to annotate 232 proteins with 530 nonoverlapping domains belonging to 102 different domain families among human proteins labelled "unknown" in the NCBI protein database. Our results demonstrate that the saHMM database represents a versatile and reliable tool for identification of domains in protein sequences. With the aid of saHMMs, homology on the family level can be assigned, even for distantly related sequences. Due to the construction of the saHMMs, the hits they provide are always associated with high quality crystal structures. The saHMM database can be accessed via the FISH server at http://babel.ucmp.umu.se/fish/.

Collapse

Nature of the protein universe. Proc Natl Acad Sci U S A 2009;106:11079-84. [PMID: 19541617 DOI: 10.1073/pnas.0905029106] [Citation(s) in RCA: 232] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Lee MM, Chan MK, Bundschuh R. SIB-BLAST: a web server for improved delineation of true and false positives in PSI-BLAST searches. Nucleic Acids Res 2009;37:W53-6. [PMID: 19429693 PMCID: PMC2703926 DOI: 10.1093/nar/gkp301] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Brandt BW, Heringa J. webPRC: the Profile Comparer for alignment-based searching of public domain databases. Nucleic Acids Res 2009;37:W48-52. [PMID: 19420063 PMCID: PMC2703954 DOI: 10.1093/nar/gkp279] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Koussounadis A, Redfern OC, Jones DT. Improving classification in protein structure databases using text mining. BMC Bioinformatics 2009;10:129. [PMID: 19416501 PMCID: PMC2688513 DOI: 10.1186/1471-2105-10-129] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2008] [Accepted: 05/05/2009] [Indexed: 11/25/2022] Open

Abstract

BACKGROUND

The classification of protein domains in the CATH resource is primarily based on structural comparisons, sequence similarity and manual analysis. One of the main bottlenecks in the processing of new entries is the evaluation of 'borderline' cases by human curators with reference to the literature, and better tools for helping both expert and non-expert users quickly identify relevant functional information from text are urgently needed. A text based method for protein classification is presented, which complements the existing sequence and structure-based approaches, especially in cases exhibiting low similarity to existing members and requiring manual intervention. The method is based on the assumption that textual similarity between sets of documents relating to proteins reflects biological function similarities and can be exploited to make classification decisions.

RESULTS

An optimal strategy for the text comparisons was identified by using an established gold standard enzyme dataset. Filtering of the abstracts using a machine learning approach to discriminate sentences containing functional, structural and classification information that are relevant to the protein classification task improved performance. Testing this classification scheme on a dataset of 'borderline' protein domains that lack significant sequence or structure similarity to classified proteins showed that although, as expected, the structural similarity classifiers perform better on average, there is a significant benefit in incorporating text similarity in logistic regression models, indicating significant orthogonality in this additional information. Coverage was significantly increased especially at low error rates, which is important for routine classification tasks: 15.3% for the combined structure and text classifier compared to 10% for the structural classifier alone, at 10-3 error rate. Finally when only the highest scoring predictions were used to infer classification, an extra 4.2% of correct decisions were made by the combined classifier.

CONCLUSION

We have described a simple text based method to classify protein domains that demonstrates an improvement over existing methods. The method is unique in incorporating structural and text based classifiers directly and is particularly useful in cases where inconclusive evidence from sequence or structure similarity requires laborious manual classification.

Collapse

Ray S, Bandyopadhyay S, Pal S. Combining Multisource Information Through Functional-Annotation-Based Weighting: Gene Function Prediction in Yeast. IEEE Trans Biomed Eng 2009;56:229-36. [DOI: 10.1109/tbme.2008.2005955] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Frech C, Kommenda M, Dorfer V, Kern T, Hintner H, Bauer JW, Onder K. Improved homology-driven computational validation of protein-protein interactions motivated by the evolutionary gene duplication and divergence hypothesis. BMC Bioinformatics 2009;10:21. [PMID: 19152684 PMCID: PMC2637843 DOI: 10.1186/1471-2105-10-21] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2008] [Accepted: 01/19/2009] [Indexed: 11/10/2022] Open

Goonesekere NC. Evaluating the efficacy of a structure-derived amino acid substitution matrix in detecting protein homologs by BLAST and PSI-BLAST. Adv Appl Bioinform Chem 2009;2:71-8. [PMID: 21918617 PMCID: PMC3169949 DOI: 10.2147/aabc.s5553] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

Immunogenicity in peptide-immunotherapy: from self/nonself to similar/dissimilar sequences. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2008;640:198-207. [PMID: 19065793 DOI: 10.1007/978-0-387-09789-3_15] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Protein subfamily assignment using the Conserved Domain Database. BMC Res Notes 2008;1:114. [PMID: 19014584 PMCID: PMC2632666 DOI: 10.1186/1756-0500-1-114] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2008] [Accepted: 11/14/2008] [Indexed: 11/10/2022] Open

Chen Z, Harb OS, Roos DS. In silico identification of specialized secretory-organelle proteins in apicomplexan parasites and in vivo validation in Toxoplasma gondii. PLoS One 2008;3:e3611. [PMID: 18974850 PMCID: PMC2575384 DOI: 10.1371/journal.pone.0003611] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2008] [Accepted: 10/06/2008] [Indexed: 12/04/2022] Open

Frenkel ZM. Does Protein Relatedness Require Sequence Matching? AlignmentviaNetworks in Sequence Space. J Biomol Struct Dyn 2008;26:215-22. [DOI: 10.1080/07391102.2008.10507237] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Cavallaro G, Decaria L, Rosato A. Genome-Based Analysis of Heme Biosynthesis and Uptake in Prokaryotic Systems. J Proteome Res 2008;7:4946-54. [DOI: 10.1021/pr8004309] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Phylogenetic profiles reveal evolutionary relationships within the "twilight zone" of sequence similarity. Proc Natl Acad Sci U S A 2008;105:13474-9. [PMID: 18765810 DOI: 10.1073/pnas.0803860105] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Bernsel A, Viklund H, Elofsson A. Remote homology detection of integral membrane proteins using conserved sequence features. Proteins 2008;71:1387-99. [PMID: 18076048 DOI: 10.1002/prot.21825] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using MODELLER. ACTA ACUST UNITED AC 2008;Chapter 2:Unit 2.9. [PMID: 18429317 DOI: 10.1002/0471140864.ps0209s50] [Citation(s) in RCA: 761] [Impact Index Per Article: 44.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Lingner T, Meinicke P. Word correlation matrices for protein sequence analysis and remote homology detection. BMC Bioinformatics 2008;9:259. [PMID: 18522726 PMCID: PMC2438326 DOI: 10.1186/1471-2105-9-259] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2008] [Accepted: 06/03/2008] [Indexed: 11/30/2022] Open

Eddy SR. A probabilistic model of local sequence alignment that simplifies statistical significance estimation. PLoS Comput Biol 2008;4:e1000069. [PMID: 18516236 PMCID: PMC2396288 DOI: 10.1371/journal.pcbi.1000069] [Citation(s) in RCA: 243] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2007] [Accepted: 03/26/2008] [Indexed: 11/19/2022] Open

Abstract

Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ = log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments.

Sequence database searches are a fundamental tool of molecular biology, enabling researchers to identify related sequences in other organisms, which often provides invaluable clues to the function and evolutionary history of genes. The power of database searches to detect more and more remote evolutionary relationships – essentially, to look back deeper in time – has improved steadily, with the adoption of more complex and realistic models. However, database searches require not just a realistic scoring model, but also the ability to distinguish good scores from bad ones – the ability to calculate the statistical significance of scores. For many models and scoring schemes, accurate statistical significance calculations have either involved expensive computational simulations, or not been feasible at all. Here, I introduce a probabilistic model of local sequence alignment that has readily predictable score statistics for position-specific profile scoring systems, and not just for traditional optimal alignment scores, but also for more powerful log-likelihood ratio scores derived in a full probabilistic inference framework. These results remove one of the main obstacles that have impeded the use of more powerful and biologically realistic statistical inference methods in sequence homology searches.

Collapse

Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using Modeller. ACTA ACUST UNITED AC 2008;Chapter 5:Unit-5.6. [PMID: 18428767 DOI: 10.1002/0471250953.bi0506s15] [Citation(s) in RCA: 1820] [Impact Index Per Article: 107.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Gholami A, Kassis R, Real E, Delmas O, Guadagnini S, Larrous F, Obach D, Prevost MC, Jacob Y, Bourhy H. Mitochondrial dysfunction in lyssavirus-induced apoptosis. J Virol 2008;82:4774-84. [PMID: 18321977 PMCID: PMC2346764 DOI: 10.1128/jvi.02651-07] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2007] [Accepted: 02/22/2008] [Indexed: 12/25/2022] Open

Lee B, Lee D. DAhunter: a web-based server that identifies homologous proteins by comparing domain architecture. Nucleic Acids Res 2008;36:W60-4. [PMID: 18411203 PMCID: PMC2447808 DOI: 10.1093/nar/gkn172] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

Lee MM, Chan MK, Bundschuh R. Simple is beautiful: a straightforward approach to improve the delineation of true and false positives in PSI-BLAST searches. Bioinformatics 2008;24:1339-43. [DOI: 10.1093/bioinformatics/btn130] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Durand PM, Coetzer TL. Utility of computational methods to identify the apoptosis machinery in unicellular eukaryotes. Bioinform Biol Insights 2008;2:101-17. [PMID: 19812769 PMCID: PMC2735952 DOI: 10.4137/bbi.s430] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

McGuffin LJ. Aligning sequences to structures. Methods Mol Biol 2008;413:61-90. [PMID: 18075162 DOI: 10.1007/978-1-59745-574-9_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2023]

Do CB, Katoh K. Protein multiple sequence alignment. Methods Mol Biol 2008;484:379-413. [PMID: 18592193 DOI: 10.1007/978-1-59745-398-1_25] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]

Marsden RL, Orengo CA. The Classification of Protein Domains. Bioinformatics 2008;453:123-46. [DOI: 10.1007/978-1-60327-429-6_5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Protein Structure Prediction. Bioinformatics 2008;453:33-85. [DOI: 10.1007/978-1-60327-429-6_2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Lee MM, Bundschuh R, Chan MK. Distant homology detection using a LEngth and STructure-based sequence Alignment Tool (LESTAT). Proteins 2007;71:1409-19. [DOI: 10.1002/prot.21830] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Frenkel ZM, Trifonov EN. From protein sequence space to elementary protein modules. Gene 2007;408:64-71. [PMID: 18022768 DOI: 10.1016/j.gene.2007.10.024] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2007] [Revised: 08/14/2007] [Accepted: 10/15/2007] [Indexed: 11/17/2022]

Goonesekere NCW, Lee B. Context-specific amino acid substitution matrices and their use in the detection of protein homologs. Proteins 2007;71:910-9. [DOI: 10.1002/prot.21775] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

100

Bernardes JS, Dávila AMR, Costa VS, Zaverucha G. Improving model construction of profile HMMs for remote homology detection through structural alignment. BMC Bioinformatics 2007;8:435. [PMID: 17999748 PMCID: PMC2245980 DOI: 10.1186/1471-2105-8-435] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2007] [Accepted: 11/09/2007] [Indexed: 11/14/2022] Open