Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Coin L, Bateman A, Durbin R. Enhanced protein domain discovery using taxonomy. BMC Bioinformatics 2004;5:56. [PMID: 15137915 PMCID: PMC434490 DOI: 10.1186/1471-2105-5-56] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2004] [Accepted: 05/11/2004] [Indexed: 11/10/2022] Open

For:	Coin L, Bateman A, Durbin R. Enhanced protein domain discovery using taxonomy. BMC Bioinformatics 2004;5:56. [PMID: 15137915 PMCID: PMC434490 DOI: 10.1186/1471-2105-5-56] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2004] [Accepted: 05/11/2004] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Karlin DG. WIV, a protein domain found in a wide number of arthropod viruses, which probably facilitates infection. J Gen Virol 2024;105. [PMID: 38193819 DOI: 10.1099/jgv.0.001948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024] Open

Iyer MS, Joshi AG, Sowdhamini R. Genome-wide survey of remote homologues for protein domain superfamilies of known structure reveals unequal distribution across structural classes. Mol Omics 2018;14:266-280. [PMID: 29971307 DOI: 10.1039/c8mo00008e] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Neumann RS, Kumar S, Haverkamp THA, Shalchian-Tabrizi K. BLASTGrabber: a bioinformatic tool for visualization, analysis and sequence selection of massive BLAST data. BMC Bioinformatics 2014;15:128. [PMID: 24885091 PMCID: PMC4062517 DOI: 10.1186/1471-2105-15-128] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 03/31/2014] [Indexed: 12/16/2022] Open

Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer ELL, Tate J, Punta M. Pfam: the protein families database. Nucleic Acids Res 2013;42:D222-30. [PMID: 24288371 PMCID: PMC3965110 DOI: 10.1093/nar/gkt1223] [Citation(s) in RCA: 4306] [Impact Index Per Article: 391.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

Terrapon N, Gascuel O, Maréchal E, Bréhélin L. Fitting hidden Markov models of protein domains to a target species: application to Plasmodium falciparum. BMC Bioinformatics 2012;13:67. [PMID: 22548871 PMCID: PMC3434054 DOI: 10.1186/1471-2105-13-67] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2011] [Accepted: 05/01/2012] [Indexed: 01/12/2023] Open

Abstract

BACKGROUND

Hidden Markov Models (HMMs) are a powerful tool for protein domain identification. The Pfam database notably provides a large collection of HMMs which are widely used for the annotation of proteins in new sequenced organisms. In Pfam, each domain family is represented by a curated multiple sequence alignment from which a profile HMM is built. In spite of their high specificity, HMMs may lack sensitivity when searching for domains in divergent organisms. This is particularly the case for species with a biased amino-acid composition, such as P. falciparum, the main causal agent of human malaria. In this context, fitting HMMs to the specificities of the target proteome can help identify additional domains.

RESULTS

Using P. falciparum as an example, we compare approaches that have been proposed for this problem, and present two alternative methods. Because previous attempts strongly rely on known domain occurrences in the target species or its close relatives, they mainly improve the detection of domains which belong to already identified families. Our methods learn global correction rules that adjust amino-acid distributions associated with the match states of HMMs. These rules are applied to all match states of the whole HMM library, thus enabling the detection of domains from previously absent families. Additionally, we propose a procedure to estimate the proportion of false positives among the newly discovered domains. Starting with the Pfam standard library, we build several new libraries with the different HMM-fitting approaches. These libraries are first used to detect new domain occurrences with low E-values. Second, by applying the Co-Occurrence Domain Discovery (CODD) procedure we have recently proposed, the libraries are further used to identify likely occurrences among potential domains with higher E-values.

CONCLUSION

We show that the new approaches allow identification of several domain families previously absent in the P. falciparum proteome and the Apicomplexa phylum, and identify many domains that are not detected by previous approaches. In terms of the number of new discovered domains, the new approaches outperform the previous ones when no close species are available or when they are used to identify likely occurrences among potential domains with high E-values. All predictions on P. falciparum have been integrated into a dedicated website which pools all known/new annotations of protein domains and functions for this organism. A software implementing the two proposed approaches is available at the same address: http://www.lirmm.fr/~terrapon/HMMﬁt/

Collapse

Charoensawan V, Wilson D, Teichmann SA. Lineage-specific expansion of DNA-binding transcription factor families. Trends Genet 2010;26:388-93. [PMID: 20675012 PMCID: PMC2937223 DOI: 10.1016/j.tig.2010.06.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2010] [Revised: 06/11/2010] [Accepted: 06/11/2010] [Indexed: 11/06/2022]

Charoensawan V, Wilson D, Teichmann SA. Genomic repertoires of DNA-binding transcription factors across the tree of life. Nucleic Acids Res 2010;38:7364-77. [PMID: 20675356 PMCID: PMC2995046 DOI: 10.1093/nar/gkq617] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Hulo N, Bairoch A, Bulliard V, Cerutti L, Cuche BA, de Castro E, Lachaize C, Langendijk-Genevaux PS, Sigrist CJA. The 20 years of PROSITE. Nucleic Acids Res 2007;36:D245-9. [PMID: 18003654 PMCID: PMC2238851 DOI: 10.1093/nar/gkm977] [Citation(s) in RCA: 326] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Boekhorst J, Snel B. Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties. BMC Bioinformatics 2007;8:356. [PMID: 17888146 PMCID: PMC2048517 DOI: 10.1186/1471-2105-8-356] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2007] [Accepted: 09/21/2007] [Indexed: 11/10/2022] Open

Cheng J. DOMAC: an accurate, hybrid protein domain prediction server. Nucleic Acids Res 2007;35:W354-6. [PMID: 17553833 PMCID: PMC1933197 DOI: 10.1093/nar/gkm390] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Novatchkova M, Schneider G, Fritz R, Eisenhaber F, Schleiffer A. DOUTfinder--identification of distant domain outliers using subsignificant sequence similarity. Nucleic Acids Res 2006;34:W214-8. [PMID: 16844996 PMCID: PMC1538801 DOI: 10.1093/nar/gkl332] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Wistrand M, Sonnhammer ELL. Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER. BMC Bioinformatics 2005;6:99. [PMID: 15831105 PMCID: PMC1097716 DOI: 10.1186/1471-2105-6-99] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2005] [Accepted: 04/15/2005] [Indexed: 11/24/2022] Open

Galperin MY, Koonin EV. 'Conserved hypothetical' proteins: prioritization of targets for experimental study. Nucleic Acids Res 2004;32:5452-63. [PMID: 15479782 PMCID: PMC524295 DOI: 10.1093/nar/gkh885] [Citation(s) in RCA: 298] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open