Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Neuwald AF, Lanczycki CJ, Marchler-Bauer A. Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures. BMC Bioinformatics 2012;13:144. [PMID: 22726767 PMCID: PMC3599474 DOI: 10.1186/1471-2105-13-144] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2012] [Accepted: 06/09/2012] [Indexed: 11/17/2022] Open

For:	Neuwald AF, Lanczycki CJ, Marchler-Bauer A. Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures. BMC Bioinformatics 2012;13:144. [PMID: 22726767 PMCID: PMC3599474 DOI: 10.1186/1471-2105-13-144] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2012] [Accepted: 06/09/2012] [Indexed: 11/17/2022] Open

Number

Cited by Other Article(s)

Neuwald AF, Kolaczkowski BD, Altschul SF. eCOMPASS: evaluative comparison of multiple protein alignments by statistical score. Bioinformatics 2021;37:3456-3463. [PMID: 33983436 PMCID: PMC8545322 DOI: 10.1093/bioinformatics/btab374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 03/31/2021] [Accepted: 05/12/2021] [Indexed: 11/21/2022] Open

Pagnuco IA, Revuelta MV, Bondino HG, Brun M, ten Have A. HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold. PLoS One 2018;13:e0193757. [PMID: 29579071 PMCID: PMC5868777 DOI: 10.1371/journal.pone.0193757] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 02/04/2018] [Indexed: 11/19/2022] Open

Neuwald AF, Altschul SF. Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations. PLoS Comput Biol 2016;12:e1005294. [PMID: 28002465 PMCID: PMC5225019 DOI: 10.1371/journal.pcbi.1005294] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 01/10/2017] [Accepted: 12/08/2016] [Indexed: 11/25/2022] Open

Abstract

Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes’ theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).

Protein sequence data, when gathered in great quantity, contain important but implicit biological information manifest as statistical correlations. Here we describe an approach to access this information by comprehensively modeling and characterizing the distribution of sequences belonging to a major protein superfamily. This approach takes as input a large set of unaligned sequences belonging to the superfamily. By applying the minimum description length principle, it seeks the statistical model that best explains the sequences while avoiding over-fitting the data. It concurrently aligns the sequences and, to model evolutionary divergence, partitions them into subgroups that are hierarchically-arranged based upon correlated residue patterns. Auxiliary routines create PyMOL scripts to visualize the locations of correlated residues within available structures. Because these correlations likely arise from structural and biochemical constraints, they can help elucidate protein properties important for functional specificity. Comparing and contrasting sequence and structural features in this way may therefore suggest, in the light of published studies, plausible biological hypotheses for experimental investigation. We illustrate this approach with N-acetyltransferases.

Collapse

Neuwald AF. Gleaning structural and functional information from correlations in protein multiple sequence alignments. Curr Opin Struct Biol 2016;38:1-8. [PMID: 27179293 DOI: 10.1016/j.sbi.2016.04.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Revised: 04/28/2016] [Accepted: 04/29/2016] [Indexed: 10/24/2022]

Cao L, Graauw MD, Yan K, Winkel L, Verbeek FJ. Hierarchical classification strategy for Phenotype extraction from epidermal growth factor receptor endocytosis screening. BMC Bioinformatics 2016;17:196. [PMID: 27142862 PMCID: PMC4855371 DOI: 10.1186/s12859-016-1053-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Accepted: 04/13/2016] [Indexed: 11/29/2022] Open

Abstract

BACKGROUND

Endocytosis is regarded as a mechanism of attenuating the epidermal growth factor receptor (EGFR) signaling and of receptor degradation. There is increasing evidence becoming available showing that breast cancer progression is associated with a defect in EGFR endocytosis. In order to find related Ribonucleic acid (RNA) regulators in this process, high-throughput imaging with fluorescent markers is used to visualize the complex EGFR endocytosis process. Subsequently a dedicated automatic image and data analysis system is developed and applied to extract the phenotype measurement and distinguish different developmental episodes from a huge amount of images acquired through high-throughput imaging. For the image analysis, a phenotype measurement quantifies the important image information into distinct features or measurements. Therefore, the manner in which prominent measurements are chosen to represent the dynamics of the EGFR process becomes a crucial step for the identification of the phenotype. In the subsequent data analysis, classification is used to categorize each observation by making use of all prominent measurements obtained from image analysis. Therefore, a better construction for a classification strategy will support to raise the performance level in our image and data analysis system.

RESULTS

In this paper, we illustrate an integrated analysis method for EGFR signalling through image analysis of microscopy images. Sophisticated wavelet-based texture measurements are used to obtain a good description of the characteristic stages in the EGFR signalling. A hierarchical classification strategy is designed to improve the recognition of phenotypic episodes of EGFR during endocytosis. Different strategies for normalization, feature selection and classification are evaluated.

CONCLUSIONS

The results of performance assessment clearly demonstrate that our hierarchical classification scheme combined with a selected set of features provides a notable improvement in the temporal analysis of EGFR endocytosis. Moreover, it is shown that the addition of the wavelet-based texture features contributes to this improvement. Our workflow can be applied to drug discovery to analyze defected EGFR endocytosis processes.

Collapse

Neuwald AF, Altschul SF. Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties. PLoS Comput Biol 2016;12:e1004936. [PMID: 27192614 PMCID: PMC4871425 DOI: 10.1371/journal.pcbi.1004936] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Accepted: 04/24/2016] [Indexed: 11/19/2022] Open

Abstract

We describe a Bayesian Markov chain Monte Carlo (MCMC) sampler for protein multiple sequence alignment (MSA) that, as implemented in the program GISMO and applied to large numbers of diverse sequences, is more accurate than the popular MSA programs MUSCLE, MAFFT, Clustal-Ω and Kalign. Features of GISMO central to its performance are: (i) It employs a "top-down" strategy with a favorable asymptotic time complexity that first identifies regions generally shared by all the input sequences, and then realigns closely related subgroups in tandem. (ii) It infers position-specific gap penalties that favor insertions or deletions (indels) within each sequence at alignment positions in which indels are invoked in other sequences. This favors the placement of insertions between conserved blocks, which can be understood as making up the proteins' structural core. (iii) It uses a Bayesian statistical measure of alignment quality based on the minimum description length principle and on Dirichlet mixture priors. Consequently, GISMO aligns sequence regions only when statistically justified. This is unlike methods based on the ad hoc, but widely used, sum-of-the-pairs scoring system, which will align random sequences. (iv) It defines a system for exploring alignment space that provides natural avenues for further experimentation through the development of new sampling strategies for more efficiently escaping from suboptimal traps. GISMO's superior performance is illustrated using 408 protein sets containing, on average, 235 sequences. These sets correspond to NCBI Conserved Domain Database alignments, which have been manually curated in the light of available crystal structures, and thus provide a means to assess alignment accuracy. GISMO fills a different niche than other MSA programs, namely identifying and aligning a conserved domain present within a large, diverse set of full length sequences. The GISMO program is available at http://gismo.igs.umaryland.edu/.

Collapse

Neuwald AF. Evaluating, comparing, and interpreting protein domain hierarchies. J Comput Biol 2014;21:287-302. [PMID: 24559108 DOI: 10.1089/cmb.2013.0098] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Neuwald AF. A Bayesian sampler for optimization of protein domain hierarchies. J Comput Biol 2014;21:269-86. [PMID: 24494927 DOI: 10.1089/cmb.2013.0099] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Chakraborty A, Chakrabarti S. A survey on prediction of specificity-determining sites in proteins. Brief Bioinform 2014;16:71-88. [DOI: 10.1093/bib/bbt092] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Marchler-Bauer A, Zheng C, Chitsaz F, Derbyshire MK, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Lanczycki CJ, Lu F, Lu S, Marchler GH, Song JS, Thanki N, Yamashita RA, Zhang D, Bryant SH. CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res 2012. [PMID: 23197659 PMCID: PMC3531192 DOI: 10.1093/nar/gks1243] [Citation(s) in RCA: 649] [Impact Index Per Article: 54.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open