Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Kamat G, Shan M, Gutman R. Bayesian record linkage with variables in one file. Stat Med 2023;42:4931-4951. [PMID: 37652076 DOI: 10.1002/sim.9894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 06/12/2023] [Accepted: 08/21/2023] [Indexed: 09/02/2023]

Andreella A, De Santis R, Vesely A, Finos L. Procrustes-based distances for exploring between-matrices similarity. STAT METHOD APPL-GER 2023. [DOI: 10.1007/s10260-023-00689-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]

Andreella A, Finos L. Procrustes Analysis for High-Dimensional Data. PSYCHOMETRIKA 2022;87:1422-1438. [PMID: 35583747 PMCID: PMC9636303 DOI: 10.1007/s11336-022-09859-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 03/01/2022] [Indexed: 05/31/2023]

Improving Wildlife Population Inference Using Aerial Imagery and Entity Resolution. JOURNAL OF AGRICULTURAL, BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2022. [DOI: 10.1007/s13253-021-00484-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Fallaize CJ, Green PJ, Mardia KV, Barber S. Bayesian protein sequence and structure alignment. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Kasmi Y, Khataby K, Souiri A, Ennaji MM. Coronaviridae: 100,000 Years of Emergence and Reemergence. EMERGING AND REEMERGING VIRAL PATHOGENS 2020. [PMCID: PMC7149750 DOI: 10.1016/b978-0-12-819400-3.00007-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Zanella G. Informed Proposals for Local MCMC in Discrete Spaces. J Am Stat Assoc 2019. [DOI: 10.1080/01621459.2019.1585255] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Kent JT, Ganeiber AM, Mardia KV. A New Unified Approach for the Simulation of a Wide Class of Directional Distributions. J Comput Graph Stat 2018. [DOI: 10.1080/10618600.2017.1390468] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Eltzner B, Huckemann S, Mardia KV. Torus principal component analysis with applications to RNA structure. Ann Appl Stat 2018. [DOI: 10.1214/17-aoas1115] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Sikaroudi AE, Welch DA, Woehl TJ, Faller R, Evans JE, Browning ND, Park C. Directional Statistics of Preferential Orientations of Two Shapes in Their Aggregate and Its Application to Nanoparticle Aggregation. Technometrics 2018. [DOI: 10.1080/00401706.2017.1366949] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Ejlali N, Faghihi MR, Sadeghi M. Bayesian comparison of protein structures using partial Procrustes distance. Stat Appl Genet Mol Biol 2017;16:243-257. [PMID: 28862992 DOI: 10.1515/sagmb-2016-0014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Sadinle M. Bayesian Estimation of Bipartite Matchings for Record Linkage. J Am Stat Assoc 2017. [DOI: 10.1080/01621459.2016.1148612] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Bayesian selection of graphical regulatory models. Int J Approx Reason 2016. [DOI: 10.1016/j.ijar.2016.05.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Herman JL, Novák Á, Lyngsø R, Szabó A, Miklós I, Hein J. Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs. BMC Bioinformatics 2015;16:108. [PMID: 25888064 PMCID: PMC4395974 DOI: 10.1186/s12859-015-0516-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 02/24/2015] [Indexed: 11/30/2022] Open

Abstract

BACKGROUND

A standard procedure in many areas of bioinformatics is to use a single multiple sequence alignment (MSA) as the basis for various types of analysis. However, downstream results may be highly sensitive to the alignment used, and neglecting the uncertainty in the alignment can lead to significant bias in the resulting inference. In recent years, a number of approaches have been developed for probabilistic sampling of alignments, rather than simply generating a single optimum. However, this type of probabilistic information is currently not widely used in the context of downstream inference, since most existing algorithms are set up to make use of a single alignment.

RESULTS

In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased.

CONCLUSIONS

The alignment DAG provides a natural way to represent a distribution in the space of MSAs, and allows for existing algorithms to be efficiently scaled up to operate on large sets of alignments. As an example, we show how this can be used to compute marginal probabilities for tree topologies, averaging over a very large number of MSAs. This framework can also be used to generate a statistically meaningful summary alignment; example applications show that this summary alignment is consistently more accurate than the majority of the alignment samples, leading to improvements in downstream tree inference. Implementations of the methods described in this article are available at http://statalign.github.io/WeaveAlign .

Collapse

Najibi S, Faghihi M, Golalizadeh M, Arab S. Bayesian alignment of proteins via Delaunay tetrahedralization. J Appl Stat 2015. [DOI: 10.1080/02664763.2014.995605] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Herman JL, Challis CJ, Novák Á, Hein J, Schmidler SC. Simultaneous Bayesian estimation of alignment and phylogeny under a joint model of protein sequence and structure. Mol Biol Evol 2014;31:2251-66. [PMID: 24899668 PMCID: PMC4137710 DOI: 10.1093/molbev/msu184] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Shape and object data analysis. Biom J 2014;56:758-60. [DOI: 10.1002/bimj.201300220] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2013] [Revised: 01/23/2014] [Accepted: 01/23/2014] [Indexed: 11/07/2022]

Kent JT. Contribution to the Discussion of the Paper Geodesic Monte Carlo on Embedded Manifolds. Scand Stat Theory Appl 2014. [DOI: 10.1111/sjos.12068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Rodriguez A, Schmidler SC. BAYESIAN PROTEIN STRUCTURE ALIGNMENT. Ann Appl Stat 2014;8:2068-2095. [PMID: 26925188 DOI: 10.1214/14-aoas780] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Titterington DM. Biometrika highlights from volume 28 onwards. Biometrika 2013. [DOI: 10.1093/biomet/ass076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open

Su J, Srivastava A, Huffer F. Detection, classification and estimation of individual shapes in 2D and 3D point clouds. Comput Stat Data Anal 2013. [DOI: 10.1016/j.csda.2012.09.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Mardia KV. Statistical approaches to three key challenges in protein structural bioinformatics. J R Stat Soc Ser C Appl Stat 2013. [DOI: 10.1111/rssc.12003] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Mardia KV, Fallaize CJ, Barber S, Jackson RM, Theobald DL. BAYESIAN ALIGNMENT OF SIMILARITY SHAPES. Ann Appl Stat 2013;7:989-1009. [PMID: 24052809 PMCID: PMC3774796 DOI: 10.1214/12-aoas615] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Mardia KV, Petty EM, Taylor CC. Matching markers and unlabeled configurations in protein gels. Ann Appl Stat 2012. [DOI: 10.1214/12-aoas544] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Challis CJ, Schmidler SC. A stochastic evolutionary model for protein structure alignment and phylogeny. Mol Biol Evol 2012;29:3575-87. [PMID: 22723302 DOI: 10.1093/molbev/mss167] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Czogiel I, Dryden IL, Brignell CJ. Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment. Ann Appl Stat 2011. [DOI: 10.1214/11-aoas486] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Melnykov V, Maitra R, Nettleton D. Accounting for spot matching uncertainty in the analysis of proteomics data from two-dimensional gel electrophoresis. SANKHYA B 2011. [DOI: 10.1007/s13571-011-0016-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Tancredi A, Liseo B. A hierarchical Bayesian approach to record linkage and population size problems. Ann Appl Stat 2011. [DOI: 10.1214/10-aoas447] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Xia H(H, Ding Y, Mallick BK. Bayesian hierarchical model for combining misaligned two-resolution metrology data. ACTA ACUST UNITED AC 2011. [DOI: 10.1080/0740817x.2010.521804] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Mardia KV, Nyirongo VB, Fallaize CJ, Barber S, Jackson RM. Hierarchical bayesian modeling of pharmacophores in bioinformatics. Biometrics 2010;67:611-9. [PMID: 20618307 DOI: 10.1111/j.1541-0420.2010.01460.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Mardia KV. Bayesian analysis for bivariate von Mises distributions. J Appl Stat 2010. [DOI: 10.1080/02664760903551267] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Micheas AC, Peng Y. Bayesian Procrustes analysis with applications to hydrology. J Appl Stat 2009. [DOI: 10.1080/02664760802653560] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Kayano M, Konishi S. Functional principal component analysis via regularized Gaussian basis expansions and its application to unbalanced data. J Stat Plan Inference 2009. [DOI: 10.1016/j.jspi.2008.11.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Xie L, Xie L, Bourne PE. A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery. Bioinformatics 2009;25:i305-12. [PMID: 19478004 PMCID: PMC2687974 DOI: 10.1093/bioinformatics/btp220] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

Functional relationships between proteins that do not share global structure similarity can be established by detecting their ligand-binding-site similarity. For a large-scale comparison, it is critical to accurately and efficiently assess the statistical significance of this similarity. Here, we report an efficient statistical model that supports local sequence order independent ligand-binding-site similarity searching. Most existing statistical models only take into account the matching vertices between two sites that are defined by a fixed number of points. In reality, the boundary of the binding site is not known or is dependent on the bound ligand making these approaches limited. To address these shortcomings and to perform binding-site mapping on a genome-wide scale, we developed a sequence-order independent profile-profile alignment (SOIPPA) algorithm that is able to detect local similarity between unknown binding sites a priori. The SOIPPA scoring integrates geometric, evolutionary and physical information into a unified framework. However, this imposes a significant challenge in assessing the statistical significance of the similarity because the conventional probability model that is based on fixed-point matching cannot be applied. Here we find that scores for binding-site matching by SOIPPA follow an extreme value distribution (EVD). Benchmark studies show that the EVD model performs at least two-orders faster and is more accurate than the non-parametric statistical method in the previous SOIPPA version. Efficient statistical analysis makes it possible to apply SOIPPA to genome-based drug discovery. Consequently, we have applied the approach to the structural genome of Mycobacterium tuberculosis to construct a protein-ligand interaction network. The network reveals highly connected proteins, which represent suitable targets for promiscuous drugs.

Collapse

Habeck M. Generation of three-dimensional random rotations in fitting and matching problems. Comput Stat 2009. [DOI: 10.1007/s00180-009-0156-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Hamelryck T. Probabilistic models and machine learning in structural bioinformatics. Stat Methods Med Res 2009;18:505-26. [PMID: 19153168 DOI: 10.1177/0962280208099492] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Ruffieux Y, Green PJ. Alignment of Multiple Configurations Using Hierarchical Models. J Comput Graph Stat 2009. [DOI: 10.1198/jcgs.2009.07048] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Mardia KV, Nyirongo VB. Simulating virtual protein Calpha traces with applications. J Comput Biol 2008;15:1209-20. [PMID: 18973436 DOI: 10.1089/cmb.2007.0092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Marín JM, Nieto C. Spatial Matching of Multiple Configurations of Points with a Bioinformatics Application. COMMUN STAT-THEOR M 2008. [DOI: 10.1080/03610920701759669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Liu J, Yu W, Wu B, Zhao H. Bayesian Mass Spectra Peak Alignment from Mass Charge Ratios. Cancer Inform 2008. [DOI: 10.1177/117693510800600006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Mardia KV. Comment. J Am Stat Assoc 2007. [DOI: 10.1198/016214507000001210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Davies JR, Jackson RM, Mardia KV, Taylor CC. The Poisson Index: a new probabilistic model for protein–ligand binding site similarity. Bioinformatics 2007;23:3001-8. [PMID: 17893083 DOI: 10.1093/bioinformatics/btm470] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Dryden IL, Hirst JD, Melville JL. Statistical analysis of unlabeled point sets: comparing molecules in chemoinformatics. Biometrics 2007;63:237-51. [PMID: 17447950 DOI: 10.1111/j.1541-0420.2006.00622.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Bayesian refinement of protein functional site matching. BMC Bioinformatics 2007;8:257. [PMID: 17640336 PMCID: PMC1940029 DOI: 10.1186/1471-2105-8-257] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2006] [Accepted: 07/17/2007] [Indexed: 11/21/2022] Open

Abstract

Background

Matching functional sites is a key problem for the understanding of protein function and evolution. The commonly used graph theoretic approach, and other related approaches, require adjustment of a matching distance threshold a priori according to the noise in atomic positions. This is difficult to pre-determine when matching sites related by varying evolutionary distances and crystallographic precision. Furthermore, sometimes the graph method is unable to identify alternative but important solutions in the neighbourhood of the distance based solution because of strict distance constraints. We consider the Bayesian approach to improve graph based solutions. In principle this approach applies to other methods with strict distance matching constraints. The Bayesian method can flexibly incorporate all types of prior information on specific binding sites (e.g. amino acid types) in contrast to combinatorial formulations.

Results

We present a new meta-algorithm for matching protein functional sites (active sites and ligand binding sites) based on an initial graph matching followed by refinement using a Markov chain Monte Carlo (MCMC) procedure. This procedure is an innovative extension to our recent work. The method accounts for the 3-dimensional structure of the site as well as the physico-chemical properties of the constituent amino acids. The MCMC procedure can lead to a significant increase in the number of significant matches compared to the graph method as measured independently by rigorously derived p-values.

Conclusion

MCMC refinement step is able to significantly improve graph based matches. We apply the method to matching NAD(P)(H) binding sites within single Rossmann fold families, between different families in the same superfamily, and in different folds. Within families sites are often well conserved, but there are examples where significant shape based matches do not retain similar amino acid chemistry, indicating that even within families the same ligand may be bound using substantially different physico-chemistry. We also show that the procedure finds significant matches between binding sites for the same co-factor in different families and different folds.

Collapse