1
|
Li P, Jiang X, Kambhamettu C, Shatkay H. Compound image segmentation of published biomedical figures. Bioinformatics 2018; 34:1192-1199. [PMID: 29040394 DOI: 10.1093/bioinformatics/btx611] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 09/22/2017] [Indexed: 12/28/2022] Open
Abstract
Motivation Images convey essential information in biomedical publications. As such, there is a growing interest within the bio-curation and the bio-databases communities, to store images within publications as evidence for biomedical processes and for experimental results. However, many of the images in biomedical publications are compound images consisting of multiple panels, where each individual panel potentially conveys a different type of information. Segmenting such images into constituent panels is an essential first step toward utilizing images. Results In this article, we develop a new compound image segmentation system, FigSplit, which is based on Connected Component Analysis. To overcome shortcomings typically manifested by existing methods, we develop a quality assessment step for evaluating and modifying segmentations. Two methods are proposed to re-segment the images if the initial segmentation is inaccurate. Experimental results show the effectiveness of our method compared with other methods. Availability and implementation The system is publicly available for use at: https://www.eecis.udel.edu/~compbio/FigSplit. The code is available upon request. Contact shatkay@udel.edu. Supplementary information Supplementary data are available online at Bioinformatics.
Collapse
Affiliation(s)
- Pengyuan Li
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA
| | - Xiangying Jiang
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA
| | - Chandra Kambhamettu
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA
| | - Hagit Shatkay
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA
| |
Collapse
|
2
|
Clark SA, Tronrud DE, Karplus PA. Residue-level global and local ensemble-ensemble comparisons of protein domains. Protein Sci 2015; 24:1528-42. [PMID: 26032515 DOI: 10.1002/pro.2714] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Accepted: 05/13/2015] [Indexed: 02/03/2023]
Abstract
Many methods of protein structure generation such as NMR-based solution structure determination and template-based modeling do not produce a single model, but an ensemble of models consistent with the available information. Current strategies for comparing ensembles lose information because they use only a single representative structure. Here, we describe the ENSEMBLATOR and its novel strategy to directly compare two ensembles containing the same atoms to identify significant global and local backbone differences between them on per-atom and per-residue levels, respectively. The ENSEMBLATOR has four components: eePREP (ee for ensemble-ensemble), which selects atoms common to all models; eeCORE, which identifies atoms belonging to a cutoff-distance dependent common core; eeGLOBAL, which globally superimposes all models using the defined core atoms and calculates for each atom the two intraensemble variations, the interensemble variation, and the closest approach of members of the two ensembles; and eeLOCAL, which performs a local overlay of each dipeptide and, using a novel measure of local backbone similarity, reports the same four variations as eeGLOBAL. The combination of eeGLOBAL and eeLOCAL analyses identifies the most significant differences between ensembles. We illustrate the ENSEMBLATOR's capabilities by showing how using it to analyze NMR ensembles and to compare NMR ensembles with crystal structures provides novel insights compared to published studies. One of these studies leads us to suggest that a "consistency check" of NMR-derived ensembles may be a useful analysis step for NMR-based structure determinations in general. The ENSEMBLATOR 1.0 is available as a first generation tool to carry out ensemble-ensemble comparisons.
Collapse
Affiliation(s)
- Sarah A Clark
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon, 97331
| | - Dale E Tronrud
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon, 97331
| | - P Andrew Karplus
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon, 97331
| |
Collapse
|
3
|
Gapsys V, de Groot BL. Optimal superpositioning of flexible molecule ensembles. Biophys J 2013; 104:196-207. [PMID: 23332072 DOI: 10.1016/j.bpj.2012.11.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2012] [Revised: 10/31/2012] [Accepted: 11/01/2012] [Indexed: 12/01/2022] Open
Abstract
Analysis of the internal dynamics of a biological molecule requires the successful removal of overall translation and rotation. Particularly for flexible or intrinsically disordered peptides, this is a challenging task due to the absence of a well-defined reference structure that could be used for superpositioning. In this work, we started the analysis with a widely known formulation of an objective for the problem of superimposing a set of multiple molecules as variance minimization over an ensemble. A negative effect of this superpositioning method is the introduction of ambiguous rotations, where different rotation matrices may be applied to structurally similar molecules. We developed two algorithms to resolve the suboptimal rotations. The first approach minimizes the variance together with the distance of a structure to a preceding molecule in the ensemble. The second algorithm seeks for minimal variance together with the distance to the nearest neighbors of each structure. The newly developed methods were applied to molecular-dynamics trajectories and normal-mode ensembles of the Aβ peptide, RS peptide, and lysozyme. These new (to our knowledge) superpositioning methods combine the benefits of variance and distance between nearest-neighbor(s) minimization, providing a solution for the analysis of intrinsic motions of flexible molecules and resolving ambiguous rotations.
Collapse
Affiliation(s)
- Vytautas Gapsys
- Computational Biomolecular Dynamics Group, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | | |
Collapse
|
4
|
Ramanathan A, Savol AJ, Agarwal PK, Chennubhotla CS. Event detection and sub-state discovery from biomolecular simulations using higher-order statistics: application to enzyme adenylate kinase. Proteins 2012; 80:2536-51. [PMID: 22733562 DOI: 10.1002/prot.24135] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2012] [Revised: 05/08/2012] [Accepted: 06/10/2012] [Indexed: 12/25/2022]
Abstract
Biomolecular simulations at millisecond and longer time-scales can provide vital insights into functional mechanisms. Because post-simulation analyses of such large trajectory datasets can be a limiting factor in obtaining biological insights, there is an emerging need to identify key dynamical events and relating these events to the biological function online, that is, as simulations are progressing. Recently, we have introduced a novel computational technique, quasi-anharmonic analysis (QAA) (Ramanathan et al., PLoS One 2011;6:e15827), for partitioning the conformational landscape into a hierarchy of functionally relevant sub-states. The unique capabilities of QAA are enabled by exploiting anharmonicity in the form of fourth-order statistics for characterizing atomic fluctuations. In this article, we extend QAA for analyzing long time-scale simulations online. In particular, we present HOST4MD--a higher-order statistical toolbox for molecular dynamics simulations, which (1) identifies key dynamical events as simulations are in progress, (2) explores potential sub-states, and (3) identifies conformational transitions that enable the protein to access those sub-states. We demonstrate HOST4MD on microsecond timescale simulations of the enzyme adenylate kinase in its apo state. HOST4MD identifies several conformational events in these simulations, revealing how the intrinsic coupling between the three subdomains (LID, CORE, and NMP) changes during the simulations. Further, it also identifies an inherent asymmetry in the opening/closing of the two binding sites. We anticipate that HOST4MD will provide a powerful and extensible framework for detecting biophysically relevant conformational coordinates from long time-scale simulations.
Collapse
Affiliation(s)
- Arvind Ramanathan
- Computational Biology Institute & Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | | | | | | |
Collapse
|
5
|
Liu YS, Wang M, Paul JC, Ramani K. 3DMolNavi: a web-based retrieval and navigation tool for flexible molecular shape comparison. BMC Bioinformatics 2012; 13:95. [PMID: 22583488 PMCID: PMC3430558 DOI: 10.1186/1471-2105-13-95] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2011] [Accepted: 04/14/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many molecules of interest are flexible and undergo significant shape deformation as part of their function, but most existing methods of molecular shape comparison treat them as rigid shapes, which may lead to incorrect measure of the shape similarity of flexible molecules. Currently, there still is a limited effort in retrieval and navigation for flexible molecular shape comparison, which would improve data retrieval by helping users locate the desirable molecule in a convenient way. RESULTS To address this issue, we develop a web-based retrieval and navigation tool, named 3DMolNavi, for flexible molecular shape comparison. This tool is based on the histogram of Inner Distance Shape Signature (IDSS) for fast retrieving molecules that are similar to a query molecule, and uses dimensionality reduction to navigate the retrieved results in 2D and 3D spaces. We tested 3DMolNavi in the Database of Macromolecular Movements (MolMovDB) and CATH. Compared to other shape descriptors, it achieves good performance and retrieval results for different classes of flexible molecules. CONCLUSIONS The advantages of 3DMolNavi, over other existing softwares, are to integrate retrieval for flexible molecular shape comparison and enhance navigation for user's interaction. 3DMolNavi can be accessed via https://engineering.purdue.edu/PRECISE/3dmolnavi/index.html.
Collapse
Affiliation(s)
- Yu-Shen Liu
- School of Software, Tsinghua University, Beijing 100084, China.
| | | | | | | |
Collapse
|
6
|
Chang DTH, Wu CY, Fan CY. A study on promoter characteristics of head-to-head genes in Saccharomyces cerevisiae. BMC Genomics 2012; 13 Suppl 1:S11. [PMID: 22369481 PMCID: PMC3303733 DOI: 10.1186/1471-2164-13-s1-s11] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background Head-to-head (h2h) genes are prone to have association in expression and in functionality and have been shown conserved in evolution. Currently there are many studies on such h2h gene pairs. We found that the previous studies extremely focused on human genome. Furthermore, they only focused on analyses that require only gene or protein sequences but not conducted a systematic investigation on other promoter features such as the binding evidence of specific transcription factors (TFs). This is mainly because of the incomplete resources of higher organisms, though they are relatively of interest, than model organisms such as Saccharomyces cerevisiae. The authors of this study recently integrated nine promoter features of 6603 genes of S. cerevisiae from six databases and five papers. These resources are suitable to conduct a comprehensive analysis of h2h genes in S. cerevisiae. Results This study analyzed various promoter features, including transcription boundaries (TSS, 5'UTR and 3'UTR), TATA box, TF binding evidence, TF regulation evidence, DNA bendability and nucleosome occupancy. The expression profiles and gene ontology (GO) annotations were used to measure if two genes are associated. Based on these promoter features, we found that i) the frequency of h2h genes was close to the expectation, namely they were not relatively frequent in genome; ii) the distance between the TSSs of most h2h genes fell into the range of 0-600 bps and was more centralized in 0-200 bps of the highly associated ones; iii) the number of TFs that regulate both h2h genes influenced the co-expression and co-function of the genes, while the number of TFs that bind both h2h genes influenced only the co-expression of the genes; iv) the association of two h2h genes was influenced by the existence of specific TFs such as STP2; v) the association of h2h genes whose bidirectional promoters have no TATA box was slightly higher than those who have TATA boxes; vi) the association of two h2h genes was not influenced by the DNA bendability and nucleosome occupancy. Conclusions This study analyzed h2h genes with various promoter features that have not been used in analyzing h2h genes. The results can be applied to other genomes to confirm if the observations of this study are limited to S. cerevisiae or universal in most organisms.
Collapse
Affiliation(s)
- Darby Tien-Hao Chang
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan.
| | | | | |
Collapse
|
7
|
Liu YS, Li Q, Zheng GQ, Ramani K, Benjamin W. Using diffusion distances for flexible molecular shape comparison. BMC Bioinformatics 2010; 11:480. [PMID: 20868474 PMCID: PMC2949899 DOI: 10.1186/1471-2105-11-480] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2009] [Accepted: 09/24/2010] [Indexed: 12/04/2022] Open
Abstract
Background Many molecules are flexible and undergo significant shape deformation as part of their function, and yet most existing molecular shape comparison (MSC) methods treat them as rigid bodies, which may lead to incorrect shape recognition. Results In this paper, we present a new shape descriptor, named Diffusion Distance Shape Descriptor (DDSD), for comparing 3D shapes of flexible molecules. The diffusion distance in our work is considered as an average length of paths connecting two landmark points on the molecular shape in a sense of inner distances. The diffusion distance is robust to flexible shape deformation, in particular to topological changes, and it reflects well the molecular structure and deformation without explicit decomposition. Our DDSD is stored as a histogram which is a probability distribution of diffusion distances between all sample point pairs on the molecular surface. Finally, the problem of flexible MSC is reduced to comparison of DDSD histograms. Conclusions We illustrate that DDSD is insensitive to shape deformation of flexible molecules and more effective at capturing molecular structures than traditional shape descriptors. The presented algorithm is robust and does not require any prior knowledge of the flexible regions.
Collapse
Affiliation(s)
- Yu-Shen Liu
- School of Software, Tsinghua University, Beijing 100084, China.
| | | | | | | | | |
Collapse
|
8
|
Mechelke M, Habeck M. Robust probabilistic superposition and comparison of protein structures. BMC Bioinformatics 2010; 11:363. [PMID: 20594332 PMCID: PMC2912885 DOI: 10.1186/1471-2105-11-363] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2009] [Accepted: 07/01/2010] [Indexed: 12/03/2022] Open
Abstract
Background Protein structure comparison is a central issue in structural bioinformatics. The standard dissimilarity measure for protein structures is the root mean square deviation (RMSD) of representative atom positions such as α-carbons. To evaluate the RMSD the structures under comparison must be superimposed optimally so as to minimize the RMSD. How to evaluate optimal fits becomes a matter of debate, if the structures contain regions which differ largely - a situation encountered in NMR ensembles and proteins undergoing large-scale conformational transitions. Results We present a probabilistic method for robust superposition and comparison of protein structures. Our method aims to identify the largest structurally invariant core. To do so, we model non-rigid displacements in protein structures with outlier-tolerant probability distributions. These distributions exhibit heavier tails than the Gaussian distribution underlying standard RMSD minimization and thus accommodate highly divergent structural regions. The drawback is that under a heavy-tailed model analytical expressions for the optimal superposition no longer exist. To circumvent this problem we work with a scale mixture representation, which implies a weighted RMSD. We develop two iterative procedures, an Expectation Maximization algorithm and a Gibbs sampler, to estimate the local weights, the optimal superposition, and the parameters of the heavy-tailed distribution. Applications demonstrate that heavy-tailed models capture differences between structures undergoing substantial conformational changes and can be used to assess the precision of NMR structures. By comparing Bayes factors we can automatically choose the most adequate model. Therefore our method is parameter-free. Conclusions Heavy-tailed distributions are well-suited to describe large-scale conformational differences in protein structures. A scale mixture representation facilitates the fitting of these distributions and enables outlier-tolerant superposition.
Collapse
Affiliation(s)
- Martin Mechelke
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Spemannstr 35, 72076 Tübingen, Germany
| | | |
Collapse
|
9
|
Liu YS, Fang Y, Ramani K. IDSS: deformation invariant signatures for molecular shape comparison. BMC Bioinformatics 2009; 10:157. [PMID: 19463181 PMCID: PMC2694795 DOI: 10.1186/1471-2105-10-157] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2009] [Accepted: 05/22/2009] [Indexed: 11/18/2022] Open
Abstract
Background Many molecules of interest are flexible and undergo significant shape deformation as part of their function, but most existing methods of molecular shape comparison (MSC) treat them as rigid bodies, which may lead to incorrect measure of the shape similarity of flexible molecules. Results To address the issue we introduce a new shape descriptor, called Inner Distance Shape Signature (IDSS), for describing the 3D shapes of flexible molecules. The inner distance is defined as the length of the shortest path between landmark points within the molecular shape, and it reflects well the molecular structure and deformation without explicit decomposition. Our IDSS is stored as a histogram which is a probability distribution of inner distances between all sample point pairs on the molecular surface. We show that IDSS is insensitive to shape deformation of flexible molecules and more effective at capturing molecular structures than traditional shape descriptors. Our approach reduces the 3D shape comparison problem of flexible molecules to the comparison of IDSS histograms. Conclusion The proposed algorithm is robust and does not require any prior knowledge of the flexible regions. We demonstrate the effectiveness of IDSS within a molecular search engine application for a benchmark containing abundant conformational changes of molecules. Such comparisons in several thousands per second can be carried out. The presented IDSS method can be considered as an alternative and complementary tool for the existing methods for rigid MSC. The binary executable program for Windows platform and database are available from .
Collapse
Affiliation(s)
- Yu-Shen Liu
- School of Mechanical Engineering, Purdue University, West Lafayette, IN 47907, USA.
| | | | | |
Collapse
|
10
|
Fang Y, Liu YS, Ramani K. Three dimensional shape comparison of flexible proteins using the local-diameter descriptor. BMC STRUCTURAL BIOLOGY 2009; 9:29. [PMID: 19435524 PMCID: PMC2685140 DOI: 10.1186/1472-6807-9-29] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2008] [Accepted: 05/12/2009] [Indexed: 11/10/2022]
Abstract
Background Techniques for inferring the functions of the protein by comparing their shape similarity have been receiving a lot of attention. Proteins are functional units and their shape flexibility occupies an essential role in various biological processes. Several shape descriptors have demonstrated the capability of protein shape comparison by treating them as rigid bodies. But this may give rise to an incorrect comparison of flexible protein shapes. Results We introduce an efficient approach for comparing flexible protein shapes by adapting a local diameter (LD) descriptor. The LD descriptor, developed recently to handle skeleton based shape deformations [1], is adapted in this work to capture the invariant properties of shape deformations caused by the motion of the protein backbone. Every sampled point on the protein surface is assigned a value measuring the diameter of the 3D shape in the neighborhood of that point. The LD descriptor is built in the form of a one dimensional histogram from the distribution of the diameter values. The histogram based shape representation reduces the shape comparison problem of the flexible protein to a simple distance calculation between 1D feature vectors. Experimental results indicate how the LD descriptor accurately treats the protein shape deformation. In addition, we use the LD descriptor for protein shape retrieval and compare it to the effectiveness of conventional shape descriptors. A sensitivity-specificity plot shows that the LD descriptor performs much better than the conventional shape descriptors in terms of consistency over a family of proteins and discernibility across families of different proteins. Conclusion Our study provides an effective technique for comparing the shape of flexible proteins. The experimental results demonstrate the insensitivity of the LD descriptor to protein shape deformation. The proposed method will be potentially useful for molecule retrieval with similar shapes and rapid structure retrieval for proteins. The demos and supplemental materials are available on .
Collapse
Affiliation(s)
- Yi Fang
- School of Mechanical Engineering, Purdue University, West Lafayette, IN 47907, USA.
| | | | | |
Collapse
|