1
|
Yang B, Bao W, Chen B. PGRNIG: novel parallel gene regulatory network identification algorithm based on GPU. Brief Funct Genomics 2022; 21:441-454. [PMID: 36064791 DOI: 10.1093/bfgp/elac028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 07/30/2022] [Accepted: 08/03/2022] [Indexed: 12/14/2022] Open
Abstract
Molecular biology has revealed that complex life phenomena can be treated as the result of many gene interactions. Investigating these interactions and understanding the intrinsic mechanisms of biological systems using gene expression data have attracted a lot of attention. As a typical gene regulatory network (GRN) inference method, the S-system has been utilized to deal with small-scale network identification. However, it is extremely difficult to optimize it to infer medium-to-large networks. This paper proposes a novel parallel swarm intelligent algorithm, PGRNIG, to optimize the parameters of the S-system. We employed the clone selection strategy to improve the whale optimization algorithm (CWOA). To enhance the time efficiency of CWOA optimization, we utilized a parallel CWOA (PCWOA) based on the compute unified device architecture (CUDA) platform. Decomposition strategy and L1 regularization were utilized to reduce the search space and complexity of GRN inference. We applied the PGRNIG algorithm on three synthetic datasets and two real time-series expression datasets of the species of Escherichia coli and Saccharomyces cerevisiae. Experimental results show that PGRNIG could infer the gene regulatory network more accurately than other state-of-the-art methods with a convincing computational speed-up. Our findings show that CWOA and PCWOA have faster convergence performances than WOA.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China
| | - Wenzheng Bao
- School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou 221018, China
| | - Baitong Chen
- Xuzhou First People's Hospital, Xuzhou 221000, China
| |
Collapse
|
2
|
Mining folded proteomes in the era of accurate structure prediction. PLoS Comput Biol 2022; 18:e1009930. [PMID: 35333855 PMCID: PMC8986115 DOI: 10.1371/journal.pcbi.1009930] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 04/06/2022] [Accepted: 02/16/2022] [Indexed: 01/02/2023] Open
Abstract
Protein structure fundamentally underpins the function and processes of numerous biological systems. Fold recognition algorithms offer a sensitive and robust tool to detect structural, and thereby functional, similarities between distantly related homologs. In the era of accurate structure prediction owing to advances in machine learning techniques and a wealth of experimentally determined structures, previously curated sequence databases have become a rich source of biological information. Here, we use bioinformatic fold recognition algorithms to scan the entire AlphaFold structure database to identify novel protein family members, infer function and group predicted protein structures. As an example of the utility of this approach, we identify novel, previously unknown members of various pore-forming protein families, including MACPFs, GSDMs and aerolysin-like proteins. Virtually every cellular process in all organisms on Earth is driven by molecular nano-machines known as proteins. The diverse functions of proteins are the result of the unique three-dimensional shape adopted by a given protein molecule. It is therefore important to determine the shape of a given protein, which unlike DNA and our genes, cannot be known from its sequence alone. Since two proteins with similar shapes typically have a similar function, knowing a protein shape provides crucial clues about its function. By virtue of decades of experimental work and advances in artificial intelligence, this complex shape can now be computationally predicted for any protein whose composition is known. Scientists have used these and other methods to produce enormous libraries of protein shapes consisting of nearly a million unique entries. However, these libraries are too large and too complex for researchers to ‘read’. We use shape-comparison algorithms to carefully check these shape-libraries to gain insight into the potential function and biological role of previously unknown proteins. Furthermore, we identified new members of protein families using this technique. We show that shape-matching algorithms and computationally generated shape-libraries can be used effectively together to yield new insights and expedite scientific endeavours.
Collapse
|
3
|
Dulcey CE, López de Los Santos Y, Létourneau M, Déziel E, Doucet N. Semi-rational evolution of the 3-(3-hydroxyalkanoyloxy)alkanoate (HAA) synthase RhlA to improve rhamnolipid production in Pseudomonas aeruginosa and Burkholderia glumae. FEBS J 2019; 286:4036-4059. [PMID: 31177633 DOI: 10.1111/febs.14954] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 04/12/2019] [Accepted: 06/06/2019] [Indexed: 12/15/2022]
Abstract
The 3-(3-hydroxyalkanoyloxy)alkanoate (HAA) synthase RhlA is an essential enzyme involved in the biosynthesis of HAAs in Pseudomonas and Burkholderia species. RhlA modulates the aliphatic chain length in rhamnolipids, conferring distinct physicochemical properties to these biosurfactants exhibiting promising industrial and pharmaceutical value. A detailed molecular understanding of substrate specificity and catalytic performance in RhlA could offer protein engineering tools to develop designer variants involved in the synthesis of novel rhamnolipid mixtures for tailored eco-friendly products. However, current directed evolution progress remains limited due to the absence of high-throughput screening methodologies and lack of an experimentally resolved RhlA structure. In the present work, we used comparative modeling and chimeric-based approaches to perform a comprehensive semi-rational mutagenesis of RhlA from Pseudomonas aeruginosa. Our extensive RhlA mutational variants and chimeric hybrids between the Pseudomonas and Burkholderia homologs illustrate selective modulation of rhamnolipid alkyl chain length in both Pseudomonas aeruginosa and Burkholderia glumae. Our results also demonstrate the implication of a putative cap-domain motif that covers the catalytic site of the enzyme and provides substrate specificity to RhlA. This semi-rational mutant-based survey reveals promising 'hot-spots' for the modulation of RL congener patterns and potential control of enzyme activity, in addition to uncovering residue positions that modulate substrate selectivity between the Pseudomonas and Burkholderia functional homologs. DATABASE: Model data are available in the PMDB database under the accession number PM0081867.
Collapse
Affiliation(s)
- Carlos Eduardo Dulcey
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Université du Québec, Laval, Canada
| | - Yossef López de Los Santos
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Université du Québec, Laval, Canada
| | - Myriam Létourneau
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Université du Québec, Laval, Canada
| | - Eric Déziel
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Université du Québec, Laval, Canada
| | - Nicolas Doucet
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Université du Québec, Laval, Canada.,PROTEO, the Québec Network for Research on Protein Function, Engineering, and Applications, Université Laval, Canada
| |
Collapse
|
4
|
High-throughput and scalable protein function identification with Hadoop and Map-only pattern of the MapReduce processing model. Knowl Inf Syst 2018. [DOI: 10.1007/s10115-018-1245-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
5
|
Warris S, Timal NRN, Kempenaar M, Poortinga AM, van de Geest H, Varbanescu AL, Nap JP. pyPaSWAS: Python-based multi-core CPU and GPU sequence alignment. PLoS One 2018; 13:e0190279. [PMID: 29293576 PMCID: PMC5749749 DOI: 10.1371/journal.pone.0190279] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2017] [Accepted: 12/11/2017] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of hardware platforms. Moreover, there is a need to promote the adoption of parallel computing in bioinformatics by making its use and extension more simple through more and better application of high-level languages commonly used in bioinformatics, such as Python. RESULTS The novel application pyPaSWAS presents the parallel SW sequence alignment code fully packed in Python. It is a generic SW implementation running on several hardware platforms with multi-core systems and/or GPUs that provides accurate sequence alignments that also can be inspected for alignment details. Additionally, pyPaSWAS support the affine gap penalty. Python libraries are used for automated system configuration, I/O and logging. This way, the Python environment will stimulate further extension and use of pyPaSWAS. CONCLUSIONS pyPaSWAS presents an easy Python-based environment for accurate and retrievable parallel SW sequence alignments on GPUs and multi-core systems. The strategy of integrating Python with high-performance parallel compute languages to create a developer- and user-friendly environment should be considered for other computationally intensive bioinformatics algorithms.
Collapse
Affiliation(s)
- Sven Warris
- Expertise Centre ALIFE, Institute for Life Science & Technology, Hanze University of Applied Sciences Groningen, Groningen, the Netherlands.,Applied Bioinformatics, Wageningen University and Research, Wageningen, the Netherlands
| | - N Roshan N Timal
- Parallel and Distributed Systems, Delft University of Technology, Delft, the Netherlands
| | - Marcel Kempenaar
- Expertise Centre ALIFE, Institute for Life Science & Technology, Hanze University of Applied Sciences Groningen, Groningen, the Netherlands
| | - Arne M Poortinga
- Expertise Centre ALIFE, Institute for Life Science & Technology, Hanze University of Applied Sciences Groningen, Groningen, the Netherlands
| | - Henri van de Geest
- Applied Bioinformatics, Wageningen University and Research, Wageningen, the Netherlands
| | - Ana L Varbanescu
- Parallel and Distributed Systems, Delft University of Technology, Delft, the Netherlands
| | - Jan-Peter Nap
- Expertise Centre ALIFE, Institute for Life Science & Technology, Hanze University of Applied Sciences Groningen, Groningen, the Netherlands.,Applied Bioinformatics, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
6
|
Nobile MS, Cazzaniga P, Tangherloni A, Besozzi D. Graphics processing units in bioinformatics, computational biology and systems biology. Brief Bioinform 2017; 18:870-885. [PMID: 27402792 PMCID: PMC5862309 DOI: 10.1093/bib/bbw058] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Indexed: 01/18/2023] Open
Abstract
Several studies in Bioinformatics, Computational Biology and Systems Biology rely on the definition of physico-chemical or mathematical models of biological systems at different scales and levels of complexity, ranging from the interaction of atoms in single molecules up to genome-wide interaction networks. Traditional computational methods and software tools developed in these research fields share a common trait: they can be computationally demanding on Central Processing Units (CPUs), therefore limiting their applicability in many circumstances. To overcome this issue, general-purpose Graphics Processing Units (GPUs) are gaining an increasing attention by the scientific community, as they can considerably reduce the running time required by standard CPU-based software, and allow more intensive investigations of biological systems. In this review, we present a collection of GPU tools recently developed to perform computational analyses in life science disciplines, emphasizing the advantages and the drawbacks in the use of these parallel architectures. The complete list of GPU-powered tools here reviewed is available at http://bit.ly/gputools.
Collapse
Affiliation(s)
- Marco S Nobile
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy
- SYSBIO.IT Centre of Systems Biology, Milano, Italy
| | - Paolo Cazzaniga
- Department of Human and Social Sciences, University of Bergamo, Bergamo, Italy
- SYSBIO.IT Centre of Systems Biology, Milano, Italy
| | - Andrea Tangherloni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy
| | - Daniela Besozzi
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy
- SYSBIO.IT Centre of Systems Biology, Milano, Italy
- Corresponding author. Daniela Besozzi, Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy and SYSBIO.IT Centre of Systems Biology, Milano, Italy. Tel.: +39 02 6448 7874. E-mail:
| |
Collapse
|
7
|
Yan X, Li J, Gu Q, Xu J. gWEGA: GPU-accelerated WEGA for molecular superposition and shape comparison. J Comput Chem 2014; 35:1122-30. [PMID: 24729358 DOI: 10.1002/jcc.23603] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2014] [Revised: 03/06/2014] [Accepted: 03/14/2014] [Indexed: 01/13/2023]
Affiliation(s)
- Xin Yan
- Research Center for Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-sen University, 132 East Circle at University City, Guangzhou, Guangdong, 510006, China
| | | | | | | |
Collapse
|
8
|
Mrozek D, Brożek M, Małysiak-Mrozek B. Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA. J Mol Model 2014; 20:2067. [PMID: 24481593 PMCID: PMC3936136 DOI: 10.1007/s00894-014-2067-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 10/11/2013] [Indexed: 01/16/2023]
Abstract
Searching for similar 3D protein structures is one of the primary processes employed in the field of structural bioinformatics. However, the computational complexity of this process means that it is constantly necessary to search for new methods that can perform such a process faster and more efficiently. Finding molecular substructures that complex protein structures have in common is still a challenging task, especially when entire databases containing tens or even hundreds of thousands of protein structures must be scanned. Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can. In this paper, we describe the GPU-based implementation of the CASSERT algorithm for 3D protein structure similarity searching. This algorithm is based on the two-phase alignment of protein structures when matching fragments of the compared proteins. The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT (“GPU-CASSERT”) parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores). In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time. GPU-CASSERT is available at: http://zti.polsl.pl/dmrozek/science/gpucassert/cassert.htm.
Collapse
Affiliation(s)
- Dariusz Mrozek
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland,
| | | | | |
Collapse
|
9
|
Going over the three dimensional protein structure similarity problem. Artif Intell Rev 2013. [DOI: 10.1007/s10462-013-9416-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
10
|
Kirshner DA, Nilmeier JP, Lightstone FC. Catalytic site identification--a web server to identify catalytic site structural matches throughout PDB. Nucleic Acids Res 2013; 41:W256-65. [PMID: 23680785 PMCID: PMC3692059 DOI: 10.1093/nar/gkt403] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The catalytic site identification web server provides the innovative capability to find structural matches to a user-specified catalytic site among all Protein Data Bank proteins rapidly (in less than a minute). The server also can examine a user-specified protein structure or model to identify structural matches to a library of catalytic sites. Finally, the server provides a database of pre-calculated matches between all Protein Data Bank proteins and the library of catalytic sites. The database has been used to derive a set of hypothesized novel enzymatic function annotations. In all cases, matches and putative binding sites (protein structure and surfaces) can be visualized interactively online. The website can be accessed at http://catsid.llnl.gov.
Collapse
Affiliation(s)
| | | | - Felice C. Lightstone
- *To whom correspondence should be addressed. Tel: +1 925 423 8657; Fax: +1 925 423 0785;
| |
Collapse
|
11
|
Park S, Shin SY, Hwang KB. CFMDS: CUDA-based fast multidimensional scaling for genome-scale data. BMC Bioinformatics 2013; 13 Suppl 17:S23. [PMID: 23282007 PMCID: PMC3521231 DOI: 10.1186/1471-2105-13-s17-s23] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Background Multidimensional scaling (MDS) is a widely used approach to dimensionality reduction. It has been applied to feature selection and visualization in various areas. Among diverse MDS methods, the classical MDS is a simple and theoretically sound solution for projecting data objects onto a low dimensional space while preserving the original distances among them as much as possible. However, it is not trivial to apply it to genome-scale data (e.g., microarray gene expression profiles) on regular desktop computers, because of its high computational complexity. Results We implemented a highly-efficient software application, called CFMDS (CUDA-based Fast MultiDimensional Scaling), which produces an approximate solution of the classical MDS based on CUDA (compute unified device architecture) and the divide-and-conquer principle. CUDA is a parallel computing architecture exploiting the power of the GPU (graphics processing unit). The principle of divide-and-conquer was adopted for circumventing the small memory problem of usual graphics cards. Our application software has been tested on various benchmark datasets including microarrays and compared with the classical MDS algorithms implemented using C# and MATLAB. In our experiments, CFMDS was more than a hundred times faster for large data than such general solutions. Regarding the quality of dimensionality reduction, our approximate solutions were as good as those from the general solutions, as the Pearson's correlation coefficients between them were larger than 0.9. Conclusions CFMDS is an expeditious solution for the data dimensionality reduction problem. It is especially useful for efficient processing of genome-scale data consisting of several thousands of objects in several minutes.
Collapse
Affiliation(s)
- Sungin Park
- School of Computer Science and Engineering, Soongsil University, Seoul 156-743, Korea
| | | | | |
Collapse
|
12
|
Wang JJY, Bensmail H, Gao X. Multiple graph regularized protein domain ranking. BMC Bioinformatics 2012; 13:307. [PMID: 23157331 PMCID: PMC3583823 DOI: 10.1186/1471-2105-13-307] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Accepted: 10/29/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods. RESULTS To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods. CONCLUSION The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications.
Collapse
Affiliation(s)
- Jim Jing-Yan Wang
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.
| | | | | |
Collapse
|
13
|
GSA: a GPU-accelerated structure similarity algorithm and its application in progressive virtual screening. Mol Divers 2012; 16:759-69. [DOI: 10.1007/s11030-012-9403-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Accepted: 10/08/2012] [Indexed: 12/21/2022]
|
14
|
Ho HK, Gange G, Kuiper MJ, Ramamohanarao K. BetaSearch: a new method for querying β-residue motifs. BMC Res Notes 2012; 5:391. [PMID: 22839199 PMCID: PMC3532365 DOI: 10.1186/1756-0500-5-391] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 06/15/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Searching for structural motifs across known protein structures can be useful for identifying unrelated proteins with similar function and characterising secondary structures such as β-sheets. This is infeasible using conventional sequence alignment because linear protein sequences do not contain spatial information. β-residue motifs are β-sheet substructures that can be represented as graphs and queried using existing graph indexing methods, however, these approaches are designed for general graphs that do not incorporate the inherent structural constraints of β-sheets and require computationally-expensive filtering and verification procedures. 3D substructure search methods, on the other hand, allow β-residue motifs to be queried in a three-dimensional context but at significant computational costs. FINDINGS We developed a new method for querying β-residue motifs, called BetaSearch, which leverages the natural planar constraints of β-sheets by indexing them as 2D matrices, thus avoiding much of the computational complexities involved with structural and graph querying. BetaSearch exhibits faster filtering, verification, and overall query time than existing graph indexing approaches whilst producing comparable index sizes. Compared to 3D substructure search methods, BetaSearch achieves 33 and 240 times speedups over index-based and pairwise alignment-based approaches, respectively. Furthermore, we have presented case-studies to demonstrate its capability of motif matching in sequentially dissimilar proteins and described a method for using BetaSearch to predict β-strand pairing. CONCLUSIONS We have demonstrated that BetaSearch is a fast method for querying substructure motifs. The improvements in speed over existing approaches make it useful for efficiently performing high-volume exploratory querying of possible protein substructural motifs or conformations. BetaSearch was used to identify a nearly identical β-residue motif between an entirely synthetic (Top7) and a naturally-occurring protein (Charcot-Leyden crystal protein), as well as identifying structural similarities between biotin-binding domains of avidin, streptavidin and the lipocalin gamma subunit of human C8.
Collapse
Affiliation(s)
- Hui Kian Ho
- Department of Computing and Information Systems, The University of Melbourne, Victoria, Australia.
| | | | | | | |
Collapse
|
15
|
Abstract
A computational pipeline PocketAnnotate for functional annotation of proteins at the level of binding sites has been proposed in this study. The pipeline integrates three in-house algorithms for site-based function annotation: PocketDepth, for prediction of binding sites in protein structures; PocketMatch, for rapid comparison of binding sites and PocketAlign, to obtain detailed alignment between pair of binding sites. A novel scheme has been developed to rapidly generate a database of non-redundant binding sites. For a given input protein structure, putative ligand-binding sites are identified, matched in real time against the database and the query substructure aligned with the promising hits, to obtain a set of possible ligands that the given protein could bind to. The input can be either whole protein structures or merely the substructures corresponding to possible binding sites. Structure-based function annotation at the level of binding sites thus achieved could prove very useful for cases where no obvious functional inference can be obtained based purely on sequence or fold-level analyses. An attempt has also been made to analyse proteins of no known function from Protein Data Bank. PocketAnnotate would be a valuable tool for the scientific community and contribute towards structure-based functional inference. The web server can be freely accessed at http://proline.biochem.iisc.ernet.in/pocketannotate/.
Collapse
Affiliation(s)
- Praveen Anand
- Department of Biochemistry, Indian Institute of Science, Bangalore 560012, Karnataka, India
| | | | | |
Collapse
|
16
|
Hirschfeld JA, Lustfeld H. Finding stable minima using a nudged-elastic-band-based optimization scheme. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 85:056709. [PMID: 23004905 DOI: 10.1103/physreve.85.056709] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2012] [Revised: 05/08/2012] [Indexed: 06/01/2023]
Abstract
Optimization is essential in many scientific and economical areas, but it is often too complex to be tackled by simple straightforward calculations or by trial and error. Two well-known methods to find low-lying minima in such complex systems are simulated annealing and the genetic algorithm. In these methods artificial fluctuations control the probability of the system to overcome a local minimum having a certain depth. Here we present a complementary scheme that is based on the nudged-elastic-band method ordinarily used to find saddle points and we apply the scheme to find the most stable isomers of the phosphorus P(4), P(8) molecules and the corresponding molecules of As(n), Sb(n), and Bi(n) (n = 4,8) in the framework of the density functional theory. In the case of n = 8 we have found stable and metastable configurations, some of which are new and have similar energies. As a by-product we obtained an upper bound for the energy barriers between these configurations.
Collapse
Affiliation(s)
- J A Hirschfeld
- Forschungszentrum Jülich, Institute for Advanced Simulation, Jülich, Germany.
| | | |
Collapse
|
17
|
Pang B, Zhao N, Becchi M, Korkin D, Shyu CR. Accelerating large-scale protein structure alignments with graphics processing units. BMC Res Notes 2012; 5:116. [PMID: 22357132 PMCID: PMC3309952 DOI: 10.1186/1756-0500-5-116] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2011] [Accepted: 02/22/2012] [Indexed: 11/24/2022] Open
Abstract
Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU.
Collapse
Affiliation(s)
- Bin Pang
- Informatics Institute, University of Missouri, Columbia, MO, USA
| | | | | | | | | |
Collapse
|
18
|
Liu P, Agrafiotis DK, Rassokhin DN, Yang E. Accelerating Chemical Database Searching Using Graphics Processing Units. J Chem Inf Model 2011; 51:1807-16. [DOI: 10.1021/ci200164g] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Pu Liu
- Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh and McKean Roads, Spring House, Pennsylvania 19477, United States
| | - Dimitris K. Agrafiotis
- Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh and McKean Roads, Spring House, Pennsylvania 19477, United States
| | - Dmitrii N. Rassokhin
- Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh and McKean Roads, Spring House, Pennsylvania 19477, United States
| | - Eric Yang
- Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh and McKean Roads, Spring House, Pennsylvania 19477, United States
| |
Collapse
|
19
|
Farber RM. Topical perspective on massive threading and parallelism. J Mol Graph Model 2011; 30:82-9. [PMID: 21764615 DOI: 10.1016/j.jmgm.2011.06.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2011] [Revised: 06/15/2011] [Accepted: 06/17/2011] [Indexed: 10/18/2022]
Abstract
Unquestionably computer architectures have undergone a recent and noteworthy paradigm shift that now delivers multi- and many-core systems with tens to many thousands of concurrent hardware processing elements per workstation or supercomputer node. GPGPU (General Purpose Graphics Processor Unit) technology in particular has attracted significant attention as new software development capabilities, namely CUDA (Compute Unified Device Architecture) and OpenCL™, have made it possible for students as well as small and large research organizations to achieve excellent speedup for many applications over more conventional computing architectures. The current scientific literature reflects this shift with numerous examples of GPGPU applications that have achieved one, two, and in some special cases, three-orders of magnitude increased computational performance through the use of massive threading to exploit parallelism. Multi-core architectures are also evolving quickly to exploit both massive-threading and massive-parallelism such as the 1.3 million threads Blue Waters supercomputer. The challenge confronting scientists in planning future experimental and theoretical research efforts--be they individual efforts with one computer or collaborative efforts proposing to use the largest supercomputers in the world is how to capitalize on these new massively threaded computational architectures--especially as not all computational problems will scale to massive parallelism. In particular, the costs associated with restructuring software (and potentially redesigning algorithms) to exploit the parallelism of these multi- and many-threaded machines must be considered along with application scalability and lifespan. This perspective is an overview of the current state of threading and parallelize with some insight into the future.
Collapse
|