1
|
Voelz VA, Pande VS, Bowman GR. Folding@home: Achievements from over 20 years of citizen science herald the exascale era. Biophys J 2023; 122:2852-2863. [PMID: 36945779 PMCID: PMC10398258 DOI: 10.1016/j.bpj.2023.03.028] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 01/26/2023] [Accepted: 03/16/2023] [Indexed: 03/23/2023] Open
Abstract
Simulations of biomolecules have enormous potential to inform our understanding of biology but require extremely demanding calculations. For over 20 years, the Folding@home distributed computing project has pioneered a massively parallel approach to biomolecular simulation, harnessing the resources of citizen scientists across the globe. Here, we summarize the scientific and technical advances this perspective has enabled. As the project's name implies, the early years of Folding@home focused on driving advances in our understanding of protein folding by developing statistical methods for capturing long-timescale processes and facilitating insight into complex dynamical processes. Success laid a foundation for broadening the scope of Folding@home to address other functionally relevant conformational changes, such as receptor signaling, enzyme dynamics, and ligand binding. Continued algorithmic advances, hardware developments such as graphics processing unit (GPU)-based computing, and the growing scale of Folding@home have enabled the project to focus on new areas where massively parallel sampling can be impactful. While previous work sought to expand toward larger proteins with slower conformational changes, new work focuses on large-scale comparative studies of different protein sequences and chemical compounds to better understand biology and inform the development of small-molecule drugs. Progress on these fronts enabled the community to pivot quickly in response to the COVID-19 pandemic, expanding to become the world's first exascale computer and deploying this massive resource to provide insight into the inner workings of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus and aid the development of new antivirals. This success provides a glimpse of what is to come as exascale supercomputers come online and as Folding@home continues its work.
Collapse
Affiliation(s)
- Vincent A Voelz
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania
| | | | - Gregory R Bowman
- Departments of Biochemistry & Biophysics and of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania.
| |
Collapse
|
2
|
Thomas T, Roux B. TYROSINE KINASES: COMPLEX MOLECULAR SYSTEMS CHALLENGING COMPUTATIONAL METHODOLOGIES. THE EUROPEAN PHYSICAL JOURNAL. B 2021; 94:203. [PMID: 36524055 PMCID: PMC9749240 DOI: 10.1140/epjb/s10051-021-00207-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 09/14/2021] [Indexed: 05/28/2023]
Abstract
Classical molecular dynamics (MD) simulations based on atomic models play an increasingly important role in a wide range of applications in physics, biology, and chemistry. Nonetheless, generating genuine knowledge about biological systems using MD simulations remains challenging. Protein tyrosine kinases are important cellular signaling enzymes that regulate cell growth, proliferation, metabolism, differentiation, and migration. Due to the large conformational changes and long timescales involved in their function, these kinases present particularly challenging problems to modern computational and theoretical frameworks aimed at elucidating the dynamics of complex biomolecular systems. Markov state models have achieved limited success in tackling the broader conformational ensemble and biased methods are often employed to examine specific long timescale events. Recent advances in machine learning continue to push the limitations of current methodologies and provide notable improvements when integrated with the existing frameworks. A broad perspective is drawn from a critical review of recent studies.
Collapse
|
3
|
Cooper S, Sterling ALR, Kleffner R, Silversmith WM, Siegel JB. Repurposing Citizen Science Games as Software Tools for Professional Scientists. FDG : PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FOUNDATIONS OF DIGITAL GAMES. INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF DIGITAL GAMES 2018; 2018:39. [PMID: 30465045 PMCID: PMC6241531 DOI: 10.1145/3235765.3235770] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Scientific software is often developed with professional scientists in mind, resulting in complex tools with a steep learning curve. Citizen science games, however, are designed for citizen scientists- members of the general public. These games maintain scientific accuracy while placing design goals such as usability and enjoyment at the forefront. In this paper, we identify an emerging use of game-based technology, in the repurposing of citizen science games to be software tools for professional scientists in their work. We discuss our experience in two such repurposings: Foldit, a protein folding and design game, and Eyewire, a web-based 3D neuron reconstruction game. Based on this experience, we provide evidence that the software artifacts produced for citizen science can be useful for professional scientists, and provide an overview of key design principles we found to be useful in the process of repurposing.
Collapse
|
4
|
Cheatham TE, Roe DR. The Impact of Heterogeneous Computing on Workflows for Biomolecular Simulation and Analysis. Comput Sci Eng 2015. [DOI: 10.1109/mcse.2015.7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
5
|
Abdul-Wahid B, Feng H, Rajan D, Costaouec R, Darve E, Thain D, Izaguirre JA. AWE-WQ: fast-forwarding molecular dynamics using the accelerated weighted ensemble. J Chem Inf Model 2014; 54:3033-43. [PMID: 25207854 PMCID: PMC4210180 DOI: 10.1021/ci500321g] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
A limitation of traditional molecular dynamics (MD) is that reaction rates are difficult to compute. This is due to the rarity of observing transitions between metastable states since high energy barriers trap the system in these states. Recently the weighted ensemble (WE) family of methods have emerged which can flexibly and efficiently sample conformational space without being trapped and allow calculation of unbiased rates. However, while WE can sample correctly and efficiently, a scalable implementation applicable to interesting biomolecular systems is not available. We provide here a GPLv2 implementation called AWE-WQ of a WE algorithm using the master/worker distributed computing WorkQueue (WQ) framework. AWE-WQ is scalable to thousands of nodes and supports dynamic allocation of computer resources, heterogeneous resource usage (such as central processing units (CPU) and graphical processing units (GPUs) concurrently), seamless heterogeneous cluster usage (i.e., campus grids and cloud providers), and support for arbitrary MD codes such as GROMACS, while ensuring that all statistics are unbiased. We applied AWE-WQ to a 34 residue protein which simulated 1.5 ms over 8 months with peak aggregate performance of 1000 ns/h. Comparison was done with a 200 μs simulation collected on a GPU over a similar timespan. The folding and unfolded rates were of comparable accuracy.
Collapse
Affiliation(s)
- Badi' Abdul-Wahid
- Department of Computer Science and Engineering, University of Notre Dame , South Bend, Indiana 46556, United States
| | | | | | | | | | | | | |
Collapse
|
6
|
McGibbon RT, Pande VS. Learning Kinetic Distance Metrics for Markov State Models of Protein Conformational Dynamics. J Chem Theory Comput 2013; 9:2900-6. [PMID: 26583974 DOI: 10.1021/ct400132h] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Statistical modeling of long timescale dynamics with Markov state models (MSMs) has been shown to be an effective strategy for building quantitative and qualitative insight into protein folding processes. Existing methodologies, however, rely on geometric clustering using distance metrics such as root mean square deviation (RMSD), assuming that geometric similarity provides an adequate basis for the kinetic partitioning of phase space. Here, inspired by advances in the machine learning community, we introduce a new approach for learning a distance metric explicitly constructed to model kinetic similarity. This approach enables the construction of models, especially in the regime of high anisotropy in the diffusion constant, with fewer states than was previously possible. Application of this technique to the analysis of two ultralong molecular dynamics simulations of the FiP35 WW domain identifies discrete near-native relaxation dynamics in the millisecond regime that were not resolved in previous analyses.
Collapse
Affiliation(s)
- Robert T McGibbon
- Department of Chemistry, Stanford University , Stanford, California 94305-4401
| | - Vijay S Pande
- Department of Chemistry, Stanford University , Stanford, California 94305-4401
| |
Collapse
|
7
|
N-terminal segments modulate the α-helical propensities of the intrinsically disordered basic regions of bZIP proteins. J Mol Biol 2011; 416:287-99. [PMID: 22226835 DOI: 10.1016/j.jmb.2011.12.043] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Revised: 12/16/2011] [Accepted: 12/20/2011] [Indexed: 01/27/2023]
Abstract
Basic region leucine zippers (bZIPs) are modular transcription factors that play key roles in eukaryotic gene regulation. The basic regions of bZIPs (bZIP-bRs) are necessary and sufficient for DNA binding and specificity. Bioinformatic predictions and spectroscopic studies suggest that unbound monomeric bZIP-bRs are uniformly disordered as isolated domains. Here, we test this assumption through a comparative characterization of conformational ensembles for 15 different bZIP-bRs using a combination of atomistic simulations and circular dichroism measurements. We find that bZIP-bRs have quantifiable preferences for α-helical conformations in their unbound monomeric forms. This helicity varies from one bZIP-bR to another despite a significant sequence similarity of the DNA binding motifs (DBMs). Our analysis reveals that intramolecular interactions between DBMs and eight-residue segments directly N-terminal to DBMs are the primary modulators of bZIP-bR helicities. We test the accuracy of this inference by designing chimeras of bZIP-bRs to have either increased or decreased overall helicities. Our results yield quantitative insights regarding the relationship between sequence and the degree of intrinsic disorder within bZIP-bRs, and might have general implications for other intrinsically disordered proteins. Understanding how natural sequence variations lead to modulation of disorder is likely to be important for understanding the evolution of specificity in molecular recognition through intrinsically disordered regions (IDRs).
Collapse
|
8
|
Fomin ES. Consideration of data load time on modern processors for the Verlet table and linked-cell algorithms. J Comput Chem 2011; 32:1386-99. [DOI: 10.1002/jcc.21722] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2010] [Accepted: 10/27/2010] [Indexed: 11/12/2022]
|
9
|
Schadt EE, Linderman MD, Sorenson J, Lee L, Nolan GP. Computational solutions to large-scale data management and analysis. Nat Rev Genet 2010; 11:647-57. [PMID: 20717155 DOI: 10.1038/nrg2857] [Citation(s) in RCA: 248] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing rate of data generation by these low-cost, high-throughput technologies in genomics is being matched by that of other technologies, such as real-time imaging and mass spectrometry-based flow cytometry. Success in the life sciences will depend on our ability to properly interpret the large-scale, high-dimensional data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. Here we discuss how we can master the different types of computational environments that exist - such as cloud and heterogeneous computing - to successfully tackle our big data problems.
Collapse
Affiliation(s)
- Eric E Schadt
- Pacific Biosciences, Menlo Park, California 94025, USA.
| | | | | | | | | |
Collapse
|
10
|
Buch I, Harvey MJ, Giorgino T, Anderson DP, De Fabritiis G. High-Throughput All-Atom Molecular Dynamics Simulations Using Distributed Computing. J Chem Inf Model 2010; 50:397-403. [DOI: 10.1021/ci900455r] [Citation(s) in RCA: 146] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- I. Buch
- Computational Biochemistry and Biophysics Lab (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C/ Doctor Aiguader 88, 08003 Barcelona, Spain, High Performance Computing Service, Information and Communications Technologies, Imperial College London, South Kensington, London, SW7 2AZ, U.K., and Space Sciences Laboratory, University of California, Berkeley California 94720
| | - M. J. Harvey
- Computational Biochemistry and Biophysics Lab (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C/ Doctor Aiguader 88, 08003 Barcelona, Spain, High Performance Computing Service, Information and Communications Technologies, Imperial College London, South Kensington, London, SW7 2AZ, U.K., and Space Sciences Laboratory, University of California, Berkeley California 94720
| | - T. Giorgino
- Computational Biochemistry and Biophysics Lab (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C/ Doctor Aiguader 88, 08003 Barcelona, Spain, High Performance Computing Service, Information and Communications Technologies, Imperial College London, South Kensington, London, SW7 2AZ, U.K., and Space Sciences Laboratory, University of California, Berkeley California 94720
| | - D. P. Anderson
- Computational Biochemistry and Biophysics Lab (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C/ Doctor Aiguader 88, 08003 Barcelona, Spain, High Performance Computing Service, Information and Communications Technologies, Imperial College London, South Kensington, London, SW7 2AZ, U.K., and Space Sciences Laboratory, University of California, Berkeley California 94720
| | - G. De Fabritiis
- Computational Biochemistry and Biophysics Lab (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C/ Doctor Aiguader 88, 08003 Barcelona, Spain, High Performance Computing Service, Information and Communications Technologies, Imperial College London, South Kensington, London, SW7 2AZ, U.K., and Space Sciences Laboratory, University of California, Berkeley California 94720
| |
Collapse
|
11
|
Harvey MJ, Giupponi G, Fabritiis GD. ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale. J Chem Theory Comput 2009; 5:1632-9. [DOI: 10.1021/ct9000685] [Citation(s) in RCA: 627] [Impact Index Per Article: 41.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- M. J. Harvey
- Information and Communications Technologies, Imperial College London, South Kensington, London, SW7 2AZ, United Kingdom, Department de Fisica Fundamental, Universitat de Barcelona, Carrer Marti i Franques 1, 08028 Barcelona, Spain, and Computational Biochemistry and Biophysics Lab (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C/ Doctor Aiguader 88, 08003 Barcelona, Spain
| | - G. Giupponi
- Information and Communications Technologies, Imperial College London, South Kensington, London, SW7 2AZ, United Kingdom, Department de Fisica Fundamental, Universitat de Barcelona, Carrer Marti i Franques 1, 08028 Barcelona, Spain, and Computational Biochemistry and Biophysics Lab (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C/ Doctor Aiguader 88, 08003 Barcelona, Spain
| | - G. De Fabritiis
- Information and Communications Technologies, Imperial College London, South Kensington, London, SW7 2AZ, United Kingdom, Department de Fisica Fundamental, Universitat de Barcelona, Carrer Marti i Franques 1, 08028 Barcelona, Spain, and Computational Biochemistry and Biophysics Lab (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C/ Doctor Aiguader 88, 08003 Barcelona, Spain
| |
Collapse
|
12
|
Long-timescale molecular dynamics simulations of protein structure and function. Curr Opin Struct Biol 2009; 19:120-7. [DOI: 10.1016/j.sbi.2009.03.004] [Citation(s) in RCA: 569] [Impact Index Per Article: 37.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2008] [Revised: 03/05/2009] [Accepted: 03/11/2009] [Indexed: 11/20/2022]
|