1
|
Lian Y, Bodian D, Shehu A. Elucidating the Role of Wildtype and Variant FGFR2 Structural Dynamics in (Dys)Function and Disorder. Int J Mol Sci 2024; 25:4523. [PMID: 38674107 PMCID: PMC11050683 DOI: 10.3390/ijms25084523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 04/12/2024] [Accepted: 04/17/2024] [Indexed: 04/28/2024] Open
Abstract
The fibroblast growth factor receptor 2 (FGFR2) gene is one of the most extensively studied genes with many known mutations implicated in several human disorders, including oncogenic ones. Most FGFR2 disease-associated gene mutations are missense mutations that result in constitutive activation of the FGFR2 protein and downstream molecular pathways. Many tertiary structures of the FGFR2 kinase domain are publicly available in the wildtype and mutated forms and in the inactive and activated state of the receptor. The current literature suggests a molecular brake inhibiting the ATP-binding A loop from adopting the activated state. Mutations relieve this brake, triggering allosteric changes between active and inactive states. However, the existing analysis relies on static structures and fails to account for the intrinsic structural dynamics. In this study, we utilize experimentally resolved structures of the FGFR2 tyrosine kinase domain and machine learning to capture the intrinsic structural dynamics, correlate it with functional regions and disease types, and enrich it with predicted structures of variants with currently no experimentally resolved structures. Our findings demonstrate the value of machine learning-enabled characterizations of structure dynamics in revealing the impact of mutations on (dys)function and disorder in FGFR2.
Collapse
Affiliation(s)
- Yiyang Lian
- School of Systems Biology, George Mason University, Manassas, VA 20110, USA;
| | - Dale Bodian
- Diamond Age Data Science, Boston, MA 02143, USA;
| | - Amarda Shehu
- School of Systems Biology, George Mason University, Manassas, VA 20110, USA;
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA
| |
Collapse
|
2
|
Zaman AB, Inan TT, De Jong K, Shehu A. Adaptive Stochastic Optimization to Improve Protein Conformation Sampling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2759-2771. [PMID: 34882562 DOI: 10.1109/tcbb.2021.3134103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We have long known that characterizing protein structures structure is key to understanding protein function. Computational approaches have largely addressed a narrow formulation of the problem, seeking to compute one native structure from an amino-acid sequence. Now AlphaFold2 is shown to be able to reveal a high-quality native structure for many proteins. However, researchers over the years have argued for broadening our view to account for the multiplicity of native structures. We now know that many protein molecules switch between different structures to regulate interactions with molecular partners in the cell. Elucidating such structures de novo is exceptionally difficult, as it requires exploration of possibly a very large structure space in search of competing, near-optimal structures. Here we report on a novel stochastic optimization method capable of revealing very different structures for a given protein from knowledge of its amino-acid sequence. The method leverages evolutionary search techniques and adapts its exploration of the search space to balance between exploration and exploitation in the presence of a computational budget. In addition to demonstrating the utility of this method for identifying multiple native structures, we additionally provide a benchmark dataset for researchers to continue work on this problem.
Collapse
|
3
|
Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures. Biomolecules 2022; 12:biom12070908. [PMID: 35883464 PMCID: PMC9313347 DOI: 10.3390/biom12070908] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 06/14/2022] [Accepted: 06/23/2022] [Indexed: 02/01/2023] Open
Abstract
With the debut of AlphaFold2, we now can get a highly-accurate view of a reasonable equilibrium tertiary structure of a protein molecule. Yet, a single-structure view is insufficient and does not account for the high structural plasticity of protein molecules. Obtaining a multi-structure view of a protein molecule continues to be an outstanding challenge in computational structural biology. In tandem with methods formulated under the umbrella of stochastic optimization, we are now seeing rapid advances in the capabilities of methods based on deep learning. In recent work, we advance the capability of these models to learn from experimentally-available tertiary structures of protein molecules of varying lengths. In this work, we elucidate the important role of the composition of the training dataset on the neural network’s ability to learn key local and distal patterns in tertiary structures. To make such patterns visible to the network, we utilize a contact map-based representation of protein tertiary structure. We show interesting relationships between data size, quality, and composition on the ability of latent variable models to learn key patterns of tertiary structure. In addition, we present a disentangled latent variable model which improves upon the state-of-the-art variable autoencoder-based model in key, physically-realistic structural patterns. We believe this work opens up further avenues of research on deep learning-based models for computing multi-structure views of protein molecules.
Collapse
|
4
|
Alam FF, Rahman T, Shehu A. Evaluating Autoencoder-Based Featurization and Supervised Learning for Protein Decoy Selection. Molecules 2020; 25:E1146. [PMID: 32143444 PMCID: PMC7179114 DOI: 10.3390/molecules25051146] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 02/18/2020] [Accepted: 02/25/2020] [Indexed: 11/24/2022] Open
Abstract
Rapid growth in molecular structure data is renewing interest in featurizing structure. Featurizations that retain information on biological activity are particularly sought for protein molecules, where decades of research have shown that indeed structure encodes function. Research on featurization of protein structure is active, but here we assess the promise of autoencoders. Motivated by rapid progress in neural network research, we investigate and evaluate autoencoders on yielding linear and nonlinear featurizations of protein tertiary structures. An additional reason we focus on autoencoders as the engine to obtain featurizations is the versatility of their architectures and the ease with which changes to architecture yield linear versus nonlinear features. While open-source neural network libraries, such as Keras, which we employ here, greatly facilitate constructing, training, and evaluating autoencoder architectures and conducting model search, autoencoders have not yet gained popularity in the structure biology community. Here we demonstrate their utility in a practical context. Employing autoencoder-based featurizations, we address the classic problem of decoy selection in protein structure prediction. Utilizing off-the-shelf supervised learning methods, we demonstrate that the featurizations are indeed meaningful and allow detecting active tertiary structures, thus opening the way for further avenues of research.
Collapse
Affiliation(s)
- Fardina Fathmiul Alam
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (F.F.A.); (T.R.)
| | - Taseef Rahman
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (F.F.A.); (T.R.)
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (F.F.A.); (T.R.)
- Center for Advancing Human-Machine Partnerships, George Mason University, Fairfax, VA 22030, USA
- Department of Bioengineering, George Mason University, Fairfax, VA 22030, USA
- School of Systems Biology, George Mason University, Fairfax, VA 22030, USA
| |
Collapse
|
5
|
ABDELHALIM MOHAMEDB, MABROUK MAIS, SAYED AHMEDY. HPS_PSP: HIGH PERFORMANCE SYSTEM FOR PROTEIN STRUCTURE PREDICTION. J BIOL SYST 2019. [DOI: 10.1142/s0218339019500190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Prediction of least energy conformation of a protein from its primary structure (chain of amino acids) is an optimization problem associated with a large complex energy landscape. In this study, a simple 2D hydrophobic–hydrophilic model was used to model the protein sequence, which allows the fast and efficient design of genetic algorithm-based protein structure prediction approach. The neighborhood search strategy is integrated into the genetic operator. The neighborhood search guides the genetic operator to regions in the computational space with good solutions. To prevent convergence to local optima, the proposed method employs crowding-based parent replacement strategy, which improves the performance of the algorithm and the ability to deal with multiple numbers of solutions. The proposed algorithm was tested with a standard benchmark of HP sequences and comparative results demonstrate that the proposed system beats most of the evolutionary algorithms for seven sequences. It finds the best energy for a sequence of length [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text] and [Formula: see text].
Collapse
Affiliation(s)
- MOHAMED B. ABDELHALIM
- College of Computing and Information Technology (CCIT), Arab Academy for Science Technology and Maritime Transport (AASTMT) Cairo, Egypt
| | - MAI S. MABROUK
- Biomedical Engineering Department, Misr University for Science and Technology, 6 October City, Giza, Egypt
| | - AHMED Y. SAYED
- Physics and Engineering Mathematics Department, Faculty of Engineering at Mataria, Helwan Uinversity, Cairo, Egypt
| |
Collapse
|
6
|
Morris D, Maximova T, Plaku E, Shehu A. Attenuating dependence on structural data in computing protein energy landscapes. BMC Bioinformatics 2019; 20:280. [PMID: 31167640 PMCID: PMC6551245 DOI: 10.1186/s12859-019-2822-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Background Nearly all cellular processes involve proteins structurally rearranging to accommodate molecular partners. The energy landscape underscores the inherent nature of proteins as dynamic molecules interconverting between structures with varying energies. In principle, reconstructing a protein’s energy landscape holds the key to characterizing the structural dynamics and its regulation of protein function. In practice, the disparate spatio-temporal scales spanned by the slow dynamics challenge both wet and dry laboratories. However, the growing number of deposited structures for proteins central to human biology presents an opportunity to infer the relevant dynamics via exploitation of the information encoded in such structures about equilibrium dynamics. Results Recent computational efforts using extrinsic modes of motion as variables have successfully reconstructed detailed energy landscapes of several medium-size proteins. Here we investigate the extent to which one can reconstruct the energy landscape of a protein in the absence of sufficient, wet-laboratory structural data. We do so by integrating intrinsic modes of motion extracted off a single structure in a stochastic optimization framework that supports the plug-and-play of different variable selection strategies. We demonstrate that, while knowledge of more wet-laboratory structures yields better-reconstructed landscapes, precious information can be obtained even when only one structural model is available. Conclusions The presented work shows that it is possible to reconstruct the energy landscape of a protein with reasonable detail and accuracy even when the structural information about the protein is limited to one structure. By attenuating the dependence on structural data of methods designed to compute protein energy landscapes, the work opens up interesting venues of research on structure-based inference of dynamics. Of particular interest are directions of research that will extend such inference to proteins with no experimentally-characterized structures.
Collapse
Affiliation(s)
- David Morris
- Department of Computer Science, George Mason University, Fairfax, 22030, VA, USA
| | - Tatiana Maximova
- Department of Computer Science, George Mason University, Fairfax, 22030, VA, USA
| | - Erion Plaku
- Department of Electrical Engineering and Computer Science, The Catholic University of America, Washington, 20064, D.C., USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, 22030, VA, USA. .,Department of Bioengineering, George Mason University, Fairfax, 22030, VA, USA. .,School of Systems Biology, George Mason University, Manassas, 20110, VA, USA.
| |
Collapse
|
7
|
Sapin E, Carr DB, De Jong KA, Shehu A. Computing energy landscape maps and structural excursions of proteins. BMC Genomics 2016; 17 Suppl 4:546. [PMID: 27535545 PMCID: PMC5001232 DOI: 10.1186/s12864-016-2798-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Structural excursions of a protein at equilibrium are key to biomolecular recognition and function modulation. Protein modeling research is driven by the need to aid wet laboratories in characterizing equilibrium protein dynamics. In principle, structural excursions of a protein can be directly observed via simulation of its dynamics, but the disparate temporal scales involved in such excursions make this approach computationally impractical. On the other hand, an informative representation of the structure space available to a protein at equilibrium can be obtained efficiently via stochastic optimization, but this approach does not directly yield information on equilibrium dynamics. METHODS We present here a novel methodology that first builds a multi-dimensional map of the energy landscape that underlies the structure space of a given protein and then queries the computed map for energetically-feasible excursions between structures of interest. An evolutionary algorithm builds such maps with a practical computational budget. Graphical techniques analyze a computed multi-dimensional map and expose interesting features of an energy landscape, such as basins and barriers. A path searching algorithm then queries a nearest-neighbor graph representation of a computed map for energetically-feasible basin-to-basin excursions. RESULTS Evaluation is conducted on intrinsically-dynamic proteins of importance in human biology and disease. Visual statistical analysis of the maps of energy landscapes computed by the proposed methodology reveals features already captured in the wet laboratory, as well as new features indicative of interesting, unknown thermodynamically-stable and semi-stable regions of the equilibrium structure space. Comparison of maps and structural excursions computed by the proposed methodology on sequence variants of a protein sheds light on the role of equilibrium structure and dynamics in the sequence-function relationship. CONCLUSIONS Applications show that the proposed methodology is effective at locating basins in complex energy landscapes and computing basin-basin excursions of a protein with a practical computational budget. While the actual temporal scales spanned by a structural excursion cannot be directly obtained due to the foregoing of simulation of dynamics, hypotheses can be formulated regarding the impact of sequence mutations on protein function. These hypotheses are valuable in instigating further research in wet laboratories.
Collapse
Affiliation(s)
- Emmanuel Sapin
- Department of Computer Science, George Mason University, 4400 University Drive, Fairfax, 22030, VA, USA
| | - Daniel B Carr
- Department of Statistics, George Mason University, 4400 University Drive, Fairfax, 22030, VA, USA
| | - Kenneth A De Jong
- Department of Computer Science, George Mason University, 4400 University Drive, Fairfax, 22030, VA, USA.,Krasnow Institute for Advanced Study, George Mason University, 4400 University Drive, Fairfax, 22030, VA, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, 4400 University Drive, Fairfax, 22030, VA, USA. amarda.@gmu.edu.,Department of Bioengineering, George Mason University, 4400 University Drive, Fairfax, 22030, VA, USA. amarda.@gmu.edu.,School of Systems Biology, George Mason University, 10900 University Boulevard, Manassas, 20110, VA, USA. amarda.@gmu.edu
| |
Collapse
|