1
|
Shukla K, Idanwekhai K, Naradikian M, Ting S, Schoenberger SP, Brunk E. Machine Learning of Three-Dimensional Protein Structures to Predict the Functional Impacts of Genome Variation. J Chem Inf Model 2024; 64:5328-5343. [PMID: 38635316 DOI: 10.1021/acs.jcim.3c01967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Research in the human genome sciences generates a substantial amount of genetic data for hundreds of thousands of individuals, which concomitantly increases the number of variants of unknown significance (VUS). Bioinformatic analyses can successfully reveal rare variants and variants with clear associations with disease-related phenotypes. These studies have had a significant impact on how clinical genetic screens are interpreted and how patients are stratified for treatment. There are few, if any, computational methods for variants comparable to biological activity predictions. To address this gap, we developed a machine learning method that uses protein three-dimensional structures from AlphaFold to predict how a variant will influence changes to a gene's downstream biological pathways. We trained state-of-the-art machine learning classifiers to predict which protein regions will most likely impact transcriptional activities of two proto-oncogenes, nuclear factor erythroid 2 (NFE2L2)-related factor 2 (NRF2) and c-Myc. We have identified classifiers that attain accuracies higher than 80%, which have allowed us to identify a set of key protein regions that lead to significant perturbations in c-Myc or NRF2 transcriptional pathway activities.
Collapse
Affiliation(s)
- Kriti Shukla
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | - Kelvin Idanwekhai
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | - Martin Naradikian
- La Jolla Institute for Immunology, San Diego, California 92093, United States
| | - Stephanie Ting
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | | | - Elizabeth Brunk
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Integrative Program for Biological and Genome Sciences (IBGS), University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| |
Collapse
|
2
|
Deichmann M, Hansson FG, Jensen ED. Yeast-based screening platforms to understand and improve human health. Trends Biotechnol 2024:S0167-7799(24)00095-7. [PMID: 38677901 DOI: 10.1016/j.tibtech.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 04/01/2024] [Accepted: 04/03/2024] [Indexed: 04/29/2024]
Abstract
Detailed molecular understanding of the human organism is essential to develop effective therapies. Saccharomyces cerevisiae has been used extensively for acquiring insights into important aspects of human health, such as studying genetics and cell-cell communication, elucidating protein-protein interaction (PPI) networks, and investigating human G protein-coupled receptor (hGPCR) signaling. We highlight recent advances and opportunities of yeast-based technologies for cost-efficient chemical library screening on hGPCRs, accelerated deciphering of PPI networks with mating-based screening and selection, and accurate cell-cell communication with human immune cells. Overall, yeast-based technologies constitute an important platform to support basic understanding and innovative applications towards improving human health.
Collapse
Affiliation(s)
- Marcus Deichmann
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| | - Frederik G Hansson
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| | - Emil D Jensen
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark.
| |
Collapse
|
3
|
Bou Dagher L, Madern D, Malbos P, Brochier-Armanet C. Persistent homology reveals strong phylogenetic signal in 3D protein structures. PNAS NEXUS 2024; 3:pgae158. [PMID: 38689707 PMCID: PMC11058471 DOI: 10.1093/pnasnexus/pgae158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/01/2024] [Indexed: 05/02/2024]
Abstract
Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on 3D structure comparisons are still in their infancy. In this study, we propose an effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g. classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from 10 major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.
Collapse
Affiliation(s)
- Léa Bou Dagher
- Université Claude Bernard Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et BiologieÉvolutive, UMR5558, F-69622 Villeurbanne, France
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, F-69622 Villeurbanne, France
- Université Libanaise, Laboratoire de Mathématiques, École Doctorale en Science et Technologie, PO BOX 5 Hadath, Liban
| | - Dominique Madern
- University Grenoble Alpes, CEA, CNRS, IBS, 38000 Grenoble, France
| | - Philippe Malbos
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, F-69622 Villeurbanne, France
| | - Céline Brochier-Armanet
- Université Claude Bernard Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et BiologieÉvolutive, UMR5558, F-69622 Villeurbanne, France
| |
Collapse
|
4
|
Wang X, Li A, Li X, Cui H. Empowering Protein Engineering through Recombination of Beneficial Substitutions. Chemistry 2024; 30:e202303889. [PMID: 38288640 DOI: 10.1002/chem.202303889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Indexed: 02/24/2024]
Abstract
Directed evolution stands as a seminal technology for generating novel protein functionalities, a cornerstone in biocatalysis, metabolic engineering, and synthetic biology. Today, with the development of various mutagenesis methods and advanced analytical machines, the challenge of diversity generation and high-throughput screening platforms is largely solved, and one of the remaining challenges is: how to empower the potential of single beneficial substitutions with recombination to achieve the epistatic effect. This review overviews experimental and computer-assisted recombination methods in protein engineering campaigns. In addition, integrated and machine learning-guided strategies were highlighted to discuss how these recombination approaches contribute to generating the screening library with better diversity, coverage, and size. A decision tree was finally summarized to guide the further selection of proper recombination strategies in practice, which was beneficial for accelerating protein engineering.
Collapse
Affiliation(s)
- Xinyue Wang
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, No. 2 Xuelin Road, Nanjing, 210097, China
| | - Anni Li
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, No. 2 Xuelin Road, Nanjing, 210097, China
| | - Xiujuan Li
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, No. 2 Xuelin Road, Nanjing, 210097, China
| | - Haiyang Cui
- School of Life Sciences, Nanjing Normal University, No. 2 Xuelin Road, Nanjing, 210097, China
| |
Collapse
|
5
|
Yang Z, Zheng Y, Gao Q. Lysine lactylation in the regulation of tumor biology. Trends Endocrinol Metab 2024:S1043-2760(24)00025-0. [PMID: 38395657 DOI: 10.1016/j.tem.2024.01.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/24/2024] [Accepted: 01/29/2024] [Indexed: 02/25/2024]
Abstract
Lysine lactylation (Kla), a newly discovered post-translational modification (PTM) of lysine residues, is progressively revealing its crucial role in tumor biology. A growing body of evidence supports its capacity of transcriptional regulation through histone modification and modulation of non-histone protein function. It intricately participates in a myriad of events in the tumor microenvironment (TME) by orchestrating the transitions of immune states and augmenting tumor malignancy. Its preferential modification of metabolic proteins underscores its specific regulatory influence on metabolism. This review focuses on the effect and the probable mechanisms of Kla-mediated regulation of tumor metabolism, the upstream factors that determine Kla intensity, and its potential implications for the clinical diagnosis and treatment of tumors.
Collapse
Affiliation(s)
- Zijian Yang
- Department of Liver Surgery and Transplantation, and Key Laboratory of Carcinogenesis and Cancer Invasion (Ministry of Education), Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Yingqi Zheng
- Department of Liver Surgery and Transplantation, and Key Laboratory of Carcinogenesis and Cancer Invasion (Ministry of Education), Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Qiang Gao
- Department of Liver Surgery and Transplantation, and Key Laboratory of Carcinogenesis and Cancer Invasion (Ministry of Education), Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China; Key Laboratory of Medical Epigenetics and Metabolism, Institutes of Biomedical Sciences, Fudan University, Shanghai, China; State Key Laboratory of Genetic Engineering, Fudan University, Shanghai, China.
| |
Collapse
|
6
|
Liu J, Yang M, Yu Y, Xu H, Li K, Zhou X. Large language models in bioinformatics: applications and perspectives. ARXIV 2024:arXiv:2401.04155v1. [PMID: 38259343 PMCID: PMC10802675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Large language models (LLMs) are a class of artificial intelligence models based on deep learning, which have great performance in various tasks, especially in natural language processing (NLP). Large language models typically consist of artificial neural networks with numerous parameters, trained on large amounts of unlabeled input using self-supervised or semi-supervised learning. However, their potential for solving bioinformatics problems may even exceed their proficiency in modeling human language. In this review, we will present a summary of the prominent large language models used in natural language processing, such as BERT and GPT, and focus on exploring the applications of large language models at different omics levels in bioinformatics, mainly including applications of large language models in genomics, transcriptomics, proteomics, drug discovery and single cell analysis. Finally, this review summarizes the potential and prospects of large language models in solving bioinformatic problems.
Collapse
Affiliation(s)
- Jiajia Liu
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, 77030, USA
| | - Mengyuan Yang
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan 450001, China
| | - Yankai Yu
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan 611756, China
| | - Haixia Xu
- The Center of Gerontology and Geriatrics, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Kang Li
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Xiaobo Zhou
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, 77030, USA
- McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|