1
|
Chu H, Liu T. Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models. Int J Mol Sci 2024; 25:4507. [PMID: 38674091 PMCID: PMC11049818 DOI: 10.3390/ijms25084507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/15/2024] [Accepted: 04/17/2024] [Indexed: 04/28/2024] Open
Abstract
Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational methods. These alternatives support drug discovery by creating advanced predictive models. In this study, we proposed a fast and precise classifier for the identification of druggable proteins using a protein language model (PLM) with fine-tuned evolutionary scale modeling 2 (ESM-2) embeddings, achieving 95.11% accuracy on the benchmark dataset. Furthermore, we made a careful comparison to examine the predictive abilities of ESM-2 embeddings and position-specific scoring matrix (PSSM) features by using the same classifiers. The results suggest that ESM-2 embeddings outperformed PSSM features in terms of accuracy and efficiency. Recognizing the potential of language models, we also developed an end-to-end model based on the generative pre-trained transformers 2 (GPT-2) with modifications. To our knowledge, this is the first time a large language model (LLM) GPT-2 has been deployed for the recognition of druggable proteins. Additionally, a more up-to-date dataset, known as Pharos, was adopted to further validate the performance of the proposed model.
Collapse
Affiliation(s)
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China;
| |
Collapse
|
2
|
Li G, Zhao H, Cheng Z, Liu J, Li G, Guo Y. Single-cell transcriptomic profiling of heart reveals ANGPTL4 linking fibroblasts and angiogenesis in heart failure with preserved ejection fraction. J Adv Res 2024:S2090-1232(24)00068-7. [PMID: 38346487 DOI: 10.1016/j.jare.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 02/06/2024] [Accepted: 02/07/2024] [Indexed: 02/19/2024] Open
Abstract
INTRODUCTION Despite the high morbidity and mortality, the effective therapies for heart failure with preserved fraction (HFpEF) are limited as the poor understand of its pathophysiological basis. OBJECTIVE This study was aimed to characterize the cellular heterogeneity and potential mechanisms of HFpEF at single-cell resolution. METHODS An HFpEF mouse model was induced by a high-fat diet with N-nitro-L-arginine methyl ester. Cells from the hearts were subjected to single-cell sequencing. The key protein expression was measured with Immunohistochemistry and immunofluorescence staining. RESULTS In HFpEF hearts, myocardial fibroblasts exhibited higher levels of fibrosis. Furthermore, an increased number of fibroblasts differentiated into high-metabolism and high-fibrosis phenotypes. The expression levels of genes encoding certain pro-angiogenic secreted proteins were decreased in the HFpEF group, as confirmed by bulk RNA sequencing. Additionally, the proportion of the endothelial cell (EC) lineages in the HFpEF group was significantly downregulated, with low angiogenesis and high apoptosis phenotypes observed in these EC lineages. Interestingly, the fibroblasts in the HFpEF heart might cross-link with the EC lineages via over-secretion of ANGPTL4, thus displaying an anti-angiogenic function. Immunohistochemistry and immunofluorescence staining then revealed the downregulation of vascular density and upregulation of ANGPTL4 expression in HFpEF hearts. Finally, we predicted ANGPTL4as a potential druggable target using DrugnomeAI. CONCLUSION In conclusion, this study comprehensively characterized the angiogenesis impairment in HFpEF hearts at single-cell resolution and proposed that ANGPTL4 secretion by fibroblasts may be a potential mechanism underlying this angiogenic abnormality.
Collapse
Affiliation(s)
- Guoxing Li
- Institute of Life Sciences, Chongqing Medical University, 400016, China
| | - Huilin Zhao
- Institute of Life Sciences, Chongqing Medical University, 400016, China
| | - Zhe Cheng
- Department of Cardiology, Chongqing University Three Gorges Hospital, Chongqing 404199, China
| | - Junjin Liu
- Department of Geriatrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Gang Li
- Institute of Life Sciences, Chongqing Medical University, 400016, China; Molecular Medicine Diagnostic and Testing Center, Chongqing Medical University, 400016, China.
| | - Yongzheng Guo
- Department of Cardiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China.
| |
Collapse
|
3
|
Chen J, Epstein MP, Schildkraut JM, Kar SP. Mapping inherited genetic variation with opposite effects on autoimmune disease and cancer identifies candidate drug targets associated with the anti-tumor immune response. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.23.23300491. [PMID: 38234717 PMCID: PMC10793537 DOI: 10.1101/2023.12.23.23300491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Background Germline alleles near genes that encode certain immune checkpoints (CTLA4, CD200) are associated with autoimmune/autoinflammatory disease and cancer but in opposite directions. This motivates a systematic search for additional germline alleles which demonstrate this pattern with the aim of identifying potential cancer immunotherapeutic targets using human genetic evidence. Methods Pairwise fixed effect cross-disorder meta-analyses combining genome-wide association studies (GWAS) for breast, prostate, ovarian and endometrial cancers (240,540 cases/317,000 controls) and seven autoimmune/autoinflammatory diseases (112,631 cases/895,386 controls) coupled with in silico follow-up. To ensure detection of alleles with opposite effects on cancer and autoimmune/autoinflammatory disease, the signs on the beta coefficients in the autoimmune/autoinflammatory GWAS were reversed prior to meta-analyses. Results Meta-analyses followed by linkage disequilibrium clumping identified 312 unique, independent lead variants with Pmeta<5x10-8 associated with at least one of the cancer types at Pcancer<10-3 and one of the autoimmune/autoinflammatory diseases at Pauto<10-3. At each lead variant, the allele that conferred autoimmune/autoinflammatory disease risk was protective for cancer. Mapping each lead variant to its nearest gene as its putative functional target and focusing on genes with established immunological effects implicated 32 of the nearest genes. Tumor bulk RNA-Seq data highlighted that the tumor expression of 5/32 genes (IRF1, IKZF1, SPI1, SH2B3, LAT) were each strongly correlated (Spearman's ρ>0.5) with at least one intra-tumor T/myeloid cell infiltration marker (CD4, CD8A, CD11B, CD45) in every one of the cancer types. Tumor single-cell RNA-Seq data from all cancer types showed that the five genes were more likely to be expressed in intra-tumor immune versus malignant cells. The five lead SNPs corresponding to these genes were linked to them via expression quantitative trait locus mechanisms and at least one additional line of functional evidence. Proteins encoded by the genes were predicted to be druggable. Conclusion We provide population-scale germline genetic and functional genomic evidence to support further evaluation of the proteins encoded by IRF1, IKZF1, SPI1, SH2B3, and LAT as possible targets for cancer immunotherapy.
Collapse
Affiliation(s)
- Junyu Chen
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Michael P Epstein
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA, USA
| | - Joellen M Schildkraut
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Siddhartha P Kar
- Early Cancer Institute, Department of Oncology, University of Cambridge, Cambridge, UK
- Ovarian Cancer Programme, Cancer Research UK Cambridge Centre, Cambridge, UK
| |
Collapse
|
4
|
Hoseini B, Jaafari MR, Golabpour A, Momtazi-Borojeni AA, Karimi M, Eslami S. Application of ensemble machine learning approach to assess the factors affecting size and polydispersity index of liposomal nanoparticles. Sci Rep 2023; 13:18012. [PMID: 37865639 PMCID: PMC10590434 DOI: 10.1038/s41598-023-43689-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/27/2023] [Indexed: 10/23/2023] Open
Abstract
Liposome nanoparticles have emerged as promising drug delivery systems due to their unique properties. Assessing particle size and polydispersity index (PDI) is critical for evaluating the quality of these liposomal nanoparticles. However, optimizing these parameters in a laboratory setting is both costly and time-consuming. This study aimed to apply a machine learning technique to assess the impact of specific factors, including sonication time, extrusion temperature, and compositions, on the size and PDI of liposomal nanoparticles. Liposomal solutions were prepared and subjected to sonication with varying values for these parameters. Two compositions: (A) HSPC:DPPG:Chol:DSPE-mPEG2000 at 55:5:35:5 molar ratio and (B) HSPC:Chol:DSPE-mPEG2000 at 55:40:5 molar ratio, were made using remote loading method. Ensemble learning (EL), a machine learning technique, was employed using the Least-squares boosting (LSBoost) algorithm to accurately model the data. The dataset was randomly split into training and testing sets, with 70% allocated for training. The LSBoost algorithm achieved mean absolute errors of 1.652 and 0.0105 for modeling the size and PDI, respectively. Under conditions where the temperature was set at approximately 60 °C, our EL model predicted a minimum particle size of 116.53 nm for composition (A) with a sonication time of approximately 30 min. Similarly, for composition (B), the model predicted a minimum particle size of 129.97 nm with sonication times of approximately 30 or 55 min. In most instances, a PDI of less than 0.2 was achieved. These results highlight the significant impact of optimizing independent factors on the characteristics of liposomal nanoparticles and demonstrate the potential of EL as a decision support system for identifying the best liposomal formulation. We recommend further studies to explore the effects of other independent factors, such as lipid composition and surfactants, on liposomal nanoparticle characteristics.
Collapse
Affiliation(s)
- Benyamin Hoseini
- Pharmaceutical Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mahmoud Reza Jaafari
- Nanotechnology Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran
- Department of Pharmaceutical Nanotechnology, School of Pharmacy, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Amin Golabpour
- Department of Health Information Technology, School of Allied Medical Sciences, Shahroud University of Medical Sciences, Shahroud, Iran
| | - Amir Abbas Momtazi-Borojeni
- Department of Medical Biotechnology, School of Medicine, Neyshabur University of Medical Sciences, Neyshabur, Iran
- Healthy Ageing Research Centre, Neyshabur University of Medical Sciences, Neyshabur, Iran
| | - Maryam Karimi
- Institute of Human Virology, School of Medicine, University of Maryland, Baltimore, USA
| | - Saeid Eslami
- Pharmaceutical Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran.
- Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
5
|
Cunningham M, Pins D, Dezső Z, Torrent M, Vasanthakumar A, Pandey A. PINNED: identifying characteristics of druggable human proteins using an interpretable neural network. J Cheminform 2023; 15:64. [PMID: 37468968 PMCID: PMC10354961 DOI: 10.1186/s13321-023-00735-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/10/2023] [Indexed: 07/21/2023] Open
Abstract
The identification of human proteins that are amenable to pharmacologic modulation without significant off-target effects remains an important unsolved challenge. Computational methods have been devised to identify features which distinguish between "druggable" and "undruggable" proteins, finding that protein sequence, tissue and cellular localization, biological role, and position in the protein-protein interaction network are all important discriminant factors. However, many prior efforts to automate the assessment of protein druggability suffer from low performance or poor interpretability. We developed a neural network-based machine learning model capable of generating druggability sub-scores based on each of four distinct categories, combining them to form an overall druggability score. The model achieves an excellent performance in separating drugged and undrugged proteins in the human proteome, with an area under the receiver operating characteristic (AUC) of 0.95. Our use of multiple sub-scores allows the assessment of potential protein targets of interest based on distinct contributors to druggability, leading to a more interpretable and holistic model to identify novel targets.
Collapse
Affiliation(s)
- Michael Cunningham
- Genomics Research Center, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA.
| | - Danielle Pins
- Information Research, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Zoltán Dezső
- Genomics Research Center, AbbVie Inc., 1000 Gateway Boulevard, South San Francisco, CA, 94080, USA
| | - Maricel Torrent
- Small Molecule Therapeutics and Platform Technologies, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Aparna Vasanthakumar
- Genomics Research Center, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Abhishek Pandey
- Information Research, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| |
Collapse
|
6
|
Wang L, Song Y, Wang H, Zhang X, Wang M, He J, Li S, Zhang L, Li K, Cao L. Advances of Artificial Intelligence in Anti-Cancer Drug Design: A Review of the Past Decade. Pharmaceuticals (Basel) 2023; 16:253. [PMID: 37259400 PMCID: PMC9963982 DOI: 10.3390/ph16020253] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 01/25/2023] [Accepted: 02/06/2023] [Indexed: 10/13/2023] Open
Abstract
Anti-cancer drug design has been acknowledged as a complicated, expensive, time-consuming, and challenging task. How to reduce the research costs and speed up the development process of anti-cancer drug designs has become a challenging and urgent question for the pharmaceutical industry. Computer-aided drug design methods have played a major role in the development of cancer treatments for over three decades. Recently, artificial intelligence has emerged as a powerful and promising technology for faster, cheaper, and more effective anti-cancer drug designs. This study is a narrative review that reviews a wide range of applications of artificial intelligence-based methods in anti-cancer drug design. We further clarify the fundamental principles of these methods, along with their advantages and disadvantages. Furthermore, we collate a large number of databases, including the omics database, the epigenomics database, the chemical compound database, and drug databases. Other researchers can consider them and adapt them to their own requirements.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Kang Li
- Department of Biostatistics, School of Public Health, Harbin Medical University, Harbin 150081, China
| | - Lei Cao
- Department of Biostatistics, School of Public Health, Harbin Medical University, Harbin 150081, China
| |
Collapse
|