1
|
Pham NT, Terrance AT, Jeon YJ, Rakkiyappan R, Manavalan B. ac4C-AFL: A high-precision identification of human mRNA N4-acetylcytidine sites based on adaptive feature representation learning. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102192. [PMID: 38779332 PMCID: PMC11108997 DOI: 10.1016/j.omtn.2024.102192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 04/18/2024] [Indexed: 05/25/2024]
Abstract
RNA N4-acetylcytidine (ac4C) is a highly conserved RNA modification that plays a crucial role in controlling mRNA stability, processing, and translation. Consequently, accurate identification of ac4C sites across the genome is critical for understanding gene expression regulation mechanisms. In this study, we have developed ac4C-AFL, a bioinformatics tool that precisely identifies ac4C sites from primary RNA sequences. In ac4C-AFL, we identified the optimal sequence length for model building and implemented an adaptive feature representation strategy that is capable of extracting the most representative features from RNA. To identify the most relevant features, we proposed a novel ensemble feature importance scoring strategy to rank features effectively. We then used this information to conduct the sequential forward search, which individually determine the optimal feature set from the 16 sequence-derived feature descriptors. Utilizing these optimal feature descriptors, we constructed 176 baseline models using 11 popular classifiers. The most efficient baseline models were identified using the two-step feature selection approach, whose predicted scores were integrated and trained with the appropriate classifier to develop the final prediction model. Our rigorous cross-validations and independent tests demonstrate that ac4C-AFL surpasses contemporary tools in predicting ac4C sites. Moreover, we have developed a publicly accessible web server at https://balalab-skku.org/ac4C-AFL/.
Collapse
Affiliation(s)
- Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| | - Annie Terrina Terrance
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| | - Young-Jun Jeon
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| | - Rajan Rakkiyappan
- Department of Mathematics, Bharathiar University, Coimbatore, Tamil Nadu 641046, India
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| |
Collapse
|
2
|
Li B, Ming D. GATSol, an enhanced predictor of protein solubility through the synergy of 3D structure graph and large language modeling. BMC Bioinformatics 2024; 25:204. [PMID: 38824535 DOI: 10.1186/s12859-024-05820-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 05/29/2024] [Indexed: 06/03/2024] Open
Abstract
BACKGROUND Protein solubility is a critically important physicochemical property closely related to protein expression. For example, it is one of the main factors to be considered in the design and production of antibody drugs and a prerequisite for realizing various protein functions. Although several solubility prediction models have emerged in recent years, many of these models are limited to capturing information embedded in one-dimensional amino acid sequences, resulting in unsatisfactory predictive performance. RESULTS In this study, we introduce a novel Graph Attention network-based protein Solubility model, GATSol, which represents the 3D structure of proteins as a protein graph. In addition to the node features of amino acids extracted by the state-of-the-art protein large language model, GATSol utilizes amino acid distance maps generated using the latest AlphaFold technology. Rigorous testing on independent eSOL and the Saccharomyces cerevisiae test datasets has shown that GATSol outperforms most recently introduced models, especially with respect to the coefficient of determination R2, which reaches 0.517 and 0.424, respectively. It outperforms the current state-of-the-art GraphSol by 18.4% on the S. cerevisiae_test set. CONCLUSIONS GATSol captures 3D dimensional features of proteins by building protein graphs, which significantly improves the accuracy of protein solubility prediction. Recent advances in protein structure modeling allow our method to incorporate spatial structure features extracted from predicted structures into the model by relying only on the input of protein sequences, which simplifies the entire graph neural network prediction process, making it more user-friendly and efficient. As a result, GATSol may help prioritize highly soluble proteins, ultimately reducing the cost and effort of experimental work. The source code and data of the GATSol model are freely available at https://github.com/binbinbinv/GATSol .
Collapse
Affiliation(s)
- Bin Li
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 South Puzhu Road, Jiangbei New District, Nanjing, 211816, Jiangsu, People's Republic of China
| | - Dengming Ming
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 South Puzhu Road, Jiangbei New District, Nanjing, 211816, Jiangsu, People's Republic of China.
| |
Collapse
|
3
|
Yang Q, Jin X, Zhou H, Ying J, Zou J, Liao Y, Lu X, Ge S, Yu H, Min X. SurfPro-NN: A 3D point cloud neural network for the scoring of protein-protein docking models based on surfaces features and protein language models. Comput Biol Chem 2024; 110:108067. [PMID: 38714420 DOI: 10.1016/j.compbiolchem.2024.108067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/18/2024] [Accepted: 04/01/2024] [Indexed: 05/09/2024]
Abstract
Protein-protein interactions (PPI) play a crucial role in numerous key biological processes, and the structure of protein complexes provides valuable clues for in-depth exploration of molecular-level biological processes. Protein-protein docking technology is widely used to simulate the spatial structure of proteins. However, there are still challenges in selecting candidate decoys that closely resemble the native structure from protein-protein docking simulations. In this study, we introduce a docking evaluation method based on three-dimensional point cloud neural networks named SurfPro-NN, which represents protein structures as point clouds and learns interaction information from protein interfaces by applying a point cloud neural network. With the continuous advancement of deep learning in the field of biology, a series of knowledge-rich pre-trained models have emerged. We incorporate protein surface representation models and language models into our approach, greatly enhancing feature representation capabilities and achieving superior performance in protein docking model scoring tasks. Through comprehensive testing on public datasets, we find that our method outperforms state-of-the-art deep learning approaches in protein-protein docking model scoring. Not only does it significantly improve performance, but it also greatly accelerates training speed. This study demonstrates the potential of our approach in addressing protein interaction assessment problems, providing strong support for future research and applications in the field of biology.
Collapse
Affiliation(s)
- Qianli Yang
- Institute of Artifical Intelligence, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China.
| | - Xiaocheng Jin
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Haixia Zhou
- School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Junjie Ying
- Institute of Artifical Intelligence, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - JiaJun Zou
- School of Informatics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Yiyang Liao
- School of Informatics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Xiaoli Lu
- Information and Networking Center, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Shengxiang Ge
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Hai Yu
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China.
| | - Xiaoping Min
- School of Informatics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China.
| |
Collapse
|
4
|
Ma B, Chen H, Gong J, Liu W, Wei X, Zhang Y, Li X, Li M, Wang Y, Shang S, Tian B, Li Y, Wang R, Tan Z. Enhancing Protein Solubility via Glycosylation: From Chemical Synthesis to Machine Learning Predictions. Biomacromolecules 2024; 25:3001-3010. [PMID: 38598264 DOI: 10.1021/acs.biomac.4c00134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2024]
Abstract
Glycosylation is a valuable tool for modulating protein solubility; however, the lack of reliable research strategies has impeded efficient progress in understanding and applying this modification. This study aimed to bridge this gap by investigating the solubility of a model glycoprotein molecule, the carbohydrate-binding module (CBM), through a two-stage process. In the first stage, an approach involving chemical synthesis, comparative analysis, and molecular dynamics simulations of a library of glycoforms was employed to elucidate the effect of different glycosylation patterns on solubility and the key factors responsible for the effect. In the second stage, a predictive mathematical formula, innovatively harnessing machine learning algorithms, was derived to relate solubility to the identified key factors and accurately predict the solubility of the newly designed glycoforms. Demonstrating feasibility and effectiveness, this two-stage approach offers a valuable strategy for advancing glycosylation research, especially for the discovery of glycoforms with increased solubility.
Collapse
Affiliation(s)
- Bo Ma
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Hedi Chen
- School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Jinyuan Gong
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Wenqiang Liu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Xiuli Wei
- Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yajing Zhang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Xin Li
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Meng Li
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Yani Wang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Shiying Shang
- Center of Pharmaceutical Technology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Boxue Tian
- School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Yaohao Li
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Ruihan Wang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
- Chemical Engineering College, Hebei Normal University of Science and Technology, Qinhuangdao 066600, China
| | - Zhongping Tan
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| |
Collapse
|
5
|
Zhang Z, Zhao L, Gao M, Chen Y, Wang J, Wang C. PPII-AEAT: Prediction of protein-protein interaction inhibitors based on autoencoders with adversarial training. Comput Biol Med 2024; 172:108287. [PMID: 38503089 DOI: 10.1016/j.compbiomed.2024.108287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/21/2024] [Accepted: 03/12/2024] [Indexed: 03/21/2024]
Abstract
Protein-protein interactions (PPIs) have shown increasing potential as novel drug targets. The design and development of small molecule inhibitors targeting specific PPIs are crucial for the prevention and treatment of related diseases. Accordingly, effective computational methods are highly desired to meet the emerging need for the large-scale accurate prediction of PPI inhibitors. However, existing machine learning models rely heavily on the manual screening of features and lack generalizability. Here, we propose a new PPI inhibitor prediction method based on autoencoders with adversarial training (named PPII-AEAT) that can adaptively learn molecule representation to cope with different PPI targets. First, Extended-connectivity fingerprints and Mordred descriptors are employed to extract the primary features of small molecular compounds. Then, an autoencoder architecture is trained in three phases to learn high-level representations and predict inhibitory scores. We evaluate PPII-AEAT on nine PPI targets and two different tasks, including the PPI inhibitor identification task and inhibitory potency prediction task. The experimental results show that our proposed PPII-AEAT outperforms state-of-the-art methods.
Collapse
Affiliation(s)
- Zitong Zhang
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Lingling Zhao
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Mengyao Gao
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Yuanlong Chen
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Junjie Wang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
6
|
Eskandari A, Nezhad NG, Leow TC, Rahman MBA, Oslan SN. Essential factors, advanced strategies, challenges, and approaches involved for efficient expression of recombinant proteins in Escherichia coli. Arch Microbiol 2024; 206:152. [PMID: 38472371 DOI: 10.1007/s00203-024-03871-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 12/31/2023] [Accepted: 01/25/2024] [Indexed: 03/14/2024]
Abstract
Producing recombinant proteins is a major accomplishment of biotechnology in the past century. Heterologous hosts, either eukaryotic or prokaryotic, are used for the production of these proteins. The utilization of microbial host systems continues to dominate as the most efficient and affordable method for biotherapeutics and food industry productions. Hence, it is crucial to analyze the limitations and advantages of microbial hosts to enhance the efficient production of recombinant proteins on a large scale. E. coli is widely used as a host for the production of recombinant proteins. Researchers have identified certain obstacles with this host, and given the growing demand for recombinant protein production, there is an immediate requirement to enhance this host. The following review discusses the elements contributing to the manifestation of recombinant protein. Subsequently, it sheds light on innovative approaches aimed at improving the expression of recombinant protein. Lastly, it delves into the obstacles and optimization methods associated with translation, mentioning both cis-optimization and trans-optimization, producing soluble recombinant protein, and engineering the metal ion transportation. In this context, a comprehensive description of the distinct features will be provided, and this knowledge could potentially enhance the expression of recombinant proteins in E. coli.
Collapse
Affiliation(s)
- Azadeh Eskandari
- Enzyme and Microbial Technology Research Centre, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Department of Biochemistry, FacultyofBiotechnologyand BiomolecularSciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
| | - Nima Ghahremani Nezhad
- Enzyme and Microbial Technology Research Centre, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
| | - Thean Chor Leow
- Enzyme and Microbial Technology Research Centre, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Enzyme Technology and X-Ray Crystallography Laboratory, VacBio 5, Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
| | | | - Siti Nurbaya Oslan
- Enzyme and Microbial Technology Research Centre, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia.
- Department of Biochemistry, FacultyofBiotechnologyand BiomolecularSciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia.
- Enzyme Technology and X-Ray Crystallography Laboratory, VacBio 5, Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia.
| |
Collapse
|
7
|
Demetriou D, Lockhat Z, Brzozowski L, Saini KS, Dlamini Z, Hull R. The Convergence of Radiology and Genomics: Advancing Breast Cancer Diagnosis with Radiogenomics. Cancers (Basel) 2024; 16:1076. [PMID: 38473432 DOI: 10.3390/cancers16051076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 02/09/2024] [Accepted: 02/22/2024] [Indexed: 03/14/2024] Open
Abstract
Despite significant progress in the prevention, screening, diagnosis, prognosis, and therapy of breast cancer (BC), it remains a highly prevalent and life-threatening disease affecting millions worldwide. Molecular subtyping of BC is crucial for predictive and prognostic purposes due to the diverse clinical behaviors observed across various types. The molecular heterogeneity of BC poses uncertainties in its impact on diagnosis, prognosis, and treatment. Numerous studies have highlighted genetic and environmental differences between patients from different geographic regions, emphasizing the need for localized research. International studies have revealed that patients with African heritage are often diagnosed at a more advanced stage and exhibit poorer responses to treatment and lower survival rates. Despite these global findings, there is a dearth of in-depth studies focusing on communities in the African region. Early diagnosis and timely treatment are paramount to improving survival rates. In this context, radiogenomics emerges as a promising field within precision medicine. By associating genetic patterns with image attributes or features, radiogenomics has the potential to significantly improve early detection, prognosis, and diagnosis. It can provide valuable insights into potential treatment options and predict the likelihood of survival, progression, and relapse. Radiogenomics allows for visual features and genetic marker linkage that promises to eliminate the need for biopsy and sequencing. The application of radiogenomics not only contributes to advancing precision oncology and individualized patient treatment but also streamlines clinical workflows. This review aims to delve into the theoretical underpinnings of radiogenomics and explore its practical applications in the diagnosis, management, and treatment of BC and to put radiogenomics on a path towards fully integrated diagnostics.
Collapse
Affiliation(s)
- Demetra Demetriou
- SAMRC Precision Oncology Research Unit (PORU), DSI/NRF SARChI Chair in Precision Oncology and Cancer Prevention (POCP), Pan African Cancer Research Institute (PACRI), University of Pretoria, Hatfield, Pretoria 0028, South Africa
| | - Zarina Lockhat
- Department of Radiology, Faculty of Health Sciences, Steve Biko Academic Hospital, University of Pretoria, Hatfield, Pretoria 0028, South Africa
| | - Luke Brzozowski
- Translational Research and Core Facilities, University Health Network, Toronto, ON M5G 1L7, Canada
| | - Kamal S Saini
- Fortrea Inc., 8 Moore Drive, Durham, NC 27709, USA
- Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge CB2 0QQ, UK
| | - Zodwa Dlamini
- SAMRC Precision Oncology Research Unit (PORU), DSI/NRF SARChI Chair in Precision Oncology and Cancer Prevention (POCP), Pan African Cancer Research Institute (PACRI), University of Pretoria, Hatfield, Pretoria 0028, South Africa
| | - Rodney Hull
- SAMRC Precision Oncology Research Unit (PORU), DSI/NRF SARChI Chair in Precision Oncology and Cancer Prevention (POCP), Pan African Cancer Research Institute (PACRI), University of Pretoria, Hatfield, Pretoria 0028, South Africa
| |
Collapse
|
8
|
Carter CW. Base Pairing Promoted the Self-Organization of Genetic Coding, Catalysis, and Free-Energy Transduction. Life (Basel) 2024; 14:199. [PMID: 38398709 PMCID: PMC10890426 DOI: 10.3390/life14020199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 01/21/2024] [Accepted: 01/25/2024] [Indexed: 02/25/2024] Open
Abstract
How Nature discovered genetic coding is a largely ignored question, yet the answer is key to explaining the transition from biochemical building blocks to life. Other, related puzzles also fall inside the aegis enclosing the codes themselves. The peptide bond is unstable with respect to hydrolysis. So, it requires some form of chemical free energy to drive it. Amino acid activation and acyl transfer are also slow and must be catalyzed. All living things must thus also convert free energy and synchronize cellular chemistry. Most importantly, functional proteins occupy only small, isolated regions of sequence space. Nature evolved heritable symbolic data processing to seek out and use those sequences. That system has three parts: a memory of how amino acids behave in solution and inside proteins, a set of code keys to access that memory, and a scoring function. The code keys themselves are the genes for cognate pairs of tRNA and aminoacyl-tRNA synthetases, AARSs. The scoring function is the enzymatic specificity constant, kcat/kM, which measures both catalysis and specificity. The work described here deepens the evidence for and understanding of an unexpected consequence of ancestral bidirectional coding. Secondary structures occur in approximately the same places within antiparallel alignments of their gene products. However, the polar amino acids that define the molecular surface of one are reflected into core-defining non-polar side chains on the other. Proteins translated from base-paired coding strands fold up inside out. Bidirectional genes thus project an inverted structural duality into the proteome. I review how experimental data root the scoring functions responsible for the origins of coding and catalyzed activation of unfavorable chemical reactions in that duality.
Collapse
Affiliation(s)
- Charles W Carter
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7260, USA
| |
Collapse
|
9
|
Wang J, Chen C, Yao G, Ding J, Wang L, Jiang H. Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review. Molecules 2023; 28:7865. [PMID: 38067593 PMCID: PMC10707872 DOI: 10.3390/molecules28237865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
In recent years, the widespread application of artificial intelligence algorithms in protein structure, function prediction, and de novo protein design has significantly accelerated the process of intelligent protein design and led to many noteworthy achievements. This advancement in protein intelligent design holds great potential to accelerate the development of new drugs, enhance the efficiency of biocatalysts, and even create entirely new biomaterials. Protein characterization is the key to the performance of intelligent protein design. However, there is no consensus on the most suitable characterization method for intelligent protein design tasks. This review describes the methods, characteristics, and representative applications of traditional descriptors, sequence-based and structure-based protein characterization. It discusses their advantages, disadvantages, and scope of application. It is hoped that this could help researchers to better understand the limitations and application scenarios of these methods, and provide valuable references for choosing appropriate protein characterization techniques for related research in the field, so as to better carry out protein research.
Collapse
Affiliation(s)
| | | | | | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| |
Collapse
|
10
|
Basith S, Pham NT, Song M, Lee G, Manavalan B. ADP-Fuse: A novel two-layer machine learning predictor to identify antidiabetic peptides and diabetes types using multiview information. Comput Biol Med 2023; 165:107386. [PMID: 37619323 DOI: 10.1016/j.compbiomed.2023.107386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/03/2023] [Accepted: 08/14/2023] [Indexed: 08/26/2023]
Abstract
Diabetes mellitus has become a major public health concern associated with high mortality and reduced life expectancy and can cause blindness, heart attacks, kidney failure, lower limb amputations, and strokes. A new generation of antidiabetic peptides (ADPs) that act on β-cells or T-cells to regulate insulin production is being developed to alleviate the effects of diabetes. However, the lack of effective peptide-mining tools has hampered the discovery of these promising drugs. Hence, novel computational tools need to be developed urgently. In this study, we present ADP-Fuse, a novel two-layer prediction framework capable of accurately identifying ADPs or non-ADPs and categorizing them into type 1 and type 2 ADPs. First, we comprehensively evaluated 22 peptide sequence-derived features coupled with eight notable machine learning algorithms. Subsequently, the most suitable feature descriptors and classifiers for both layers were identified. The output of these single-feature models, embedded with multiview information, was trained with an appropriate classifier to provide the final prediction. Comprehensive cross-validation and independent tests substantiate that ADP-Fuse surpasses single-feature models and the feature fusion approach for the prediction of ADPs and their types. In addition, the SHapley Additive exPlanation method was used to elucidate the contributions of individual features to the prediction of ADPs and their types. Finally, a user-friendly web server for ADP-Fuse was developed and made publicly accessible (https://balalab-skku.org/ADP-Fuse), enabling the swift screening and identification of novel ADPs and their types. This framework is expected to contribute significantly to antidiabetic peptide identification.
Collapse
Affiliation(s)
- Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon, 16499, Republic of Korea
| | - Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Republic of Korea
| | - Minkyung Song
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Republic of Korea; Department of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon, 16419, Republic of Korea.
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, 16499, Republic of Korea; Department of Molecular Science and Technology, Ajou University, Suwon, 16499, Republic of Korea.
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Republic of Korea.
| |
Collapse
|
11
|
Mou M, Pan Z, Zhou Z, Zheng L, Zhang H, Shi S, Li F, Sun X, Zhu F. A Transformer-Based Ensemble Framework for the Prediction of Protein-Protein Interaction Sites. RESEARCH (WASHINGTON, D.C.) 2023; 6:0240. [PMID: 37771850 PMCID: PMC10528219 DOI: 10.34133/research.0240] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/08/2023] [Indexed: 09/30/2023]
Abstract
The identification of protein-protein interaction (PPI) sites is essential in the research of protein function and the discovery of new drugs. So far, a variety of computational tools based on machine learning have been developed to accelerate the identification of PPI sites. However, existing methods suffer from the low predictive accuracy or the limited scope of application. Specifically, some methods learned only global or local sequential features, leading to low predictive accuracy, while others achieved improved performance by extracting residue interactions from structures but were limited in their application scope for the serious dependence on precise structure information. There is an urgent need to develop a method that integrates comprehensive information to realize proteome-wide accurate profiling of PPI sites. Herein, a novel ensemble framework for PPI sites prediction, EnsemPPIS, was therefore proposed based on transformer and gated convolutional networks. EnsemPPIS can effectively capture not only global and local patterns but also residue interactions. Specifically, EnsemPPIS was unique in (a) extracting residue interactions from protein sequences with transformer and (b) further integrating global and local sequential features with the ensemble learning strategy. Compared with various existing methods, EnsemPPIS exhibited either superior performance or broader applicability on multiple PPI sites prediction tasks. Moreover, pattern analysis based on the interpretability of EnsemPPIS demonstrated that EnsemPPIS was fully capable of learning residue interactions within the local structure of PPI sites using only sequence information. The web server of EnsemPPIS is freely available at http://idrblab.org/ensemppis.
Collapse
Affiliation(s)
- Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Zhimeng Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
12
|
Cui Z, Wu Y, Zhang QH, Wang SG, He Y, Huang DS. MV-CVIB: a microbiome-based multi-view convolutional variational information bottleneck for predicting metastatic colorectal cancer. Front Microbiol 2023; 14:1238199. [PMID: 37675425 PMCID: PMC10477591 DOI: 10.3389/fmicb.2023.1238199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 08/02/2023] [Indexed: 09/08/2023] Open
Abstract
Introduction Imbalances in gut microbes have been implied in many human diseases, including colorectal cancer (CRC), inflammatory bowel disease, type 2 diabetes, obesity, autism, and Alzheimer's disease. Compared with other human diseases, CRC is a gastrointestinal malignancy with high mortality and a high probability of metastasis. However, current studies mainly focus on the prediction of colorectal cancer while neglecting the more serious malignancy of metastatic colorectal cancer (mCRC). In addition, high dimensionality and small samples lead to the complexity of gut microbial data, which increases the difficulty of traditional machine learning models. Methods To address these challenges, we collected and processed 16S rRNA data and calculated abundance data from patients with non-metastatic colorectal cancer (non-mCRC) and mCRC. Different from the traditional health-disease classification strategy, we adopted a novel disease-disease classification strategy and proposed a microbiome-based multi-view convolutional variational information bottleneck (MV-CVIB). Results The experimental results show that MV-CVIB can effectively predict mCRC. This model can achieve AUC values above 0.9 compared to other state-of-the-art models. Not only that, MV-CVIB also achieved satisfactory predictive performance on multiple published CRC gut microbiome datasets. Discussion Finally, multiple gut microbiota analyses were used to elucidate communities and differences between mCRC and non-mCRC, and the metastatic properties of CRC were assessed by patient age and microbiota expression.
Collapse
Affiliation(s)
- Zhen Cui
- Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Yan Wu
- College of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Qin-Hu Zhang
- EIT Institute for Advanced Study, Ningbo, Zhejiang, China
| | - Si-Guo Wang
- Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Ying He
- Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, China
| | | |
Collapse
|
13
|
Yang M, Huang ZA, Zhou W, Ji J, Zhang J, He S, Zhu Z. MIX-TPI: a flexible prediction framework for TCR-pMHC interactions based on multimodal representations. Bioinformatics 2023; 39:btad475. [PMID: 37527015 PMCID: PMC10423027 DOI: 10.1093/bioinformatics/btad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 07/05/2023] [Accepted: 07/29/2023] [Indexed: 08/03/2023] Open
Abstract
MOTIVATION The interactions between T-cell receptors (TCR) and peptide-major histocompatibility complex (pMHC) are essential for the adaptive immune system. However, identifying these interactions can be challenging due to the limited availability of experimental data, sequence data heterogeneity, and high experimental validation costs. RESULTS To address this issue, we develop a novel computational framework, named MIX-TPI, to predict TCR-pMHC interactions using amino acid sequences and physicochemical properties. Based on convolutional neural networks, MIX-TPI incorporates sequence-based and physicochemical-based extractors to refine the representations of TCR-pMHC interactions. Each modality is projected into modality-invariant and modality-specific representations to capture the uniformity and diversities between different features. A self-attention fusion layer is then adopted to form the classification module. Experimental results demonstrate the effectiveness of MIX-TPI in comparison with other state-of-the-art methods. MIX-TPI also shows good generalization capability on mutual exclusive evaluation datasets and a paired TCR dataset. AVAILABILITY AND IMPLEMENTATION The source code of MIX-TPI and the test data are available at: https://github.com/Wolverinerine/MIX-TPI.
Collapse
Affiliation(s)
- Minghao Yang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Zhi-An Huang
- Research Office, City University of Hong Kong (Dongguan), Dongguan 523000, China
| | - Wei Zhou
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Junkai Ji
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Jun Zhang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Shan He
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen 518060, China
| |
Collapse
|
14
|
Rcheulishvili N, Mao J, Papukashvili D, Feng S, Liu C, Wang X, He Y, Wang PG. Design, evaluation, and immune simulation of potentially universal multi-epitope mpox vaccine candidate: focus on DNA vaccine. Front Microbiol 2023; 14:1203355. [PMID: 37547674 PMCID: PMC10403236 DOI: 10.3389/fmicb.2023.1203355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 07/03/2023] [Indexed: 08/08/2023] Open
Abstract
Monkeypox (mpox) is a zoonotic infectious disease caused by the mpox virus. Mpox symptoms are similar to smallpox with less severity and lower mortality. As yet mpox virus is not characterized by as high transmissibility as some severe acute respiratory syndrome 2 (SARS-CoV-2) variants, still, it is spreading, especially among men who have sex with men (MSM). Thus, taking preventive measures, such as vaccination, is highly recommended. While the smallpox vaccine has demonstrated considerable efficacy against the mpox virus due to the antigenic similarities, the development of a universal anti-mpox vaccine remains a necessary pursuit. Recently, nucleic acid vaccines have garnered special attention owing to their numerous advantages compared to traditional vaccines. Importantly, DNA vaccines have certain advantages over mRNA vaccines. In this study, a potentially universal DNA vaccine candidate against mpox based on conserved epitopes was designed and its efficacy was evaluated via an immunoinformatics approach. The vaccine candidate demonstrated potent humoral and cellular immune responses in silico, indicating the potential efficacy in vivo and the need for further research.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Yunjiao He
- Department of Pharmacology, School of Medicine, Southern University of Science and Technology, Shenzhen, China
| | - Peng George Wang
- Department of Pharmacology, School of Medicine, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
15
|
Bao W, Gu Y, Chen B, Yu H. Golgi_DF: Golgi proteins classification with deep forest. Front Neurosci 2023; 17:1197824. [PMID: 37250391 PMCID: PMC10213405 DOI: 10.3389/fnins.2023.1197824] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 04/19/2023] [Indexed: 05/31/2023] Open
Abstract
Introduction Golgi is one of the components of the inner membrane system in eukaryotic cells. Its main function is to send the proteins involved in the synthesis of endoplasmic reticulum to specific parts of cells or secrete them outside cells. It can be seen that Golgi is an important organelle for eukaryotic cells to synthesize proteins. Golgi disorders can cause various neurodegenerative and genetic diseases, and the accurate classification of Golgi proteins is helpful to develop corresponding therapeutic drugs. Methods This paper proposed a novel Golgi proteins classification method, which is Golgi_DF with the deep forest algorithm. Firstly, the classified proteins method can be converted the vector features containing various information. Secondly, the synthetic minority oversampling technique (SMOTE) is utilized to deal with the classified samples. Next, the Light GBM method is utilized to feature reduction. Meanwhile, the features can be utilized in the penultimate dense layer. Therefore, the reconstructed features can be classified with the deep forest algorithm. Results In Golgi_DF, this method can be utilized to select the important features and identify Golgi proteins. Experiments show that the well-performance than the other art-of-the state methods. Golgi_DF as a standalone tools, all its source codes publicly available at https://github.com/baowz12345/golgiDF. Discussion Golgi_DF employed reconstructed feature to classify the Golgi proteins. Such method may achieve more available features among the UniRep features.
Collapse
Affiliation(s)
- Wenzheng Bao
- School of Information Engineering, Xuzhou University of Technology, Xuzhou, China
| | - Yujian Gu
- School of Information Engineering, Xuzhou University of Technology, Xuzhou, China
| | - Baitong Chen
- Department of Stomatology, Xuzhou First People’s Hospital, Xuzhou, China
- The Affiliated Hospital of China University of Mining and Technology, Xuzhou, China
| | - Huiping Yu
- Department of Neurosurgery, The Hospital of Joint Logistic, Quanzhou, China
| |
Collapse
|
16
|
Xi J, Sun D, Chang C, Zhou S, Huang Q. An omics-to-omics joint knowledge association subtensor model for radiogenomics cross-modal modules from genomics and ultrasonic images of breast cancers. Comput Biol Med 2023; 155:106672. [PMID: 36805226 DOI: 10.1016/j.compbiomed.2023.106672] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 02/06/2023] [Accepted: 02/10/2023] [Indexed: 02/16/2023]
Abstract
The radiogenomics analysis can provide the connections between genomics and radiomics, which can infer the genomic features of tumors from their radiogenomic associations through the low-cost and non-invasiveness screening ultrasonic images. Although there are a number of pioneer approaches exploring the connections between genomic aberrations and ultrasonic features, these studies mainly focus on the relationship between ultrasonic features and only the most popular cancer genes, confronting two difficulties: missing many-to-many relationships as omics-to-omics view, and confounding group-specific associations with whole sample associations. To overcome the difficulty of omics-to-omics view and the issue of tumor heterogeneity, we propose an omics-to-omics joint knowledge association subtensor model. Specifically, the subtensor factorization framework can successfully discover the joint cross-modal module via an omics-to-omics view, while the sparse weight sample indication strategy can mine sample subgroups from the multi-omic data with tumor heterogeneity. The experimental evaluation result shows the jointness of the discovered modules across omics, their association with tumorigenesis contribution, and their relation for cancer related functions. In summary, our proposed omics-to-omics joint knowledge association subtensor model can serve as an efficient tool for radiogenomic knowledge associations, promoting the cross-modal knowledge graph construction of in explainable artificial intelligence cancer diagnosis.
Collapse
Affiliation(s)
- Jianing Xi
- School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Donghui Sun
- School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Cai Chang
- Department of Ultrasound, Fudan University Shanghai Cancer Center, Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, China.
| | - Shichong Zhou
- Department of Ultrasound, Fudan University Shanghai Cancer Center, Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, China.
| | - Qinghua Huang
- School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an, 710072, China.
| |
Collapse
|