1
|
Dong S, Fan C, Wang M, Patil S, Li J, Huang L, Chen Y, Guo H, Liu Y, Pan M, Ma L, Chen F. Development of a carbohydrate-binding protein prediction algorithm using structural features of stacking aromatic rings. Int J Biol Macromol 2024; 281:136553. [PMID: 39401628 DOI: 10.1016/j.ijbiomac.2024.136553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 10/03/2024] [Accepted: 10/11/2024] [Indexed: 10/20/2024]
Abstract
Carbohydrate-protein interactions play fundamental roles in numerous aspects of biological activities, and the search for new carbohydrate (CHO)-binding proteins (CBPs) has long been a research focus. In this study, through the analysis of CBP structures, we identified significant enrichment of aromatic residues in CHO-binding regions. We further summarized the structural features of these aromatic rings within the CHO-stacking region, namely "exposing" and "proximity" features, and developed a screening algorithm that can identify CHO-stacking Trp (tryptophan) residues based on these two features. Our Trp screening algorithm can achieve high accuracy in both CBP (specificity score 0.93) and CBS (Carbohydrate binding site, precision score 0.77) prediction using experimentally determined protein structures. We also applied our screening algorithm on AlphaGO pan-species predicted models and observed significant enrichment of carbohydrate-related functions in predicted CBP candidates across different species. Moreover, through carbohydrate arrays, we experimentally verified the CHO-binding ability of four candidate proteins, which further confirms the robustness of the algorithm. This study provides another perspective on proteome-wide CBP and CBS prediction. Our results not only help to reveal the structural mechanism of CHO-binding, but also provide a pan-species CBP dataset for future CHO-protein interaction exploration.
Collapse
Affiliation(s)
- Shaowei Dong
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China; Department of Obstetrics and Gynecology, Department of Pediatrics, Guangdong Provincial Key Laboratory of Major Obstetric Diseases, Guangdong Provincial Clinical Research Center for Obstetrics and Gynecology, Guangdong-Hong Kong-Macao Greater Bay Area Higher Education Joint Laboratory of Maternal-Fetal Medicine, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| | - Chuiqin Fan
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Manna Wang
- Department of Obstetrics and Gynecology, Department of Pediatrics, Guangdong Provincial Key Laboratory of Major Obstetric Diseases, Guangdong Provincial Clinical Research Center for Obstetrics and Gynecology, Guangdong-Hong Kong-Macao Greater Bay Area Higher Education Joint Laboratory of Maternal-Fetal Medicine, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Sandip Patil
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Jun Li
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Liangping Huang
- Department of Obstetrics and Gynecology, Department of Pediatrics, Guangdong Provincial Key Laboratory of Major Obstetric Diseases, Guangdong Provincial Clinical Research Center for Obstetrics and Gynecology, Guangdong-Hong Kong-Macao Greater Bay Area Higher Education Joint Laboratory of Maternal-Fetal Medicine, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Yuanguo Chen
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Huijie Guo
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Yanbing Liu
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Mengwen Pan
- Department of Obstetrics and Gynecology, Department of Pediatrics, Guangdong Provincial Key Laboratory of Major Obstetric Diseases, Guangdong Provincial Clinical Research Center for Obstetrics and Gynecology, Guangdong-Hong Kong-Macao Greater Bay Area Higher Education Joint Laboratory of Maternal-Fetal Medicine, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| | - Lian Ma
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China.
| | - Fuyi Chen
- Department of Obstetrics and Gynecology, Department of Pediatrics, Guangdong Provincial Key Laboratory of Major Obstetric Diseases, Guangdong Provincial Clinical Research Center for Obstetrics and Gynecology, Guangdong-Hong Kong-Macao Greater Bay Area Higher Education Joint Laboratory of Maternal-Fetal Medicine, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| |
Collapse
|
2
|
Chakraborty C, Bhattacharya M, Lee SS, Wen ZH, Lo YH. The changing scenario of drug discovery using AI to deep learning: Recent advancement, success stories, collaborations, and challenges. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102295. [PMID: 39257717 PMCID: PMC11386122 DOI: 10.1016/j.omtn.2024.102295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Due to the transformation of artificial intelligence (AI) tools and technologies, AI-driven drug discovery has come to the forefront. It reduces the time and expenditure. Due to these advantages, pharmaceutical industries are concentrating on AI-driven drug discovery. Several drug molecules have been discovered using AI-based techniques and tools, and several newly AI-discovered drug molecules have already entered clinical trials. In this review, we first present the data and their resources in the pharmaceutical sector for AI-driven drug discovery and illustrated some significant algorithms or techniques used for AI and ML which are used in this field. We gave an overview of the deep neural network (NN) models and compared them with artificial NNs. Then, we illustrate the recent advancement of the landscape of drug discovery using AI to deep learning, such as the identification of drug targets, prediction of their structure, estimation of drug-target interaction, estimation of drug-target binding affinity, design of de novo drug, prediction of drug toxicity, estimation of absorption, distribution, metabolism, excretion, toxicity; and estimation of drug-drug interaction. Moreover, we highlighted the success stories of AI-driven drug discovery and discussed several collaboration and the challenges in this area. The discussions in the article will enrich the pharmaceutical industry.
Collapse
Affiliation(s)
- Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal 700126, India
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, Odisha 756020, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon, Gangwon-Do 24252, Republic of Korea
| | - Zhi-Hong Wen
- Department of Marine Biotechnology and Resources, National Sun Yat-sen University, Kaohsiung 80424, Taiwan
| | - Yi-Hao Lo
- Department of Family Medicine, Zuoying Armed Forces General Hospital, Kaohsiung 813204, Taiwan
- Shu-Zen Junior College of Medicine and Management, Kaohsiung 821004, Taiwan
- Institute of Medical Science and Technology, National Sun Yat-sen University, Kaohsiung 804201, Taiwan
| |
Collapse
|
3
|
He X, Zhao L, Tian Y, Li R, Chu Q, Gu Z, Zheng M, Wang Y, Li S, Jiang H, Jiang Y, Wen L, Wang D, Cheng X. Highly accurate carbohydrate-binding site prediction with DeepGlycanSite. Nat Commun 2024; 15:5163. [PMID: 38886381 PMCID: PMC11183243 DOI: 10.1038/s41467-024-49516-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 06/10/2024] [Indexed: 06/20/2024] Open
Abstract
As the most abundant organic substances in nature, carbohydrates are essential for life. Understanding how carbohydrates regulate proteins in the physiological and pathological processes presents opportunities to address crucial biological problems and develop new therapeutics. However, the diversity and complexity of carbohydrates pose a challenge in experimentally identifying the sites where carbohydrates bind to and act on proteins. Here, we introduce a deep learning model, DeepGlycanSite, capable of accurately predicting carbohydrate-binding sites on a given protein structure. Incorporating geometric and evolutionary features of proteins into a deep equivariant graph neural network with the transformer architecture, DeepGlycanSite remarkably outperforms previous state-of-the-art methods and effectively predicts binding sites for diverse carbohydrates. Integrating with a mutagenesis study, DeepGlycanSite reveals the guanosine-5'-diphosphate-sugar-recognition site of an important G-protein coupled receptor. These findings demonstrate DeepGlycanSite is invaluable for carbohydrate-binding site prediction and could provide insights into molecular mechanisms underlying carbohydrate-regulation of therapeutically important proteins.
Collapse
Affiliation(s)
- Xinheng He
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Lifen Zhao
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Yinping Tian
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Rui Li
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Qinyu Chu
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
| | - Zhiyong Gu
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
| | - Mingyue Zheng
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
| | - Yusong Wang
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, China
| | - Shaoning Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Hualiang Jiang
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
- Lingang Laboratory, Shanghai, China
| | - Yi Jiang
- Lingang Laboratory, Shanghai, China
| | - Liuqing Wen
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | | | - Xi Cheng
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China.
| |
Collapse
|
4
|
Bibekar P, Krapp L, Peraro MD. PeSTo-Carbs: Geometric Deep Learning for Prediction of Protein-Carbohydrate Binding Interfaces. J Chem Theory Comput 2024; 20:2985-2991. [PMID: 38602504 PMCID: PMC11044267 DOI: 10.1021/acs.jctc.3c01145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 03/28/2024] [Accepted: 03/29/2024] [Indexed: 04/12/2024]
Abstract
The Protein Structure Transformer (PeSTo), a geometric transformer, has exhibited exceptional performance in predicting protein-protein binding interfaces and distinguishing interfaces with nucleic acids, lipids, small molecules, and ions. In this study, we introduce PeSTo-Carbs, an extension of PeSTo specifically engineered to predict protein-carbohydrate binding interfaces. We evaluate the performance of this approach using independent test sets and compare them with those of previous methods. Furthermore, we highlight the model's capability to specialize in predicting interfaces involving cyclodextrins, a biologically and pharmaceutically significant class of carbohydrates. Our method consistently achieves remarkable accuracy despite the scarcity of available structural data for cyclodextrins.
Collapse
Affiliation(s)
- Parth Bibekar
- Department
of Biological Sciences, Indian Institute
of Science Education and Research (IISER) Kolkata, Mohanpur 741246, India
- Institute
of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Lucien Krapp
- Institute
of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- Swiss
Institute of Bioinformatics (SIB), Lausanne 1015, Switzerland
| | - Matteo Dal Peraro
- Institute
of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- Swiss
Institute of Bioinformatics (SIB), Lausanne 1015, Switzerland
| |
Collapse
|
5
|
Kabir MWU, Alawad DM, Pokhrel P, Hoque MT. DRBpred: A sequence-based machine learning method to effectively predict DNA- and RNA-binding residues. Comput Biol Med 2024; 170:108081. [PMID: 38295475 PMCID: PMC10922697 DOI: 10.1016/j.compbiomed.2024.108081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 01/12/2024] [Accepted: 01/27/2024] [Indexed: 02/02/2024]
Abstract
DNA-binding and RNA-binding proteins are essential to an organism's normal life cycle. These proteins have diverse functions in various biological processes. DNA-binding proteins are crucial for DNA replication, transcription, repair, packaging, and gene expression. Likewise, RNA-binding proteins are essential for the post-transcriptional control of RNAs and RNA metabolism. Identifying DNA- and RNA-binding residue is essential for biological research and understanding the pathogenesis of many diseases. However, most DNA-binding and RNA-binding proteins still need to be discovered. This research explored various properties of the protein sequences, such as amino acid composition type, Position-Specific Scoring Matrix (PSSM) values of amino acids, Hidden Markov model (HMM) profiles, physiochemical properties, structural properties, torsion angles, and disorder regions. We utilized a sliding window technique to extract more information from a target residue's neighbors. We proposed an optimized Light Gradient Boosting Machine (LightGBM) method, named DRBpred, to predict DNA-binding and RNA-binding residues from the protein sequence. DRBpred shows an improvement of 112.00 %, 33.33 %, and 6.49 % for the DNA-binding test set compared to the state-of-the-art method. It shows an improvement of 112.50 %, 16.67 %, and 7.46 % for the RNA-binding test set regarding Sensitivity, Mathews Correlation Coefficient (MCC), and AUC metric.
Collapse
Affiliation(s)
- Md Wasi Ul Kabir
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA.
| | - Duaa Mohammad Alawad
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA.
| | - Pujan Pokhrel
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA.
| | - Md Tamjidul Hoque
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA.
| |
Collapse
|
6
|
Hassan J, Saeed SM, Deka L, Uddin MJ, Das DB. Applications of Machine Learning (ML) and Mathematical Modeling (MM) in Healthcare with Special Focus on Cancer Prognosis and Anticancer Therapy: Current Status and Challenges. Pharmaceutics 2024; 16:260. [PMID: 38399314 PMCID: PMC10892549 DOI: 10.3390/pharmaceutics16020260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/29/2024] [Accepted: 02/07/2024] [Indexed: 02/25/2024] Open
Abstract
The use of data-driven high-throughput analytical techniques, which has given rise to computational oncology, is undisputed. The widespread use of machine learning (ML) and mathematical modeling (MM)-based techniques is widely acknowledged. These two approaches have fueled the advancement in cancer research and eventually led to the uptake of telemedicine in cancer care. For diagnostic, prognostic, and treatment purposes concerning different types of cancer research, vast databases of varied information with manifold dimensions are required, and indeed, all this information can only be managed by an automated system developed utilizing ML and MM. In addition, MM is being used to probe the relationship between the pharmacokinetics and pharmacodynamics (PK/PD interactions) of anti-cancer substances to improve cancer treatment, and also to refine the quality of existing treatment models by being incorporated at all steps of research and development related to cancer and in routine patient care. This review will serve as a consolidation of the advancement and benefits of ML and MM techniques with a special focus on the area of cancer prognosis and anticancer therapy, leading to the identification of challenges (data quantity, ethical consideration, and data privacy) which are yet to be fully addressed in current studies.
Collapse
Affiliation(s)
- Jasmin Hassan
- Drug Delivery & Therapeutics Lab, Dhaka 1212, Bangladesh; (J.H.); (S.M.S.)
| | | | - Lipika Deka
- Faculty of Computing, Engineering and Media, De Montfort University, Leicester LE1 9BH, UK;
| | - Md Jasim Uddin
- Department of Pharmaceutical Technology, Faculty of Pharmacy, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| | - Diganta B. Das
- Department of Chemical Engineering, Loughborough University, Loughborough LE11 3TU, UK
| |
Collapse
|
7
|
Liu G, Chang Y, Mei X, Chen G, Zhang Y, Jiang X, Tao W, Xue C. Identification and structural characterization of a novel chondroitin sulfate-specific carbohydrate-binding module: The first member of a new family, CBM100. Int J Biol Macromol 2024; 255:127959. [PMID: 37951443 DOI: 10.1016/j.ijbiomac.2023.127959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 11/05/2023] [Accepted: 11/06/2023] [Indexed: 11/14/2023]
Abstract
Chondroitin sulfate is a biologically and commercially important polysaccharide with a variety of applications. Carbohydrate-binding module (CBM) is an important class of carbohydrate-binding protein, which could be utilized as a promising tool for the applications of polysaccharides. In the present study, an unknown function domain was explored from a putative chondroitin sulfate lyase in PL29 family. Recombinant PhCBM100 demonstrated binding capacity to chondroitin sulfates with Ka values of 2.1 ± 0.2 × 106 M-1 and 6.0 ± 0.1 × 106 M-1 to chondroitin sulfate A and chondroitin sulfate C, respectively. The 1.55 Å resolution X-ray crystal structure of PhCBM100 exhibited a β-sandwich fold formed by two antiparallel β-sheets. A binding groove in PhCBM100 interacting with chondroitin sulfate was subsequently identified, and the potential of PhCBM100 for visualization of chondroitin sulfate was evaluated. PhCBM100 is the first characterized chondroitin sulfate-specific CBM. The novelty of PhCBM100 proposed a new CBM family of CBM100.
Collapse
Affiliation(s)
- Guanchen Liu
- College of Food Science and Engineering, Ocean University of China, 1299 Sansha Road, Qingdao 266404, China
| | - Yaoguang Chang
- College of Food Science and Engineering, Ocean University of China, 1299 Sansha Road, Qingdao 266404, China.
| | - Xuanwei Mei
- College of Food Science and Engineering, Ocean University of China, 1299 Sansha Road, Qingdao 266404, China
| | - Guangning Chen
- College of Food Science and Engineering, Ocean University of China, 1299 Sansha Road, Qingdao 266404, China
| | - Yuying Zhang
- College of Food Science and Engineering, Ocean University of China, 1299 Sansha Road, Qingdao 266404, China
| | - Xiaoxiao Jiang
- College of Food Science and Engineering, Ocean University of China, 1299 Sansha Road, Qingdao 266404, China
| | - Wenwen Tao
- College of Food Science and Engineering, Ocean University of China, 1299 Sansha Road, Qingdao 266404, China
| | - Changhu Xue
- College of Food Science and Engineering, Ocean University of China, 1299 Sansha Road, Qingdao 266404, China
| |
Collapse
|
8
|
Guan J, Yao L, Chung CR, Chiang YC, Lee TY. StackTHPred: Identifying Tumor-Homing Peptides through GBDT-Based Feature Selection with Stacking Ensemble Architecture. Int J Mol Sci 2023; 24:10348. [PMID: 37373494 DOI: 10.3390/ijms241210348] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 05/31/2023] [Accepted: 06/02/2023] [Indexed: 06/29/2023] Open
Abstract
One of the major challenges in cancer therapy lies in the limited targeting specificity exhibited by existing anti-cancer drugs. Tumor-homing peptides (THPs) have emerged as a promising solution to this issue, due to their capability to specifically bind to and accumulate in tumor tissues while minimally impacting healthy tissues. THPs are short oligopeptides that offer a superior biological safety profile, with minimal antigenicity, and faster incorporation rates into target cells/tissues. However, identifying THPs experimentally, using methods such as phage display or in vivo screening, is a complex, time-consuming task, hence the need for computational methods. In this study, we proposed StackTHPred, a novel machine learning-based framework that predicts THPs using optimal features and a stacking architecture. With an effective feature selection algorithm and three tree-based machine learning algorithms, StackTHPred has demonstrated advanced performance, surpassing existing THP prediction methods. It achieved an accuracy of 0.915 and a 0.831 Matthews Correlation Coefficient (MCC) score on the main dataset, and an accuracy of 0.883 and a 0.767 MCC score on the small dataset. StackTHPred also offers favorable interpretability, enabling researchers to better understand the intrinsic characteristics of THPs. Overall, StackTHPred is beneficial for both the exploration and identification of THPs and facilitates the development of innovative cancer therapies.
Collapse
Affiliation(s)
- Jiahui Guan
- School of Medicine, The Chinese University of Hong Kong (Shenzhen) 2001 Longxiang Road, Shenzhen 518172, China
| | - Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Chia-Ru Chung
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Ying-Chih Chiang
- School of Medicine, The Chinese University of Hong Kong (Shenzhen) 2001 Longxiang Road, Shenzhen 518172, China
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
9
|
Corrales-Hernández MG, Villarroel-Hagemann SK, Mendoza-Rodelo IE, Palacios-Sánchez L, Gaviria-Carrillo M, Buitrago-Ricaurte N, Espinosa-Lugo S, Calderon-Ospina CA, Rodríguez-Quintana JH. Development of Antiepileptic Drugs throughout History: From Serendipity to Artificial Intelligence. Biomedicines 2023; 11:1632. [PMID: 37371727 DOI: 10.3390/biomedicines11061632] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 05/24/2023] [Accepted: 05/31/2023] [Indexed: 06/29/2023] Open
Abstract
This article provides a comprehensive narrative review of the history of antiepileptic drugs (AEDs) and their development over time. Firstly, it explores the significant role of serendipity in the discovery of essential AEDs that continue to be used today, such as phenobarbital and valproic acid. Subsequently, it delves into the historical progression of crucial preclinical models employed in the development of novel AEDs, including the maximal electroshock stimulation test, pentylenetetrazol-induced test, kindling models, and other animal models. Moving forward, a concise overview of the clinical advancement of major AEDs is provided, highlighting the initial milestones and the subsequent refinement of this process in recent decades, in line with the emergence of evidence-based medicine and the implementation of increasingly rigorous controlled clinical trials. Lastly, the article explores the contributions of artificial intelligence, while also offering recommendations and discussing future perspectives for the development of new AEDs.
Collapse
Affiliation(s)
- María Gabriela Corrales-Hernández
- Pharmacology Unit, Department of Biomedical Sciences, School of Medicine and Health Sciences, Universidad del Rosario, Bogotá 111221, Colombia
| | - Sebastián Kurt Villarroel-Hagemann
- Pharmacology Unit, Department of Biomedical Sciences, School of Medicine and Health Sciences, Universidad del Rosario, Bogotá 111221, Colombia
| | | | - Leonardo Palacios-Sánchez
- Neuroscience Research Group (NeURos), NeuroVitae Center for Neuroscience, School of Medicine and Health Sciences, Universidad del Rosario, Bogotá 111221, Colombia
| | - Mariana Gaviria-Carrillo
- Neuroscience Research Group (NeURos), NeuroVitae Center for Neuroscience, School of Medicine and Health Sciences, Universidad del Rosario, Bogotá 111221, Colombia
| | | | - Santiago Espinosa-Lugo
- Pharmacology Unit, Department of Biomedical Sciences, School of Medicine and Health Sciences, Universidad del Rosario, Bogotá 111221, Colombia
| | - Carlos-Alberto Calderon-Ospina
- Pharmacology Unit, Department of Biomedical Sciences, School of Medicine and Health Sciences, Universidad del Rosario, Bogotá 111221, Colombia
- Research Group in Applied Biomedical Sciences (UR Biomed), School of Medicine and Health Sciences, Universidad del Rosario, Bogotá 111221, Colombia
| | - Jesús Hernán Rodríguez-Quintana
- Fundacion CardioInfantil-Instituto de Cardiologia, Calle 163a # 13B-60, Bogotá 111156, Colombia
- Hospital Universitario Mayor Mederi, Calle 24 # 29-45, Bogotá 111411, Colombia
| |
Collapse
|
10
|
Dixit R, Khambhati K, Supraja KV, Singh V, Lederer F, Show PL, Awasthi MK, Sharma A, Jain R. Application of machine learning on understanding biomolecule interactions in cellular machinery. BIORESOURCE TECHNOLOGY 2023; 370:128522. [PMID: 36565819 DOI: 10.1016/j.biortech.2022.128522] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/17/2022] [Accepted: 12/20/2022] [Indexed: 06/17/2023]
Abstract
Machine learning (ML) applications have become ubiquitous in all fields of research including protein science and engineering. Apart from protein structure and mutation prediction, scientists are focusing on knowledge gaps with respect to the molecular mechanisms involved in protein binding and interactions with other components in the experimental setups or the human body. Researchers are working on several wet-lab techniques and generating data for a better understanding of concepts and mechanics involved. The information like biomolecular structure, binding affinities, structure fluctuations and movements are enormous which can be handled and analyzed by ML. Therefore, this review highlights the significance of ML in understanding the biomolecular interactions while assisting in various fields of research such as drug discovery, nanomedicine, nanotoxicity and material science. Hence, the way ahead would be to force hand-in hand of laboratory work and computational techniques.
Collapse
Affiliation(s)
- Rewati Dixit
- Waste Treatment Laboratory, Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Haus-khas, New Delhi 110016, India
| | - Khushal Khambhati
- Department of Biosciences, School of Science, Indrashil University, Rajpur, Mehsana 382715, Gujarat, India
| | - Kolli Venkata Supraja
- Waste Treatment Laboratory, Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Haus-khas, New Delhi 110016, India
| | - Vijai Singh
- Department of Biosciences, School of Science, Indrashil University, Rajpur, Mehsana 382715, Gujarat, India
| | - Franziska Lederer
- Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz Institute Freiberg for Resource Technology, Bautzner landstrasse 400, 01328 Dresden, Germany
| | - Pau-Loke Show
- Zhejiang Provincial Key Laboratory for Subtropical Water Environment and Marine Biological Resources Protection, Wenzhou University, Wenzhou 325035, China; Department of Sustainable Engineering, Saveetha School of Engineering, SIMATS, Chennai 602105, India; Department of Chemical and Environmental Engineering, University of Nottingham, Malaysia, 43500 Semenyih, Selangor Darul Ehsan, Malaysia
| | - Mukesh Kumar Awasthi
- College of Natural Resources and Environment, Northwest A&F University, Yangling 712100, China
| | - Abhinav Sharma
- Institute Theory of Polymers, Leibniz Institute for Polymer Research, Hohe Strasse 6, 01069 Dresden, Germany
| | - Rohan Jain
- Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz Institute Freiberg for Resource Technology, Bautzner landstrasse 400, 01328 Dresden, Germany.
| |
Collapse
|
11
|
Abstract
Glycoscience assembles all the scientific disciplines involved in studying various molecules and macromolecules containing carbohydrates and complex glycans. Such an ensemble involves one of the most extensive sets of molecules in quantity and occurrence since they occur in all microorganisms and higher organisms. Once the compositions and sequences of these molecules are established, the determination of their three-dimensional structural and dynamical features is a step toward understanding the molecular basis underlying their properties and functions. The range of the relevant computational methods capable of addressing such issues is anchored by the specificity of stereoelectronic effects from quantum chemistry to mesoscale modeling throughout molecular dynamics and mechanics and coarse-grained and docking calculations. The Review leads the reader through the detailed presentations of the applications of computational modeling. The illustrations cover carbohydrate-carbohydrate interactions, glycolipids, and N- and O-linked glycans, emphasizing their role in SARS-CoV-2. The presentation continues with the structure of polysaccharides in solution and solid-state and lipopolysaccharides in membranes. The full range of protein-carbohydrate interactions is presented, as exemplified by carbohydrate-active enzymes, transporters, lectins, antibodies, and glycosaminoglycan binding proteins. A final section features a list of 150 tools and databases to help address the many issues of structural glycobioinformatics.
Collapse
Affiliation(s)
- Serge Perez
- Centre de Recherche sur les Macromolecules Vegetales, University of Grenoble-Alpes, Centre National de la Recherche Scientifique, Grenoble F-38041, France
| | - Olga Makshakova
- FRC Kazan Scientific Center of Russian Academy of Sciences, Kazan Institute of Biochemistry and Biophysics, Kazan 420111, Russia
| |
Collapse
|
12
|
Wang R, Jin J, Zou Q, Nakai K, Wei L. Predicting protein-peptide binding residues via interpretable deep learning. Bioinformatics 2022; 38:3351-3360. [PMID: 35604077 DOI: 10.1093/bioinformatics/btac352] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 04/13/2022] [Accepted: 05/18/2022] [Indexed: 11/14/2022] Open
Abstract
Identifying the protein-peptide binding residues is fundamentally important to understand the mechanisms of protein functions and explore drug discovery. Although several computational methods have been developed, they highly rely on third-party tools or information for feature design, easily resulting in low computational efficacy and suffering from low predictive performance. To address the limitations, we propose PepBCL, a novel BERT (Bidirectional Encoder Representation from Transformers)-based Contrastive Learning framework to predict the protein-Peptide binding residues based on protein sequences only. PepBCL is an end-to-end predictive model that is independent of designed features. Specifically, we introduce a well pre-trained protein language model that can automatically extract and learn high-latent representations of protein sequences relevant for protein structure and functions. Further, we design a novel contrastive learning module to optimize the feature representations of binding residues underlying the imbalanced dataset. We demonstrate that our proposed method significantly outperforms the state-of-the-art methods under benchmarking comparison, and achieves more robust performance. Moreover, we found that we further improve the performance via the integration of traditional features and our learnt features. Our results highlight the flexibility and adaptability of deep learning-based protein language model to capture both conserved and non-conserved sequential characteristics of peptide-binding residues. Interestingly, we demonstrate that peptide-binding residues in local sequential regions have more specific sequential patterns as compared with other protein-ligand binding residues, which potentially provides functional difference. Finally, to facilitate the use of our method, we establish an online predictive platform as the implementation of the proposed PepBCL, which is now available at http://server.wei-group.net/PepBCL/. AVAILABILITY https://github.com/Ruheng-W/PepBCL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruheng Wang
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Junru Jin
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| | - Kenta Nakai
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| |
Collapse
|
13
|
PCA-MutPred: Prediction of binding free energy change upon missense mutation in protein-carbohydrate complexes. J Mol Biol 2022; 434:167526. [DOI: 10.1016/j.jmb.2022.167526] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 02/26/2022] [Accepted: 03/01/2022] [Indexed: 11/22/2022]
|
14
|
Machine Learning Based Restaurant Sales Forecasting. MACHINE LEARNING AND KNOWLEDGE EXTRACTION 2022. [DOI: 10.3390/make4010006] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
To encourage proper employee scheduling for managing crew load, restaurants need accurate sales forecasting. This paper proposes a case study on many machine learning (ML) models using real-world sales data from a mid-sized restaurant. Trendy recurrent neural network (RNN) models are included for direct comparison to many methods. To test the effects of trend and seasonality, we generate three different datasets to train our models with and to compare our results. To aid in forecasting, we engineer many features and demonstrate good methods to select an optimal sub-set of highly correlated features. We compare the models based on their performance for forecasting time steps of one-day and one-week over a curated test dataset. The best results seen in one-day forecasting come from linear models with a sMAPE of only 19.6%. Two RNN models, LSTM and TFT, and ensemble models also performed well with errors less than 20%. When forecasting one-week, non-RNN models performed poorly, giving results worse than 20% error. RNN models extended better with good sMAPE scores giving 19.5% in the best result. The RNN models performed worse overall on datasets with trend and seasonality removed, however many simpler ML models performed well when linearly separating each training instance.
Collapse
|
15
|
Lailvaux SP, Mishra A, Pun P, Ul Kabir MW, Wilson RS, Herrel A, Hoque MT. Machine learning accurately predicts the multivariate performance phenotype from morphology in lizards. PLoS One 2022; 17:e0261613. [PMID: 35061733 PMCID: PMC8782310 DOI: 10.1371/journal.pone.0261613] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 12/06/2021] [Indexed: 11/18/2022] Open
Abstract
Completing the genotype-to-phenotype map requires rigorous measurement of the entire multivariate organismal phenotype. However, phenotyping on a large scale is not feasible for many kinds of traits, resulting in missing data that can also cause problems for comparative analyses and the assessment of evolutionary trends across species. Measuring the multivariate performance phenotype is especially logistically challenging, and our ability to predict several performance traits from a given morphology is consequently poor. We developed a machine learning model to accurately estimate multivariate performance data from morphology alone by training it on a dataset containing performance and morphology data from 68 lizard species. Our final, stacked model predicts missing performance data accurately at the level of the individual from simple morphological measures. This model performed exceptionally well, even for performance traits that were missing values for >90% of the sampled individuals. Furthermore, incorporating phylogeny did not improve model fit, indicating that the phenotypic data alone preserved sufficient information to predict the performance based on morphological information. This approach can both significantly increase our understanding of performance evolution and act as a bridge to incorporate performance into future work on phenomics.
Collapse
Affiliation(s)
- Simon P. Lailvaux
- Department of Biological Sciences, The University of New Orleans, New Orleans, LA, United States of America
| | - Avdesh Mishra
- Department of Electrical Engineering and Computer Science, Texas A&M University-Kingsville, Kingsville, TX, United States of America
| | - Pooja Pun
- Department of Computer Science, The University of New Orleans, New Orleans, LA, United States of America
| | - Md Wasi Ul Kabir
- Department of Computer Science, The University of New Orleans, New Orleans, LA, United States of America
| | - Robbie S. Wilson
- School of Biological Sciences, The University of Queensland, St. Lucia, Queensland, Australia
| | - Anthony Herrel
- Département Adaptations du Vivant, UMR 7179 C.N.R.S/M.N.H.N., Paris, France
| | - Md Tamjidul Hoque
- Department of Computer Science, The University of New Orleans, New Orleans, LA, United States of America
| |
Collapse
|
16
|
Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers 2021; 25:1315-1360. [PMID: 33844136 PMCID: PMC8040371 DOI: 10.1007/s11030-021-10217-3] [Citation(s) in RCA: 286] [Impact Index Per Article: 95.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/22/2021] [Indexed: 02/06/2023]
Abstract
Drug designing and development is an important area of research for pharmaceutical companies and chemical scientists. However, low efficacy, off-target delivery, time consumption, and high cost impose a hurdle and challenges that impact drug design and discovery. Further, complex and big data from genomics, proteomics, microarray data, and clinical trials also impose an obstacle in the drug discovery pipeline. Artificial intelligence and machine learning technology play a crucial role in drug discovery and development. In other words, artificial neural networks and deep learning algorithms have modernized the area. Machine learning and deep learning algorithms have been implemented in several drug discovery processes such as peptide synthesis, structure-based virtual screening, ligand-based virtual screening, toxicity prediction, drug monitoring and release, pharmacophore modeling, quantitative structure-activity relationship, drug repositioning, polypharmacology, and physiochemical activity. Evidence from the past strengthens the implementation of artificial intelligence and deep learning in this field. Moreover, novel data mining, curation, and management techniques provided critical support to recently developed modeling algorithms. In summary, artificial intelligence and deep learning advancements provide an excellent opportunity for rational drug design and discovery process, which will eventually impact mankind. The primary concern associated with drug design and development is time consumption and production cost. Further, inefficiency, inaccurate target delivery, and inappropriate dosage are other hurdles that inhibit the process of drug delivery and development. With advancements in technology, computer-aided drug design integrating artificial intelligence algorithms can eliminate the challenges and hurdles of traditional drug design and development. Artificial intelligence is referred to as superset comprising machine learning, whereas machine learning comprises supervised learning, unsupervised learning, and reinforcement learning. Further, deep learning, a subset of machine learning, has been extensively implemented in drug design and development. The artificial neural network, deep neural network, support vector machines, classification and regression, generative adversarial networks, symbolic learning, and meta-learning are examples of the algorithms applied to the drug design and discovery process. Artificial intelligence has been applied to different areas of drug design and development process, such as from peptide synthesis to molecule design, virtual screening to molecular docking, quantitative structure-activity relationship to drug repositioning, protein misfolding to protein-protein interactions, and molecular pathway identification to polypharmacology. Artificial intelligence principles have been applied to the classification of active and inactive, monitoring drug release, pre-clinical and clinical development, primary and secondary drug screening, biomarker development, pharmaceutical manufacturing, bioactivity identification and physiochemical properties, prediction of toxicity, and identification of mode of action.
Collapse
Affiliation(s)
- Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Devesh Srivastava
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Mehar Sahu
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Swati Tiwari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India.
| |
Collapse
|
17
|
Nance ML, Labonte JW, Adolf-Bryfogle J, Gray JJ. Development and Evaluation of GlycanDock: A Protein-Glycoligand Docking Refinement Algorithm in Rosetta. J Phys Chem B 2021; 125:10.1021/acs.jpcb.1c00910. [PMID: 34133179 PMCID: PMC8742512 DOI: 10.1021/acs.jpcb.1c00910] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Carbohydrate chains are ubiquitous in the complex molecular processes of life. These highly diverse chains are recognized by a variety of protein receptors, enabling glycans to regulate many biological functions. High-resolution structures of protein-glycoligand complexes reveal the atomic details necessary to understand this level of molecular recognition and inform application-focused scientific and engineering pursuits. When experimental challenges hinder high-throughput determination of quality structures, computational tools can, in principle, fill the gap. In this work, we introduce GlycanDock, a residue-centric protein-glycoligand docking refinement algorithm developed within the Rosetta macromolecular modeling and design software suite. We performed a benchmark docking assessment using a set of 109 experimentally determined protein-glycoligand complexes as well as 62 unbound protein structures. The GlycanDock algorithm can sample and discriminate among protein-glycoligand models of native-like structural accuracy with statistical reliability from starting structures of up to 7 Å root-mean-square deviation in the glycoligand ring atoms. We show that GlycanDock-refined models qualitatively replicated the known binding specificity of a bacterial carbohydrate-binding module. Finally, we present a protein-glycoligand docking pipeline for generating putative protein-glycoligand complexes when only the glycoligand sequence and unbound protein structure are known. In combination with other carbohydrate modeling tools, the GlycanDock docking refinement algorithm will accelerate research in the glycosciences.
Collapse
Affiliation(s)
- Morgan L. Nance
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Jason W. Labonte
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
- Department of Chemistry, Franklin & Marshall College, Lancaster, Pennsylvania 17603, United States
- Department of Chemistry, Gettysburg College, Gettysburg, Pennsylvania 17325, United States
| | - Jared Adolf-Bryfogle
- Protein Design Lab, Institute for Protein Innovation, Boston, Massachusetts 02115, United States
- Division of Hematology/Oncology, Boston Children’s Hospital, Boston, Massachusetts 02115, United States
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Jeffrey J. Gray
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, United States
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| |
Collapse
|
18
|
Panta M, Mishra A, Hoque MT, Atallah J. ClassifyTE: A stacking based prediction of hierarchical classification of transposable elements. Bioinformatics 2021; 37:2529-2536. [PMID: 33682878 DOI: 10.1093/bioinformatics/btab146] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 02/10/2021] [Accepted: 03/01/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Transposable Elements (TEs) or jumping genes are DNA sequences that have an intrinsic capability to move within a host genome from one genomic location to another. Studies show that the presence of a TE within or adjacent to a functional gene may alter its expression. TEs can also cause an increase in the rate of mutation and can even mediate duplications and large insertions and deletions in the genome, promoting gross genetic rearrangements. The proper classification of identified jumping genes is important for analyzing their genetic and evolutionary effects. An effective classifier, which can explain the role of TEs in germline and somatic evolution more accurately, is needed. In this study, we examine the performance of a variety of machine learning (ML) techniques and propose a robust method, ClassifyTE, for the hierarchical classification of TEs with high accuracy, using a stacking-based ML method. RESULTS We propose a stacking-based approach for the hierarchical classification of TEs. When trained on three different benchmark datasets, our proposed system achieved 4%, 10.68%, and 10.13% average percentage improvement (using the hF measure) compared to several state-of-the-art methods. We developed an end-to-end automated hierarchical classification tool based on the proposed approach, ClassifyTE, to classify TEs up to the super-family level. We further evaluated our method on a new TE library generated by a homology-based classification method and found relatively high concordance at higher taxonomic levels. Thus, ClassifyTE paves the way for a more accurate analysis of the role of TEs. AVAILABILITY The source code and data are available at https://github.com/manisa/ClassifyTE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Manisha Panta
- Department of Computer Science, University of New Orleans, New Orleans, LA 70148, USA
| | - Avdesh Mishra
- Department of Electrical Engineering and Computer Science, Texas A&M University-Kingsville, Kingsville, TX, 78363, USA
| | - Md Tamjidul Hoque
- Department of Computer Science, University of New Orleans, New Orleans, LA 70148, USA
| | - Joel Atallah
- Department of Biological Sciences, University of New Orleans, New Orleans, LA 70148, USA
| |
Collapse
|
19
|
Mei X, Chang Y, Shen J, Zhang Y, Xue C. Expression and characterization of a novel alginate-binding protein: A promising tool for investigating alginate. Carbohydr Polym 2020; 246:116645. [PMID: 32747278 DOI: 10.1016/j.carbpol.2020.116645] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 05/06/2020] [Accepted: 06/11/2020] [Indexed: 12/26/2022]
Abstract
Alginate is a commercially important polysaccharide widely applied in various industries. Carbohydrate-binding proteins could be utilized as desirable tools in the investigation and further applications of polysaccharides. Few alginate-binding proteins have hitherto been characterized and reported. In the present study, a novel alginate-binding protein ABP_Wf, consisting of two "orphan" carbohydrate-binding modules, was cloned from a predicted alginate utilization locus of marine bacterium Wenyingzhuangia funcanilytica, and expressed in Escherichia coli. ABP_Wf exhibited a specific binding capacity to alginate, and the association constant (Ka) and affinity (KD) were 1.94 × 103 M-1s-1 and 1.16 × 10-6 M. It was confirmed that the binding capacity of ABP_Wf to alginate is attributed to its constituent CBM16 domain rather than the CBM44 domain. The potentials of ABP_Wf in the semi-quantitative detection and the in situ visualization of alginate were evaluated, which implied that ABP_Wf could be served as a promising tool for investigating alginate.
Collapse
Affiliation(s)
- Xuanwei Mei
- College of Food Science and Engineering, Ocean University of China, 5 Yushan Road, Qingdao, 266003, China
| | - Yaoguang Chang
- College of Food Science and Engineering, Ocean University of China, 5 Yushan Road, Qingdao, 266003, China; Laboratory for Marine Drugs and Bioproducts, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266237, China.
| | - Jingjing Shen
- College of Food Science and Engineering, Ocean University of China, 5 Yushan Road, Qingdao, 266003, China
| | - Yuying Zhang
- College of Food Science and Engineering, Ocean University of China, 5 Yushan Road, Qingdao, 266003, China
| | - Changhu Xue
- College of Food Science and Engineering, Ocean University of China, 5 Yushan Road, Qingdao, 266003, China; Laboratory for Marine Drugs and Bioproducts, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266237, China
| |
Collapse
|
20
|
AIBH: Accurate Identification of Brain Hemorrhage Using Genetic Algorithm Based Feature Selection and Stacking. MACHINE LEARNING AND KNOWLEDGE EXTRACTION 2020. [DOI: 10.3390/make2020005] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Brain hemorrhage is a type of stroke which is caused by a ruptured artery, resulting in localized bleeding in or around the brain tissues. Among a variety of imaging tests, a computerized tomography (CT) scan of the brain enables the accurate detection and diagnosis of a brain hemorrhage. In this work, we developed a practical approach to detect the existence and type of brain hemorrhage in a CT scan image of the brain, called Accurate Identification of Brain Hemorrhage, abbreviated as AIBH. The steps of the proposed method consist of image preprocessing, image segmentation, feature extraction, feature selection, and design of an advanced classification framework. The image preprocessing and segmentation steps involve removing the skull region from the image and finding out the region of interest (ROI) using Otsu’s method, respectively. Subsequently, feature extraction includes the collection of a comprehensive set of features from the ROI, such as the size of the ROI, centroid of the ROI, perimeter of the ROI, the distance between the ROI and the skull, and more. Furthermore, a genetic algorithm (GA)-based feature selection algorithm is utilized to select relevant features for improved performance. These features are then used to train the stacking-based machine learning framework to predict different types of a brain hemorrhage. Finally, the evaluation results indicate that the proposed predictor achieves a 10-fold cross-validation (CV) accuracy (ACC), precision (PR), Recall, F1-score, and Matthews correlation coefficient (MCC) of 99.5%, 99%, 98.9%, 0.989, and 0.986, respectively, on the benchmark CT scan dataset. While comparing AIBH with the existing state-of-the-art classification method of the brain hemorrhage type, AIBH provides an improvement of 7.03%, 7.27%, and 7.38% based on PR, Recall, and F1-score, respectively. Therefore, the proposed approach considerably outperforms the existing brain hemorrhage classification approach and can be useful for the effective prediction of brain hemorrhage types from CT scan images (The code and data can be found here: http://cs.uno.edu/~tamjid/Software/AIBH/code_data.zip).
Collapse
|