101
|
Zhu R, Ji C, Wang Y, Cai Y, Wu H. Heterogeneous Graph Convolutional Networks and Matrix Completion for miRNA-Disease Association Prediction. Front Bioeng Biotechnol 2020; 8:901. [PMID: 32974293 PMCID: PMC7468400 DOI: 10.3389/fbioe.2020.00901] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 05/13/2020] [Indexed: 01/21/2023] Open
Abstract
Due to the cost and complexity of biological experiments, many computational methods have been proposed to predict potential miRNA-disease associations by utilizing known miRNA-disease associations and other related information. However, there are some challenges for these computational methods. First, the relationships between miRNAs and diseases are complex. The computational network should consider the local and global influence of neighborhoods from the network. Furthermore, predicting disease-related miRNAs without any known associations is also very important. This study presents a new computational method that constructs a heterogeneous network composed of a miRNA similarity network, disease similarity network, and known miRNA-disease association network. The miRNA similarity considers the miRNAs and their possible families and clusters. The information of each node in heterogeneous network is obtained by aggregating neighborhood information with graph convolutional networks (GCNs), which can pass the information of a node to its intermediate and distant neighbors. Disease-related miRNAs with no known associations can be predicted with the reconstructed heterogeneous matrix. We apply 5-fold cross-validation, leave-one-disease-out cross-validation, and global and local leave-one-out cross-validation to evaluate our method. The corresponding areas under the curves (AUCs) are 0.9616, 0.9946, 0.9656, and 0.9532, confirming that our approach significantly outperforms the state-of-the-art methods. Case studies show that this approach can effectively predict new diseases without any known miRNAs.
Collapse
Affiliation(s)
- Rongxiang Zhu
- Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China
| | - Chaojie Ji
- Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yingying Wang
- Department of Neurology and Stroke Center, The First Affiliated Hospital of Jinan University, Guangzhou, China.,Clinical Neuroscience Institute, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Yunpeng Cai
- Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Hongyan Wu
- Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
102
|
Identifying and ranking potential cancer drivers using representation learning on attributed network. Methods 2020; 192:13-24. [PMID: 32758683 DOI: 10.1016/j.ymeth.2020.07.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 07/16/2020] [Accepted: 07/29/2020] [Indexed: 12/14/2022] Open
Abstract
Cancer can arise as a consequence of the accumulation of genomic alterations. Only a small part of driver mutations contributes to cancer development and progression. Hence, the identification of genes and alterations that serve as drivers for cancer development plays a critical role in drug design, cancer diagnoses and treatment. In this study, we propose a novel method to identify potential cancer drivers by using a Representation Learning method on Attributed Graphs (called RLAG). It is a first attempt to use both network structure and node attributes to learn feature representation for the genes in the network. Then it leverages these feature vectors to divide the genes into several subgroups. Finally, potential cancer driver genes are prioritized according to ranking scores that measure both genes' properties and their importance in the subgroups. We apply our method to predict driver genes for lung cancer, breast cancer and prostate cancer. The results show that our method outperforms the other three state-of-the-art methods in terms of Precision, Recall and F1-score values.
Collapse
|
103
|
Feng P, Feng L. Recent Advances on Antioxidant Identification Based on Machine Learning Methods. Curr Drug Metab 2020; 21:804-809. [PMID: 32682368 DOI: 10.2174/1389200221666200719001449] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Revised: 03/17/2020] [Accepted: 05/13/2020] [Indexed: 11/22/2022]
Abstract
Antioxidants are molecules that can prevent damages to cells caused by free radicals. Recent studies also demonstrated that antioxidants play roles in preventing diseases. However, the number of known molecules with antioxidant activity is very small. Therefore, it is necessary to identify antioxidants from various resources. In the past several years, a series of computational methods have been proposed to identify antioxidants. In this review, we briefly summarized recent advances in computationally identifying antioxidants. The challenges and future perspectives for identifying antioxidants were also discussed. We hope this review will provide insights into researches on antioxidant identification.
Collapse
Affiliation(s)
- Pengmian Feng
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Lijing Feng
- School of Sciences, North China University of Science and Technology, Tangshan 063000, China
| |
Collapse
|
104
|
A machine learning approach for mortality prediction only using non-invasive parameters. Med Biol Eng Comput 2020; 58:2195-2238. [PMID: 32691219 DOI: 10.1007/s11517-020-02174-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 03/26/2020] [Indexed: 10/23/2022]
Abstract
At present, the traditional scoring methods generally utilize laboratory measurements to predict mortality. It results in difficulties of early mortality prediction in the rural areas lack of professional laboratorians and medical laboratory equipment. To improve the efficiency, accuracy, and applicability of mortality prediction in the remote areas, a novel mortality prediction method based on machine learning algorithms is proposed, which only uses non-invasive parameters readily available from ordinary monitors and manual measurement. A new feature selection method based on the Bayes error rate is developed to select valuable features. Based on non-invasive parameters, four machine learning models were trained for early mortality prediction. The subjects contained in this study suffered from general critical diseases including but not limited to cancer, bone fracture, and diarrhea. Comparison tests among five traditional scoring methods and these four machine learning models with and without laboratory measurement variables are performed. Only using the non-invasive parameters, the LightGBM algorithms have an excellent performance with the largest accuracy of 0.797 and AUC of 0.879. There is no apparent difference between the mortality prediction performance with and without laboratory measurement variables for the four machine learning methods. After reducing the number of feature variables to no more than 50, the machine learning models still outperform the traditional scoring systems, with AUC higher than 0.83. The machine learning approaches only using non-invasive parameters achieved an excellent mortality prediction performance and can equal those using extra laboratory measurements, which can be applied in rural areas and remote battlefield for mortality risk evaluation. Graphical abstract.
Collapse
|
105
|
Lin E, Lin CH, Lane HY. Relevant Applications of Generative Adversarial Networks in Drug Design and Discovery: Molecular De Novo Design, Dimensionality Reduction, and De Novo Peptide and Protein Design. Molecules 2020; 25:E3250. [PMID: 32708785 PMCID: PMC7397124 DOI: 10.3390/molecules25143250] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 07/11/2020] [Accepted: 07/14/2020] [Indexed: 01/16/2023] Open
Abstract
A growing body of evidence now suggests that artificial intelligence and machine learning techniques can serve as an indispensable foundation for the process of drug design and discovery. In light of latest advancements in computing technologies, deep learning algorithms are being created during the development of clinically useful drugs for treatment of a number of diseases. In this review, we focus on the latest developments for three particular arenas in drug design and discovery research using deep learning approaches, such as generative adversarial network (GAN) frameworks. Firstly, we review drug design and discovery studies that leverage various GAN techniques to assess one main application such as molecular de novo design in drug design and discovery. In addition, we describe various GAN models to fulfill the dimension reduction task of single-cell data in the preclinical stage of the drug development pipeline. Furthermore, we depict several studies in de novo peptide and protein design using GAN frameworks. Moreover, we outline the limitations in regard to the previous drug design and discovery studies using GAN models. Finally, we present a discussion of directions and challenges for future research.
Collapse
Affiliation(s)
- Eugene Lin
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan
| | - Chieh-Hsin Lin
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan
- Department of Psychiatry, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University College of Medicine, Kaohsiung 83301, Taiwan
- School of Medicine, Chang Gung University, Taoyuan 33302, Taiwan
| | - Hsien-Yuan Lane
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan
- Department of Psychiatry, China Medical University Hospital, Taichung 40447, Taiwan
- Brain Disease Research Center, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Psychology, College of Medical and Health Sciences, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
106
|
Identification of an Individualized Prognostic Signature Based on the RWSR Model in Early-Stage Bladder Carcinoma. BIOMED RESEARCH INTERNATIONAL 2020; 2020:9186546. [PMID: 32596394 PMCID: PMC7293744 DOI: 10.1155/2020/9186546] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 05/11/2020] [Indexed: 12/19/2022]
Abstract
Bladder cancer (BLCA) is the fourth common cancer among males in the United States, which is also the fourth leading cause of cancer-related death in old males. BLCA has a high recurrence rate, with over 50% of patients which has at least one recurrence within five years. Due to the complexity of the molecular mechanisms and heterogeneous cancer feature, BLCA clinicians find it hard to make an efficient management decision as they lack reliable assessment of mortality risk. Meanwhile, there is currently no screening suitable prognostic signature or method recommended for early detection, which is significantly important to early-stage detection and prognosis. In this study, a novel model, named the risk-weighted sparse regression (RWSR) model, is constructed to identify a robust signature for patients of early-stage BLCA. The 17-gene signature is generated and then validated as an independent prognostic factor in BLCA cohorts from GSE13507 and TCGA_BLCA datasets. Meanwhile, a risk score model is developed and validated among the 17-gene signature. The risk score is also considered an independent factor for prognosis prediction, which is confirmed through prognosis analysis. The Kaplan-Meier with the log-rank test is used to assess survival difference. Furthermore, the predictive capacity of the signature is proved through stratification analysis. Finally, an effective patient classification is completed by a combination of the 17-gene signature and stage information, which is for better survival prediction and treatment decisions. Besides, 11 genes in the signature, such as coiled-coil domain containing 73 (CCDC73) and protein kinase, DNA-activated, and catalytic subunit (PRKDC), are proved to be prognosis marker genes or strongly associated with prognosis and progress of other types of cancer in published literature already. As a result, this paper would more accurately predict a patient's prognosis and improve surveillance in the clinical setting, which may provide a quantitative and reliable decision-making basis for the treatment plan.
Collapse
|
107
|
Savaikar MA, Whitehead T, Roy S, Strong L, Fettig N, Prmeau T, Luo J, Li S, Wahl RL, Shoghi KI. Preclinical PERCIST and 25% of SUV max Threshold: Precision Imaging of Response to Therapy in Co-clinical 18F-FDG PET Imaging of Triple-Negative Breast Cancer Patient-Derived Tumor Xenografts. J Nucl Med 2020; 61:842-849. [PMID: 31757841 PMCID: PMC7262224 DOI: 10.2967/jnumed.119.234286] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 10/30/2019] [Indexed: 11/16/2022] Open
Abstract
Numerous recent works highlight the limited utility of established tumor cell lines in recapitulating the heterogeneity of tumors in patients. More realistic preclinical cancer models are thought to be provided by transplantable, patient-derived xenografts (PDXs). The inter- and intratumor heterogeneity of PDXs, however, presents several challenges in developing optimal quantitative pipelines to assess response to therapy. The objective of this work was to develop and optimize image metrics for 18F-FDG PET to assess response to combination docetaxel and carboplatin therapy in a co-clinical trial involving triple-negative breast cancer PDXs. We characterized the reproducibility of standardized uptake value (SUV) metrics to assess response to therapy, and we optimized a preclinical PERCIST paradigm to complement clinical standards. Considerations in this effort included variability in tumor growth rate and tumor size, solid tumors versus tumor heterogeneity and a necrotic phenotype, and optimal selection of tumor slices versus whole tumor. Methods: A test-retest protocol was implemented to optimize the reproducibility of 18F-FDG PET SUV thresholds, SUVpeak metrics, and preclinical PERCIST parameters. In assessing response to therapy, 18F-FDG PET imaging was performed at baseline and 4 d after therapy. The reproducibility, accuracy, variability, and performance of imaging metrics to assess response to therapy were determined. We defined an index called the Quantitative Response Assessment Score to integrate parameters of prediction and precision and thus aid in selecting the optimal image metric to assess response to therapy. Results: Our data suggest that a threshold of 25% of SUVmax (SUV25) was highly reproducible (<9% variability). The concordance and reproducibility of preclinical PERCIST were maximized at α = 0.7 and β = 2.8 and exhibited a high correlation with SUV25 measures of tumor uptake, which in turn correlated with the SUV of metabolic tumor. Conclusion: The Quantitative Response Assessment Score favors SUV25 followed by SUVpeak for a sphere with a volume of 14 mm3 (SUVP14) as optimal metrics of response to therapy. Additional studies are warranted to fully characterize the utility of SUV25 and preclinical PERCIST SUVP14 as image metrics for response to therapy across a wide range of therapeutic regimens and PDX models.
Collapse
Affiliation(s)
- Madhusudan A Savaikar
- Department of Radiology, Washington University School of Medicine, St. Louis, Missouri
| | - Timothy Whitehead
- Department of Radiology, Washington University School of Medicine, St. Louis, Missouri
| | - Sudipta Roy
- Department of Radiology, Washington University School of Medicine, St. Louis, Missouri
| | - Lori Strong
- Department of Radiology, Washington University School of Medicine, St. Louis, Missouri
| | - Nicole Fettig
- Department of Radiology, Washington University School of Medicine, St. Louis, Missouri
| | - Tina Prmeau
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, Missouri
| | - Jingqin Luo
- Department of Surgery, Washington University School of Medicine, St. Louis, Missouri; and
| | - Shunqiang Li
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, Missouri
| | - Richard L Wahl
- Department of Radiology, Washington University School of Medicine, St. Louis, Missouri
| | - Kooresh I Shoghi
- Department of Radiology, Washington University School of Medicine, St. Louis, Missouri
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, Missouri
| |
Collapse
|
108
|
Kilic A, Goyal A, Miller JK, Gjekmarkaj E, Tam WL, Gleason TG, Sultan I, Dubrawksi A. Predictive Utility of a Machine Learning Algorithm in Estimating Mortality Risk in Cardiac Surgery. Ann Thorac Surg 2020; 109:1811-1819. [DOI: 10.1016/j.athoracsur.2019.09.049] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 08/28/2019] [Accepted: 09/12/2019] [Indexed: 10/25/2022]
|
109
|
Dao FY, Lv H, Yang YH, Zulfiqar H, Gao H, Lin H. Computational identification of N6-methyladenosine sites in multiple tissues of mammals. Comput Struct Biotechnol J 2020; 18:1084-1091. [PMID: 32435427 PMCID: PMC7229270 DOI: 10.1016/j.csbj.2020.04.015] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Revised: 04/20/2020] [Accepted: 04/21/2020] [Indexed: 12/12/2022] Open
Abstract
N6-methyladenosine (m6A) is the methylation of the adenosine at the nitrogen-6 position, which is the most abundant RNA methylation modification and involves a series of important biological processes. Accurate identification of m6A sites in genome-wide is invaluable for better understanding their biological functions. In this work, an ensemble predictor named iRNA-m6A was established to identify m6A sites in multiple tissues of human, mouse and rat based on the data from high-throughput sequencing techniques. In the proposed predictor, RNA sequences were encoded by physical-chemical property matrix, mono-nucleotide binary encoding and nucleotide chemical property. Subsequently, these features were optimized by using minimum Redundancy Maximum Relevance (mRMR) feature selection method. Based on the optimal feature subset, the best m6A classification models were trained by Support Vector Machine (SVM) with 5-fold cross-validation test. Prediction results on independent dataset showed that our proposed method could produce the excellent generalization ability. We also established a user-friendly webserver called iRNA-m6A which can be freely accessible at http://lin-group.cn/server/iRNA-m6A. This tool will provide more convenience to users for studying m6A modification in different tissues.
Collapse
Affiliation(s)
| | | | - Yu-He Yang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hasan Zulfiqar
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Gao
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
110
|
Cesari M, Christensen JAE, Muntean ML, Mollenhauer B, Sixel-Döring F, Sorensen HBD, Trenkwalder C, Jennum P. A data-driven system to identify REM sleep behavior disorder and to predict its progression from the prodromal stage in Parkinson's disease. Sleep Med 2020; 77:238-248. [PMID: 32798136 DOI: 10.1016/j.sleep.2020.04.010] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 04/04/2020] [Accepted: 04/10/2020] [Indexed: 11/18/2022]
Abstract
OBJECTIVES To investigate electroencephalographic (EEG), electrooculographic (EOG) and micro-sleep abnormalities associated with rapid eye movement (REM) sleep behavior disorder (RBD) and REM behavioral events (RBEs) in Parkinson's disease (PD). METHODS We developed an automated system using only EEG and EOG signals. First, automatic macro- (30-s epochs) and micro-sleep (5-s mini-epochs) staging was performed. Features describing micro-sleep structure, EEG spectral content, EEG coherence, EEG complexity, and EOG energy were derived. All features were input to an ensemble of random forests, giving as outputs the probabilities of having RBD or not (P (RBD) and P (nonRBD), respectively). A patient was classified as having RBD if P (RBD)≥P (nonRBD). The system was applied to 107 de novo PD patients: 54 had normal REM sleep (PDnonRBD), 26 had RBD (PD + RBD), and 27 had at least two RBEs without meeting electromyographic RBD cut-off (PD + RBE). Sleep diagnoses were made with video-polysomnography (v-PSG). RESULTS Considering PDnonRBD and PD + RBD patients only, the system identified RBD with accuracy, sensitivity, and specificity over 80%. Among the features, micro-sleep instability had the highest importance for RBD identification. Considering PD + RBE patients, the ones who developed definite RBD after two years had significantly higher values of P (RBD) at baseline compared to the ones who did not. The former were distinguished from the latter with sensitivity and specificity over 75%. CONCLUSIONS Our method identifies RBD in PD patients using only EEG and EOG signals. Micro-sleep instability could be a biomarker for RBD and for proximity of conversion from RBEs, as prodromal RBD, to definite RBD in PD patients.
Collapse
Affiliation(s)
- Matteo Cesari
- Department of Health Technology, Technical University of Denmark, Kgs. Lyngby, Denmark.
| | - Julie A E Christensen
- Department of Health Technology, Technical University of Denmark, Kgs. Lyngby, Denmark; Danish Center for Sleep Medicine, Department of Clinical Neurophysiology, Rigshospitalet, Glostrup, Denmark
| | | | - Brit Mollenhauer
- Paracelsus-Elena Klinik, Kassel, Germany; Department of Neurology, University Medical Center, Goettingen, Germany
| | - Friederike Sixel-Döring
- Paracelsus-Elena Klinik, Kassel, Germany; Department of Neurology, Philipps University, Marburg, Germany
| | - Helge B D Sorensen
- Department of Health Technology, Technical University of Denmark, Kgs. Lyngby, Denmark
| | | | - Poul Jennum
- Danish Center for Sleep Medicine, Department of Clinical Neurophysiology, Rigshospitalet, Glostrup, Denmark
| |
Collapse
|
111
|
AHMAD WAKIL, ARAFAT EASIN, TAHERZADEH GHAZALEH, SHARMA ALOK, DIPTA SHUBHASHISROY, DEHZANGI ABDOLLAH, SHATABDA SWAKKHAR. Mal-Light: Enhancing Lysine Malonylation Sites Prediction Problem Using Evolutionary-based Features. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:77888-77902. [PMID: 33354488 PMCID: PMC7751949 DOI: 10.1109/access.2020.2989713] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Post Translational Modification (PTM) is considered an important biological process with a tremendous impact on the function of proteins in both eukaryotes, and prokaryotes cells. During the past decades, a wide range of PTMs has been identified. Among them, malonylation is a recently identified PTM which plays a vital role in a wide range of biological interactions. Notwithstanding, this modification plays a potential role in energy metabolism in different species including Homo Sapiens. The identification of PTM sites using experimental methods is time-consuming and costly. Hence, there is a demand for introducing fast and cost-effective computational methods. In this study, we propose a new machine learning method, called Mal-Light, to address this problem. To build this model, we extract local evolutionary-based information according to the interaction of neighboring amino acids using a bi-peptide based method. We then use Light Gradient Boosting (LightGBM) as our classifier to predict malonylation sites. Our results demonstrate that Mal-Light is able to significantly improve malonylation site prediction performance compared to previous studies found in the literature. Using Mal-Light we achieve Matthew's correlation coefficient (MCC) of 0.74 and 0.60, Accuracy of 86.66% and 79.51%, Sensitivity of 78.26% and 67.27%, and Specificity of 95.05% and 91.75%, for Homo Sapiens and Mus Musculus proteins, respectively. Mal-Light is implemented as an online predictor which is publicly available at: (http://brl.uiu.ac.bd/MalLight/).
Collapse
Affiliation(s)
- WAKIL AHMAD
- Department of Computer Science and Engineering, United International University, United City, Madani Avenue, Dhaka 1212, Bangladesh
| | - EASIN ARAFAT
- Department of Computer Science and Engineering, United International University, United City, Madani Avenue, Dhaka 1212, Bangladesh
| | - GHAZALEH TAHERZADEH
- Institute for Bioscience and Biotechnology Research, University of Maryland, College Park, MD, 20742, USA
| | - ALOK SHARMA
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD-4111, Australia
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Kanagawa, Japan
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji
- CREST, JST, Tokyo, 102-8666, Japan
| | - SHUBHASHIS ROY DIPTA
- Department of Computer Science and Engineering, United International University, United City, Madani Avenue, Dhaka 1212, Bangladesh
| | - ABDOLLAH DEHZANGI
- Department of Computer Science, Morgan State University, Baltimore, MD, 21251, USA
| | - SWAKKHAR SHATABDA
- Department of Computer Science and Engineering, United International University, United City, Madani Avenue, Dhaka 1212, Bangladesh
| |
Collapse
|
112
|
Cesari M, Christensen JAE, Sixel-Doring F, Muntean ML, Mollenhauer B, Trenkwalder C, Jennum P, Sorensen HBD. A Clinically Applicable Interactive Micro and Macro-Sleep Staging Algorithm for Elderly and Patients with Neurodegeneration. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2019:3649-3652. [PMID: 31946667 DOI: 10.1109/embc.2019.8856705] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Elderly and patients with neurodegenerative diseases (NDD) often complain about sleep problems and show altered sleep structure. Automated algorithms for efficient and specific sleep staging are needed. We propose a new algorithm using only one electroencephalographic and two electrooculographic channels to score wakefulness, rapid eye movement (REM) sleep and non-REM sleep in a cohort of elderly healthy controls (HC), patients with Parkinson's disease (PD), isolated REM sleep behavior disorder (iRBD), considered the prodromal stage of PD, and patients with PD and RBD (PD+RBD). The proposed method scores both standard 30-s epochs (macro-staging) and 5-s mini-epochs (micro-staging), whose evaluation may help to better understand sleep micro-structure. Moreover, the algorithm is interactive, as it labels the classified sleep epochs as either certain or uncertain, so that experts can manually review the uncertain ones. The algorithm performances were evaluated for macro-sleep staging, where it achieved overall accuracies of 0.87±0.05 in 41 HC, 0.86±0.10 in 57 PD, 0.76±0.10 in 31 iRBD and 0.77±0.10 in 30 PD+RBD patients when all 30-s epochs were considered. The accuracies increased to 0.91±0.05, 0.90±0.08, 0.85±0.09, 0.88±0.08 respectively when considering only the certain ones. The epochs labeled as uncertain were 9.95±4.15%, 11.13±7.86%, 18.39±7.38% and 18.90±8.00% in HC, PD, iRBD and PD+RBD respectively. The proposed interactive micro and macro sleep staging algorithm can be used in clinics to reduce the burden of manual sleep staging in elderly and patients with NDD.
Collapse
|
113
|
Miao YY, Zhao W, Li GP, Gao Y, Du PF. Predicting Endoplasmic Reticulum Resident Proteins Using Auto-Cross Covariance Transformation With a U-Shaped Residue Weight-Transfer Function. Front Genet 2020; 10:1231. [PMID: 31921288 PMCID: PMC6932965 DOI: 10.3389/fgene.2019.01231] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 11/06/2019] [Indexed: 11/13/2022] Open
Abstract
Background: The endoplasmic reticulum (ER) is an important organelle in eukaryotic cells. It is involved in many important biological processes, such as cell metabolism, protein synthesis, and post-translational modification. The proteins that reside within the ER are called ER-resident proteins. These proteins are closely related to the biological functions of the ER. The difference between the ER-resident proteins and other non-resident proteins should be carefully studied. Methods: We developed a support vector machine (SVM)-based method. We developed a U-shaped weight-transfer function and used it, along with the positional-specific physiochemical properties (PSPCP), to integrate together sequence order information, signaling peptides information, and evolutionary information. Result: Our method achieved over 86% accuracy in a jackknife test. We also achieved roughly 86% sensitivity and 67% specificity in an independent dataset test. Our method is capable of identifying ER-resident proteins.
Collapse
Affiliation(s)
- Yang-Yang Miao
- College of Intelligence and Computing, Tianjin University, Tianjin, China.,School of Chemical Engineering, Tianjin University, Tianjin, China
| | - Wei Zhao
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Guang-Ping Li
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yang Gao
- School of Medicine, Nankai University, Tianjin, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
114
|
Taxonomy dimension reduction for colorectal cancer prediction. Comput Biol Chem 2019; 83:107160. [DOI: 10.1016/j.compbiolchem.2019.107160] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 11/02/2019] [Accepted: 11/04/2019] [Indexed: 02/01/2023]
|
115
|
Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M, Zhao S. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 2019; 18:463-477. [PMID: 30976107 DOI: 10.1038/s41573-019-0024-5] [Citation(s) in RCA: 931] [Impact Index Per Article: 186.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Drug discovery and development pipelines are long, complex and depend on numerous factors. Machine learning (ML) approaches provide a set of tools that can improve discovery and decision making for well-specified questions with abundant, high-quality data. Opportunities to apply ML occur in all stages of drug discovery. Examples include target validation, identification of prognostic biomarkers and analysis of digital pathology data in clinical trials. Applications have ranged in context and methodology, with some approaches yielding accurate predictions and insights. The challenges of applying ML lie primarily with the lack of interpretability and repeatability of ML-generated results, which may limit their application. In all areas, systematic and comprehensive high-dimensional data still need to be generated. With ongoing efforts to tackle these issues, as well as increasing awareness of the factors needed to validate ML approaches, the application of ML can promote data-driven decision making and has the potential to speed up the process and reduce failure rates in drug discovery and development.
Collapse
Affiliation(s)
- Jessica Vamathevan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK.
| | - Dominic Clark
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | | | - Ian Dunham
- Open Targets and European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Edgardo Ferran
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - George Lee
- Bristol-Myers Squibb, Princeton, NJ, USA
| | - Bin Li
- Takeda Pharmaceuticals International Co., Cambridge, MA, USA
| | - Anant Madabhushi
- Case Western Reserve University, Cleveland, OH, USA.,Louis Stokes Cleveland Veterans Affair Medical Center, Cleveland, OH, USA
| | | | - Michaela Spitzer
- Open Targets and European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Shanrong Zhao
- Pfizer Worldwide Research and Development, Cambridge, MA, USA
| |
Collapse
|
116
|
An Automated ECG Beat Classification System Using Deep Neural Networks with an Unsupervised Feature Extraction Technique. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9142921] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
An automated classification system based on a Deep Learning (DL) technique for Cardiac Disease (CD) monitoring and detection is proposed in this paper. The proposed DL architecture is divided into Deep Auto-Encoders (DAEs) as an unsupervised form of feature learning and Deep Neural Networks (DNNs) as a classifier. The objective of this study is to improve on the previous machine learning technique that consists of several data processing steps such as feature extraction and feature selection or feature reduction. It is also noticed that the previously used machine learning technique required human interference and expertise in determining robust features, yet was time-consuming in the labeling and data processing steps. In contrast, DL enables an embedded feature extraction and feature selection in DAEs pre-training and DNNs fine-tuning process directly from raw data. Hence, DAEs is able to extract high-level of features not only from the training data but also from unseen data. The proposed model uses 10 classes of imbalanced data from ECG signals. Since it is related to the cardiac region, abnormality is usually considered for an early diagnosis of CD. In order to validate the result, the proposed model is compared with the shallow models and DL approaches. Results found that the proposed method achieved a promising performance with 99.73% accuracy, 91.20% sensitivity, 93.60% precision, 99.80% specificity, and a 91.80% F1-Score. Moreover, both the Receiver Operating Characteristic (ROC) curve and the Precision-Recall (PR) curve from the confusion matrix showed that the developed model is a good classifier. The developed model based on unsupervised feature extraction and deep neural network is ready to be used on a large population before its installation for clinical usage.
Collapse
|
117
|
Lucas A, Williams AT, Cabrales P. Prediction of Recovery From Severe Hemorrhagic Shock Using Logistic Regression. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2019; 7:1900509. [PMID: 31367491 PMCID: PMC6661015 DOI: 10.1109/jtehm.2019.2924011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 06/13/2019] [Accepted: 06/16/2019] [Indexed: 11/09/2022]
Abstract
This paper implements logistic regression models (LRMs) and feature selection for creating a predictive model for recovery form hemorrhagic shock (HS) with resuscitation using blood in the multiple experimental rat animal protocols. A total of 61 animals were studied across multiple HS experiments, which encompassed two different HS protocols and two resuscitation protocols using blood stored for short periods using five different techniques. Twenty-seven different systemic hemodynamics, cardiac function, and blood gas parameters were measured in each experiment, of which feature selection deemed only 25% of the them as relevant. The reduced feature set was used to train a final logistic regression model. A final test set accuracy is 84% compared to 74% for a baseline classifier using only MAP and HR measurements. Receiver operating characteristics (ROC) curve analysis and Cohens kappa statistics were also used as measures of performance, with the final reduced model outperforming the model, including all parameters. Our results suggest that LRMs trained with a combination of systemic hemodynamics, cardiac function, and blood gas parameters measured at multiple timepoints during HS can successfully classify HS recovery groups. Our results show the predictive ability of traditional and novel hemodynamic and cardiac function features and their combinations, many of which had not previously been taken into consideration, for monitoring HS. Furthermore, we have devised an effective methodology for feature selection and shown ways in which the performance of such predictive models should be assessed in future studies.
Collapse
Affiliation(s)
- Alfredo Lucas
- Department of BioengineeringUniversity of California at San DiegoLa JollaCA92092USA
| | | | - Pedro Cabrales
- Department of BioengineeringUniversity of California at San DiegoLa JollaCA92092USA
| |
Collapse
|
118
|
Zhao W, Li GP, Wang J, Zhou YK, Gao Y, Du PF. Predicting protein sub-Golgi locations by combining functional domain enrichment scores with pseudo-amino acid compositions. J Theor Biol 2019; 473:38-43. [DOI: 10.1016/j.jtbi.2019.04.025] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 04/22/2019] [Accepted: 04/29/2019] [Indexed: 12/11/2022]
|
119
|
Yi HC, You ZH, Zhou X, Cheng L, Li X, Jiang TH, Chen ZH. ACP-DL: A Deep Learning Long Short-Term Memory Model to Predict Anticancer Peptides Using High-Efficiency Feature Representation. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 17:1-9. [PMID: 31173946 PMCID: PMC6554234 DOI: 10.1016/j.omtn.2019.04.025] [Citation(s) in RCA: 99] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Revised: 04/08/2019] [Accepted: 04/08/2019] [Indexed: 01/10/2023]
Abstract
Cancer is a well-known killer of human beings, which has led to countless deaths and misery. Anticancer peptides open a promising perspective for cancer treatment, and they have various attractive advantages. Conventional wet experiments are expensive and inefficient for finding and identifying novel anticancer peptides. There is an urgent need to develop a novel computational method to predict novel anticancer peptides. In this study, we propose a deep learning long short-term memory (LSTM) neural network model, ACP-DL, to effectively predict novel anticancer peptides. More specifically, to fully exploit peptide sequence information, we developed an efficient feature representation approach by integrating binary profile feature and k-mer sparse matrix of the reduced amino acid alphabet. Then we implemented a deep LSTM model to automatically learn how to identify anticancer peptides and non-anticancer peptides. To our knowledge, this is the first time that the deep LSTM model has been applied to predict anticancer peptides. It was demonstrated by cross-validation experiments that the proposed ACP-DL remarkably outperformed other comparison methods with high accuracy and satisfied specificity on benchmark datasets. In addition, we also contributed two new anticancer peptides benchmark datasets, ACP740 and ACP240, in this work. The source code and datasets are available at https://github.com/haichengyi/ACP-DL.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
| | - Xi Zhou
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Li Cheng
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Xiao Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Tong-Hai Jiang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| |
Collapse
|
120
|
Shi JY, Mao KT, Yu H, Yiu SM. Detecting drug communities and predicting comprehensive drug-drug interactions via balance regularized semi-nonnegative matrix factorization. J Cheminform 2019; 11:28. [PMID: 30963300 PMCID: PMC6454721 DOI: 10.1186/s13321-019-0352-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Accepted: 04/01/2019] [Indexed: 01/09/2023] Open
Abstract
Background Because drug–drug interactions (DDIs) may cause adverse drug reactions or contribute to complex-disease treatments, it is important to identify DDIs before multiple-drug medications are prescribed. As the alternative of high-cost experimental identifications, computational approaches provide a much cheaper screening for potential DDIs on a large scale manner. Nevertheless, most of them only predict whether or not one drug interacts with another, but neglect their enhancive (positive) and depressive (negative) changes of pharmacological effects. Moreover, these comprehensive DDIs do not occur at random, but exhibit a weakly balanced relationship (a structural property when considering the DDI network), which would help understand how high-order DDIs work. Results This work exploits the intrinsically structural relationship to solve two tasks, including drug community detection as well as comprehensive DDI prediction in the cold-start scenario. Accordingly, we first design a balance regularized semi-nonnegative matrix factorization (BRSNMF) to partition the drugs into communities. Then, to predict enhancive and degressive DDIs in the cold-start scenario, we develop a BRSNMF-based predictive approach, which technically leverages drug-binding proteins (DBP) as features to associate new drugs (having no known DDI) with other drugs (having known DDIs). Our experiments demonstrate that BRSNMF can generate the drug communities, which exhibit more reasonable sizes, the property of weak balance as well as pharmacological significances. Moreover, they demonstrate the superiority of DBP features and the inspiring ability of the BRSNMF-based predictive approach on comprehensive DDI prediction with 94% accuracy among top-50 predicted enhancive and 86% accuracy among bottom-50 predicted degressive DDIs. Conclusions Owing to the regularization of the weak balance property of the comprehensive DDI network into semi-nonnegative matrix factorization, our proposed BRSNMF is able to not only generate better drug communities but also provide an inspiring comprehensive DDI prediction in the cold-start scenario. Electronic supplementary material The online version of this article (10.1186/s13321-019-0352-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China.
| | - Kui-Tao Mao
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Hui Yu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
121
|
Sattar M, Majid A. Lung Cancer Classification Models Using Discriminant Information of Mutated Genes in Protein Amino Acids Sequences. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2019. [DOI: 10.1007/s13369-018-3468-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
122
|
Khan YD, Batool A, Rasool N, Khan SA, Chou KC. Prediction of Nitrosocysteine Sites Using Position and Composition Variant Features. LETT ORG CHEM 2019. [DOI: 10.2174/1570178615666180802122953] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
S-nitrosylation is one of the most prominent posttranslational modification among proteins. It involves the addition of nitrogen oxide group to cysteine thiols forming S-nitrosocysteine. Evidence suggests that S-nitrosylation plays a foremost role in numerous human diseases and disorders. The incorporation of techniques for robust identification of S-nitrosylated proteins is highly anticipated in biological research and drug discovery. The proposed system endeavors a novel strategy based on a statistical and computational intelligent methods for the identification of S-nitrosocystiene sites within a given primary protein sequence. For this purpose, 5-step rule was approached comprising of benchmark dataset creation, mathematical modelling, prediction, evaluation and web-server development. For position relative feature extraction, statistical moments were used and a multilayer neural network was trained adapting Gradient Descent and Adaptive Learning algorithms. The results were comparatively analyzed with existing techniques using benchmark datasets. It is inferred through conclusive experimentation that the proposed scheme is very propitious, accurate and exceptionally effective for the prediction of S-nitrosocystiene in protein sequences.
Collapse
Affiliation(s)
- Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Aroosa Batool
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Nouman Rasool
- Department of Life Sciences, School of Science, University of Management and Technology, Lahore, Pakistan
| | - Sher Afzal Khan
- Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University, Jeddah, 21577, Saudi Arabia
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
123
|
Khan SA, Khan YD, Ahmad S, Allehaibi KH. N-MyristoylG-PseAAC: Sequence-based Prediction of N-Myristoyl Glycine Sites in Proteins by Integration of PseAAC and Statistical Moments. LETT ORG CHEM 2019. [DOI: 10.2174/1570178616666181217153958] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
N-Myristoylation, an irreversible protein modification, occurs by the covalent attachment of myristate with the N-terminal glycine of the eukaryotic and viral proteins, and is associated with a variety of pathogens and disease-related proteins. Identification of myristoylation sites through experimental mechanisms can be costly, labour associated and time-consuming. Due to the association of N-myristoylation with various diseases, its timely prediction can help in diagnosing and controlling the associated fatal diseases. Herein, we present a method named N-MyristoylG-PseAAC in which we have incorporated PseAAC with statistical moments for the prediction of N-Myristoyl Glycine (NMG) sites. A benchmark dataset of 893 positive and 1093 negative samples was collected and used in this study. For feature vector, various position and composition relative features along with the statistical moments were calculated. Later on, a back propagation neural network was trained using feature vectors and scaled conjugate gradient descent with adaptive learning was used as an optimizer. Selfconsistency testing and 10-fold cross-validation were performed to evaluate the performance of N-MyristoylG-PseAAC, by using accuracy metrics. For self-consistency testing, 99.80% Acc, 99.78% Sp, 99.81% Sn and 0.99 MCC were observed, whereas, for 10-fold cross validation, 97.18% Acc, 98.54% Sp, 96.07% Sn and 0.94 MCC were observed. Thus, it was found that the proposed predictor can help in predicting the myristoylation sites in an efficient and accurate way.
Collapse
Affiliation(s)
- Sher Afzal Khan
- Department of Information Technology, Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management Technology, Lahore, Pakistan
| | - Shakeel Ahmad
- Department of Computer Sciences, FCITR, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Khalid H. Allehaibi
- Department of Computer Sciences, FCIT, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
124
|
Shi JY, Li JX, Mao KT, Cao JB, Lei P, Lu HM, Yiu SM. Predicting combinative drug pairs via multiple classifier system with positive samples only. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 168:1-10. [PMID: 30527128 DOI: 10.1016/j.cmpb.2018.11.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 10/24/2018] [Accepted: 11/12/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND AND OBJECTIVE Due to the synergistic effects of drugs, drug combination is one of the effective approaches for treating complex diseases. However, the identification of drug combinations by dose-response methods is still costly. It is promising to develop supervised learning-based approaches to predict potential drug combinations on a large scale. Nevertheless, these approaches have the inadequate utilization of heterogeneous features, which causes the loss of information useful to classification. Moreover, they have an intrinsic bias, because they assume unknown drug pairs as non-combinations, of which some could be real drug combinations in practice. METHODS To address above issues, this work first designs a two-layer multiple classifier system (TLMCS) to effectively integrate heterogeneous features involving anatomical therapeutic chemical codes of drugs, drug-drug interactions, drug-target interactions, gene ontology of drug targets, and side effects. To avoid the bias caused by labelling unknown samples as negative, it then utilizes the one-class support vector machines, (which requires no negative instance and only labels approved drug combinations as positive instances), as the member classifiers in TLMCS. Last, both a 10-fold cross validation (10-CV) and a novel prediction are performed to validate the performance of TLMCS. RESULTS The comparison with three state-of-the-art approaches under 10-CV exhibits the superiority of TLMCS, which achieves the area under the receiver operating characteristic curve = 0.824 and the area under the precision-recall curve = 0.372. Moreover, the experiment under the novel prediction demonstrates its ability, where 9 out of the top-20 predicted combinative drug pairs are validated by checking the published literature. Furthermore, for each of the newly-validated drug combinations, this work analyses the combining mode of the member drugs and investigates their relationship in terms of drug targeting pathways. CONCLUSIONS The proposed TLMCS provides an effective framework to integrate those heterogeneous features and is trained by only positive samples such that the bias of taking unknown drug pairs as negative samples can be avoided. Furthermore, its results in the novel prediction reveal five types of drug combinations and three types of drug relationships in terms of pathways.
Collapse
Affiliation(s)
- Jian-Yu Shi
- School of Life Science, Northwestern Polytechnical University, China.
| | - Jia-Xin Li
- School of Life Science, Northwestern Polytechnical University, China.
| | - Kui-Tao Mao
- School of Computer Science, Northwestern Polytechnical University, China.
| | - Jiang-Bo Cao
- School of Life Science, Northwestern Polytechnical University, China.
| | - Peng Lei
- Department of Chinese Medicine, Shaanxi Provincial People's Hospital, China.
| | - Hui-Meng Lu
- School of Life Science, Northwestern Polytechnical University, China.
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong, China.
| |
Collapse
|
125
|
Shi JY, Zhang AQ, Zhang SW, Mao KT, Yiu SM. A unified solution for different scenarios of predicting drug-target interactions via triple matrix factorization. BMC SYSTEMS BIOLOGY 2018; 12:136. [PMID: 30598094 PMCID: PMC6311903 DOI: 10.1186/s12918-018-0663-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Background During the identification of potential candidates, computational prediction of drug-target interactions (DTIs) is important to subsequent expensive validation in wet-lab. DTI screening considers four scenarios, depending on whether the drug is an existing or a new drug and whether the target is an existing or a new target. However, existing approaches have the following limitations. First, only a few of them can address the most difficult scenario (i.e., predicting interactions between new drugs and new targets). More importantly, none of the existing approaches could provide the explicit information for understanding the mechanism of forming interactions, such as the drug-target feature pairs contributing to the interactions. Results In this paper, we propose a Triple Matrix Factorization-based model (TMF) to tackle these problems. Compared with former state-of-the-art predictive methods, TMF demonstrates its significant superiority by assessing the predictions on four benchmark datasets over four kinds of screening scenarios. Also, it exhibits its outperformance by validating predicted novel interactions. More importantly, by using PubChem fingerprints of chemical structures as drug features and occurring frequencies of amino acid trimer as protein features, TMF shows its ability to find out the features determining interactions, including dominant feature pairs, frequently occurring substructures, and conserved triplet of amino acids. Conclusions Our TMF provides a unified framework of DTI prediction for all the screening scenarios. It also presents a new insight for the underlying mechanism of DTIs by indicating dominant features, which play important roles in the forming of DTI. Electronic supplementary material The online version of this article (10.1186/s12918-018-0663-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'An, China.
| | - An-Qi Zhang
- School of Life Sciences, Northwestern Polytechnical University, Xi'An, China
| | - Shao-Wu Zhang
- School of Automations, Northwestern Polytechnical University, Xi'An, China
| | - Kui-Tao Mao
- School of Computer Science, Northwestern Polytechnical University, Xi'An, China
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
126
|
Shi JY, Huang H, Li JX, Lei P, Zhang YN, Dong K, Yiu SM. TMFUF: a triple matrix factorization-based unified framework for predicting comprehensive drug-drug interactions of new drugs. BMC Bioinformatics 2018; 19:411. [PMID: 30453924 PMCID: PMC6245591 DOI: 10.1186/s12859-018-2379-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Background A significant number of adverse drug reactions is caused by unexpected Drug-drug interactions (DDIs). The identification of DDIs becomes crucial before the co-prescription of multiple drugs is made. Such a task in clinics or in drug discovery usually requires high costs and numerous limitations, while computational approaches are able to predict potential DDIs effectively by utilizing diverse drug attributes (e.g. side effects). Nevertheless, they’re incapable when required to predict enhancive and degressive DDIs, which change increasingly and decreasingly the pharmacological behavior of interacting drugs respectively. The pharmacological change of DDIs is one of the most important factors when making a multi-drug prescription. Results In this work, we design a Triple Matrix Factorization-based Unified Framework (TMFUF) to address the above issue. By leveraging a group of side effect entries of drugs, TMFUF achieves the inspiring result (AUC = 0.842 and AUPR = 0.526) in the case of conventional DDI prediction under the traditional screening task. In the comparison with two state-of-the-art approaches, TMFUF demonstrates it superiority by ~ 7% and ~ 20% improvement in terms of AUC and AUPR respectively. More importantly, TMFUF shows its ability in the comprehensive DDI prediction under different screening tasks. Finally, a utilization TMFUF reveals the significant pairs of side effects, which contribute to form enhancive and degressive DDIs, for further clinical validation. Conclusions The proposed TMFUF is first capable to predict both conventional binary DDIs and comprehensive DDIs such that it captures the pharmacological changes caused by DDIs. Furthermore, it provides a unified solution of DDI prediction for two screening scenarios, which involves newly given drugs having no prior interaction. Another advantage is its ability to indicate how significantly the pairs of drug features contribute to form DDIs.
Collapse
Affiliation(s)
- Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China.
| | - Hua Huang
- School of Software and Microelectronics, Northwestern Polytechnical University, Xi'an, China
| | - Jia-Xin Li
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Peng Lei
- Department of Chinese Medicine, Shaanxi Provincial People's Hospital, Xi'an, China
| | - Yan-Ning Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Kai Dong
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Siu-Ming Yiu
- Department of Computer Science, the University of Hong Kong, Hong Kong, China.
| |
Collapse
|
127
|
Khan YD, Rasool N, Hussain W, Khan SA, Chou KC. iPhosY-PseAAC: identify phosphotyrosine sites by incorporating sequence statistical moments into PseAAC. Mol Biol Rep 2018; 45:2501-2509. [DOI: 10.1007/s11033-018-4417-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 10/01/2018] [Indexed: 10/28/2022]
|
128
|
Shi JY, Shang XQ, Gao K, Zhang SW, Yiu SM. An Integrated Local Classification Model of Predicting Drug-Drug Interactions via Dempster-Shafer Theory of Evidence. Sci Rep 2018; 8:11829. [PMID: 30087377 PMCID: PMC6081396 DOI: 10.1038/s41598-018-30189-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 07/24/2018] [Indexed: 12/19/2022] Open
Abstract
Drug-drug interactions (DDIs) may trigger adverse drug reactions, which endanger the patients. DDI identification before making clinical medications is critical but bears a high cost in clinics. Computational approaches, including global model-based and local model based, are able to screen DDI candidates among a large number of drug pairs by utilizing preliminary characteristics of drugs (e.g. drug chemical structure). However, global model-based approaches are usually slow and don't consider the topological structure of DDI network, while local model-based approaches have the degree-induced bias that a new drug tends to link to the drug having many DDI. All of them lack an effective ensemble method to combine results from multiple predictors. To address the first two issues, we propose a local classification-based model (LCM), which considers the topology of DDI network and has the relaxation of the degree-induced bias. Furthermore, we design a novel supervised fusion rule based on the Dempster-Shafer theory of evidence (LCM-DS), which aggregates the results from multiple LCMs. To make the final prediction, LCM-DS integrates three aspects from multiple classifiers, including the posterior probabilities output by individual classifiers, the proximity between their instance decision profiles and their reference profiles, as well as the quality of their reference profiles. Last, the substantial comparison with three state-of-the-art approaches demonstrates the effectiveness of our LCM, and the comparison with both individual LCM implementations and classical fusion algorithms exhibits the superiority of our LCM-DS.
Collapse
Affiliation(s)
- Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Xue-Qun Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Ke Gao
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Shao-Wu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong, 999077, China
| |
Collapse
|
129
|
Yu H, Mao KT, Shi JY, Huang H, Chen Z, Dong K, Yiu SM. Predicting and understanding comprehensive drug-drug interactions via semi-nonnegative matrix factorization. BMC SYSTEMS BIOLOGY 2018; 12:14. [PMID: 29671393 PMCID: PMC5907306 DOI: 10.1186/s12918-018-0532-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Background Drug-drug interactions (DDIs) always cause unexpected and even adverse drug reactions. It is important to identify DDIs before drugs are used in the market. However, preclinical identification of DDIs requires much money and time. Computational approaches have exhibited their abilities to predict potential DDIs on a large scale by utilizing pre-market drug properties (e.g. chemical structure). Nevertheless, none of them can predict two comprehensive types of DDIs, including enhancive and degressive DDIs, which increases and decreases the behaviors of the interacting drugs respectively. There is a lack of systematic analysis on the structural relationship among known DDIs. Revealing such a relationship is very important, because it is able to help understand how DDIs occur. Both the prediction of comprehensive DDIs and the discovery of structural relationship among them play an important guidance when making a co-prescription. Results In this work, treating a set of comprehensive DDIs as a signed network, we design a novel model (DDINMF) for the prediction of enhancive and degressive DDIs based on semi-nonnegative matrix factorization. Inspiringly, DDINMF achieves the conventional DDI prediction (AUROC = 0.872 and AUPR = 0.605) and the comprehensive DDI prediction (AUROC = 0.796 and AUPR = 0.579). Compared with two state-of-the-art approaches, DDINMF shows it superiority. Finally, representing DDIs as a binary network and a signed network respectively, an analysis based on NMF reveals crucial knowledge hidden among DDIs. Conclusions Our approach is able to predict not only conventional binary DDIs but also comprehensive DDIs. More importantly, it reveals several key points about the DDI network: (1) both binary and signed networks show fairly clear clusters, in which both drug degree and the difference between positive degree and negative degree show significant distribution; (2) the drugs having large degrees tend to have a larger difference between positive degree and negative degree; (3) though the binary DDI network contains no information about enhancive and degressive DDIs at all, it implies some of their relationship in the comprehensive DDI matrix; (4) the occurrence of signs indicating enhancive and degressive DDIs is not random because the comprehensive DDI network is equipped with a structural balance.
Collapse
Affiliation(s)
- Hui Yu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Kui-Tao Mao
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China.
| | - Hua Huang
- School of Software and Microelectronics, Northwestern Polytechnical University, Xi'an, China
| | - Zhi Chen
- Department of Critical Care Medicine, People's Hospital of Jiangxi Province, Nan Chang, China
| | - Kai Dong
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
130
|
Li J, Shi X, You Z, Chen Z, Lin Q, Fang M. Using Weighted Extreme Learning Machine Combined with Scale-Invariant Feature Transform to Predict Protein-Protein Interactions from Protein Evolutionary Information. INTELLIGENT COMPUTING THEORIES AND APPLICATION 2018:527-532. [DOI: 10.1007/978-3-319-95930-6_49] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
|
131
|
Greiff V, Weber CR, Palme J, Bodenhofer U, Miho E, Menzel U, Reddy ST. Learning the High-Dimensional Immunogenomic Features That Predict Public and Private Antibody Repertoires. THE JOURNAL OF IMMUNOLOGY 2017; 199:2985-2997. [DOI: 10.4049/jimmunol.1700594] [Citation(s) in RCA: 85] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2017] [Accepted: 08/16/2017] [Indexed: 11/19/2022]
|
132
|
Hasan MM, Guo D, Kurata H. Computational identification of protein S-sulfenylation sites by incorporating the multiple sequence features information. MOLECULAR BIOSYSTEMS 2017; 13:2545-2550. [DOI: 10.1039/c7mb00491e] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Cysteine S-sulfenylation is a major type of posttranslational modification that contributes to protein structure and function regulation in many cellular processes.
Collapse
Affiliation(s)
- Md. Mehedi Hasan
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
| | - Dianjing Guo
- School of Life Sciences and the State Key Lab of Agrobiotechnology
- The Chinese University of Hong Kong
- Shatin
- Hong Kong
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
- Biomedical Informatics R&D Center
| |
Collapse
|