1
|
Lilhore UK, Simiaya S, Alhussein M, Faujdar N, Dalal S, Aurangzeb K. Optimizing protein sequence classification: integrating deep learning models with Bayesian optimization for enhanced biological analysis. BMC Med Inform Decis Mak 2024; 24:236. [PMID: 39192227 DOI: 10.1186/s12911-024-02631-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Accepted: 08/07/2024] [Indexed: 08/29/2024] Open
Abstract
Efforts to enhance the accuracy of protein sequence classification are of utmost importance in driving forward biological analyses and facilitating significant medical advancements. This study presents a cutting-edge model called ProtICNN-BiLSTM, which combines attention-based Improved Convolutional Neural Networks (ICNN) and Bidirectional Long Short-Term Memory (BiLSTM) units seamlessly. Our main goal is to improve the accuracy of protein sequence classification by carefully optimizing performance through Bayesian Optimisation. ProtICNN-BiLSTM combines the power of CNN and BiLSTM architectures to effectively capture local and global protein sequence dependencies. In the proposed model, the ICNN component uses convolutional operations to identify local patterns. Captures long-range associations by analyzing sequence data forward and backwards. In advanced biological studies, Bayesian Optimisation optimizes model hyperparameters for efficiency and robustness. The model was extensively confirmed with PDB-14,189 and other protein data. We found that ProtICNN-BiLSTM outperforms traditional categorization models. Bayesian Optimization's fine-tuning and seamless integration of local and global sequence information make it effective. The precision of ProtICNN-BiLSTM improves comparative protein sequence categorization. The study improves computational bioinformatics for complex biological analysis. Good results from the ProtICNN-BiLSTM model improve protein sequence categorization. This powerful tool could improve medical and biological research. The breakthrough protein sequence classification model is ProtICNN-BiLSTM. Bayesian optimization, ICNN, and BiLSTM analyze biological data accurately.
Collapse
Affiliation(s)
- Umesh Kumar Lilhore
- School of Computing Science and Engineering, Galgotias University, Greater Noida, UP, India
| | - Sarita Simiaya
- School of Computing Science and Engineering, Galgotias University, Greater Noida, UP, India
| | - Musaed Alhussein
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P. O. Box 51178, Riyadh, 11543, Saudi Arabia
| | - Neetu Faujdar
- Department of Computer Engineering and Applications, GLA University, 281406, UP, Mathura, India
| | | | - Khursheed Aurangzeb
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P. O. Box 51178, Riyadh, 11543, Saudi Arabia
| |
Collapse
|
2
|
Yu S, Peng D, Zhu W, Liao B, Wang P, Yang D, Wu F. Hybrid_DBP: Prediction of DNA-binding proteins using hybrid features and convolutional neural networks. Front Pharmacol 2022; 13:1031759. [PMID: 36299898 PMCID: PMC9589247 DOI: 10.3389/fphar.2022.1031759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 09/27/2022] [Indexed: 11/21/2022] Open
Abstract
DNA-binding proteins (DBP) play an essential role in the genetics and evolution of organisms. A particular DNA sequence could provide underlying therapeutic benefits for hereditary diseases and cancers. Studying these proteins can timely and effectively understand their mechanistic analysis and play a particular function in disease prevention and treatment. The limitation of identifying DNA-binding protein members from the sequence database is time-consuming, costly, and ineffective. Therefore, efficient methods for improving DBP classification are crucial to disease research. In this paper, we developed a novel predictor Hybrid _DBP, which identified potential DBP by using hybrid features and convolutional neural networks. The method combines two feature selection methods, MonoDiKGap and Kmer, and then used MRMD2.0 to remove redundant features. According to the results, 94% of DBP were correctly recognized, and the accuracy of the independent test set reached 91.2%. This means Hybrid_ DBP can become a useful prediction tool for predicting DBP.
Collapse
Affiliation(s)
- Shaoyou Yu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Dejun Peng
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- *Correspondence: Wen Zhu,
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Peng Wang
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Dongxuan Yang
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Fangxiang Wu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
3
|
Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022; 10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open
Abstract
Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit-explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring "the state of the art" in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI-PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI-PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI-PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the "state of the art" on research in the AI-PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.
Collapse
Affiliation(s)
- Jalil Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Luis Ochoa-Toledo
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Mario Javier Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Atocha Aliseda
- Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Fernando Pérez-Escamirosa
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | | | - Francine Ochoa-Fernández
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Ricardo Zamora-Solís
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Sebastián Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Cristina Revilla-Monsalve
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Nicolás Kemper-Valverde
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Myriam M. Altamirano-Bustamante
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| |
Collapse
|
4
|
Nguyen TTD, Chen S, Ho QT, Ou YY. Using multiple convolutional window scanning of convolutional neural network for an efficient prediction of ATP-binding sites in transport proteins. Proteins 2022; 90:1486-1492. [PMID: 35246878 DOI: 10.1002/prot.26329] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 02/23/2022] [Accepted: 02/25/2022] [Indexed: 12/31/2022]
Abstract
Protein multiple sequence alignment information has long been important features to know about functions of proteins inferred from related sequences with known functions. It is therefore one of the underlying ideas of Alpha fold 2, a breakthrough study and model for the prediction of three-dimensional structures of proteins from their primary sequence. Our study used protein multiple sequence alignment information in the form of position-specific scoring matrices as input. We also refined the use of a convolutional neural network, a well-known deep-learning architecture with impressive achievement on image and image-like data. Specifically, we revisited the study of prediction of adenosine triphosphate (ATP)-binding sites with more efficient convolutional neural networks. We applied multiple convolutional window scanning filters of a convolutional neural network on position-specific scoring matrices for as much as useful information as possible. Furthermore, only the most specific motifs are retained at each feature map output through the one-max pooling layer before going to the next layer. We assumed that this way could help us retain the most conserved motifs which are discriminative information for prediction. Our experiment results show that a convolutional neural network with not too many convolutional layers can be enough to extract the conserved information of proteins, which leads to higher performance. Our best prediction models were obtained after examining them with different hyper-parameters. Our experiment results showed that our models were superior to traditional use of convolutional neural networks on the same datasets as well as other machine-learning classification algorithms.
Collapse
Affiliation(s)
| | - Syun Chen
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan
| | - Quang-Thai Ho
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan
| |
Collapse
|
5
|
Nguyen TTD, Ho QT, Tarn YC, Ou YY. MFPS_CNN: Multi-filter pattern scanning from position-specific scoring matrix with convolutional neural network for efficient prediction of ion transporters. Mol Inform 2022; 41:e2100271. [PMID: 35322557 DOI: 10.1002/minf.202100271] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Accepted: 03/23/2022] [Indexed: 11/08/2022]
Abstract
In cellular transportation mechanisms, the movement of ions across the cell membrane and its proper control are important for cells, especially for life processes. Ion transporters/pumps and ion channel proteins work as border guards controlling the incessant traffic of ions across cell membranes. We revisited the study of classification of transporters and ion channels from membrane proteins with a more efficient deep learning approach. Specifically, we applied multi-window scanning filters of convolutional neural networks on almost full-length position-specific scoring matrices for extracting useful information. In this way, we were able to retain important evolutionary information of the proteins. Our experiment results show that a convolutional neural network with a minimum number of convolutional layers can be enough to extract the conserved information of proteins which leads to higher performance. Our best prediction models were obtained after examining different data imbalanced handling techniques, and different protein encoding methods. We also showed that our models were superior to traditional deep learning approaches on the same datasets as well as other machine learning classification algorithms.
Collapse
|
6
|
Guo Z, Lin X, Hui Y, Wang J, Zhang Q, Kong F. Circulating Tumor Cell Identification Based on Deep Learning. Front Oncol 2022; 12:843879. [PMID: 35252012 PMCID: PMC8889528 DOI: 10.3389/fonc.2022.843879] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 01/21/2022] [Indexed: 12/18/2022] Open
Abstract
As a major reason for tumor metastasis, circulating tumor cell (CTC) is one of the critical biomarkers for cancer diagnosis and prognosis. On the one hand, CTC count is closely related to the prognosis of tumor patients; on the other hand, as a simple blood test with the advantages of safety, low cost and repeatability, CTC test has an important reference value in determining clinical results and studying the mechanism of drug resistance. However, the determination of CTC usually requires a big effort from pathologist and is also error-prone due to inexperience and fatigue. In this study, we developed a novel convolutional neural network (CNN) method to automatically detect CTCs in patients’ peripheral blood based on immunofluorescence in situ hybridization (imFISH) images. We collected the peripheral blood of 776 patients from Chifeng Municipal Hospital in China, and then used Cyttel to delete leukocytes and enrich CTCs. CTCs were identified by imFISH with CD45+, DAPI+ immunofluorescence staining and chromosome 8 centromeric probe (CEP8+). The sensitivity and specificity based on traditional CNN prediction were 95.3% and 91.7% respectively, and the sensitivity and specificity based on transfer learning were 97.2% and 94.0% respectively. The traditional CNN model and transfer learning method introduced in this paper can detect CTCs with high sensitivity, which has a certain clinical reference value for judging prognosis and diagnosing metastasis.
Collapse
Affiliation(s)
- Zhifeng Guo
- Department of Oncology, Chifeng Municipal Hospital, Chifeng, China
| | - Xiaoxi Lin
- Department of Oncology, Chifeng Municipal Hospital, Chifeng, China
| | - Yan Hui
- Department of Oncology, Chifeng Municipal Hospital, Chifeng, China
| | - Jingchun Wang
- Department of Oncology, Chifeng Municipal Hospital, Chifeng, China
| | - Qiuli Zhang
- Department of Oncology, Chifeng Municipal Hospital, Chifeng, China
| | - Fanlong Kong
- Department of Oncology, Chifeng Municipal Hospital, Chifeng, China
| |
Collapse
|
7
|
Ho QT, Le NQK, Ou YY. mCNN-ETC: identifying electron transporters and their functional families by using multiple windows scanning techniques in convolutional neural networks with evolutionary information of protein sequences. Brief Bioinform 2022; 23:6361041. [PMID: 34472594 DOI: 10.1093/bib/bbab352] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 08/09/2021] [Accepted: 08/10/2021] [Indexed: 11/13/2022] Open
Abstract
In the past decade, convolutional neural networks (CNNs) have been used as powerful tools by scientists to solve visual data tasks. However, many efforts of convolutional neural networks in solving protein function prediction and extracting useful information from protein sequences have certain limitations. In this research, we propose a new method to improve the weaknesses of the previous method. mCNN-ETC is a deep learning model which can transform the protein evolutionary information into image-like data composed of 20 channels, which correspond to the 20 amino acids in the protein sequence. We constructed CNN layers with different scanning windows in parallel to enhance the useful pattern detection ability of the proposed model. Then we filtered specific patterns through the 1-max pooling layer before inputting them into the prediction layer. This research attempts to solve a basic problem in biology in terms of application: predicting electron transporters and classifying their corresponding complexes. The performance result reached an accuracy of 97.41%, which was nearly 6% higher than its predecessor. We have also published a web server on http://bio219.bioinfo.yzu.edu.tw, which can be used for research purposes free of charge.
Collapse
Affiliation(s)
- Quang-Thai Ho
- Computer Science and Engineering Departments at the Yuan Ze University, Taiwan
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taiwan
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan-Ze University, Taiwan
| |
Collapse
|
8
|
Soltész B, Buglyó G, Németh N, Szilágyi M, Pös O, Szemes T, Balogh I, Nagy B. The Role of Exosomes in Cancer Progression. Int J Mol Sci 2021; 23:ijms23010008. [PMID: 35008434 PMCID: PMC8744561 DOI: 10.3390/ijms23010008] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 12/05/2021] [Accepted: 12/15/2021] [Indexed: 12/12/2022] Open
Abstract
Early detection, characterization and monitoring of cancer are possible by using extracellular vesicles (EVs) isolated from non-invasively obtained liquid biopsy samples. They play a role in intercellular communication contributing to cell growth, differentiation and survival, thereby affecting the formation of tumor microenvironments and causing metastases. EVs were discovered more than seventy years ago. They have been tested recently as tools of drug delivery to treat cancer. Here we give a brief review on extracellular vesicles, exosomes, microvesicles and apoptotic bodies. Exosomes play an important role by carrying extracellular nucleic acids (DNA, RNA) in cell-to-cell communication causing tumor and metastasis development. We discuss the role of extracellular vesicles in the pathogenesis of cancer and their practical application in the early diagnosis, follow up, and next-generation treatment of cancer patients.
Collapse
Affiliation(s)
- Beáta Soltész
- Department of Human Genetics, Faculty of Medicine, University of Debrecen, Egyetem tér 1, H-4032 Debrecen, Hungary; (G.B.); (N.N.); (M.S.); (I.B.); (B.N.)
- Correspondence: ; Tel.: +36-52416531
| | - Gergely Buglyó
- Department of Human Genetics, Faculty of Medicine, University of Debrecen, Egyetem tér 1, H-4032 Debrecen, Hungary; (G.B.); (N.N.); (M.S.); (I.B.); (B.N.)
| | - Nikolett Németh
- Department of Human Genetics, Faculty of Medicine, University of Debrecen, Egyetem tér 1, H-4032 Debrecen, Hungary; (G.B.); (N.N.); (M.S.); (I.B.); (B.N.)
| | - Melinda Szilágyi
- Department of Human Genetics, Faculty of Medicine, University of Debrecen, Egyetem tér 1, H-4032 Debrecen, Hungary; (G.B.); (N.N.); (M.S.); (I.B.); (B.N.)
| | - Ondrej Pös
- Geneton Ltd., 841 04 Bratislava, Slovakia; (O.P.); (T.S.)
- Comenius University Science Park, Comenius University, 841 04 Bratislava, Slovakia
| | - Tomas Szemes
- Geneton Ltd., 841 04 Bratislava, Slovakia; (O.P.); (T.S.)
- Comenius University Science Park, Comenius University, 841 04 Bratislava, Slovakia
| | - István Balogh
- Department of Human Genetics, Faculty of Medicine, University of Debrecen, Egyetem tér 1, H-4032 Debrecen, Hungary; (G.B.); (N.N.); (M.S.); (I.B.); (B.N.)
- Division of Clinical Genetics, Department of Laboratory Medicine, Faculty of Medicine, University of Debrecen, H-4032 Debrecen, Hungary
| | - Bálint Nagy
- Department of Human Genetics, Faculty of Medicine, University of Debrecen, Egyetem tér 1, H-4032 Debrecen, Hungary; (G.B.); (N.N.); (M.S.); (I.B.); (B.N.)
| |
Collapse
|
9
|
Sikander R, Wang Y, Ghulam A, Wu X. Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network. Front Genet 2021; 12:759384. [PMID: 34917128 PMCID: PMC8670239 DOI: 10.3389/fgene.2021.759384] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 10/25/2021] [Indexed: 11/21/2022] Open
Abstract
Predicting the protein sequence information of enzymes and non-enzymes is an important but a very challenging task. Existing methods use protein geometric structures only or protein sequences alone to predict enzymatic functions. Thus, their prediction results are unsatisfactory. In this paper, we propose a novel approach for predicting the amino acid sequences of enzymes and non-enzymes via Convolutional Neural Network (CNN). In CNN, the roles of enzymes are predicted from multiple sides of biological information, including information on sequences and structures. We propose the use of two-dimensional data via 2DCNN to predict the proteins of enzymes and non-enzymes by using the same fivefold cross-validation function. We also use an independent dataset to test the performance of our model, and the results demonstrate that we are able to solve the overfitting problem. We used the CNN model proposed herein to demonstrate the superiority of our model for classifying an entire set of filters, such as 32, 64, and 128 parameters, with the fivefold validation test set as the independent classification. Via the Dipeptide Deviation from Expected Mean (DDE) matrix, mutation information is extracted from amino acid sequences and structural information with the distance and angle of amino acids is conveyed. The derived feature maps are then encoded in DDE exploitation. The independent datasets are then compared with other two methods, namely, GRU and XGBOOST. All analyses were conducted using 32, 64 and 128 filters on our proposed CNN method. The cross-validation datasets achieved an accuracy score of 0.8762%, whereas the accuracy of independent datasets was 0.7621%. Additional variables were derived on the basis of ROC AUC with fivefold cross-validation was achieved score is 0.95%. The performance of our model and that of other models in terms of sensitivity (0.9028%) and specificity (0.8497%) was compared. The overall accuracy of our model was 0.9133% compared with 0.8310% for the other model.
Collapse
Affiliation(s)
- Rahu Sikander
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Yuping Wang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Ali Ghulam
- Computerization and Network Section, Sindh Agriculture University, Tando Jam, Pakistan
| | - Xianjuan Wu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
10
|
Giri SJ, Dutta P, Halani P, Saha S. MultiPredGO: Deep Multi-Modal Protein Function Prediction by Amalgamating Protein Structure, Sequence, and Interaction Information. IEEE J Biomed Health Inform 2021; 25:1832-1838. [PMID: 32897865 DOI: 10.1109/jbhi.2020.3022806] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Protein is an essential macro-nutrient for perceiving a wide range of biochemical activities and biological regulations in living cells. In this work, we have presented a novel multi-modal approach, named MultiPredGO, for predicting protein functions by utilizing two different kinds of information, namely protein sequence and the protein secondary structure. Here, our contributions are threefold; firstly, along with the protein sequence, we learn the feature representation from the protein structure. Secondly, we develop two different deep learning models after considering the characteristics of the underlying data patterns of the protein sequence and protein 3D structures. Finally, along with these two modalities, we have also utilized protein interaction information for expediting the efficiency of the proposed model in predicting the protein functions. For extracting features from different modalities, we have utilized various variations of the convolutional neural network. As the protein function classes are dependent on each other, we have used a neuro-symbolic hierarchical classification model, which resembles the structure of Gene Ontology (GO), for effectively predicting the dependent protein functions. Finally, to validate the goodness of our proposed method (MultiPredGO), we have compared our results with various uni-modal along with two well-known multi-modal protein function prediction approaches, namely, INGA and DeepGO. Results show that the overall performance of the proposed approach in terms of accuracy, F-measure, precision, and recall metrics are better than those by the state-of-the-art methods. MultiPredGO attains an average 13.05% and 30.87% improvements over the best existing comparing approach (DeepGO) for cellular component and molecular functions, respectively.
Collapse
|
11
|
Ni X, Li D, Dai S, Pan H, Sun H, Ao J, Chen L, Kong H. Development and Evaluation of Nomograms to Predict the Cancer-Specific Mortality and Overall Mortality of Patients with Hepatocellular Carcinoma. BIOMED RESEARCH INTERNATIONAL 2021; 2021:1658403. [PMID: 33860031 PMCID: PMC8024067 DOI: 10.1155/2021/1658403] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 01/11/2021] [Accepted: 03/22/2021] [Indexed: 01/10/2023]
Abstract
Hepatocellular carcinoma (HCC) is the most common type among primary liver cancers (PLC). With its poor prognosis and survival rate, it is necessary for HCC patients to have a long-term follow-up. We believe that there are currently no relevant reports or literature about nomograms for predicting the cancer-specific mortality of HCC patients. Therefore, the primary goal of this study was to develop and evaluate nomograms to predict cancer-specific mortality and overall mortality. Data of 45,158 cases of HCC patients were collected from the Surveillance, Epidemiology, and End Results (SEER) program database between 2004 and 2013, which were then utilized to develop the nomograms. Finally, the performance of the nomograms was evaluated by the concordance index (C-index) and the area under the time-dependent receiver operating characteristic (ROC) curve (td-AUC). The categories selected to develop a nomogram for predicting cancer-specific mortality included marriage, insurance, radiotherapy, surgery, distant metastasis, lymphatic metastasis, tumor size, grade, sex, and the American Joint Committee on Cancer (AJCC) stage; while the marriage, radiotherapy, surgery, AJCC stage, grade, race, sex, and age were selected to develop a nomogram for predicting overall mortality. The C-indices for predicted 1-, 3-, and 5-year cancer-specific mortality were 0.792, 0.776, and 0.774; the AUC values for 1-, 3-, and 5-year cancer-specific mortality were 0.830, 0.830, and 0.830. The C-indices for predicted 1-, 3-, and 5-year overall mortality were 0.770, 0.755, and 0.752; AUC values for predicted 1-, 3-, and 5-year overall mortality were 0.820, 0.820, and 0.830. The results showed that the nomograms possessed good agreement compared with the observed outcomes. It could provide clinicians with a personalized predicted risk of death information to evaluate the potential changes of the disease-specific condition so that clinicians can adjust therapy options when combined with the actual condition of the patient, which is beneficial to patients.
Collapse
Affiliation(s)
- Xiaofeng Ni
- Department of Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China
- Key Laboratory of Diagnosis and Treatment of Severe Hepato-Pancreatic Diseases of Zhejiang Province, Zhejiang Provincial Top Key Discipline in Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Ding Li
- Key Laboratory of Diagnosis and Treatment of Severe Hepato-Pancreatic Diseases of Zhejiang Province, Zhejiang Provincial Top Key Discipline in Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Shengjie Dai
- Department of Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Hao Pan
- Department of Orthopaedics, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Hongwei Sun
- Department of Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Jianyang Ao
- Department of Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Lei Chen
- Department of Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China
- Department of Orthopaedics, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Hongru Kong
- Department of Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China
- Key Laboratory of Diagnosis and Treatment of Severe Hepato-Pancreatic Diseases of Zhejiang Province, Zhejiang Provincial Top Key Discipline in Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| |
Collapse
|
12
|
Ho QT, Nguyen TTD, Khanh Le NQ, Ou YY. FAD-BERT: Improved prediction of FAD binding sites using pre-training of deep bidirectional transformers. Comput Biol Med 2021; 131:104258. [PMID: 33601085 DOI: 10.1016/j.compbiomed.2021.104258] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 01/16/2021] [Accepted: 02/03/2021] [Indexed: 02/07/2023]
Abstract
The electron transport chain is a series of protein complexes embedded in the process of cellular respiration, which is an important process to transfer electrons and other macromolecules throughout the cell. Identifying Flavin Adenine Dinucleotide (FAD) binding sites in the electron transport chain is vital since it helps biological researchers precisely understand how electrons are produced and are transported in cells. This study distills and analyzes the contextualized word embedding from pre-trained BERT models to explore similarities in natural language and protein sequences. Thereby, we propose a new approach based on Pre-training of Bidirectional Encoder Representations from Transformers (BERT), Position-specific Scoring Matrix profiles (PSSM), Amino Acid Index database (AAIndex) to predict FAD-binding sites from the transport proteins which are found in nature recently. Our proposed approach archives 85.14% accuracy and improves accuracy by 11%, with Matthew's correlation coefficient of 0.39 compared to the previous method on the same independent set. We also deploy a web server that identifies FAD-binding sites in electron transporters available for academics at http://140.138.155.216/fadbert/.
Collapse
Affiliation(s)
- Quang-Thai Ho
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan; College of Information & Communication Technology, Can Tho University, Viet Nam
| | | | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei City, 106, Taiwan; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei City, 106, Taiwan
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan.
| |
Collapse
|
13
|
Abstract
Background:
The evolutionary history of organisms can be described by phylogenetic
trees. We need to compare the topologies of rooted phylogenetic trees when researching the
evolution of a given set of species.
Objective:
Up to now, there are several metrics measuring the dissimilarity between rooted
phylogenetic trees, and those metrics are defined by different ways.
Methods:
This paper analyzes those metrics from their definitions and the distance values
computed by those metrics by terms of experiments.
Results:
The results of experiments show that the distances calculated by the cluster metric, the
partition metric, and the equivalent metric have a good Gaussian fitting, and the equivalent metric
can describe the difference between trees better than the others.
Conclusion:
Moreover, it presents a tool called as CDRPT (Computing Distance for Rooted
Phylogenetic Trees). CDRPT is a web server to calculate the distance for trees by an on-line way.
CDRPT can also be off-line used by means of installing application packages for the Windows
system. It greatly facilitates the use of researchers. The home page of CDRPT is
http://bioinformatics.imu.edu.cn/tree/.
Collapse
Affiliation(s)
- Juan Wang
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Xinyue Qi
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Bo Cui
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
14
|
Li J, Chang M, Gao Q, Song X, Gao Z. Lung Cancer Classification and Gene Selection by Combining Affinity Propagation Clustering and Sparse Group Lasso. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017103557] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Background:
Cancer threatens human health seriously. Diagnosing cancer via gene expression
analysis is a hot topic in cancer research.
Objective:
The study aimed to diagnose the accurate type of lung cancer and discover the pathogenic
genes.
Methods:
In this study, Affinity Propagation (AP) clustering with similarity score was employed
to each type of lung cancer and normal lung. After grouping genes, sparse group lasso was adopted
to construct four binary classifiers and the voting strategy was used to integrate them.
Results:
This study screened six gene groups that may associate with different lung cancer subtypes
among 73 genes groups, and identified three possible key pathogenic genes, KRAS, BRAF
and VDR. Furthermore, this study achieved improved classification accuracies at minority classes
SQ and COID in comparison with other four methods.
Conclusion:
We propose the AP clustering based sparse group lasso (AP-SGL), which provides
an alternative for simultaneous diagnosis and gene selection for lung cancer.
Collapse
Affiliation(s)
- Juntao Li
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Mingming Chang
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Qinghui Gao
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Xuekun Song
- School of Information Technology, Henan University of Chinese Medicine, Zhengzhou, 450046, China
| | - Zhiyu Gao
- School of Information Technology, Henan University of Chinese Medicine, Zhengzhou, 450046, China
| |
Collapse
|
15
|
Alphonse AS, Mary NAB, Starvin MS. Classification of membrane protein using Tetra Peptide Pattern. Anal Biochem 2020; 606:113845. [PMID: 32739352 DOI: 10.1016/j.ab.2020.113845] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 06/17/2020] [Accepted: 06/22/2020] [Indexed: 11/29/2022]
Abstract
Membrane proteins play an important role in the life activities of organisms. The mechanism of cell structures and biological activities can be identified only by knowing the functional types of membrane proteins which accelerate the process. Therefore, it is greatly necessary to build up computational approaches for timely and accurate prediction of the functional types of membrane protein. The proposed method analyzes the structure of the membrane proteins using novel Tetra Peptide Pattern (TPP)-based feature extraction technique. A frequency occurrence matrix is created from which a feature vector is formed. This feature vector captures the pattern among amino acids in a membrane protein sequence. The feature vector is reduced in the dimension using General Kernel-based Supervised Principal Component Analysis (GKSPCA). Stacked Restricted Boltzmann Machines (RBM) in Deep Belief Network (DBN) is used for classification. The RBM is the building block of Deep Belief Network. The proposed method achieves good results on two datasets. The performance of the proposed method was analyzed using Accuracy, Specificity, Sensitivity and Mathew's correlation coefficient. The proposed method achieves good results when compared to other state-of-the-art techniques.
Collapse
Affiliation(s)
| | | | - M S Starvin
- University College of Engineering, Nagercoil, 629004, India.
| |
Collapse
|
16
|
Kumar R, Donakonda S, Müller SA, Bötzel K, Höglinger GU, Koeglsperger T. FGF2 Affects Parkinson's Disease-Associated Molecular Networks Through Exosomal Rab8b/Rab31. Front Genet 2020; 11:572058. [PMID: 33101391 PMCID: PMC7545478 DOI: 10.3389/fgene.2020.572058] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 09/02/2020] [Indexed: 01/24/2023] Open
Abstract
Ras-associated binding (Rab) proteins are small GTPases that regulate the trafficking of membrane components during endocytosis and exocytosis including the release of extracellular vesicles (EVs). Parkinson’s disease (PD) is one of the most prevalent neurodegenerative disorder in the elderly population, where pathological proteins such as alpha-synuclein (α-Syn) are transmitted in EVs from one neuron to another neuron and ultimately across brain regions, thereby facilitating the spreading of pathology. We recently demonstrated fibroblast growth factor-2 (FGF2) to enhance the release of EVs and delineated the proteomic signature of FGF2-triggered EVs in cultured primary hippocampal neurons. Out of 235 significantly upregulated proteins, we found that FGF2 specifically enriched EVs for the two Rab family members Rab8b and Rab31. Consequently, we investigated the interactions of Rab8b and Rab31 using a network analysis approach in order to estimate the global influence of their enrichment in EVs. To achieve this, we have demarcated a protein–protein interaction network (PPiN) for these Rabs and identified the proteins associated with PD in various cellular components of the central nervous system (CNS), in different brain regions, and in the enteric nervous system (ENS). A total of 126 direct or indirect interactions were reported for two Rab candidates, out of which 114 are Rab8b interactions and 54 are Rab31 interactions, ultimately resulting in an individual interaction score (IS) of 90.48 and 42.86%, respectively. Conclusively, these results for the first time demonstrate the relevance of FGF2-induced Rab-enrichment in EVs and its potential to regulate PD pathophysiology.
Collapse
Affiliation(s)
- Rohit Kumar
- German Center for Neurodegenerative Diseases (DZNE), Munich, Germany.,Faculty of Medicine, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany.,Department of Neurology, Ludwig Maximilian University, Munich, Germany
| | - Sainitin Donakonda
- Institute of Immunology and Experimental Oncology, Technical University of Munich, Munich, Germany
| | - Stephan A Müller
- German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
| | - Kai Bötzel
- Department of Neurology, Ludwig Maximilian University, Munich, Germany
| | - Günter U Höglinger
- German Center for Neurodegenerative Diseases (DZNE), Munich, Germany.,Faculty of Medicine, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany.,Department of Neurology, Hannover Medical School, Hanover, Germany
| | - Thomas Koeglsperger
- German Center for Neurodegenerative Diseases (DZNE), Munich, Germany.,Department of Neurology, Ludwig Maximilian University, Munich, Germany
| |
Collapse
|
17
|
He B, Lu Q, Lang J, Yu H, Peng C, Bing P, Li S, Zhou Q, Liang Y, Tian G. A New Method for CTC Images Recognition Based on Machine Learning. Front Bioeng Biotechnol 2020; 8:897. [PMID: 32850745 PMCID: PMC7423836 DOI: 10.3389/fbioe.2020.00897] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 07/13/2020] [Indexed: 12/18/2022] Open
Abstract
Circulating tumor cells (CTCs) derived from primary tumors and/or metastatic tumors are markers for tumor prognosis, and can also be used to monitor therapeutic efficacy and tumor recurrence. Circulating tumor cells enrichment and screening can be automated, but the final counting of CTCs currently requires manual intervention. This not only requires the participation of experienced pathologists, but also easily causes artificial misjudgment. Medical image recognition based on machine learning can effectively reduce the workload and improve the level of automation. So, we use machine learning to identify CTCs. First, we collected the CTC test results of 600 patients. After immunofluorescence staining, each picture presented a positive CTC cell nucleus and several negative controls. The images of CTCs were then segmented by image denoising, image filtering, edge detection, image expansion and contraction techniques using python’s openCV scheme. Subsequently, traditional image recognition methods and machine learning were used to identify CTCs. Machine learning algorithms are implemented using convolutional neural network deep learning networks for training. We took 2300 cells from 600 patients for training and testing. About 1300 cells were used for training and the others were used for testing. The sensitivity and specificity of recognition reached 90.3 and 91.3%, respectively. We will further revise our models, hoping to achieve a higher sensitivity and specificity.
Collapse
Affiliation(s)
- Binsheng He
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Qingqing Lu
- Geneis (Beijing) Co., Ltd., Beijing, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Jidong Lang
- Geneis (Beijing) Co., Ltd., Beijing, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Hai Yu
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Chao Peng
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Pingping Bing
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Shijun Li
- Department of Pathology, Chifeng Municipal Hospital, Chifeng, China
| | - Qiliang Zhou
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Yuebin Liang
- Geneis (Beijing) Co., Ltd., Beijing, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Geng Tian
- Geneis (Beijing) Co., Ltd., Beijing, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| |
Collapse
|
18
|
Liang Y, Wang H, Yang J, Li X, Dai C, Shao P, Tian G, Wang B, Wang Y. A Deep Learning Framework to Predict Tumor Tissue-of-Origin Based on Copy Number Alteration. Front Bioeng Biotechnol 2020; 8:701. [PMID: 32850687 PMCID: PMC7419421 DOI: 10.3389/fbioe.2020.00701] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Accepted: 06/04/2020] [Indexed: 12/18/2022] Open
Abstract
Cancer of unknown primary site (CUPS) is a type of metastatic tumor for which the sites of tumor origin cannot be determined. Precise diagnosis of the tissue origin for metastatic CUPS is crucial for developing treatment schemes to improve patient prognosis. Recently, there have been many studies using various cancer biomarkers to predict the tissue-of-origin (TOO) of CUPS. However, only a very few of them use copy number alteration (CNA) to trance TOO. In this paper, a two-step computational framework called CNA_origin is introduced to predict the tissue-of-origin of a tumor from its gene CNA levels. CNA_origin set up an intellectual deep-learning network mainly composed of an autoencoder and a convolution neural network (CNN). Based on real datasets released from the public database, CNA_origin had an overall accuracy of 83.81% on 10-fold cross-validation and 79% on independent datasets for predicting tumor origin, which improved the accuracy by 7.75 and 9.72% compared with the method published in a previous paper. Our results suggested that the autoencoder model can extract key characteristics of CNA and that the CNN classifier model developed in this study can predict the origin of tumors robustly and effectively. CNA_origin was written in Python and can be downloaded from https://github.com/YingLianghnu/CNA_origin.
Collapse
Affiliation(s)
- Ying Liang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Haifeng Wang
- Department of Urology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | | | - Xiong Li
- School of Software, East China Jiaotong University, Nanchang, China
| | - Chan Dai
- Geneis (Beijing) Co. Ltd., Beijing, China
| | - Peng Shao
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Geng Tian
- Geneis (Beijing) Co. Ltd., Beijing, China
| | - Bo Wang
- Geneis (Beijing) Co. Ltd., Beijing, China
| | - Yinglong Wang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| |
Collapse
|
19
|
Jin D, Jiao Y, Ji J, Jiang W, Ni W, Wu Y, Ni R, Lu C, Qu L, Ni H, Liu J, Xu W, Xiao M. Identification of prognostic risk factors for pancreatic cancer using bioinformatics analysis. PeerJ 2020; 8:e9301. [PMID: 32587798 PMCID: PMC7301898 DOI: 10.7717/peerj.9301] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 05/15/2020] [Indexed: 12/11/2022] Open
Abstract
Background Pancreatic cancer is one of the most common malignant cancers worldwide. Currently, the pathogenesis of pancreatic cancer remains unclear; thus, it is necessary to explore its precise molecular mechanisms. Methods To identify candidate genes involved in the tumorigenesis and proliferation of pancreatic cancer, the microarray datasets GSE32676, GSE15471 and GSE71989 were downloaded from the Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) between Pancreatic ductal adenocarcinoma (PDAC) and nonmalignant samples were screened by GEO2R. The Database for Annotation Visualization and Integrated Discovery (DAVID) online tool was used to obtain a synthetic set of functional annotation information for the DEGs. A PPI network of the DEGs was established using the Search Tool for the Retrieval of Interacting Genes (STRING) database, and a combination of more than 0.4 was considered statistically significant for the PPI. Subsequently, we visualized the PPI network using Cytoscape. Functional module analysis was then performed using Molecular Complex Detection (MCODE). Genes with a degree ≥10 were chosen as hub genes, and pathways of the hub genes were visualized using ClueGO and CluePedia. Additionally, GenCLiP 2.0 was used to explore interactions of hub genes. The Literature Mining Gene Networks module was applied to explore the cocitation of hub genes. The Cytoscape plugin iRegulon was employed to analyze transcription factors regulating the hub genes. Furthermore, the expression levels of the 13 hub genes in pancreatic cancer tissues and normal samples were validated using the Gene Expression Profiling Interactive Analysis (GEPIA) platform. Moreover, overall survival and disease-free survival analyses according to the expression of hub genes were performed using Kaplan-Meier curve analysis in the cBioPortal online platform. The relationship between expression level and tumor grade was analyzed using the online database Oncomine. Lastly, the eight snap-frozen tumorous and adjacent noncancerous adjacent tissues of pancreatic cancer patients used to detect the CDK1 and CEP55 protein levels by western blot. Conclusions Altogether, the DEGs and hub genes identified in this work can help uncover the molecular mechanisms underlying the tumorigenesis of pancreatic cancer and provide potential targets for the diagnosis and treatment of this disease.
Collapse
Affiliation(s)
- Dandan Jin
- Department of Gastroenterology, Affiliated Hospital of Nantong University, Nantong, China.,Clinical Medicine, Medical College, Nantong University, Nantong, China
| | - Yujie Jiao
- Department of Gastroenterology, Affiliated Hospital of Nantong University, Nantong, China.,Clinical Medicine, Medical College, Nantong University, Nantong, China
| | - Jie Ji
- Department of Gastroenterology, Affiliated Hospital of Nantong University, Nantong, China.,Clinical Medicine, Medical College, Nantong University, Nantong, China
| | - Wei Jiang
- Department of Emergency, Affiliated Hospital of Nantong University, Nantong, China
| | - Wenkai Ni
- Department of Gastroenterology, Affiliated Hospital of Nantong University, Nantong, China
| | - Yingcheng Wu
- Clinical Medicine, Medical College, Nantong University, Nantong, China
| | - Runzhou Ni
- Department of Gastroenterology, Affiliated Hospital of Nantong University, Nantong, China
| | - Cuihua Lu
- Department of Gastroenterology, Affiliated Hospital of Nantong University, Nantong, China
| | - Lishuai Qu
- Department of Gastroenterology, Affiliated Hospital of Nantong University, Nantong, China
| | - Hongbing Ni
- Department of Gastroenterology, Affiliated Hospital of Nantong University, Nantong, China
| | - Jinxia Liu
- Department of Gastroenterology, Affiliated Hospital of Nantong University, Nantong, China
| | - Weisong Xu
- Department of Gastroenterology, Second People's Hospital of Nantong, Nantong, China
| | - MingBing Xiao
- Department of Gastroenterology, Affiliated Hospital of Nantong University, Nantong, China.,Research Center of Clinical Medicine, Affiliated Hospital of Nantong University, Nantong, China
| |
Collapse
|
20
|
Zhu X, Wang X, Zhao H, Pei T, Kuang L, Wang L. BHCMDA: A New Biased Heat Conduction Based Method for Potential MiRNA-Disease Association Prediction. Front Genet 2020; 11:384. [PMID: 32425979 PMCID: PMC7212362 DOI: 10.3389/fgene.2020.00384] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 03/27/2020] [Indexed: 01/04/2023] Open
Abstract
Recent studies have indicated that microRNAs (miRNAs) are closely related to sundry human sophisticated diseases. According to the surmise that functionally similar miRNAs are more likely associated with phenotypically similar diseases, researchers have proposed a variety of valid computational models through integrating known miRNA-disease associations, disease semantic similarity, miRNA functional similarity, and Gaussian interaction profile kernel similarity to discover the potential miRNA-disease relationships in biomedical researches. Taking account of the limitations of previous computational models, a new computational model based on biased heat conduction for MiRNA-Disease Association prediction (BHCMDA) was proposed in this paper, which can achieve the AUC of 0.8890 in LOOCV (Leave-One-Out Cross Validation) and the mean AUC of 0.9060, 0.8931 under the framework of twofold cross validation, fivefold cross validation, respectively. In addition, BHCMDA was further implemented to the case studies of three vital human cancers, and simulation results illustrated that there were 88% (Esophageal Neoplasms), 92% (Colonic Neoplasms) and 92% (Lymphoma) out of top 50 predicted miRNAs having been confirmed by experimental literatures, separately, which demonstrated the good performance of BHCMDA as well. Thence, BHCMDA would be a useful calculative resource for potential miRNA-disease association prediction.
Collapse
Affiliation(s)
- Xianyou Zhu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
| | - Xuzai Wang
- Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, China
| | - Haochen Zhao
- Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, China
| | - Tingrui Pei
- Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, China
| | - Linai Kuang
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, China
| | - Lei Wang
- Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, China.,College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|
21
|
Jiang H, Yang M, Chen X, Li M, Li Y, Wang J. miRTMC: A miRNA Target Prediction Method Based on Matrix Completion Algorithm. IEEE J Biomed Health Inform 2020; 24:3630-3641. [PMID: 32287029 DOI: 10.1109/jbhi.2020.2987034] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
microRNAs (miRNAs) are small non-coding RNAs which modulate the stability of gene targets and their rates of translation into proteins at transcriptional level and post-transcriptional level. miRNA dysfunctions can lead to human diseases because of dysregulation of their targets. Correct miRNA target prediction will lead to better understanding of the mechanisms of human diseases and provide hints on curing them. In recent years, computational miRNA target prediction methods have been proposed according to the interaction rules between miRNAs and targets. However, these methods suffer from high false positive rates due to the complicated relationship between miRNAs and their targets. The rapidly growing number of experimentally validated miRNA targets enables predicting miRNA targets with high precision via accurate data analysis. Taking advantage of these known miRNA targets, a novel recommendation system model (miRTMC) for miRNA target prediction is established using a new matrix completion algorithm. In miRTMC, a heterogeneous network is constructed by integrating the miRNA similarity network, the gene similarity network, and the miRNA-gene interaction network. Our assumption is that the latent factors determining whether a gene is the target of miRNA or not are highly correlated, i.e., the adjacency matrix of the heterogeneous network is low-rank, which is then completed by using a nuclear norm regularized linear least squares model under non-negative constraints. Alternating direction method of multipliers (ADMM) is adopted to numerically solve the matrix completion problem. Our results show that miRTMC outperforms the competing methods in terms of various evaluation metrics. Our software package is available at https://github.com/hjiangcsu/miRTMC.
Collapse
|
22
|
Soetje B, Fuellekrug J, Haffner D, Ziegler WH. Application and Comparison of Supervised Learning Strategies to Classify Polarity of Epithelial Cell Spheroids in 3D Culture. Front Genet 2020; 11:248. [PMID: 32292417 PMCID: PMC7119422 DOI: 10.3389/fgene.2020.00248] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 03/02/2020] [Indexed: 12/16/2022] Open
Abstract
Three-dimensional culture systems that allow generation of monolayered epithelial cell spheroids are widely used to study epithelial function in vitro. Epithelial spheroid formation is applied to address cellular consequences of (mono)-genetic disorders, that is, ciliopathies, in toxicity testing, or to develop treatment options aimed to restore proper epithelial cell characteristics and function. With the potential of a high-throughput method, the main obstacle to efficient application of the spheroid formation assay so far is the laborious, time-consuming, and bias-prone analysis of spheroid images by individuals. Hundredths of multidimensional fluorescence images are blinded, rated by three persons, and subsequently, differences in ratings are compared and discussed. Here, we apply supervised learning and compare strategies based on machine learning versus deep learning. While deep learning approaches can directly process raw image data, machine learning requires transformed data of features extracted from fluorescence images. We verify the accuracy of both strategies on a validation data set, analyse an experimental data set, and observe that different strategies can be very accurate. Deep learning, however, is less sensitive to overfitting and experimental batch-to-batch variations, thus providing a rather powerful and easily adjustable classification tool.
Collapse
Affiliation(s)
- Birga Soetje
- Department of Paediatric Kidney, Liver and Metabolic Diseases, Hannover Medical School, Hanover, Germany
| | - Joachim Fuellekrug
- Molecular Cell Biology Laboratory, Internal Medicine IV, University Hospital Heidelberg, Heidelberg, Germany
| | - Dieter Haffner
- Department of Paediatric Kidney, Liver and Metabolic Diseases, Hannover Medical School, Hanover, Germany
| | - Wolfgang H. Ziegler
- Department of Paediatric Kidney, Liver and Metabolic Diseases, Hannover Medical School, Hanover, Germany
| |
Collapse
|
23
|
Le NQK, Ho QT, Ou YY. Using two-dimensional convolutional neural networks for identifying GTP binding sites in Rab proteins. J Bioinform Comput Biol 2020; 17:1950005. [PMID: 30866734 DOI: 10.1142/s0219720019500057] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predict GTP binding sites in Rab proteins, which is one of the most vital molecular functions in life science. A functional loss of GTP binding sites in Rab proteins has been implicated in a variety of human diseases (choroideremia, intellectual disability, cancer, Parkinson's disease). Therefore, creating a precise model to identify their functions is a crucial problem for understanding these diseases and designing the drug targets. Our deep learning model with two-dimensional convolutional neural network and position-specific scoring matrix profiles could identify GTP binding residues with achieved sensitivity of 92.3%, specificity of 99.8%, accuracy of 99.5%, and MCC of 0.92 for independent dataset. Compared with other published works, this approach achieved a significant improvement. Throughout the proposed study, we provide an effective model for predicting GTP binding sites in Rab proteins and a basis for further research that can apply deep learning in bioinformatics, especially in nucleotide binding site prediction.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- * Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan 32003, R. O. C.,† School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore 639798, Singapore
| | - Quang-Thai Ho
- * Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan 32003, R. O. C
| | - Yu-Yen Ou
- * Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan 32003, R. O. C
| |
Collapse
|
24
|
Le NQK, Ho QT, Yapp EKY, Ou YY, Yeh HY. DeepETC: A deep convolutional neural network architecture for investigating and classifying electron transport chain's complexes. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.09.070] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
25
|
Chandra A, Sharma A, Dehzangi A, Shigemizu D, Tsunoda T. Bigram-PGK: phosphoglycerylation prediction using the technique of bigram probabilities of position specific scoring matrix. BMC Mol Cell Biol 2019; 20:57. [PMID: 31856704 PMCID: PMC6923822 DOI: 10.1186/s12860-019-0240-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Accepted: 11/20/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The biological process known as post-translational modification (PTM) is a condition whereby proteomes are modified that affects normal cell biology, and hence the pathogenesis. A number of PTMs have been discovered in the recent years and lysine phosphoglycerylation is one of the fairly recent developments. Even with a large number of proteins being sequenced in the post-genomic era, the identification of phosphoglycerylation remains a big challenge due to factors such as cost, time consumption and inefficiency involved in the experimental efforts. To overcome this issue, computational techniques have emerged to accurately identify phosphoglycerylated lysine residues. However, the computational techniques proposed so far hold limitations to correctly predict this covalent modification. RESULTS We propose a new predictor in this paper called Bigram-PGK which uses evolutionary information of amino acids to try and predict phosphoglycerylated sites. The benchmark dataset which contains experimentally labelled sites is employed for this purpose and profile bigram occurrences is calculated from position specific scoring matrices of amino acids in the protein sequences. The statistical measures of this work, such as sensitivity, specificity, precision, accuracy, Mathews correlation coefficient and area under ROC curve have been reported to be 0.9642, 0.8973, 0.8253, 0.9193, 0.8330, 0.9306, respectively. CONCLUSIONS The proposed predictor, based on the feature of evolutionary information and support vector machine classifier, has shown great potential to effectively predict phosphoglycerylated and non-phosphoglycerylated lysine residues when compared against the existing predictors. The data and software of this work can be acquired from https://github.com/abelavit/Bigram-PGK.
Collapse
Affiliation(s)
- Abel Chandra
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji.
| | - Alok Sharma
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji. .,Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, 4111, Australia. .,Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan. .,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan. .,CREST, JST, Tokyo, 102-8666, Japan.
| | - Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, MD, USA
| | - Daichi Shigemizu
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan.,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan.,CREST, JST, Tokyo, 102-8666, Japan.,Medical Genome Center, National Center for Geriatrics and Gerontology, Obu, Aichi, 474-8511, Japan
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan.,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan.,CREST, JST, Tokyo, 102-8666, Japan.,Laboratory for Medical Science Mathematics, Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, 108-8639, Japan
| |
Collapse
|
26
|
Le NQK, Huynh TT. Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation. Front Physiol 2019; 10:1501. [PMID: 31920706 PMCID: PMC6914855 DOI: 10.3389/fphys.2019.01501] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Accepted: 11/26/2019] [Indexed: 12/12/2022] Open
Abstract
SNAREs (soluble N-ethylmaleimide-sensitive factor activating protein receptors) are a group of proteins that are crucial for membrane fusion and exocytosis of neurotransmitters from the cell. They play an important role in a broad range of cell processes, including cell growth, cytokinesis, and synaptic transmission, to promote cell membrane integration in eukaryotes. Many studies determined that SNARE proteins have been associated with a lot of human diseases, especially in cancer. Therefore, identifying their functions is a challenging problem for scientists to better understand the cancer disease as well as design the drug targets for treatment. We described each protein sequence based on the amino acid embeddings using fastText, which is a natural language processing model performing well in its field. Because each protein sequence is similar to a sentence with different words, applying language model into protein sequence is challenging and promising. After generating, the amino acid embedding features were fed into a deep learning algorithm for prediction. Our model which combines fastText model and deep convolutional neural networks could identify SNARE proteins with an independent test accuracy of 92.8%, sensitivity of 88.5%, specificity of 97%, and Matthews correlation coefficient (MCC) of 0.86. Our performance results were superior to the state-of-the-art predictor (SNARE-CNN). We suggest this study as a reliable method for biologists for SNARE identification and it serves a basis for applying fastText word embedding model into bioinformatics, especially in protein sequencing prediction.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taipei, Taiwan
| | - Tuan-Tu Huynh
- Department of Electrical Electronic and Mechanical Engineering, Lac Hong University, Bien Hoa, Vietnam
- Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan
| |
Collapse
|
27
|
Pirgazi J, Alimoradi M, Esmaeili Abharian T, Olyaee MH. An Efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets. Sci Rep 2019; 9:18580. [PMID: 31819106 PMCID: PMC6901457 DOI: 10.1038/s41598-019-54987-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 11/22/2019] [Indexed: 01/06/2023] Open
Abstract
Feature selection problem is one of the most significant issues in data classification. The purpose of feature selection is selection of the least number of features in order to increase accuracy and decrease the cost of data classification. In recent years, due to appearance of high-dimensional datasets with low number of samples, classification models have encountered over-fitting problem. Therefore, the need for feature selection methods that are used to remove the extensions and irrelevant features is felt. Recently, although, various methods have been proposed for selecting the optimal subset of features with high precision, these methods have encountered some problems such as instability, high convergence time, selection of a semi-optimal solution as the final result. In other words, they have not been able to fully extract the effective features. In this paper, a hybrid method based on the IWSSr method and Shuffled Frog Leaping Algorithm (SFLA) is proposed to select effective features in a large-scale gene dataset. The proposed algorithm is implemented in two phases: filtering and wrapping. In the filter phase, the Relief method is used for weighting features. Then, in the wrapping phase, by using the SFLA and the IWSSr algorithms, the search for effective features in a feature-rich area is performed. The proposed method is evaluated by using some standard gene expression datasets. The experimental results approve that the proposed approach in comparison to similar methods, has been achieved a more compact set of features along with high accuracy. The source code and testing datasets are available at https://github.com/jimy2020/SFLA_IWSSr-Feature-Selection.
Collapse
Affiliation(s)
- Jamshid Pirgazi
- Faculty of Engineering, Department of Computer Engineering, University of Gonabad, Gonabad, Iran.
| | - Mohsen Alimoradi
- Faculty of Electronic, Computer & IT Department of Computer, Qazvin Islamic Azad University, Qazvin, Iran
| | - Tahereh Esmaeili Abharian
- Faculty of Electronic, Computer & IT Department of Computer, Qazvin Islamic Azad University, Qazvin, Iran
| | - Mohammad Hossein Olyaee
- Faculty of Engineering, Department of Computer Engineering, University of Gonabad, Gonabad, Iran
| |
Collapse
|
28
|
Hong Y, Hou B, Jiang H, Zhang J. Machine learning and artificial neural network accelerated computational discoveries in materials science. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1450] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Yang Hong
- Department of Chemistry University of Nebraska‐Lincoln Lincoln Nebraska
| | - Bo Hou
- Department of Engineering University of Cambridge Cambridge UK
| | - Hengle Jiang
- Holland Computing Center University of Nebraska‐Lincoln Lincoln Nebraska
| | - Jingchao Zhang
- Holland Computing Center University of Nebraska‐Lincoln Lincoln Nebraska
| |
Collapse
|
29
|
Chang W, Liu Y, Xiao Y, Yuan X, Xu X, Zhang S, Zhou S. A Machine-Learning-Based Prediction Method for Hypertension Outcomes Based on Medical Data. Diagnostics (Basel) 2019; 9:diagnostics9040178. [PMID: 31703364 PMCID: PMC6963807 DOI: 10.3390/diagnostics9040178] [Citation(s) in RCA: 73] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 11/04/2019] [Accepted: 11/05/2019] [Indexed: 01/28/2023] Open
Abstract
The outcomes of hypertension refer to the death or serious complications (such as myocardial infarction or stroke) that may occur in patients with hypertension. The outcomes of hypertension are very concerning for patients and doctors, and are ideally avoided. However, there is no satisfactory method for predicting the outcomes of hypertension. Therefore, this paper proposes a prediction method for outcomes based on physical examination indicators of hypertension patients. In this work, we divide the patients' outcome prediction into two steps. The first step is to extract the key features from the patients' many physical examination indicators. The second step is to use the key features extracted from the first step to predict the patients' outcomes. To this end, we propose a model combining recursive feature elimination with a cross-validation method and classification algorithm. In the first step, we use the recursive feature elimination algorithm to rank the importance of all features, and then extract the optimal features subset using cross-validation. In the second step, we use four classification algorithms (support vector machine (SVM), C4.5 decision tree, random forest (RF), and extreme gradient boosting (XGBoost)) to accurately predict patient outcomes by using their optimal features subset. The selected model prediction performance evaluation metrics are accuracy, F1 measure, and area under receiver operating characteristic curve. The 10-fold cross-validation shows that C4.5, RF, and XGBoost can achieve very good prediction results with a small number of features, and the classifier after recursive feature elimination with cross-validation feature selection has better prediction performance. Among the four classifiers, XGBoost has the best prediction performance, and its accuracy, F1, and area under receiver operating characteristic curve (AUC) values are 94.36%, 0.875, and 0.927, respectively, using the optimal features subset. This article's prediction of hypertension outcomes contributes to the in-depth study of hypertension complications and has strong practical significance.
Collapse
|
30
|
Kurisu K, Yoshiuchi K, Ogino K, Oda T. Machine learning analysis to identify the association between risk factors and onset of nosocomial diarrhea: a retrospective cohort study. PeerJ 2019; 7:e7969. [PMID: 31687281 PMCID: PMC6825409 DOI: 10.7717/peerj.7969] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 10/01/2019] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Although several risk factors for nosocomial diarrhea have been identified, the detail of association between these factors and onset of nosocomial diarrhea, such as degree of importance or temporal pattern of influence, remains unclear. We aimed to determine the association between risk factors and onset of nosocomial diarrhea using machine learning algorithms. METHODS We retrospectively collected data of patients with acute cerebral infarction. Seven variables, including age, sex, modified Rankin Scale (mRS) score, and number of days of antibiotics, tube feeding, proton pump inhibitors, and histamine 2-receptor antagonist use, were used in the analysis. We split the data into a training dataset and independant test dataset. Based on the training dataset, we developed a random forest, support vector machine (SVM), and radial basis function (RBF) network model. By calculating an area under the curve (AUC) of the receiver operating characteristic curve using 5-fold cross-validation, we performed feature selection and hyperparameter optimization in each model. According to their final performances, we selected the optimal model and also validated it in the independent test dataset. Based on the selected model, we visualized the variable importance and the association between each variable and the outcome using partial dependence plots. RESULTS Two-hundred and eighteen patients were included. In the cross-validation within the training dataset, the random forest model achieved an AUC of 0.944, which was higher than in the SVM and RBF network models. The random forest model also achieved an AUC of 0.832 in the independent test dataset. Tube feeding use days, mRS score, antibiotic use days, age and sex were strongly associated with the onset of nosocomial diarrhea, in this order. Tube feeding use had an inverse U-shaped association with the outcome. The mRS score and age had a convex downward and increasing association, while antibiotic use had a convex upward association with the outcome. CONCLUSION We revealed the degree of importance and temporal pattern of the influence of several risk factors for nosocomial diarrhea, which could help clinicians manage nosocomial diarrhea.
Collapse
Affiliation(s)
- Ken Kurisu
- Department of Stress Sciences and Psychosomatic Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- Department of Infectious Diseases, Showa General Hospital, Tokyo, Japan
| | - Kazuhiro Yoshiuchi
- Department of Stress Sciences and Psychosomatic Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Kei Ogino
- Department of Stress Sciences and Psychosomatic Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- Department of Infectious Diseases, Showa General Hospital, Tokyo, Japan
| | - Toshimi Oda
- Department of Infectious Diseases, Showa General Hospital, Tokyo, Japan
| |
Collapse
|
31
|
Le NQK, Yapp EKY, Nagasundaram N, Chua MCH, Yeh HY. Computational identification of vesicular transport proteins from sequences using deep gated recurrent units architecture. Comput Struct Biotechnol J 2019; 17:1245-1254. [PMID: 31921391 PMCID: PMC6944713 DOI: 10.1016/j.csbj.2019.09.005] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 09/07/2019] [Accepted: 09/11/2019] [Indexed: 11/20/2022] Open
Abstract
Protein function prediction is one of the most well-studied topics, attracting attention from countless researchers in the field of computational biology. Implementing deep neural networks that help improve the prediction of protein function, however, is still a major challenge. In this research, we suggested a new strategy that includes gated recurrent units and position-specific scoring matrix profiles to predict vesicular transportation proteins, a biological function of great importance. Although it is difficult to discover its function, our model is able to achieve accuracies of 82.3% and 85.8% in the cross-validation and independent dataset, respectively. We also solve the problem of imbalance in the dataset via tuning class weight in the deep learning model. The results generated showed sensitivity, specificity, MCC, and AUC to have values of 79.2%, 82.9%, 0.52, and 0.861, respectively. Our strategy shows superiority in results on the same dataset against all other state-of-the-art algorithms. In our suggested research, we have suggested a technique for the discovery of more proteins, particularly proteins connected with vesicular transport. In addition, our accomplishment could encourage the use of gated recurrent units architecture in protein function prediction.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639818, Singapore
- Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan
| | - Edward Kien Yee Yapp
- Singapore Institute of Manufacturing Technology, 2 Fusionopolis Way, #08-04, Innovis, 138634, Singapore
| | - N. Nagasundaram
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639818, Singapore
| | - Matthew Chin Heng Chua
- Institute of Systems Science, 25 Heng Mui Keng Terrace, National University of Singapore, 119615, Singapore
| | - Hui-Yuan Yeh
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639818, Singapore
| |
Collapse
|
32
|
Gan M, Li W, Jiang R. EnContact: predicting enhancer-enhancer contacts using sequence-based deep learning model. PeerJ 2019; 7:e7657. [PMID: 31565573 PMCID: PMC6746221 DOI: 10.7717/peerj.7657] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 08/10/2019] [Indexed: 01/22/2023] Open
Abstract
Chromatin contacts between regulatory elements are of crucial importance for the interpretation of transcriptional regulation and the understanding of disease mechanisms. However, existing computational methods mainly focus on the prediction of interactions between enhancers and promoters, leaving enhancer-enhancer (E-E) interactions not well explored. In this work, we develop a novel deep learning approach, named Enhancer-enhancer contacts prediction (EnContact), to predict E-E contacts using genomic sequences as input. We statistically demonstrated the predicting ability of EnContact using training sets and testing sets derived from HiChIP data of seven cell lines. We also show that our model significantly outperforms other baseline methods. Besides, our model identifies finer-mapping E-E interactions from region-based chromatin contacts, where each region contains several enhancers. In addition, we identify a class of hub enhancers using the predicted E-E interactions and find that hub enhancers tend to be active across cell lines. We summarize that our EnContact model is capable of predicting E-E interactions using features automatically learned from genomic sequences.
Collapse
Affiliation(s)
- Mingxin Gan
- Donlinks School of Economics and Management, University of Science and Technology Beijing, Beijing, China
| | - Wenran Li
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic and Systems Biology, BNRist; Department of Automation, Tsinghua University, Beijing, China
- Department of Statistics, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic and Systems Biology, BNRist; Department of Automation, Tsinghua University, Beijing, China
| |
Collapse
|
33
|
Le NQK. Fertility-GRU: Identifying Fertility-Related Proteins by Incorporating Deep-Gated Recurrent Units and Original Position-Specific Scoring Matrix Profiles. J Proteome Res 2019. [DOI: 10.1021/acs.jproteome.9b00411 10.1021/acs.jproteome.9b00411] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Nguyen Quoc Khanh Le
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore 639798
| |
Collapse
|
34
|
Le NQK, Huynh TT, Yapp EKY, Yeh HY. Identification of clathrin proteins by incorporating hyperparameter optimization in deep learning and PSSM profiles. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 177:81-88. [PMID: 31319963 DOI: 10.1016/j.cmpb.2019.05.016] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 05/06/2019] [Accepted: 05/16/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND AND OBJECTIVES Clathrin is an adaptor protein that serves as the principal element of the vesicle-coating complex and is important for the membrane cleavage to dispense the invaginated vesicle from the plasma membrane. The functional loss of clathrins has been tied to a lot of human diseases, i.e., neurodegenerative disorders, cancer, Alzheimer's diseases, and so on. Therefore, creating a precise model to identify its functions is a crucial step towards understanding human diseases and designing drug targets. METHODS We present a deep learning model using a two-dimensional convolutional neural network (CNN) and position-specific scoring matrix (PSSM) profiles to identify clathrin proteins from high throughput sequences. Traditionally, the 2D CNNs take images as an input so we treated the PSSM profile with a 20 × 20 matrix as an image of 20 × 20 pixels. The input PSSM profile was then connected to our 2D CNN in which we set a variety of parameters to improve the performance of the model. Based on the 10-fold cross-validation results, hyper-parameter optimization process was employed to find the best model for our dataset. Finally, an independent dataset was used to assess the predictive ability of the current model. RESULTS Our model could identify clathrin proteins with sensitivity of 92.2%, specificity of 91.2%, accuracy of 91.8%, and MCC of 0.83 in the independent dataset. Compared to state-of-the-art traditional neural networks, our method achieved a significant improvement in all typical measurement metrics. CONCLUSIONS Throughout the proposed study, we provide an effective tool for investigating clathrin proteins and our achievement could promote the use of deep learning in biomedical research. We also provide source codes and dataset freely at https://www.github.com/khanhlee/deep-clathrin/.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639798 Singapore.
| | - Tuan-Tu Huynh
- Department of Electrical Electronic and Mechanical Engineering, Lac Hong University, No. 10 Huynh Van Nghe Road, Bien Hoa, Dong Nai, Vietnam
| | - Edward Kien Yee Yapp
- Singapore Institute of Manufacturing Technology, 2 Fusionopolis Way, #08-04, Innovis, 138634 Singapore
| | - Hui-Yuan Yeh
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639798 Singapore.
| |
Collapse
|
35
|
Le NQK. Fertility-GRU: Identifying Fertility-Related Proteins by Incorporating Deep-Gated Recurrent Units and Original Position-Specific Scoring Matrix Profiles. J Proteome Res 2019; 18:3503-3511. [DOI: 10.1021/acs.jproteome.9b00411] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Nguyen Quoc Khanh Le
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore 639798
| |
Collapse
|
36
|
Cho M, Kim JH, Hong KS, Kim JS, Kong HJ, Kim S. Identification of cecum time-location in a colonoscopy video by deep learning analysis of colonoscope movement. PeerJ 2019; 7:e7256. [PMID: 31392088 PMCID: PMC6673422 DOI: 10.7717/peerj.7256] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 06/05/2019] [Indexed: 12/11/2022] Open
Abstract
Background Cecal intubation time is an important component for quality colonoscopy. Cecum is the turning point that determines the insertion and withdrawal phase of the colonoscope. For this reason, obtaining information related with location of the cecum in the endoscopic procedure is very useful. Also, it is necessary to detect the direction of colonoscope's movement and time-location of the cecum. Methods In order to analysis the direction of scope's movement, the Horn-Schunck algorithm was used to compute the pixel's motion change between consecutive frames. Horn-Schunk-algorithm applied images were trained and tested through convolutional neural network deep learning methods, and classified to the insertion, withdrawal and stop movements. Based on the scope's movement, the graph was drawn with a value of +1 for insertion, -1 for withdrawal, and 0 for stop. We regarded the turning point as a cecum candidate point when the total graph area sum in a certain section recorded the lowest. Results A total of 328,927 frame images were obtained from 112 patients. The overall accuracy, drawn from 5-fold cross-validation, was 95.6%. When the value of "t" was 30 s, accuracy of cecum discovery was 96.7%. In order to increase visibility, the movement of the scope was added to summary report of colonoscopy video. Insertion, withdrawal, and stop movements were mapped to each color and expressed with various scale. As the scale increased, the distinction between the insertion phase and the withdrawal phase became clearer. Conclusion Information obtained in this study can be utilized as metadata for proficiency assessment. Since insertion and withdrawal are technically different movements, data of scope's movement and phase can be quantified and utilized to express pattern unique to the colonoscopist and to assess proficiency. Also, we hope that the findings of this study can contribute to the informatics field of medical records so that medical charts can be transmitted graphically and effectively in the field of colonoscopy.
Collapse
Affiliation(s)
- Minwoo Cho
- Interdisciplinary Program for Bioengineering, Graduate School, Seoul National University, Seoul, South Korea
| | - Jee Hyun Kim
- Department of Gastroenterology, Seoul National University Boramae Medical Center, Seoul, South Korea
| | - Kyoung Sup Hong
- Department of Gastroenterology, Mediplex Sejong Hospital, Incheon, South Korea
| | - Joo Sung Kim
- Department of Internal Medicine, Seoul National University College of Medicine, Seoul, South Korea
| | - Hyoun-Joong Kong
- Department of Biomedical Engineering, Chungnam National University College of Medicine, Daejeon, South Korea
| | - Sungwan Kim
- Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, South Korea
| |
Collapse
|
37
|
An Automated ECG Beat Classification System Using Deep Neural Networks with an Unsupervised Feature Extraction Technique. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9142921] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
An automated classification system based on a Deep Learning (DL) technique for Cardiac Disease (CD) monitoring and detection is proposed in this paper. The proposed DL architecture is divided into Deep Auto-Encoders (DAEs) as an unsupervised form of feature learning and Deep Neural Networks (DNNs) as a classifier. The objective of this study is to improve on the previous machine learning technique that consists of several data processing steps such as feature extraction and feature selection or feature reduction. It is also noticed that the previously used machine learning technique required human interference and expertise in determining robust features, yet was time-consuming in the labeling and data processing steps. In contrast, DL enables an embedded feature extraction and feature selection in DAEs pre-training and DNNs fine-tuning process directly from raw data. Hence, DAEs is able to extract high-level of features not only from the training data but also from unseen data. The proposed model uses 10 classes of imbalanced data from ECG signals. Since it is related to the cardiac region, abnormality is usually considered for an early diagnosis of CD. In order to validate the result, the proposed model is compared with the shallow models and DL approaches. Results found that the proposed method achieved a promising performance with 99.73% accuracy, 91.20% sensitivity, 93.60% precision, 99.80% specificity, and a 91.80% F1-Score. Moreover, both the Receiver Operating Characteristic (ROC) curve and the Precision-Recall (PR) curve from the confusion matrix showed that the developed model is a good classifier. The developed model based on unsupervised feature extraction and deep neural network is ready to be used on a large population before its installation for clinical usage.
Collapse
|
38
|
Le NQK, Yapp EKY, Yeh HY. ET-GRU: using multi-layer gated recurrent units to identify electron transport proteins. BMC Bioinformatics 2019; 20:377. [PMID: 31277574 PMCID: PMC6612191 DOI: 10.1186/s12859-019-2972-5] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2019] [Accepted: 06/27/2019] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND Electron transport chain is a series of protein complexes embedded in the process of cellular respiration, which is an important process to transfer electrons and other macromolecules throughout the cell. It is also the major process to extract energy via redox reactions in the case of oxidation of sugars. Many studies have determined that the electron transport protein has been implicated in a variety of human diseases, i.e. diabetes, Parkinson, Alzheimer's disease and so on. Few bioinformatics studies have been conducted to identify the electron transport proteins with high accuracy, however, their performance results require a lot of improvements. Here, we present a novel deep neural network architecture to address this problem. RESULTS Most of the previous studies could not use the original position specific scoring matrix (PSSM) profiles to feed into neural networks, leading to a lack of information and the neural networks consequently could not achieve the best results. In this paper, we present a novel approach by using deep gated recurrent units (GRU) on full PSSMs to resolve this problem. Our approach can precisely predict the electron transporters with the cross-validation and independent test accuracy of 93.5 and 92.3%, respectively. Our approach demonstrates superior performance to all of the state-of-the-art predictors on electron transport proteins. CONCLUSIONS Through the proposed study, we provide ET-GRU, a web server for discriminating electron transport proteins in particular and other protein functions in general. Also, our achievement could promote the use of GRU in computational biology, especially in protein function prediction.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore, 639798 Singapore
| | - Edward Kien Yee Yapp
- Singapore Institute of Manufacturing Technology, 2 Fusionopolis Way, #08-04, Innovis, Singapore, 138634 Singapore
| | - Hui-Yuan Yeh
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore, 639798 Singapore
| |
Collapse
|
39
|
Sramka M, Slovak M, Tuckova J, Stodulka P. Improving clinical refractive results of cataract surgery by machine learning. PeerJ 2019; 7:e7202. [PMID: 31304064 PMCID: PMC6611496 DOI: 10.7717/peerj.7202] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 05/27/2019] [Indexed: 11/20/2022] Open
Abstract
AIM To evaluate the potential of the Support Vector Machine Regression model (SVM-RM) and Multilayer Neural Network Ensemble model (MLNN-EM) to improve the intraocular lens (IOL) power calculation for clinical workflow. BACKGROUND Current IOL power calculation methods are limited in their accuracy with the possibility of decreased accuracy especially in eyes with an unusual ocular dimension. In case of an improperly calculated power of the IOL in cataract or refractive lens replacement surgery there is a risk of re-operation or further refractive correction. This may create potential complications and discomfort for the patient. METHODS A dataset containing information about 2,194 eyes was obtained using data mining process from the Electronic Health Record (EHR) system database of the Gemini Eye Clinic. The dataset was optimized and split into the selection set (used in the design for models and training), and the verification set (used in the evaluation). The set of mean prediction errors (PEs) and the distribution of predicted refractive errors were evaluated for both models and clinical results (CR). RESULTS Both models performed significantly better for the majority of the evaluated parameters compared with the CR. There was no significant difference between both evaluated models. In the ±0.50 D PE category both SVM-RM and MLNN-EM were slightly better than the Barrett Universal II formula, which is often presented as the most accurate calculation formula. CONCLUSION In comparison to the current clinical method, both SVM-RM and MLNN-EM have achieved significantly better results in IOL calculations and therefore have a strong potential to improve clinical cataract refractive outcomes.
Collapse
Affiliation(s)
- Martin Sramka
- Department of Circuit Theory/Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
- Research and Development Department, Gemini Eye Clinic, Zlin, Czech Republic
| | - Martin Slovak
- Research and Development Department, Gemini Eye Clinic, Zlin, Czech Republic
| | - Jana Tuckova
- Department of Circuit Theory/Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
| | - Pavel Stodulka
- Research and Development Department, Gemini Eye Clinic, Zlin, Czech Republic
- Department of Ophthalmology/Third Faculty of Medicine, Charles University, Prague, Czech Republic
| |
Collapse
|
40
|
Gao R, Wang M, Zhou J, Fu Y, Liang M, Guo D, Nie J. Prediction of Enzyme Function Based on Three Parallel Deep CNN and Amino Acid Mutation. Int J Mol Sci 2019; 20:E2845. [PMID: 31212665 PMCID: PMC6600291 DOI: 10.3390/ijms20112845] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Revised: 06/03/2019] [Accepted: 06/04/2019] [Indexed: 01/28/2023] Open
Abstract
During the past decade, due to the number of proteins in PDB database being increased gradually, traditional methods cannot better understand the function of newly discovered enzymes in chemical reactions. Computational models and protein feature representation for predicting enzymatic function are more important. Most of existing methods for predicting enzymatic function have used protein geometric structure or protein sequence alone. In this paper, the functions of enzymes are predicted from many-sided biological information including sequence information and structure information. Firstly, we extract the mutation information from amino acids sequence by the position scoring matrix and express structure information with amino acids distance and angle. Then, we use histogram to show the extracted sequence and structural features respectively. Meanwhile, we establish a network model of three parallel Deep Convolutional Neural Networks (DCNN) to learn three features of enzyme for function prediction simultaneously, and the outputs are fused through two different architectures. Finally, The proposed model was investigated on a large dataset of 43,843 enzymes from the PDB and achieved 92.34% correct classification when sequence information is considered, demonstrating an improvement compared with the previous result.
Collapse
Affiliation(s)
- Ruibo Gao
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Mengmeng Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Jiaoyan Zhou
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Yuhang Fu
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Meng Liang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Dongliang Guo
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Junlan Nie
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| |
Collapse
|
41
|
Yi HC, You ZH, Zhou X, Cheng L, Li X, Jiang TH, Chen ZH. ACP-DL: A Deep Learning Long Short-Term Memory Model to Predict Anticancer Peptides Using High-Efficiency Feature Representation. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 17:1-9. [PMID: 31173946 PMCID: PMC6554234 DOI: 10.1016/j.omtn.2019.04.025] [Citation(s) in RCA: 115] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Revised: 04/08/2019] [Accepted: 04/08/2019] [Indexed: 01/10/2023]
Abstract
Cancer is a well-known killer of human beings, which has led to countless deaths and misery. Anticancer peptides open a promising perspective for cancer treatment, and they have various attractive advantages. Conventional wet experiments are expensive and inefficient for finding and identifying novel anticancer peptides. There is an urgent need to develop a novel computational method to predict novel anticancer peptides. In this study, we propose a deep learning long short-term memory (LSTM) neural network model, ACP-DL, to effectively predict novel anticancer peptides. More specifically, to fully exploit peptide sequence information, we developed an efficient feature representation approach by integrating binary profile feature and k-mer sparse matrix of the reduced amino acid alphabet. Then we implemented a deep LSTM model to automatically learn how to identify anticancer peptides and non-anticancer peptides. To our knowledge, this is the first time that the deep LSTM model has been applied to predict anticancer peptides. It was demonstrated by cross-validation experiments that the proposed ACP-DL remarkably outperformed other comparison methods with high accuracy and satisfied specificity on benchmark datasets. In addition, we also contributed two new anticancer peptides benchmark datasets, ACP740 and ACP240, in this work. The source code and datasets are available at https://github.com/haichengyi/ACP-DL.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
| | - Xi Zhou
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Li Cheng
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Xiao Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Tong-Hai Jiang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| |
Collapse
|
42
|
iN6-methylat (5-step): identifying DNA N6-methyladenine sites in rice genome using continuous bag of nucleobases via Chou’s 5-step rule. Mol Genet Genomics 2019; 294:1173-1182. [DOI: 10.1007/s00438-019-01570-y] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2019] [Accepted: 04/25/2019] [Indexed: 12/21/2022]
|
43
|
Le NQK, Yapp EKY, Ou YY, Yeh HY. iMotor-CNN: Identifying molecular functions of cytoskeleton motor proteins using 2D convolutional neural network via Chou's 5-step rule. Anal Biochem 2019; 575:17-26. [PMID: 30930199 DOI: 10.1016/j.ab.2019.03.017] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 03/24/2019] [Accepted: 03/25/2019] [Indexed: 12/12/2022]
Abstract
Motor proteins are the driving force behind muscle contraction and are responsible for the active transportation of most proteins and vesicles in the cytoplasm. There are three superfamilies of cytoskeletal motor proteins with various molecular functions and structures: dynein, kinesin, and myosin. The functional loss of a specific motor protein molecular function has linked to a variety of human diseases, e.g., Charcot-Marie-Tooth disease, kidney disease. Therefore, creating a precise model to classify motor proteins is essential for helping biologists understand their molecular functions and design drug targets according to their impact on human diseases. Here we attempt to classify cytoskeleton motor proteins using deep learning, which has been increasingly and widely used to address numerous problems in a variety of fields resulting in state-of-the-art results. Our effective deep convolutional neural network is able to achieve an independent test accuracy of 97.5%, 96.4%, and 96.1% for each superfamily, respectively. Compared to other state-of-the-art methods, our approach showed a significant improvement in performance across a range of evaluation metrics. Through the proposed study, we provide an effective model for classifying motor proteins and a basis for further research that can enhance the performance of protein function classification using deep learning.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639798, Singapore.
| | - Edward Kien Yee Yapp
- Singapore Institute of Manufacturing Technology, 2 Fusionopolis Way, #08-04, Innovis, 138634, Singapore
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, 32003, Taiwan
| | - Hui-Yuan Yeh
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639798, Singapore.
| |
Collapse
|
44
|
Le NQK, Yapp EKY, Ho QT, Nagasundaram N, Ou YY, Yeh HY. iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding. Anal Biochem 2019; 571:53-61. [PMID: 30822398 DOI: 10.1016/j.ab.2019.02.017] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Revised: 02/17/2019] [Accepted: 02/19/2019] [Indexed: 12/22/2022]
Abstract
An enhancer is a short (50-1500bp) region of DNA that plays an important role in gene expression and the production of RNA and proteins. Genetic variation in enhancers has been linked to many human diseases, such as cancer, disorder or inflammatory bowel disease. Due to the importance of enhancers in genomics, the classification of enhancers has become a popular area of research in computational biology. Despite the few computational tools employed to address this problem, their resulting performance still requires improvements. In this study, we treat enhancers by the word embeddings, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to classify them. We present iEnhancer-5Step, a web server containing two-layer classifiers to identify enhancers and their strength. We are able to attain an independent test accuracy of 79% and 63.5% in the two layers, respectively. Compared to current predictors on the same dataset, our proposed method is able to yield superior performance as compared to the other methods. Moreover, this study provides a basis for further research that can enrich the field of applying natural language processing techniques in biological sequences. iEnhancer-5Step is freely accessible via http://biologydeep.com/fastenc/.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639798, Singapore.
| | - Edward Kien Yee Yapp
- Singapore Institute of Manufacturing Technology, 2 Fusionopolis Way, #08-04, Innovis, 138634, Singapore
| | - Quang-Thai Ho
- Department of Computer Science and Engineering, Yuan Ze University, 32003, Taiwan
| | - N Nagasundaram
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639798, Singapore
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, 32003, Taiwan
| | - Hui-Yuan Yeh
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639798, Singapore.
| |
Collapse
|
45
|
Le NQK, Nguyen VN. SNARE-CNN: a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data. PeerJ Comput Sci 2019; 5:e177. [PMID: 33816830 PMCID: PMC7924420 DOI: 10.7717/peerj-cs.177] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 02/06/2019] [Indexed: 05/04/2023]
Abstract
Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predict SNARE proteins, which is one of the most vital molecular functions in life science. A functional loss of SNARE proteins has been implicated in a variety of human diseases (e.g., neurodegenerative, mental illness, cancer, and so on). Therefore, creating a precise model to identify their functions is a crucial problem for understanding these diseases, and designing the drug targets. Our SNARE-CNN model which uses two-dimensional convolutional neural networks and position-specific scoring matrix profiles could identify SNARE proteins with achieved sensitivity of 76.6%, specificity of 93.5%, accuracy of 89.7%, and MCC of 0.7 in cross-validation dataset. We also evaluate the performance of our model via an independent dataset and the result shows that we are able to solve the overfitting problem. Compared with other state-of-the-art methods, this approach achieved significant improvement in all of the metrics. Throughout the proposed study, we provide an effective model for identifying SNARE proteins and a basis for further research that can apply deep learning in bioinformatics, especially in protein function prediction. SNARE-CNN are freely available at https://github.com/khanhlee/snare-cnn.
Collapse
Affiliation(s)
| | - Van-Nui Nguyen
- University of Information and Communication Technology, Thai Nguyen University, Thai Nguyen, Vietnam
| |
Collapse
|
46
|
Jayapriya K, Mary NAB. Employing a novel 2-gram subgroup intra pattern (2GSIP) with stacked auto encoder for membrane protein classification. Mol Biol Rep 2019; 46:2259-2272. [PMID: 30778923 DOI: 10.1007/s11033-019-04680-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Accepted: 02/07/2019] [Indexed: 12/01/2022]
Abstract
Cell membrane proteins play an essentially significant function in manipulating the behaviour of cells. Examination of amino acid sequences can put forward useful insights into the tertiary structures of proteins and their biological functions. One of the important problems in amino acid analysis is the uncertainty to establish a digital coding system to better reflect the properties of amino acids and their degeneracy. In order to overcome the demerits, the proposed method is a novel representation of protein sequences that incorporates a new feature named 2-gram subgroup intra pattern. The functional types of membrane protein classification will be supportive to explain the biological functions of membrane proteins. For classification, Stacked Auto Encoder Deep learning method is applied. The performance of the proposed method is evaluated on two benchmark data sets. The results were experimented using the Self-consistency test, Accuracy, Specificity, Sensitivity, Mathew's correlation coefficient, Jackknife test and Independent data set are the tests in which the proposed method outperformed other existing techniques generally used in literatures.
Collapse
Affiliation(s)
- K Jayapriya
- Vin Solutions, Tirunelveli, Tamilnadu, India.
| | | |
Collapse
|
47
|
Jalalian SH, Ramezani M, Jalalian SA, Abnous K, Taghdisi SM. Exosomes, new biomarkers in early cancer detection. Anal Biochem 2019; 571:1-13. [PMID: 30776327 DOI: 10.1016/j.ab.2019.02.013] [Citation(s) in RCA: 101] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2018] [Revised: 01/26/2019] [Accepted: 02/13/2019] [Indexed: 02/07/2023]
Abstract
Exosomes are endosomal-derived vesicles, playing a major role in cell-to-cell communication. Multiple cells secret these vesicles to induce and inhibit different cellular and molecular pathways. Cancer-derived exosomes have been shown to affect development of cancer in different stages and contribute to the recruitment and reprogramming of both proximal and distal tissues. The growing interest in defining the clinical relevance of these nano-sized particles in cancers, has led to the identification of either tissue- or disease-specific exosomal contents, such as nucleic acids, proteins and lipids as a source of new biomarkers which propose the diagnostic potentials of exosomes in early detection of cancers. In this review, we have discussed some aspects of exosomes including their contents, applications and isolation techniques in the field of early cancer detection. Although, exosomes are considered as ideal biomarkers in cancer diagnosis, due to their unique characteristics, there is still a long way in the development of exosome-based assays.
Collapse
Affiliation(s)
- Seyed Hamid Jalalian
- Pharmaceutical Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran; Students Research Committee, Department of Pharmaceutical Nanotechnology, School of Pharmacy, Mashhad University of Medical Sciences, Mashhad, Iran; Academic Center for Education, Culture and Research (ACECR)-Mashhad Branch, Mashhad, Iran
| | - Mohammad Ramezani
- Nanotechnology Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Seyed Ali Jalalian
- Students Research Committee, Department of Pharmaceutical Nanotechnology, School of Pharmacy, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Khalil Abnous
- Pharmaceutical Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran.
| | - Seyed Mohammad Taghdisi
- Targeted Drug Delivery Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran; Department of Pharmaceutical Biotechnology, School of Pharmacy, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
48
|
A TOPSIS multi-criteria decision method-based intelligent recurrent wavelet CMAC control system design for MIMO uncertain nonlinear systems. Neural Comput Appl 2018. [DOI: 10.1007/s00521-018-3795-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|