1
|
Abbasi Holasou H, Panahi B, Shahi A, Nami Y. Integration of machine learning models with microsatellite markers: New avenue in world grapevine germplasm characterization. Biochem Biophys Rep 2024; 38:101678. [PMID: 38495412 PMCID: PMC10940787 DOI: 10.1016/j.bbrep.2024.101678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 02/09/2024] [Accepted: 02/27/2024] [Indexed: 03/19/2024] Open
Abstract
Development of efficient analytical techniques is required for effective interpretation of biological data to take novel hypotheses and finding the critical predictive patterns. Machine Learning algorithms provide a novel opportunity for development of low-cost and practical solutions in biology. In this study, we proposed a new integrated analytical approach using supervised machine learning algorithms and microsatellites data of worldwide vitis populations. A total of 1378 wild (V. vinifera spp. sylvestris) and cultivated (V. vinifera spp. sativa) accessions of grapevine were investigated using 20 microsatellite markers. Data cleaning, feature selection, and supervised machine learning classification models vis, Naive Bayes, Support Vector Machine (SVM) and Tree Induction methods were implied to find most indicative and diagnostic alleles to represent wild/cultivated and originated geography of each population. Our combined approaches showed microsatellite markers with the highest differentiating capacity and proved efficiency for our pipeline of classification and prediction of vitis accessions. Moreover, our study proposed the best combination of markers for better distinguishing of populations, which can be exploited in future germplasm conservation and breeding programs.
Collapse
Affiliation(s)
- Hossein Abbasi Holasou
- Department of Plant Breeding and Biotechnology, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
| | - Bahman Panahi
- Department of Genomics, Branch for Northwest and West Region, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran
| | - Ali Shahi
- Faculty of Agriculture (Meshgin Shahr Campus), Mohaghegh Ardabili University, Ardabil, Iran
| | - Yousef Nami
- Department of Food Biotechnology, Branch for Northwest and West Region, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran
| |
Collapse
|
2
|
Mu D, Sun D, Qian X, Ma X, Qiu L, Cheng X, Yu S. Steroid profiling in adrenal disease. Clin Chim Acta 2024; 553:117749. [PMID: 38169194 DOI: 10.1016/j.cca.2023.117749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 12/26/2023] [Accepted: 12/27/2023] [Indexed: 01/05/2024]
Abstract
The measurement of steroid hormones in blood and urine, which reflects steroid biosynthesis and metabolism, has been recognized as a valuable tool for identifying and distinguishing steroidogenic disorders. The application of mass spectrometry enables the reliable and simultaneous analysis of large panels of steroids, ushering in a new era for diagnosing adrenal diseases. However, the interpretation of complex hormone results necessitates the expertise and experience of skilled clinicians. In this scenario, machine learning techniques are gaining worldwide attention within healthcare fields. The clinical values of combining mass spectrometry-based steroid profiles analysis with machine learning models, also known as steroid metabolomics, have been investigated for identifying and discriminating adrenal disorders such as adrenocortical carcinomas, adrenocortical adenomas, and congenital adrenal hyperplasia. This promising approach is expected to lead to enhanced clinical decision-making in the field of adrenal diseases. This review will focus on the clinical performances of steroid profiling, which is measured using mass spectrometry and analyzed by machine learning techniques, in the realm of decision-making for adrenal diseases.
Collapse
Affiliation(s)
- Danni Mu
- Department of Laboratory Medicine, Peking Union Medical College Hospital, Peking Union Medical College & Chinese Academy of Medical Science, Beijing 100730, China
| | - Dandan Sun
- Department of Laboratory Medicine, Peking Union Medical College Hospital, Peking Union Medical College & Chinese Academy of Medical Science, Beijing 100730, China
| | - Xia Qian
- Department of Laboratory Medicine, Peking Union Medical College Hospital, Peking Union Medical College & Chinese Academy of Medical Science, Beijing 100730, China
| | - Xiaoli Ma
- Department of Laboratory Medicine, Peking Union Medical College Hospital, Peking Union Medical College & Chinese Academy of Medical Science, Beijing 100730, China
| | - Ling Qiu
- Department of Laboratory Medicine, Peking Union Medical College Hospital, Peking Union Medical College & Chinese Academy of Medical Science, Beijing 100730, China; State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, China.
| | - Xinqi Cheng
- Department of Laboratory Medicine, Peking Union Medical College Hospital, Peking Union Medical College & Chinese Academy of Medical Science, Beijing 100730, China.
| | - Songlin Yu
- Department of Laboratory Medicine, Peking Union Medical College Hospital, Peking Union Medical College & Chinese Academy of Medical Science, Beijing 100730, China.
| |
Collapse
|
3
|
Choudhary A, Anand A, Singh A, Roy P, Singh N, Kumar V, Sharma S, Baranwal M. Machine learning-based ensemble approach in prediction of lung cancer predisposition using XRCC1 gene polymorphism. J Biomol Struct Dyn 2023:1-10. [PMID: 37545160 DOI: 10.1080/07391102.2023.2242492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 07/23/2023] [Indexed: 08/08/2023]
Abstract
The employment of machine learning approaches has shown promising results in predicting cancer. In the current study, polymorphisms data of five single nucleotide polymorphisms (SNPs) of DNA repair gene XRCC1 (XRCC1 399, XRCC1 194, XRCC1 206, XRCC1 632, XRCC1 280) of the north Indian population along with four smoking status data is considered as an input to the proposed ensemble model to predict the risk of individual susceptibility to the lung cancer. The prediction accuracy of the proposed ensemble model for cancer predisposition was found to be 85%. The model performance is also evaluated using sensitivity, specificity, precision and the Gini index, which is found in the range of 0.83-0.87. The proposed model also outperformed in all evaluation parameters when compared with the individual Model (LM, SVM, RF, KNN and baseline neural net). Collectively, current results suggest the potential of the proposed ensemble model in predicting the risk of cancer based on XRCC1 SNPs data.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Abhishek Choudhary
- Department of Computer Science, Thapar Institute of Engineering & Technology, India
| | - Adarsh Anand
- Department of Electronics & Communication Engineering, Thapar Institute of Engineering & Technology, India
| | - Amrita Singh
- Department of Biotechnology, Thapar Institute of Engineering & Technology, Patiala, Punjab, India
| | - Pratima Roy
- Department of Biotechnology, Thapar Institute of Engineering & Technology, Patiala, Punjab, India
| | - Navneet Singh
- Department of Pulmonary Medicine, Post Graduate Institute of Education and Medical Research (PGIMER), Chandigarh, India
| | - Vinay Kumar
- Department of Electronics & Communication Engineering, Thapar Institute of Engineering & Technology, India
| | - Siddharth Sharma
- Department of Biotechnology, Thapar Institute of Engineering & Technology, Patiala, Punjab, India
| | - Manoj Baranwal
- Department of Biotechnology, Thapar Institute of Engineering & Technology, Patiala, Punjab, India
| |
Collapse
|
4
|
Lung cancer prediction using multi-gene genetic programming by selecting automatic features from amino acid sequences. Comput Biol Chem 2022; 98:107638. [DOI: 10.1016/j.compbiolchem.2022.107638] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 12/22/2021] [Accepted: 02/01/2022] [Indexed: 02/07/2023]
|
5
|
ALAKUŞ TB, TÜRKOĞLU İ. Kanser Teşhisinde Protein Haritalama Tekniklerinin Başarımlarının Derin Öğrenme Kullanılarak Karşılaştırılması. FIRAT ÜNIVERSITESI MÜHENDISLIK BILIMLERI DERGISI 2021; 33:547-565. [DOI: 10.35234/fumbd.881228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Kanser, dünya çapında çoğu insanın ölmesine neden olan ve birçok farklı alt tiplerden oluşan heterojen bir hastalıktır. Bir kanser türünün erken teşhisi ve prognozu, hastaların sonraki klinik takibini kolaylaştırabildiği için kanser araştırmalarında bir gereklilik haline gelmiştir. Bunun için en çok kullanılan yöntemlerden birisi histolojik incelemedir. Ancak bu yöntemde çok sayıda gözlemciler arası değişkenlik bulunmakta, bu ise inceleme sürecinin uzun olmasına ve zaman almasına neden olmaktadır. Bu dezavantajın önüne geçmek için araştırmacılar hesaplama-tabanlı yaklaşımlara yönelmişler ve kanserli proteinlerin belirlenmesi için protein-protein etkileşimleri, protein etkileşim ağları ve moleküler parmak izleri yöntemlerinden yararlanmaktadırlar. Bu yöntemler arasında, çeşitli çalışmalar genomik bilgilerden de kanserli hücrelerin tespit edilebildiğini göstermiştir. Kansere ait genlerin dizilimlerine göre belirli kanser türlerinin belirlenebildiği ve bu süreçte yapay öğrenme tabanlı yaklaşımların etkili olduğu görülmüştür. Bu çalışmada, derin öğrenme algoritmalarından birisi olan tekrarlayıcı sinir ağı mimarisi kullanılmış ve insana ait mesane, kolon ve prostat kanserlerinin, protein dizilimlerine göre sınıflandırılması yapılmıştır. Çalışma, verilerin elde edilmesi, protein dizilimlerinin sayısallaştırılması, derin öğrenme model uygulamasının geliştirilmesi ve protein haritalama tekniklerinin başarımının karşılaştırılması olmak üzere dört aşamadan meydana gelmektedir. Protein dizilimlerini sayısallaştırmak için AESNN1, hidrofobiklik, tam sayı, Miyazawa enerjileri ve rastgele kodlama yöntemleri ele alınmıştır. Çalışmanın sonunda, mesane kanseri için en yüksek doğruluk değeri %87.15 ile AESNN1 haritalama yöntemiyle, kolon kanseri ve prostat kanseri için ise en yüksek doğruluk değeri sırasıyla %94.40 ve %75.45 olarak Miyazawa enerjileri ve rastgele kodlama protein haritalama yöntemi ile elde edilmiştir. Bu çalışma ile yapay öğrenme ve protein haritalama tekniklerinin, kanserli protein dizilimlerinin belirlenmesinde etkili olduğu gözlemlenmiştir.
Collapse
|
6
|
Ahsan R, Tahsili MR, Ebrahimi F, Ebrahimie E, Ebrahimi M. Image processing unravels the evolutionary pattern of SARS-CoV-2 against SARS and MERS through position-based pattern recognition. Comput Biol Med 2021; 134:104471. [PMID: 34004573 PMCID: PMC8106241 DOI: 10.1016/j.compbiomed.2021.104471] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 04/27/2021] [Accepted: 05/02/2021] [Indexed: 12/16/2022]
Abstract
SARS-COV-2, Severe Acute Respiratory Syndrome (SARS), and the Middle East respiratory syndrome-related coronavirus (MERS) viruses are from the coronaviridae family; the former became a global pandemic (with low mortality rate) while the latter were confined to a limited region (with high mortality rates). To investigate the possible structural differences at basic levels for the three viruses, genomic and proteomic sequences were downloaded and converted to polynomial datasets. Seven attribute weighting (feature selection) models were employed to find the key differences in their genome's nucleotide sequence. Most attribute weighting models selected the final nucleotide sequences (from 29,000th nucleotide positions to the end of the genome) as significantly different among the three virus classes. The genome and proteome sequences of this hot zone area (which corresponds to the 3'UTR region and encodes for nucleoprotein (N)) and Spike (S) protein sequences (as the most important viral protein) were converted into binary images and were analyzed by image processing techniques and Convolutional deep Neural Network (CNN). Although the predictive accuracy of CNN for Spike (S) proteins was low (0.48%), the machine-based learning algorithms were able to classify the three members of coronaviridae viruses with 100% accuracy based on 3'UTR region. For the first time ever, the relationship between the possible structural differences of coronaviruses at the sequential levels and their pathogenesis are being reported, which paves the road to deciphering the high pathogenicity of the SARS-COV-2 virus.
Collapse
Affiliation(s)
- Reza Ahsan
- Department of Computer Engineering, Qom Branch, Islamic Azad University, Qom, Iran
| | | | - Faezeh Ebrahimi
- Faculty of Life Sciences and Biotechnology, Department of Microbiology and Microbial Biotechnology, Shahid Beheshti University, Tehran, Iran
| | - Esmaeil Ebrahimie
- Genomics Research Platform, School of Life Sciences, College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, 3086, Australia,School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran,School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia,Corresponding author. Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| |
Collapse
|
7
|
Wang Z, Sun J, Sun Y, Gu Y, Xu Y, Zhao B, Yang M, Yao G, Zhou Y, Li Y, Du D, Zhao H. Machine Learning Algorithm Guiding Local Treatment Decisions to Reduce Pain for Lung Cancer Patients with Bone Metastases, a Prospective Cohort Study. Pain Ther 2021; 10:619-633. [PMID: 33740239 PMCID: PMC8119531 DOI: 10.1007/s40122-021-00251-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Accepted: 02/23/2021] [Indexed: 01/02/2023] Open
Abstract
INTRODUCTION As life expectancy increases for lung cancer patients with bone metastases, the need for personalized local treatment to reduce pain is expanding. METHODS Patients were treated by a multidisciplinary team (MDT), and local treatment including surgery, percutaneous osteoplasty, or radiation. Visual analog scale (VAS) and quality of life (QoL) scores were analyzed. VAS at 12 weeks after treatment was the main outcome. We developed and tested machine learning models to predict which patients should receive local treatment. Model discrimination was evaluated by the area under curve (AUC), and the best model was used for prospective decision-making accuracy validation. RESULTS Under the direction of MDT, 161 patients in the training set, 32 patients in the test set, and 36 patients in the validation set underwent local treatment. VAS in surgery, percutaneous osteoplasty, and radiation groups decreased significantly to 4.78 ± 1.28, 4.37 ± 1.36, and 5.39 ± 1.31 at 12 weeks, respectively (p < 0.05), with no significant differences among the three datasets, and improved QoL was also observed (p < 0.05). A decision tree (DT) model that included VAS, bone metastases character, Frankel classification, Mirels score, age, driver gene, aldehyde dehydrogenase 2, and enolase 1 expression had a best AUC in predicting whether patients would receive local treatment of 0.92 (95% CI 0.89-0.94) in the training set, 0.85 (95% CI 0.77-0.94) in the test set, and 0.88 (95% CI 0.81-0.96) in the validation set. CONCLUSION Local treatment provided significant pain relief and improved QoL. There were no significant differences in reducing pain and improving QoL among training, test, and validation sets. The DT model was best at determining whether patients should receive local treatment. Our machine learning model can help guide clinicians to make local treatment decisions to reduce pain. TRIAL REGISTRATION Trial registration number ChiCRT-ROC-16009501.
Collapse
Affiliation(s)
- Zhiyu Wang
- Department of Internal Oncology, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiaotong University, Shanghai, People's Republic of China
| | - Jing Sun
- Department of Internal Oncology, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiaotong University, Shanghai, People's Republic of China
| | - Yi Sun
- Department of Radiation, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiaotong University, Shanghai, People's Republic of China
| | - Yifeng Gu
- Department of Intervention, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiaotong University, Shanghai, People's Republic of China
| | - Yongming Xu
- Department of Pain, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiaotong University, Shanghai, People's Republic of China
| | - Bizeng Zhao
- Department of Orthopaedics, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiaotong University, Shanghai, People's Republic of China
| | - Mengdi Yang
- Department of Internal Oncology, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiaotong University, Shanghai, People's Republic of China
| | - Guangyu Yao
- Department of Internal Oncology, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiaotong University, Shanghai, People's Republic of China
| | - Yiyi Zhou
- Department of Internal Oncology, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiaotong University, Shanghai, People's Republic of China
| | - Yuehua Li
- Department of Intervention, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiaotong University, Shanghai, People's Republic of China
| | - Dongping Du
- Department of Pain, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiaotong University, Shanghai, People's Republic of China.
| | - Hui Zhao
- Department of Internal Oncology, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiaotong University, Shanghai, People's Republic of China.
| |
Collapse
|
8
|
Yang L, Liu Q, Zhao Q, Zhu X, Wang L. Machine learning is a valid method for predicting prehospital delay after acute ischemic stroke. Brain Behav 2020; 10:e01794. [PMID: 32812396 PMCID: PMC7559608 DOI: 10.1002/brb3.1794] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 07/15/2020] [Accepted: 07/20/2020] [Indexed: 12/27/2022] Open
Abstract
OBJECTIVES This study aimed to identify the influencing factors associated with long onset-to-door time and establish predictive models that could help to assess the probability of prehospital delay in populations with a high risk for stroke. MATERIALS AND METHODS Patients who were diagnosed with acute ischemic stroke (AIS) and hospitalized between 1 November 2018 and 31 July 2019 were interviewed, and their medical records were extracted for data analysis. Two machine learning algorithms (support vector machine and Bayesian network) were applied in this study, and their predictive performance was compared with that of the classical logistic regression models after using several variable selection methods. Timely admission (onset-to-door time < 3 hr) and prehospital delay (onset-to-door time ≥ 3 hr) were the outcome variables. We computed the area under curve (AUC) and the difference in the mean AUC values between the models. RESULTS A total of 450 patients with AIS were enrolled; 57 (12.7%) with timely admission and 393 (87.3%) patients with prehospital delay. All models, both those constructed by logistic regression and those by machine learning, performed well in predicting prehospital delay (range mean AUC: 0.800-0.846). The difference in the mean AUC values between the best performing machine learning model and the best performing logistic regression model was negligible (0.014; 95% CI: 0.013-0.015). CONCLUSIONS Machine learning algorithms were not inferior to logistic regression models for prediction of prehospital delay after stroke. All models provided good discrimination, thereby creating valuable diagnostic programs for prehospital delay prediction.
Collapse
Affiliation(s)
- Li Yang
- School of Nursing, Qingdao University, Qingdao, China
| | - Qinqin Liu
- School of Nursing, The second Affiliated Hospital of Harbin Medical University, Harbin Medical University, Harbin, China
| | - Qiuli Zhao
- School of Nursing, The second Affiliated Hospital of Harbin Medical University, Harbin Medical University, Harbin, China
| | - Xuemei Zhu
- School of Nursing, The second Affiliated Hospital of Harbin Medical University, Harbin Medical University, Harbin, China
| | - Ling Wang
- School of Nursing, The second Affiliated Hospital of Harbin Medical University, Harbin Medical University, Harbin, China
| |
Collapse
|
9
|
Machine Learning Analysis of Image Data Based on Detailed MR Image Reports for Nasopharyngeal Carcinoma Prognosis. BIOMED RESEARCH INTERNATIONAL 2020; 2020:8068913. [PMID: 32149139 PMCID: PMC7054759 DOI: 10.1155/2020/8068913] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2019] [Accepted: 01/16/2020] [Indexed: 11/17/2022]
Abstract
We aimed to assess the use of automatic machine learning (AutoML) algorithm based on magnetic resonance (MR) image data to assign prediction scores to patients with nasopharyngeal carcinoma (NPC). We also aimed to develop a 4-group classification system for NPC, superior to the current clinical staging system. Between January 2010 and January 2013, 792 patients with recent diagnosis of NPC, who had MR image data, were enrolled in the study. The AutoML algorithm was used and all statistical analyses were based on the 10-fold test. Primary endpoints included the probabilities of overall survival (OS), distant metastasis-free survival (DMFS), and local-region relapse-free survival (LRFS), and their sum was recorded as the final voting score, representative of progression-free survival (PFS) for each patient. The area under the receiver operating characteristic (ROC) curve generated from the MR image data-based model compared with the tumor, node, and metastasis (TNM) system-based model was 0.796 (P=0.008) for OS, 0.752 (P=0.053) for DMFS, and 0.721 (P=0.025) for LRFS. The Kaplan-Meier (KM) test values for II/I, III/II, IV/III groups in our new machine learning-based scoring system were 0.011, 0.010, and <0.001, respectively, whereas those for II/I, III/II, IV/III groups in the TNM/American Joint Committee on Cancer (AJCC) system were 0.118, 0.121, and <0.001, respectively. Significant differences were observed in the new machine learning-based scoring system analysis of each curve (P < 0.05), whereas the P values of curves obtained from the TNM/AJCC system, between II/I and III/II, were 0.118 and 0.121, respectively, without a significant difference. In conclusion, the AutoML algorithm demonstrated better prognostic performance than the TNM/AJCC system for NPC. The algorithm showed a good potential for clinical application and may aid in improving counseling and facilitate the personalized management of patients with NPC. The clinical application of our new scoring and staging system may significantly improve precision medicine.
Collapse
|
10
|
Lung Cancer Prediction Using Stochastic Diffusion Search (SDS) Based Feature Selection and Machine Learning Methods. Neural Process Lett 2020. [DOI: 10.1007/s11063-020-10192-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
11
|
Shahid AH, Singh M. Computational intelligence techniques for medical diagnosis and prognosis: Problems and current developments. Biocybern Biomed Eng 2019. [DOI: 10.1016/j.bbe.2019.05.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
12
|
Accuracy Enhanced Lung Cancer Prognosis for Improving Patient Survivability Using Proposed Gaussian Classifier System. J Med Syst 2019; 43:201. [DOI: 10.1007/s10916-019-1297-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 04/16/2019] [Indexed: 10/26/2022]
|
13
|
Sattar M, Majid A. Lung Cancer Classification Models Using Discriminant Information of Mutated Genes in Protein Amino Acids Sequences. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2019. [DOI: 10.1007/s13369-018-3468-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
14
|
Xie J, Lu D, Li J, Wang J, Zhang Y, Li Y, Nie Q. Kernel differential subgraph reveals dynamic changes in biomolecular networks. J Bioinform Comput Biol 2017; 16:1750027. [PMID: 29281952 DOI: 10.1142/s0219720017500275] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Many major diseases, including various types of cancer, are increasingly threatening human health. However, the mechanisms of the dynamic processes underlying these diseases remain ambiguous. From the holistic perspective of systems science, complex biological networks can reveal biological phenomena. Changes among networks in different states influence the direction of living organisms. The identification of the kernel differential subgraph (KDS) that leads to drastic changes is critical. The existing studies contribute to the identification of a KDS in networks with the same nodes; however, networks in different states involve the disappearance of some nodes or the appearance of some new nodes. In this paper, we propose a new topology-based KDS (TKDS) method to explore the core module from gene regulatory networks with different nodes in this process. For the common nodes, the TKDS method considers the differential value (D-value) of the topological change. For the different nodes, TKDS identifies the most similar gene pairs and computes the D-value. Hence, TKDS discovers the essential KDS, which considers the relationships between the same nodes as well as different nodes. After applying this method to non-small cell lung cancer (NSCLC), we identified 30 genes that are most likely related to NSCLC and extracted the KDSs in both the cancer and normal states. Two significance functional modules were revealed, and gene ontology (GO) analyses and literature mining indicated that the KDSs are essential to the processes in NSCLC. In addition, compared with existing methods, TKDS provides a unique perspective in identifying particular genes and KDSs related to NSCLC. Moreover, TKDS has the potential to predict other critical disease-related genes and modules.
Collapse
Affiliation(s)
- Jiang Xie
- * School of Computer Engineering and Science, Shanghai University, 99 Shang Da Road, Shanghai 200444, P. R. China
| | - Dongfang Lu
- * School of Computer Engineering and Science, Shanghai University, 99 Shang Da Road, Shanghai 200444, P. R. China
| | - Jiaxin Li
- * School of Computer Engineering and Science, Shanghai University, 99 Shang Da Road, Shanghai 200444, P. R. China
| | - Jiao Wang
- † Laboratory of Molecular Neural Biology, School of Life Sciences, Shanghai University, 99 Shang Da Road, Shanghai 200444, P. R. China
| | - Yong Zhang
- ‡ Pulmonary Department, Zhongshan Hospital, Fudan University, 180 Fenglin Road, Shanghai 200032, P. R. China
| | - Yanhui Li
- ‡ Pulmonary Department, Zhongshan Hospital, Fudan University, 180 Fenglin Road, Shanghai 200032, P. R. China
| | - Qing Nie
- § Department of Mathematics, University of California, Irvine, Irvine, California, USA
| |
Collapse
|
15
|
Wang Z, Wen X, Lu Y, Yao Y, Zhao H. Exploiting machine learning for predicting skeletal-related events in cancer patients with bone metastases. Oncotarget 2017; 7:12612-22. [PMID: 26871471 PMCID: PMC4914308 DOI: 10.18632/oncotarget.7278] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2015] [Accepted: 01/24/2016] [Indexed: 12/03/2022] Open
Abstract
The aim of the bone metastases (BM) treatment is to prevent the occurrence of skeletal-related events (SREs). In clinical, physicians could only predict the occurrence of SREs by subjective experience. Machine learning (ML) could be used as predictive models in the medical field. But there is no published research using ML to predict SREs in cancer patients with BM. The purpose of this study was to assess the associations of clinical variables with the occurrence of SREs and to subsequently develop prediction models to help identify SREs risk groups. We analyzed 1143 cancer patients with BM. We used the statistical package of SPSS and SPSS Modeler for data analysis and the development of the prediction model. We compared the performance of logistic regression (LR), decision tree (DT) and support vector machine(SVM). The results suggested that Visual Analog Scale (VAS) scale was a key factor to SREs in LR, DT and SVM model. Modifiable factors such as Frankel classification, Mirels score, Ca, aminoterminal propeptide of type I collagen (PINP) and bone-specific alkaline phosphatase (BALP) were identified. We found that the result of applying LR, DT and SVM classification accuracy was 79.2%, 85.8% and 88.2%, with 9, 4 and 8 variables, respectively. In conclusion, DT and SVM achieved higher accuracies with smaller number of variables than the number of variables used in LR. ML techniques can be used to build model to predict SREs in cancer patients with BM.
Collapse
Affiliation(s)
- Zhiyu Wang
- Department of Internal Oncology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Xiaoting Wen
- Department of Internal Oncology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Yaohong Lu
- Department of Internal Oncology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Yang Yao
- Department of Internal Oncology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Hui Zhao
- Department of Internal Oncology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| |
Collapse
|
16
|
Podolsky MD, Barchuk AA, Kuznetcov VI, Gusarova NF, Gaidukov VS, Tarakanov SA. Evaluation of Machine Learning Algorithm Utilization for Lung Cancer Classification Based on Gene Expression Levels. Asian Pac J Cancer Prev 2017; 17:835-8. [PMID: 26925688 DOI: 10.7314/apjcp.2016.17.2.835] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Lung cancer remains one of the most common cancers in the world, both in terms of new cases (about 13% of total per year) and deaths (nearly one cancer death in five), because of the high case fatality. Errors in lung cancer type or malignant growth determination lead to degraded treatment efficacy, because anticancer strategy depends on tumor morphology. MATERIALS AND METHODS We have made an attempt to evaluate effectiveness of machine learning algorithms in the task of lung cancer classification based on gene expression levels. We processed four publicly available data sets. The Dana-Farber Cancer Institute data set contains 203 samples and the task was to classify four cancer types and sound tissue samples. With the University of Michigan data set of 96 samples, the task was to execute a binary classification of adenocarcinoma and non-neoplastic tissues. The University of Toronto data set contains 39 samples and the task was to detect recurrence, while with the Brigham and Women's Hospital data set of 181 samples it was to make a binary classification of malignant pleural mesothelioma and adenocarcinoma. We used the k-nearest neighbor algorithm (k=1, k=5, k=10), naive Bayes classifier with assumption of both a normal distribution of attributes and a distribution through histograms, support vector machine and C4.5 decision tree. Effectiveness of machine learning algorithms was evaluated with the Matthews correlation coefficient. RESULTS The support vector machine method showed best results among data sets from the Dana-Farber Cancer Institute and Brigham and Women's Hospital. All algorithms with the exception of the C4.5 decision tree showed maximum potential effectiveness in the University of Michigan data set. However, the C4.5 decision tree showed best results for the University of Toronto data set. CONCLUSIONS Machine learning algorithms can be used for lung cancer morphology classification and similar tasks based on gene expression level evaluation.
Collapse
|
17
|
Azzawi H, Hou J, Xiang Y, Alanni R. Lung cancer prediction from microarray data by gene expression programming. IET Syst Biol 2016; 10:168-178. [PMID: 27762231 PMCID: PMC8687242 DOI: 10.1049/iet-syb.2015.0082] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2015] [Revised: 04/20/2016] [Accepted: 04/20/2016] [Indexed: 01/20/2023] Open
Abstract
Lung cancer is a leading cause of cancer-related death worldwide. The early diagnosis of cancer has demonstrated to be greatly helpful for curing the disease effectively. Microarray technology provides a promising approach of exploiting gene profiles for cancer diagnosis. In this study, the authors propose a gene expression programming (GEP)-based model to predict lung cancer from microarray data. The authors use two gene selection methods to extract the significant lung cancer related genes, and accordingly propose different GEP-based prediction models. Prediction performance evaluations and comparisons between the authors' GEP models and three representative machine learning methods, support vector machine, multi-layer perceptron and radial basis function neural network, were conducted thoroughly on real microarray lung cancer datasets. Reliability was assessed by the cross-data set validation. The experimental results show that the GEP model using fewer feature genes outperformed other models in terms of accuracy, sensitivity, specificity and area under the receiver operating characteristic curve. It is concluded that GEP model is a better solution to lung cancer prediction problems.
Collapse
Affiliation(s)
- Hasseeb Azzawi
- School of Information Technology, Deakin University, Victoria, Australia.
| | - Jingyu Hou
- School of Information Technology, Deakin University, Victoria, Australia
| | - Yong Xiang
- School of Information Technology, Deakin University, Victoria, Australia
| | - Russul Alanni
- School of Information Technology, Deakin University, Victoria, Australia
| |
Collapse
|
18
|
Yu Z, Lu H, Si H, Liu S, Li X, Gao C, Cui L, Li C, Yang X, Yao X. A Highly Efficient Gene Expression Programming (GEP) Model for Auxiliary Diagnosis of Small Cell Lung Cancer. PLoS One 2015; 10:e0125517. [PMID: 25996920 PMCID: PMC4440826 DOI: 10.1371/journal.pone.0125517] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Accepted: 03/24/2015] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Lung cancer is an important and common cancer that constitutes a major public health problem, but early detection of small cell lung cancer can significantly improve the survival rate of cancer patients. A number of serum biomarkers have been used in the diagnosis of lung cancers; however, they exhibit low sensitivity and specificity. METHODS We used biochemical methods to measure blood levels of lactate dehydrogenase (LDH), C-reactive protein (CRP), Na+, Cl-, carcino-embryonic antigen (CEA), and neuron specific enolase (NSE) in 145 small cell lung cancer (SCLC) patients and 155 non-small cell lung cancer and 155 normal controls. A gene expression programming (GEP) model and Receiver Operating Characteristic (ROC) curves incorporating these biomarkers was developed for the auxiliary diagnosis of SCLC. RESULTS After appropriate modification of the parameters, the GEP model was initially set up based on a training set of 115 SCLC patients and 125 normal controls for GEP model generation. Then the GEP was applied to the remaining 60 subjects (the test set) for model validation. GEP successfully discriminated 281 out of 300 cases, showing a correct classification rate for lung cancer patients of 93.75% (225/240) and 93.33% (56/60) for the training and test sets, respectively. Another GEP model incorporating four biomarkers, including CEA, NSE, LDH, and CRP, exhibited slightly lower detection sensitivity than the GEP model, including six biomarkers. We repeat the models on artificial neural network (ANN), and our results showed that the accuracy of GEP models were higher than that in ANN. GEP model incorporating six serum biomarkers performed by NSCLC patients and normal controls showed low accuracy than SCLC patients and was enough to prove that the GEP model is suitable for the SCLC patients. CONCLUSION We have developed a GEP model with high sensitivity and specificity for the auxiliary diagnosis of SCLC. This GEP model has the potential for the wide use for detection of SCLC in less developed regions.
Collapse
Affiliation(s)
- Zhuang Yu
- The Affiliated Hospital of Qingdao University, Department of Oncology, Qingdao, Shandong, P.R. China
| | - Haijiao Lu
- The Affiliated Hospital of Qingdao University, Department of Oncology, Qingdao, Shandong, P.R. China
| | - Hongzong Si
- Institute for Computational Science and Engineering, Laboratory of New Fibrous Materials and Modern Textile, the Growing Base for State Key Laboratory, Department of Pharmacy, Qingdao University, Qingdao, Shandong, P.R. China
| | - Shihai Liu
- The Affiliated Hospital of Qingdao University, The Central Laboratory, Qingdao, Shandong, P.R. China
| | - Xianchao Li
- Department of Pharmacy, Qingdao University, Qingdao, Shandong, P.R. China
| | - Caihong Gao
- The Affiliated Hospital of Qingdao University, Department of Oncology, Qingdao, Shandong, P.R. China
| | - Lianhua Cui
- Department of Public Health, Qingdao University Medical College, Qingdao, Shandong, P.R. China
| | - Chuan Li
- The Affiliated Hospital of Qingdao University, Department of Thoracic Surgery, Qingdao, Shandong, P.R. China
| | - Xue Yang
- The Affiliated Hospital of Qingdao University, Department of Oncology, Qingdao, Shandong, P.R. China
| | - Xiaojun Yao
- Department of Chemistry, Lanzhou University, Lanzhou, Gansu, P.R. China
| |
Collapse
|
19
|
Yang R, Zhang C, Gao R, Zhang L. An ensemble method with hybrid features to identify extracellular matrix proteins. PLoS One 2015; 10:e0117804. [PMID: 25680094 PMCID: PMC4334504 DOI: 10.1371/journal.pone.0117804] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2014] [Accepted: 01/02/2015] [Indexed: 12/29/2022] Open
Abstract
The extracellular matrix (ECM) is a dynamic composite of secreted proteins that play important roles in numerous biological processes such as tissue morphogenesis, differentiation and homeostasis. Furthermore, various diseases are caused by the dysfunction of ECM proteins. Therefore, identifying these important ECM proteins may assist in understanding related biological processes and drug development. In view of the serious imbalance in the training dataset, a Random Forest-based ensemble method with hybrid features is developed in this paper to identify ECM proteins. Hybrid features are employed by incorporating sequence composition, physicochemical properties, evolutionary and structural information. The Information Gain Ratio and Incremental Feature Selection (IGR-IFS) methods are adopted to select the optimal features. Finally, the resulting predictor termed IECMP (Identify ECM Proteins) achieves an balanced accuracy of 86.4% using the 10-fold cross-validation on the training dataset, which is much higher than results obtained by other methods (ECMPRED: 71.0%, ECMPP: 77.8%). Moreover, when tested on a common independent dataset, our method also achieves significantly improved performance over ECMPP and ECMPRED. These results indicate that IECMP is an effective method for ECM protein prediction, which has a more balanced prediction capability for positive and negative samples. It is anticipated that the proposed method will provide significant information to fully decipher the molecular mechanisms of ECM-related biological processes and discover candidate drug targets. For public access, we develop a user-friendly web server for ECM protein identification that is freely accessible at http://iecmp.weka.cc.
Collapse
Affiliation(s)
- Runtao Yang
- School of Control Science and Engineering, Shandong University, Jinan, China
| | - Chengjin Zhang
- School of Control Science and Engineering, Shandong University, Jinan, China
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, China
- * E-mail: (CJZ); (RG)
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, Jinan, China
- * E-mail: (CJZ); (RG)
| | - Lina Zhang
- School of Control Science and Engineering, Shandong University, Jinan, China
| |
Collapse
|
20
|
New layers in understanding and predicting α-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. Comput Biol Med 2014; 54:14-23. [DOI: 10.1016/j.compbiomed.2014.08.019] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Revised: 08/16/2014] [Accepted: 08/17/2014] [Indexed: 12/11/2022]
|
21
|
KayvanJoo AH, Ebrahimi M, Haqshenas G. Prediction of hepatitis C virus interferon/ribavirin therapy outcome based on viral nucleotide attributes using machine learning algorithms. BMC Res Notes 2014; 7:565. [PMID: 25150834 PMCID: PMC4246553 DOI: 10.1186/1756-0500-7-565] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 08/10/2014] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Hepatitis C virus (HCV) causes chronic hepatitis C in 2-3% of world population and remains one of the health threatening human viruses, worldwide. In the absence of an effective vaccine, therapeutic approach is the only option to combat hepatitis C. Interferon-alpha (IFN-alpha) and ribavirin (RBV) combination alone or in combination with recently introduced new direct-acting antivirals (DAA) is used to treat patients infected with HCV. The present study utilized feature selection methods (Gini Index, Chi Squared and machine learning algorithms) and other bioinformatics tools to identify genetic determinants of therapy outcome within the entire HCV nucleotide sequence. RESULTS Using combination of several algorithms, the present study performed a comprehensive bioinformatics analysis and identified several nucleotide attributes within the full-length nucleotide sequences of HCV subtypes 1a and 1b that correlated with treatment outcome. Feature selection algorithms identified several nucleotide features (e.g. count of hydrogen and CG). Combination of algorithms utilized the selected nucleotide attributes and predicted HCV subtypes 1a and 1b therapy responders from non-responders with an accuracy of 75.00% and 85.00%, respectively. In addition, therapy responders and relapsers were categorized with an accuracy of 82.50% and 84.17%, respectively. Based on the identified attributes, decision trees were induced to differentiate different therapy response groups. CONCLUSIONS The present study identified new genetic markers that potentially impact the outcome of hepatitis C treatment. In addition, the results suggest new viral genomic attributes that might influence the outcome of IFN-mediated immune response to HCV infection.
Collapse
Affiliation(s)
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran.
| | | |
Collapse
|
22
|
Baker YS, Agrawal R, Foster JA, Beck D, Dozier G. APPLYING MACHINE LEARNING TECHNIQUES IN DETECTING BACTERIAL VAGINOSIS. PROCEEDINGS. INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS 2014; 2014:241-246. [PMID: 25914861 PMCID: PMC4407517 DOI: 10.1109/icmlc.2014.7009123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
There are several diseases which arise because of changes in the microbial communities in the body. Scientists continue to conduct research in a quest to find the catalysts that provoke these changes in the naturally occurring microbiota. Bacterial Vaginosis (BV) is a disease that fits the above criteria. BV afflicts approximately 29% of women in child bearing age. Unfortunately, its causes are unknown. This paper seeks to uncover the most important features for diagnosis and in turn employ classification algorithms on those features. In order to fulfill our purpose, we conducted two experiments on the data. We isolated the clinical and medical features from the full set of raw data, we compared the accuracy, precision, recall and F-measure and time elapsed for each feature selection and classification grouping. We noticed that classification results were as good or better after performing feature selection although there was a wide range in the number of features produced from the feature selection process. After comparing the experiments, the algorithms performed best on the medical dataset.
Collapse
Affiliation(s)
- Yolanda S. Baker
- Department of Computer Systems Technology, North Carolina Agricultural and Technical State University, Greensboro, NC, USA
| | - Rajeev Agrawal
- Department of Computer Systems Technology, North Carolina Agricultural and Technical State University, Greensboro, NC, USA
| | - James A. Foster
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, USA
| | - Daniel Beck
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, USA
| | - Gerry Dozier
- Department of Computer Science, North Carolina Agricultural and Technical State University, Greensboro, NC, USA
| |
Collapse
|
23
|
Bakhtiarizadeh MR, Moradi-Shahrbabak M, Ebrahimi M, Ebrahimie E. Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology. J Theor Biol 2014; 356:213-22. [PMID: 24819464 DOI: 10.1016/j.jtbi.2014.04.040] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 04/03/2014] [Accepted: 04/29/2014] [Indexed: 01/05/2023]
Abstract
Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods.
Collapse
Affiliation(s)
| | - Mohammad Moradi-Shahrbabak
- Department of Animal Science, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | - Esmaeil Ebrahimie
- Department of Crop Production & Plant Breeding, College of Agriculture, Shiraz University, Shiraz, Iran; School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia.
| |
Collapse
|
24
|
Ebrahimi M, Aghagolzadeh P, Shamabadi N, Tahmasebi A, Alsharifi M, Adelson DL, Hemmatzadeh F, Ebrahimie E. Understanding the undelaying mechanism of HA-subtyping in the level of physic-chemical characteristics of protein. PLoS One 2014; 9:e96984. [PMID: 24809455 PMCID: PMC4014573 DOI: 10.1371/journal.pone.0096984] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 04/07/2014] [Indexed: 01/05/2023] Open
Abstract
The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics.
Collapse
Affiliation(s)
- Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | - Parisa Aghagolzadeh
- Department of Nephrology, Hypertension, and Clinical Pharmacology, University of Bern, Bern, Switzerland
| | - Narges Shamabadi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | | | - Mohammed Alsharifi
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
| | - David L. Adelson
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
| | - Farhid Hemmatzadeh
- School of Animal and Veterinary Science, The University of Adelaide, Adelaide, Australia
- * E-mail: (FH); (EE)
| | - Esmaeil Ebrahimie
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
- * E-mail: (FH); (EE)
| |
Collapse
|