1
|
Zhao K, Ebrahimie E, Mohammadi-Dehcheshmeh M, Lewsey MG, Zheng L, Hoogenraad NJ. Transcriptomic signature of cancer cachexia by integration of machine learning, literature mining and meta-analysis. Comput Biol Med 2024; 172:108233. [PMID: 38452471 DOI: 10.1016/j.compbiomed.2024.108233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 01/23/2024] [Accepted: 02/25/2024] [Indexed: 03/09/2024]
Abstract
BACKGROUND Cancer cachexia is a severe metabolic syndrome marked by skeletal muscle atrophy. A successful clinical intervention for cancer cachexia is currently lacking. The study of cachexia mechanisms is largely based on preclinical animal models and the availability of high-throughput transcriptomic datasets of cachectic mouse muscles is increasing through the extensive use of next generation sequencing technologies. METHODS Cachectic mouse muscle transcriptomic datasets of ten different studies were combined and mined by seven attribute weighting models, which analysed both categorical variables and numerical variables. The transcriptomic signature of cancer cachexia was identified by attribute weighting algorithms and was used to evaluate the performance of eleven pattern discovery models. The signature was employed to find the best combination of drugs (drug repurposing) for developing cancer cachexia treatment strategies, as well as to evaluate currently used cachexia drugs by literature mining. RESULTS Attribute weighting algorithms ranked 26 genes as the transcriptomic signature of muscle from mice with cancer cachexia. Deep Learning and Random Forest models performed better in differentiating cancer cachexia cases based on muscle transcriptomic data. Literature mining revealed that a combination of melatonin and infliximab has negative interactions with 2 key genes (Rorc and Fbxo32) upregulated in the transcriptomic signature of cancer cachexia in muscle. CONCLUSIONS The integration of machine learning, meta-analysis and literature mining was found to be an efficient approach to identifying a robust transcriptomic signature for cancer cachexia, with implications for improving clinical diagnosis and management of this condition.
Collapse
Affiliation(s)
- Kening Zhao
- Department of Laboratory Medicine, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, China; La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia.
| | - Esmaeil Ebrahimie
- Genomics Research Platform, School of Agriculture, Biomedicine and Environment, La Trobe University, Melbourne, VIC, 3086, Australia; School of Animal and Veterinary Science, The University of Adelaide, Adelaide, SA 5371, Australia; School of BioSciences, The University of Melbourne, Melbourne, VIC, 3010, Australia.
| | - Manijeh Mohammadi-Dehcheshmeh
- Genomics Research Platform, School of Agriculture, Biomedicine and Environment, La Trobe University, Melbourne, VIC, 3086, Australia; School of Animal and Veterinary Science, The University of Adelaide, Adelaide, SA 5371, Australia.
| | - Mathew G Lewsey
- Australian Research Council Research Hub for Medicinal Agriculture, La Trobe University, AgriBio Building, Bundoora, VIC, 3086, Australia; La Trobe Institute for Sustainable Agriculture and Food, Department of Plant, Animal and Soil Sciences, La Trobe University, AgriBio Building, Bundoora, VIC, 3086, Australia; Australian Research Council Centre of Excellence in Plants for Space, AgriBio Building, La Trobe University, Bundoora, VIC, 3086, Australia.
| | - Lei Zheng
- Department of Laboratory Medicine, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, China.
| | - Nick J Hoogenraad
- La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia; Tumour Targeting Laboratory, Olivia Newton-John Cancer Research Institute, School of Cancer Medicine, La Trobe University, Melbourne, VIC, 3084, Australia.
| |
Collapse
|
2
|
Rezaei Z, Tahmasebi A, Pourabbas B. Using meta-analysis and machine learning to investigate the transcriptional response of immune cells to Leishmania infection. PLoS Negl Trop Dis 2024; 18:e0011892. [PMID: 38190401 PMCID: PMC10798641 DOI: 10.1371/journal.pntd.0011892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 01/19/2024] [Accepted: 12/29/2023] [Indexed: 01/10/2024] Open
Abstract
BACKGROUND Leishmaniasis is a parasitic disease caused by the Leishmania protozoan affecting millions of people worldwide, especially in tropical and subtropical regions. The immune response involves the activation of various cells to eliminate the infection. Understanding the complex interplay between Leishmania and the host immune system is crucial for developing effective treatments against this disease. METHODS This study collected extensive transcriptomic data from macrophages, dendritic, and NK cells exposed to Leishmania spp. Our objective was to determine the Leishmania-responsive genes in immune system cells by applying meta-analysis and feature selection algorithms, followed by co-expression analysis. RESULTS As a result of meta-analysis, we discovered 703 differentially expressed genes (DEGs), primarily associated with the immune system and cellular metabolic processes. In addition, we have substantiated the significance of transcription factor families, such as bZIP and C2H2 ZF, in response to Leishmania infection. Furthermore, the feature selection techniques revealed the potential of two genes, namely G0S2 and CXCL8, as biomarkers and therapeutic targets for Leishmania infection. Lastly, our co-expression analysis has unveiled seven hub genes, including PFKFB3, DIAPH1, BSG, BIRC3, GOT2, EIF3H, and ATF3, chiefly related to signaling pathways. CONCLUSIONS These findings provide valuable insights into the molecular mechanisms underlying the response of immune system cells to Leishmania infection and offer novel potential targets for the therapeutic goals.
Collapse
Affiliation(s)
- Zahra Rezaei
- Professor Alborzi Clinical Microbiology Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Ahmad Tahmasebi
- Professor Alborzi Clinical Microbiology Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
- Shiraz Institute for Cancer Research, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Bahman Pourabbas
- Professor Alborzi Clinical Microbiology Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| |
Collapse
|
3
|
Liu Y, Zhuang Y, Yu L, Li Q, Zhao C, Meng R, Zhu J, Guo X. A Machine Learning Framework Based on Extreme Gradient Boosting to Predict the Occurrence and Development of Infectious Diseases in Laying Hen Farms, Taking H9N2 as an Example. Animals (Basel) 2023; 13:1494. [PMID: 37174531 PMCID: PMC10177545 DOI: 10.3390/ani13091494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 04/26/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023] Open
Abstract
The H9N2 avian influenza virus has become one of the dominant subtypes of avian influenza virus in poultry and has been significantly harmful to chickens in China, with great economic losses in terms of reduced egg production or high mortality by co-infection with other pathogens. A prediction of H9N2 status based on easily available production data with high accuracy would be important and essential to prevent and control H9N2 outbreaks in advance. This study developed a machine learning framework based on the XGBoost classification algorithm using 3 months' laying rates and mortalities collected from three H9N2-infected laying hen houses with complete onset cycles. A framework was developed to automatically predict the H9N2 status of individual house for future 3 days (H9N2 status + 0, H9N2 status + 1, H9N2 status + 2) with five time frames (day + 0, day - 1, day - 2, day - 3, day - 4). It had been proven that a high accuracy rate > 90%, a recall rate > 90%, a precision rate of >80%, and an area under the curve of the receiver operator characteristic ≥ 0.85 could be achieved with the prediction models. Models with day + 0 and day - 1 were highly recommended to predict H9N2 status + 0 and H9N2 status + 1 for the direct or auxiliary monitoring of its occurrence and development. Such a framework could provide new insights into predicting H9N2 outbreaks, and other practical potential applications to assist in disease monitor were also considerable.
Collapse
Affiliation(s)
- Yu Liu
- Research Center of Information Technology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
- National Innovation Center of Digital Technology in Animal Husbandry, Beijing 100097, China
| | - Yanrong Zhuang
- College of Water Resources and Civil Engineering, China Agricultural University, Beijing 100083, China
| | - Ligen Yu
- Research Center of Information Technology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
- National Innovation Center of Digital Technology in Animal Husbandry, Beijing 100097, China
| | - Qifeng Li
- Research Center of Information Technology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
- National Innovation Center of Digital Technology in Animal Husbandry, Beijing 100097, China
| | - Chunjiang Zhao
- Research Center of Information Technology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
- National Innovation Center of Digital Technology in Animal Husbandry, Beijing 100097, China
| | - Rui Meng
- Research Center of Information Technology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
- National Innovation Center of Digital Technology in Animal Husbandry, Beijing 100097, China
| | - Jun Zhu
- Research Center of Information Technology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
- National Innovation Center of Digital Technology in Animal Husbandry, Beijing 100097, China
| | - Xiaoli Guo
- Research Center of Information Technology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
- National Innovation Center of Digital Technology in Animal Husbandry, Beijing 100097, China
| |
Collapse
|
4
|
Gemler BT, Mukherjee C, Howland CA, Huk D, Shank Z, Harbo LJ, Tabbaa OP, Bartling CM. Function-based classification of hazardous biological sequences: Demonstration of a new paradigm for biohazard assessments. Front Bioeng Biotechnol 2022; 10:979497. [PMID: 36277394 PMCID: PMC9585941 DOI: 10.3389/fbioe.2022.979497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 08/31/2022] [Indexed: 12/04/2022] Open
Abstract
Bioengineering applies analytical and engineering principles to identify functional biological building blocks for biotechnology applications. While these building blocks are leveraged to improve the human condition, the lack of simplistic, machine-readable definition of biohazards at the function level is creating a gap for biosafety practices. More specifically, traditional safety practices focus on the biohazards of known pathogens at the organism-level and may not accurately consider novel biodesigns with engineered functionalities at the genetic component-level. This gap is motivating the need for a paradigm shift from organism-centric procedures to function-centric biohazard identification and classification practices. To address this challenge, we present a novel methodology for classifying biohazards at the individual sequence level, which we then compiled to distinguish the biohazardous property of pathogenicity at the whole genome level. Our methodology is rooted in compilation of hazardous functions, defined as a set of sequences and associated metadata that describe coarse-level functions associated with pathogens (e.g., adherence, immune subversion). We demonstrate that the resulting database can be used to develop hazardous “fingerprints” based on the functional metadata categories. We verified that these hazardous functions are found at higher levels in pathogens compared to non-pathogens, and hierarchical clustering of the fingerprints can distinguish between these two groups. The methodology presented here defines the hazardous functions associated with bioengineering functional building blocks at the sequence level, which provide a foundational framework for classifying biological hazards at the organism level, thus leading to the improvement and standardization of current biosecurity and biosafety practices.
Collapse
|
5
|
Ahsan R, Ebrahimi F, Ebrahimi M. Classification of imbalanced protein sequences with deep-learning approaches; application on influenza A imbalanced virus classes. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
6
|
Evaluation of the Effectiveness of Herbal Components Based on Their Regulatory Signature on Carcinogenic Cancer Cells. Cells 2021; 10:cells10113139. [PMID: 34831362 PMCID: PMC8621084 DOI: 10.3390/cells10113139] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 11/06/2021] [Accepted: 11/09/2021] [Indexed: 12/28/2022] Open
Abstract
Predicting cancer cells’ response to a plant-derived agent is critical for the drug discovery process. Recently transcriptomes advancements have provided an opportunity to identify regulatory signatures to predict drug activity. Here in this study, a combination of meta-analysis and machine learning models have been used to determine regulatory signatures focusing on differentially expressed transcription factors (TFs) of herbal components on cancer cells. In order to increase the size of the dataset, six datasets were combined in a meta-analysis from studies that had evaluated the gene expression in cancer cell lines before and after herbal extract treatments. Then, categorical feature analysis based on the machine learning methods was applied to examine transcription factors in order to find the best signature/pattern capable of discriminating between control and treated groups. It was found that this integrative approach could recognize the combination of TFs as predictive biomarkers. It was observed that the random forest (RF) model produced the best combination rules, including AIP/TFE3/VGLL4/ID1 and AIP/ZNF7/DXO with the highest modulating capacity. As the RF algorithm combines the output of many trees to set up an ultimate model, its predictive rules are more accurate and reproducible than other trees. The discovered regulatory signature suggests an effective procedure to figure out the efficacy of investigational herbal compounds on particular cells in the drug discovery process.
Collapse
|
7
|
Ebrahimie E, Zamansani F, Alanazi IO, Sabi EM, Khazandi M, Ebrahimi F, Mohammadi-Dehcheshmeh M, Ebrahimi M. Advances in understanding the specificity function of transporters by machine learning. Comput Biol Med 2021; 138:104893. [PMID: 34598069 DOI: 10.1016/j.compbiomed.2021.104893] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 09/20/2021] [Accepted: 09/22/2021] [Indexed: 11/25/2022]
Abstract
Understanding the underlying molecular mechanism of transporter activity is one of the major discussions in structural biology. A transporter can exclusively transport one ion (specific transporter) or multiple ions (general transporter). This study compared categorical and numerical features of general and specific calcium transporters using machine learning and attribute weighting models. To this end, 444 protein features, such as the frequency of dipeptides, organism, and subcellular location, were extracted for general (n = 103) and specific calcium transporters (n = 238). Aliphatic index, subcellular location, organism, Ile-Leu frequency, Glycine frequency, hydrophobic frequency, and specific dipeptides such as Ile-Leu, Phe-Val, and Tyr-Gln were the key features in differentiating general from specific calcium transporters. Calcium transporters in the cell outer membranes were specific, while the inner ones were general; additionally, when the hydrophobic frequency or Aliphatic index is increased, the calcium transporter act as a general transporter. Random Forest with accuracy criterion showed the highest accuracy (88.88% ±5.75%) and high AUC (0.964 ± 0.020), based on 5-fold cross-validation. Decision Tree with accuracy criterion was able to predict the specificity of calcium transporter irrespective of the organism and subcellular location. This study demonstrates the precise classification of transporter function based on sequence-derived physicochemical features.
Collapse
Affiliation(s)
- Esmaeil Ebrahimie
- Genomics Research Platform, School of Life Sciences, College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, 3086, Australia; School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia.
| | - Fatemeh Zamansani
- Department of Crop Production and Plant Breeding, College of Agriculture, Shiraz University, Shiraz, Iran.
| | - Ibrahim O Alanazi
- National Center for Biotechnology, Life Science and Environment Research Institute, King Abdulaziz City for Science and Technology (KACST), Riyadh, 6086, Saudi Arabia.
| | - Essa M Sabi
- Department of Pathology, Clinical Biochemistry Unit, College of Medicine, King Saud University, Riyadh, 11461, Saudi Arabia.
| | - Manouchehr Khazandi
- UniSA Clinical and Health Sciences, The University of South Australia, Adelaide, 5000, Australia.
| | - Faezeh Ebrahimi
- Faculty of Life Sciences and Biotechnology, Department of Microbiology and Microbial Biotechnology, Shahid Beheshti University, Tehran, Iran.
| | | | - Mansour Ebrahimi
- School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia; Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran.
| |
Collapse
|
8
|
Borkenhagen LK, Allen MW, Runstadler JA. Influenza virus genotype to phenotype predictions through machine learning: a systematic review. Emerg Microbes Infect 2021; 10:1896-1907. [PMID: 34498543 PMCID: PMC8462836 DOI: 10.1080/22221751.2021.1978824] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background: There is great interest in understanding the viral genomic predictors of phenotypic traits that allow influenza A viruses to adapt to or become more virulent in different hosts. Machine learning techniques have demonstrated promise in addressing this critical need for other pathogens because the underlying algorithms are especially well equipped to uncover complex patterns in large datasets and produce generalizable predictions for new data. As the body of research where these techniques are applied for influenza A virus phenotype prediction continues to grow, it is useful to consider the strengths and weaknesses of these approaches to understand what has prevented these models from seeing widespread use by surveillance laboratories and to identify gaps that are underexplored with this technology. Methods and Results: We present a systematic review of English literature published through 15 April 2021 of studies employing machine learning methods to generate predictions of influenza A virus phenotypes from genomic or proteomic input. Forty-nine studies were included in this review, spanning the topics of host discrimination, human adaptability, subtype and clade assignment, pandemic lineage assignment, characteristics of infection, and antiviral drug resistance. Conclusions: Our findings suggest that biases in model design and a dearth of wet laboratory follow-up may explain why these models often go underused. We, therefore, offer guidance to overcome these limitations, aid in improving predictive models of previously studied influenza A virus phenotypes, and extend those models to unexplored phenotypes in the ultimate pursuit of tools to enable the characterization of virus isolates across surveillance laboratories.
Collapse
Affiliation(s)
- Laura K Borkenhagen
- Department of Infectious Disease and Global Health, Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA, USA
| | - Martin W Allen
- Department of Computer Science, School of Engineering, Tufts University, Medford, MA, USA
| | - Jonathan A Runstadler
- Department of Infectious Disease and Global Health, Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA, USA
| |
Collapse
|
9
|
Ahsan R, Ebrahimi M. Image processing techniques represent innovative tools for comparative analysis of proteins. Comput Biol Med 2019; 117:103584. [PMID: 32072976 DOI: 10.1016/j.compbiomed.2019.103584] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 12/10/2019] [Accepted: 12/12/2019] [Indexed: 01/09/2023]
Abstract
Different bioinformatic and data-mining approaches have been used for the analysis of proteins. Here, we describe a novel, robust, and reliable approach for comparative analysis of a large number of proteins by combining Image Processing Techniques and Convolutional Deep Neural Network (IPT-CNN). As proof of principle, we used IPT-CNN to predict different subtypes of Influenza A virus (IAV). Over 8000 sequences of surface proteins haemagglutinin (HA) and neuraminidase (NA) from different IAV subtypes were used to create polynomial or binary vector datasets. The datasets were then converted into binary images. Analysis of these images enabled the classification of IAV subtypes with 100% accuracy and, compared to non-image-based approaches, within a shorter time frame. The proteome-based IPT-CNN approach described here may be used for analysis and proteome-based classification of other proteins.
Collapse
Affiliation(s)
- Reza Ahsan
- Department of Information Technology, School of Engineering, University of Qom, Qom, Iran
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran; School of Agriculture and Veterinary Sciences, University of Adelaide, Adelaide, Australia.
| |
Collapse
|
10
|
Panahi B, Frahadian M, Dums JT, Hejazi MA. Integration of Cross Species RNA-seq Meta-Analysis and Machine-Learning Models Identifies the Most Important Salt Stress-Responsive Pathways in Microalga Dunaliella. Front Genet 2019; 10:752. [PMID: 31555319 PMCID: PMC6727038 DOI: 10.3389/fgene.2019.00752] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Accepted: 07/17/2019] [Indexed: 01/12/2023] Open
Abstract
Photosynthetic microalgae are potentially yielding sources of different high-value secondary metabolites. Salinity is a complex stress that influences various metabolite-related pathways in microalgae. To obtain a clear view of the underlying metabolic pathways and resolve contradictory information concerning the transcriptional regulation of Dunaliella species in salt stress conditions, RNA-seq meta-analysis along with systems levels analysis was conducted. A p-value combination technique with Fisher method was used for cross species meta-analysis on the transcriptomes of two Dunaliella salina and Dunaliellatertiolecta species. The potential functional impacts of core meta-genes were surveyed based on gene ontology and network analysis. In the current study, the integration of supervised machine-learning algorithms with RNA-seq meta-analysis was performed. The analysis shows that the lipid and nitrogen metabolism, structural proteins of photosynthesis apparatus, chaperone-mediated autophagy, and ROS-related genes are the keys and core elements of the Dunaliella salt stress response system. Cross-talk between Ca2+ signal transduction, lipid accumulation, and ROS signaling network in salt stress conditions are also proposed. Our novel approach opens new avenues for better understanding of microalgae stress response mechanisms and for selection of candidate gene targets for metabolite production in microalgae.
Collapse
Affiliation(s)
- Bahman Panahi
- Department of Genomics, Branch for Northwest & West region, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran
| | - Mohammad Frahadian
- Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
| | - Jacob T Dums
- Department of Plant and Soil Sciences, University of Delaware, Newark, DE, USA
| | - Mohammad Amin Hejazi
- Department of Food Biotechnology, Branch for Northwest & West region, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran
| |
Collapse
|
11
|
Kargarfard F, Sami A, Hemmatzadeh F, Ebrahimie E. Identifying mutation positions in all segments of influenza genome enables better differentiation between pandemic and seasonal strains. Gene 2019; 697:78-85. [PMID: 30769139 DOI: 10.1016/j.gene.2019.01.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2018] [Revised: 12/29/2018] [Accepted: 01/17/2019] [Indexed: 01/08/2023]
Abstract
Influenza has a negative sense, single-stranded, and segmented RNA. In the context of pandemic influenza research, most studies have focused on variations in the surface proteins (Hemagglutinin and Neuraminidase). However, new findings suggest that all internal and external proteins of influenza viruses can contribute in pandemic emergence, pathogenicity and increasing host range. The occurrence of the 2009 influenza pandemic and the availability of many external and internal segments of pandemic and non-pandemic sequences offer a unique opportunity to evaluate the performance of machine learning models in discrimination of pandemic from seasonal sequences using mutation positions in all segments. In this study, we hypothesized that identifying mutation positions in all segments (proteins) encoded by the influenza genome would enable pandemic and seasonal strains to be more reliably distinguished. In a large scale study, we applied a range of data mining techniques to all segments of influenza for rule discovery and discrimination of pandemic from seasonal strains. CBA (classification based on association rule mining), Ripper and Decision tree algorithms were utilized to extract association rules among mutations. CBA outperformed the other models. Our approach could discriminate pandemic sequences from seasonal ones with more than 95% accuracy for PA and NP, 99.33% accuracy for NA and 100% accuracy, precision, specificity and sensitivity (recall) for M1, M2, PB1, NS1, and NS2. The values of precision, specificity, and sensitivity were more than 90% for other segments except PB2. If sequences of all segments of one strain were available, the accuracy of discrimination of pandemic strains was 100%. General rules extracted by rule base classification approaches, such as M1-V147I, NP-N334H, NS1-V112I, and PB1-L364I, were able to detect pandemic sequences with high accuracy. We observed that mutations on internal proteins of influenza can contribute in distinguishing the pandemic viruses, similar to the external ones.
Collapse
Affiliation(s)
- Fatemeh Kargarfard
- Faculty of Engineering and IT, University of Technology Sydney, New South Wales, Australia; Department of Computer Science and Engineering, School of Electrical Engineering and Computer, Shiraz University, Shiraz, Iran
| | - Ashkan Sami
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer, Shiraz University, Shiraz, Iran
| | - Farhid Hemmatzadeh
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia
| | - Esmaeil Ebrahimie
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia; Genomics Research Platform, La Trobe University, Melbourne, Victoria 3086, Australia; School of Information Technology and Mathematical Sciences, Division of Information Technology Engineering & Environment, University of South Australia, Adelaide, Australia; School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, Australia.
| |
Collapse
|
12
|
Mohammadi-Dehcheshmeh M, Niazi A, Ebrahimi M, Tahsili M, Nurollah Z, Ebrahimi Khaksefid R, Ebrahimi M, Ebrahimie E. Unified Transcriptomic Signature of Arbuscular Mycorrhiza Colonization in Roots of Medicago truncatula by Integration of Machine Learning, Promoter Analysis, and Direct Merging Meta-Analysis. FRONTIERS IN PLANT SCIENCE 2018; 9:1550. [PMID: 30483277 PMCID: PMC6240842 DOI: 10.3389/fpls.2018.01550] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Accepted: 10/03/2018] [Indexed: 05/25/2023]
Abstract
Plant root symbiosis with Arbuscular mycorrhizal (AM) fungi improves uptake of water and mineral nutrients, improving plant development under stressful conditions. Unraveling the unified transcriptomic signature of a successful colonization provides a better understanding of symbiosis. We developed a framework for finding the transcriptomic signature of Arbuscular mycorrhiza colonization and its regulating transcription factors in roots of Medicago truncatula. Expression profiles of roots in response to AM species were collected from four separate studies and were combined by direct merging meta-analysis. Batch effect, the major concern in expression meta-analysis, was reduced by three normalization steps: Robust Multi-array Average algorithm, Z-standardization, and quartiling normalization. Then, expression profile of 33685 genes in 18 root samples of Medicago as numerical features, as well as study ID and Arbuscular mycorrhiza type as categorical features, were mined by seven models: RELIEF, UNCERTAINTY, GINI INDEX, Chi Squared, RULE, INFO GAIN, and INFO GAIN RATIO. In total, 73 genes selected by machine learning models were up-regulated in response to AM (Z-value difference > 0.5). Feature weighting models also documented that this signature is independent from study (batch) effect. The AM inoculation signature obtained was able to differentiate efficiently between AM inoculated and non-inoculated samples. The AP2 domain class transcription factor, GRAS family transcription factors, and cyclin-dependent kinase were among the highly expressed meta-genes identified in the signature. We found high correspondence between the AM colonization signature obtained in this study and independent RNA-seq experiments on AM colonization, validating the repeatability of the colonization signature. Promoter analysis of upregulated genes in the transcriptomic signature led to the key regulators of AM colonization, including the essential transcription factors for endosymbiosis establishment and development such as NF-YA factors. The approach developed in this study offers three distinct novel features: (I) it improves direct merging meta-analysis by integrating supervised machine learning models and normalization steps to reduce study-specific batch effects; (II) seven attribute weighting models assessed the suitability of each gene for the transcriptomic signature which contributes to robustness of the signature (III) the approach is justifiable, easy to apply, and useful in practice. Our integrative framework of meta-analysis, promoter analysis, and machine learning provides a foundation to reveal the transcriptomic signature and regulatory circuits governing Arbuscular mycorrhizal symbiosis and is transferable to the other biological settings.
Collapse
Affiliation(s)
- Manijeh Mohammadi-Dehcheshmeh
- Australian Centre for Antimicrobial Resistance Ecology, School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, SA, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | - Ali Niazi
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | | | | | - Zahra Nurollah
- Department of Biotechnology, Shahrekord University, Shahrekord, Iran
| | - Reyhaneh Ebrahimi Khaksefid
- Department of Biotechnology, Shahrekord University, Shahrekord, Iran
- School of Agriculture Food and Wine, Department of Plant Science, The University of Adelaide, Adelaide, SA, Australia
| | - Mahdi Ebrahimi
- Max-Planck-Institute for Informatics, Saarbrucken, Germany
| | - Esmaeil Ebrahimie
- Australian Centre for Antimicrobial Resistance Ecology, School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, SA, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
- Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- Division of Information Technology, Engineering and the Environment, School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, SA, Australia
- Faculty of Science and Engineering, School of Biological Sciences, Flinders University, Adelaide, SA, Australia
| |
Collapse
|
13
|
A large-scale study of indicators of sub-clinical mastitis in dairy cattle by attribute weighting analysis of milk composition features: highlighting the predictive power of lactose and electrical conductivity. J DAIRY RES 2018; 85:193-200. [PMID: 29785910 DOI: 10.1017/s0022029918000249] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Sub-clinical mastitis (SCM) affects milk composition. In this study, we hypothesise that large-scale mining of milk composition features by pattern recognition models can identify the best predictors of SCM within the milk composition features. To this end, using data mining algorithms, we conducted a large-scale and longitudinal study to evaluate the ability of various milk production parameters as indicators of SCM. SCM is the most prevalent disease of dairy cattle, causing substantial economic loss for the dairy industry. Developing new techniques to diagnose SCM in its early stages improves herd health and is of great importance. Test-day Somatic Cell Count (SCC) is the most common indicator of SCM and the primary mastitis surveillance approach worldwide. However, test-day SCC fluctuates widely between days, causing major concerns for its reliability. Consequently, there would be great benefit to identifying additional efficient indicators from large-scale and longitudinal studies. With this intent, data was collected at every milking (twice per day) for a period of 2 months from a single farm using in-line electronic equipment (346 248 records in total). The following data were analysed: milk volume, protein concentration, lactose concentration, electrical conductivity (EC), milking time and peak flow. Three SCC cut-offs were used to estimate the prevalence of SCM: Australian ≥ 250 000 cells/ml, European ≥200 000 cells/ml and New Zealand ≥ 150 000 cells/ml. At first, 10 different Attribute Weighting Algorithms (AWM) were applied to the data. In the absence of SCC, lactose concentration featured as the most important variable, followed by EC. For the first time, using attribute weighted modelling, we showed that the concentration of lactose in milk can be used as a strong indicator of SCM. The development of machine-learning expert systems using two or more milk variables (such as lactose concentration and EC) may produce a predictive pattern for early SCM detection.
Collapse
|
14
|
Kargarfard F, Sami A, Mohammadi-Dehcheshmeh M, Ebrahimie E. Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments. BMC Genomics 2016; 17:925. [PMID: 27852224 PMCID: PMC5112743 DOI: 10.1186/s12864-016-3250-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 11/02/2016] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Recent (2013 and 2009) zoonotic transmission of avian or porcine influenza to humans highlights an increase in host range by evading species barriers. Gene reassortment or antigenic shift between viruses from two or more hosts can generate a new life-threatening virus when the new shuffled virus is no longer recognized by antibodies existing within human populations. There is no large scale study to help understand the underlying mechanisms of host transmission. Furthermore, there is no clear understanding of how different segments of the influenza genome contribute in the final determination of host range. METHODS To obtain insight into the rules underpinning host range determination, various supervised machine learning algorithms were employed to mine reassortment changes in different viral segments in a range of hosts. Our multi-host dataset contained whole segments of 674 influenza strains organized into three host categories: avian, human, and swine. Some of the sequences were assigned to multiple hosts. In point of fact, the datasets are a form of multi-labeled dataset and we utilized a multi-label learning method to identify discriminative sequence sites. Then algorithms such as CBA, Ripper, and decision tree were applied to extract informative and descriptive association rules for each viral protein segment. RESULT We found informative rules in all segments that are common within the same host class but varied between different hosts. For example, for infection of an avian host, HA14V and NS1230S were the most important discriminative and combinatorial positions. CONCLUSION Host range identification is facilitated by high support combined rules in this study. Our major goal was to detect discriminative genomic positions that were able to identify multi host viruses, because such viruses are likely to cause pandemic or disastrous epidemics.
Collapse
Affiliation(s)
- Fatemeh Kargarfard
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
| | - Ashkan Sami
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
| | - Manijeh Mohammadi-Dehcheshmeh
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | - Esmaeil Ebrahimie
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia
- School of Medicine, Faculty of Health Sciences, The University of Adelaide, Adelaide, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, University of South Australia, Adelaide, Australia
- School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, Australia
| |
Collapse
|
15
|
Epitope Mapping of Avian Influenza M2e Protein: Different Species Recognise Various Epitopes. PLoS One 2016; 11:e0156418. [PMID: 27362795 PMCID: PMC4928777 DOI: 10.1371/journal.pone.0156418] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Accepted: 05/14/2016] [Indexed: 12/12/2022] Open
Abstract
A common approach for developing diagnostic tests for influenza virus detection is the use of mouse or rabbit monoclonal and/or polyclonal antibodies against a target antigen of the virus. However, comparative mapping of the target antigen using antibodies from different animal sources has not been evaluated before. This is important because identification of antigenic determinants of the target antigen in different species plays a central role to ensure the efficiency of a diagnostic test, such as competitive ELISA or immunohistochemistry-based tests. Interest in the matrix 2 ectodomain (M2e) protein of avian influenza virus (AIV) as a candidate for a universal vaccine and also as a marker for detection of virus infection in vaccinated animals (DIVA) is the rationale for the selection of this protein for comparative mapping evaluation. This study aimed to map the epitopes of the M2e protein of avian influenza virus H5N1 using chicken, mouse and rabbit monoclonal or monospecific antibodies. Our findings revealed that rabbit antibodies (rAbs) recognized epitope 6EVETPTRN13 of the M2e, located at the N-terminal of the protein, while mouse (mAb) and chicken antibodies (cAbs) recognized epitope 10PTRNEWECK18, located at the centre region of the protein. The findings highlighted the difference between the M2e antigenic determinants recognized by different species that emphasized the importance of comparative mapping of antibody reactivity from different animals to the same antigen, especially in the case of multi-host infectious agents such as influenza. The findings are of importance for antigenic mapping, as well as diagnostic test and vaccine development.
Collapse
|
16
|
Jamali AA, Ferdousi R, Razzaghi S, Li J, Safdari R, Ebrahimie E. DrugMiner: comparative analysis of machine learning algorithms for prediction of potential druggable proteins. Drug Discov Today 2016; 21:718-24. [PMID: 26821132 DOI: 10.1016/j.drudis.2016.01.007] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2015] [Revised: 12/05/2015] [Accepted: 01/19/2016] [Indexed: 12/14/2022]
Abstract
Application of computational methods in drug discovery has received increased attention in recent years as a way to accelerate drug target prediction. Based on 443 sequence-derived protein features, we applied the most commonly used machine learning methods to predict whether a protein is druggable as well as to opt for superior algorithm in this task. In addition, feature selection procedures were used to provide the best performance of each classifier according to the optimum number of features. When run on all features, Neural Network was the best classifier, with 89.98% accuracy, based on a k-fold cross-validation test. Among all the algorithms applied, the optimum number of most-relevant features was 130, according to the Support Vector Machine-Feature Selection (SVM-FS) algorithm. This study resulted in the discovery of new drug target which potentially can be employed in cell signaling pathways, gene expression, and signal transduction. The DrugMiner web tool was developed based on the findings of this study to provide researchers with the ability to predict druggable proteins. DrugMiner is freely available at www.DrugMiner.org.
Collapse
Affiliation(s)
- Ali Akbar Jamali
- Research Center for Pharmaceutical Nanotechnology (RCPN), Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Ferdousi
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Saeed Razzaghi
- Information Technology Center, The University of Zanjan, Zanjan, Iran
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, The University of South Australia, Adelaide, SA, Australia
| | - Reza Safdari
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran.
| | - Esmaeil Ebrahimie
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, The University of South Australia, Adelaide, SA, Australia; Department of Genetics & Evolution, School of Biological Sciences, The University of Adelaide, Adelaide, SA, Australia; School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, SA, Australia.
| |
Collapse
|
17
|
Zinati Z, Alemzadeh A, KayvanJoo AH. Computational approaches for classification and prediction of P-type ATPase substrate specificity in Arabidopsis. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2016; 22:163-174. [PMID: 27186030 PMCID: PMC4840148 DOI: 10.1007/s12298-016-0351-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 03/15/2016] [Accepted: 03/28/2016] [Indexed: 06/05/2023]
Abstract
As an extended gamut of integral membrane (extrinsic) proteins, and based on their transporting specificities, P-type ATPases include five subfamilies in Arabidopsis, inter alia, P4ATPases (phospholipid-transporting ATPase), P3AATPases (plasma membrane H(+) pumps), P2A and P2BATPases (Ca(2+) pumps) and P1B ATPases (heavy metal pumps). Although, many different computational methods have been developed to predict substrate specificity of unknown proteins, further investigation needs to improve the efficiency and performance of the predicators. In this study, various attribute weighting and supervised clustering algorithms were employed to identify the main amino acid composition attributes, which can influence the substrate specificity of ATPase pumps, classify protein pumps and predict the substrate specificity of uncharacterized ATPase pumps. The results of this study indicate that both non-reduced coefficients pertaining to absorption and Cys extinction within 280 nm, the frequencies of hydrogen, Ala, Val, carbon, hydrophilic residues, the counts of Val, Asn, Ser, Arg, Phe, Tyr, hydrophilic residues, Phe-Phe, Ala-Ile, Phe-Leu, Val-Ala and length are specified as the most important amino acid attributes through applying the whole attribute weighting models. Here, learning algorithms engineered in a predictive machine (Naive Bays) is proposed to foresee the Q9LVV1 and O22180 substrate specificities (P-type ATPase like proteins) with 100 % prediction confidence. For the first time, our analysis demonstrated promising application of bioinformatics algorithms in classifying ATPases pumps. Moreover, we suggest the predictive systems that can assist towards the prediction of the substrate specificity of any new ATPase pumps with the maximum possible prediction confidence.
Collapse
Affiliation(s)
- Zahra Zinati
- />Department of Agroecology, College of Agriculture and Natural Resources of Darab, Shiraz University, Shiraz, Iran
| | - Abbas Alemzadeh
- />Department of Crop Production and Plant Breeding, College of Agriculture, Shiraz University, Shiraz, Iran
| | - Amir Hossein KayvanJoo
- />Bonn-Aachen International Center for Information Technology B-IT, University of Bonn, Bonn, Germany
| |
Collapse
|
18
|
Torkzaban B, Kayvanjoo AH, Ardalan A, Mousavi S, Mariotti R, Baldoni L, Ebrahimie E, Ebrahimi M, Hosseini-Mazinani M. Machine Learning Based Classification of Microsatellite Variation: An Effective Approach for Phylogeographic Characterization of Olive Populations. PLoS One 2015; 10:e0143465. [PMID: 26599001 PMCID: PMC4658005 DOI: 10.1371/journal.pone.0143465] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 11/05/2015] [Indexed: 11/24/2022] Open
Abstract
Finding efficient analytical techniques is overwhelmingly turning into a bottleneck for the effectiveness of large biological data. Machine learning offers a novel and powerful tool to advance classification and modeling solutions in molecular biology. However, these methods have been less frequently used with empirical population genetics data. In this study, we developed a new combined approach of data analysis using microsatellite marker data from our previous studies of olive populations using machine learning algorithms. Herein, 267 olive accessions of various origins including 21 reference cultivars, 132 local ecotypes, and 37 wild olive specimens from the Iranian plateau, together with 77 of the most represented Mediterranean varieties were investigated using a finely selected panel of 11 microsatellite markers. We organized data in two ‘4-targeted’ and ‘16-targeted’ experiments. A strategy of assaying different machine based analyses (i.e. data cleaning, feature selection, and machine learning classification) was devised to identify the most informative loci and the most diagnostic alleles to represent the population and the geography of each olive accession. These analyses revealed microsatellite markers with the highest differentiating capacity and proved efficiency for our method of clustering olive accessions to reflect upon their regions of origin. A distinguished highlight of this study was the discovery of the best combination of markers for better differentiating of populations via machine learning models, which can be exploited to distinguish among other biological populations.
Collapse
Affiliation(s)
- Bahareh Torkzaban
- National Institute of Genetic Engineering & Biotechnology, Tehran, Iran
| | | | - Arman Ardalan
- National Institute of Genetic Engineering & Biotechnology, Tehran, Iran
- Department of Gene Technology, KTH, Royal Institute of Technology, Science for Life Laboratory, Solna, Sweden
| | - Soraya Mousavi
- National Institute of Genetic Engineering & Biotechnology, Tehran, Iran
| | | | - Luciana Baldoni
- CNR, Institute of Biosciences & Bioresources, Perugia, Italy
| | - Esmaeil Ebrahimie
- Institute of Biotechnology, College of Agriculture, Shiraz University, Shiraz, Iran
- Department of Genetics and Evolution, School of Biological Sciences, University of Adelaide, Adelaide, Australia
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, University of South Australia, Adelaide, Australia
- School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, Australia
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Science, University of Qom, Qom, Iran
- * E-mail: (MHM); (ME)
| | - Mehdi Hosseini-Mazinani
- National Institute of Genetic Engineering & Biotechnology, Tehran, Iran
- * E-mail: (MHM); (ME)
| |
Collapse
|
19
|
Kargarfard F, Sami A, Ebrahimie E. Knowledge discovery and sequence-based prediction of pandemic influenza using an integrated classification and association rule mining (CBA) algorithm. J Biomed Inform 2015; 57:181-8. [PMID: 26232668 DOI: 10.1016/j.jbi.2015.07.018] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Revised: 07/09/2015] [Accepted: 07/27/2015] [Indexed: 10/23/2022]
Abstract
Pandemic influenza is a major concern worldwide. Availability of advanced technologies and the nucleotide sequences of a large number of pandemic and non-pandemic influenza viruses in 2009 provide a great opportunity to investigate the underlying rules of pandemic induction through data mining tools. Here, for the first time, an integrated classification and association rule mining algorithm (CBA) was used to discover the rules underpinning alteration of non-pandemic sequences to pandemic ones. We hypothesized that the extracted rules can lead to the development of an efficient expert system for prediction of influenza pandemics. To this end, we used a large dataset containing 5373 HA (hemagglutinin) segments of the 2009 H1N1 pandemic and non-pandemic influenza sequences. The analysis was carried out for both nucleotide and protein sequences. We found a number of new rules which potentially present the undiscovered antigenic sites at influenza structure. At the nucleotide level, alteration of thymine (T) at position 260 was the key discriminating feature in distinguishing non-pandemic from pandemic sequences. At the protein level, rules including I233K, M334L were the differentiating features. CBA efficiently classifies pandemic and non-pandemic sequences with high accuracy at both the nucleotide and protein level. Finding hotspots in influenza sequences is a significant finding as they represent the regions with low antibody reactivity. We argue that the virus breaks host immunity response by mutation at these spots. Based on the discovered rules, we developed the software, "Prediction of Pandemic Influenza" for discrimination of pandemic from non-pandemic sequences. This study opens a new vista in discovery of association rules between mutation points during evolution of pandemic influenza.
Collapse
Affiliation(s)
- Fatemeh Kargarfard
- Department of Computer Science and IT, School of Electrical Engineering and Computer Science, Shiraz University, Shiraz, Iran
| | - Ashkan Sami
- Department of Computer Science and IT, School of Electrical Engineering and Computer Science, Shiraz University, Shiraz, Iran.
| | - Esmaeil Ebrahimie
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, University of South Australia, Adelaide, Australia; Institute of Biotechnology, Shiraz University, Shiraz, Iran; Department of Genetics and Evolution, School of Biological Sciences, The University of Adelaide, Adelaide, Australia.
| |
Collapse
|
20
|
New layers in understanding and predicting α-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. Comput Biol Med 2014; 54:14-23. [DOI: 10.1016/j.compbiomed.2014.08.019] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Revised: 08/16/2014] [Accepted: 08/17/2014] [Indexed: 12/11/2022]
|