Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ebrahimi M, Lakizadeh A, Agha-Golzadeh P, Ebrahimie E, Ebrahimi M. Prediction of thermostability from amino acid attributes by combination of clustering with attribute weighting: a new vista in engineering enzymes. PLoS One 2011;6:e23146. [PMID: 21853079 DOI: 10.1371/journal.pone.0023146] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Accepted: 07/06/2011] [Indexed: 11/19/2022] Open

For:	Ebrahimi M, Lakizadeh A, Agha-Golzadeh P, Ebrahimie E, Ebrahimi M. Prediction of thermostability from amino acid attributes by combination of clustering with attribute weighting: a new vista in engineering enzymes. PLoS One 2011;6:e23146. [PMID: 21853079 DOI: 10.1371/journal.pone.0023146] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Accepted: 07/06/2011] [Indexed: 11/19/2022] Open

Number

Cited by Other Article(s)

Vassileff N, Spiers JG, Lee JD, Woodruff TM, Ebrahimie E, Mohammadi Dehcheshmeh M, Hill AF, Cheng L. A Panel of miRNA Biomarkers Common to Serum and Brain-Derived Extracellular Vesicles Identified in Mouse Model of Amyotrophic Lateral Sclerosis. Mol Neurobiol 2024;61:5901-5915. [PMID: 38252383 PMCID: PMC11249427 DOI: 10.1007/s12035-023-03857-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 12/05/2023] [Indexed: 01/23/2024]

Zhao K, Ebrahimie E, Mohammadi-Dehcheshmeh M, Lewsey MG, Zheng L, Hoogenraad NJ. Transcriptomic signature of cancer cachexia by integration of machine learning, literature mining and meta-analysis. Comput Biol Med 2024;172:108233. [PMID: 38452471 DOI: 10.1016/j.compbiomed.2024.108233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 01/23/2024] [Accepted: 02/25/2024] [Indexed: 03/09/2024]

Yang Y, Zhao J, Zeng L, Vihinen M. ProTstab2 for Prediction of Protein Thermal Stabilities. Int J Mol Sci 2022;23:ijms231810798. [PMID: 36142711 PMCID: PMC9505338 DOI: 10.3390/ijms231810798] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/12/2022] [Accepted: 09/13/2022] [Indexed: 11/16/2022] Open

Jafari O, Ebrahimi M, Hedayati SAA, Zeinalabedini M, Poorbagher H, Nasrolahpourmoghadam M, Fernandes JMO. Integration of Morphometrics and Machine Learning Enables Accurate Distinction between Wild and Farmed Common Carp. LIFE (BASEL, SWITZERLAND) 2022;12:life12070957. [PMID: 35888047 PMCID: PMC9315565 DOI: 10.3390/life12070957] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 06/16/2022] [Accepted: 06/20/2022] [Indexed: 11/16/2022]

Shahraki MF, Atanaki FF, Ariaeenejad S, Ghaffari MR, Norouzi‐Beirami MH, Maleki M, Salekdeh GH, Kavousi K. A computational learning paradigm to targeted discovery of biocatalysts from metagenomic data: a case study of lipase identification. Biotechnol Bioeng 2022;119:1115-1128. [DOI: 10.1002/bit.28037] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 08/18/2021] [Accepted: 12/01/2021] [Indexed: 11/09/2022]

Ghahramani N, Shodja J, Rafat SA, Panahi B, Hasanpur K. Integrative Systems Biology Analysis Elucidates Mastitis Disease Underlying Functional Modules in Dairy Cattle. Front Genet 2021;12:712306. [PMID: 34691146 PMCID: PMC8531812 DOI: 10.3389/fgene.2021.712306] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 08/30/2021] [Indexed: 11/13/2022] Open

Abstract

Background: Mastitis is the most prevalent disease in dairy cattle and one of the most significant bovine pathologies affecting milk production, animal health, and reproduction. In addition, mastitis is the most common, expensive, and contagious infection in the dairy industry. Methods: A meta-analysis of microarray and RNA-seq data was conducted to identify candidate genes and functional modules associated with mastitis disease. The results were then applied to systems biology analysis via weighted gene coexpression network analysis (WGCNA), Gene Ontology, enrichment analysis for the Kyoto Encyclopedia of Genes and Genomes (KEGG), and modeling using machine-learning algorithms. Results: Microarray and RNA-seq datasets were generated for 2,089 and 2,794 meta-genes, respectively. Between microarray and RNA-seq datasets, a total of 360 meta-genes were found that were significantly enriched as "peroxisome," "NOD-like receptor signaling pathway," "IL-17 signaling pathway," and "TNF signaling pathway" KEGG pathways. The turquoise module (n = 214 genes) and the brown module (n = 57 genes) were identified as critical functional modules associated with mastitis through WGCNA. PRDX5, RAB5C, ACTN4, SLC25A16, MAPK6, CD53, NCKAP1L, ARHGEF2, COL9A1, and PTPRC genes were detected as hub genes in identified functional modules. Finally, using attribute weighting and machine-learning methods, hub genes that are sufficiently informative in Escherichia coli mastitis were used to optimize predictive models. The constructed model proposed the optimal approach for the meta-genes and validated several high-ranked genes as biomarkers for E. coli mastitis using the decision tree (DT) method. Conclusion: The candidate genes and pathways proposed in this study may shed new light on the underlying molecular mechanisms of mastitis disease and suggest new approaches for diagnosing and treating E. coli mastitis in dairy cattle.

Collapse

Ebrahimie E, Zamansani F, Alanazi IO, Sabi EM, Khazandi M, Ebrahimi F, Mohammadi-Dehcheshmeh M, Ebrahimi M. Advances in understanding the specificity function of transporters by machine learning. Comput Biol Med 2021;138:104893. [PMID: 34598069 DOI: 10.1016/j.compbiomed.2021.104893] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 09/20/2021] [Accepted: 09/22/2021] [Indexed: 11/25/2022]

Zakipour Z, Alemzadeh A. Molecular evolution of Na, K-ATPase β-subunit. GENE REPORTS 2021. [DOI: 10.1016/j.genrep.2021.101204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Ahsan R, Tahsili MR, Ebrahimi F, Ebrahimie E, Ebrahimi M. Image processing unravels the evolutionary pattern of SARS-CoV-2 against SARS and MERS through position-based pattern recognition. Comput Biol Med 2021;134:104471. [PMID: 34004573 PMCID: PMC8106241 DOI: 10.1016/j.compbiomed.2021.104471] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 04/27/2021] [Accepted: 05/02/2021] [Indexed: 12/16/2022]

Ferguson AL, Ranganathan R. 100th Anniversary of Macromolecular Science Viewpoint: Data-Driven Protein Design. ACS Macro Lett 2021;10:327-340. [PMID: 35549066 DOI: 10.1021/acsmacrolett.0c00885] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Machine learning and statistics to qualify environments through multi-traits in Coffea arabica. PLoS One 2021;16:e0245298. [PMID: 33434204 PMCID: PMC7802962 DOI: 10.1371/journal.pone.0245298] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 12/25/2020] [Indexed: 11/30/2022] Open

Abstract

Several factors such as genotype, environment, and post-harvest processing can affect the responses of important traits in the coffee production chain. Determining the influence of these factors is of great relevance, as they can be indicators of the characteristics of the coffee produced. The most efficient models choice to be applied should take into account the variety of information and the particularities of each biological material. This study was developed to evaluate statistical and machine learning models that would better discriminate environments through multi-traits of coffee genotypes and identify the main agronomic and beverage quality traits responsible for the variation of the environments. For that, 31 morpho-agronomic and post-harvest traits were evaluated, from field experiments installed in three municipalities in the Matas de Minas region, in the State of Minas Gerais, Brazil. Two types of post-harvest processing were evaluated: natural and pulped. The apparent error rate was estimated for each method. The Multilayer Perceptron and Radial Basis Function networks were able to discriminate the coffee samples in multi-environment more efficiently than the other methods, identifying differences in multi-traits responses according to the production sites and type of post-harvest processing. The local factors did not present specific traits that favored the severity of diseases and differentiated vegetative vigor. Sensory traits acidity and fragrance/aroma score also made little contribution to the discrimination process, indicating that acidity and fragrance/aroma are characteristic of coffee produced and all coffee samples evaluated are of the special type in the Mata of Minas region. The main traits responsible for the differentiation of production sites are plant height, fruit size, and bean production. The sensory trait "Body" is the main one to discriminate the form of post-harvest processing.

Collapse

Liyaghatdar Z, Pezeshkian Z, Mohammadi-Dehcheshmeh M, Ebrahimie E. Fast school closures correspond with a lower rate of COVID-19 incidence and deaths in most countries. INFORMATICS IN MEDICINE UNLOCKED 2021;27:100805. [PMID: 34849394 PMCID: PMC8607689 DOI: 10.1016/j.imu.2021.100805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 11/01/2021] [Accepted: 11/21/2021] [Indexed: 01/31/2023] Open

Abstract

School closures have been used as one of the main nonpharmaceutical interventions to overcome the spread of SARS-CoV-2. Different countries use this intervention with a wide range of time intervals from the date of the first confirmed case or death. This study aimed to investigate whether fast or late school closures affect the cumulative number of COVID-19 cases or deaths. A worldwide population-based observational study has been conducted and a range of attributes were weighted using 10 attribute weighting models against the normalized number of infected cases or death in the form of numeric, binominal and polynomial labels. Statistical analysis was performed for the most weighted and the most common attributes of all types of labels. By the end of March 2021, the school closure data of 198 countries with at least one COVID-19 case were available. The days before the first school closure were one of the most weighted factors in relation to the normalized number of infected cases and deaths in numeric, binomial, and quartile forms. The average of days before the first school closure in the lowest quartile to highest quartile of infected cases (Q1, Q2, Q3 and Q4) was -6.10 [95% CI, -26.5 to 14.2], 9.35 [95% CI, 2.16 to 16.53], 17.55 [95% CI, 5.95 to 29.15], and 16.00 [95% CI, 11.69 to 20.31], respectively. In addition, 188 countries reported at least one death from COVID-19. The average of the days before the first school closure in the lowest quartile of death to highest quartile (Q1, Q2, Q3 and Q4) was -49.4 [95% CI, -76.5 to -22.3], -10.34 [95% CI, -30.12 to 9.44], -18.74 [95% CI, -32.72 to -4.77], and -12.89 [95% CI, -27.84 to 2.06], respectively. Countries that closed schools faster, especially before the detection of any confirmed case or death, had fewer COVID-19 cases or deaths per million of the population on total days of involvement. It can be concluded that rapid prevention policies are the main determinants of the countries' success.

Collapse

Gado JE, Beckham GT, Payne CM. Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning. J Chem Inf Model 2020;60:4098-4107. [DOI: 10.1021/acs.jcim.0c00489] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]

Piroozmand F, Ghadam P, Zarrabi M, Abdi-Ali A. Biochemical and computational study of an alginate lyase produced by Pseudomonas aeruginosa strain S21. IRANIAN JOURNAL OF BASIC MEDICAL SCIENCES 2020;23:454-460. [PMID: 32489560 PMCID: PMC7239423 DOI: 10.22038/ijbms.2020.37277.8874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Lee T, Lee H. Prediction of Alzheimer's disease using blood gene expression data. Sci Rep 2020;10:3485. [PMID: 32103140 PMCID: PMC7044318 DOI: 10.1038/s41598-020-60595-1] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 02/11/2020] [Indexed: 12/13/2022] Open

Hu Y, Zhao T, Zhang N, Zhang Y, Cheng L. A Review of Recent Advances and Research on Drug Target Identification Methods. Curr Drug Metab 2019;20:209-216. [PMID: 30251599 DOI: 10.2174/1389200219666180925091851] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 01/01/2018] [Accepted: 08/02/2018] [Indexed: 12/14/2022]

Yang Y, Ding X, Zhu G, Niroula A, Lv Q, Vihinen M. ProTstab - predictor for cellular protein stability. BMC Genomics 2019;20:804. [PMID: 31684883 PMCID: PMC6830000 DOI: 10.1186/s12864-019-6138-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 09/24/2019] [Indexed: 01/10/2023] Open

Khan MF, Kundu D, Hazra C, Patra S. A strategic approach of enzyme engineering by attribute ranking and enzyme immobilization on zinc oxide nanoparticles to attain thermostability in mesophilic Bacillus subtilis lipase for detergent formulation. Int J Biol Macromol 2019;136:66-82. [DOI: 10.1016/j.ijbiomac.2019.06.042] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Revised: 06/06/2019] [Accepted: 06/07/2019] [Indexed: 12/27/2022]

Karami K, Zerehdaran S, Javadmanesh A, Shariati MM, Fallahi H. Characterization of bovine (Bos taurus) imprinted genes from genomic to amino acid attributes by data mining approaches. PLoS One 2019;14:e0217813. [PMID: 31170205 PMCID: PMC6553745 DOI: 10.1371/journal.pone.0217813] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2018] [Accepted: 05/21/2019] [Indexed: 01/05/2023] Open

Karami K, Zerehdaran S, Javadmanesh A, Shariati MM, Fallahi H. Attribute selection and model evaluation for the maternal and paternal imprinted genes in bovine (Bos Taurus) using supervised machine learning algorithms. J Anim Breed Genet 2019;136:205-216. [DOI: 10.1111/jbg.12379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 12/06/2018] [Accepted: 12/06/2018] [Indexed: 11/29/2022]

Li G, Dong Y, Reetz MT. Can Machine Learning Revolutionize Directed Evolution of Selective Enzymes? Adv Synth Catal 2019. [DOI: 10.1002/adsc.201900149] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Lu M, Dukunde A, Daniel R. Biochemical profiles of two thermostable and organic solvent-tolerant esterases derived from a compost metagenome. Appl Microbiol Biotechnol 2019;103:3421-3437. [PMID: 30809711 DOI: 10.1007/s00253-019-09695-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 02/11/2019] [Accepted: 02/12/2019] [Indexed: 12/15/2022]

Kargarfard F, Sami A, Hemmatzadeh F, Ebrahimie E. Identifying mutation positions in all segments of influenza genome enables better differentiation between pandemic and seasonal strains. Gene 2019;697:78-85. [PMID: 30769139 DOI: 10.1016/j.gene.2019.01.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2018] [Revised: 12/29/2018] [Accepted: 01/17/2019] [Indexed: 01/08/2023]

Abstract

Influenza has a negative sense, single-stranded, and segmented RNA. In the context of pandemic influenza research, most studies have focused on variations in the surface proteins (Hemagglutinin and Neuraminidase). However, new findings suggest that all internal and external proteins of influenza viruses can contribute in pandemic emergence, pathogenicity and increasing host range. The occurrence of the 2009 influenza pandemic and the availability of many external and internal segments of pandemic and non-pandemic sequences offer a unique opportunity to evaluate the performance of machine learning models in discrimination of pandemic from seasonal sequences using mutation positions in all segments. In this study, we hypothesized that identifying mutation positions in all segments (proteins) encoded by the influenza genome would enable pandemic and seasonal strains to be more reliably distinguished. In a large scale study, we applied a range of data mining techniques to all segments of influenza for rule discovery and discrimination of pandemic from seasonal strains. CBA (classification based on association rule mining), Ripper and Decision tree algorithms were utilized to extract association rules among mutations. CBA outperformed the other models. Our approach could discriminate pandemic sequences from seasonal ones with more than 95% accuracy for PA and NP, 99.33% accuracy for NA and 100% accuracy, precision, specificity and sensitivity (recall) for M1, M2, PB1, NS1, and NS2. The values of precision, specificity, and sensitivity were more than 90% for other segments except PB2. If sequences of all segments of one strain were available, the accuracy of discrimination of pandemic strains was 100%. General rules extracted by rule base classification approaches, such as M1-V147I, NP-N334H, NS1-V112I, and PB1-L364I, were able to detect pandemic sequences with high accuracy. We observed that mutations on internal proteins of influenza can contribute in distinguishing the pandemic viruses, similar to the external ones.

Collapse

Alanazi IO, Al Shehri ZS, Ebrahimie E, Giahi H, Mohammadi-Dehcheshmeh M. Non-coding and coding genomic variants distinguish prostate cancer, castration-resistant prostate cancer, familial prostate cancer, and metastatic castration-resistant prostate cancer from each other. Mol Carcinog 2019;58:862-874. [PMID: 30644608 DOI: 10.1002/mc.22975] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 01/07/2019] [Accepted: 01/08/2019] [Indexed: 12/11/2022]

Abstract

A considerable number of deposited variants has provided new possibilities for knowledge discovery in different types of prostate cancer. Here, we analyzed variants located on 3'UTR, 5'UTR, CDs, Intergenic, and Intronic regions in castration-resistant prostate cancer (8496 variants), familial prostate cancer (3241 variants), metastatic castration-resistant prostate cancer (3693 variants), and prostate cancer (16599 variants). Chromosome regions 10p15-p14 and 2p13 were highly enriched (P < 0.00001) for variants located in 3'UTR, 5'UTR, CDs, intergenic, and intronic regions in castration-resistant prostate cancer. In contrast, 10p15-p14, 10q23.3, 12q13.11, 13q12.3, 1q25, and 8p22 regions were enriched (P < 0.001) in familial prostate cancer. In metastatic castration-resistant prostate cancer, 10p15-p14, 10q23.3, 11q22-q23, 14q21.1, and 14q32.13 were highly variant regions (P < 0.001). Chromosome 2 and chromosome 1 hosted many enriched variant regions. AKR1C3, BRCA1, BRCA2, CHGA, CYP19A1, HOXB13, KLK3, and PTEN contained the highest number of 3'UTR, 5'UTR, CDs, Intergenic, and Intronic variants. Network analysis showed that these genes are upstream of important functions including prostate gland development, tumor recurrence, prostate cancer-specific survival, tumor progression, cancer mortality, long-term survival, cancer recurrence, angiogenesis, and AR. Interestingly, all of EGFR, JAK2, NR3C1, PDZD2, and SEMA3C genes had single nucleotide polymorphisms (SNP) in castration-resistant prostate cancer, consistent with high selection pressure on these genes during drug treatment and consequent resistance. High occurrence of variants in 3'UTRs suggests the importance of regulatory variants in different types of prostate cancer; an area that has been neglected compared with coding variants. This study provides a comprehensive overview of genomic regions contributing to different types of prostate cancer.

Collapse

Mohammadi-Dehcheshmeh M, Niazi A, Ebrahimi M, Tahsili M, Nurollah Z, Ebrahimi Khaksefid R, Ebrahimi M, Ebrahimie E. Unified Transcriptomic Signature of Arbuscular Mycorrhiza Colonization in Roots of Medicago truncatula by Integration of Machine Learning, Promoter Analysis, and Direct Merging Meta-Analysis. FRONTIERS IN PLANT SCIENCE 2018;9:1550. [PMID: 30483277 PMCID: PMC6240842 DOI: 10.3389/fpls.2018.01550] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Accepted: 10/03/2018] [Indexed: 05/25/2023]

Abstract

Plant root symbiosis with Arbuscular mycorrhizal (AM) fungi improves uptake of water and mineral nutrients, improving plant development under stressful conditions. Unraveling the unified transcriptomic signature of a successful colonization provides a better understanding of symbiosis. We developed a framework for finding the transcriptomic signature of Arbuscular mycorrhiza colonization and its regulating transcription factors in roots of Medicago truncatula. Expression profiles of roots in response to AM species were collected from four separate studies and were combined by direct merging meta-analysis. Batch effect, the major concern in expression meta-analysis, was reduced by three normalization steps: Robust Multi-array Average algorithm, Z-standardization, and quartiling normalization. Then, expression profile of 33685 genes in 18 root samples of Medicago as numerical features, as well as study ID and Arbuscular mycorrhiza type as categorical features, were mined by seven models: RELIEF, UNCERTAINTY, GINI INDEX, Chi Squared, RULE, INFO GAIN, and INFO GAIN RATIO. In total, 73 genes selected by machine learning models were up-regulated in response to AM (Z-value difference > 0.5). Feature weighting models also documented that this signature is independent from study (batch) effect. The AM inoculation signature obtained was able to differentiate efficiently between AM inoculated and non-inoculated samples. The AP2 domain class transcription factor, GRAS family transcription factors, and cyclin-dependent kinase were among the highly expressed meta-genes identified in the signature. We found high correspondence between the AM colonization signature obtained in this study and independent RNA-seq experiments on AM colonization, validating the repeatability of the colonization signature. Promoter analysis of upregulated genes in the transcriptomic signature led to the key regulators of AM colonization, including the essential transcription factors for endosymbiosis establishment and development such as NF-YA factors. The approach developed in this study offers three distinct novel features: (I) it improves direct merging meta-analysis by integrating supervised machine learning models and normalization steps to reduce study-specific batch effects; (II) seven attribute weighting models assessed the suitability of each gene for the transcriptomic signature which contributes to robustness of the signature (III) the approach is justifiable, easy to apply, and useful in practice. Our integrative framework of meta-analysis, promoter analysis, and machine learning provides a foundation to reveal the transcriptomic signature and regulatory circuits governing Arbuscular mycorrhizal symbiosis and is transferable to the other biological settings.

Collapse

Khan MF, Patra S. Deciphering the rationale behind specific codon usage pattern in extremophiles. Sci Rep 2018;8:15548. [PMID: 30341344 PMCID: PMC6195531 DOI: 10.1038/s41598-018-33476-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 09/21/2018] [Indexed: 12/03/2022] Open

A large-scale study of indicators of sub-clinical mastitis in dairy cattle by attribute weighting analysis of milk composition features: highlighting the predictive power of lactose and electrical conductivity. J DAIRY RES 2018;85:193-200. [PMID: 29785910 DOI: 10.1017/s0022029918000249] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Abstract

Sub-clinical mastitis (SCM) affects milk composition. In this study, we hypothesise that large-scale mining of milk composition features by pattern recognition models can identify the best predictors of SCM within the milk composition features. To this end, using data mining algorithms, we conducted a large-scale and longitudinal study to evaluate the ability of various milk production parameters as indicators of SCM. SCM is the most prevalent disease of dairy cattle, causing substantial economic loss for the dairy industry. Developing new techniques to diagnose SCM in its early stages improves herd health and is of great importance. Test-day Somatic Cell Count (SCC) is the most common indicator of SCM and the primary mastitis surveillance approach worldwide. However, test-day SCC fluctuates widely between days, causing major concerns for its reliability. Consequently, there would be great benefit to identifying additional efficient indicators from large-scale and longitudinal studies. With this intent, data was collected at every milking (twice per day) for a period of 2 months from a single farm using in-line electronic equipment (346 248 records in total). The following data were analysed: milk volume, protein concentration, lactose concentration, electrical conductivity (EC), milking time and peak flow. Three SCC cut-offs were used to estimate the prevalence of SCM: Australian ≥ 250 000 cells/ml, European ≥200 000 cells/ml and New Zealand ≥ 150 000 cells/ml. At first, 10 different Attribute Weighting Algorithms (AWM) were applied to the data. In the absence of SCC, lactose concentration featured as the most important variable, followed by EC. For the first time, using attribute weighted modelling, we showed that the concentration of lactose in milk can be used as a strong indicator of SCM. The development of machine-learning expert systems using two or more milk variables (such as lactose concentration and EC) may produce a predictive pattern for early SCM detection.

Collapse

Farhadian M, Rafat SA, Hasanpur K, Ebrahimi M, Ebrahimie E. Cross-Species Meta-Analysis of Transcriptomic Data in Combination With Supervised Machine Learning Models Identifies the Common Gene Signature of Lactation Process. Front Genet 2018;9:235. [PMID: 30050559 PMCID: PMC6052129 DOI: 10.3389/fgene.2018.00235] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 06/13/2018] [Indexed: 01/13/2023] Open

Sharifi S, Pakdel A, Ebrahimi M, Reecy JM, Fazeli Farsani S, Ebrahimie E. Integration of machine learning and meta-analysis identifies the transcriptomic bio-signature of mastitis disease in cattle. PLoS One 2018;13:e0191227. [PMID: 29470489 PMCID: PMC5823400 DOI: 10.1371/journal.pone.0191227] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Accepted: 12/29/2017] [Indexed: 12/14/2022] Open

Abstract

Gram-negative bacteria such as Escherichia coli (E. coli) are assumed to be among the main agents that cause severe mastitis disease with clinical signs in dairy cattle. Rapid detection of this disease is so important in order to prevent transmission to other cows and helps to reduce inappropriate use of antibiotics. With the rapid progress in high-throughput technologies, and accumulation of various kinds of '-omics' data in public repositories, there is an opportunity to retrieve, integrate, and reanalyze these resources to improve the diagnosis and treatment of different diseases and to provide mechanistic insights into host resistance in an efficient way. Meta-analysis is a relatively inexpensive option with good potential to increase the statistical power and generalizability of single-study analysis. In the current meta-analysis research, six microarray-based studies that investigate the transcriptome profile of mammary gland tissue after induced mastitis by E. coli infection were used. This meta-analysis not only reinforced the findings in individual studies, but also several novel terms including responses to hypoxia, response to drug, anti-apoptosis and positive regulation of transcription from RNA polymerase II promoter enriched by up-regulated genes. Finally, in order to identify the small sets of genes that are sufficiently informative in E. coli mastitis, the differentially expressed gene introduced by meta-analysis were prioritized by using ten different attribute weighting algorithms. Twelve meta-genes were detected by the majority of attribute weighting algorithms (with weight above 0.7) as most informative genes including CXCL8 (IL8), NFKBIZ, HP, ZC3H12A, PDE4B, CASP4, CXCL2, CCL20, GRO1(CXCL1), CFB, S100A9, and S100A8. Interestingly, the results have been demonstrated that all of these genes are the key genes in the immune response, inflammation or mastitis. The Decision tree models efficiently discovered the best combination of the meta-genes as bio-signature and confirmed that some of the top-ranked genes -ZC3H12A, CXCL2, GRO, CFB- as biomarkers for E. coli mastitis (with the accuracy 83% in average). This research properly indicated that by combination of two novel data mining tools, meta-analysis and machine learning, increased power to detect most informative genes that can help to improve the diagnosis and treatment strategies for E. coli associated with mastitis in cattle.

Collapse

Kargarfard F, Sami A, Mohammadi-Dehcheshmeh M, Ebrahimie E. Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments. BMC Genomics 2016;17:925. [PMID: 27852224 PMCID: PMC5112743 DOI: 10.1186/s12864-016-3250-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 11/02/2016] [Indexed: 01/01/2023] Open

Abstract

BACKGROUND

Recent (2013 and 2009) zoonotic transmission of avian or porcine influenza to humans highlights an increase in host range by evading species barriers. Gene reassortment or antigenic shift between viruses from two or more hosts can generate a new life-threatening virus when the new shuffled virus is no longer recognized by antibodies existing within human populations. There is no large scale study to help understand the underlying mechanisms of host transmission. Furthermore, there is no clear understanding of how different segments of the influenza genome contribute in the final determination of host range.

METHODS

To obtain insight into the rules underpinning host range determination, various supervised machine learning algorithms were employed to mine reassortment changes in different viral segments in a range of hosts. Our multi-host dataset contained whole segments of 674 influenza strains organized into three host categories: avian, human, and swine. Some of the sequences were assigned to multiple hosts. In point of fact, the datasets are a form of multi-labeled dataset and we utilized a multi-label learning method to identify discriminative sequence sites. Then algorithms such as CBA, Ripper, and decision tree were applied to extract informative and descriptive association rules for each viral protein segment.

RESULT

We found informative rules in all segments that are common within the same host class but varied between different hosts. For example, for infection of an avian host, HA14V and NS1230S were the most important discriminative and combinatorial positions.

CONCLUSION

Host range identification is facilitated by high support combined rules in this study. Our major goal was to detect discriminative genomic positions that were able to identify multi host viruses, because such viruses are likely to cause pandemic or disastrous epidemics.

Collapse

Jamali AA, Ferdousi R, Razzaghi S, Li J, Safdari R, Ebrahimie E. DrugMiner: comparative analysis of machine learning algorithms for prediction of potential druggable proteins. Drug Discov Today 2016;21:718-24. [PMID: 26821132 DOI: 10.1016/j.drudis.2016.01.007] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2015] [Revised: 12/05/2015] [Accepted: 01/19/2016] [Indexed: 12/14/2022]

Zinati Z, Alemzadeh A, KayvanJoo AH. Computational approaches for classification and prediction of P-type ATPase substrate specificity in Arabidopsis. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2016;22:163-174. [PMID: 27186030 PMCID: PMC4840148 DOI: 10.1007/s12298-016-0351-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 03/15/2016] [Accepted: 03/28/2016] [Indexed: 06/05/2023]

Nasiri J, Naghavi MR, Kayvanjoo AH, Nasiri M, Ebrahimi M. Precision assessment of some supervised and unsupervised algorithms for genotype discrimination in the genus Pisum using SSR molecular data. J Theor Biol 2015;368:122-32. [PMID: 25591889 DOI: 10.1016/j.jtbi.2015.01.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Revised: 11/06/2014] [Accepted: 01/01/2015] [Indexed: 10/24/2022]

Jemli S, Ayadi-Zouari D, Hlima HB, Bejar S. Biocatalysts: application and engineering for industrial purposes. Crit Rev Biotechnol 2014;36:246-58. [DOI: 10.3109/07388551.2014.950550] [Citation(s) in RCA: 119] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

New layers in understanding and predicting α-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. Comput Biol Med 2014;54:14-23. [DOI: 10.1016/j.compbiomed.2014.08.019] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Revised: 08/16/2014] [Accepted: 08/17/2014] [Indexed: 12/11/2022]

KayvanJoo AH, Ebrahimi M, Haqshenas G. Prediction of hepatitis C virus interferon/ribavirin therapy outcome based on viral nucleotide attributes using machine learning algorithms. BMC Res Notes 2014;7:565. [PMID: 25150834 PMCID: PMC4246553 DOI: 10.1186/1756-0500-7-565] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 08/10/2014] [Indexed: 02/07/2023] Open

Shekoofa A, Emam Y, Shekoufa N, Ebrahimi M, Ebrahimie E. Determining the most important physiological and agronomic traits contributing to maize grain yield through machine learning algorithms: a new avenue in intelligent agriculture. PLoS One 2014;9:e97288. [PMID: 24830330 PMCID: PMC4022653 DOI: 10.1371/journal.pone.0097288] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2013] [Accepted: 04/17/2014] [Indexed: 11/19/2022] Open

Abstract

Prediction is an attempt to accurately forecast the outcome of a specific situation while using input information obtained from a set of variables that potentially describe the situation. They can be used to project physiological and agronomic processes; regarding this fact, agronomic traits such as yield can be affected by a large number of variables. In this study, we analyzed a large number of physiological and agronomic traits by screening, clustering, and decision tree models to select the most relevant factors for the prospect of accurately increasing maize grain yield. Decision tree models (with nearly the same performance evaluation) were the most useful tools in understanding the underlying relationships in physiological and agronomic features for selecting the most important and relevant traits (sowing date-location, kernel number per ear, maximum water content, kernel weight, and season duration) corresponding to the maize grain yield. In particular, decision tree generated by C&RT algorithm was the best model for yield prediction based on physiological and agronomical traits which can be extensively employed in future breeding programs. No significant differences in the decision tree models were found when feature selection filtering on data were used, but positive feature selection effect observed in clustering models. Finally, the results showed that the proposed model techniques are useful tools for crop physiologists to search through large datasets seeking patterns for the physiological and agronomic factors, and may assist the selection of the most important traits for the individual site and field. In particular, decision tree models are method of choice with the capability of illustrating different pathways of yield increase in breeding programs, governed by their hierarchy structure of feature ranking as well as pattern discovery via various combinations of features.

Collapse

Bakhtiarizadeh MR, Moradi-Shahrbabak M, Ebrahimi M, Ebrahimie E. Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology. J Theor Biol 2014;356:213-22. [PMID: 24819464 DOI: 10.1016/j.jtbi.2014.04.040] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 04/03/2014] [Accepted: 04/29/2014] [Indexed: 01/05/2023]

Ebrahimi M, Aghagolzadeh P, Shamabadi N, Tahmasebi A, Alsharifi M, Adelson DL, Hemmatzadeh F, Ebrahimie E. Understanding the undelaying mechanism of HA-subtyping in the level of physic-chemical characteristics of protein. PLoS One 2014;9:e96984. [PMID: 24809455 PMCID: PMC4014573 DOI: 10.1371/journal.pone.0096984] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 04/07/2014] [Indexed: 01/05/2023] Open

Abstract

The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics.

Collapse

Predictions of Enzymatic Parameters: A Mini-Review with Focus on Enzymes for Biofuel. Appl Biochem Biotechnol 2013;171:590-615. [DOI: 10.1007/s12010-013-0328-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Accepted: 06/11/2013] [Indexed: 12/25/2022]

Hosseinzadeh F, Kayvanjoo AH, Ebrahimi M, Goliaei B. Prediction of lung tumor types based on protein attributes by machine learning algorithms. SPRINGERPLUS 2013;2:238. [PMID: 23888262 PMCID: PMC3710575 DOI: 10.1186/2193-1801-2-238] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2013] [Accepted: 03/21/2013] [Indexed: 01/15/2023]

Chow J, Kovacic F, Dall Antonia Y, Krauss U, Fersini F, Schmeisser C, Lauinger B, Bongen P, Pietruszka J, Schmidt M, Menyes I, Bornscheuer UT, Eckstein M, Thum O, Liese A, Mueller-Dieckmann J, Jaeger KE, Streit WR. The metagenome-derived enzymes LipS and LipT increase the diversity of known lipases. PLoS One 2012;7:e47665. [PMID: 23112831 PMCID: PMC3480424 DOI: 10.1371/journal.pone.0047665] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2012] [Accepted: 09/13/2012] [Indexed: 11/18/2022] Open

Affiliation(s)

Jennifer Chow Department of Microbiology and Biotechnology, Biocenter Klein Flottbek, University of Hamburg, Hamburg, Germany
Filip Kovacic Institute of Molecular Enzyme Technology, Heinrich Heine University Duesseldorf, Research Center Juelich, Juelich, Germany
Yuliya Dall Antonia European Molecular Biology Laboratory (EMBL) Hamburg Outstation, c/o Deutsches Elektronen-Synchrotron (DESY), Hamburg, Germany
Ulrich Krauss Institute of Molecular Enzyme Technology, Heinrich Heine University Duesseldorf, Research Center Juelich, Juelich, Germany
Francesco Fersini European Molecular Biology Laboratory (EMBL) Hamburg Outstation, c/o Deutsches Elektronen-Synchrotron (DESY), Hamburg, Germany
Christel Schmeisser Department of Microbiology and Biotechnology, Biocenter Klein Flottbek, University of Hamburg, Hamburg, Germany
Benjamin Lauinger Institute of Bioorganic Chemistry, Heinrich Heine University Duesseldorf, Research Center Juelich, Juelich, Germany
Patrick Bongen Institute of Bioorganic Chemistry, Heinrich Heine University Duesseldorf, Research Center Juelich, Juelich, Germany
Joerg Pietruszka Institute of Bioorganic Chemistry, Heinrich Heine University Duesseldorf, Research Center Juelich, Juelich, Germany
Marlen Schmidt Department of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Greifswald, Germany
Ina Menyes Department of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Greifswald, Germany
Uwe T. Bornscheuer Department of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Greifswald, Germany
Marrit Eckstein Bioprocess Development Consumer Specialties and Biocatalysis Biotechnology, Evonik Industries AG, Essen, Germany
Oliver Thum Bioprocess Development Consumer Specialties and Biocatalysis Biotechnology, Evonik Industries AG, Essen, Germany
Andreas Liese Institute of Technical Biocatalysis, Hamburg University of Technology, Hamburg, Germany
Jochen Mueller-Dieckmann European Molecular Biology Laboratory (EMBL) Hamburg Outstation, c/o Deutsches Elektronen-Synchrotron (DESY), Hamburg, Germany
Karl-Erich Jaeger Institute of Molecular Enzyme Technology, Heinrich Heine University Duesseldorf, Research Center Juelich, Juelich, Germany
Wolfgang R. Streit Department of Microbiology and Biotechnology, Biocenter Klein Flottbek, University of Hamburg, Hamburg, Germany

Collapse

Moghadam AA, Taghavi SM, Niazi A, Djavaheri M, Ebrahimie E. Isolation and in silico functional analysis of MtATP6, a 6-kDa subunit of mitochondrial F₁F0-ATP synthase, in response to abiotic stress. GENETICS AND MOLECULAR RESEARCH 2012;11:3547-67. [PMID: 23096681 DOI: 10.4238/2012.october.4.3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]

A new avenue for classification and prediction of olive cultivars using supervised and unsupervised algorithms. PLoS One 2012;7:e44164. [PMID: 22957050 PMCID: PMC3434224 DOI: 10.1371/journal.pone.0044164] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2012] [Accepted: 07/30/2012] [Indexed: 11/19/2022] Open

Hosseinzadeh F, Ebrahimi M, Goliaei B, Shamabadi N. Classification of lung cancer tumors based on structural and physicochemical properties of proteins by bioinformatics models. PLoS One 2012;7:e40017. [PMID: 22829872 PMCID: PMC3400626 DOI: 10.1371/journal.pone.0040017] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Accepted: 05/30/2012] [Indexed: 12/03/2022] Open

Abstract

Rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important in diagnosis of this disease. Furthermore sequence-derived structural and physicochemical descriptors are very useful for machine learning prediction of protein structural and functional classes, classifying proteins and the prediction performance. Herein, in this study is the classification of lung tumors based on 1497 attributes derived from structural and physicochemical properties of protein sequences (based on genes defined by microarray analysis) investigated through a combination of attribute weighting, supervised and unsupervised clustering algorithms. Eighty percent of the weighting methods selected features such as autocorrelation, dipeptide composition and distribution of hydrophobicity as the most important protein attributes in classification of SCLC, NSCLC and COMMON classes of lung tumors. The same results were observed by most tree induction algorithms while descriptors of hydrophobicity distribution were high in protein sequences COMMON in both groups and distribution of charge in these proteins was very low; showing COMMON proteins were very hydrophobic. Furthermore, compositions of polar dipeptide in SCLC proteins were higher than NSCLC proteins. Some clustering models (alone or in combination with attribute weighting algorithms) were able to nearly classify SCLC and NSCLC proteins. Random Forest tree induction algorithm, calculated on leaves one-out and 10-fold cross validation) shows more than 86% accuracy in clustering and predicting three different lung cancer tumors. Here for the first time the application of data mining tools to effectively classify three classes of lung cancer tumors regarding the importance of dipeptide composition, autocorrelation and distribution descriptor has been reported.

Collapse

Bakhtiarizadeh MR, Ebrahimi M, Ebrahimie E. Discovery of EST-SSRs in lung cancer: tagged ESTs with SSRs lead to differential amino acid and protein expression patterns in cancerous tissues. PLoS One 2011;6:e27118. [PMID: 22073269 PMCID: PMC3208562 DOI: 10.1371/journal.pone.0027118] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2011] [Accepted: 10/11/2011] [Indexed: 11/18/2022] Open

Abstract

Tandem repeats are found in both coding and non-coding sequences of higher organisms. These sequences can be used in cancer genetics and diagnosis to unravel the genetic basis of tumor formation and progression. In this study, a possible relationship between SSR distributions and lung cancer was studied by comparative analysis of EST-SSRs in normal and lung cancerous tissues. While the EST-SSR distribution was similar between tumorous tissues, this distribution was different between normal and tumorous tissues. Trinucleotides tandem repeats were highly different; the number of trinucleotides in ESTs of lung cancer was 3 times higher than normal tissue. Significant negative correlation between normal and cancerous tissue showed that cancerous tissue generates different types of trinucleotides. GGC and CGC were the more frequent expressed trinucleotides in cancerous tissue, but these SSRs were not expressed in normal tissue. Similar to the EST level, the expression pattern of EST-SSRs-derived amino acids was significantly different between normal and cancerous tissues. Arg, Pro, Ser, Gly, and Lys were the most abundant amino acids in cancerous tissues, and Leu, Cys, Phe, and His were significantly more abundant in normal tissues than in cancerous tissues. Next, the putative functions of triplet SSR-containing genes were analyzed. In cancerous tissue, EST-SSRs produce different types of proteins. Chromodomain helicase DNA binding proteins were one of the major protein products of EST-SSRs in the cancerous library, while these proteins were not produced from EST-SSRs in normal tissue. For the first time, the findings of this study confirmed that EST-SSRs in normal lung tissues are different than in unhealthy tissues, and tagged ESTs with SSRs cause remarkable differences in amino acid and protein expression patterns in cancerous tissue. We suggest that EST-SSRs and EST-SSRs differentially expressed in cancerous tissue may be suitable candidate markers for lung cancer diagnosis and prediction.

Collapse