1
|
Vassileff N, Spiers JG, Lee JD, Woodruff TM, Ebrahimie E, Mohammadi Dehcheshmeh M, Hill AF, Cheng L. A Panel of miRNA Biomarkers Common to Serum and Brain-Derived Extracellular Vesicles Identified in Mouse Model of Amyotrophic Lateral Sclerosis. Mol Neurobiol 2024; 61:5901-5915. [PMID: 38252383 PMCID: PMC11249427 DOI: 10.1007/s12035-023-03857-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 12/05/2023] [Indexed: 01/23/2024]
Abstract
Amyotrophic lateral sclerosis (ALS) is a progressive motor neuron disease characterised by the deposition of aggregated proteins including TAR DNA-binding protein 43 (TDP-43) in vulnerable motor neurons and the brain. Extracellular vesicles (EVs) facilitate the spread of neurodegenerative diseases and can be easily accessed in the bloodstream. This study aimed to identify a panel of EV miRNAs that can capture the pathology occurring in the brain and peripheral circulation. EVs were isolated from the cortex (BDEVs) and serum (serum EVs) of 3 month-old and 6-month-old TDP-43*Q331K and TDP-43*WT mice. Following characterisation and miRNA isolation, the EVs underwent next-generation sequencing where 24 differentially packaged miRNAs were identified in the TDP-43*Q331K BDEVs and 7 in the TDP-43*Q331K serum EVs. Several miRNAs, including miR-183-5p, were linked to ALS. Additionally, miR-122-5p and miR-486b-5p were identified in both panels, demonstrating the ability of the serum EVs to capture the dysregulation occurring in the brain. This is the first study to identify miRNAs common to both the serum EVs and BDEVs in a mouse model of ALS.
Collapse
Affiliation(s)
- Natasha Vassileff
- Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Bundoora, Victoria, Australia
| | - Jereme G Spiers
- Clear Vision Research, Eccles Institute of Neuroscience, John Curtin School of Medical Research, College of Health and Medicine, The Australian National University, Acton, ACT, Australia
- School of Medicine and Psychology, College of Health and Medicine, The Australian National University, Acton, ACT, Australia
| | - John D Lee
- School of Biomedical Sciences, The University of Queensland, St. Lucia, Australia
| | - Trent M Woodruff
- School of Biomedical Sciences, The University of Queensland, St. Lucia, Australia
| | - Esmaeil Ebrahimie
- Genomics Research Platform, School of Agriculture, Biomedicine and Environment, La Trobe University, Melbourne, VIC, 3000, Australia
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, SA, 5371, Australia
- School of BioSciences, The University of Melbourne, Melbourne, VIC, 3010, Australia
| | | | - Andrew F Hill
- Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Bundoora, Victoria, Australia
- Institute for Health and Sport, Victoria University, Footscray, Victoria, Australia
| | - Lesley Cheng
- Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Bundoora, Victoria, Australia.
| |
Collapse
|
2
|
Zhao K, Ebrahimie E, Mohammadi-Dehcheshmeh M, Lewsey MG, Zheng L, Hoogenraad NJ. Transcriptomic signature of cancer cachexia by integration of machine learning, literature mining and meta-analysis. Comput Biol Med 2024; 172:108233. [PMID: 38452471 DOI: 10.1016/j.compbiomed.2024.108233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 01/23/2024] [Accepted: 02/25/2024] [Indexed: 03/09/2024]
Abstract
BACKGROUND Cancer cachexia is a severe metabolic syndrome marked by skeletal muscle atrophy. A successful clinical intervention for cancer cachexia is currently lacking. The study of cachexia mechanisms is largely based on preclinical animal models and the availability of high-throughput transcriptomic datasets of cachectic mouse muscles is increasing through the extensive use of next generation sequencing technologies. METHODS Cachectic mouse muscle transcriptomic datasets of ten different studies were combined and mined by seven attribute weighting models, which analysed both categorical variables and numerical variables. The transcriptomic signature of cancer cachexia was identified by attribute weighting algorithms and was used to evaluate the performance of eleven pattern discovery models. The signature was employed to find the best combination of drugs (drug repurposing) for developing cancer cachexia treatment strategies, as well as to evaluate currently used cachexia drugs by literature mining. RESULTS Attribute weighting algorithms ranked 26 genes as the transcriptomic signature of muscle from mice with cancer cachexia. Deep Learning and Random Forest models performed better in differentiating cancer cachexia cases based on muscle transcriptomic data. Literature mining revealed that a combination of melatonin and infliximab has negative interactions with 2 key genes (Rorc and Fbxo32) upregulated in the transcriptomic signature of cancer cachexia in muscle. CONCLUSIONS The integration of machine learning, meta-analysis and literature mining was found to be an efficient approach to identifying a robust transcriptomic signature for cancer cachexia, with implications for improving clinical diagnosis and management of this condition.
Collapse
Affiliation(s)
- Kening Zhao
- Department of Laboratory Medicine, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, China; La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia.
| | - Esmaeil Ebrahimie
- Genomics Research Platform, School of Agriculture, Biomedicine and Environment, La Trobe University, Melbourne, VIC, 3086, Australia; School of Animal and Veterinary Science, The University of Adelaide, Adelaide, SA 5371, Australia; School of BioSciences, The University of Melbourne, Melbourne, VIC, 3010, Australia.
| | - Manijeh Mohammadi-Dehcheshmeh
- Genomics Research Platform, School of Agriculture, Biomedicine and Environment, La Trobe University, Melbourne, VIC, 3086, Australia; School of Animal and Veterinary Science, The University of Adelaide, Adelaide, SA 5371, Australia.
| | - Mathew G Lewsey
- Australian Research Council Research Hub for Medicinal Agriculture, La Trobe University, AgriBio Building, Bundoora, VIC, 3086, Australia; La Trobe Institute for Sustainable Agriculture and Food, Department of Plant, Animal and Soil Sciences, La Trobe University, AgriBio Building, Bundoora, VIC, 3086, Australia; Australian Research Council Centre of Excellence in Plants for Space, AgriBio Building, La Trobe University, Bundoora, VIC, 3086, Australia.
| | - Lei Zheng
- Department of Laboratory Medicine, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, China.
| | - Nick J Hoogenraad
- La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia; Tumour Targeting Laboratory, Olivia Newton-John Cancer Research Institute, School of Cancer Medicine, La Trobe University, Melbourne, VIC, 3084, Australia.
| |
Collapse
|
3
|
Yang Y, Zhao J, Zeng L, Vihinen M. ProTstab2 for Prediction of Protein Thermal Stabilities. Int J Mol Sci 2022; 23:ijms231810798. [PMID: 36142711 PMCID: PMC9505338 DOI: 10.3390/ijms231810798] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/12/2022] [Accepted: 09/13/2022] [Indexed: 11/16/2022] Open
Abstract
The stability of proteins is an essential property that has several biological implications. Knowledge about protein stability is important in many ways, ranging from protein purification and structure determination to stability in cells and biotechnological applications. Experimental determination of thermal stabilities has been tedious and available data have been limited. The introduction of limited proteolysis and mass spectrometry approaches has facilitated more extensive cellular protein stability data production. We collected melting temperature information for 34,913 proteins and developed a machine learning predictor, ProTstab2, by utilizing a gradient boosting algorithm after testing seven algorithms. The method performance was assessed on a blind test data set and showed a Pearson correlation coefficient of 0.753 and root mean square error of 7.005. Comparison to previous methods indicated that ProTstab2 had superior performance. The method is fast, so it was applied to predict and compare the stabilities of all proteins in human, mouse, and zebrafish proteomes for which experimental data were not determined. The tool is freely available.
Collapse
Affiliation(s)
- Yang Yang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Jianjun Zhao
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Lianjie Zeng
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22184 Lund, Sweden
- Correspondence:
| |
Collapse
|
4
|
Jafari O, Ebrahimi M, Hedayati SAA, Zeinalabedini M, Poorbagher H, Nasrolahpourmoghadam M, Fernandes JMO. Integration of Morphometrics and Machine Learning Enables Accurate Distinction between Wild and Farmed Common Carp. LIFE (BASEL, SWITZERLAND) 2022; 12:life12070957. [PMID: 35888047 PMCID: PMC9315565 DOI: 10.3390/life12070957] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 06/16/2022] [Accepted: 06/20/2022] [Indexed: 11/16/2022]
Abstract
Morphology and feature selection are key approaches to address several issues in fisheries science and stock management, such as the hypothesis of admixture of Caspian common carp (Cyprinus carpio) and farmed carp stocks in Iran. The present study was performed to investigate the population classification of common carp in the southern Caspian basin using data mining algorithms to find the most important characteristic(s) differing between Iranian and farmed common carp. A total of 74 individuals were collected from three locations within the southern Caspian basin and from one farm between November 2015 and April 2016. A dataset of 26 traditional morphometric (TMM) attributes and a dataset of 14 geometric landmark points were constructed and then subjected to various machine learning methods. In general, the machine learning methods had a higher prediction rate with TMM datasets. The highest decision tree accuracy of 77% was obtained by rule and decision tree parallel algorithms, and “head height on eye area” was selected as the best marker to distinguish between wild and farmed common carp. Various machine learning algorithms were evaluated, and we found that the linear discriminant was the best method, with 81.1% accuracy. The results obtained from this novel approach indicate that Darwin’s domestication syndrome is observed in common carp. Moreover, they pave the way for automated detection of farmed fish, which will be most beneficial to detect escapees and improve restocking programs.
Collapse
Affiliation(s)
- Omid Jafari
- International Sturgeon Research Institute, Iranian Fisheries Science Research Institute, Agricultural Research, Education and Extension Organization, Rasht 416353464, Iran
- Correspondence: (O.J.); (J.M.O.F.)
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Science, University of Qom, Qom 3716146611, Iran;
| | - Seyed Ali-Akbar Hedayati
- Department of Fisheries, Faculty of Fisheries and Environmental Sciences, Gorgan University of Agricultural Sciences and Natural Resources, Gorgan 4913815739, Iran;
| | - Mehrshad Zeinalabedini
- Department of Genomics, Agricultural Biotechnology Research Institute of Iran (ABRII), Karaj 3135933151, Iran;
| | - Hadi Poorbagher
- Department of Fisheries Sciences, Faculty of Natural Resources, University of Tehran, Karaj 3158777871, Iran; (H.P.); (M.N.)
| | - Maryam Nasrolahpourmoghadam
- Department of Fisheries Sciences, Faculty of Natural Resources, University of Tehran, Karaj 3158777871, Iran; (H.P.); (M.N.)
| | - Jorge M. O. Fernandes
- Faculty of Biosciences and Aquaculture, Nord University, 8026 Bodø, Norway
- Correspondence: (O.J.); (J.M.O.F.)
| |
Collapse
|
5
|
Shahraki MF, Atanaki FF, Ariaeenejad S, Ghaffari MR, Norouzi‐Beirami MH, Maleki M, Salekdeh GH, Kavousi K. A computational learning paradigm to targeted discovery of biocatalysts from metagenomic data: a case study of lipase identification. Biotechnol Bioeng 2022; 119:1115-1128. [DOI: 10.1002/bit.28037] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 08/18/2021] [Accepted: 12/01/2021] [Indexed: 11/09/2022]
Affiliation(s)
- Mehdi Foroozandeh Shahraki
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran Tehran Iran
| | - Fereshteh Fallah Atanaki
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran Tehran Iran
| | - Shohreh Ariaeenejad
- Department of Systems and Synthetic Biology Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREEO) Karaj Iran
| | - Mohammad Reza Ghaffari
- Department of Systems and Synthetic Biology Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREEO) Karaj Iran
| | - Mohammad Hossein Norouzi‐Beirami
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran Tehran Iran
- Department of Computer Engineering Osku Branch, Islamic Azad University Osku Iran
| | - Morteza Maleki
- Department of Systems and Synthetic Biology Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREEO) Karaj Iran
| | - Ghasem Hosseini Salekdeh
- Department of Systems and Synthetic Biology Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREEO) Karaj Iran
- Department of Molecular Sciences Macquarie University Sydney NSW Australia
| | - Kaveh Kavousi
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran Tehran Iran
| |
Collapse
|
6
|
Ghahramani N, Shodja J, Rafat SA, Panahi B, Hasanpur K. Integrative Systems Biology Analysis Elucidates Mastitis Disease Underlying Functional Modules in Dairy Cattle. Front Genet 2021; 12:712306. [PMID: 34691146 PMCID: PMC8531812 DOI: 10.3389/fgene.2021.712306] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 08/30/2021] [Indexed: 11/13/2022] Open
Abstract
Background: Mastitis is the most prevalent disease in dairy cattle and one of the most significant bovine pathologies affecting milk production, animal health, and reproduction. In addition, mastitis is the most common, expensive, and contagious infection in the dairy industry. Methods: A meta-analysis of microarray and RNA-seq data was conducted to identify candidate genes and functional modules associated with mastitis disease. The results were then applied to systems biology analysis via weighted gene coexpression network analysis (WGCNA), Gene Ontology, enrichment analysis for the Kyoto Encyclopedia of Genes and Genomes (KEGG), and modeling using machine-learning algorithms. Results: Microarray and RNA-seq datasets were generated for 2,089 and 2,794 meta-genes, respectively. Between microarray and RNA-seq datasets, a total of 360 meta-genes were found that were significantly enriched as "peroxisome," "NOD-like receptor signaling pathway," "IL-17 signaling pathway," and "TNF signaling pathway" KEGG pathways. The turquoise module (n = 214 genes) and the brown module (n = 57 genes) were identified as critical functional modules associated with mastitis through WGCNA. PRDX5, RAB5C, ACTN4, SLC25A16, MAPK6, CD53, NCKAP1L, ARHGEF2, COL9A1, and PTPRC genes were detected as hub genes in identified functional modules. Finally, using attribute weighting and machine-learning methods, hub genes that are sufficiently informative in Escherichia coli mastitis were used to optimize predictive models. The constructed model proposed the optimal approach for the meta-genes and validated several high-ranked genes as biomarkers for E. coli mastitis using the decision tree (DT) method. Conclusion: The candidate genes and pathways proposed in this study may shed new light on the underlying molecular mechanisms of mastitis disease and suggest new approaches for diagnosing and treating E. coli mastitis in dairy cattle.
Collapse
Affiliation(s)
- Nooshin Ghahramani
- Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
| | - Jalil Shodja
- Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
| | - Seyed Abbas Rafat
- Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
| | - Bahman Panahi
- Department of Genomics, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran
| | - Karim Hasanpur
- Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
| |
Collapse
|
7
|
Ebrahimie E, Zamansani F, Alanazi IO, Sabi EM, Khazandi M, Ebrahimi F, Mohammadi-Dehcheshmeh M, Ebrahimi M. Advances in understanding the specificity function of transporters by machine learning. Comput Biol Med 2021; 138:104893. [PMID: 34598069 DOI: 10.1016/j.compbiomed.2021.104893] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 09/20/2021] [Accepted: 09/22/2021] [Indexed: 11/25/2022]
Abstract
Understanding the underlying molecular mechanism of transporter activity is one of the major discussions in structural biology. A transporter can exclusively transport one ion (specific transporter) or multiple ions (general transporter). This study compared categorical and numerical features of general and specific calcium transporters using machine learning and attribute weighting models. To this end, 444 protein features, such as the frequency of dipeptides, organism, and subcellular location, were extracted for general (n = 103) and specific calcium transporters (n = 238). Aliphatic index, subcellular location, organism, Ile-Leu frequency, Glycine frequency, hydrophobic frequency, and specific dipeptides such as Ile-Leu, Phe-Val, and Tyr-Gln were the key features in differentiating general from specific calcium transporters. Calcium transporters in the cell outer membranes were specific, while the inner ones were general; additionally, when the hydrophobic frequency or Aliphatic index is increased, the calcium transporter act as a general transporter. Random Forest with accuracy criterion showed the highest accuracy (88.88% ±5.75%) and high AUC (0.964 ± 0.020), based on 5-fold cross-validation. Decision Tree with accuracy criterion was able to predict the specificity of calcium transporter irrespective of the organism and subcellular location. This study demonstrates the precise classification of transporter function based on sequence-derived physicochemical features.
Collapse
Affiliation(s)
- Esmaeil Ebrahimie
- Genomics Research Platform, School of Life Sciences, College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, 3086, Australia; School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia.
| | - Fatemeh Zamansani
- Department of Crop Production and Plant Breeding, College of Agriculture, Shiraz University, Shiraz, Iran.
| | - Ibrahim O Alanazi
- National Center for Biotechnology, Life Science and Environment Research Institute, King Abdulaziz City for Science and Technology (KACST), Riyadh, 6086, Saudi Arabia.
| | - Essa M Sabi
- Department of Pathology, Clinical Biochemistry Unit, College of Medicine, King Saud University, Riyadh, 11461, Saudi Arabia.
| | - Manouchehr Khazandi
- UniSA Clinical and Health Sciences, The University of South Australia, Adelaide, 5000, Australia.
| | - Faezeh Ebrahimi
- Faculty of Life Sciences and Biotechnology, Department of Microbiology and Microbial Biotechnology, Shahid Beheshti University, Tehran, Iran.
| | | | - Mansour Ebrahimi
- School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia; Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran.
| |
Collapse
|
8
|
Zakipour Z, Alemzadeh A. Molecular evolution of Na, K-ATPase β-subunit. GENE REPORTS 2021. [DOI: 10.1016/j.genrep.2021.101204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
9
|
Ahsan R, Tahsili MR, Ebrahimi F, Ebrahimie E, Ebrahimi M. Image processing unravels the evolutionary pattern of SARS-CoV-2 against SARS and MERS through position-based pattern recognition. Comput Biol Med 2021; 134:104471. [PMID: 34004573 PMCID: PMC8106241 DOI: 10.1016/j.compbiomed.2021.104471] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 04/27/2021] [Accepted: 05/02/2021] [Indexed: 12/16/2022]
Abstract
SARS-COV-2, Severe Acute Respiratory Syndrome (SARS), and the Middle East respiratory syndrome-related coronavirus (MERS) viruses are from the coronaviridae family; the former became a global pandemic (with low mortality rate) while the latter were confined to a limited region (with high mortality rates). To investigate the possible structural differences at basic levels for the three viruses, genomic and proteomic sequences were downloaded and converted to polynomial datasets. Seven attribute weighting (feature selection) models were employed to find the key differences in their genome's nucleotide sequence. Most attribute weighting models selected the final nucleotide sequences (from 29,000th nucleotide positions to the end of the genome) as significantly different among the three virus classes. The genome and proteome sequences of this hot zone area (which corresponds to the 3'UTR region and encodes for nucleoprotein (N)) and Spike (S) protein sequences (as the most important viral protein) were converted into binary images and were analyzed by image processing techniques and Convolutional deep Neural Network (CNN). Although the predictive accuracy of CNN for Spike (S) proteins was low (0.48%), the machine-based learning algorithms were able to classify the three members of coronaviridae viruses with 100% accuracy based on 3'UTR region. For the first time ever, the relationship between the possible structural differences of coronaviruses at the sequential levels and their pathogenesis are being reported, which paves the road to deciphering the high pathogenicity of the SARS-COV-2 virus.
Collapse
Affiliation(s)
- Reza Ahsan
- Department of Computer Engineering, Qom Branch, Islamic Azad University, Qom, Iran
| | | | - Faezeh Ebrahimi
- Faculty of Life Sciences and Biotechnology, Department of Microbiology and Microbial Biotechnology, Shahid Beheshti University, Tehran, Iran
| | - Esmaeil Ebrahimie
- Genomics Research Platform, School of Life Sciences, College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, 3086, Australia,School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran,School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia,Corresponding author. Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| |
Collapse
|
10
|
Ferguson AL, Ranganathan R. 100th Anniversary of Macromolecular Science Viewpoint: Data-Driven Protein Design. ACS Macro Lett 2021; 10:327-340. [PMID: 35549066 DOI: 10.1021/acsmacrolett.0c00885] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The design of synthetic proteins with the desired function is a long-standing goal in biomolecular science, with broad applications in biochemical engineering, agriculture, medicine, and public health. Rational de novo design and experimental directed evolution have achieved remarkable successes but are challenged by the requirement to find functional "needles" in the vast "haystack" of protein sequence space. Data-driven models for fitness landscapes provide a predictive map between protein sequence and function and can prospectively identify functional candidates for experimental testing to greatly improve the efficiency of this search. This Viewpoint reviews the applications of machine learning and, in particular, deep learning as part of data-driven protein engineering platforms. We highlight recent successes, review promising computational methodologies, and provide an outlook on future challenges and opportunities. The article is written for a broad audience comprising both polymer and protein scientists and computer and data scientists interested in an up-to-date review of recent innovations and opportunities in this rapidly evolving field.
Collapse
Affiliation(s)
- Andrew L. Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Rama Ranganathan
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
- Center for Physics of Evolving Systems, University of Chicago, Chicago, Illinois 60637, United States
- Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
11
|
Machine learning and statistics to qualify environments through multi-traits in Coffea arabica. PLoS One 2021; 16:e0245298. [PMID: 33434204 PMCID: PMC7802962 DOI: 10.1371/journal.pone.0245298] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 12/25/2020] [Indexed: 11/30/2022] Open
Abstract
Several factors such as genotype, environment, and post-harvest processing can affect the responses of important traits in the coffee production chain. Determining the influence of these factors is of great relevance, as they can be indicators of the characteristics of the coffee produced. The most efficient models choice to be applied should take into account the variety of information and the particularities of each biological material. This study was developed to evaluate statistical and machine learning models that would better discriminate environments through multi-traits of coffee genotypes and identify the main agronomic and beverage quality traits responsible for the variation of the environments. For that, 31 morpho-agronomic and post-harvest traits were evaluated, from field experiments installed in three municipalities in the Matas de Minas region, in the State of Minas Gerais, Brazil. Two types of post-harvest processing were evaluated: natural and pulped. The apparent error rate was estimated for each method. The Multilayer Perceptron and Radial Basis Function networks were able to discriminate the coffee samples in multi-environment more efficiently than the other methods, identifying differences in multi-traits responses according to the production sites and type of post-harvest processing. The local factors did not present specific traits that favored the severity of diseases and differentiated vegetative vigor. Sensory traits acidity and fragrance/aroma score also made little contribution to the discrimination process, indicating that acidity and fragrance/aroma are characteristic of coffee produced and all coffee samples evaluated are of the special type in the Mata of Minas region. The main traits responsible for the differentiation of production sites are plant height, fruit size, and bean production. The sensory trait "Body" is the main one to discriminate the form of post-harvest processing.
Collapse
|
12
|
Liyaghatdar Z, Pezeshkian Z, Mohammadi-Dehcheshmeh M, Ebrahimie E. Fast school closures correspond with a lower rate of COVID-19 incidence and deaths in most countries. INFORMATICS IN MEDICINE UNLOCKED 2021; 27:100805. [PMID: 34849394 PMCID: PMC8607689 DOI: 10.1016/j.imu.2021.100805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 11/01/2021] [Accepted: 11/21/2021] [Indexed: 01/31/2023] Open
Abstract
School closures have been used as one of the main nonpharmaceutical interventions to overcome the spread of SARS-CoV-2. Different countries use this intervention with a wide range of time intervals from the date of the first confirmed case or death. This study aimed to investigate whether fast or late school closures affect the cumulative number of COVID-19 cases or deaths. A worldwide population-based observational study has been conducted and a range of attributes were weighted using 10 attribute weighting models against the normalized number of infected cases or death in the form of numeric, binominal and polynomial labels. Statistical analysis was performed for the most weighted and the most common attributes of all types of labels. By the end of March 2021, the school closure data of 198 countries with at least one COVID-19 case were available. The days before the first school closure were one of the most weighted factors in relation to the normalized number of infected cases and deaths in numeric, binomial, and quartile forms. The average of days before the first school closure in the lowest quartile to highest quartile of infected cases (Q1, Q2, Q3 and Q4) was -6.10 [95% CI, -26.5 to 14.2], 9.35 [95% CI, 2.16 to 16.53], 17.55 [95% CI, 5.95 to 29.15], and 16.00 [95% CI, 11.69 to 20.31], respectively. In addition, 188 countries reported at least one death from COVID-19. The average of the days before the first school closure in the lowest quartile of death to highest quartile (Q1, Q2, Q3 and Q4) was -49.4 [95% CI, -76.5 to -22.3], -10.34 [95% CI, -30.12 to 9.44], -18.74 [95% CI, -32.72 to -4.77], and -12.89 [95% CI, -27.84 to 2.06], respectively. Countries that closed schools faster, especially before the detection of any confirmed case or death, had fewer COVID-19 cases or deaths per million of the population on total days of involvement. It can be concluded that rapid prevention policies are the main determinants of the countries' success.
Collapse
Affiliation(s)
- Zahra Liyaghatdar
- Department of Biochemistry, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran,Corresponding author
| | - Zahra Pezeshkian
- Department of Animal Sciences, University of Guilan, Rasht, Iran
| | - Manijeh Mohammadi-Dehcheshmeh
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, SA, 5371, Australia,Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | - Esmaeil Ebrahimie
- La Trobe Genomics Research Platform, School of Life Sciences, College of Science, Health and Engineering, La Trobe University, Melbourne, VIC, 3086, Australia,Institute of Biotechnology, Shiraz University, Shiraz, Iran
| |
Collapse
|
13
|
Gado JE, Beckham GT, Payne CM. Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning. J Chem Inf Model 2020; 60:4098-4107. [DOI: 10.1021/acs.jcim.0c00489] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Japheth E. Gado
- Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States
- National Bioenergy Center, National Renewable Energy Laboratory, Golden, Colorado 80401, United States
| | - Gregg T. Beckham
- National Bioenergy Center, National Renewable Energy Laboratory, Golden, Colorado 80401, United States
| | - Christina M. Payne
- Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States
| |
Collapse
|
14
|
Piroozmand F, Ghadam P, Zarrabi M, Abdi-Ali A. Biochemical and computational study of an alginate lyase produced by Pseudomonas aeruginosa strain S21. IRANIAN JOURNAL OF BASIC MEDICAL SCIENCES 2020; 23:454-460. [PMID: 32489560 PMCID: PMC7239423 DOI: 10.22038/ijbms.2020.37277.8874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
OBJECTIVES Alginates play a key role in mucoid Pseudomonas aeruginosa colonization, biofilm formation, and driving out of cationic antibiotics. P. aeruginosa alginate lyase (AlgL) is a periplasmic enzyme that is necessary for alginate synthesis and secretion. It also has a role in depolymerization of alginates. Using AlgLs in cystic fibrosis patients along with antibiotics enhances bacterial killing and host healing. In this study, we investigated the different biochemical properties of a newly isolated AlgL from P. aeruginosa S21 to complete the databank of AlgLs. MATERIALS AND METHODS The enzyme was extracted from the periplasmic space of the bacteria by the heat shock method. Using the TBA method, the enzyme activity and biochemical properties were assessed. The mutability of P. aeruginosa S21 AlgL to increase its thermal stability was investigated. The most favorable mutations were studied computationally. The molecular dynamics simulation (MDS) package GROMACS was used for determining the effect of S34R mutation on enzyme's thermal stability. RESULTS Data showed that this enzyme has the best activity at 37 °C and pH 7.5 and it can degrade mannuronate blocks, guluronate blocks, and sodium alginate. After 7 hr at 80 °C, 45% of the enzyme activity was retained. This enzyme needed 15 min to completely degrade accessible sodium alginate. Tris buffer, pH 8.5 and Britton-Robinson buffer, pH 7.0 were the preferable buffers for the enzyme activity. MDS of native and mutated enzymes showed desirable results. CONCLUSION P. aeruginosa S21 AlgL can be used in medical and industrial applications to degrade alginates.
Collapse
Affiliation(s)
- Firoozeh Piroozmand
- Department of Biotechnology, Faculty of Biological Sciences, Alzahra University, Tehran, Iran
| | - Parinaz Ghadam
- Department of Biotechnology, Faculty of Biological Sciences, Alzahra University, Tehran, Iran,Corresponding author: Parinaz Ghadam. Department of Biotechnology, Faculty of Biological Sciences, Alzahra University, Tehran, Iran. Tel: +98-21-88044051;
| | - Mahboobe Zarrabi
- Department of Biotechnology, Faculty of Biological Sciences, Alzahra University, Tehran, Iran
| | - Ahya Abdi-Ali
- Department of Microbiology, Faculty of Biological Sciences, Alzahra University, Tehran, Iran
| |
Collapse
|
15
|
Lee T, Lee H. Prediction of Alzheimer's disease using blood gene expression data. Sci Rep 2020; 10:3485. [PMID: 32103140 PMCID: PMC7044318 DOI: 10.1038/s41598-020-60595-1] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 02/11/2020] [Indexed: 12/13/2022] Open
Abstract
Identification of AD (Alzheimer's disease)-related genes obtained from blood samples is crucial for early AD diagnosis. We used three public datasets, ADNI, AddNeuroMed1 (ANM1), and ANM2, for this study. Five feature selection methods and five classifiers were used to curate AD-related genes and discriminate AD patients, respectively. In the internal validation (five-fold cross-validation within each dataset), the best average values of the area under the curve (AUC) were 0.657, 0.874, and 0.804 for ADNI, ANMI, and ANM2, respectively. In the external validation (training and test sets from different datasets), the best AUCs were 0.697 (training: ADNI to testing: ANM1), 0.764 (ADNI to ANM2), 0.619 (ANM1 to ADNI), 0.79 (ANM1 to ANM2), 0.655 (ANM2 to ADNI), and 0.859 (ANM2 to ANM1), respectively. These results suggest that although the classification performance of ADNI is relatively lower than that of ANM1 and ANM2, classifiers trained using blood gene expression can be used to classify AD for other data sets. In addition, pathway analysis showed that AD-related genes were enriched with inflammation, mitochondria, and Wnt signaling pathways. Our study suggests that blood gene expression data are useful in predicting the AD classification.
Collapse
Affiliation(s)
- Taesic Lee
- Department of Biomedical Science and Engineering, Gwangju Institute of Science and Technology, Gwangju, South Korea
| | - Hyunju Lee
- Department of Biomedical Science and Engineering, Gwangju Institute of Science and Technology, Gwangju, South Korea.
- Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology, Gwangju, South Korea.
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, South Korea.
| |
Collapse
|
16
|
Hu Y, Zhao T, Zhang N, Zhang Y, Cheng L. A Review of Recent Advances and Research on Drug Target Identification Methods. Curr Drug Metab 2019; 20:209-216. [PMID: 30251599 DOI: 10.2174/1389200219666180925091851] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 01/01/2018] [Accepted: 08/02/2018] [Indexed: 12/14/2022]
Abstract
BACKGROUND From a therapeutic viewpoint, understanding how drugs bind and regulate the functions of their target proteins to protect against disease is crucial. The identification of drug targets plays a significant role in drug discovery and studying the mechanisms of diseases. Therefore the development of methods to identify drug targets has become a popular issue. METHODS We systematically review the recent work on identifying drug targets from the view of data and method. We compiled several databases that collect data more comprehensively and introduced several commonly used databases. Then divided the methods into two categories: biological experiments and machine learning, each of which is subdivided into different subclasses and described in detail. RESULTS Machine learning algorithms are the majority of new methods. Generally, an optimal set of features is chosen to predict successful new drug targets with similar properties. The most widely used features include sequence properties, network topological features, structural properties, and subcellular locations. Since various machine learning methods exist, improving their performance requires combining a better subset of features and choosing the appropriate model for the various datasets involved. CONCLUSION The application of experimental and computational methods in protein drug target identification has become increasingly popular in recent years. Current biological and computational methods still have many limitations due to unbalanced and incomplete datasets or imperfect feature selection methods.
Collapse
Affiliation(s)
- Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ningyi Zhang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
17
|
Yang Y, Ding X, Zhu G, Niroula A, Lv Q, Vihinen M. ProTstab - predictor for cellular protein stability. BMC Genomics 2019; 20:804. [PMID: 31684883 PMCID: PMC6830000 DOI: 10.1186/s12864-019-6138-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 09/24/2019] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Stability is one of the most fundamental intrinsic characteristics of proteins and can be determined with various methods. Characterization of protein properties does not keep pace with increase in new sequence data and therefore even basic properties are not known for far majority of identified proteins. There have been some attempts to develop predictors for protein stabilities; however, they have suffered from small numbers of known examples. RESULTS We took benefit of results from a recently developed cellular stability method, which is based on limited proteolysis and mass spectrometry, and developed a machine learning method using gradient boosting of regression trees. ProTstab method has high performance and is well suited for large scale prediction of protein stabilities. CONCLUSIONS The Pearson's correlation coefficient was 0.793 in 10-fold cross validation and 0.763 in independent blind test. The corresponding values for mean absolute error are 0.024 and 0.036, respectively. Comparison with a previously published method indicated ProTstab to have superior performance. We used the method to predict stabilities of all the remaining proteins in the entire human proteome and then correlated the predicted stabilities to protein chain lengths of isoforms and to localizations of proteins.
Collapse
Affiliation(s)
- Yang Yang
- School of Computer Science and Technology, Soochow University, Suzhou, China
- Department of Experimental Medical Science, BMC B13, Lund University, Lund, Sweden
- Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, China
| | - Xuesong Ding
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Guanchen Zhu
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Abhishek Niroula
- Department of Experimental Medical Science, BMC B13, Lund University, Lund, Sweden
| | - Qiang Lv
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, Lund, Sweden.
| |
Collapse
|
18
|
Khan MF, Kundu D, Hazra C, Patra S. A strategic approach of enzyme engineering by attribute ranking and enzyme immobilization on zinc oxide nanoparticles to attain thermostability in mesophilic Bacillus subtilis lipase for detergent formulation. Int J Biol Macromol 2019; 136:66-82. [DOI: 10.1016/j.ijbiomac.2019.06.042] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Revised: 06/06/2019] [Accepted: 06/07/2019] [Indexed: 12/27/2022]
|
19
|
Karami K, Zerehdaran S, Javadmanesh A, Shariati MM, Fallahi H. Characterization of bovine (Bos taurus) imprinted genes from genomic to amino acid attributes by data mining approaches. PLoS One 2019; 14:e0217813. [PMID: 31170205 PMCID: PMC6553745 DOI: 10.1371/journal.pone.0217813] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2018] [Accepted: 05/21/2019] [Indexed: 01/05/2023] Open
Abstract
Genomic imprinting results in monoallelic expression of genes in mammals and flowering plants. Understanding the function of imprinted genes improves our knowledge of the regulatory processes in the genome. In this study, we have employed classification and clustering algorithms with attribute weighting to specify the unique attributes of both imprinted (monoallelic) and biallelic expressed genes. We have obtained characteristics of 22 known monoallelically expressed (imprinted) and 8 biallelic expressed genes that have been experimentally validated alongside 208 randomly selected genes in bovine (Bos taurus). Attribute weighting methods and various supervised and unsupervised algorithms in machine learning were applied. Unique characteristics were discovered and used to distinguish mono and biallelic expressed genes from each other in bovine. To obtain the accuracy of classification, 10-fold cross-validation with concerning each combination of attribute weighting (feature selection) and machine learning algorithms, was used. Our approach was able to accurately predict mono and biallelic genes using the genomics and proteomics attributes.
Collapse
Affiliation(s)
- Keyvan Karami
- Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Saeed Zerehdaran
- Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Ali Javadmanesh
- Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mohammad Mahdi Shariati
- Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Hossein Fallahi
- Department of Biology, School of Sciences, Razi University, Kermanshah, Iran
| |
Collapse
|
20
|
Karami K, Zerehdaran S, Javadmanesh A, Shariati MM, Fallahi H. Attribute selection and model evaluation for the maternal and paternal imprinted genes in bovine ( Bos Taurus) using supervised machine learning algorithms. J Anim Breed Genet 2019; 136:205-216. [DOI: 10.1111/jbg.12379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 12/06/2018] [Accepted: 12/06/2018] [Indexed: 11/29/2022]
Affiliation(s)
- Keyvan Karami
- Department of Animal ScienceFerdowsi University of Mashhad Mashhad Iran
| | - Saeed Zerehdaran
- Department of Animal ScienceFerdowsi University of Mashhad Mashhad Iran
| | - Ali Javadmanesh
- Department of Animal ScienceFerdowsi University of Mashhad Mashhad Iran
| | | | - Hossien Fallahi
- Department of Biology, School of SciencesRazi University Kermanshah Iran
| |
Collapse
|
21
|
Li G, Dong Y, Reetz MT. Can Machine Learning Revolutionize Directed Evolution of Selective Enzymes? Adv Synth Catal 2019. [DOI: 10.1002/adsc.201900149] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Guangyue Li
- State Key Laboratory for Biology of Plant Diseases and Insect Pests/Key Laboratory of Control of Biological Hazard Factors (Plant Origin) for Agri-product Quality and Safety, Ministry of Agriculture, Institute of Plant ProtectionChinese Academy of Agricultural Sciences Beijing 100081 People's Republic of China
| | - Yijie Dong
- State Key Laboratory for Biology of Plant Diseases and Insect Pests/Key Laboratory of Control of Biological Hazard Factors (Plant Origin) for Agri-product Quality and Safety, Ministry of Agriculture, Institute of Plant ProtectionChinese Academy of Agricultural Sciences Beijing 100081 People's Republic of China
| | - Manfred T. Reetz
- Max-Planck-Institut für Kohlenforschung Kaiser-Wilhelm-Platz 1 45470 Mülheim an der Ruhr Germany
- Fachbereich Chemie der Philipps-Universität Hans-Meerwein-Strasse 35032 Marburg Germany
| |
Collapse
|
22
|
Lu M, Dukunde A, Daniel R. Biochemical profiles of two thermostable and organic solvent-tolerant esterases derived from a compost metagenome. Appl Microbiol Biotechnol 2019; 103:3421-3437. [PMID: 30809711 DOI: 10.1007/s00253-019-09695-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 02/11/2019] [Accepted: 02/12/2019] [Indexed: 12/15/2022]
Abstract
Owing to the functional versatility and potential applications in industry, interest in lipolytic enzymes tolerant to organic solvents is increasing. In this study, functional screening of a compost soil metagenome resulted in identification of two lipolytic genes, est1 and est2, encoding 270 and 389 amino acids, respectively. The two genes were heterologously expressed and characterized. Est1 and Est2 are thermostable enzymes with optimal enzyme activities at 80 and 70 °C, respectively. A second-order rotatable design, which allows establishing the relationship between multiple variables with the obtained responses, was used to explore the combined effects of temperature and pH on esterase stability. The response curve indicated that Est1, and particularly Est2, retained high stability within a broad range of temperature and pH values. Furthermore, the effects of organic solvents on Est1 and Est2 activities and stabilities were assessed. Notably, Est2 activity was significantly enhanced (two- to tenfold) in the presence of ethanol, methanol, isopropanol, and 1-propanol over a concentration range between 6 and 30% (v/v). For the short-term stability (2 h of incubation), Est2 exhibited high tolerance against 60% (v/v) of ethanol, methanol, isopropanol, DMSO, and acetone, while Est1 activity resisted these solvents only at lower concentrations (below 30%, v/v). Est2 also displayed high stability towards some water-immiscible organic solvents, such as ethyl acetate, diethyl ether, and toluene. With respect to long-term stability, Est2 retained most of its activity after 26 days of incubation in the presence of 30% (v/v) ethanol, methanol, isopropanol, DMSO, or acetone. All of these features indicate that Est1 and Est2 possess application potential.
Collapse
Affiliation(s)
- Mingji Lu
- Department of Genomic and Applied Microbiology, Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August-University of Göttingen, Grisebachstraße 8, 37077, Göttingen, Germany
| | - Amélie Dukunde
- Department of Genomic and Applied Microbiology, Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August-University of Göttingen, Grisebachstraße 8, 37077, Göttingen, Germany
| | - Rolf Daniel
- Department of Genomic and Applied Microbiology, Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August-University of Göttingen, Grisebachstraße 8, 37077, Göttingen, Germany.
| |
Collapse
|
23
|
Kargarfard F, Sami A, Hemmatzadeh F, Ebrahimie E. Identifying mutation positions in all segments of influenza genome enables better differentiation between pandemic and seasonal strains. Gene 2019; 697:78-85. [PMID: 30769139 DOI: 10.1016/j.gene.2019.01.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2018] [Revised: 12/29/2018] [Accepted: 01/17/2019] [Indexed: 01/08/2023]
Abstract
Influenza has a negative sense, single-stranded, and segmented RNA. In the context of pandemic influenza research, most studies have focused on variations in the surface proteins (Hemagglutinin and Neuraminidase). However, new findings suggest that all internal and external proteins of influenza viruses can contribute in pandemic emergence, pathogenicity and increasing host range. The occurrence of the 2009 influenza pandemic and the availability of many external and internal segments of pandemic and non-pandemic sequences offer a unique opportunity to evaluate the performance of machine learning models in discrimination of pandemic from seasonal sequences using mutation positions in all segments. In this study, we hypothesized that identifying mutation positions in all segments (proteins) encoded by the influenza genome would enable pandemic and seasonal strains to be more reliably distinguished. In a large scale study, we applied a range of data mining techniques to all segments of influenza for rule discovery and discrimination of pandemic from seasonal strains. CBA (classification based on association rule mining), Ripper and Decision tree algorithms were utilized to extract association rules among mutations. CBA outperformed the other models. Our approach could discriminate pandemic sequences from seasonal ones with more than 95% accuracy for PA and NP, 99.33% accuracy for NA and 100% accuracy, precision, specificity and sensitivity (recall) for M1, M2, PB1, NS1, and NS2. The values of precision, specificity, and sensitivity were more than 90% for other segments except PB2. If sequences of all segments of one strain were available, the accuracy of discrimination of pandemic strains was 100%. General rules extracted by rule base classification approaches, such as M1-V147I, NP-N334H, NS1-V112I, and PB1-L364I, were able to detect pandemic sequences with high accuracy. We observed that mutations on internal proteins of influenza can contribute in distinguishing the pandemic viruses, similar to the external ones.
Collapse
Affiliation(s)
- Fatemeh Kargarfard
- Faculty of Engineering and IT, University of Technology Sydney, New South Wales, Australia; Department of Computer Science and Engineering, School of Electrical Engineering and Computer, Shiraz University, Shiraz, Iran
| | - Ashkan Sami
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer, Shiraz University, Shiraz, Iran
| | - Farhid Hemmatzadeh
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia
| | - Esmaeil Ebrahimie
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia; Genomics Research Platform, La Trobe University, Melbourne, Victoria 3086, Australia; School of Information Technology and Mathematical Sciences, Division of Information Technology Engineering & Environment, University of South Australia, Adelaide, Australia; School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, Australia.
| |
Collapse
|
24
|
Alanazi IO, Al Shehri ZS, Ebrahimie E, Giahi H, Mohammadi-Dehcheshmeh M. Non-coding and coding genomic variants distinguish prostate cancer, castration-resistant prostate cancer, familial prostate cancer, and metastatic castration-resistant prostate cancer from each other. Mol Carcinog 2019; 58:862-874. [PMID: 30644608 DOI: 10.1002/mc.22975] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 01/07/2019] [Accepted: 01/08/2019] [Indexed: 12/11/2022]
Abstract
A considerable number of deposited variants has provided new possibilities for knowledge discovery in different types of prostate cancer. Here, we analyzed variants located on 3'UTR, 5'UTR, CDs, Intergenic, and Intronic regions in castration-resistant prostate cancer (8496 variants), familial prostate cancer (3241 variants), metastatic castration-resistant prostate cancer (3693 variants), and prostate cancer (16599 variants). Chromosome regions 10p15-p14 and 2p13 were highly enriched (P < 0.00001) for variants located in 3'UTR, 5'UTR, CDs, intergenic, and intronic regions in castration-resistant prostate cancer. In contrast, 10p15-p14, 10q23.3, 12q13.11, 13q12.3, 1q25, and 8p22 regions were enriched (P < 0.001) in familial prostate cancer. In metastatic castration-resistant prostate cancer, 10p15-p14, 10q23.3, 11q22-q23, 14q21.1, and 14q32.13 were highly variant regions (P < 0.001). Chromosome 2 and chromosome 1 hosted many enriched variant regions. AKR1C3, BRCA1, BRCA2, CHGA, CYP19A1, HOXB13, KLK3, and PTEN contained the highest number of 3'UTR, 5'UTR, CDs, Intergenic, and Intronic variants. Network analysis showed that these genes are upstream of important functions including prostate gland development, tumor recurrence, prostate cancer-specific survival, tumor progression, cancer mortality, long-term survival, cancer recurrence, angiogenesis, and AR. Interestingly, all of EGFR, JAK2, NR3C1, PDZD2, and SEMA3C genes had single nucleotide polymorphisms (SNP) in castration-resistant prostate cancer, consistent with high selection pressure on these genes during drug treatment and consequent resistance. High occurrence of variants in 3'UTRs suggests the importance of regulatory variants in different types of prostate cancer; an area that has been neglected compared with coding variants. This study provides a comprehensive overview of genomic regions contributing to different types of prostate cancer.
Collapse
Affiliation(s)
- Ibrahim O Alanazi
- National Center for Biotechnology, Life Science and Environment Research Institute, King Abdulaziz City for Science and Technology (KACST), Riyadh, Saudi Arabia
| | - Zafer S Al Shehri
- Clinical Laboratory Department, College of Applied Medical Sciences, Shaqra University, KSA, Al dawadmi, Saudi Arabia
| | - Esmaeil Ebrahimie
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, South Australia, Australia.,School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, The University of South Australia, Adelaide, SA, Australia.,Institute of Biotechnology, Shiraz University, Shiraz, Iran.,Faculty of Science and Engineering, School of Biological Sciences, Flinders University, Adelaide, SA, Australia
| | - Hassan Giahi
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | - Manijeh Mohammadi-Dehcheshmeh
- Australian Centre for Antimicrobial Resistance Ecology, School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, Australia
| |
Collapse
|
25
|
Mohammadi-Dehcheshmeh M, Niazi A, Ebrahimi M, Tahsili M, Nurollah Z, Ebrahimi Khaksefid R, Ebrahimi M, Ebrahimie E. Unified Transcriptomic Signature of Arbuscular Mycorrhiza Colonization in Roots of Medicago truncatula by Integration of Machine Learning, Promoter Analysis, and Direct Merging Meta-Analysis. FRONTIERS IN PLANT SCIENCE 2018; 9:1550. [PMID: 30483277 PMCID: PMC6240842 DOI: 10.3389/fpls.2018.01550] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Accepted: 10/03/2018] [Indexed: 05/25/2023]
Abstract
Plant root symbiosis with Arbuscular mycorrhizal (AM) fungi improves uptake of water and mineral nutrients, improving plant development under stressful conditions. Unraveling the unified transcriptomic signature of a successful colonization provides a better understanding of symbiosis. We developed a framework for finding the transcriptomic signature of Arbuscular mycorrhiza colonization and its regulating transcription factors in roots of Medicago truncatula. Expression profiles of roots in response to AM species were collected from four separate studies and were combined by direct merging meta-analysis. Batch effect, the major concern in expression meta-analysis, was reduced by three normalization steps: Robust Multi-array Average algorithm, Z-standardization, and quartiling normalization. Then, expression profile of 33685 genes in 18 root samples of Medicago as numerical features, as well as study ID and Arbuscular mycorrhiza type as categorical features, were mined by seven models: RELIEF, UNCERTAINTY, GINI INDEX, Chi Squared, RULE, INFO GAIN, and INFO GAIN RATIO. In total, 73 genes selected by machine learning models were up-regulated in response to AM (Z-value difference > 0.5). Feature weighting models also documented that this signature is independent from study (batch) effect. The AM inoculation signature obtained was able to differentiate efficiently between AM inoculated and non-inoculated samples. The AP2 domain class transcription factor, GRAS family transcription factors, and cyclin-dependent kinase were among the highly expressed meta-genes identified in the signature. We found high correspondence between the AM colonization signature obtained in this study and independent RNA-seq experiments on AM colonization, validating the repeatability of the colonization signature. Promoter analysis of upregulated genes in the transcriptomic signature led to the key regulators of AM colonization, including the essential transcription factors for endosymbiosis establishment and development such as NF-YA factors. The approach developed in this study offers three distinct novel features: (I) it improves direct merging meta-analysis by integrating supervised machine learning models and normalization steps to reduce study-specific batch effects; (II) seven attribute weighting models assessed the suitability of each gene for the transcriptomic signature which contributes to robustness of the signature (III) the approach is justifiable, easy to apply, and useful in practice. Our integrative framework of meta-analysis, promoter analysis, and machine learning provides a foundation to reveal the transcriptomic signature and regulatory circuits governing Arbuscular mycorrhizal symbiosis and is transferable to the other biological settings.
Collapse
Affiliation(s)
- Manijeh Mohammadi-Dehcheshmeh
- Australian Centre for Antimicrobial Resistance Ecology, School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, SA, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | - Ali Niazi
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | | | | | - Zahra Nurollah
- Department of Biotechnology, Shahrekord University, Shahrekord, Iran
| | - Reyhaneh Ebrahimi Khaksefid
- Department of Biotechnology, Shahrekord University, Shahrekord, Iran
- School of Agriculture Food and Wine, Department of Plant Science, The University of Adelaide, Adelaide, SA, Australia
| | - Mahdi Ebrahimi
- Max-Planck-Institute for Informatics, Saarbrucken, Germany
| | - Esmaeil Ebrahimie
- Australian Centre for Antimicrobial Resistance Ecology, School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, SA, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
- Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- Division of Information Technology, Engineering and the Environment, School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, SA, Australia
- Faculty of Science and Engineering, School of Biological Sciences, Flinders University, Adelaide, SA, Australia
| |
Collapse
|
26
|
Khan MF, Patra S. Deciphering the rationale behind specific codon usage pattern in extremophiles. Sci Rep 2018; 8:15548. [PMID: 30341344 PMCID: PMC6195531 DOI: 10.1038/s41598-018-33476-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 09/21/2018] [Indexed: 12/03/2022] Open
Abstract
Protein stability is affected at different hierarchies – gene, RNA, amino acid sequence and structure. Gene is the first level which contributes via varying codon compositions. Codon selectivity of an organism differs with normal and extremophilic milieu. The present work attempts at detailing the codon usage pattern of six extremophilic classes and their harmony. Homologous gene datasets of thermophile-mesophile, psychrophile-mesophile, thermophile-psychrophile, acidophile-alkaliphile, halophile-nonhalophile and barophile-nonbarophile were analysed for filtering statistically significant attributes. Relative abundance analysis, 1–9 scale ranking, nucleotide compositions, attribute weighting and machine learning algorithms were employed to arrive at findings. AGG in thermophiles and barophiles, CAA in mesophiles and psychrophiles, TGG in acidophiles, GAG in alkaliphiles and GAC in halophiles had highest preference. Preference of GC-rich and G/C-ending codons were observed in halophiles and barophiles whereas, a decreasing trend was reflected in psychrophiles and alkaliphiles. GC-rich codons were found to decrease and G/C-ending codons increased in thermophiles whereas, acidophiles showed equal contents of GC-rich and G/C-ending codons. Codon usage patterns exhibited harmony among different extremophiles and has been detailed. However, the codon attribute preferences and their selectivity of extremophiles varied in comparison to non-extremophiles. The finding can be instrumental in codon optimization application for heterologous expression of extremophilic proteins.
Collapse
Affiliation(s)
- Mohd Faheem Khan
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, 781039, Assam, India
| | - Sanjukta Patra
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, 781039, Assam, India.
| |
Collapse
|
27
|
A large-scale study of indicators of sub-clinical mastitis in dairy cattle by attribute weighting analysis of milk composition features: highlighting the predictive power of lactose and electrical conductivity. J DAIRY RES 2018; 85:193-200. [PMID: 29785910 DOI: 10.1017/s0022029918000249] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Sub-clinical mastitis (SCM) affects milk composition. In this study, we hypothesise that large-scale mining of milk composition features by pattern recognition models can identify the best predictors of SCM within the milk composition features. To this end, using data mining algorithms, we conducted a large-scale and longitudinal study to evaluate the ability of various milk production parameters as indicators of SCM. SCM is the most prevalent disease of dairy cattle, causing substantial economic loss for the dairy industry. Developing new techniques to diagnose SCM in its early stages improves herd health and is of great importance. Test-day Somatic Cell Count (SCC) is the most common indicator of SCM and the primary mastitis surveillance approach worldwide. However, test-day SCC fluctuates widely between days, causing major concerns for its reliability. Consequently, there would be great benefit to identifying additional efficient indicators from large-scale and longitudinal studies. With this intent, data was collected at every milking (twice per day) for a period of 2 months from a single farm using in-line electronic equipment (346 248 records in total). The following data were analysed: milk volume, protein concentration, lactose concentration, electrical conductivity (EC), milking time and peak flow. Three SCC cut-offs were used to estimate the prevalence of SCM: Australian ≥ 250 000 cells/ml, European ≥200 000 cells/ml and New Zealand ≥ 150 000 cells/ml. At first, 10 different Attribute Weighting Algorithms (AWM) were applied to the data. In the absence of SCC, lactose concentration featured as the most important variable, followed by EC. For the first time, using attribute weighted modelling, we showed that the concentration of lactose in milk can be used as a strong indicator of SCM. The development of machine-learning expert systems using two or more milk variables (such as lactose concentration and EC) may produce a predictive pattern for early SCM detection.
Collapse
|
28
|
Farhadian M, Rafat SA, Hasanpur K, Ebrahimi M, Ebrahimie E. Cross-Species Meta-Analysis of Transcriptomic Data in Combination With Supervised Machine Learning Models Identifies the Common Gene Signature of Lactation Process. Front Genet 2018; 9:235. [PMID: 30050559 PMCID: PMC6052129 DOI: 10.3389/fgene.2018.00235] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 06/13/2018] [Indexed: 01/13/2023] Open
Abstract
Lactation, a physiologically complex process, takes place in mammary gland after parturition. The expression profile of the effective genes in lactation has not comprehensively been elucidated. Herein, meta-analysis, using publicly available microarray data, was conducted identify the differentially expressed genes (DEGs) between pre- and post-peak milk production. Three microarray datasets of Rat, Bos Taurus, and Tammar wallaby were used. Samples related to pre-peak (n = 85) and post-peak (n = 24) milk production were selected. Meta-analysis revealed 31 DEGs across the studied species. Interestingly, 10 genes, including MRPS18B, SF1, UQCRC1, NUCB1, RNF126, ADSL, TNNC1, FIS1, HES5 and THTPA, were not detected in original studies that highlights meta-analysis power in biosignature discovery. Common target and regulator analysis highlighted the high connectivity of CTNNB1, CDD4 and LPL as gene network hubs. As data originally came from three different species, to check the effects of heterogeneous data sources on DEGs, 10 attribute weighting (machine learning) algorithms were applied. Attribute weighting results showed that the type of organism had no or little effect on the selected gene list. Systems biology analysis suggested that these DEGs affect the milk production by improving the immune system performance and mammary cell growth. This is the first study employing both meta-analysis and machine learning approaches for comparative analysis of gene expression pattern of mammary glands in two important time points of lactation process. The finding may pave the way to use of publically available to elucidate the underlying molecular mechanisms of physiologically complex traits such as lactation in mammals.
Collapse
Affiliation(s)
- Mohammad Farhadian
- Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
| | - Seyed A Rafat
- Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
| | - Karim Hasanpur
- Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
| | | | - Esmaeil Ebrahimie
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA, Australia.,Institute of Biotechnology, Shiraz University, Shiraz, Iran.,Division of Information Technology, Engineering and the Environment, School of Information Technology & Mathematical Sciences, University of South Australia, Adelaide, SA, Australia.,School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, SA, Australia
| |
Collapse
|
29
|
Sharifi S, Pakdel A, Ebrahimi M, Reecy JM, Fazeli Farsani S, Ebrahimie E. Integration of machine learning and meta-analysis identifies the transcriptomic bio-signature of mastitis disease in cattle. PLoS One 2018; 13:e0191227. [PMID: 29470489 PMCID: PMC5823400 DOI: 10.1371/journal.pone.0191227] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Accepted: 12/29/2017] [Indexed: 12/14/2022] Open
Abstract
Gram-negative bacteria such as Escherichia coli (E. coli) are assumed to be among the main agents that cause severe mastitis disease with clinical signs in dairy cattle. Rapid detection of this disease is so important in order to prevent transmission to other cows and helps to reduce inappropriate use of antibiotics. With the rapid progress in high-throughput technologies, and accumulation of various kinds of '-omics' data in public repositories, there is an opportunity to retrieve, integrate, and reanalyze these resources to improve the diagnosis and treatment of different diseases and to provide mechanistic insights into host resistance in an efficient way. Meta-analysis is a relatively inexpensive option with good potential to increase the statistical power and generalizability of single-study analysis. In the current meta-analysis research, six microarray-based studies that investigate the transcriptome profile of mammary gland tissue after induced mastitis by E. coli infection were used. This meta-analysis not only reinforced the findings in individual studies, but also several novel terms including responses to hypoxia, response to drug, anti-apoptosis and positive regulation of transcription from RNA polymerase II promoter enriched by up-regulated genes. Finally, in order to identify the small sets of genes that are sufficiently informative in E. coli mastitis, the differentially expressed gene introduced by meta-analysis were prioritized by using ten different attribute weighting algorithms. Twelve meta-genes were detected by the majority of attribute weighting algorithms (with weight above 0.7) as most informative genes including CXCL8 (IL8), NFKBIZ, HP, ZC3H12A, PDE4B, CASP4, CXCL2, CCL20, GRO1(CXCL1), CFB, S100A9, and S100A8. Interestingly, the results have been demonstrated that all of these genes are the key genes in the immune response, inflammation or mastitis. The Decision tree models efficiently discovered the best combination of the meta-genes as bio-signature and confirmed that some of the top-ranked genes -ZC3H12A, CXCL2, GRO, CFB- as biomarkers for E. coli mastitis (with the accuracy 83% in average). This research properly indicated that by combination of two novel data mining tools, meta-analysis and machine learning, increased power to detect most informative genes that can help to improve the diagnosis and treatment strategies for E. coli associated with mastitis in cattle.
Collapse
Affiliation(s)
- Somayeh Sharifi
- Department of Animal Science, College of Agriculture, Isfahan University of Technology, Isfahan, Iran
- Department of Animal Science, Iowa State University, Ames, Iowa, United States of America
| | - Abbas Pakdel
- Department of Animal Science, College of Agriculture, Isfahan University of Technology, Isfahan, Iran
| | | | - James M. Reecy
- Department of Animal Science, Iowa State University, Ames, Iowa, United States of America
| | | | - Esmaeil Ebrahimie
- School of Medicine, The University of Adelaide, Adelaide, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
- Division of Information Technology, Engineering and the Environment, School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, South Australia, Australia
- School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, South Australia, Australia
| |
Collapse
|
30
|
Kargarfard F, Sami A, Mohammadi-Dehcheshmeh M, Ebrahimie E. Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments. BMC Genomics 2016; 17:925. [PMID: 27852224 PMCID: PMC5112743 DOI: 10.1186/s12864-016-3250-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 11/02/2016] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Recent (2013 and 2009) zoonotic transmission of avian or porcine influenza to humans highlights an increase in host range by evading species barriers. Gene reassortment or antigenic shift between viruses from two or more hosts can generate a new life-threatening virus when the new shuffled virus is no longer recognized by antibodies existing within human populations. There is no large scale study to help understand the underlying mechanisms of host transmission. Furthermore, there is no clear understanding of how different segments of the influenza genome contribute in the final determination of host range. METHODS To obtain insight into the rules underpinning host range determination, various supervised machine learning algorithms were employed to mine reassortment changes in different viral segments in a range of hosts. Our multi-host dataset contained whole segments of 674 influenza strains organized into three host categories: avian, human, and swine. Some of the sequences were assigned to multiple hosts. In point of fact, the datasets are a form of multi-labeled dataset and we utilized a multi-label learning method to identify discriminative sequence sites. Then algorithms such as CBA, Ripper, and decision tree were applied to extract informative and descriptive association rules for each viral protein segment. RESULT We found informative rules in all segments that are common within the same host class but varied between different hosts. For example, for infection of an avian host, HA14V and NS1230S were the most important discriminative and combinatorial positions. CONCLUSION Host range identification is facilitated by high support combined rules in this study. Our major goal was to detect discriminative genomic positions that were able to identify multi host viruses, because such viruses are likely to cause pandemic or disastrous epidemics.
Collapse
Affiliation(s)
- Fatemeh Kargarfard
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
| | - Ashkan Sami
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
| | - Manijeh Mohammadi-Dehcheshmeh
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | - Esmaeil Ebrahimie
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia
- School of Medicine, Faculty of Health Sciences, The University of Adelaide, Adelaide, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, University of South Australia, Adelaide, Australia
- School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, Australia
| |
Collapse
|
31
|
Jamali AA, Ferdousi R, Razzaghi S, Li J, Safdari R, Ebrahimie E. DrugMiner: comparative analysis of machine learning algorithms for prediction of potential druggable proteins. Drug Discov Today 2016; 21:718-24. [PMID: 26821132 DOI: 10.1016/j.drudis.2016.01.007] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2015] [Revised: 12/05/2015] [Accepted: 01/19/2016] [Indexed: 12/14/2022]
Abstract
Application of computational methods in drug discovery has received increased attention in recent years as a way to accelerate drug target prediction. Based on 443 sequence-derived protein features, we applied the most commonly used machine learning methods to predict whether a protein is druggable as well as to opt for superior algorithm in this task. In addition, feature selection procedures were used to provide the best performance of each classifier according to the optimum number of features. When run on all features, Neural Network was the best classifier, with 89.98% accuracy, based on a k-fold cross-validation test. Among all the algorithms applied, the optimum number of most-relevant features was 130, according to the Support Vector Machine-Feature Selection (SVM-FS) algorithm. This study resulted in the discovery of new drug target which potentially can be employed in cell signaling pathways, gene expression, and signal transduction. The DrugMiner web tool was developed based on the findings of this study to provide researchers with the ability to predict druggable proteins. DrugMiner is freely available at www.DrugMiner.org.
Collapse
Affiliation(s)
- Ali Akbar Jamali
- Research Center for Pharmaceutical Nanotechnology (RCPN), Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Ferdousi
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Saeed Razzaghi
- Information Technology Center, The University of Zanjan, Zanjan, Iran
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, The University of South Australia, Adelaide, SA, Australia
| | - Reza Safdari
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran.
| | - Esmaeil Ebrahimie
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, The University of South Australia, Adelaide, SA, Australia; Department of Genetics & Evolution, School of Biological Sciences, The University of Adelaide, Adelaide, SA, Australia; School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, SA, Australia.
| |
Collapse
|
32
|
Zinati Z, Alemzadeh A, KayvanJoo AH. Computational approaches for classification and prediction of P-type ATPase substrate specificity in Arabidopsis. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2016; 22:163-174. [PMID: 27186030 PMCID: PMC4840148 DOI: 10.1007/s12298-016-0351-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 03/15/2016] [Accepted: 03/28/2016] [Indexed: 06/05/2023]
Abstract
As an extended gamut of integral membrane (extrinsic) proteins, and based on their transporting specificities, P-type ATPases include five subfamilies in Arabidopsis, inter alia, P4ATPases (phospholipid-transporting ATPase), P3AATPases (plasma membrane H(+) pumps), P2A and P2BATPases (Ca(2+) pumps) and P1B ATPases (heavy metal pumps). Although, many different computational methods have been developed to predict substrate specificity of unknown proteins, further investigation needs to improve the efficiency and performance of the predicators. In this study, various attribute weighting and supervised clustering algorithms were employed to identify the main amino acid composition attributes, which can influence the substrate specificity of ATPase pumps, classify protein pumps and predict the substrate specificity of uncharacterized ATPase pumps. The results of this study indicate that both non-reduced coefficients pertaining to absorption and Cys extinction within 280 nm, the frequencies of hydrogen, Ala, Val, carbon, hydrophilic residues, the counts of Val, Asn, Ser, Arg, Phe, Tyr, hydrophilic residues, Phe-Phe, Ala-Ile, Phe-Leu, Val-Ala and length are specified as the most important amino acid attributes through applying the whole attribute weighting models. Here, learning algorithms engineered in a predictive machine (Naive Bays) is proposed to foresee the Q9LVV1 and O22180 substrate specificities (P-type ATPase like proteins) with 100 % prediction confidence. For the first time, our analysis demonstrated promising application of bioinformatics algorithms in classifying ATPases pumps. Moreover, we suggest the predictive systems that can assist towards the prediction of the substrate specificity of any new ATPase pumps with the maximum possible prediction confidence.
Collapse
Affiliation(s)
- Zahra Zinati
- />Department of Agroecology, College of Agriculture and Natural Resources of Darab, Shiraz University, Shiraz, Iran
| | - Abbas Alemzadeh
- />Department of Crop Production and Plant Breeding, College of Agriculture, Shiraz University, Shiraz, Iran
| | - Amir Hossein KayvanJoo
- />Bonn-Aachen International Center for Information Technology B-IT, University of Bonn, Bonn, Germany
| |
Collapse
|
33
|
Nasiri J, Naghavi MR, Kayvanjoo AH, Nasiri M, Ebrahimi M. Precision assessment of some supervised and unsupervised algorithms for genotype discrimination in the genus Pisum using SSR molecular data. J Theor Biol 2015; 368:122-32. [PMID: 25591889 DOI: 10.1016/j.jtbi.2015.01.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Revised: 11/06/2014] [Accepted: 01/01/2015] [Indexed: 10/24/2022]
Abstract
For the first time, prediction accuracies of some supervised and unsupervised algorithms were evaluated in an SSR-based DNA fingerprinting study of a pea collection containing 20 cultivars and 57 wild samples. In general, according to the 10 attribute weighting models, the SSR alleles of PEAPHTAP-2 and PSBLOX13.2-1 were the two most important attributes to generate discrimination among eight different species and subspecies of genus Pisum. In addition, K-Medoids unsupervised clustering run on Chi squared dataset exhibited the best prediction accuracy (83.12%), while the lowest accuracy (25.97%) gained as K-Means model ran on FCdb database. Irrespective of some fluctuations, the overall accuracies of tree induction models were significantly high for many algorithms, and the attributes PSBLOX13.2-3 and PEAPHTAP could successfully detach Pisum fulvum accessions and cultivars from the others when two selected decision trees were taken into account. Meanwhile, the other used supervised algorithms exhibited overall reliable accuracies, even though in some rare cases, they gave us low amounts of accuracies. Our results, altogether, demonstrate promising applications of both supervised and unsupervised algorithms to provide suitable data mining tools regarding accurate fingerprinting of different species and subspecies of genus Pisum, as a fundamental priority task in breeding programs of the crop.
Collapse
Affiliation(s)
- Jaber Nasiri
- Department of Agronomy and Plant Breeding, Division of Molecular Plant Genetics, College of Agricultural & Natural Resources, University of Tehran, Karaj, Tehran, Iran.
| | - Mohammad Reza Naghavi
- Department of Agronomy and Plant Breeding, College of Agricultural & Natural Resources, University of Tehran, Karaj, Tehran, Iran.
| | | | - Mojtaba Nasiri
- School of Life Sciences, Biomedical Science, Division of Molecular Biology, University of Sussex, Falmer, Brighton, UK.
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran.
| |
Collapse
|
34
|
Jemli S, Ayadi-Zouari D, Hlima HB, Bejar S. Biocatalysts: application and engineering for industrial purposes. Crit Rev Biotechnol 2014; 36:246-58. [DOI: 10.3109/07388551.2014.950550] [Citation(s) in RCA: 119] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
35
|
New layers in understanding and predicting α-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. Comput Biol Med 2014; 54:14-23. [DOI: 10.1016/j.compbiomed.2014.08.019] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Revised: 08/16/2014] [Accepted: 08/17/2014] [Indexed: 12/11/2022]
|
36
|
KayvanJoo AH, Ebrahimi M, Haqshenas G. Prediction of hepatitis C virus interferon/ribavirin therapy outcome based on viral nucleotide attributes using machine learning algorithms. BMC Res Notes 2014; 7:565. [PMID: 25150834 PMCID: PMC4246553 DOI: 10.1186/1756-0500-7-565] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 08/10/2014] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Hepatitis C virus (HCV) causes chronic hepatitis C in 2-3% of world population and remains one of the health threatening human viruses, worldwide. In the absence of an effective vaccine, therapeutic approach is the only option to combat hepatitis C. Interferon-alpha (IFN-alpha) and ribavirin (RBV) combination alone or in combination with recently introduced new direct-acting antivirals (DAA) is used to treat patients infected with HCV. The present study utilized feature selection methods (Gini Index, Chi Squared and machine learning algorithms) and other bioinformatics tools to identify genetic determinants of therapy outcome within the entire HCV nucleotide sequence. RESULTS Using combination of several algorithms, the present study performed a comprehensive bioinformatics analysis and identified several nucleotide attributes within the full-length nucleotide sequences of HCV subtypes 1a and 1b that correlated with treatment outcome. Feature selection algorithms identified several nucleotide features (e.g. count of hydrogen and CG). Combination of algorithms utilized the selected nucleotide attributes and predicted HCV subtypes 1a and 1b therapy responders from non-responders with an accuracy of 75.00% and 85.00%, respectively. In addition, therapy responders and relapsers were categorized with an accuracy of 82.50% and 84.17%, respectively. Based on the identified attributes, decision trees were induced to differentiate different therapy response groups. CONCLUSIONS The present study identified new genetic markers that potentially impact the outcome of hepatitis C treatment. In addition, the results suggest new viral genomic attributes that might influence the outcome of IFN-mediated immune response to HCV infection.
Collapse
Affiliation(s)
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran.
| | | |
Collapse
|
37
|
Shekoofa A, Emam Y, Shekoufa N, Ebrahimi M, Ebrahimie E. Determining the most important physiological and agronomic traits contributing to maize grain yield through machine learning algorithms: a new avenue in intelligent agriculture. PLoS One 2014; 9:e97288. [PMID: 24830330 PMCID: PMC4022653 DOI: 10.1371/journal.pone.0097288] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2013] [Accepted: 04/17/2014] [Indexed: 11/19/2022] Open
Abstract
Prediction is an attempt to accurately forecast the outcome of a specific situation while using input information obtained from a set of variables that potentially describe the situation. They can be used to project physiological and agronomic processes; regarding this fact, agronomic traits such as yield can be affected by a large number of variables. In this study, we analyzed a large number of physiological and agronomic traits by screening, clustering, and decision tree models to select the most relevant factors for the prospect of accurately increasing maize grain yield. Decision tree models (with nearly the same performance evaluation) were the most useful tools in understanding the underlying relationships in physiological and agronomic features for selecting the most important and relevant traits (sowing date-location, kernel number per ear, maximum water content, kernel weight, and season duration) corresponding to the maize grain yield. In particular, decision tree generated by C&RT algorithm was the best model for yield prediction based on physiological and agronomical traits which can be extensively employed in future breeding programs. No significant differences in the decision tree models were found when feature selection filtering on data were used, but positive feature selection effect observed in clustering models. Finally, the results showed that the proposed model techniques are useful tools for crop physiologists to search through large datasets seeking patterns for the physiological and agronomic factors, and may assist the selection of the most important traits for the individual site and field. In particular, decision tree models are method of choice with the capability of illustrating different pathways of yield increase in breeding programs, governed by their hierarchy structure of feature ranking as well as pattern discovery via various combinations of features.
Collapse
Affiliation(s)
- Avat Shekoofa
- Department of Crop Science, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Yahya Emam
- Department of Crop Production and Plant Breeding, Shiraz University, Shiraz, Iran
| | - Navid Shekoufa
- Department of Computer Engineering and Information Technology, Amirkabir University of Technology, Tehran, Iran
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | - Esmaeil Ebrahimie
- Department of Crop Production and Plant Breeding, Shiraz University, Shiraz, Iran
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
- * E-mail:
| |
Collapse
|
38
|
Bakhtiarizadeh MR, Moradi-Shahrbabak M, Ebrahimi M, Ebrahimie E. Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology. J Theor Biol 2014; 356:213-22. [PMID: 24819464 DOI: 10.1016/j.jtbi.2014.04.040] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 04/03/2014] [Accepted: 04/29/2014] [Indexed: 01/05/2023]
Abstract
Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods.
Collapse
Affiliation(s)
| | - Mohammad Moradi-Shahrbabak
- Department of Animal Science, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | - Esmaeil Ebrahimie
- Department of Crop Production & Plant Breeding, College of Agriculture, Shiraz University, Shiraz, Iran; School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia.
| |
Collapse
|
39
|
Ebrahimi M, Aghagolzadeh P, Shamabadi N, Tahmasebi A, Alsharifi M, Adelson DL, Hemmatzadeh F, Ebrahimie E. Understanding the undelaying mechanism of HA-subtyping in the level of physic-chemical characteristics of protein. PLoS One 2014; 9:e96984. [PMID: 24809455 PMCID: PMC4014573 DOI: 10.1371/journal.pone.0096984] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 04/07/2014] [Indexed: 01/05/2023] Open
Abstract
The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics.
Collapse
Affiliation(s)
- Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | - Parisa Aghagolzadeh
- Department of Nephrology, Hypertension, and Clinical Pharmacology, University of Bern, Bern, Switzerland
| | - Narges Shamabadi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | | | - Mohammed Alsharifi
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
| | - David L. Adelson
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
| | - Farhid Hemmatzadeh
- School of Animal and Veterinary Science, The University of Adelaide, Adelaide, Australia
- * E-mail: (FH); (EE)
| | - Esmaeil Ebrahimie
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
- * E-mail: (FH); (EE)
| |
Collapse
|
40
|
Predictions of Enzymatic Parameters: A Mini-Review with Focus on Enzymes for Biofuel. Appl Biochem Biotechnol 2013; 171:590-615. [DOI: 10.1007/s12010-013-0328-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Accepted: 06/11/2013] [Indexed: 12/25/2022]
|
41
|
Hosseinzadeh F, Kayvanjoo AH, Ebrahimi M, Goliaei B. Prediction of lung tumor types based on protein attributes by machine learning algorithms. SPRINGERPLUS 2013; 2:238. [PMID: 23888262 PMCID: PMC3710575 DOI: 10.1186/2193-1801-2-238] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2013] [Accepted: 03/21/2013] [Indexed: 01/15/2023]
Abstract
Early diagnosis of lung cancers and distinction between the tumor types (Small Cell Lung Cancer (SCLC) and Non-Small Cell Lung Cancer (NSCLC) are very important to increase the survival rate of patients. Herein, we propose a diagnostic system based on sequence-derived structural and physicochemical attributes of proteins that involved in both types of tumors via feature extraction, feature selection and prediction models. 1497 proteins attributes computed and important features selected by 12 attribute weighting models and finally machine learning models consist of seven SVM models, three ANN models and two NB models applied on original database and newly created ones from attribute weighting models; models accuracies calculated through 10-fold cross and wrapper validation (just for SVM algorithms). In line with our previous findings, dipeptide composition, autocorrelation and distribution descriptor were the most important protein features selected by bioinformatics tools. The algorithms performances in lung cancer tumor type prediction increased when they applied on datasets created by attribute weighting models rather than original dataset. Wrapper-Validation performed better than X-Validation; the best cancer type prediction resulted from SVM and SVM Linear models (82%). The best accuracy of ANN gained when Neural Net model applied on SVM dataset (88%). This is the first report suggesting that the combination of protein features and attribute weighting models with machine learning algorithms can be effectively used to predict the type of lung cancer tumors (SCLC and NSCLC).
Collapse
Affiliation(s)
- Faezeh Hosseinzadeh
- Laboratory of biophysics and molecular biology, Institute of Biophysics and Biochemistry (IBB), University of Tehran, Tehran, Iran
| | | | | | | |
Collapse
|
42
|
Chow J, Kovacic F, Dall Antonia Y, Krauss U, Fersini F, Schmeisser C, Lauinger B, Bongen P, Pietruszka J, Schmidt M, Menyes I, Bornscheuer UT, Eckstein M, Thum O, Liese A, Mueller-Dieckmann J, Jaeger KE, Streit WR. The metagenome-derived enzymes LipS and LipT increase the diversity of known lipases. PLoS One 2012; 7:e47665. [PMID: 23112831 PMCID: PMC3480424 DOI: 10.1371/journal.pone.0047665] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2012] [Accepted: 09/13/2012] [Indexed: 11/18/2022] Open
Abstract
Triacylglycerol lipases (EC 3.1.1.3) catalyze both hydrolysis and synthesis reactions with a broad spectrum of substrates rendering them especially suitable for many biotechnological applications. Most lipases used today originate from mesophilic organisms and are susceptible to thermal denaturation whereas only few possess high thermotolerance. Here, we report on the identification and characterization of two novel thermostable bacterial lipases identified by functional metagenomic screenings. Metagenomic libraries were constructed from enrichment cultures maintained at 65 to 75 °C and screened resulting in the identification of initially 10 clones with lipolytic activities. Subsequently, two ORFs were identified encoding lipases, LipS and LipT. Comparative sequence analyses suggested that both enzymes are members of novel lipase families. LipS is a 30.2 kDa protein and revealed a half-life of 48 h at 70 °C. The lipT gene encoded for a multimeric enzyme with a half-life of 3 h at 70 °C. LipS had an optimum temperature at 70 °C and LipT at 75 °C. Both enzymes catalyzed hydrolysis of long-chain (C(12) and C(14)) fatty acid esters and additionally hydrolyzed a number of industry-relevant substrates. LipS was highly specific for (R)-ibuprofen-phenyl ester with an enantiomeric excess (ee) of 99%. Furthermore, LipS was able to synthesize 1-propyl laurate and 1-tetradecyl myristate at 70 °C with rates similar to those of the lipase CalB from Candida antarctica. LipS represents the first example of a thermostable metagenome-derived lipase with significant synthesis activities. Its X-ray structure was solved with a resolution of 1.99 Å revealing an unusually compact lid structure.
Collapse
Affiliation(s)
- Jennifer Chow
- Department of Microbiology and Biotechnology, Biocenter Klein Flottbek, University of Hamburg, Hamburg, Germany
| | - Filip Kovacic
- Institute of Molecular Enzyme Technology, Heinrich Heine University Duesseldorf, Research Center Juelich, Juelich, Germany
| | - Yuliya Dall Antonia
- European Molecular Biology Laboratory (EMBL) Hamburg Outstation, c/o Deutsches Elektronen-Synchrotron (DESY), Hamburg, Germany
| | - Ulrich Krauss
- Institute of Molecular Enzyme Technology, Heinrich Heine University Duesseldorf, Research Center Juelich, Juelich, Germany
| | - Francesco Fersini
- European Molecular Biology Laboratory (EMBL) Hamburg Outstation, c/o Deutsches Elektronen-Synchrotron (DESY), Hamburg, Germany
| | - Christel Schmeisser
- Department of Microbiology and Biotechnology, Biocenter Klein Flottbek, University of Hamburg, Hamburg, Germany
| | - Benjamin Lauinger
- Institute of Bioorganic Chemistry, Heinrich Heine University Duesseldorf, Research Center Juelich, Juelich, Germany
| | - Patrick Bongen
- Institute of Bioorganic Chemistry, Heinrich Heine University Duesseldorf, Research Center Juelich, Juelich, Germany
| | - Joerg Pietruszka
- Institute of Bioorganic Chemistry, Heinrich Heine University Duesseldorf, Research Center Juelich, Juelich, Germany
| | - Marlen Schmidt
- Department of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Greifswald, Germany
| | - Ina Menyes
- Department of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Greifswald, Germany
| | - Uwe T. Bornscheuer
- Department of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Greifswald, Germany
| | - Marrit Eckstein
- Bioprocess Development Consumer Specialties and Biocatalysis Biotechnology, Evonik Industries AG, Essen, Germany
| | - Oliver Thum
- Bioprocess Development Consumer Specialties and Biocatalysis Biotechnology, Evonik Industries AG, Essen, Germany
| | - Andreas Liese
- Institute of Technical Biocatalysis, Hamburg University of Technology, Hamburg, Germany
| | - Jochen Mueller-Dieckmann
- European Molecular Biology Laboratory (EMBL) Hamburg Outstation, c/o Deutsches Elektronen-Synchrotron (DESY), Hamburg, Germany
| | - Karl-Erich Jaeger
- Institute of Molecular Enzyme Technology, Heinrich Heine University Duesseldorf, Research Center Juelich, Juelich, Germany
| | - Wolfgang R. Streit
- Department of Microbiology and Biotechnology, Biocenter Klein Flottbek, University of Hamburg, Hamburg, Germany
| |
Collapse
|
43
|
Moghadam AA, Taghavi SM, Niazi A, Djavaheri M, Ebrahimie E. Isolation and in silico functional analysis of MtATP6, a 6-kDa subunit of mitochondrial F₁F0-ATP synthase, in response to abiotic stress. GENETICS AND MOLECULAR RESEARCH 2012; 11:3547-67. [PMID: 23096681 DOI: 10.4238/2012.october.4.3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Mitochondrial F(1)F(0)-ATP synthase is a key enzymatic complex of energy metabolism that provides ATP for the cell. Subunits of this enzyme over-express under stress conditions. Little is known about the structure and regulatory mechanism of the F(0) portion of this enzyme. We isolated the full-length coding sequence of the RMtATP6 gene from rice and wheat, and partial sequences from Aegilops crassa and Triticum monococcum (Poaceae). We found that the sequence of rice RMtATP6 is 1965 bp long and contains two exons and one intron in 3'-UTR. Then, we analyzed the 2000-bp upstream region of the initiation codon ATG of the RMtATP6 and AtMtATP6, as promoter. The RMtATP6 coding sequence was found to be much conserved in the different plant species, possibly because of its key role under stress conditions. Promoter analysis demonstrated that RMtATP6 and AtMtATP6 include cis-acting elements such as ABRE, MYC/MYB, GT element in the upstream region, which respond to abscisic acid stress hormone and might show vital its roles in biotic and abiotic tolerance as an early-stress responsive gene. A mitochondrial signal peptide of 30 amino acids in length and an N-terminal cleavage site between amino acids 20 and 21 were discovered in RMtATP6. In addition, we found a transmembrane domain with an alpha helix structure that possibly passed through the mitochondrial inner membrane and established the 6-kDa subunit in the F(0) portion of the enzyme complex. Apparently, under stress conditions, with increasing ATP consumption by the cell, the 6-kDa subunit accumulates; by switching on F(1)F(0)-ATP synthase it provides additional energy needed for cell homeostasis.
Collapse
Affiliation(s)
- A A Moghadam
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | | | | | | | | |
Collapse
|
44
|
A new avenue for classification and prediction of olive cultivars using supervised and unsupervised algorithms. PLoS One 2012; 7:e44164. [PMID: 22957050 PMCID: PMC3434224 DOI: 10.1371/journal.pone.0044164] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2012] [Accepted: 07/30/2012] [Indexed: 11/19/2022] Open
Abstract
Various methods have been used to identify cultivares of olive trees; herein we used different bioinformatics algorithms to propose new tools to classify 10 cultivares of olive based on RAPD and ISSR genetic markers datasets generated from PCR reactions. Five RAPD markers (OPA0a21, OPD16a, OP01a1, OPD16a1 and OPA0a8) and five ISSR markers (UBC841a4, UBC868a7, UBC841a14, U12BC807a and UBC810a13) selected as the most important markers by all attribute weighting models. K-Medoids unsupervised clustering run on SVM dataset was fully able to cluster each olive cultivar to the right classes. All trees (176) induced by decision tree models generated meaningful trees and UBC841a4 attribute clearly distinguished between foreign and domestic olive cultivars with 100% accuracy. Predictive machine learning algorithms (SVM and Naïve Bayes) were also able to predict the right class of olive cultivares with 100% accuracy. For the first time, our results showed data mining techniques can be effectively used to distinguish between plant cultivares and proposed machine learning based systems in this study can predict new olive cultivars with the best possible accuracy.
Collapse
|
45
|
Hosseinzadeh F, Ebrahimi M, Goliaei B, Shamabadi N. Classification of lung cancer tumors based on structural and physicochemical properties of proteins by bioinformatics models. PLoS One 2012; 7:e40017. [PMID: 22829872 PMCID: PMC3400626 DOI: 10.1371/journal.pone.0040017] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Accepted: 05/30/2012] [Indexed: 12/03/2022] Open
Abstract
Rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important in diagnosis of this disease. Furthermore sequence-derived structural and physicochemical descriptors are very useful for machine learning prediction of protein structural and functional classes, classifying proteins and the prediction performance. Herein, in this study is the classification of lung tumors based on 1497 attributes derived from structural and physicochemical properties of protein sequences (based on genes defined by microarray analysis) investigated through a combination of attribute weighting, supervised and unsupervised clustering algorithms. Eighty percent of the weighting methods selected features such as autocorrelation, dipeptide composition and distribution of hydrophobicity as the most important protein attributes in classification of SCLC, NSCLC and COMMON classes of lung tumors. The same results were observed by most tree induction algorithms while descriptors of hydrophobicity distribution were high in protein sequences COMMON in both groups and distribution of charge in these proteins was very low; showing COMMON proteins were very hydrophobic. Furthermore, compositions of polar dipeptide in SCLC proteins were higher than NSCLC proteins. Some clustering models (alone or in combination with attribute weighting algorithms) were able to nearly classify SCLC and NSCLC proteins. Random Forest tree induction algorithm, calculated on leaves one-out and 10-fold cross validation) shows more than 86% accuracy in clustering and predicting three different lung cancer tumors. Here for the first time the application of data mining tools to effectively classify three classes of lung cancer tumors regarding the importance of dipeptide composition, autocorrelation and distribution descriptor has been reported.
Collapse
Affiliation(s)
- Faezeh Hosseinzadeh
- Student at Laboratory of Biophysics and Molecular Biology, Institute of Biophysics and Biochemistry, University of Tehran, Tehran, Iran
| | - Mansour Ebrahimi
- Department of Biology at Basic science School & Bioinformatics Research Group, Green Research Center, University of Qom, Qom, Iran
| | - Bahram Goliaei
- Department of Medical Physics, Iran University of Medical Science, Tehran, Iran
| | - Narges Shamabadi
- Bioinformatics Research Group, Green Research Center, University of Qom, Qom, Iran
| |
Collapse
|
46
|
Bakhtiarizadeh MR, Ebrahimi M, Ebrahimie E. Discovery of EST-SSRs in lung cancer: tagged ESTs with SSRs lead to differential amino acid and protein expression patterns in cancerous tissues. PLoS One 2011; 6:e27118. [PMID: 22073269 PMCID: PMC3208562 DOI: 10.1371/journal.pone.0027118] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2011] [Accepted: 10/11/2011] [Indexed: 11/18/2022] Open
Abstract
Tandem repeats are found in both coding and non-coding sequences of higher organisms. These sequences can be used in cancer genetics and diagnosis to unravel the genetic basis of tumor formation and progression. In this study, a possible relationship between SSR distributions and lung cancer was studied by comparative analysis of EST-SSRs in normal and lung cancerous tissues. While the EST-SSR distribution was similar between tumorous tissues, this distribution was different between normal and tumorous tissues. Trinucleotides tandem repeats were highly different; the number of trinucleotides in ESTs of lung cancer was 3 times higher than normal tissue. Significant negative correlation between normal and cancerous tissue showed that cancerous tissue generates different types of trinucleotides. GGC and CGC were the more frequent expressed trinucleotides in cancerous tissue, but these SSRs were not expressed in normal tissue. Similar to the EST level, the expression pattern of EST-SSRs-derived amino acids was significantly different between normal and cancerous tissues. Arg, Pro, Ser, Gly, and Lys were the most abundant amino acids in cancerous tissues, and Leu, Cys, Phe, and His were significantly more abundant in normal tissues than in cancerous tissues. Next, the putative functions of triplet SSR-containing genes were analyzed. In cancerous tissue, EST-SSRs produce different types of proteins. Chromodomain helicase DNA binding proteins were one of the major protein products of EST-SSRs in the cancerous library, while these proteins were not produced from EST-SSRs in normal tissue. For the first time, the findings of this study confirmed that EST-SSRs in normal lung tissues are different than in unhealthy tissues, and tagged ESTs with SSRs cause remarkable differences in amino acid and protein expression patterns in cancerous tissue. We suggest that EST-SSRs and EST-SSRs differentially expressed in cancerous tissue may be suitable candidate markers for lung cancer diagnosis and prediction.
Collapse
Affiliation(s)
| | - Mansour Ebrahimi
- Department of Biology & Bioinformatics Research Group, University of Qom, Qom, Iran
| | - Esmaeil Ebrahimie
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
- * E-mail:
| |
Collapse
|