1
|
Jafari O, Ebrahimi M, Hedayati SAA, Zeinalabedini M, Poorbagher H, Nasrolahpourmoghadam M, Fernandes JMO. Integration of Morphometrics and Machine Learning Enables Accurate Distinction between Wild and Farmed Common Carp. LIFE (BASEL, SWITZERLAND) 2022; 12:life12070957. [PMID: 35888047 PMCID: PMC9315565 DOI: 10.3390/life12070957] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 06/16/2022] [Accepted: 06/20/2022] [Indexed: 11/16/2022]
Abstract
Morphology and feature selection are key approaches to address several issues in fisheries science and stock management, such as the hypothesis of admixture of Caspian common carp (Cyprinus carpio) and farmed carp stocks in Iran. The present study was performed to investigate the population classification of common carp in the southern Caspian basin using data mining algorithms to find the most important characteristic(s) differing between Iranian and farmed common carp. A total of 74 individuals were collected from three locations within the southern Caspian basin and from one farm between November 2015 and April 2016. A dataset of 26 traditional morphometric (TMM) attributes and a dataset of 14 geometric landmark points were constructed and then subjected to various machine learning methods. In general, the machine learning methods had a higher prediction rate with TMM datasets. The highest decision tree accuracy of 77% was obtained by rule and decision tree parallel algorithms, and “head height on eye area” was selected as the best marker to distinguish between wild and farmed common carp. Various machine learning algorithms were evaluated, and we found that the linear discriminant was the best method, with 81.1% accuracy. The results obtained from this novel approach indicate that Darwin’s domestication syndrome is observed in common carp. Moreover, they pave the way for automated detection of farmed fish, which will be most beneficial to detect escapees and improve restocking programs.
Collapse
Affiliation(s)
- Omid Jafari
- International Sturgeon Research Institute, Iranian Fisheries Science Research Institute, Agricultural Research, Education and Extension Organization, Rasht 416353464, Iran
- Correspondence: (O.J.); (J.M.O.F.)
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Science, University of Qom, Qom 3716146611, Iran;
| | - Seyed Ali-Akbar Hedayati
- Department of Fisheries, Faculty of Fisheries and Environmental Sciences, Gorgan University of Agricultural Sciences and Natural Resources, Gorgan 4913815739, Iran;
| | - Mehrshad Zeinalabedini
- Department of Genomics, Agricultural Biotechnology Research Institute of Iran (ABRII), Karaj 3135933151, Iran;
| | - Hadi Poorbagher
- Department of Fisheries Sciences, Faculty of Natural Resources, University of Tehran, Karaj 3158777871, Iran; (H.P.); (M.N.)
| | - Maryam Nasrolahpourmoghadam
- Department of Fisheries Sciences, Faculty of Natural Resources, University of Tehran, Karaj 3158777871, Iran; (H.P.); (M.N.)
| | - Jorge M. O. Fernandes
- Faculty of Biosciences and Aquaculture, Nord University, 8026 Bodø, Norway
- Correspondence: (O.J.); (J.M.O.F.)
| |
Collapse
|
2
|
Machine learning and statistics to qualify environments through multi-traits in Coffea arabica. PLoS One 2021; 16:e0245298. [PMID: 33434204 PMCID: PMC7802962 DOI: 10.1371/journal.pone.0245298] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 12/25/2020] [Indexed: 11/30/2022] Open
Abstract
Several factors such as genotype, environment, and post-harvest processing can affect the responses of important traits in the coffee production chain. Determining the influence of these factors is of great relevance, as they can be indicators of the characteristics of the coffee produced. The most efficient models choice to be applied should take into account the variety of information and the particularities of each biological material. This study was developed to evaluate statistical and machine learning models that would better discriminate environments through multi-traits of coffee genotypes and identify the main agronomic and beverage quality traits responsible for the variation of the environments. For that, 31 morpho-agronomic and post-harvest traits were evaluated, from field experiments installed in three municipalities in the Matas de Minas region, in the State of Minas Gerais, Brazil. Two types of post-harvest processing were evaluated: natural and pulped. The apparent error rate was estimated for each method. The Multilayer Perceptron and Radial Basis Function networks were able to discriminate the coffee samples in multi-environment more efficiently than the other methods, identifying differences in multi-traits responses according to the production sites and type of post-harvest processing. The local factors did not present specific traits that favored the severity of diseases and differentiated vegetative vigor. Sensory traits acidity and fragrance/aroma score also made little contribution to the discrimination process, indicating that acidity and fragrance/aroma are characteristic of coffee produced and all coffee samples evaluated are of the special type in the Mata of Minas region. The main traits responsible for the differentiation of production sites are plant height, fruit size, and bean production. The sensory trait "Body" is the main one to discriminate the form of post-harvest processing.
Collapse
|
3
|
Wu CY, Chan CH, Dubey NK, Wei HJ, Lu JH, Chang CC, Cheng HC, Ou KL, Deng WP. Highly Expressed FOXF1 Inhibit Non-Small-Cell Lung Cancer Growth via Inducing Tumor Suppressor and G1-Phase Cell-Cycle Arrest. Int J Mol Sci 2020; 21:ijms21093227. [PMID: 32370197 PMCID: PMC7246752 DOI: 10.3390/ijms21093227] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 04/29/2020] [Accepted: 04/30/2020] [Indexed: 12/13/2022] Open
Abstract
Cancer pathogenesis results from genetic alteration-induced high or low transcriptional programs, which become highly dependent on regulators of gene expression. However, their role in progressive regulation of non-small-cell lung cancer (NSCLC) and how these dependencies may offer opportunities for novel therapeutic options remain to be understood. Previously, we identified forkhead box F1 (FOXF1) as a reprogramming mediator which leads to stemnesss when mesenchymal stem cells fuse with lung cancer cells, and we now examine its effect on lung cancer through establishing lowly and highly expressing FOXF1 NSCLC engineered cell lines. Higher expression of FOXF1 was enabled in cell lines through lentiviral transduction, and their viability, proliferation, and anchorage-dependent growth was assessed. Flow cytometry and Western blot were used to analyze cellular percentage in cell-cycle phases and levels of cellular cyclins, respectively. In mice, tumorigenic behavior of FOXF1 was investigated. We found that FOXF1 was downregulated in lung cancer tissues and cancer cell lines. Cell proliferation and ability of migration, anchorage-independent growth, and transformation were inhibited in H441-FOXF1H and H1299-FOXF1H, with upregulated tumor suppressor p21 and suppressed cellular cyclins, leading to cell-cycle arrest at the gap 1 (G1) phase. H441-FOXF1H and H1299-FOXF1H injected mice showed reduced tumor size. Conclusively, highly expressing FOXF1 inhibited NSCLC growth via activating tumor suppressor p21 and G1 cell-cycle arrest, thus offering a potentially novel therapeutic strategy for lung cancer.
Collapse
Affiliation(s)
- Chia-Yu Wu
- Division of Oral and Maxillofacial Surgery, Department of Dentistry, Taipei Medical University Hospital, Taipei 11031, Taiwan;
- School of Dental Technology, College of Oral Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - Chun-Hao Chan
- School of Dentistry, College of Oral Medicine, Taipei Medical University, Taipei 11031, Taiwan; (C.-H.C.); (N.K.D.); (H.-J.W.); (J.-H.L.); (H.-C.C.)
- Stem Cell Research Center, College of Oral Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - Navneet Kumar Dubey
- School of Dentistry, College of Oral Medicine, Taipei Medical University, Taipei 11031, Taiwan; (C.-H.C.); (N.K.D.); (H.-J.W.); (J.-H.L.); (H.-C.C.)
- Stem Cell Research Center, College of Oral Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - Hong-Jian Wei
- School of Dentistry, College of Oral Medicine, Taipei Medical University, Taipei 11031, Taiwan; (C.-H.C.); (N.K.D.); (H.-J.W.); (J.-H.L.); (H.-C.C.)
- Stem Cell Research Center, College of Oral Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - Jui-Hua Lu
- School of Dentistry, College of Oral Medicine, Taipei Medical University, Taipei 11031, Taiwan; (C.-H.C.); (N.K.D.); (H.-J.W.); (J.-H.L.); (H.-C.C.)
- Stem Cell Research Center, College of Oral Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - Chun-Chao Chang
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Taipei Medical University Hospital, Taipei 11031, Taiwan;
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, School of Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - Hsin-Chung Cheng
- School of Dentistry, College of Oral Medicine, Taipei Medical University, Taipei 11031, Taiwan; (C.-H.C.); (N.K.D.); (H.-J.W.); (J.-H.L.); (H.-C.C.)
- Department of Dentistry, Taipei Medical University Hospital, Taipei 11031, Taiwan
| | - Keng-Liang Ou
- Department of Dentistry, Taipei Medical University-Shuang Ho Hospital, New Taipei City 23561, Taiwan;
- 3D Global Biotech Inc., New Taipei City 22175, Taiwan
| | - Win-Ping Deng
- School of Dentistry, College of Oral Medicine, Taipei Medical University, Taipei 11031, Taiwan; (C.-H.C.); (N.K.D.); (H.-J.W.); (J.-H.L.); (H.-C.C.)
- Stem Cell Research Center, College of Oral Medicine, Taipei Medical University, Taipei 11031, Taiwan
- Graduate Institute of Basic Medicine, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- Correspondence:
| |
Collapse
|
4
|
|
5
|
Karami K, Zerehdaran S, Javadmanesh A, Shariati MM, Fallahi H. Characterization of bovine (Bos taurus) imprinted genes from genomic to amino acid attributes by data mining approaches. PLoS One 2019; 14:e0217813. [PMID: 31170205 PMCID: PMC6553745 DOI: 10.1371/journal.pone.0217813] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2018] [Accepted: 05/21/2019] [Indexed: 01/05/2023] Open
Abstract
Genomic imprinting results in monoallelic expression of genes in mammals and flowering plants. Understanding the function of imprinted genes improves our knowledge of the regulatory processes in the genome. In this study, we have employed classification and clustering algorithms with attribute weighting to specify the unique attributes of both imprinted (monoallelic) and biallelic expressed genes. We have obtained characteristics of 22 known monoallelically expressed (imprinted) and 8 biallelic expressed genes that have been experimentally validated alongside 208 randomly selected genes in bovine (Bos taurus). Attribute weighting methods and various supervised and unsupervised algorithms in machine learning were applied. Unique characteristics were discovered and used to distinguish mono and biallelic expressed genes from each other in bovine. To obtain the accuracy of classification, 10-fold cross-validation with concerning each combination of attribute weighting (feature selection) and machine learning algorithms, was used. Our approach was able to accurately predict mono and biallelic genes using the genomics and proteomics attributes.
Collapse
Affiliation(s)
- Keyvan Karami
- Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Saeed Zerehdaran
- Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Ali Javadmanesh
- Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mohammad Mahdi Shariati
- Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Hossein Fallahi
- Department of Biology, School of Sciences, Razi University, Kermanshah, Iran
| |
Collapse
|
6
|
Karami K, Zerehdaran S, Javadmanesh A, Shariati MM, Fallahi H. Attribute selection and model evaluation for the maternal and paternal imprinted genes in bovine ( Bos Taurus) using supervised machine learning algorithms. J Anim Breed Genet 2019; 136:205-216. [DOI: 10.1111/jbg.12379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 12/06/2018] [Accepted: 12/06/2018] [Indexed: 11/29/2022]
Affiliation(s)
- Keyvan Karami
- Department of Animal ScienceFerdowsi University of Mashhad Mashhad Iran
| | - Saeed Zerehdaran
- Department of Animal ScienceFerdowsi University of Mashhad Mashhad Iran
| | - Ali Javadmanesh
- Department of Animal ScienceFerdowsi University of Mashhad Mashhad Iran
| | | | - Hossien Fallahi
- Department of Biology, School of SciencesRazi University Kermanshah Iran
| |
Collapse
|
7
|
Alanazi IO, Al Shehri ZS, Ebrahimie E, Giahi H, Mohammadi-Dehcheshmeh M. Non-coding and coding genomic variants distinguish prostate cancer, castration-resistant prostate cancer, familial prostate cancer, and metastatic castration-resistant prostate cancer from each other. Mol Carcinog 2019; 58:862-874. [PMID: 30644608 DOI: 10.1002/mc.22975] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 01/07/2019] [Accepted: 01/08/2019] [Indexed: 12/11/2022]
Abstract
A considerable number of deposited variants has provided new possibilities for knowledge discovery in different types of prostate cancer. Here, we analyzed variants located on 3'UTR, 5'UTR, CDs, Intergenic, and Intronic regions in castration-resistant prostate cancer (8496 variants), familial prostate cancer (3241 variants), metastatic castration-resistant prostate cancer (3693 variants), and prostate cancer (16599 variants). Chromosome regions 10p15-p14 and 2p13 were highly enriched (P < 0.00001) for variants located in 3'UTR, 5'UTR, CDs, intergenic, and intronic regions in castration-resistant prostate cancer. In contrast, 10p15-p14, 10q23.3, 12q13.11, 13q12.3, 1q25, and 8p22 regions were enriched (P < 0.001) in familial prostate cancer. In metastatic castration-resistant prostate cancer, 10p15-p14, 10q23.3, 11q22-q23, 14q21.1, and 14q32.13 were highly variant regions (P < 0.001). Chromosome 2 and chromosome 1 hosted many enriched variant regions. AKR1C3, BRCA1, BRCA2, CHGA, CYP19A1, HOXB13, KLK3, and PTEN contained the highest number of 3'UTR, 5'UTR, CDs, Intergenic, and Intronic variants. Network analysis showed that these genes are upstream of important functions including prostate gland development, tumor recurrence, prostate cancer-specific survival, tumor progression, cancer mortality, long-term survival, cancer recurrence, angiogenesis, and AR. Interestingly, all of EGFR, JAK2, NR3C1, PDZD2, and SEMA3C genes had single nucleotide polymorphisms (SNP) in castration-resistant prostate cancer, consistent with high selection pressure on these genes during drug treatment and consequent resistance. High occurrence of variants in 3'UTRs suggests the importance of regulatory variants in different types of prostate cancer; an area that has been neglected compared with coding variants. This study provides a comprehensive overview of genomic regions contributing to different types of prostate cancer.
Collapse
Affiliation(s)
- Ibrahim O Alanazi
- National Center for Biotechnology, Life Science and Environment Research Institute, King Abdulaziz City for Science and Technology (KACST), Riyadh, Saudi Arabia
| | - Zafer S Al Shehri
- Clinical Laboratory Department, College of Applied Medical Sciences, Shaqra University, KSA, Al dawadmi, Saudi Arabia
| | - Esmaeil Ebrahimie
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, South Australia, Australia.,School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, The University of South Australia, Adelaide, SA, Australia.,Institute of Biotechnology, Shiraz University, Shiraz, Iran.,Faculty of Science and Engineering, School of Biological Sciences, Flinders University, Adelaide, SA, Australia
| | - Hassan Giahi
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | - Manijeh Mohammadi-Dehcheshmeh
- Australian Centre for Antimicrobial Resistance Ecology, School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, Australia
| |
Collapse
|
8
|
K T, N KV, S S. Distribution based Fuzzy Estimate Spectral Clustering for Cancer Detection with Protein Sequence and Structural Motifs. Asian Pac J Cancer Prev 2018; 19:1935-1940. [PMID: 30051675 PMCID: PMC6165630 DOI: 10.22034/apjcp.2018.19.7.1935] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Objective: In biological data analysis, protein sequence and structural motifs are an amino-acid sequence patterns
that are widespread and used as tools for detecting the cancer at an earlier stage. To improve the cancer detection with
minimum space and time complexity, Distribution based Fuzzy Estimate Spectral Clustering (DFESC) technique is
developed. Methods: Initially, the protein sequence motifs are taken from dataset to form the cluster. The Distribution
based spectral clustering is applied to group the protein sequence by measuring the generalized jaccard similarity
between each protein sequences. To develop the clustering accuracy, soft computing technique namely fuzzy logic is
applied to calculate membership value of each sequence motifs. Results: The outcome showed that the presented DFESC
technique effectively identifies the cancer in terms of clustering accuracy, false positive rate, and cancer detection time
and space complexity. Conclusion: Based on the observations, evaluation of DFESC technique provides improved
result for premature detection of cancer using protein sequence and structural motifs.
Collapse
Affiliation(s)
- Thenmozhi K
- Department of Computer Applications, Selvam College of Technology, Namakkal, TamilNadu, India,For Correspondence:
| | | | - Shanthi S
- Department of Computer Applications, Kongu Engineering College, Erode, TamilNadu, India
| |
Collapse
|
9
|
Kargarfard F, Sami A, Mohammadi-Dehcheshmeh M, Ebrahimie E. Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments. BMC Genomics 2016; 17:925. [PMID: 27852224 PMCID: PMC5112743 DOI: 10.1186/s12864-016-3250-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 11/02/2016] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Recent (2013 and 2009) zoonotic transmission of avian or porcine influenza to humans highlights an increase in host range by evading species barriers. Gene reassortment or antigenic shift between viruses from two or more hosts can generate a new life-threatening virus when the new shuffled virus is no longer recognized by antibodies existing within human populations. There is no large scale study to help understand the underlying mechanisms of host transmission. Furthermore, there is no clear understanding of how different segments of the influenza genome contribute in the final determination of host range. METHODS To obtain insight into the rules underpinning host range determination, various supervised machine learning algorithms were employed to mine reassortment changes in different viral segments in a range of hosts. Our multi-host dataset contained whole segments of 674 influenza strains organized into three host categories: avian, human, and swine. Some of the sequences were assigned to multiple hosts. In point of fact, the datasets are a form of multi-labeled dataset and we utilized a multi-label learning method to identify discriminative sequence sites. Then algorithms such as CBA, Ripper, and decision tree were applied to extract informative and descriptive association rules for each viral protein segment. RESULT We found informative rules in all segments that are common within the same host class but varied between different hosts. For example, for infection of an avian host, HA14V and NS1230S were the most important discriminative and combinatorial positions. CONCLUSION Host range identification is facilitated by high support combined rules in this study. Our major goal was to detect discriminative genomic positions that were able to identify multi host viruses, because such viruses are likely to cause pandemic or disastrous epidemics.
Collapse
Affiliation(s)
- Fatemeh Kargarfard
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
| | - Ashkan Sami
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
| | - Manijeh Mohammadi-Dehcheshmeh
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | - Esmaeil Ebrahimie
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia
- School of Medicine, Faculty of Health Sciences, The University of Adelaide, Adelaide, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, University of South Australia, Adelaide, Australia
- School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, Australia
| |
Collapse
|
10
|
Pradeepkiran JA, Sainath SB, Kumar KK, Balasubramanyam L, Prabhakar KV, Bhaskar M. CGMD: An integrated database of cancer genes and markers. Sci Rep 2015; 5:12035. [PMID: 26160459 PMCID: PMC4498195 DOI: 10.1038/srep12035] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 06/12/2015] [Indexed: 11/09/2022] Open
Abstract
Integrating cancer genes and markers with experimental evidence might provide valuable information for the further investigation of crosstalk between tumor genes and markers in cancer biology. To achieve this objective, we developed a database known as the Cancer Gene Marker Database (CGMD), which integrates data on tumor genes and markers based on experimental evidence. The major goal of CGMD is to provide the following: 1) current systematic treatment approaches and recent advances in different cancer treatments; 2) the aggregation of different genes and markers by their molecular characteristics and pathway associations; and 3) free access to the data compiled by CGMD at http://cgmd.in/. The database consists of 309 genes and 206 markers, as well as a list of 40 different human cancers, with detailed descriptions of all characterized markers. CGMD provides complete cancer annotations and molecular descriptions of cancer genes and markers such as CpG islands, promoters, exons, PDB structures, active sites and domains.
Collapse
Affiliation(s)
- Jangampalli Adi Pradeepkiran
- Division of Animal Biotechnology, Department of Zoology, Sri Venkateswara University, Tirupati-517502, Andhra Pradesh, India
| | - Sri Bhashyam Sainath
- 1] Department of Biotechnology, Vikrama Simhapuri University, Nellore-524 003, Andhra Pradesh, India [2] CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, 177, 4050-123 Porto, Portugal
| | - Konidala Kramthi Kumar
- Division of Animal Biotechnology, Department of Zoology, Sri Venkateswara University, Tirupati-517502, Andhra Pradesh, India
| | - Lokanada Balasubramanyam
- Division of Animal Biotechnology, Department of Zoology, Sri Venkateswara University, Tirupati-517502, Andhra Pradesh, India
| | - Kodali Vidya Prabhakar
- Department of Biotechnology, Vikrama Simhapuri University, Nellore-524 003, Andhra Pradesh, India
| | - Matcha Bhaskar
- Division of Animal Biotechnology, Department of Zoology, Sri Venkateswara University, Tirupati-517502, Andhra Pradesh, India
| |
Collapse
|
11
|
New layers in understanding and predicting α-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. Comput Biol Med 2014; 54:14-23. [DOI: 10.1016/j.compbiomed.2014.08.019] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Revised: 08/16/2014] [Accepted: 08/17/2014] [Indexed: 12/11/2022]
|
12
|
Saaber F, Chen Y, Cui T, Yang L, Mireskandari M, Petersen I. Expression of desmogleins 1-3 and their clinical impacts on human lung cancer. Pathol Res Pract 2014; 211:208-13. [PMID: 25468811 DOI: 10.1016/j.prp.2014.10.008] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Revised: 10/14/2014] [Accepted: 10/23/2014] [Indexed: 11/16/2022]
Abstract
AIMS Desmogleins (DSGs) are components of the cell-cell connecting desmosomes. Desmosomal proteins have been found dysregulated in various cancers. Here we studied the role of DSGs in human lung cancer. METHODS Expression of DSG1-3 mRNA in lung cancer cell lines and human bronchial epithelial cells (HBEC) was examined by real time RT-PCR. Methylation status of DSG1-2 was evaluated by demethylation test and bisulfite sequencing (BS). Moreover, DSG1-3 protein expression was analysed in 112 primary lung tumor samples by immunohistochemistry (IHC) on tissue microarrays. RESULTS It turned out that DSG1-3 was downregulated in most of the lung cancer cell lines. Reexpression of DSG2 and DSG3 was found in several cancer cell lines after demethylation treatment with 5-aza-2'-deoxycytidine (DAC), a DNA methyltransferase inhibitor. Complete or partial methylation of DSG2 promoter region was detected in 5 out of 6 cancer cell lines by BS. In primary lung tumors, higher protein expression of DSG2 and DSG3 correlated to the diagnosis of squamous cell lung carcinoma (SCC) (P=0.009 and P<0.001, respectively), additionally, a lower expression of DSG3 was significantly linked to higher tumor grade (P=0.012). CONCLUSIONS Our data suggest that downregulation of DSG2 and DSG3 could be partially explained by DNA methylation. DSG2 and DSG3 might be potential diagnostic markers for SCC, and DSG3 could be a potential differentiation marker for lung cancer.
Collapse
Affiliation(s)
- Friederike Saaber
- Institute of Pathology, Jena University Hospital, Ziegelmühlenweg 1, 07740 Jena, Germany
| | - Yuan Chen
- Institute of Pathology, Jena University Hospital, Ziegelmühlenweg 1, 07740 Jena, Germany
| | - Tiantian Cui
- Institute of Pathology, Jena University Hospital, Ziegelmühlenweg 1, 07740 Jena, Germany
| | - Linlin Yang
- Institute of Pathology, Jena University Hospital, Ziegelmühlenweg 1, 07740 Jena, Germany
| | - Masoud Mireskandari
- Institute of Pathology, Jena University Hospital, Ziegelmühlenweg 1, 07740 Jena, Germany
| | - Iver Petersen
- Institute of Pathology, Jena University Hospital, Ziegelmühlenweg 1, 07740 Jena, Germany.
| |
Collapse
|
13
|
KayvanJoo AH, Ebrahimi M, Haqshenas G. Prediction of hepatitis C virus interferon/ribavirin therapy outcome based on viral nucleotide attributes using machine learning algorithms. BMC Res Notes 2014; 7:565. [PMID: 25150834 PMCID: PMC4246553 DOI: 10.1186/1756-0500-7-565] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 08/10/2014] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Hepatitis C virus (HCV) causes chronic hepatitis C in 2-3% of world population and remains one of the health threatening human viruses, worldwide. In the absence of an effective vaccine, therapeutic approach is the only option to combat hepatitis C. Interferon-alpha (IFN-alpha) and ribavirin (RBV) combination alone or in combination with recently introduced new direct-acting antivirals (DAA) is used to treat patients infected with HCV. The present study utilized feature selection methods (Gini Index, Chi Squared and machine learning algorithms) and other bioinformatics tools to identify genetic determinants of therapy outcome within the entire HCV nucleotide sequence. RESULTS Using combination of several algorithms, the present study performed a comprehensive bioinformatics analysis and identified several nucleotide attributes within the full-length nucleotide sequences of HCV subtypes 1a and 1b that correlated with treatment outcome. Feature selection algorithms identified several nucleotide features (e.g. count of hydrogen and CG). Combination of algorithms utilized the selected nucleotide attributes and predicted HCV subtypes 1a and 1b therapy responders from non-responders with an accuracy of 75.00% and 85.00%, respectively. In addition, therapy responders and relapsers were categorized with an accuracy of 82.50% and 84.17%, respectively. Based on the identified attributes, decision trees were induced to differentiate different therapy response groups. CONCLUSIONS The present study identified new genetic markers that potentially impact the outcome of hepatitis C treatment. In addition, the results suggest new viral genomic attributes that might influence the outcome of IFN-mediated immune response to HCV infection.
Collapse
Affiliation(s)
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran.
| | | |
Collapse
|
14
|
Bakhtiarizadeh MR, Moradi-Shahrbabak M, Ebrahimi M, Ebrahimie E. Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology. J Theor Biol 2014; 356:213-22. [PMID: 24819464 DOI: 10.1016/j.jtbi.2014.04.040] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 04/03/2014] [Accepted: 04/29/2014] [Indexed: 01/05/2023]
Abstract
Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods.
Collapse
Affiliation(s)
| | - Mohammad Moradi-Shahrbabak
- Department of Animal Science, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | - Esmaeil Ebrahimie
- Department of Crop Production & Plant Breeding, College of Agriculture, Shiraz University, Shiraz, Iran; School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia.
| |
Collapse
|
15
|
Ebrahimi M, Aghagolzadeh P, Shamabadi N, Tahmasebi A, Alsharifi M, Adelson DL, Hemmatzadeh F, Ebrahimie E. Understanding the undelaying mechanism of HA-subtyping in the level of physic-chemical characteristics of protein. PLoS One 2014; 9:e96984. [PMID: 24809455 PMCID: PMC4014573 DOI: 10.1371/journal.pone.0096984] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 04/07/2014] [Indexed: 01/05/2023] Open
Abstract
The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics.
Collapse
Affiliation(s)
- Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | - Parisa Aghagolzadeh
- Department of Nephrology, Hypertension, and Clinical Pharmacology, University of Bern, Bern, Switzerland
| | - Narges Shamabadi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | | | - Mohammed Alsharifi
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
| | - David L. Adelson
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
| | - Farhid Hemmatzadeh
- School of Animal and Veterinary Science, The University of Adelaide, Adelaide, Australia
- * E-mail: (FH); (EE)
| | - Esmaeil Ebrahimie
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
- * E-mail: (FH); (EE)
| |
Collapse
|
16
|
Hosseinzadeh F, Kayvanjoo AH, Ebrahimi M, Goliaei B. Prediction of lung tumor types based on protein attributes by machine learning algorithms. SPRINGERPLUS 2013; 2:238. [PMID: 23888262 PMCID: PMC3710575 DOI: 10.1186/2193-1801-2-238] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2013] [Accepted: 03/21/2013] [Indexed: 01/15/2023]
Abstract
Early diagnosis of lung cancers and distinction between the tumor types (Small Cell Lung Cancer (SCLC) and Non-Small Cell Lung Cancer (NSCLC) are very important to increase the survival rate of patients. Herein, we propose a diagnostic system based on sequence-derived structural and physicochemical attributes of proteins that involved in both types of tumors via feature extraction, feature selection and prediction models. 1497 proteins attributes computed and important features selected by 12 attribute weighting models and finally machine learning models consist of seven SVM models, three ANN models and two NB models applied on original database and newly created ones from attribute weighting models; models accuracies calculated through 10-fold cross and wrapper validation (just for SVM algorithms). In line with our previous findings, dipeptide composition, autocorrelation and distribution descriptor were the most important protein features selected by bioinformatics tools. The algorithms performances in lung cancer tumor type prediction increased when they applied on datasets created by attribute weighting models rather than original dataset. Wrapper-Validation performed better than X-Validation; the best cancer type prediction resulted from SVM and SVM Linear models (82%). The best accuracy of ANN gained when Neural Net model applied on SVM dataset (88%). This is the first report suggesting that the combination of protein features and attribute weighting models with machine learning algorithms can be effectively used to predict the type of lung cancer tumors (SCLC and NSCLC).
Collapse
Affiliation(s)
- Faezeh Hosseinzadeh
- Laboratory of biophysics and molecular biology, Institute of Biophysics and Biochemistry (IBB), University of Tehran, Tehran, Iran
| | | | | | | |
Collapse
|
17
|
Molecular classification of non-small-cell lung cancer: diagnosis, individualized treatment, and prognosis. Front Med 2013; 7:157-71. [PMID: 23681892 DOI: 10.1007/s11684-013-0272-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2013] [Accepted: 04/19/2013] [Indexed: 12/16/2022]
Abstract
Non-small-cell lung cancer (NSCLC) is the most common cause of premature death among the malignant diseases worldwide. The current staging criteria do not fully capture the complexity of this disease. Molecular biology techniques, particularly gene expression microarrays, proteomics, and next-generation sequencing, have recently been developed to facilitate effectively its molecular classification. The underlying etiology, pathogenesis, therapeutics, and prognosis of NSCLC based on an improved molecular classification scheme may promote individualized treatment and improve clinical outcomes. This review focuses on the molecular classification of NSCLC based on gene expression microarray technology reported during the past decade, as well as their applications for improving the diagnosis, staging and treatment of NSCLC, including the discovery of prognostic markers or potential therapeutic targets. We highlight some of the recent studies that may refine the identification of NSCLC subtypes using novel techniques such as epigenetics, proteomics, or deep sequencing.
Collapse
|
18
|
Ramani RG, Jacob SG. Improved classification of lung cancer tumors based on structural and physicochemical properties of proteins using data mining models. PLoS One 2013; 8:e58772. [PMID: 23505559 PMCID: PMC3591381 DOI: 10.1371/journal.pone.0058772] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2012] [Accepted: 02/06/2013] [Indexed: 11/22/2022] Open
Abstract
Detecting divergence between oncogenic tumors plays a pivotal role in cancer diagnosis and therapy. This research work was focused on designing a computational strategy to predict the class of lung cancer tumors from the structural and physicochemical properties (1497 attributes) of protein sequences obtained from genes defined by microarray analysis. The proposed methodology involved the use of hybrid feature selection techniques (gain ratio and correlation based subset evaluators with Incremental Feature Selection) followed by Bayesian Network prediction to discriminate lung cancer tumors as Small Cell Lung Cancer (SCLC), Non-Small Cell Lung Cancer (NSCLC) and the COMMON classes. Moreover, this methodology eliminated the need for extensive data cleansing strategies on the protein properties and revealed the optimal and minimal set of features that contributed to lung cancer tumor classification with an improved accuracy compared to previous work. We also attempted to predict via supervised clustering the possible clusters in the lung tumor data. Our results revealed that supervised clustering algorithms exhibited poor performance in differentiating the lung tumor classes. Hybrid feature selection identified the distribution of solvent accessibility, polarizability and hydrophobicity as the highest ranked features with Incremental feature selection and Bayesian Network prediction generating the optimal Jack-knife cross validation accuracy of 87.6%. Precise categorization of oncogenic genes causing SCLC and NSCLC based on the structural and physicochemical properties of their protein sequences is expected to unravel the functionality of proteins that are essential in maintaining the genomic integrity of a cell and also act as an informative source for drug design, targeting essential protein properties and their composition that are found to exist in lung cancer tumors.
Collapse
Affiliation(s)
- R. Geetha Ramani
- Department of Information Science and Technology, College of Engineering, Guindy, Anna University, Chennai, Tamilnadu, India
| | - Shomona Gracia Jacob
- Faculty of Information and Communication Engineering, Anna University, Chennai, Tamilnadu, India
| |
Collapse
|
19
|
A new avenue for classification and prediction of olive cultivars using supervised and unsupervised algorithms. PLoS One 2012; 7:e44164. [PMID: 22957050 PMCID: PMC3434224 DOI: 10.1371/journal.pone.0044164] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2012] [Accepted: 07/30/2012] [Indexed: 11/19/2022] Open
Abstract
Various methods have been used to identify cultivares of olive trees; herein we used different bioinformatics algorithms to propose new tools to classify 10 cultivares of olive based on RAPD and ISSR genetic markers datasets generated from PCR reactions. Five RAPD markers (OPA0a21, OPD16a, OP01a1, OPD16a1 and OPA0a8) and five ISSR markers (UBC841a4, UBC868a7, UBC841a14, U12BC807a and UBC810a13) selected as the most important markers by all attribute weighting models. K-Medoids unsupervised clustering run on SVM dataset was fully able to cluster each olive cultivar to the right classes. All trees (176) induced by decision tree models generated meaningful trees and UBC841a4 attribute clearly distinguished between foreign and domestic olive cultivars with 100% accuracy. Predictive machine learning algorithms (SVM and Naïve Bayes) were also able to predict the right class of olive cultivares with 100% accuracy. For the first time, our results showed data mining techniques can be effectively used to distinguish between plant cultivares and proposed machine learning based systems in this study can predict new olive cultivars with the best possible accuracy.
Collapse
|