1
|
Berrar D, Lopes P, Dubitzky W. Incorporating domain knowledge in machine learning for soccer outcome prediction. Mach Learn 2018. [DOI: 10.1007/s10994-018-5747-8] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
2
|
Giannini R, Torregrossa L, Gottardi S, Fregoli L, Borrelli N, Savino M, Macerola E, Vitti P, Miccoli P, Basolo F. Digital gene expression profiling of a series of cytologically indeterminate thyroid nodules. Cancer Cytopathol 2015; 123:461-70. [PMID: 26033834 DOI: 10.1002/cncy.21564] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Revised: 04/24/2015] [Accepted: 04/30/2015] [Indexed: 12/29/2022]
Abstract
BACKGROUND Fine-needle aspiration cytology (FNAC) has been widely accepted as the most crucial step in the preoperative assessment of thyroid nodules. Testing for the expression of specific genes should improve the accuracy of FNAC diagnosis, especially when it is performed in samples with indeterminate cytology. METHODS In total, 69 consecutive FNACs that had both cytologic and histologic diagnoses were collected, and expression levels of 34 genes were determined in RNA extracted from FNAC cells by using a custom digital mRNA counting assay. A supervised k-nearest neighbor (K-nn) learning approach was used to build a 2-class prediction model based on a subset of 27 benign and 26 malignant FNAC samples. Then, the K-nn models were used to classify the 16 indeterminate FNAC samples. RESULTS Malignant and benign thyroid nodules had different gene expression profiles. The K-nn approach was able to correctly classify 10 FNAC samples as benign, whereas only 1 sample was grouped in the malignant class. Two malignant FNAC samples were incorrectly classified as benign, and 3 of 16 samples were unclassified. CONCLUSIONS Although the current data will require further confirmation in a larger number of cases, the preliminary results indicate that testing for specific gene expression appears to be useful for distinguishing between benign and malignant lesions. The results from this study indicate that, in indeterminate FNAC samples, testing for cancer-specific gene expression signatures, together with mutational analyses, could improve diagnostic accuracy for patients with thyroid nodules.
Collapse
Affiliation(s)
- Riccardo Giannini
- Department of Surgical, Medical, and Molecular Pathology and Critical Care, University of Pisa, Pisa, Italy
| | | | | | - Lorenzo Fregoli
- Department of Surgical, Medical, and Molecular Pathology and Critical Care, University of Pisa, Pisa, Italy
| | - Nicla Borrelli
- Department of Surgical, Medical, and Molecular Pathology and Critical Care, University of Pisa, Pisa, Italy
| | | | - Elisabetta Macerola
- Department of Surgical, Medical, and Molecular Pathology and Critical Care, University of Pisa, Pisa, Italy
| | - Paolo Vitti
- Department of Experimental and Clinical Medicine, University of Pisa, Pisa, Italy
| | - Paolo Miccoli
- Department of Surgical, Medical, and Molecular Pathology and Critical Care, University of Pisa, Pisa, Italy
| | - Fulvio Basolo
- Department of Surgical, Medical, and Molecular Pathology and Critical Care, University of Pisa, Pisa, Italy
| |
Collapse
|
3
|
Chakraborty D, Maulik U. Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE-JTEHM 2014; 2:4300211. [PMID: 27170887 PMCID: PMC4848046 DOI: 10.1109/jtehm.2014.2375820] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2014] [Revised: 09/20/2014] [Accepted: 11/22/2014] [Indexed: 11/07/2022]
Abstract
Microarrays have now gone from obscurity to being almost ubiquitous in biological research. At the same time, the statistical methodology for microarray analysis has progressed from simple visual assessments of results to novel algorithms for analyzing changes in expression profiles. In a micro-RNA (miRNA) or gene-expression profiling experiment, the expression levels of thousands of genes/miRNAs are simultaneously monitored to study the effects of certain treatments, diseases, and developmental stages on their expressions. Microarray-based gene expression profiling can be used to identify genes, whose expressions are changed in response to pathogens or other organisms by comparing gene expression in infected to that in uninfected cells or tissues. Recent studies have revealed that patterns of altered microarray expression profiles in cancer can serve as molecular biomarkers for tumor diagnosis, prognosis of disease-specific outcomes, and prediction of therapeutic responses. Microarray data sets containing expression profiles of a number of miRNAs or genes are used to identify biomarkers, which have dysregulation in normal and malignant tissues. However, small sample size remains a bottleneck to design successful classification methods. On the other hand, adequate number of microarray data that do not have clinical knowledge can be employed as additional source of information. In this paper, a combination of kernelized fuzzy rough set (KFRS) and semisupervised support vector machine (S(3)VM) is proposed for predicting cancer biomarkers from one miRNA and three gene expression data sets. Biomarkers are discovered employing three feature selection methods, including KFRS. The effectiveness of the proposed KFRS and S(3)VM combination on the microarray data sets is demonstrated, and the cancer biomarkers identified from miRNA data are reported. Furthermore, biological significance tests are conducted for miRNA cancer biomarkers.
Collapse
|
4
|
Gevaert O, De Moor B. Prediction of cancer outcome using DNA microarray technology: past, present and future. ACTA ACUST UNITED AC 2013; 3:157-65. [PMID: 23485162 DOI: 10.1517/17530050802680172] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
BACKGROUND The use of DNA microarray technology to predict cancer outcome already has a history of almost a decade. Although many breakthroughs have been made, the promise of individualized therapy is still not fulfilled. In addition, new technologies are emerging that also show promise in outcome prediction of cancer patients. OBJECTIVE The impact of DNA microarray and other 'omics' technologies on the outcome prediction of cancer patients was investigated. Whether integration of omics data results in better predictions was also examined. METHODS DNA microarray technology was focused on as a starting point because this technology is considered to be the most mature technology from all omics technologies. Next, emerging technologies that may accomplish the same goals but have been less extensively studied are described. CONCLUSION Besides DNA microarray technology, other omics technologies have shown promise in predicting the cancer outcome or have potential to replace microarray technology in the near future. Moreover, it is shown that integration of multiple omics data can result in better predictions of cancer outcome; but, owing to the lack of comprehensive studies, validation studies are required to verify which omics has the most information and whether a combination of multiple omics data improves predictive performance.
Collapse
Affiliation(s)
- Olivier Gevaert
- Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD-Sista, Kasteelpark Arenberg 10, 3001 Leuven, Belgium +32 16 328646 ; +32 16 32 ;
| | | |
Collapse
|
5
|
Nazeer KAA, Sebastian MP, Kumar SDM. A novel harmony search-K means hybrid algorithm for clustering gene expression data. Bioinformation 2013; 9:84-8. [PMID: 23390351 PMCID: PMC3563403 DOI: 10.6026/97320630009084] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 12/21/2012] [Indexed: 11/24/2022] Open
Abstract
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.
Collapse
Affiliation(s)
- KA Abdul Nazeer
- Department of Computer Science and Engineering, National Institute of Technology Calicut, India- 673 601
| | - MP Sebastian
- Information Technology and Systems Area, Indian Institute of Management Kozhikode, India- 673 570
| | - SD Madhu Kumar
- Department of Computer Science and Engineering, National Institute of Technology Calicut, India- 673 601
| |
Collapse
|
6
|
Use of yeast chemigenomics and COXEN informatics in preclinical evaluation of anticancer agents. Neoplasia 2011; 13:72-80. [PMID: 21253455 DOI: 10.1593/neo.101214] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2010] [Revised: 10/07/2010] [Accepted: 10/09/2010] [Indexed: 11/18/2022] Open
Abstract
Bladder cancer metastasis is virtually incurable with current platinum-based chemotherapy. We used the novel COXEN informatic approach for in silico drug discovery and identified NSC-637993 and NSC-645809 (C1311), both imidazoacridinones, as agents with high-predicted activity in human bladder cancer. Because even highly effective monotherapy is unlikely to cure most patients with metastasis and NSC-645809 is undergoing clinical trials in other tumor types, we sought to develop the basis for use of C1311 in rational combination with other agents in bladder cancer. Here, we demonstrate in 40 human bladder cancer cells that the in vitro cytotoxicity profile for C1311 correlates with that of NSC-637993 and compares favorably to that of standard of care chemotherapeutics. Using genome-wide patterns of synthetic lethality of C1311 with open reading frame knockouts in budding yeast, we determined that combining C1311 with a taxane could provide mechanistically rational combinations. To determine the preclinical relevance of these yeast findings, we evaluated C1311 singly and in doublet combination with paclitaxel in human bladder cancer in the in vivo hollow fiber assay and observed efficacy. By applying COXEN to gene expression data from 40 bladder cancer cell lines and 30 human tumors with associated clinical response data to platinum-based chemotherapy, we provide evidence that signatures of C1311 sensitivity exist within nonresponders to this regimen. Coupling COXEN and yeast chemigenomics provides rational combinations with C1311 and tumor genomic signatures that can be used to select bladder cancer patients for clinical trials with this agent.
Collapse
|
7
|
Chuang LY, Yang CH, Li JC, Yang CH. A hybrid BPSO-CGA approach for gene selection and classification of microarray data. J Comput Biol 2011; 19:68-82. [PMID: 21210743 DOI: 10.1089/cmb.2010.0064] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Microarray analysis promises to detect variations in gene expressions, and changes in the transcription rates of an entire genome in vivo. Microarray gene expression profiles indicate the relative abundance of mRNA corresponding to the genes. The selection of relevant genes from microarray data poses a formidable challenge to researchers due to the high-dimensionality of features, multiclass categories being involved, and the usually small sample size. A classification process is often employed which decreases the dimensionality of the microarray data. In order to correctly analyze microarray data, the goal is to find an optimal subset of features (genes) which adequately represents the original set of features. A hybrid method of binary particle swarm optimization (BPSO) and a combat genetic algorithm (CGA) is to perform the microarray data selection. The K-nearest neighbor (K-NN) method with leave-one-out cross-validation (LOOCV) served as a classifier. The proposed BPSO-CGA approach is compared to ten microarray data sets from the literature. The experimental results indicate that the proposed method not only effectively reduce the number of genes expression level, but also achieves a low classification error rate.
Collapse
Affiliation(s)
- Li-Yeh Chuang
- Department of Chemical Engineering, and Institute of Biotechnology and Chemical Engineering, I-Shou University Kaohsiung, Taiwan
| | | | | | | |
Collapse
|
8
|
Chuang LY, Yang CH, Yang CH. Tabu search and binary particle swarm optimization for feature selection using microarray data. J Comput Biol 2010; 16:1689-703. [PMID: 20047491 DOI: 10.1089/cmb.2007.0211] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Gene expression profiles have great potential as a medical diagnosis tool because they represent the state of a cell at the molecular level. In the classification of cancer type research, available training datasets generally have a fairly small sample size compared to the number of genes involved. This fact poses an unprecedented challenge to some classification methodologies due to training data limitations. Therefore, a good selection method for genes relevant for sample classification is needed to improve the predictive accuracy, and to avoid incomprehensibility due to the large number of genes investigated. In this article, we propose to combine tabu search (TS) and binary particle swarm optimization (BPSO) for feature selection. BPSO acts as a local optimizer each time the TS has been run for a single generation. The K-nearest neighbor method with leave-one-out cross-validation and support vector machine with one-versus-rest serve as evaluators of the TS and BPSO. The proposed method is applied and compared to the 11 classification problems taken from the literature. Experimental results show that our method simplifies features effectively and either obtains higher classification accuracy or uses fewer features compared to other feature selection methods.
Collapse
Affiliation(s)
- Li-Yeh Chuang
- Department of Chemical Engineering, I-Shou University, Kaohsiung, Taiwan
| | | | | |
Collapse
|
9
|
Yao B, Li S. ANMM4CBR: a case-based reasoning method for gene expression data classification. Algorithms Mol Biol 2010; 5:14. [PMID: 20051140 PMCID: PMC2843690 DOI: 10.1186/1748-7188-5-14] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2009] [Accepted: 01/06/2010] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Accurate classification of microarray data is critical for successful clinical diagnosis and treatment. The "curse of dimensionality" problem and noise in the data, however, undermines the performance of many algorithms. METHOD In order to obtain a robust classifier, a novel Additive Nonparametric Margin Maximum for Case-Based Reasoning (ANMM4CBR) method is proposed in this article. ANMM4CBR employs a case-based reasoning (CBR) method for classification. CBR is a suitable paradigm for microarray analysis, where the rules that define the domain knowledge are difficult to obtain because usually only a small number of training samples are available. Moreover, in order to select the most informative genes, we propose to perform feature selection via additively optimizing a nonparametric margin maximum criterion, which is defined based on gene pre-selection and sample clustering. Our feature selection method is very robust to noise in the data. RESULTS The effectiveness of our method is demonstrated on both simulated and real data sets. We show that the ANMM4CBR method performs better than some state-of-the-art methods such as support vector machine (SVM) and k nearest neighbor (kNN), especially when the data contains a high level of noise. AVAILABILITY The source code is attached as an additional file of this paper.
Collapse
Affiliation(s)
- Bangpeng Yao
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, PR China
| | - Shao Li
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, PR China
| |
Collapse
|
10
|
Biddiss EA, Chau TT. Multivariate prediction of upper limb prosthesis acceptance or rejection. Disabil Rehabil Assist Technol 2009; 3:181-92. [PMID: 19238719 DOI: 10.1080/17483100701869826] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
OBJECTIVE To develop a model for prediction of upper limb prosthesis use or rejection. DESIGN A questionnaire exploring factors in prosthesis acceptance was distributed internationally to individuals with upper limb absence through community-based support groups and rehabilitation hospitals. SUBJECTS A total of 191 participants (59 prosthesis rejecters and 132 prosthesis wearers) were included in this study. METHODS A logistic regression model, a C5.0 decision tree, and a radial basis function neural network were developed and compared in terms of sensitivity (prediction of prosthesis rejecters), specificity (prediction of prosthesis wearers), and overall cross-validation accuracy. RESULTS The logistic regression and neural network provided comparable overall accuracies of approximately 84 +/- 3%, specificity of 93%, and sensitivity of 61%. Fitting time-frame emerged as the predominant predictor. Individuals fitted within two years of birth (congenital) or six months of amputation (acquired) were 16 times more likely to continue prosthesis use. CONCLUSIONS To increase rates of prosthesis acceptance, clinical directives should focus on timely, client-centred fitting strategies and the development of improved prostheses and healthcare for individuals with high-level or bilateral limb absence. Multivariate analyses are useful in determining the relative importance of the many factors involved in prosthesis acceptance and rejection.
Collapse
|
11
|
TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics 2009; 10:56. [PMID: 19210774 PMCID: PMC2653487 DOI: 10.1186/1471-2105-10-56] [Citation(s) in RCA: 142] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2008] [Accepted: 02/11/2009] [Indexed: 02/03/2023] Open
Abstract
Background Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. Results Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp – 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments ≥ 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained. Conclusion An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference set can be easily updated as newly sequenced genomes become available. Moreover, the method demonstrated to be competitive when compared to the most current classifier PhyloPythia and has the advantage that it can be locally installed and the reference set can be kept up-to-date.
Collapse
|
12
|
Yang CS, Chuang LY, Ke CH, Yang CH. A Combination of Shuffled Frog-Leaping Algorithm and Genetic Algorithm for Gene Selection. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2008. [DOI: 10.20965/jaciii.2008.p0218] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Microarray data referencing to gene expression profiles provides valuable answers to a variety of problems, and contributes to advances in clinical medicine. The application of microarray data to the classification of cancer types has recently assumed increasing importance. The classification of microarray data samples involves feature selection, whose goal is to identify subsets of differentially expressed gene potentially relevant for distinguishing sample classes and classifier design. We propose an efficient evolutionary approach for selecting gene subsets from gene expression data that effectively achieves higher accuracy for classification problems. Our proposal combines a shuffled frog-leaping algorithm (SFLA) and a genetic algorithm (GA), and chooses genes (features) related to classification. The K-nearest neighbor (KNN) with leave-one-out cross validation (LOOCV) is used to evaluate classification accuracy. We apply a novel hybrid approach based on SFLA-GA and KNN classification and compare 11 classification problems from the literature. Experimental results show that classification accuracy obtained using selected features was higher than the accuracy of datasets without feature selection.
Collapse
|