1
|
Performance Analysis of Ovarian Cancer Detection and Classification for Microarray Gene Data. BIOMED RESEARCH INTERNATIONAL 2022; 2022:6750457. [PMID: 35872866 PMCID: PMC9307352 DOI: 10.1155/2022/6750457] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 06/30/2022] [Indexed: 11/18/2022]
Abstract
The most common gynecologic cancer, behind cervical and uterine, is ovarian cancer. Ovarian cancer is a severe concern for women. Abnormal cells form and spread throughout the body. Ovarian cancer microarray data can diagnose and prognosis. Typically, ovarian cancer microarray data contains tens of thousands of genes. In order to reduce computational complexity, selecting the most critical genes or attributes in the entire dataset is necessary. Because microarray datasets have limited samples and many characteristics, classifier detection lags. So, dimensionality reduction measures are essential to protect disease classification genes. In this research, initially the ANOVA method is used for gene selection and then two clustering-based and three transform-based feature extraction methods, namely, Fuzzy C Means, Softmax Discriminant Algorithm (SDA), Hilbert Transform, Fast Fourier Transform (FFT), and Discrete Cosine Transform (DCT), respectively, are used to select relevant genes further. Six classifiers further classify the features as normal and abnormal. The NLR classifier gives the highest accuracy for SDA features at 92%, and KNN gives the lowest accuracy of 55% for SDA, Hilbert, and DCT features. With correlation distance feature selection, the NLR classifier attains the lowest accuracy of 53%, and the highest accuracy of 88% is obtained by the GMM classifier.
Collapse
|
2
|
Osorio M, Martinez E, Naranjo T, Castro C. Recent Advances in Polymer Nanomaterials for Drug Delivery of Adjuvants in Colorectal Cancer Treatment: A Scientific-Technological Analysis and Review. Molecules 2020; 25:E2270. [PMID: 32408538 PMCID: PMC7288015 DOI: 10.3390/molecules25102270] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 04/30/2020] [Accepted: 05/01/2020] [Indexed: 12/22/2022] Open
Abstract
Colorectal cancer (CRC) is the type with the second highest morbidity. Recently, a great number of bioactive compounds and encapsulation techniques have been developed. Thus, this paper aims to review the drug delivery strategies for chemotherapy adjuvant treatments for CRC, including an initial scientific-technological analysis of the papers and patents related to cancer, CRC, and adjuvant treatments. For 2018, a total of 167,366 cancer-related papers and 306,240 patents were found. Adjuvant treatments represented 39.3% of the total CRC patents, indicating the importance of adjuvants in the prognosis of patients. Chemotherapy adjuvants can be divided into two groups, natural and synthetic (5-fluorouracil and derivatives). Both groups can be encapsulated using polymers. Polymer-based drug delivery systems can be classified according to polymer nature. From those, anionic polymers have garnered the most attention, because they are pH responsive. The use of polymers tailors the desorption profile, improving drug bioavailability and enhancing the local treatment of CRC via oral administration. Finally, it can be concluded that antioxidants are emerging compounds that can complement today's chemotherapy treatments. In the long term, encapsulated antioxidants will replace synthetic drugs and will play an important role in curing CRC.
Collapse
Affiliation(s)
- Marlon Osorio
- School of Engineering, Universidad Pontificia Bolivariana, Circular 1 # 70-01, Medellín 050031, Colombia; (M.O.); (E.M.)
| | - Estefanía Martinez
- School of Engineering, Universidad Pontificia Bolivariana, Circular 1 # 70-01, Medellín 050031, Colombia; (M.O.); (E.M.)
| | - Tonny Naranjo
- School of Health Sciences, Universidad Pontificia Bolivariana, Calle 78 B # 72 A-109, Medellín 050034, Colombia;
- Medical and Experimental Mycology Group, Corporación para Investigaciones Biológicas, Carrera 72 A # 78 B-141, Medellín 050034, Colombia
| | - Cristina Castro
- School of Engineering, Universidad Pontificia Bolivariana, Circular 1 # 70-01, Medellín 050031, Colombia; (M.O.); (E.M.)
| |
Collapse
|
3
|
Han X, Li D, Liu P, Wang L. Feature selection by recursive binary gravitational search algorithm optimization for cancer classification. Soft comput 2020. [DOI: 10.1007/s00500-019-04203-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
4
|
Incorporating Particle Swarm Optimization into Improved Bacterial Foraging Optimization Algorithm Applied to Classify Imbalanced Data. Symmetry (Basel) 2020. [DOI: 10.3390/sym12020229] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
In this paper, particle swarm optimization is incorporated into an improved bacterial foraging optimization algorithm, which is applied to classifying imbalanced data to solve the problem of how original bacterial foraging optimization easily falls into local optimization. In this study, the borderline synthetic minority oversampling technique (Borderline-SMOTE) and Tomek link are used to pre-process imbalanced data. Then, the proposed algorithm is used to classify the imbalanced data. In the proposed algorithm, firstly, the chemotaxis process is improved. The particle swarm optimization (PSO) algorithm is used to search first and then treat the result as bacteria, improving the global searching ability of bacterial foraging optimization (BFO). Secondly, the reproduction operation is improved and the selection standard of survival of the cost is improved. Finally, we improve elimination and dispersal operation, and the population evolution factor is introduced to prevent the population from stagnating and falling into a local optimum. In this paper, three data sets are used to test the performance of the proposed algorithm. The simulation results show that the classification accuracy of the proposed algorithm is better than the existing approaches.
Collapse
|
5
|
Zararsız G, Goksuluk D, Korkmaz S, Eldem V, Zararsiz GE, Duru IP, Ozturk A. A comprehensive simulation study on classification of RNA-Seq data. PLoS One 2017; 12:e0182507. [PMID: 28832679 PMCID: PMC5568128 DOI: 10.1371/journal.pone.0182507] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2017] [Accepted: 07/19/2017] [Indexed: 02/02/2023] Open
Abstract
RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM classifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html.
Collapse
Affiliation(s)
- Gökmen Zararsız
- Turcosa Analytics Solutions Ltd Co, Erciyes Teknopark, 38039, Kayseri, Turkey
- Department of Biostatistics, Erciyes University, Kayseri, Turkey
| | - Dincer Goksuluk
- Turcosa Analytics Solutions Ltd Co, Erciyes Teknopark, 38039, Kayseri, Turkey
- Department of Biostatistics, Hacettepe University, Ankara, Turkey
| | - Selcuk Korkmaz
- Turcosa Analytics Solutions Ltd Co, Erciyes Teknopark, 38039, Kayseri, Turkey
- Department of Biostatistics, Hacettepe University, Ankara, Turkey
| | - Vahap Eldem
- Department of Biology, Istanbul University, Istanbul, Turkey
| | | | | | - Ahmet Ozturk
- Department of Biostatistics, Erciyes University, Kayseri, Turkey
| |
Collapse
|
6
|
Anand D, Pandey B, Pandey DK. Facioscapulohumeral Muscular Dystrophy Diagnosis Using Hierarchical Clustering Algorithm and K-Nearest Neighbor Based Methodology. INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS 2017. [DOI: 10.4018/ijehmc.2017040103] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The genetic diagnosis of neuromuscular disorder is an active area of research. Microarrays are used to detect the changes in genes for the accurate diagnosis. Unfortunately, the number of genes in gene expression data is very large as compared to number of samples. The number of genes needs to be reduced for correct diagnosis. In the present paper, the authors have made an intelligent integrated model for clustering and diagnosis of neuromuscular diseases. Wilcoxon signed rank test is used to preselect the genes. K-means and hierarchical clustering algorithms with different distance metric are employed to cluster the genes. Three classifiers namely linear discriminant analysis, quadratic discriminant analysis and k-nearest neighbor are used. For the employment of integrated techniques, a balanced facioscapulohumeral muscular dystrophy dataset is taken. A comparative analysis of the above integrated algorithms is presented which demonstrate that the integration of cosine distance metric hierarchical clustering algorithm with k-nearest neighbor has given the best performance measures.
Collapse
Affiliation(s)
- Divya Anand
- Department of Computer Science and Engineering, Lovely Professional University, Phagwara, India
| | - Babita Pandey
- Department of Computer Applications, Lovely Professional University, Phagwara, India
| | | |
Collapse
|
7
|
Motieghader H, Najafi A, Sadeghi B, Masoudi-Nejad A. A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata. INFORMATICS IN MEDICINE UNLOCKED 2017. [DOI: 10.1016/j.imu.2017.10.004] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
|
8
|
Kim YM, Delen D. Medical informatics research trend analysis: A text mining approach. Health Informatics J 2016; 24:432-452. [PMID: 30376768 DOI: 10.1177/1460458216678443] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The objective of this research is to identify major subject areas of medical informatics and explore the time-variant changes therein. As such it can inform the field about where medical informatics research has been and where it is heading. Furthermore, by identifying subject areas, this study identifies the development trends and the boundaries of medical informatics as an academic field. To conduct the study, first we identified 26,307 articles in PubMed archives which were published in the top medical informatics journals within the timeframe of 2002 to 2013. And then, employing a text mining -based semi-automated analytic approach, we clustered major research topics by analyzing the most frequently appearing subject terms extracted from the abstracts of these articles. The results indicated that some subject areas, such as biomedical, are declining, while other research areas such as health information technology (HIT), Internet-enabled research, and electronic medical/health records (EMR/EHR), are growing. The changes within the research subject areas can largely be attributed to the increasing capabilities and use of HIT. The Internet, for example, has changed the way medical research is conducted in the health care field. While discovering new medical knowledge through clinical and biological experiments is important, the utilization of EMR/EHR enabled the researchers to discover novel medical insight buried deep inside massive data sets, and hence, data analytics research has become a common complement in the medical field, rapidly growing in popularity.
Collapse
|
9
|
Intraoperative Diagnosis Support Tool for Serous Ovarian Tumors Based on Microarray Data Using Multicategory Machine Learning. Int J Gynecol Cancer 2016; 26:104-13. [PMID: 26512784 DOI: 10.1097/igc.0000000000000566] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
OBJECTIVES Serous borderline ovarian tumors (SBOTs) are a subtype of serous ovarian carcinoma with atypical proliferation. Frozen-section diagnosis has been used as an intraoperative diagnosis tool in supporting the fertility-sparing surgery by diagnosing SBOTs with accuracy of 48% to 79%. Using DNA microarray technology, we designed multicategory classification models to support frozen-section diagnosis within 30 minutes. MATERIALS AND METHODS We systematically evaluated 6 machine learning algorithms and 3 feature selection methods using 5-fold cross-validation and a grid search on microarray data obtained from the National Center for Biotechnology Information. To validate the models and selected biomarkers, expression profiles were analyzed in tissue samples obtained from the Yonsei University College of Medicine. RESULTS The best accuracy of the optimal machine learning model was 97.3%. In addition, 5 features, including the expression of the putative biomarkers SNTN and AOX1, were selected to differentiate between normal, SBOT, and serous ovarian carcinoma groups. Different expression levels of SNTN and AOX1 were validated by real-time quantitative reverse-transcription polymerase chain reaction, Western blotting, and immunohistochemistry. A multinomial logistic regression model using SNTN and AOX1 alone was used to construct a simple-to-use equation that gave a diagnostic test accuracy of 91.9%. CONCLUSIONS We identified 2 biomarkers, SNTN and AOX1, that are likely involved in the pathogenesis and progression of ovarian tumors. An accurate diagnosis of ovarian tumor subclasses by application of the equation in conjunction with expression analysis of SNTN and AOX1 would offer a new accurate diagnosis tool in conjunction with frozen-section diagnosis within 30 minutes.
Collapse
|
10
|
Mishra S, Mishra D. Adaptive multi-classifier fusion approach for gene expression dataset based on probabilistic theory. J Korean Stat Soc 2015. [DOI: 10.1016/j.jkss.2014.09.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
11
|
Park JS, Choi SB, Chung JW, Kim SW, Kim DW. Classification of serous ovarian tumors based on microarray data using multicategory support vector machines. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2015; 2014:3430-3. [PMID: 25570728 DOI: 10.1109/embc.2014.6944360] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Ovarian cancer, the most fatal of reproductive cancers, is the fifth leading cause of death in women in the United States. Serous borderline ovarian tumors (SBOTs) are considered to be earlier or less malignant forms of serous ovarian carcinomas (SOCs). SBOTs are asymptomatic and progression to advanced stages is common. Using DNA microarray technology, we designed multicategory classification models to discriminate ovarian cancer subclasses. To develop multicategory classification models with optimal parameters and features, we systematically evaluated three machine learning algorithms and three feature selection methods using five-fold cross validation and a grid search. The study included 22 subjects with normal ovarian surface epithelial cells, 12 with SBOTs, and 79 with SOCs according to microarray data with 54,675 probe sets obtained from the National Center for Biotechnology Information gene expression omnibus repository. Application of the optimal model of support vector machines one-versus-rest with signal-to-noise as a feature selection method gave an accuracy of 97.3%, relative classifier information of 0.916, and a kappa index of 0.941. In addition, 5 features, including the expression of putative biomarkers SNTN and AOX1, were selected to differentiate between normal, SBOT, and SOC groups. An accurate diagnosis of ovarian tumor subclasses by application of multicategory machine learning would be cost-effective and simple to perform, and would ensure more effective subclass-targeted therapy.
Collapse
|
12
|
Tsai MH, Chen MY, Huang SG, Hung YC, Wang HC. A bio-inspired computing model for ovarian carcinoma classification and oncogene detection. Bioinformatics 2014; 31:1102-10. [PMID: 25429060 DOI: 10.1093/bioinformatics/btu782] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 11/19/2014] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION Ovarian cancer is the fifth leading cause of cancer deaths in women in the western world for 2013. In ovarian cancer, benign tumors turn malignant, but the point of transition is difficult to predict and diagnose. The 5-year survival rate of all types of ovarian cancer is 44%, but this can be improved to 92% if the cancer is found and treated before it spreads beyond the ovary. However, only 15% of all ovarian cancers are found at this early stage. Therefore, the ability to automatically identify and diagnose ovarian cancer precisely and efficiently as the tissue changes from benign to invasive is important for clinical treatment and for increasing the cure rate. This study proposes a new ovarian carcinoma classification model using two algorithms: a novel discretization of food sources for an artificial bee colony (DfABC), and a support vector machine (SVM). For the first time in the literature, oncogene detection using this method is also investigated. RESULTS A novel bio-inspired computing model and hybrid algorithms combining DfABC and SVM was applied to ovarian carcinoma and oncogene classification. This study used the human ovarian cDNA expression database to collect 41 patient samples and 9600 genes in each pathological stage. Feature selection methods were used to detect and extract 15 notable oncogenes. We then used the DfABC-SVM model to examine these 15 oncogenes, dividing them into eight different classifications according to their gene expressions of various pathological stages. The average accuracyof the eight classification experiments was 94.76%. This research also found some oncogenes that had not been discovered or indicated in previous scientific studies. The main contribution of this research is the proof that these newly discovered oncogenes are highly related to ovarian or other cancers. AVAILABILITY AND IMPLEMENTATION http://mht.mis.nchu.edu.tw/moodle/course/view.php?id=7.
Collapse
Affiliation(s)
- Meng-Hsiun Tsai
- Department of Management Information System and Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung City 402, Taiwan, Department of Information Management, National Taichung University of Science and Technology, Taichung City 404, Taiwan, Institute of Nanotechnology, National Chiao Tung University, Hsinchu City 300, Taiwan and Department of Obstetrics and Gynecology, China Medical University and Hospital, Taichung City 404, Taiwan Department of Management Information System and Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung City 402, Taiwan, Department of Information Management, National Taichung University of Science and Technology, Taichung City 404, Taiwan, Institute of Nanotechnology, National Chiao Tung University, Hsinchu City 300, Taiwan and Department of Obstetrics and Gynecology, China Medical University and Hospital, Taichung City 404, Taiwan
| | - Mu-Yen Chen
- Department of Management Information System and Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung City 402, Taiwan, Department of Information Management, National Taichung University of Science and Technology, Taichung City 404, Taiwan, Institute of Nanotechnology, National Chiao Tung University, Hsinchu City 300, Taiwan and Department of Obstetrics and Gynecology, China Medical University and Hospital, Taichung City 404, Taiwan
| | - Steve G Huang
- Department of Management Information System and Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung City 402, Taiwan, Department of Information Management, National Taichung University of Science and Technology, Taichung City 404, Taiwan, Institute of Nanotechnology, National Chiao Tung University, Hsinchu City 300, Taiwan and Department of Obstetrics and Gynecology, China Medical University and Hospital, Taichung City 404, Taiwan
| | - Yao-Ching Hung
- Department of Management Information System and Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung City 402, Taiwan, Department of Information Management, National Taichung University of Science and Technology, Taichung City 404, Taiwan, Institute of Nanotechnology, National Chiao Tung University, Hsinchu City 300, Taiwan and Department of Obstetrics and Gynecology, China Medical University and Hospital, Taichung City 404, Taiwan
| | - Hsin-Chieh Wang
- Department of Management Information System and Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung City 402, Taiwan, Department of Information Management, National Taichung University of Science and Technology, Taichung City 404, Taiwan, Institute of Nanotechnology, National Chiao Tung University, Hsinchu City 300, Taiwan and Department of Obstetrics and Gynecology, China Medical University and Hospital, Taichung City 404, Taiwan
| |
Collapse
|
13
|
NIM: a node influence based method for cancer classification. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2014; 2014:826373. [PMID: 25180045 PMCID: PMC4144086 DOI: 10.1155/2014/826373] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2014] [Revised: 06/16/2014] [Accepted: 06/23/2014] [Indexed: 11/18/2022]
Abstract
The classification of different cancer types owns great significance in the medical field. However, the great majority of existing cancer classification methods are clinical-based and have relatively weak diagnostic ability. With the rapid development of gene expression technology, it is able to classify different kinds of cancers using DNA microarray. Our main idea is to confront the problem of cancer classification using gene expression data from a graph-based view. Based on a new node influence model we proposed, this paper presents a novel high accuracy method for cancer classification, which is composed of four parts: the first is to calculate the similarity matrix of all samples, the second is to compute the node influence of training samples, the third is to obtain the similarity between every test sample and each class using weighted sum of node influence and similarity matrix, and the last is to classify each test sample based on its similarity between every class. The data sets used in our experiments are breast cancer, central nervous system, colon tumor, prostate cancer, acute lymphoblastic leukemia, and lung cancer. experimental results showed that our node influence based method (NIM) is more efficient and robust than the support vector machine, K-nearest neighbor, C4.5, naive Bayes, and CART.
Collapse
|
14
|
Wang D, Quek C, See Ng G. Ovarian cancer diagnosis using a hybrid intelligent system with simple yet convincing rules. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2013.12.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
15
|
|
16
|
Lin SW, Ying KC, Lee CY, Lee ZJ. An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection. Appl Soft Comput 2012. [DOI: 10.1016/j.asoc.2012.05.004] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
17
|
|
18
|
Nahar J, Tickle KS, Shawkat Ali AB. Pattern Discovery from Biological Data. Mach Learn 2012. [DOI: 10.4018/978-1-60960-818-7.ch403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Extracting useful information from structured and unstructured biological data is crucial in the health industry. Some examples include medical practitioner’s need to identify breast cancer patient in the early stage, estimate survival time of a heart disease patient, or recognize uncommon disease characteristics which suddenly appear. Currently there is an explosion in biological data available in the data bases. But information extraction and true open access to data are require time to resolve issues such as ethical clearance. The emergence of novel IT technologies allows health practitioners to facilitate the comprehensive analyses of medical images, genomes, transcriptomes, and proteomes in health and disease. The information that is extracted from such technologies may soon exert a dramatic change in the pace of medical research and impact considerably on the care of patients. The current research will review the existing technologies being used in heart and cancer research. Finally this research will provide some possible solutions to overcome the limitations of existing technologies. In summary the primary objective of this research is to investigate how existing modern machine learning techniques (with their strength and limitations) are being used in the indent of heartbeat related disease and the early detection of cancer in patients. After an extensive literature review these are the objectives chosen: to develop a new approach to find the association between diseases such as high blood pressure, stroke and heartbeat, to propose an improved feature selection method to analyze huge images and microarray databases for machine learning algorithms in cancer research, to find an automatic distance function selection method for clustering tasks, to discover the most significant risk factors for specific cancers, and to determine the preventive factors for specific cancers that are aligned with the most significant risk factors. Therefore we propose a research plan to attain these objectives within this chapter. The possible solutions of the above objectives are: new heartbeat identification techniques show promising association with the heartbeat patterns and diseases, sensitivity based feature selection methods will be applied to early cancer patient classification, meta learning approaches will be adopted in clustering algorithms to select an automatic distance function, and Apriori algorithm will be applied to discover the significant risks and preventive factors for specific cancers. We expect this research will add significant contributions to the medical professional to enable more accurate diagnosis and better patient care. It will also contribute in other area such as biomedical modeling, medical image analysis and early diseases warning.
Collapse
|
19
|
Muselli M, Bertoni A, Frasca M, Beghini A, Ruffino F, Valentini G. A mathematical model for the validation of gene selection methods. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1385-1392. [PMID: 21778526 DOI: 10.1109/tcbb.2010.83] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Gene selection methods aim at determining biologically relevant subsets of genes in DNA microarray experiments. However, their assessment and validation represent a major difficulty since the subset of biologically relevant genes is usually unknown. To solve this problem a novel procedure for generating biologically plausible synthetic gene expression data is proposed. It is based on a proper mathematical model representing gene expression signatures and expression profiles through Boolean threshold functions. The results show that the proposed procedure can be successfully adopted to analyze the quality of statistical and machine learning-based gene selection algorithms.
Collapse
|
20
|
Han B, Li L, Chen Y, Zhu L, Dai Q. A two step method to identify clinical outcome relevant genes with microarray data. J Biomed Inform 2011; 44:229-38. [DOI: 10.1016/j.jbi.2010.11.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2010] [Revised: 10/06/2010] [Accepted: 11/29/2010] [Indexed: 12/29/2022]
|
21
|
Saraswathi S, Sundaram S, Sundararajan N, Zimmermann M, Nilsen-Hamilton M. ICGA-PSO-ELM approach for accurate multiclass cancer classification resulting in reduced gene sets in which genes encoding secreted proteins are highly represented. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:452-463. [PMID: 21233525 DOI: 10.1109/tcbb.2010.13] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
A combination of Integer-Coded Genetic Algorithm (ICGA) and Particle Swarm Optimization (PSO), coupled with the neural-network-based Extreme Learning Machine (ELM), is used for gene selection and cancer classification. ICGA is used with PSO-ELM to select an optimal set of genes, which is then used to build a classifier to develop an algorithm (ICGA_PSO_ELM) that can handle sparse data and sample imbalance. We evaluate the performance of ICGA-PSO-ELM and compare our results with existing methods in the literature. An investigation into the functions of the selected genes, using a systems biology approach, revealed that many of the identified genes are involved in cell signaling and proliferation. An analysis of these gene sets shows a larger representation of genes that encode secreted proteins than found in randomly selected gene sets. Secreted proteins constitute a major means by which cells interact with their surroundings. Mounting biological evidence has identified the tumor microenvironment as a critical factor that determines tumor survival and growth. Thus, the genes identified by this study that encode secreted proteins might provide important insights to the nature of the critical biological features in the microenvironment of each tumor type that allow these cells to thrive and proliferate.
Collapse
|
22
|
Gustafsson MG, Wallman M, Wickenberg Bolin U, Göransson H, Fryknäs M, Andersson CR, Isaksson A. Improving Bayesian credibility intervals for classifier error rates using maximum entropy empirical priors. Artif Intell Med 2010; 49:93-104. [DOI: 10.1016/j.artmed.2010.02.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2009] [Revised: 12/07/2009] [Accepted: 02/16/2010] [Indexed: 10/19/2022]
|
23
|
Yang P, Xu L, Zhou BB, Zhang Z, Zomaya AY. A particle swarm based hybrid system for imbalanced medical data sampling. BMC Genomics 2009; 10 Suppl 3:S34. [PMID: 19958499 PMCID: PMC2788388 DOI: 10.1186/1471-2164-10-s3-s34] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance problem in medical and biological data mining. This hybrid system combines the particle swarm optimization (PSO) algorithm with multiple classifiers and evaluation metrics for evaluation fusion. Samples from the majority class are ranked using multiple objectives according to their merit in compensating the class imbalance, and then combined with the minority class to form a balanced dataset. RESULTS One important finding of this study is that different classifiers and metrics often provide different evaluation results. Nevertheless, the proposed hybrid system demonstrates consistent improvements over several alternative methods with three different metrics. The sampling results also demonstrate good generalization on different types of classification algorithms, indicating the advantage of information fusion applied in the hybrid system. CONCLUSION The experimental results demonstrate that unlike many currently available methods which often perform unevenly with different datasets the proposed hybrid system has a better generalization property which alleviates the method-data dependency problem. From the biological perspective, the system provides indication for further investigation of the highly ranked samples, which may result in the discovery of new conditions or disease subtypes.
Collapse
Affiliation(s)
- Pengyi Yang
- School of Information Technologies (J12), The University of Sydney, NSW 2006, Australia.
| | | | | | | | | |
Collapse
|
24
|
Valentini G, Tagliaferri R, Masulli F. Computational intelligence and machine learning in bioinformatics. Artif Intell Med 2008; 45:91-6. [PMID: 18929473 DOI: 10.1016/j.artmed.2008.08.014] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|