Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Nagi S, Bhattacharyya DK. Classification of microarray cancer data using ensemble approach. ACTA ACUST UNITED AC 2013. [DOI: 10.1007/s13721-013-0034-x] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Number

Cited by Other Article(s)

Choi TJ, An HE, Kim CB. Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using Daphnia magna Transcriptomic Profiles. Life (Basel) 2022;12:1443. [PMID: 36143479 PMCID: PMC9503646 DOI: 10.3390/life12091443] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 09/14/2022] [Accepted: 09/14/2022] [Indexed: 11/16/2022] Open

Interval modelling in optimization of k‐NN classifiers for large number of attributes in data sets on an example of DNA microarrays. INT J INTELL SYST 2021. [DOI: 10.1002/int.22679] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Bose S, Das C, Banerjee A, Ghosh K, Chattopadhyay M, Chattopadhyay S, Barik A. An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples. PeerJ Comput Sci 2021;7:e671. [PMID: 34616883 PMCID: PMC8459790 DOI: 10.7717/peerj-cs.671] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 07/20/2021] [Indexed: 06/13/2023]

Abstract

BACKGROUND

Machine learning is one kind of machine intelligence technique that learns from data and detects inherent patterns from large, complex datasets. Due to this capability, machine learning techniques are widely used in medical applications, especially where large-scale genomic and proteomic data are used. Cancer classification based on bio-molecular profiling data is a very important topic for medical applications since it improves the diagnostic accuracy of cancer and enables a successful culmination of cancer treatments. Hence, machine learning techniques are widely used in cancer detection and prognosis.

METHODS

In this article, a new ensemble machine learning classification model named Multiple Filtering and Supervised Attribute Clustering algorithm based Ensemble Classification model (MFSAC-EC) is proposed which can handle class imbalance problem and high dimensionality of microarray datasets. This model first generates a number of bootstrapped datasets from the original training data where the oversampling procedure is applied to handle the class imbalance problem. The proposed MFSAC method is then applied to each of these bootstrapped datasets to generate sub-datasets, each of which contains a subset of the most relevant/informative attributes of the original dataset. The MFSAC method is a feature selection technique combining multiple filters with a new supervised attribute clustering algorithm. Then for every sub-dataset, a base classifier is constructed separately, and finally, the predictive accuracy of these base classifiers is combined using the majority voting technique forming the MFSAC-based ensemble classifier. Also, a number of most informative attributes are selected as important features based on their frequency of occurrence in these sub-datasets.

RESULTS

To assess the performance of the proposed MFSAC-EC model, it is applied on different high-dimensional microarray gene expression datasets for cancer sample classification. The proposed model is compared with well-known existing models to establish its effectiveness with respect to other models. From the experimental results, it has been found that the generalization performance/testing accuracy of the proposed classifier is significantly better compared to other well-known existing models. Apart from that, it has been also found that the proposed model can identify many important attributes/biomarker genes.

Collapse

Panta M, Mishra A, Hoque MT, Atallah J. ClassifyTE: A stacking based prediction of hierarchical classification of transposable elements. Bioinformatics 2021;37:2529-2536. [PMID: 33682878 DOI: 10.1093/bioinformatics/btab146] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 02/10/2021] [Accepted: 03/01/2021] [Indexed: 11/14/2022] Open

Mishra A, Khanal R, Kabir WU, Hoque T. AIRBP: Accurate identification of RNA-binding proteins using machine learning techniques. Artif Intell Med 2021;113:102034. [PMID: 33685590 DOI: 10.1016/j.artmed.2021.102034] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Revised: 01/19/2021] [Accepted: 02/09/2021] [Indexed: 12/25/2022]

Abstract

Identification of RNA-binding proteins (RBPs) that bind to ribonucleic acid molecules is an important problem in Computational Biology and Bioinformatics. It becomes indispensable to identify RBPs as they play crucial roles in post-transcriptional control of RNAs and RNA metabolism as well as have diverse roles in various biological processes such as splicing, mRNA stabilization, mRNA localization, and translation, RNA synthesis, folding-unfolding, modification, processing, and degradation. The existing experimental techniques for identifying RBPs are time-consuming and expensive. Therefore, identifying RBPs directly from the sequence using computational methods can be useful to annotate RBPs and assist the experimental design efficiently. In this work, we present a method called AIRBP, which is designed using an advanced machine learning technique, called stacking, to effectively predict RBPs by utilizing features extracted from evolutionary information, physiochemical properties, and disordered properties. Moreover, our method, AIRBP, use the majority vote from RBPPred, DeepRBPPred, and the stacking model for the prediction for RBPs. The results show that AIRBP attains Accuracy (ACC), Balanced Accuracy (BACC), F1-score, and Mathews Correlation Coefficient (MCC) of 95.84 %, 94.71 %, 0.928, and 0.899, respectively, based on the training dataset, using 10-fold cross-validation (CV). Further evaluation of AIRBP on independent test set reveals that it achieves ACC, BACC, F1-score, and MCC of 94.36 %, 94.28 %, 0.897, and 0.860, for Human test set; 91.25 %, 93.00 %, 0.896, and 0.835 for S. cerevisiae test set; and 90.60 %, 90.41 %, 0.934, and 0.775 for A. thaliana test set, respectively. These results indicate that the AIRBP outperforms the existing Deep- and TriPepSVM methods. Therefore, the proposed better-performing AIRBP can be useful for accurate identification and annotation of RBPs directly from the sequence and help gain valuable insight to treat critical diseases. Availability: Code-data is available here: http://cs.uno.edu/∼tamjid/Software/AIRBP/code_data.zip.

Collapse

Chatzimparmpas A, Martins RM, Kucher K, Kerren A. StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021;27:1547-1557. [PMID: 33048687 DOI: 10.1109/tvcg.2020.3030352] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Mishra A, Kabir MWU, Hoque MT. diSBPred: A machine learning based approach for disulfide bond prediction. Comput Biol Chem 2021;91:107436. [PMID: 33550156 DOI: 10.1016/j.compbiolchem.2021.107436] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 12/28/2020] [Accepted: 01/10/2021] [Indexed: 12/25/2022]

Abstract

The protein disulfide bond is a covalent bond that forms during post-translational modification by the oxidation of a pair of cysteines. In protein, the disulfide bond is the most frequent covalent link between amino acids after the peptide bond. It plays a significant role in three-dimensional (3D) ab initio protein structure prediction (aiPSP), stabilizing protein conformation, post-translational modification, and protein folding. In aiPSP, the location of disulfide bonds can strongly reduce the conformational space searching by imposing geometrical constraints. Existing experimental techniques for the determination of disulfide bonds are time-consuming and expensive. Thus, developing sequence-based computational methods for disulfide bond prediction becomes indispensable. This study proposed a stacking-based machine learning approach for disulfide bond prediction (diSBPred). Various useful sequence and structure-based features are extracted for effective training, including conservation profile, residue solvent accessibility, torsion angle flexibility, disorder probability, a sequential distance between cysteines, and more. The prediction of disulfide bonds is carried out in two stages: first, individual cysteines are predicted as either bonding or non-bonding; second, the cysteine-pairs are predicted as either bonding or non-bonding by including the results from cysteine bonding prediction as a feature. The examination of the relevance of the features employed in this study and the features utilized in the existing nearest neighbor algorithm (NNA) method shows that the features used in this study improve about 7.39 % in jackknife validation balanced accuracy. Moreover, for individual cysteine bonding prediction and cysteine-pair bonding prediction, diSBPred provides a 10-fold cross-validation balanced accuracy of 82.29 % and 94.20 %, respectively. Altogether, our predictor achieves an improvement of 43.25 % based on balanced accuracy compared to the existing NNA based approach. Thus, diSBPred can be utilized to annotate the cysteine bonding residues of protein sequences whose structures are unknown as well as improve the accuracy of the aiPSP method, which can further aid in experimental studies of the disulfide bond and structure determination.

Collapse

Khanal J, Lim DY, Tayara H, Chong KT. i6mA-stack: A stacking ensemble-based computational prediction of DNA N6-methyladenine (6mA) sites in the Rosaceae genome. Genomics 2020;113:582-592. [PMID: 33010390 DOI: 10.1016/j.ygeno.2020.09.054] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 09/09/2020] [Accepted: 09/23/2020] [Indexed: 01/09/2023]

Ghosh KK, Ghosh S, Sen S, Sarkar R, Maulik U. A two-stage approach towards protein secondary structure classification. Med Biol Eng Comput 2020;58:1723-1737. [PMID: 32472446 DOI: 10.1007/s11517-020-02194-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Accepted: 05/20/2020] [Indexed: 12/11/2022]

Abstract

Protein secondary structure (PSS) describes the local folded structures which get formed inside a polypeptide due to interactions among atoms of the backbone. Generally, globular proteins are divided into four classes, namely all-α, all-β, α + β, and α/β. As nearly 90% of proteins fall into the said four classes, these are mostly considered for the purpose of computational classification of proteins. Classification of PSS is important for different biological functions that include protein fold recognition, tertiary structure prediction, prediction of DNA-binding sites, and reduction of the conformation search space among others. In this paper, we have proposed a machine learning-based model for secondary structure classification of proteins into four classes: all-α, all-β, α + β, and α/β. In doing so, we have considered both sequence-based and structure-based features. At first, mutual information (MI), a filter-based feature selection method, is used to remove the redundant features, and then these selected features are used to train three different classifiers-random forest, K-nearest neighbor (KNN), and multi-layer perceptron (MLP). After that, some standard classifier combination approaches are applied to integrate the decision made by the said classifiers and it has been found that weighted product rule performs the best among all. The overall accuracies obtained using the proposed model on the four standard datasets, namely 640, 1189, 25pdb, and fc699 are 86.89%, 92.93%, 91.38%, and 94.87% respectively. The proposed model outperforms some state-of-the-art methods considered here for comparison. Significantly high classification accuracy produced by our proposed model on four datasets is attributed to the development of a comprehensive feature set (by eliminating redundant features through feature selection technique) which is then passed through an ensemble consists of three different classifiers. Assigning different weights to the outcome of different classifiers thus proved to be useful in designing the model for predicting the secondary structure of proteins based on its sequence-based and structure-based features. Graphical abstract.

Collapse

An N, Ding H, Yang J, Au R, Ang TFA. Deep ensemble learning for Alzheimer's disease classification. J Biomed Inform 2020;105:103411. [PMID: 32234546 PMCID: PMC9760486 DOI: 10.1016/j.jbi.2020.103411] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 02/29/2020] [Accepted: 03/23/2020] [Indexed: 01/01/2023]

AIBH: Accurate Identification of Brain Hemorrhage Using Genetic Algorithm Based Feature Selection and Stacking. MACHINE LEARNING AND KNOWLEDGE EXTRACTION 2020. [DOI: 10.3390/make2020005] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Abstract Brain hemorrhage is a type of stroke which is caused by a ruptured artery, resulting in localized bleeding in or around the brain tissues. Among a variety of imaging tests, a computerized tomography (CT) scan of the brain enables the accurate detection and diagnosis of a brain hemorrhage. In this work, we developed a practical approach to detect the existence and type of brain hemorrhage in a CT scan image of the brain, called Accurate Identification of Brain Hemorrhage, abbreviated as AIBH. The steps of the proposed method consist of image preprocessing, image segmentation, feature extraction, feature selection, and design of an advanced classification framework. The image preprocessing and segmentation steps involve removing the skull region from the image and finding out the region of interest (ROI) using Otsu’s method, respectively. Subsequently, feature extraction includes the collection of a comprehensive set of features from the ROI, such as the size of the ROI, centroid of the ROI, perimeter of the ROI, the distance between the ROI and the skull, and more. Furthermore, a genetic algorithm (GA)-based feature selection algorithm is utilized to select relevant features for improved performance. These features are then used to train the stacking-based machine learning framework to predict different types of a brain hemorrhage. Finally, the evaluation results indicate that the proposed predictor achieves a 10-fold cross-validation (CV) accuracy (ACC), precision (PR), Recall, F1-score, and Matthews correlation coefficient (MCC) of 99.5%, 99%, 98.9%, 0.989, and 0.986, respectively, on the benchmark CT scan dataset. While comparing AIBH with the existing state-of-the-art classification method of the brain hemorrhage type, AIBH provides an improvement of 7.03%, 7.27%, and 7.38% based on PR, Recall, and F1-score, respectively. Therefore, the proposed approach considerably outperforms the existing brain hemorrhage classification approach and can be useful for the effective prediction of brain hemorrhage types from CT scan images (The code and data can be found here: http://cs.uno.edu/~tamjid/Software/AIBH/code_data.zip). Collapse

Gattani S, Mishra A, Hoque MT. StackCBPred: A stacking based prediction of protein-carbohydrate binding sites from sequence. Carbohydr Res 2019;486:107857. [DOI: 10.1016/j.carres.2019.107857] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 10/05/2019] [Accepted: 10/23/2019] [Indexed: 11/26/2022]

Kwon H, Park J, Lee Y. Stacking Ensemble Technique for Classifying Breast Cancer. Healthc Inform Res 2019;25:283-288. [PMID: 31777671 PMCID: PMC6859259 DOI: 10.4258/hir.2019.25.4.283] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2019] [Revised: 10/04/2019] [Accepted: 10/06/2019] [Indexed: 11/23/2022] Open

Flot M, Mishra A, Kuchi AS, Hoque MT. StackSSSPred: A Stacking-Based Prediction of Supersecondary Structure from Sequence. Methods Mol Biol 2019;1958:101-122. [PMID: 30945215 DOI: 10.1007/978-1-4939-9161-7_5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Computer Aided Diagnosis System for Detection of Cancer Cells on Cytological Pleural Effusion Images. BIOMED RESEARCH INTERNATIONAL 2018;2018:6456724. [PMID: 30533436 PMCID: PMC6250027 DOI: 10.1155/2018/6456724] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 09/11/2018] [Accepted: 10/16/2018] [Indexed: 12/17/2022]

Abstract

Cytological screening plays a vital role in the diagnosis of cancer from the microscope slides of pleural effusion specimens. However, this manual screening method is subjective and time-intensive and it suffers from inter- and intra-observer variations. In this study, we propose a novel Computer Aided Diagnosis (CAD) system for the detection of cancer cells in cytological pleural effusion (CPE) images. Firstly, intensity adjustment and median filtering methods were applied to improve image quality. Cell nuclei were extracted through a hybrid segmentation method based on the fusion of Simple Linear Iterative Clustering (SLIC) superpixels and K-Means clustering. A series of morphological operations were utilized to correct segmented nuclei boundaries and eliminate any false findings. A combination of shape analysis and contour concavity analysis was carried out to detect and split any overlapped nuclei into individual ones. After the cell nuclei were accurately delineated, we extracted 14 morphometric features, 6 colorimetric features, and 181 texture features from each nucleus. The texture features were derived from a combination of color components based first order statistics, gray level cooccurrence matrix and gray level run-length matrix. A novel hybrid feature selection method based on simulated annealing combined with an artificial neural network (SA-ANN) was developed to select the most discriminant and biologically interpretable features. An ensemble classifier of bagged decision trees was utilized as the classification model for differentiating cells into either benign or malignant using the selected features. The experiment was carried out on 125 CPE images containing more than 10500 cells. The proposed method achieved sensitivity of 87.97%, specificity of 99.40%, accuracy of 98.70%, and F-score of 87.79%.

Collapse

Mishra A, Pokhrel P, Hoque MT. StackDPPred: a stacking based prediction of DNA-binding protein from sequence. Bioinformatics 2018;35:433-441. [DOI: 10.1093/bioinformatics/bty653] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 07/18/2018] [Indexed: 12/12/2022] Open

Iqbal S, Hoque MT. PBRpredict-Suite: a suite of models to predict peptide-recognition domain residues from protein sequence. Bioinformatics 2018;34:3289-3299. [DOI: 10.1093/bioinformatics/bty352] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 04/29/2018] [Indexed: 01/10/2023] Open

Sahu B, Dehuri S, Jagadev AK. Feature selection model based on clustering and ranking in pipeline for microarray data. INFORMATICS IN MEDICINE UNLOCKED 2017. [DOI: 10.1016/j.imu.2017.07.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open

Random Subspace Aggregation for Cancer Prediction with Gene Expression Profiles. BIOMED RESEARCH INTERNATIONAL 2016;2016:4596326. [PMID: 27999797 PMCID: PMC5143691 DOI: 10.1155/2016/4596326] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2016] [Revised: 10/08/2016] [Accepted: 10/20/2016] [Indexed: 12/23/2022]

Huang BFF, Boutros PC. The parameter sensitivity of random forests. BMC Bioinformatics 2016;17:331. [PMID: 27586051 PMCID: PMC5009551 DOI: 10.1186/s12859-016-1228-x] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 08/26/2016] [Indexed: 02/07/2023] Open

Significant patterns for oral cancer detection: association rule on clinical examination and history data. ACTA ACUST UNITED AC 2014. [DOI: 10.1007/s13721-014-0050-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]