1
|
Almutaani M, Turki T, Taguchi YH. Novel large empirical study of deep transfer learning for COVID-19 classification based on CT and X-ray images. Sci Rep 2024; 14:26520. [PMID: 39489731 PMCID: PMC11532342 DOI: 10.1038/s41598-024-76498-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 10/14/2024] [Indexed: 11/05/2024] Open
Abstract
The early and highly accurate prediction of COVID-19 based on medical images can speed up the diagnostic process and thereby mitigate disease spread; therefore, developing AI-based models is an inevitable endeavor. The presented work, to our knowledge, is the first to expand the model space and identify a better performing model among 10,000 constructed deep transfer learning (DTL) models as follows. First, we downloaded and processed 4481 CT and X-ray images pertaining to COVID-19 and non-COVID-19 patients, obtained from the Kaggle repository. Second, we provide processed images as inputs to four pre-trained deep learning models (ConvNeXt, EfficientNetV2, DenseNet121, and ResNet34) on more than a million images from the ImageNet database, in which we froze the convolutional and pooling layers pertaining to the feature extraction part while unfreezing and training the densely connected classifier with the Adam optimizer. Third, we generate and take a majority vote of two, three, and four combinations from the four DTL models, resulting in [Formula: see text] DTL models. Then, we combine the 11 DTL models, followed by consecutively generating and taking the majority vote of [Formula: see text] DTL models. Finally, we select [Formula: see text] DTL models from [Formula: see text] Experimental results from the whole datasets using five-fold cross-validation demonstrate that the best generated DTL model, named HC, achieving the best AUC of 0.909 when applied to the CT dataset, while ConvNeXt yielded a higher marginal AUC of 0.933 compared to 0.93 for HX when considering the X-ray dataset. These promising results set the foundation for promoting the large generation of models (LGM) in AI.
Collapse
Affiliation(s)
- Mansour Almutaani
- Department of Computer Science, King Abdulaziz University, 21589, Jeddah, Saudi Arabia
| | - Turki Turki
- Department of Computer Science, King Abdulaziz University, 21589, Jeddah, Saudi Arabia.
| | - Y-H Taguchi
- Department of Physics, Chuo University, Tokyo, 112-8551, Japan
| |
Collapse
|
2
|
Alghamdi S, Turki T. A novel interpretable deep transfer learning combining diverse learnable parameters for improved T2D prediction based on single-cell gene regulatory networks. Sci Rep 2024; 14:4491. [PMID: 38396138 PMCID: PMC10891129 DOI: 10.1038/s41598-024-54923-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/18/2024] [Indexed: 02/25/2024] Open
Abstract
Accurate deep learning (DL) models to predict type 2 diabetes (T2D) are concerned not only with targeting the discrimination task but also with learning useful feature representation. However, existing DL tools are far from perfect and do not provide appropriate interpretation as a guideline to explain and promote superior performance in the target task. Therefore, we provide an interpretable approach for our presented deep transfer learning (DTL) models to overcome such drawbacks, working as follows. We utilize several pre-trained models including SEResNet152, and SEResNeXT101. Then, we transfer knowledge from pre-trained models via keeping the weights in the convolutional base (i.e., feature extraction part) while modifying the classification part with the use of Adam optimizer to deal with classifying healthy controls and T2D based on single-cell gene regulatory network (SCGRN) images. Another DTL models work in a similar manner but just with keeping weights of the bottom layers in the feature extraction unaltered while updating weights of consecutive layers through training from scratch. Experimental results on the whole 224 SCGRN images using five-fold cross-validation show that our model (TFeSEResNeXT101) achieving the highest average balanced accuracy (BAC) of 0.97 and thereby significantly outperforming the baseline that resulted in an average BAC of 0.86. Moreover, the simulation study demonstrated that the superiority is attributed to the distributional conformance of model weight parameters obtained with Adam optimizer when coupled with weights from a pre-trained model.
Collapse
Affiliation(s)
- Sumaya Alghamdi
- Department of Computer Science, King Abdulaziz University, 21589, Jeddah, Saudi Arabia
- Department of Computer Science, Albaha University, 65799, Albaha, Saudi Arabia
| | - Turki Turki
- Department of Computer Science, King Abdulaziz University, 21589, Jeddah, Saudi Arabia.
| |
Collapse
|
3
|
Yin C, Yan B. Machine learning in basic scientific research on oral diseases. DIGITAL MEDICINE 2023; 9. [DOI: 10.1097/dm-2023-00001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
4
|
Lu Q, Chen F, Li Q, Chen L, Tong L, Tian G, Zhou X. A Machine Learning Method to Trace Cancer Primary Lesion Using Microarray-Based Gene Expression Data. Front Oncol 2022; 12:832567. [PMID: 35530331 PMCID: PMC9071249 DOI: 10.3389/fonc.2022.832567] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 03/21/2022] [Indexed: 11/17/2022] Open
Abstract
Cancer of unknown primary site (CUP) is a heterogeneous group of cancers whose tissue of origin remains unknown after detailed investigation by conventional clinical methods. The number of CUP accounts for roughly 3%–5% of all human malignancies. CUP patients are usually treated with broad-spectrum chemotherapy, which often leads to a poor prognosis. Recent studies suggest that the treatment targeting the primary lesion of CUP will significantly improve the prognosis of the patient. Therefore, it is urgent to develop an efficient method to accurately detect tissue of origin of CUP in clinical cancer research. In this work, we developed a novel framework that uses Extreme Gradient Boosting (XGBoost) to trace the primary site of CUP based on microarray-based gene expression data. First, we downloaded the microarray-based gene expression profiles of 59,385 genes for 57,08 samples from The Cancer Genome Atlas (TCGA) and 6,364 genes for 3,101 samples from the Gene Expression Omnibus (GEO). Both data were divided into training and independent testing data with a ratio of 4:1. Then, we obtained in the training data 200 and 290 genes from TCGA and the GEO datasets, respectively, to train XGBoost models for the identification of the primary site of CUP. The overall 5-fold cross-validation accuracies of our methods were 96.9% and 95.3% on TCGA and GEO training datasets, respectively. Meanwhile, the macro-precision for the independent dataset reached 96.75% and 98.8% on, respectively, TCGA and GEO. Experimental results demonstrated that the XGBoost framework not only can reduce the cost of clinical cancer traceability but also has high efficiency, which might be useful in clinical usage.
Collapse
Affiliation(s)
- Qingfeng Lu
- Oncology Department, Daqing Oilfield General Hospital, Daqing, China
| | - Fengxia Chen
- Department of Thoracic Surgery, Hainan General Hospital, Haikou, China
| | - Qianyue Li
- Department of R&D, Geneis (Beijing) Co., Ltd., Beijing, China
| | - Lihong Chen
- Department of Emergency, Qingdao Eighth People's Hospital, Qingdao, China
| | - Ling Tong
- Department of Pathology, Chifeng Municipal Hospital, Chifeng Clinical Medical School of Inner Mongolia Medical University, Chifeng, China
| | - Geng Tian
- Department of R&D, Geneis (Beijing) Co., Ltd., Beijing, China
| | - Xiaohong Zhou
- Second Division of Cancer, Jiamusi Cancer Hospital, Jiamusi, China
| |
Collapse
|
5
|
Liu W, Fang X, Zhou Y, Dou L, Dou T. Machine learning-based investigation of the relationship between gut microbiome and obesity status. Microbes Infect 2021; 24:104892. [PMID: 34678464 DOI: 10.1016/j.micinf.2021.104892] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 09/30/2021] [Accepted: 10/10/2021] [Indexed: 02/06/2023]
Abstract
Gut microbiota is believed to play a crucial role in obesity. However, the consistent findings among published studies regarding microbiome-obesity interaction are relatively rare, and one of the underlying causes could be the limited sample size of cohort studies. In order to identify gut microbiota changes between normal-weight individuals and obese individuals, fecal samples along with phenotype information from 2262 Chinese individuals were collected and analyzed. Compared with normal-weight individuals, the obese individuals exhibit lower diversity of species and higher diversity of metabolic pathways. In addition, various machine learning models were employed to quantify the relationship between obesity status and Body mass index (BMI) values, of which support vector machine model achieves best performance with 0.716 classification accuracy and 0.485 R2 score. In addition to two well-established obesity-associated species, three species that have potential to be obesity-related biomarkers, including Bacteroides caccae, Odoribacter splanchnicus and Roseburia hominis were identified. Further analyses of functional pathways also reveal some enriched pathways in obese individuals. Collectively, our data demonstrates tight relationship between obesity and gut microbiota in a large-scale Chinese population. These findings may provide potential targets for the prevention and treatment of obesity.
Collapse
Affiliation(s)
- Wanjun Liu
- School of Life and Pharmaceutical Sciences, Dalian University of Technology, Panjin 124221, China; Department of Scientific Research, KMHD, Shenzhen 518126, China
| | - Xiaojie Fang
- Guangdong Provincial Hospital of Chinese Medicine, Guangzhou 510120, China
| | - Yong Zhou
- Department of Scientific Research, KMHD, Shenzhen 518126, China
| | - Lihong Dou
- The First People's Hospital of Jiashan, Zhejiang 314100, China
| | - Tongyi Dou
- School of Life and Pharmaceutical Sciences, Dalian University of Technology, Panjin 124221, China.
| |
Collapse
|
6
|
Shaikh TA, Ali R. An automated machine learning tool for breast cancer diagnosis for healthcare professionals. Health Syst (Basingstoke) 2021; 11:303-333. [DOI: 10.1080/20476965.2021.1966324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
Affiliation(s)
- Tawseef Ayoub Shaikh
- Department Of Computer Science & Engineering, Baba Ghulam Shah Badshah University Rajouri, Rajouri, J&K, India
| | - Rashid Ali
- Department Of Computer Engineering, Aligarh Muslim University, Aligarh, Uttar Pradesh, India
| |
Collapse
|
7
|
Vuong TTL, Kim K, Song B, Kwak JT. Joint categorical and ordinal learning for cancer grading in pathology images. Med Image Anal 2021; 73:102206. [PMID: 34399153 DOI: 10.1016/j.media.2021.102206] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 07/26/2021] [Accepted: 07/28/2021] [Indexed: 02/07/2023]
Abstract
Cancer grading in pathology image analysis is one of the most critical tasks since it is related to patient outcomes and treatment planning. Traditionally, it has been considered a categorical problem, ignoring the natural ordering among the cancer grades, i.e., the higher the grade is, the more aggressive it is, and the worse the outcome is. Herein, we propose a joint categorical and ordinal learning framework for cancer grading in pathology images. The approach simultaneously performs both categorical classification and ordinal classification and aims to leverage the distinctive features from the two tasks. Moreover, we propose a new loss function for the ordinal classification task that offers an improved contrast between the correctly classified examples and misclassified examples. The proposed method is evaluated on multiple collections of colorectal and prostate pathology images that underwent different acquisition and processing procedures. Both quantitative and qualitative assessments of the experimental results confirm the effectiveness and robustness of the proposed method in comparison to other competing methods. The results suggest that the proposed approach could permit improved histopathologic analysis of cancer grades in pathology images.
Collapse
Affiliation(s)
- Trinh Thi Le Vuong
- School of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea
| | - Kyungeun Kim
- Department of Pathology, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, Republic of Korea
| | - Boram Song
- Department of Pathology, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, Republic of Korea
| | - Jin Tae Kwak
- School of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea.
| |
Collapse
|
8
|
Alabi RO, Youssef O, Pirinen M, Elmusrati M, Mäkitie AA, Leivo I, Almangush A. Machine learning in oral squamous cell carcinoma: Current status, clinical concerns and prospects for future-A systematic review. Artif Intell Med 2021; 115:102060. [PMID: 34001326 DOI: 10.1016/j.artmed.2021.102060] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 01/27/2021] [Accepted: 03/23/2021] [Indexed: 02/06/2023]
Abstract
BACKGROUND Oral cancer can show heterogenous patterns of behavior. For proper and effective management of oral cancer, early diagnosis and accurate prediction of prognosis are important. To achieve this, artificial intelligence (AI) or its subfield, machine learning, has been touted for its potential to revolutionize cancer management through improved diagnostic precision and prediction of outcomes. Yet, to date, it has made only few contributions to actual medical practice or patient care. OBJECTIVES This study provides a systematic review of diagnostic and prognostic application of machine learning in oral squamous cell carcinoma (OSCC) and also highlights some of the limitations and concerns of clinicians towards the implementation of machine learning-based models for daily clinical practice. DATA SOURCES We searched OvidMedline, PubMed, Scopus, Web of Science, and Institute of Electrical and Electronics Engineers (IEEE) databases from inception until February 2020 for articles that used machine learning for diagnostic or prognostic purposes of OSCC. ELIGIBILITY CRITERIA Only original studies that examined the application of machine learning models for prognostic and/or diagnostic purposes were considered. DATA EXTRACTION Independent extraction of articles was done by two researchers (A.R. & O.Y) using predefine study selection criteria. We used the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) in the searching and screening processes. We also used Prediction model Risk of Bias Assessment Tool (PROBAST) for assessing the risk of bias (ROB) and quality of included studies. RESULTS A total of 41 studies were published to have used machine learning to aid in the diagnosis/or prognosis of OSCC. The majority of these studies used the support vector machine (SVM) and artificial neural network (ANN) algorithms as machine learning techniques. Their specificity ranged from 0.57 to 1.00, sensitivity from 0.70 to 1.00, and accuracy from 63.4 % to 100.0 % in these studies. The main limitations and concerns can be grouped as either the challenges inherent to the science of machine learning or relating to the clinical implementations. CONCLUSION Machine learning models have been reported to show promising performances for diagnostic and prognostic analyses in studies of oral cancer. These models should be developed to further enhance explainability, interpretability, and externally validated for generalizability in order to be safely integrated into daily clinical practices. Also, regulatory frameworks for the adoption of these models in clinical practices are necessary.
Collapse
Affiliation(s)
- Rasheed Omobolaji Alabi
- Department of Industrial Digitalization, School of Technology and Innovations, University of Vaasa, Vaasa, Finland.
| | - Omar Youssef
- Department of Pathology, University of Helsinki, Helsinki, Finland; Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Matti Pirinen
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland; Department of Public Health, University of Helsinki, Helsinki, Finland; Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Mohammed Elmusrati
- Department of Industrial Digitalization, School of Technology and Innovations, University of Vaasa, Vaasa, Finland
| | - Antti A Mäkitie
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland; Department of Otorhinolaryngology - Head and Neck Surgery, University of Helsinki and Helsinki University Hospital, Helsinki, Finland; Division of Ear, Nose and Throat Diseases, Department of Clinical Sciences, Intervention and Technology, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
| | - Ilmo Leivo
- University of Turku, Institute of Biomedicine, Pathology, Turku, Finland
| | - Alhadi Almangush
- Department of Pathology, University of Helsinki, Helsinki, Finland; Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland; University of Turku, Institute of Biomedicine, Pathology, Turku, Finland; Faculty of Dentistry, Misurata University, Misurata, Libya
| |
Collapse
|
9
|
Turki T, Taguchi YH. Discriminating the single-cell gene regulatory networks of human pancreatic islets: A novel deep learning application. Comput Biol Med 2021; 132:104257. [PMID: 33740535 DOI: 10.1016/j.compbiomed.2021.104257] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 02/01/2021] [Accepted: 02/03/2021] [Indexed: 12/24/2022]
Abstract
Analysis of single-cell pancreatic data can play an important role in understanding various metabolic diseases and health conditions. Due to the sparsity and noise present in such single-cell gene expression data, inference of single-cell gene regulatory networks remains a challenge. Since recent studies have reported the reliable inference of single-cell gene regulatory networks (SCGRNs), the current study focused on discriminating the SCGRNs of T2D patients from those of healthy controls. By accurately distinguishing SCGRNs of healthy pancreas from those of T2D pancreas, it would be possible to annotate, organize, visualize, and identify common patterns of SCGRNs in metabolic diseases. Such annotated SCGRNs could play an important role in accelerating the process of building large data repositories. This study aimed to contribute to the development of a novel deep learning (DL) application. First, we generated a dataset consisting of 224 SCGRNs belonging to both T2D and healthy pancreas and made it freely available. Next, we chose seven DL architectures, including VGG16, VGG19, Xception, ResNet50, ResNet101, DenseNet121, and DenseNet169, trained each of them on the dataset, and checked their prediction based on a test set. Of note, we evaluated the DL architectures on a single NVIDIA GeForce RTX 2080Ti GPU. Experimental results on the whole dataset, using several performance measures, demonstrated the superiority of VGG19 DL model in the automatic classification of SCGRNs, derived from the single-cell pancreatic data.
Collapse
Affiliation(s)
- Turki Turki
- Department of Computer Science, King Abdulaziz University, Jeddah, 21589, Saudi Arabia.
| | - Y-H Taguchi
- Department of Physics, Chuo University, Tokyo, 112-8551, Japan.
| |
Collapse
|
10
|
Minimum Relevant Features to Obtain Explainable Systems for Predicting Cardiovascular Disease Using the Statlog Data Set. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11031285] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Learning systems have been focused on creating models capable of obtaining the best results in error metrics. Recently, the focus has shifted to improvement in the interpretation and explanation of the results. The need for interpretation is greater when these models are used to support decision making. In some areas, this becomes an indispensable requirement, such as in medicine. The goal of this study was to define a simple process to construct a system that could be easily interpreted based on two principles: (1) reduction of attributes without degrading the performance of the prediction systems and (2) selecting a technique to interpret the final prediction system. To describe this process, we selected a problem, predicting cardiovascular disease, by analyzing the well-known Statlog (Heart) data set from the University of California’s Automated Learning Repository. We analyzed the cost of making predictions easier to interpret by reducing the number of features that explain the classification of health status versus the cost in accuracy. We performed an analysis on a large set of classification techniques and performance metrics, demonstrating that it is possible to construct explainable and reliable models that provide high quality predictive performance.
Collapse
|
11
|
Mei HX, Cheng JH, Li YZ, Ma HS, Zhang KW, Shou YK, Li Y. [Advances in the application of machine learning in maxillofacial cysts and tumors]. HUA XI KOU QIANG YI XUE ZA ZHI = HUAXI KOUQIANG YIXUE ZAZHI = WEST CHINA JOURNAL OF STOMATOLOGY 2020; 38:687-691. [PMID: 33377348 PMCID: PMC7738924 DOI: 10.7518/hxkq.2020.06.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 01/19/2020] [Indexed: 02/05/2023]
Abstract
The application of artificial intelligence in medicine has gradually received attention along with its development. Many studies have shown that machine learning has a wide range of applications in stomatology, especially in the clinical diagnosis and treatment of maxillofacial cysts and tumors. This article reviews the application of machine learning in maxillofacial cyst and tumor to provide a new method for the diagnosis of oral and maxillofacial diseases.
Collapse
Affiliation(s)
- Hong-Xiang Mei
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China
| | - Jun-Hao Cheng
- College of Computer Science, Sichuan University, Chengdu 610041, China
| | - Yi-Zhou Li
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China
| | - Huang-Shui Ma
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China
| | - Kai-Wen Zhang
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China
| | - Yu-Ke Shou
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China
| | - Yang Li
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China
| |
Collapse
|
12
|
Shaikh TA, Ali R. An intelligent healthcare system for optimized breast cancer diagnosis using harmony search and simulated annealing (HS-SA) algorithm. INFORMATICS IN MEDICINE UNLOCKED 2020. [DOI: 10.1016/j.imu.2020.100408] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
13
|
Galvez JM, Castillo-Secilla D, Herrera LJ, Valenzuela O, Caba O, Prados JC, Ortuno FM, Rojas I. Towards Improving Skin Cancer Diagnosis by Integrating Microarray and RNA-Seq Datasets. IEEE J Biomed Health Inform 2019; 24:2119-2130. [PMID: 31871000 DOI: 10.1109/jbhi.2019.2953978] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Many clinical studies have revealed the high biological similarities existing among different skin pathological states. These similarities create difficulties in the efficient diagnosis of skin cancer, and encourage to study and design new intelligent clinical decision support systems. In this sense, gene expression analysis can help find differentially expressed genes (DEGs) simultaneously discerning multiple skin pathological states in a single test. The integration of multiple heterogeneous transcriptomic datasets requires different pipeline stages to be properly designed: from suitable batch merging and efficient biomarker selection to automated classification assessment. This article presents a novel approach addressing all these technical issues, with the intention of providing new sights about skin cancer diagnosis. Although new future efforts will have to be made in the search for better biomarkers recognizing specific skin pathological states, our study found a panel of 8 highly relevant multiclass DEGs for discerning up to 10 skin pathological states: 2 healthy skin conditions a priori, 2 cataloged precancerous skin diseases and 6 cancerous skin states. Their power of diagnosis over new samples was widely tested by previously well-trained classification models. Robust performance metrics such as overall and mean multiclass F1-score outperformed recognition rates of 94% and 80%, respectively. Clinicians should give special attention to highlighted multiclass DEGs that have high gene expression changes present among them, and understand their biological relationship to different skin pathological states.
Collapse
|
14
|
|
15
|
Hosni M, Abnane I, Idri A, Carrillo de Gea JM, Fernández Alemán JL. Reviewing ensemble classification methods in breast cancer. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 177:89-112. [PMID: 31319964 DOI: 10.1016/j.cmpb.2019.05.019] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 05/16/2019] [Accepted: 05/18/2019] [Indexed: 05/09/2023]
Abstract
CONTEXT Ensemble methods consist of combining more than one single technique to solve the same task. This approach was designed to overcome the weaknesses of single techniques and consolidate their strengths. Ensemble methods are now widely used to carry out prediction tasks (e.g. classification and regression) in several fields, including that of bioinformatics. Researchers have particularly begun to employ ensemble techniques to improve research into breast cancer, as this is the most frequent type of cancer and accounts for most of the deaths among women. OBJECTIVE AND METHOD The goal of this study is to analyse the state of the art in ensemble classification methods when applied to breast cancer as regards 9 aspects: publication venues, medical tasks tackled, empirical and research types adopted, types of ensembles proposed, single techniques used to construct the ensembles, validation framework adopted to evaluate the proposed ensembles, tools used to build the ensembles, and optimization methods used for the single techniques. This paper was undertaken as a systematic mapping study. RESULTS A total of 193 papers that were published from the year 2000 onwards, were selected from four online databases: IEEE Xplore, ACM digital library, Scopus and PubMed. This study found that of the six medical tasks that exist, the diagnosis medical task was that most frequently researched, and that the experiment-based empirical type and evaluation-based research type were the most dominant approaches adopted in the selected studies. The homogeneous type was that most widely used to perform the classification task. With regard to single techniques, this mapping study found that decision trees, support vector machines and artificial neural networks were those most frequently adopted to build ensemble classifiers. In the case of the evaluation framework, the Wisconsin Breast Cancer dataset was the most frequently used by researchers to perform their experiments, while the most noticeable validation method was k-fold cross-validation. Several tools are available to perform experiments related to ensemble classification methods, such as Weka and R Software. Few researchers took into account the optimisation of the single technique of which their proposed ensemble was composed, while the grid search method was that most frequently adopted to tune the parameter settings of a single classifier. CONCLUSION This paper reports an in-depth study of the application of ensemble methods as regards breast cancer. Our results show that there are several gaps and issues and we, therefore, provide researchers in the field of breast cancer research with recommendations. Moreover, after analysing the papers found in this systematic mapping study, we discovered that the majority report positive results concerning the accuracy of ensemble classifiers when compared to the single classifiers. In order to aggregate the evidence reported in literature, it will, therefore, be necessary to perform a systematic literature review and meta-analysis in which an in-depth analysis could be conducted so as to confirm the superiority of ensemble classifiers over the classical techniques.
Collapse
Affiliation(s)
- Mohamed Hosni
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - Ibtissam Abnane
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - Ali Idri
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - Juan M Carrillo de Gea
- Department of Informatics and Systems, Faculty of Computer Science, University of Murcia, Spain.
| | | |
Collapse
|
16
|
Way GP, Greene CS. Discovering Pathway and Cell Type Signatures in Transcriptomic Compendia with Machine Learning. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021348] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Pathway and cell type signatures are patterns present in transcriptome data that are associated with biological processes or phenotypic consequences. These signatures result from specific cell type and pathway expression but can require large transcriptomic compendia to detect. Machine learning techniques can be powerful tools for signature discovery through their ability to provide accurate and interpretable results. In this review, we discuss various machine learning applications to extract pathway and cell type signatures from transcriptomic compendia. We focus on the biological motivations and interpretation for both supervised and unsupervised learning approaches in this setting. We consider recent advances, including deep learning, and their applications to expanding bulk and single-cell RNA data. As data and computational resources increase, there will be more opportunities for machine learning to aid in revealing biological signatures.
Collapse
Affiliation(s)
- Gregory P. Way
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|