101
|
Benchmarking Eliminative Radiomic Feature Selection for Head and Neck Lymph Node Classification. Cancers (Basel) 2022; 14:cancers14030477. [PMID: 35158745 PMCID: PMC8833684 DOI: 10.3390/cancers14030477] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 01/13/2022] [Accepted: 01/16/2022] [Indexed: 12/12/2022] Open
Abstract
Simple Summary Pathologic cervical lymph nodes (LN) in head and neck squamous cell carcinoma (HNSCC) deteriorate prognosis. Current radiologic criteria for LN-classification are primarily shape-based. Radiomics is an emerging data-driven technique that aids in extraction, processing and analyzing features and is potentially capable of LN-classification. Currently available sets of features are too complex for clinical applicability. We identified the combination of sparse discriminant analysis and genetic algorithms as a potentially useful algorithm for eliminative feature selection. In this retrospective, cohort-study, from 252 LNs with over extracted 30,000 features, this algorithm retained a classification accuracy of up to 90% with only 10% of the original number of features. From a clinical perspective, the selected features appeared plausible and potentially capable of correctly classifying LNs. Both the identified algorithm and features need further exploration of their potential as prospective classifiers for LNs in HNSCC. Abstract In head and neck squamous cell carcinoma (HNSCC) pathologic cervical lymph nodes (LN) remain important negative predictors. Current criteria for LN-classification in contrast-enhanced computed-tomography scans (contrast-CT) are shape-based; contrast-CT imagery allows extraction of additional quantitative data (“features”). The data-driven technique to extract, process, and analyze features from contrast-CTs is termed “radiomics”. Extracted features from contrast-CTs at various levels are typically redundant and correlated. Current sets of features for LN-classification are too complex for clinical application. Effective eliminative feature selection (EFS) is a crucial preprocessing step to reduce the complexity of sets identified. We aimed at exploring EFS-algorithms for their potential to identify sets of features, which were as small as feasible and yet retained as much accuracy as possible for LN-classification. In this retrospective cohort-study, which adhered to the STROBE guidelines, in total 252 LNs were classified as “non-pathologic” (n = 70), “pathologic” (n = 182) or “pathologic with extracapsular spread” (n = 52) by two experienced head-and-neck radiologists based on established criteria which served as a reference. The combination of sparse discriminant analysis and genetic optimization retained up to 90% of the classification accuracy with only 10% of the original numbers of features. From a clinical perspective, the selected features appeared plausible and potentially capable of correctly classifying LNs. Both the identified EFS-algorithm and the identified features need further exploration to assess their potential to prospectively classify LNs in HNSCC.
Collapse
|
102
|
Jiao Z, Chen S, Shi H, Xu J. Multi-Modal Feature Selection with Feature Correlation and Feature Structure Fusion for MCI and AD Classification. Brain Sci 2022; 12:80. [PMID: 35053823 PMCID: PMC8773824 DOI: 10.3390/brainsci12010080] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 12/24/2021] [Accepted: 12/29/2021] [Indexed: 11/16/2022] Open
Abstract
Feature selection for multiple types of data has been widely applied in mild cognitive impairment (MCI) and Alzheimer's disease (AD) classification research. Combining multi-modal data for classification can better realize the complementarity of valuable information. In order to improve the classification performance of feature selection on multi-modal data, we propose a multi-modal feature selection algorithm using feature correlation and feature structure fusion (FC2FS). First, we construct feature correlation regularization by fusing a similarity matrix between multi-modal feature nodes. Then, based on manifold learning, we employ feature matrix fusion to construct feature structure regularization, and learn the local geometric structure of the feature nodes. Finally, the two regularizations are embedded in a multi-task learning model that introduces low-rank constraint, the multi-modal features are selected, and the final features are linearly fused and input into a support vector machine (SVM) for classification. Different controlled experiments were set to verify the validity of the proposed method, which was applied to MCI and AD classification. The accuracy of normal controls versus Alzheimer's disease, normal controls versus late mild cognitive impairment, normal controls versus early mild cognitive impairment, and early mild cognitive impairment versus late mild cognitive impairment achieve 91.85 ± 1.42%, 85.33 ± 2.22%, 78.29 ± 2.20%, and 77.67 ± 1.65%, respectively. This method makes up for the shortcomings of the traditional multi-modal feature selection based on subjects and fully considers the relationship between feature nodes and the local geometric structure of feature space. Our study not only enhances the interpretation of feature selection but also improves the classification performance, which has certain reference values for the identification of MCI and AD.
Collapse
Affiliation(s)
- Zhuqing Jiao
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China; (Z.J.); (S.C.)
| | - Siwei Chen
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China; (Z.J.); (S.C.)
| | - Haifeng Shi
- Department of Radiology, Changzhou Second People’s Hospital, Nanjing Medical University, Changzhou 213003, China
- School of Microelectronics and Control Engineering, Changzhou University, Changzhou 213164, China
| | - Jia Xu
- School of Medicine, Ningbo University, Ningbo 315211, China
| |
Collapse
|
103
|
How can dense results be differentiated in comprehensive evaluations? A hybrid information filtering model. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
104
|
Evaluation of Feature Selection Methods on Psychosocial Education Data Using Additive Ratio Assessment. ELECTRONICS 2021. [DOI: 10.3390/electronics11010114] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Artificial intelligence, particularly machine learning, is the fastest-growing research trend in educational fields. Machine learning shows an impressive performance in many prediction models, including psychosocial education. The capability of machine learning to discover hidden patterns in large datasets encourages researchers to invent data with high-dimensional features. In contrast, not all features are needed by machine learning, and in many cases, high-dimensional features decrease the performance of machine learning. The feature selection method is one of the appropriate approaches to reducing the features to ensure machine learning works efficiently. Various selection methods have been proposed, but research to determine the essential subset feature in psychosocial education has not been established thus far. This research investigated and proposed methods to determine the best feature selection method in the domain of psychosocial education. We used a multi-criteria decision system (MCDM) approach with Additive Ratio Assessment (ARAS) to rank seven feature selection methods. The proposed model evaluated the best feature selection method using nine criteria from the performance metrics provided by machine learning. The experimental results showed that the ARAS is promising for evaluating and recommending the best feature selection method for psychosocial education data using the teacher’s psychosocial risk levels dataset.
Collapse
|
105
|
Pes B, Lai G. Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study. PeerJ Comput Sci 2021; 7:e832. [PMID: 35036539 PMCID: PMC8725666 DOI: 10.7717/peerj-cs.832] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 12/06/2021] [Indexed: 05/28/2023]
Abstract
High dimensionality and class imbalance have been largely recognized as important issues in machine learning. A vast amount of literature has indeed investigated suitable approaches to address the multiple challenges that arise when dealing with high-dimensional feature spaces (where each problem instance is described by a large number of features). As well, several learning strategies have been devised to cope with the adverse effects of imbalanced class distributions, which may severely impact on the generalization ability of the induced models. Nevertheless, although both the issues have been largely studied for several years, they have mostly been addressed separately, and their combined effects are yet to be fully understood. Indeed, little research has been so far conducted to investigate which approaches might be best suited to deal with datasets that are, at the same time, high-dimensional and class-imbalanced. To make a contribution in this direction, our work presents a comparative study among different learning strategies that leverage both feature selection, to cope with high dimensionality, as well as cost-sensitive learning methods, to cope with class imbalance. Specifically, different ways of incorporating misclassification costs into the learning process have been explored. Also different feature selection heuristics have been considered, both univariate and multivariate, to comparatively evaluate their effectiveness on imbalanced data. The experiments have been conducted on three challenging benchmarks from the genomic domain, gaining interesting insight into the beneficial impact of combining feature selection and cost-sensitive learning, especially in the presence of highly skewed data distributions.
Collapse
Affiliation(s)
- Barbara Pes
- Dipartimento di Matematica e Informatica, Università degli Studi di Cagliari, Cagliari, Italy
| | - Giuseppina Lai
- Dipartimento di Matematica e Informatica, Università degli Studi di Cagliari, Cagliari, Italy
| |
Collapse
|
106
|
Bhattacharjee S, Ikromjanov K, Carole KS, Madusanka N, Cho NH, Hwang YB, Sumon RI, Kim HC, Choi HK. Cluster Analysis of Cell Nuclei in H&E-Stained Histological Sections of Prostate Cancer and Classification Based on Traditional and Modern Artificial Intelligence Techniques. Diagnostics (Basel) 2021; 12:diagnostics12010015. [PMID: 35054182 PMCID: PMC8774423 DOI: 10.3390/diagnostics12010015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 12/14/2021] [Accepted: 12/20/2021] [Indexed: 11/16/2022] Open
Abstract
Biomarker identification is very important to differentiate the grade groups in the histopathological sections of prostate cancer (PCa). Assessing the cluster of cell nuclei is essential for pathological investigation. In this study, we present a computer-based method for cluster analyses of cell nuclei and performed traditional (i.e., unsupervised method) and modern (i.e., supervised method) artificial intelligence (AI) techniques for distinguishing the grade groups of PCa. Two datasets on PCa were collected to carry out this research. Histopathology samples were obtained from whole slides stained with hematoxylin and eosin (H&E). In this research, state-of-the-art approaches were proposed for color normalization, cell nuclei segmentation, feature selection, and classification. A traditional minimum spanning tree (MST) algorithm was employed to identify the clusters and better capture the proliferation and community structure of cell nuclei. K-medoids clustering and stacked ensemble machine learning (ML) approaches were used to perform traditional and modern AI-based classification. The binary and multiclass classification was derived to compare the model quality and results between the grades of PCa. Furthermore, a comparative analysis was carried out between traditional and modern AI techniques using different performance metrics (i.e., statistical parameters). Cluster features of the cell nuclei can be useful information for cancer grading. However, further validation of cluster analysis is required to accomplish astounding classification results.
Collapse
Affiliation(s)
| | - Kobiljon Ikromjanov
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Kouayep Sonia Carole
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Nuwan Madusanka
- School of Computing & IT, Sri Lanka Technological Campus, Paduka 10500, Sri Lanka;
| | - Nam-Hoon Cho
- Department of Pathology, Yonsei University Hospital, Seoul 03722, Korea;
| | - Yeong-Byn Hwang
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Rashadul Islam Sumon
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Hee-Cheol Kim
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Heung-Kook Choi
- Department of Computer Engineering, u-AHRC, Inje University, Gimhae 50834, Korea;
- Correspondence: ; Tel.: +82-10-6733-3437
| |
Collapse
|
107
|
Syed FH, Tahir MA, Rafi M, Shahab MD. Feature selection for semi-supervised multi-target regression using genetic algorithm. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02291-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
108
|
Yang P, Huang H, Liu C. Feature selection revisited in the single-cell era. Genome Biol 2021; 22:321. [PMID: 34847932 PMCID: PMC8638336 DOI: 10.1186/s13059-021-02544-3] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 11/15/2021] [Indexed: 12/13/2022] Open
Abstract
Recent advances in single-cell biotechnologies have resulted in high-dimensional datasets with increased complexity, making feature selection an essential technique for single-cell data analysis. Here, we revisit feature selection techniques and summarise recent developments. We review their application to a range of single-cell data types generated from traditional cytometry and imaging technologies and the latest array of single-cell omics technologies. We highlight some of the challenges and future directions and finally consider their scalability and make general recommendations on each type of feature selection method. We hope this review stimulates future research and application of feature selection in the single-cell era.
Collapse
Affiliation(s)
- Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, 2006, Australia.
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia.
- Charles Perkins Centre, University of Sydney, Sydney, NSW, 2006, Australia.
| | - Hao Huang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, 2006, Australia
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
| | - Chunlei Liu
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
| |
Collapse
|
109
|
López-Dorado A, Pérez J, Rodrigo M, Miguel-Jiménez J, Ortiz M, de Santiago L, López-Guillén E, Blanco R, Cavalliere C, Morla EMS, Boquete L, Garcia-Martin E. Diagnosis of multiple sclerosis using multifocal ERG data feature fusion. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2021; 76:157-167. [PMID: 34867127 PMCID: PMC8475498 DOI: 10.1016/j.inffus.2021.05.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 11/15/2020] [Accepted: 05/17/2021] [Indexed: 05/16/2023]
Abstract
The purpose of this paper is to implement a computer-aided diagnosis (CAD) system for multiple sclerosis (MS) based on analysing the outer retina as assessed by multifocal electroretinograms (mfERGs). MfERG recordings taken with the RETI-port/scan 21 (Roland Consult) device from 15 eyes of patients diagnosed with incipient relapsing-remitting MS and without prior optic neuritis, and from 6 eyes of control subjects, are selected. The mfERG recordings are grouped (whole macular visual field, five rings, and four quadrants). For each group, the correlation with a normative database of adaptively filtered signals, based on empirical model decomposition (EMD) and three features from the continuous wavelet transform (CWT) domain, are obtained. Of the initial 40 features, the 4 most relevant are selected in two stages: a) using a filter method and b) using a wrapper-feature selection method. The Support Vector Machine (SVM) is used as a classifier. With the optimal CAD configuration, a Matthews correlation coefficient value of 0.89 (accuracy = 0.95, specificity = 1.0 and sensitivity = 0.93) is obtained. This study identified an outer retina dysfunction in patients with recent MS by analysing the outer retina responses in the mfERG and employing an SVM as a classifier. In conclusion, a promising new electrophysiological-biomarker method based on feature fusion for MS diagnosis was identified.
Collapse
Affiliation(s)
- A. López-Dorado
- Biomedical Engineering Group, Department of Electronics, University of Alcalá, Alcalá de Henares, Spain
| | - J. Pérez
- Department of Ophthalmology, Miguel Servet University Hospital, Zaragoza, Spain
- Aragon Institute for Health Research (IIS Aragon). Miguel Servet Ophthalmology Innovation and Research Group (GIMSO), University of Zaragoza, Spain
| | - M.J. Rodrigo
- Department of Ophthalmology, Miguel Servet University Hospital, Zaragoza, Spain
- Aragon Institute for Health Research (IIS Aragon). Miguel Servet Ophthalmology Innovation and Research Group (GIMSO), University of Zaragoza, Spain
- RETICS: Thematic Networks for Co-operative Research in Health for Ocular Diseases, Spain
| | - J.M. Miguel-Jiménez
- Biomedical Engineering Group, Department of Electronics, University of Alcalá, Alcalá de Henares, Spain
| | - M. Ortiz
- School of Physics, University of Melbourne, VIC 3010, Australia
| | - L. de Santiago
- Biomedical Engineering Group, Department of Electronics, University of Alcalá, Alcalá de Henares, Spain
| | - E. López-Guillén
- Biomedical Engineering Group, Department of Electronics, University of Alcalá, Alcalá de Henares, Spain
| | - R. Blanco
- Department of Surgery, Medical and Social Sciences, University of Alcalá, Alcalá de Henares, Spain
- RETICS: Thematic Networks for Co-operative Research in Health for Ocular Diseases, Spain
| | - C. Cavalliere
- Biomedical Engineering Group, Department of Electronics, University of Alcalá, Alcalá de Henares, Spain
| | - E. Mª Sánchez Morla
- Department of Psychiatry, Hospital 12 de Octubre Research Institute (i+12), 28041 Madrid, Spain
- Faculty of Medicine, Complutense University of Madrid, 28040 Madrid, Spain
- CIBERSAM: Biomedical Research Networking Centre in Mental Health, 28029 Madrid, Spain
| | - L. Boquete
- Biomedical Engineering Group, Department of Electronics, University of Alcalá, Alcalá de Henares, Spain
- RETICS: Thematic Networks for Co-operative Research in Health for Ocular Diseases, Spain
| | - E. Garcia-Martin
- Department of Ophthalmology, Miguel Servet University Hospital, Zaragoza, Spain
- Aragon Institute for Health Research (IIS Aragon). Miguel Servet Ophthalmology Innovation and Research Group (GIMSO), University of Zaragoza, Spain
- RETICS: Thematic Networks for Co-operative Research in Health for Ocular Diseases, Spain
| |
Collapse
|
110
|
Monitoring Forest Health Using Hyperspectral Imagery: Does Feature Selection Improve the Performance of Machine-Learning Techniques? REMOTE SENSING 2021. [DOI: 10.3390/rs13234832] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
This study analyzed highly correlated, feature-rich datasets from hyperspectral remote sensing data using multiple statistical and machine-learning methods. The effect of filter-based feature selection methods on predictive performance was compared. In addition, the effect of multiple expert-based and data-driven feature sets, derived from the reflectance data, was investigated. Defoliation of trees (%), derived from in situ measurements from fall 2016, was modeled as a function of reflectance. Variable importance was assessed using permutation-based feature importance. Overall, the support vector machine (SVM) outperformed other algorithms, such as random forest (RF), extreme gradient boosting (XGBoost), and lasso (L1) and ridge (L2) regressions by at least three percentage points. The combination of certain feature sets showed small increases in predictive performance, while no substantial differences between individual feature sets were observed. For some combinations of learners and feature sets, filter methods achieved better predictive performances than using no feature selection. Ensemble filters did not have a substantial impact on performance. The most important features were located around the red edge. Additional features in the near-infrared region (800–1000 nm) were also essential to achieve the overall best performances. Filter methods have the potential to be helpful in high-dimensional situations and are able to improve the interpretation of feature effects in fitted models, which is an essential constraint in environmental modeling studies. Nevertheless, more training data and replication in similar benchmarking studies are needed to be able to generalize the results.
Collapse
|
111
|
Mahendran N, P M DRV. A deep learning framework with an embedded-based feature selection approach for the early detection of the Alzheimer's disease. Comput Biol Med 2021; 141:105056. [PMID: 34839903 DOI: 10.1016/j.compbiomed.2021.105056] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 11/20/2021] [Accepted: 11/20/2021] [Indexed: 12/29/2022]
Abstract
Ageing is associated with various ailments including Alzheimer 's disease (AD), which is a progressive form of dementia. AD symptoms develop over a period of years and, unfortunately, there is no cure. Existing AD treatments can only slow down the progression of symptoms and thus it is critical to diagnose the disease at an early stage. To help improve the early diagnosis of AD, a deep learning-based classification model with an embedded feature selection approach was used to classify AD patients. An AD DNA methylation data set (64 records with 34 cases and 34 controls) from the GEO omnibus database was used for the analysis. Before selecting the relevant features, the data were preprocessed by performing quality control, normalization and downstream analysis. As the number of associated CpG sites was huge, four embedded-based feature selection models were compared and the best method was used for the proposed classification model. An Enhanced Deep Recurrent Neural Network (EDRNN) was implemented and compared to other existing classification models, including a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), and a Deep Recurrent Neural Network (DRNN). The results showed a significant improvement in the classification accuracy of the proposed model as compared to the other methods.
Collapse
Affiliation(s)
- Nivedhitha Mahendran
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India.
| | - Durai Raj Vincent P M
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India.
| |
Collapse
|
112
|
Siddhartha M, Kumar V, Nath R. Early-stage diagnosis of chronic kidney disease using majority vote – Grey Wolf optimization (MV-GWO). HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00617-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
113
|
Chaddad A, Li J, Lu Q, Li Y, Okuwobi IP, Tanougast C, Desrosiers C, Niazi T. Can Autism Be Diagnosed with Artificial Intelligence? A Narrative Review. Diagnostics (Basel) 2021; 11:2032. [PMID: 34829379 PMCID: PMC8618159 DOI: 10.3390/diagnostics11112032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 10/31/2021] [Accepted: 10/31/2021] [Indexed: 11/16/2022] Open
Abstract
Radiomics with deep learning models have become popular in computer-aided diagnosis and have outperformed human experts on many clinical tasks. Specifically, radiomic models based on artificial intelligence (AI) are using medical data (i.e., images, molecular data, clinical variables, etc.) for predicting clinical tasks such as autism spectrum disorder (ASD). In this review, we summarized and discussed the radiomic techniques used for ASD analysis. Currently, the limited radiomic work of ASD is related to the variation of morphological features of brain thickness that is different from texture analysis. These techniques are based on imaging shape features that can be used with predictive models for predicting ASD. This review explores the progress of ASD-based radiomics with a brief description of ASD and the current non-invasive technique used to classify between ASD and healthy control (HC) subjects. With AI, new radiomic models using the deep learning techniques will be also described. To consider the texture analysis with deep CNNs, more investigations are suggested to be integrated with additional validation steps on various MRI sites.
Collapse
Affiliation(s)
- Ahmad Chaddad
- School of Artificial Intelligence, Guilin Universiy of Electronic Technology, Guilin 541004, China; (J.L.); (Q.L.); (Y.L.); (I.P.O.)
- The Laboratory for Imagery, Vision and Artificial Intelligence, École de Technologie Supérieure (ETS), Montreal, QC H3C 1K3, Canada;
| | - Jiali Li
- School of Artificial Intelligence, Guilin Universiy of Electronic Technology, Guilin 541004, China; (J.L.); (Q.L.); (Y.L.); (I.P.O.)
| | - Qizong Lu
- School of Artificial Intelligence, Guilin Universiy of Electronic Technology, Guilin 541004, China; (J.L.); (Q.L.); (Y.L.); (I.P.O.)
| | - Yujie Li
- School of Artificial Intelligence, Guilin Universiy of Electronic Technology, Guilin 541004, China; (J.L.); (Q.L.); (Y.L.); (I.P.O.)
| | - Idowu Paul Okuwobi
- School of Artificial Intelligence, Guilin Universiy of Electronic Technology, Guilin 541004, China; (J.L.); (Q.L.); (Y.L.); (I.P.O.)
| | - Camel Tanougast
- Laboratoire de Conception, Optimisation et Modélisation des Systèmes, University of Lorraine, 57070 Metz, France;
| | - Christian Desrosiers
- The Laboratory for Imagery, Vision and Artificial Intelligence, École de Technologie Supérieure (ETS), Montreal, QC H3C 1K3, Canada;
| | - Tamim Niazi
- Lady Davis Institute for Medical Research, McGill University, Montreal, QC H3T 1E2, Canada;
| |
Collapse
|
114
|
A highly predictive autoantibody-based biomarker panel for prognosis in early-stage NSCLC with potential therapeutic implications. Br J Cancer 2021; 126:238-246. [PMID: 34728792 PMCID: PMC8770460 DOI: 10.1038/s41416-021-01572-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 09/12/2021] [Accepted: 09/30/2021] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Lung cancer is the leading cause of cancer-related death worldwide. Surgical resection remains the definitive curative treatment for early-stage disease offering an overall 5-year survival rate of 62%. Despite careful case selection, a significant proportion of early-stage cancers relapse aggressively within the first year post-operatively. Identification of these patients is key to accurate prognostication and understanding the biology that drives early relapse might open up potential novel adjuvant therapies. METHODS We performed an unsupervised interrogation of >1600 serum-based autoantibody biomarkers using an iterative machine-learning algorithm. RESULTS We identified a 13 biomarker signature that was highly predictive for survivorship in post-operative early-stage lung cancer; this outperforms currently used autoantibody biomarkers in solid cancers. Our results demonstrate significantly poor survivorship in high expressers of this biomarker signature with an overall 5-year survival rate of 7.6%. CONCLUSIONS We anticipate that the data will lead to the development of an off-the-shelf prognostic panel and further that the oncogenic relevance of the proteins recognised in the panel may be a starting point for a new adjuvant therapy.
Collapse
|
115
|
Ouchani M, Gharibzadeh S, Jamshidi M, Amini M. A Review of Methods of Diagnosis and Complexity Analysis of Alzheimer's Disease Using EEG Signals. BIOMED RESEARCH INTERNATIONAL 2021; 2021:5425569. [PMID: 34746303 PMCID: PMC8566072 DOI: 10.1155/2021/5425569] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 06/20/2021] [Accepted: 10/18/2021] [Indexed: 01/27/2023]
Abstract
This study will concentrate on recent research on EEG signals for Alzheimer's diagnosis, identifying and comparing key steps of EEG-based Alzheimer's disease (AD) detection, such as EEG signal acquisition, preprocessing function extraction, and classification methods. Furthermore, highlighting general approaches, variations, and agreement in the use of EEG identified shortcomings and guidelines for multiple experimental stages ranging from demographic characteristics to outcomes monitoring for future research. Two main targets have been defined based on the article's purpose: (1) discriminative (or detection), i.e., look for differences in EEG-based features across groups, such as MCI, moderate Alzheimer's disease, extreme Alzheimer's disease, other forms of dementia, and stable normal elderly controls; and (2) progression determination, i.e., look for correlations between EEG-based features and clinical markers linked to MCI-to-AD conversion and Alzheimer's disease intensity progression. Limitations mentioned in the reviewed papers were also gathered and explored in this study, with the goal of gaining a better understanding of the problems that need to be addressed in order to advance the use of EEG in Alzheimer's disease science.
Collapse
Affiliation(s)
- Mahshad Ouchani
- Institute for Cognitive and Brain Sciences, Shahid Beheshti University, Tehran, Iran
| | - Shahriar Gharibzadeh
- Institute for Cognitive and Brain Sciences, Shahid Beheshti University, Tehran, Iran
| | - Mahdieh Jamshidi
- Institute for Cognitive and Brain Sciences, Shahid Beheshti University, Tehran, Iran
| | - Morteza Amini
- Shahid Beheshti University, Tehran, Iran
- Institute for Cognitive Science Studies (ICSS), Tehran, Iran
| |
Collapse
|
116
|
Li Y, Li G, Guo L. Feature Selection for Regression Based on Gamma Test Nested Monte Carlo Tree Search. ENTROPY 2021; 23:e23101331. [PMID: 34682055 PMCID: PMC8535147 DOI: 10.3390/e23101331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 10/06/2021] [Accepted: 10/07/2021] [Indexed: 12/03/2022]
Abstract
This paper investigates the nested Monte Carlo tree search (NMCTS) for feature selection on regression tasks. NMCTS starts out with an empty subset and uses search results of lower nesting level simulation. Level 0 is based on random moves until the path reaches the leaf node. In order to accomplish feature selection on the regression task, the Gamma test is introduced to play the role of the reward function at the end of the simulation. The concept Vratio of the Gamma test is also combined with the original UCT-tuned1 and the design of stopping conditions in the selection and simulation phases. The proposed GNMCTS method was tested on seven numeric datasets and compared with six other feature selection methods. It shows better performance than the vanilla MCTS framework and maintains the relevant information in the original feature space. The experimental results demonstrate that GNMCTS is a robust and effective tool for feature selection. It can accomplish the task well in a reasonable computation budget.
Collapse
Affiliation(s)
- Ying Li
- Beijing Key Lab of Petroleum Data Mining, Department of Geophysics, China University of Petroleum, Beijing 102249, China; (Y.L.); (L.G.)
| | - Guohe Li
- Beijing Key Lab of Petroleum Data Mining, Department of Geophysics, China University of Petroleum, Beijing 102249, China; (Y.L.); (L.G.)
- Correspondence:
| | - Lingun Guo
- Beijing Key Lab of Petroleum Data Mining, Department of Geophysics, China University of Petroleum, Beijing 102249, China; (Y.L.); (L.G.)
- College of Software, Henan Normal University, Xinxiang 453007, China
| |
Collapse
|
117
|
Jiang Z, Zhang Y, Wang J. A multi-surrogate-assisted dual-layer ensemble feature selection algorithm. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
118
|
Bommert A, Welchowski T, Schmid M, Rahnenführer J. Benchmark of filter methods for feature selection in high-dimensional gene expression survival data. Brief Bioinform 2021; 23:6366322. [PMID: 34498681 PMCID: PMC8769710 DOI: 10.1093/bib/bbab354] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 08/05/2021] [Accepted: 08/10/2021] [Indexed: 11/30/2022] Open
Abstract
Feature selection is crucial for the analysis of high-dimensional data, but benchmark studies for data with a survival outcome are rare. We compare 14 filter methods for feature selection based on 11 high-dimensional gene expression survival data sets. The aim is to provide guidance on the choice of filter methods for other researchers and practitioners. We analyze the accuracy of predictive models that employ the features selected by the filter methods. Also, we consider the run time, the number of selected features for fitting models with high predictive accuracy as well as the feature selection stability. We conclude that the simple variance filter outperforms all other considered filter methods. This filter selects the features with the largest variance and does not take into account the survival outcome. Also, we identify the correlation-adjusted regression scores filter as a more elaborate alternative that allows fitting models with similar predictive accuracy. Additionally, we investigate the filter methods based on feature rankings, finding groups of similar filters.
Collapse
Affiliation(s)
- Andrea Bommert
- Department of Statistics, TU Dortmund University, Vogelpothsweg 87, 44227, Dortmund, Germany
| | - Thomas Welchowski
- Institute of Medical Biometry, Informatics and Epidemiology (IMBIE), Medical Faculty, University of Bonn, Venusberg-Campus 1, 53127, Bonn, Germany
| | - Matthias Schmid
- Institute of Medical Biometry, Informatics and Epidemiology (IMBIE), Medical Faculty, University of Bonn, Venusberg-Campus 1, 53127, Bonn, Germany
| | - Jörg Rahnenführer
- Department of Statistics, TU Dortmund University, Vogelpothsweg 87, 44227, Dortmund, Germany
| |
Collapse
|
119
|
Sundaram S, Zeid A. Smart Prognostics and Health Management (SPHM) in Smart Manufacturing: An Interoperable Framework. SENSORS (BASEL, SWITZERLAND) 2021; 21:5994. [PMID: 34577203 PMCID: PMC8472989 DOI: 10.3390/s21185994] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 08/28/2021] [Accepted: 09/01/2021] [Indexed: 11/18/2022]
Abstract
Advances in the manufacturing industry have led to modern approaches such as Industry 4.0, Cyber-Physical Systems, Smart Manufacturing (SM) and Digital Twins. The traditional manufacturing architecture that consisted of hierarchical layers has evolved into a hierarchy-free network in which all the areas of a manufacturing enterprise are interconnected. The field devices on the shop floor generate large amounts of data that can be useful for maintenance planning. Prognostics and Health Management (PHM) approaches use this data and help us in fault detection and Remaining Useful Life (RUL) estimation. Although there is a significant amount of research primarily focused on tool wear prediction and Condition-Based Monitoring (CBM), there is not much importance given to the multiple facets of PHM. This paper conducts a review of PHM approaches, the current research trends and proposes a three-phased interoperable framework to implement Smart Prognostics and Health Management (SPHM). The uniqueness of SPHM lies in its framework, which makes it applicable to any manufacturing operation across the industry. The framework consists of three phases: Phase 1 consists of the shopfloor setup and data acquisition steps, Phase 2 describes steps to prepare and analyze the data and Phase 3 consists of modeling, predictions and deployment. The first two phases of SPHM are addressed in detail and an overview is provided for the third phase, which is a part of ongoing research. As a use-case, the first two phases of the SPHM framework are applied to data from a milling machine operation.
Collapse
Affiliation(s)
| | - Abe Zeid
- College of Engineering, Northeastern University, Boston, MA 02135, USA;
| |
Collapse
|
120
|
Hamid TMTA, Sallehuddin R, Yunos ZM, Ali A. Ensemble Based Filter Feature Selection with Harmonize Particle Swarm Optimization and Support Vector Machine for Optimal Cancer Classification. MACHINE LEARNING WITH APPLICATIONS 2021. [DOI: 10.1016/j.mlwa.2021.100054] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
|
121
|
Topolski M. Application of Feature Extraction Methods for Chemical Risk Classification in the Pharmaceutical Industry. SENSORS 2021; 21:s21175753. [PMID: 34502644 PMCID: PMC8434006 DOI: 10.3390/s21175753] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Revised: 08/20/2021] [Accepted: 08/21/2021] [Indexed: 11/25/2022]
Abstract
The features that are used in the classification process are acquired from sensor data on the production site (associated with toxic, physicochemical properties) and also a dataset associated with cybersecurity that may affect the above-mentioned risk. These are large datasets, so it is important to reduce them. The author’s motivation was to develop a method of assessing the dimensionality of features based on correlation measures and the discriminant power of features allowing for a more accurate reduction of their dimensions compared to the classical Kaiser criterion and assessment of scree plot. The method proved to be promising. The results obtained in the experiments demonstrate that the quality of classification after extraction is better than using classical criteria for estimating the number of components and features. Experiments were carried out for various extraction methods, demonstrating that the rotation of factors according to centroids of a class in this classification task gives the best risk assessment of chemical threats. The classification quality increased by about 7% compared to a model where feature extraction was not used and resulted in an improvement of 4% compared to the classical PCA method with the Kaiser criterion, with an evaluation of the scree plot. Furthermore, it has been shown that there is a certain subspace of cybersecurity features, which complemented with the features of the concentration of volatile substances, affects the risk assessment of chemical hazards. The identified cybersecurity factors are the number of packets lost, incorrect Logins, incorrect sensor responses, increased email spam, and excessive traffic in the computer network. To visualize the speed of classification in real-time, simulations were carried out for various systems used in Industry 4.0.
Collapse
Affiliation(s)
- Mariusz Topolski
- Department of Systems and Computer Networks, Faculty of Electronics, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland
| |
Collapse
|
122
|
Khaleghi MK, Savizi ISP, Lewis NE, Shojaosadati SA. Synergisms of machine learning and constraint-based modeling of metabolism for analysis and optimization of fermentation parameters. Biotechnol J 2021; 16:e2100212. [PMID: 34390201 DOI: 10.1002/biot.202100212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 08/10/2021] [Accepted: 08/11/2021] [Indexed: 11/06/2022]
Abstract
Recent noteworthy advances in the development of high-performing microbial and mammalian strains have enabled the sustainable production of bio-economically valuable substances such as bio-compounds, biofuels, and biopharmaceuticals. However, to obtain an industrially viable mass-production scheme, much time and effort are required. The robust and rational design of fermentation processes requires analysis and optimization of different extracellular conditions and medium components, which have a massive effect on growth and productivity. In this regard, knowledge- and data-driven modeling methods have received much attention. Constraint-based modeling (CBM) is a knowledge-driven mathematical approach that has been widely used in fermentation analysis and optimization due to its capabilities of predicting the cellular phenotype from genotype through high-throughput means. On the other hand, machine learning (ML) is a data-driven statistical method that identifies the data patterns within sophisticated biological systems and processes, where there is inadequate knowledge to represent underlying mechanisms. Furthermore, ML models are becoming a viable complement to constraint-based models in a reciprocal manner when one is used as a pre-step of another. As a result, more predictable model is produced. This review highlights the applications of CBM and ML independently and the combination of these two approaches for analyzing and optimizing fermentation parameters. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Mohammad Karim Khaleghi
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| | - Iman Shahidi Pour Savizi
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| | - Nathan E Lewis
- Department of Bioengineering, University of California, San Diego, USA.,Department of Pediatrics, University of California, San Diego, USA
| | - Seyed Abbas Shojaosadati
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
123
|
Degeest A, Frénay B, Verleysen M. Reading grid for feature selection relevance criteria in regression. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.04.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
124
|
Usmani S, Saboor A, Haris M, Khan MA, Park H. Latest Research Trends in Fall Detection and Prevention Using Machine Learning: A Systematic Review. SENSORS 2021; 21:s21155134. [PMID: 34372371 PMCID: PMC8347190 DOI: 10.3390/s21155134] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/16/2021] [Accepted: 07/24/2021] [Indexed: 12/15/2022]
Abstract
Falls are unusual actions that cause a significant health risk among older people. The growing percentage of people of old age requires urgent development of fall detection and prevention systems. The emerging technology focuses on developing such systems to improve quality of life, especially for the elderly. A fall prevention system tries to predict and reduce the risk of falls. In contrast, a fall detection system observes the fall and generates a help notification to minimize the consequences of falls. A plethora of technical and review papers exist in the literature with a primary focus on fall detection. Similarly, several studies are relatively old, with a focus on wearables only, and use statistical and threshold-based approaches with a high false alarm rate. Therefore, this paper presents the latest research trends in fall detection and prevention systems using Machine Learning (ML) algorithms. It uses recent studies and analyzes datasets, age groups, ML algorithms, sensors, and location. Additionally, it provides a detailed discussion of the current trends of fall detection and prevention systems with possible future directions. This overview can help researchers understand the current systems and propose new methodologies by improving the highlighted issues.
Collapse
Affiliation(s)
- Sara Usmani
- School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan; (S.U.); (M.H.)
| | - Abdul Saboor
- Department of Electrical Engineering (ESAT), Katholieke Universiteit (KU) Leuven, 3000 Leuven, Belgium;
| | - Muhammad Haris
- School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan; (S.U.); (M.H.)
| | - Muneeb A. Khan
- Department of Software, Sangmyung University, Cheonan 31066, Korea;
| | - Heemin Park
- Department of Software, Sangmyung University, Cheonan 31066, Korea;
- Correspondence:
| |
Collapse
|
125
|
Biological knowledge-slanted random forest approach for the classification of calcified aortic valve stenosis. BioData Min 2021; 14:35. [PMID: 34301292 PMCID: PMC8305490 DOI: 10.1186/s13040-021-00269-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2021] [Accepted: 07/18/2021] [Indexed: 11/29/2022] Open
Abstract
Background Calcific aortic valve stenosis (CAVS) is a fatal disease and there is no pharmacological treatment to prevent the progression of CAVS. This study aims to identify genes potentially implicated with CAVS in patients with congenital bicuspid aortic valve (BAV) and tricuspid aortic valve (TAV) in comparison with patients having normal valves, using a knowledge-slanted random forest (RF). Results This study implemented a knowledge-slanted random forest (RF) using information extracted from a protein-protein interactions network to rank genes in order to modify their selection probability to draw the candidate split-variables. A total of 15,191 genes were assessed in 19 valves with CAVS (BAV, n = 10; TAV, n = 9) and 8 normal valves. The performance of the model was evaluated using accuracy, sensitivity, and specificity to discriminate cases with CAVS. A comparison with conventional RF was also performed. The performance of this proposed approach reported improved accuracy in comparison with conventional RF to classify cases separately with BAV and TAV (Slanted RF: 59.3% versus 40.7%). When patients with BAV and TAV were grouped against patients with normal valves, the addition of prior biological information was not relevant with an accuracy of 92.6%. Conclusion The knowledge-slanted RF approach reflected prior biological knowledge, leading to better precision in distinguishing between cases with BAV, TAV, and normal valves. The results of this study suggest that the integration of biological knowledge can be useful during difficult classification tasks. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00269-4.
Collapse
|
126
|
Outlier Detection Based Feature Selection Exploiting Bio-Inspired Optimization Algorithms. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11156769] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The curse of dimensionality problem occurs when the data are high-dimensional. It affects the learning process and reduces the accuracy. Feature selection is one of the dimensionality reduction approaches that mainly contribute to solving the curse of the dimensionality problem by selecting the relevant features. Irrelevant features are the dependent and redundant features that cause noise in the data and then reduce its quality. The main well-known feature-selection methods are wrapper and filter techniques. However, wrapper feature selection techniques are computationally expensive, whereas filter feature selection methods suffer from multicollinearity. In this research study, four new feature selection methods based on outlier detection using the Projection Pursuit method are proposed. Outlier detection involves identifying abnormal data (irrelevant features of the transpose matrix obtained from the original dataset matrix). The concept of outlier detection using projection pursuit has proved its efficiency in many applications but has not yet been used as a feature selection approach. To the author’s knowledge, this study is the first of its kind. Experimental results on nineteen real datasets using three classifiers (k-NN, SVM, and Random Forest) indicated that the suggested methods enhanced the classification accuracy rate by an average of 6.64% when compared to the classification accuracy without applying feature selection. It also outperformed the state-of-the-art methods on most of the used datasets with an improvement rate ranging between 0.76% and 30.64%. Statistical analysis showed that the results of the proposed methods are statistically significant.
Collapse
|
127
|
Abstract
Class imbalance and high dimensionality are two major issues in several real-life applications, e.g., in the fields of bioinformatics, text mining and image classification. However, while both issues have been extensively studied in the machine learning community, they have mostly been treated separately, and little research has been thus far conducted on which approaches might be best suited to deal with datasets that are class-imbalanced and high-dimensional at the same time (i.e., with a large number of features). This work attempts to give a contribution to this challenging research area by studying the effectiveness of hybrid learning strategies that involve the integration of feature selection techniques, to reduce the data dimensionality, with proper methods that cope with the adverse effects of class imbalance (in particular, data balancing and cost-sensitive methods are considered). Extensive experiments have been carried out across datasets from different domains, leveraging a well-known classifier, the Random Forest, which has proven to be effective in high-dimensional spaces and has also been successfully applied to imbalanced tasks. Our results give evidence of the benefits of such a hybrid approach, when compared to using only feature selection or imbalance learning methods alone.
Collapse
|
128
|
Particle Swarm Optimization and Multiple Stacked Generalizations to Detect Nitrogen and Organic-Matter in Organic-Fertilizer Using Vis-NIR. SENSORS 2021; 21:s21144882. [PMID: 34300620 PMCID: PMC8309747 DOI: 10.3390/s21144882] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Revised: 07/13/2021] [Accepted: 07/16/2021] [Indexed: 11/29/2022]
Abstract
Organic fertilizer is a key component of agricultural sustainability and significantly contributes to the improvement of soil fertility. The values of nutrients such as organic matter and nitrogen in organic fertilizers positively affect plant growth and cause environmental problems when used in large amounts. Hence the importance of implementing fast detection of nitrogen (N) and organic matter (OM). This paper examines the feasibility of a framework that combined a particle swarm optimization (PSO) and two multiple stacked generalizations to determine the amount of nitrogen and organic matter in organic-fertilizer using visible near-infrared spectroscopy (Vis-NIR). The first multiple stacked generalizations for classification coupled with PSO (FSGC-PSO) were for feature selection purposes, while the second stacked generalizations for regression (SSGR) improved the detection of nitrogen and organic matter. The computation of root means square error (RMSE) and the coefficient of determination for calibration and prediction set (R2) was used to gauge the different models. The obtained FSGC-PSO subset combined with SSGR achieved significantly better prediction results than conventional methods such as Ridge, support vector machine (SVM), and partial least square (PLS) for both nitrogen (R2p = 0.9989, root mean square error of prediction (RMSEP) = 0.031 and limit of detection (LOD) = 2.97) and organic matter (R2p = 0.9972, RMSEP = 0.051 and LOD = 2.97). Therefore, our settled approach can be implemented as a promising way to monitor and evaluate the amount of N and OM in organic fertilizer.
Collapse
|
129
|
Sheikhi G, Altınçay H. A novel dissimilarity metric based on feature‐to‐feature scatter frequencies for clustering‐based feature selection in biomedical data. Comput Intell 2021. [DOI: 10.1111/coin.12470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ghazaal Sheikhi
- Department of Computer Engineering Final International University Kyrenia North Cyprus Turkey
| | - Hakan Altınçay
- Department of Computer Engineering Eastern Mediterranean University Famagusta North Cyprus Turkey
| |
Collapse
|
130
|
Krarti M, Aldubyan M. Review analysis of COVID-19 impact on electricity demand for residential buildings. RENEWABLE & SUSTAINABLE ENERGY REVIEWS 2021; 143:110888. [PMID: 36310544 PMCID: PMC9586839 DOI: 10.1016/j.rser.2021.110888] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Revised: 02/18/2021] [Accepted: 02/25/2021] [Indexed: 05/02/2023]
Abstract
In this paper, a systematic review analysis of fully enforced stay at home orders and government lockdowns is presented. The main goal of the analysis is to identify the impacts of stay home living patterns on energy consumption of residential buildings. Specifically, metered data collected from various reported sources are reviewed and analyzed to assess the changes in overall electricity demand for various countries and US states. Weather adjusted time series data of electricity demand before and after COVID-19 lockdowns are used to determine the magnitude of changes in electricity demand and residential energy use patterns. The analysis results indicate that while overall electricity demand is lower because of lockdowns that impact commercial buildings and manufacturing sectors, the energy consumption for the housing sector has increased by as much as 30% during the full 2020 lockdown period. Analysis of reported end-use data indicates that most of the increase in household energy demand is due to higher occupancy patterns during daytime hours, resulting in increased use of energy intensive systems such as heating, air conditioning, lighting, and appliances. Several energy efficiency and renewable energy solutions are presented to cost-effectively mitigate the increase in energy demands due to extended stayhome living patterns.
Collapse
Affiliation(s)
- Moncef Krarti
- University of Colorado Boulder, CO, USA
- KAPSARC, Riyadh, Saudi Arabia
| | | |
Collapse
|
131
|
Robust variable selection for model-based learning in presence of adulteration. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2021.107186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
132
|
Pashaei E, Pashaei E. Gene selection using hybrid dragonfly black hole algorithm: A case study on RNA-seq COVID-19 data. Anal Biochem 2021; 627:114242. [PMID: 33974890 DOI: 10.1016/j.ab.2021.114242] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 04/12/2021] [Accepted: 05/02/2021] [Indexed: 11/18/2022]
Abstract
This paper introduces a new hybrid approach (DBH) for solving gene selection problem that incorporates the strengths of two existing metaheuristics: binary dragonfly algorithm (BDF) and binary black hole algorithm (BBHA). This hybridization aims to identify a limited and stable set of discriminative genes without sacrificing classification accuracy, whereas most current methods have encountered challenges in extracting disease-related information from a vast amount of redundant genes. The proposed approach first applies the minimum redundancy maximum relevancy (MRMR) filter method to reduce the dimensionality of feature space and then utilizes the suggested hybrid DBH algorithm to determine a smaller set of significant genes. The proposed approach was evaluated on eight benchmark gene expression datasets, and then, was compared against the latest state-of-art techniques to demonstrate algorithm efficiency. The comparative study shows that the proposed approach achieves a significant improvement as compared with existing methods in terms of classification accuracy and the number of selected genes. Moreover, the performance of the suggested method was examined on real RNA-Seq coronavirus-related gene expression data of asthmatic patients for selecting the most significant genes in order to improve the discriminative accuracy of angiotensin-converting enzyme 2 (ACE2). ACE2, as a coronavirus receptor, is a biomarker that helps to classify infected patients from uninfected in order to identify subgroups at risk for COVID-19. The result denotes that the suggested MRMR-DBH approach represents a very promising framework for finding a new combination of most discriminative genes with high classification accuracy.
Collapse
Affiliation(s)
- Elnaz Pashaei
- Department of Software Engineering, Istanbul Aydin University, Istanbul, Turkey.
| | - Elham Pashaei
- Department of Computer Engineering, Istanbul Gelisim University, Istanbul, Turkey.
| |
Collapse
|
133
|
Novel Prediction Model for Steel Mechanical Properties with MSVR Based on MIC and Complex Network Clustering. METALS 2021. [DOI: 10.3390/met11050747] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Traditional mechanical properties prediction models are mostly based on experience and mechanism, which neglect the linear and nonlinear relationships between process parameters. Aiming at the high-dimensional data collected in the complex industrial process of steel production, a new prediction model is proposed. The multidimensional support vector regression (MSVR)-based model is combined with the feature selection method, which involves maximum information coefficient (MIC) correlation characterization and complex network clustering. Firstly, MIC is used to measure the correlation between process parameters and mechanical properties, based on which a complex network is constructed and hierarchical clustering is performed. Secondly, we evaluate all parameters and select a representative one for each partition as the input of the subsequent model based on the centrality and influence indicators. Finally, an actual steel production case is used to train the MSVR prediction model. The prediction results show that our proposed framework can capture effective features from the full parameters in terms of higher prediction accuracy and is less time-consuming compared with the Pearson-based subset, full-parameter subset, and empirical subset input. The feature selection method based on MIC can dig out some nonlinear relationships which cannot be found by Pearson coefficient.
Collapse
|
134
|
P A, G SS, Srivastava G, Maddikunta PKR, Gadekallu TR. A Two-stage Text Feature Selection Algorithm for Improving Text Classification. ACM T ASIAN LOW-RESO 2021. [DOI: 10.1145/3425781] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
As the number of digital text documents increases on a daily basis, the classification of text is becoming a challenging task. Each text document consists of a large number of words (or features) that drive down the efficiency of a classification algorithm. This article presents an optimized feature selection algorithm designed to reduce a large number of features to improve the accuracy of the text classification algorithm. The proposed algorithm uses noun-based filtering, a word ranking that enhances the performance of the text classification algorithm. Experiments are carried out on three benchmark datasets, and the results show that the proposed classification algorithm has achieved the maximum accuracy when compared to the existing algorithms. The proposed algorithm is compared to Term Frequency-Inverse Document Frequency, Balanced Accuracy Measure, GINI Index, Information Gain, and Chi-Square. The experimental results clearly show the strength of the proposed algorithm.
Collapse
Affiliation(s)
- Ashokkumar P
- Sri Ramachandra College of Engineering and Technology, Sri Ramachandra Institute of Higher Education and Research, Chennai, Tamil Nadu
| | - Siva Shankar G
- Sri Ramachandra College of Engineering and Technology, Sri Ramachandra Institute of Higher Education and Research, Chennai, Tamil Nadu
| | - Gautam Srivastava
- Department of Mathematics and Computer Science, Brandon University Research Center for Interneural Computing, China Medical University, Taichung, Taiwan, Republic of China
| | | | | |
Collapse
|
135
|
Computational methods for integrative evaluation of confidence, accuracy, and reaction time in facial affect recognition in schizophrenia. SCHIZOPHRENIA RESEARCH-COGNITION 2021; 25:100196. [PMID: 33996517 PMCID: PMC8093458 DOI: 10.1016/j.scog.2021.100196] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 03/06/2021] [Accepted: 03/10/2021] [Indexed: 11/21/2022]
Abstract
People with schizophrenia (SZ) process emotions less accurately than do healthy comparators (HC), and emotion recognition have expanded beyond accuracy to performance variables like reaction time (RT) and confidence. These domains are typically evaluated independently, but complex inter-relationships can be evaluated through machine learning at an item-by-item level. Using a mix of ranking and machine learning tools, we investigated item-by-item discrimination of facial affect with two emotion recognition tests (BLERT and ER-40) between SZ and HC. The best performing multi-domain model for ER40 had a large effect size in differentiating SZ and HC (d = 1.24) compared to a standard comparison of accuracy alone (d = 0.48); smaller increments in effect sizes were evident for the BLERT (d = 0.87 vs. d = 0.58). Almost half of the selected items were confidence ratings. Within SZ, machine learning models with ER40 (generally accuracy and reaction time) items predicted severity of depression and overconfidence in social cognitive ability, but not psychotic symptoms. Pending independent replication, the results support machine learning, and the inclusion of confidence ratings, in characterizing the social cognitive deficits in SZ. This moderate-sized study (n = 372) included subjects with schizophrenia (SZ, n = 218) and healthy controls (HC, n = 154). This paper explores the value of integrative evaluation of confidence, accuracy, and reaction time by way of machine learning in understanding the unique aspects of facial affect recognition in schizophrenia. Machine learning models better separated schizophrenia from healthy comparators that standard statistical comparison, confidence ratings contributed to this separation in a disproportionate manner. Machine learning approaches provide a novel way to analyze item-by-item associations with social cognition measures, or potentially other tests, where multiple overlapping dimensions exist. Aberrant confidence ratings interact with performance variables in complex ways to contribute to social cognitive deficits in schizophrenia.
Collapse
|
136
|
Kim YJ, Jeon JS, Cho SE, Kim KG, Kang SG. Prediction Models for Obstructive Sleep Apnea in Korean Adults Using Machine Learning Techniques. Diagnostics (Basel) 2021; 11:diagnostics11040612. [PMID: 33808100 PMCID: PMC8066462 DOI: 10.3390/diagnostics11040612] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 03/24/2021] [Accepted: 03/26/2021] [Indexed: 12/01/2022] Open
Abstract
This study aimed to investigate the applicability of machine learning to predict obstructive sleep apnea (OSA) among individuals with suspected OSA in South Korea. A total of 92 clinical variables for OSA were collected from 279 South Koreans (OSA, n = 213; no OSA, n = 66), from which seven major clinical indices were selected. The data were randomly divided into training data (OSA, n = 149; no OSA, n = 46) and test data (OSA, n = 64; no OSA, n = 20). Using the seven clinical indices, the OSA prediction models were trained using four types of machine learning models—logistic regression, support vector machine (SVM), random forest, and XGBoost (XGB)—and each model was validated using the test data. In the validation, the SVM showed the best OSA prediction result with a sensitivity, specificity, and area under curve (AUC) of 80.33%, 86.96%, and 0.87, respectively, while the XGB showed the lowest OSA prediction performance with a sensitivity, specificity, and AUC of 78.69%, 73.91%, and 0.80, respectively. The machine learning algorithms showed high OSA prediction performance using data from South Koreans with suspected OSA. Hence, machine learning will be helpful in clinical applications for OSA prediction in the Korean population.
Collapse
Affiliation(s)
- Young Jae Kim
- Department of Biomedical Engineering, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea; (Y.J.K.); (J.S.J.)
| | - Ji Soo Jeon
- Department of Biomedical Engineering, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea; (Y.J.K.); (J.S.J.)
| | - Seo-Eun Cho
- Department of Psychiatry, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea;
| | - Kwang Gi Kim
- Department of Biomedical Engineering, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea; (Y.J.K.); (J.S.J.)
- Correspondence: (K.G.K.); (S.-G.K.); Tel.: +82-32-458-2818 (S.-G.K.)
| | - Seung-Gul Kang
- Department of Psychiatry, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea;
- Correspondence: (K.G.K.); (S.-G.K.); Tel.: +82-32-458-2818 (S.-G.K.)
| |
Collapse
|
137
|
Piles M, Bergsma R, Gianola D, Gilbert H, Tusell L. Feature Selection Stability and Accuracy of Prediction Models for Genomic Prediction of Residual Feed Intake in Pigs Using Machine Learning. Front Genet 2021; 12:611506. [PMID: 33692825 PMCID: PMC7938892 DOI: 10.3389/fgene.2021.611506] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 01/20/2021] [Indexed: 11/25/2022] Open
Abstract
Feature selection (FS, i.e., selection of a subset of predictor variables) is essential in high-dimensional datasets to prevent overfitting of prediction/classification models and reduce computation time and resources. In genomics, FS allows identifying relevant markers and designing low-density SNP chips to evaluate selection candidates. In this research, several univariate and multivariate FS algorithms combined with various parametric and non-parametric learners were applied to the prediction of feed efficiency in growing pigs from high-dimensional genomic data. The objective was to find the best combination of feature selector, SNP subset size, and learner leading to accurate and stable (i.e., less sensitive to changes in the training data) prediction models. Genomic best linear unbiased prediction (GBLUP) without SNP pre-selection was the benchmark. Three types of FS methods were implemented: (i) filter methods: univariate (univ.dtree, spearcor) or multivariate (cforest, mrmr), with random selection as benchmark; (ii) embedded methods: elastic net and least absolute shrinkage and selection operator (LASSO) regression; (iii) combination of filter and embedded methods. Ridge regression, support vector machine (SVM), and gradient boosting (GB) were applied after pre-selection performed with the filter methods. Data represented 5,708 individual records of residual feed intake to be predicted from the animal’s own genotype. Accuracy (stability of results) was measured as the median (interquartile range) of the Spearman correlation between observed and predicted data in a 10-fold cross-validation. The best prediction in terms of accuracy and stability was obtained with SVM and GB using 500 or more SNPs [0.28 (0.02) and 0.27 (0.04) for SVM and GB with 1,000 SNPs, respectively]. With larger subset sizes (1,000–1,500 SNPs), the filter method had no influence on prediction quality, which was similar to that attained with a random selection. With 50–250 SNPs, the FS method had a huge impact on prediction quality: it was very poor for tree-based methods combined with any learner, but good and similar to what was obtained with larger SNP subsets when spearcor or mrmr were implemented with or without embedded methods. Those filters also led to very stable results, suggesting their potential use for designing low-density SNP chips for genome-based evaluation of feed efficiency.
Collapse
Affiliation(s)
- Miriam Piles
- Animal Breeding and Genetics Program, Institute of Agriculture and Food Research and Technology (IRTA), Barcelona, Spain
| | - Rob Bergsma
- Topigs Norsvin Research Center, Beuningen, Netherlands
| | - Daniel Gianola
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, United States.,Department of Dairy Science, University of Wisconsin-Madison, Madison, WI, United States
| | - Hélène Gilbert
- GenPhySE, INRAE, Université de Toulouse, Castanet-Tolosan, France
| | - Llibertat Tusell
- Animal Breeding and Genetics Program, Institute of Agriculture and Food Research and Technology (IRTA), Barcelona, Spain.,GenPhySE, INRAE, Université de Toulouse, Castanet-Tolosan, France
| |
Collapse
|
138
|
Feature Subset Selection for Malware Detection in Smart IoT Platforms. SENSORS 2021; 21:s21041374. [PMID: 33669191 PMCID: PMC7919840 DOI: 10.3390/s21041374] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 02/01/2021] [Accepted: 02/10/2021] [Indexed: 11/16/2022]
Abstract
Malicious software (“malware”) has become one of the serious cybersecurity issues in Android ecosystem. Given the fast evolution of Android malware releases, it is practically not feasible to manually detect malware apps in the Android ecosystem. As a result, machine learning has become a fledgling approach for malware detection. Since machine learning performance is largely influenced by the availability of high quality and relevant features, feature selection approaches play key role in machine learning based detection of malware. In this paper, we formulate the feature selection problem as a quadratic programming problem and analyse how commonly used filter-based feature selection methods work with emphases on Android malware detection. We compare and contrast several feature selection methods along several factors including the composition of relevant features selected. We empirically evaluate the predictive accuracy of the feature subset selection algorithms and compare their predictive accuracy and the execution time using several learning algorithms. The results of the experiments confirm that feature selection is necessary for improving accuracy of the learning models as well decreasing the run time. The results also show that the performance of the feature selection algorithms vary from one learning algorithm to another and no one feature selection approach performs better than the other approaches all the time.
Collapse
|
139
|
Ben Brahim A. Stable feature selection based on instance learning, redundancy elimination and efficient subsets fusion. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-04971-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
140
|
Hameed SS, Hassan R, Hassan WH, Muhammadsharif FF, Latiff LA. HDG-select: A novel GUI based application for gene selection and classification in high dimensional datasets. PLoS One 2021; 16:e0246039. [PMID: 33507983 PMCID: PMC7842997 DOI: 10.1371/journal.pone.0246039] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Accepted: 01/12/2021] [Indexed: 11/24/2022] Open
Abstract
The selection and classification of genes is essential for the identification of related genes to a specific disease. Developing a user-friendly application with combined statistical rigor and machine learning functionality to help the biomedical researchers and end users is of great importance. In this work, a novel stand-alone application, which is based on graphical user interface (GUI), is developed to perform the full functionality of gene selection and classification in high dimensional datasets. The so-called HDG-select application is validated on eleven high dimensional datasets of the format CSV and GEO soft. The proposed tool uses the efficient algorithm of combined filter-GBPSO-SVM and it was made freely available to users. It was found that the proposed HDG-select outperformed other tools reported in literature and presented a competitive performance, accessibility, and functionality.
Collapse
Affiliation(s)
- Shilan S. Hameed
- Computer Systems and Networks (CSN), Malaysia-Japan International Institute of Technology (MJIIT), Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
- Directorate of Information Technology, Koya University, Koya, Kurdistan Region-F.R., Iraq
| | - Rohayanti Hassan
- School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia
| | - Wan Haslina Hassan
- Computer Systems and Networks (CSN), Malaysia-Japan International Institute of Technology (MJIIT), Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
| | - Fahmi F. Muhammadsharif
- Department of Physics, Faculty of Science and Health, Koya University, Koya, Kurdistan Region-F.R., Iraq
| | - Liza Abdul Latiff
- U-BAN Research Group, Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
| |
Collapse
|
141
|
Sharma A, Colonna G. System-Wide Pollution of Biomedical Data: Consequence of the Search for Hub Genes of Hepatocellular Carcinoma Without Spatiotemporal Consideration. Mol Diagn Ther 2021; 25:9-27. [PMID: 33475988 PMCID: PMC7847983 DOI: 10.1007/s40291-020-00505-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/01/2020] [Indexed: 12/17/2022]
Abstract
Biomedical institutions rely on data evaluation and are turning into data factories. Big-data storage centers, supercomputing systems, and increased algorithmic efficiency allow us to analyze the ever-increasing amount of data generated every day in biomedical research centers. In network science, the principal intrinsic problem is how to integrate the data and information from different experiments on genes or proteins. Data curation is an essential process in annotating new functional data to known genes or proteins, undertaken by a biobank curator, which is then reflected in the calculated networks. We provide an example of how protein-protein networks today have space-time limits. The next step is the integration of data and information from different biobanks. Omics data and networks are essential parts of this step but also have flawed protocols and errors. Consider data from patients with cancer: from biopsy procedures to experimental tests, to archiving methods and computational algorithms, these are continuously handled so require critical and continuous "updates" to obtain reproducible, reliable, and correct results. We show, as a second example, how all this distorts studies in cellular hepatocellular carcinoma. It is not unlikely that these flawed data have been polluting biobanks for some time before stringent conditions for the veracity of data were implemented in Big data. Therefore, all this could contribute to errors in future medical decisions.
Collapse
Affiliation(s)
- Ankush Sharma
- Department of Biosciences, University of Oslo, Oslo, Norway.
- Department of Informatics, University of Oslo, Oslo, Norway.
- Institute of Cancer Research, Institute of Clinical medicine, University of Oslo, Oslo, Norway.
| | - Giovanni Colonna
- Medical Informatics, AOU-Vanvitelli, Università della Campania, Naples, Italy
| |
Collapse
|
142
|
Ruuskanen MO, Åberg F, Männistö V, Havulinna AS, Méric G, Liu Y, Loomba R, Vázquez-Baeza Y, Tripathi A, Valsta LM, Inouye M, Jousilahti P, Salomaa V, Jain M, Knight R, Lahti L, Niiranen TJ. Links between gut microbiome composition and fatty liver disease in a large population sample. Gut Microbes 2021; 13:1-22. [PMID: 33651661 PMCID: PMC7928040 DOI: 10.1080/19490976.2021.1888673] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 01/14/2021] [Accepted: 01/28/2021] [Indexed: 02/08/2023] Open
Abstract
Fatty liver disease is the most common liver disease in the world. Its connection with the gut microbiome has been known for at least 80 y, but this association remains mostly unstudied in the general population because of underdiagnosis and small sample sizes. To address this knowledge gap, we studied the link between the Fatty Liver Index (FLI), a well-established proxy for fatty liver disease, and gut microbiome composition in a representative, ethnically homogeneous population sample of 6,269 Finnish participants. We based our models on biometric covariates and gut microbiome compositions from shallow metagenome sequencing. Our classification models could discriminate between individuals with a high FLI (≥60, indicates likely liver steatosis) and low FLI (<60) in internal cross-region validation, consisting of 30% of the data not used in model training, with an average AUC of 0.75 and AUPRC of 0.56 (baseline at 0.30). In addition to age and sex, our models included differences in 11 microbial groups from class Clostridia, mostly belonging to orders Lachnospirales and Oscillospirales. Our models were also predictive of the high FLI group in a different Finnish cohort, consisting of 258 participants, with an average AUC of 0.77 and AUPRC of 0.51 (baseline at 0.21). Pathway analysis of representative genomes of the positively FLI-associated taxa in (NCBI) Clostridium subclusters IV and XIVa indicated the presence of, e.g., ethanol fermentation pathways. These results support several findings from smaller case-control studies, such as the role of endogenous ethanol producers in the development of the fatty liver.
Collapse
Affiliation(s)
- Matti O. Ruuskanen
- Department of Internal Medicine, University of Turku, Turku, Finland
- Department of Public Health Solutions, Finnish Institute for Health and Welfare, Helsinki, Finland
| | - Fredrik Åberg
- Transplantation and Liver Surgery Clinic, Helsinki University Hospital, University of Helsinki, Helsinki, Finland
- Transplant Institute, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Ville Männistö
- Department of Medicine, Kuopio University Hospital, University of Eastern Finland, Kuopio, Finland
- Department of Experimental Vascular Medicine, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Aki S. Havulinna
- Department of Public Health Solutions, Finnish Institute for Health and Welfare, Helsinki, Finland
- Institute for Molecular Medicine Finland, FIMM - HiLIFE, Helsinki, Finland
| | - Guillaume Méric
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria, Australia
| | - Yang Liu
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Department of Clinical Pathology, The University of Melbourne, Melbourne, Victoria, Australia
| | - Rohit Loomba
- Department of Medicine, NAFLD Research Center, La Jolla, CA, USA
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Yoshiki Vázquez-Baeza
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
| | - Anupriya Tripathi
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
| | - Liisa M. Valsta
- Department of Public Health Solutions, Finnish Institute for Health and Welfare, Helsinki, Finland
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Department of Public Health and Primary Care, Cambridge University, Cambridge, UK
| | - Pekka Jousilahti
- Department of Public Health Solutions, Finnish Institute for Health and Welfare, Helsinki, Finland
| | - Veikko Salomaa
- Department of Public Health Solutions, Finnish Institute for Health and Welfare, Helsinki, Finland
| | - Mohit Jain
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Department of Pharmacology, University of California San Diego, La Jolla, California, USA
| | - Rob Knight
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, California, USA
- Department of Computer Science & Engineering, University of California San Diego, La Jolla, California, USA
| | - Leo Lahti
- Deparment of Computing, University of Turku, Turku, Finland
| | - Teemu J. Niiranen
- Department of Internal Medicine, University of Turku, Turku, Finland
- Department of Public Health Solutions, Finnish Institute for Health and Welfare, Helsinki, Finland
- Division of Medicine, Turku University Hospital, Turku, Finland
| |
Collapse
|
143
|
Rasheed I, Banka H, Khan HM. A Hybrid Feature Selection Approach Based on LSI for Classification of Urdu Text. STUDIES IN COMPUTATIONAL INTELLIGENCE 2021. [DOI: 10.1007/978-3-030-50641-4_1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
144
|
Using a Text Mining Approach to Hear Voices of Customers from Social Media toward the Fast-Food Restaurant Industry. SUSTAINABILITY 2020. [DOI: 10.3390/su13010268] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Due to the COVID-19 pandemic, the sales of fast-food businesses have dropped sharply. Customer satisfaction has always been one of the key factors for the sustainable development of enterprises. However, in the fast-food restaurant business, gaining the knowledge of customer satisfaction is one of the critical tasks. Moreover, text reviews in social media have become one of important reference sources for customers’ decisions in buying services and products. Therefore, the main purpose of this study is to explore whether customer voices from social media reviews are different during the COVID-19 outbreak and to propose a new method to reduce interpersonal contact when collecting data. A text mining scheme which includes least absolute shrinkage and selection operator (LASSO) and decision trees (DT) are presented to discover the essential factors for customers to increase their satisfaction from unstructured online customer reviews. Finally, three real world review sets were employed to validate the effectiveness of the presented text mining scheme. Experimental results can help companies to properly adapt to similar epidemic situations in the future and facilitate their sustainable development.
Collapse
|
145
|
Tusell L, Bergsma R, Gilbert H, Gianola D, Piles M. Machine Learning Prediction of Crossbred Pig Feed Efficiency and Growth Rate From Single Nucleotide Polymorphisms. Front Genet 2020; 11:567818. [PMID: 33391339 PMCID: PMC7775539 DOI: 10.3389/fgene.2020.567818] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 11/17/2020] [Indexed: 11/24/2022] Open
Abstract
This research assessed the ability of a Support Vector Machine (SVM) regression model to predict pig crossbred (CB) performance from various sources of phenotypic and genotypic information for improving crossbreeding performance at reduced genotyping cost. Data consisted of average daily gain (ADG) and residual feed intake (RFI) records and genotypes of 5,708 purebred (PB) boars and 5,007 CB pigs. Prediction models were fitted using individual PB genotypes and phenotypes (trn.1); genotypes of PB sires and average of CB records per PB sire (trn.2); and individual CB genotypes and phenotypes (trn.3). The average of CB offspring records was the trait to be predicted from PB sire’s genotype using cross-validation. Single nucleotide polymorphisms (SNPs) were ranked based on the Spearman Rank correlation with the trait. Subsets with an increasing number (from 50 to 2,000) of the most informative SNPs were used as predictor variables in SVM. Prediction performance was the median of the Spearman correlation (SC, interquartile range in brackets) between observed and predicted phenotypes in the testing set. The best predictive performances were obtained when sire phenotypic information was included in trn.1 (0.22 [0.03] for RFI with SVM and 250 SNPs, and 0.12 [0.05] for ADG with SVM and 500–1,000 SNPs) or when trn.3 was used (0.29 [0.16] with Genomic best linear unbiased prediction (GBLUP) for RFI, and 0.15 [0.09] for ADG with just 50 SNPs). Animals from the last two generations were assigned to the testing set and remaining animals to the training set. Individual’s PB own phenotype and genotype improved the prediction ability of CB offspring of young animals for ADG but not for RFI. The highest SC was 0.34 [0.21] and 0.36 [0.22] for RFI and ADG, respectively, with SVM and 50 SNPs. Predictive performance using CB data for training leads to a SC of 0.34 [0.19] with GBLUP and 0.28 [0.18] with SVM and 250 SNPs for RFI and 0.34 [0.15] with SVM and 500 SNPs for ADG. Results suggest that PB candidates could be evaluated for CB performance with SVM and low-density SNP chip panels after collecting their own RFI or ADG performances or even earlier, after being genotyped using a reference population of CB animals.
Collapse
Affiliation(s)
- Llibertat Tusell
- GenPhySE, Université de Toulouse, National Research Institute for Agriculture, Food and the Environment (INRAE), Castanet-Tolosan, France
| | - Rob Bergsma
- Topigs Norsvin Research Center, Beuningen, Netherlands
| | - Hélène Gilbert
- GenPhySE, Université de Toulouse, National Research Institute for Agriculture, Food and the Environment (INRAE), Castanet-Tolosan, France
| | - Daniel Gianola
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WL, United States.,Department of Dairy Science, University of Wisconsin-Madison, Madison, WI, United States
| | - Miriam Piles
- Animal Breeding and Genetics Program, Institute of Agriculture and Food Research and Technology (IRTA), Barcelona, Spain
| |
Collapse
|
146
|
Ben Azzouz F, Michel B, Lasla H, Gouraud W, François AF, Girka F, Lecointre T, Guérin-Charbonnel C, Juin PP, Campone M, Jézéquel P. Development of an absolute assignment predictor for triple-negative breast cancer subtyping using machine learning approaches. Comput Biol Med 2020; 129:104171. [PMID: 33316552 DOI: 10.1016/j.compbiomed.2020.104171] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 12/01/2020] [Accepted: 12/05/2020] [Indexed: 12/12/2022]
Abstract
Triple-negative breast cancer (TNBC) heterogeneity represents one of the main obstacles to precision medicine for this disease. Recent concordant transcriptomics studies have shown that TNBC could be divided into at least three subtypes with potential therapeutic implications. Although a few studies have been conducted to predict TNBC subtype using transcriptomics data, the subtyping was partially sensitive and limited by batch effect and dependence on a given dataset, which may penalize the switch to routine diagnostic testing. Therefore, we sought to build an absolute predictor (i.e., intra-patient diagnosis) based on machine learning algorithms with a limited number of probes. To that end, we started by introducing probe binary comparison for each patient (indicators). We based the predictive analysis on this transformed data. Probe selection was first involved combining both filter and wrapper methods for variable selection using cross-validation. We tested three prediction models (random forest, gradient boosting [GB], and extreme gradient boosting) using this optimal subset of indicators as inputs. Nested cross-validation consistently allowed us to choose the best model. The results showed that the fifty selected indicators highlighted the biological characteristics associated with each TNBC subtype. The GB based on this subset of indicators performs better than other models.
Collapse
Affiliation(s)
- Fadoua Ben Azzouz
- Unité de Bioinfomique, Institut de Cancérologie de L'Ouest, Bd Jacques Monod, 44805, Saint Herblain Cedex, France; SIRIC ILIAD, Nantes, Angers, France
| | - Bertrand Michel
- Unité de Bioinfomique, Institut de Cancérologie de L'Ouest, Bd Jacques Monod, 44805, Saint Herblain Cedex, France; SIRIC ILIAD, Nantes, Angers, France; Ecole Centrale de Nantes, 1 Rue de La Noë, 44300, Nantes, France; Laboratoire de Mathématiques Jean Leray, BP 92208, 2 Rue de La Houssinière, 44322, Nantes Cedex 03, France
| | - Hamza Lasla
- Unité de Bioinfomique, Institut de Cancérologie de L'Ouest, Bd Jacques Monod, 44805, Saint Herblain Cedex, France; SIRIC ILIAD, Nantes, Angers, France
| | - Wilfried Gouraud
- Unité de Bioinfomique, Institut de Cancérologie de L'Ouest, Bd Jacques Monod, 44805, Saint Herblain Cedex, France; SIRIC ILIAD, Nantes, Angers, France
| | | | - Fabien Girka
- Ecole Centrale de Nantes, 1 Rue de La Noë, 44300, Nantes, France
| | - Théo Lecointre
- Ecole Centrale de Nantes, 1 Rue de La Noë, 44300, Nantes, France
| | - Catherine Guérin-Charbonnel
- Unité de Bioinfomique, Institut de Cancérologie de L'Ouest, Bd Jacques Monod, 44805, Saint Herblain Cedex, France; SIRIC ILIAD, Nantes, Angers, France
| | - Philippe P Juin
- SIRIC ILIAD, Nantes, Angers, France; CRCINA, INSERM, CNRS, Université de Nantes, Université D'Angers, Institut de Recherche en Santé-Université de Nantes, 8 Quai Moncousu - BP 70721, 44007, Nantes Cedex 1, France
| | - Mario Campone
- SIRIC ILIAD, Nantes, Angers, France; CRCINA, INSERM, CNRS, Université de Nantes, Université D'Angers, Institut de Recherche en Santé-Université de Nantes, 8 Quai Moncousu - BP 70721, 44007, Nantes Cedex 1, France; Oncologie Médicale, Institut de Cancérologie de L'Ouest - René Gauducheau, Bd Jacques Monod, 44805, Saint Herblain Cedex, France
| | - Pascal Jézéquel
- Unité de Bioinfomique, Institut de Cancérologie de L'Ouest, Bd Jacques Monod, 44805, Saint Herblain Cedex, France; SIRIC ILIAD, Nantes, Angers, France; CRCINA, INSERM, CNRS, Université de Nantes, Université D'Angers, Institut de Recherche en Santé-Université de Nantes, 8 Quai Moncousu - BP 70721, 44007, Nantes Cedex 1, France.
| |
Collapse
|
147
|
O’Connor D, Pinto MV, Sheerin D, Tomic A, Drury RE, Channon‐Wells S, Galal U, Dold C, Robinson H, Kerridge S, Plested E, Hughes H, Stockdale L, Sadarangani M, Snape MD, Rollier CS, Levin M, Pollard AJ. Gene expression profiling reveals insights into infant immunological and febrile responses to group B meningococcal vaccine. Mol Syst Biol 2020; 16:e9888. [PMID: 33210468 PMCID: PMC7674973 DOI: 10.15252/msb.20209888] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 10/06/2020] [Accepted: 10/08/2020] [Indexed: 12/12/2022] Open
Abstract
Neisseria meningitidis is a major cause of meningitis and septicaemia. A MenB vaccine (4CMenB) was licensed by the European Medicines Agency in January 2013. Here we describe the blood transcriptome and proteome following infant immunisations with or without concomitant 4CMenB, to gain insight into the molecular mechanisms underlying post-vaccination reactogenicity and immunogenicity. Infants were randomised to receive control immunisations (PCV13 and DTaP-IPV-Hib) with or without 4CMenB at 2 and 4 months of age. Blood gene expression and plasma proteins were measured prior to, then 4 h, 24 h, 3 days or 7 days post-vaccination. 4CMenB vaccination was associated with increased expression of ENTPD7 and increased concentrations of 4 plasma proteins: CRP, G-CSF, IL-1RA and IL-6. Post-vaccination fever was associated with increased expression of SELL, involved in neutrophil recruitment. A murine model dissecting the vaccine components found the concomitant regimen to be associated with increased gene perturbation compared with 4CMenB vaccine alone with enhancement of pathways such as interleukin-3, -5 and GM-CSF signalling. Finally, we present transcriptomic profiles predictive of immunological and febrile responses following 4CMenB vaccine.
Collapse
Affiliation(s)
- Daniel O’Connor
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
| | - Marta Valente Pinto
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
| | - Dylan Sheerin
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
| | - Adriana Tomic
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
- Institute of Immunity, Transplantation and InfectionStanford University School of MedicineStanfordCAUSA
| | - Ruth E Drury
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
| | - Samuel Channon‐Wells
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
| | - Ushma Galal
- Nuffield Department of Primary Health CareClinical Trials UnitUniversity of OxfordOxfordUK
| | - Christina Dold
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
| | - Hannah Robinson
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
| | - Simon Kerridge
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
| | - Emma Plested
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
| | - Harri Hughes
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
| | - Lisa Stockdale
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
| | | | - Matthew D Snape
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
| | - Christine S Rollier
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
| | - Michael Levin
- Division of Infectious DiseasesDepartment of MedicineImperial College LondonLondonUK
| | - Andrew J Pollard
- Department of PaediatricsUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxford University Hospitals NHS Foundation TrustOxfordUK
| |
Collapse
|
148
|
Antonakoudis A, Barbosa R, Kotidis P, Kontoravdi C. The era of big data: Genome-scale modelling meets machine learning. Comput Struct Biotechnol J 2020; 18:3287-3300. [PMID: 33240470 PMCID: PMC7663219 DOI: 10.1016/j.csbj.2020.10.011] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 10/07/2020] [Accepted: 10/08/2020] [Indexed: 12/15/2022] Open
Abstract
With omics data being generated at an unprecedented rate, genome-scale modelling has become pivotal in its organisation and analysis. However, machine learning methods have been gaining ground in cases where knowledge is insufficient to represent the mechanisms underlying such data or as a means for data curation prior to attempting mechanistic modelling. We discuss the latest advances in genome-scale modelling and the development of optimisation algorithms for network and error reduction, intracellular constraining and applications to strain design. We further review applications of supervised and unsupervised machine learning methods to omics datasets from microbial and mammalian cell systems and present efforts to harness the potential of both modelling approaches through hybrid modelling.
Collapse
Affiliation(s)
| | | | | | - Cleo Kontoravdi
- Department of Chemical Engineering, Imperial College London, London SW7 2AZ, United Kingdom
| |
Collapse
|
149
|
Wang X, Zhao K, Zhou X, Street N. Predicting User Posting Activities in Online Health Communities with Deep Learning. ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS 2020. [DOI: 10.1145/3383780] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Online health communities (OHCs) represent a great source of social support for patients and their caregivers. Better predictions of user activities in OHCs can help improve user engagement and retention, which are important to manage and sustain a successful OHC. This article proposes a general framework to predict OHC user posting activities. Deep learning methods are adopted to learn from users’ temporal trajectories in both the volumes and content of posts published over time. Experiments based on data from a popular OHC for cancer survivors demonstrate that the proposed approach can improve the performance of user activity predictions. In addition, several topics of users’ posts are found to have strong impact on predicting users’ activities in the OHC.
Collapse
Affiliation(s)
| | | | - Xun Zhou
- University of Iowa, Iowa City, IA
| | | |
Collapse
|
150
|
Zhou Y, Xu X, Song L, Wang C, Guo J, Yi Z, Li W. The application of artificial intelligence and radiomics in lung cancer. PRECISION CLINICAL MEDICINE 2020; 3:214-227. [PMID: 35694416 PMCID: PMC8982538 DOI: 10.1093/pcmedi/pbaa028] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Revised: 08/13/2020] [Accepted: 08/14/2020] [Indexed: 02/05/2023] Open
Abstract
Lung cancer is one of the most leading causes of death throughout the world, and there is an urgent requirement for the precision medical management of it. Artificial intelligence (AI) consisting of numerous advanced techniques has been widely applied in the field of medical care. Meanwhile, radiomics based on traditional machine learning also does a great job in mining information through medical images. With the integration of AI and radiomics, great progress has been made in the early diagnosis, specific characterization, and prognosis of lung cancer, which has aroused attention all over the world. In this study, we give a brief review of the current application of AI and radiomics for precision medical management in lung cancer.
Collapse
Affiliation(s)
- Yaojie Zhou
- Department of Respiratory and Critical Care Medicine, West China School of Medicine, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Xiuyuan Xu
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Lujia Song
- West China School of Public Health, Sichuan University, Chengdu 610041, China
| | - Chengdi Wang
- Department of Respiratory and Critical Care Medicine, West China School of Medicine, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Jixiang Guo
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Zhang Yi
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Weimin Li
- Department of Respiratory and Critical Care Medicine, West China School of Medicine, West China Hospital, Sichuan University, Chengdu 610041, China
| |
Collapse
|