1
|
Tang X, Mo Z, Chang C, Qian X. Group-shrinkage feature selection with a spatial network for mining DNA methylation data. Comput Biol Med 2023; 154:106573. [PMID: 36706568 DOI: 10.1016/j.compbiomed.2023.106573] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 01/05/2023] [Accepted: 01/22/2023] [Indexed: 01/25/2023]
Abstract
Identifying disease-related biomarkers from high-dimensional DNA methylation data helps in reducing early screening costs and inferring pathogenesis mechanisms. Good discovery results have been achieved through spatial correlation methods of methylation sites, group-based regularization, and network constraints. However, these methods still have some key limitations as they cannot exclude isolated differential sites and only consider adjacent site ordering. Therefore, we propose a group-shrinkage feature selection algorithm to encourage the selection of clustered sites and discourage the selection of isolated differential sites. Specifically, a network-guided group-shrinkage strategy is developed to penalize weakly-correlated isolated methylation sites through a network structure constraint. The spatial network is constructed based on spatial correlation information of DNA methylation sites, where this information accounts for the uneven site distribution. The experimental simulations and applications demonstrated that the proposed method outperforms the advanced regularization methods, especially in rejecting isolated methylation sites; hence this study provides an efficient and clinical-valuable method for biomarker candidate discovery in DNA methylation data. Additionally, the proposed method exhibits enhanced reliability due to introducing biological prior knowledge into a regularization-based feature selection framework and could promote more research in the integration between biological prior knowledge and classical feature selection methods, thus facilitating their clinical application. Our source codes will be released at https://github.com/SJTUBME-QianLab/Group-shrinkage-Spatial-Network once this manuscript is accepted for publication.
Collapse
Affiliation(s)
- Xinlu Tang
- Medical Image and Health Informatics Lab, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| | - Zhanfeng Mo
- School of Computer Science and Engineering, Nanyang Technological University, Singapore.
| | - Cheng Chang
- Department of Nuclear Medicine, Shanghai, Chest Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200030, China.
| | - Xiaohua Qian
- Medical Image and Health Informatics Lab, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
2
|
Yousef M, Ozdemir F, Jaber A, Allmer J, Bakir-Gungor B. PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach. BMC Bioinformatics 2023; 24:60. [PMID: 36823571 PMCID: PMC9947447 DOI: 10.1186/s12859-023-05187-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 02/14/2023] [Indexed: 02/25/2023] Open
Abstract
BACKGROUND Cell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases. RESULTS PriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research. CONCLUSIONS PriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine.
Collapse
Affiliation(s)
- Malik Yousef
- Department of Information Systems, Zefat Academic College, 13206, Zefat, Israel. .,Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel.
| | - Fatma Ozdemir
- grid.440414.10000 0004 0558 2628Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey ,grid.5570.70000 0004 0490 981XUniversity Institute of Digital Communication Systems, Ruhr-University, Bochum, Germany
| | - Amhar Jaber
- grid.440414.10000 0004 0558 2628Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Jens Allmer
- grid.454318.f0000 0004 0431 5034Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Hochschule Ruhr West, University of Applied Sciences, Mülheim an der Ruhr, Germany
| | - Burcu Bakir-Gungor
- grid.440414.10000 0004 0558 2628Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
| |
Collapse
|
3
|
Sancho-Albero M, Ayaz N, Sebastian V, Chirizzi C, Encinas-Gimenez M, Neri G, Chaabane L, Luján L, Martin-Duque P, Metrangolo P, Santamaría J, Baldelli Bombelli F. Superfluorinated Extracellular Vesicles for In Vivo Imaging by 19F-MRI. ACS APPLIED MATERIALS & INTERFACES 2023; 15:8974-8985. [PMID: 36780137 PMCID: PMC9951174 DOI: 10.1021/acsami.2c20566] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 01/25/2023] [Indexed: 06/01/2023]
Abstract
Extracellular vesicles (EVs) play a crucial role in cell-to-cell communication and have great potential as efficient delivery vectors. However, a better understanding of EV in vivo behavior is hampered by the limitations of current imaging tools. In addition, chemical labels present the risk of altering the EV membrane features and, thus, in vivo behavior. 19F-MRI is a safe bioimaging technique providing selective images of exogenous probes. Here, we present the first example of fluorinated EVs containing PERFECTA, a branched molecule with 36 magnetically equivalent 19F atoms. A PERFECTA emulsion is given to the cells, and PERFECTA-containing EVs are naturally produced. PERFECTA-EVs maintain the physicochemical features, morphology, and biological fingerprint as native EVs but exhibit an intense 19F-NMR signal and excellent 19F relaxation times. In vivo 19F-MRI and tumor-targeting capabilities of stem cell-derived PERFECTA-EVs are also proved. We propose PERFECTA-EVs as promising biohybrids for imaging biodistribution and delivery of EVs throughout the body.
Collapse
Affiliation(s)
- María Sancho-Albero
- Instituto
de Nanociencia y Materiales de Aragón (INMA), CSIC-Universidad de Zaragoza, 50009 Zaragoza, Spain
- Department
of Chemical Engineering and Environmental Technologies, University of Zaragoza, 50009 Zaragoza, Spain
- Networking
Research Center on Bioengineering Biomaterials and Nanomedicine (CIBER-BBN), 28029 Madrid, Spain
| | - Nazeeha Ayaz
- Laboratory
of Supramolecular and Bio-Nanomaterials (SupraBioNano Lab), Department
of Chemistry, Materials and Chemical Engineering, “Giulio Natta”, Politecnico di Milano, 20131 Milan, Italy
| | - Victor Sebastian
- Instituto
de Nanociencia y Materiales de Aragón (INMA), CSIC-Universidad de Zaragoza, 50009 Zaragoza, Spain
- Department
of Chemical Engineering and Environmental Technologies, University of Zaragoza, 50009 Zaragoza, Spain
- Networking
Research Center on Bioengineering Biomaterials and Nanomedicine (CIBER-BBN), 28029 Madrid, Spain
| | - Cristina Chirizzi
- Laboratory
of Supramolecular and Bio-Nanomaterials (SupraBioNano Lab), Department
of Chemistry, Materials and Chemical Engineering, “Giulio Natta”, Politecnico di Milano, 20131 Milan, Italy
- Experimental
Neurology (INSPE) and Experimental Imaging Center (CIS), Neuroscience
Division, IRCCS Ospedale San Raffaele, 20132 Milan, Italy
| | - Miguel Encinas-Gimenez
- Instituto
de Nanociencia y Materiales de Aragón (INMA), CSIC-Universidad de Zaragoza, 50009 Zaragoza, Spain
- Department
of Chemical Engineering and Environmental Technologies, University of Zaragoza, 50009 Zaragoza, Spain
- Networking
Research Center on Bioengineering Biomaterials and Nanomedicine (CIBER-BBN), 28029 Madrid, Spain
| | - Giulia Neri
- Laboratory
of Supramolecular and Bio-Nanomaterials (SupraBioNano Lab), Department
of Chemistry, Materials and Chemical Engineering, “Giulio Natta”, Politecnico di Milano, 20131 Milan, Italy
| | - Linda Chaabane
- Experimental
Neurology (INSPE) and Experimental Imaging Center (CIS), Neuroscience
Division, IRCCS Ospedale San Raffaele, 20132 Milan, Italy
| | - Lluís Luján
- Department
of Animal Pathology, University of Zaragoza, 50009 Zaragoza, Spain
- Instituto
Universitario de Investigación Mixto Agroalimentario de Aragón
(IA2), University of Zaragoza, 50009 Zaragoza, Spain
| | - Pilar Martin-Duque
- Networking
Research Center on Bioengineering Biomaterials and Nanomedicine (CIBER-BBN), 28029 Madrid, Spain
- Instituto
Aragonés de Ciencias de la Salud (IACS) /IIS Aragón, Zaragoza 5009, Spain
- Fundación
Araid, 50018 Zaragoza, Spain
| | - Pierangelo Metrangolo
- Laboratory
of Supramolecular and Bio-Nanomaterials (SupraBioNano Lab), Department
of Chemistry, Materials and Chemical Engineering, “Giulio Natta”, Politecnico di Milano, 20131 Milan, Italy
| | - Jesús Santamaría
- Instituto
de Nanociencia y Materiales de Aragón (INMA), CSIC-Universidad de Zaragoza, 50009 Zaragoza, Spain
- Department
of Chemical Engineering and Environmental Technologies, University of Zaragoza, 50009 Zaragoza, Spain
- Networking
Research Center on Bioengineering Biomaterials and Nanomedicine (CIBER-BBN), 28029 Madrid, Spain
| | - Francesca Baldelli Bombelli
- Laboratory
of Supramolecular and Bio-Nanomaterials (SupraBioNano Lab), Department
of Chemistry, Materials and Chemical Engineering, “Giulio Natta”, Politecnico di Milano, 20131 Milan, Italy
| |
Collapse
|
4
|
Buch G, Schulz A, Schmidtmann I, Strauch K, Wild PS. A systematic review and evaluation of statistical methods for group variable selection. Stat Med 2023; 42:331-352. [PMID: 36546512 DOI: 10.1002/sim.9620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/27/2022] [Accepted: 11/22/2022] [Indexed: 12/24/2022]
Abstract
This review condenses the knowledge on variable selection methods implemented in R and appropriate for datasets with grouped features. The focus is on regularized regressions identified through a systematic review of the literature, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. A total of 14 methods are discussed, most of which use penalty terms to perform group variable selection. Depending on how the methods account for the group structure, they can be classified into knowledge and data-driven approaches. The first encompass group-level and bi-level selection methods, while two-step approaches and collinearity-tolerant methods constitute the second category. The identified methods are briefly explained and their performance compared in a simulation study. This comparison demonstrated that group-level selection methods, such as the group minimax concave penalty, are superior to other methods in selecting relevant variable groups but are inferior in identifying important individual variables in scenarios where not all variables in the groups are predictive. This can be better achieved by bi-level selection methods such as group bridge. Two-step and collinearity-tolerant approaches such as elastic net and ordered homogeneity pursuit least absolute shrinkage and selection operator are inferior to knowledge-driven methods but provide results without requiring prior knowledge. Possible applications in proteomics are considered, leading to suggestions on which method to use depending on existing prior knowledge and research question.
Collapse
Affiliation(s)
- Gregor Buch
- Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany.,German Center for Cardiovascular Research (DZHK), partner site Rhine-Main, Mainz, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Andreas Schulz
- Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Irene Schmidtmann
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Konstantin Strauch
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Philipp S Wild
- Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany.,German Center for Cardiovascular Research (DZHK), partner site Rhine-Main, Mainz, Germany.,Clinical Epidemiology and Systems Medicine, Center for Thrombosis and Hemostasis, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany.,Institute of Molecular Biology (IMB), Mainz, Germany
| |
Collapse
|
5
|
Zhao Q, Bai J, Chen Y, Liu X, Zhao S, Ling G, Jia S, Zhai F, Xiang R. An optimized herbal combination for the treatment of liver fibrosis: Hub genes, bioactive ingredients, and molecular mechanisms. JOURNAL OF ETHNOPHARMACOLOGY 2022; 297:115567. [PMID: 35870684 DOI: 10.1016/j.jep.2022.115567] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Revised: 06/30/2022] [Accepted: 07/15/2022] [Indexed: 06/15/2023]
Abstract
ETHNOPHARMACOLOGICAL RELEVANCE Liver fibrosis is a chronic liver disease that can lead to cirrhosis, liver failure, and hepatocellular carcinoma, and it is associated with long-term adverse outcomes and mortality. As a primary resource for complementary and alternative medicine, traditional Chinese medicine (TCM) has accumulated a large number of effective formulas for the treatment of liver fibrosis in clinical practice. However, studies on how to systematically optimize TCM formulas are still lacking. AIM OF THE REVIEW To provide a methodological reference for the systematic optimization of TCM formulae against liver fibrosis and explored the underlying molecular mechanisms; To provide an efficient method for searching for lead compounds from natural sources and developing from herbal medicines; To enable clinicians and patients to make more reasonable choices and promote the effective treatment toward those patients with liver fibrosis. MATERIALS AND METHODS TCM formulas related to treating liver fibrosis were collected from the Web of Science, PubMed, the China National Knowledge Infrastructure (CNKI), Wan Fang, and the Chinese Scientific Journals Database (VIP). Furthermore, the TCM compatibility patterns were mined using association analysis. The core TCM combinations were found by designing an optimized formulas algorithm. Finally, the hub target proteins, potential molecular mechanisms, and active compounds were explored through integrative pharmacology and docking-based inverse virtual screening (IVS) approaches. RESULTS We found that the herbs for reinforcing deficiency, activating blood, removing blood stasis, and clearing heat were the basis of TCM formulae patterns. Furthermore, the combination of Salviae Miltiorrhizae (Salvia miltiorrhiza Bunge; Chinese salvia/Danshen), Astragali Radix (Astragalus membranaceus (Fisch.) Bunge; Astragalus/Huangqi), and Radix Bupleuri (Bupleurum chinense DC.; Bupleurum/Chaihu) was identified as core groups. A total of six targets (TNF, STAT3, EGFR, IL2, ICAM1, PTGS2) play a pivotal role in TCM-mediated liver fibrosis inhibition. (-)-Cryptotanshinone, Tanshinaldehyde, Ononin, Thymol, Daidzein, and Formononetin were identified as active compounds in TCM. And mechanistically, TCM could affect the development of liver fibrosis by regulating inflammation, immunity, angiogenesis, antioxidants, and involvement in TNF, MicroRNAs, Jak-STAT, NF-kappa B, and C-type lectin receptors (CLRs) signaling pathways. Molecular docking results showed that key components had good potential to bind to the target genes. CONCLUSION In summary, this study provides a methodological reference for the systematic optimization of TCM formulae and exploration of underlying molecular mechanisms.
Collapse
Affiliation(s)
- Qianqian Zhao
- Faculty of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, Shenyang, 110016, China.
| | - Jinwei Bai
- School of Medical Equipment, Shenyang Pharmaceutical University, Shenyang, 110016, China.
| | - Yiwei Chen
- Faculty of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, Shenyang, 110016, China.
| | - Xin Liu
- Faculty of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, Shenyang, 110016, China.
| | - Shangfeng Zhao
- Faculty of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, Shenyang, 110016, China.
| | - Guixia Ling
- School of Medical Equipment, Shenyang Pharmaceutical University, Shenyang, 110016, China.
| | - Shubing Jia
- Faculty of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, Shenyang, 110016, China.
| | - Fei Zhai
- School of Medical Equipment, Shenyang Pharmaceutical University, Shenyang, 110016, China.
| | - Rongwu Xiang
- School of Medical Equipment, Shenyang Pharmaceutical University, Shenyang, 110016, China; Liaoning Professional Technology Innovation Center on Medical Big Data and Artificial Intelligence, Shenyang, 110016, China.
| |
Collapse
|
6
|
DHULI KRISTJANA, BONETTI GABRIELE, ANPILOGOV KYRYLO, HERBST KARENL, CONNELLY STEPHENTHADDEUS, BELLINATO FRANCESCO, GISONDI PAOLO, BERTELLI MATTEO. Validating methods for testing natural molecules on molecular pathways of interest in silico and in vitro. JOURNAL OF PREVENTIVE MEDICINE AND HYGIENE 2022; 63:E279-E288. [PMID: 36479497 PMCID: PMC9710400 DOI: 10.15167/2421-4248/jpmh2022.63.2s3.2770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Differentially expressed genes can serve as drug targets and are used to predict drug response and disease progression. In silico drug analysis based on the expression of these genetic biomarkers allows the detection of putative therapeutic agents, which could be used to reverse a pathological gene expression signature. Indeed, a set of bioinformatics tools can increase the accuracy of drug discovery, helping in biomarker identification. Once a drug target is identified, in vitro cell line models of disease are used to evaluate and validate the therapeutic potential of putative drugs and novel natural molecules. This study describes the development of efficacious PCR primers that can be used to identify gene expression of specific genetic pathways, which can lead to the identification of natural molecules as therapeutic agents in specific molecular pathways. For this study, genes involved in health conditions and processes were considered. In particular, the expression of genes involved in obesity, xenobiotics metabolism, endocannabinoid pathway, leukotriene B4 metabolism and signaling, inflammation, endocytosis, hypoxia, lifespan, and neurotrophins were evaluated. Exploiting the expression of specific genes in different cell lines can be useful in in vitro to evaluate the therapeutic effects of small natural molecules.
Collapse
Affiliation(s)
- KRISTJANA DHULI
- MAGI’S LAB, Rovereto (TN), Italy
- Correspondence: Kristjana Dhuli, MAGI’S LAB, Rovereto (TN), 38068, Italy. E-mail:
| | | | | | - KAREN L. HERBST
- Total Lipedema Care, Beverly Hills California and Tucson Arizona, USA
| | - STEPHEN THADDEUS CONNELLY
- San Francisco Veterans Affairs Health Care System, Department of Oral & Maxillofacial Surgery, University of California, San Francisco, CA, USA7
| | - FRANCESCO BELLINATO
- Section of Dermatology and Venereology, Department of Medicine, University of Verona, Verona, Italy
| | - PAOLO GISONDI
- Section of Dermatology and Venereology, Department of Medicine, University of Verona, Verona, Italy
| | - MATTEO BERTELLI
- MAGI’S LAB, Rovereto (TN), Italy
- MAGI EUREGIO, Bolzano, BZ, Italy
- MAGISNAT, Peachtree Corners (GA), USA
| |
Collapse
|
7
|
Evaluation of Feature Selection Methods for Mammographic Breast Cancer Diagnosis in a Unified Framework. BIOMED RESEARCH INTERNATIONAL 2021; 2021:6079163. [PMID: 34646886 PMCID: PMC8505067 DOI: 10.1155/2021/6079163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 07/10/2020] [Accepted: 07/18/2020] [Indexed: 11/17/2022]
Abstract
Over recent years, feature selection (FS) has gained more attention in intelligent diagnosis. This study is aimed at evaluating FS methods in a unified framework for mammographic breast cancer diagnosis. After FS methods generated rank lists according to feature importance, the framework added features incrementally as the input of random forest which performed as the classifier for breast lesion classification. In this study, 10 FS methods were evaluated and the digital database for screening mammography (1104 benign and 980 malignant lesions) was analyzed. The classification performance was quantified with the area under the curve (AUC), and accuracy, sensitivity, and specificity were also considered. Experimental results suggested that both infinite latent FS method (AUC, 0.866 ± 0.028) and RELIEFF (AUC, 0.855 ± 0.020) achieved good prediction (AUC ≥ 0.85) when 6 features were used, followed by correlation-based FS method (AUC, 0.867 ± 0.023) using 7 features and WILCOXON (AUC, 0.887 ± 0.019) using 8 features. The reliability of the diagnosis models was also verified, indicating that correlation-based FS method was generally superior over other methods. Identification of discriminative features among high-throughput ones remains an unavoidable challenge in intelligent diagnosis, and extra efforts should be made toward accurate and efficient feature selection.
Collapse
|
8
|
Raja Sree S, Kunthavai A. Hubness weighted svm ensemble for prediction of breast cancer subtypes. Technol Health Care 2021; 30:565-578. [PMID: 34397436 DOI: 10.3233/thc-212825] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND Breast cancer is a major disease causing panic among women worldwide. Since gene mutations are the root cause for cancer development, analyzing gene expressions can give more insights into various phenotype of cancer treatments. Breast Cancer subtype prediction from gene expression data can provide more information for cancer treatment decisions. OBJECTIVE Gene expressions are complex for analysis due to its high dimensional nature. Machine learning algorithms such as k-Nearest Neighbors, Support Vector Machine (SVM) and Random Forest are used with selection of features for prediction of breast cancer subtypes. Prediction accuracy of the existing methods are affected due to high dimensional nature of gene expressions. The objective of the work is to propose an efficient algorithm for the prediction of breast cancer subtypes from gene expression. METHODS For subtype prediction, a novel Hubness Weighted Support Vector machine algorithm (HWSVM) using bad hubness score as a weight measure to handle the outliers in the data has been proposed. Based on the various subtypes, features are projected into seven different feature sets and Ensemble based Hubness Aware Weighted Support Vector Machine (HWSVMEns) is implemented for breast cancer subtype prediction. RESULTS The proposed algorithms have been compared with the classical SVM and other traditional algorithms such as Random Forest, k-Nearest Neighbor algorithms and also with various gene selection methods. CONCLUSIONS Experimental results show that the proposed HWSVM outperforms other algorithms in terms of accuracy, precision, recall and F1 score due to the hubness weightage scheme and the ensemble approach. The experiments have shown an average accuracy of 92% across various gene expression datasets.
Collapse
Affiliation(s)
- S Raja Sree
- Department of Information Technology, Coimbatore Institute of Technology, Coimbatore, India
| | - A Kunthavai
- Department of Computer Science and Engineering, Coimbatore Institute of Technology, Coimbatore, India
| |
Collapse
|
9
|
|
10
|
Tian S, Wang C, Suarez-Farinas M. GEE-TGDR: A Longitudinal Feature Selection Algorithm and Its Application to lncRNA Expression Profiles for Psoriasis Patients Treated with Immune Therapies. BIOMED RESEARCH INTERNATIONAL 2021; 2021:8862895. [PMID: 33928163 PMCID: PMC8053058 DOI: 10.1155/2021/8862895] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 03/06/2021] [Accepted: 03/29/2021] [Indexed: 01/06/2023]
Abstract
With the fast evolution of high-throughput technology, longitudinal gene expression experiments have become affordable and increasingly common in biomedical fields. Generalized estimating equation (GEE) approach is a widely used statistical method for the analysis of longitudinal data. Feature selection is imperative in longitudinal omics data analysis. Among a variety of existing feature selection methods, an embedded method-threshold gradient descent regularization (TGDR)-stands out due to its excellent characteristics. An alignment of GEE with TGDR is a promising area for the purpose of identifying relevant markers that can explain the dynamic changes of outcomes across time. We proposed a new novel feature selection algorithm for longitudinal outcomes-GEE-TGDR. In the GEE-TGDR method, the corresponding quasilikelihood function of a GEE model is the objective function to be optimized, and the optimization and feature selection are accomplished by the TGDR method. Long noncoding RNAs (lncRNAs) are posttranscriptional and epigenetic regulators and have lower expression levels and are more tissue-specific compared with protein-coding genes. So far, the implication of lncRNAs in psoriasis remains largely unexplored and poorly understood even though some evidence in the literature supports that lncRNAs and psoriasis are highly associated. In this study, we applied the GEE-TGDR method to a lncRNA expression dataset that examined the response of psoriasis patients to immune treatments. As a result, a list including 10 relevant lncRNAs was identified with a predictive accuracy of 70% that is superior to the accuracies achieved by two competitive methods and meaningful biological interpretation. A widespread application of the GEE-TGDR method in omics longitudinal data analysis is anticipated.
Collapse
Affiliation(s)
- Suyan Tian
- Division of Clinical Division, First Hospital of Jilin University, Changchun, Jilin, China 130021
| | - Chi Wang
- Department of Internal Medicine, College of Medicine, University of Kentucky, 800 Rose St., Lexington, KY 40536, USA
- Markey Cancer Center, University of Kentucky, 800 Rose St., Lexington, KY 40536, USA
| | - Mayte Suarez-Farinas
- Department of Population Health Science & Policy, The Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA
- Department of Genetics and Genomics, The Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA
| |
Collapse
|
11
|
Wu X, Wang L, Feng F, Tian S. Weighted gene expression profiles identify diagnostic and prognostic genes for lung adenocarcinoma and squamous cell carcinoma. J Int Med Res 2019; 48:300060519893837. [PMID: 31854219 PMCID: PMC7607763 DOI: 10.1177/0300060519893837] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
OBJECTIVE To construct a diagnostic signature to distinguish lung adenocarcinoma from lung squamous cell carcinoma and a prognostic signature to predict the risk of death for patients with nonsmall-cell lung cancer, with satisfactory predictive performances, good stabilities, small sizes and meaningful biological implications. METHODS Pathway-based feature selection methods utilize pathway information as a priori to provide insightful clues on potential biomarkers from the biological perspective, and such incorporation may be realized by adding weights to test statistics or gene expression values. In this study, weighted gene expression profiles were generated using the GeneRank method and then the LASSO method was used to identify discriminative and prognostic genes. RESULTS The five-gene diagnostic signature including keratin 5 (KRT5), mucin 1 (MUC1), triggering receptor expressed on myeloid cells 1 (TREM1), complement C3 (C3) and transmembrane serine protease 2 (TMPRSS2) achieved a predictive error of 12.8% and a Generalized Brier Score of 0.108, while the five-gene prognostic signature including alcohol dehydrogenase 1C (class I), gamma polypeptide (ADH1C), alpha-2-glycoprotein 1, zinc-binding (AZGP1), clusterin (CLU), cyclin dependent kinase 1 (CDK1) and paternally expressed 10 (PEG10) obtained a log-rank P-value of 0.03 and a C-index of 0.622 on the test set. CONCLUSIONS Besides good predictive capacity, model parsimony and stability, the identified diagnostic and prognostic genes were highly relevant to lung cancer. A large-sized prospective study to explore the utilization of these genes in a clinical setting is warranted.
Collapse
Affiliation(s)
- Xing Wu
- Department of Teaching, The First Hospital of Jilin University, Changchun, Jilin Province, China
| | - Linlin Wang
- Department of Ultrasound, China-Japan Union Hospital of Jilin University, Changchun, Jilin Province, China
| | - Fan Feng
- School of Mathematics, Jilin University, Changchun, Jilin Province, China
| | - Suyan Tian
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, Jilin Province, China
| |
Collapse
|