1
|
Longkumer I, Mazumder DH. A novel parallel feature rank aggregation algorithm for gene selection applied to microarray data classification. Comput Biol Chem 2024; 112:108182. [PMID: 39197395 DOI: 10.1016/j.compbiolchem.2024.108182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 07/07/2024] [Accepted: 08/22/2024] [Indexed: 09/01/2024]
Abstract
Microarray data often comprises numerous genes, yet not all genes are relevant for predicting cancer. Feature selection becomes a crucial step to reduce the high dimensionality in these kinds of data. While no single feature selection method consistently outperforms others across diverse domains, the combination of multiple feature selectors or rankers tends to produce more effective results compared to relying on a single ranker alone. However, this approach can be computationally expensive, particularly when handling a large quantity of features. Hence, this paper presents a parallel feature rank aggregation that utilizes borda count as the rank aggregator. The concept of vertically partitioning the data along feature space was adapted to ease the parallel execution of the aggregation task. Features were selected based on the final aggregated rank list, and their classification performances were evaluated. The model's execution time was also observed across multiple worker nodes of the cluster. The experiment was conducted on six benchmark microarray datasets. The results show the capability of the proposed distributed framework compared to the sequential version in all the cases. It also illustrated the improved accuracy performance of the proposed method and its ability to select a minimal number of genes.
Collapse
Affiliation(s)
- Imtisenla Longkumer
- National Institute of Technology Nagaland, Chumukedima, Dimapur, Nagaland 797103, India
| | | |
Collapse
|
2
|
Thimoteo RRC, Neto PN, Costa DSS, da Mota Ramalho Costa F, Brito DC, Costa PRR, de Almeida Simão T, Dias AG, Justo G. Microarray data analysis of antileukemic action of Cinnamoylated benzaldehyde LQB-461 in Jurkat cell line. Mol Biol Rep 2024; 51:187. [PMID: 38270684 DOI: 10.1007/s11033-023-09030-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/28/2023] [Indexed: 01/26/2024]
Abstract
BACKGROUND Leukemias stand out for being the main type of childhood cancer in the world. Current treatments have strong side effects for patients, and there is still a high rate of development of resistance to multidrug therapy. Previously, our research group developed a structure-activity study with novel synthetic molecules analogous to LQB-278, described as an essential molecule with in vitro antileukemic action. Among these analogs, LQB-461 stood out, presenting more significant antileukemic action compared to its derivative LQB-278, with cytostatic and cytotoxicity effect by apoptosis, inducing caspase-3, and increased sub-G1 phase on cell cycle analysis. METHODS AND RESULTS Deepening the study of the mechanism of action of LQB-461 in Jurkat cells in vitro, a microarray assay was carried out, which confirmed the importance of the apoptosis pathway in the LQB-461 activity. Through real-time PCR, we validated an increased expression of CDKN1A and BAX genes, essential mediators of the apoptosis intrinsic pathway. Through the extrinsic apoptosis pathway, we found an increased expression of the Fas receptor by flow cytometry, showing the presence of a more sensitive population and another more resistant to death. Considering the importance of autophagy in cellular resistance, it was demonstrated by western blotting that LQB-461 decreased LC-3 protein expression, an autophagic marker. CONCLUSIONS These results suggest that this synthetic molecule LQB-461 induces cell death by apoptosis in Jurkat cells through intrinsic and extrinsic pathways and inhibits autophagy, overcoming some mechanisms of cell resistance related to this process, which differentiates LQB-461 of other drugs used for the leukemia treatment.
Collapse
Affiliation(s)
| | | | - Debora S S Costa
- Instituto de Pesquisas Biomédicas - HNMD Marinha do Brazil, Rio de Janeiro, RJ, Brazil
| | | | | | - Paulo R R Costa
- Laboratório de Química Bioorgânica, UFRJ, Rio de Janeiro, RJ, Brazil
| | | | - Ayres G Dias
- Departamento de Química Orgânica, UERJ, Rio de Janeiro, RJ, Brazil
| | - Graça Justo
- Departamento de Bioquímica, UERJ, Rio de Janeiro, RJ, Brazil.
| |
Collapse
|
3
|
Guo X, Hu J, Yu H, Wang M, Yang B. A new population initialization of metaheuristic algorithms based on hybrid fuzzy rough set for high-dimensional gene data feature selection. Comput Biol Med 2023; 166:107538. [PMID: 37857136 DOI: 10.1016/j.compbiomed.2023.107538] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/06/2023] [Accepted: 09/28/2023] [Indexed: 10/21/2023]
Abstract
In the realm of modern medicine and biology, vast amounts of genetic data with high complexity are available. However, dealing with such high-dimensional data poses challenges due to increased processing complexity and size. Identifying critical genes to reduce data dimensionality is essential. The filter-wrapper hybrid method is a commonly used approach in feature selection. Most of these methods employ filters such as MRMR and ReliefF, but the performance of these simple filters is limited. Rough set methods, on the other hand, are a type of filter method that outperforms traditional filters. Simultaneously, many studies have pointed out the crucial importance of good initialization strategies for the performance of the metaheuristic algorithm (a type of wrapper-based method). Combining these two points, this paper proposes a novel filter-wrapper hybrid method for high-dimensional feature selection. To be specific, we utilize the variant of bWOA (binary Whale Optimization Algorithm) based on Hybrid Fuzzy Rough Set to perform attribute reduction, and the reduced attributes are used as prior knowledge to initialize the population. We then employ metaheuristics for further feature selection based on this initialized population. We conducted experiments using five different algorithms on 14 UCI datasets. The experiment results show that after applying the initialization method proposed in this article, the performance of five enhanced algorithms, has shown significant improvement. Particularly, the improved bMFO using our initialization method: fuzzy_bMFO outperformed six currently advanced algorithms, indicating that our initialization method for metaheuristic algorithms is suitable for high-dimensional feature selection tasks.
Collapse
Affiliation(s)
- Xuanming Guo
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| | - Jiao Hu
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| | - Helong Yu
- College of Information Technology, Jilin Agricultural University, Changchun, 130118, China.
| | - Mingjing Wang
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, 325000, China.
| | - Bo Yang
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| |
Collapse
|
4
|
O’Connor LM, O’Connor BA, Zeng J, Lo CH. Data Mining of Microarray Datasets in Translational Neuroscience. Brain Sci 2023; 13:1318. [PMID: 37759919 PMCID: PMC10527016 DOI: 10.3390/brainsci13091318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/04/2023] [Accepted: 09/10/2023] [Indexed: 09/29/2023] Open
Abstract
Data mining involves the computational analysis of a plethora of publicly available datasets to generate new hypotheses that can be further validated by experiments for the improved understanding of the pathogenesis of neurodegenerative diseases. Although the number of sequencing datasets is on the rise, microarray analysis conducted on diverse biological samples represent a large collection of datasets with multiple web-based programs that enable efficient and convenient data analysis. In this review, we first discuss the selection of biological samples associated with neurological disorders, and the possibility of a combination of datasets, from various types of samples, to conduct an integrated analysis in order to achieve a holistic understanding of the alterations in the examined biological system. We then summarize key approaches and studies that have made use of the data mining of microarray datasets to obtain insights into translational neuroscience applications, including biomarker discovery, therapeutic development, and the elucidation of the pathogenic mechanisms of neurodegenerative diseases. We further discuss the gap to be bridged between microarray and sequencing studies to improve the utilization and combination of different types of datasets, together with experimental validation, for more comprehensive analyses. We conclude by providing future perspectives on integrating multi-omics, to advance precision phenotyping and personalized medicine for neurodegenerative diseases.
Collapse
Affiliation(s)
- Lance M. O’Connor
- College of Biological Sciences, University of Minnesota, Minneapolis, MN 55455, USA;
| | - Blake A. O’Connor
- School of Pharmacy, University of Wisconsin, Madison, WI 53705, USA;
| | - Jialiu Zeng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 308232, Singapore;
| | - Chih Hung Lo
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 308232, Singapore;
| |
Collapse
|
5
|
Xu C, Zhang R, Duan M, Zhou Y, Bao J, Lu H, Wang J, Hu M, Hu Z, Zhou F, Zhu W. A polygenic stacking classifier revealed the complicated platelet transcriptomic landscape of adult immune thrombocytopenia. MOLECULAR THERAPY - NUCLEIC ACIDS 2022; 28:477-487. [PMID: 35505964 PMCID: PMC9046129 DOI: 10.1016/j.omtn.2022.04.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 04/01/2022] [Indexed: 01/19/2023]
Abstract
Immune thrombocytopenia (ITP) is an autoimmune disease with the typical symptom of a low platelet count in blood. ITP demonstrated age and sex biases in both occurrences and prognosis, and adult ITP was mainly induced by the living environments. The current diagnosis guideline lacks the integration of molecular heterogenicity. This study recruited the largest cohort of platelet transcriptome samples. A comprehensive procedure of feature selection, feature engineering, and stacking classification was carried out to detect the ITP biomarkers using RNA sequencing (RNA-seq) transcriptomes. The 40 detected biomarkers were loaded to train the final ITP detection model, with an overall accuracy 0.974. The biomarkers suggested that ITP onset may be associated with various transcribed components, including protein-coding genes, long intergenic non-coding RNA (lincRNA) genes, and pseudogenes with apparent transcriptions. The delivered ITP detection model may also be utilized as a complementary ITP diagnosis tool. The code and the example dataset is freely available on http://www.healthinformaticslab.org/supp/resources.php
Collapse
Affiliation(s)
- Chengfeng Xu
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Ruochi Zhang
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Meiyu Duan
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Yongming Zhou
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Jizhang Bao
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Hao Lu
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Jie Wang
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Minghui Hu
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Zhaoyang Hu
- Fun-Med Pharmaceutical Technology (Shanghai) Co., Ltd., RM. A310, 115 Xinjunhuan Road, Minhang District, Shanghai 201100, China
- Corresponding author Zhaoyang Hu, PhD, Fengneng Pharmaceutical Technology (Shanghai) Co., Ltd., RM. A310, 115 Xinjunhuan Road, Minhang District, Shanghai 201100, China.
| | - Fengfeng Zhou
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- Corresponding author Fengfeng Zhou, PhD, College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China.
| | - Wenwei Zhu
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
- Corresponding author Wenwei Zhu, PhD, Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China.
| |
Collapse
|
6
|
Riaz U, Razzaq FA, Hu S, Valdés-Sosa PA. Stepwise Covariance-Free Common Principal Components (CF-CPC) With an Application to Neuroscience. Front Neurosci 2021; 15:750290. [PMID: 34867161 PMCID: PMC8636064 DOI: 10.3389/fnins.2021.750290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 10/15/2021] [Indexed: 11/30/2022] Open
Abstract
Finding the common principal component (CPC) for ultra-high dimensional data is a multivariate technique used to discover the latent structure of covariance matrices of shared variables measured in two or more k conditions. Common eigenvectors are assumed for the covariance matrix of all conditions, only the eigenvalues being specific to each condition. Stepwise CPC computes a limited number of these CPCs, as the name indicates, sequentially and is, therefore, less time-consuming. This method becomes unfeasible when the number of variables p is ultra-high since storing k covariance matrices requires O(k p 2) memory. Many dimensionality reduction algorithms have been improved to avoid explicit covariance calculation and storage (covariance-free). Here we propose a covariance-free stepwise CPC, which only requires O(k n) memory, where n is the total number of examples. Thus for n < < p, the new algorithm shows apparent advantages. It computes components quickly, with low consumption of machine resources. We validate our method CFCPC with the classical Iris data. We then show that CFCPC allows extracting the shared anatomical structure of EEG and MEG source spectra across a frequency range of 0.01-40 Hz.
Collapse
Affiliation(s)
- Usama Riaz
- The Clinical Hospital of Chengdu Brain Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fuleah A. Razzaq
- The Clinical Hospital of Chengdu Brain Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Shiang Hu
- Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Hefei, China
| | - Pedro A. Valdés-Sosa
- The Clinical Hospital of Chengdu Brain Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Cuban Neuroscience Center, Havana, Cuba
| |
Collapse
|
7
|
Liang C, Raza SHA, Naqvi MAR, Feng Y, Khan R, Mohammedsaleh ZM, Shater AF, Al-Ahmadi BM, Saleh FM, Bilal MA, Zan L. Construction of Adipogenic ceRNA Network Based on lncRNA Expression Profile of Adipogenic Differentiation of Human MSC Cells. Biochem Genet 2021; 60:543-557. [PMID: 34302581 DOI: 10.1007/s10528-021-10115-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 07/12/2021] [Indexed: 12/15/2022]
Abstract
The Long non-coding RNA (lncRNA) expression profile data of ten samples including human Mesenchymal Stem Cell (MSC) adipogenic differentiation 0, 3, and 6 days from the GEO database, and then perform gene ID conversion, BLAST comparison, and annotation marking. Finally, group A (treatment group on day 3 of differentiation and control group on day 0 of differentiation) obtained a total of 1180 mRNA and 185 lncRNA; group B (treatment group on day 6 of differentiation and control group on day 0 of differentiation). A total of 1376 mRNA and 206 lncRNA were obtained. Finally, we processed the differential lncRNAs and mRNAs obtained in the two groups, and obtained 113 shared differential lncRNAs to further predict the targeted miRNA, a total of 815 lncRNA-miRNA pairs. The targeted mRNA was further predicted, and the grouped differential mRNAs were combined to obtain 64 differential mRNAs. In the end, we obtained 216 ceRNAs containing 26 lncRNAs, 27 miRNAs and 64 mRNAs. We found that the mRNAs in the ceRNA network were mainly enriched with 45 Gene Ontology (GO) terms, mainly including glucose homeostasis mechanism and insulin stimulation response. 69 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were mainly enriched. It mainly includes many pathways related to lipid metabolism such as Adenosine 5'-monophosphate (AMP)-activated protein kinase (AMPK), Rap1, cAMP, mitogen-activated protein kinase (MAPK), Ras, hypoxia inducible factor-1 (HIF-1), PI3K-Akt, insulin signaling and so on. In the end, we identified 216 ceRNA regulatory relationships related to obesity research. Our research provides a clearer direction for understanding the molecular mechanism of obesity, the screening and determination of drug targets biomarkers in the future.
Collapse
Affiliation(s)
- Chengcheng Liang
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, People's Republic of China
| | - Sayed Haidar Abbas Raza
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, People's Republic of China
| | | | - Yanrong Feng
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, People's Republic of China
| | - Rajwali Khan
- Department of Livestock Management, Breeding and Genetics, The University of Agriculture Peshawar, Peshawar, Pakistan
| | - Zuhair M Mohammedsaleh
- Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, University of Tabuk, Tabuk, 71491, Kingdom of Saudi Arabia
| | - Abdullah F Shater
- Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, University of Tabuk, Tabuk, Kingdom of Saudi Arabia
| | - Bassam M Al-Ahmadi
- Biology department, Faculty of Science, Taibah University, Medina, Kingdom of Saudi Arabia
| | - Fayez M Saleh
- Department of Medical Microbiology, Faculty of Medicine, University of Tabuk, Tabuk, 71491, Kingdom of Saudi Arabia
| | - Muhammad Ahsan Bilal
- Department of Dermatology, Hospital, Xian Jiaotong University, 157 Xiwu Road, Xi'an, 710004, Shaanxi Province, China
| | - Linsen Zan
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, People's Republic of China.
- National Beef Cattle Improvement Center, Northwest A&F University, Yangling, 712100, Shaanxi, China.
| |
Collapse
|
8
|
Sarajlic P, Plunde O, Franco-Cereceda A, Bäck M. Artificial Intelligence Models Reveal Sex-Specific Gene Expression in Aortic Valve Calcification. JACC Basic Transl Sci 2021; 6:403-412. [PMID: 34095631 PMCID: PMC8165113 DOI: 10.1016/j.jacbts.2021.02.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 02/03/2021] [Accepted: 02/03/2021] [Indexed: 12/13/2022]
Abstract
Differences in the clinical presentation and physiology of aortic stenosis in men and women complicate the management of the condition. By combining traditional inferential statistics, artificial intelligence predictive modeling, and genetic pathway analysis, one can gain further insight into sex-specific gene expression patterns, potentially driving the valvular phenotype differences between the sexes. Results from this study, implementing a mixed and comprehensive methodological approach, offer a foundation for further exploration of potential drug targets.
Male and female aortic stenosis patients have distinct valvular phenotypes, increasing the complexities in the evaluation of valvular pathophysiology. In this study, we present cutting-edge artificial intelligence analyses of transcriptome-wide array data from stenotic aortic valves to highlight differences in gene expression patterns between the sexes, using both sex-differentiated transcripts and unbiased gene selections. This approach enabled the development of efficient models with high predictive ability and determining the most significant sex-dependent contributors to calcification. In addition, analyses of function-related gene groups revealed enriched fibrotic pathways among female patients. Ultimately, we demonstrate that artificial intelligence models can be used to accurately predict aortic valve calcification by carefully analyzing sex-specific gene transcripts.
Collapse
Affiliation(s)
- Philip Sarajlic
- Department of Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Oscar Plunde
- Department of Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Anders Franco-Cereceda
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Theme Heart and Vessels, Division of Valvular and Coronary Disease, Karolinska University Hospital, Stockholm, Sweden
| | - Magnus Bäck
- Department of Medicine, Karolinska Institutet, Stockholm, Sweden.,Theme Heart and Vessels, Division of Valvular and Coronary Disease, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|