1
|
Borah K, Das HS, Seth S, Mallick K, Rahaman Z, Mallik S. A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis. Funct Integr Genomics 2024; 24:139. [PMID: 39158621 DOI: 10.1007/s10142-024-01415-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/20/2024]
Abstract
Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.
Collapse
Affiliation(s)
- Kasmika Borah
- Department of Computer Science and Information Technology, Cotton University, Panbazar, Guwahati, 781001, Assam, India
| | - Himanish Shekhar Das
- Department of Computer Science and Information Technology, Cotton University, Panbazar, Guwahati, 781001, Assam, India.
| | - Soumita Seth
- Department of Computer Science and Engineering, Future Institute of Engineering and Management, Narendrapur, Kolkata, 700150, West Bengal, India
| | - Koushik Mallick
- Department of Computer Science and Engineering, RCC Institute of Information Technology, Canal S Rd, Beleghata, Kolkata, 700015, West Bengal, India
| | | | - Saurav Mallik
- Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA, 02115, USA.
- Department of Pharmacology & Toxicology, University of Arizona, Tucson, AZ, 85721, USA.
| |
Collapse
|
2
|
Wu J, Möhle L, Brüning T, Eiriz I, Rafehi M, Stefan K, Stefan SM, Pahnke J. A Novel Huntington's Disease Assessment Platform to Support Future Drug Discovery and Development. Int J Mol Sci 2022; 23:ijms232314763. [PMID: 36499090 PMCID: PMC9740291 DOI: 10.3390/ijms232314763] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 11/21/2022] [Accepted: 11/22/2022] [Indexed: 11/29/2022] Open
Abstract
Huntington's disease (HD) is a lethal neurodegenerative disorder without efficient therapeutic options. The inefficient translation from preclinical and clinical research into clinical use is mainly attributed to the lack of (i) understanding of disease initiation, progression, and involved molecular mechanisms; (ii) knowledge of the possible HD target space and general data awareness; (iii) detailed characterizations of available disease models; (iv) better suitable models; and (v) reliable and sensitive biomarkers. To generate robust HD-like symptoms in a mouse model, the neomycin resistance cassette was excised from zQ175 mice, generating a new line: zQ175Δneo. We entirely describe the dynamics of behavioral, neuropathological, and immunohistological changes from 15-57 weeks of age. Specifically, zQ175Δneo mice showed early astrogliosis from 15 weeks; growth retardation, body weight loss, and anxiety-like behaviors from 29 weeks; motor deficits and reduced muscular strength from 36 weeks; and finally slight microgliosis at 57 weeks of age. Additionally, we collected the entire bioactivity network of small-molecule HD modulators in a multitarget dataset (HD_MDS). Hereby, we uncovered 358 unique compounds addressing over 80 different pharmacological targets and pathways. Our data will support future drug discovery approaches and may serve as useful assessment platform for drug discovery and development against HD.
Collapse
Affiliation(s)
- Jingyun Wu
- Department of Pathology, Section of Neuropathology, Translational Neurodegeneration Research and Neuropathology Lab, University of Oslo and Oslo University Hospital, Sognsvannsveien 20, 0372 Oslo, Norway; www.pahnkelab.eu
| | - Luisa Möhle
- Department of Pathology, Section of Neuropathology, Translational Neurodegeneration Research and Neuropathology Lab, University of Oslo and Oslo University Hospital, Sognsvannsveien 20, 0372 Oslo, Norway; www.pahnkelab.eu
| | - Thomas Brüning
- Department of Pathology, Section of Neuropathology, Translational Neurodegeneration Research and Neuropathology Lab, University of Oslo and Oslo University Hospital, Sognsvannsveien 20, 0372 Oslo, Norway; www.pahnkelab.eu
| | - Iván Eiriz
- Department of Pathology, Section of Neuropathology, Translational Neurodegeneration Research and Neuropathology Lab, University of Oslo and Oslo University Hospital, Sognsvannsveien 20, 0372 Oslo, Norway; www.pahnkelab.eu
| | - Muhammad Rafehi
- Institute of Clinical Pharmacology, University Medical Center Göttingen, Robert-Koch-Str. 40, 37075 Göttingen, Germany
| | - Katja Stefan
- Department of Pathology, Section of Neuropathology, Translational Neurodegeneration Research and Neuropathology Lab, University of Oslo and Oslo University Hospital, Sognsvannsveien 20, 0372 Oslo, Norway; www.pahnkelab.eu
| | - Sven Marcel Stefan
- Department of Pathology, Section of Neuropathology, Translational Neurodegeneration Research and Neuropathology Lab, University of Oslo and Oslo University Hospital, Sognsvannsveien 20, 0372 Oslo, Norway; www.pahnkelab.eu
- Pahnke Lab (Drug Development and Chemical Biology), Lübeck Institute of Experimental Dermatology (LIED), University of Lübeck and University Medical Center Schleswig-Holstein, Ratzeburger Allee 160, 23538 Lübeck, Germany
- Correspondence: (J.P.); (S.M.S.); Tel.: +47-23-071-466 (J.P.)
| | - Jens Pahnke
- Department of Pathology, Section of Neuropathology, Translational Neurodegeneration Research and Neuropathology Lab, University of Oslo and Oslo University Hospital, Sognsvannsveien 20, 0372 Oslo, Norway; www.pahnkelab.eu
- Pahnke Lab (Drug Development and Chemical Biology), Lübeck Institute of Experimental Dermatology (LIED), University of Lübeck and University Medical Center Schleswig-Holstein, Ratzeburger Allee 160, 23538 Lübeck, Germany
- Department of Pharmacology, Faculty of Medicine, University of Latvia, Jelgavas iela 4, 1004 Rīga, Latvia
- Department of Neurobiology, The Georg S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
- Correspondence: (J.P.); (S.M.S.); Tel.: +47-23-071-466 (J.P.)
| |
Collapse
|
3
|
Lai J, Chen H, Li T, Yang X. Adaptive graph learning for semi-supervised feature selection with redundancy minimization. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|