1
|
Alshawaqfeh M, Rababah S, Hayajneh A, Gharaibeh A, Serpedin E. MetaAnalyst: a user-friendly tool for metagenomic biomarker detection and phenotype classification. BMC Med Res Methodol 2022; 22:336. [PMID: 36577938 PMCID: PMC9795700 DOI: 10.1186/s12874-022-01812-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 11/28/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Many metagenomic studies have linked the imbalance in microbial abundance profiles to a wide range of diseases. These studies suggest utilizing the microbial abundance profiles as potential markers for metagenomic-associated conditions. Due to the inevitable importance of biomarkers in understanding the disease progression and the development of possible therapies, various computational tools have been proposed for metagenomic biomarker detection. However, most existing tools require prior scripting knowledge and lack user friendly interfaces, causing considerable time and effort to install, configure, and run these tools. Besides, there is no available all-in-one solution for running and comparing various metagenomic biomarker detection simultaneously. In addition, most of these tools just present the suggested biomarkers without any statistical evaluation for their quality. RESULTS To overcome these limitations, this work presents MetaAnalyst, a software package with a simple graphical user interface (GUI) that (i) automates the installation and configuration of 28 state-of-the-art tools, (ii) supports flexible study design to enable studying the dataset under different scenarios smoothly, iii) runs and evaluates several algorithms simultaneously iv) supports different input formats and provides the user with several preprocessing capabilities, v) provides a variety of metrics to evaluate the quality of the suggested markers, and vi) presents the outcomes in the form of publication quality plots with various formatting capabilities as well as Excel sheets. CONCLUSIONS The utility of this tool has been verified through studying a metagenomic dataset under four scenarios. The executable file for MetaAnalyst along with its user manual are made available at https://github.com/mshawaqfeh/MetaAnalyst .
Collapse
Affiliation(s)
- Mustafa Alshawaqfeh
- grid.440896.70000 0004 0418 154XSchool of Electrical Engineering and Information Technology, German Jordanian University, Amman, Jordan
| | - Salahelden Rababah
- grid.440896.70000 0004 0418 154XSchool of Electrical Engineering and Information Technology, German Jordanian University, Amman, Jordan ,grid.264260.40000 0001 2164 4508Department of Systems Science and Industrial Engineering, State University of New York at Binghamton, Binghamton, NY, USA
| | - Abdullah Hayajneh
- grid.264756.40000 0004 4687 2082Electrical and Computer Engineering Department, Texas A &M University, College Station, TX, USA
| | - Ammar Gharaibeh
- grid.440896.70000 0004 0418 154XSchool of Electrical Engineering and Information Technology, German Jordanian University, Amman, Jordan
| | - Erchin Serpedin
- grid.264756.40000 0004 4687 2082Electrical and Computer Engineering Department, Texas A &M University, College Station, TX, USA
| |
Collapse
|
2
|
Forouzandeh A, Rutar A, Kalmady SV, Greiner R. Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets. PLoS One 2022; 17:e0252697. [PMID: 35901020 PMCID: PMC9333302 DOI: 10.1371/journal.pone.0252697] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 06/29/2022] [Indexed: 11/19/2022] Open
Abstract
Many researchers try to understand a biological condition by identifying biomarkers. This is typically done using univariate hypothesis testing over a labeled dataset, declaring a feature to be a biomarker if there is a significant statistical difference between its values for the subjects with different outcomes. However, such sets of proposed biomarkers are often not reproducible – subsequent studies often fail to identify the same sets. Indeed, there is often only a very small overlap between the biomarkers proposed in pairs of related studies that explore the same phenotypes over the same distribution of subjects. This paper first defines the Reproducibility Score for a labeled dataset as a measure (taking values between 0 and 1) of the reproducibility of the results produced by a specified fixed biomarker discovery process for a given distribution of subjects. We then provide ways to reliably estimate this score by defining algorithms that produce an over-bound and an under-bound for this score for a given dataset and biomarker discovery process, for the case of univariate hypothesis testing on dichotomous groups. We confirm that these approximations are meaningful by providing empirical results on a large number of datasets and show that these predictions match known reproducibility results. To encourage others to apply this technique to analyze their biomarker sets, we have also created a publicly available website, https://biomarker.shinyapps.io/BiomarkerReprod/, that produces these Reproducibility Score approximations for any given dataset (with continuous or discrete features and binary class labels).
Collapse
Affiliation(s)
- Amir Forouzandeh
- Department of Computing Science, University of Alberta, Edmonton, Canada
- * E-mail:
| | - Alex Rutar
- Department of Pure Math, University of Waterloo, Waterloo, ON, Canada
| | - Sunil V. Kalmady
- Department of Computing Science, University of Alberta, Edmonton, Canada
- Canadian VIGOUR Centre, University of Alberta, Edmonton, Canada
| | - Russell Greiner
- Department of Computing Science, University of Alberta, Edmonton, Canada
- Alberta Machine Intelligence Institute, Edmonton, Canada
| |
Collapse
|
3
|
Nagpal S, Singh R, Taneja B, Mande SS. MarkerML – Marker feature identification in metagenomic datasets using interpretable machine learning. J Mol Biol 2022; 434:167589. [DOI: 10.1016/j.jmb.2022.167589] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 04/08/2022] [Accepted: 04/12/2022] [Indexed: 12/29/2022]
|
4
|
Abstract
Measurement of biological systems containing biomolecules and bioparticles is a key task in the fields of analytical chemistry, biology, and medicine. Driven by the complex nature of biological systems and unprecedented amounts of measurement data, artificial intelligence (AI) in measurement science has rapidly advanced from the use of silicon-based machine learning (ML) for data mining to the development of molecular computing with improved sensitivity and accuracy. This review presents an overview of fundamental ML methodologies and discusses their applications in disease diagnostics, biomarker discovery, and imaging analysis. We next provide the working principles of molecular computing using logic gates and arithmetical devices, which can be employed for in situ detection, computation, and signal transduction for biological systems. This review concludes by summarizing the strengths and limitations of AI-involved biological measurement in fundamental and applied research.
Collapse
Affiliation(s)
- Chao Liu
- CAS Key Laboratory of Standardization and Measurement for Nanotechnology, CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology, Beijing 100190, China;
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiashu Sun
- CAS Key Laboratory of Standardization and Measurement for Nanotechnology, CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology, Beijing 100190, China;
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
5
|
Huang L, Xu C, Yang W, Yu R. A machine learning framework to determine geolocations from metagenomic profiling. Biol Direct 2020; 15:27. [PMID: 33225966 PMCID: PMC7682025 DOI: 10.1186/s13062-020-00278-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Accepted: 10/28/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Studies on metagenomic data of environmental microbial samples found that microbial communities seem to be geolocation-specific, and the microbiome abundance profile can be a differentiating feature to identify samples' geolocations. In this paper, we present a machine learning framework to determine the geolocations from metagenomics profiling of microbial samples. RESULTS Our method was applied to the multi-source microbiome data from MetaSUB (The Metagenomics and Metadesign of Subways and Urban Biomes) International Consortium for the CAMDA 2019 Metagenomic Forensics Challenge (the Challenge). The goal of the Challenge is to predict the geographical origins of mystery samples by constructing microbiome fingerprints.First, we extracted features from metagenomic abundance profiles. We then randomly split the training data into training and validation sets and trained the prediction models on the training set. Prediction performance was evaluated on the validation set. By using logistic regression with L2 normalization, the prediction accuracy of the model reaches 86%, averaged over 100 random splits of training and validation datasets.The testing data consists of samples from cities that do not occur in the training data. To predict the "mystery" cities that are not sampled before for the testing data, we first defined biological coordinates for sampled cities based on the similarity of microbial samples from them. Then we performed affine transform on the map such that the distance between cities measures their biological difference rather than geographical distance. After that, we derived the probabilities of a given testing sample from unsampled cities based on its predicted probabilities on sampled cities using Kriging interpolation. Results show that this method can successfully assign high probabilities to the true cities-of-origin of testing samples. CONCLUSION Our framework shows good performance in predicting the geographic origin of metagenomic samples for cities where training data are available. Furthermore, we demonstrate the potential of the proposed method to predict metagenomic samples' geolocations for samples from locations that are not in the training dataset.
Collapse
Affiliation(s)
- Lihong Huang
- School of Informatics, Xiamen University, Xiamen, China
| | | | | | - Rongshan Yu
- School of Informatics, Xiamen University, Xiamen, China
- Aginome Scientific Pte. Ltd., Xiamen, China
| |
Collapse
|
6
|
Li C, Xu J. Feature selection with the Fisher score followed by the Maximal Clique Centrality algorithm can accurately identify the hub genes of hepatocellular carcinoma. Sci Rep 2019; 9:17283. [PMID: 31754223 PMCID: PMC6872594 DOI: 10.1038/s41598-019-53471-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 11/01/2019] [Indexed: 02/08/2023] Open
Abstract
This study aimed to select the feature genes of hepatocellular carcinoma (HCC) with the Fisher score algorithm and to identify hub genes with the Maximal Clique Centrality (MCC) algorithm. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was performed to examine the enrichment of terms. Gene set enrichment analysis (GSEA) was used to identify the classes of genes that are overrepresented. Following the construction of a protein-protein interaction network with the feature genes, hub genes were identified with the MCC algorithm. The Kaplan–Meier plotter was utilized to assess the prognosis of patients based on expression of the hub genes. The feature genes were closely associated with cancer and the cell cycle, as revealed by GO, KEGG and GSEA enrichment analyses. Survival analysis showed that the overexpression of the Fisher score–selected hub genes was associated with decreased survival time (P < 0.05). Weighted gene co-expression network analysis (WGCNA), Lasso, ReliefF and random forest were used for comparison with the Fisher score algorithm. The comparison among these approaches showed that the Fisher score algorithm is superior to the Lasso and ReliefF algorithms in terms of hub gene identification and has similar performance to the WGCNA and random forest algorithms. Our results demonstrated that the Fisher score followed by the application of the MCC algorithm can accurately identify hub genes in HCC.
Collapse
Affiliation(s)
- Chengzhang Li
- College of Life Science, Henan Normal University, Xinxiang, 453007, Henan Province, China.,State Key Laboratory Cultivation Base for Cell Differentiation Regulation, Henan Normal University, Xinxiang, 453007, Henan Province, China.,Department of Physiology and Neurobiology, School of Basic Medical Sciences, Xinxiang Medical University, Xinxiang, 453003, Henan Province, China
| | - Jiucheng Xu
- Engineering Lab of Intelligence Business & Internet of Things, College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, Henan Province, China. .,State Key Laboratory Cultivation Base for Cell Differentiation Regulation, Henan Normal University, Xinxiang, 453007, Henan Province, China.
| |
Collapse
|
7
|
Tang J, Wang Y, Fu J, Zhou Y, Luo Y, Zhang Y, Li B, Yang Q, Xue W, Lou Y, Qiu Y, Zhu F. A critical assessment of the feature selection methods used for biomarker discovery in current metaproteomics studies. Brief Bioinform 2019; 21:1378-1390. [DOI: 10.1093/bib/bbz061] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 04/14/2019] [Indexed: 02/06/2023] Open
Abstract
Abstract
Microbial community (MC) has great impact on mediating complex disease indications, biogeochemical cycling and agricultural productivities, which makes metaproteomics powerful technique for quantifying diverse and dynamic composition of proteins or peptides. The key role of biostatistical strategies in MC study is reported to be underestimated, especially the appropriate application of feature selection method (FSM) is largely ignored. Although extensive efforts have been devoted to assessing the performance of FSMs, previous studies focused only on their classification accuracy without considering their ability to correctly and comprehensively identify the spiked proteins. In this study, the performances of 14 FSMs were comprehensively assessed based on two key criteria (both sample classification and spiked protein discovery) using a variety of metaproteomics benchmarks. First, the classification accuracies of those 14 FSMs were evaluated. Then, their abilities in identifying the proteins of different spiked concentrations were assessed. Finally, seven FSMs (FC, LMEB, OPLS-DA, PLS-DA, SAM, SVM-RFE and T-Test) were identified as performing consistently superior or good under both criteria with the PLS-DA performing consistently superior. In summary, this study served as comprehensive analysis on the performances of current FSMs and could provide a valuable guideline for researchers in metaproteomics.
Collapse
Affiliation(s)
- Jing Tang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- Department of Bioinformatics, Chongqing Medical University, Chongqing, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jianbo Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Ying Zhou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Ying Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Bo Li
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Qingxia Yang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Yan Lou
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yunqing Qiu
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| |
Collapse
|
8
|
Feng X, Li J, Li H, Chen H, Li F, Liu Q, You ZH, Zhou F. Age Is Important for the Early-Stage Detection of Breast Cancer on Both Transcriptomic and Methylomic Biomarkers. Front Genet 2019; 10:212. [PMID: 30984234 PMCID: PMC6448048 DOI: 10.3389/fgene.2019.00212] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2018] [Accepted: 02/27/2019] [Indexed: 12/27/2022] Open
Abstract
Patients at different ages have different rates of cell development and metabolisms. As a result, age should be an essential part of how a disease diagnosis model is trained and optimized. Unfortunately, most of the existing studies have not taken age into account. This study demonstrated that disease diagnosis models could be improved by merely applying individual models for patients of different age groups. Both transcriptomes and methylomes of the TCGA breast cancer dataset (TCGA-BRCA) were utilized for the analysis procedure of feature selection and classification. Our experimental data strongly suggested that disease diagnosis modeling should integrate patient age into the whole experimental design.
Collapse
Affiliation(s)
- Xin Feng
- BioKnow Health Informatics Lab, College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Jialiang Li
- BioKnow Health Informatics Lab, College of Software, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Han Li
- BioKnow Health Informatics Lab, College of Software, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Hang Chen
- BioKnow Health Informatics Lab, College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Fei Li
- BioKnow Health Informatics Lab, College of Software, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Quewang Liu
- BioKnow Health Informatics Lab, College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Fengfeng Zhou
- BioKnow Health Informatics Lab, College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China.,BioKnow Health Informatics Lab, College of Software, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
9
|
Disease Prediction Using Metagenomic Data Visualizations Based on Manifold Learning and Convolutional Neural Network. FUTURE DATA AND SECURITY ENGINEERING 2019. [DOI: 10.1007/978-3-030-35653-8_9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
10
|
AlShawaqfeh MK, Wajid B, Minamoto Y, Markel M, Lidbury JA, Steiner JM, Serpedin E, Suchodolski JS. A dysbiosis index to assess microbial changes in fecal samples of dogs with chronic inflammatory enteropathy. FEMS Microbiol Ecol 2018; 93:4443197. [PMID: 29040443 DOI: 10.1093/femsec/fix136] [Citation(s) in RCA: 168] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Accepted: 10/10/2017] [Indexed: 01/12/2023] Open
Abstract
Recent studies have identified various bacterial groups that are altered in dogs with chronic inflammatory enteropathies (CE) compared to healthy dogs. The study aim was to use quantitative PCR (qPCR) assays to confirm these findings in a larger number of dogs, and to build a mathematical algorithm to report these microbiota changes as a dysbiosis index (DI). Fecal DNA from 95 healthy dogs and 106 dogs with histologically confirmed CE was analyzed. Samples were grouped into a training set and a validation set. Various mathematical models and combination of qPCR assays were evaluated to find a model with highest discriminatory power. The final qPCR panel consisted of eight bacterial groups: total bacteria, Faecalibacterium, Turicibacter, Escherichia coli, Streptococcus, Blautia, Fusobacterium and Clostridium hiranonis. The qPCR-based DI was built based on the nearest centroid classifier, and reports the degree of dysbiosis in a single numerical value that measures the closeness in the l2 - norm of the test sample to the mean prototype of each class. A negative DI indicates normobiosis, whereas a positive DI indicates dysbiosis. For a threshold of 0, the DI based on the combined dataset achieved 74% sensitivity and 95% specificity to separate healthy and CE dogs.
Collapse
Affiliation(s)
- M K AlShawaqfeh
- Gastrointestinal Laboratory, Texas A&M University, College Station, TX 77843-4474, USA.,Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843-4474, USA
| | - B Wajid
- Gastrointestinal Laboratory, Texas A&M University, College Station, TX 77843-4474, USA.,Department of Electrical Engineering, University of Engineering and Technology, 54890 Lahore, Pakistan
| | - Y Minamoto
- Gastrointestinal Laboratory, Texas A&M University, College Station, TX 77843-4474, USA
| | - M Markel
- Gastrointestinal Laboratory, Texas A&M University, College Station, TX 77843-4474, USA
| | - J A Lidbury
- Gastrointestinal Laboratory, Texas A&M University, College Station, TX 77843-4474, USA
| | - J M Steiner
- Gastrointestinal Laboratory, Texas A&M University, College Station, TX 77843-4474, USA
| | - E Serpedin
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843-4474, USA
| | - J S Suchodolski
- Gastrointestinal Laboratory, Texas A&M University, College Station, TX 77843-4474, USA
| |
Collapse
|
11
|
Alshawaqfeh M, Bashaireh A, Serpedin E, Suchodolski J. Reliable Biomarker discovery from Metagenomic data via RegLRSD algorithm. BMC Bioinformatics 2017; 18:328. [PMID: 28693478 PMCID: PMC5504766 DOI: 10.1186/s12859-017-1738-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Accepted: 06/22/2017] [Indexed: 12/13/2022] Open
Abstract
Background Biomarker detection presents itself as a major means of translating biological data into clinical applications. Due to the recent advances in high throughput sequencing technologies, an increased number of metagenomics studies have suggested the dysbiosis in microbial communities as potential biomarker for certain diseases. The reproducibility of the results drawn from metagenomic data is crucial for clinical applications and to prevent incorrect biological conclusions. The variability in the sample size and the subjects participating in the experiments induce diversity, which may drastically change the outcome of biomarker detection algorithms. Therefore, a robust biomarker detection algorithm that ensures the consistency of the results irrespective of the natural diversity present in the samples is needed. Results Toward this end, this paper proposes a novel Regularized Low Rank-Sparse Decomposition (RegLRSD) algorithm. RegLRSD models the bacterial abundance data as a superposition between a sparse matrix and a low-rank matrix, which account for the differentially and non-differentially abundant microbes, respectively. Hence, the biomarker detection problem is cast as a matrix decomposition problem. In order to yield more consistent and solid biological conclusions, RegLRSD incorporates the prior knowledge that the irrelevant microbes do not exhibit significant variation between samples belonging to different phenotypes. Moreover, an efficient algorithm to extract the sparse matrix is proposed. Comprehensive comparisons of RegLRSD with the state-of-the-art algorithms on three realistic datasets are presented. The obtained results demonstrate that RegLRSD consistently outperforms the other algorithms in terms of reproducibility performance and provides a marker list with high classification accuracy. Conclusions The proposed RegLRSD algorithm for biomarker detection provides high reproducibility and classification accuracy performance regardless of the dataset complexity and the number of selected biomarkers. This renders RegLRSD as a reliable and powerful tool for identifying potential metagenomic biomarkers.
Collapse
Affiliation(s)
- Mustafa Alshawaqfeh
- Bioinformatics and Genomic Signal Processing Lab, ECEN Dept., Texas A&M University, College Station, 77843-3128, TX, USA
| | - Ahmad Bashaireh
- Bioinformatics and Genomic Signal Processing Lab, ECEN Dept., Texas A&M University, College Station, 77843-3128, TX, USA
| | - Erchin Serpedin
- Bioinformatics and Genomic Signal Processing Lab, ECEN Dept., Texas A&M University, College Station, 77843-3128, TX, USA.
| | - Jan Suchodolski
- College of Veterinary Medicine and Biomedical Sciences, Gastrointestinal Laboratory, Texas A&M University, College Station, 77843-3128, TX, USA
| |
Collapse
|