1
|
Kryukov M, Moriarty KP, Villamea M, O'Dwyer I, Chow O, Dormont F, Hernandez R, Bar-Joseph Z, Rufino B. Proxy endpoints - bridging clinical trials and real world data. J Biomed Inform 2024; 158:104723. [PMID: 39299565 DOI: 10.1016/j.jbi.2024.104723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 08/15/2024] [Accepted: 09/03/2024] [Indexed: 09/22/2024]
Abstract
OBJECTIVE Disease severity scores, or endpoints, are routinely measured during Randomized Controlled Trials (RCTs) to closely monitor the effect of treatment. In real-world clinical practice, although a larger set of patients is observed, the specific RCT endpoints are often not captured, which makes it hard to utilize real-world data (RWD) to evaluate drug efficacy in larger populations. METHODS To overcome this challenge, we developed an ensemble technique which learns proxy models of disease endpoints in RWD. Using a multi-stage learning framework applied to RCT data, we first identify features considered significant drivers of disease available within RWD. To create endpoint proxy models, we use Explainable Boosting Machines (EBMs) which allow for both end-user interpretability and modeling of non-linear relationships. RESULTS We demonstrate our approach on two diseases, rheumatoid arthritis (RA) and atopic dermatitis (AD). As we show, our combined feature selection and prediction method achieves good results for both disease areas, improving upon prior methods proposed for predictive disease severity scoring. CONCLUSION Having disease severity over time for a patient is important to further disease understanding and management. Our results open the door to more use cases in the space of RA and AD such as treatment effect estimates or prognostic scoring on RWD. Our framework may be extended beyond RA and AD to other diseases where the severity score is not well measured in electronic health records.
Collapse
Affiliation(s)
- Maxim Kryukov
- Data & Computational Science, R&D, Sanofi, 240 Richmond Street West, 3rd Floor, Toronto, M5V 1V6, Ontario, Canada.
| | - Kathleen P Moriarty
- Data & Computational Science, R&D, Sanofi, 240 Richmond Street West, 3rd Floor, Toronto, M5V 1V6, Ontario, Canada.
| | | | - Ingrid O'Dwyer
- Data & Computational Science, R&D, Sanofi, 240 Richmond Street West, 3rd Floor, Toronto, M5V 1V6, Ontario, Canada.
| | - Ohn Chow
- Clinical Immunology and Inflammation, R&D, Sanofi, 450 Water St, MA, Cambridge, 02141, MA, United States.
| | - Flavio Dormont
- Clinical Real World Evidence, R&D, Sanofi, 46 Av. de la Grande Armée, Paris, 75017, Île-de-France, France.
| | - Ramon Hernandez
- Clinical Real World Evidence, R&D, Sanofi, 46 Av. de la Grande Armée, Paris, 75017, Île-de-France, France.
| | - Ziv Bar-Joseph
- Data & Computational Science, R&D, Sanofi, 450 Water St, , MA, Cambridge, 02141, MA, United States.
| | - Brandon Rufino
- Data & Computational Science, R&D, Sanofi, 240 Richmond Street West, 3rd Floor, Toronto, M5V 1V6, Ontario, Canada.
| |
Collapse
|
2
|
Liu Y, Ji Y, Chen J, Zhang Y, Li X, Li X. Pioneering noninvasive colorectal cancer detection with an AI-enhanced breath volatilomics platform. Theranostics 2024; 14:4240-4255. [PMID: 39113791 PMCID: PMC11303087 DOI: 10.7150/thno.94950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 05/02/2024] [Indexed: 08/10/2024] Open
Abstract
Background: The sensitivity and specificity of current breath biomarkers are often inadequate for effective cancer screening, particularly in colorectal cancer (CRC). While a few exhaled biomarkers in CRC exhibit high specificity, they lack the requisite sensitivity for early-stage detection, thereby limiting improvements in patient survival rates. Methods: In this study, we developed an advanced Mass Spectrometry-based volatilomics platform, complemented by an enhanced breath sampler. The platform integrates artificial intelligence (AI)-assisted algorithms to detect multiple volatile organic compounds (VOCs) biomarkers in human breath. Subsequently, we applied this platform to analyze 364 clinical CRC and normal exhaled samples. Results: The diagnostic signatures, including 2-methyl, octane, and butyric acid, generated by the platform effectively discriminated CRC patients from normal controls with high sensitivity (89.7%), specificity (86.8%), and accuracy (AUC = 0.91). Furthermore, the metastatic signature correctly identified over 50% of metastatic patients who tested negative for carcinoembryonic antigen (CEA). Fecal validation indicated that elevated breath biomarkers correlated with an inflammatory response guided by Bacteroides fragilis in CRC. Conclusion: This study introduces a sophisticated AI-aided Mass Spectrometry-based platform capable of identifying novel and feasible breath biomarkers for early-stage CRC detection. The promising results position the platform as an efficient noninvasive screening test for clinical applications, offering potential advancements in early detection and improved survival rates for CRC patients.
Collapse
Affiliation(s)
- Yongqian Liu
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, P.R. China
| | - Yongyan Ji
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, P.R. China
| | - Jian Chen
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, P.R. China
| | - Yixuan Zhang
- Department of gastroenterology, Huadong hospital, Fudan University, Shanghai 200040, P.R. China
| | - Xiaowen Li
- Department of gastroenterology, Huadong hospital, Fudan University, Shanghai 200040, P.R. China
| | - Xiang Li
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, P.R. China
| |
Collapse
|
3
|
Flynn CD, Chang D. Artificial Intelligence in Point-of-Care Biosensing: Challenges and Opportunities. Diagnostics (Basel) 2024; 14:1100. [PMID: 38893627 PMCID: PMC11172335 DOI: 10.3390/diagnostics14111100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 05/22/2024] [Accepted: 05/24/2024] [Indexed: 06/21/2024] Open
Abstract
The integration of artificial intelligence (AI) into point-of-care (POC) biosensing has the potential to revolutionize diagnostic methodologies by offering rapid, accurate, and accessible health assessment directly at the patient level. This review paper explores the transformative impact of AI technologies on POC biosensing, emphasizing recent computational advancements, ongoing challenges, and future prospects in the field. We provide an overview of core biosensing technologies and their use at the POC, highlighting ongoing issues and challenges that may be solved with AI. We follow with an overview of AI methodologies that can be applied to biosensing, including machine learning algorithms, neural networks, and data processing frameworks that facilitate real-time analytical decision-making. We explore the applications of AI at each stage of the biosensor development process, highlighting the diverse opportunities beyond simple data analysis procedures. We include a thorough analysis of outstanding challenges in the field of AI-assisted biosensing, focusing on the technical and ethical challenges regarding the widespread adoption of these technologies, such as data security, algorithmic bias, and regulatory compliance. Through this review, we aim to emphasize the role of AI in advancing POC biosensing and inform researchers, clinicians, and policymakers about the potential of these technologies in reshaping global healthcare landscapes.
Collapse
Affiliation(s)
- Connor D. Flynn
- Department of Chemistry, Weinberg College of Arts & Sciences, Northwestern University, Evanston, IL 60208, USA
| | - Dingran Chang
- Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL 60208, USA
| |
Collapse
|
4
|
Blutt SE, Coarfa C, Neu J, Pammi M. Multiomic Investigations into Lung Health and Disease. Microorganisms 2023; 11:2116. [PMID: 37630676 PMCID: PMC10459661 DOI: 10.3390/microorganisms11082116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/08/2023] [Accepted: 08/13/2023] [Indexed: 08/27/2023] Open
Abstract
Diseases of the lung account for more than 5 million deaths worldwide and are a healthcare burden. Improving clinical outcomes, including mortality and quality of life, involves a holistic understanding of the disease, which can be provided by the integration of lung multi-omics data. An enhanced understanding of comprehensive multiomic datasets provides opportunities to leverage those datasets to inform the treatment and prevention of lung diseases by classifying severity, prognostication, and discovery of biomarkers. The main objective of this review is to summarize the use of multiomics investigations in lung disease, including multiomics integration and the use of machine learning computational methods. This review also discusses lung disease models, including animal models, organoids, and single-cell lines, to study multiomics in lung health and disease. We provide examples of lung diseases where multi-omics investigations have provided deeper insight into etiopathogenesis and have resulted in improved preventative and therapeutic interventions.
Collapse
Affiliation(s)
- Sarah E. Blutt
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX 77030, USA;
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA;
| | - Cristian Coarfa
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA;
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Josef Neu
- Department of Pediatrics, Section of Neonatology, University of Florida, Gainesville, FL 32611, USA;
| | - Mohan Pammi
- Department of Pediatrics, Section of Neonatology, Baylor College of Medicine and Texas Children’s Hospital, Houston, TX 77030, USA
| |
Collapse
|
5
|
Mansur A, Vrionis A, Charles JP, Hancel K, Panagides JC, Moloudi F, Iqbal S, Daye D. The Role of Artificial Intelligence in the Detection and Implementation of Biomarkers for Hepatocellular Carcinoma: Outlook and Opportunities. Cancers (Basel) 2023; 15:2928. [PMID: 37296890 PMCID: PMC10251861 DOI: 10.3390/cancers15112928] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 05/23/2023] [Accepted: 05/24/2023] [Indexed: 06/12/2023] Open
Abstract
Liver cancer is a leading cause of cancer-related death worldwide, and its early detection and treatment are crucial for improving morbidity and mortality. Biomarkers have the potential to facilitate the early diagnosis and management of liver cancer, but identifying and implementing effective biomarkers remains a major challenge. In recent years, artificial intelligence has emerged as a promising tool in the cancer sphere, and recent literature suggests that it is very promising in facilitating biomarker use in liver cancer. This review provides an overview of the status of AI-based biomarker research in liver cancer, with a focus on the detection and implementation of biomarkers for risk prediction, diagnosis, staging, prognostication, prediction of treatment response, and recurrence of liver cancers.
Collapse
Affiliation(s)
- Arian Mansur
- Harvard Medical School, Boston, MA 02115, USA; (A.M.); (J.C.P.)
| | - Andrea Vrionis
- Morsani College of Medicine, University of South Florida Health, Tampa, FL 33602, USA; (A.V.); (J.P.C.)
| | - Jonathan P. Charles
- Morsani College of Medicine, University of South Florida Health, Tampa, FL 33602, USA; (A.V.); (J.P.C.)
| | - Kayesha Hancel
- Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, USA; (K.H.); (F.M.); (S.I.)
| | | | - Farzad Moloudi
- Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, USA; (K.H.); (F.M.); (S.I.)
| | - Shams Iqbal
- Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, USA; (K.H.); (F.M.); (S.I.)
| | - Dania Daye
- Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, USA; (K.H.); (F.M.); (S.I.)
| |
Collapse
|
6
|
Alharbi F, Vakanski A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering (Basel) 2023; 10:bioengineering10020173. [PMID: 36829667 PMCID: PMC9952758 DOI: 10.3390/bioengineering10020173] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 01/24/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023] Open
Abstract
Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic acid (DNA) microarrays and ribonucleic acid (RNA)-sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the application of deep learning models due to their comparative advantages for identifying gene patterns that are distinctive for various types of cancers. Relevant works that employ the most commonly used deep neural network architectures are covered, including multi-layer perceptrons, as well as convolutional, recurrent, graph, and transformer networks. This survey also presents an overview of the data collection methods for gene expression analysis and lists important datasets that are commonly used for supervised machine learning for this task. Furthermore, we review pertinent techniques for feature engineering and data preprocessing that are typically used to handle the high dimensionality of gene expression data, caused by a large number of genes present in data samples. The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification.
Collapse
|
7
|
Silva PP, Gaudillo JD, Vilela JA, Roxas-Villanueva RML, Tiangco BJ, Domingo MR, Albia JR. A machine learning-based SNP-set analysis approach for identifying disease-associated susceptibility loci. Sci Rep 2022; 12:15817. [PMID: 36138111 PMCID: PMC9499949 DOI: 10.1038/s41598-022-19708-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 09/02/2022] [Indexed: 11/17/2022] Open
Abstract
Identifying disease-associated susceptibility loci is one of the most pressing and crucial challenges in modeling complex diseases. Existing approaches to biomarker discovery are subject to several limitations including underpowered detection, neglect for variant interactions, and restrictive dependence on prior biological knowledge. Addressing these challenges necessitates more ingenious ways of approaching the "missing heritability" problem. This study aims to discover disease-associated susceptibility loci by augmenting previous genome-wide association study (GWAS) using the integration of random forest and cluster analysis. The proposed integrated framework is applied to a hepatitis B virus surface antigen (HBsAg) seroclearance GWAS data. Multiple cluster analyses were performed on (1) single nucleotide polymorphisms (SNPs) considered significant by GWAS and (2) SNPs with the highest feature importance scores obtained using random forest. The resulting SNP-sets from the cluster analyses were subsequently tested for trait-association. Three susceptibility loci possibly associated with HBsAg seroclearance were identified: (1) SNP rs2399971, (2) gene LINC00578, and (3) locus 11p15. SNP rs2399971 is a biomarker reported in the literature to be significantly associated with HBsAg seroclearance in patients who had received antiviral treatment. The latter two loci are linked with diseases influenced by the presence of hepatitis B virus infection. These findings demonstrate the potential of the proposed integrated framework in identifying disease-associated susceptibility loci. With further validation, results herein could aid in better understanding complex disease etiologies and provide inputs for a more advanced disease risk assessment for patients.
Collapse
Affiliation(s)
- Princess P Silva
- Data-Driven Research Laboratory (DARELab), Institute of Mathematical Sciences and Physics, University of the Philippines Los Baños, 4031, Los Baños, Laguna, Philippines
- Computational Interdisciplinary Research Laboratory (CINTERLabs), University of the Philippines Los Baños, 4031, Los Baños, Laguna, Philippines
| | - Joverlyn D Gaudillo
- Data-Driven Research Laboratory (DARELab), Institute of Mathematical Sciences and Physics, University of the Philippines Los Baños, 4031, Los Baños, Laguna, Philippines.
- Computational Interdisciplinary Research Laboratory (CINTERLabs), University of the Philippines Los Baños, 4031, Los Baños, Laguna, Philippines.
- Domingo AI Research Center (DARC Labs), 1606, Pasig City, Philippines.
| | - Julianne A Vilela
- Philippine Genome Center Program for Agriculture, Office of the Vice Chancellor for Research and Extension, University of the Philippines Los Baños, 4031, Los Baños, Laguna, Philippines
| | - Ranzivelle Marianne L Roxas-Villanueva
- Data-Driven Research Laboratory (DARELab), Institute of Mathematical Sciences and Physics, University of the Philippines Los Baños, 4031, Los Baños, Laguna, Philippines
- Computational Interdisciplinary Research Laboratory (CINTERLabs), University of the Philippines Los Baños, 4031, Los Baños, Laguna, Philippines
| | - Beatrice J Tiangco
- National Institute of Health, UP College of Medicine, Taft Avenue, 1000, Manila, Philippines
- Division of Medicine, The Medical City, 1605, Pasig, Philippines
| | - Mario R Domingo
- Domingo AI Research Center (DARC Labs), 1606, Pasig City, Philippines
| | - Jason R Albia
- Data-Driven Research Laboratory (DARELab), Institute of Mathematical Sciences and Physics, University of the Philippines Los Baños, 4031, Los Baños, Laguna, Philippines
- Domingo AI Research Center (DARC Labs), 1606, Pasig City, Philippines
- Venn Biosciences Corporation Dba InterVenn Biosciences, Metro Manila, Philippines
| |
Collapse
|
8
|
Liu R, Zhan Y, Liu X, Zhang Y, Gui L, Qu Y, Nan H, Jiang Y. Stacking Ensemble Method for Gestational Diabetes Mellitus Prediction in Chinese Pregnant Women: A Prospective Cohort Study. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:8948082. [PMID: 36147870 PMCID: PMC9489389 DOI: 10.1155/2022/8948082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 07/28/2022] [Indexed: 11/18/2022]
Abstract
Gestational diabetes mellitus (GDM) is closely related to adverse pregnancy outcomes and other diseases. Early intervention in pregnant women who are at high risk of developing GDM could help prevent adverse health consequences. The study aims to develop a simple model using the stacking ensemble method to predict GDM for women in the first trimester based on easily available factors. We used the data from the Chinese Pregnant Women Cohort Study from July 2017 to November 2018. A total of 6,848 pregnant women in the first trimester were included in the analysis. Logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost) were considered as base learners. Optimal feature subsets for each learner were chosen by using recursive feature elimination cross-validation. Then, we built a pipeline to process imbalance data, tune hyperparameters, and evaluate model performance. The learners with the best hyperparameters were employed in the first layer of the proposed stacking method. Their predictions were obtained using optimal feature subsets and served as meta-learner's inputs. Another LR was used as a meta-learner to obtain the final prediction results. Accuracy, specificity, error rate, and other metrics were calculated to evaluate the performance of the models. A paired samples t-test was performed to compare the model performance. In total, 967 (14.12%) women developed GDM. For base learners, the RF model had the highest accuracy (0.638 (95% confidence interval (CI) 0.628-0.648)) and specificity (0.683 (0.669-0.698)) and lowest error rate (0.362 (0.352-0.372)). The stacking method effectively improved the accuracy (0.666 (95% CI 0.663-0.670)) and specificity (0.725 (0.721-0.729)) and decreased the error rate (0.333 (0.330-0.337)). The differences in the performance between the stacking method and RF were statistically significant. Our proposed stacking method based on easily available factors has better performance than other learners such as RF.
Collapse
Affiliation(s)
- Ruiyi Liu
- Department of Epidemiology and Biostatistics, School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yongle Zhan
- Department of Epidemiology and Biostatistics, School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Xuan Liu
- Department of Epidemiology and Biostatistics, School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yifang Zhang
- Department of Epidemiology and Biostatistics, School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Luting Gui
- Department of Epidemiology and Biostatistics, School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yimin Qu
- Department of Epidemiology and Biostatistics, School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Hairong Nan
- Department of Endocrinology, Shenzhen Longhua Maternity and Child Healthcare Hospital, Shenzhen, China
| | - Yu Jiang
- Department of Epidemiology and Biostatistics, School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
9
|
Chowdhury A, Razzaque RR, Muhtadi S, Shafiullah A, Ul Islam Abir E, Garra BS, Kaisar Alam S. Ultrasound classification of breast masses using a comprehensive Nakagami imaging and machine learning framework. ULTRASONICS 2022; 124:106744. [PMID: 35390626 DOI: 10.1016/j.ultras.2022.106744] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 03/22/2022] [Accepted: 03/31/2022] [Indexed: 06/14/2023]
Abstract
In this study we investigate the potential of parametric images formed from ultrasound B-mode scans using the Nakagami distribution for non-invasive classification of breast lesions and characterization of breast tissue. Through a sliding window technique, we generated seven types of Nakagami images for each patient scan in our dataset using basic and as well as derived parameters of the Nakagami distribution. To determine the suitable window size for image generation, we conducted an empirical analysis using 4 windows, which includes 3 column windows of lengths 0.1875 mm, 0.45 mm and 0.75 mm and widths of 0.002 mm, along with the standard square window with sides equal to three times the pulse length of incident ultrasound. From the parametric image sets generated using each window, we extracted a total of 72 features that consisted of morphometric, elemental and hybrid features. To our knowledge no other literature has conducted such a comprehensive analysis of Nakagami parametric images for the classification of breast lesions. Feature selection was performed to find the most useful subset of features from each of the parametric image sets for the classification of breast cancer. Analyzing the classification accuracy and Area under the Receiver Operating Characteristic (ROC) Curve (AUC) of the selected feature subsets, we determined that the selected features acquired from Nakagami parametric images generated using a column window of length 0.75 mm provides the best results for characterization of breast lesions. This optimal feature set provided a classification accuracy of 93.08%, an AUC of 0.9712, a False Negative Rate (FNR) of 0%, and a very low False Positive Rate (FPR) of 8.65%. Our results indicate that the high accuracy of such a procedure may assist in the diagnosis of breast cancer by helping to reduce false positive diagnoses.
Collapse
Affiliation(s)
- Ahmad Chowdhury
- Department of Electrical and Electronic Engineering, Islamic University of Technology, Gazipur, Bangladesh
| | - Rezwana R Razzaque
- Department of Electrical and Electronic Engineering, Islamic University of Technology, Gazipur, Bangladesh
| | - Sabiq Muhtadi
- Department of Electrical and Electronic Engineering, Islamic University of Technology, Gazipur, Bangladesh.
| | - Ahmad Shafiullah
- Department of Electrical and Electronic Engineering, Islamic University of Technology, Gazipur, Bangladesh
| | - Ehsan Ul Islam Abir
- Department of Electrical and Electronic Engineering, Islamic University of Technology, Gazipur, Bangladesh
| | - Brian S Garra
- Division of Imaging, Diagnostics and Software Reliability, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MD, United States
| | - S Kaisar Alam
- Imagine Consulting LLC, Dayton, NJ, United States; Prep Excellence LLC, Dayton, NJ, United States; The Center for Computational Biomedicine Imaging and Modelling (CBIM), Rutgers University, NJ, Piscataway, United States
| |
Collapse
|
10
|
Li L, Ching WK, Liu ZP. Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods. Comput Biol Chem 2022; 100:107747. [DOI: 10.1016/j.compbiolchem.2022.107747] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 06/17/2022] [Accepted: 07/25/2022] [Indexed: 11/03/2022]
|
11
|
Jin Z, Li N. Diagnosis of each main coronary artery stenosis based on whale optimization algorithm and stacking model. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:4568-4591. [PMID: 35430828 DOI: 10.3934/mbe.2022211] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Cardiovascular disease is currently one of the diseases with high morbidity and mortality worldwide. One of the main types is coronary artery disease (CAD), which occurs when one or more of the three main arteries, the left anterior descending (LAD) artery, the left circumflex (LCX) artery, and the right coronary artery (RCA), are narrowed. In this paper, we introduce a computer-aided diagnosis model, which uses the k-nearest neighbor (KNN)-based whale optimization algorithm (WOA) for feature selection and combines stacking model for CAD diagnosis and prediction. In WOA, the values in the solution vectors are all continuous, and a threshold is set for binary-conversion to obtain the optimal feature subsets of each main coronary artery. Then we develop a two-layer stacking model based on the selected feature subsets to diagnosis LAD, LCX and RCA. By the proposed method, we select 17 features for each main artery diagnosis, and the classification accuracy on LAD, LCX, and RCA test sets is 89.68, 88.71 and 85.81%, respectively. On the Z-Alizadeh Sani dataset, we compare the proposed feature selection method with other metaheuristics and compare the performance of WOA based on different wrappers. The experimental results show that, the KNN-based WOA method selects the optimal feature subsets, and the classification performance of the stacking model is better than other machine learning algorithms.
Collapse
Affiliation(s)
- Ziyu Jin
- College of Sciences, Northeastern University, Shenyang 110819, China
| | - Ning Li
- College of Sciences, Northeastern University, Shenyang 110819, China
| |
Collapse
|
12
|
Cai J, Qiu J, Wang H, Sun J, Ji Y. Identification of potential biomarkers in ovarian carcinoma and an evaluation of their prognostic value. ANNALS OF TRANSLATIONAL MEDICINE 2021; 9:1472. [PMID: 34734024 PMCID: PMC8506714 DOI: 10.21037/atm-21-4606] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 09/16/2021] [Indexed: 11/06/2022]
Abstract
Background Ovarian cancer is one of the most common malignant tumors in female genital organs, and its incidence rate is high. However, the pathogenesis and prognostic markers of ovarian cancer are unclear. This study sought to screen potential markers of ovarian cancer and to explore their prognostic value. Methods The Cancer Genome Atlas, Gene Expression Omnibus, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes databases were used in this study. The least absolute shrinkage and selection operator (LASSO), multivariate Cox regression and stepwise regression analysis were chosen to screen genes and construct risk model. Gene Set Enrichment Analysis (GSEA) and an immune-infiltration analysis were performed. Results One hundred thirty two co-expressed genes were found. They involved in metabolism, protein phosphorylation, mitochondria, and immune signaling pathways. Twelve genes significantly related to the survival of ovarian cancer were identified. Eight risk genes (i.e., CACNB1, FAM120B, HOXB2, MED19, PTPN2, SMU1, WAC.AS1, and BCL2L11) were further screened and used to construct the risk model. The risk status might be an independent prognostic factor of ovarian cancer, and most of the biological functions of genes expressed in high-risk ovarian cancer were related to synapse, adhesion, and immune-related functions. The clusters of CD4+ T cells and M2 macrophages were high in high-risk status samples. Conclusions In ovarian cancer, the abnormal expression of 8 genes, including CACNB1, FAM120B, HOXB2, MED19, PTPN2, SMU1, WAC.AS1, and BCL2L11, is closely related to ovarian cancer progression, and these genes can serve as independent prognosis markers of ovarian cancer.
Collapse
Affiliation(s)
- Junyan Cai
- Department of Rehabilitation, Affiliated Hospital of Nantong University, Nantong, China
| | - Jiayi Qiu
- Medical College, Nantong University, Nantong, China
| | - Hongliang Wang
- Department of Neurology, Nantong Sixth People's Hospital, Nantong, China
| | - Jiacheng Sun
- Xinglin College, Nantong University, Nantong, China
| | - Yanan Ji
- Key Laboratory of Neuroregeneration of Jiangsu and Ministry of Education, Nantong University, Nantong, China
| |
Collapse
|