1
|
Lu W, Ding C, Zhu M. Discrimination of coal geographical origins through HS-GC-IMS assisted with machine learning algorithms in larceny case. J Chromatogr A 2024; 1735:465330. [PMID: 39232421 DOI: 10.1016/j.chroma.2024.465330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 08/14/2024] [Accepted: 08/30/2024] [Indexed: 09/06/2024]
Abstract
The process of globalization and industrialization has resulted in a rise in the theft of coal and other related products, thereby becoming a focal point for forensic science. This situation has engendered an escalated demand for effective detection and monitoring technologies. The precise identification of coal trace evidence presents a challenge with current methods, owing to its minute quantity, fine texture, and intricate composition. In this study, we integrated machine learning with the identification of volatiles to accurately differentiate coal geographical origins through the application of headspace-gas chromatography-ion mobility spectrometry (HS-GC-IMS). The topographic distribution of volatiles in coals was visually depicted to elucidate the subtle distinctions through spectra and fingerprint analysis. Additionally, four supervised machine learning algorithms were developed to quantitatively predict the geographical origins of natural coals utilizing the HS-GC-IMS dataset, and these were subsequently compared with unsupervised models. Remarkable volatile compounds were identified through the quantitative analysis and optimal Random Forest model, which offered a rapid readout and achieved an average accuracy of 100 % in coal identification. Our findings indicate that the integration of HS-GC-IMS and machine learning is anticipated to enhance the efficiency and accuracy of coal geographical traceability, thereby providing a foundation for litigation and trials.
Collapse
Affiliation(s)
- Wenhui Lu
- Shanghai Key Laboratory of Forensic Medicine and Key Laboratory of Forensic Science, Ministry of Justice, Shanghai 200063, PR China; Characteristic Laboratory of Forensic Science in Universities of Shandong Province, Shandong University of Political Science and Law, Jinan, Shandong 250014, PR China.
| | - Chunli Ding
- Characteristic Laboratory of Forensic Science in Universities of Shandong Province, Shandong University of Political Science and Law, Jinan, Shandong 250014, PR China
| | - Mingshuo Zhu
- Yankuang Technology Co., Ltd., Shandong Energy Group Co., Ltd., Jinan, Shandong 250101, PR China
| |
Collapse
|
2
|
Yue C, Xue H. Construction and validation of a nomogram model for lymph node metastasis of stage II-III gastric cancer based on machine learning algorithms. Front Oncol 2024; 14:1399970. [PMID: 39439953 PMCID: PMC11493538 DOI: 10.3389/fonc.2024.1399970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 09/17/2024] [Indexed: 10/25/2024] Open
Abstract
Background Gastric cancer, a pervasive malignancy globally, often presents with regional lymph node metastasis (LNM), profoundly impacting prognosis and treatment options. Existing clinical methods for determining the presence of LNM are not precise enough, necessitating the development of an accurate risk prediction model. Objective Our primary objective was to employ machine learning algorithms to identify risk factors for LNM and establish a precise prediction model for stage II-III gastric cancer. Methods A study was conducted at Renji Hospital Affiliated to Shanghai Jiao Tong University School of Medicine between May 2010 and December 2022. This retrospective study analyzed 1147 surgeries for gastric cancer and explored the clinicopathological differences between LNM and non-LNM cohorts. Utilizing univariate logistic regression and two machine learning methodologies-Least absolute shrinkage and selection operator (LASSO) and random forest (RF)-we identified vascular invasion, maximum tumor diameter, percentage of monocytes, hematocrit (HCT), and lymphocyte-monocyte ratio (LMR) as salient factors and consolidated them into a nomogram model. The area under the receiver operating characteristic (ROC) curve (AUC), calibration curves, and decision curves were used to evaluate the test efficacy of the nomogram. Shapley Additive Explanation (SHAP) values were utilized to illustrate the predictive impact of each feature on the model's output. Results Significant differences in tumor characteristics were discerned between LNM and non-LNM cohorts through appropriate statistical methods. A nomogram, incorporating vascular invasion, maximum tumor diameter, percentage of monocytes, HCT, and LMR, was developed and exhibited satisfactory predictive capabilities with an AUC of 0.787 (95% CI: 0.749-0.824) in the training set and 0.753 (95% CI: 0.694-0.812) in the validation set. Calibration curves and decision curves affirmed the nomogram's predictive accuracy. Conclusion In conclusion, leveraging machine learning algorithms, we devised a nomogram for precise LNM risk prognostication in stage II-III gastric cancer, offering a valuable tool for tailored risk assessment in clinical decision-making.
Collapse
Affiliation(s)
| | - Huiping Xue
- Department of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
3
|
Dai Y, Zou F, Zou B. High-dimensional Biomarker Identification for Scalable and Interpretable Disease Prediction via Machine Learning Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.04.616748. [PMID: 39416165 PMCID: PMC11482776 DOI: 10.1101/2024.10.04.616748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
Omics data generated from high-throughput technologies and clinical features jointly impact many complex human diseases. Identifying key biomarkers and clinical risk factors is essential for understanding disease mechanisms and advancing early disease diagnosis and precision medicine. However, the high-dimensionality and intricate associations between disease outcomes and omics profiles present significant analytical challenges. To address these, we propose an ensemble data-driven biomarker identification tool, Hybrid Feature Screening (HFS), to construct a candidate feature set for downstream advanced machine learning models. The pre-screened candidate features from HFS are further refined using a computationally efficient permutation-based feature importance test, forming the comprehensive High-dimensional Feature Importance Test (HiFIT) framework. Through extensive numerical simulations and real-world applications, we demonstrate HiFIT's superior performance in both outcome prediction and feature importance identification. An R package implementing HiFIT is available on GitHub (https://github.com/BZou-lab/HiFIT).
Collapse
Affiliation(s)
- Yifan Dai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Baiming Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- School of Nursing, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
4
|
Sharma A, Lysenko A, Jia S, Boroevich KA, Tsunoda T. Advances in AI and machine learning for predictive medicine. J Hum Genet 2024; 69:487-497. [PMID: 38424184 PMCID: PMC11422165 DOI: 10.1038/s10038-024-01231-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/04/2024] [Accepted: 02/12/2024] [Indexed: 03/02/2024]
Abstract
The field of omics, driven by advances in high-throughput sequencing, faces a data explosion. This abundance of data offers unprecedented opportunities for predictive modeling in precision medicine, but also presents formidable challenges in data analysis and interpretation. Traditional machine learning (ML) techniques have been partly successful in generating predictive models for omics analysis but exhibit limitations in handling potential relationships within the data for more accurate prediction. This review explores a revolutionary shift in predictive modeling through the application of deep learning (DL), specifically convolutional neural networks (CNNs). Using transformation methods such as DeepInsight, omics data with independent variables in tabular (table-like, including vector) form can be turned into image-like representations, enabling CNNs to capture latent features effectively. This approach not only enhances predictive power but also leverages transfer learning, reducing computational time, and improving performance. However, integrating CNNs in predictive omics data analysis is not without challenges, including issues related to model interpretability, data heterogeneity, and data size. Addressing these challenges requires a multidisciplinary approach, involving collaborations between ML experts, bioinformatics researchers, biologists, and medical doctors. This review illuminates these complexities and charts a course for future research to unlock the full predictive potential of CNNs in omics data analysis and related fields.
Collapse
Affiliation(s)
- Alok Sharma
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Institute for Integrated and Intelligent Systems, Griffith University, Queensland, Australia.
| | - Artem Lysenko
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
| | - Shangru Jia
- Laboratory for Medical Science Mathematics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Keith A Boroevich
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Laboratory for Medical Science Mathematics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan.
| |
Collapse
|
5
|
Tivey A, Lee RJ, Clipson A, Hill SM, Lorigan P, Rothwell DG, Dive C, Mouliere F. Mining nucleic acid "omics" to boost liquid biopsy in cancer. Cell Rep Med 2024; 5:101736. [PMID: 39293399 PMCID: PMC11525024 DOI: 10.1016/j.xcrm.2024.101736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 07/22/2024] [Accepted: 08/21/2024] [Indexed: 09/20/2024]
Abstract
Treatments for cancer patients are becoming increasingly complex, and there is a growing desire from clinicians and patients for biomarkers that can account for this complexity to support informed decisions about clinical care. To achieve precision medicine, the new generation of biomarkers must reflect the spatial and temporal heterogeneity of cancer biology both between patients and within an individual patient. Mining the different layers of 'omics in a multi-modal way from a minimally invasive, easily repeatable, liquid biopsy has increasing potential in a range of clinical applications, and for improving our understanding of treatment response and resistance. Here, we detail the recent developments and methods allowing exploration of genomic, epigenomic, transcriptomic, and fragmentomic layers of 'omics from liquid biopsy, and their integration in a range of applications. We also consider the specific challenges that are posed by the clinical implementation of multi-omic liquid biopsies.
Collapse
Affiliation(s)
- Ann Tivey
- Cancer Research UK National Biomarker Centre, University of Manchester, Manchester, UK; Division of Cancer Sciences, University of Manchester, Manchester, UK
| | - Rebecca J Lee
- Cancer Research UK National Biomarker Centre, University of Manchester, Manchester, UK; Division of Cancer Sciences, University of Manchester, Manchester, UK
| | - Alexandra Clipson
- Cancer Research UK National Biomarker Centre, University of Manchester, Manchester, UK
| | - Steven M Hill
- Cancer Research UK National Biomarker Centre, University of Manchester, Manchester, UK
| | - Paul Lorigan
- Division of Cancer Sciences, University of Manchester, Manchester, UK
| | - Dominic G Rothwell
- Cancer Research UK National Biomarker Centre, University of Manchester, Manchester, UK
| | - Caroline Dive
- Cancer Research UK National Biomarker Centre, University of Manchester, Manchester, UK
| | - Florent Mouliere
- Cancer Research UK National Biomarker Centre, University of Manchester, Manchester, UK.
| |
Collapse
|
6
|
Wang F, Jia K, Li Y. Integrative deep learning with prior assisted feature selection. Stat Med 2024; 43:3792-3814. [PMID: 38923006 DOI: 10.1002/sim.10148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 04/23/2024] [Accepted: 06/07/2024] [Indexed: 06/28/2024]
Abstract
Integrative analysis has emerged as a prominent tool in biomedical research, offering a solution to the "smalln $$ n $$ and largep $$ p $$ " challenge. Leveraging the powerful capabilities of deep learning in extracting complex relationship between genes and diseases, our objective in this study is to incorporate deep learning into the framework of integrative analysis. Recognizing the redundancy within candidate features, we introduce a dedicated feature selection layer in the proposed integrative deep learning method. To further improve the performance of feature selection, the rich previous researches are utilized by an ensemble learning method to identify "prior information". This leads to the proposed prior assisted integrative deep learning (PANDA) method. We demonstrate the superiority of the PANDA method through a series of simulation studies, showing its clear advantages over competing approaches in both feature selection and outcome prediction. Finally, a skin cutaneous melanoma (SKCM) dataset is extensively analyzed by the PANDA method to show its practical application.
Collapse
Affiliation(s)
- Feifei Wang
- Center for Applied Statistics, Renmin University of China, Beijing, China
- School of Statistics, Renmin University of China, Beijing, China
| | - Ke Jia
- School of Statistics, Renmin University of China, Beijing, China
| | - Yang Li
- Center for Applied Statistics, Renmin University of China, Beijing, China
- School of Statistics, Renmin University of China, Beijing, China
| |
Collapse
|
7
|
Loh HW, Ooi CP, Oh SL, Barua PD, Tan YR, Acharya UR, Fung DSS. ADHD/CD-NET: automated EEG-based characterization of ADHD and CD using explainable deep neural network technique. Cogn Neurodyn 2024; 18:1609-1625. [PMID: 39104684 PMCID: PMC11297883 DOI: 10.1007/s11571-023-10028-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 10/04/2023] [Accepted: 10/23/2023] [Indexed: 08/07/2024] Open
Abstract
In this study, attention deficit hyperactivity disorder (ADHD), a childhood neurodevelopmental disorder, is being studied alongside its comorbidity, conduct disorder (CD), a behavioral disorder. Because ADHD and CD share commonalities, distinguishing them is difficult, thus increasing the risk of misdiagnosis. It is crucial that these two conditions are not mistakenly identified as the same because the treatment plan varies depending on whether the patient has CD or ADHD. Hence, this study proposes an electroencephalogram (EEG)-based deep learning system known as ADHD/CD-NET that is capable of objectively distinguishing ADHD, ADHD + CD, and CD. The 12-channel EEG signals were first segmented and converted into channel-wise continuous wavelet transform (CWT) correlation matrices. The resulting matrices were then used to train the convolutional neural network (CNN) model, and the model's performance was evaluated using 10-fold cross-validation. Gradient-weighted class activation mapping (Grad-CAM) was also used to provide explanations for the prediction result made by the 'black box' CNN model. Internal private dataset (45 ADHD, 62 ADHD + CD and 16 CD) and external public dataset (61 ADHD and 60 healthy controls) were used to evaluate ADHD/CD-NET. As a result, ADHD/CD-NET achieved classification accuracy, sensitivity, specificity, and precision of 93.70%, 90.83%, 95.35% and 91.85% for the internal evaluation, and 98.19%, 98.36%, 98.03% and 98.06% for the external evaluation. Grad-CAM also identified significant channels that contributed to the diagnosis outcome. Therefore, ADHD/CD-NET can perform temporal localization and choose significant EEG channels for diagnosis, thus providing objective analysis for mental health professionals and clinicians to consider when making a diagnosis. Supplementary Information The online version contains supplementary material available at 10.1007/s11571-023-10028-2.
Collapse
Affiliation(s)
- Hui Wen Loh
- School of Science and Technology, Singapore University of Social Sciences, Singapore, Singapore
| | - Chui Ping Ooi
- School of Science and Technology, Singapore University of Social Sciences, Singapore, Singapore
| | - Shu Lih Oh
- Cogninet Australia, Sydney, NSW 2010 Australia
| | - Prabal Datta Barua
- Cogninet Australia, Sydney, NSW 2010 Australia
- Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW 2007 Australia
- School of Business (Information System), University of Southern Queensland, Darling Heights, Australia
- Australian International Institute of Higher Education, Sydney, NSW 2000 Australia
- School of Science & Technology, University of New England, Armidale, Australia
- School of Biosciences, Taylor’s University, Selangor, Malaysia
- School of Computing, SRM Institute of Science and Technology, Kattankulathur, India
- School of Science and Technology, Kumamoto University, Kumamoto, Japan
- Sydney School of Education and Social work, University of Sydney, Camperdown, Australia
| | - Yi Ren Tan
- Developmental Psychiatry, Institute of Mental Health, Singapore, Singapore
| | - U. Rajendra Acharya
- School of Business (Information Systems), Faculty of Business, Education, Law & Arts, University of Southern Queensland, Darling Heights, Australia
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield, Australia
- Centre for Health Research, University of Southern Queensland, Springfield, Australia
| | - Daniel Shuen Sheng Fung
- Developmental Psychiatry, Institute of Mental Health, Singapore, Singapore
- Lee Kong Chian School of Medicine, DUKE NUS Medical School, Yong Loo Lin School of Medicine, Nanyang Technological University, National University of Singapore, Singapore, Singapore
| |
Collapse
|
8
|
Tang YC, Li R, Tang J, Zheng WJ, Jiang X. SAFER: sub-hypergraph attention-based neural network for predicting effective responses to dose combinations. BMC Bioinformatics 2024; 25:250. [PMID: 39080535 PMCID: PMC11290087 DOI: 10.1186/s12859-024-05873-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 07/17/2024] [Indexed: 08/02/2024] Open
Abstract
BACKGROUND The potential benefits of drug combination synergy in cancer medicine are significant, yet the risks must be carefully managed due to the possibility of increased toxicity. Although artificial intelligence applications have demonstrated notable success in predicting drug combination synergy, several key challenges persist: (1) Existing models often predict average synergy values across a restricted range of testing dosages, neglecting crucial dose amounts and the mechanisms of action of the drugs involved. (2) Many graph-based models rely on static protein-protein interactions, failing to adapt to dynamic and higher-order relationships. These limitations constrain the applicability of current methods. RESULTS We introduce SAFER, a Sub-hypergraph Attention-based graph model, addressing these issues by incorporating complex relationships among biological knowledge networks and considering dosing effects on subject-specific networks. SAFER outperformed previous models on the benchmark and the independent test set. The analysis of subgraph attention weight for the lung cancer cell line highlighted JAK-STAT signaling pathway, PRDM12, ZNF781, and CDC5L that have been implicated in lung fibrosis. CONCLUSIONS SAFER presents an interpretable framework designed to identify drug-responsive signals. Tailored for comprehending dose effects on subject-specific molecular contexts, our model uniquely captures dose-level drug combination responses. This capability unlocks previously inaccessible avenues of investigation compared to earlier models. Furthermore, the SAFER framework can be leveraged by future inquiries to investigate molecular networks that uniquely characterize individual patients and can be applied to prioritize personalized effective treatment based on safe dose combinations.
Collapse
Affiliation(s)
- Yi-Ching Tang
- Department of Health Data Science and Artificial Intelligence, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin Street, Houston, TX, USA
| | - Rongbin Li
- Department of Bioinformatics and Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jing Tang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Department of Biochemistry and Developmental Biology, Faculty of Medicine, University of Helsinki, 00290, Helsinki, Finland
| | - W Jim Zheng
- Department of Bioinformatics and Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Xiaoqian Jiang
- Department of Health Data Science and Artificial Intelligence, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin Street, Houston, TX, USA.
| |
Collapse
|
9
|
Waqas A, Tripathi A, Ramachandran RP, Stewart PA, Rasool G. Multimodal data integration for oncology in the era of deep neural networks: a review. Front Artif Intell 2024; 7:1408843. [PMID: 39118787 PMCID: PMC11308435 DOI: 10.3389/frai.2024.1408843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 07/09/2024] [Indexed: 08/10/2024] Open
Abstract
Cancer research encompasses data across various scales, modalities, and resolutions, from screening and diagnostic imaging to digitized histopathology slides to various types of molecular data and clinical records. The integration of these diverse data types for personalized cancer care and predictive modeling holds the promise of enhancing the accuracy and reliability of cancer screening, diagnosis, and treatment. Traditional analytical methods, which often focus on isolated or unimodal information, fall short of capturing the complex and heterogeneous nature of cancer data. The advent of deep neural networks has spurred the development of sophisticated multimodal data fusion techniques capable of extracting and synthesizing information from disparate sources. Among these, Graph Neural Networks (GNNs) and Transformers have emerged as powerful tools for multimodal learning, demonstrating significant success. This review presents the foundational principles of multimodal learning including oncology data modalities, taxonomy of multimodal learning, and fusion strategies. We delve into the recent advancements in GNNs and Transformers for the fusion of multimodal data in oncology, spotlighting key studies and their pivotal findings. We discuss the unique challenges of multimodal learning, such as data heterogeneity and integration complexities, alongside the opportunities it presents for a more nuanced and comprehensive understanding of cancer. Finally, we present some of the latest comprehensive multimodal pan-cancer data sources. By surveying the landscape of multimodal data integration in oncology, our goal is to underline the transformative potential of multimodal GNNs and Transformers. Through technological advancements and the methodological innovations presented in this review, we aim to chart a course for future research in this promising field. This review may be the first that highlights the current state of multimodal modeling applications in cancer using GNNs and transformers, presents comprehensive multimodal oncology data sources, and sets the stage for multimodal evolution, encouraging further exploration and development in personalized cancer care.
Collapse
Affiliation(s)
- Asim Waqas
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, United States
- Department of Cancer Epidemiology, Moffitt Cancer Center, Tampa, FL, United States
| | - Aakash Tripathi
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, United States
| | - Ravi P. Ramachandran
- Department of Electrical and Computer Engineering, Rowan University, Glassboro, NJ, United States
| | - Paul A. Stewart
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, United States
| | - Ghulam Rasool
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, United States
| |
Collapse
|
10
|
Zhou Y, Geng P, Zhang S, Xiao F, Cai G, Chen L, Lu Q. Multimodal functional deep learning for multiomics data. Brief Bioinform 2024; 25:bbae448. [PMID: 39285512 PMCID: PMC11405129 DOI: 10.1093/bib/bbae448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/03/2024] [Accepted: 08/28/2024] [Indexed: 09/20/2024] Open
Abstract
With rapidly evolving high-throughput technologies and consistently decreasing costs, collecting multimodal omics data in large-scale studies has become feasible. Although studying multiomics provides a new comprehensive approach in understanding the complex biological mechanisms of human diseases, the high dimensionality of omics data and the complexity of the interactions among various omics levels in contributing to disease phenotypes present tremendous analytical challenges. There is a great need of novel analytical methods to address these challenges and to facilitate multiomics analyses. In this paper, we propose a multimodal functional deep learning (MFDL) method for the analysis of high-dimensional multiomics data. The MFDL method models the complex relationships between multiomics variants and disease phenotypes through the hierarchical structure of deep neural networks and handles high-dimensional omics data using the functional data analysis technique. Furthermore, MFDL leverages the structure of the multimodal model to capture interactions between different types of omics data. Through simulation studies and real-data applications, we demonstrate the advantages of MFDL in terms of prediction accuracy and its robustness to the high dimensionality and noise within the data.
Collapse
Affiliation(s)
- Yuan Zhou
- Department of Biostatistics, University of Florida, 2004 Mowry Rd, Gainesville, FL 32611, USA
| | - Pei Geng
- Department of Mathematics and Statistics, University of New Hampshire, 33 Academic Way, Durham, NH 03824, USA
| | - Shan Zhang
- Department of Statistics and Probability, Michigan State University, 619 Red Cedar Road, East Lansing, MI 48824, USA
| | - Feifei Xiao
- Department of Biostatistics, University of Florida, 2004 Mowry Rd, Gainesville, FL 32611, USA
| | - Guoshuai Cai
- Department of Surgery, University of Florida, Gainesville, 1600 SW Archer Rd, FL 32611, USA
| | - Li Chen
- Department of Biostatistics, University of Florida, 2004 Mowry Rd, Gainesville, FL 32611, USA
| | - Qing Lu
- Department of Biostatistics, University of Florida, 2004 Mowry Rd, Gainesville, FL 32611, USA
| |
Collapse
|
11
|
John Martin JJ, Song Y, Hou M, Zhou L, Liu X, Li X, Fu D, Li Q, Cao H, Li R. Multi-Omics Approaches in Oil Palm Research: A Comprehensive Review of Metabolomics, Proteomics, and Transcriptomics Based on Low-Temperature Stress. Int J Mol Sci 2024; 25:7695. [PMID: 39062936 PMCID: PMC11277459 DOI: 10.3390/ijms25147695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 07/05/2024] [Accepted: 07/10/2024] [Indexed: 07/28/2024] Open
Abstract
Oil palm (Elaeis guineensis Jacq.) is a typical tropical oil crop with a temperature of 26-28 °C, providing approximately 35% of the total world's vegetable oil. Growth and productivity are significantly affected by low-temperature stress, resulting in inhibited growth and substantial yield losses. To comprehend the intricate molecular mechanisms underlying the response and acclimation of oil palm under low-temperature stress, multi-omics approaches, including metabolomics, proteomics, and transcriptomics, have emerged as powerful tools. This comprehensive review aims to provide an in-depth analysis of recent advancements in multi-omics studies on oil palm under low-temperature stress, including the key findings from omics-based research, highlighting changes in metabolite profiles, protein expression, and gene transcription, as well as including the potential of integrating multi-omics data to reveal novel insights into the molecular networks and regulatory pathways involved in the response to low-temperature stress. This review also emphasizes the challenges and prospects of multi-omics approaches in oil palm research, providing a roadmap for future investigations. Overall, a better understanding of the molecular basis of the response of oil palm to low-temperature stress will facilitate the development of effective breeding and biotechnological strategies to improve the crop's resilience and productivity in changing climate scenarios.
Collapse
Affiliation(s)
- Jerome Jeyakumar John Martin
- National Key Laboratory for Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China; (J.J.J.M.); (Y.S.); (M.H.); (L.Z.); (X.L.); (X.L.); (D.F.); (Q.L.)
- Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang 571339, China
| | - Yuqiao Song
- National Key Laboratory for Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China; (J.J.J.M.); (Y.S.); (M.H.); (L.Z.); (X.L.); (X.L.); (D.F.); (Q.L.)
- Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang 571339, China
- School of Life Sciences, Henan University, Kaifeng 475001, China
| | - Mingming Hou
- National Key Laboratory for Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China; (J.J.J.M.); (Y.S.); (M.H.); (L.Z.); (X.L.); (X.L.); (D.F.); (Q.L.)
- Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang 571339, China
- School of Life Sciences, Henan University, Kaifeng 475001, China
| | - Lixia Zhou
- National Key Laboratory for Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China; (J.J.J.M.); (Y.S.); (M.H.); (L.Z.); (X.L.); (X.L.); (D.F.); (Q.L.)
- Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang 571339, China
| | - Xiaoyu Liu
- National Key Laboratory for Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China; (J.J.J.M.); (Y.S.); (M.H.); (L.Z.); (X.L.); (X.L.); (D.F.); (Q.L.)
- Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang 571339, China
| | - Xinyu Li
- National Key Laboratory for Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China; (J.J.J.M.); (Y.S.); (M.H.); (L.Z.); (X.L.); (X.L.); (D.F.); (Q.L.)
- Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang 571339, China
| | - Dengqiang Fu
- National Key Laboratory for Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China; (J.J.J.M.); (Y.S.); (M.H.); (L.Z.); (X.L.); (X.L.); (D.F.); (Q.L.)
- Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang 571339, China
| | - Qihong Li
- National Key Laboratory for Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China; (J.J.J.M.); (Y.S.); (M.H.); (L.Z.); (X.L.); (X.L.); (D.F.); (Q.L.)
- Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang 571339, China
| | - Hongxing Cao
- National Key Laboratory for Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China; (J.J.J.M.); (Y.S.); (M.H.); (L.Z.); (X.L.); (X.L.); (D.F.); (Q.L.)
- Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang 571339, China
| | - Rui Li
- National Key Laboratory for Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China; (J.J.J.M.); (Y.S.); (M.H.); (L.Z.); (X.L.); (X.L.); (D.F.); (Q.L.)
- Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang 571339, China
| |
Collapse
|
12
|
Sharma SD, Bluett J. Towards Personalized Medicine in Rheumatoid Arthritis. Open Access Rheumatol 2024; 16:89-114. [PMID: 38779469 PMCID: PMC11110814 DOI: 10.2147/oarrr.s372610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 05/03/2024] [Indexed: 05/25/2024] Open
Abstract
Rheumatoid arthritis (RA) is a chronic, incurable, multisystem, inflammatory disease characterized by synovitis and extra-articular features. Although several advanced therapies targeting inflammatory mechanisms underlying the disease are available, no advanced therapy is universally effective. Therefore, a ceiling of treatment response is currently accepted where no advanced therapy is superior to another. The current challenge for medical research is the discovery and integration of predictive markers of drug response that can be used to personalize medicine so that the patient is started on "the right drug at the right time". This review article summarizes our current understanding of predicting response to anti-rheumatic drugs in RA, obstacles impeding the development of personalized medicine approaches and future research priorities to overcome these barriers.
Collapse
Affiliation(s)
- Seema D Sharma
- Centre for Musculoskeletal Research, Division of Musculoskeletal & Dermatological Sciences, School of Biological Sciences, University of Manchester, Manchester, UK
| | - James Bluett
- Centre for Musculoskeletal Research, Division of Musculoskeletal & Dermatological Sciences, School of Biological Sciences, University of Manchester, Manchester, UK
| |
Collapse
|
13
|
Binson VA, Thomas S, Subramoniam M, Arun J, Naveen S, Madhu S. A Review of Machine Learning Algorithms for Biomedical Applications. Ann Biomed Eng 2024; 52:1159-1183. [PMID: 38383870 DOI: 10.1007/s10439-024-03459-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Accepted: 01/24/2024] [Indexed: 02/23/2024]
Abstract
As the amount and complexity of biomedical data continue to increase, machine learning methods are becoming a popular tool in creating prediction models for the underlying biomedical processes. Although all machine learning methods aim to fit models to data, the methodologies used can vary greatly and may seem daunting at first. A comprehensive review of various machine learning algorithms per biomedical applications is presented. The key concepts of machine learning are supervised and unsupervised learning, feature selection, and evaluation metrics. Technical insights on the major machine learning methods such as decision trees, random forests, support vector machines, and k-nearest neighbors are analyzed. Next, the dimensionality reduction methods like principal component analysis and t-distributed stochastic neighbor embedding methods, and their applications in biomedical data analysis were reviewed. Moreover, in biomedical applications predominantly feedforward neural networks, convolutional neural networks, and recurrent neural networks are utilized. In addition, the identification of emerging directions in machine learning methodology will serve as a useful reference for individuals involved in biomedical research, clinical practice, and related professions who are interested in understanding and applying machine learning algorithms in their research or practice.
Collapse
Affiliation(s)
- V A Binson
- Department of Electronics Engineering, Saintgits College of Engineering, Kottayam, India
| | - Sania Thomas
- Department of Computer Science and Engineering, Saintgits College of Engineering, Kottayam, India
| | - M Subramoniam
- Department of Electronics Engineering, Sathyabama Institute of Science and Technology, Chennai, Tamil Nadu, India
| | - J Arun
- Centre for Waste Management-International Research Centre, Sathyabama Institute of Science and Technology, Chennai, 600119, India
| | - S Naveen
- Department of Automobile Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, India
| | - S Madhu
- Department of Automobile Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, India.
| |
Collapse
|
14
|
Ewald JD, Zhou G, Lu Y, Kolic J, Ellis C, Johnson JD, Macdonald PE, Xia J. Web-based multi-omics integration using the Analyst software suite. Nat Protoc 2024; 19:1467-1497. [PMID: 38355833 DOI: 10.1038/s41596-023-00950-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 11/21/2023] [Indexed: 02/16/2024]
Abstract
The growing number of multi-omics studies demands clear conceptual workflows coupled with easy-to-use software tools to facilitate data analysis and interpretation. This protocol covers three key components involved in multi-omics analysis, including single-omics data analysis, knowledge-driven integration using biological networks and data-driven integration through joint dimensionality reduction. Using the dataset from a recent multi-omics study of human pancreatic islet tissue and plasma samples, the first section introduces how to perform transcriptomics/proteomics data analysis using ExpressAnalyst and lipidomics data analysis using MetaboAnalyst. On the basis of significant features detected in these workflows, the second section demonstrates how to perform knowledge-driven integration using OmicsNet. The last section illustrates how to perform data-driven integration from the normalized omics data and metadata using OmicsAnalyst. The complete protocol can be executed in ~2 h. Compared with other available options for multi-omics integration, the Analyst software suite described in this protocol enables researchers to perform a wide range of omics data analysis tasks via a user-friendly web interface.
Collapse
Affiliation(s)
- Jessica D Ewald
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada
| | - Guangyan Zhou
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada
| | - Yao Lu
- Department of Microbiology and Immunology, McGill University, Montreal, Quebec, Canada
| | - Jelena Kolic
- Life Sciences Institute, Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | - Cara Ellis
- Department of Pharmacology, University of Alberta, Edmonton, Alberta, Canada
| | - James D Johnson
- Life Sciences Institute, Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | - Patrick E Macdonald
- Department of Pharmacology, University of Alberta, Edmonton, Alberta, Canada
| | - Jianguo Xia
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada.
- Department of Microbiology and Immunology, McGill University, Montreal, Quebec, Canada.
| |
Collapse
|
15
|
Tang YC, Li R, Tang J, Zheng WJ, Jiang X. SAFER: sub-hypergraph attention-based neural network for predicting effective responses to dose combinations. RESEARCH SQUARE 2024:rs.3.rs-4308618. [PMID: 38746131 PMCID: PMC11092851 DOI: 10.21203/rs.3.rs-4308618/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Background The potential benefits of drug combination synergy in cancer medicine are significant, yet the risks must be carefully managed due to the possibility of increased toxicity. Although artificial intelligence applications have demonstrated notable success in predicting drug combination synergy, several key challenges persist: (1) Existing models often predict average synergy values across a restricted range of testing dosages, neglecting crucial dose amounts and the mechanisms of action of the drugs involved. (2) Many graph-based models rely on static protein-protein interactions, failing to adapt to dynamic and context-dependent networks. This limitation constrains the applicability of current methods. Results We introduced SAFER, a Sub-hypergraph Attention-based graph model, addressing these issues by incorporating complex relationships among biological knowledge networks and considering dosing effects on subject-specific networks. SAFER outperformed previous models on the benchmark and the independent test set. The analysis of subgraph attention weight for the lung cancer cell line highlighted JAK-STAT signaling pathway, PRDM12, ZNF781, and CDC5L that have been implicated in lung fibrosis. Conclusions SAFER presents an interpretable framework designed to identify drug-responsive signals. Tailored for comprehending dose effects on subject-specific molecular contexts, our model uniquely captures dose-level drug combination responses. This capability unlocks previously inaccessible avenues of investigation compared to earlier models. Finally, the SAFER framework can be leveraged by future inquiries to investigate molecular networks that uniquely characterize individual patients.
Collapse
Affiliation(s)
- Yi-Ching Tang
- Center for Safe Artificial Intelligence for Healthcare, McWilliams School of Biomedical Informatics, the University of Texas Health Science Center at Houston, Houston, United States
| | - Rongbin Li
- Center for Safe Artificial Intelligence for Healthcare, McWilliams School of Biomedical Informatics, the University of Texas Health Science Center at Houston, Houston, United States
| | - Jing Tang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - W Jim Zheng
- Center for Safe Artificial Intelligence for Healthcare, McWilliams School of Biomedical Informatics, the University of Texas Health Science Center at Houston, Houston, United States
| | - Xiaoqian Jiang
- Center for Safe Artificial Intelligence for Healthcare, McWilliams School of Biomedical Informatics, the University of Texas Health Science Center at Houston, Houston, United States
| |
Collapse
|
16
|
Zhang W, Mou M, Hu W, Lu M, Zhang H, Zhang H, Luo Y, Xu H, Tao L, Dai H, Gao J, Zhu F. MOINER: A Novel Multiomics Early Integration Framework for Biomedical Classification and Biomarker Discovery. J Chem Inf Model 2024; 64:2720-2732. [PMID: 38373720 DOI: 10.1021/acs.jcim.4c00013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
In the context of precision medicine, multiomics data integration provides a comprehensive understanding of underlying biological processes and is critical for disease diagnosis and biomarker discovery. One commonly used integration method is early integration through concatenation of multiple dimensionally reduced omics matrices due to its simplicity and ease of implementation. However, this approach is seriously limited by information loss and lack of latent feature interaction. Herein, a novel multiomics early integration framework (MOINER) based on information enhancement and image representation learning is thus presented to address the challenges. MOINER employs the self-attention mechanism to capture the intrinsic correlations of omics-features, which make it significantly outperform the existing state-of-the-art methods for multiomics data integration. Moreover, visualizing the attention embedding and identifying potential biomarkers offer interpretable insights into the prediction results. All source codes and model for MOINER are freely available https://github.com/idrblab/MOINER.
Collapse
Affiliation(s)
- Wei Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Wei Hu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Hongning Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Hongquan Xu
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Haibin Dai
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jianqing Gao
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
17
|
Herbst K, Wang T, Forchielli EJ, Thommes M, Paschalidis IC, Segrè D. Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations. Commun Biol 2024; 7:407. [PMID: 38570615 PMCID: PMC10991586 DOI: 10.1038/s42003-024-06093-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 03/22/2024] [Indexed: 04/05/2024] Open
Abstract
The interpretation of complex biological datasets requires the identification of representative variables that describe the data without critical information loss. This is particularly important in the analysis of large phenotypic datasets (phenomics). Here we introduce Multi-Attribute Subset Selection (MASS), an algorithm which separates a matrix of phenotypes (e.g., yield across microbial species and environmental conditions) into predictor and response sets of conditions. Using mixed integer linear programming, MASS expresses the response conditions as a linear combination of the predictor conditions, while simultaneously searching for the optimally descriptive set of predictors. We apply the algorithm to three microbial datasets and identify environmental conditions that predict phenotypes under other conditions, providing biologically interpretable axes for strain discrimination. MASS could be used to reduce the number of experiments needed to identify species or to map their metabolic capabilities. The generality of the algorithm allows addressing subset selection problems in areas beyond biology.
Collapse
Affiliation(s)
- Konrad Herbst
- Bioinformatics Program, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| | - Taiyao Wang
- Division of Systems Engineering, Boston University, Boston, MA, USA
| | - Elena J Forchielli
- Biological Design Center, Boston University, Boston, MA, USA
- Department of Biology, Boston University, Boston, MA, USA
| | - Meghan Thommes
- Biological Design Center, Boston University, Boston, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
| | - Ioannis Ch Paschalidis
- Division of Systems Engineering, Boston University, Boston, MA, USA.
- Department of Biomedical Engineering, Boston University, Boston, MA, USA.
- Faculty of Computing and Data Science, Boston University, Boston, MA, USA.
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA.
| | - Daniel Segrè
- Bioinformatics Program, Boston University, Boston, MA, USA.
- Biological Design Center, Boston University, Boston, MA, USA.
- Department of Biology, Boston University, Boston, MA, USA.
- Department of Biomedical Engineering, Boston University, Boston, MA, USA.
- Faculty of Computing and Data Science, Boston University, Boston, MA, USA.
| |
Collapse
|
18
|
Srinivasan Y, Liu A, Rameau A. Machine learning in the evaluation of voice and swallowing in the head and neck cancer patient. Curr Opin Otolaryngol Head Neck Surg 2024; 32:105-112. [PMID: 38116798 DOI: 10.1097/moo.0000000000000948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
PURPOSE OF REVIEW The purpose of this review is to present recent advances and limitations in machine learning applied to the evaluation of speech, voice, and swallowing in head and neck cancer. RECENT FINDINGS Novel machine learning models incorporating diverse data modalities with improved discriminatory capabilities have been developed for predicting toxicities following head and neck cancer therapy, including dysphagia, dysphonia, xerostomia, and weight loss as well as guiding treatment planning. Machine learning has been applied to the care of posttreatment voice and swallowing dysfunction by offering objective and standardized assessments and aiding innovative technologies for functional restoration. Voice and speech are also being utilized in machine learning algorithms to screen laryngeal cancer. SUMMARY Machine learning has the potential to help optimize, assess, predict, and rehabilitate voice and swallowing function in head and neck cancer patients as well as aid in cancer screening. However, existing studies are limited by the lack of sufficient external validation and generalizability, insufficient transparency and reproducibility, and no clear superior predictive modeling strategies. Algorithms and applications will need to be trained on large multiinstitutional data sets, incorporate sociodemographic data to reduce bias, and achieve validation through clinical trials for optimal performance and utility.
Collapse
Affiliation(s)
- Yashes Srinivasan
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York
| | - Amy Liu
- University of California, San Diego, School of Medicine, San Diego, California, USA
| | - Anaïs Rameau
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York
| |
Collapse
|
19
|
Cusworth S, Gkoutos GV, Acharjee A. A novel generative adversarial networks modelling for the class imbalance problem in high dimensional omics data. BMC Med Inform Decis Mak 2024; 24:90. [PMID: 38549123 PMCID: PMC10979623 DOI: 10.1186/s12911-024-02487-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 03/22/2024] [Indexed: 04/01/2024] Open
Abstract
Class imbalance remains a large problem in high-throughput omics analyses, causing bias towards the over-represented class when training machine learning-based classifiers. Oversampling is a common method used to balance classes, allowing for better generalization of the training data. More naive approaches can introduce other biases into the data, being especially sensitive to inaccuracies in the training data, a problem considering the characteristically noisy data obtained in healthcare. This is especially a problem with high-dimensional data. A generative adversarial network-based method is proposed for creating synthetic samples from small, high-dimensional data, to improve upon other more naive generative approaches. The method was compared with 'synthetic minority over-sampling technique' (SMOTE) and 'random oversampling' (RO). Generative methods were validated by training classifiers on the balanced data.
Collapse
Affiliation(s)
- Samuel Cusworth
- Institute of Applied Health Research, University of Birmingham, Birmingham, UK
- NIHR Blood and Transplant Research Unit (BTRU) in Precision Transplant and Cellular Therapeutics, University of Birmingham, Birmingham, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, B15 2TT, Birmingham, UK
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, B15 2TT, Birmingham, UK
- MRC Health Data Research UK (HDR), Midlands Site, UK
- Centre for Health Data Research, University of Birmingham, B15 2TT, Birmingham, UK
- NIHR Experimental Cancer Medicine Centre, B15 2TT, Birmingham, UK
| | - Animesh Acharjee
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, B15 2TT, Birmingham, UK.
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, B15 2TT, Birmingham, UK.
- MRC Health Data Research UK (HDR), Midlands Site, UK.
- Centre for Health Data Research, University of Birmingham, B15 2TT, Birmingham, UK.
| |
Collapse
|
20
|
Munk K, Ilina D, Ziemba L, Brader G, Molin EM. Holomics - a user-friendly R shiny application for multi-omics data integration and analysis. BMC Bioinformatics 2024; 25:93. [PMID: 38438871 PMCID: PMC10913680 DOI: 10.1186/s12859-024-05719-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 02/26/2024] [Indexed: 03/06/2024] Open
Abstract
An organism's observable traits, or phenotype, result from intricate interactions among genes, proteins, metabolites and the environment. External factors, such as associated microorganisms, along with biotic and abiotic stressors, can significantly impact this complex biological system, influencing processes like growth, development and productivity. A comprehensive analysis of the entire biological system and its interactions is thus crucial to identify key components that support adaptation to stressors and to discover biomarkers applicable in breeding programs or disease diagnostics. Since the genomics era, several other 'omics' disciplines have emerged, and recent advances in high-throughput technologies have facilitated the generation of additional omics datasets. While traditionally analyzed individually, the last decade has seen an increase in multi-omics data integration and analysis strategies aimed at achieving a holistic understanding of interactions across different biological layers. Despite these advances, the analysis of multi-omics data is still challenging due to their scale, complexity, high dimensionality and multimodality. To address these challenges, a number of analytical tools and strategies have been developed, including clustering and differential equations, which require advanced knowledge in bioinformatics and statistics. Therefore, this study recognizes the need for user-friendly tools by introducing Holomics, an accessible and easy-to-use R shiny application with multi-omics functions tailored for scientists with limited bioinformatics knowledge. Holomics provides a well-defined workflow, starting with the upload and pre-filtering of single-omics data, which are then further refined by single-omics analysis focusing on key features. Subsequently, these reduced datasets are subjected to multi-omics analyses to unveil correlations between 2-n datasets. This paper concludes with a real-world case study where microbiomics, transcriptomics and metabolomics data from previous studies that elucidate factors associated with improved sugar beet storability are integrated using Holomics. The results are discussed in the context of the biological background, underscoring the importance of multi-omics insights. This example not only highlights the versatility of Holomics in handling different types of omics data, but also validates its consistency by reproducing findings from preceding single-omics studies.
Collapse
Affiliation(s)
- Katharina Munk
- Center for Health & Bioresources, AIT Austrian Institute of Technology, Konrad-Lorenz-Straße 24, 3430, Tulln, Austria
| | - Daria Ilina
- Center for Health & Bioresources, AIT Austrian Institute of Technology, Konrad-Lorenz-Straße 24, 3430, Tulln, Austria
| | - Lisa Ziemba
- Center for Health & Bioresources, AIT Austrian Institute of Technology, Konrad-Lorenz-Straße 24, 3430, Tulln, Austria
| | - Günter Brader
- Center for Health & Bioresources, AIT Austrian Institute of Technology, Konrad-Lorenz-Straße 24, 3430, Tulln, Austria
| | - Eva M Molin
- Center for Health & Bioresources, AIT Austrian Institute of Technology, Konrad-Lorenz-Straße 24, 3430, Tulln, Austria.
| |
Collapse
|
21
|
Zheng X, Pan F, Naumovski N, Wei Y, Wu L, Peng W, Wang K. Precise prediction of metabolites patterns using machine learning approaches in distinguishing honey and sugar diets fed to mice. Food Chem 2024; 430:136915. [PMID: 37515908 DOI: 10.1016/j.foodchem.2023.136915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 07/13/2023] [Accepted: 07/15/2023] [Indexed: 07/31/2023]
Abstract
As a natural sweetener produced by honey bees, honey was recognized as being healthier for consumption than table sugar. Our previous study also indicated thatmetaboliteprofiles in mice fed honey and mixedsugardiets aredifferent. However, it is still noteworthy about the batch-to-batch consistency of the metabolic differences between two diet types. Here, the machine learning (ML) algorithms were applied to complement and calibrate HPLC-QTOF/MS-based untargeted metabolomics data. Data were generated from three batches of mice that had the same treatment, which can further mine the metabolite biomarkers. Random Forest and Extra-Trees models could better discriminate between honey and mixed sugar dietary patterns under five-fold cross-validation. Finally, SHapley Additive exPlanations tool identified phosphatidylethanolamine and phosphatidylcholine as reliable metabolic biomarkers to discriminate the honey diet from the mixed sugar diet. This study provides us new ideas for metabolomic analysis of larger data sets.
Collapse
Affiliation(s)
- Xing Zheng
- State Key Laboratory of Resource Insects, Institute of Apiculture Research, Chinese Academy of Agricultural Sciences, Beijing 100093, China
| | - Fei Pan
- State Key Laboratory of Resource Insects, Institute of Apiculture Research, Chinese Academy of Agricultural Sciences, Beijing 100093, China
| | - Nenad Naumovski
- University of Canberra Health Research Institute (UCHRI), University of Canberra, Locked Bag 1, Bruce, Canberra, ACT 2601, Australia
| | - Yue Wei
- College of Science & Technology, Hebei Agricultural University, Huanghua, Hebei 061100, China
| | - Liming Wu
- State Key Laboratory of Resource Insects, Institute of Apiculture Research, Chinese Academy of Agricultural Sciences, Beijing 100093, China
| | - Wenjun Peng
- State Key Laboratory of Resource Insects, Institute of Apiculture Research, Chinese Academy of Agricultural Sciences, Beijing 100093, China.
| | - Kai Wang
- State Key Laboratory of Resource Insects, Institute of Apiculture Research, Chinese Academy of Agricultural Sciences, Beijing 100093, China.
| |
Collapse
|
22
|
Wani NA, Kumar R, Bedi J. DeepXplainer: An interpretable deep learning based approach for lung cancer detection using explainable artificial intelligence. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 243:107879. [PMID: 37897989 DOI: 10.1016/j.cmpb.2023.107879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 10/17/2023] [Accepted: 10/20/2023] [Indexed: 10/30/2023]
Abstract
BACKGROUND AND OBJECTIVE Artificial intelligence (AI) has several uses in the healthcare industry, some of which include healthcare management, medical forecasting, practical making of decisions, and diagnosis. AI technologies have reached human-like performance, but their use is limited since they are still largely viewed as opaque black boxes. This distrust remains the primary factor for their limited real application, particularly in healthcare. As a result, there is a need for interpretable predictors that provide better predictions and also explain their predictions. METHODS This study introduces "DeepXplainer", a new interpretable hybrid deep learning-based technique for detecting lung cancer and providing explanations of the predictions. This technique is based on a convolutional neural network and XGBoost. XGBoost is used for class label prediction after "DeepXplainer" has automatically learned the features of the input using its many convolutional layers. For providing explanations or explainability of the predictions, an explainable artificial intelligence method known as "SHAP" is implemented. RESULTS The open-source "Survey Lung Cancer" dataset was processed using this method. On multiple parameters, including accuracy, sensitivity, F1-score, etc., the proposed method outperformed the existing methods. The proposed method obtained an accuracy of 97.43%, a sensitivity of 98.71%, and an F1-score of 98.08. After the model has made predictions with this high degree of accuracy, each prediction is explained by implementing an explainable artificial intelligence method at both the local and global levels. CONCLUSIONS A deep learning-based classification model for lung cancer is proposed with three primary components: one for feature learning, another for classification, and a third for providing explanations for the predictions made by the proposed hybrid (ConvXGB) model. The proposed "DeepXplainer" has been evaluated using a variety of metrics, and the results demonstrate that it outperforms the current benchmarks. Providing explanations for the predictions, the proposed approach may help doctors in detecting and treating lung cancer patients more effectively.
Collapse
Affiliation(s)
- Niyaz Ahmad Wani
- Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala (PIN: 147004), Punjab, India.
| | - Ravinder Kumar
- Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala (PIN: 147004), Punjab, India.
| | - Jatin Bedi
- Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala (PIN: 147004), Punjab, India.
| |
Collapse
|
23
|
Seo S, Lee JW. Applications of Big Data and AI-Driven Technologies in CADD (Computer-Aided Drug Design). Methods Mol Biol 2024; 2714:295-305. [PMID: 37676605 DOI: 10.1007/978-1-0716-3441-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
In the field of computer-aided drug design (CADD), there has been dramatic progress in the development of big data and AI-driven methodologies. The expensive and time-consuming process of drug design is related to biomedical complexity. CADD can be used to apply effective and efficient strategies to overcome obstacles in the field of drug design in order to properly design and develop a new medicine. To prepare the raw data for consistent and repeatable applications of big data and AI methodologies, data pre-processing methods are introduced. Big data and AI technologies can be used to develop drugs in areas including predicting absorption, distribution, metabolism, excretion, and toxicity properties as well as finding binding sites in target proteins and conducting structure-based virtual screenings. The accurate and thorough analysis of large amounts of biomedical data as well as the design of prediction models in the area of drug design is made possible by data pre-processing and applications of big data and AI skills. In the biomedical big data era, knowledge on the biological, chemical, or pharmacological structures of biomedical entities relevant to drug design should be analyzed with significant big data and AI approaches.
Collapse
Affiliation(s)
- Seongmin Seo
- Department of Mechanical Engineering, Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea
| | - Jai Woo Lee
- Department of Big Data Science, College of Public Policy, Korea University, Sejong, Republic of Korea.
| |
Collapse
|
24
|
Chen L, Yuan L, Sun T, Liu R, Huang Q, Deng S. The performance of VCS(volume, conductivity, light scatter) parameters in distinguishing latent tuberculosis and active tuberculosis by using machine learning algorithm. BMC Infect Dis 2023; 23:881. [PMID: 38104064 PMCID: PMC10725592 DOI: 10.1186/s12879-023-08531-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 08/11/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND Tuberculosis is a chronic infectious disease caused by mycobacterium tuberculosis (MTB) and is the ninth leading cause of death worldwide. It is still difficult to distinguish active TB from latent TB,but it is very important for individualized management and treatment to distinguish whether patients are active or latent tuberculosis infection. METHODS A total of 220 subjects, including active TB patients (ATB, n = 97) and latent TB patients (LTB, n = 113), were recruited in this study .46 features about blood routine indicators and the VCS parameters (volume, conductivity, light scatter) of neutrophils(NE), monocytes(MO), and lymphocytes(LY) were collected and was constructed classification model by four machine learning algorithms(logistic regression(LR), random forest(RF), support vector machine(SVM) and k-nearest neighbor(KNN)). And the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUROC) to estimate of the model's predictive performance for dentifying active and latent tuberculosis infection. RESULTS After verification,among the four classifications, LR and RF had the best performance (AUROC = 1, AUPRC = 1), followed by SVM (AUROC = 0.967, AUPRC = 0.971), KNN (AUROC = 0.943, AUPRC = 0.959) in the training set. And LR had the best performance (AUROC = 0.977, AUPRC = 0.957), followed by SVM (AUROC = 0.962, AUPRC = 0.949), RF (AUROC = 0.903, AUPRC = 0.922),KNN(AUROC = 0.883, AUPRC = 0.901) in the testing set. CONCLUSIONS The machine learning algorithm classifier based on leukocyte VCS parameters is of great value in identifying active and latent tuberculosis infection.
Collapse
Affiliation(s)
- Lijiao Chen
- Department of Laboratory Medicine, Daping Hospital, Army Medical University, Chongqing, 400042, P.R. China
| | - Lingke Yuan
- Science in Computational Finance, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tingting Sun
- College of Medical Technology, Chongqing Medical and Pharmaceutical College, Chongqing, China
| | - Ruiqing Liu
- Department of Laboratory Medicine, Daping Hospital, Army Medical University, Chongqing, 400042, P.R. China
| | - Qing Huang
- Department of Laboratory Medicine, Daping Hospital, Army Medical University, Chongqing, 400042, P.R. China.
| | - Shaoli Deng
- Department of Laboratory Medicine, Daping Hospital, Army Medical University, Chongqing, 400042, P.R. China.
| |
Collapse
|
25
|
Gammaldi N, Pezzini F, Michelucci E, Di Giorgi N, Simonati A, Rocchiccioli S, Santorelli FM, Doccini S. Integrative human and murine multi-omics: Highlighting shared biomarkers in the neuronal ceroid lipofuscinoses. Neurobiol Dis 2023; 189:106349. [PMID: 37952681 DOI: 10.1016/j.nbd.2023.106349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/08/2023] [Accepted: 11/09/2023] [Indexed: 11/14/2023] Open
Abstract
Neuronal ceroid lipofuscinosis (NCL) is a group of neurodegenerative disorders whose molecular mechanisms remain largely unknown. Omics approaches are among the methods that generate new information on modifying factors and molecular signatures. Moreover, omics data integration can address the need to progressively expand knowledge around the disease and pinpoint specific proteins to promote as candidate biomarkers. In this work, we integrated a total of 62 proteomic and transcriptomic datasets originating from humans and mice, employing a new approach able to define dysregulated processes across species, stages and NCL forms. Moreover, we selected a pool of differentially expressed proteins and genes as species- and form-related biomarkers of disease status/progression and evaluated local and spatial differences in most affected brain regions. Our results offer promising targets for potential new therapeutic strategies and reinforce the hypothesis of a connection between NCLs and other forms of dementia, particularly Alzheimer's disease.
Collapse
Affiliation(s)
- N Gammaldi
- Department of Neurosciences, Psychology, Drug Research and Child Health (NEUROFARBA), University of Florence, Florence, Italy; Molecular Medicine for Neurodegenerative and Neuromuscular Diseases Unit, IRCCS Stella Maris Foundation - Pisa, Italy
| | - F Pezzini
- Department of Surgery, Dentistry, Paediatrics and Gynaecology, University of Verona, Verona, Italy
| | - E Michelucci
- Clinical Physiology-National Research Council (IFC-CNR), Pisa, Italy
| | - N Di Giorgi
- Clinical Physiology-National Research Council (IFC-CNR), Pisa, Italy
| | - A Simonati
- Department of Surgery, Dentistry, Paediatrics and Gynaecology, University of Verona, Verona, Italy
| | - S Rocchiccioli
- Clinical Physiology-National Research Council (IFC-CNR), Pisa, Italy
| | - F M Santorelli
- Molecular Medicine for Neurodegenerative and Neuromuscular Diseases Unit, IRCCS Stella Maris Foundation - Pisa, Italy
| | - S Doccini
- Molecular Medicine for Neurodegenerative and Neuromuscular Diseases Unit, IRCCS Stella Maris Foundation - Pisa, Italy.
| |
Collapse
|
26
|
Spick M, Muazzam A, Pandha H, Michael A, Gethings LA, Hughes CJ, Munjoma N, Plumb RS, Wilson ID, Whetton AD, Townsend PA, Geifman N. Multi-omic diagnostics of prostate cancer in the presence of benign prostatic hyperplasia. Heliyon 2023; 9:e22604. [PMID: 38076065 PMCID: PMC10709398 DOI: 10.1016/j.heliyon.2023.e22604] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 11/01/2023] [Accepted: 11/15/2023] [Indexed: 09/11/2024] Open
Abstract
There is an unmet need for improved diagnostic testing and risk prediction for cases of prostate cancer (PCa) to improve care and reduce overtreatment of indolent disease. Here we have analysed the serum proteome and lipidome of 262 study participants by liquid chromatography-mass spectrometry, including participants diagnosed with PCa, benign prostatic hyperplasia (BPH), or otherwise healthy volunteers, with the aim of improving biomarker specificity. Although a two-class machine learning model separated PCa from controls with sensitivity of 0.82 and specificity of 0.95, adding BPH resulted in a statistically significant decline in specificity for prostate cancer to 0.76, with half of BPH cases being misclassified by the model as PCa. A small number of biomarkers differentiating between BPH and prostate cancer were identified, including proteins in MAP Kinase pathways, as well as in lipids containing oleic acid; these may offer a route to greater specificity. These results highlight, however, that whilst there are opportunities for machine learning, these will only be achieved by use of appropriate training sets that include confounding comorbidities, especially when calculating the specificity of a test.
Collapse
Affiliation(s)
- Matt Spick
- School of Health Sciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford, Surrey, GU2 7YH, United Kingdom
| | - Ammara Muazzam
- The Hospital for Sick Children (SickKids), 555 University Ave, Toronto, ON M5G 1X8, Canada
- Division of Cancer Sciences, Manchester Cancer Research Center, Manchester Academic Health Sciences Center, University of Manchester, Manchester, M20 4GJ, United Kingdom
| | - Hardev Pandha
- School of Biosciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom
| | - Agnieszka Michael
- School of Biosciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom
| | - Lee A. Gethings
- School of Health Sciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford, Surrey, GU2 7YH, United Kingdom
- Waters Corporation, Wilmslow, Cheshire, SK9 4AX, United Kingdom
- Manchester Institute of Biotechnology, Division of Infection, Immunity and Respiratory Medicine, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, M13 9PL, United Kingdom
| | | | | | - Robert S. Plumb
- Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College, Burlington Danes Building, Du Cane Road, London, W12 0NN, United Kingdom
| | - Ian D. Wilson
- Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College, Burlington Danes Building, Du Cane Road, London, W12 0NN, United Kingdom
| | - Anthony D. Whetton
- Veterinary Health Innovation Engine, Faculty of Health and Medical Sciences, University of Surrey, Guildford, Surrey, GU2 7YH, United Kingdom
- School of Veterinary Medicine, Faculty of Health and Medical Sciences, University of Surrey, Guildford, Surrey, GU2 7YH, United Kingdom
- Division of Cancer Sciences, Manchester Cancer Research Center, Manchester Academic Health Sciences Center, University of Manchester, Manchester, M20 4GJ, United Kingdom
| | - Paul A. Townsend
- School of Biosciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom
- Division of Cancer Sciences, Manchester Cancer Research Center, Manchester Academic Health Sciences Center, University of Manchester, Manchester, M20 4GJ, United Kingdom
| | - Nophar Geifman
- School of Health Sciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford, Surrey, GU2 7YH, United Kingdom
| |
Collapse
|
27
|
Jia X, Wang T, Zhu H. Advancing Computational Toxicology by Interpretable Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17690-17706. [PMID: 37224004 PMCID: PMC10666545 DOI: 10.1021/acs.est.3c00653] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/05/2023] [Accepted: 05/05/2023] [Indexed: 05/26/2023]
Abstract
Chemical toxicity evaluations for drugs, consumer products, and environmental chemicals have a critical impact on human health. Traditional animal models to evaluate chemical toxicity are expensive, time-consuming, and often fail to detect toxicants in humans. Computational toxicology is a promising alternative approach that utilizes machine learning (ML) and deep learning (DL) techniques to predict the toxicity potentials of chemicals. Although the applications of ML- and DL-based computational models in chemical toxicity predictions are attractive, many toxicity models are "black boxes" in nature and difficult to interpret by toxicologists, which hampers the chemical risk assessments using these models. The recent progress of interpretable ML (IML) in the computer science field meets this urgent need to unveil the underlying toxicity mechanisms and elucidate the domain knowledge of toxicity models. In this review, we focused on the applications of IML in computational toxicology, including toxicity feature data, model interpretation methods, use of knowledge base frameworks in IML development, and recent applications. The challenges and future directions of IML modeling in toxicology are also discussed. We hope this review can encourage efforts in developing interpretable models with new IML algorithms that can assist new chemical assessments by illustrating toxicity mechanisms in humans.
Collapse
Affiliation(s)
- Xuelian Jia
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Tong Wang
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Hao Zhu
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| |
Collapse
|
28
|
Venkatesan VK, Kuppusamy Murugesan KR, Chandrasekaran KA, Thyluru Ramakrishna M, Khan SB, Almusharraf A, Albuali A. Cancer Diagnosis through Contour Visualization of Gene Expression Leveraging Deep Learning Techniques. Diagnostics (Basel) 2023; 13:3452. [PMID: 37998588 PMCID: PMC10670706 DOI: 10.3390/diagnostics13223452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 10/30/2023] [Accepted: 11/04/2023] [Indexed: 11/25/2023] Open
Abstract
Prompt diagnostics and appropriate cancer therapy necessitate the use of gene expression databases. The integration of analytical methods can enhance detection precision by capturing intricate patterns and subtle connections in the data. This study proposes a diagnostic-integrated approach combining Empirical Bayes Harmonization (EBS), Jensen-Shannon Divergence (JSD), deep learning, and contour mathematics for cancer detection using gene expression data. EBS preprocesses the gene expression data, while JSD measures the distributional differences between cancerous and non-cancerous samples, providing invaluable insights into gene expression patterns. Deep learning (DL) models are employed for automatic deep feature extraction and to discern complex patterns from the data. Contour mathematics is applied to visualize decision boundaries and regions in the high-dimensional feature space. JSD imparts significant information to the deep learning model, directing it to concentrate on pertinent features associated with cancerous samples. Contour visualization elucidates the model's decision-making process, bolstering interpretability. The amalgamation of JSD, deep learning, and contour mathematics in gene expression dataset analysis diagnostics presents a promising pathway for precise cancer detection. This method taps into the prowess of deep learning for feature extraction while employing JSD to pinpoint distributional differences and contour mathematics for visual elucidation. The outcomes underscore its potential as a formidable instrument for cancer detection, furnishing crucial insights for timely diagnostics and tailor-made treatment strategies.
Collapse
Affiliation(s)
- Vinoth Kumar Venkatesan
- School of Computer Science Engineering and Information Systems (SCORE), Vellore Institute of Technology, Vellore 632014, India;
| | - Karthick Raghunath Kuppusamy Murugesan
- Department of Computer Science and Engineering, Faculty of Engineering and Technology, JAIN (Deemed-to-be University), Bangalore 562112, India; (K.R.K.M.); (M.T.R.)
| | | | - Mahesh Thyluru Ramakrishna
- Department of Computer Science and Engineering, Faculty of Engineering and Technology, JAIN (Deemed-to-be University), Bangalore 562112, India; (K.R.K.M.); (M.T.R.)
| | - Surbhi Bhatia Khan
- Department of Data Science, School of Science Engineering and Environment, University of Salford, Manchester M5 4WT, UK
- Department of Engineering and Environment, University of Religions and Denominations, Qom 37491-13357, Iran
- Department of Electrical and Computer Engineering, Lebanese American University, Byblos P.O. Box 13-5053, Lebanon
| | - Ahlam Almusharraf
- Department of Business Administration, College of Business and Administration, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia;
| | - Abdullah Albuali
- Department of Computer Science, School of Computer Science and Information Technology, King Faisal University, Hofuf 11671, Saudi Arabia;
| |
Collapse
|
29
|
Loh HW, Ooi CP, Oh SL, Barua PD, Tan YR, Molinari F, March S, Acharya UR, Fung DSS. Deep neural network technique for automated detection of ADHD and CD using ECG signal. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 241:107775. [PMID: 37651817 DOI: 10.1016/j.cmpb.2023.107775] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 08/09/2023] [Accepted: 08/22/2023] [Indexed: 09/02/2023]
Abstract
BACKGROUND AND OBJECTIVE Attention Deficit Hyperactivity problem (ADHD) is a common neurodevelopment problem in children and adolescents that can lead to long-term challenges in life outcomes if left untreated. Also, ADHD is frequently associated with Conduct Disorder (CD), and multiple research have found similarities in clinical signs and behavioral symptoms between both diseases, making differentiation between ADHD, ADHD comorbid with CD (ADHD+CD), and CD a subjective diagnosis. Therefore, the goal of this pilot study is to create the first explainable deep learning (DL) model for objective ECG-based ADHD/CD diagnosis as having an objective biomarker may improve diagnostic accuracy. METHODS The dataset used in this study consist of ECG data collected from 45 ADHD, 62 ADHD+CD, and 16 CD patients at the Child Guidance Clinic in Singapore. The ECG data were segmented into 2 s epochs and directly used to train our 1-dimensional (1D) convolutional neural network (CNN) model. RESULTS The proposed model yielded 96.04% classification accuracy, 96.26% precision, 95.99% sensitivity, and 96.11% F1-score. The Gradient-weighted class activation mapping (Grad-CAM) function was also used to highlight the important ECG characteristics at specific time points that most impact the classification score. CONCLUSION In addition to achieving model performance results with our suggested DL method, Grad-CAM's implementation also offers vital temporal data that clinicians and other mental healthcare professionals can use to make wise medical judgments. We hope that by conducting this pilot study, we will be able to encourage larger-scale research with a larger biosignal dataset. Hence allowing biosignal-based computer-aided diagnostic (CAD) tools to be implemented in healthcare and ambulatory settings, as ECG can be easily obtained via wearable devices such as smartwatches.
Collapse
Affiliation(s)
- Hui Wen Loh
- School of Science and Technology, Singapore University of Social Sciences, Singapore
| | - Chui Ping Ooi
- School of Science and Technology, Singapore University of Social Sciences, Singapore
| | - Shu Lih Oh
- Cogninet Australia, Sydney, NSW 2010, Australia
| | - Prabal Datta Barua
- Cogninet Australia, Sydney, NSW 2010, Australia; Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW 2007, Australia; School of Business (Information System), University of Southern Queensland, Australia; Australian International Institute of Higher Education, Sydney, NSW 2000, Australia; School of Science & Technology, University of New England, Australia; School of Biosciences, Taylor's University, Malaysia; School of Computing, SRM Institute of Science and Technology, India; School of Science and Technology, Kumamoto University, Japan; Sydney School of Education and Social work, University of Sydney, Australia
| | - Yi Ren Tan
- Developmental Psychiatry, Institute of Mental Health, Singapore
| | - Filippo Molinari
- Biolab, Department of Electronics and Telecommunications, Politecnico di Torino, 10129 Torino, Italy
| | - Sonja March
- Centre for Health Research and School of Psychology and Wellbeing, University of Southern Queensland, Springfield, Australia
| | - U Rajendra Acharya
- School of Mathematics, Physics, and Computing, University of Southern Queensland, Springfield, Australia.
| | - Daniel Shuen Sheng Fung
- Developmental Psychiatry, Institute of Mental Health, Singapore; Lee Kong Chian School of Medicine, Nanyang Technological University, DUKE NUS Medical School, Yong Loo Lin School of Medicine, National University of Singapore
| |
Collapse
|
30
|
Layton AT. "Hi, how can i help you?": embracing artificial intelligence in kidney research. Am J Physiol Renal Physiol 2023; 325:F395-F406. [PMID: 37589052 DOI: 10.1152/ajprenal.00177.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 08/08/2023] [Accepted: 08/09/2023] [Indexed: 08/18/2023] Open
Abstract
In recent years, biology and precision medicine have benefited from major advancements in generating large-scale molecular and biomedical datasets and in analyzing those data using advanced machine learning algorithms. Machine learning applications in kidney physiology and pathophysiology include segmenting kidney structures from imaging data and predicting conditions like acute kidney injury or chronic kidney disease using electronic health records. Despite the potential of machine learning to revolutionize nephrology by providing innovative diagnostic and therapeutic tools, its adoption in kidney research has been slower than in other organ systems. Several factors contribute to this underutilization. The complexity of the kidney as an organ, with intricate physiology and specialized cell populations, makes it challenging to extrapolate bulk omics data to specific processes. In addition, kidney diseases often present with overlapping manifestations and morphological changes, making diagnosis and treatment complex. Moreover, kidney diseases receive less funding compared with other pathologies, leading to lower awareness and limited public-private partnerships. To promote the use of machine learning in kidney research, this review provides an introduction to machine learning and reviews its notable applications in renal research, such as morphological analysis, omics data examination, and disease diagnosis and prognosis. Challenges and limitations associated with data-driven predictive techniques are also discussed. The goal of this review is to raise awareness and encourage the kidney research community to embrace machine learning as a powerful tool that can drive advancements in understanding kidney diseases and improving patient care.
Collapse
Affiliation(s)
- Anita T Layton
- Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario, Canada
- Department of Biology, University of Waterloo, Waterloo, Ontario, Canada
- Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
- School of Pharmacology, University of Waterloo, Waterloo, Ontario, Canada
| |
Collapse
|
31
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network. Proc Natl Acad Sci U S A 2023; 120:e2303590120. [PMID: 37729196 PMCID: PMC10523478 DOI: 10.1073/pnas.2303590120] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 08/14/2023] [Indexed: 09/22/2023] Open
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key posttranslational modification involved in physiology and disease. The ability to robustly and rapidly predict protease-substrate specificity would also enable targeted proteolytic cleavage by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pretrained PGCN model to guide the design of protease libraries for cleaving two noncanonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Joseph H. Lubin
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | | | - Guanyang Wang
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| |
Collapse
|
32
|
Wang H, Xia Z, Xu Y, Sun J, Wu J. The predictive value of machine learning and nomograms for lymph node metastasis of prostate cancer: a systematic review and meta-analysis. Prostate Cancer Prostatic Dis 2023; 26:602-613. [PMID: 37488275 DOI: 10.1038/s41391-023-00704-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 07/10/2023] [Accepted: 07/17/2023] [Indexed: 07/26/2023]
Abstract
BACKGROUND In clinical practice, there are currently a variety of nomograms for predicting lymph node metastasis (LNM) of prostate cancer. At the same time, some scholars have introduced machine learning (ML) into the prediction of LNM of prostate cancer. However, the predictive value of nomograms and ML remains controversial. Based on this situation, this systematic review and meta-analysis was performed to explore the predictive value of various nomograms currently recommended and newly-developed ML models for LNM in prostate cancer patients. EVIDENCE ACQUISITION Cochrane, PubMed, Embase, and Web of Science were searched up to November 1, 2022. The risk of bias in the included studies was evaluated using the Prediction model Risk of Bias Assessment Tool (PROBAST). The concordance index (C-index), sensitivity, and specificity were adopted to evaluate the predictive accuracy of the models. RESULTS Thirty-one studies (18,803 patients) were included. Seven kinds of nomograms currently recommended, dominated by Briganti nomogram or MSKCC nomogram, were covered in the included studies. For newly-developed ML models, the C-index for LNM prediction in the training set and validation set was 0.846 [95%CI (0.818, 0.873)] and 0.862 [95%CI (0.819-0.905)] respectively. Most ML models in the training set were based on Logistic Regression (LR), which had a sensitivity of 0.78 [95%CI (0.70, 0.85)] and a specificity of 0.85 [95%CI (0.77, 0.90)] in the training set, and a sensitivity of 0.81 [95%CI (0.67, 0.89)] and a specificity of 0.82 [95%CI (0.75, 0.88)] in the validation set. For the recommended nomograms, the C-index in the validation set was 0.745 [95%CI (0.701, 0.790)] for the Briganti nomogram and 0.714 [95%CI (0.662, 0.765)] for the MSKCC nomogram. CONCLUSION The predictive accuracy of ML is superior to existing clinically recommended nomograms, and appropriate updates can be conducted to existing nomograms according to special situations.
Collapse
Affiliation(s)
- Hao Wang
- Department of Urology, Nanchong Central Hospital, The Second Clinical College, North Sichuan Medical College (University), Nanchong, 637000, Sichuan, China
| | - Zhongyou Xia
- Department of Urology, Nanchong Central Hospital, The Second Clinical College, North Sichuan Medical College (University), Nanchong, 637000, Sichuan, China
| | - Yulai Xu
- Department of Urology, Nanchong Central Hospital, The Second Clinical College, North Sichuan Medical College (University), Nanchong, 637000, Sichuan, China
| | - Jing Sun
- Department of Urology, Nanchong Central Hospital, The Second Clinical College, North Sichuan Medical College (University), Nanchong, 637000, Sichuan, China
| | - Ji Wu
- Department of Urology, Nanchong Central Hospital, The Second Clinical College, North Sichuan Medical College (University), Nanchong, 637000, Sichuan, China.
| |
Collapse
|
33
|
Blutt SE, Coarfa C, Neu J, Pammi M. Multiomic Investigations into Lung Health and Disease. Microorganisms 2023; 11:2116. [PMID: 37630676 PMCID: PMC10459661 DOI: 10.3390/microorganisms11082116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/08/2023] [Accepted: 08/13/2023] [Indexed: 08/27/2023] Open
Abstract
Diseases of the lung account for more than 5 million deaths worldwide and are a healthcare burden. Improving clinical outcomes, including mortality and quality of life, involves a holistic understanding of the disease, which can be provided by the integration of lung multi-omics data. An enhanced understanding of comprehensive multiomic datasets provides opportunities to leverage those datasets to inform the treatment and prevention of lung diseases by classifying severity, prognostication, and discovery of biomarkers. The main objective of this review is to summarize the use of multiomics investigations in lung disease, including multiomics integration and the use of machine learning computational methods. This review also discusses lung disease models, including animal models, organoids, and single-cell lines, to study multiomics in lung health and disease. We provide examples of lung diseases where multi-omics investigations have provided deeper insight into etiopathogenesis and have resulted in improved preventative and therapeutic interventions.
Collapse
Affiliation(s)
- Sarah E. Blutt
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX 77030, USA;
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA;
| | - Cristian Coarfa
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA;
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Josef Neu
- Department of Pediatrics, Section of Neonatology, University of Florida, Gainesville, FL 32611, USA;
| | - Mohan Pammi
- Department of Pediatrics, Section of Neonatology, Baylor College of Medicine and Texas Children’s Hospital, Houston, TX 77030, USA
| |
Collapse
|
34
|
Röbeck P, Franzén B, Cantera-Ahlman R, Dragomir A, Auer G, Jorulf H, Jacobsson SP, Viktorsson K, Lewensohn R, Häggman M, Ladjevardi S. Multiplex protein analysis and ensemble machine learning methods of fine needle aspirates from prostate cancer patients reveal potential diagnostic signatures associated with tumour grade. Cytopathology 2023; 34:286-294. [PMID: 36840380 DOI: 10.1111/cyt.13226] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 02/06/2023] [Accepted: 02/16/2023] [Indexed: 02/26/2023]
Abstract
BACKGROUND Improved molecular diagnosis is needed in prostate cancer (PC). Fine needle aspiration (FNA) is a minimally invasive biopsy technique, less traumatic compared to core needle biopsy, and could be useful for diagnosis of PC. Molecular biomarkers (BMs) in FNA-samples can be assessed for prediction, eg of immunotherapy efficacy before treatment as well as at treatment decision time points during disease progression. METHODS In the present pilot study, the expression levels of 151 BM proteins were analysed by proximity extension assay in FNA-samples from 16 patients, including benign prostate lesions (n = 3) and cancers (n = 13). An ensemble data analysis strategy was applied using several machine learning models. RESULTS Twelve potentially predictive BM proteins correlating with International Society of Urological Pathology grade groups were identified, among them vimentin, tissue factor pathway inhibitor 2, and integrin beta-5. The validity of the results was supported by network analysis that showed functional associations between most of the identified putative BMs. We also showed that multiple immune checkpoint targets can be assessed (eg PD-L1, CD137, and Galectin-9), which may support the selection of immunotherapy in advanced PC. Results are promising but need further validation in a larger cohort. CONCLUSIONS Our pilot study represents a "proof of concept" and shows that multiplex profiling of potential diagnostic and predictive BM proteins is feasible on tumour material obtained by FNA sampling of prostate cancer. Moreover, our results demonstrate that an ensemble data analysis strategy may facilitate the identification of BM signatures in pilot studies when the patient cohort is limited.
Collapse
Affiliation(s)
- Pontus Röbeck
- Department of Urology, Uppsala University, Uppsala, Sweden
- Department of Surgical Sciences, Uppsala University, Uppsala, Sweden
| | - Bo Franzén
- Department of Oncology and Pathology, Karolinska Institutet, Stockholm, Sweden
| | - Rafaele Cantera-Ahlman
- Department of Urology, Uppsala University, Uppsala, Sweden
- Department of Surgical Sciences, Uppsala University, Uppsala, Sweden
| | - Anca Dragomir
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Gert Auer
- Department of Oncology and Pathology, Karolinska Institutet, Stockholm, Sweden
| | - Håkan Jorulf
- Department of Urology, Uppsala University, Uppsala, Sweden
- Department of Surgical Sciences, Uppsala University, Uppsala, Sweden
| | - Sven P Jacobsson
- Department of Analytical Chemistry, Stockholm University, Stockholm, Sweden
| | - Kristina Viktorsson
- Department of Oncology and Pathology, Karolinska Institutet, Stockholm, Sweden
| | - Rolf Lewensohn
- Department of Oncology and Pathology, Karolinska Institutet, Stockholm, Sweden
- Theme Cancer, Medical Unit Head and Neck, Lung, and Skin Tumors, Thoracic Oncology Center, Karolinska University Hospital, Solna, Sweden
| | - Michael Häggman
- Department of Urology, Uppsala University, Uppsala, Sweden
- Department of Surgical Sciences, Uppsala University, Uppsala, Sweden
| | - Sam Ladjevardi
- Department of Urology, Uppsala University, Uppsala, Sweden
- Department of Surgical Sciences, Uppsala University, Uppsala, Sweden
| |
Collapse
|
35
|
Petrini I, Cecchini RL, Mascaró M, Ponzoni I, Carballido JA. Papillary Thyroid Carcinoma: A thorough Bioinformatic Analysis of Gene Expression and Clinical Data. Genes (Basel) 2023; 14:1250. [PMID: 37372430 DOI: 10.3390/genes14061250] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 05/30/2023] [Accepted: 06/06/2023] [Indexed: 06/29/2023] Open
Abstract
The likelihood of being diagnosed with thyroid cancer has increased in recent years; it is the fastest-expanding cancer in the United States and it has tripled in the last three decades. In particular, Papillary Thyroid Carcinoma (PTC) is the most common type of cancer affecting the thyroid. It is a slow-growing cancer and, thus, it can usually be cured. However, given the worrying increase in the diagnosis of this type of cancer, the discovery of new genetic markers for accurate treatment and prognostic is crucial. In the present study, the aim is to identify putative genes that may be specifically relevant in PTC through bioinformatic analysis of several gene expression public datasets and clinical information. Two datasets from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) dataset were studied. Statistics and machine learning methods were sequentially employed to retrieve a final small cluster of genes of interest: PTGFR, ZMAT3, GABRB2, and DPP6. Kaplan-Meier plots were employed to assess the expression levels regarding overall survival and relapse-free survival. Furthermore, a manual bibliographic search for each gene was carried out, and a Protein-Protein Interaction (PPI) network was built to verify existing associations among them, followed by a new enrichment analysis. The results revealed that all the genes are highly relevant in the context of thyroid cancer and, more particularly interesting, PTGFR and DPP6 have not yet been associated with the disease up to date, thus making them worthy of further investigation as to their relationship to PTC.
Collapse
Affiliation(s)
- Iván Petrini
- Department of Computer Science and Engineering, Universidad Nacional del Sur, Bahía Blanca 8000, Argentina
| | - Rocío L Cecchini
- Department of Computer Science and Engineering, Universidad Nacional del Sur, Bahía Blanca 8000, Argentina
- Institute for Computer Science and Engineering (UNS-CONICET), Bahía Blanca 8000, Argentina
| | - Marilina Mascaró
- Departamento de Biología, Bioquímica y Farmacia, Universidad Nacional del Sur, Bahía Blanca 8000, Argentina
| | - Ignacio Ponzoni
- Department of Computer Science and Engineering, Universidad Nacional del Sur, Bahía Blanca 8000, Argentina
- Institute for Computer Science and Engineering (UNS-CONICET), Bahía Blanca 8000, Argentina
| | - Jessica A Carballido
- Department of Computer Science and Engineering, Universidad Nacional del Sur, Bahía Blanca 8000, Argentina
- Institute for Computer Science and Engineering (UNS-CONICET), Bahía Blanca 8000, Argentina
| |
Collapse
|
36
|
Chong D, Jones NC, Schittenhelm RB, Anderson A, Casillas-Espinosa PM. Multi-omics Integration and Epilepsy: Towards a Better Understanding of Biological Mechanisms. Prog Neurobiol 2023:102480. [PMID: 37286031 DOI: 10.1016/j.pneurobio.2023.102480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/09/2023] [Accepted: 06/03/2023] [Indexed: 06/09/2023]
Abstract
The epilepsies are a group of complex neurological disorders characterised by recurrent seizures. Approximately 30% of patients fail to respond to anti-seizure medications, despite the recent introduction of many new drugs. The molecular processes underlying epilepsy development are not well understood and this knowledge gap impedes efforts to identify effective targets and develop novel therapies against epilepsy. Omics studies allow a comprehensive characterisation of a class of molecules. Omics-based biomarkers have led to clinically validated diagnostic and prognostic tests for personalised oncology, and more recently for non-cancer diseases. We believe that, in epilepsy, the full potential of multi-omics research is yet to be realised and we envisage that this review will serve as a guide to researchers planning to undertake omics-based mechanistic studies.
Collapse
Affiliation(s)
- Debbie Chong
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia
| | - Nigel C Jones
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia; Department of Medicine (The Royal Melbourne Hospital), The University of Melbourne, 3000, Victoria, Australia; Department of Neurology, Alfred Health, Melbourne, 3004, Victoria, Australia
| | - Ralf B Schittenhelm
- Monash Proteomics & Metabolomics Facility and Monash Biomedicine Discovery Institute, Monash University, Clayton, Victoria, 3800, Australia
| | - Alison Anderson
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia; Department of Medicine (The Royal Melbourne Hospital), The University of Melbourne, 3000, Victoria, Australia; Department of Neurology, Alfred Health, Melbourne, 3004, Victoria, Australia
| | - Pablo M Casillas-Espinosa
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia; Department of Medicine (The Royal Melbourne Hospital), The University of Melbourne, 3000, Victoria, Australia; Department of Neurology, Alfred Health, Melbourne, 3004, Victoria, Australia
| |
Collapse
|
37
|
Pinho SA, Anjo SI, Cunha-Oliveira T. Metabolic Priming as a Tool in Redox and Mitochondrial Theragnostics. Antioxidants (Basel) 2023; 12:1072. [PMID: 37237939 PMCID: PMC10215850 DOI: 10.3390/antiox12051072] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 05/05/2023] [Accepted: 05/06/2023] [Indexed: 05/28/2023] Open
Abstract
Theragnostics is a promising approach that integrates diagnostics and therapeutics into a single personalized strategy. To conduct effective theragnostic studies, it is essential to create an in vitro environment that accurately reflects the in vivo conditions. In this review, we discuss the importance of redox homeostasis and mitochondrial function in the context of personalized theragnostic approaches. Cells have several ways to respond to metabolic stress, including changes in protein localization, density, and degradation, which can promote cell survival. However, disruption of redox homeostasis can lead to oxidative stress and cellular damage, which are implicated in various diseases. Models of oxidative stress and mitochondrial dysfunction should be developed in metabolically conditioned cells to explore the underlying mechanisms of diseases and develop new therapies. By choosing an appropriate cellular model, adjusting cell culture conditions and validating the cellular model, it is possible to identify the most promising therapeutic options and tailor treatments to individual patients. Overall, we highlight the importance of precise and individualized approaches in theragnostics and the need to develop accurate in vitro models that reflect the in vivo conditions.
Collapse
Affiliation(s)
- Sónia A. Pinho
- CNC-Center for Neuroscience and Cell Biology, CIBB-Centre for Innovative Biomedicine and Biotechnology, University of Coimbra, 3060-197 Cantanhede, Portugal; (S.A.P.); (S.I.A.)
- PDBEB—PhD Programme in Experimental Biology and Biomedicine, Institute of Interdisciplinary Research (IIIUC), University of Coimbra, 3004-504 Coimbra, Portugal
- IIIUC, University of Coimbra, 3004-504 Coimbra, Portugal
| | - Sandra I. Anjo
- CNC-Center for Neuroscience and Cell Biology, CIBB-Centre for Innovative Biomedicine and Biotechnology, University of Coimbra, 3060-197 Cantanhede, Portugal; (S.A.P.); (S.I.A.)
- IIIUC, University of Coimbra, 3004-504 Coimbra, Portugal
| | - Teresa Cunha-Oliveira
- CNC-Center for Neuroscience and Cell Biology, CIBB-Centre for Innovative Biomedicine and Biotechnology, University of Coimbra, 3060-197 Cantanhede, Portugal; (S.A.P.); (S.I.A.)
- IIIUC, University of Coimbra, 3004-504 Coimbra, Portugal
| |
Collapse
|
38
|
Wang X, Zhang X, Li H, Zhang M, Liu Y, Li X. Application of machine learning algorithm in prediction of lymph node metastasis in patients with intermediate and high-risk prostate cancer. J Cancer Res Clin Oncol 2023:10.1007/s00432-023-04816-w. [PMID: 37127828 PMCID: PMC10374763 DOI: 10.1007/s00432-023-04816-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 04/23/2023] [Indexed: 05/03/2023]
Abstract
PURPOSE This study aims to establish the best prediction model of lymph node metastasis (LNM) in patients with intermediate- and high-risk prostate cancer (PCa) through machine learning (ML), and provide the guideline of accurate clinical diagnosis and precise treatment for clinicals. METHODS A total of 24,470 patients with intermediate- and high-risk PCa were included in this study. Multivariate logistic regression model was used to screen the independent risk factors of LNM. At the same time, six algorithms, namely random forest (RF), naive Bayesian classifier (NBC), xgboost (XGB), gradient boosting machine (GBM), logistic regression (LR) and decision tree (DT) are used to establish risk prediction models. Based on the best prediction performance of ML algorithm, a prediction model is established, and the performance of the model is evaluated from three aspects: area under curve (AUC), sensitivity and specificity. RESULTS In multivariate logistic regression analysis, T stage, PSA, Gleason score and bone metastasis were independent predictors of LNM in patients with intermediate- and high-risk PCa. By comprehensively comparing the prediction model performance of training set and test set, GBM model has the best prediction performance (F1 score = 0.838, AUROC = 0.804). Finally, we developed a preliminary calculator model that can quickly and accurately calculate the regional LNM in patients with intermediate- and high-risk PCa. CONCLUSION T stage, PSA, Gleason and bone metastasis were independent risk factors for predicting LNM in patients with intermediate- and high-risk PCa. The prediction model established in this study performs well; however, the GBM model is the best one.
Collapse
Affiliation(s)
- Xiangrong Wang
- Department of Urology, Gansu Provincial Hospital, Lanzhou, Gansu, China
| | - Xiangxiang Zhang
- Department of Urology, Gansu Provincial Hospital, Lanzhou, Gansu, China
| | - Hengping Li
- Department of Urology, Gansu Provincial Hospital, Lanzhou, Gansu, China.
| | - Mao Zhang
- Department of Urology, Gansu Provincial Hospital, Lanzhou, Gansu, China
| | - Yang Liu
- Department of Urology, Gansu Provincial Hospital, Lanzhou, Gansu, China
| | - Xuanpeng Li
- Department of Urology, Gansu Provincial Hospital, Lanzhou, Gansu, China
| |
Collapse
|
39
|
Pandey D, Onkara Perumal P. A scoping review on deep learning for next-generation RNA-Seq. data analysis. Funct Integr Genomics 2023; 23:134. [PMID: 37084004 DOI: 10.1007/s10142-023-01064-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/24/2023] [Accepted: 04/17/2023] [Indexed: 04/22/2023]
Abstract
In the last decade, transcriptome research adopting next-generation sequencing (NGS) technologies has gathered incredible momentum amongst functional genomics scientists, particularly amongst clinical/biomedical research groups. The progressive enfoldment/adoption of NGS technologies has incited an abundance of next-generation transcriptomic data harbouring an opulence of new knowledge in public databases. Nevertheless, knowledge discovery from these next-generation RNA-Seq. data analysis necessitates extensive bioinformatics know-how besides elaborate data analysis software packages consistent with the type and context of data analysis. Several reliability and reproducibility concerns continue to impede RNA-Seq. data analysis. Characteristic challenges comprise of data quality, hardware and networking provisions, selection and prioritisation of data analysis tools, and yet significantly implementing of robust machine learning algorithms for maximised exploitation of these experimental transcriptomic data. Over the years, numerous machine learning algorithms have been implemented for improved transcriptomic data analysis executing predominantly shallow learning approaches. More recently, deep learning algorithms are becoming more mainstream, and enactment for next-generation RNA-Seq. data analysis could be revolutionary in the coming years in the biomedical domain. In this scoping review, we attempt to determine the existing literature's size and potential nature in deep learning and NGS RNA-Seq. data analysis. An analysis of the contemporary topics of next-generation RNA-Seq. data analysis based on deep learning algorithms is critically reviewed, emphasising open-source resources.
Collapse
Affiliation(s)
- Diksha Pandey
- Department of Biotechnology, National Institute of Technology, Warangal, Telanga na, 506004, India
| | - P Onkara Perumal
- Department of Biotechnology, National Institute of Technology, Warangal, Telanga na, 506004, India.
| |
Collapse
|
40
|
Omar M, Dinalankara W, Mulder L, Coady T, Zanettini C, Imada EL, Younes L, Geman D, Marchionni L. Using biological constraints to improve prediction in precision oncology. iScience 2023; 26:106108. [PMID: 36852282 PMCID: PMC9958363 DOI: 10.1016/j.isci.2023.106108] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 12/20/2022] [Accepted: 01/28/2023] [Indexed: 02/05/2023] Open
Abstract
Many gene signatures have been developed by applying machine learning (ML) on omics profiles, however, their clinical utility is often hindered by limited interpretability and unstable performance. Here, we show the importance of embedding prior biological knowledge in the decision rules yielded by ML approaches to build robust classifiers. We tested this by applying different ML algorithms on gene expression data to predict three difficult cancer phenotypes: bladder cancer progression to muscle-invasive disease, response to neoadjuvant chemotherapy in triple-negative breast cancer, and prostate cancer metastatic progression. We developed two sets of classifiers: mechanistic, by restricting the training to features capturing specific biological mechanisms; and agnostic, in which the training did not use any a priori biological information. Mechanistic models had a similar or better testing performance than their agnostic counterparts, with enhanced interpretability. Our findings support the use of biological constraints to develop robust gene signatures with high translational potential.
Collapse
Affiliation(s)
- Mohamed Omar
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Wikum Dinalankara
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Lotte Mulder
- Technical University Delft, 2628 CD Delft, the Netherlands
| | - Tendai Coady
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Claudio Zanettini
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Eddie Luidy Imada
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Laurent Younes
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Donald Geman
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| |
Collapse
|
41
|
Bensoussan Y, Vanstrum EB, Johns MM, Rameau A. Artificial Intelligence and Laryngeal Cancer: From Screening to Prognosis: A State of the Art Review. Otolaryngol Head Neck Surg 2023; 168:319-329. [PMID: 35787073 DOI: 10.1177/01945998221110839] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 06/13/2022] [Indexed: 11/16/2022]
Abstract
OBJECTIVE This state of the art review aims to examine contemporary advances in applications of artificial intelligence (AI) to the screening, detection, management, and prognostication of laryngeal cancer (LC). DATA SOURCES Four bibliographic databases were searched: PubMed, EMBASE, Cochrane, and IEEE. REVIEW METHODS A structured review of the current literature (up to January 2022) was performed. Search terms related to topics of AI in LC were identified and queried by 2 independent reviewers. Citations of selected studies and review articles were also evaluated to ensure comprehensiveness. CONCLUSIONS AI applications in LC have encompassed a variety of data modalities, including radiomics, genomics, acoustics, clinical data, and videomics, to support screening, diagnosis, therapeutic decision making, and prognosis. However, most studies remain at the proof-of-concept level, as AI algorithms are trained on single-institution databases with limited data sets and a single data modality. IMPLICATIONS FOR PRACTICE AI algorithms in LC will need to be trained on large multi-institutional data sets and integrate multimodal data for optimal performance and clinical utility from screening to prognosis. Out of the data types reviewed, genomics has the most potential to provide generalizable models thanks to available large multi-institutional open access genomic data sets. Voice acoustic data represent an inexpensive and accurate biomarker, which is easy and noninvasive to capture, offering a unique opportunity for screening and monitoring of LA, especially in low-resource settings.
Collapse
Affiliation(s)
- Yael Bensoussan
- Department of Otolaryngology-Head and Neck Surgery, University of South Florida, Tampa, Florida, USA
| | - Erik B Vanstrum
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Michael M Johns
- Department of Otolaryngology-Head and Neck Surgery, University of Southern California, Los Angeles, California, USA
| | - Anaïs Rameau
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, New York, USA
| |
Collapse
|
42
|
Kortebein S, Gu S, Dai K, Zhao E, Riska K, Kaylie D, Hoa M. MRI Screening in Vestibular Schwannoma: A Deep Learning-based Analysis of Clinical and Audiometric Data. OTOLOGY & NEUROTOLOGY OPEN 2023; 3:e028. [PMID: 38516318 PMCID: PMC10950172 DOI: 10.1097/ono.0000000000000028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 12/05/2022] [Indexed: 03/23/2024]
Abstract
Objective To find a more objective method of assessing which patients should be screened for a vestibular schwannoma (VS) with magnetic resonance imaging (MRI) using a deep-learning algorithm to assess clinical and audiometric data. Materials and Methods Clinical and audiometric data were collected for 592 patients who received an audiogram between January 2015 and 2020 at Duke University Health Center with and without VS confirmed by MRI. These data were analyzed using a deep learning-based analysis to determine if the need for MRI screening could be assessed more objectively with adequate sensitivity and specificity. Results Patients with VS showed slightly elevated, but not statistically significant, mean thresholds compared to those without. Tinnitus, gradual hearing loss, and aural fullness were more common in patients with VS. Of these, only the presence of tinnitus was statistically significant. Several machine learning algorithms were used to incorporate and model the collected clinical and audiometric data, but none were able to distinguish ears with and without confirmed VS. When tumor size was taken into account the analysis was still unable to distinguish a difference. Conclusions Using audiometric and clinical data, deep learning-based analyses failed to produce an adequately sensitive and specific model for the detection of patients with VS. This suggests that a specific pattern of audiometric asymmetry and clinical symptoms may not necessarily be predictive of the presence/absence of VS to a level that clinicians would be comfortable forgoing an MRI.
Collapse
Affiliation(s)
- Sarah Kortebein
- Department of Head and Neck Surgery and Communication Sciences, Duke University School of Medicine, Durham, NC
| | - Shoujun Gu
- Auditory Development and Restoration Program, NIDCD Otolaryngology Surgeon-Scientist Program, Division of Intramural Research, NIDCD/NIH, Bethesda, MD
| | - Kathy Dai
- Department of Head and Neck Surgery and Communication Sciences, Duke University School of Medicine, Durham, NC
| | - Elizabeth Zhao
- Department of Head and Neck Surgery and Communication Sciences, Duke University School of Medicine, Durham, NC
| | - Kristal Riska
- Department of Head and Neck Surgery and Communication Sciences, Duke University School of Medicine, Durham, NC
| | - David Kaylie
- Department of Head and Neck Surgery and Communication Sciences, Duke University School of Medicine, Durham, NC
| | - Michael Hoa
- Auditory Development and Restoration Program, NIDCD Otolaryngology Surgeon-Scientist Program, Division of Intramural Research, NIDCD/NIH, Bethesda, MD
| |
Collapse
|
43
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and Design of Protease Enzyme Specificity Using a Structure-Aware Graph Convolutional Network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.16.528728. [PMID: 36824945 PMCID: PMC9949123 DOI: 10.1101/2023.02.16.528728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key post-translational modification involved in physiology and disease. The ability to robustly and rapidly predict protease substrate specificity would also enable targeted proteolytic cleavage - editing - of a target protein by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally-derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the three-dimensional structure and energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically-grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases: the NS3/4 protease from the Hepatitis C virus (HCV) and the Tobacco Etch Virus (TEV) proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pre-trained PGCN model to guide the design of TEV protease libraries for cleaving two non-canonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Joseph H. Lubin
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | | | - Guanyang Wang
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| |
Collapse
|
44
|
Mohammed MA, Abdulkareem KH, Dinar AM, Zapirain BG. Rise of Deep Learning Clinical Applications and Challenges in Omics Data: A Systematic Review. Diagnostics (Basel) 2023; 13:diagnostics13040664. [PMID: 36832152 PMCID: PMC9955380 DOI: 10.3390/diagnostics13040664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 02/05/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023] Open
Abstract
This research aims to review and evaluate the most relevant scientific studies about deep learning (DL) models in the omics field. It also aims to realize the potential of DL techniques in omics data analysis fully by demonstrating this potential and identifying the key challenges that must be addressed. Numerous elements are essential for comprehending numerous studies by surveying the existing literature. For example, the clinical applications and datasets from the literature are essential elements. The published literature highlights the difficulties encountered by other researchers. In addition to looking for other studies, such as guidelines, comparative studies, and review papers, a systematic approach is used to search all relevant publications on omics and DL using different keyword variants. From 2018 to 2022, the search procedure was conducted on four Internet search engines: IEEE Xplore, Web of Science, ScienceDirect, and PubMed. These indexes were chosen because they offer enough coverage and linkages to numerous papers in the biological field. A total of 65 articles were added to the final list. The inclusion and exclusion criteria were specified. Of the 65 publications, 42 are clinical applications of DL in omics data. Furthermore, 16 out of 65 articles comprised the review publications based on single- and multi-omics data from the proposed taxonomy. Finally, only a small number of articles (7/65) were included in papers focusing on comparative analysis and guidelines. The use of DL in studying omics data presented several obstacles related to DL itself, preprocessing procedures, datasets, model validation, and testbed applications. Numerous relevant investigations were performed to address these issues. Unlike other review papers, our study distinctly reflects different observations on omics with DL model areas. We believe that the result of this study can be a useful guideline for practitioners who look for a comprehensive view of the role of DL in omics data analysis.
Collapse
Affiliation(s)
- Mazin Abed Mohammed
- College of Computer Science and Information Technology, University of Anbar, Anbar 31001, Iraq
- eVIDA Lab, University of Deusto, 48007 Bilbao, Spain
- Correspondence: (M.A.M.); (B.G.Z.)
| | - Karrar Hameed Abdulkareem
- College of Agriculture, Al-Muthanna University, Samawah 66001, Iraq
- College of Engineering, University of Warith Al-Anbiyaa, Karbala 56001, Iraq
| | - Ahmed M. Dinar
- Computer Engineering Department, University of Technology- Iraq, Baghdad 19006, Iraq
| | | |
Collapse
|
45
|
Flores JE, Claborne DM, Weller ZD, Webb-Robertson BJM, Waters KM, Bramer LM. Missing data in multi-omics integration: Recent advances through artificial intelligence. Front Artif Intell 2023; 6:1098308. [PMID: 36844425 PMCID: PMC9949722 DOI: 10.3389/frai.2023.1098308] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Biological systems function through complex interactions between various 'omics (biomolecules), and a more complete understanding of these systems is only possible through an integrated, multi-omic perspective. This has presented the need for the development of integration approaches that are able to capture the complex, often non-linear, interactions that define these biological systems and are adapted to the challenges of combining the heterogenous data across 'omic views. A principal challenge to multi-omic integration is missing data because all biomolecules are not measured in all samples. Due to either cost, instrument sensitivity, or other experimental factors, data for a biological sample may be missing for one or more 'omic techologies. Recent methodological developments in artificial intelligence and statistical learning have greatly facilitated the analyses of multi-omics data, however many of these techniques assume access to completely observed data. A subset of these methods incorporate mechanisms for handling partially observed samples, and these methods are the focus of this review. We describe recently developed approaches, noting their primary use cases and highlighting each method's approach to handling missing data. We additionally provide an overview of the more traditional missing data workflows and their limitations; and we discuss potential avenues for further developments as well as how the missing data issue and its current solutions may generalize beyond the multi-omics context.
Collapse
Affiliation(s)
- Javier E. Flores
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Daniel M. Claborne
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Zachary D. Weller
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Bobbie-Jo M. Webb-Robertson
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Katrina M. Waters
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Lisa M. Bramer
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| |
Collapse
|
46
|
Tan P, Chen X, Zhang H, Wei Q, Luo K. Artificial intelligence aids in development of nanomedicines for cancer management. Semin Cancer Biol 2023; 89:61-75. [PMID: 36682438 DOI: 10.1016/j.semcancer.2023.01.005] [Citation(s) in RCA: 58] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 12/28/2022] [Accepted: 01/18/2023] [Indexed: 01/21/2023]
Abstract
Over the last decade, the nanomedicine has experienced unprecedented development in diagnosis and management of diseases. A number of nanomedicines have been approved in clinical use, which has demonstrated the potential value of clinical transition of nanotechnology-modified medicines from bench to bedside. The application of artificial intelligence (AI) in development of nanotechnology-based products could transform the healthcare sector by realizing acquisition and analysis of large datasets, and tailoring precision nanomedicines for cancer management. AI-enabled nanotechnology could improve the accuracy of molecular profiling and early diagnosis of patients, and optimize the design pipeline of nanomedicines by tuning the properties of nanomedicines, achieving effective drug synergy, and decreasing the nanotoxicity, thereby, enhancing the targetability, personalized dosing and treatment potency of nanomedicines. Herein, the advances in AI-enabled nanomedicines in cancer management are elaborated and their application in diagnosis, monitoring and therapy as well in precision medicine development is discussed.
Collapse
Affiliation(s)
- Ping Tan
- Department of Urology, and Department of Radiology, Institute of Urology, and Huaxi MR Research Center (HMRRC), Animal Experimental Center, National Clinical Research Center for Geriatrics, Frontiers Science Center for Disease-Related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Xiaoting Chen
- Department of Urology, and Department of Radiology, Institute of Urology, and Huaxi MR Research Center (HMRRC), Animal Experimental Center, National Clinical Research Center for Geriatrics, Frontiers Science Center for Disease-Related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Hu Zhang
- Amgen Bioprocessing Centre, Keck Graduate Institute, Claremont, CA 91711, USA
| | - Qiang Wei
- Department of Urology, and Department of Radiology, Institute of Urology, and Huaxi MR Research Center (HMRRC), Animal Experimental Center, National Clinical Research Center for Geriatrics, Frontiers Science Center for Disease-Related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China.
| | - Kui Luo
- Department of Urology, and Department of Radiology, Institute of Urology, and Huaxi MR Research Center (HMRRC), Animal Experimental Center, National Clinical Research Center for Geriatrics, Frontiers Science Center for Disease-Related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China.
| |
Collapse
|
47
|
He X, Liu X, Zuo F, Shi H, Jing J. Artificial intelligence-based multi-omics analysis fuels cancer precision medicine. Semin Cancer Biol 2023; 88:187-200. [PMID: 36596352 DOI: 10.1016/j.semcancer.2022.12.009] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 12/16/2022] [Accepted: 12/29/2022] [Indexed: 01/02/2023]
Abstract
With biotechnological advancements, innovative omics technologies are constantly emerging that have enabled researchers to access multi-layer information from the genome, epigenome, transcriptome, proteome, metabolome, and more. A wealth of omics technologies, including bulk and single-cell omics approaches, have empowered to characterize different molecular layers at unprecedented scale and resolution, providing a holistic view of tumor behavior. Multi-omics analysis allows systematic interrogation of various molecular information at each biological layer while posing tricky challenges regarding how to extract valuable insights from the exponentially increasing amount of multi-omics data. Therefore, efficient algorithms are needed to reduce the dimensionality of the data while simultaneously dissecting the mysteries behind the complex biological processes of cancer. Artificial intelligence has demonstrated the ability to analyze complementary multi-modal data streams within the oncology realm. The coincident development of multi-omics technologies and artificial intelligence algorithms has fuelled the development of cancer precision medicine. Here, we present state-of-the-art omics technologies and outline a roadmap of multi-omics integration analysis using an artificial intelligence strategy. The advances made using artificial intelligence-based multi-omics approaches are described, especially concerning early cancer screening, diagnosis, response assessment, and prognosis prediction. Finally, we discuss the challenges faced in multi-omics analysis, along with tentative future trends in this field. With the increasing application of artificial intelligence in multi-omics analysis, we anticipate a shifting paradigm in precision medicine becoming driven by artificial intelligence-based multi-omics technologies.
Collapse
Affiliation(s)
- Xiujing He
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Xiaowei Liu
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Fengli Zuo
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Hubing Shi
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Jing Jing
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China.
| |
Collapse
|
48
|
Zhi Y, Li M, Lv G. Into the multi-omics era: Progress of T cells profiling in the context of solid organ transplantation. Front Immunol 2023; 14:1058296. [PMID: 36798139 PMCID: PMC9927650 DOI: 10.3389/fimmu.2023.1058296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 01/20/2023] [Indexed: 02/04/2023] Open
Abstract
T cells are the common type of lymphocyte to mediate allograft rejection, remaining long-term allograft survival impeditive. However, the heterogeneity of T cells, in terms of differentiation and activation status, the effector function, and highly diverse T cell receptors (TCRs) have thus precluded us from tracking these T cells and thereby comprehending their fate in recipients due to the limitations of traditional detection approaches. Recently, with the widespread development of single-cell techniques, the identification and characterization of T cells have been performed at single-cell resolution, which has contributed to a deeper comprehension of T cell heterogeneity by relevant detections in a single cell - such as gene expression, DNA methylation, chromatin accessibility, surface proteins, and TCR. Although these approaches can provide valuable insights into an individual cell independently, a comprehensive understanding can be obtained when applied joint analysis. Multi-omics techniques have been implemented in characterizing T cells in health and disease, including transplantation. This review focuses on the thesis, challenges, and advances in these technologies and highlights their application to the study of alloreactive T cells to improve the understanding of T cell heterogeneity in solid organ transplantation.
Collapse
Affiliation(s)
- Yao Zhi
- Department of Hepatobiliary and Pancreatic Surgery, The First Hospital of Jilin University, Changchun, China
| | - Mingqian Li
- Department of Hepatobiliary and Pancreatic Surgery, The First Hospital of Jilin University, Changchun, China
| | - Guoyue Lv
- Department of Hepatobiliary and Pancreatic Surgery, The First Hospital of Jilin University, Changchun, China
| |
Collapse
|
49
|
Patra P, B R D, Kundu P, Das M, Ghosh A. Recent advances in machine learning applications in metabolic engineering. Biotechnol Adv 2023; 62:108069. [PMID: 36442697 DOI: 10.1016/j.biotechadv.2022.108069] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 10/18/2022] [Accepted: 11/22/2022] [Indexed: 11/27/2022]
Abstract
Metabolic engineering encompasses several widely-used strategies, which currently hold a high seat in the field of biotechnology when its potential is manifesting through a plethora of research and commercial products with a strong societal impact. The genomic revolution that occurred almost three decades ago has initiated the generation of large omics-datasets which has helped in gaining a better understanding of cellular behavior. The itinerary of metabolic engineering that has occurred based on these large datasets has allowed researchers to gain detailed insights and a reasonable understanding of the intricacies of biosystems. However, the existing trail-and-error approaches for metabolic engineering are laborious and time-intensive when it comes to the production of target compounds with high yields through genetic manipulations in host organisms. Machine learning (ML) coupled with the available metabolic engineering test instances and omics data brings a comprehensive and multidisciplinary approach that enables scientists to evaluate various parameters for effective strain design. This vast amount of biological data should be standardized through knowledge engineering to train different ML models for providing accurate predictions in gene circuits designing, modification of proteins, optimization of bioprocess parameters for scaling up, and screening of hyper-producing robust cell factories. This review briefs on the premise of ML, followed by mentioning various ML methods and algorithms alongside the numerous omics datasets available to train ML models for predicting metabolic outcomes with high-accuracy. The combinative interplay between the ML algorithms and biological datasets through knowledge engineering have guided the recent advancements in applications such as CRISPR/Cas systems, gene circuits, protein engineering, metabolic pathway reconstruction, and bioprocess engineering. Finally, this review addresses the probable challenges of applying ML in metabolic engineering which will guide the researchers toward novel techniques to overcome the limitations.
Collapse
Affiliation(s)
- Pradipta Patra
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Disha B R
- B.M.S College of Engineering, Basavanagudi, Bengaluru, Karnataka 560019, India
| | - Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Manali Das
- School of Bioscience, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
50
|
Early Diagnosis of Brain Diseases Using Artificial Intelligence and EV Molecular Data: A Proposed Noninvasive Repeated Diagnosis Approach. Cells 2022; 12:cells12010102. [PMID: 36611896 PMCID: PMC9818301 DOI: 10.3390/cells12010102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 12/22/2022] [Accepted: 12/23/2022] [Indexed: 12/28/2022] Open
Abstract
Brain-derived extracellular vesicles (BDEVs) are released from the central nervous system. Brain-related research and diagnostic techniques involving BDEVs have rapidly emerged as a means of diagnosing brain disorders because they are minimally invasive and enable repeatable measurements based on body fluids. However, EVs from various cells and organs are mixed in the blood, acting as potential obstacles for brain diagnostic systems using BDEVs. Therefore, it is important to screen appropriate brain EV markers to isolate BDEVs in blood. Here, we established a strategy for screening potential BDEV biomarkers. To collect various molecular data from the BDEVs, we propose that the sensitivity and specificity of the diagnostic system could be enhanced using machine learning and AI analysis. This BDEV-based diagnostic strategy could be used to diagnose various brain diseases and will help prevent disease through early diagnosis and early treatment.
Collapse
|