51
|
Barash M, McNevin D, Fedorenko V, Giverts P. Machine learning applications in forensic DNA profiling: A critical review. Forensic Sci Int Genet 2024; 69:102994. [PMID: 38086200 DOI: 10.1016/j.fsigen.2023.102994] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 11/06/2023] [Accepted: 11/26/2023] [Indexed: 01/29/2024]
Abstract
Machine learning (ML) is a range of powerful computational algorithms capable of generating predictive models via intelligent autonomous analysis of relatively large and often unstructured data. ML has become an integral part of our daily lives with a plethora of applications, including web, business, automotive industry, clinical diagnostics, scientific research, and more recently, forensic science. In the field of forensic DNA, the manual analysis of complex data can be challenging, time-consuming, and error-prone. The integration of novel ML-based methods may aid in streamlining this process while maintaining the high accuracy and reproducibility required for forensic tools. Due to the relative novelty of such applications, the forensic community is largely unaware of ML capabilities and limitations. Furthermore, computer science and ML professionals are often unfamiliar with the forensic science field and its specific requirements. This manuscript offers a brief introduction to the capabilities of machine learning methods and their applications in the context of forensic DNA analysis and offers a critical review of the current literature in this rapidly developing field.
Collapse
Affiliation(s)
- Mark Barash
- Department of Justice Studies, San José State University, San Jose, CA, United States; Centre for Forensic Science, School of Mathematical and Physical Sciences, Faculty of Science, University of Technology Sydney, Broadway, Ultimo, NSW 2007, Australia.
| | - Dennis McNevin
- Centre for Forensic Science, School of Mathematical and Physical Sciences, Faculty of Science, University of Technology Sydney, Broadway, Ultimo, NSW 2007, Australia
| | - Vladimir Fedorenko
- The Educational and Scientific Laboratory of Forensic Materials Engineering of the Saratov State University, Russia
| | - Pavel Giverts
- Division of Identification and Forensic Science, Israel Police HQ, Haim Bar-Lev Road, Jerusalem, Israel
| |
Collapse
|
52
|
Chen Y, Mancini M, Zhu X, Akata Z. Semi-Supervised and Unsupervised Deep Visual Learning: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:1327-1347. [PMID: 36006881 DOI: 10.1109/tpami.2022.3201576] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
State-of-the-art deep learning models are often trained with a large amount of costly labeled training data. However, requiring exhaustive manual annotations may degrade the model's generalizability in the limited-label regime.Semi-supervised learning and unsupervised learning offer promising paradigms to learn from an abundance of unlabeled visual data. Recent progress in these paradigms has indicated the strong benefits of leveraging unlabeled data to improve model generalization and provide better model initialization. In this survey, we review the recent advanced deep learning algorithms on semi-supervised learning (SSL) and unsupervised learning (UL) for visual recognition from a unified perspective. To offer a holistic understanding of the state-of-the-art in these areas, we propose a unified taxonomy. We categorize existing representative SSL and UL with comprehensive and insightful analysis to highlight their design rationales in different learning scenarios and applications in different computer vision tasks. Lastly, we discuss the emerging trends and open challenges in SSL and UL to shed light on future critical research directions.
Collapse
|
53
|
Fang C, Dziedzic A, Zhang L, Oliva L, Verma A, Razak F, Papernot N, Wang B. Decentralised, collaborative, and privacy-preserving machine learning for multi-hospital data. EBioMedicine 2024; 101:105006. [PMID: 38377795 PMCID: PMC10884342 DOI: 10.1016/j.ebiom.2024.105006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 01/26/2024] [Accepted: 01/28/2024] [Indexed: 02/22/2024] Open
Abstract
BACKGROUND Machine Learning (ML) has demonstrated its great potential on medical data analysis. Large datasets collected from diverse sources and settings are essential for ML models in healthcare to achieve better accuracy and generalizability. Sharing data across different healthcare institutions or jurisdictions is challenging because of complex and varying privacy and regulatory requirements. Hence, it is hard but crucial to allow multiple parties to collaboratively train an ML model leveraging the private datasets available at each party without the need for direct sharing of those datasets or compromising the privacy of the datasets through collaboration. METHODS In this paper, we address this challenge by proposing Decentralized, Collaborative, and Privacy-preserving ML for Multi-Hospital Data (DeCaPH). This framework offers the following key benefits: (1) it allows different parties to collaboratively train an ML model without transferring their private datasets (i.e., no data centralization); (2) it safeguards patients' privacy by limiting the potential privacy leakage arising from any contents shared across the parties during the training process; and (3) it facilitates the ML model training without relying on a centralized party/server. FINDINGS We demonstrate the generalizability and power of DeCaPH on three distinct tasks using real-world distributed medical datasets: patient mortality prediction using electronic health records, cell-type classification using single-cell human genomes, and pathology identification using chest radiology images. The ML models trained with DeCaPH framework have less than 3.2% drop in model performance comparing to those trained by the non-privacy-preserving collaborative framework. Meanwhile, the average vulnerability to privacy attacks of the models trained with DeCaPH decreased by up to 16%. In addition, models trained with our DeCaPH framework achieve better performance than those models trained solely with the private datasets from individual parties without collaboration and those trained with the previous privacy-preserving collaborative training framework under the same privacy guarantee by up to 70% and 18.2% respectively. INTERPRETATION We demonstrate that the ML models trained with DeCaPH framework have an improved utility-privacy trade-off, showing DeCaPH enables the models to have good performance while preserving the privacy of the training data points. In addition, the ML models trained with DeCaPH framework in general outperform those trained solely with the private datasets from individual parties, showing that DeCaPH enhances the model generalizability. FUNDING This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC, RGPIN-2020-06189 and DGECR-2020-00294), Canadian Institute for Advanced Research (CIFAR) AI Catalyst Grants, CIFAR AI Chair programs, Temerty Professor of AI Research and Education in Medicine, University of Toronto, Amazon, Apple, DARPA through the GARD project, Intel, Meta, the Ontario Early Researcher Award, and the Sloan Foundation. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.
Collapse
Affiliation(s)
- Congyu Fang
- Department of Computer Science, University of Toronto, Canada; Peter Munk Cardiac Centre, University Health Network, Canada; Vector Institute, Toronto, Canada
| | - Adam Dziedzic
- Vector Institute, Toronto, Canada; CISPA Helmholtz Center for Information Security, Germany; Department of Electrical and Computer Engineering, University of Toronto, Canada
| | - Lin Zhang
- Peter Munk Cardiac Centre, University Health Network, Canada; Simon Fraser University, Canada
| | - Laura Oliva
- Peter Munk Cardiac Centre, University Health Network, Canada
| | - Amol Verma
- St. Michael's Hospital, Unity Health Toronto, Canada; Department of Medicine, University of Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Canada
| | - Fahad Razak
- St. Michael's Hospital, Unity Health Toronto, Canada; Department of Medicine, University of Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Canada
| | - Nicolas Papernot
- Department of Computer Science, University of Toronto, Canada; Vector Institute, Toronto, Canada; Department of Electrical and Computer Engineering, University of Toronto, Canada.
| | - Bo Wang
- Department of Computer Science, University of Toronto, Canada; Peter Munk Cardiac Centre, University Health Network, Canada; Vector Institute, Toronto, Canada; Department of Laboratory Medicine and Pathobiology, Temerty Faculty of Medicine, University of Toronto, Canada.
| |
Collapse
|
54
|
Bai Y, Lin H, Wang C, Wang Q, Qu J. Digitalizing river aquatic ecosystems. J Environ Sci (China) 2024; 137:677-680. [PMID: 37980050 DOI: 10.1016/j.jes.2023.03.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 03/07/2023] [Accepted: 03/08/2023] [Indexed: 11/20/2023]
Abstract
Traditional river health assessment relies on limited water quality indices and representative organism activity, but does not comprehensively obtain biotic and abiotic information of the ecosystem. Here, we propose a new approach to evaluate the ecological and health risks of river aquatic ecosystems. First, detailed physicochemical and biological characterization of a river ecosystem can be obtained through pollutant determination (especially emerging pollutants) and DNA/RNA sequencing. Second, supervised machine learning can be applied to perform classification analysis of characterization data and ascertain river ecosystem ecology and health. Our proposed methodology transforms river ecosystem health assessment and can be applied in river management.
Collapse
Affiliation(s)
- Yaohui Bai
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China.
| | - Hui Lin
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chenchen Wang
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China; School of Environmental and Municipal Engineering, Tianjin Chengjian University, Tianjin 300384, China
| | - Qiaojuan Wang
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiuhui Qu
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China; Center for Water and Ecology, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
55
|
Hazan JM, Amador R, Ali-Nasser T, Lahav T, Shotan SR, Steinberg M, Cohen Z, Aran D, Meiri D, Assaraf YG, Guigó R, Bester AC. Integration of transcription regulation and functional genomic data reveals lncRNA SNHG6's role in hematopoietic differentiation and leukemia. J Biomed Sci 2024; 31:27. [PMID: 38419051 PMCID: PMC10900714 DOI: 10.1186/s12929-024-01015-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 02/22/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) are pivotal players in cellular processes, and their unique cell-type specific expression patterns render them attractive biomarkers and therapeutic targets. Yet, the functional roles of most lncRNAs remain enigmatic. To address the need to identify new druggable lncRNAs, we developed a comprehensive approach integrating transcription factor binding data with other genetic features to generate a machine learning model, which we have called INFLAMeR (Identifying Novel Functional LncRNAs with Advanced Machine Learning Resources). METHODS INFLAMeR was trained on high-throughput CRISPR interference (CRISPRi) screens across seven cell lines, and the algorithm was based on 71 genetic features. To validate the predictions, we selected candidate lncRNAs in the human K562 leukemia cell line and determined the impact of their knockdown (KD) on cell proliferation and chemotherapeutic drug response. We further performed transcriptomic analysis for candidate genes. Based on these findings, we assessed the lncRNA small nucleolar RNA host gene 6 (SNHG6) for its role in myeloid differentiation. Finally, we established a mouse K562 leukemia xenograft model to determine whether SNHG6 KD attenuates tumor growth in vivo. RESULTS The INFLAMeR model successfully reconstituted CRISPRi screening data and predicted functional lncRNAs that were previously overlooked. Intensive cell-based and transcriptomic validation of nearly fifty genes in K562 revealed cell type-specific functionality for 85% of the predicted lncRNAs. In this respect, our cell-based and transcriptomic analyses predicted a role for SNHG6 in hematopoiesis and leukemia. Consistent with its predicted role in hematopoietic differentiation, SNHG6 transcription is regulated by hematopoiesis-associated transcription factors. SNHG6 KD reduced the proliferation of leukemia cells and sensitized them to differentiation. Treatment of K562 leukemic cells with hemin and PMA, respectively, demonstrated that SNHG6 inhibits red blood cell differentiation but strongly promotes megakaryocyte differentiation. Using a xenograft mouse model, we demonstrate that SNHG6 KD attenuated tumor growth in vivo. CONCLUSIONS Our approach not only improved the identification and characterization of functional lncRNAs through genomic approaches in a cell type-specific manner, but also identified new lncRNAs with roles in hematopoiesis and leukemia. Such approaches can be readily applied to identify novel targets for precision medicine.
Collapse
Affiliation(s)
- Joshua M Hazan
- Department of Biology, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
| | - Raziel Amador
- Centre for Genomic Regulation (CRG), Doctor Aiguader 88, 08003, Barcelona, Catalonia, Spain
- Universitat de Barcelona (UB), Barcelona, Catalonia, Spain
| | - Tahleel Ali-Nasser
- Department of Biology, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
| | - Tamar Lahav
- Department of Biology, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
| | - Stav Roni Shotan
- Department of Biology, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
| | - Miryam Steinberg
- Department of Biology, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
| | - Ziv Cohen
- Department of Biology, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
- The Taub Faculty of Computer Science, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
| | - Dvir Aran
- Department of Biology, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
- The Taub Faculty of Computer Science, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
| | - David Meiri
- Department of Biology, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
| | - Yehuda G Assaraf
- The Fred Wyszkowski Cancer Research Laboratory, Department of Biology, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), Doctor Aiguader 88, 08003, Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Assaf C Bester
- Department of Biology, Technion-Israel Institute of Technology, 3200003, Haifa, Israel.
| |
Collapse
|
56
|
Rahit KMTH, Avramovic V, Chong JX, Tarailo-Graovac M. GPAD: a natural language processing-based application to extract the gene-disease association discovery information from OMIM. BMC Bioinformatics 2024; 25:84. [PMID: 38413851 PMCID: PMC10898068 DOI: 10.1186/s12859-024-05693-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 02/09/2024] [Indexed: 02/29/2024] Open
Abstract
BACKGROUND Thousands of genes have been associated with different Mendelian conditions. One of the valuable sources to track these gene-disease associations (GDAs) is the Online Mendelian Inheritance in Man (OMIM) database. However, most of the information in OMIM is textual, and heterogeneous (e.g. summarized by different experts), which complicates automated reading and understanding of the data. Here, we used Natural Language Processing (NLP) to make a tool (Gene-Phenotype Association Discovery (GPAD)) that could syntactically process OMIM text and extract the data of interest. RESULTS GPAD applies a series of language-based techniques to the text obtained from OMIM API to extract GDA discovery-related information. GPAD can inform when a particular gene was associated with a specific phenotype, as well as the type of validation-whether through model organisms or cohort-based patient-matching approaches-for such an association. GPAD extracted data was validated with published reports and was compared with large language model. Utilizing GPAD's extracted data, we analysed trends in GDA discoveries, noting a significant increase in their rate after the introduction of exome sequencing, rising from an average of about 150-250 discoveries each year. Contrary to hopes of resolving most GDAs for Mendelian disorders by now, our data indicate a substantial decline in discovery rates over the past five years (2017-2022). This decline appears to be linked to the increasing necessity for larger cohorts to substantiate GDAs. The rising use of zebrafish and Drosophila as model organisms in providing evidential support for GDAs is also observed. CONCLUSIONS GPAD's real-time analyzing capacity offers an up-to-date view of GDA discovery and could help in planning and managing the research strategies. In future, this solution can be extended or modified to capture other information in OMIM and scientific literature.
Collapse
Affiliation(s)
- K M Tahsin Hassan Rahit
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada
| | - Vladimir Avramovic
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada
| | - Jessica X Chong
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, 98195, USA
- Brotman-Baty Institute, Seattle, WA, 98195, USA
| | - Maja Tarailo-Graovac
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada.
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada.
| |
Collapse
|
57
|
Li G, Li C, Wang C, Wang Z. Suboptimal capability of individual machine learning algorithms in modeling small-scale imbalanced clinical data of local hospital. PLoS One 2024; 19:e0298328. [PMID: 38394317 PMCID: PMC10890755 DOI: 10.1371/journal.pone.0298328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 01/22/2024] [Indexed: 02/25/2024] Open
Abstract
In recent years, artificial intelligence (AI) has shown promising applications in various scientific domains, including biochemical analysis research. However, the effectiveness of AI in modeling small-scale, imbalanced datasets remains an open question in such fields. This study explores the capabilities of eight basic AI algorithms, including ridge regression, logistic regression, random forest regression, and others, in modeling a small, imbalanced clinical dataset (total n = 387, class 0 = 27, class 1 = 360) related to the records of the biochemical blood tests from the patients with multiple wasp stings (MWS). Through rigorous evaluation using k-fold cross-validation and comprehensive scoring, we found that none of the models could effectively model the data. Even after fine-tuning the hyperparameters of the best-performing models, the results remained below acceptable thresholds. The study highlights the challenges of applying AI to small-scale datasets with imbalanced groups in biochemical or clinical research and emphasizes the need for novel algorithms tailored to small-scale data. The findings also call for further exploration into techniques such as transfer learning and data augmentation, and they underline the importance of understanding the minimum dataset scale required for effective AI modeling in biochemical contexts.
Collapse
Affiliation(s)
- Gang Li
- Department of ICU, 3201 Hospital, Hanzhong, Shaanxi, China
| | - Chenbi Li
- Department of ICU, 3201 Hospital, Hanzhong, Shaanxi, China
| | - Chengli Wang
- Department of ICU, 3201 Hospital, Hanzhong, Shaanxi, China
| | - Zeheng Wang
- Data61, CSIRO, Clayton, VIC, Australia
- Manufacturing, CSIRO, West Lindfield, NSW, Australia
| |
Collapse
|
58
|
Fong WJ, Tan HM, Garg R, Teh AL, Pan H, Gupta V, Krishna B, Chen ZH, Purwanto NY, Yap F, Tan KH, Chan KYJ, Chan SY, Goh N, Rane N, Tan ESE, Jiang Y, Han M, Meaney M, Wang D, Keppo J, Tan GCY. Comparing feature selection and machine learning approaches for predicting CYP2D6 methylation from genetic variation. Front Neuroinform 2024; 17:1244336. [PMID: 38449836 PMCID: PMC10915285 DOI: 10.3389/fninf.2023.1244336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 10/18/2023] [Indexed: 03/08/2024] Open
Abstract
Introduction Pharmacogenetics currently supports clinical decision-making on the basis of a limited number of variants in a few genes and may benefit paediatric prescribing where there is a need for more precise dosing. Integrating genomic information such as methylation into pharmacogenetic models holds the potential to improve their accuracy and consequently prescribing decisions. Cytochrome P450 2D6 (CYP2D6) is a highly polymorphic gene conventionally associated with the metabolism of commonly used drugs and endogenous substrates. We thus sought to predict epigenetic loci from single nucleotide polymorphisms (SNPs) related to CYP2D6 in children from the GUSTO cohort. Methods Buffy coat DNA methylation was quantified using the Illumina Infinium Methylation EPIC beadchip. CpG sites associated with CYP2D6 were used as outcome variables in Linear Regression, Elastic Net and XGBoost models. We compared feature selection of SNPs from GWAS mQTLs, GTEx eQTLs and SNPs within 2 MB of the CYP2D6 gene and the impact of adding demographic data. The samples were split into training (75%) sets and test (25%) sets for validation. In Elastic Net model and XGBoost models, optimal hyperparameter search was done using 10-fold cross validation. Root Mean Square Error and R-squared values were obtained to investigate each models' performance. When GWAS was performed to determine SNPs associated with CpG sites, a total of 15 SNPs were identified where several SNPs appeared to influence multiple CpG sites. Results Overall, Elastic Net models of genetic features appeared to perform marginally better than heritability estimates and substantially better than Linear Regression and XGBoost models. The addition of nongenetic features appeared to improve performance for some but not all feature sets and probes. The best feature set and Machine Learning (ML) approach differed substantially between CpG sites and a number of top variables were identified for each model. Discussion The development of SNP-based prediction models for CYP2D6 CpG methylation in Singaporean children of varying ethnicities in this study has clinical application. With further validation, they may add to the set of tools available to improve precision medicine and pharmacogenetics-based dosing.
Collapse
Affiliation(s)
- Wei Jing Fong
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Hong Ming Tan
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Rishabh Garg
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Ai Ling Teh
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Hong Pan
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Varsha Gupta
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Bernadus Krishna
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Zou Hui Chen
- Computational Biology, National University of Singapore, Singapore, Singapore
| | | | - Fabian Yap
- KK Women's and Children's Hospital, Singapore, Singapore
| | - Kok Hian Tan
- KK Women's and Children's Hospital, Singapore, Singapore
- Duke NUS Medical School, Singapore, Singapore
| | - Kok Yen Jerry Chan
- KK Women's and Children's Hospital, Singapore, Singapore
- Duke NUS Medical School, Singapore, Singapore
| | - Shiao-Yng Chan
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- National University Hospital, Singapore, Singapore
| | | | - Nikita Rane
- Institute of Mental Health,Singapore, Singapore
| | | | | | - Mei Han
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Michael Meaney
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Dennis Wang
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Jussi Keppo
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Geoffrey Chern-Yee Tan
- Computational Biology, National University of Singapore, Singapore, Singapore
- Institute of Mental Health,Singapore, Singapore
| |
Collapse
|
59
|
Lv Q, Liu Y, Sun Y, Wu M. Insight into deep learning for glioma IDH medical image analysis: A systematic review. Medicine (Baltimore) 2024; 103:e37150. [PMID: 38363910 PMCID: PMC10869095 DOI: 10.1097/md.0000000000037150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 01/11/2024] [Indexed: 02/18/2024] Open
Abstract
BACKGROUND Deep learning techniques explain the enormous potential of medical image analysis, particularly in digital pathology. Concurrently, molecular markers have gained increasing significance over the past decade in the context of glioma patients, providing novel insights into diagnosis and more personalized treatment options. Deep learning combined with imaging and molecular analysis enables more accurate prognostication of patients, more accurate treatment plan proposals, and accurate biomarker (IDH) prediction for gliomas. This systematic study examines the development of deep learning techniques for IDH prediction using histopathology images, spanning the period from 2019 to 2023. METHOD The study adhered to the PRISMA reporting requirements, and databases including PubMed, Google Scholar, Google Search, and preprint repositories (such as arXiv) were systematically queried for pertinent literature spanning the period from 2019 to the 30th of 2023. Search phrases related to deep learning, digital pathology, glioma, and IDH were collaboratively utilized. RESULTS Fifteen papers meeting the inclusion criteria were included in the analysis. These criteria specifically encompassed studies utilizing deep learning for the analysis of hematoxylin and eosin images to determine the IDH status in patients with gliomas. CONCLUSIONS When predicting the status of IDH, the classifier built on digital pathological images demonstrates exceptional performance. The study's predictive effectiveness is enhanced with the utilization of the appropriate deep learning model. However, external verification is necessary to showcase their resilience and universality. Larger sample sizes and multicenter samples are necessary for more comprehensive research to evaluate performance and confirm clinical advantages.
Collapse
Affiliation(s)
- Qingqing Lv
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha 410008, Hunan, China
- The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health, The Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, 410078, Hunan, China
| | - Yihao Liu
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha 410008, Hunan, China
- The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health, The Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, 410078, Hunan, China
| | - Yingnan Sun
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha 410008, Hunan, China
| | - Minghua Wu
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha 410008, Hunan, China
- The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health, The Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, 410078, Hunan, China
| |
Collapse
|
60
|
Shahjahan, Dey JK, Dey SK. Translational bioinformatics approach to combat cardiovascular disease and cancers. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2024; 139:221-261. [PMID: 38448136 DOI: 10.1016/bs.apcsb.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
Bioinformatics is an interconnected subject of science dealing with diverse fields including biology, chemistry, physics, statistics, mathematics, and computer science as the key fields to answer complicated physiological problems. Key intention of bioinformatics is to store, analyze, organize, and retrieve essential information about genome, proteome, transcriptome, metabolome, as well as organisms to investigate the biological system along with its dynamics, if any. The outcome of bioinformatics depends on the type, quantity, and quality of the raw data provided and the algorithm employed to analyze the same. Despite several approved medicines available, cardiovascular disorders (CVDs) and cancers comprises of the two leading causes of human deaths. Understanding the unknown facts of both these non-communicable disorders is inevitable to discover new pathways, find new drug targets, and eventually newer drugs to combat them successfully. Since, all these goals involve complex investigation and handling of various types of macro- and small- molecules of the human body, bioinformatics plays a key role in such processes. Results from such investigation has direct human application and thus we call this filed as translational bioinformatics. Current book chapter thus deals with diverse scope and applications of this translational bioinformatics to find cure, diagnosis, and understanding the mechanisms of CVDs and cancers. Developing complex yet small or long algorithms to address such problems is very common in translational bioinformatics. Structure-based drug discovery or AI-guided invention of novel antibodies that too with super-high accuracy, speed, and involvement of considerably low amount of investment are some of the astonishing features of the translational bioinformatics and its applications in the fields of CVDs and cancers.
Collapse
Affiliation(s)
- Shahjahan
- Laboratory for Structural Biology of Membrane Proteins, Dr. B.R. Ambedkar Center for Biomedical Research, University of Delhi, Delhi, India
| | - Joy Kumar Dey
- Central Council for Research in Homoeopathy, Ministry of Ayush, Govt. of India, New Delhi, Delhi, India
| | - Sanjay Kumar Dey
- Laboratory for Structural Biology of Membrane Proteins, Dr. B.R. Ambedkar Center for Biomedical Research, University of Delhi, Delhi, India.
| |
Collapse
|
61
|
Hassan J, Saeed SM, Deka L, Uddin MJ, Das DB. Applications of Machine Learning (ML) and Mathematical Modeling (MM) in Healthcare with Special Focus on Cancer Prognosis and Anticancer Therapy: Current Status and Challenges. Pharmaceutics 2024; 16:260. [PMID: 38399314 PMCID: PMC10892549 DOI: 10.3390/pharmaceutics16020260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/29/2024] [Accepted: 02/07/2024] [Indexed: 02/25/2024] Open
Abstract
The use of data-driven high-throughput analytical techniques, which has given rise to computational oncology, is undisputed. The widespread use of machine learning (ML) and mathematical modeling (MM)-based techniques is widely acknowledged. These two approaches have fueled the advancement in cancer research and eventually led to the uptake of telemedicine in cancer care. For diagnostic, prognostic, and treatment purposes concerning different types of cancer research, vast databases of varied information with manifold dimensions are required, and indeed, all this information can only be managed by an automated system developed utilizing ML and MM. In addition, MM is being used to probe the relationship between the pharmacokinetics and pharmacodynamics (PK/PD interactions) of anti-cancer substances to improve cancer treatment, and also to refine the quality of existing treatment models by being incorporated at all steps of research and development related to cancer and in routine patient care. This review will serve as a consolidation of the advancement and benefits of ML and MM techniques with a special focus on the area of cancer prognosis and anticancer therapy, leading to the identification of challenges (data quantity, ethical consideration, and data privacy) which are yet to be fully addressed in current studies.
Collapse
Affiliation(s)
- Jasmin Hassan
- Drug Delivery & Therapeutics Lab, Dhaka 1212, Bangladesh; (J.H.); (S.M.S.)
| | | | - Lipika Deka
- Faculty of Computing, Engineering and Media, De Montfort University, Leicester LE1 9BH, UK;
| | - Md Jasim Uddin
- Department of Pharmaceutical Technology, Faculty of Pharmacy, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| | - Diganta B. Das
- Department of Chemical Engineering, Loughborough University, Loughborough LE11 3TU, UK
| |
Collapse
|
62
|
Wong EY, Chu TN, Ladi-Seyedian SS. Genomics and Artificial Intelligence: Prostate Cancer. Urol Clin North Am 2024; 51:27-33. [PMID: 37945100 DOI: 10.1016/j.ucl.2023.06.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
Artificial intelligence (AI) is revolutionizing prostate cancer genomics research. By leveraging machine learning and deep learning algorithms, researchers can rapidly analyze vast genomic datasets to identify patterns and correlations that may be missed by traditional methods. These AI-driven insights can lead to the discovery of novel biomarkers, enhance the accuracy of diagnosis, and predict disease progression and treatment response. As such, AI is becoming an indispensable tool in the pursuit of personalized medicine for prostate cancer.
Collapse
Affiliation(s)
- Elyssa Y Wong
- Catherine & Joseph Aresty Department of Urology, Center for Robotic Simulation & Education, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, USA
| | - Timothy N Chu
- Catherine & Joseph Aresty Department of Urology, Center for Robotic Simulation & Education, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, USA
| | - Seyedeh-Sanam Ladi-Seyedian
- Catherine & Joseph Aresty Department of Urology, Center for Robotic Simulation & Education, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
63
|
Gunter NB, Gebre RK, Graff-Radford J, Heckman MG, Jack CR, Lowe VJ, Knopman DS, Petersen RC, Ross OA, Vemuri P, Ramanan VK. Machine Learning Models of Polygenic Risk for Enhanced Prediction of Alzheimer Disease Endophenotypes. Neurol Genet 2024; 10:e200120. [PMID: 38250184 PMCID: PMC10798228 DOI: 10.1212/nxg.0000000000200120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 11/01/2023] [Indexed: 01/23/2024]
Abstract
Background and Objectives Alzheimer disease (AD) has a polygenic architecture, for which genome-wide association studies (GWAS) have helped elucidate sequence variants (SVs) influencing susceptibility. Polygenic risk score (PRS) approaches show promise for generating summary measures of inherited risk for clinical AD based on the effects of APOE and other GWAS hits. However, existing PRS approaches, based on traditional regression models, explain only modest variation in AD dementia risk and AD-related endophenotypes. We hypothesized that machine learning (ML) models of polygenic risk (ML-PRS) could outperform standard regression-based PRS methods and therefore have the potential for greater clinical utility. Methods We analyzed combined data from the Mayo Clinic Study of Aging (n = 1,791) and the Alzheimer's Disease Neuroimaging Initiative (n = 864). An AD PRS was computed for each participant using the top common SVs obtained from a large AD dementia GWAS. In parallel, ML models were trained using those SV genotypes, with amyloid PET burden as the primary outcome. Secondary outcomes included amyloid PET positivity and clinical diagnosis (cognitively unimpaired vs impaired). We compared performance between ML-PRS and standard PRS across 100 training sessions with different data splits. In each session, data were split into 80% training and 20% testing, and then five-fold cross-validation was used within the training set to ensure the best model was produced for testing. We also applied permutation importance techniques to assess which genetic factors contributed most to outcome prediction. Results ML-PRS models outperformed the AD PRS (r2 = 0.28 vs r2 = 0.24 in test set) in explaining variation in amyloid PET burden. Among ML approaches, methods accounting for nonlinear genetic influences were superior to linear methods. ML-PRS models were also more accurate when predicting amyloid PET positivity (area under the curve [AUC] = 0.80 vs AUC = 0.63) and the presence of cognitive impairment (AUC = 0.75 vs AUC = 0.54) compared with the standard PRS. Discussion We found that ML-PRS approaches improved upon standard PRS for prediction of AD endophenotypes, partly related to improved accounting for nonlinear effects of genetic susceptibility alleles. Further adaptations of the ML-PRS framework could help to close the gap of remaining unexplained heritability for AD and therefore facilitate more accurate presymptomatic and early-stage risk stratification for clinical decision-making.
Collapse
Affiliation(s)
- Nathaniel B Gunter
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Robel K Gebre
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Jonathan Graff-Radford
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Michael G Heckman
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Clifford R Jack
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Val J Lowe
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - David S Knopman
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Ronald C Petersen
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Owen A Ross
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Prashanthi Vemuri
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Vijay K Ramanan
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| |
Collapse
|
64
|
Cho H, She J, De Marchi D, El-Zaatari H, Barnes EL, Kahkoska AR, Kosorok MR, Virkud AV. Machine Learning and Health Science Research: Tutorial. J Med Internet Res 2024; 26:e50890. [PMID: 38289657 PMCID: PMC10865203 DOI: 10.2196/50890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 11/30/2023] [Accepted: 12/21/2023] [Indexed: 02/01/2024] Open
Abstract
Machine learning (ML) has seen impressive growth in health science research due to its capacity for handling complex data to perform a range of tasks, including unsupervised learning, supervised learning, and reinforcement learning. To aid health science researchers in understanding the strengths and limitations of ML and to facilitate its integration into their studies, we present here a guideline for integrating ML into an analysis through a structured framework, covering steps from framing a research question to study design and analysis techniques for specialized data types.
Collapse
Affiliation(s)
- Hunyong Cho
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Jane She
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Daniel De Marchi
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Helal El-Zaatari
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Edward L Barnes
- Division of Gastroenterology and Hepatology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Center for Gastrointestinal Biology and Diseases, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Anna R Kahkoska
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Division of Endocrinology and Metabolism, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Center for Aging and Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Michael R Kosorok
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Arti V Virkud
- Kidney Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
65
|
Lebatteux D, Soudeyns H, Boucoiran I, Gantt S, Diallo AB. Machine learning-based approach KEVOLVE efficiently identifies SARS-CoV-2 variant-specific genomic signatures. PLoS One 2024; 19:e0296627. [PMID: 38241279 PMCID: PMC10798494 DOI: 10.1371/journal.pone.0296627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 12/07/2023] [Indexed: 01/21/2024] Open
Abstract
Machine learning was shown to be effective at identifying distinctive genomic signatures among viral sequences. These signatures are defined as pervasive motifs in the viral genome that allow discrimination between species or variants. In the context of SARS-CoV-2, the identification of these signatures can assist in taxonomic and phylogenetic studies, improve in the recognition and definition of emerging variants, and aid in the characterization of functional properties of polymorphic gene products. In this paper, we assess KEVOLVE, an approach based on a genetic algorithm with a machine-learning kernel, to identify multiple genomic signatures based on minimal sets of k-mers. In a comparative study, in which we analyzed large SARS-CoV-2 genome dataset, KEVOLVE was more effective at identifying variant-discriminative signatures than several gold-standard statistical tools. Subsequently, these signatures were characterized using a new extension of KEVOLVE (KANALYZER) to highlight variations of the discriminative signatures among different classes of variants, their genomic location, and the mutations involved. The majority of identified signatures were associated with known mutations among the different variants, in terms of functional and pathological impact based on available literature. Here we showed that KEVOLVE is a robust machine learning approach to identify discriminative signatures among SARS-CoV-2 variants, which are frequently also biologically relevant, while bypassing multiple sequence alignments. The source code of the method and additional resources are available at: https://github.com/bioinfoUQAM/KEVOLVE.
Collapse
Affiliation(s)
- Dylan Lebatteux
- Department of Computer Science, Université du Québec à Montréal, Montréal, Québec, Canada
| | - Hugo Soudeyns
- CHU Sainte-Justine Research Centre, Montréal, Québec, Canada
- Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université de Montréal, Montréal, Québec, Canada
- Department of Pediatrics, Faculty of Medicine, Université du Québec à Montréal, Montréal, Québec, Canada
| | - Isabelle Boucoiran
- Department of Obstetrics and Gynecology, Faculty of Medicine, Université de Montréal, Montreal, Quebec, Canada
| | - Soren Gantt
- CHU Sainte-Justine Research Centre, Montréal, Québec, Canada
- Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université de Montréal, Montréal, Québec, Canada
| | | |
Collapse
|
66
|
Yu X, Zhao H, Wang R, Chen Y, Ouyang X, Li W, Sun Y, Peng A. Cancer epigenetics: from laboratory studies and clinical trials to precision medicine. Cell Death Discov 2024; 10:28. [PMID: 38225241 PMCID: PMC10789753 DOI: 10.1038/s41420-024-01803-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/23/2023] [Accepted: 01/04/2024] [Indexed: 01/17/2024] Open
Abstract
Epigenetic dysregulation is a common feature of a myriad of human diseases, particularly cancer. Defining the epigenetic defects associated with malignant tumors has become a focus of cancer research resulting in the gradual elucidation of cancer cell epigenetic regulation. In fact, most stages of tumor progression, including tumorigenesis, promotion, progression, and recurrence are accompanied by epigenetic alterations, some of which can be reversed by epigenetic drugs. The main objective of epigenetic therapy in the era of personalized precision medicine is to detect cancer biomarkers to improve risk assessment, diagnosis, and targeted treatment interventions. Rapid technological advancements streamlining the characterization of molecular epigenetic changes associated with cancers have propelled epigenetic drug research and development. This review summarizes the main mechanisms of epigenetic dysregulation and discusses past and present examples of epigenetic inhibitors in cancer diagnosis and treatment, with an emphasis on the development of epigenetic enzyme inhibitors or drugs. In the final part, the prospect of precise diagnosis and treatment is considered based on a better understanding of epigenetic abnormalities in cancer.
Collapse
Affiliation(s)
- Xinyang Yu
- Guangdong Provincial Key Laboratory of Tumor Interventional Diagnosis and Treatment, Zhuhai Institute of Translational Medicine, (Zhuhai People's Hospital Zhuhai Clinical Medical College of Jinan University), Zhuhai, 519000, China
| | - Hao Zhao
- Department of Spinal Surgery, Yichang Central People's Hospital Affiliated with China Three Gorges University, Yichang, Hubei, 443000, China
| | - Ruiqi Wang
- Department of Pharmacy, Zhuhai People's Hospital, Zhuhai People's Hospital (Zhuhai Clinical Medical College of Jinan University), Zhuhai, Guangdong, 519000, China
| | - Yingyin Chen
- Guangdong Provincial Key Laboratory of Tumor Interventional Diagnosis and Treatment, Zhuhai Institute of Translational Medicine, (Zhuhai People's Hospital Zhuhai Clinical Medical College of Jinan University), Zhuhai, 519000, China
| | - Xumei Ouyang
- Guangdong Provincial Key Laboratory of Tumor Interventional Diagnosis and Treatment, Zhuhai Institute of Translational Medicine, (Zhuhai People's Hospital Zhuhai Clinical Medical College of Jinan University), Zhuhai, 519000, China
| | - Wenting Li
- Guangdong Provincial Key Laboratory of Tumor Interventional Diagnosis and Treatment, Zhuhai Institute of Translational Medicine, (Zhuhai People's Hospital Zhuhai Clinical Medical College of Jinan University), Zhuhai, 519000, China
| | - Yihao Sun
- Guangdong Provincial Key Laboratory of Tumor Interventional Diagnosis and Treatment, Zhuhai Institute of Translational Medicine, (Zhuhai People's Hospital Zhuhai Clinical Medical College of Jinan University), Zhuhai, 519000, China.
| | - Anghui Peng
- Guangdong Provincial Key Laboratory of Tumor Interventional Diagnosis and Treatment, Zhuhai Institute of Translational Medicine, (Zhuhai People's Hospital Zhuhai Clinical Medical College of Jinan University), Zhuhai, 519000, China.
| |
Collapse
|
67
|
Sen SK, Green ED, Hutter CM, Craven M, Ideker T, Di Francesco V. Opportunities for basic, clinical, and bioethics research at the intersection of machine learning and genomics. CELL GENOMICS 2024; 4:100466. [PMID: 38190108 PMCID: PMC10794834 DOI: 10.1016/j.xgen.2023.100466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 07/14/2023] [Accepted: 11/20/2023] [Indexed: 01/09/2024]
Abstract
The data-intensive fields of genomics and machine learning (ML) are in an early stage of convergence. Genomics researchers increasingly seek to harness the power of ML methods to extract knowledge from their data; conversely, ML scientists recognize that genomics offers a wealth of large, complex, and well-annotated datasets that can be used as a substrate for developing biologically relevant algorithms and applications. The National Human Genome Research Institute (NHGRI) inquired with researchers working in these two fields to identify common challenges and receive recommendations to better support genomic research efforts using ML approaches. Those included increasing the amount and variety of training datasets by integrating genomic with multiomics, context-specific (e.g., by cell type), and social determinants of health datasets; reducing the inherent biases of training datasets; prioritizing transparency and interpretability of ML methods; and developing privacy-preserving technologies for research participants' data.
Collapse
Affiliation(s)
- Shurjo K Sen
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | - Eric D Green
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Carolyn M Hutter
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mark Craven
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53792, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Valentina Di Francesco
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| |
Collapse
|
68
|
Barnett EJ, Onete DG, Salekin A, Faraone SV. Genomic Machine Learning Meta-regression: Insights on Associations of Study Features With Reported Model Performance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:169-177. [PMID: 38109236 DOI: 10.1109/tcbb.2023.3343808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Many studies have been conducted with the goal of correctly predicting diagnostic status of a disorder using the combination of genomic data and machine learning. It is often hard to judge which components of a study led to better results and whether better reported results represent a true improvement or an uncorrected bias inflating performance. We extracted information about the methods used and other differentiating features in genomic machine learning models. We used these features in linear regressions predicting model performance. We tested for univariate and multivariate associations as well as interactions between features. Of the models reviewed, 46% used feature selection methods that can lead to data leakage. Across our models, the number of hyperparameter optimizations reported, data leakage due to feature selection, model type, and modeling an autoimmune disorder were significantly associated with an increase in reported model performance. We found a significant, negative interaction between data leakage and training size. Our results suggest that methods susceptible to data leakage are prevalent among genomic machine learning research, resulting in inflated reported performance. Best practice guidelines that promote the avoidance and recognition of data leakage may help the field avoid biased results.
Collapse
|
69
|
Li J, Varghese RS, Ressom HW. RNA-Seq Data Analysis. Methods Mol Biol 2024; 2822:263-290. [PMID: 38907924 DOI: 10.1007/978-1-0716-3918-4_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2024]
Abstract
RNA-Seq data analysis stands as a vital part of genomics research, turning vast and complex datasets into meaningful biological insights. It is a field marked by rapid evolution and ongoing innovation, necessitating a thorough understanding for anyone seeking to unlock the potential of RNA-Seq data. In this chapter, we describe the intricate landscape of RNA-seq data analysis, elucidating a comprehensive pipeline that navigates through the entirety of this complex process. Beginning with quality control, the chapter underscores the paramount importance of ensuring the integrity of RNA-seq data, as it lays the groundwork for subsequent analyses. Preprocessing is then addressed, where the raw sequence data undergoes necessary modifications and enhancements, setting the stage for the alignment phase. This phase involves mapping the processed sequences to a reference genome, a step pivotal for decoding the origins and functions of these sequences.Venturing into the heart of RNA-seq analysis, the chapter then explores differential expression analysis-the process of identifying genes that exhibit varying expression levels across different conditions or sample groups. Recognizing the biological context of these differentially expressed genes is pivotal; hence, the chapter transitions into functional analysis. Here, methods and tools like Gene Ontology and pathway analyses help contextualize the roles and interactions of the identified genes within broader biological frameworks. However, the chapter does not stop at conventional analysis methods. Embracing the evolving paradigms of data science, it delves into machine learning applications for RNA-seq data, introducing advanced techniques in dimension reduction and both unsupervised and supervised learning. These approaches allow for patterns and relationships to be discerned in the data that might be imperceptible through traditional methods.
Collapse
Affiliation(s)
- James Li
- Genomics & Epigenomics Shared Resource, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA
| | - Rency S Varghese
- Genomics & Epigenomics Shared Resource, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA
| | - Habtom W Ressom
- Genomics & Epigenomics Shared Resource, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA.
| |
Collapse
|
70
|
JIA KEGANG, WANG YAWEI, CAO QI, WANG YOUYU. Extensive prediction of drug response in mutation-subtype-specific LUAD with machine learning approach. Oncol Res 2023; 32:409-419. [PMID: 38186568 PMCID: PMC10765129 DOI: 10.32604/or.2023.042863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 09/25/2023] [Indexed: 01/09/2024] Open
Abstract
Background Lung cancer is the most prevalent cancer diagnosis and the leading cause of cancer death worldwide. Therapeutic failure in lung cancer (LUAD) is heavily influenced by drug resistance. This challenge stems from the diverse cell populations within the tumor, each having unique genetic, epigenetic, and phenotypic profiles. Such variations lead to varied therapeutic responses, thereby contributing to tumor relapse and disease progression. Methods The Genomics of Drug Sensitivity in Cancer (GDSC) database was used in this investigation to obtain the mRNA expression dataset, genomic mutation profile, and drug sensitivity information of NSCLS. Machine Learning (ML) methods, including Random Forest (RF), Artificial Neurol Network (ANN), and Support Vector Machine (SVM), were used to predict the response status of each compound based on the mRNA and mutation characteristics determined using statistical methods. The most suitable method for each drug was proposed by comparing the prediction accuracy of different ML methods, and the selected mRNA and mutation characteristics were identified as molecular features for the drug-responsive cancer subtype. Finally, the prognostic influence of molecular features on the mutational subtype of LUAD in publicly available datasets. Results Our analyses yielded 1,564 gene features and 45 mutational features for 46 drugs. Applying the ML approach to predict the drug response for each medication revealed an upstanding performance for SVM in predicting Afuresertib drug response (area under the curve [AUC] 0.875) using CIT, GAS2L3, STAG3L3, ATP2B4-mut, and IL15RA-mut as molecular features. Furthermore, the ANN algorithm using 9 mRNA characteristics demonstrated the highest prediction performance (AUC 0.780) in Gefitinib with CCL23-mut. Conclusion This work extensively investigated the mRNA and mutation signatures associated with drug response in LUAD using a machine-learning approach and proposed a priority algorithm to predict drug response for different drugs.
Collapse
Affiliation(s)
- KEGANG JIA
- Department of Thoracic Surgery, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu, China
| | - YAWEI WANG
- School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - QI CAO
- Department of Assisted Reproductive Medicine, Sichuan Provincial Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China
| | - YOUYU WANG
- Department of Thoracic Surgery, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
71
|
Li R, Chen S, Matsumoto H, Gouda M, Gafforov Y, Wang M, Liu Y. Predicting rice diseases using advanced technologies at different scales: present status and future perspectives. ABIOTECH 2023; 4:359-371. [PMID: 38106429 PMCID: PMC10721578 DOI: 10.1007/s42994-023-00126-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 10/30/2023] [Indexed: 12/19/2023]
Abstract
The past few years have witnessed significant progress in emerging disease detection techniques for accurately and rapidly tracking rice diseases and predicting potential solutions. In this review we focus on image processing techniques using machine learning (ML) and deep learning (DL) models related to multi-scale rice diseases. Furthermore, we summarize applications of different detection techniques, including genomic, physiological, and biochemical approaches. In addition, we also present the state-of-the-art in contemporary optical sensing applications of pathogen-plant interaction phenotypes. This review serves as a valuable resource for researchers seeking effective solutions to address the challenges of high-throughput data and model recognition for early detection of issues affecting rice crops through ML and DL models.
Collapse
Affiliation(s)
- Ruyue Li
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, 310058 China
- College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, 310058 China
| | - Sishi Chen
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, 310058 China
| | - Haruna Matsumoto
- State Key Laboratory of Rice Biology, and Ministry of Agricultural and Rural Affairs Laboratory of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, 310058 China
| | - Mostafa Gouda
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, 310058 China
- Department of Nutrition and Food Science, National Research Centre, Giza, 12622 Egypt
| | - Yusufjon Gafforov
- Central Asian Center for Development Studies, New Uzbekistan University, Tashkent, 100000 Uzbekistan
| | - Mengcen Wang
- State Key Laboratory of Rice Biology, and Ministry of Agricultural and Rural Affairs Laboratory of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, 310058 China
- Global Education Program for AgriScience Frontiers, Graduate School of Agriculture, Hokkaido University, Sapporo, 060-8589 Japan
| | - Yufei Liu
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, 310058 China
| |
Collapse
|
72
|
Chao H, Zhang S, Hu Y, Ni Q, Xin S, Zhao L, Ivanisenko VA, Orlov YL, Chen M. Integrating omics databases for enhanced crop breeding. J Integr Bioinform 2023; 20:jib-2023-0012. [PMID: 37486120 PMCID: PMC10777369 DOI: 10.1515/jib-2023-0012] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 06/12/2023] [Indexed: 07/25/2023] Open
Abstract
Crop plant breeding involves selecting and developing new plant varieties with desirable traits such as increased yield, improved disease resistance, and enhanced nutritional value. With the development of high-throughput technologies, such as genomics, transcriptomics, and metabolomics, crop breeding has entered a new era. However, to effectively use these technologies, integration of multi-omics data from different databases is required. Integration of omics data provides a comprehensive understanding of the biological processes underlying plant traits and their interactions. This review highlights the importance of integrating omics databases in crop plant breeding, discusses available omics data and databases, describes integration challenges, and highlights recent developments and potential benefits. Taken together, the integration of omics databases is a critical step towards enhancing crop plant breeding and improving global food security.
Collapse
Affiliation(s)
- Haoyu Chao
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou310058, China
| | - Shilong Zhang
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou310058, China
| | - Yueming Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou310058, China
| | - Qingyang Ni
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou310058, China
| | - Saige Xin
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou310058, China
| | - Liang Zhao
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou310058, China
| | - Vladimir A. Ivanisenko
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk630090, Russia
| | - Yuriy L. Orlov
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk630090, Russia
- Agrarian and Technological Institute, Peoples’ Friendship University of Russia, Moscow117198, Russia
- The Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Russian Ministry of Health (Sechenov University), Moscow119991, Russia
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou310058, China
| |
Collapse
|
73
|
Gao S, Chen S, Yang M, Wu J, Chen S, Li H. Mining salt stress-related genes in Spartina alterniflora via analyzing co-evolution signal across 365 plant species using phylogenetic profiling. ABIOTECH 2023; 4:291-302. [PMID: 38106430 PMCID: PMC10721760 DOI: 10.1007/s42994-023-00125-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/23/2023] [Indexed: 12/19/2023]
Abstract
With the increasing number of sequenced species, phylogenetic profiling (PP) has become a powerful method to predict functional genes based on co-evolutionary information. However, its potential in plant genomics has not yet been fully explored. In this context, we combined the power of machine learning and PP to identify salt stress-related genes in a halophytic grass, Spartina alterniflora, using evolutionary information generated from 365 plant species. Our results showed that the genes highly co-evolved with known salt stress-related genes are enriched in biological processes of ion transport, detoxification and metabolic pathways. For ion transport, five identified genes coding two sodium and three potassium transporters were validated to be able to uptake Na+. In addition, we identified two orthologs of trichome-related AtR3-MYB genes, SaCPC1 and SaCPC2, which may be involved in salinity responses. Genes co-evolved with SaCPCs were enriched in functions related to the circadian rhythm and abiotic stress responses. Overall, this work demonstrates the feasibility of mining salt stress-related genes using evolutionary information, highlighting the potential of PP as a valuable tool for plant functional genomics. Supplementary Information The online version contains supplementary material available at 10.1007/s42994-023-00125-5.
Collapse
Affiliation(s)
- Shang Gao
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024 China
| | - Shoukun Chen
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024 China
- Hainan Yazhou Bay Seed Laboratory, Sanya, 572024 China
| | - Maogeng Yang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024 China
- Key Laboratory of Plant Molecular & Developmental Biology, College of Life Sciences, Yantai University, Yantai, 264005 China
| | - Jinran Wu
- The Institute for Learning Sciences and Teacher Education, Australian Catholic University, Brisbane, QLD 4001 Australia
| | - Shihua Chen
- Key Laboratory of Plant Molecular & Developmental Biology, College of Life Sciences, Yantai University, Yantai, 264005 China
| | - Huihui Li
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024 China
| |
Collapse
|
74
|
Lyu K, Xiao J, Lyu S, Liu R. Comparative Analysis of Transposable Elements in Strawberry Genomes of Different Ploidy Levels. Int J Mol Sci 2023; 24:16935. [PMID: 38069258 PMCID: PMC10706760 DOI: 10.3390/ijms242316935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 11/25/2023] [Accepted: 11/27/2023] [Indexed: 12/18/2023] Open
Abstract
Transposable elements (TEs) make up a large portion of plant genomes and play a vital role in genome structure, function, and evolution. Cultivated strawberry (Fragaria x ananassa) is one of the most important fruit crops, and its octoploid genome was formed through several rounds of genome duplications from diploid ancestors. Here, we built a pan-genome TE library for the Fragaria genus using ten published strawberry genomes at different ploidy levels, including seven diploids, one tetraploid, and two octoploids, and performed comparative analysis of TE content in these genomes. The TEs comprise 51.83% (F. viridis) to 60.07% (F. nilgerrensis) of the genomes. Long terminal repeat retrotransposons (LTR-RTs) are the predominant TE type in the Fragaria genomes (20.16% to 34.94%), particularly in F. iinumae (34.94%). Estimating TE content and LTR-RT insertion times revealed that species-specific TEs have shaped each strawberry genome. Additionally, the copy number of different LTR-RT families inserted in the last one million years reflects the genetic distance between Fragaria species. Comparing cultivated strawberry subgenomes to extant diploid ancestors showed that F. vesca and F. iinumae are likely the diploid ancestors of the cultivated strawberry, but not F. viridis. These findings provide new insights into the TE variations in the strawberry genomes and their roles in strawberry genome evolution.
Collapse
Affiliation(s)
- Keliang Lyu
- College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou 350002, China; (K.L.); (S.L.)
- Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China;
| | - Jiajing Xiao
- Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China;
| | - Shiheng Lyu
- College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou 350002, China; (K.L.); (S.L.)
| | - Renyi Liu
- Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China;
| |
Collapse
|
75
|
Toussaint PA, Leiser F, Thiebes S, Schlesner M, Brors B, Sunyaev A. Explainable artificial intelligence for omics data: a systematic mapping study. Brief Bioinform 2023; 25:bbad453. [PMID: 38113073 PMCID: PMC10729786 DOI: 10.1093/bib/bbad453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 07/28/2023] [Accepted: 11/08/2023] [Indexed: 12/21/2023] Open
Abstract
Researchers increasingly turn to explainable artificial intelligence (XAI) to analyze omics data and gain insights into the underlying biological processes. Yet, given the interdisciplinary nature of the field, many findings have only been shared in their respective research community. An overview of XAI for omics data is needed to highlight promising approaches and help detect common issues. Toward this end, we conducted a systematic mapping study. To identify relevant literature, we queried Scopus, PubMed, Web of Science, BioRxiv, MedRxiv and arXiv. Based on keywording, we developed a coding scheme with 10 facets regarding the studies' AI methods, explainability methods and omics data. Our mapping study resulted in 405 included papers published between 2010 and 2023. The inspected papers analyze DNA-based (mostly genomic), transcriptomic, proteomic or metabolomic data by means of neural networks, tree-based methods, statistical methods and further AI methods. The preferred post-hoc explainability methods are feature relevance (n = 166) and visual explanation (n = 52), while papers using interpretable approaches often resort to the use of transparent models (n = 83) or architecture modifications (n = 72). With many research gaps still apparent for XAI for omics data, we deduced eight research directions and discuss their potential for the field. We also provide exemplary research questions for each direction. Many problems with the adoption of XAI for omics data in clinical practice are yet to be resolved. This systematic mapping study outlines extant research on the topic and provides research directions for researchers and practitioners.
Collapse
Affiliation(s)
- Philipp A Toussaint
- Department of Economics and Management, Karlsruhe Institute of Technology, Karlsruhe, Germany
- HIDSS4Health – Helmholtz Information and Data Science School for Health, Karlsruhe, Heidelberg, Germany
| | - Florian Leiser
- Department of Economics and Management, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Scott Thiebes
- Department of Economics and Management, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Matthias Schlesner
- Biomedical Informatics, Data Mining and Data Analytics, Faculty of Applied Computer Science and Medical Faculty, University of Augsburg, Augsburg, Germany
| | - Benedikt Brors
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Translational Oncology, National Center for Tumor Diseases, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Ali Sunyaev
- Department of Economics and Management, Karlsruhe Institute of Technology, Karlsruhe, Germany
| |
Collapse
|
76
|
Sun Y, Zhao Z, Tong H, Sun B, Liu Y, Ren N, You S. Machine Learning Models for Inverse Design of the Electrochemical Oxidation Process for Water Purification. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17990-18000. [PMID: 37189261 DOI: 10.1021/acs.est.2c08771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
In this study, a machine learning (ML) framework is developed toward target-oriented inverse design of the electrochemical oxidation (EO) process for water purification. The XGBoost model exhibited the best performances for prediction of reaction rate (k) based on training the data set relevant to pollutant characteristics and reaction conditions, indicated by Rext2 of 0.84 and RMSEext of 0.79. Based on 315 data points collected from the literature, the current density, pollutant concentration, and gap energy (Egap) were identified to be the most impactful parameters available for the inverse design of the EO process. In particular, adding reaction conditions as model input features allowed provision of more available information and an increase in the sample size of the data set to improve the model accuracy. The feature importance analysis was performed for revealing the data pattern and feature interpretation by using Shapley additive explanations (SHAP). The ML-based inverse design for the EO process was generalized to a random case for tailoring the optimum conditions with phenol and 2,4-dichlorophenol (2,4-DCP) serving as model pollutants. The resulting predicted k values were close to the experimental k values by experimental verification, accounting for the relative error lower than 5%. This study provides a paradigm shift from conventional trial-and-error mode to data-driven mode for advancing research and development of the EO process by a time-saving, labor-effective, and environmentally friendly target-oriented strategy, which makes electrochemical water purification more efficient, more economic, and more sustainable in the context of global carbon peaking and carbon neutrality.
Collapse
Affiliation(s)
- Ye Sun
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
| | - Zhiyuan Zhao
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
| | - Hailong Tong
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
- State Key Laboratory of Veterinary Biotechnology, Harbin Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Harbin 150069, P. R. China
| | - Baiming Sun
- State Key Laboratory of Veterinary Biotechnology, Harbin Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Harbin 150069, P. R. China
| | - Yanbiao Liu
- College of Environmental Science and Engineering, Textile Pollution Controlling Engineering Center of the Ministry of Ecology and Environment, Donghua University, Shanghai 201620, China
| | - Nanqi Ren
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
| | - Shijie You
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
| |
Collapse
|
77
|
Arjmandi M, Fattahi M, Motevassel M, Rezaveisi H. Evaluating algorithms of decision tree, support vector machine and regression for anode side catalyst data in proton exchange membrane water electrolysis. Sci Rep 2023; 13:20309. [PMID: 37985795 PMCID: PMC10662483 DOI: 10.1038/s41598-023-47174-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/09/2023] [Indexed: 11/22/2023] Open
Abstract
Nowadays, due to the various type of problems stemmed from using chemical compounds and fossil fuels which have widely influence on whole environment including acid rain, polar ice melting and etc., number of researches have been leading on replacing the nonrenewable energy sources with renewable ones in order to produce clean fuels. Among these, hydrogen emerges as a quintessential clean fuel, garnering substantial attention for its potential to be synthesized from the electric power generated by renewable sources like nuclear and solar energies. This is achieved through the employment of a proton exchange membrane water electrolysis (PEMWE) system, widely recognized as one of the most proficient and economically viable technologies for effecting the separation of H2O into H+ and OH-. In this study, the important affecting parameters on the anode side of catalyst in PEMWE and analyzed them by machine-learning (ML) algorithms through developing a data science (DS) procedure were discussed. Various machine learning models were subjected to comparison, wherein the Decision Tree models, specifically those configured with maximum depths of 3 and 4, emerged as the optimal choices, attaining a perfect 100% accuracy across both Dataset 1 and Dataset 2. Moreover, notable enhancements in accuracy values were observed for the Support Vector Machine (SVM) model, registering increments from 0.79 to 0.82 for Dataset 1 and 2, respectively. In stark contrast, the remaining models experienced a decrement in their accuracy scores. This phenomenon underscores the pivotal role played by the data generation process in rendering the models more faithful to real-world scenarios.
Collapse
Affiliation(s)
- Mahdi Arjmandi
- Chemical Engineering Department, Abadan Faculty of Petroleum Engineering, Petroleum University of Technology, Abadan, Iran
| | - Moslem Fattahi
- Chemical Engineering Department, Abadan Faculty of Petroleum Engineering, Petroleum University of Technology, Abadan, Iran.
- Department of Chemical and Materials Engineering, University of Alberta, Edmonton, AB, Canada.
| | - Mohsen Motevassel
- Chemical Engineering Department, Abadan Faculty of Petroleum Engineering, Petroleum University of Technology, Abadan, Iran
| | - Hosna Rezaveisi
- Chemical Engineering Department, Faculty of Engineering, Razi University, Kermanshah, Iran
| |
Collapse
|
78
|
Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W, Lyu Q, Dun Y. Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models. Int J Mol Sci 2023; 24:15858. [PMID: 37958843 PMCID: PMC10649223 DOI: 10.3390/ijms242115858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 10/24/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Abstract
The data explosion driven by advancements in genomic research, such as high-throughput sequencing techniques, is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in various fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning, since we expect a superhuman intelligence that explores beyond our knowledge to interpret the genome from deep learning. A powerful deep learning model should rely on the insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with proper deep learning-based architecture, and we remark on practical considerations of developing deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research and point out current challenges and potential research directions for future genomics applications. We believe the collaborative use of ever-growing diverse data and the fast iteration of deep learning models will continue to contribute to the future of genomics.
Collapse
Affiliation(s)
- Tianwei Yue
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Yuanxin Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Longxiang Zhang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Chunming Gu
- Department of Biomedical Engineering, School of Medicine, Johns Hopkins University, Baltimore, MD 21218, USA;
| | - Haoru Xue
- The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA;
| | - Wenping Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Qi Lyu
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI 48824, USA;
| | - Yujie Dun
- School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an 710049, China;
| |
Collapse
|
79
|
Rodríguez-López M, Bordin N, Lees J, Scholes H, Hassan S, Saintain Q, Kamrad S, Orengo C, Bähler J. Broad functional profiling of fission yeast proteins using phenomics and machine learning. eLife 2023; 12:RP88229. [PMID: 37787768 PMCID: PMC10547477 DOI: 10.7554/elife.88229] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023] Open
Abstract
Many proteins remain poorly characterized even in well-studied organisms, presenting a bottleneck for research. We applied phenomics and machine-learning approaches with Schizosaccharomyces pombe for broad cues on protein functions. We assayed colony-growth phenotypes to measure the fitness of deletion mutants for 3509 non-essential genes in 131 conditions with different nutrients, drugs, and stresses. These analyses exposed phenotypes for 3492 mutants, including 124 mutants of 'priority unstudied' proteins conserved in humans, providing varied functional clues. For example, over 900 proteins were newly implicated in the resistance to oxidative stress. Phenotype-correlation networks suggested roles for poorly characterized proteins through 'guilt by association' with known proteins. For complementary functional insights, we predicted Gene Ontology (GO) terms using machine learning methods exploiting protein-network and protein-homology data (NET-FF). We obtained 56,594 high-scoring GO predictions, of which 22,060 also featured high information content. Our phenotype-correlation data and NET-FF predictions showed a strong concordance with existing PomBase GO annotations and protein networks, with integrated analyses revealing 1675 novel GO predictions for 783 genes, including 47 predictions for 23 priority unstudied proteins. Experimental validation identified new proteins involved in cellular aging, showing that these predictions and phenomics data provide a rich resource to uncover new protein functions.
Collapse
Affiliation(s)
- María Rodríguez-López
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Nicola Bordin
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jon Lees
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
- University of BristolBristolUnited Kingdom
| | - Harry Scholes
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Shaimaa Hassan
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
- Helwan University, Faculty of PharmacyCairoEgypt
| | - Quentin Saintain
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Stephan Kamrad
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Christine Orengo
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jürg Bähler
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| |
Collapse
|
80
|
Wu K, Xu C, Li T, Ma H, Gong J, Li X, Sun X, Hu X. Application of Nanotechnology in Plant Genetic Engineering. Int J Mol Sci 2023; 24:14836. [PMID: 37834283 PMCID: PMC10573821 DOI: 10.3390/ijms241914836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 09/20/2023] [Accepted: 09/28/2023] [Indexed: 10/15/2023] Open
Abstract
The ever-increasing food requirement with globally growing population demands advanced agricultural practices to improve grain yield, to gain crop resilience under unpredictable extreme weather, and to reduce production loss caused by insects and pathogens. To fulfill such requests, genome engineering technology has been applied to various plant species. To date, several generations of genome engineering methods have been developed. Among these methods, the new mainstream technology is clustered regularly interspaced short palindromic repeats (CRISPR) with nucleases. One of the most important processes in genome engineering is to deliver gene cassettes into plant cells. Conventionally used systems have several shortcomings, such as being labor- and time-consuming procedures, potential tissue damage, and low transformation efficiency. Taking advantage of nanotechnology, the nanoparticle-mediated gene delivery method presents technical superiority over conventional approaches due to its high efficiency and adaptability in different plant species. In this review, we summarize the evolution of plant biomolecular delivery methods and discussed their characteristics as well as limitations. We focused on the cutting-edge nanotechnology-based delivery system, and reviewed different types of nanoparticles, preparation of nanomaterials, mechanism of nanoparticle transport, and advanced application in plant genome engineering. On the basis of established methods, we concluded that the combination of genome editing, nanoparticle-mediated gene transformation and de novo regeneration technologies can accelerate crop improvement efficiently in the future.
Collapse
Affiliation(s)
- Kexin Wu
- Collaborative Innovation Center for Efficient and Green Production of Agriculture in Mountainous Areas of Zhejiang Province, College of Horticulture Science, Zhejiang A&F University, Hangzhou 311300, China
- Key Laboratory of Quality and Safety Control for Subtropical Fruit and Vegetable, Ministry of Agriculture and Rural Affairs, Hangzhou 311300, China
| | - Changbin Xu
- Collaborative Innovation Center for Efficient and Green Production of Agriculture in Mountainous Areas of Zhejiang Province, College of Horticulture Science, Zhejiang A&F University, Hangzhou 311300, China
- Key Laboratory of Quality and Safety Control for Subtropical Fruit and Vegetable, Ministry of Agriculture and Rural Affairs, Hangzhou 311300, China
| | - Tong Li
- Collaborative Innovation Center for Efficient and Green Production of Agriculture in Mountainous Areas of Zhejiang Province, College of Horticulture Science, Zhejiang A&F University, Hangzhou 311300, China
- Key Laboratory of Quality and Safety Control for Subtropical Fruit and Vegetable, Ministry of Agriculture and Rural Affairs, Hangzhou 311300, China
| | - Haijie Ma
- Collaborative Innovation Center for Efficient and Green Production of Agriculture in Mountainous Areas of Zhejiang Province, College of Horticulture Science, Zhejiang A&F University, Hangzhou 311300, China
- Key Laboratory of Quality and Safety Control for Subtropical Fruit and Vegetable, Ministry of Agriculture and Rural Affairs, Hangzhou 311300, China
| | - Jinli Gong
- Collaborative Innovation Center for Efficient and Green Production of Agriculture in Mountainous Areas of Zhejiang Province, College of Horticulture Science, Zhejiang A&F University, Hangzhou 311300, China
- Key Laboratory of Quality and Safety Control for Subtropical Fruit and Vegetable, Ministry of Agriculture and Rural Affairs, Hangzhou 311300, China
| | - Xiaolong Li
- Collaborative Innovation Center for Efficient and Green Production of Agriculture in Mountainous Areas of Zhejiang Province, College of Horticulture Science, Zhejiang A&F University, Hangzhou 311300, China
- Key Laboratory of Quality and Safety Control for Subtropical Fruit and Vegetable, Ministry of Agriculture and Rural Affairs, Hangzhou 311300, China
| | - Xuepeng Sun
- Collaborative Innovation Center for Efficient and Green Production of Agriculture in Mountainous Areas of Zhejiang Province, College of Horticulture Science, Zhejiang A&F University, Hangzhou 311300, China
- Key Laboratory of Quality and Safety Control for Subtropical Fruit and Vegetable, Ministry of Agriculture and Rural Affairs, Hangzhou 311300, China
| | - Xiaoli Hu
- Collaborative Innovation Center for Efficient and Green Production of Agriculture in Mountainous Areas of Zhejiang Province, College of Horticulture Science, Zhejiang A&F University, Hangzhou 311300, China
- Key Laboratory of Quality and Safety Control for Subtropical Fruit and Vegetable, Ministry of Agriculture and Rural Affairs, Hangzhou 311300, China
| |
Collapse
|
81
|
Chalka A, Dallman TJ, Vohra P, Stevens MP, Gally DL. The advantage of intergenic regions as genomic features for machine-learning-based host attribution of Salmonella Typhimurium from the USA. Microb Genom 2023; 9:001116. [PMID: 37843883 PMCID: PMC10634445 DOI: 10.1099/mgen.0.001116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 10/02/2023] [Indexed: 10/17/2023] Open
Abstract
Salmonella enterica is a taxonomically diverse pathogen with over 2600 serovars associated with a wide variety of animal hosts including humans, other mammals, birds and reptiles. Some serovars are host-specific or host-restricted and cause disease in distinct host species, while others, such as serovar S. Typhimurium (STm), are generalists and have the potential to colonize a wide variety of species. However, even within generalist serovars such as STm it is becoming clear that pathovariants exist that differ in tropism and virulence. Identifying the genetic factors underlying host specificity is complex, but the availability of thousands of genome sequences and advances in machine learning have made it possible to build specific host prediction models to aid outbreak control and predict the human pathogenic potential of isolates from animals and other reservoirs. We have advanced this area by building host-association prediction models trained on a wide range of genomic features and compared them with predictions based on nearest-neighbour phylogeny. SNPs, protein variants (PVs), antimicrobial resistance (AMR) profiles and intergenic regions (IGRs) were extracted from 3883 high-quality STm assemblies collected from humans, swine, bovine and poultry in the USA, and used to construct Random Forest (RF) machine learning models. An additional 244 recent STm assemblies from farm animals were used as a test set for further validation. The models based on PVs and IGRs had the best performance in terms of predicting the host of origin of isolates and outperformed nearest-neighbour phylogenetic host prediction as well as models based on SNPs or AMR data. However, the models did not yield reliable predictions when tested with isolates that were phylogenetically distinct from the training set. The IGR and PV models were often able to differentiate human isolates in clusters where the majority of isolates were from a single animal source. Notably, IGRs were the feature with the best performance across multiple models which may be due to IGRs acting as both a representation of their flanking genes, equivalent to PVs, while also capturing genomic regulatory variation, such as altered promoter regions. The IGR and PV models predict that ~45 % of the human infections with STm in the USA originate from bovine, ~40 % from poultry and ~14.5 % from swine, although sequences of isolates from other sources were not used for training. In summary, the research demonstrates a significant gain in accuracy for models with IGRs and PVs as features compared to SNP-based and core genome phylogeny predictions when applied within the existing population structure. This article contains data hosted by Microreact.
Collapse
Affiliation(s)
- Antonia Chalka
- The Roslin Institute and R(D)SVS, University of Edinburgh, Edinburgh, UK
| | - Tim J. Dallman
- Institute for Risk Assessment Sciences (IRAS), University of Utrecht, Heidelberglaan, Utrecht, Netherlands
| | - Prerna Vohra
- The Roslin Institute and R(D)SVS, University of Edinburgh, Edinburgh, UK
| | - Mark P. Stevens
- The Roslin Institute and R(D)SVS, University of Edinburgh, Edinburgh, UK
| | - David L. Gally
- The Roslin Institute and R(D)SVS, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
82
|
Mizrahi L, Choudhary A, Ofer P, Goldberg G, Milanesi E, Kelsoe JR, Gurwitz D, Alda M, Gage FH, Stern S. Immunoglobulin genes expressed in lymphoblastoid cell lines discern and predict lithium response in bipolar disorder patients. Mol Psychiatry 2023; 28:4280-4293. [PMID: 37488168 PMCID: PMC10827667 DOI: 10.1038/s41380-023-02183-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 07/03/2023] [Accepted: 07/06/2023] [Indexed: 07/26/2023]
Abstract
Bipolar disorder (BD) is a neuropsychiatric mood disorder manifested by recurrent episodes of mania and depression. More than half of BD patients are non-responsive to lithium, the first-line treatment drug, complicating BD clinical management. Given its unknown etiology, it is pertinent to understand the genetic signatures that lead to variability in lithium response. We discovered a set of differentially expressed genes (DEGs) from the lymphoblastoid cell lines (LCLs) of 10 controls and 19 BD patients belonging mainly to the immunoglobulin gene family that can be used as potential biomarkers to diagnose and treat BD. Importantly, we trained machine learning algorithms on our datasets that predicted the lithium response of BD subtypes with minimal errors, even when used on a different cohort of 24 BD patients acquired by a different laboratory. This proves the scalability of our methodology for predicting lithium response in BD and for a prompt and suitable decision on therapeutic interventions.
Collapse
Affiliation(s)
- Liron Mizrahi
- Sagol Department of Neurobiology, Faculty of Natural Sciences, University of Haifa, Haifa, 3498838, Israel
| | - Ashwani Choudhary
- Sagol Department of Neurobiology, Faculty of Natural Sciences, University of Haifa, Haifa, 3498838, Israel
| | - Polina Ofer
- Sagol Department of Neurobiology, Faculty of Natural Sciences, University of Haifa, Haifa, 3498838, Israel
| | - Gabriela Goldberg
- Laboratory of Genetics, The Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Elena Milanesi
- Victor Babes National Institute of Pathology, Bucharest, 050096, Romania
| | - John R Kelsoe
- Department of Psychiatry, University of California, San Diego, La Jolla, CA, 92093, USA
| | - David Gurwitz
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Martin Alda
- Department of Psychiatry, Dalhousie University, Halifax, NS, B3H 2E2, Canada
| | - Fred H Gage
- Laboratory of Genetics, The Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Shani Stern
- Sagol Department of Neurobiology, Faculty of Natural Sciences, University of Haifa, Haifa, 3498838, Israel.
| |
Collapse
|
83
|
He M, Tang B, Xiao Y, Tang S. Transmission dynamics informed neural network with application to COVID-19 infections. Comput Biol Med 2023; 165:107431. [PMID: 37696183 DOI: 10.1016/j.compbiomed.2023.107431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/26/2023] [Accepted: 08/28/2023] [Indexed: 09/13/2023]
Abstract
Since the end of 2019 the COVID-19 repeatedly surges with most countries/territories experiencing multiple waves, and mechanism-based epidemic models played important roles in understanding the transmission mechanism of multiple epidemic waves. However, capturing temporal changes of the transmissibility of COVID-19 during the multiple waves keeps ill-posed problem for traditional mechanism-based epidemic compartment models, because that the transmission rate is usually assumed to be specific piecewise functions and more parameters are added to the model once multiple epidemic waves involved, which poses a huge challenge to parameter estimation. Meanwhile, data-driven deep neural networks fail to discover the driving factors of repeated outbreaks and lack interpretability. In this study, aiming at developing a data-driven method to project time-dependent parameters but also merging the advantage of mechanism-based models, we propose a transmission dynamics informed neural network (TDINN) by encoding the SEIRD compartment model into deep neural networks. We show that the proposed TDINN algorithm performs very well when fitting the COVID-19 epidemic data with multiple waves, where the epidemics in the United States, Italy, South Africa, and Kenya, and several outbreaks the Omicron variant in China are taken as examples. In addition, the numerical simulation shows that the trained TDINN can also perform as a predictive model to capture the future development of COVID-19 epidemic. We find that the transmission rate inferred by the TDINN frequently fluctuates, and a feedback loop between the epidemic shifting and the changes of transmissibility drives the occurrence of multiple waves. We observe a long response delay to the implementation of control interventions in the four countries, while the decline of the transmission rate in the outbreaks in China usually happens once the implementation of control interventions. The further simulation show that 17 days' delay of the response to the implementation of control interventions lead to a roughly four-fold increase in daily reported cases in one epidemic wave in Italy, which suggest that a rapid response to policies that strengthen control interventions can be effective in flattening the epidemic curve or avoiding subsequent epidemic waves. We observe that the transmission rate in the outbreaks in China is already decreasing before enhancing control interventions, providing the evidence that the increasing of the epidemics can drive self-conscious behavioural changes to protect against infections.
Collapse
Affiliation(s)
- Mengqi He
- School of Mathematics and Statistics, Shaanxi Normal University, Xi'an, China
| | - Biao Tang
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China.
| | - Yanni Xiao
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
| | - Sanyi Tang
- School of Mathematics and Statistics, Shaanxi Normal University, Xi'an, China
| |
Collapse
|
84
|
Nazir A, Memon Z, Sadiq T, Rahman H, Khan IU. A Novel Feature-Selection Algorithm in IoT Networks for Intrusion Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:8153. [PMID: 37836983 PMCID: PMC10575335 DOI: 10.3390/s23198153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 09/19/2023] [Accepted: 09/25/2023] [Indexed: 10/15/2023]
Abstract
The Internet of Things (IoT) and network-enabled smart devices are crucial to the digitally interconnected society of the present day. However, the increased reliance on IoT devices increases their susceptibility to malicious activities within network traffic, posing significant challenges to cybersecurity. As a result, both system administrators and end users are negatively affected by these malevolent behaviours. Intrusion-detection systems (IDSs) are commonly deployed as a cyber attack defence mechanism to mitigate such risks. IDS plays a crucial role in identifying and preventing cyber hazards within IoT networks. However, the development of an efficient and rapid IDS system for the detection of cyber attacks remains a challenging area of research. Moreover, IDS datasets contain multiple features, so the implementation of feature selection (FS) is required to design an effective and timely IDS. The FS procedure seeks to eliminate irrelevant and redundant features from large IDS datasets, thereby improving the intrusion-detection system's overall performance. In this paper, we propose a hybrid wrapper-based feature-selection algorithm that is based on the concepts of the Cellular Automata (CA) engine and Tabu Search (TS)-based aspiration criteria. We used a Random Forest (RF) ensemble learning classifier to evaluate the fitness of the selected features. The proposed algorithm, CAT-S, was tested on the TON_IoT dataset. The simulation results demonstrate that the proposed algorithm, CAT-S, enhances classification accuracy while simultaneously reducing the number of features and the false positive rate.
Collapse
Affiliation(s)
- Anjum Nazir
- Department of Computer Science, National University of Computer and Emerging Sciences (NUCES—FAST), Karachi 75123, Pakistan; (A.N.); (Z.M.)
| | - Zulfiqar Memon
- Department of Computer Science, National University of Computer and Emerging Sciences (NUCES—FAST), Karachi 75123, Pakistan; (A.N.); (Z.M.)
| | - Touseef Sadiq
- Centre for Artificial Intelligence Research, Department of Information and Communication Technology, University of Agder, Jon Lilletuns vei 9, 4879 Grimstad, Norway
| | - Hameedur Rahman
- Department of Computer Games Development, Faculty of Computing & AI, Air University, E9, Islamabad 44400, Pakistan;
| | - Inam Ullah Khan
- Department of Electronic Engineering, School of Engineering & Applied Sciences (SEAS), Isra University, Islamabad Campus, Islamabad 44400, Pakistan;
| |
Collapse
|
85
|
Rehman S, Ahmad Z, Ramakrishnan M, Kalendar R, Zhuge Q. Regulation of plant epigenetic memory in response to cold and heat stress: towards climate resilient agriculture. Funct Integr Genomics 2023; 23:298. [PMID: 37700098 DOI: 10.1007/s10142-023-01219-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Revised: 08/18/2023] [Accepted: 08/23/2023] [Indexed: 09/14/2023]
Abstract
Plants have evolved to adapt and grow in hot and cold climatic conditions. Some also adapt to daily and seasonal temperature changes. Epigenetic modifications play an important role in regulating plant tolerance under such conditions. DNA methylation and post-translational modifications of histone proteins influence gene expression during plant developmental stages and under stress conditions, including cold and heat stress. While short-term modifications are common, some modifications may persist and result in stress memory that can be inherited by subsequent generations. Understanding the mechanisms of epigenomes responding to stress and the factors that trigger stress memory is crucial for developing climate-resilient agriculture, but such an integrated view is currently limited. This review focuses on the plant epigenetic stress memory during cold and heat stress. It also discusses the potential of machine learning to modify stress memory through epigenetics to develop climate-resilient crops.
Collapse
Affiliation(s)
- Shamsur Rehman
- Co-Innovation Center for Sustainable Forestry in Southern China, Key Laboratory of Forest Genetics and Biotechnology, College of Biology and the Environment, Nanjing Forestry University, Ministry of Education, Nanjing, China
| | - Zishan Ahmad
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, 210037, China
- Bamboo Research Institute, Nanjing Forestry University, Nanjing, 210037, China
| | - Muthusamy Ramakrishnan
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, 210037, China
- Bamboo Research Institute, Nanjing Forestry University, Nanjing, 210037, China
| | - Ruslan Kalendar
- Helsinki Institute of Life Science HiLIFE, Biocenter 3, Viikinkaari 1, FI-00014 University of Helsinki, Helsinki, Finland.
- Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan.
| | - Qiang Zhuge
- Co-Innovation Center for Sustainable Forestry in Southern China, Key Laboratory of Forest Genetics and Biotechnology, College of Biology and the Environment, Nanjing Forestry University, Ministry of Education, Nanjing, China.
| |
Collapse
|
86
|
Chang Q, Yan Z, Zhou M, Qu H, He X, Zhang H, Baskaran L, Al'Aref S, Li H, Zhang S, Metaxas DN. Mining multi-center heterogeneous medical data with distributed synthetic learning. Nat Commun 2023; 14:5510. [PMID: 37679325 PMCID: PMC10484909 DOI: 10.1038/s41467-023-40687-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 08/03/2023] [Indexed: 09/09/2023] Open
Abstract
Overcoming barriers on the use of multi-center data for medical analytics is challenging due to privacy protection and data heterogeneity in the healthcare system. In this study, we propose the Distributed Synthetic Learning (DSL) architecture to learn across multiple medical centers and ensure the protection of sensitive personal information. DSL enables the building of a homogeneous dataset with entirely synthetic medical images via a form of GAN-based synthetic learning. The proposed DSL architecture has the following key functionalities: multi-modality learning, missing modality completion learning, and continual learning. We systematically evaluate the performance of DSL on different medical applications using cardiac computed tomography angiography (CTA), brain tumor MRI, and histopathology nuclei datasets. Extensive experiments demonstrate the superior performance of DSL as a high-quality synthetic medical image provider by the use of an ideal synthetic quality metric called Dist-FID. We show that DSL can be adapted to heterogeneous data and remarkably outperforms the real misaligned modalities segmentation model by 55% and the temporal datasets segmentation model by 8%.
Collapse
Affiliation(s)
- Qi Chang
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | | | - Mu Zhou
- SenseBrain Research, Princeton, NJ, USA
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Hui Qu
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | - Xiaoxiao He
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | - Han Zhang
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | - Lohendran Baskaran
- Department of Cardiovascular Medicine, National Heart Centre Singapore, and Duke-National University Of Singapore, Singapore, Singapore
| | - Subhi Al'Aref
- Department of Medicine, Division of Cardiology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Hongsheng Li
- Chinese University of Hong Kong, Hong Kong SAR, China.
- Centre for Perceptual and Interactive Intelligence (CPII), Hong Kong SAR, China.
| | - Shaoting Zhang
- Shanghai Artificial Intelligence Laboratory, Shanghai, China.
- Centre for Perceptual and Interactive Intelligence (CPII), Hong Kong SAR, China.
- SenseTime, Shanghai, China.
| | - Dimitris N Metaxas
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA.
| |
Collapse
|
87
|
Lakiotaki K, Papadovasilakis Z, Lagani V, Fafalios S, Charonyktakis P, Tsagris M, Tsamardinos I. Automated machine learning for genome wide association studies. Bioinformatics 2023; 39:btad545. [PMID: 37672022 PMCID: PMC10562960 DOI: 10.1093/bioinformatics/btad545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 06/29/2023] [Accepted: 09/05/2023] [Indexed: 09/07/2023] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) present several computational and statistical challenges for their data analysis, including knowledge discovery, interpretability, and translation to clinical practice. RESULTS We develop, apply, and comparatively evaluate an automated machine learning (AutoML) approach, customized for genomic data that delivers reliable predictive and diagnostic models, the set of genetic variants that are important for predictions (called a biosignature), and an estimate of the out-of-sample predictive power. This AutoML approach discovers variants with higher predictive performance compared to standard GWAS methods, computes an individual risk prediction score, generalizes to new, unseen data, is shown to better differentiate causal variants from other highly correlated variants, and enhances knowledge discovery and interpretability by reporting multiple equivalent biosignatures. AVAILABILITY AND IMPLEMENTATION Code for this study is available at: https://github.com/mensxmachina/autoML-GWAS. JADBio offers a free version at: https://jadbio.com/sign-up/. SNP data can be downloaded from the EGA repository (https://ega-archive.org/). PRS data are found at: https://www.aicrowd.com/challenges/opensnp-height-prediction. Simulation data to study population structure can be found at: https://easygwas.ethz.ch/data/public/dataset/view/1/.
Collapse
Affiliation(s)
| | - Zaharias Papadovasilakis
- Department of Computer Science, University of Crete, Heraklion, Greece
- JADBio Gnosis DA S.A., Science and Technology Park of Crete, GR-70013 Heraklion, Greece
- Laboratory of Immune Regulation and Tolerance, School of Medicine, University of Crete, Heraklion, Greece
| | - Vincenzo Lagani
- Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology KAUST, Thuwal 23952, Saudi Arabia
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, Thuwal 23952, Saudi Arabia
- Institute of Chemical Biology, Ilia State University, Tbilisi, Georgia
| | - Stefanos Fafalios
- Department of Computer Science, University of Crete, Heraklion, Greece
- JADBio Gnosis DA S.A., Science and Technology Park of Crete, GR-70013 Heraklion, Greece
| | - Paulos Charonyktakis
- JADBio Gnosis DA S.A., Science and Technology Park of Crete, GR-70013 Heraklion, Greece
| | - Michail Tsagris
- Department of Computer Science, University of Crete, Heraklion, Greece
- Department of Economics, University of Crete, Heraklion, Greece
| | - Ioannis Tsamardinos
- Department of Computer Science, University of Crete, Heraklion, Greece
- JADBio Gnosis DA S.A., Science and Technology Park of Crete, GR-70013 Heraklion, Greece
| |
Collapse
|
88
|
Bustos-Aibar M, Aguilera CM, Alcalá-Fdez J, Ruiz-Ojeda FJ, Plaza-Díaz J, Plaza-Florido A, Tofe I, Gil-Campos M, Gacto MJ, Anguita-Ruiz A. Shared gene expression signatures between visceral adipose and skeletal muscle tissues are associated with cardiometabolic traits in children with obesity. Comput Biol Med 2023; 163:107085. [PMID: 37399741 DOI: 10.1016/j.compbiomed.2023.107085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 04/28/2023] [Accepted: 05/27/2023] [Indexed: 07/05/2023]
Abstract
Obesity in children is related to the development of cardiometabolic complications later in life, where molecular changes of visceral adipose tissue (VAT) and skeletal muscle tissue (SMT) have been proven to be fundamental. The aim of this study is to unveil the gene expression architecture of both tissues in a cohort of Spanish boys with obesity, using a clustering method known as weighted gene co-expression network analysis. For this purpose, we have followed a multi-objective analytic pipeline consisting of three main approaches; identification of gene co-expression clusters associated with childhood obesity, individually in VAT and SMT (intra-tissue, approach I); identification of gene co-expression clusters associated with obesity-metabolic alterations, individually in VAT and SMT (intra-tissue, approach II); and identification of gene co-expression clusters associated with obesity-metabolic alterations simultaneously in VAT and SMT (inter-tissue, approach III). In both tissues, we identified independent and inter-tissue gene co-expression signatures associated with obesity and cardiovascular risk, some of which exceeded multiple-test correction filters. In these signatures, we could identify some central hub genes (e.g., NDUFB8, GUCY1B1, KCNMA1, NPR2, PPP3CC) participating in relevant metabolic pathways exceeding multiple-testing correction filters. We identified the central hub genes PIK3R2, PPP3C and PTPN5 associated with MAPK signaling and insulin resistance terms. This is the first time that these genes have been associated with childhood obesity in both tissues. Therefore, they could be potential novel molecular targets for drugs and health interventions, opening new lines of research on the personalized care in this pathology. This work generates interesting hypotheses about the transcriptomics alterations underlying metabolic health alterations in obesity in the pediatric population.
Collapse
Affiliation(s)
- Mireia Bustos-Aibar
- Department of Biochemistry and Molecular Biology II, School of Pharmacy, University of Granada, 18071, Granada, Spain.
| | - Concepción M Aguilera
- Department of Biochemistry and Molecular Biology II, School of Pharmacy, University of Granada, 18071, Granada, Spain; Biomedical Research Networking Center for Physiopathology of Obesity and Nutrition, Carlos III Health Institute, 28029, Madrid, Spain.
| | - Jesús Alcalá-Fdez
- Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, 18071, Granada, Spain.
| | - Francisco J Ruiz-Ojeda
- Department of Biochemistry and Molecular Biology II, School of Pharmacy, University of Granada, 18071, Granada, Spain; RG Adipocytes and Metabolism, Institute for Diabetes and Obesity, Helmholtz Diabetes Center at the Helmholtz Zentrum München, Neuherberg, 85764, Munich, Germany.
| | - Julio Plaza-Díaz
- Department of Biochemistry and Molecular Biology II, School of Pharmacy, University of Granada, 18071, Granada, Spain; Children's Hospital of Eastern Ontario Research Institute, Ottawa, ON K1H 8L1, Ontario, Canada.
| | - Abel Plaza-Florido
- PROmoting FITness and Health through physical activity research group, Sport and Health University Research Institute, Department of Physical Education and Sports, University of Granada, 18071, Granada, Spain; Pediatric Exercise and Genomics Research Center, Department of Pediatrics, School of Medicine, University of California at Irvine, Irvine, 92617, CA, United States.
| | - Inés Tofe
- Biomedical Research Networking Center for Physiopathology of Obesity and Nutrition, Carlos III Health Institute, 28029, Madrid, Spain; University Clinical Hospital, Institute Maimónides of Biomedicine Investigation of Córdoba, University of Córdoba, 14004, Córdoba, Spain.
| | - Mercedes Gil-Campos
- Biomedical Research Networking Center for Physiopathology of Obesity and Nutrition, Carlos III Health Institute, 28029, Madrid, Spain; University Clinical Hospital, Institute Maimónides of Biomedicine Investigation of Córdoba, University of Córdoba, 14004, Córdoba, Spain.
| | - María J Gacto
- Department of Software Engineering, University of Granada, 18071, Granada, Spain.
| | - Augusto Anguita-Ruiz
- Department of Biochemistry and Molecular Biology II, School of Pharmacy, University of Granada, 18071, Granada, Spain; Barcelona Institute for Global Health, ISGlobal, 08003, Barcelona, Spain.
| |
Collapse
|
89
|
Prusokiene A, Prusokas A, Retkute R. Machine learning based lineage tree reconstruction improved with knowledge of higher level relationships between cells and genomic barcodes. NAR Genom Bioinform 2023; 5:lqad077. [PMID: 37608801 PMCID: PMC10440785 DOI: 10.1093/nargab/lqad077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 06/26/2023] [Accepted: 08/11/2023] [Indexed: 08/24/2023] Open
Abstract
Tracking cells as they divide and progress through differentiation is a fundamental step in understanding many biological processes, such as the development of organisms and progression of diseases. In this study, we investigate a machine learning approach to reconstruct lineage trees in experimental systems based on mutating synthetic genomic barcodes. We refine previously proposed methodology by embedding information of higher level relationships between cells and single-cell barcode values into a feature space. We test performance of the algorithm on shallow trees (up to 100 cells) and deep trees (up to 10 000 cells). Our proposed algorithm can improve tree reconstruction accuracy in comparison to reconstructions based on a maximum parsimony method, but this comes at a higher computational time requirement.
Collapse
Affiliation(s)
- Alisa Prusokiene
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK
| | | | - Renata Retkute
- Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge CB2 3EA, UK
| |
Collapse
|
90
|
Belova T, Biondi N, Hsieh PH, Lutsik P, Chudasama P, Kuijjer M. Heterogeneity in the gene regulatory landscape of leiomyosarcoma. NAR Cancer 2023; 5:zcad037. [PMID: 37492373 PMCID: PMC10365024 DOI: 10.1093/narcan/zcad037] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 07/06/2023] [Accepted: 07/18/2023] [Indexed: 07/27/2023] Open
Abstract
Characterizing inter-tumor heterogeneity is crucial for selecting suitable cancer therapy, as the presence of diverse molecular subgroups of patients can be associated with disease outcome or response to treatment. While cancer subtypes are often characterized by differences in gene expression, the mechanisms driving these differences are generally unknown. We set out to model the regulatory mechanisms driving sarcoma heterogeneity based on patient-specific, genome-wide gene regulatory networks. We developed a new computational framework, PORCUPINE, which combines knowledge on biological pathways with permutation-based network analysis to identify pathways that exhibit significant regulatory heterogeneity across a patient population. We applied PORCUPINE to patient-specific leiomyosarcoma networks modeled on data from The Cancer Genome Atlas and validated our results in an independent dataset from the German Cancer Research Center. PORCUPINE identified 37 heterogeneously regulated pathways, including pathways representing potential targets for treatment of subgroups of leiomyosarcoma patients, such as FGFR and CTLA4 inhibitory signaling. We validated the detected regulatory heterogeneity through analysis of networks and chromatin states in leiomyosarcoma cell lines. We showed that the heterogeneity identified with PORCUPINE is not associated with methylation profiles or clinical features, thereby suggesting an independent mechanism of patient heterogeneity driven by the complex landscape of gene regulatory interactions.
Collapse
Affiliation(s)
- Tatiana Belova
- Computational Biology and Systems Medicine Group, Centre for Molecular Medicine Norway, University of Oslo, Oslo, Norway
| | - Nicola Biondi
- Precision Sarcoma Research Group, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases, Heidelberg, Germany
| | - Ping-Han Hsieh
- Computational Biology and Systems Medicine Group, Centre for Molecular Medicine Norway, University of Oslo, Oslo, Norway
| | - Pavlo Lutsik
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Oncology, Catholic University (KU) Leuven, Leuven, Belgium
| | - Priya Chudasama
- Precision Sarcoma Research Group, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases, Heidelberg, Germany
| | - Marieke L Kuijjer
- Computational Biology and Systems Medicine Group, Centre for Molecular Medicine Norway, University of Oslo, Oslo, Norway
- Department of Pathology, Leiden University Medical Center, Leiden, the Netherlands
- Leiden Center for Computational Oncology, Leiden University Medical Center, Leiden, the Netherlands
| |
Collapse
|
91
|
Aradhya S, Facio FM, Metz H, Manders T, Colavin A, Kobayashi Y, Nykamp K, Johnson B, Nussbaum RL. Applications of artificial intelligence in clinical laboratory genomics. AMERICAN JOURNAL OF MEDICAL GENETICS. PART C, SEMINARS IN MEDICAL GENETICS 2023; 193:e32057. [PMID: 37507620 DOI: 10.1002/ajmg.c.32057] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023]
Abstract
The transition from analog to digital technologies in clinical laboratory genomics is ushering in an era of "big data" in ways that will exceed human capacity to rapidly and reproducibly analyze those data using conventional approaches. Accurately evaluating complex molecular data to facilitate timely diagnosis and management of genomic disorders will require supportive artificial intelligence methods. These are already being introduced into clinical laboratory genomics to identify variants in DNA sequencing data, predict the effects of DNA variants on protein structure and function to inform clinical interpretation of pathogenicity, link phenotype ontologies to genetic variants identified through exome or genome sequencing to help clinicians reach diagnostic answers faster, correlate genomic data with tumor staging and treatment approaches, utilize natural language processing to identify critical published medical literature during analysis of genomic data, and use interactive chatbots to identify individuals who qualify for genetic testing or to provide pre-test and post-test education. With careful and ethical development and validation of artificial intelligence for clinical laboratory genomics, these advances are expected to significantly enhance the abilities of geneticists to translate complex data into clearly synthesized information for clinicians to use in managing the care of their patients at scale.
Collapse
Affiliation(s)
- Swaroop Aradhya
- Invitae Corporation, San Francisco, California, USA
- Adjunct Clinical Faculty, Department of Pathology, Stanford University School of Medicine, Stanford, California, USA
| | | | - Hillery Metz
- Invitae Corporation, San Francisco, California, USA
| | - Toby Manders
- Invitae Corporation, San Francisco, California, USA
| | | | | | - Keith Nykamp
- Invitae Corporation, San Francisco, California, USA
| | | | - Robert L Nussbaum
- Invitae Corporation, San Francisco, California, USA
- Volunteer Faculty, School of Medicine, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
92
|
Zhuravleva SI, Zadorozhny AD, Shilov BV, Lagunin AA. Prediction of Amino Acid Substitutions in ABL1 Protein Leading to Tumor Drug Resistance Based on "Structure-Property" Relationship Classification Models. Life (Basel) 2023; 13:1807. [PMID: 37763211 PMCID: PMC10532460 DOI: 10.3390/life13091807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/15/2023] [Accepted: 08/21/2023] [Indexed: 09/29/2023] Open
Abstract
Drug resistance to anticancer drugs is a serious complication in patients with cancer. Typically, drug resistance occurs due to amino acid substitutions (AAS) in drug target proteins. The study aimed at developing and validating a new approach to the creation of structure-property relationships (SPR) classification models to predict AASs leading to drug resistance to inhibitors of tyrosine-protein kinase ABL1. The approach was based on the representation of AASs as peptides described in terms of structural formulas. The data on drug-resistant and non-resistant variants of AAS for two isoforms of ABL1 were extracted from the COSMIC database. The given training sets (approximately 700 missense variants) were used for the creation of SPR models in MultiPASS software based on substructural atom-centric multiple neighborhoods of atom (MNA) descriptors for the description of the structural formula of protein fragments and a Bayesian-like algorithm for revealing structure-property relationships. It was found that MNA descriptors of the 6th level and peptides from 11 amino acid residues were the best combination for ABL1 isoform 1 with the prediction accuracy (AUC) of resistance to imatinib (0.897) and dasatinib (0.996). For ABL1 isoform 2 (resistance to imatinib), the best combination was MNA descriptors of the 6th level, peptides form 15 amino acids (AUC value was 0.909). The prediction of possible drug-resistant AASs was made for dbSNP and gnomAD data. The six selected most probable imatinib-resistant AASs were additionally validated by molecular modeling and docking, which confirmed the possibility of resistance for the E334V and T392I variants.
Collapse
Affiliation(s)
- Svetlana I. Zhuravleva
- Department of Bioinformatics, Pirogov Russian National Research Medical University, 117997 Moscow, Russia; (S.I.Z.); (A.D.Z.); (B.V.S.)
| | - Anton D. Zadorozhny
- Department of Bioinformatics, Pirogov Russian National Research Medical University, 117997 Moscow, Russia; (S.I.Z.); (A.D.Z.); (B.V.S.)
| | - Boris V. Shilov
- Department of Bioinformatics, Pirogov Russian National Research Medical University, 117997 Moscow, Russia; (S.I.Z.); (A.D.Z.); (B.V.S.)
| | - Alexey A. Lagunin
- Department of Bioinformatics, Pirogov Russian National Research Medical University, 117997 Moscow, Russia; (S.I.Z.); (A.D.Z.); (B.V.S.)
- Department of Bioinformatics, Institute of Biomedical Chemistry, 119121 Moscow, Russia
| |
Collapse
|
93
|
Nguyen AH, Wang Z. Time-Distributed Framework for 3D Reconstruction Integrating Fringe Projection with Deep Learning. SENSORS (BASEL, SWITZERLAND) 2023; 23:7284. [PMID: 37631820 PMCID: PMC10458373 DOI: 10.3390/s23167284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 08/07/2023] [Accepted: 08/18/2023] [Indexed: 08/27/2023]
Abstract
In recent years, integrating structured light with deep learning has gained considerable attention in three-dimensional (3D) shape reconstruction due to its high precision and suitability for dynamic applications. While previous techniques primarily focus on processing in the spatial domain, this paper proposes a novel time-distributed approach for temporal structured-light 3D shape reconstruction using deep learning. The proposed approach utilizes an autoencoder network and time-distributed wrapper to convert multiple temporal fringe patterns into their corresponding numerators and denominators of the arctangent functions. Fringe projection profilometry (FPP), a well-known temporal structured-light technique, is employed to prepare high-quality ground truth and depict the 3D reconstruction process. Our experimental findings show that the time-distributed 3D reconstruction technique achieves comparable outcomes with the dual-frequency dataset (p = 0.014) and higher accuracy than the triple-frequency dataset (p = 1.029 × 10-9), according to non-parametric statistical tests. Moreover, the proposed approach's straightforward implementation of a single training network for multiple converters makes it more practical for scientific research and industrial applications.
Collapse
Affiliation(s)
- Andrew-Hieu Nguyen
- Neuroimaging Research Branch, National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD 21224, USA;
| | - Zhaoyang Wang
- Department of Mechanical Engineering, The Catholic University of America, Washington, DC 20064, USA
| |
Collapse
|
94
|
Liu C, Wu F, Jiang X, Hu Y, Shao K, Tang X, Qin B, Gao G. Climate Change Causes Salinity To Become Determinant in Shaping the Microeukaryotic Spatial Distribution among the Lakes of the Inner Mongolia-Xinjiang Plateau. Microbiol Spectr 2023; 11:e0317822. [PMID: 37306569 PMCID: PMC10434070 DOI: 10.1128/spectrum.03178-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 05/06/2023] [Indexed: 06/13/2023] Open
Abstract
Climate change greatly affects lake microorganisms in arid and semiarid zones, which alters ecosystem functions and the ecological security of lakes. However, the responses of lake microorganisms, especially microeukaryotes, to climate change are poorly understood. Here, using 18S ribosomal RNA (rRNA) high-throughput sequencing, we investigated the distribution patterns of microeukaryotic communities and whether and how climate change directly or indirectly affected the microeukaryotic communities on the Inner Mongolia-Xinjiang Plateau. Our results showed that climate change, as the main driving force of lake change, drives salinity to become a determinant of the microeukaryotic community among the lakes of the Inner Mongolia-Xinjiang Plateau. Salinity shapes the diversity and trophic level of the microeukaryotic community and further affects lake carbon cycling. Co-occurrence network analysis further revealed that increasing salinity reduced the complexity but improved the stability of microeukaryotic communities and changed ecological relationships. Meanwhile, increasing salinity enhanced the importance of deterministic processes in microeukaryotic community assembly, and the dominance of stochastic processes in freshwater lakes transformed into deterministic processes in salt lakes. Furthermore, we established lake biomonitoring and climate sentinel models by integrating microeukaryotic information, which would provide substantial improvements to our predictive ability of lake responses to climate change. IMPORTANCE Our findings have important implications for understanding the distribution patterns and the driving mechanisms of microeukaryotic communities among the lakes of the Inner Mongolia-Xinjiang Plateau and whether and how climate change directly or indirectly affects microeukaryotic communities. Our study also establishes the groundwork to use the lake microbiome for the assessment of aquatic ecological health and climate change, which is critical for ecosystem management and for projecting the ecological consequences of future climate warming.
Collapse
Affiliation(s)
- Changqing Liu
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Fan Wu
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xingyu Jiang
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, China
| | - Yang Hu
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, China
| | - Keqiang Shao
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, China
| | - Xiangming Tang
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, China
| | - Boqiang Qin
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, China
| | - Guang Gao
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, China
| |
Collapse
|
95
|
Kabir M, Stuart HM, Lopes FM, Fotiou E, Keavney B, Doig AJ, Woolf AS, Hentges KE. Predicting congenital renal tract malformation genes using machine learning. Sci Rep 2023; 13:13204. [PMID: 37580336 PMCID: PMC10425350 DOI: 10.1038/s41598-023-38110-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 07/03/2023] [Indexed: 08/16/2023] Open
Abstract
Congenital renal tract malformations (RTMs) are the major cause of severe kidney failure in children. Studies to date have identified defined genetic causes for only a minority of human RTMs. While some RTMs may be caused by poorly defined environmental perturbations affecting organogenesis, it is likely that numerous causative genetic variants have yet to be identified. Unfortunately, the speed of discovering further genetic causes for RTMs is limited by challenges in prioritising candidate genes harbouring sequence variants. Here, we exploited the computer-based artificial intelligence methodology of supervised machine learning to identify genes with a high probability of being involved in renal development. These genes, when mutated, are promising candidates for causing RTMs. With this methodology, the machine learning classifier determines which attributes are common to renal development genes and identifies genes possessing these attributes. Here we report the validation of an RTM gene classifier and provide predictions of the RTM association status for all protein-coding genes in the mouse genome. Overall, our predictions, whilst not definitive, can inform the prioritisation of genes when evaluating patient sequence data for genetic diagnosis. This knowledge of renal developmental genes will accelerate the processes of reaching a genetic diagnosis for patients born with RTMs.
Collapse
Affiliation(s)
- Mitra Kabir
- CentreDivision of Evolution, Infection and Genomics, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester, M13 9PT, UK
| | - Helen M Stuart
- CentreDivision of Evolution, Infection and Genomics, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester, M13 9PT, UK
- Manchester Centre for Genomic Medicine, St. Mary's Hospital, Health Innovation Manchester, Manchester University Foundation NHS Trust, Manchester, M13 9WL, UK
| | - Filipa M Lopes
- Division of Cell Matrix Biology and Regenerative Medicine, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, M13 9PL, UK
| | - Elisavet Fotiou
- Division of Cardiovascular Sciences, School of Medical Sciences, Faculty of Biology, Medicine, and Health, The University of Manchester, Manchester, M13 9PL, UK
- C.B.B Lifeline Biotech Ltd, 5 Propontidos Street, Strovolos, 2033, Nicosia, Cyprus
| | - Bernard Keavney
- Division of Cardiovascular Sciences, School of Medical Sciences, Faculty of Biology, Medicine, and Health, The University of Manchester, Manchester, M13 9PL, UK
- Manchester Heart Institute, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, M13 9WL, UK
| | - Andrew J Doig
- Division of Neuroscience, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Stopford Building, Manchester, M13 9BL, UK
| | - Adrian S Woolf
- Division of Cell Matrix Biology and Regenerative Medicine, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, M13 9PL, UK
- Department of Nephrology, Royal Manchester Children's Hospital, Manchester Academic Health Science Centre, Manchester, M13 9WL, UK
| | - Kathryn E Hentges
- CentreDivision of Evolution, Infection and Genomics, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester, M13 9PT, UK.
| |
Collapse
|
96
|
Komuro J, Kusumoto D, Hashimoto H, Yuasa S. Machine learning in cardiology: Clinical application and basic research. J Cardiol 2023; 82:128-133. [PMID: 37141938 DOI: 10.1016/j.jjcc.2023.04.020] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/23/2023] [Accepted: 04/28/2023] [Indexed: 05/06/2023]
Abstract
Machine learning is a subfield of artificial intelligence. The quality and versatility of machine learning have been rapidly improving and playing a critical role in many aspects of social life. This trend is also observed in the medical field. Generally, there are three main types of machine learning: supervised, unsupervised, and reinforcement learning. Each type of learning is adequately selected for the purpose and type of data. In the field of medicine, various types of information are collected and used, and research using machine learning is becoming increasingly relevant. Many clinical studies are conducted using electronic health and medical records, including in the cardiovascular area. Machine learning has also been applied in basic research. Machine learning has been widely used for several types of data analysis, such as clustering of microarray analysis and RNA sequence analysis. Machine learning is essential for genome and multi-omics analyses. This review summarizes the recent advancements in the use of machine learning in clinical applications and basic cardiovascular research.
Collapse
Affiliation(s)
- Jin Komuro
- Department of Cardiology, Keio University School of Medicine, Tokyo, Japan
| | - Dai Kusumoto
- Department of Cardiology, Keio University School of Medicine, Tokyo, Japan
| | - Hisayuki Hashimoto
- Department of Cardiology, Keio University School of Medicine, Tokyo, Japan
| | - Shinsuke Yuasa
- Department of Cardiology, Keio University School of Medicine, Tokyo, Japan.
| |
Collapse
|
97
|
Ong J, Waisberg E, Masalkhi M, Kamran SA, Lowry K, Sarker P, Zaman N, Paladugu P, Tavakkoli A, Lee AG. Artificial Intelligence Frameworks to Detect and Investigate the Pathophysiology of Spaceflight Associated Neuro-Ocular Syndrome (SANS). Brain Sci 2023; 13:1148. [PMID: 37626504 PMCID: PMC10452366 DOI: 10.3390/brainsci13081148] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 07/24/2023] [Accepted: 07/28/2023] [Indexed: 08/27/2023] Open
Abstract
Spaceflight associated neuro-ocular syndrome (SANS) is a unique phenomenon that has been observed in astronauts who have undergone long-duration spaceflight (LDSF). The syndrome is characterized by distinct imaging and clinical findings including optic disc edema, hyperopic refractive shift, posterior globe flattening, and choroidal folds. SANS serves a large barrier to planetary spaceflight such as a mission to Mars and has been noted by the National Aeronautics and Space Administration (NASA) as a high risk based on its likelihood to occur and its severity to human health and mission performance. While it is a large barrier to future spaceflight, the underlying etiology of SANS is not well understood. Current ophthalmic imaging onboard the International Space Station (ISS) has provided further insights into SANS. However, the spaceflight environment presents with unique challenges and limitations to further understand this microgravity-induced phenomenon. The advent of artificial intelligence (AI) has revolutionized the field of imaging in ophthalmology, particularly in detection and monitoring. In this manuscript, we describe the current hypothesized pathophysiology of SANS and the medical diagnostic limitations during spaceflight to further understand its pathogenesis. We then introduce and describe various AI frameworks that can be applied to ophthalmic imaging onboard the ISS to further understand SANS including supervised/unsupervised learning, generative adversarial networks, and transfer learning. We conclude by describing current research in this area to further understand SANS with the goal of enabling deeper insights into SANS and safer spaceflight for future missions.
Collapse
Affiliation(s)
- Joshua Ong
- Department of Ophthalmology and Visual Sciences, University of Michigan Kellogg Eye Center, Ann Arbor, MI 48105, USA
| | | | - Mouayad Masalkhi
- University College Dublin School of Medicine, Belfield, Dublin 4, Ireland
| | - Sharif Amit Kamran
- Human-Machine Perception Laboratory, Department of Computer Science and Engineering, University of Nevada, Reno, NV 89512, USA
| | | | - Prithul Sarker
- Human-Machine Perception Laboratory, Department of Computer Science and Engineering, University of Nevada, Reno, NV 89512, USA
| | - Nasif Zaman
- Human-Machine Perception Laboratory, Department of Computer Science and Engineering, University of Nevada, Reno, NV 89512, USA
| | - Phani Paladugu
- Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
- Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA 19107, USA
| | - Alireza Tavakkoli
- Human-Machine Perception Laboratory, Department of Computer Science and Engineering, University of Nevada, Reno, NV 89512, USA
| | - Andrew G. Lee
- Center for Space Medicine, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Ophthalmology, Blanton Eye Institute, Houston Methodist Hospital, Houston, TX 77030, USA
- The Houston Methodist Research Institute, Houston Methodist Hospital, Houston, TX 77030, USA
- Departments of Ophthalmology, Neurology, and Neurosurgery, Weill Cornell Medicine, New York, NY 10065, USA
- Department of Ophthalmology, University of Texas Medical Branch, Galveston, TX 77555, USA
- University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Texas A&M College of Medicine, Bryan, TX 77030, USA
- Department of Ophthalmology, The University of Iowa Hospitals and Clinics, Iowa City, IA 50010, USA
| |
Collapse
|
98
|
McDonnell KJ. Leveraging the Academic Artificial Intelligence Silecosystem to Advance the Community Oncology Enterprise. J Clin Med 2023; 12:4830. [PMID: 37510945 PMCID: PMC10381436 DOI: 10.3390/jcm12144830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 07/05/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023] Open
Abstract
Over the last 75 years, artificial intelligence has evolved from a theoretical concept and novel paradigm describing the role that computers might play in our society to a tool with which we daily engage. In this review, we describe AI in terms of its constituent elements, the synthesis of which we refer to as the AI Silecosystem. Herein, we provide an historical perspective of the evolution of the AI Silecosystem, conceptualized and summarized as a Kuhnian paradigm. This manuscript focuses on the role that the AI Silecosystem plays in oncology and its emerging importance in the care of the community oncology patient. We observe that this important role arises out of a unique alliance between the academic oncology enterprise and community oncology practices. We provide evidence of this alliance by illustrating the practical establishment of the AI Silecosystem at the City of Hope Comprehensive Cancer Center and its team utilization by community oncology providers.
Collapse
Affiliation(s)
- Kevin J McDonnell
- Center for Precision Medicine, Department of Medical Oncology & Therapeutics Research, City of Hope Comprehensive Cancer Center, Duarte, CA 91010, USA
| |
Collapse
|
99
|
Su YY, Liu YL, Huang HC, Lin CC. Ensemble learning model for identifying the hallmark genes of NFκB/TNF signaling pathway in cancers. J Transl Med 2023; 21:485. [PMID: 37475016 PMCID: PMC10357720 DOI: 10.1186/s12967-023-04355-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 07/13/2023] [Indexed: 07/22/2023] Open
Abstract
BACKGROUND The nuclear factor kappa B (NFκB) regulatory pathways downstream of tumor necrosis factor (TNF) play a critical role in carcinogenesis. However, the widespread influence of NFκB in cells can result in off-target effects, making it a challenging therapeutic target. Ensemble learning is a machine learning technique where multiple models are combined to improve the performance and robustness of the prediction. Accordingly, an ensemble learning model could uncover more precise targets within the NFκB/TNF signaling pathway for cancer therapy. METHODS In this study, we trained an ensemble learning model on the transcriptome profiles from 16 cancer types in the TCGA database to identify a robust set of genes that are consistently associated with the NFκB/TNF pathway in cancer. Our model uses cancer patients as features to predict the genes involved in the NFκB/TNF signaling pathway and can be adapted to predict the genes for different cancer types by switching the cancer type of patients. We also performed functional analysis, survival analysis, and a case study of triple-negative breast cancer to demonstrate our model's potential in translational cancer medicine. RESULTS Our model accurately identified genes regulated by NFκB in response to TNF in cancer patients. The downstream analysis showed that the identified genes are typically involved in the canonical NFκB-regulated pathways, particularly in adaptive immunity, anti-apoptosis, and cellular response to cytokine stimuli. These genes were found to have oncogenic properties and detrimental effects on patient survival. Our model also could distinguish patients with a specific cancer subtype, triple-negative breast cancer (TNBC), which is known to be influenced by NFκB-regulated pathways downstream of TNF. Furthermore, a functional module known as mononuclear cell differentiation was identified that accurately predicts TNBC patients and poor short-term survival in non-TNBC patients, providing a potential avenue for developing precision medicine for cancer subtypes. CONCLUSIONS In conclusion, our approach enables the discovery of genes in NFκB-regulated pathways in response to TNF and their relevance to carcinogenesis. We successfully categorized these genes into functional groups, providing valuable insights for discovering more precise and targeted cancer therapeutics.
Collapse
Affiliation(s)
- Yin-Yuan Su
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Yu-Ling Liu
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan
- Division of General Surgery, Department of Surgery, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
| | - Hsuan-Cheng Huang
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Chen-Ching Lin
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan.
| |
Collapse
|
100
|
Chen H, Liu Y, Balabani S, Hirayama R, Huang J. Machine Learning in Predicting Printable Biomaterial Formulations for Direct Ink Writing. RESEARCH (WASHINGTON, D.C.) 2023; 6:0197. [PMID: 37469394 PMCID: PMC10353544 DOI: 10.34133/research.0197] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 06/29/2023] [Indexed: 07/21/2023]
Abstract
Three-dimensional (3D) printing is emerging as a transformative technology for biomedical engineering. The 3D printed product can be patient-specific by allowing customizability and direct control of the architecture. The trial-and-error approach currently used for developing the composition of printable inks is time- and resource-consuming due to the increasing number of variables requiring expert knowledge. Artificial intelligence has the potential to reshape the ink development process by forming a predictive model for printability from experimental data. In this paper, we constructed machine learning (ML) algorithms including decision tree, random forest (RF), and deep learning (DL) to predict the printability of biomaterials. A total of 210 formulations including 16 different bioactive and smart materials and 4 solvents were 3D printed, and their printability was assessed. All ML methods were able to learn and predict the printability of a variety of inks based on their biomaterial formulations. In particular, the RF algorithm has achieved the highest accuracy (88.1%), precision (90.6%), and F1 score (87.0%), indicating the best overall performance out of the 3 algorithms, while DL has the highest recall (87.3%). Furthermore, the ML algorithms have predicted the printability window of biomaterials to guide the ink development. The printability map generated with DL has finer granularity than other algorithms. ML has proven to be an effective and novel strategy for developing biomaterial formulations with desired 3D printability for biomedical engineering applications.
Collapse
Affiliation(s)
- Hongyi Chen
- Department of Mechanical Engineering,
University College London, London, UK
- Department of Computer Science,
University College London, London, UK
| | - Yuanchang Liu
- Department of Mechanical Engineering,
University College London, London, UK
| | - Stavroula Balabani
- Department of Mechanical Engineering,
University College London, London, UK
- Wellcome-EPSRC Centre for Interventional Surgical Sciences (WEISS),
University College London, London, UK
| | - Ryuji Hirayama
- Department of Computer Science,
University College London, London, UK
| | - Jie Huang
- Department of Mechanical Engineering,
University College London, London, UK
| |
Collapse
|