1
|
Gokhale M, Mohanty SK, Ojha A. GeneViT: Gene Vision Transformer with Improved DeepInsight for cancer classification. Comput Biol Med 2023; 155:106643. [PMID: 36803792 DOI: 10.1016/j.compbiomed.2023.106643] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 01/03/2023] [Accepted: 02/05/2023] [Indexed: 02/09/2023]
Abstract
Analysis of gene expression data is crucial for disease prognosis and diagnosis. Gene expression data has high redundancy and noise that brings challenges in extracting disease information. Over the past decade, several conventional machine learning and deep learning models have been developed for classification of diseases using gene expressions. In recent years, vision transformer networks have shown promising performance in many fields due to their powerful attention mechanism that provides a better insight into the data characteristics. However, these network models have not been explored for gene expression analysis. In this paper, a method for classifying cancerous gene expression is presented that uses a Vision transformer. The proposed method first performs dimensionality reduction using a stacked autoencoder followed by an Improved DeepInsight algorithm that converts the data into image format. The data is then fed to the vision transformer for building the classification model. Performance of the proposed classification model is evaluated on ten benchmark datasets having binary classes or multiple classes. Its performance is also compared with nine existing classification models. The experimental results demonstrate that the proposed model outperforms existing methods. The t-SNE plots demonstrate the distinctive feature learning property of the model.
Collapse
Affiliation(s)
- Madhuri Gokhale
- Department of Computer Science & Engineering, Jabalpur Engineering College, Jabalpur, 482001, India; Computer Science & Engineering, PDPM Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, 482005, India.
| | - Sraban Kumar Mohanty
- Computer Science & Engineering, PDPM Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, 482005, India.
| | - Aparajita Ojha
- Computer Science & Engineering, PDPM Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, 482005, India.
| |
Collapse
|
2
|
Kidwai-Khan F, Rentsch CT, Pulk R, Alcorn C, Brandt CA, Justice AC. Pharmacogenomics driven decision support prototype with machine learning: A framework for improving patient care. Front Big Data 2022; 5:1059088. [DOI: 10.3389/fdata.2022.1059088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 10/31/2022] [Indexed: 11/16/2022] Open
Abstract
IntroductionA growing number of healthcare providers make complex treatment decisions guided by electronic health record (EHR) software interfaces. Many interfaces integrate multiple sources of data (e.g., labs, pharmacy, diagnoses) successfully, though relatively few have incorporated genetic data.MethodThis study utilizes informatics methods with predictive modeling to create and validate algorithms to enable informed pharmacogenomic decision-making at the point of care in near real-time. The proposed framework integrates EHR and genetic data relevant to the patient's current medications including decision support mechanisms based on predictive modeling. We created a prototype with EHR and linked genetic data from the Department of Veterans Affairs (VA), the largest integrated healthcare system in the US. The EHR data included diagnoses, medication fills, and outpatient clinic visits for 2,600 people with HIV and matched uninfected controls linked to prototypic genetic data (variations in single or multiple positions in the DNA sequence). We then mapped the medications that patients were prescribed to medications defined in the drug-gene interaction mapping of the Clinical Pharmacogenomics Implementation Consortium's (CPIC) level A (i.e., sufficient evidence for at least one prescribing action) guidelines that predict adverse events. CPIC is a National Institute of Health funded group of experts who develop evidence based pharmacogenomic guidelines. Preventable adverse events (PAE) can be defined as a harmful outcome from an intervention that could have been prevented. For this study, we focused on potential PAEs resulting from a medication-gene interaction.ResultsThe final model showed AUC scores of 0.972 with an F1 score of 0.97 with genetic data as compared to 0.766 and 0.73 respectively, without genetic data integration.DiscussionOver 98% of people in the cohort were on at least one medication with CPIC level a guideline in their lifetime. We compared predictive power of machine learning models to detect a PAE between five modeling methods: Random Forest, Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), K Nearest neighbors (KNN), and Decision Tree. We found that XGBoost performed best for the prototype when genetic data was added to the framework and improved prediction of PAE. We compared area under the curve (AUC) between the models in the testing dataset.
Collapse
|
3
|
Gudur VY, Maheshwari S, Acharyya A, Shafik R. An FPGA Based Energy-Efficient Read Mapper With Parallel Filtering and In-Situ Verification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2697-2711. [PMID: 34415836 DOI: 10.1109/tcbb.2021.3106311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In the assembly pipeline of Whole Genome Sequencing (WGS), read mapping is a widely used method to re-assemble the genome. It employs approximate string matching and dynamic programming-based algorithms on a large volume of data and associated structures, making it a computationally intensive process. Currently, the state-of-the-art data centers for genome sequencing incur substantial setup and energy costs for maintaining hardware, data storage and cooling systems. To enable low-cost genomics, we propose an energy-efficient architectural methodology for read mapping using a single system-on-chip (SoC) platform. The proposed methodology is based on the q-gram lemma and designed using a novel architecture for filtering and verification. The filtering algorithm is designed using a parallel sorted q-gram lemma based method for the first time, and it is complemented by an in-situ verification routine using parallel Myers bit-vector algorithm. We have implemented our design on the Zynq Ultrascale+ XCZU9EG MPSoC platform. It is then extensively validated using real genomic data to demonstrate up to 7.8× energy reduction and up to 13.3× less resource utilization when compared with the state-of-the-art software and hardware approaches.
Collapse
|
4
|
Louarn M, Chatonnet F, Garnier X, Fest T, Siegel A, Faron C, Dameron O. Improving reusability along the data life cycle: a regulatory circuits case study. J Biomed Semantics 2022; 13:11. [PMID: 35346379 PMCID: PMC8962212 DOI: 10.1186/s13326-022-00266-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 03/07/2022] [Indexed: 12/22/2022] Open
Abstract
Background In life sciences, there has been a long-standing effort of standardization and integration of reference datasets and databases. Despite these efforts, many studies data are provided using specific and non-standard formats. This hampers the capacity to reuse the studies data in other pipelines, the capacity to reuse the pipelines results in other studies, and the capacity to enrich the data with additional information. The Regulatory Circuits project is one of the largest efforts for integrating human cell genomics data to predict tissue-specific transcription factor-genes interaction networks. In spite of its success, it exhibits the usual shortcomings limiting its update, its reuse (as a whole or partially), and its extension with new data samples. To address these limitations, the resource has previously been integrated in an RDF triplestore so that TF-gene interaction networks could be generated with two SPARQL queries. However, this triplestore did not store the computed networks and did not integrate metadata about tissues and samples, therefore limiting the reuse of this dataset. In particular, it does not enable to reuse only a portion of Regulatory Circuits if a study focuses on a subset of the tissues, nor to combine the samples described in the datasets with samples from other studies. Overall, these limitations advocate for the design of a complete, flexible and reusable representation of the Regulatory Circuits dataset based on Semantic Web technologies. Results We provide a modular RDF representation of the Regulatory Circuits, called Linked Extended Regulatory Circuits (LERC). It consists in (i) descriptions of biological and experimental context mapped to the references databases, (ii) annotations about TF-gene interactions at the sample level for 808 samples, (iii) annotations about TF-gene interactions at the tissue level for 394 tissues, (iv) metadata connecting the knowledge graphs cited above. LERC is based on a modular organisation into 1,205 RDF named graphs for representing the biological data, the sample-specific and the tissue-specific networks, and the corresponding metadata. In total it contains 3,910,794,050 triples and is available as a SPARQL endpoint. Conclusion The flexible and modular architecture of LERC supports biologically-relevant SPARQL queries. It allows an easy and fast querying of the resources related to the initial Regulatory Circuits datasets and facilitates its reuse in other studies. Associated website https://regulatorycircuits-lod.genouest.org
Collapse
|
5
|
Carter JL, Critchlow J, Jackson S, Sanghvi S, Feger H, Chaudhry A, Foley L, Sofat R. Pharmacogenomic alerts: developing guidance for use by healthcare professionals. Br J Clin Pharmacol 2022; 88:3201-3210. [PMID: 35060169 PMCID: PMC9305234 DOI: 10.1111/bcp.15234] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 12/06/2021] [Accepted: 12/31/2021] [Indexed: 12/04/2022] Open
Abstract
Aims For diseases with a genetic cause, genomics can deliver improved diagnostics and facilitate access to targeted treatments. Drug pharmacodynamics and pharmacokinetics are often dependent on genetic variation underlying these processes. As pharmacogenomics comes of age, it may be the first way in which genomics is utilised at a population level. Still required is guidance and standards of how genomic information can be communicated within the health record, and how clinicians should be alerted to variation impacting the use of medicines. Methods The Professional Record Standards Body commissioned by NHS England developed guidance on using pharmacogenomics information in clinical practice. We conducted research with those implementing pharmacogenomics in England and internationally to produce guidance and recommendations for a systems‐based approach. Results A consensus viewpoint is that systems need to be in place to ensure the safe provision of pharmacogenomics information that is curated, actionable and up‐to‐date. Standards should be established with respect to notification and information exchange, which could impact new or existing prescribing and these must be in keeping with routine practice. Alerting systems should contribute to safer practices. Conclusion Ensuring pharmacogenetics information is available to make safer use of medicines will require a major effort, of which this guidance is a beginning. Standards are required to ensure useful genomic information within the health record can be communicated to clinicians in the right format and at the right times to be actioned successfully. A multidisciplinary group of stakeholders must be engaged in developing pharmacogenomic standards to support the most appropriate prescribing.
Collapse
Affiliation(s)
- John‐Paul L. Carter
- Centre for Clinical Pharmacology and Therapeutics, Institute of Health Informatics University College London
| | - James Critchlow
- Centre for Clinical Pharmacology and Therapeutics, Institute of Health Informatics University College London
- Professional Record Standards Body London UK
| | | | | | | | - Afzal Chaudhry
- Professional Record Standards Body London UK
- Hospital & Renal Medicine Cambridge University Hospitals NHS Foundation Trust
| | | | - Reecha Sofat
- Centre for Clinical Pharmacology and Therapeutics, Institute of Health Informatics University College London
| |
Collapse
|
6
|
Johnson KB, Clayton EW, Starren J, Peterson J. The Implementation Chasm Hindering Genome-informed Health Care. THE JOURNAL OF LAW, MEDICINE & ETHICS : A JOURNAL OF THE AMERICAN SOCIETY OF LAW, MEDICINE & ETHICS 2020; 48:119-125. [PMID: 32342791 PMCID: PMC7395963 DOI: 10.1177/1073110520916999] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The promises of precision medicine are often heralded in the medical and lay literature, but routine integration of genomics in clinical practice is still limited. While the "last mile' infrastructure to bring genomics to the bedside has been demonstrated in some healthcare settings, a number of challenges remain - both in the receptivity of today's health system and in its technical and educational readiness to respond to this evolution in care. To improve the impact of genomics on health and disease management, we will need to integrate both new knowledge and new care processes into existing workflows. This change will be onerous and time-consuming, but hopefully valuable to the provision of high quality, economically feasible care worldwide.
Collapse
Affiliation(s)
- Kevin B Johnson
- Kevin B. Johnson, M.D., M.S., is Cornelius Vanderbilt Professor and Chair of Biomedical Informatics, with a joint appointment in the Department of Pediatrics at Vanderbilt University Medical Center. He received his M.D. from Johns Hopkins Hospital in Baltimore and his M.S. in Medical Informatics from Stanford University in 1992. Ellen Wright Clayton, M.D., J.D., is the Craig-Weaver Professor of Pediatrics, Professor of Health Policy in the Center for Biomedical Ethics and Society at Vanderbilt University Medical Center, and Professor of Law at Vanderbilt University. She has been studying the ethical, legal, and social implications of genetics research and its translation to the clinic for many years. She is currently a PI of LawSeq as well as GetPreCiSe, a Center of Excellence in ELSI Research focused on genetic privacy and identity, and has been an investigator in the eMERGE Network since its inception. Justin Starren, M.D., M.S., Ph.D., is Professor of Preventive Medicine and Medical Social Sciences and Chief of the Division of Health and Biomedical Informatics at the Northwestern University Feinberg School of Medicine. He received his M.D. and M.S. in Immunogenetics from Washington University in St. Louis in 1987, and his Ph.D. in Biomedical Informatics from Columbia University in 1997. Josh Peterson, M.D., M.P.H., is an Associate Professor of Biomedical Informatics and Medicine at Vanderbilt University Medical Center. He received his M.D. from Vanderbilt University in 1997 and his M.P.H. from Harvard University School of Public Health in 2002
| | - Ellen Wright Clayton
- Kevin B. Johnson, M.D., M.S., is Cornelius Vanderbilt Professor and Chair of Biomedical Informatics, with a joint appointment in the Department of Pediatrics at Vanderbilt University Medical Center. He received his M.D. from Johns Hopkins Hospital in Baltimore and his M.S. in Medical Informatics from Stanford University in 1992. Ellen Wright Clayton, M.D., J.D., is the Craig-Weaver Professor of Pediatrics, Professor of Health Policy in the Center for Biomedical Ethics and Society at Vanderbilt University Medical Center, and Professor of Law at Vanderbilt University. She has been studying the ethical, legal, and social implications of genetics research and its translation to the clinic for many years. She is currently a PI of LawSeq as well as GetPreCiSe, a Center of Excellence in ELSI Research focused on genetic privacy and identity, and has been an investigator in the eMERGE Network since its inception. Justin Starren, M.D., M.S., Ph.D., is Professor of Preventive Medicine and Medical Social Sciences and Chief of the Division of Health and Biomedical Informatics at the Northwestern University Feinberg School of Medicine. He received his M.D. and M.S. in Immunogenetics from Washington University in St. Louis in 1987, and his Ph.D. in Biomedical Informatics from Columbia University in 1997. Josh Peterson, M.D., M.P.H., is an Associate Professor of Biomedical Informatics and Medicine at Vanderbilt University Medical Center. He received his M.D. from Vanderbilt University in 1997 and his M.P.H. from Harvard University School of Public Health in 2002
| | - Justin Starren
- Kevin B. Johnson, M.D., M.S., is Cornelius Vanderbilt Professor and Chair of Biomedical Informatics, with a joint appointment in the Department of Pediatrics at Vanderbilt University Medical Center. He received his M.D. from Johns Hopkins Hospital in Baltimore and his M.S. in Medical Informatics from Stanford University in 1992. Ellen Wright Clayton, M.D., J.D., is the Craig-Weaver Professor of Pediatrics, Professor of Health Policy in the Center for Biomedical Ethics and Society at Vanderbilt University Medical Center, and Professor of Law at Vanderbilt University. She has been studying the ethical, legal, and social implications of genetics research and its translation to the clinic for many years. She is currently a PI of LawSeq as well as GetPreCiSe, a Center of Excellence in ELSI Research focused on genetic privacy and identity, and has been an investigator in the eMERGE Network since its inception. Justin Starren, M.D., M.S., Ph.D., is Professor of Preventive Medicine and Medical Social Sciences and Chief of the Division of Health and Biomedical Informatics at the Northwestern University Feinberg School of Medicine. He received his M.D. and M.S. in Immunogenetics from Washington University in St. Louis in 1987, and his Ph.D. in Biomedical Informatics from Columbia University in 1997. Josh Peterson, M.D., M.P.H., is an Associate Professor of Biomedical Informatics and Medicine at Vanderbilt University Medical Center. He received his M.D. from Vanderbilt University in 1997 and his M.P.H. from Harvard University School of Public Health in 2002
| | - Josh Peterson
- Kevin B. Johnson, M.D., M.S., is Cornelius Vanderbilt Professor and Chair of Biomedical Informatics, with a joint appointment in the Department of Pediatrics at Vanderbilt University Medical Center. He received his M.D. from Johns Hopkins Hospital in Baltimore and his M.S. in Medical Informatics from Stanford University in 1992. Ellen Wright Clayton, M.D., J.D., is the Craig-Weaver Professor of Pediatrics, Professor of Health Policy in the Center for Biomedical Ethics and Society at Vanderbilt University Medical Center, and Professor of Law at Vanderbilt University. She has been studying the ethical, legal, and social implications of genetics research and its translation to the clinic for many years. She is currently a PI of LawSeq as well as GetPreCiSe, a Center of Excellence in ELSI Research focused on genetic privacy and identity, and has been an investigator in the eMERGE Network since its inception. Justin Starren, M.D., M.S., Ph.D., is Professor of Preventive Medicine and Medical Social Sciences and Chief of the Division of Health and Biomedical Informatics at the Northwestern University Feinberg School of Medicine. He received his M.D. and M.S. in Immunogenetics from Washington University in St. Louis in 1987, and his Ph.D. in Biomedical Informatics from Columbia University in 1997. Josh Peterson, M.D., M.P.H., is an Associate Professor of Biomedical Informatics and Medicine at Vanderbilt University Medical Center. He received his M.D. from Vanderbilt University in 1997 and his M.P.H. from Harvard University School of Public Health in 2002
| |
Collapse
|
7
|
Dhombres F, Charlet J. Formal Medical Knowledge Representation Supports Deep Learning Algorithms, Bioinformatics Pipelines, Genomics Data Analysis, and Big Data Processes. Yearb Med Inform 2019; 28:152-155. [PMID: 31419827 PMCID: PMC6697514 DOI: 10.1055/s-0039-1677933] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
OBJECTIVE To select, present, and summarize the best papers published in 2018 in the field of Knowledge Representation and Management (KRM). METHODS A comprehensive and standardized review of the medical informatics literature was performed to select the most interesting papers published in 2018 in KRM, based on PubMed and ISI Web Of Knowledge queries. RESULTS Four best papers were selected among the 962 publications retrieved following the Yearbook review process. The research areas in 2018 were mainly related to the ontology-based data integration for phenotype-genotype association mining, the design of ontologies and their application, and the semantic annotation of clinical texts. CONCLUSION In the KRM selection for 2018, research on semantic representations demonstrated their added value for enhanced deep learning approaches in text mining and for designing novel bioinformatics pipelines based on graph databases. In addition, the ontology structure can enrich the analyses of whole genome expression data. Finally, semantic representations demonstrated promising results to process phenotypic big data.
Collapse
Affiliation(s)
- Ferdinand Dhombres
- Sorbonne Université, Université Paris 13, Sorbonne Paris Cité, INSERM, UMR_S 1142, LIMICS, Paris, France.,Médecine Sorbonne Université, Service de Médecine Fætale, AP-HP/HUEP, Hôpital Armand Trousseau, Paris, France
| | - Jean Charlet
- Sorbonne Université, Université Paris 13, Sorbonne Paris Cité, INSERM, UMR_S 1142, LIMICS, Paris, France.,AP-HP, Delegation for Clinical Research and Innovation, Paris, France
| | | |
Collapse
|