1
|
Pu J, Yu Y, Liu Y, Wang D, Gui S, Zhong X, Chen W, Chen X, Chen Y, Chen X, Qiao R, Jiang Y, Zhang H, Fan L, Ren Y, Chen X, Wang H, Xie P. ProMENDA: an updated resource for proteomic and metabolomic characterization in depression. Transl Psychiatry 2024; 14:229. [PMID: 38816410 PMCID: PMC11139925 DOI: 10.1038/s41398-024-02948-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 05/15/2024] [Accepted: 05/17/2024] [Indexed: 06/01/2024] Open
Abstract
Depression is a prevalent mental disorder with a complex biological mechanism. Following the rapid development of systems biology technology, a growing number of studies have applied proteomics and metabolomics to explore the molecular profiles of depression. However, a standardized resource facilitating the identification and annotation of the available knowledge from these scattered studies associated with depression is currently lacking. This study presents ProMENDA, an upgraded resource that provides a platform for manual annotation of candidate proteins and metabolites linked to depression. Following the establishment of the protein dataset and the update of the metabolite dataset, the ProMENDA database was developed as a major extension of its initial release. A multi-faceted annotation scheme was employed to provide comprehensive knowledge of the molecules and studies. A new web interface was also developed to improve the user experience. The ProMENDA database now contains 43,366 molecular entries, comprising 20,847 protein entries and 22,519 metabolite entries, which were manually curated from 1370 human, rat, mouse, and non-human primate studies. This represents a significant increase (more than 7-fold) in molecular entries compared to the initial release. To demonstrate the usage of ProMENDA, a case study identifying consistently reported proteins and metabolites in the brains of animal models of depression was presented. Overall, ProMENDA is a comprehensive resource that offers a panoramic view of proteomic and metabolomic knowledge in depression. ProMENDA is freely available at https://menda.cqmu.edu.cn .
Collapse
Affiliation(s)
- Juncai Pu
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yue Yu
- Department of Health Sciences Research, Mayo Clinic, MN, 55901, USA
| | - Yiyun Liu
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Dongfang Wang
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Siwen Gui
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiaogang Zhong
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Weiyi Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiaopeng Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yue Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiang Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Renjie Qiao
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yanyi Jiang
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Hanping Zhang
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Li Fan
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yi Ren
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiangyu Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Haiyang Wang
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Peng Xie
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
- The Jinfeng Laboratory, Chongqing, 401336, China.
- Chongqing Institute for Brain and Intelligence, Chongqing, 400072, China.
| |
Collapse
|
2
|
Di Maria A, Bellomo L, Billeci F, Cardillo A, Alaimo S, Ferragina P, Ferro A, Pulvirenti A. NetMe 2.0: a web-based platform for extracting and modeling knowledge from biomedical literature as a labeled graph. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae194. [PMID: 38597890 DOI: 10.1093/bioinformatics/btae194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/29/2024] [Accepted: 04/08/2024] [Indexed: 04/11/2024]
Abstract
MOTIVATION The rapid increase of bio-medical literature makes it harder and harder for scientists to keep pace with the discoveries on which they build their studies. Therefore, computational tools have become more widespread, among which network analysis plays a crucial role in several life-science contexts. Nevertheless, building correct and complete networks about some user-defined biomedical topics on top of the available literature is still challenging. RESULTS We introduce NetMe 2.0, a web-based platform that automatically extracts relevant biomedical entities and their relations from a set of input texts-i.e. in the form of full-text or abstract of PubMed Central's papers, free texts, or PDFs uploaded by users-and models them as a BioMedical Knowledge Graph (BKG). NetMe 2.0 also implements an innovative Retrieval Augmented Generation module (Graph-RAG) that works on top of the relationships modeled by the BKG and allows the distilling of well-formed sentences that explain their content. The experimental results show that NetMe 2.0 can infer comprehensive and reliable biological networks with significant Precision-Recall metrics when compared to state-of-the-art approaches. AVAILABILITY AND IMPLEMENTATION https://netme.click/.
Collapse
Affiliation(s)
- Antonio Di Maria
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | | | - Fabrizio Billeci
- Department of Computer Science, University of Catania, Catania, 95125, Italy
| | - Alfio Cardillo
- Department of Computer Science, University of Catania, Catania, 95125, Italy
| | - Salvatore Alaimo
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | - Paolo Ferragina
- Department of Computer Science, University of Pisa, Pisa, 56126 , Italy
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| |
Collapse
|
3
|
Kokoli M, Karatzas E, Baltoumas FA, Schneider R, Pafilis E, Paragkamian S, Doncheva NT, Jensen L, Pavlopoulos G. Arena3D web: interactive 3D visualization of multilayered networks supporting multiple directional information channels, clustering analysis and application integration. NAR Genom Bioinform 2023; 5:lqad053. [PMID: 37260509 PMCID: PMC10227371 DOI: 10.1093/nargab/lqad053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/25/2023] [Accepted: 05/18/2023] [Indexed: 06/02/2023] Open
Abstract
Arena3Dweb is an interactive web tool that visualizes multi-layered networks in 3D space. In this update, Arena3Dweb supports directed networks as well as up to nine different types of connections between pairs of nodes with the use of Bézier curves. It comes with different color schemes (light/gray/dark mode), custom channel coloring, four node clustering algorithms which one can run on-the-fly, visualization in VR mode and predefined layer layouts (zig-zag, star and cube). This update also includes enhanced navigation controls (mouse orbit controls, layer dragging and layer/node selection), while its newly developed API allows integration with external applications as well as saving and loading of sessions in JSON format. Finally, a dedicated Cytoscape app has been developed, through which users can automatically send their 2D networks from Cytoscape to Arena3Dweb for 3D multi-layer visualization. Arena3Dweb is accessible at http://arena3d.pavlopouloslab.info or http://arena3d.org.
Collapse
Affiliation(s)
| | | | - Fotis A Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari16672, Greece
| | - Reinhard Schneider
- University of Luxembourg, Luxembourg Centre for Systems Biomedicine, Bioinformatics Core, Esch-sur-Alzette, Luxembourg
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, Heraklion 71003, Greece
| | - Savvas Paragkamian
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, Heraklion 71003, Greece
- Department of Biology, University of Crete, Voutes University Campus, P.O. Box 2208, 70013 Heraklion, Crete, Greece
| | - Nadezhda T Doncheva
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen N DK-2200, Denmark
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen N DK-2200, Denmark
| | | |
Collapse
|
4
|
Vafiadaki E, Glijnis PC, Doevendans PA, Kranias EG, Sanoudou D. Phospholamban R14del disease: The past, the present and the future. Front Cardiovasc Med 2023; 10:1162205. [PMID: 37144056 PMCID: PMC10151546 DOI: 10.3389/fcvm.2023.1162205] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 04/03/2023] [Indexed: 05/06/2023] Open
Abstract
Arrhythmogenic cardiomyopathy affects significant number of patients worldwide and is characterized by life-threatening ventricular arrhythmias and sudden cardiac death. Mutations in multiple genes with diverse functions have been reported to date including phospholamban (PLN), a key regulator of sarcoplasmic reticulum (SR) Ca2+ homeostasis and cardiac contractility. The PLN-R14del variant in specific is recognized as the cause in an increasing number of patients worldwide, and extensive investigations have enabled rapid advances towards the delineation of PLN-R14del disease pathogenesis and discovery of an effective treatment. We provide a critical overview of current knowledge on PLN-R14del disease pathophysiology, including clinical, animal model, cellular and biochemical studies, as well as diverse therapeutic approaches that are being pursued. The milestones achieved in <20 years, since the discovery of the PLN R14del mutation (2006), serve as a paradigm of international scientific collaboration and patient involvement towards finding a cure.
Collapse
Affiliation(s)
- Elizabeth Vafiadaki
- Center of Basic Research, Biomedical Research Foundation of the Academy of Athens, Athens, Greece
- Correspondence: Elizabeth Vafiadaki Despina Sanoudou
| | - Pieter C. Glijnis
- Stichting Genetische Hartspierziekte PLN, Phospholamban Foundation, Wieringerwerf, Netherlands
| | - Pieter A. Doevendans
- Netherlands Heart Institute, Utrecht, Netherlands
- Department of Cardiology, University Medical Center Utrecht, Utrecht, Netherlands
| | - Evangelia G. Kranias
- Center of Basic Research, Biomedical Research Foundation of the Academy of Athens, Athens, Greece
- Department of Pharmacology and Systems Physiology, University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Despina Sanoudou
- Center of Basic Research, Biomedical Research Foundation of the Academy of Athens, Athens, Greece
- Clinical Genomics and Pharmacogenomics Unit, 4th Department of Internal Medicine, Attikon Hospital, Medical School, National and Kapodistrian University of Athens, Athens, Greece
- Center for New Biotechnologies and Precision Medicine, Medical School, National and Kapodistrian University of Athens, Athens, Greece
- Correspondence: Elizabeth Vafiadaki Despina Sanoudou
| |
Collapse
|
5
|
Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023; 3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, United States
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Nikos C. Kyrpides
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Center of New Biotechnologies and Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece
- Hellenic Army Academy, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| |
Collapse
|
6
|
Eliopoulos AG, Angelis A, Liakakou A, Skaltsounis LA. In Vitro Anti-Influenza Virus Activity of Non-Polar Primula veris subsp. veris Extract. Pharmaceuticals (Basel) 2022; 15:ph15121513. [PMID: 36558964 PMCID: PMC9787935 DOI: 10.3390/ph15121513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 11/21/2022] [Accepted: 11/25/2022] [Indexed: 12/12/2022] Open
Abstract
Medicinal plants have long been recognized as a tremendous source of candidate compounds for the development of pharmaceuticals, including anti-viral agents. Herein, we report the identification of anti-influenza virus activity in non-polar Primula veris L. subsp. veris extracts. We show that P. veris subsp. veris flower extracts, obtained using supercritical fluid or ultrasound-based extraction, possess virucidal/virus inactivation properties and confer prophylactic and therapeutic effects against influenza virus-induced cytolysis in vitro. By GC-MS and UPLC-HRMS analysis of non-polar P. veris subsp. veris extracts we identified terpenes, flavones, tocopherols, and other classes of phytochemicals with known or putative anti-influenza properties. In silico prediction of cellular functions and molecular pathways affected by these phytochemicals suggests putative effects on signal transduction, inflammasome, and cell death pathways that are relevant to influenza virus pathogenesis. Combining P. veris subsp. veris with extracts of medicinal plants with proven anti-influenza activity such as Echinacea purpurea (L.) Moench and Cistus creticus L. subsp. creticus achieves an impressive protective effect against infection by influenza virus H1N1 in vitro and reduced progeny virus production by infected cells. Collectively, these findings uncover a previously uncharted biological property of non-polar P. veris flower extracts that warrants further studies to assess clinical efficacy.
Collapse
Affiliation(s)
- Aristides G. Eliopoulos
- Department of Biology, School of Medicine, National and Kapodistrian University of Athens, 11527 Athens, Greece
- Center of Basic Research, Biomedical Research Foundation of the Academy of Athens, 11527 Athens, Greece
- Correspondence: (A.G.E.); (L.A.S.)
| | - Apostolis Angelis
- Department of Pharmacy, Division of Pharmacognosy and Natural Products Chemistry, National and Kapodistrian University of Athens, 15771 Athens, Greece
| | - Anastasia Liakakou
- Department of Pharmacy, Division of Pharmacognosy and Natural Products Chemistry, National and Kapodistrian University of Athens, 15771 Athens, Greece
| | - Leandros A. Skaltsounis
- Department of Pharmacy, Division of Pharmacognosy and Natural Products Chemistry, National and Kapodistrian University of Athens, 15771 Athens, Greece
- Correspondence: (A.G.E.); (L.A.S.)
| |
Collapse
|
7
|
Prediction and Ranking of Biomarkers Using multiple UniReD. Int J Mol Sci 2022; 23:ijms231911112. [PMID: 36232413 PMCID: PMC9569535 DOI: 10.3390/ijms231911112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 09/06/2022] [Accepted: 09/17/2022] [Indexed: 11/23/2022] Open
Abstract
Protein–protein interactions (PPIs) are of key importance for understanding how cells and organisms function. Thus, in recent decades, many approaches have been developed for the identification and discovery of such interactions. These approaches addressed the problem of PPI identification either by an experimental point of view or by a computational one. Here, we present an updated version of UniReD, a computational prediction tool which takes advantage of biomedical literature aiming to extract documented, already published protein associations and predict undocumented ones. The usefulness of this computational tool has been previously evaluated by experimentally validating predicted interactions and by benchmarking it against public databases of experimentally validated PPIs. In its updated form, UniReD allows the user to provide a list of proteins of known implication in, e.g., a particular disease, as well as another list of proteins that are potentially associated with the proteins of the first list. UniReD then automatically analyzes both lists and ranks the proteins of the second list by their association with the proteins of the first list, thus serving as a potential biomarker discovery/validation tool.
Collapse
|
8
|
Discovering Thematically Coherent Biomedical Documents Using Contextualized Bidirectional Encoder Representations from Transformers-Based Clustering. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19105893. [PMID: 35627429 PMCID: PMC9141535 DOI: 10.3390/ijerph19105893] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 04/26/2022] [Accepted: 05/10/2022] [Indexed: 02/01/2023]
Abstract
The increasing expansion of biomedical documents has increased the number of natural language textual resources related to the current applications. Meanwhile, there has been a great interest in extracting useful information from meaningful coherent groupings of textual content documents in the last decade. However, it is challenging to discover informative representations and define relevant articles from the rapidly growing biomedical literature due to the unsupervised nature of document clustering. Moreover, empirical investigations demonstrated that traditional text clustering methods produce unsatisfactory results in terms of non-contextualized vector space representations because that neglect the semantic relationship between biomedical texts. Recently, pre-trained language models have emerged as successful in a wide range of natural language processing applications. In this paper, we propose the Gaussian Mixture Model-based efficient clustering framework that incorporates substantially pre-trained (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) BioBERT domain-specific language representations to enhance the clustering accuracy. Our proposed framework consists of main three phases. First, classic text pre-processing techniques are used biomedical document data, which crawled from the PubMed repository. Second, representative vectors are extracted from a pre-trained BioBERT language model for biomedical text mining. Third, we employ the Gaussian Mixture Model as a clustering algorithm, which allows us to assign labels for each biomedical document. In order to prove the efficiency of our proposed model, we conducted a comprehensive experimental analysis utilizing several clustering algorithms while combining diverse embedding techniques. Consequently, the experimental results show that the proposed model outperforms the benchmark models by reaching performance measures of Fowlkes mallows score, silhouette coefficient, adjusted rand index, Davies-Bouldin score of 0.7817, 0.3765, 0.4478, 1.6849, respectively. We expect the outcomes of this study will assist domain specialists in comprehending thematically cohesive documents in the healthcare field.
Collapse
|