1
|
Theodosiou T, Vrettos K, Baltsavia I, Baltoumas F, Papanikolaou N, Antonakis AΝ, Mossialos D, Ouzounis CA, Promponas VJ, Karaglani M, Chatzaki E, Brandau S, Pavlopoulos GA, Andreakos E, Iliopoulos I. BioTextQuest v2.0: An evolved tool for biomedical literature mining and concept discovery. Comput Struct Biotechnol J 2024; 23:3247-3253. [PMID: 39279874 PMCID: PMC11399685 DOI: 10.1016/j.csbj.2024.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 08/05/2024] [Accepted: 08/15/2024] [Indexed: 09/18/2024] Open
Abstract
The process of navigating through the landscape of biomedical literature and performing searches or combining them with bioinformatics analyses can be daunting, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related repositories. Herein, we present BioTextQuest v2.0, a tool for biomedical literature mining. BioTextQuest v2.0 is an open-source online web portal for document clustering based on sets of selected biomedical terms, offering efficient management of information derived from PubMed abstracts. Employing established machine learning algorithms, the tool facilitates document clustering while allowing users to customize the analysis by selecting terms of interest. BioTextQuest v2.0 streamlines the process of uncovering valuable insights from biomedical research articles, serving as an agent that connects the identification of key terms like genes/proteins, diseases, chemicals, Gene Ontology (GO) terms, functions, and others through named entity recognition, and their application in biological research. Instead of manually sifting through articles, researchers can enter their PubMed-like query and receive extracted information in two user-friendly formats, tables and word clouds, simplifying the comprehension of key findings. The latest update of BioTextQuest leverages the EXTRACT named entity recognition tagger, enhancing its ability to pinpoint various biological entities within text. BioTextQuest v2.0 acts as a research assistant, significantly reducing the time and effort required for researchers to identify and present relevant information from the biomedical literature.
Collapse
Affiliation(s)
- Theodosios Theodosiou
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece
| | - Konstantinos Vrettos
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece
| | - Ismini Baltsavia
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece
| | - Fotis Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Athens 16672, Greece
| | - Nikolas Papanikolaou
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece
| | - Andreas Ν Antonakis
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece
| | - Dimitrios Mossialos
- Department of Biochemistry and Biotechnology, University of Thessaly, 41500 Larissa, Greece
| | - Christos A Ouzounis
- Biological Computation & Computational Biology Group, AIIA Lab, School of Informatics, Aristotle University of Thessalonica, 57001 Thessalonica, Greece
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia 1678, Cyprus
| | - Makrina Karaglani
- Medical School, Democritus University of Thrace, 68100 Alexandroupolis, Greece
| | - Ekaterini Chatzaki
- Medical School, Democritus University of Thrace, 68100 Alexandroupolis, Greece
| | - Sven Brandau
- Experimental and Translational Research, Department of Otorhinolaryngology, University Hospital Essen, Essen, Germany
| | - Georgios A Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Athens 16672, Greece
| | - Evangelos Andreakos
- Center for Immunology and Transplantation, Biomedical Research Foundation Academy of Athens, Athens, Greece
| | - Ioannis Iliopoulos
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece
| |
Collapse
|
2
|
Grützmann K, Kraft T, Meinhardt M, Meier F, Westphal D, Seifert M. Network-based analysis of heterogeneous patient-matched brain and extracranial melanoma metastasis pairs reveals three homogeneous subgroups. Comput Struct Biotechnol J 2024; 23:1036-1050. [PMID: 38464935 PMCID: PMC10920107 DOI: 10.1016/j.csbj.2024.02.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/15/2024] [Accepted: 02/15/2024] [Indexed: 03/12/2024] Open
Abstract
Melanoma, the deadliest form of skin cancer, can metastasize to different organs. Molecular differences between brain and extracranial melanoma metastases are poorly understood. Here, promoter methylation and gene expression of 11 heterogeneous patient-matched pairs of brain and extracranial metastases were analyzed using melanoma-specific gene regulatory networks learned from public transcriptome and methylome data followed by network-based impact propagation of patient-specific alterations. This innovative data analysis strategy allowed to predict potential impacts of patient-specific driver candidate genes on other genes and pathways. The patient-matched metastasis pairs clustered into three robust subgroups with specific downstream targets with known roles in cancer, including melanoma (SG1: RBM38, BCL11B, SG2: GATA3, FES, SG3: SLAMF6, PYCARD). Patient subgroups and ranking of target gene candidates were confirmed in a validation cohort. Summarizing, computational network-based impact analyses of heterogeneous metastasis pairs predicted individual regulatory differences in melanoma brain metastases, cumulating into three consistent subgroups with specific downstream target genes.
Collapse
Affiliation(s)
- Konrad Grützmann
- Institute for Medical Informatics and Biometry, Faculty of Medicine, TU Dresden, 01307 Dresden, Germany
| | - Theresa Kraft
- Institute for Medical Informatics and Biometry, Faculty of Medicine, TU Dresden, 01307 Dresden, Germany
| | - Matthias Meinhardt
- Department of Pathology, University Hospital Carl Gustav Carus Dresden, TU Dresden, 01307 Dresden, Germany
| | - Friedegund Meier
- Department of Dermatology, University Hospital Carl Gustav Carus Dresden, TU Dresden, 01307 Dresden, Germany
- National Center for Tumor Diseases (NCT), D-01307 Dresden, Germany
| | - Dana Westphal
- Department of Dermatology, University Hospital Carl Gustav Carus Dresden, TU Dresden, 01307 Dresden, Germany
- National Center for Tumor Diseases (NCT), D-01307 Dresden, Germany
| | - Michael Seifert
- Institute for Medical Informatics and Biometry, Faculty of Medicine, TU Dresden, 01307 Dresden, Germany
- National Center for Tumor Diseases (NCT), D-01307 Dresden, Germany
| |
Collapse
|
3
|
Hsieh AR, Tsai CY. Biomedical literature mining: graph kernel-based learning for gene-gene interaction extraction. Eur J Med Res 2024; 29:404. [PMID: 39095899 PMCID: PMC11297645 DOI: 10.1186/s40001-024-01983-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 07/17/2024] [Indexed: 08/04/2024] Open
Abstract
The supervised machine learning method is often used for biomedical relationship extraction. The disadvantage is that it requires much time and money to manually establish an annotated dataset. Based on distant supervision, the knowledge base is combined with the corpus, thus, the training corpus can be automatically annotated. As many biomedical databases provide knowledge bases for study with a limited number of annotated corpora, this method is practical in biomedicine. The clinical significance of each patient's genetic makeup can be understood based on the healthcare provider's genetic database. Unfortunately, the lack of previous biomedical relationship extraction studies focuses on gene-gene interaction. The main purpose of this study is to develop extraction methods for gene-gene interactions that can help explain the heritability of human complex diseases. This study referred to the information on gene-gene interactions in the KEGG PATHWAY database, the abstracts in PubMed were adopted to generate the training sample set, and the graph kernel method was adopted to extract gene-gene interactions. The best assessment result was an F1-score of 0.79. Our developed distant supervision method automatically finds sentences through the corpus without manual labeling for extracting gene-gene interactions, which can effectively reduce the time cost for manual annotation data; moreover, the relationship extraction method based on a graph kernel can be successfully applied to extract gene-gene interactions. In this way, the results of this study are expected to help achieve precision medicine.
Collapse
Affiliation(s)
- Ai-Ru Hsieh
- Department of Statistics, Tamkang University, Tamsui District, New Taipei City, 251301, Taiwan.
| | - Chen-Yu Tsai
- Department of Statistics, Tamkang University, Tamsui District, New Taipei City, 251301, Taiwan
| |
Collapse
|
4
|
Yao X, He Z, Liu Y, Wang Y, Ouyang S, Xia J. Cancer-Alterome: a literature-mined resource for regulatory events caused by genetic alterations in cancer. Sci Data 2024; 11:265. [PMID: 38431735 PMCID: PMC10908799 DOI: 10.1038/s41597-024-03083-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 02/20/2024] [Indexed: 03/05/2024] Open
Abstract
It is vital to investigate the complex mechanisms underlying tumors to better understand cancer and develop effective treatments. Metabolic abnormalities and clinical phenotypes can serve as essential biomarkers for diagnosing this challenging disease. Additionally, genetic alterations provide profound insights into the fundamental aspects of cancer. This study introduces Cancer-Alterome, a literature-mined dataset that focuses on the regulatory events of an organism's biological processes or clinical phenotypes caused by genetic alterations. By proposing and leveraging a text-mining pipeline, we identify 16,681 thousand of regulatory events records encompassing 21K genes, 157K genetic alterations and 154K downstream bio-concepts, extracted from 4,354K pan-cancer literature. The resulting dataset empowers a multifaceted investigation of cancer pathology, enabling the meticulous tracking of relevant literature support. Its potential applications extend to evidence-based medicine and precision medicine, yielding valuable insights for further advancements in cancer research.
Collapse
Affiliation(s)
- Xinzhi Yao
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Zhihan He
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Yawen Liu
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Yuxing Wang
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P.R. China
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, P.R. China
| | - Sizhuo Ouyang
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Jingbo Xia
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P.R. China.
| |
Collapse
|