1
|
Li Y, Wang Y, Chen S, Liu L. The landscape of miRNA-mRNA regulatory network and cellular sources in inflammatory bowel diseases: insights from text mining and single cell RNA sequencing analysis. Front Immunol 2024; 15:1454532. [PMID: 39238649 PMCID: PMC11374595 DOI: 10.3389/fimmu.2024.1454532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Accepted: 08/05/2024] [Indexed: 09/07/2024] Open
Abstract
Background Inflammatory Bowel Diseases (IBDs), encompassing Ulcerative Colitis (UC) and Crohn's Disease (CD), are chronic, recurrent inflammatory conditions of the gastrointestinal tract. The microRNA (miRNA) -mRNA regulatory network is pivotal in the initiation and progression of IBDs. Although individual studies provide valuable insights into miRNA mechanisms in IBDs, they often have limited scope due to constraints in population diversity, sample size, sequencing platform variability, batch effects, and potential researcher bias. Our study aimed to construct comprehensive miRNA-mRNA regulatory networks and determine the cellular sources and functions of key miRNAs in IBD pathogenesis. Methods To minimize potential bias from individual studies, we utilized a text mining-based approach on published scientific literature from PubMed and PMC databases to identify miRNAs and mRNAs associated with IBDs and their subtypes. We constructed miRNA-mRNA regulatory networks by integrating both predicted and experimentally validated results from DIANA, Targetscan, PicTar, Miranda, miRDB, and miRTarBase (all of which are databases for miRNA target annotation). The functions of miRNAs were determined through gene enrichment analysis of their target mRNAs. Additionally, we used two large-scale single-cell RNA sequencing datasets to identify the cellular sources of miRNAs and the association of their expression levels with clinical status, molecular and functional alternation in CD and UC. Results Our analysis systematically summarized IBD-related genes using text-mining methodologies. We constructed three comprehensive miRNA-mRNA regulatory networks specific to IBD, CD, and UC. Through cross-analysis with two large-scale scRNA-seq datasets, we determined the cellular sources of the identified miRNAs. Despite originating from different cell types, hsa-miR-142, hsa-miR-145, and hsa-miR-146a were common to both CD and UC. Notably, hsa-miR-145 was identified as myofibroblast-specific in both CD and UC. Furthermore, we found that higher tissue repair and enhanced glucose and lipid metabolism were associated with hsa-miR-145 in myofibroblasts in both CD and UC contexts. Conclusion This comprehensive approach revealed common and distinct miRNA-mRNA regulatory networks in CD and UC, identified cell-specific miRNA expressions (notably hsa-miR-145 in myofibroblasts), and linked miRNA expression to functional alterations in IBD. These findings not only enhance our understanding of IBD pathogenesis but also offer promising diagnostic biomarkers and therapeutic targets for clinical practice in managing IBDs.
Collapse
Affiliation(s)
- Yuan Li
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Yao Wang
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Simeng Chen
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Lijia Liu
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| |
Collapse
|
2
|
Martin VP, Gauld C, Taillard J, Peter-Derex L, Lopez R, Micoulaud-Franchi JA. Sleepiness should be reinvestigated through the lens of clinical neurophysiology: A mixed expertal and big-data Natural Language Processing approach. Neurophysiol Clin 2024; 54:102937. [PMID: 38401240 DOI: 10.1016/j.neucli.2023.102937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 12/14/2023] [Accepted: 12/15/2023] [Indexed: 02/26/2024] Open
Abstract
Historically, the field of sleep medicine has revolved around electrophysiological tools. However, the use of these tools as a neurophysiological method of investigation seems to be underrepresented today, from both international recommendations and sleep centers, in contrast to behavioral and psychometric tools. The aim of this article is to combine a data-driven approach and neurophysiological and sleep medicine expertise to confirm or refute the hypothesis that neurophysiology has declined in favor of behavioral or self-reported dimensions in sleep medicine for the investigation of sleepiness, despite the use of electrophysiological tools. Using Natural Language Processing methods, we analyzed the abstracts of the 18,370 articles indexed by PubMed containing the terms 'sleepiness' or 'sleepy' in the title, abstract, or keywords. For this purpose, we examined these abstracts using two methods: a lexical network, enabling the identification of concepts (neurophysiological or clinical) related to sleepiness in these articles and their interconnections; furthermore, we analyzed the temporal evolution of these concepts to extract historical trends. These results confirm the hypothesis that neurophysiology has declined in favor of behavioral or self-reported dimensions in sleep medicine for the investigation of sleepiness. In order to bring sleepiness measurements closer to brain functioning and to reintroduce neurophysiology into sleep medicine, we discuss two strategies: the first is reanalyzing electrophysiological signals collected during the standard sleep electrophysiological test; the second takes advantage of the current trend towards dimensional models of sleepiness to situate clinical neurophysiology at the heart of the redefinition of sleepiness.
Collapse
Affiliation(s)
- Vincent P Martin
- Deep Digital Phenotyping Research Unit, Department of Precision Health, Luxembourg Institute of Health, L-1445 Strassen, Luxembourg; Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800, F-33400 Talence, France; Univ. Bordeaux, CNRS, SANPSY, UMR 6033, F-33000 Bordeaux, France
| | - Christophe Gauld
- Service Psychopathologie du Développement de l'Enfant et de l'Adolescent, Hospices Civils de Lyon & Université de Lyon 1, France; Institut des Sciences Cognitives Marc Jeannerod, UMR 5229 CNRS & Université Claude Bernard Lyon 1, France
| | - Jacques Taillard
- Univ. Bordeaux, CNRS, SANPSY, UMR 6033, F-33000 Bordeaux, France
| | - Laure Peter-Derex
- Lyon Neuroscience Research Centre, INSERM U1028, CNRS UMR 5292, Lyon, France; Centre for Sleep Medicine and Respiratory Diseases, Croix-Rousse Hospital, Hospices Civils de Lyon, Lyon 1 University, Lyon, France
| | - Régis Lopez
- National Reference Centre for Orphan Diseases, Narcolepsy-Rare hypersomnias, Sleep Unit, Department of Neurology, CHU de Montpellier, University of Montpellier, Montpellier, France; Institute for Neurosciences of Montpellier (INM), University of Montpellier, Inserm, Montpellier, France
| | - Jean-Arthur Micoulaud-Franchi
- Univ. Bordeaux, CNRS, SANPSY, UMR 6033, F-33000 Bordeaux, France; University Sleep Clinic, University Hospital of Bordeaux, Place Amélie Raba-Leon, 33 076 Bordeaux, France.
| |
Collapse
|
3
|
Méndez-Cruz CF, Rodríguez-Herrera J, Varela-Vega A, Mateo-Estrada V, Castillo-Ramírez S. Unsupervised learning and natural language processing highlight research trends in a superbug. Front Artif Intell 2024; 7:1336071. [PMID: 38576460 PMCID: PMC10991725 DOI: 10.3389/frai.2024.1336071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 03/11/2024] [Indexed: 04/06/2024] Open
Abstract
Introduction Antibiotic-resistant Acinetobacter baumannii is a very important nosocomial pathogen worldwide. Thousands of studies have been conducted about this pathogen. However, there has not been any attempt to use all this information to highlight the research trends concerning this pathogen. Methods Here we use unsupervised learning and natural language processing (NLP), two areas of Artificial Intelligence, to analyse the most extensive database of articles created (5,500+ articles, from 851 different journals, published over 3 decades). Results K-means clustering found 113 theme clusters and these were defined with representative terms automatically obtained with topic modelling, summarising different research areas. The biggest clusters, all with over 100 articles, are biased toward multidrug resistance, carbapenem resistance, clinical treatment, and nosocomial infections. However, we also found that some research areas, such as ecology and non-human infections, have received very little attention. This approach allowed us to study research themes over time unveiling those of recent interest, such as the use of Cefiderocol (a recently approved antibiotic) against A. baumannii. Discussion In a broader context, our results show that unsupervised learning, NLP and topic modelling can be used to describe and analyse the research themes for important infectious diseases. This strategy should be very useful to analyse other ESKAPE pathogens or any other pathogens relevant to Public Health.
Collapse
Affiliation(s)
- Carlos-Francisco Méndez-Cruz
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Joel Rodríguez-Herrera
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Alfredo Varela-Vega
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Valeria Mateo-Estrada
- Programa de Genómica Evolutiva, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Santiago Castillo-Ramírez
- Programa de Genómica Evolutiva, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| |
Collapse
|
4
|
Collins C, Baker S, Brown J, Zheng H, Chan A, Stenius U, Narita M, Korhonen A. Text mining for contexts and relationships in cancer genomics literature. Bioinformatics 2024; 40:btae021. [PMID: 38258418 PMCID: PMC10822582 DOI: 10.1093/bioinformatics/btae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 09/27/2023] [Accepted: 01/15/2024] [Indexed: 01/24/2024] Open
Abstract
MOTIVATION Scientific advances build on the findings of existing research. The 2001 publication of the human genome has led to the production of huge volumes of literature exploring the context-specific functions and interactions of genes. Technology is needed to perform large-scale text mining of research papers to extract the reported actions of genes in specific experimental contexts and cell states, such as cancer, thereby facilitating the design of new therapeutic strategies. RESULTS We present a new corpus and Text Mining methodology that can accurately identify and extract the most important details of cancer genomics experiments from biomedical texts. We build a Named Entity Recognition model that accurately extracts relevant experiment details from PubMed abstract text, and a second model that identifies the relationships between them. This system outperforms earlier models and enables the analysis of gene function in diverse and dynamically evolving experimental contexts. AVAILABILITY AND IMPLEMENTATION Code and data are available here: https://github.com/cambridgeltl/functional-genomics-ie.
Collapse
Affiliation(s)
- Charlotte Collins
- Language Technology Laboratory, Theoretical and Applied Linguistics, Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge CB3 9DA, United Kingdom
| | - Simon Baker
- Language Technology Laboratory, Theoretical and Applied Linguistics, Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge CB3 9DA, United Kingdom
| | - Jason Brown
- Language Technology Laboratory, Theoretical and Applied Linguistics, Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge CB3 9DA, United Kingdom
| | - Huiyuan Zheng
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Adelyne Chan
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge CB2 0RE, United Kingdom
| | - Ulla Stenius
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Masashi Narita
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge CB2 0RE, United Kingdom
| | - Anna Korhonen
- Language Technology Laboratory, Theoretical and Applied Linguistics, Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge CB3 9DA, United Kingdom
| |
Collapse
|
5
|
Wu W, Zhang M. Exploring the motivations and obstacles of the public's garbage classification participation: evidence from Sina Weibo. JOURNAL OF MATERIAL CYCLES AND WASTE MANAGEMENT 2023; 25:1-14. [PMID: 37360951 PMCID: PMC10105363 DOI: 10.1007/s10163-023-01659-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 03/24/2023] [Indexed: 06/28/2023]
Abstract
China has been implementing garbage classification to improve resource recycling for many years. Since garbage classification is essentially a social activity, it needs the active participation of the public. However, the phenomenon of "high practice, low effect" is widespread in most cities. Therefore, this paper uses the data from Sina Weibo to analyze the reasons for the poor garbage classification effect. First, the key factors affecting residents' willingness to participate in garbage classification are identified based on the text-mining method. Further, this paper analyzes the reasons that promote or hinder the residents' intention of garbage classification. Finally, the resident's attitude towards garbage classification is explored by the score of the text's emotional orientation, and further the reasons for the positive and negative emotional orientation are analyzed, respectively. The main conclusions are as follows: (1) The proportion of residents holding negative sentiment towards garbage classification is as high as 55%. (2) Residents' positive emotions are mainly caused by the public's sense of environmental protection inspired by publicity and education, and the incentive measures taken by the government. (3) The main reasons for negative emotions are imperfect infrastructure and unreasonable garbage sorting arrangements.
Collapse
Affiliation(s)
- Wenqi Wu
- School of Economics and Management, China University of Mining and Technology, Xuzhou, 221116 China
- Center for Environmental Management and Economics Policy Research, China University of Mining and Technology, Xuzhou, 221116 China
| | - Ming Zhang
- School of Economics and Management, China University of Mining and Technology, Xuzhou, 221116 China
- Center for Environmental Management and Economics Policy Research, China University of Mining and Technology, Xuzhou, 221116 China
| |
Collapse
|
6
|
Gauld C, Pignon B, Fourneret P, Dubertret C, Tebeka S. Comparison of relative areas of interest between major depression disorder and postpartum depression. Prog Neuropsychopharmacol Biol Psychiatry 2023; 121:110671. [PMID: 36341842 DOI: 10.1016/j.pnpbp.2022.110671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 10/11/2022] [Accepted: 10/26/2022] [Indexed: 11/06/2022]
Abstract
INTRODUCTION Postpartum depression (PPD) is defined as a major depressive disorder (MDD) beginning after childbirth. Wide debates aim to better understand PPD's specificities compared with MDD. One of the keys in differentiating PPD from MDD is to systematically study scientific "Areas Of Interest" (AOIs) of these disorders. METHODS In November 2021, we performed an extraction and textual computational analysis of associated terms for PPD and MDD, using the biomedical database PubMed. We performed an undirected lexical network analysis to map the 150 first terms in space. Then, we used an unsupervised machine learning technique to detect word patterns and automatically cluster AOIs with a topic-modeling analysis. RESULTS We identified 30,000 articles of the 554,724 articles for MDD and 15,642 articles for PPD. Four AOIs were detected in the MDD network: mood disorders and their treatments, risk factors, consequences and quality of life, and mental health and comorbidities. Five AOIs were detected in the PPD network: mood disorders and treatments, risk factors, consequences and child health, patient's background, and the challenges of screening. DISCUSSION AND CONCLUSION Limitations are both methodological, in particular due to the qualitative interpretation of AOIs, and are also related to the difficult transferability of these research results to the clinical practice. The partial overlap between AOIs for MDD and for PPD suggest that the latter is a particular form of the former.
Collapse
Affiliation(s)
- Christophe Gauld
- Department of Psychopathology of Child and Adolescent Development, Hospices Civils de Lyon, Lyon 1, France; UMR CNRS 8590 IHPST, Sorbonne University, Paris 1, France.
| | - Baptiste Pignon
- Univ Paris-Est-Créteil (UPEC), AP-HP, Hôpitaux Universitaires « H. Mondor », France; DMU IMPACT, INSERM, IMRB, Translational Neuropsychiatry, Fondation FondaMental, F-94010 Creteil, France
| | - Pierre Fourneret
- Department of Psychopathology of Child and Adolescent Development, Hospices Civils de Lyon, Lyon 1, France; Marc Jeannerod Institute of Cognitive Sciences UMR 5229, CNRS & Claude Bernard University, Lyon 1, France
| | - Caroline Dubertret
- Université de Paris, INSERM UMR1266, Institute of Psychiatry and Neurosciences, Team 1, Paris, France; Department of Psychiatry, AP-HP, Louis Mourier Hospital, F-92700 Colombes, France
| | - Sarah Tebeka
- Université de Paris, INSERM UMR1266, Institute of Psychiatry and Neurosciences, Team 1, Paris, France; Department of Psychiatry, AP-HP, Louis Mourier Hospital, F-92700 Colombes, France
| |
Collapse
|
7
|
Dorr RA, Casal JJ, Toriano R. Text Mining of Biomedical Articles Using the Konstanz Information Miner (KNIME) Platform: Hemolytic Uremic Syndrome as a Case Study. Healthc Inform Res 2022; 28:276-283. [PMID: 35982602 PMCID: PMC9388920 DOI: 10.4258/hir.2022.28.3.276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 04/14/2022] [Indexed: 11/23/2022] Open
Abstract
Objectives Automated systems for information extraction are becoming very useful due to the enormous scale of the existing literature and the increasing number of scientific articles published worldwide in the field of medicine. We aimed to develop an accessible method using the open-source platform KNIME to perform text mining (TM) on indexed publications. Material from scientific publications in the field of life sciences was obtained and integrated by mining information on hemolytic uremic syndrome (HUS) as a case study. Methods Text retrieved from Europe PubMed Central (PMC) was processed using specific KNIME nodes. The results were presented in the form of tables or graphical representations. Data could also be compared with those from other sources. Results By applying TM to the scientific literature on HUS as a case study, and by selecting various fields from scientific articles, it was possible to obtain a list of individual authors of publications, build bags of words and study their frequency and temporal use, discriminate topics (HUS vs. atypical HUS) in an unsupervised manner, and cross-reference information with a list of FDA-approved drugs. Conclusions Following the instructions in the tutorial, researchers without programming skills can successfully perform TM on the indexed scientific literature. This methodology, using KNIME, could become a useful tool for performing statistics, analyzing behaviors, following trends, and making forecast related to medical issues. The advantages of TM using KNIME include enabling the integration of scientific information, helping to carry out reviews, and optimizing the management of resources dedicated to basic and clinical research.
Collapse
Affiliation(s)
- Ricardo A Dorr
- Facultad de Medicina, Instituto de Fisiología y Biofísica Bernardo Houssay (IFIBIO Houssay), CONICET-Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Juan J Casal
- Facultad de Medicina, Instituto de Fisiología y Biofísica Bernardo Houssay (IFIBIO Houssay), CONICET-Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Roxana Toriano
- Facultad de Medicina, Instituto de Fisiología y Biofísica Bernardo Houssay (IFIBIO Houssay), CONICET-Universidad de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
8
|
Van Meenen J, Leysen H, Chen H, Baccarne R, Walter D, Martin B, Maudsley S. Making Biomedical Sciences publications more accessible for machines. MEDICINE, HEALTH CARE, AND PHILOSOPHY 2022; 25:179-190. [PMID: 35039972 DOI: 10.1007/s11019-022-10069-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 01/08/2022] [Indexed: 06/14/2023]
Abstract
With the rapidly expanding catalogue of scientific publications, especially within the Biomedical Sciences field, it is becoming increasingly difficult for researchers to search for, read or even interpret emerging scientific findings. PubMed, just one of the current biomedical data repositories, comprises over 33 million citations for biomedical research, and over 2500 publications are added each day. To further strengthen the impact biomedical research, we suggest that there should be more synergy between publications and machines. By bringing machines into the realm of research and publication, we can greatly augment the assessment, investigation and cataloging of the biomedical literary corpus. The effective application of machine-based manuscript assessment and interpretation is now crucial, and potentially stands as the most effective way for researchers to comprehend and process the tsunami of biomedical data and literature. Many biomedical manuscripts are currently published online in poorly searchable document types, with figures and data presented in formats that are partially inaccessible to machine-based approaches. The structure and format of biomedical manuscripts should be adapted to facilitate machine-assisted interrogation of this important literary corpus. In this context, it is important to embrace the concept that biomedical scientists should also write manuscripts that can be read by machines. It is likely that an enhanced human-machine synergy in reading biomedical publications will greatly enhance biomedical data retrieval and reveal novel insights into complex datasets.
Collapse
Affiliation(s)
- Joris Van Meenen
- Receptor Biology Lab, Department of Biomedical Sciences, University of Antwerp, Wilrijk, 2610, Antwerp, Belgium
- Antwerp Research Group for Ocular Science, Department of Translational Neurosciences, University of Antwerp, Wilrijk, 2610, Antwerp, Belgium
| | - Hanne Leysen
- Receptor Biology Lab, Department of Biomedical Sciences, University of Antwerp, Wilrijk, 2610, Antwerp, Belgium
| | - Hongyu Chen
- Weill Cornell Medical College, New York, NY, USA
| | - Rudi Baccarne
- Anet Library Automation, University of Antwerp, Wilrijk, 2610, Antwerp, Belgium
| | - Deborah Walter
- Receptor Biology Lab, Department of Biomedical Sciences, University of Antwerp, Wilrijk, 2610, Antwerp, Belgium
| | - Bronwen Martin
- Faculty of Pharmaceutical, Veterinary and Biomedical Sciences, University of Antwerp, Wilrijk, 2610, Antwerp, Belgium
| | - Stuart Maudsley
- Receptor Biology Lab, Department of Biomedical Sciences, University of Antwerp, Wilrijk, 2610, Antwerp, Belgium.
| |
Collapse
|
9
|
An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:7937573. [PMID: 34795792 PMCID: PMC8594978 DOI: 10.1155/2021/7937573] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 10/11/2021] [Indexed: 01/03/2023]
Abstract
Semantic mining is always a challenge for big biomedical text data. Ontology has been widely proved and used to extract semantic information. However, the process of ontology-based semantic similarity calculation is so complex that it cannot measure the similarity for big text data. To solve this problem, we propose a parallelized semantic similarity measurement method based on Hadoop MapReduce for big text data. At first, we preprocess and extract the semantic features from documents. Then, we calculate the document semantic similarity based on ontology network structure under MapReduce framework. Finally, based on the generated semantic document similarity, document clusters are generated via clustering algorithms. To validate the effectiveness, we use two kinds of open datasets. The experimental results show that the traditional methods can hardly work for more than ten thousand biomedical documents. The proposed method keeps efficient and accurate for big dataset and is of high parallelism and scalability.
Collapse
|
10
|
Tewari S, Toledo Margalef P, Kareem A, Abdul-Hussein A, White M, Wazana A, Davidge ST, Delrieux C, Connor KL. Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence. J Pers Med 2021; 11:jpm11111064. [PMID: 34834416 PMCID: PMC8621659 DOI: 10.3390/jpm11111064] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 10/01/2021] [Accepted: 10/18/2021] [Indexed: 01/03/2023] Open
Abstract
The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health.
Collapse
Affiliation(s)
- Shrankhala Tewari
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
| | - Pablo Toledo Margalef
- CONICET, National Science and Technology Council of Argentina, Buenos Aires C1425FQD, Argentina; (P.T.M.); (C.D.)
| | - Ayesha Kareem
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
| | - Ayah Abdul-Hussein
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
| | - Marina White
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
| | - Ashley Wazana
- Department of Psychiatry, McGill University, Montreal, QC H3A 0G4, Canada;
| | - Sandra T. Davidge
- Women and Children’s Health Research Institute, University of Alberta, Edmonton, AB T6G 1C9, Canada;
| | - Claudio Delrieux
- CONICET, National Science and Technology Council of Argentina, Buenos Aires C1425FQD, Argentina; (P.T.M.); (C.D.)
- DIEC—Electric and Computer Engineering Department, Universidad Nacional del Sur, Bahía Blanca B8000, Argentina
| | - Kristin L. Connor
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
- Correspondence:
| |
Collapse
|
11
|
Leonardelli L, Lofano G, Selvaggio G, Parolo S, Giampiccolo S, Tomasoni D, Domenici E, Priami C, Song H, Medini D, Marchetti L, Siena E. Literature Mining and Mechanistic Graphical Modelling to Improve mRNA Vaccine Platforms. Front Immunol 2021; 12:738388. [PMID: 34557200 PMCID: PMC8454234 DOI: 10.3389/fimmu.2021.738388] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 08/23/2021] [Indexed: 12/25/2022] Open
Abstract
RNA vaccines represent a milestone in the history of vaccinology. They provide several advantages over more traditional approaches to vaccine development, showing strong immunogenicity and an overall favorable safety profile. While preclinical testing has provided some key insights on how RNA vaccines interact with the innate immune system, their mechanism of action appears to be fragmented amid the literature, making it difficult to formulate new hypotheses to be tested in clinical settings and ultimately improve this technology platform. Here, we propose a systems biology approach, based on the combination of literature mining and mechanistic graphical modeling, to consolidate existing knowledge around mRNA vaccines mode of action and enhance the translatability of preclinical hypotheses into clinical evidence. A Natural Language Processing (NLP) pipeline for automated knowledge extraction retrieved key biological evidences that were joined into an interactive mechanistic graphical model representing the chain of immune events induced by mRNA vaccines administration. The achieved mechanistic graphical model will help the design of future experiments, foster the generation of new hypotheses and set the basis for the development of mathematical models capable of simulating and predicting the immune response to mRNA vaccines.
Collapse
Affiliation(s)
- Lorena Leonardelli
- Fondazione The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto, Italy
| | | | - Gianluca Selvaggio
- Fondazione The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto, Italy
| | - Silvia Parolo
- Fondazione The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto, Italy
| | - Stefano Giampiccolo
- Fondazione The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto, Italy
| | - Danilo Tomasoni
- Fondazione The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto, Italy
| | - Enrico Domenici
- Fondazione The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto, Italy.,Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Povo, Italy
| | - Corrado Priami
- Fondazione The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto, Italy.,Department of Computer Science, University of Pisa, Pisa, Italy
| | | | | | - Luca Marchetti
- Fondazione The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto, Italy.,Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Povo, Italy
| | - Emilio Siena
- Data Science and Computational Vaccinology, GSK, Siena, Italy
| |
Collapse
|
12
|
Rahaman T. Discovering New Trends & Connections: Current Applications of Biomedical Text Mining. Med Ref Serv Q 2021; 40:329-336. [PMID: 34495798 DOI: 10.1080/02763869.2021.1945869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
The explosive growth of digital information in recent years has amplified the information overload experienced by today's health-care professionals. In particular, the wide variety of unstructured text makes it difficult for researchers to find meaningful data without spending a considerable amount of time reading. Text mining can be used to facilitate better discoverability and analysis, and aid researchers in identifying critical trends and connections. This column will introduce key text-mining terms, recent use cases of biomedical text mining, and current applications for this technology in medical libraries.
Collapse
Affiliation(s)
- Tariq Rahaman
- Tampa Bay Regional Campus Library, Nova Southeastern University, Clearwater, Florida, USA
| |
Collapse
|
13
|
Gopal J, Prakash Sinnarasan VS, Venkatesan A. Identification of Repurpose Drugs by Computational Analysis of Disease-Gene-Drug Associations. J Comput Biol 2021; 28:975-984. [PMID: 34242526 DOI: 10.1089/cmb.2020.0356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Repurposing of marketed drugs to find new indications has become an alternative to circumvent the risk of traditional drug development by its productivity quality. Despite many approaches, computational analysis has great potential to fuel the development of all-rounder drugs to find new classes of medicine for neglected and rare disease. The genes that can explain variations in drug response associated to disease are more important and significant in drug therapeutics necessitate elucidating the relationships of a gene, drug, and disease. The proposed computational analysis facilitates the discovery of knowledge on both target and disease-based relationships from large sources of biomedical literature spread over different platforms. It uses the utility of text mining for automatic extraction of valuable aggregated biomedical entities (disease, gene, and drug) from PubMed to serves as an input to the analysis of association prediction. The top-ranked associations considered for identification of repurposing drugs and also the hidden associations identified using concurrence principle to extrapolate the new relationships. Such findings are reported as novel and contribute to the knowledge base for pharmacogenomics, would immensely support the discovery and progress of novel therapeutic pathways and patient segment biomarkers.
Collapse
Affiliation(s)
- Jeyakodi Gopal
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Puducherry, India
| | | | - Amouda Venkatesan
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Puducherry, India
| |
Collapse
|
14
|
Wu Z, Zhang Y, Chen Q, Wang H. Attitude of Chinese public towards municipal solid waste sorting policy: A text mining study. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 756:142674. [PMID: 33071141 DOI: 10.1016/j.scitotenv.2020.142674] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Revised: 09/23/2020] [Accepted: 09/25/2020] [Indexed: 05/21/2023]
Abstract
With the acceleration of urban development, the amount of municipal solid waste (MSW) has increased dramatically. In order to recycle MSW more efficiently, a compulsory policy of sorting MSW has been enacted in China. According to the existing literature, attitude is an important factor affecting public's MSW sorting behavior. To explore the Chinese residents' emotional tendency towards the MSW sorting policy, this study analyzed the data of Sina Weibo users and their comments on related popular posts. Meanwhile, text mining technology was employed to analyze the collected data. Results showed that although a large proportion of the Chinese public has a positive attitude towards the MSW sorting policy, the proportion of people with negative emotions reached nearly half. In addition, it was found that the Chinese people in different regions pay different attentions to the MSW sorting policy. Results further revealed that the main reasons for the public's negative emotions were fines, MSW sorting rules, fees, timing of throwing waste, and irregular recycling procedures. By providing the public sentiment analysis of MSW sorting, this study can serve as a policy guide for practitioners and policy-makers to link current research areas into social development.
Collapse
Affiliation(s)
- Zezhou Wu
- Sino-Australia Joint Research Centre in BIM and Smart Construction, Shenzhen University, Shenzhen, China; Key Laboratory of Coastal Urban Resilient Infrastructures, Shenzhen University, Shenzhen, China
| | - Yan Zhang
- Sino-Australia Joint Research Centre in BIM and Smart Construction, Shenzhen University, Shenzhen, China
| | - Qiaohui Chen
- Department of Building and Real Estate, The Hong Kong Polytechnic University, Hong Kong.
| | - Hao Wang
- School of Management Science and Engineering, Central University of Finance and Economics, Beijing, China
| |
Collapse
|
15
|
Karim MR, Beyan O, Zappa A, Costa IG, Rebholz-Schuhmann D, Cochez M, Decker S. Deep learning-based clustering approaches for bioinformatics. Brief Bioinform 2021; 22:393-415. [PMID: 32008043 PMCID: PMC7820885 DOI: 10.1093/bib/bbz170] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Revised: 11/28/2019] [Accepted: 12/11/2019] [Indexed: 12/14/2022] Open
Abstract
Clustering is central to many data-driven bioinformatics research and serves a powerful computational method. In particular, clustering helps at analyzing unstructured and high-dimensional data in the form of sequences, expressions, texts and images. Further, clustering is used to gain insights into biological processes in the genomics level, e.g. clustering of gene expressions provides insights on the natural structure inherent in the data, understanding gene functions, cellular processes, subtypes of cells and understanding gene regulations. Subsequently, clustering approaches, including hierarchical, centroid-based, distribution-based, density-based and self-organizing maps, have long been studied and used in classical machine learning settings. In contrast, deep learning (DL)-based representation and feature learning for clustering have not been reviewed and employed extensively. Since the quality of clustering is not only dependent on the distribution of data points but also on the learned representation, deep neural networks can be effective means to transform mappings from a high-dimensional data space into a lower-dimensional feature space, leading to improved clustering results. In this paper, we review state-of-the-art DL-based approaches for cluster analysis that are based on representation learning, which we hope to be useful, particularly for bioinformatics research. Further, we explore in detail the training procedures of DL-based clustering algorithms, point out different clustering quality metrics and evaluate several DL-based approaches on three bioinformatics use cases, including bioimaging, cancer genomics and biomedical text mining. We believe this review and the evaluation results will provide valuable insights and serve a starting point for researchers wanting to apply DL-based unsupervised methods to solve emerging bioinformatics research problems.
Collapse
Affiliation(s)
- Md Rezaul Karim
- Fraunhofer Institute for Applied Information Technology FIT, Schloss Birlinghoven, Sankt Augustin, Germany
| | - Oya Beyan
- Fraunhofer Institute for Applied Information Technology FIT, Schloss Birlinghoven, Sankt Augustin, Germany
- Information Systems and Databases, RWTH Aachen University, Aachen, Germany
| | - Achille Zappa
- Insight Centre for Data Analytics, National University of Ireland Galway, Ireland
| | - Ivan G Costa
- Institute for Computational Genomics, RWTH Aachen University Medical School, Aachen, Germany
| | | | - Michael Cochez
- Fraunhofer Institute for Applied Information Technology FIT, Schloss Birlinghoven, Sankt Augustin, Germany
- Department of Computer Science, Vrije Univeriteit Amsterdam, The Netherlands
| | - Stefan Decker
- Fraunhofer Institute for Applied Information Technology FIT, Schloss Birlinghoven, Sankt Augustin, Germany
- Information Systems and Databases, RWTH Aachen University, Aachen, Germany
| |
Collapse
|
16
|
Structure of communities in semantic networks of biomedical research on disparities in health and sexism. ACTA ACUST UNITED AC 2020; 40:702-721. [PMID: 33275349 PMCID: PMC7808772 DOI: 10.7705/biomedica.5182] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Indexed: 01/12/2023]
Abstract
Introducción. Como una iniciativa para mejorar la calidad de la atención sanitaria, en la investigación biomédica se ha incrementado la tendencia centrada en el estudio de las disparidades en salud y sexismo. Objetivo. Caracterizar la evidencia científica sobre la disparidad en salud definida como la brecha existente entre la distribución de la salud y el posible sesgo por sexo en el acceso a los servicios médicos. Materiales y métodos. Se hizo una búsqueda simultánea de la literatura científica en la base de datos Medline PubMed de dos descriptores fundamentales: Healthcare disparities y Sexism. Posteriormente, se construyó una red semántica principal y se determinaron algunas subunidades estructurales (comunidades) para el análisis de los patrones de organización de la información. Se utilizó el programa de código abierto Cytoscape para el analisis y la visualización de las redes y el MapEquation, para la detección de comunidades. Asimismo, se desarrolló código ex profeso disponible en un repositorio de acceso público. Resultados. El corpus de la red principal mostró que los términos sobre las enfermedades del corazón fueron los descriptores de condiciones médicas más concurrentes. A partir de las subunidades estructurales, se determinaron los patrones de información relacionada con las políticas públicas, los servicios de salud, los factores sociales determinantes y los factores de riesgo, pero con cierta tendencia a mantenerse indirectamente conectados con los nodos relacionados con condiciones médicas. Conclusiones. La evidencia científica indica que la disparidad por sexo sí importa para la calidad de la atención de muchas enfermedades, especialmente aquellas relacionadas con el sistema circulatorio. Sin embargo, aún se percibe un distanciamiento entre los factores médicos y los sociales que dan lugar a las posibles disparidades por sexo.
Collapse
|
17
|
Pandi MT, van der Spek PJ, Koromina M, Patrinos GP. A Novel Text-Mining Approach for Retrieving Pharmacogenomics Associations From the Literature. Front Pharmacol 2020; 11:602030. [PMID: 33343371 PMCID: PMC7748107 DOI: 10.3389/fphar.2020.602030] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 09/30/2020] [Indexed: 11/13/2022] Open
Abstract
Text mining in biomedical literature is an emerging field which has already been shown to have a variety of implementations in many research areas, including genetics, personalized medicine, and pharmacogenomics. In this study, we describe a novel text-mining approach for the extraction of pharmacogenomics associations. The code that was used toward this end was implemented using R programming language, either through custom scripts, where needed, or through utilizing functions from existing libraries. Articles (abstracts or full texts) that correspond to a specified query were extracted from PubMed, while concept annotations were derived by PubTator Central. Terms that denote a Mutation or a Gene as well as Chemical compound terms corresponding to drug compounds were normalized and the sentences containing the aforementioned terms were filtered and preprocessed to create appropriate training sets. Finally, after training and adequate hyperparameter tuning, four text classifiers were created and evaluated (FastText, Linear kernel SVMs, XGBoost, Lasso, and Elastic-Net Regularized Generalized Linear Models) with regard to their performance in identifying pharmacogenomics associations. Although further improvements are essential toward proper implementation of this text-mining approach in the clinical practice, our study stands as a comprehensive, simplified, and up-to-date approach for the identification and assessment of research articles enriched in clinically relevant pharmacogenomics relationships. Furthermore, this work highlights a series of challenges concerning the effective application of text mining in biomedical literature, whose resolution could substantially contribute to the further development of this field.
Collapse
Affiliation(s)
- Maria-Theodora Pandi
- Laboratory of Pharmacogenomics and Individualized Therapy, Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece.,Erasmus University Medical Center, Faculty of Medicine and Health Sciences, Department of Pathology, Bioinformatics Unit, Rotterdam, Netherlands
| | - Peter J van der Spek
- Erasmus University Medical Center, Faculty of Medicine and Health Sciences, Department of Pathology, Bioinformatics Unit, Rotterdam, Netherlands
| | - Maria Koromina
- Laboratory of Pharmacogenomics and Individualized Therapy, Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece
| | - George P Patrinos
- Laboratory of Pharmacogenomics and Individualized Therapy, Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece.,Erasmus University Medical Center, Faculty of Medicine and Health Sciences, Department of Pathology, Bioinformatics Unit, Rotterdam, Netherlands.,Department of Pathology, College of Medicine and Health Sciences, United Arab Emirates University, Al-Ain, United Arab Emirates.,Zayed Center of Health Sciences, United Arab Emirates University, Al-Ain, United Arab Emirates
| |
Collapse
|
18
|
Gauld C, Ouazzani K, Micoulaud-Franchi JA. To split or to lump? Classifying the central disorders of hypersomnolence: sleep split requires epistemological tools and systematic data-driven conceptual analysis. Sleep 2020; 43:5901528. [PMID: 32926158 DOI: 10.1093/sleep/zsaa091] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Christophe Gauld
- Department of Psychiatry, University of Grenoble, Avenue du Maquis du Grésivaudan, Grenoble, France.,UMR CNRS 8590 IHPST, Sorbonne University (Paris 1), Paris, France
| | - Kévin Ouazzani
- Bordeaux Population Health Research Center, U1219, University of Bordeaux, Inserm, Bordeaux, France
| | - Jean-Arthur Micoulaud-Franchi
- University Sleep Clinic, Services of Functional Exploration of the Nervous System, University Hospital of Bordeaux, Place Amélie Raba-Leon, Bordeaux, France.,USR CNRS 3413 SANPSY, University Hospital Pellegrin, University of Bordeaux, Bordeaux, France
| |
Collapse
|