Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

56
(from Reference Citation Analysis)

Article PDFs (27)

Cited by > 0 (39)

Searched Name

Knowledge discovery

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Xing X, Sun M, Guo Z, Zhao Y, Cai Y, Zhou P, Wang H, Gao W, Li P, Yang H. Functional annotation map of natural compounds in traditional Chinese medicines library: TCMs with myocardial protection as a case. Acta Pharm Sin B 2023;13:3802-3816. [PMID: 37719385 PMCID: PMC10502289 DOI: 10.1016/j.apsb.2023.06.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 05/14/2023] [Accepted: 05/31/2023] [Indexed: 09/19/2023] Open

Cuffy C, McInnes BT. Exploring a deep learning neural architecture for closed Literature-based discovery. J Biomed Inform 2023;143:104362. [PMID: 37146741 DOI: 10.1016/j.jbi.2023.104362] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 03/15/2023] [Accepted: 04/06/2023] [Indexed: 05/07/2023]

Abstract

Scientific literature presents a wealth of information yet to be explored. As the number of researchers increase with each passing year and publications are released, this contributes to an era where specialized fields of research are becoming more prevalent. As this trend continues, this further propagates the separation of interdisciplinary publications and makes keeping up to date with literature a laborious task. Literature-based discovery (LBD) aims to mitigate these concerns by promoting information sharing among non-interacting literature while extracting potentially meaningful information. Furthermore, recent advances in neural network architectures and data representation techniques have fueled their respective research communities in achieving state-of-the-art performance in many downstream tasks. However, studies of neural network-based methods for LBD remain to be explored. We introduce and explore a deep learning neural network-based approach for LBD. Additionally, we investigate various approaches to represent terms as concepts and analyze the affect of feature scaling representations into our model. We compare the evaluation performance of our method on five hallmarks of cancer datasets utilized for closed discovery. Our results show the chosen representation as input into our model affects evaluation performance. We found feature scaling our input representations increases evaluation performance and decreases the necessary number of epochs needed to achieve model generalization. We also explore two approaches to represent model output. We found reducing the model's output to capturing a subset of concepts improved evaluation performance at the cost of model generalizability. We also compare the efficacy of our method on the five hallmarks of cancer datasets to a set of randomly chosen relations between concepts. We found these experiments confirm our method's suitability for LBD.

Collapse

Zaripova K, Cosmo L, Kazi A, Ahmadi SA, Bronstein MM, Navab N. Graph-in-Graph (GiG): Learning interpretable latent graphs in non-Euclidean domain for biological and healthcare applications. Med Image Anal 2023;88:102839. [PMID: 37263109 DOI: 10.1016/j.media.2023.102839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 04/26/2023] [Accepted: 05/06/2023] [Indexed: 06/03/2023]

Li K, Marsic I, Sarcevic A, Yang S, Sullivan TM, Tempel PE, Milestone ZP, O'Connell KJ, Burd RS. Discovering interpretable medical process models: A case study in trauma resuscitation. J Biomed Inform 2023;140:104344. [PMID: 36940896 PMCID: PMC10111432 DOI: 10.1016/j.jbi.2023.104344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 01/20/2023] [Accepted: 03/13/2023] [Indexed: 03/23/2023]

Abstract

Understanding the actual work (i.e., "work-as-done") rather than theorized work (i.e., "work-as-imagined") during complex medical processes is critical for developing approaches that improve patient outcomes. Although process mining has been used to discover process models from medical activity logs, it often omits critical steps or produces cluttered and unreadable models. In this paper, we introduce a TraceAlignment-based ProcessDiscovery method called TAD Miner to build interpretable process models for complex medical processes. TAD Miner creates simple linear process models using a threshold metric that optimizes the consensus sequence to represent the backbone process, and then identifies both concurrent activities and uncommon-but-critical activities to represent the side branches. TAD Miner also identifies the locations of repeated activities, an essential feature for representing medical treatment steps. We conducted a study using activity logs of 308 pediatric trauma resuscitations to develop and evaluate TAD Miner. TAD Miner was used to discover process models for five resuscitation goals, including establishing intravenous (IV) access, administering non-invasive oxygenation, performing back assessment, administering blood transfusion, and performing intubation. We quantitively evaluated the process models with several complexity and accuracy metrics, and performed qualitative evaluation with four medical experts to assess the accuracy and interpretability of the discovered models. Through these evaluations, we compared the performance of our method to that of two state-of-the-art process discovery algorithms: Inductive Miner and Split Miner. The process models discovered by TAD Miner had lower complexity and better interpretability than the state-of-the-art methods, and the fitness and precision of the models were comparable. We used the TAD process models to identify (1) the errors and (2)the best locations for the tentative steps in knowledge-driven expert models. The knowledge-driven models were revised based on the modifications suggested by the discovered models. The improved modeling using TAD Miner may enhance understanding of complex medical processes.

Collapse

Kumari N, Acharjya DP. A hybrid rough set shuffled frog leaping knowledge inference system for diagnosis of lung cancer disease. Comput Biol Med 2023;155:106662. [PMID: 36805223 DOI: 10.1016/j.compbiomed.2023.106662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 01/13/2023] [Accepted: 02/09/2023] [Indexed: 02/15/2023]

Abstract

Abundant medical data are generated in the digital world every second. However, gathering helpful information from these data is difficult. Gathering useful information from the dataset is very advantageous and demanding. Besides, such data also contain many extraneous features that do not influence the foreboding accuracy while diagnosing a disease. The data must eliminate these extraneous features to get a better diagnosis. Ultimately, the minimized information system will lead to a better diagnosis. In this paper, we have introduced an incremental rough set shuffled frog leaping algorithm for knowledge inference. The proposed algorithm helps find minimum features from an information system while handling complex databases with uncertainty and incompleteness. The proposed rough set shuffled frog leaping knowledge inference model works in two phases. In the initial phase, the incremental rough set shuffled frog leaping algorithm is used to get the most relevant features. Identifying the relevant features is carried out using a fitness function, which uses the rough degree of dependency. The use of the fitness function identifies the much information with the minimum number of features. The purpose of feature selection is to identify a feature subset from an original set of features without reducing the predictive accuracy and to scale back the computation overhead in the data processing. In the second phase, a rough set is utilized for knowledge discovery in perception with rule generation. The selection of decision rules is carried out based on the accuracy of the decision rule and a predefined threshold value. An empirical analysis of the lung disease information system and a comparative study is conducted. Experimental outcomes exhibit that hybrid techniques express the feasibility of the proposed model while achieving better classification accuracy.

Collapse

Shu X, Ye Y. Knowledge Discovery: Methods from data mining and machine learning. Soc Sci Res 2023;110:102817. [PMID: 36796993 DOI: 10.1016/j.ssresearch.2022.102817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 10/17/2022] [Accepted: 10/18/2022] [Indexed: 06/18/2023]

Zha Y, Chong H, Yang P, Ning K. Microbial Dark Matter: from Discovery to Applications. Genomics Proteomics Bioinformatics 2022;20:867-881. [PMID: 35477055 PMCID: PMC10025686 DOI: 10.1016/j.gpb.2022.02.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/28/2021] [Accepted: 03/22/2022] [Indexed: 01/12/2023]

Pačínková A, Popovici V. Using empirical biological knowledge to infer regulatory networks from multi-omics data. BMC Bioinformatics 2022;23:351. [PMID: 35996085 PMCID: PMC9396869 DOI: 10.1186/s12859-022-04891-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 08/08/2022] [Indexed: 12/13/2022] Open

Abstract

Background

Integration of multi-omics data can provide a more complex view of the biological system consisting of different interconnected molecular components, the crucial aspect for developing novel personalised therapeutic strategies for complex diseases. Various tools have been developed to integrate multi-omics data. However, an efficient multi-omics framework for regulatory network inference at the genome level that incorporates prior knowledge is still to emerge.

Results

We present IntOMICS, an efficient integrative framework based on Bayesian networks. IntOMICS systematically analyses gene expression, DNA methylation, copy number variation and biological prior knowledge to infer regulatory networks. IntOMICS complements the missing biological prior knowledge by so-called empirical biological knowledge, estimated from the available experimental data. Regulatory networks derived from IntOMICS provide deeper insights into the complex flow of genetic information on top of the increasing accuracy trend compared to a published algorithm designed exclusively for gene expression data. The ability to capture relevant crosstalks between multi-omics modalities is verified using known associations in microsatellite stable/instable colon cancer samples. Additionally, IntOMICS performance is compared with two algorithms for multi-omics regulatory network inference that can also incorporate prior knowledge in the inference framework. IntOMICS is also applied to detect potential predictive biomarkers in microsatellite stable stage III colon cancer samples.

Conclusions

We provide IntOMICS, a framework for multi-omics data integration using a novel approach to biological knowledge discovery. IntOMICS is a powerful resource for exploratory systems biology and can provide valuable insights into the complex mechanisms of biological processes that have a vital role in personalised medicine.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-022-04891-9.

Collapse

Mohammadiun S, Hu G, Gharahbagh AA, Li J, Hewage K, Sadiq R. Evaluation of machine learning techniques to select marine oil spill response methods under small-sized dataset conditions. J Hazard Mater 2022;436:129282. [PMID: 35739791 DOI: 10.1016/j.jhazmat.2022.129282] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 05/17/2022] [Accepted: 05/31/2022] [Indexed: 06/15/2023]

Schutte D, Vasilakes J, Bompelli A, Zhou Y, Fiszman M, Xu H, Kilicoglu H, Bishop JR, Adam T, Zhang R. Discovering novel drug-supplement interactions using SuppKG generated from the biomedical literature. J Biomed Inform 2022;131:104120. [PMID: 35709900 PMCID: PMC9335448 DOI: 10.1016/j.jbi.2022.104120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 04/26/2022] [Accepted: 06/08/2022] [Indexed: 12/04/2022]

Abstract

Objective:

Develop a novel methodology to create a comprehensive knowledge graph (SuppKG) to represent a domain with limited coverage in the Unified Medical Language System (UMLS), specifically dietary supplement (DS) information for discovering drug-supplement interactions (DSI), by leveraging biomedical natural language processing (NLP) technologies and a DS domain terminology.

Materials and Methods:

We created SemRepDS (an extension of an NLP tool, SemRep), capable of extracting semantic relations from abstracts by leveraging a DS-specific terminology (iDISK) containing 28,884 DS terms not found in the UMLS. PubMed abstracts were processed using SemRepDS to generate semantic relations, which were then filtered using a PubMedBERT model to remove incorrect relations before generating SuppKG. Two discovery pathways were applied to SuppKG to identify potential DSIs, which are then compared with an existing DSI database and also evaluated by medical professionals for mechanistic plausibility.

Results:

SemRepDS returned 158.5% more DS entities and 206.9% more DS relations than SemRep. The fine-tuned PubMedBERT model (significantly outperformed other machine learning and BERT models) obtained an F1 score of 0.8605 and removed 43.86% of semantic relations, improving the precision of the relations by 26.4% over pre-filtering. SuppKG consists of 56,635 nodes and 595,222 directed edges with 2,928 DS-specific nodes and 164,738 edges. Manual review of findings identified 182 of 250 (72.8%) proposed DS-Gene-Drug and 77 of 100 (77%) proposed DS-Gene1-Function-Gene2-Drug pathways to be mechanistically plausible.

Discussion:

With added DS terminology to the UMLS, SemRepDS has the capability to find more DS-specific semantic relationships from PubMed than SemRep. The utility of the resulting SuppKG was demonstrated using discovery patterns to find novel DSIs.

Conclusion:

For the domain with limited coverage in the traditional terminology (e.g., UMLS), we demonstrated an approach to leverage domain terminology and improve existing NLP tools to generate a more comprehensive knowledge graph for the downstream task. Even this study focuses on DSI, the method may be adapted to other domains.

Collapse

Sassi S, Ivanovic M, Chbeir R, Prasath R, Manolopoulos Y. Collective intelligence and knowledge exploration: an introduction. Int J Data Sci Anal 2022;14:99-111. [PMID: 35730041 PMCID: PMC9205147 DOI: 10.1007/s41060-022-00338-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]

Sebro R, Kahn CE. Causal Associations Among Diseases and Imaging Findings in Radiology Reports. Stud Health Technol Inform 2022;294:411-412. [PMID: 35612109 DOI: 10.3233/shti220487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Raja K. Biomedical Literature Mining and Its Components. Methods Mol Biol 2022;2496:1-16. [PMID: 35713856 DOI: 10.1007/978-1-0716-2305-3_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Alzubaidi A, Tepper J. Deep Mining from Omics Data. Methods Mol Biol 2022;2449:349-386. [PMID: 35507271 DOI: 10.1007/978-1-0716-2095-3_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Abstract

Since the advent of high-throughput omics technologies, various molecular data such as genes, transcripts, proteins, and metabolites have been made widely available to researchers. This has afforded clinicians, bioinformaticians, statisticians, and data scientists the opportunity to apply their innovations in feature mining and predictive modeling to a rich data resource to develop a wide range of generalizable prediction models. What has become apparent over the last 10 years is that researchers have adopted deep neural networks (or "deep nets") as their preferred paradigm of choice for complex data modeling due to the superiority of performance over more traditional statistical machine learning approaches, such as support vector machines. A key stumbling block, however, is that deep nets inherently lack transparency and are considered to be a "black box" approach. This naturally makes it very difficult for clinicians and other stakeholders to trust their deep learning models even though the model predictions appear to be highly accurate. In this chapter, we therefore provide a detailed summary of the deep net architectures typically used in omics research, together with a comprehensive summary of the notable "deep feature mining" techniques researchers have applied to open up this black box and provide some insights into the salient input features and why these models behave as they do. We group these techniques into the following three categories: (a) hidden layer visualization and interpretation; (b) input feature importance and impact evaluation; and (c) output layer gradient analysis. While we find that omics researchers have made some considerable gains in opening up the black box through interpretation of the hidden layer weights and node activations to identify salient input features, we highlight other approaches for omics researchers, such as employing deconvolutional network-based approaches and development of bespoke attribute impact measures to enable researchers to better understand the relationships between the input data and hidden layer representations formed and thus the output behavior of their deep nets.

Collapse

Fana SE, Esmaeili F, Esmaeili S, Bandaryan F, Esfahani EN, Amoli MM, Razi F. Knowledge discovery in genetics of diabetes in Iran, a roadmap for future researches. J Diabetes Metab Disord 2021;20:1785-1791. [PMID: 34900825 DOI: 10.1007/s40200-021-00838-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 06/18/2021] [Indexed: 12/12/2022]

Abstract

Purpose

The pathogenesis of diabetes is considered polygenic as a result of complex interactions between genetic/epigenetic and environmental factors. This review intended to evaluate the scientometric and knowledge gap of diabetes genetics researches conducted in Iran as a case of developing countries, and drawn up a roadmap for future studies.

Methods

We searched Scopus and PubMed databases from January 2015 until December 2019 using the keywords: (diabetes OR diabetic) AND (Iran). All publications were reviewed by two experts and after choosing relevant articles, they were categorized based on the subject, level of evidence, study design, publication year, and type of genetic studies.

Results

Of 10,540 records, 428 articles were met the inclusion criteria. Generally, the number of researches about diabetes genetics rose since 2015. Case-control/cross-sectional and animal studies were the common types of study design and based on the subject, the most frequent researches were about genetic factors involved in diabetes development (38%). Briefly, the top seven genes that were evaluated for T2DM were TCF7L2, APOAII, FTO, PON1, ADIPOQ, MTHFR, and PPARG respectively, and also, CTL4 for T1DM. miR-21, miR-155, and miR-375 respectively were the most micro-RNAs that were evaluated. Furthermore, there were six studies about lncRNAs.

Discussion and Conclusion

Investigation about the genetic of diabetes is progressed although there are some limitations like non-homogenous data from Iran, heterogeneity of ethnicity, and rationale of studies. Compared to the previous analysis in Iran, still, GWAS and large-scale studies are required to achieve better policies for manage and control of diabetes disease.

Supplementary Information

The online version contains supplementary material available at 10.1007/s40200-021-00838-8.

Collapse

Trautman A, Linchangco R, Walstead R, Jay JJ, Brouwer C. The Aliment to Bodily Condition knowledgebase (ABCkb): a database connecting plants and human health. BMC Res Notes 2021;14:433. [PMID: 34838100 PMCID: PMC8627056 DOI: 10.1186/s13104-021-05835-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 11/03/2021] [Indexed: 11/10/2022] Open

Abstract

Objective

Overconsumption of processed foods has led to an increase in chronic diet-related diseases such obesity and type 2 diabetes. Although diets high in fresh fruits and vegetables are linked with healthier outcomes, the specific mechanisms for these relationships are poorly understood. Experiments examining plant phytochemical production and breeding programs, or separately on the health effects of nutritional supplements have yielded results that are sparse, siloed, and difficult to integrate between the domains of human health and agriculture. To connect plant products to health outcomes through their molecular mechanism an integrated computational resource is necessary.

Results

We created the Aliment to Bodily Condition Knowledgebase (ABCkb) to connect plants to human health by creating a stepwise path from plant \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document}→ plant product \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document}→ human gene \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document}→ pathways \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document}→ indication. ABCkb integrates 11 curated sources as well as relationships mined from Medline abstracts by loading into a graph database which is deployed via a Docker container. This new resource, provided in a queryable container with a user-friendly interface connects plant products with human health outcomes for generating nutritive hypotheses. All scripts used are available on github (https://github.com/atrautm1/ABCkb) along with basic directions for building the knowledgebase and a browsable interface is available (https://abckb.charlotte.edu).

Supplementary Information

The online version contains supplementary material available at 10.1186/s13104-021-05835-x.

Collapse

Liu J, Stewart H, Wiens C, Mcnitt-Gray J, Liu B. Development of an integrated biomechanics informatics system with knowledge discovery and decision support tools for research of injury prevention and performance enhancement. Comput Biol Med 2021;141:105062. [PMID: 34836623 DOI: 10.1016/j.compbiomed.2021.105062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Revised: 11/11/2021] [Accepted: 11/20/2021] [Indexed: 11/03/2022]

Rafii F, Nasrabadi AN, Tehrani FJ. How Nurses Apply Patterns of Knowing in Clinical Practice: A Grounded Theory Study. Ethiop J Health Sci 2021;31:139-146. [PMID: 34158761 PMCID: PMC8188100 DOI: 10.4314/ejhs.v31i1.16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Steiner B, Saalfeld B, Elgert L, Haux R, Wolf KH. OnTARi: an ontology for factors influencing therapy adherence to rehabilitation. BMC Med Inform Decis Mak 2021;21:153. [PMID: 33975585 PMCID: PMC8111729 DOI: 10.1186/s12911-021-01512-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/28/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Adherence and motivation are key factors for successful treatment of patients with chronic diseases, especially in long-term care processes like rehabilitation. However, only a few patients achieve good treatment adherence. The causes are manifold. Adherence-influencing factors vary depending on indications, therapies, and individuals. Positive and negative effects are rarely confirmed or even contradictory. An ontology seems to be convenient to represent existing knowledge in this domain and to make it available for information retrieval.

METHODS

First, a manual data extraction of current knowledge in the domain of treatment adherence in rehabilitation was conducted. Data was retrieved from various sources, including basic literature, scientific publications, and health behavior models. Second, all adherence and motivation factors identified were formalized according to the ontology development methodology METHONTOLOGY. This comprises the specification, conceptualization, formalization, and implementation of the ontology "Ontology for factors influencing therapy adherence to rehabilitation" (OnTARi) in Protégé. A taxonomy-oriented evaluation was conducted by two domain experts.

RESULTS

OnTARi includes 281 classes implemented in ontology web language, ten object properties, 22 data properties, 1440 logical axioms, 244 individuals, and 1023 annotations. Six higher-level classes are differentiated: (1) Adherence, (2) AdherenceFactors, (3) AdherenceFactorCategory, (4) Rehabilitation, (5) RehabilitationForm, and (6) RehabilitationType. By means of the class AdherenceFactors 227 adherence factors, thereof 49 hard factors, are represented. Each factor involves a proper description, synonyms, possibly existing acronyms, and a German translation. OnTARi illustrates links between adherence factors through 160 influences-relations. Description logic queries implemented in Protégé allow multiple targeted requests, e.g., for the extraction of adherence factors in a specific rehabilitation area.

CONCLUSIONS

With OnTARi, a generic reference model was built to represent potential adherence and motivation factors and their interrelations in rehabilitation of patients with chronic diseases. In terms of information retrieval, this formalization can serve as a basis for implementation and adaptation of conventional rehabilitative measures, taking into account (patient-specific) adherence factors. OnTARi also enables the development of medical assistance systems to increase motivation and adherence in rehabilitation processes.

Collapse

Eugenie R, Stattner E. DISGROU: an algorithm for discontinuous subgroup discovery. PeerJ Comput Sci 2021;7:e512. [PMID: 33987462 PMCID: PMC8093955 DOI: 10.7717/peerj-cs.512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 04/07/2021] [Indexed: 06/12/2023]

Jacimovic J, Jakovljevic A, Nagendrababu V, Duncan HF, Dummer PMH. A bibliometric analysis of the dental scientific literature on COVID-19. Clin Oral Investig 2021;25:6171-6183. [PMID: 33822288 PMCID: PMC8022306 DOI: 10.1007/s00784-021-03916-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 03/25/2021] [Indexed: 02/06/2023]

Abstract

Objectives

The rapid production of a large volume of literature during the early phase of the COVID-19 outbreak created a substantial burden for clinicians and scientists. Therefore, this manuscript aims to identify and describe the scientific literature addressing COVID-19 from a dental research perspective, in terms of the manuscript origin, research domain, study type, and level of evidence (LoE).

Materials and methods

Data were retrieved from Web of Science, Scopus, and PubMed. A descriptive analysis of bibliographic data, collaboration network, and keyword co-occurrence analysis were performed. Articles were further classified according to the field of interest, main research question, type of study, and LoE.

Results

The present study identified 296 dental scientific COVID-19 original papers, published in 89 journals, and co-authored by 1331 individuals affiliated with 429 institutions from 53 countries. Although 81.4% were single-country papers, extensive collaboration among the institutions of single countries (Italian, British, and Brazilian institutions) was observed. The main research areas were as follows: the potential use of saliva and other oral fluids as promising samples for COVID-19 testing, dental education, and guidelines for the prevention of COVID-19 transmission in dental practice. The majority of articles were narrative reviews, cross-sectional studies, and short communications. The overall LoE in the analyzed dental literature was low, with only two systematic reviews with the highest LoE I.

Conclusion

The dental literature on the COVID-19 pandemic does not provide data relevant to the evidence-based decision-making process. Future studies with a high LoE are essential to gain precise knowledge on COVID-19 infection within the various fields of Dentistry.

Clinical relevance

The published dental literature on COVID-19 consists principally of articles with a low level of scientific evidence which do not provide sufficient reliable high-quality evidence that is essential for decision making in clinical dental practice.

Supplementary Information

The online version contains supplementary material available at 10.1007/s00784-021-03916-6.

Collapse

Hegazi MO, Al-Dossari Y, Al-Yahy A, Al-Sumari A, Hilal A. Preprocessing Arabic text on social media. Heliyon 2021;7:e06191. [PMID: 33644469 PMCID: PMC7895730 DOI: 10.1016/j.heliyon.2021.e06191] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2019] [Revised: 05/19/2020] [Accepted: 02/01/2021] [Indexed: 11/04/2022] Open

Li X, Peng S, Du J. Towards medical knowmetrics: representing and computing medical knowledge using semantic predications as the knowledge unit and the uncertainty as the knowledge context. Scientometrics 2021;:1-27. [PMID: 33612884 DOI: 10.1007/s11192-021-03880-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 01/19/2021] [Indexed: 11/05/2022]

Abstract

In China, Prof. Hongzhou Zhao and Zeyuan Liu are the pioneers of the concept “knowledge unit” and “knowmetrics” for measuring knowledge. However, the definition on “computable knowledge object” remains controversial so far in different fields. For example, it is defined as (1) quantitative scientific concept in natural science and engineering, (2) knowledge point in the field of education research, and (3) semantic predications, i.e., Subject-Predicate-Object (SPO) triples in biomedical fields. The Semantic MEDLINE Database (SemMedDB), a high-quality public repository of SPO triples extracted from medical literature, provides a basic data infrastructure for measuring medical knowledge. In general, the study of extracting SPO triples as computable knowledge unit from unstructured scientific text has been overwhelmingly focusing on scientific knowledge per se. Since the SPO triples would be possibly extracted from hypothetical, speculative statements or even conflicting and contradictory assertions, the knowledge status (i.e., the uncertainty), which serves as an integral and critical part of scientific knowledge has been largely overlooked. This article aims to put forward a framework for Medical Knowmetrics using the SPO triples as the knowledge unit and the uncertainty as the knowledge context. The lung cancer publications dataset is used to validate the proposed framework. The uncertainty of medical knowledge and how its status evolves over time indirectly reflect the strength of competing knowledge claims, and the probability of certainty for a given SPO triple. We try to discuss the new insights using the uncertainty-centric approaches to detect research fronts, and identify knowledge claims with high certainty level, in order to improve the efficacy of knowledge-driven decision support.

Collapse

Moro G, Masseroli M. Gene function finding through cross-organism ensemble learning. BioData Min 2021;14:14. [PMID: 33579334 PMCID: PMC7879670 DOI: 10.1186/s13040-021-00239-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 01/10/2021] [Indexed: 11/12/2022] Open

Abstract

Background

Structured biological information about genes and proteins is a valuable resource to improve discovery and understanding of complex biological processes via machine learning algorithms. Gene Ontology (GO) controlled annotations describe, in a structured form, features and functions of genes and proteins of many organisms. However, such valuable annotations are not always reliable and sometimes are incomplete, especially for rarely studied organisms. Here, we present GeFF (Gene Function Finder), a novel cross-organism ensemble learning method able to reliably predict new GO annotations of a target organism from GO annotations of another source organism evolutionarily related and better studied.

Results

Using a supervised method, GeFF predicts unknown annotations from random perturbations of existing annotations. The perturbation consists in randomly deleting a fraction of known annotations in order to produce a reduced annotation set. The key idea is to train a supervised machine learning algorithm with the reduced annotation set to predict, namely to rebuild, the original annotations. The resulting prediction model, in addition to accurately rebuilding the original known annotations for an organism from their perturbed version, also effectively predicts new unknown annotations for the organism. Moreover, the prediction model is also able to discover new unknown annotations in different target organisms without retraining.We combined our novel method with different ensemble learning approaches and compared them to each other and to an equivalent single model technique. We tested the method with five different organisms using their GO annotations: Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum. The outcomes demonstrate the effectiveness of the cross-organism ensemble approach, which can be customized with a trade-off between the desired number of predicted new annotations and their precision.A Web application to browse both input annotations used and predicted ones, choosing the ensemble prediction method to use, is publicly available at http://tiny.cc/geff/.

Conclusions

Our novel cross-organism ensemble learning method provides reliable predicted novel gene annotations, i.e., functions, ranked according to an associated likelihood value. They are very valuable both to speed the annotation curation, focusing it on the prioritized new annotations predicted, and to complement known annotations available.

Collapse

Yu J, Liu G. Extracting and inserting knowledge into stacked denoising auto-encoders. Neural Netw 2021;137:31-42. [PMID: 33545610 DOI: 10.1016/j.neunet.2021.01.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Revised: 11/28/2020] [Accepted: 01/14/2021] [Indexed: 10/22/2022]

Shahmoradi L, Ramezani A, Atlasi R, Namazi N, Larijani B. Visualization of knowledge flow in interpersonal scientific collaboration network endocrinology and metabolism research institute. J Diabetes Metab Disord 2020;20:815-823. [PMID: 34222091 DOI: 10.1007/s40200-020-00644-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Accepted: 09/21/2020] [Indexed: 10/23/2022]

Piad-Morffis A, Gutiérrez Y, Almeida-Cruz Y, Muñoz R. A computational ecosystem to support eHealth Knowledge Discovery technologies in Spanish. J Biomed Inform 2020;109:103517. [PMID: 32712157 PMCID: PMC7377985 DOI: 10.1016/j.jbi.2020.103517] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 05/18/2020] [Accepted: 07/19/2020] [Indexed: 11/29/2022]

Matsuo R, Yamazaki T, Suzuki M, Toyama H, Araki K. A random forest algorithm-based approach to capture latent decision variables and their cutoff values. J Biomed Inform 2020;110:103548. [PMID: 32866626 DOI: 10.1016/j.jbi.2020.103548] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 08/21/2020] [Accepted: 08/25/2020] [Indexed: 11/18/2022]

Abstract

Although reference intervals (RIs) and clinical decision limits (CDLs) are vital laboratory information for supporting the interpretation of numerical clinical pathology results, there is evidence that RIs and CDLs vary in certain contexts as well as other evidence that RIs and CDLs are flawed. We propose a random forest algorithm-based exploration methodology by using phenotype transformation of independent variables in relation to dependent variables to capture latent decision variables and their cutoff values. We denote certain CDLs within the RIs estimated by an indirect method that affect some diagnostics or outcomes in the context of specific patients' conditions as latent CDLs. We then apply the proposed methodology to clinical laboratory data regarding bodily fluids, such as blood, urine at the admission of patients for the exploration of latent CDLs of hospital length of stay (HLOS) for each patients' condition identified by diseases of patients who undergoing surgeries. From the exploration results, we found that free Thyroxine (T4) above five unique cutoff values: 1.16 ng/dL, 1.19 ng/dL, 1.2 ng/dL, 1.23 ng/dL and 1.25 ng/dL for tachyarrhythmia predicted longer HLOS, though these cutoff values fall within the estimated RIs as well as the hospital-determined RIs. In addition to the evidence that higher free Thyroxine (T4) levels within the RIs have an association with the corresponding disease, on the whole, the cutoff values except 1.16 ng/dL tended to affect long HLOS with the significant differences. The cutoff values could be taken up for discussion among clinical experts whether it is meaningful to alert the risk of patients' conditions and the long HLOS at the admission of patients. If clinical experts appreciate its meaningfulness in clinical practice, the alerts could be embedded in electronic medical records for handling those risks at the admission of patients.

Collapse

Menychtas A, Tsanakas P, Maglogiannis I. Knowledge Discovery on IoT-Enabled mHealth Applications. Adv Exp Med Biol 2020;1194:181-91. [PMID: 32468534 DOI: 10.1007/978-3-030-32622-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register]

Reid RW, Ferrier JW, Jay JJ. Automated gene data integration with Databio. BMC Res Notes 2020;13:195. [PMID: 32238171 PMCID: PMC7110638 DOI: 10.1186/s13104-020-05038-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 03/20/2020] [Indexed: 02/04/2023] Open

Heo GE, Xie Q, Song M, Lee JH. Combining entity co-occurrence with specialized word embeddings to measure entity relation in Alzheimer's disease. BMC Med Inform Decis Mak 2019;19:240. [PMID: 31801521 PMCID: PMC6894106 DOI: 10.1186/s12911-019-0934-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open

Cao XH, Han C, Glass LM, Kindman A, Obradovic Z. Time-to-event estimation by re-defining time. J Biomed Inform 2019;100:103326. [PMID: 31678589 DOI: 10.1016/j.jbi.2019.103326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 09/05/2019] [Accepted: 10/28/2019] [Indexed: 11/26/2022]

Baron JM, Kurant DE, Dighe AS. Machine Learning and Other Emerging Decision Support Tools. Clin Lab Med 2019;39:319-331. [PMID: 31036284 DOI: 10.1016/j.cll.2019.01.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Rodriguez JC, Merino GA, Llera AS, Fernández EA. Massive integrative gene set analysis enables functional characterization of breast cancer subtypes. J Biomed Inform 2019;93:103157. [PMID: 30928514 DOI: 10.1016/j.jbi.2019.103157] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Revised: 03/11/2019] [Accepted: 03/22/2019] [Indexed: 01/31/2023]

Razzak MI, Imran M, Xu G. Big data analytics for preventive medicine. Neural Comput Appl 2019;:1-35. [PMID: 32205918 DOI: 10.1007/s00521-019-04095-y] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 02/12/2019] [Indexed: 11/07/2022]

Chen YA, Tripathi LP, Mizuguchi K. Data Warehousing with TargetMine for Omics Data Analysis. Methods Mol Biol 2019;1986:35-64. [PMID: 31115884 DOI: 10.1007/978-1-4939-9442-7_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]

Arji G, Safdari R, Rezaeizadeh H, Abbassian A, Mokhtaran M, Hossein Ayati M. A systematic literature review and classification of knowledge discovery in traditional medicine. Comput Methods Programs Biomed 2019;168:39-57. [PMID: 30392889 DOI: 10.1016/j.cmpb.2018.10.017] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 10/14/2018] [Accepted: 10/26/2018] [Indexed: 06/08/2023]

Benhar H, Idri A, Fernández-Alemán JL. A Systematic Mapping Study of Data Preparation in Heart Disease Knowledge Discovery. J Med Syst 2018;43:17. [PMID: 30542772 DOI: 10.1007/s10916-018-1134-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 12/03/2018] [Indexed: 01/25/2023]

Abstract

The increasing amount of data produced by various biomedical and healthcare systems has led to a need for methodologies related to knowledge data discovery. Data mining (DM) offers a set of powerful techniques that allow the identification and extraction of relevant information from medical datasets, thus enabling doctors and patients to greatly benefit from DM, particularly in the case of diseases with high mortality and morbidity rates, such as heart disease (HD). Nonetheless, the use of raw medical data implies several challenges, such as missing data, noise, redundancy and high dimensionality, which make the extraction of useful and relevant information difficult and challenging. Intensive research has, therefore, recently begun in order to prepare raw healthcare data before knowledge extraction. In any knowledge data discovery (KDD) process, data preparation is the step prior to DM that deals with data imperfectness in order to improve its quality so as to satisfy the requirements and improve the performances of DM techniques. The objective of this paper is to perform a systematic mapping study (SMS) on data preparation for KDD in cardiology so as to provide an overview of the quantity and type of research carried out in this respect. The SMS consisted of a set of 58 selected papers published in the period January 2000 and December 2017. The selected studies were analyzed according to six criteria: year and channel of publication, preparation task, medical task, DM objective, research type and empirical type. The results show that a high amount of data preparation research was carried out in order to improve the performance of DM-based decision support systems in cardiology. Researchers were mainly interested in the data reduction preparation task and particularly in feature selection. Moreover, the majority of the selected studies focused on classification for the diagnosis of HD. Two main research types were identified in the selected studies: solution proposal and evaluation research, and the most frequently used empirical type was that of historical-based evaluation.

Collapse

Ostaszewski M, Kieffer E, Danoy G, Schneider R, Bouvry P. Clustering approaches for visual knowledge exploration in molecular interaction networks. BMC Bioinformatics 2018;19:308. [PMID: 30157777 PMCID: PMC6116538 DOI: 10.1186/s12859-018-2314-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 08/14/2018] [Indexed: 12/02/2022] Open

Abstract

BACKGROUND

Biomedical knowledge grows in complexity, and becomes encoded in network-based repositories, which include focused, expert-drawn diagrams, networks of evidence-based associations and established ontologies. Combining these structured information sources is an important computational challenge, as large graphs are difficult to analyze visually.

RESULTS

We investigate knowledge discovery in manually curated and annotated molecular interaction diagrams. To evaluate similarity of content we use: i) Euclidean distance in expert-drawn diagrams, ii) shortest path distance using the underlying network and iii) ontology-based distance. We employ clustering with these metrics used separately and in pairwise combinations. We propose a novel bi-level optimization approach together with an evolutionary algorithm for informative combination of distance metrics. We compare the enrichment of the obtained clusters between the solutions and with expert knowledge. We calculate the number of Gene and Disease Ontology terms discovered by different solutions as a measure of cluster quality. Our results show that combining distance metrics can improve clustering accuracy, based on the comparison with expert-provided clusters. Also, the performance of specific combinations of distance functions depends on the clustering depth (number of clusters). By employing bi-level optimization approach we evaluated relative importance of distance functions and we found that indeed the order by which they are combined affects clustering performance. Next, with the enrichment analysis of clustering results we found that both hierarchical and bi-level clustering schemes discovered more Gene and Disease Ontology terms than expert-provided clusters for the same knowledge repository. Moreover, bi-level clustering found more enriched terms than the best hierarchical clustering solution for three distinct distance metric combinations in three different instances of disease maps.

CONCLUSIONS

In this work we examined the impact of different distance functions on clustering of a visual biomedical knowledge repository. We found that combining distance functions may be beneficial for clustering, and improve exploration of such repositories. We proposed bi-level optimization to evaluate the importance of order by which the distance functions are combined. Both combination and order of these functions affected clustering quality and knowledge recognition in the considered benchmarks. We propose that multiple dimensions can be utilized simultaneously for visual knowledge exploration.

Collapse

Xin J, Afrasiabi C, Lelong S, Adesara J, Tsueng G, Su AI, Wu C. Cross-linking BioThings APIs through JSON-LD to facilitate knowledge exploration. BMC Bioinformatics 2018;19:30. [PMID: 29390967 PMCID: PMC5796402 DOI: 10.1186/s12859-018-2041-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Accepted: 01/24/2018] [Indexed: 01/25/2023] Open

Silva JCF, Carvalho TFM, Basso MF, Deguchi M, Pereira WA, Sobrinho RR, Vidigal PMP, Brustolini OJB, Silva FF, Dal-Bianco M, Fontes RLF, Santos AA, Zerbini FM, Cerqueira FR, Fontes EPB. Geminivirus data warehouse: a database enriched with machine learning approaches. BMC Bioinformatics 2017;18:240. [PMID: 28476106 PMCID: PMC5420152 DOI: 10.1186/s12859-017-1646-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Accepted: 04/25/2017] [Indexed: 03/28/2023] Open

Affiliation(s)

Jose Cleydson F Silva Departamento de Informática, Universidade Federal de Viçosa, Viçosa, Brazil.,National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil
Thales F M Carvalho Departamento de Informática, Universidade Federal de Viçosa, Viçosa, Brazil
Marcos F Basso National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil
Michihito Deguchi National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil
Welison A Pereira National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil
Roberto R Sobrinho National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil
Pedro M P Vidigal Núcleo de Biomoléculas, Universidade Federal de Viçosa, Viçosa, MG, Brazil
Otávio J B Brustolini National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil
Fabyano F Silva Departamento de Zootecnia, Universidade Federal de Viçosa, Viçosa, Brazil
Maximiller Dal-Bianco National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil
Renildes L F Fontes Departamento de Solos, Universidade Federal de Viçosa, Viçosa, Brazil
Anésia A Santos National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil.,Departamento de Biologia Geral, Universidade Federal de Viçosa, Viçosa, Brazil
Francisco Murilo Zerbini National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil.,Departamento de Fitopatologia, Universidade Federal de Viçosa, Viçosa, MG, Brazil
Fabio R Cerqueira Departamento de Informática, Universidade Federal de Viçosa, Viçosa, Brazil.,Departamento de Engenharia de Produção, Universidade Federal Fluminense, Petrópolis, Rio de Janeiro, Brazil
Elizabeth P B Fontes National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, Viçosa, Brazil. .,Departamento de Bioquímica e Biologia Molecular, Universidade Federal de Viçosa, Viçosa, Brazil.

Collapse

Liou YF, Huang HL, Ho SY. A hydrophobic spine stabilizes a surface-exposed α-helix according to analysis of the solvent-accessible surface area. BMC Bioinformatics 2016;17:503. [PMID: 28155647 PMCID: PMC5259910 DOI: 10.1186/s12859-016-1368-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open

Abstract

Background

Most of hydrophilic and hydrophobic residues are thought to be exposed and buried in proteins, respectively. In contrast to the majority of the existing studies on protein folding characteristics using protein structures, in this study, our aim was to design predictors for estimating relative solvent accessibility (RSA) of amino acid residues to discover protein folding characteristics from sequences.

Methods

The proposed 20 real-value RSA predictors were designed on the basis of the support vector regression method with a set of informative physicochemical properties (PCPs) obtained by means of an optimal feature selection algorithm. Then, molecular dynamics simulations were performed for validating the knowledge discovered by analysis of the selected PCPs.

Results

The RSA predictors had the mean absolute error of 14.11% and a correlation coefficient of 0.69, better than the existing predictors. The hydrophilic-residue predictors preferred PCPs of buried amino acid residues to PCPs of exposed ones as prediction features. A hydrophobic spine composed of exposed hydrophobic residues of an α-helix was discovered by analyzing the PCPs of RSA predictors corresponding to hydrophobic residues. For example, the results of a molecular dynamics simulation of wild-type sequences and their mutants showed that proteins 1MOF and 2WRP_H16I (Protein Data Bank IDs), which have a perfectly hydrophobic spine, have more stable structures than 1MOF_I54D and 2WRP do (which do not have a perfectly hydrophobic spine).

Conclusions

We identified informative PCPs to design high-performance RSA predictors and to analyze these PCPs for identification of novel protein folding characteristics. A hydrophobic spine in a protein can help to stabilize exposed α-helices.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1368-z) contains supplementary material, which is available to authorized users.

Collapse

Bui EN. Data-driven Critical Zone science: A new paradigm. Sci Total Environ 2016;568:587-593. [PMID: 26883371 DOI: 10.1016/j.scitotenv.2016.01.202] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Revised: 01/28/2016] [Accepted: 01/29/2016] [Indexed: 06/05/2023]

Rodriguez JC, González GA, Fresno C, Llera AS, Fernández EA. Improving information retrieval in functional analysis. Comput Biol Med 2016;79:10-20. [PMID: 27723507 DOI: 10.1016/j.compbiomed.2016.09.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Revised: 09/21/2016] [Accepted: 09/22/2016] [Indexed: 12/20/2022]

Yimam SM, Biemann C, Majnaric L, Šabanović Š, Holzinger A. An adaptive annotation approach for biomedical entity and relation recognition. Brain Inform 2016;3:157-168. [PMID: 27747591 PMCID: PMC4999566 DOI: 10.1007/s40708-016-0036-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 01/25/2016] [Indexed: 12/14/2022] Open

Zaslavsky L, Ciufo S, Fedorov B, Tatusova T. Clustering analysis of proteins from microbial genomes at multiple levels of resolution. BMC Bioinformatics 2016;17 Suppl 8:276. [PMID: 27586436 PMCID: PMC5009818 DOI: 10.1186/s12859-016-1112-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open

Abstract

Background

Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy.

Results

Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering.

The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters.

Conclusion

The developed filtering strategies allow to identify and exclude such peripheral proteins limiting the protein dataset in global clustering. Overall, the proposed methodology allows the relevant data at different levels of details to be obtained and data redundancy eliminated while keeping biologically interesting variations.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1112-8) contains supplementary material, which is available to authorized users.

Collapse

Papanikolaou N, Pavlopoulos GA, Theodosiou T, Vizirianakis IS, Iliopoulos I. DrugQuest - a text mining workflow for drug association discovery. BMC Bioinformatics 2016;17 Suppl 5:182. [PMID: 27295093 PMCID: PMC4905607 DOI: 10.1186/s12859-016-1041-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Domeniconi G, Masseroli M, Moro G, Pinoli P. Cross-organism learning method to discover new gene functionalities. Comput Methods Programs Biomed 2016;126:20-34. [PMID: 26724853 DOI: 10.1016/j.cmpb.2015.12.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2015] [Revised: 11/16/2015] [Accepted: 12/08/2015] [Indexed: 06/05/2023]

Abstract

BACKGROUND

Knowledge of gene and protein functions is paramount for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. Analyses for biomedical knowledge discovery greatly benefit from the availability of gene and protein functional feature descriptions expressed through controlled terminologies and ontologies, i.e., of gene and protein biomedical controlled annotations. In the last years, several databases of such annotations have become available; yet, these valuable annotations are incomplete, include errors and only some of them represent highly reliable human curated information. Computational techniques able to reliably predict new gene or protein annotations with an associated likelihood value are thus paramount.

METHODS

Here, we propose a novel cross-organisms learning approach to reliably predict new functionalities for the genes of an organism based on the known controlled annotations of the genes of another, evolutionarily related and better studied, organism. We leverage a new representation of the annotation discovery problem and a random perturbation of the available controlled annotations to allow the application of supervised algorithms to predict with good accuracy unknown gene annotations. Taking advantage of the numerous gene annotations available for a well-studied organism, our cross-organisms learning method creates and trains better prediction models, which can then be applied to predict new gene annotations of a target organism.

RESULTS

We tested and compared our method with the equivalent single organism approach on different gene annotation datasets of five evolutionarily related organisms (Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum). Results show both the usefulness of the perturbation method of available annotations for better prediction model training and a great improvement of the cross-organism models with respect to the single-organism ones, without influence of the evolutionary distance between the considered organisms. The generated ranked lists of reliably predicted annotations, which describe novel gene functionalities and have an associated likelihood value, are very valuable both to complement available annotations, for better coverage in biomedical knowledge discovery analyses, and to quicken the annotation curation process, by focusing it on the prioritized novel annotations predicted.

Collapse

Girardi D, Küng J, Kleiser R, Sonnberger M, Csillag D, Trenkler J, Holzinger A. Interactive knowledge discovery with the doctor-in-the-loop: a practical example of cerebral aneurysms research. Brain Inform 2016;3:133-43. [PMID: 27747590 DOI: 10.1007/s40708-016-0038-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2015] [Accepted: 02/03/2016] [Indexed: 12/02/2022] Open

Zare Hosseini Z, Mohammadzadeh M. Knowledge discovery from patients' behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services. Iran J Pharm Res 2016;15:355-67. [PMID: 27610177 PMCID: PMC4986115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]