1
|
Upadhyaya P, Zhang K, Li C, Jiang X, Kim Y. Scalable Causal Structure Learning: Scoping Review of Traditional and Deep Learning Algorithms and New Opportunities in Biomedicine. JMIR Med Inform 2023; 11:e38266. [PMID: 36649070 PMCID: PMC9890349 DOI: 10.2196/38266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 08/30/2022] [Accepted: 09/18/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Causal structure learning refers to a process of identifying causal structures from observational data, and it can have multiple applications in biomedicine and health care. OBJECTIVE This paper provides a practical review and tutorial on scalable causal structure learning models with examples of real-world data to help health care audiences understand and apply them. METHODS We reviewed traditional (combinatorial and score-based) methods for causal structure discovery and machine learning-based schemes. Various traditional approaches have been studied to tackle this problem, the most important among these being the Peter Spirtes and Clark Glymour algorithms. This was followed by analyzing the literature on score-based methods, which are computationally faster. Owing to the continuous constraint on acyclicity, there are new deep learning approaches to the problem in addition to traditional and score-based methods. Such methods can also offer scalability, particularly when there is a large amount of data involving multiple variables. Using our own evaluation metrics and experiments on linear, nonlinear, and benchmark Sachs data, we aimed to highlight the various advantages and disadvantages associated with these methods for the health care community. We also highlighted recent developments in biomedicine where causal structure learning can be applied to discover structures such as gene networks, brain connectivity networks, and those in cancer epidemiology. RESULTS We also compared the performance of traditional and machine learning-based algorithms for causal discovery over some benchmark data sets. Directed Acyclic Graph-Graph Neural Network has the lowest structural hamming distance (19) and false positive rate (0.13) based on the Sachs data set, whereas Greedy Equivalence Search and Max-Min Hill Climbing have the best false discovery rate (0.68) and true positive rate (0.56), respectively. CONCLUSIONS Machine learning-based approaches, including deep learning, have many advantages over traditional approaches, such as scalability, including a greater number of variables, and potentially being applied in a wide range of biomedical applications, such as genetics, if sufficient data are available. Furthermore, these models are more flexible than traditional models and are poised to positively affect many applications in the future.
Collapse
Affiliation(s)
- Pulakesh Upadhyaya
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, HOUSTON, TX, United States
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA, United States
| | - Kai Zhang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, HOUSTON, TX, United States
| | - Can Li
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, HOUSTON, TX, United States
| | - Xiaoqian Jiang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, HOUSTON, TX, United States
| | - Yejin Kim
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, HOUSTON, TX, United States
| |
Collapse
|
2
|
Kumari P, Wang Q, Khan F, Kwon JSI. A Direct Transfer Entropy-Based Multiblock Bayesian Network for Root Cause Diagnosis of Process Faults. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.2c02320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Pallavi Kumari
- Texas A&M Energy Institute, Texas A&M University, College Station, Texas77843, United States
- Mary Kay O’Connor Process Safety Center, Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas77843, United States
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas77843, United States
| | - Qingsheng Wang
- Mary Kay O’Connor Process Safety Center, Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas77843, United States
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas77843, United States
| | - Faisal Khan
- Mary Kay O’Connor Process Safety Center, Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas77843, United States
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas77843, United States
| | - Joseph Sang-Il Kwon
- Texas A&M Energy Institute, Texas A&M University, College Station, Texas77843, United States
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas77843, United States
| |
Collapse
|
3
|
A stochastic variance-reduced coordinate descent algorithm for learning sparse Bayesian network from discrete high-dimensional data. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01674-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/06/2022]
|
4
|
Wang X, Ren H, Guo X. A novel discrete firefly algorithm for Bayesian network structure learning. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108426] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
5
|
Yu K, Cui Z, Sui X, Qiu X, Zhang J. Biological Network Inference With GRASP: A Bayesian Network Structure Learning Method Using Adaptive Sequential Monte Carlo. Front Genet 2021; 12:764020. [PMID: 34912373 PMCID: PMC8668238 DOI: 10.3389/fgene.2021.764020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 10/25/2021] [Indexed: 11/13/2022] Open
Abstract
Bayesian networks (BNs) provide a probabilistic, graphical framework for modeling high-dimensional joint distributions with complex correlation structures. BNs have wide applications in many disciplines, including biology, social science, finance and biomedical science. Despite extensive studies in the past, network structure learning from data is still a challenging open question in BN research. In this study, we present a sequential Monte Carlo (SMC)-based three-stage approach, GRowth-based Approach with Staged Pruning (GRASP). A double filtering strategy was first used for discovering the overall skeleton of the target BN. To search for the optimal network structures we designed an adaptive SMC (adSMC) algorithm to increase the quality and diversity of sampled networks which were further improved by a third stage to reclaim edges missed in the skeleton discovery step. GRASP gave very satisfactory results when tested on benchmark networks. Finally, BN structure learning using multiple types of genomics data illustrates GRASP’s potential in discovering novel biological relationships in integrative genomic studies.
Collapse
Affiliation(s)
- Kaixian Yu
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| | - Zihan Cui
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| | - Xin Sui
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| | - Xing Qiu
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, United States
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| |
Collapse
|
6
|
Touré V, Flobak Å, Niarakis A, Vercruysse S, Kuiper M. The status of causality in biological databases: data resources and data retrieval possibilities to support logical modeling. Brief Bioinform 2021; 22:bbaa390. [PMID: 33378765 PMCID: PMC8294520 DOI: 10.1093/bib/bbaa390] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 11/26/2020] [Accepted: 11/27/2020] [Indexed: 12/16/2022] Open
Abstract
Causal molecular interactions represent key building blocks used in computational modeling, where they facilitate the assembly of regulatory networks. Logical regulatory networks can be used to predict biological and cellular behaviors by system perturbations and in silico simulations. Today, broad sets of causal interactions are available in a variety of biological knowledge resources. However, different visions, based on distinct biological interests, have led to the development of multiple ways to describe and annotate causal molecular interactions. It can therefore be challenging to efficiently explore various resources of causal interaction and maintain an overview of recorded contextual information that ensures valid use of the data. This review lists the different types of public resources with causal interactions, the different views on biological processes that they represent, the various data formats they use for data representation and storage, and the data exchange and conversion procedures that are available to extract and download these interactions. This may further raise awareness among the targeted audience, i.e. logical modelers and other scientists interested in molecular causal interactions, but also database managers and curators, about the abundance and variety of causal molecular interaction data, and the variety of tools and approaches to convert them into one interoperable resource.
Collapse
Affiliation(s)
- Vasundra Touré
- Department of Biology of the Norwegian University of Science and Technology
| | | | - Anna Niarakis
- Department of Biology, Univ Evry, University of Paris-Saclay, affiliated with the laboratory GenHotel in Genopole campus, and a delegate at the Lifeware Group, INRIA Saclay
| | - Steven Vercruysse
- Researcher in computer science and computational biology and focuses on building a bridge between human and computer understanding
| | - Martin Kuiper
- systems biology at the Department of Biology of the Norwegian University of Science and Technology
| |
Collapse
|
7
|
Aranyi SC, Nagy M, Opposits G, Berényi E, Emri M. Characterizing Network Search Algorithms Developed for Dynamic Causal Modeling. Front Neuroinform 2021; 15:656486. [PMID: 34177506 PMCID: PMC8222613 DOI: 10.3389/fninf.2021.656486] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 05/07/2021] [Indexed: 11/13/2022] Open
Abstract
Dynamic causal modeling (DCM) is a widely used tool to estimate the effective connectivity of specified models of a brain network. Finding the model explaining measured data is one of the most important outstanding problems in Bayesian modeling. Using heuristic model search algorithms enables us to find an optimal model without having to define a model set a priori. However, the development of such methods is cumbersome in the case of large model-spaces. We aimed to utilize commonly used graph theoretical search algorithms for DCM to create a framework for characterizing them, and to investigate relevance of such methods for single-subject and group-level studies. Because of the enormous computational demand of DCM calculations, we separated the model estimation procedure from the search algorithm by providing a database containing the parameters of all models in a full model-space. For test data a publicly available fMRI dataset of 60 subjects was used. First, we reimplemented the deterministic bilinear DCM algorithm in the ReDCM R package, increasing computational speed during model estimation. Then, three network search algorithms have been adapted for DCM, and we demonstrated how modifications to these methods, based on DCM posterior parameter estimates, can enhance search performance. Comparison of the results are based on model evidence, structural similarities and the number of model estimations needed during search. An analytical approach using Bayesian model reduction (BMR) for efficient network discovery is already available for DCM. Comparing model search methods we found that topological algorithms often outperform analytical methods for single-subject analysis and achieve similar results for recovering common network properties of the winning model family, or set of models, obtained by multi-subject family-wise analysis. However, network search methods show their limitations in higher level statistical analysis of parametric empirical Bayes. Optimizing such linear modeling schemes the BMR methods are still considered the recommended approach. We envision the freely available database of estimated model-spaces to help further studies of the DCM model-space, and the ReDCM package to be a useful contribution for Bayesian inference within and beyond the field of neuroscience.
Collapse
Affiliation(s)
- Sándor Csaba Aranyi
- Division of Nuclear Medicine and Translational Imaging, Department of Medical Imaging, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Marianna Nagy
- Division of Radiology and Imaging Science, Department of Medical Imaging, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Gábor Opposits
- Division of Nuclear Medicine and Translational Imaging, Department of Medical Imaging, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Ervin Berényi
- Division of Radiology and Imaging Science, Department of Medical Imaging, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Miklós Emri
- Division of Nuclear Medicine and Translational Imaging, Department of Medical Imaging, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| |
Collapse
|
8
|
Adabor ES, Acquaah-Mensah GK. DOKI: Domain knowledge-driven inference method for reverse-engineering transcriptional regulatory relationships among genes in cancer. Comput Biol Med 2020; 125:104017. [PMID: 33010618 DOI: 10.1016/j.compbiomed.2020.104017] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 09/16/2020] [Accepted: 09/20/2020] [Indexed: 11/18/2022]
Abstract
Efficient reverse-engineering methods are important for identifying transcriptional regulatory relationships among genes in cancer. These methods are becoming increasingly useful in this era where huge volumes of data are generated through the use of high-throughput technologies such as next-generation sequencing technologies and microarrays. However, it is important to improve current methods because of complications involved in modelling complex biological systems. In this paper, we present a novel approach, Domain Knowledge-driven Inference (DOKI), for identification of transcriptional regulatory relationships among genes, given a biological context such as cancer. Combining data normalization, the use of a probability distribution function and Kullback-Leibler Divergence, DOKI incorporates a domain knowledge-driven criterion to make determinations of the existence of regulatory relationships between given transcription factors and given specific gene targets. Characteristics of DOKI enable it to adequately handle complexities inherent in data, and accurately unearth linear and higher-order dependent relationships among genes. DOKI performed equally well with one established high-performing method and better than three other high-performing methods on relatively small data sets. However, it remarkably outperformed these methods on larger data sets to demonstrate its utility. Furthermore, we demonstrate the relevance of such inference algorithms for identifying novel relationships among genes in breast cancer, as some of the consensus results representing novel relationships were confirmed in previously published experimental results. Thus, DOKI will facilitate current efforts to gain etiological insights and help uncover new targeted therapies for various diseases.
Collapse
Affiliation(s)
- Emmanuel S Adabor
- School of Technology, Ghana Institute of Management and Public Administration, Achimota, Accra, Ghana.
| | - George K Acquaah-Mensah
- Pharmaceutical Sciences Department, Massachusetts College of Pharmacy and Health Sciences (MCPHS University), 19 Foster Street, Worcester, MA, USA
| |
Collapse
|
9
|
Adabor ES, Acquaah-Mensah GK, Mazandu GK. MSclassifier: median-supplement model-based classification tool for automated knowledge discovery. F1000Res 2020; 9:1114. [PMID: 33456763 PMCID: PMC7788522 DOI: 10.12688/f1000research.25501.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/03/2020] [Indexed: 11/20/2022] Open
Abstract
High-throughput technologies have resulted in an exponential growth of publicly available and accessible datasets for biomedical research. Efficient computational models, algorithms and tools are required to exploit the datasets for knowledge discovery to aid medical decisions. Here, we introduce a new tool, MSclassifier, based on median-supplement approaches to machine learning to enable an automated and effective binary classification for optimal decision making. The MSclassifier package estimates medians of features (attributes) to deduce supplementary data, which is subsequently introduced into the training set for balancing and building superior models for classification. To test our approach, it is used to determine HER2 receptor expression status phenotypes in breast cancer and also predict protein subcellular localization (plasma membrane and nucleus). Using independent sample and cross-validation tests, the performance of MSclassifier is evaluated and compared with well established tools that could perform such tasks. In the HER2 receptor expression status phenotype identification tasks, MSclassifier achieved statistically significant higher classification rates than the best performing existing tool (90.30% versus 89.83%, p=8.62e-3). In the subcellular localization prediction tasks, MSclassifier and one other existing tool achieved equally high performances (93.42% versus 93.19%, p=0.06) although they both outperformed tools based on Naive Bayes classifiers. Overall, the application and evaluation of MSclassifier reveal its potential to be applied to varieties of binary classification problems. The MSclassifier package provides an R-portable and user-friendly application to a broad audience, enabling experienced end-users as well as non-programmers to perform an effective classification in biomedical and other fields of study.
Collapse
Affiliation(s)
- Emmanuel S. Adabor
- School of Technology, Ghana Institute of Management and Public Administration, Accra, Ghana
| | - George K. Acquaah-Mensah
- Pharmaceutical Sciences Department, Massachusetts College of Pharmacy and Health Sciences, Worcester, MA, USA
| | - Gaston K. Mazandu
- African Institute for Mathematical Sciences and Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
10
|
Ramos J, Yoo C, Felty Q, Gong Z, Liuzzi JP, Poppiti R, Thakur IS, Goel R, Vaid AK, Komotar RJ, Ehtesham NZ, Hasnain SE, Roy D. Sensitivity to differential NRF1 gene signatures contributes to breast cancer disparities. J Cancer Res Clin Oncol 2020; 146:2777-2815. [PMID: 32705365 DOI: 10.1007/s00432-020-03320-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 07/09/2020] [Indexed: 01/12/2023]
Abstract
PURPOSE Nuclear respiratory factor 1 (NRF1) drives estrogen-dependent breast tumorigenesis. Herein we examined the impact of NRF1 activity on the aggressiveness and disparate molecular signature of breast cancer in Black, White, Asian, and Hispanic women. METHODS NRF1 activity by transcription factor target enrichment analysis and causal NRF1-target gene signatures by Bayesian Network Inference with Java Objects (BANJO) and Markov Chain Monte Carlo (MCMC)-based gene order were examined in The Cancer Genome Atlas (TCGA) breast cancer cohorts. RESULTS We are the first to report increased NRF1 activity based on its differential effects on genome-wide transcription associated with luminal A and B, HER2+ and triple-negative (TN) molecular subtypes of breast cancer in women of different race/ethnicity. We observed disparate NRF1 motif-containing causal gene signatures unique to Black, White, Asian, and Hispanic women for luminal A breast cancer. Further gene order searches showed molecular heterogeneity of each subtype of breast cancer. Six different gene order sequences involving CDK1, HMMR, CCNB2, CCNB1, E2F1, CREB3L4, GTSE1, and LMNB1 with almost equal weight predicted the probability of luminal A breast cancer in whites. Three different gene order sequences consisting of CCNB1 and GTSE1, and CCNB1, LMNB1, CDK1 or CASP3 predicted almost 100% probability of luminal B breast cancer in whites; CCNB1 and LMNB1 or GTSE predicted 100% HER2+ breast cancer in whites. GTSE1 and TUBA1C combined together predicted 100% probability of developing TNBC in whites; NRF1, TUBA1B and BAX with EFNA4, and NRF1 and BTRC predicated 100% TNBC in blacks. High expressor NRF1 TN breast tumors showed unfavorable prognosis with a high risk of breast cancer death in white women. CONCLUSION Our findings showed how sensitivity to high NRF1 transcriptional activity coupled with its target gene signatures contribute to racial differences in luminal A and TN breast cancer subtypes. This knowledge may be useful in personalized intervention to prevent and treat this clinically challenging problem.
Collapse
Affiliation(s)
- Jairo Ramos
- Department of Environmental Health Sciences, Florida International University, Miami, USA
| | - Changwon Yoo
- Department of Biostatistics, Florida International University, Miami, FL, 33199, USA
| | - Quentin Felty
- Department of Environmental Health Sciences, Florida International University, Miami, USA
| | - Zhenghua Gong
- Department of Biostatistics, Florida International University, Miami, FL, 33199, USA
| | - Juan P Liuzzi
- Department of Dietetics and Nutrition, Florida International University, Miami, FL, 33199, USA
| | - Robert Poppiti
- Department of Pathology, Florida International University, Miami, FL, USA
| | - Indu Shekhar Thakur
- School of Environmental Sciences, Jawaharlal Nehru University, New Delhi, 110067, India
| | - Ruchika Goel
- Medanta Cancer Institute, Medanta-The Medicity, Gurugram, Haryana, 122001, India
| | - Ashok Kumar Vaid
- Medanta Cancer Institute, Medanta-The Medicity, Gurugram, Haryana, 122001, India
| | - Ricardo Jorge Komotar
- Department of Neurological Surgery, University of Miami School of Medicine, Miami, FL, USA
| | - Nasreen Z Ehtesham
- ICMR-National Institute of Pathology, Safdarjung Hospital Campus, New Delhi, India
| | - Seyed E Hasnain
- JH Institute of Molecular Medicine, Jamia Hamdard, New Delhi, India
| | - Deodutta Roy
- Department of Environmental Health Sciences, Florida International University, Miami, USA.
| |
Collapse
|
11
|
Sauta E, Demartini A, Vitali F, Riva A, Bellazzi R. A Bayesian data fusion based approach for learning genome-wide transcriptional regulatory networks. BMC Bioinformatics 2020; 21:219. [PMID: 32471360 PMCID: PMC7257163 DOI: 10.1186/s12859-020-3510-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 04/22/2020] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Reverse engineering of transcriptional regulatory networks (TRN) from genomics data has always represented a computational challenge in System Biology. The major issue is modeling the complex crosstalk among transcription factors (TFs) and their target genes, with a method able to handle both the high number of interacting variables and the noise in the available heterogeneous experimental sources of information. RESULTS In this work, we propose a data fusion approach that exploits the integration of complementary omics-data as prior knowledge within a Bayesian framework, in order to learn and model large-scale transcriptional networks. We develop a hybrid structure-learning algorithm able to jointly combine TFs ChIP-Sequencing data and gene expression compendia to reconstruct TRNs in a genome-wide perspective. Applying our method to high-throughput data, we verified its ability to deal with the complexity of a genomic TRN, providing a snapshot of the synergistic TFs regulatory activity. Given the noisy nature of data-driven prior knowledge, which potentially contains incorrect information, we also tested the method's robustness to false priors on a benchmark dataset, comparing the proposed approach to other regulatory network reconstruction algorithms. We demonstrated the effectiveness of our framework by evaluating structural commonalities of our learned genomic network with other existing networks inferred by different DNA binding information-based methods. CONCLUSIONS This Bayesian omics-data fusion based methodology allows to gain a genome-wide picture of the transcriptional interplay, helping to unravel key hierarchical transcriptional interactions, which could be subsequently investigated, and it represents a promising learning approach suitable for multi-layered genomic data integration, given its robustness to noisy sources and its tailored framework for handling high dimensional data.
Collapse
Affiliation(s)
- Elisabetta Sauta
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100, Pavia, Italy.
| | - Andrea Demartini
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100, Pavia, Italy
| | - Francesca Vitali
- Center for Biomedical Informatics and Biostatistics, Dept. of Medicine, The University of Arizona Health Sciences, 1230 Cherry Ave, Tucson, AZ, 85719, USA
| | - Alberto Riva
- Bioinformatics Core, Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, 32610, USA
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100, Pavia, Italy
| |
Collapse
|
12
|
Liu J, Tian Z, Xiao Y, Liu H, Hao S, Zhang X, Wang C, Sun J, Yu H, Yan J. Gene Regulatory Relationship Mining Using Improved Three-Phase Dependency Analysis Approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:339-346. [PMID: 30281476 DOI: 10.1109/tcbb.2018.2872993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
How to mine the gene regulatory relationship and construct gene regulatory network (GRN) is of utmost interest within the whole biological community, however, which has been consistently a challenging problem since the tremendous complexity in cellular systems. In present work, we construct gene regulatory network using an improved three-phase dependency analysis algorithm (TPDA) Bayesian network learning method, which includes the steps of Drafting, Thickening, and Thinning. In order to solve the problem of learning result is not reliable due to the high order conditional independence test, we use the entropy estimation approach of Gaussian kernel probability density estimator to calculate the (conditional) mutual information between genes. The experiment on the public benchmark data sets show the improved method outperforms the other nine kinds of Bayesian network learning methods when to process the data with large sample size, with small number of discrete values, and the frequency of different discrete values is about same. In addition, the improved TPDA method was further applied on a real large gene expression data set on RNA-seq from a global collection with 368 elite maize inbred lines. Experiment results show it performs better than the original TPDA method and the other nine kinds of Bayesian network learning algorithms significantly.
Collapse
|
13
|
Liu J, Ji J, Jia X, Zhang A. Learning Brain Effective Connectivity Network Structure Using Ant Colony Optimization Combining With Voxel Activation Information. IEEE J Biomed Health Inform 2019; 24:2028-2040. [PMID: 31603829 DOI: 10.1109/jbhi.2019.2946676] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Learning brain effective connectivity (EC) networks from functional magnetic resonance imaging (fMRI) data has become a new hot topic in the neuroinformatics field. However, how to accurately and efficiently learn brain EC networks is still a challenging problem. In this paper, we propose a new algorithm to learn the brain EC network structure using ant colony optimization (ACO) algorithm combining with voxel activation information, named as VACOEC. First, VACOEC uses the voxel activation information to measure the independence between each pair of brain regions and effectively restricts the space of candidate solutions, which makes many unnecessary searches of ants be avoided. Then, by combining the global score increase of a solution with the voxel activation information, a new heuristic function is designed to guide the process of ACO to search for the optimal solution. The experimental results on simulated datasets show that the proposed method can accurately and efficiently identify the directions of the brain EC networks. Moreover, the experimental results on real-world data show that patients with Alzheimers disease (AD) exhibit decreased effective connectivity not only in the intra-network within the default mode network (DMN) and salience network (SN), but also in the inter-network between DMN and SN, compared with normal control (NC) subjects. The experimental results demonstrate that VACOEC is promising for practical applications in the neuroimaging studies of geriatric subjects and neurological patients.
Collapse
|
14
|
Computational Health Engineering Applied to Model Infectious Diseases and Antimicrobial Resistance Spread. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9122486] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Infectious diseases are the primary cause of mortality worldwide. The dangers of infectious disease are compounded with antimicrobial resistance, which remains the greatest concern for human health. Although novel approaches are under investigation, the World Health Organization predicts that by 2050, septicaemia caused by antimicrobial resistant bacteria could result in 10 million deaths per year. One of the main challenges in medical microbiology is to develop novel experimental approaches, which enable a better understanding of bacterial infections and antimicrobial resistance. After the introduction of whole genome sequencing, there was a great improvement in bacterial detection and identification, which also enabled the characterization of virulence factors and antimicrobial resistance genes. Today, the use of in silico experiments jointly with computational and machine learning offer an in depth understanding of systems biology, allowing us to use this knowledge for the prevention, prediction, and control of infectious disease. Herein, the aim of this review is to discuss the latest advances in human health engineering and their applicability in the control of infectious diseases. An in-depth knowledge of host–pathogen–protein interactions, combined with a better understanding of a host’s immune response and bacterial fitness, are key determinants for halting infectious diseases and antimicrobial resistance dissemination.
Collapse
|
15
|
Glymour C, Zhang K, Spirtes P. Review of Causal Discovery Methods Based on Graphical Models. Front Genet 2019; 10:524. [PMID: 31214249 PMCID: PMC6558187 DOI: 10.3389/fgene.2019.00524] [Citation(s) in RCA: 143] [Impact Index Per Article: 28.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Accepted: 05/13/2019] [Indexed: 12/11/2022] Open
Abstract
A fundamental task in various disciplines of science, including biology, is to find underlying causal relations and make use of them. Causal relations can be seen if interventions are properly applied; however, in many cases they are difficult or even impossible to conduct. It is then necessary to discover causal relations by analyzing statistical properties of purely observational data, which is known as causal discovery or causal structure search. This paper aims to give a introduction to and a brief review of the computational methods for causal discovery that were developed in the past three decades, including constraint-based and score-based methods and those based on functional causal models, supplemented by some illustrations and applications.
Collapse
Affiliation(s)
- Clark Glymour
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Kun Zhang
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Peter Spirtes
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
16
|
Adabor ES, Acquaah-Mensah GK. Restricted-derestricted dynamic Bayesian Network inference of transcriptional regulatory relationships among genes in cancer. Comput Biol Chem 2019; 79:155-164. [PMID: 30822674 DOI: 10.1016/j.compbiolchem.2019.02.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 01/21/2019] [Accepted: 02/20/2019] [Indexed: 01/19/2023]
Abstract
Understanding transcriptional regulatory relationships among genes is important for gaining etiological insights into diseases such as cancer. To this end, high-throughput biological data have been generated through advancements in a variety of technologies. These rely on computational approaches to discover underlying structures in such data. Among these computational approaches, Bayesian networks (BNs) stand out because their probabilistic nature enables them to manage randomness in the dynamics of gene regulation and experimental data. Feedback loops inherent in networks of regulatory relationships are more tractable when enhancements to BNs are applied to them. Here, we propose Restricted-Derestricted dynamic BNs with a novel search technique, Restricted-Derestricted Greedy Method, for such tasks. This approach relies on the Restricted-Derestricted Greedy search technique to infer transcriptional regulatory networks in two phases: restricted inference and derestricted inference. An application of this approach to real data sets reveals it performs favourably well compared to other existing well performing dynamic BN approaches in terms of recovering true relationships among genes. In addition, it provides a balance between searching for optimal networks and keeping biologically relevant regulatory interactions among variables.
Collapse
Affiliation(s)
- Emmanuel S Adabor
- School of Technology, Ghana Institute of Management and Public Administration, Achimota, Accra, Ghana.
| | - George K Acquaah-Mensah
- Pharmaceutical Sciences Department, Massachusetts College of Pharmacy and Health Sciences (MCPHS University), 19 Foster Street, Worcester, MA, USA
| |
Collapse
|
17
|
Li H, Wang F, Li H. Integrating expert knowledge for Bayesian network structure learning based on intuitionistic fuzzy set and Genetic Algorithm. INTELL DATA ANAL 2019. [DOI: 10.3233/ida-183877] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
18
|
Causal Queries from Observational Data in Biological Systems via Bayesian Networks: An Empirical Study in Small Networks. Methods Mol Biol 2018. [PMID: 30547398 DOI: 10.1007/978-1-4939-8882-2_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Biological networks are a very convenient modeling and visualization tool to discover knowledge from modern high-throughput genomics and post-genomics data sets. Indeed, biological entities are not isolated but are components of complex multilevel systems. We go one step further and advocate for the consideration of causal representations of the interactions in living systems. We present the causal formalism and bring it out in the context of biological networks, when the data is observational. We also discuss its ability to decipher the causal information flow as observed in gene expression. We also illustrate our exploration by experiments on small simulated networks as well as on a real biological data set.
Collapse
|
19
|
Park SB, Chung CK, Gonzalez E, Yoo C. Causal Inference Network of Genes Related with Bone Metastasis of Breast Cancer and Osteoblasts Using Causal Bayesian Networks. J Bone Metab 2018; 25:251-266. [PMID: 30574470 PMCID: PMC6288606 DOI: 10.11005/jbm.2018.25.4.251] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/07/2018] [Revised: 10/29/2018] [Accepted: 11/02/2018] [Indexed: 12/14/2022] Open
Abstract
Background The causal networks among genes that are commonly expressed in osteoblasts and during bone metastasis (BM) of breast cancer (BC) are not well understood. Here, we developed a machine learning method to obtain a plausible causal network of genes that are commonly expressed during BM and in osteoblasts in BC. Methods We selected BC genes that are commonly expressed during BM and in osteoblasts from the Gene Expression Omnibus database. Bayesian Network Inference with Java Objects (Banjo) was used to obtain the Bayesian network. Genes registered as BC related genes were included as candidate genes in the implementation of Banjo. Next, we obtained the Bayesian structure and assessed the prediction rate for BM, conditional independence among nodes, and causality among nodes. Furthermore, we reported the maximum relative risks (RRs) of combined gene expression of the genes in the model. Results We mechanistically identified 33 significantly related and plausibly involved genes in the development of BC BM. Further model evaluations showed that 16 genes were enough for a model to be statistically significant in terms of maximum likelihood of the causal Bayesian networks (CBNs) and for correct prediction of BM of BC. Maximum RRs of combined gene expression patterns showed that the expression levels of UBIAD1, HEBP1, BTNL8, TSPO, PSAT1, and ZFP36L2 significantly affected development of BM from BC. Conclusions The CBN structure can be used as a reasonable inference network for accurately predicting BM in BC.
Collapse
Affiliation(s)
- Sung Bae Park
- Department of Neurosurgery, Seoul National University Boramae Medical Center, Seoul, Korea
| | - Chun Kee Chung
- Department of Neurosurgery, Seoul National University Hospital, Seoul National University College of Medicine, Clinical Research Institute, Seoul, Korea
| | - Efrain Gonzalez
- Department of Biostatistics, Robert Stempel College of Public Health and Social Work, Florida International University, Miami, FL, USA
| | - Changwon Yoo
- Department of Biostatistics, Robert Stempel College of Public Health and Social Work, Florida International University, Miami, FL, USA
| |
Collapse
|
20
|
Adabor ES, Acquaah-Mensah GK. Machine learning approaches to decipher hormone and HER2 receptor status phenotypes in breast cancer. Brief Bioinform 2017; 20:504-514. [DOI: 10.1093/bib/bbx138] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Revised: 09/27/2017] [Indexed: 11/14/2022] Open
Affiliation(s)
- Emmanuel S Adabor
- African Institute for Mathematical Sciences, Muizenberg, South Africa
| | - George K Acquaah-Mensah
- Massachusetts College of Pharmacy and Health Sciences, Pharmaceutical Sciences, Worcester, Massachusetts, United States
| |
Collapse
|
21
|
Yu B, Xu JM, Li S, Chen C, Chen RX, Wang L, Zhang Y, Wang MH. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method. Oncotarget 2017; 8:80373-80392. [PMID: 29113310 PMCID: PMC5655205 DOI: 10.18632/oncotarget.21268] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 08/27/2017] [Indexed: 01/31/2023] Open
Abstract
Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.
Collapse
Affiliation(s)
- Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- CAS Key Laboratory of Geospace Environment, Department of Geophysics and Planetary Science, University of Science and Technology of China, Hefei 230026, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Jia-Meng Xu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Shan Li
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Rui-Xin Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Lei Wang
- Key Laboratory of Eco-chemical Engineering, Ministry of Education, Laboratory of Inorganic Synthesis and Applied Chemistry, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao 266042, China
| | - Yan Zhang
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
- College of Electromechanical Engineering, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Ming-Hui Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| |
Collapse
|
22
|
Nam S. Databases and tools for constructing signal transduction networks in cancer. BMB Rep 2017; 50:12-19. [PMID: 27502015 PMCID: PMC5319659 DOI: 10.5483/bmbrep.2017.50.1.135] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2016] [Indexed: 12/22/2022] Open
Abstract
Traditionally, biologists have devoted their careers to studying individual biological entities of their own interest, partly due to lack of available data regarding that entity. Large, high-throughput data, too complex for conventional processing methods (i.e., “big data”), has accumulated in cancer biology, which is freely available in public data repositories. Such challenges urge biologists to inspect their biological entities of interest using novel approaches, firstly including repository data retrieval. Essentially, these revolutionary changes demand new interpretations of huge datasets at a systems-level, by so called “systems biology”. One of the representative applications of systems biology is to generate a biological network from high-throughput big data, providing a global map of molecular events associated with specific phenotype changes. In this review, we introduce the repositories of cancer big data and cutting-edge systems biology tools for network generation, and improved identification of therapeutic targets.
Collapse
Affiliation(s)
- Seungyoon Nam
- Department of Life Sciences, Gachon University, Seongnam 13120; Department of Genome Medicine and Science, College of Medicine, Gachon University; Gachon Institute of Genome Medicine and Science, Gachon University Gil Medical Center, Incheon 21565, Korea
| |
Collapse
|
23
|
Variable neighborhood search for reverse engineering of gene regulatory networks. J Biomed Inform 2016; 65:120-131. [PMID: 27919733 DOI: 10.1016/j.jbi.2016.11.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Revised: 11/16/2016] [Accepted: 11/27/2016] [Indexed: 01/08/2023]
Abstract
A new search heuristic, Divided Neighborhood Exploration Search, designed to be used with inference algorithms such as Bayesian networks to improve on the reverse engineering of gene regulatory networks is presented. The approach systematically moves through the search space to find topologies representative of gene regulatory networks that are more likely to explain microarray data. In empirical testing it is demonstrated that the novel method is superior to the widely employed greedy search techniques in both the quality of the inferred networks and computational time.
Collapse
|
24
|
|
25
|
Nemzek JA, Hodges AP, He Y. Bayesian network analysis of multi-compartmentalized immune responses in a murine model of sepsis and direct lung injury. BMC Res Notes 2015; 8:516. [PMID: 26423575 PMCID: PMC4589912 DOI: 10.1186/s13104-015-1488-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Accepted: 09/21/2015] [Indexed: 12/20/2022] Open
Abstract
Background Inflammatory disease processes involve complex and interrelated systems of mediators. Determining the causal relationships among these mediators becomes more complicated when two, concurrent inflammatory conditions occur. In those cases, the outcome may also be dependent upon the timing, severity and compartmentalization of the insults. Unfortunately, standard methods of experimentation and analysis of data sets may investigate a single scenario without uncovering many potential associations among mediators. However, Bayesian network analysis is able to model linear, nonlinear, combinatorial, and stochastic relationships among variables to explore complex inflammatory disease systems. In these studies, we modeled the development of acute lung injury from an indirect insult (sepsis induced by cecal ligation and puncture) complicated by a direct lung insult (aspiration). To replicate multiple clinical situations, the aspiration injury was delivered at different severities and at different time intervals relative to the septic insult. For each scenario, we measured numerous inflammatory cell types and cytokines in samples from the local compartments (peritoneal and bronchoalveolar lavage fluids) and the systemic compartment (plasma). We then analyzed these data by Bayesian networks and standard methods. Results Standard data analysis demonstrated that the lung injury was actually reduced when two insults were involved as compared to one lung injury alone. Bayesian network analysis determined that both the severity of lung insult and presence of sepsis influenced neutrophil recruitment and the amount of injury to the lung. However, the levels of chemoattractant cytokines responsible for neutrophil recruitment were more strongly linked to the timing and severity of the lung insult compared to the presence of sepsis. This suggests that something other than sepsis-driven exacerbation of chemokine levels was influencing the lung injury, contrary to previous theories. Conclusions To our knowledge, these studies are the first to use Bayesian networks together with experimental studies to examine the pathogenesis of sepsis-associated lung injury. Compared to standard statistical analysis and inference, these analyses elucidated more intricate relationships among the mediators, immune cells and insult-related variables (timing, compartmentalization and severity) that cause lung injury. Bayesian networks are an effective tool for evaluating complex models of inflammation.
Collapse
Affiliation(s)
- Jean A Nemzek
- Unit for Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI, 48109, USA.
| | - Andrew P Hodges
- Center for Computational Medicine and Biology, University of Michigan Medical School, Ann Arbor, MI, USA. .,Bioinformatics and Systems Biology, Sanford
- Burnham Medical Research Institute, La Jolla, CA, USA.
| | - Yongqun He
- Unit for Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI, 48109, USA. .,Center for Computational Medicine and Biology, University of Michigan Medical School, Ann Arbor, MI, USA.
| |
Collapse
|
26
|
Li F, Li M, Guan P, Ma S, Cui L. Mapping publication trends and identifying hot spots of research on Internet health information seeking behavior: a quantitative and co-word biclustering analysis. J Med Internet Res 2015; 17:e81. [PMID: 25830358 PMCID: PMC4390616 DOI: 10.2196/jmir.3326] [Citation(s) in RCA: 104] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2014] [Revised: 12/06/2014] [Accepted: 02/04/2015] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND The Internet has become an established source of health information for people seeking health information. In recent years, research on the health information seeking behavior of Internet users has become an increasingly important scholarly focus. However, there have been no long-term bibliometric studies to date on Internet health information seeking behavior. OBJECTIVE The purpose of this study was to map publication trends and explore research hot spots of Internet health information seeking behavior. METHODS A bibliometric analysis based on PubMed was conducted to investigate the publication trends of research on Internet health information seeking behavior. For the included publications, the annual publication number, the distribution of countries, authors, languages, journals, and annual distribution of highly frequent major MeSH (Medical Subject Headings) terms were determined. Furthermore, co-word biclustering analysis of highly frequent major MeSH terms was utilized to detect the hot spots in this field. RESULTS A total of 533 publications were included. The research output was gradually increasing. There were five authors who published four or more articles individually. A total of 271 included publications (50.8%) were written by authors from the United States, and 516 of the 533 articles (96.8%) were published in English. The eight most active journals published 34.1% (182/533) of the publications on this topic. Ten research hot spots were found: (1) behavior of Internet health information seeking about HIV infection or sexually transmitted diseases, (2) Internet health information seeking behavior of students, (3) behavior of Internet health information seeking via mobile phone and its apps, (4) physicians' utilization of Internet medical resources, (5) utilization of social media by parents, (6) Internet health information seeking behavior of patients with cancer (mainly breast cancer), (7) trust in or satisfaction with Web-based health information by consumers, (8) interaction between Internet utilization and physician-patient communication or relationship, (9) preference and computer literacy of people using search engines or other Web-based systems, and (10) attitude of people (especially adolescents) when seeking health information via the Internet. CONCLUSIONS The 10 major research hot spots could provide some hints for researchers when launching new projects. The output of research on Internet health information seeking behavior is gradually increasing. Compared to the United States, the relatively small number of publications indexed by PubMed from other developed and developing countries indicates to some extent that the field might be still underdeveloped in many countries. More studies on Internet health information seeking behavior could give some references for health information providers.
Collapse
Affiliation(s)
- Fan Li
- Department of Medical Informatics, China Medical University, Shenyang, China
| | | | | | | | | |
Collapse
|