1
|
Yang J, Peng Y, Ouyang D, Zhang W, Lin X, Zhao X. (p,q)-biclique counting and enumeration for large sparse bipartite graphs. THE VLDB JOURNAL : VERY LARGE DATA BASES : A PUBLICATION OF THE VLDB ENDOWMENT 2023; 32:1-25. [PMID: 37362202 PMCID: PMC10008723 DOI: 10.1007/s00778-023-00786-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 11/02/2022] [Accepted: 12/15/2022] [Indexed: 06/28/2023]
Abstract
In this paper, we study the problem of (p , q )-biclique counting and enumeration for large sparse bipartite graphs. Given a bipartite graph G = ( U , V , E ) and two integer parameters p and q, we aim to efficiently count and enumerate all (p , q )-bicliques in G, where a (p , q )-biclique B(L, R) is a complete subgraph of G with L ⊆ U , R ⊆ V , | L | = p , and | R | = q . The problem of (p , q )-biclique counting and enumeration has many applications, such as graph neural network information aggregation, densest subgraph detection, and cohesive subgroup analysis. Despite the wide range of applications, to the best of our knowledge, we note that there is no efficient and scalable solution to this problem in the literature . This problem is computationally challenging, due to the worst-case exponential number of (p , q )-bicliques. In this paper, we propose a competitive branch-and-bound baseline method, namely BCList, which explores the search space in a depth-first manner, together with a variety of pruning techniques. Although BCList offers a useful computation framework to our problem, its worst-case time complexity is exponential to p + q . To alleviate this, we propose an advanced approach, called BCList++. Particularly, BCList++ applies a layer-based exploring strategy to enumerate (p , q )-bicliques by anchoring the search on either U or V only, which has a worst-case time complexity exponential to either p or q only. Consequently, a vital task is to choose a layer with the least computation cost. To this end, we develop a cost model, which is built upon an unbiased estimator for the density of 2-hop graph induced by U or V. To improve computation efficiency, BCList++ exploits pre-allocated arrays and vertex labeling techniques such that the frequent subgraph creating operations can be substituted by array element switching operations. We conduct extensive experiments on 16 real-life datasets, and the experimental results demonstrate that BCList++ significantly outperforms the baseline methods by up to 3 orders of magnitude. We show via a case study that (p , q )-bicliques optimizes the efficiency of graph neural networks. In this paper, we extend our techniques to count and enumerate (p , q )-bicliques on uncertain bipartite graphs. An efficient method IUBCList is developed on the top of BCList++, together with a couple of pruning techniques, including common neighbor refinement and search branch early termination, to discard unpromising uncertain (p , q )-bicliques early. The experimental results demonstrate that IUBCList significantly outperforms the baseline method by up to 2 orders of magnitude.
Collapse
Affiliation(s)
| | - Yun Peng
- Guangzhou University, Guangzhou, China
| | | | - Wenjie Zhang
- The University of New South Wales, Sydney, Australia
| | - Xuemin Lin
- Shanghai Jiao Tong University, Shanghai, China
| | - Xiang Zhao
- National University of Defense Technology, Changsha, China
| |
Collapse
|
2
|
Zhao J, Sun M, Chen F, Chiu P. Understanding Missing Links in Bipartite Networks With MissBiN. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2457-2469. [PMID: 33090955 DOI: 10.1109/tvcg.2020.3032984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The analysis of bipartite networks is critical in a variety of application domains, such as exploring entity co-occurrences in intelligence analysis and investigating gene expression in bio-informatics. One important task is missing link prediction, which infers the existence of unseen links based on currently observed ones. In this article, we propose a visual analysis system, MissBiN, to involve analysts in the loop for making sense of link prediction results. MissBiN equips a novel method for link prediction in a bipartite network by leveraging the information of bi-cliques in the network. It also provides an interactive visualization for understanding the algorithm outputs. The design of MissBiN is based on three high-level analysis questions (what, why, and how) regarding missing links, which are distilled from the literature and expert interviews. We conducted quantitative experiments to assess the performance of the proposed link prediction algorithm, and interviewed two experts from different domains to demonstrate the effectiveness of MissBiN as a whole. We also provide a comprehensive usage scenario to illustrate the usefulness of the tool in an application of intelligence analysis.
Collapse
|
3
|
Aldewereld ZT, Zhang LA, Urbano A, Parker RS, Swigon D, Banerjee I, Gómez H, Clermont G. Identification of Clinical Phenotypes in Septic Patients Presenting With Hypotension or Elevated Lactate. Front Med (Lausanne) 2022; 9:794423. [PMID: 35665340 PMCID: PMC9160971 DOI: 10.3389/fmed.2022.794423] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 04/28/2022] [Indexed: 01/13/2023] Open
Abstract
Introduction Targeted therapies for sepsis have failed to show benefit due to high variability among subjects. We sought to demonstrate different phenotypes of septic shock based solely on clinical features and show that these relate to outcome. Methods A retrospective analysis was performed of a 1,023-subject cohort with early septic shock from the ProCESS trial. Twenty-three clinical variables at baseline were analyzed using hierarchical clustering, with consensus clustering used to identify and validate the ideal number of clusters in a derivation cohort of 642 subjects from 20 hospitals. Clusters were visualized using heatmaps over 0, 6, 24, and 72 h. Clinical outcomes were 14-day all-cause mortality and organ failure pattern. Cluster robustness was confirmed in a validation cohort of 381 subjects from 11 hospitals. Results Five phenotypes were identified, each with unique organ failure patterns that persisted in time. By enrollment criteria, all patients had shock. The two high-risk phenotypes were characterized by distinct multi-organ failure patterns and cytokine signatures, with the highest mortality group characterized most notably by liver dysfunction and coagulopathy while the other group exhibited primarily respiratory failure, neurologic dysfunction, and renal dysfunction. The moderate risk phenotype was that of respiratory failure, while low-risk phenotypes did not have a high degree of additional organ failure. Conclusions Sepsis phenotypes with distinct biochemical abnormalities may be identified by clinical characteristics alone and likely provide an opportunity for early clinical actionability and prognosis.
Collapse
Affiliation(s)
- Zachary T. Aldewereld
- UPMC Children's Hospital of Pittsburgh, Pittsburgh, PA, United States,Department of Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, United States,Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, United States,*Correspondence: Zachary T. Aldewereld
| | - Li Ang Zhang
- Department of Chemical and Petroleum Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, United States
| | - Alisa Urbano
- Department of Chemical and Petroleum Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, United States
| | - Robert S. Parker
- Department of Chemical and Petroleum Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, United States
| | - David Swigon
- Department of Mathematics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Ipsita Banerjee
- Department of Chemical and Petroleum Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, United States
| | - Hernando Gómez
- Department of Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Gilles Clermont
- Department of Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, United States,Department of Mathematics, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
4
|
Thieme S, Walther D. Biclique extension as an effective approach to identify missing links in metabolic compound-protein interaction networks. BIOINFORMATICS ADVANCES 2022; 2:vbac001. [PMID: 36699348 PMCID: PMC9710583 DOI: 10.1093/bioadv/vbac001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 11/26/2021] [Accepted: 01/10/2022] [Indexed: 01/28/2023]
Abstract
Motivation Metabolic networks are complex systems of chemical reactions proceeding via physical interactions between metabolites and proteins. We aimed to predict previously unknown compound-protein interactions (CPI) in metabolic networks by applying biclique extension, a network-structure-based prediction method. Results We developed a workflow, named BiPredict, to predict CPIs based on biclique extension and applied it to Escherichia coli and human using their respective known CPI networks as input. Depending on the chosen biclique size and using a STITCH-derived E.coli CPI network as input, a sensitivity of 39% and an associated precision of 59% was reached. For the larger human STITCH network, a sensitivity of 78% with a false-positive rate of <5% and precision of 75% was obtained. High performance was also achieved when using KEGG metabolic-reaction networks as input. Prediction performance significantly exceeded that of randomized controls and compared favorably to state-of-the-art deep-learning methods. Regarding metabolic process involvement, TCA-cycle and ribosomal processes were found enriched among predicted interactions. BiPredict can be used for network curation, may help increase the efficiency of experimental testing of CPIs, and can readily be applied to other species. Availability and implementation BiPredict and related datasets are available at https://github.com/SandraThieme/BiPredict. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Sandra Thieme
- Max Planck Institute of Molecular Plant Physiology, Potsdam 14476, Germany
| | - Dirk Walther
- Max Planck Institute of Molecular Plant Physiology, Potsdam 14476, Germany,To whom correspondence should be addressed.
| |
Collapse
|
5
|
Zhang J, Liu L, Xu T, Zhang W, Zhao C, Li S, Li J, Rao N, Le TD. Exploring cell-specific miRNA regulation with single-cell miRNA-mRNA co-sequencing data. BMC Bioinformatics 2021; 22:578. [PMID: 34856921 PMCID: PMC8641245 DOI: 10.1186/s12859-021-04498-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 11/19/2021] [Indexed: 11/13/2022] Open
Abstract
Background Existing computational methods for studying miRNA regulation are mostly based on bulk miRNA and mRNA expression data. However, bulk data only allows the analysis of miRNA regulation regarding a group of cells, rather than the miRNA regulation unique to individual cells. Recent advance in single-cell miRNA-mRNA co-sequencing technology has opened a way for investigating miRNA regulation at single-cell level. However, as currently single-cell miRNA-mRNA co-sequencing data is just emerging and only available at small-scale, there is a strong need of novel methods to exploit existing single-cell data for the study of cell-specific miRNA regulation. Results In this work, we propose a new method, CSmiR (Cell-Specific miRNA regulation) to combine single-cell miRNA-mRNA co-sequencing data and putative miRNA-mRNA binding information to identify miRNA regulatory networks at the resolution of individual cells. We apply CSmiR to the miRNA-mRNA co-sequencing data in 19 K562 single-cells to identify cell-specific miRNA-mRNA regulatory networks for understanding miRNA regulation in each K562 single-cell. By analyzing the obtained cell-specific miRNA-mRNA regulatory networks, we observe that the miRNA regulation in each K562 single-cell is unique. Moreover, we conduct detailed analysis on the cell-specific miRNA regulation associated with the miR-17/92 family as a case study. The comparison results indicate that CSmiR is effective in predicting cell-specific miRNA targets. Finally, through exploring cell–cell similarity matrix characterized by cell-specific miRNA regulation, CSmiR provides a novel strategy for clustering single-cells and helps to understand cell–cell crosstalk. Conclusions To the best of our knowledge, CSmiR is the first method to explore miRNA regulation at a single-cell resolution level, and we believe that it can be a useful method to enhance the understanding of cell-specific miRNA regulation. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04498-6.
Collapse
Affiliation(s)
- Junpeng Zhang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, Sichuan, China. .,School of Engineering, Dali University, Dali, 671003, Yunnan, China.
| | - Lin Liu
- UniSA STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Taosheng Xu
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, 230031, China
| | - Wu Zhang
- School of Agriculture and Biological Sciences, Dali University, Dali, 671003, Yunnan, China
| | - Chunwen Zhao
- School of Engineering, Dali University, Dali, 671003, Yunnan, China
| | - Sijing Li
- School of Engineering, Dali University, Dali, 671003, Yunnan, China
| | - Jiuyong Li
- UniSA STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Nini Rao
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, Sichuan, China.
| | - Thuc Duy Le
- UniSA STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia.
| |
Collapse
|
6
|
Puelz D, Basse G, Feller A, Toulis P. A graph‐theoretic approach to randomization tests of causal effects under general interference. J R Stat Soc Series B Stat Methodol 2021. [DOI: 10.1111/rssb.12478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- David Puelz
- McCombs School of Business and Salem Center for Policy The University of Texas at Austin Austin Texas USA
| | | | - Avi Feller
- The University of California Berkeley California USA
| | - Panos Toulis
- Booth School of Business The University of Chicago Chicago Illinois USA
| |
Collapse
|
7
|
Zhao X, Ji J, Wang S, Wang R, Yu Q, Li D. The regulatory pattern of target gene expression by aberrant enhancer methylation in glioblastoma. BMC Bioinformatics 2021; 22:420. [PMID: 34482818 PMCID: PMC8420065 DOI: 10.1186/s12859-021-04345-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/23/2021] [Indexed: 12/21/2022] Open
Abstract
Background Glioblastoma multiforme (GBM) is the most common and aggressive primary malignant brain tumor with grim prognosis. Aberrant DNA methylation is an epigenetic mechanism that promotes GBM carcinogenesis, while the function of DNA methylation at enhancer regions in GBM remains poorly described. Results We integrated multi-omics data to identify differential methylation enhancer region (DMER)-genes and revealed global enhancer hypomethylation in GBM. In addition, a DMER-mediated target genes regulatory network and functional enrichment analysis of target genes that might be regulated by hypomethylation enhancer regions showed that aberrant enhancer regions could contribute to tumorigenesis and progression in GBM. Further, we identified 22 modules in which lncRNAs and mRNAs synergistically competed with each other. Finally, through the construction of drug-target association networks, our study identified potential small-molecule drugs for GBM treatment. Conclusions Our study provides novel insights for understanding the regulation of aberrant enhancer region methylation and developing methylation-based biomarkers for the diagnosis and treatment of GBM. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04345-8.
Collapse
Affiliation(s)
- Xiaoxiao Zhao
- School of Biomedical Engineering, Capital Medical University, 10 You An Men Wai, Xi Tou Tiao, Beijing, 100069, People's Republic of China.,Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical, Capital Medical University, Beijing, 100069, People's Republic of China
| | - Jianghuai Ji
- Department of Radiation Physics, Zhejiang Cancer Hospital, Hangzhou, 310022, People's Republic of China.,Zhejiang Key Laboratory of Radiation Oncology, Hangzhou, 310022, People's Republic of China
| | - Shijia Wang
- School of Biomedical Engineering, Capital Medical University, 10 You An Men Wai, Xi Tou Tiao, Beijing, 100069, People's Republic of China.,Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical, Capital Medical University, Beijing, 100069, People's Republic of China
| | - Rendong Wang
- School of Biomedical Engineering, Capital Medical University, 10 You An Men Wai, Xi Tou Tiao, Beijing, 100069, People's Republic of China.,Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical, Capital Medical University, Beijing, 100069, People's Republic of China
| | - Qiuhong Yu
- Department of Hyperbaric Oxygen, Beijing Tiantan Hospital, Capital Medical University, 119 Nansihuan Xi Lu, Fengtai District, Beijing, 100070, People's Republic of China.
| | - Dongguo Li
- School of Biomedical Engineering, Capital Medical University, 10 You An Men Wai, Xi Tou Tiao, Beijing, 100069, People's Republic of China. .,Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical, Capital Medical University, Beijing, 100069, People's Republic of China.
| |
Collapse
|
8
|
Jha K, Xun G, Zhang A. Continual Representation Learning For Evolving Biomedical Bipartite Networks. Bioinformatics 2021; 37:2190-2197. [PMID: 33532833 DOI: 10.1093/bioinformatics/btab067] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Revised: 12/14/2020] [Accepted: 01/27/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Many real-world biomedical interactions such as 'gene-disease', 'disease-symptom', and 'drug-target' are modeled as a bipartite network structure. Learning meaningful representations for such networks is a fundamental problem in the research area of Network Representation Learning (NRL). NRL approaches aim to translate the network structure into low-dimensional vector representations that are useful to a variety of biomedical applications. Despite significant advances, the existing approaches still have certain limitations. First, a majority of these approaches do not model the unique topological properties of bipartite networks. Consequently, their straightforward application to the bipartite graphs yields unsatisfactory results. Second, the existing approaches typically learn representations from static networks. This is limiting for the biomedical bipartite networks that evolve at a rapid pace, and thus necessitate the development of approaches that can update the representations in an online fashion. RESULTS In this research, we propose a novel representation learning approach that accurately preserves the intricate bipartite structure, and efficiently updates the node representations. Specifically, we design a customized autoencoder that captures the proximity relationship between nodes participating in the bipartite bicliques (2 × 2 sub-graph), while preserving both the global and local structures. Moreover, the proposed structure-preserving technique is carefully interleaved with the central tenets of continual machine learning to design an incremental learning strategy that updates the node representations in an online manner. Taken together, the proposed approach produces meaningful representations with high fidelity and computational efficiency. Extensive experiments conducted on several biomedical bipartite networks validate the effectiveness and rationality of the proposed approach. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kishlay Jha
- Department of Computer Science, University of Virginia, Charlottesville, VA, 22904, USA
| | - Guangxu Xun
- Department of Computer Science, University of Virginia, Charlottesville, VA, 22904, USA
| | - Aidong Zhang
- Department of Computer Science, University of Virginia, Charlottesville, VA, 22904, USA
| |
Collapse
|
9
|
Xiong C, Sun S, Jiang W, Ma L, Zhang J. ASDmiR: A Stepwise Method to Uncover miRNA Regulation Related to Autism Spectrum Disorder. Front Genet 2020; 11:562971. [PMID: 33173536 PMCID: PMC7591752 DOI: 10.3389/fgene.2020.562971] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Accepted: 08/31/2020] [Indexed: 12/14/2022] Open
Abstract
Autism spectrum disorder (ASD) is a class of neurodevelopmental disorders characterized by genetic and environmental risk factors. The pathogenesis of ASD has a strong genetic basis, consisting of rare de novo or inherited variants among a variety of multiple molecules. Previous studies have shown that microRNAs (miRNAs) are involved in neurogenesis and brain development and are closely associated with the pathogenesis of ASD. However, the regulatory mechanisms of miRNAs in ASD are largely unclear. In this work, we present a stepwise method, ASDmiR, for the identification of underlying pathogenic genes, networks, and modules associated with ASD. First, we conduct a comparison study on 12 miRNA target prediction methods by using the matched miRNA, lncRNA, and mRNA expression data in ASD. In terms of the number of experimentally confirmed miRNA-target interactions predicted by each method, we choose the best method for identifying miRNA-target regulatory network. Based on the miRNA-target interaction network identified by the best method, we further infer miRNA-target regulatory bicliques or modules. In addition, by integrating high-confidence miRNA-target interactions and gene expression data, we identify three types of networks, including lncRNA-lncRNA, lncRNA-mRNA, and mRNA-mRNA related miRNA sponge interaction networks. To reveal the community of miRNA sponges, we further infer miRNA sponge modules from the identified miRNA sponge interaction network. Functional analysis results show that the identified hub genes, as well as miRNA-associated networks and modules, are closely linked with ASD. ASDmiR is freely available at https://github.com/chenchenxiong/ASDmiR.
Collapse
Affiliation(s)
- Chenchen Xiong
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| | - Shaoping Sun
- Department of Medical Engineering, People's Hospital of Yuxi City, Yuxi, China
| | - Weili Jiang
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| | - Lei Ma
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| | | |
Collapse
|
10
|
Bubier JA, Philip VM, Dickson PE, Mittleman G, Chesler EJ. Discovery of a Role for Rab3b in Habituation and Cocaine Induced Locomotor Activation in Mice Using Heterogeneous Functional Genomic Analysis. Front Neurosci 2020; 14:721. [PMID: 32742255 PMCID: PMC7364128 DOI: 10.3389/fnins.2020.00721] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 06/16/2020] [Indexed: 12/21/2022] Open
Abstract
Substance use disorders are prevalent and present a tremendous societal cost but the mechanisms underlying addiction behavior are poorly understood and few biological treatments exist. One strategy to identify novel molecular mechanisms of addiction is through functional genomic experimentation. However, results from individual experiments are often noisy. To address this problem, the convergent analysis of multiple genomic experiments can discern signal from these studies. In the present study, we examine genetic loci that modulate the locomotor response to cocaine identified in the recombinant inbred (BXD RI) genetic reference population. We then applied the GeneWeaver software system for heterogeneous functional genomic analysis to integrate and aggregate multiple studies of addiction genomics, resulting in the identification of Rab3b as a functional correlate of the locomotor response to cocaine in rodents. This gene encodes a member of the RAB family of Ras-like GTPases known to be involved in trafficking of secretory and endocytic vesicles in eukaryotic cells. The convergent evidence for a role of Rab3b includes co-occurrence in previously published genetic mapping studies of cocaine related behaviors; methamphetamine response and cocaine- and amphetamine-regulated transcript prepropeptide (Cartpt) transcript abundance; evidence related to other addictive substances; density of polymorphisms; and its expression pattern in reward pathways. To evaluate this finding, we examined the effect of RAB3 complex perturbation in cocaine response. B6;129-Rab3btm1Sud Rab3ctm1sud Rab3dtm1sud triple null mice (Rab3bcd -/-) exhibited significant deficits in habituation, and increased acute and repeated cocaine responses. This previously unidentified mechanism of the behavioral predisposition and response to cocaine is an example of many that can be identified and validated using aggregate genomic studies.
Collapse
Affiliation(s)
| | | | - Price E. Dickson
- The Jackson Laboratory, Bar Harbor, ME, United States
- Department of Biomedical Sciences, Marshall University, Huntington, WV, United States
| | - Guy Mittleman
- Department of Psychological Science, Ball State University, Muncie, IN, United States
| | | |
Collapse
|
11
|
Guo Q, Wang J, Gao Y, Li X, Hao Y, Ning S, Wang P. Dynamic TF-lncRNA Regulatory Networks Revealed Prognostic Signatures in the Development of Ovarian Cancer. Front Bioeng Biotechnol 2020; 8:460. [PMID: 32478062 PMCID: PMC7237576 DOI: 10.3389/fbioe.2020.00460] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 04/21/2020] [Indexed: 12/15/2022] Open
Abstract
The pathological development of ovarian cancer (OC) is a complex progression that depends on multiple alterations of coding and non-coding genes. Therefore, it is important to capture the transcriptional-regulating events during the progression of OC development and to identify reliable markers for predicting clinical outcomes in patients. A dataset of 399 ovarian serous cystadenocarcinoma patients at different stages from The Cancer Genome Atlas (TCGA) was analyzed. Stage-specific transcription factor (TF)-long non-coding RNA (lncRNA) regulatory networks were constructed by integrating high-throughput RNA molecular profiles and TF binding information. Systematic analysis was performed to characterize the TF-lncRNA-regulating behaviors across different stages of OC. Cox regression analysis and Kaplan-Meier survival curves were used to evaluate the prognostic efficiency of TF-lncRNA regulations and cliques. The stage-specific TF-lncRNA regulatory networks at three OC stages (II, III, and IV) exhibited common structures and specific topologies of risk TFs and lncRNAs. A TF-lncRNA activity profile across different stages revealed that TFs were highly stage-selective in regulating lncRNAs. Functional analysis indicated that groups of TF-lncRNA interactions were involved in specific pathological processes in the development of OC. In a STAT3-FOS co-regulating clique, the TFs STAT3 and FOS were selectively regulating target lncRNAs across different OC stages. Further survival analysis indicated that this TF-lncRNA biclique may have the potential for predicting OC prognosis. This study revealed the topological and dynamic principles of TF-lncRNA regulatory networks and provided a resource for further analysis of stage-specific regulating mechanisms of OC.
Collapse
Affiliation(s)
- Qiuyan Guo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.,Department of Gynecology, The First Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Junwei Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.,Department of Respiratory Medicine, The Second Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Yue Gao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xin Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yangyang Hao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shangwei Ning
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Peng Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
12
|
Lu Y, Phillips CA, Langston MA. Biclique: an R package for maximal biclique enumeration in bipartite graphs. BMC Res Notes 2020; 13:88. [PMID: 32085812 PMCID: PMC7035696 DOI: 10.1186/s13104-020-04955-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 02/12/2020] [Indexed: 12/02/2022] Open
Abstract
OBJECTIVE Bipartite graphs are widely used to model relationships between pairs of heterogeneous data types. Maximal bicliques are foundational structures in such graphs, and their enumeration is an important task in systems biology, epidemiology and many other problem domains. Thus, there is a need for an efficient, general purpose, publicly available tool to enumerate maximal bicliques in bipartite graphs. The statistical programming language R is a logical choice for such a tool, but until now no R package has existed for this purpose. Our objective is to provide such a package, so that the research community can more easily perform this computationally demanding task. RESULTS Biclique is an R package that takes as input a bipartite graph and produces a listing of all maximal bicliques in this graph. Input and output formats are straightforward, with examples provided both in this paper and in the package documentation. Biclique employs a state-of-the-art algorithm previously developed for basic research in functional genomics. This package, along with its source code and reference manual, are freely available from the CRAN public repository at https://cran.r-project.org/web/packages/biclique/index.html.
Collapse
Affiliation(s)
- Yuping Lu
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
| | - Charles A. Phillips
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996-2250 USA
| | - Michael A. Langston
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996-2250 USA
| |
Collapse
|
13
|
Barabási DL, Barabási AL. A Genetic Model of the Connectome. Neuron 2020; 105:435-445.e5. [PMID: 31806491 PMCID: PMC7007360 DOI: 10.1016/j.neuron.2019.10.031] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 10/07/2019] [Accepted: 10/24/2019] [Indexed: 11/18/2022]
Abstract
The connectomes of organisms of the same species show remarkable architectural and often local wiring similarity, raising the question: where and how is neuronal connectivity encoded? Here, we start from the hypothesis that the genetic identity of neurons guides synapse and gap-junction formation and show that such genetically driven wiring predicts the existence of specific biclique motifs in the connectome. We identify a family of large, statistically significant biclique subgraphs in the connectomes of three species and show that within many of the observed bicliques the neurons share statistically significant expression patterns and morphological characteristics, supporting our expectation of common genetic factors that drive the synapse formation within these subgraphs. The proposed connectome model offers a self-consistent framework to link the genetics of an organism to the reproducible architecture of its connectome, offering experimentally falsifiable predictions on the genetic factors that drive the formation of individual neuronal circuits.
Collapse
Affiliation(s)
| | - Albert-László Barabási
- Network Science Institute, Northeastern University, Boston, MA 02115, USA; Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA; Department of Data and Network Science, Central European University, Budapest 1051, Hungary.
| |
Collapse
|
14
|
Zhang J, Pham VVH, Liu L, Xu T, Truong B, Li J, Rao N, Le TD. Identifying miRNA synergism using multiple-intervention causal inference. BMC Bioinformatics 2019; 20:613. [PMID: 31881825 PMCID: PMC6933624 DOI: 10.1186/s12859-019-3215-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 11/12/2019] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Studying multiple microRNAs (miRNAs) synergism in gene regulation could help to understand the regulatory mechanisms of complicated human diseases caused by miRNAs. Several existing methods have been presented to infer miRNA synergism. Most of the current methods assume that miRNAs with shared targets at the sequence level are working synergistically. However, it is unclear if miRNAs with shared targets are working in concert to regulate the targets or they individually regulate the targets at different time points or different biological processes. A standard method to test the synergistic activities is to knock-down multiple miRNAs at the same time and measure the changes in the target genes. However, this approach may not be practical as we would have too many sets of miRNAs to test. RESULTS n this paper, we present a novel framework called miRsyn for inferring miRNA synergism by using a causal inference method that mimics the multiple-intervention experiments, e.g. knocking-down multiple miRNAs, with observational data. Our results show that several miRNA-miRNA pairs that have shared targets at the sequence level are not working synergistically at the expression level. Moreover, the identified miRNA synergistic network is small-world and biologically meaningful, and a number of miRNA synergistic modules are significantly enriched in breast cancer. Our further analyses also reveal that most of synergistic miRNA-miRNA pairs show the same expression patterns. The comparison results indicate that the proposed multiple-intervention causal inference method performs better than the single-intervention causal inference method in identifying miRNA synergistic network. CONCLUSIONS Taken together, the results imply that miRsyn is a promising framework for identifying miRNA synergism, and it could enhance the understanding of miRNA synergism in breast cancer.
Collapse
Affiliation(s)
- Junpeng Zhang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, Sichuan, China.,School of Engineering, Dali University, Dali, 671003, Yunnan, China
| | - Vu Viet Hoang Pham
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Lin Liu
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Taosheng Xu
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| | - Buu Truong
- Pham Ngoc Thach University of Medicine, Ho Chi Minh, Vietnam
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Nini Rao
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, Sichuan, China.
| | - Thuc Duy Le
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA, 5095, Australia.
| |
Collapse
|
15
|
Dey L, Mukhopadhyay A. A Graph-Based Approach for Finding the Dengue Infection Pathways in Humans Using Protein-Protein Interactions. J Comput Biol 2019; 27:755-768. [PMID: 31486690 DOI: 10.1089/cmb.2019.0171] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Dengue virus (DENV) is one of the deadly arboviruses, which is primarily transmitted by Aedes aegypti, and causes dengue infection to the humans. According to WHO, every year around 390 million humans are affected by DENV, of which around 50 million deaths are reported. Knowledge of the various diseases caused by the DENV would greatly encourage to understand the infection mechanism and help to design new antiviral drug discovery. We propose a quasi-clique and quasi-biclique algorithm to classify infection gateway proteins of the human body and possible pathways of DENV leading to various diseases. For this, we have examined three networks, dengue-human protein-protein interaction network, human protein interaction network, and human proteins-disease association network. The prediction result states that DENV may lead to various diseases in the human body, including cancer, asthma, ulcerative colitis, multiple sclerosis, premature birth, and so on. Some of the results have recently been validated experimentally. This study may endow with potential targets for more effective anti-dengue remedial contribution.
Collapse
Affiliation(s)
- Lopamudra Dey
- Department of Computer Science and Engineering, Heritage Institute of Technology, Kolkata, India
| | - Anirban Mukhopadhyay
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, India
| |
Collapse
|
16
|
Li Q, Yu Q, Ji J, Wang P, Li D. Comparison and analysis of lncRNA-mediated ceRNA regulation in different molecular subtypes of glioblastoma. Mol Omics 2019; 15:406-419. [DOI: 10.1039/c9mo00126c] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
LncRNA-mediated ceRNA regulation varies among different molecular subtypes in glioblastoma.
Collapse
Affiliation(s)
- Qianpeng Li
- School of Biomedical Engineering
- Capital Medical University
- Beijing 100069
- People's Republic of China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical
| | - Qiuhong Yu
- Department of Hyperbaric Oxygen, Beijing Tiantan Hospital, Capital Medical University
- Beijing 100070
- People's Republic of China
| | - Jianghuai Ji
- School of Biomedical Engineering
- Capital Medical University
- Beijing 100069
- People's Republic of China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical
| | - Peng Wang
- College of Bioinformatics Science and Technology
- Harbin Medical University
- Harbin 150081
- People's Republic of China
| | - Dongguo Li
- School of Biomedical Engineering
- Capital Medical University
- Beijing 100069
- People's Republic of China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical
| |
Collapse
|
17
|
Phillips CA, Wang K, Baker EJ, Bubier JA, Chesler EJ, Langston MA. On Finding and Enumerating Maximal and Maximum k-Partite Cliques in k-Partite Graphs. ALGORITHMS 2019; 12:23. [PMID: 31448059 PMCID: PMC6707360 DOI: 10.3390/a12010023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Let k denote an integer greater than 2, let G denote a k-partite graph, and let S denote the set of all maximal k-partite cliques in G. Several open questions concerning the computation of S are resolved. A straightforward and highly-scalable modification to the classic recursive backtracking approach of Bron and Kerbosch is first described and shown to run in O(3 n/3) time. A series of novel graph constructions is then used to prove that this bound is best possible in the sense that it matches an asymptotically tight upper limit on |S|. The task of identifying a vertex-maximum element of S is also considered and, in contrast with the k = 2 case, shown to be NP-hard for every k ≥ 3. A special class of k-partite graphs that arises in the context of functional genomics and other problem domains is studied as well and shown to be more readily solvable via a polynomial-time transformation to bipartite graphs. Applications, limitations, potentials for faster methods, heuristic approaches, and alternate formulations are also addressed.
Collapse
Affiliation(s)
- Charles A. Phillips
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996, USA
| | - Kai Wang
- Department of Computer Science, Georgia Southern University, Statesboro, GA 30460, USA
| | - Erich J. Baker
- Computer Science Department, Baylor University, Waco, TX 76798, USA
| | | | | | - Michael A. Langston
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996, USA
| |
Collapse
|
18
|
Sreeja A, Vinayan KP. Multidimensional knowledge-based framework is an essential step in the categorization of gene sets in complex disorders. J Bioinform Comput Biol 2017; 15:1750022. [DOI: 10.1142/s0219720017500226] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In complex disorders, collaborative role of several genes accounts for the multitude of symptoms and the discovery of molecular mechanisms requires proper understanding of pertinent genes. Majority of the recent techniques utilize either single information or consolidate the independent outlook from multiple knowledge sources for assisting the discovery of candidate genes. In any case, given that various sorts of heterogeneous sources are possibly significant for quality gene prioritization, every source bearing data not conveyed by another, we assert that a perfect strategy ought to give approaches to observe among them in a genuine integrative style that catches the degree of each, instead of utilizing a straightforward mix of sources. We propose a flexible approach that empowers multi-source information reconciliation for quality gene prioritization that augments the complementary nature of various learning sources so as to utilize the maximum information of aggregated data. To illustrate the proposed approach, we took Autism Spectrum Disorder (ASD) as a case study and validated the framework on benchmark studies. We observed that the combined ranking based on integrated knowledge reduces the false positive observations and boosts the performance when compared with individual rankings. The clinical phenotype validation for ASD shows that there is a significant linkage between top positioned genes and endophenotypes of ASD. Categorization of genes based on endophenotype associations by this method will be useful for further hypothesis generation leading to clinical and translational analysis. This approach may also be useful in other complex neurological and psychiatric disorders with a strong genetic component.
Collapse
Affiliation(s)
- A. Sreeja
- Department of Computer Science & IT, School of Arts and Sciences, Amrita University, Kochi, Kerala, India
| | - K. P. Vinayan
- Division of Paediatric Neurology, Department of Neurology, Amrita Institute of Medical Sciences, Amrita University, Kochi, Kerala, India
| |
Collapse
|
19
|
Kang M, Park J, Kim DC, Biswas AK, Liu C, Gao J. Multi-Block Bipartite Graph for Integrative Genomic Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1350-1358. [PMID: 27429442 DOI: 10.1109/tcbb.2016.2591521] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Human diseases involve a sequence of complex interactions between multiple biological processes. In particular, multiple genomic data such as Single Nucleotide Polymorphism (SNP), Copy Number Variation (CNV), DNA Methylation (DM), and their interactions simultaneously play an important role in human diseases. However, despite the widely known complex multi-layer biological processes and increased availability of the heterogeneous genomic data, most research has considered only a single type of genomic data. Furthermore, recent integrative genomic studies for the multiple genomic data have also been facing difficulties due to the high-dimensionality and complexity, especially when considering their intra- and inter-block interactions. In this paper, we introduce a novel multi-block bipartite graph and its inference methods, MB2I and sMB2I, for the integrative genomic study. The proposed methods not only integrate multiple genomic data but also incorporate intra/inter-block interactions by using a multi-block bipartite graph. In addition, the methods can be used to predict quantitative traits (e.g., gene expression, survival time) from the multi-block genomic data. The performance was assessed by simulation experiments that implement practical situations. We also applied the method to the human brain data of psychiatric disorders. The experimental results were analyzed by maximum edge biclique and biclustering, and biological findings were discussed.
Collapse
|
20
|
Walsh CJ, Hu P, Batt J, Dos Santos CC. Discovering MicroRNA-Regulatory Modules in Multi-Dimensional Cancer Genomic Data: A Survey of Computational Methods. Cancer Inform 2016; 15:25-42. [PMID: 27721651 PMCID: PMC5051584 DOI: 10.4137/cin.s39369] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Revised: 08/14/2016] [Accepted: 08/16/2016] [Indexed: 12/20/2022] Open
Abstract
MicroRNAs (miRs) are small single-stranded noncoding RNA that function in RNA silencing and post-transcriptional regulation of gene expression. An increasing number of studies have shown that miRs play an important role in tumorigenesis, and understanding the regulatory mechanism of miRs in this gene regulatory network will help elucidate the complex biological processes at play during malignancy. Despite advances, determination of miR–target interactions (MTIs) and identification of functional modules composed of miRs and their specific targets remain a challenge. A large amount of data generated by high-throughput methods from various sources are available to investigate MTIs. The development of data-driven tools to harness these multi-dimensional data has resulted in significant progress over the past decade. In parallel, large-scale cancer genomic projects are allowing new insights into the commonalities and disparities of miR–target regulation across cancers. In the first half of this review, we explore methods for identification of pairwise MTIs, and in the second half, we explore computational tools for discovery of miR-regulatory modules in a cancer-specific and pan-cancer context. We highlight strengths and limitations of each of these tools as a practical guide for the computational biologists.
Collapse
Affiliation(s)
- Christopher J Walsh
- Keenan and Li Ka Shing Knowledge Institute of Saint Michael's Hospital, Toronto, ON, Canada.; Institute of Medical Sciences and Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Pingzhao Hu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada
| | - Jane Batt
- Keenan and Li Ka Shing Knowledge Institute of Saint Michael's Hospital, Toronto, ON, Canada.; Institute of Medical Sciences and Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Claudia C Dos Santos
- Keenan and Li Ka Shing Knowledge Institute of Saint Michael's Hospital, Toronto, ON, Canada.; Institute of Medical Sciences and Department of Medicine, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
21
|
Platig J, Castaldi PJ, DeMeo D, Quackenbush J. Bipartite Community Structure of eQTLs. PLoS Comput Biol 2016; 12:e1005033. [PMID: 27618581 PMCID: PMC5019382 DOI: 10.1371/journal.pcbi.1005033] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Accepted: 06/23/2016] [Indexed: 11/18/2022] Open
Abstract
Genome Wide Association Studies (GWAS) and expression quantitative trait locus (eQTL) analyses have identified genetic associations with a wide range of human phenotypes. However, many of these variants have weak effects and understanding their combined effect remains a challenge. One hypothesis is that multiple SNPs interact in complex networks to influence functional processes that ultimately lead to complex phenotypes, including disease states. Here we present CONDOR, a method that represents both cis- and trans-acting SNPs and the genes with which they are associated as a bipartite graph and then uses the modular structure of that graph to place SNPs into a functional context. In applying CONDOR to eQTLs in chronic obstructive pulmonary disease (COPD), we found the global network “hub” SNPs were devoid of disease associations through GWAS. However, the network was organized into 52 communities of SNPs and genes, many of which were enriched for genes in specific functional classes. We identified local hubs within each community (“core SNPs”) and these were enriched for GWAS SNPs for COPD and many other diseases. These results speak to our intuition: rather than single SNPs influencing single genes, we see groups of SNPs associated with the expression of families of functionally related genes and that disease SNPs are associated with the perturbation of those functions. These methods are not limited in their application to COPD and can be used in the analysis of a wide variety of disease processes and other phenotypic traits. Large-scale studies have identified thousands of genetic variants associated with different phenotypes without explaining their function. Expression quantitative trait locus analysis associates the compendium of genetic variants with expression levels of individual genes, providing the opportunity to link those variants to functions. But the complexity of those associations has caused most analyses to focus solely on genetic variants immediately adjacent to the genes they may influence. We describe a method that embraces the complexity, representing all variant-gene associations as a bipartite graph. The graph contains highly modular, functional communities in which disease-associated variants emerge as those likely to perturb the structure of the network and the function of the genes in these communities.
Collapse
Affiliation(s)
- John Platig
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard Chan School of Public Health, Boston, Massachusetts, United States of America
- * E-mail:
| | - Peter J. Castaldi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Division of General Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Dawn DeMeo
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - John Quackenbush
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard Chan School of Public Health, Boston, Massachusetts, United States of America
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| |
Collapse
|
22
|
Alonso R, Monroy R, Trejo LA. Mining IP to Domain Name Interactions to Detect DNS Flood Attacks on Recursive DNS Servers. SENSORS 2016; 16:s16081311. [PMID: 27548169 PMCID: PMC5017476 DOI: 10.3390/s16081311] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 08/09/2016] [Accepted: 08/13/2016] [Indexed: 11/16/2022]
Abstract
The Domain Name System (DNS) is a critical infrastructure of any network, and, not surprisingly a common target of cybercrime. There are numerous works that analyse higher level DNS traffic to detect anomalies in the DNS or any other network service. By contrast, few efforts have been made to study and protect the recursive DNS level. In this paper, we introduce a novel abstraction of the recursive DNS traffic to detect a flooding attack, a kind of Distributed Denial of Service (DDoS). The crux of our abstraction lies on a simple observation: Recursive DNS queries, from IP addresses to domain names, form social groups; hence, a DDoS attack should result in drastic changes on DNS social structure. We have built an anomaly-based detection mechanism, which, given a time window of DNS usage, makes use of features that attempt to capture the DNS social structure, including a heuristic that estimates group composition. Our detection mechanism has been successfully validated (in a simulated and controlled setting) and with it the suitability of our abstraction to detect flooding attacks. To the best of our knowledge, this is the first time that work is successful in using this abstraction to detect these kinds of attacks at the recursive level. Before concluding the paper, we motivate further research directions considering this new abstraction, so we have designed and tested two additional experiments which exhibit promising results to detect other types of anomalies in recursive DNS servers.
Collapse
Affiliation(s)
- Roberto Alonso
- Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Carretera al Lago de Guadalupe Km. 3.5, Atizapán, Estado de México 52926, Mexico.
- Department of Informatics, Technical University of Munich, Boltzmannstr. 3, 85748 Garching, Germany.
| | - Raúl Monroy
- Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Carretera al Lago de Guadalupe Km. 3.5, Atizapán, Estado de México 52926, Mexico.
| | - Luis A Trejo
- Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Carretera al Lago de Guadalupe Km. 3.5, Atizapán, Estado de México 52926, Mexico.
| |
Collapse
|
23
|
Saracco F, Di Clemente R, Gabrielli A, Squartini T. Detecting early signs of the 2007-2008 crisis in the world trade. Sci Rep 2016; 6:30286. [PMID: 27461469 PMCID: PMC4962096 DOI: 10.1038/srep30286] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 06/24/2016] [Indexed: 11/09/2022] Open
Abstract
Since 2007, several contributions have tried to identify early-warning signals of the financial crisis. However, the vast majority of analyses has focused on financial systems and little theoretical work has been done on the economic counterpart. In the present paper we fill this gap and employ the theoretical tools of network theory to shed light on the response of world trade to the financial crisis of 2007 and the economic recession of 2008-2009. We have explored the evolution of the bipartite World Trade Web (WTW) across the years 1995-2010, monitoring the behavior of the system both before and after 2007. Our analysis shows early structural changes in the WTW topology: since 2003, the WTW becomes increasingly compatible with the picture of a network where correlations between countries and products are progressively lost. Moreover, the WTW structural modification can be considered as concluded in 2010, after a seemingly stationary phase of three years. We have also refined our analysis by considering specific subsets of countries and products: the most statistically significant early-warning signals are provided by the most volatile macrosectors, especially when measured on developing countries, suggesting the emerging economies as being the most sensitive ones to the global economic cycles.
Collapse
Affiliation(s)
- Fabio Saracco
- IMT School for Advanced Studies, Lucca, 55100, Italy
- Istituto dei Sistemi Complessi (ISC) - CNR, Dipartimento di Fisica, Università “Sapienza”, Rome, 00185, Italy
| | - Riccardo Di Clemente
- Istituto dei Sistemi Complessi (ISC) - CNR, Dipartimento di Fisica, Università “Sapienza”, Rome, 00185, Italy
- Massachusetts Institute of Technology, Department of Civil and Environmental Engineering, Cambridge, 02139, MA, USA
| | - Andrea Gabrielli
- IMT School for Advanced Studies, Lucca, 55100, Italy
- Istituto dei Sistemi Complessi (ISC) - CNR, Dipartimento di Fisica, Università “Sapienza”, Rome, 00185, Italy
| | - Tiziano Squartini
- IMT School for Advanced Studies, Lucca, 55100, Italy
- Istituto dei Sistemi Complessi (ISC) - CNR, Dipartimento di Fisica, Università “Sapienza”, Rome, 00185, Italy
| |
Collapse
|
24
|
Bubier JA, Wilcox TD, Jay JJ, Langston MA, Baker EJ, Chesler EJ. Cross-Species Integrative Functional Genomics in GeneWeaver Reveals a Role for Pafah1b1 in Altered Response to Alcohol. Front Behav Neurosci 2016; 10:1. [PMID: 26834590 PMCID: PMC4720795 DOI: 10.3389/fnbeh.2016.00001] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Accepted: 01/04/2016] [Indexed: 12/12/2022] Open
Abstract
Identifying the biological substrates of complex neurobehavioral traits such as alcohol dependency pose a tremendous challenge given the diverse model systems and phenotypic assessments used. To address this problem we have developed a platform for integrated analysis of high-throughput or genome-wide functional genomics studies. A wealth of such data exists, but it is often found in disparate, non-computable forms. Our interactive web-based software system, Gene Weaver (http://www.geneweaver.org), couples curated results from genomic studies to graph-theoretical tools for combinatorial analysis. Using this system we identified a gene underlying multiple alcohol-related phenotypes in four species. A search of over 60,000 gene sets in GeneWeaver's database revealed alcohol-related experimental results including genes identified in mouse genetic mapping studies, alcohol selected Drosophila lines, Rattus differential expression, and human alcoholic brains. We identified highly connected genes and compared these to genes currently annotated to alcohol-related behaviors and processes. The most highly connected gene not annotated to alcohol was Pafah1b1. Experimental validation using a Pafah1b1 conditional knock-out mouse confirmed that this gene is associated with an increased preference for alcohol and an altered thermoregulatory response to alcohol. Although this gene has not been previously implicated in alcohol-related behaviors, its function in various neural mechanisms makes a role in alcohol-related phenomena plausible. By making diverse cross-species functional genomics data readily computable, we were able to identify and confirm a novel alcohol-related gene that may have implications for alcohol use disorders and other effects of alcohol.
Collapse
Affiliation(s)
| | | | - Jeremy J Jay
- The Jackson LaboratoryBar Harbor, ME, USA; Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, North Carolina Research CampusKannapolis, NC, USA
| | - Michael A Langston
- Department of Electrical Engineering and Computer Science, University of Tennessee Knoxville, TN, USA
| | - Erich J Baker
- School of Engineering and Department of Computer Science, Baylor University Waco, TX, USA
| | | |
Collapse
|
25
|
Kumari A, Kanchan S, Sinha RP, Kesheri M. Applications of Bio-molecular Databases in Bioinformatics. MEDICAL IMAGING IN CLINICAL APPLICATIONS 2016. [DOI: 10.1007/978-3-319-33793-7_15] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
26
|
Baker E, Bubier JA, Reynolds T, Langston MA, Chesler EJ. GeneWeaver: data driven alignment of cross-species genomics in biology and disease. Nucleic Acids Res 2015; 44:D555-9. [PMID: 26656951 PMCID: PMC4702926 DOI: 10.1093/nar/gkv1329] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 11/13/2015] [Indexed: 11/17/2022] Open
Abstract
The GeneWeaver data and analytics website (www.geneweaver.org) is a publically available resource for storing, curating and analyzing sets of genes from heterogeneous data sources. The system enables discovery of relationships among genes, variants, traits, drugs, environments, anatomical structures and diseases implicitly found through gene set intersections. Since the previous review in the 2012 Nucleic Acids Research Database issue, GeneWeaver's underlying analytics platform has been enhanced, its number and variety of publically available gene set data sources has been increased, and its advanced search mechanisms have been expanded. In addition, its interface has been redesigned to take advantage of flexible web services, programmatic data access, and a refined data model for handling gene network data in addition to its original emphasis on gene set data. By enumerating the common and distinct biological molecules associated with all subsets of curated or user submitted groups of gene sets and gene networks, GeneWeaver empowers users with the ability to construct data driven descriptions of shared and unique biological processes, diseases and traits within and across species.
Collapse
Affiliation(s)
- Erich Baker
- School of Engineering & Computer Science, Baylor University, Waco, TX 76798, USA Institute for Biomedical Studies, Baylor University, Waco, TX 76798, USA
| | | | - Timothy Reynolds
- Institute for Biomedical Studies, Baylor University, Waco, TX 76798, USA
| | - Michael A Langston
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996, USA
| | | |
Collapse
|
27
|
Bubier JA, Phillips CA, Langston MA, Baker EJ, Chesler EJ. GeneWeaver: finding consilience in heterogeneous cross-species functional genomics data. Mamm Genome 2015; 26:556-66. [PMID: 26092690 PMCID: PMC4602068 DOI: 10.1007/s00335-015-9575-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 06/03/2015] [Indexed: 01/20/2023]
Abstract
A persistent challenge lies in the interpretation of consensus and discord from functional genomics experimentation. Harmonizing and analyzing this data will enable investigators to discover relations of many genes to many diseases, and from many phenotypes and experimental paradigms to many diseases through their genomic substrates. The GeneWeaver.org system provides a platform for cross-species integration and interrogation of heterogeneous curated and experimentally derived functional genomics data. GeneWeaver enables researchers to store, share, analyze, and compare results of their own genome-wide functional genomics experiments in an environment containing rich companion data obtained from major curated repositories, including the Mouse Genome Database and other model organism databases, along with derived data from highly specialized resources, publications, and user submissions. The data, largely consisting of gene sets and putative biological networks, are mapped onto one another through gene identifiers and homology across species. A versatile suite of interactive tools enables investigators to perform a variety of set analysis operations to find consilience among these often noisy experimental results. Fast algorithms enable real-time analysis of large queries. Specific applications include prioritizing candidate genes for quantitative trait loci, identifying biologically valid mouse models and phenotypic assays for human disease, finding the common biological substrates of related diseases, classifying experiments and the biological concepts they represent from empirical data, and applying patterns of genomic evidence to implicate novel genes in disease. These results illustrate an alternative to strict emphasis on replicability, whereby researchers classify experimental results to identify the conditions that lead to their similarity.
Collapse
Affiliation(s)
| | - Charles A Phillips
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, 37996, USA
| | - Michael A Langston
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, 37996, USA
| | - Erich J Baker
- Computer Science Department, Baylor University, Waco, TX, 76798, USA
| | | |
Collapse
|
28
|
Chen HC, Zou W, Lu TP, Chen JJ. A composite model for subgroup identification and prediction via bicluster analysis. PLoS One 2014; 9:e111318. [PMID: 25347824 PMCID: PMC4210136 DOI: 10.1371/journal.pone.0111318] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2014] [Accepted: 09/30/2014] [Indexed: 11/18/2022] Open
Abstract
Background A major challenges in the analysis of large and complex biomedical data is to develop an approach for 1) identifying distinct subgroups in the sampled populations, 2) characterizing their relationships among subgroups, and 3) developing a prediction model to classify subgroup memberships of new samples by finding a set of predictors. Each subgroup can represent different pathogen serotypes of microorganisms, different tumor subtypes in cancer patients, or different genetic makeups of patients related to treatment response. Methods This paper proposes a composite model for subgroup identification and prediction using biclusters. A biclustering technique is first used to identify a set of biclusters from the sampled data. For each bicluster, a subgroup-specific binary classifier is built to determine if a particular sample is either inside or outside the bicluster. A composite model, which consists of all binary classifiers, is constructed to classify samples into several disjoint subgroups. The proposed composite model neither depends on any specific biclustering algorithm or patterns of biclusters, nor on any classification algorithms. Results The composite model was shown to have an overall accuracy of 97.4% for a synthetic dataset consisting of four subgroups. The model was applied to two datasets where the sample’s subgroup memberships were known. The procedure showed 83.7% accuracy in discriminating lung cancer adenocarcinoma and squamous carcinoma subtypes, and was able to identify 5 serotypes and several subtypes with about 94% accuracy in a pathogen dataset. Conclusion The composite model presents a novel approach to developing a biclustering-based classification model from unlabeled sampled data. The proposed approach combines unsupervised biclustering and supervised classification techniques to classify samples into disjoint subgroups based on their associated attributes, such as genotypic factors, phenotypic outcomes, efficacy/safety measures, or responses to treatments. The procedure is useful for identification of unknown species or new biomarkers for targeted therapy.
Collapse
Affiliation(s)
- Hung-Chia Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America
- Graduate Institute of Biostatistics and Biostatistics Center, China Medical University, Taichung, Taiwan
| | - Wen Zou
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America
| | - Tzu-Pin Lu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America
- Department of Public Health, Graduate Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - James J. Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America
- Graduate Institute of Biostatistics and Biostatistics Center, China Medical University, Taichung, Taiwan
- * E-mail:
| |
Collapse
|
29
|
Baker E, Culpepper C, Philips C, Bubier J, Langston M, Chesler E. Identifying common components across biological network graphs using a bipartite data model. BMC Proc 2014; 8:S4. [PMID: 25374613 PMCID: PMC4202189 DOI: 10.1186/1753-6561-8-s6-s4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The GeneWeaver bipartite data model provides an efficient means to evaluate shared molecular components from sets derived across diverse species, disease states and biological processes. In order to adapt this model for examining related molecular components and biological networks, such as pathway or gene network data, we have developed a means to leverage the bipartite data structure to extract and analyze shared edges. Using the Pathway Commons database we demonstrate the ability to rapidly identify shared connected components among a diverse set of pathways. In addition, we illustrate how results from maximal bipartite discovery can be decomposed into hierarchical relationships, allowing shared pathway components to be mapped through various parent-child relationships to help visualization and discovery of emergent kernel driven relationships. Interrogating common relationships among biological networks and conventional GeneWeaver gene lists will increase functional specificity and reliability of the shared biological components. This approach enables self-organization of biological processes through shared biological networks.
Collapse
Affiliation(s)
- Ej Baker
- Department of Computer Science, Baylor University, Waco, TX, USA
| | - C Culpepper
- Department of Computer Science, Baylor University, Waco, TX, USA
| | - C Philips
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA
| | - J Bubier
- Department of Bioinformatics and Computational Biology, The Jackson Laboratory, Bar Harbor, ME, USA
| | - M Langston
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA
| | - Ej Chesler
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA
| |
Collapse
|
30
|
Identification of a QTL in Mus musculus for alcohol preference, withdrawal, and Ap3m2 expression using integrative functional genomics and precision genetics. Genetics 2014; 197:1377-93. [PMID: 24923803 DOI: 10.1534/genetics.114.166165] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Extensive genetic and genomic studies of the relationship between alcohol drinking preference and withdrawal severity have been performed using animal models. Data from multiple such publications and public data resources have been incorporated in the GeneWeaver database with >60,000 gene sets including 285 alcohol withdrawal and preference-related gene sets. Among these are evidence for positional candidates regulating these behaviors in overlapping quantitative trait loci (QTL) mapped in distinct mouse populations. Combinatorial integration of functional genomics experimental results revealed a single QTL positional candidate gene in one of the loci common to both preference and withdrawal. Functional validation studies in Ap3m2 knockout mice confirmed these relationships. Genetic validation involves confirming the existence of segregating polymorphisms that could account for the phenotypic effect. By exploiting recent advances in mouse genotyping, sequence, epigenetics, and phylogeny resources, we confirmed that Ap3m2 resides in an appropriately segregating genomic region. We have demonstrated genetic and alcohol-induced regulation of Ap3m2 expression. Although sequence analysis revealed no polymorphisms in the Ap3m2-coding region that could account for all phenotypic differences, there are several upstream SNPs that could. We have identified one of these to be an H3K4me3 site that exhibits strain differences in methylation. Thus, by making cross-species functional genomics readily computable we identified a common QTL candidate for two related bio-behavioral processes via functional evidence and demonstrate sufficiency of the genetic locus as a source of variation underlying two traits.
Collapse
|
31
|
Baker EJ, Jay JJ, Bubier JA, Langston MA, Chesler EJ. GeneWeaver: a web-based system for integrative functional genomics. Nucleic Acids Res 2011; 40:D1067-76. [PMID: 22080549 PMCID: PMC3245070 DOI: 10.1093/nar/gkr968] [Citation(s) in RCA: 97] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
High-throughput genome technologies have produced a wealth of data on the association of genes and gene products to biological functions. Investigators have discovered value in combining their experimental results with published genome-wide association studies, quantitative trait locus, microarray, RNA-sequencing and mutant phenotyping studies to identify gene-function associations across diverse experiments, species, conditions, behaviors or biological processes. These experimental results are typically derived from disparate data repositories, publication supplements or reconstructions from primary data stores. This leaves bench biologists with the complex and unscalable task of integrating data by identifying and gathering relevant studies, reanalyzing primary data, unifying gene identifiers and applying ad hoc computational analysis to the integrated set. The freely available GeneWeaver (http://www.GeneWeaver.org) powered by the Ontological Discovery Environment is a curated repository of genomic experimental results with an accompanying tool set for dynamic integration of these data sets, enabling users to interactively address questions about sets of biological functions and their relations to sets of genes. Thus, large numbers of independently published genomic results can be organized into new conceptual frameworks driven by the underlying, inferred biological relationships rather than a pre-existing semantic framework. An empirical ‘ontology’ is discovered from the aggregate of experimental knowledge around user-defined areas of biological inquiry.
Collapse
Affiliation(s)
- Erich J. Baker
- School of Engineering & Computer Science, Baylor University, Waco, TX 76798, The Jackson Laboratory, Bar Harbor, ME 04609 and Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996, USA
| | - Jeremy J. Jay
- School of Engineering & Computer Science, Baylor University, Waco, TX 76798, The Jackson Laboratory, Bar Harbor, ME 04609 and Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996, USA
| | - Jason A. Bubier
- School of Engineering & Computer Science, Baylor University, Waco, TX 76798, The Jackson Laboratory, Bar Harbor, ME 04609 and Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996, USA
| | - Michael A. Langston
- School of Engineering & Computer Science, Baylor University, Waco, TX 76798, The Jackson Laboratory, Bar Harbor, ME 04609 and Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996, USA
| | - Elissa J. Chesler
- School of Engineering & Computer Science, Baylor University, Waco, TX 76798, The Jackson Laboratory, Bar Harbor, ME 04609 and Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996, USA
- *To whom correspondence should be addressed. Tel: +1 207 288 6000; Fax: +1 207 288 6847;
| |
Collapse
|
32
|
Baker EJ, Jay JJ, Philip VM, Zhang Y, Li Z, Kirova R, Langston MA, Chesler EJ. Ontological Discovery Environment: a system for integrating gene-phenotype associations. Genomics 2009; 94:377-87. [PMID: 19733230 DOI: 10.1016/j.ygeno.2009.08.016] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2009] [Revised: 08/19/2009] [Accepted: 08/27/2009] [Indexed: 10/20/2022]
Abstract
The wealth of genomic technologies has enabled biologists to rapidly ascribe phenotypic characters to biological substrates. Central to effective biological investigation is the operational definition of the process under investigation. We propose an elucidation of categories of biological characters, including disease relevant traits, based on natural endogenous processes and experimentally observed biological networks, pathways and systems rather than on externally manifested constructs and current semantics such as disease names and processes. The Ontological Discovery Environment (ODE) is an Internet accessible resource for the storage, sharing, retrieval and analysis of phenotype-centered genomic data sets across species and experimental model systems. Any type of data set representing gene-phenotype relationships, such quantitative trait loci (QTL) positional candidates, literature reviews, microarray experiments, ontological or even meta-data, may serve as inputs. To demonstrate a use case leveraging the homology capabilities of ODE and its ability to synthesize diverse data sets, we conducted an analysis of genomic studies related to alcoholism. The core of ODE's gene set similarity, distance and hierarchical analysis is the creation of a bipartite network of gene-phenotype relations, a unique discrete graph approach to analysis that enables set-set matching of non-referential data. Gene sets are annotated with several levels of metadata, including community ontologies, while gene set translations compare models across species. Computationally derived gene sets are integrated into hierarchical trees based on gene-derived phenotype interdependencies. Automated set identifications are augmented by statistical tools which enable users to interpret the confidence of modeled results. This approach allows data integration and hypothesis discovery across multiple experimental contexts, regardless of the face similarity and semantic annotation of the experimental systems or species domain.
Collapse
Affiliation(s)
- Erich J Baker
- Department of Computer Science, Baylor University, Waco, TX, USA
| | | | | | | | | | | | | | | |
Collapse
|