1
|
Yang IS, Jang I, Yang JO, Choi J, Kim MS, Kim KK, Seung BJ, Cheong JH, Sur JH, Nam H, Lee B, Kim J, Kim S. CanISO: a database of genomic and transcriptomic variations in domestic dog (Canis lupus familiaris). BMC Genomics 2023; 24:613. [PMID: 37828501 PMCID: PMC10571338 DOI: 10.1186/s12864-023-09655-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 09/06/2023] [Indexed: 10/14/2023] Open
Abstract
BACKGROUND The domestic dog, Canis lupus familiaris, is a companion animal for humans as well as an animal model in cancer research due to similar spontaneous occurrence of cancers as humans. Despite the social and biological importance of dogs, the catalogue of genomic variations and transcripts for dogs is relatively incomplete. RESULTS We developed CanISO, a new database to hold a large collection of transcriptome profiles and genomic variations for domestic dogs. CanISO provides 87,692 novel transcript isoforms and 60,992 known isoforms from whole transcriptome sequencing of canine tumors (N = 157) and their matched normal tissues (N = 64). CanISO also provides genomic variation information for 210,444 unique germline single nucleotide polymorphisms (SNPs) from the whole exome sequencing of 183 dogs, with a query system that searches gene- and transcript-level information as well as covered SNPs. Transcriptome profiles can be compared with corresponding human transcript isoforms at a tissue level, or between sample groups to identify tumor-specific gene expression and alternative splicing patterns. CONCLUSIONS CanISO is expected to increase understanding of the dog genome and transcriptome, as well as its functional associations with humans, such as shared/distinct mechanisms of cancer. CanISO is publicly available at https://www.kobic.re.kr/caniso/ .
Collapse
Affiliation(s)
- In Seok Yang
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, 03722, Korea
| | - Insu Jang
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience & Biotechnology, Daejeon, 34141, Korea
| | - Jin Ok Yang
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience & Biotechnology, Daejeon, 34141, Korea
| | - Jinhyuk Choi
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience & Biotechnology, Daejeon, 34141, Korea
| | - Min-Seo Kim
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience & Biotechnology, Daejeon, 34141, Korea
| | - Ka-Kyung Kim
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, 03722, Korea
| | - Byung-Joon Seung
- Department of Veterinary Pathology, College of Veterinary Medicine, Konkuk University, Seoul, 05029, Korea
| | - Jae-Ho Cheong
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, 03722, Korea
| | - Jung-Hyang Sur
- Department of Veterinary Pathology, College of Veterinary Medicine, Konkuk University, Seoul, 05029, Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005, Korea
| | - Byungwook Lee
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience & Biotechnology, Daejeon, 34141, Korea.
| | - Junho Kim
- Department of Biological Sciences, Sungkyunkwan University, Suwon, 16419, Korea.
| | - Sangwoo Kim
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, 03722, Korea.
| |
Collapse
|
2
|
Zhao Y, Shin DG. Deep Pathway Analysis V2.0: A Pathway Analysis Framework Incorporating Multi-Dimensional Omics Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:373-385. [PMID: 31603796 DOI: 10.1109/tcbb.2019.2945959] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Pathway analysis is essential in cancer research particularly when scientists attempt to derive interpretation from genome-wide high-throughput experimental data. If pathway information is organized into a network topology, its use in interpreting omics data can become very powerful. In this paper, we propose a topology-based pathway analysis method, called DPA V2.0, which can combine multiple heterogeneous omics data types in its analysis. In this method, each pathway route is encoded as a Bayesian network which is initialized with a sequence of conditional probabilities specifically designed to encode directionality of regulatory relationships defined in the pathway. Unlike other topology-based pathway tools, DPA is capable of identifying pathway routes as representatives of perturbed regulatory signals. We demonstrate the effectiveness of our model by applying it to two well-established TCGA data sets, namely, breast cancer study (BRCA) and ovarian cancer study (OV). The analysis combines mRNA-seq, mutation, copy number variation, and phosphorylation data publicly available for both TCGA data sets. We performed survival analysis and patient subtype analysis and the analysis outcomes revealed the anticipated strengths of our model. We hope that the availability of our model encourages wet lab scientists to generate extra data sets to reap the benefits of using multiple data types in pathway analysis. The majority of pathways distinguished can be confirmed by biological literature. Moreover, the proportion of correctly indentified pathways is 10 percent higher than previous work where only mRNA-seq and mutation data is incorporated for breast cancer patients. Consequently, such an in-depth pathway analysis incorporating more diverse data can give rise to the accuracy of perturbed pathway detection.
Collapse
|
3
|
Zhao Y, Piekos S, Hoang TH, Shin DG. A framework using topological pathways for deeper analysis of transcriptome data. BMC Genomics 2020; 21:834. [PMID: 32138666 PMCID: PMC7057456 DOI: 10.1186/s12864-019-6155-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 09/30/2019] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Pathway analysis is one of the later stage data analysis steps essential in interpreting high-throughput gene expression data. We propose a set of algorithms which given gene expression data can recognize which portion of sub-pathways are actively utilized in the biological system being studied. The degree of activation is measured by conditional probability of the input expression data based on the Bayesian Network model constructed from the topological pathway. RESULTS We demonstrate the effectiveness of our pathway analysis method by conducting two case studies. The first one applies our method to a well-studied temporal microarray data set for the cell cycle using the KEGG Cell Cycle pathway. Our method closely reproduces the biological claims associated with the data sets, but unlike the original work ours can produce how pathway routes interact with each other above and beyond merely identifying which pathway routes are involved in the process. The second study applies the method to the p53 mutation microarray data to perform a comparative study. CONCLUSIONS We show that our method achieves comparable performance against all other pathway analysis systems included in this study in identifying p53 altered pathways. Our method could pave a new way of carrying out next generation pathway analysis.
Collapse
Affiliation(s)
- Yue Zhao
- Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, 06269 USA
| | - Stephanie Piekos
- Department of Pharmaceutical Sciences, University of Connecticut, 69 North Eagleville Road, Unit 3092, Storrs, USA
| | - Tham H. Hoang
- Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, 06269 USA
| | - Dong-Guk Shin
- Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, 06269 USA
| |
Collapse
|
4
|
Tang Z, Yu Z, Wang C. A fast iterative algorithm for high-dimensional differential network. Comput Stat 2019. [DOI: 10.1007/s00180-019-00915-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
5
|
Ramsahai E, Tripathi V, John M. Cancer driver genes: a guilty by resemblance doctrine. PeerJ 2019; 7:e6979. [PMID: 31275738 PMCID: PMC6598669 DOI: 10.7717/peerj.6979] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 04/16/2019] [Indexed: 11/30/2022] Open
Abstract
A major benefit of expansive cancer genome projects is the discovery of new targets for drug treatment and development. To date, cancer driver genes have been primarily identified by methods based on gene mutation frequency. This approach fails to identify culpable genes that are not mutated, rarely mutated, or contribute to the development of rare forms of cancer. Due to the complexity of the disease and the sheer volume of data, computational methods may encounter a NP-complete problem. We have developed a novel pathway and reach (PAR) method that employs a guilty by resemblance approach to identify cancer driver genes that avoids the above problems. Essentially PAR sifts through a list of genes of biological pathways to find those that are common to the same pathways and possess a similar 2-reach topology metric as a reference set of recognized driver genes. This approach leads to faster processing times and eliminates any dependency on gene mutation frequency. Out of the three pathways, signal transduction, immune system, and gene expression, a set of 50 candidate driver genes were identified, 30 of which were new. The top five were HGF, E2F1, C6, MIF, and CDK2.
Collapse
Affiliation(s)
- Emilie Ramsahai
- Department of Mathematics and Statistics, The University of the West Indies, St. Augustine, Trinidad and Tobago
| | - Vrijesh Tripathi
- Department of Mathematics and Statistics, The University of the West Indies, St. Augustine, Trinidad and Tobago
| | - Melford John
- Department of Preclinical Sciences, The University of the West Indies, St. Augustine, Trinidad and Tobago
| |
Collapse
|
6
|
BioTarget: A Computational Framework Identifying Cancer Type Specific Transcriptional Targets of Immune Response Pathways. Sci Rep 2019; 9:9029. [PMID: 31227749 PMCID: PMC6588588 DOI: 10.1038/s41598-019-45304-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Accepted: 06/03/2019] [Indexed: 01/04/2023] Open
Abstract
Transcriptome data can provide information on signaling pathways active in cancers, but new computational tools are needed to more accurately quantify pathway activity and identify tissue-specific pathway features. We developed a computational method called “BioTarget” that incorporates ChIP-seq data into cellular pathway analysis. This tool relates the expression of transcription factor TF target genes (based on ChIP-seq data) with the status of upstream signaling components for an accurate quantification of pathway activity. This analysis also reveals TF targets expressed in specific contexts/tissues. We applied BioTarget to assess the activity of TBX21 and GATA3 pathways in cancers. TBX21 and GATA3 are TF regulators that control the differentiation of T cells into Th1 and Th2 helper cells that mediate cell-based and humoral immune responses, respectively. Since tumor immune responses can impact cancer progression, the significance of our pathway scores should be revealed by effective patient stratification. We found that low Th1/Th2 activity ratios were associated with a significantly poorer survival of stomach and breast cancer patients, whereas an unbalanced Th1/Th2 response was correlated with poorer survival of colon cancer patients. Lung adenocarcinoma and lung squamous cell carcinoma patients had the lowest survival rates when both Th1 and Th2 responses were high. Our method also identified context-specific target genes for TBX21 and GATA3. Applying the BioTarget tool to BCL6, a TF associated with germinal center lymphocytes, we observed that patients with an active BCL6 pathway had significantly improved survival for breast, colon, and stomach cancer. Our findings support the effectiveness of the BioTarget tool for transcriptome analysis and point to interesting associations between some immune-response pathways and cancer progression.
Collapse
|
7
|
DynSig: Modelling Dynamic Signaling Alterations along Gene Pathways for Identifying Differential Pathways. Genes (Basel) 2018; 9:genes9070323. [PMID: 29954150 PMCID: PMC6071020 DOI: 10.3390/genes9070323] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2018] [Revised: 06/25/2018] [Accepted: 06/25/2018] [Indexed: 11/16/2022] Open
Abstract
Although a number of methods have been proposed for identifying differentially expressed pathways (DEPs), few efforts consider the dynamic components of pathway networks, i.e., gene links. We here propose a signaling dynamics detection method for identification of DEPs, DynSig, which detects the molecular signaling changes in cancerous cells along pathway topology. Specifically, DynSig relies on gene links, instead of gene nodes, in pathways, and models the dynamic behavior of pathways based on Markov chain model (MCM). By incorporating the dynamics of molecular signaling, DynSig allows for an in-depth characterization of pathway activity. To identify DEPs, a novel statistic of activity alteration of pathways was formulated as an overall signaling perturbation score between sample classes. Experimental results on both simulation and real-world datasets demonstrate the effectiveness and efficiency of the proposed method in identifying differential pathways.
Collapse
|
8
|
Identification and characterization of some putative genes involved in arabinoxylan biosynthesis in Plantago ovata. 3 Biotech 2018; 8:266. [PMID: 29868304 DOI: 10.1007/s13205-018-1289-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2018] [Accepted: 05/14/2018] [Indexed: 01/18/2023] Open
Abstract
Plantago ovata is an important source of Psyllium (Isabgol), which swells upon contact with water forming mucilaginous mass, largely composed of arabinoxylans. In this study, we analyzed the expression pattern of arabinoxylan biosynthetic pathway genes at different stages of seed development in P. ovata. Besides, arabinoxylans were quantified at different stages of seed development in water extractable and water unextractable fractions. The expression analysis revealed 5-8 fold increase in the levels of expression of some genes involved in arabinoxylan biosynthetic pathway such as UDP-arabinopyranose mutase, UDP-xylosyltransferase 2 and xylan glucuronosyltransferase at 15 days after pollination stage in seed. The xylose and arabinose units were analyzed at different stages of seed development and also in water-soluble (cold water and hot water), alkali and ethanolic fractions. The concentration of xylose and arabinose units increased steadily after pollination. Overall, alkali extract had high concentration of xylose (0.70 ± 0.022 mg/g) and arabinose units (0.10 ± 0.01 mg/g) at 15 days after pollination stage.
Collapse
|
9
|
Pérez-Valencia JA, Prosdocimi F, Cesari IM, da Costa IR, Furtado C, Agostini M, Rumjanek FD. Angiogenesis and evading immune destruction are the main related transcriptomic characteristics to the invasive process of oral tongue cancer. Sci Rep 2018; 8:2007. [PMID: 29386520 PMCID: PMC5792437 DOI: 10.1038/s41598-017-19010-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Accepted: 12/19/2017] [Indexed: 01/29/2023] Open
Abstract
Metastasis of head and neck tumors is responsible for a high mortality rate. Understanding its biochemistry may allow insights into tumorigenesis. To that end we carried out RNA-Seq analyses of 5 SCC9 derived oral cancer cell lines displaying increased invasive potential. Differentially expressed genes (DEGs) were annotated based on p-values and false discovery rate (q-values). All 292 KEGG pathways related to the human genome were compared in order to pinpoint the absolute and relative contributions to the invasive process considering the 8 hallmarks of cancer plus 2 new defined categories, as well as we made with our transcriptomic data. In terms of absolute contribution, the highest correlations were associated to the categories of evading immune destruction and energy metabolism and for relative contributions, angiogenesis and evading immune destruction. DEGs were distributed into each one of all possible modes of regulation, regarding up, down and continuum expression, along the 3 stages of metastatic progression. For p-values twenty-six genes were consistently present along the tumoral progression and 4 for q-values. Among the DEGs, we found 2 novel potentially informative metastatic markers: PIGG and SLC8B1. Furthermore, interactome analysis showed that MYH14, ANGPTL4, PPARD and ENPP1 are amenable to pharmacological interventions.
Collapse
Affiliation(s)
- Juan Alberto Pérez-Valencia
- Instituto de Bioquímica Médica Leopoldo de Meis, Centro de Ciências da Saúde, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
| | - Francisco Prosdocimi
- Instituto de Bioquímica Médica Leopoldo de Meis, Centro de Ciências da Saúde, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
| | - Italo M Cesari
- Instituto de Bioquímica Médica Leopoldo de Meis, Centro de Ciências da Saúde, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
| | - Igor Rodrigues da Costa
- Instituto de Bioquímica Médica Leopoldo de Meis, Centro de Ciências da Saúde, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
| | | | - Michelle Agostini
- Departamento de Patologia e Diagnóstico Oral, Faculdade de Odontologia, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
| | - Franklin David Rumjanek
- Instituto de Bioquímica Médica Leopoldo de Meis, Centro de Ciências da Saúde, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil.
| |
Collapse
|
10
|
Ma T, Zhang A. Reconstructing context-specific gene regulatory network and identifying modules and network rewiring through data integration. Methods 2017; 124:36-45. [PMID: 28529066 DOI: 10.1016/j.ymeth.2017.05.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2017] [Accepted: 05/05/2017] [Indexed: 12/01/2022] Open
Abstract
Reconstructing context-specific transcriptional regulatory network is crucial for deciphering principles of regulatory mechanisms underlying various conditions. Recently studies that reconstructed transcriptional networks have focused on individual organisms or cell types and relied on data repositories of context-free regulatory relationships. Here we present a comprehensive framework to systematically derive putative regulator-target pairs in any given context by integrating context-specific transcriptional profiling and public data repositories of gene regulatory networks. Moreover, our framework can identify core regulatory modules and signature genes underlying global regulatory circuitry, and detect network rewiring and core rewired modules in different contexts by considering gene modules and edge (gene interaction) modules collaboratively. We applied our methods to analyzing Autism RNA-seq experiment data and produced biologically meaningful results. In particular, all 11 hub genes in a predicted rewired autistic regulatory subnetwork have been linked to autism based on literature review. The predicted rewired autistic regulatory network may shed some new insight into disease mechanism.
Collapse
Affiliation(s)
- Tianle Ma
- Department of Computer Science and Engineering, University at Buffalo (SUNY), Buffalo, NY 14260-2500, United States.
| | - Aidong Zhang
- Department of Computer Science and Engineering, University at Buffalo (SUNY), Buffalo, NY 14260-2500, United States.
| |
Collapse
|