1
|
Panda A, Chaudhari NM, Tripathy S. Genome Annotator Light (GAL): A Docker-based package for genome analysis and visualization. Genomics 2019; 112:127-134. [PMID: 30926570 DOI: 10.1016/j.ygeno.2019.03.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Revised: 03/05/2019] [Accepted: 03/25/2019] [Indexed: 11/15/2022]
Abstract
Next generation sequencing techniques produce enormous data but its analysis and visualization remains a big challenge. To address this, we have developed Genome Annotator Light(GAL), a Docker based package for genome analysis and data visualization. GAL integrated several existing tools and in-house programs inside a Docker Container for systematic analysis and visualization of genomes through web browser. GAL takes varieties of input types ranging from raw Fasta files to fully annotated files, processes them through a standard annotation pipeline and visualizes on a web browser. Comparative genomic analysis is performed automatically within a given taxonomic class. GAL creates interactive genome browser with clickable genomic feature tracks; local BLAST-able database; query page, on-fly downstream data analysis using EMBOSS etc. Overall, GAL is an extremely convenient, portable and platform independent. Fully integrated web-resources can be easily created and deployed, e.g. www.eumicrobedb.org/cglab, for our in-house genomes. GAL is freely available at https://hub.docker.com/u/cglabiicb/.
Collapse
Affiliation(s)
- Arijit Panda
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Indian Institute of Chemical Biology, 4, Raja S.C. Mullick Road, Jadavpur, Kolkata 700032, India; Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, 4, Raja S.C. Mullick Road, Jadavpur, Kolkata 700032, India
| | - Narendrakumar M Chaudhari
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, 4, Raja S.C. Mullick Road, Jadavpur, Kolkata 700032, India
| | - Sucheta Tripathy
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Indian Institute of Chemical Biology, 4, Raja S.C. Mullick Road, Jadavpur, Kolkata 700032, India; Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, 4, Raja S.C. Mullick Road, Jadavpur, Kolkata 700032, India.
| |
Collapse
|
2
|
Suwinski P, Ong C, Ling MHT, Poh YM, Khan AM, Ong HS. Advancing Personalized Medicine Through the Application of Whole Exome Sequencing and Big Data Analytics. Front Genet 2019; 10:49. [PMID: 30809243 PMCID: PMC6379253 DOI: 10.3389/fgene.2019.00049] [Citation(s) in RCA: 103] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Accepted: 01/21/2019] [Indexed: 12/11/2022] Open
Abstract
There is a growing attention toward personalized medicine. This is led by a fundamental shift from the ‘one size fits all’ paradigm for treatment of patients with conditions or predisposition to diseases, to one that embraces novel approaches, such as tailored target therapies, to achieve the best possible outcomes. Driven by these, several national and international genome projects have been initiated to reap the benefits of personalized medicine. Exome and targeted sequencing provide a balance between cost and benefit, in contrast to whole genome sequencing (WGS). Whole exome sequencing (WES) targets approximately 3% of the whole genome, which is the basis for protein-coding genes. Nonetheless, it has the characteristics of big data in large deployment. Herein, the application of WES and its relevance in advancing personalized medicine is reviewed. WES is mapped to Big Data “10 Vs” and the resulting challenges discussed. Application of existing biological databases and bioinformatics tools to address the bottleneck in data processing and analysis are presented, including the need for new generation big data analytics for the multi-omics challenges of personalized medicine. This includes the incorporation of artificial intelligence (AI) in the clinical utility landscape of genomic information, and future consideration to create a new frontier toward advancing the field of personalized medicine.
Collapse
Affiliation(s)
- Pawel Suwinski
- Malaysian Genomics Resource Centre Berhad, Kuala Lumpur, Malaysia
| | - ChuangKee Ong
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Serdang, Malaysia.,Centre of Genomics Research, Precision Medicine and Genomics, AstraZeneca UK Limited, London, United Kingdom
| | - Maurice H T Ling
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Serdang, Malaysia
| | - Yang Ming Poh
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Serdang, Malaysia
| | - Asif M Khan
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Serdang, Malaysia.,Graduate School of Medicine, Perdana University, Serdang, Malaysia
| | - Hui San Ong
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Serdang, Malaysia
| |
Collapse
|
3
|
Paquette SM, Leinonen K, Longabaugh WJR. BioTapestry now provides a web application and improved drawing and layout tools. F1000Res 2016; 5:39. [PMID: 27134726 PMCID: PMC4841208 DOI: 10.12688/f1000research.7620.1] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/22/2015] [Indexed: 01/17/2023] Open
Abstract
Gene regulatory networks (GRNs) control embryonic development, and to understand this process in depth, researchers need to have a detailed understanding of both the network architecture and its dynamic evolution over time and space. Interactive visualization tools better enable researchers to conceptualize, understand, and share GRN models. BioTapestry is an established application designed to fill this role, and recent enhancements released in Versions 6 and 7 have targeted two major facets of the program. First, we introduced significant improvements for network drawing and automatic layout that have now made it much easier for the user to create larger, more organized network drawings. Second, we revised the program architecture so it could continue to support the current Java desktop Editor program, while introducing a new BioTapestry GRN Viewer that runs as a JavaScript web application in a browser. We have deployed a number of GRN models using this new web application. These improvements will ensure that BioTapestry remains viable as a research tool in the face of the continuing evolution of web technologies, and as our understanding of GRN models grows.
Collapse
|
4
|
Li MJ, Liu Z, Wang P, Wong MP, Nelson MR, Kocher JPA, Yeager M, Sham PC, Chanock SJ, Xia Z, Wang J. GWASdb v2: an update database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res 2015; 44:D869-76. [PMID: 26615194 PMCID: PMC4702921 DOI: 10.1093/nar/gkv1317] [Citation(s) in RCA: 142] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 11/10/2015] [Indexed: 12/19/2022] Open
Abstract
Genome-wide association studies (GWASs), now as a routine approach to study single-nucleotide polymorphism (SNP)-trait association, have uncovered over ten thousand significant trait/disease associated SNPs (TASs). Here, we updated GWASdb (GWASdb v2, http://jjwanglab.org/gwasdb) which provides comprehensive data curation and knowledge integration for GWAS TASs. These updates include: (i) Up to August 2015, we collected 2479 unique publications from PubMed and other resources; (ii) We further curated moderate SNP-trait associations (P-value < 1.0×10−3) from each original publication, and generated a total of 252 530 unique TASs in all GWASdb v2 collected studies; (iii) We manually mapped 1610 GWAS traits to 501 Human Phenotype Ontology (HPO) terms, 435 Disease Ontology (DO) terms and 228 Disease Ontology Lite (DOLite) terms. For each ontology term, we also predicted the putative causal genes; (iv) We curated the detailed sub-populations and related sample size for each study; (v) Importantly, we performed extensive function annotation for each TAS by incorporating gene-based information, ENCODE ChIP-seq assays, eQTL, population haplotype, functional prediction across multiple biological domains, evolutionary signals and disease-related annotation; (vi) Additionally, we compiled a SNP-drug response association dataset for 650 pharmacogenetic studies involving 257 drugs in this update; (vii) Last, we improved the user interface of website.
Collapse
Affiliation(s)
- Mulin Jun Li
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Zipeng Liu
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China Department of Anaesthesiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Panwen Wang
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Maria P Wong
- Department of Pathology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Matthew R Nelson
- Quantitative Sciences, GlaxoSmithKline, Research Triangle Park, NC, USA
| | - Jean-Pierre A Kocher
- Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - Meredith Yeager
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Pak Chung Sham
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China State Key Laboratory of Brain and Cognitive Sciences and Department of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Stephen J Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Zhengyuan Xia
- Department of Anaesthesiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Junwen Wang
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
5
|
Cheng R, Leung RKK, Chen Y, Pan Y, Tong Y, Li Z, Ning L, Ling XB, He J. Virtual Pharmacist: A Platform for Pharmacogenomics. PLoS One 2015; 10:e0141105. [PMID: 26496198 PMCID: PMC4619711 DOI: 10.1371/journal.pone.0141105] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 10/03/2015] [Indexed: 01/15/2023] Open
Abstract
We present Virtual Pharmacist, a web-based platform that takes common types of high-throughput data, namely microarray SNP genotyping data, FASTQ and Variant Call Format (VCF) files as inputs, and reports potential drug responses in terms of efficacy, dosage and toxicity at one glance. Batch submission facilitates multivariate analysis or data mining of targeted groups. Individual analysis consists of a report that is readily comprehensible to patients and practioners who have basic knowledge in pharmacology, a table that summarizes variants and potential affected drug response according to the US Food and Drug Administration pharmacogenomic biomarker labeled drug list and PharmGKB, and visualization of a gene-drug-target network. Group analysis provides the distribution of the variants and potential affected drug response of a target group, a sample-gene variant count table, and a sample-drug count table. Our analysis of genomes from the 1000 Genome Project underlines the potentially differential drug responses among different human populations. Even within the same population, the findings from Watson's genome highlight the importance of personalized medicine. Virtual Pharmacist can be accessed freely at http://www.sustc-genome.org.cn/vp or installed as a local web server. The codes and documentation are available at the GitHub repository (https://github.com/VirtualPharmacist/vp). Administrators can download the source codes to customize access settings for further development.
Collapse
Affiliation(s)
- Ronghai Cheng
- Department of Biology, South University of Science and Technology of China, Shenzhen, China
| | - Ross Ka-Kit Leung
- Division of Genomics and Bioinformatics, The Chinese University of Hong Kong, Hong Kong, China
| | - Yao Chen
- Department of Biology, South University of Science and Technology of China, Shenzhen, China
| | - Yidan Pan
- Department of Biology, South University of Science and Technology of China, Shenzhen, China
| | - Yin Tong
- Department of Biology, South University of Science and Technology of China, Shenzhen, China
| | - Zhoufang Li
- Department of Biology, South University of Science and Technology of China, Shenzhen, China
| | - Luwen Ning
- Department of Biology, South University of Science and Technology of China, Shenzhen, China
| | - Xuefeng B. Ling
- Departments of Surgery, Stanford University, Stanford, California, United States of America
| | - Jiankui He
- Department of Biology, South University of Science and Technology of China, Shenzhen, China
- * E-mail:
| |
Collapse
|
6
|
Pavlopoulos GA, Malliarakis D, Papanikolaou N, Theodosiou T, Enright AJ, Iliopoulos I. Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future. Gigascience 2015; 4:38. [PMID: 26309733 PMCID: PMC4548842 DOI: 10.1186/s13742-015-0077-2] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Accepted: 08/03/2015] [Indexed: 01/31/2023] Open
Abstract
"Α picture is worth a thousand words." This widely used adage sums up in a few words the notion that a successful visual representation of a concept should enable easy and rapid absorption of large amounts of information. Although, in general, the notion of capturing complex ideas using images is very appealing, would 1000 words be enough to describe the unknown in a research field such as the life sciences? Life sciences is one of the biggest generators of enormous datasets, mainly as a result of recent and rapid technological advances; their complexity can make these datasets incomprehensible without effective visualization methods. Here we discuss the past, present and future of genomic and systems biology visualization. We briefly comment on many visualization and analysis tools and the purposes that they serve. We focus on the latest libraries and programming languages that enable more effective, efficient and faster approaches for visualizing biological concepts, and also comment on the future human-computer interaction trends that would enable for enhancing visualization further.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | | | - Nikolas Papanikolaou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | - Theodosis Theodosiou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | - Anton J Enright
- EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SD UK
| | - Ioannis Iliopoulos
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| |
Collapse
|
7
|
Juan L, Liu Y, Wang Y, Teng M, Zang T, Wang Y. Family genome browser: visualizing genomes with pedigree information. Bioinformatics 2015; 31:2262-8. [PMID: 25788626 DOI: 10.1093/bioinformatics/btv151] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 03/11/2015] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Families with inherited diseases are widely used in Mendelian/complex disease studies. Owing to the advances in high-throughput sequencing technologies, family genome sequencing becomes more and more prevalent. Visualizing family genomes can greatly facilitate human genetics studies and personalized medicine. However, due to the complex genetic relationships and high similarities among genomes of consanguineous family members, family genomes are difficult to be visualized in traditional genome visualization framework. How to visualize the family genome variants and their functions with integrated pedigree information remains a critical challenge. RESULTS We developed the Family Genome Browser (FGB) to provide comprehensive analysis and visualization for family genomes. The FGB can visualize family genomes in both individual level and variant level effectively, through integrating genome data with pedigree information. Family genome analysis, including determination of parental origin of the variants, detection of de novo mutations, identification of potential recombination events and identical-by-decent segments, etc., can be performed flexibly. Diverse annotations for the family genome variants, such as dbSNP memberships, linkage disequilibriums, genes, variant effects, potential phenotypes, etc., are illustrated as well. Moreover, the FGB can automatically search de novo mutations and compound heterozygous variants for a selected individual, and guide investigators to find high-risk genes with flexible navigation options. These features enable users to investigate and understand family genomes intuitively and systematically. AVAILABILITY AND IMPLEMENTATION The FGB is available at http://mlg.hit.edu.cn/FGB/.
Collapse
Affiliation(s)
- Liran Juan
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yongzhuang Liu
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yongtian Wang
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Mingxiang Teng
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Tianyi Zang
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| |
Collapse
|