1
|
Sablok G, Chen TW, Lee CC, Yang C, Gan RC, Wegrzyn JL, Porta NL, Nayak KC, Huang PJ, Varotto C, Tang P. ChloroMitoCU: Codon patterns across organelle genomes for functional genomics and evolutionary applications. DNA Res 2017; 24:327-332. [PMID: 28419256 PMCID: PMC5499650 DOI: 10.1093/dnares/dsw044] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 09/14/2016] [Indexed: 01/01/2023] Open
Abstract
Organelle genomes are widely thought to have arisen from reduction events involving cyanobacterial and archaeal genomes, in the case of chloroplasts, or α-proteobacterial genomes, in the case of mitochondria. Heterogeneity in base composition and codon preference has long been the subject of investigation of topics ranging from phylogenetic distortion to the design of overexpression cassettes for transgenic expression. From the overexpression point of view, it is critical to systematically analyze the codon usage patterns of the organelle genomes. In light of the importance of codon usage patterns in the development of hyper-expression organelle transgenics, we present ChloroMitoCU, the first-ever curated, web-based reference catalog of the codon usage patterns in organelle genomes. ChloroMitoCU contains the pre-compiled codon usage patterns of 328 chloroplast genomes (29,960 CDS) and 3,502 mitochondrial genomes (49,066 CDS), enabling genome-wide exploration and comparative analysis of codon usage patterns across species. ChloroMitoCU allows the phylogenetic comparison of codon usage patterns across organelle genomes, the prediction of codon usage patterns based on user-submitted transcripts or assembled organelle genes, and comparative analysis with the pre-compiled patterns across species of interest. ChloroMitoCU can increase our understanding of the biased patterns of codon usage in organelle genomes across multiple clades. ChloroMitoCU can be accessed at: http://chloromitocu.cgu.edu.tw/
Collapse
Affiliation(s)
- Gaurav Sablok
- Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy
| | - Ting-Wen Chen
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Chi-Ching Lee
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Chi Yang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Ruei-Chi Gan
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, University 10 of Connecticut, 75 North Eagleville Road, Storrs, CT 06269-3043 USA
| | - Nicola L Porta
- Department of Sustainable Agrobiosystems and Bioresources, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy.,MOUNTFOR Project Centre, European Forest Institute, Via E. Mach 1, 38010 San Michele all'Adige, Trento, Italy
| | - Kinshuk C Nayak
- Bioinformatics Centre, Institute of Life Sciences, Department of Biotechnology, Govt. India, Nalco Square, Bhubaneswar - 751 023, India
| | - Po-Jung Huang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Claudio Varotto
- Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy
| | - Petrus Tang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan.,Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Kweishan, Taoyuan 333, Taiwan
| |
Collapse
|
2
|
Gan RC, Chen TW, Wu TH, Huang PJ, Lee CC, Yeh YM, Chiu CH, Huang HD, Tang P. PARRoT- a homology-based strategy to quantify and compare RNA-sequencing from non-model organisms. BMC Bioinformatics 2016; 17:513. [PMID: 28155708 PMCID: PMC5260104 DOI: 10.1186/s12859-016-1366-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
Background Next-generation sequencing promises the de novo genomic and transcriptomic analysis of samples of interests. However, there are only a few organisms having reference genomic sequences and even fewer having well-defined or curated annotations. For transcriptome studies focusing on organisms lacking proper reference genomes, the common strategy is de novo assembly followed by functional annotation. However, things become even more complicated when multiple transcriptomes are compared. Results Here, we propose a new analysis strategy and quantification methods for quantifying expression level which not only generate a virtual reference from sequencing data, but also provide comparisons between transcriptomes. First, all reads from the transcriptome datasets are pooled together for de novo assembly. The assembled contigs are searched against NCBI NR databases to find potential homolog sequences. Based on the searched result, a set of virtual transcripts are generated and served as a reference transcriptome. By using the same reference, normalized quantification values including RC (read counts), eRPKM (estimated RPKM) and eTPM (estimated TPM) can be obtained that are comparable across transcriptome datasets. In order to demonstrate the feasibility of our strategy, we implement it in the web service PARRoT. PARRoT stands for Pipeline for Analyzing RNA Reads of Transcriptomes. It analyzes gene expression profiles for two transcriptome sequencing datasets. For better understanding of the biological meaning from the comparison among transcriptomes, PARRoT further provides linkage between these virtual transcripts and their potential function through showing best hits in SwissProt, NR database, assigning GO terms. Our demo datasets showed that PARRoT can analyze two paired-end transcriptomic datasets of approximately 100 million reads within just three hours. Conclusions In this study, we proposed and implemented a strategy to analyze transcriptomes from non-reference organisms which offers the opportunity to quantify and compare transcriptome profiles through a homolog based virtual transcriptome reference. By using the homolog based reference, our strategy effectively avoids the problems that may cause from inconsistencies among transcriptomes. This strategy will shed lights on the field of comparative genomics for non-model organism. We have implemented PARRoT as a web service which is freely available at http://parrot.cgu.edu.tw.
Collapse
Affiliation(s)
- Ruei-Chi Gan
- Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu, 300, Taiwan.,Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Ting-Wen Chen
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Timothy H Wu
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei City, Taiwan
| | - Po-Jung Huang
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Chi-Ching Lee
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Yuan-Ming Yeh
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Cheng-Hsun Chiu
- Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Hsien-Da Huang
- Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu, 300, Taiwan. .,Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu, 300, Taiwan.
| | - Petrus Tang
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan. .,Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan. .,Molecular Regulation & Bioinformatics Laboratory, Chang Gung University, Taoyuan, Taiwan.
| |
Collapse
|
3
|
Huang PJ, Lee CC, Tan BCM, Yeh YM, Huang KY, Gan RC, Chen TW, Lee CY, Yang ST, Liao CS, Liu H, Tang P. Vanno: a visualization-aided variant annotation tool. Hum Mutat 2015; 36:167-74. [PMID: 25196204 DOI: 10.1002/humu.22684] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Accepted: 08/25/2014] [Indexed: 01/20/2023]
Abstract
Next-generation sequencing (NGS) technologies have revolutionized the field of genetics and are trending toward clinical diagnostics. Exome and targeted sequencing in a disease context represent a major NGS clinical application, considering its utility and cost-effectiveness. With the ongoing discovery of disease-associated genes, various gene panels have been launched for both basic research and diagnostic tests. However, the fundamental inconsistencies among the diverse annotation sources, software packages, and data formats have complicated the subsequent analysis. To manage disease-associated NGS data, we developed Vanno, a Web-based application for in-depth analysis and rapid evaluation of disease-causative genome sequence alterations. Vanno integrates information from biomedical databases, functional predictions from available evaluation models, and mutation landscapes from TCGA cancer types. A highly integrated framework that incorporates filtering, sorting, clustering, and visual analytic modules is provided to facilitate exploration of oncogenomics datasets at different levels, such as gene, variant, protein domain, or three-dimensional structure. Such design is crucial for the extraction of knowledge from sequence alterations and translating biological insights into clinical applications. Taken together, Vanno supports almost all disease-associated gene tests and exome sequencing panels designed for NGS, providing a complete solution for targeted and exome sequencing analysis. Vanno is freely available at http://cgts.cgu.edu.tw/vanno.
Collapse
Affiliation(s)
- Po-Jung Huang
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan, Taiwan; Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Chen TW, Gan RC, Chang YF, Liao WC, Wu TH, Lee CC, Huang PJ, Lee CY, Chen YYM, Chiu CH, Tang P. Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing. BMC Genomics 2015; 16:648. [PMID: 26315384 PMCID: PMC4552406 DOI: 10.1186/s12864-015-1859-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 08/18/2015] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Whole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the assembly process. However, the influences of different library sizes and assembly methods on paired-end sequencing-based de novo assembly remain poorly understood. RESULTS We used 250 bp Illumina Miseq paired-end reads of different library sizes generated from genomic DNA from Escherichia coli DH1 and Streptococcus parasanguinis FW213 to compare the assembly results of different library sizes and assembly approaches. Our data indicate that overlapping paired-end reads can increase read accuracy but sometimes cause insertion or deletions. Regarding genome assembly, merged reads only outcompete original paired-end reads when coverage depth is low, and larger libraries tend to yield better assembly results. These results imply that distance information is the most critical factor during assembly. Our results also indicate that when depth is sufficiently high, assembly from subsets can sometimes produce better results. CONCLUSIONS In summary, this study provides systematic evaluations of de novo assembly from paired end sequencing data. Among the assembly strategies, we find that overlapping paired-end reads is not always beneficial for bacteria genome assembly and should be avoided or used with caution especially for genomes containing high fraction of repetitive sequences. Because increasing numbers of projects aim at bacteria genome sequencing, our study provides valuable suggestions for the field of genomic sequence construction.
Collapse
Affiliation(s)
- Ting-Wen Chen
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan.
| | - Ruei-Chi Gan
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan.
| | - Yi-Feng Chang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan.
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.
| | - Wei-Chao Liao
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan.
| | | | - Chi-Ching Lee
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan.
| | - Po-Jung Huang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan.
| | - Cheng-Yang Lee
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan.
| | - Yi-Ywan M Chen
- Department of Microbiology and Immunology, Chang Gung University, Taoyuan, Taiwan.
- Graduate Institute of Biomedical Sciences, Chang Gung University, Taoyuan, Taiwan.
| | - Cheng-Hsun Chiu
- Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan.
| | - Petrus Tang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan.
- Graduate Institute of Biomedical Sciences, Chang Gung University, Taoyuan, Taiwan.
- Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan.
| |
Collapse
|
5
|
Huang PJ, Lee CC, Tan BCM, Yeh YM, Julie Chu L, Chen TW, Chang KP, Lee CY, Gan RC, Liu H, Tang P. CMPD: cancer mutant proteome database. Nucleic Acids Res 2014; 43:D849-55. [PMID: 25398898 PMCID: PMC4383976 DOI: 10.1093/nar/gku1182] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Whole-exome sequencing, which centres on the protein coding regions of disease/cancer associated genes, represents the most cost-effective method to-date for deciphering the association between genetic alterations and diseases. Large-scale whole exome/genome sequencing projects have been launched by various institutions, such as NCI, Broad Institute and TCGA, to provide a comprehensive catalogue of coding variants in diverse tissue samples and cell lines. Further functional and clinical interrogation of these sequence variations must rely on extensive cross-platforms integration of sequencing information and a proteome database that explicitly and comprehensively archives the corresponding mutated peptide sequences. While such data resource is a critical for the mass spectrometry-based proteomic analysis of exomic variants, no database is currently available for the collection of mutant protein sequences that correspond to recent large-scale genomic data. To address this issue and serve as bridge to integrate genomic and proteomics datasets, CMPD (http://cgbc.cgu.edu.tw/cmpd) collected over 2 millions genetic alterations, which not only facilitates the confirmation and examination of potential cancer biomarkers but also provides an invaluable resource for translational medicine research and opportunities to identify mutated proteins encoded by mutated genes.
Collapse
Affiliation(s)
- Po-Jung Huang
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan Molecular Medicine Research Center, Chang Gung University, Taoyuan 333, Taiwan
| | - Chi-Ching Lee
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| | | | - Yuan-Ming Yeh
- Bioinformatics Division, Tri-I Biotech, Inc., Taipei 221, Taiwan
| | - Lichieh Julie Chu
- Molecular Medicine Research Center, Chang Gung University, Taoyuan 333, Taiwan
| | - Ting-Wen Chen
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| | - Kai-Ping Chang
- Department of Otolaryngology, Head and Neck Surgery, Chang Gung Memorial Hospital, Lin-Kou, Taoyuan 333, Taiwan
| | - Cheng-Yang Lee
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| | - Ruei-Chi Gan
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| | - Hsuan Liu
- Department of Molecular and Cellular Biology, Chang Gung University, Taoyuan 333, Taiwan
| | - Petrus Tang
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| |
Collapse
|
6
|
Chen TW, Li HP, Lee CC, Gan RC, Huang PJ, Wu TH, Lee CY, Chang YF, Tang P. ChIPseek, a web-based analysis tool for ChIP data. BMC Genomics 2014; 15:539. [PMID: 24974934 PMCID: PMC4092222 DOI: 10.1186/1471-2164-15-539] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2014] [Accepted: 06/20/2014] [Indexed: 02/08/2023] Open
Abstract
Background Chromatin is a dynamic but highly regulated structure. DNA-binding proteins such as transcription factors, epigenetic and chromatin modifiers are responsible for regulating specific gene expression pattern and may result in different phenotypes. To reveal the identity of the proteins associated with the specific region on DNA, chromatin immunoprecipitation (ChIP) is the most widely used technique. ChIP assay followed by next generation sequencing (ChIP-seq) or microarray (ChIP-chip) is often used to study patterns of protein-binding profiles in different cell types and in cancer samples on a genome-wide scale. However, only a limited number of bioinformatics tools are available for ChIP datasets analysis. Results We present ChIPseek, a web-based tool for ChIP data analysis providing summary statistics in graphs and offering several commonly demanded analyses. ChIPseek can provide statistical summary of the dataset including histogram of peak length distribution, histogram of distances to the nearest transcription start site (TSS), and pie chart (or bar chart) of genomic locations for users to have a comprehensive view on the dataset for further analysis. For examining the potential functions of peaks, ChIPseek provides peak annotation, visualization of peak genomic location, motif identification, sequence extraction, and comparison between datasets. Beyond that, ChIPseek also offers users the flexibility to filter peaks and re-analyze the filtered subset of peaks. ChIPseek supports 20 different genome assemblies for 12 model organisms including human, mouse, rat, worm, fly, frog, zebrafish, chicken, yeast, fission yeast, Arabidopsis, and rice. We use demo datasets to demonstrate the usage and intuitive user interface of ChIPseek. Conclusions ChIPseek provides a user-friendly interface for biologists to analyze large-scale ChIP data without requiring any programing skills. All the results and figures produced by ChIPseek can be downloaded for further analysis. The analysis tools built into ChIPseek, especially the ones for selecting and examine a subset of peaks from ChIP data, provides invaluable helps for exploring the high through-put data from either ChIP-seq or ChIP-chip. ChIPseek is freely available at http://chipseek.cgu.edu.tw.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Petrus Tang
- Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan.
| |
Collapse
|
7
|
Huang PJ, Yeh YM, Gan RC, Lee CC, Chen TW, Lee CY, Liu H, Chen SJ, Tang P. CPAP: Cancer Panel Analysis Pipeline. Hum Mutat 2013; 34:1340-6. [PMID: 23893859 DOI: 10.1002/humu.22386] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Accepted: 07/12/2013] [Indexed: 12/14/2022]
Abstract
Targeted sequencing using next-generation sequencing technologies is currently being rapidly adopted for clinical sequencing and cancer marker tests. However, no existing bioinformatics tool is available for the analysis and visualization of multiple targeted sequencing datasets. In the present study, we use cancer panel targeted sequencing datasets generated by the Life Technologies Ion Personal Genome Machine Sequencer as an example to illustrate how to develop an automated pipeline for the comparative analyses of multiple datasets. Cancer Panel Analysis Pipeline (CPAP) uses standard output files from variant calling software to generate a distribution map of SNPs among all of the samples in a circular diagram generated by Circos. The diagram is hyperlinked to a dynamic HTML table that allows the users to identify target SNPs by using different filters. CPAP also integrates additional information about the identified SNPs by linking to an integrated SQL database compiled from SNP-related databases, including dbSNP, 1000 Genomes Project, COSMIC, and dbNSFP. CPAP only takes 17 min to complete a comparative analysis of 500 datasets. CPAP not only provides an automated platform for the analysis of multiple cancer panel datasets but can also serve as a model for any customized targeted sequencing project.
Collapse
Affiliation(s)
- Po-Jung Huang
- Bioinformatics Center, Chang Gung University, Taoyuan, Taiwan; Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | | | | | | | | | | | | | | | | |
Collapse
|