1
|
Cai C, Wang L, Zhou L, He P, Jiao B. Complete chloroplast genome of green tide algae Ulva flexuosa (Ulvophyceae, Chlorophyta) with comparative analysis. PLoS One 2017; 12:e0184196. [PMID: 28863197 PMCID: PMC5581003 DOI: 10.1371/journal.pone.0184196] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 08/18/2017] [Indexed: 12/05/2022] Open
Abstract
Ulva flexuosa, one kind of green tide algae, has outbroken in the Yellow Sea of China during the past ten years. In the present study, we sequenced the chloroplast genome of U. flexuosa followed by annotation and comparative analysis. It indicated that the chloroplast genomes had high conservation among Ulva spp., and high rearrangement outside them. Though U. flexuosa was closer to U. linza than U. fasciata in phylogenetic tree, the average Ka/Ks between U. flexuosa and U. linza assessed by 67 protein-coding genes was higher than those between U. flexuosa and other species in Ulva spp., due to the variation of psbZ, psbM and ycf20. Our results laid the foundation for the future studies on the evolution of chloroplast genomes of Ulva, as well as the molecular identification of U. flexuosa varieties.
Collapse
Affiliation(s)
- Chuner Cai
- College of Marine Ecology and Environment, Shanghai Ocean University, Shanghai, China
- National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), Shanghai, China
- Marine Biomedicine Institute, Second Military Medical University, Shanghai, China
| | - Lingke Wang
- College of Marine Ecology and Environment, Shanghai Ocean University, Shanghai, China
| | - Lingjie Zhou
- College of Marine Ecology and Environment, Shanghai Ocean University, Shanghai, China
- Department of Marine Sciences, University of Connecticut, Groton, Connecticut, United States of America
| | - Peimin He
- College of Marine Ecology and Environment, Shanghai Ocean University, Shanghai, China
- National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), Shanghai, China
- * E-mail: (PH); (BJ)
| | - Binghua Jiao
- Marine Biomedicine Institute, Second Military Medical University, Shanghai, China
- * E-mail: (PH); (BJ)
| |
Collapse
|
2
|
da Silva RA, de Carvalho IMVG, de Matos RPA, Yamasaki LHT, Bittar C, Rahal P, Jardim ACG. Evidence of bottleneck effect on hepatitis C virus transmission between a couple under interferon based therapy. INFECTION GENETICS AND EVOLUTION 2016; 47:87-93. [PMID: 27888038 DOI: 10.1016/j.meegid.2016.11.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Revised: 09/20/2016] [Accepted: 11/11/2016] [Indexed: 01/06/2023]
Abstract
Issues on the correlation of viral genetic diversity and treatment response to the hepatitis C infection remain uncertain. The bottleneck effect dictates the characteristics of the viral population that will establish the infection in a new host and is related to how the immune system and treatment will be effective against the virus. Here we evaluated the phylogenetic characteristics of quasispecies population and the treatment response pattern of a HCV infected couple. We also analyzed whether the viral population of these patients indicated that they were exposed to the same source for primer infection. This study included two patients (P10 and P11) HCV genotype 1b infected. The couple presented horizontal transmission. Viral RNA was isolated from serum samples collected before, during and after treatment, at specific time points. The HCV NS5A gene sequence was amplified, cloned and sequenced. Genetic and evolutionary analyses were performed to compare the quasispecies population of these two patients and local control patients. Genetic distance and diversity were calculated. Phylogenetic analyses were performed by using maximum likelihood and Bayesian methodologies. The analysis of the baseline samples showed that the genetic distance of the viral populations of patients P10 and P11 was significantly lower than when these patients and the control group based on sequences from local patients were analyzed, supporting the horizontal transmission hypothesis. Phylogenetic analysis with sequences from all the time point samples also demonstrated two patterns of evolution depending on the treatment response. The Bayesian analysis showed that one isolate corresponding to the baseline sample of P10 was grouped into the P11 clade, suggesting a way of infection and a bottleneck effect. Our data suggests that the patient P11 viral population may be originated from variants from P10 patient and consequently showing that clinical differences between treatment responses can emerge from the bottleneck effect on viral populations.
Collapse
Affiliation(s)
- Rafael Alves da Silva
- Laboratório de Hepatologia Molecular Aplicada, LHEMA, Disciplina de Gastroenterologia, Departamento de Medicina, Universidade Federal de São Paulo, Av. Pedro de Toledo n° 669, 5° Andar, SP, Brazil; Laboratório de Parasitologia, Instituto Butantan, Av. Vital Brazil, n° 1500, SP, Brazil
| | | | | | | | - Cíntia Bittar
- Laboratório de Estudos Genômicos, Ibilce, UNESP, São José do Rio Preto, SP, Brazil
| | - Paula Rahal
- Laboratório de Estudos Genômicos, Ibilce, UNESP, São José do Rio Preto, SP, Brazil
| | - Ana Carolina Gomes Jardim
- Laboratório de Estudos Genômicos, Ibilce, UNESP, São José do Rio Preto, SP, Brazil; Laboratório de Virologia, Instituto de Ciências Biomédicas, Universidade Federal de Uberlândia - UFU, Uberlândia, MG, Brazil
| |
Collapse
|
3
|
Brumm PJ, De Maayer P, Mead DA, Cowan DA. Genomic analysis of six new Geobacillus strains reveals highly conserved carbohydrate degradation architectures and strategies. Front Microbiol 2015; 6:430. [PMID: 26029180 PMCID: PMC4428132 DOI: 10.3389/fmicb.2015.00430] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Accepted: 04/22/2015] [Indexed: 11/13/2022] Open
Abstract
In this work we report the whole genome sequences of six new Geobacillus xylanolytic strains along with the genomic analysis of their capability to degrade carbohydrates. The six sequenced Geobacillus strains described here have a range of GC contents from 43.9% to 52.5% and clade with named Geobacillus species throughout the entire genus. We have identified a ~200 kb unique super-cluster in all six strains, containing five to eight distinct carbohydrate degradation clusters in a single genomic region, a feature not seen in other genera. The Geobacillus strains rely on a small number of secreted enzymes located within distinct clusters for carbohydrate utilization, in contrast to most biomass-degrading organisms which contain numerous secreted enzymes located randomly throughout the genomes. All six strains are able to utilize fructose, arabinose, xylose, mannitol, gluconate, xylan, and α-1,6-glucosides. The gene clusters for utilization of these seven substrates have identical organization and the individual proteins have a high percent identity to their homologs. The strains show significant differences in their ability to utilize inositol, sucrose, lactose, α-mannosides, α-1,4-glucosides and arabinan.
Collapse
Affiliation(s)
- Phillip J. Brumm
- C5•6 TechnologiesMiddleton, WI, USA
- Great Lakes Bioenergy Research Center, University of WisconsinMadison, WI, USA
| | - Pieter De Maayer
- Centre for Microbial Ecology and Genomics, Genomics Research Institute, University of PretoriaPretoria, South Africa
- Department of Microbiology and Plant Pathology, University of PretoriaPretoria, South Africa
| | - David A. Mead
- C5•6 TechnologiesMiddleton, WI, USA
- Great Lakes Bioenergy Research Center, University of WisconsinMadison, WI, USA
- Lucigen CorporationMiddleton, WI, USA
| | - Don A. Cowan
- Centre for Microbial Ecology and Genomics, Genomics Research Institute, University of PretoriaPretoria, South Africa
| |
Collapse
|
4
|
DeGiorgio M, Syring J, Eckert AJ, Liston A, Cronn R, Neale DB, Rosenberg NA. An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines. BMC Evol Biol 2014; 14:67. [PMID: 24678701 PMCID: PMC4021425 DOI: 10.1186/1471-2148-14-67] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2013] [Accepted: 02/10/2014] [Indexed: 12/26/2022] Open
Abstract
Background As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in which they fail to converge on the correct estimate as data sets increase in size. Results Here, using North American pines, we empirically evaluate the behavior of 24 strategies for species tree inference using three alternative outgroups (72 strategies total). The data consist of 120 individuals sampled in eight ingroup species from subsection Strobus and three outgroup species from subsection Gerardianae, spanning ∼47 kilobases of sequence at 121 loci. Each “strategy” for inferring species trees consists of three features: a species tree construction method, a gene tree inference method, and a choice of outgroup. We use multivariate analysis techniques such as principal components analysis and hierarchical clustering to identify tree characteristics that are robustly observed across strategies, as well as to identify groups of strategies that produce trees with similar features. We find that strategies that construct species trees using only topological information cluster together and that strategies that use additional non-topological information (e.g., branch lengths) also cluster together. Strategies that utilize more than one individual within a species to infer gene trees tend to produce estimates of species trees that contain clades present in trees estimated by other strategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tend to produce species tree estimates that contain clades that are not present in trees estimated by the Concatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced than those inferred by these other strategies. Conclusions When constructing a species tree from a multilocus set of sequences, our observations provide a basis for interpreting differences in species tree estimates obtained via different approaches that have a two-stage structure in common, one step for gene tree estimation and a second step for species tree estimation. The methods explored here employ a number of distinct features of the data, and our analysis suggests that recovery of the same results from multiple methods that tend to differ in their patterns of inference can be a valuable tool for obtaining reliable estimates.
Collapse
Affiliation(s)
- Michael DeGiorgio
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA.
| | | | | | | | | | | | | |
Collapse
|
5
|
Wu K, Yang M, Liu H, Tao Y, Mei J, Zhao Y. Genetic analysis and molecular characterization of Chinese sesame (Sesamum indicum L.) cultivars using insertion-deletion (InDel) and simple sequence repeat (SSR) markers. BMC Genet 2014; 15:35. [PMID: 24641723 PMCID: PMC4234512 DOI: 10.1186/1471-2156-15-35] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2013] [Accepted: 03/10/2014] [Indexed: 11/10/2022] Open
Abstract
Background Sesame is an important and ancient oil crop in tropical and subtropical areas. China is one of the most important sesame producing countries with many germplasm accessions and excellent cultivars. Domestication and modern plant breeding have presumably narrowed the genetic basis of cultivated sesame. Several modern sesame cultivars were bred with a limited number of landrace cultivars in their pedigree. The genetic variation was subsequently reduced by genetic drift and selection. Characterization of genetic diversity of these cultivars by molecular markers is of great value to assist parental line selection and breeding strategy design. Results Three hundred and forty nine simple sequence repeat (SSR) and 79 insertion-deletion (InDel) markers were developed from cDNA library and reduced-representation sequencing of a sesame cultivar Zhongzhi 14, respectively. Combined with previously published SSR markers, 88 polymorphic markers were used to assess the genetic diversity, phylogenetic relationships, population structure, and allele distribution among 130 Chinese sesame accessions including 82 cultivars, 44 landraces and 4 wild germplasm accessions. A total of 325 alleles were detected, with the average gene diversity of 0.432. Model-based structure analysis revealed the presence of five subgroups belonging to two main groups, which were consistent with the results from principal coordinate analysis (PCA), phylogenetic clustering and analysis of molecular variance (AMOVA). Several missing or unique alleles were identified from particular types, subgroups or families, even though they share one or both parental/progenitor lines. Conclusions This report presented a by far most comprehensive characterization of the molecular and genetic diversity of sesame cultivars in China. InDels are more polymorphic than SSRs, but their ability for deciphering genetic diversity compared to the later. Improved sesame cultivars have narrower genetic basis than landraces, reflecting the effect of genetic drift or selection during breeding processes. Comparative analysis of allele distribution revealed genetic divergence between improved cultivars and landraces, as well as between cultivars released in different years. These results will be useful for assessing cultivars and for marker-assisted breeding in sesame.
Collapse
Affiliation(s)
| | | | | | | | | | - Yingzhong Zhao
- Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, Sesame Genetic Improvement Laboratory, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences (OCRI-CAAS), Wuhan, Hubei 430062, China.
| |
Collapse
|
6
|
Mead D, Drinkwater C, Brumm PJ. Genomic and enzymatic results show Bacillus cellulosilyticus uses a novel set of LPXTA carbohydrases to hydrolyze polysaccharides. PLoS One 2013; 8:e61131. [PMID: 23593409 PMCID: PMC3617157 DOI: 10.1371/journal.pone.0061131] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 03/08/2013] [Indexed: 11/19/2022] Open
Abstract
Background Alkaliphilic Bacillus species are intrinsically interesting due to the bioenergetic problems posed by growth at high pH and high salt. Three alkaline cellulases have been cloned, sequenced and expressed from Bacillus cellulosilyticus N-4 (Bcell) making it an excellent target for genomic sequencing and mining of biomass-degrading enzymes. Methodology/Principal Findings The genome of Bcell is a single chromosome of 4.7 Mb with no plasmids present and three large phage insertions. The most unusual feature of the genome is the presence of 23 LPXTA membrane anchor proteins; 17 of these are annotated as involved in polysaccharide degradation. These two values are significantly higher than seen in any other Bacillus species. This high number of membrane anchor proteins is seen only in pathogenic Gram-positive organisms such as Listeria monocytogenes or Staphylococcus aureus. Bcell also possesses four sortase D subfamily 4 enzymes that incorporate LPXTA-bearing proteins into the cell wall; three of these are closely related to each other and unique to Bcell. Cell fractionation and enzymatic assay of Bcell cultures show that the majority of polysaccharide degradation is associated with the cell wall LPXTA-enzymes, an unusual feature in Gram-positive aerobes. Genomic analysis and growth studies both strongly argue against Bcell being a truly cellulolytic organism, in spite of its name. Preliminary results suggest that fungal mycelia may be the natural substrate for this organism. Conclusions/Significance Bacillus cellulosilyticus N-4, in spite of its name, does not possess any of the genes necessary for crystalline cellulose degradation, demonstrating the risk of classifying microorganisms without the benefit of genomic analysis. Bcell is the first Gram-positive aerobic organism shown to use predominantly cell-bound, non-cellulosomal enzymes for polysaccharide degradation. The LPXTA-sortase system utilized by Bcell may have applications both in anchoring cellulases and other biomass-degrading enzymes to Bcell itself and in anchoring proteins other Gram-positive organisms.
Collapse
Affiliation(s)
- David Mead
- Lucigen Corporation, Middleton, Wisconsin, United States of America
- C5•6 Technologies, Middleton, Wisconsin, United States of America
- DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Colleen Drinkwater
- Lucigen Corporation, Middleton, Wisconsin, United States of America
- DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Phillip J. Brumm
- C5•6 Technologies, Middleton, Wisconsin, United States of America
- DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- * E-mail:
| |
Collapse
|
7
|
Complete genome sequence of the genotype 4 hepatitis E virus strain prevalent in swine in Jiangsu Province, China, reveals a close relationship with that from the human population in this area. J Virol 2012; 86:8334-5. [PMID: 22787267 DOI: 10.1128/jvi.01060-12] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Hepatitis E virus (HEV) is a zoonotic pathogen of which several species of animal were reported as reservoirs. Swine stands out as the major reservoir for HEV infection in humans, as suggested by the close genetic relationship of swine and human viruses. In a previous study, we sequenced the complete genome of a human genotype 4 HEV strain (HM439284) that is prevalent in Jiangsu Province, China. Here we report the complete genome of one genotype 4 HEV strain which is prevalent in swine herds in Jiangsu Province. Phylogenetic analysis indicated that the swine HEV strain in the present study has high sequence homology (>92%) with the genotype 4 HEV strains prevalent in the human population of Jiangsu Province. These results suggested that the genotype 4 HEV strain in the present study is involved in cross-species transmission between swine and humans in this area.
Collapse
|
8
|
Abstract
The porcine enteroviruses (PEVs) belong to the family Picornaviridae. We report a complete genome sequence of a novel PEV strain that is widely prevalent in pigs at least in central and eastern China. The complete genome consists of 7,390 nucleotides, excluding the 3' poly(A) tail, and has an open reading frame that maps between nucleotide positions 812 and 7318 and encodes a 2,168-amino-acid polyprotein. Phylogenetic analysis based on the 3CD and VP1 regions reveals that this PEV strain belongs to a species of PEV9 but may represent a novel sero-/genotype in CPE group III. We also report the major findings from bootscan analysis based on the whole genomes of PEVs in the present study and those available in GenBank.
Collapse
|
9
|
Abstract
Summary: The Pine Alignment and SNP Identification Pipeline (PineSAP) provides a high-throughput solution to single nucleotide polymorphism (SNP) prediction using multiple sequence alignments from re-sequencing data. This pipeline integrates a hybrid of customized scripting, existing utilities and machine learning in order to increase the speed and accuracy of SNP calls. The implementation of this pipeline results in significantly improved multiple sequence alignments and SNP identifications when compared with existing solutions. The use of machine learning in the SNP identifications extends the pipeline's application to any eukaryotic species where full genome sequence information is unavailable. Availability: All code used for this pipeline is freely available at the Dendrome project website (http://dendrome.ucdavis.edu/adept2/resequencing.html) Contact:jlwegrzyn@ucdavis.edu
Collapse
Affiliation(s)
- Jill L Wegrzyn
- Department of Plant Sciences, University of California, Davis, CA 95616, USA.
| | | | | | | |
Collapse
|
10
|
Grattapaglia D, Kirst M. Eucalyptus applied genomics: from gene sequences to breeding tools. THE NEW PHYTOLOGIST 2008; 179:911-929. [PMID: 18537893 DOI: 10.1111/j.1469-8137.2008.02503.x] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Eucalyptus is the most widely planted hardwood crop in the tropical and subtropical world because of its superior growth, broad adaptability and multipurpose wood properties. Plantation forestry of Eucalyptus supplies high-quality woody biomass for several industrial applications while reducing the pressure on tropical forests and associated biodiversity. This review links current eucalypt breeding practices with existing and emerging genomic tools. A brief discussion provides a background to modern eucalypt breeding together with some current applications of molecular markers in support of operational breeding. Quantitative trait locus (QTL) mapping and genetical genomics are reviewed and an in-depth perspective is provided on the power of association genetics to dissect quantitative variation in this highly diverse organism. Finally, some challenges and opportunities to integrate genomic information into directional selective breeding are discussed in light of the upcoming draft of the Eucalyptus grandis genome. Given the extraordinary genetic variation that exists in the genus Eucalyptus, the ingenuity of most breeders, and the powerful genomic tools that have become available, the prospects of applied genomics in Eucalyptus forest production are encouraging.
Collapse
Affiliation(s)
- Dario Grattapaglia
- Plant Genetics Laboratory, Embrapa - Genetic Resources and Biotechnology, Parque Estação Biológica, Brasília 70770-970 DF, Brazil
- Graduate Program in Genomic Sciences and Biotechnology, Universidade Católica de Brasília - SGAN 916 módulo B, Brasília 70790-160 DF, Brazil
| | - Matias Kirst
- School of Forest Resources and Conservation, Graduate Program in Plant Molecular and Cellular Biology, and University of Florida Genetics Institute, University of Florida, PO Box 110410, Gainesville, FL 32611, USA
| |
Collapse
|
11
|
Abstract
In recent years, genome-wide detection of alternative splicing based on Expressed Sequence Tag (EST) sequence alignments with mRNA and genomic sequences has dramatically expanded our understanding of the role of alternative splicing in functional regulation. This chapter reviews the data, methodology, and technical challenges of these genome-wide analyses of alternative splicing, and briefly surveys some of the uses to which such alternative splicing databases have been put. For example, with proper alternative splicing database schema design, it is possible to query genome-wide for alternative splicing patterns that are specific to particular tissues, disease states (e.g., cancer), gender, or developmental stages. EST alignments can be used to estimate exon inclusion or exclusion level of alternatively spliced exons and evolutionary changes for various species can be inferred from exon inclusion level. Such databases can also help automate design of probes for RT-PCR and microarrays, enabling high throughput experimental measurement of alternative splicing.
Collapse
|
12
|
Dereeper A, Argout X, Billot C, Rami JF, Ruiz M. SAT, a flexible and optimized Web application for SSR marker development. BMC Bioinformatics 2007; 8:465. [PMID: 18047663 PMCID: PMC2216045 DOI: 10.1186/1471-2105-8-465] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2007] [Accepted: 11/29/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Simple Sequence Repeats (SSRs), or microsatellites, are among the most powerful genetic markers known. A common method for the development of SSR markers is the construction of genomic DNA libraries enriched for SSR sequences, followed by DNA sequencing. However, designing optimal SSR markers from bulk sequence data is a laborious and time-consuming process. RESULTS SAT (SSR Analysis Tool) is a user-friendly Web application developed to minimize tedious manual operations and reduce errors. This tool facilitates the integration, analysis and display of sequence data from SSR-enriched libraries.SAT is designed to successively perform base calling and quality evaluation of chromatograms, eliminate cloning vector, adaptors and low quality sequences, detect chimera or partially digested sequences, search for SSR motifs, cluster and assemble the redundant sequences, and design SSR primer pairs. An additional virtual PCR step establishes primer specificity. Users may modify the different parameters of each step of the SAT analysis. Although certain steps are compulsory, such as SSR motifs search and sequence assembly, users do not have to run the entire pipeline, and they can choose selectively which steps to perform. A database allows users to store and query results, and to redo individual steps of the workflow. CONCLUSION The SAT Web application is available at http://sat.cirad.fr/sat, and a standalone command-line version is also freely downloadable. Users must send an email to the SAT administrator tropgene@cirad.fr to request a login and password.
Collapse
Affiliation(s)
- Alexis Dereeper
- CIRAD, UMR DAP, TA A-96/03, Avenue Agropolis, Montpellier, France.
| | | | | | | | | |
Collapse
|
13
|
JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow. BMC Bioinformatics 2006; 7:513. [PMID: 17123449 PMCID: PMC1676024 DOI: 10.1186/1471-2105-7-513] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2006] [Accepted: 11/23/2006] [Indexed: 11/25/2022] Open
Abstract
Background Expressed sequence tag (EST) analyses provide a rapid and economical means to identify candidate genes that may be involved in a particular biological process. These ESTs are useful in many Functional Genomics studies. However, the large quantity and complexity of the data generated during an EST sequencing project can make the analysis of this information a daunting task. Results In an attempt to make this task friendlier, we have developed JUICE, an open source data management system (Apache + PHP + MySQL on Linux), which enables the user to easily upload, organize, visualize and search the different types of data generated in an EST project pipeline. In contrast to other systems, the JUICE data management system allows a branched pipeline to be established, modified and expanded, during the course of an EST project. The web interfaces and tools in JUICE enable the users to visualize the information in a graphical, user-friendly manner. The user may browse or search for sequences and/or sequence information within all the branches of the pipeline. The user can search using terms associated with the sequence name, annotation or other characteristics stored in JUICE and associated with sequences or sequence groups. Groups of sequences can be created by the user, stored in a clipboard and/or downloaded for further analyses. Different user profiles restrict the access of each user depending upon their role in the project. The user may have access exclusively to visualize sequence information, access to annotate sequences and sequence information, or administrative access. Conclusion JUICE is an open source data management system that has been developed to aid users in organizing and analyzing the large amount of data generated in an EST Project workflow. JUICE has been used in one of the first functional genomics projects in Chile, entitled "Functional Genomics in nectarines: Platform to potentiate the competitiveness of Chile in fruit exportation". However, due to its ability to organize and visualize data from external pipelines, JUICE is a flexible data management system that should be useful for other EST/Genome projects. The JUICE data management system is released under the Open Source GNU Lesser General Public License (LGPL). JUICE may be downloaded from or .
Collapse
|