51
|
Yan J, Kurgan L. DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues. Nucleic Acids Res 2017; 45:e84. [PMID: 28132027 PMCID: PMC5449545 DOI: 10.1093/nar/gkx059] [Citation(s) in RCA: 72] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 01/24/2017] [Indexed: 01/18/2023] Open
Abstract
Protein-DNA and protein-RNA interactions are part of many diverse and essential cellular functions and yet most of them remain to be discovered and characterized. Recent research shows that sequence-based predictors of DNA-binding residues accurately find these residues but also cross-predict many RNA-binding residues as DNA-binding, and vice versa. Most of these methods are also relatively slow, prohibiting applications on the whole-genome scale. We describe a novel sequence-based method, DRNApred, which accurately and in high-throughput predicts and discriminates between DNA- and RNA-binding residues. DRNApred was designed using a new dataset with both DNA- and RNA-binding proteins, regression that penalizes cross-predictions, and a novel two-layered architecture. DRNApred outperforms state-of-the-art predictors of DNA- or RNA-binding residues on a benchmark test dataset by substantially reducing the cross predictions and predicting arguably higher quality false positives that are located nearby the native binding residues. Moreover, it also more accurately predicts the DNA- and RNA-binding proteins. Application on the human proteome confirms that DRNApred reduces the cross predictions among the native nucleic acid binders. Also, novel putative DNA/RNA-binding proteins that it predicts share similar subcellular locations and residue charge profiles with the known native binding proteins. Webserver of DRNApred is freely available at http://biomine.cs.vcu.edu/servers/DRNApred/.
Collapse
Affiliation(s)
- Jing Yan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, 23284, USA
| |
Collapse
|
52
|
Bryan K, McGivney BA, Farries G, McGettigan PA, McGivney CL, Gough KF, MacHugh DE, Katz LM, Hill EW. Equine skeletal muscle adaptations to exercise and training: evidence of differential regulation of autophagosomal and mitochondrial components. BMC Genomics 2017; 18:595. [PMID: 28793853 PMCID: PMC5551008 DOI: 10.1186/s12864-017-4007-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 08/02/2017] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND A single bout of exercise induces changes in gene expression in skeletal muscle. Regular exercise results in an adaptive response involving changes in muscle architecture and biochemistry, and is an effective way to manage and prevent common human diseases such as obesity, cardiovascular disorders and type II diabetes. However, the biomolecular mechanisms underlying such responses still need to be fully elucidated. Here we performed a transcriptome-wide analysis of skeletal muscle tissue in a large cohort of untrained Thoroughbred horses (n = 51) before and after a bout of high-intensity exercise and again after an extended period of training. We hypothesized that regular high-intensity exercise training primes the transcriptome for the demands of high-intensity exercise. RESULTS An extensive set of genes was observed to be significantly differentially regulated in response to a single bout of high-intensity exercise in the untrained cohort (3241 genes) and following multiple bouts of high-intensity exercise training over a six-month period (3405 genes). Approximately one-third of these genes (1025) and several biological processes related to energy metabolism were common to both the exercise and training responses. We then developed a novel network-based computational analysis pipeline to test the hypothesis that these transcriptional changes also influence the contextual molecular interactome and its dynamics in response to exercise and training. The contextual network analysis identified several important hub genes, including the autophagosomal-related gene GABARAPL1, and dynamic functional modules, including those enriched for mitochondrial respiratory chain complexes I and V, that were differentially regulated and had their putative interactions 're-wired' in the exercise and/or training responses. CONCLUSION Here we have generated for the first time, a comprehensive set of genes that are differentially expressed in Thoroughbred skeletal muscle in response to both exercise and training. These data indicate that consecutive bouts of high-intensity exercise result in a priming of the skeletal muscle transcriptome for the demands of the next exercise bout. Furthermore, this may also lead to an extensive 're-wiring' of the molecular interactome in both exercise and training and include key genes and functional modules related to autophagy and the mitochondrion.
Collapse
Affiliation(s)
- Kenneth Bryan
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
| | - Beatrice A. McGivney
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
| | - Gabriella Farries
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
| | - Paul A. McGettigan
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
| | - Charlotte L. McGivney
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
| | - Katie F. Gough
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
| | - David E. MacHugh
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
- UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Ireland
| | - Lisa M. Katz
- UCD School of Veterinary Medicine, University College Dublin, Belfield, D04 V1W8 Ireland
| | - Emmeline W. Hill
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
| |
Collapse
|
53
|
Du N, Sun Y. Improve homology search sensitivity of PacBio data by correcting frameshifts. Bioinformatics 2017; 32:i529-i537. [PMID: 27587671 DOI: 10.1093/bioinformatics/btw458] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data. RESULTS In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing. AVAILABILITY AND IMPLEMENTATION The source code is freely available at https://sourceforge.net/projects/frame-pro/ CONTACT yannisun@msu.edu.
Collapse
Affiliation(s)
- Nan Du
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yanni Sun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
54
|
Abstract
MOTIVATION Omics studies aim to find significant changes due to biological or functional perturbation. However, gene and protein expression profiling experiments contain inherent technical variation. In discovery proteomics studies where the number of samples is typically small, technical variation plays an important role because it contributes considerably to the observed variation. Previous methods place both technical and biological variations in tightly integrated mathematical models that are difficult to adapt for different technological platforms. Our aim is to derive a statistical framework that allows the inclusion of a wide range of technical variability. RESULTS We introduce a new method called the simulated linear test, or the s-test, that is easy to implement and easy to adapt for different models of technical variation. It generates virtual data points from the observed values according to a pre-defined technical distribution and subsequently employs linear modeling for significance analysis. We demonstrate the flexibility of the proposed approach by deriving a new significance test for quantitative discovery proteomics for which missing values have been a major issue for traditional methods such as the t-test. We evaluate the result on two label-free (phospho) proteomics datasets based on ion-intensity quantitation. AVAILABILITY AND IMPLEMENTATION Available at http://www.oncoproteomics.nl/software/stest.html CONTACT : t.pham@vumc.nl.
Collapse
Affiliation(s)
- T V Pham
- OncoProteomics Laboratory, Department of Medical Oncology, VU University Medical Center, 1081 HV Amsterdam, The Netherlands
| | - C R Jimenez
- OncoProteomics Laboratory, Department of Medical Oncology, VU University Medical Center, 1081 HV Amsterdam, The Netherlands
| |
Collapse
|
55
|
Ye W, Chen Y, Zhang Y, Xu Y. H-BLAST: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs. Bioinformatics 2017; 33:1130-1138. [PMID: 28087515 DOI: 10.1093/bioinformatics/btw769] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2016] [Accepted: 12/12/2016] [Indexed: 11/15/2022] Open
Abstract
Motivation The sequence alignment is a fundamental problem in bioinformatics. BLAST is a routinely used tool for this purpose with over 118 000 citations in the past two decades. As the size of bio-sequence databases grows exponentially, the computational speed of alignment softwares must be improved. Results We develop the heterogeneous BLAST (H-BLAST), a fast parallel search tool for a heterogeneous computer that couples CPUs and GPUs, to accelerate BLASTX and BLASTP-basic tools of NCBI-BLAST. H-BLAST employs a locally decoupled seed-extension algorithm for better performance on GPUs, and offers a performance tuning mechanism for better efficiency among various CPUs and GPUs combinations. H-BLAST produces identical alignment results as NCBI-BLAST and its computational speed is much faster than that of NCBI-BLAST. Speedups achieved by H-BLAST over sequential NCBI-BLASTP (resp. NCBI-BLASTX) range mostly from 4 to 10 (resp. 5 to 7.2). With 2 CPU threads and 2 GPUs, H-BLAST can be faster than 16-threaded NCBI-BLASTX. Furthermore, H-BLAST is 1.5-4 times faster than GPU-BLAST. Availability and Implementation https://github.com/Yeyke/H-BLAST.git. Contact yux06@syr.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Weicai Ye
- School of Data and Computer Science, and Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou 510275, People's Republic of China
| | - Ying Chen
- School of Data and Computer Science, and Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou 510275, People's Republic of China
| | - Yongdong Zhang
- School of Data and Computer Science, and Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou 510275, People's Republic of China
| | - Yuesheng Xu
- School of Data and Computer Science, and Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou 510275, People's Republic of China.,Professor Emeritus of Department of Mathematics, Syracuse University, Syracuse, NY 13244, USA
| |
Collapse
|
56
|
Zhuo B, Jiang D. MEACA: efficient gene-set interpretation of expression data using mixed models.. [DOI: 10.1101/106781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
AbstractCompetitive gene-set analysis, or enrichment analysis, is widely used for functional interpretation of gene expression data. It tests a known category (e.g. pathway) of genes for enriched differential expression signals. Current methods do not properly capture inter-gene correlations and heterogeneity, resulting in mis-calibration and power loss. We propose MEACA, a new gene-set method based on mixed-effects models. MEACA flexibly incorporates unknown heterogeneity and correlations across genes, and does not need time-consuming permutations. Compared to existing methods, MEACA substantially improves type 1 error control and power in widely ranging scenarios. Real data applications demonstrate MEACA’s ability to recover biologically meaningful relationships.
Collapse
|
57
|
Bowers EC, McCullough SD. Linking the Epigenome with Exposure Effects and Susceptibility: The Epigenetic Seed and Soil Model. Toxicol Sci 2017; 155:302-314. [PMID: 28049737 PMCID: PMC5291212 DOI: 10.1093/toxsci/kfw215] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The epigenome is a dynamic mediator of gene expression that shapes the way that cells, tissues, and organisms respond to their environment. Initial studies in the emerging field of "toxicoepigenetics" have described either the impact of an environmental exposure on the epigenome or the association of epigenetic signatures with the onset or progression of disease; however, the majority of these pioneering studies examined the relationship between discrete epigenetic modifications and the effects of a single environmental factor. Although these data provide critical blocks with which we construct our understanding of the role of the epigenome in susceptibility and disease, they are akin to individual letters in a complex alphabet that is used to compose the language of the epigenome. Advancing the use of epigenetic data to gain a more comprehensive understanding of the mechanisms underlying exposure effects, identify susceptible populations, and inform the next generation risk assessment depends on our ability to integrate these data in a way that accounts for their cumulative impact on gene regulation. Here we will review current examples demonstrating associations between the epigenetic impacts of intrinsic factors, such as such as age, genetics, and sex, and environmental exposures shape the epigenome and susceptibility to exposure effects and disease. We will also demonstrate how the "epigenetic seed and soil" model can be used as a conceptual framework to explain how epigenetic states are shaped by the cumulative impacts of intrinsic and extrinsic factors and how these in turn determine how an individual responds to subsequent exposure to environmental stressors.
Collapse
Affiliation(s)
- Emma C Bowers
- Curriculum in Toxicology, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Shaun D McCullough
- Environmental Public Health Division, National Health and Environmental Effects Research Laboratory, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711
| |
Collapse
|
58
|
Ripoche H, Laine E, Ceres N, Carbone A. JET2 Viewer: a database of predicted multiple, possibly overlapping, protein-protein interaction sites for PDB structures. Nucleic Acids Res 2017; 45:D236-D242. [PMID: 27899675 PMCID: PMC5210541 DOI: 10.1093/nar/gkw1053] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2016] [Revised: 10/18/2016] [Accepted: 10/20/2016] [Indexed: 11/13/2022] Open
Abstract
The database JET2 Viewer, openly accessible at http://www.jet2viewer.upmc.fr/, reports putative protein binding sites for all three-dimensional (3D) structures available in the Protein Data Bank (PDB). This knowledge base was generated by applying the computational method JET2 at large-scale on more than 20 000 chains. JET2 strategy yields very precise predictions of interacting surfaces and unravels their evolutionary process and complexity. JET2 Viewer provides an online intelligent display, including interactive 3D visualization of the binding sites mapped onto PDB structures and suitable files recording JET2 analyses. Predictions were evaluated on more than 15 000 experimentally characterized protein interfaces. This is, to our knowledge, the largest evaluation of a protein binding site prediction method. The overall performance of JET2 on all interfaces are: Sen = 52.52, PPV = 51.24, Spe = 80.05, Acc = 75.89. The data can be used to foster new strategies for protein-protein interactions modulation and interaction surface redesign.
Collapse
Affiliation(s)
- Hugues Ripoche
- Sorbonne Universités, UPMC University Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Elodie Laine
- Sorbonne Universités, UPMC University Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Nicoletta Ceres
- CNRS UMR 5086/University Lyon I, Institut de Biologie et Chimie des Proteines, 69367 Lyon, France
| | - Alessandra Carbone
- Sorbonne Universités, UPMC University Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France .,Institut Universitaire de France, 75005 Paris, France
| |
Collapse
|
59
|
Mangiacotti M, Fumagalli M, Scali S, Zuffi MAL, Cagnone M, Salvini R, Sacchi R. Inter- and intra-population variability of the protein content of femoral gland secretions from a lacertid lizard. Curr Zool 2016; 63:657-665. [PMID: 29492027 PMCID: PMC5804213 DOI: 10.1093/cz/zow113] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Accepted: 11/18/2016] [Indexed: 11/30/2022] Open
Abstract
Femoral glands of male lizards produce waxy secretions that are involved in inter- and intraspecific chemical communication. The main components of these secretions are proteins and lipids, the latter having been extensively studied and already associated to male quality. On the opposite, the composition and role of proteins are nearly unknown, the only available information coming from few studies on iguanids. These studies got the conclusion that proteins might have a communicative function, notably they could signal individual identity. A generalization of these findings requires the extension of protein analysis to other lizard families, and the primary detection of some patterns of individual variability. Using the common wall lizard Podarcis muralis as a model species, the protein fraction of the femoral pore secretions was investigated to provide the first characterization of this component in a lacertid lizard and to explore its source of variability, as a first step to support the hypothesized communicative role. Samples of proteins from femoral secretions were collected from 6 Italian populations and subjected to 1-dimensional electrophoresis. The binary vector of the band presence/absence was used to define the individual profiles. Protein fraction is found to have a structured pattern, with both an individual and a population component. Although the former supports the potential communicative role of proteins, the latter offers a double interpretation, phylogenetic or environmental, even though the phylogenetic effect seems more likely given the climatic resemblance of the considered sites. Further studies are necessary to shed light on both these issues.
Collapse
Affiliation(s)
- Marco Mangiacotti
- Department of Earth and Environmental Sciences, University of Pavia, Via Taramelli 24, Pavia I-27100, Italy.,Museo Civico di Storia Naturale di Milano, Corso Venezia 55, Milano I-20121, Italy
| | - Marco Fumagalli
- Department of Biology and Biotechnology "L. Spallanzani", Biochemistry Unit, University of Pavia, Via Taramelli 3, Pavia I-27100, Italy
| | - Stefano Scali
- Museo Civico di Storia Naturale di Milano, Corso Venezia 55, Milano I-20121, Italy
| | - Marco A L Zuffi
- Museo di Storia Naturale, Università di Pisa, Via Roma 79, Calci, Pisa I-56011, Italy
| | - Maddalena Cagnone
- Department of Molecular Medicine, Biochemistry Unit, University of Pavia, Via Taramelli 3, Pavia I-27100, Italy
| | - Roberta Salvini
- Department of Molecular Medicine, Biochemistry Unit, University of Pavia, Via Taramelli 3, Pavia I-27100, Italy
| | - Roberto Sacchi
- Department of Earth and Environmental Sciences, University of Pavia, Via Taramelli 24, Pavia I-27100, Italy
| |
Collapse
|
60
|
Alcaraz LD, Hernández AM, Peimbert M. Exploring the cockatiel ( Nymphicus hollandicus) fecal microbiome, bacterial inhabitants of a worldwide pet. PeerJ 2016; 4:e2837. [PMID: 28028487 PMCID: PMC5183021 DOI: 10.7717/peerj.2837] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 11/28/2016] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Cockatiels (Nymphicus hollandicus) were originally endemic to Australia; now they are popular pets with a global distribution. It is now possible to conduct detailed molecular studies on cultivable and uncultivable bacteria that are part of the intestinal microbiome of healthy animals. These studies show that bacteria are an essential part of the metabolic capacity of animals. There are few studies on bird microbiomes and, to the best of our knowledge, this is the first report on the cockatiel microbiome. METHODS In this paper, we analyzed the gut microbiome from fecal samples of three healthy adult cockatiels by massive sequencing of the 16S rRNA gene. Additionally, we compared the cockatiel fecal microbiomes with those of other bird species, including poultry and wild birds. RESULTS The vast majority of the bacteria found in cockatiels were Firmicutes, while Proteobacteria and Bacteroidetes were poorly represented. A total of 19,280 different OTUs were detected, of which 8,072 belonged to the Erysipelotrichaceae family. DISCUSSION It is relevant to study cockatiel the microbiomes of cockatiels owing to their wide geographic distribution and close human contact. This study serves as a reference for cockatiel bacterial diversity. Despite the large OTU numbers, the diversity is not even and is dominated by Firmicutes of the Erysipelotrichaceae family. Cockatiels and other wild birds are almost depleted of Bacteroidetes, which happen to be abundant in poultry-related birds, and this is probably associated with the intensive human manipulation of poultry bird diets. Some probable pathogenic bacteria, such as Clostridium and Serratia, appeared to be frequent inhabitants of the fecal microbiome of cockatiels, whereas other potential pathogens were not detected.
Collapse
Affiliation(s)
- Luis David Alcaraz
- Laboratorio Nacional de Ciencias de la Sostenibilidad, Instituto de Ecología, Universidad Nacional Autonóma de México, Mexico City, Mexico
| | - Apolinar M. Hernández
- Departamento de Ciencias Naturales, Unidad Cuajimalpa, Universidad Autónoma Metropolitana, Mexico City, Mexico
| | - Mariana Peimbert
- Departamento de Ciencias Naturales, Unidad Cuajimalpa, Universidad Autónoma Metropolitana, Mexico City, Mexico
| |
Collapse
|
61
|
Seuradge BJ, Oelbermann M, Neufeld JD. Depth-dependent influence of different land-use systems on bacterial biogeography. FEMS Microbiol Ecol 2016; 93:fiw239. [PMID: 27915285 DOI: 10.1093/femsec/fiw239] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 04/15/2016] [Accepted: 11/29/2016] [Indexed: 11/12/2022] Open
Abstract
Despite progress in understanding microbial biogeography of surface soils, few studies have investigated depth-dependent distributions of terrestrial microorganisms in subsoils. We leveraged high-throughput sequencing of 16S rRNA genes obtained from soils collected from the RARE: Charitable Research Reserve (Cambridge, ON, Canada) to assess the influence of depth on bacterial communities across various land-use types. Although bacterial communities were strongly influenced by depth across all sites, the magnitude of this influence was variable and demonstrated that land-use attributes also played a significant role in shaping soil bacterial communities. Soil pH exhibited a large gradient across samples and strongly influenced shifts in bacterial communities with depth and across different land-use systems, especially considering that physicochemical conditions showed generally consistent trends with depth. We observed significant (p ≤ 0.001) and strongly correlated taxa with depth and pH, with a strong predominance of positively depth-correlated OTUs without cultured representatives. These findings highlight the importance of depth in soil biogeographical surveys and that subsurface soils harbour understudied bacterial members with potentially unique and important functions in deeper soil horizons that remain to be characterized.
Collapse
Affiliation(s)
- Brent J Seuradge
- Department of Biology, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Maren Oelbermann
- School of Environment, Resources and Sustainability, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Josh D Neufeld
- Department of Biology, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| |
Collapse
|
62
|
Zhou W, Zhang Y, Li YH, Wang S, Zhang JJ, Zhang CX, Zhang ZS. Investigating dysregulated pathways in Staphylococcus aureus (SA) exposed macrophages based on pathway interaction network. Comput Biol Chem 2016; 66:21-25. [PMID: 27866052 DOI: 10.1016/j.compbiolchem.2016.11.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Revised: 11/02/2016] [Accepted: 11/11/2016] [Indexed: 12/11/2022]
Abstract
OBJECTIVE This work aimed to identify dysregulated pathways for Staphylococcus aureus (SA) exposed macrophages based on pathway interaction network (PIN). METHODS The inference of dysregulated pathways was comprised of four steps: preparing gene expression data, protein-protein interaction (PPI) data and pathway data; constructing a PIN dependent on the data and Pearson correlation coefficient (PCC); selecting seed pathway from PIN by computing activity score for each pathway according to principal component analysis (PCA) method; and investigating dysregulated pathways in a minimum set of pathways (MSP) utilizing seed pathway and the area under the receiver operating characteristics curve (AUC) index implemented in support vector machines (SVM) model. RESULTS A total of 20,545 genes, 449,833 interactions and 1189 pathways were obtained in the gene expression data, PPI data and pathway data, respectively. The PIN was consisted of 8388 interactions and 1189 nodes, and Respiratory electron transport, ATP synthesis by chemiosmotic coupling, and heat production by uncoupling proteins was identified as the seed pathway. Finally, 15 dysregulated pathways in MSP (AUC=0.999) were obtained for SA infected samples, such as Respiratory electron transport and DNA Replication. CONCLUSIONS We have identified 15 dysregulated pathways for SA infected macrophages based on PIN. The findings might provide potential biomarkers for early detection and therapy of SA infection, and give insights to reveal the molecular mechanism underlying SA infections. However, how these dysregulated pathways worked together still needs to be studied.
Collapse
Affiliation(s)
- Wei Zhou
- College of Food Science and Technology, Agricultural University of Hebei, Baoding, 071000, Hebei Province, China; Department of Biological Safety Inspection, Hebei Food Inspection and Research Institute, Shijiazhuang, 050091, Hebei Province, China
| | - Yan Zhang
- Department of Biological Safety Inspection, Hebei Food Inspection and Research Institute, Shijiazhuang, 050091, Hebei Province, China
| | - Yue-Hua Li
- Department of Biological Safety Inspection, Hebei Food Inspection and Research Institute, Shijiazhuang, 050091, Hebei Province, China
| | - Shuang Wang
- Department of Biological Safety Inspection, Hebei Food Inspection and Research Institute, Shijiazhuang, 050091, Hebei Province, China
| | - Jing-Jing Zhang
- Department of Biological Safety Inspection, Hebei Food Inspection and Research Institute, Shijiazhuang, 050091, Hebei Province, China
| | - Cui-Xia Zhang
- Department of Biological Safety Inspection, Hebei Food Inspection and Research Institute, Shijiazhuang, 050091, Hebei Province, China
| | - Zhi-Sheng Zhang
- College of Food Science and Technology, Agricultural University of Hebei, Baoding, 071000, Hebei Province, China.
| |
Collapse
|
63
|
Brbić M, Piškorec M, Vidulin V, Kriško A, Šmuc T, Supek F. The landscape of microbial phenotypic traits and associated genes. Nucleic Acids Res 2016; 44:10074-10090. [PMID: 27915291 PMCID: PMC5137458 DOI: 10.1093/nar/gkw964] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2016] [Revised: 09/21/2016] [Accepted: 10/11/2016] [Indexed: 12/31/2022] Open
Abstract
Bacteria and Archaea display a variety of phenotypic traits and can adapt to diverse ecological niches. However, systematic annotation of prokaryotic phenotypes is lacking. We have therefore developed ProTraits, a resource containing ∼545 000 novel phenotype inferences, spanning 424 traits assigned to 3046 bacterial and archaeal species. These annotations were assigned by a computational pipeline that associates microbes with phenotypes by text-mining the scientific literature and the broader World Wide Web, while also being able to define novel concepts from unstructured text. Moreover, the ProTraits pipeline assigns phenotypes by drawing extensively on comparative genomics, capturing patterns in gene repertoires, codon usage biases, proteome composition and co-occurrence in metagenomes. Notably, we find that gene synteny is highly predictive of many phenotypes, and highlight examples of gene neighborhoods associated with spore-forming ability. A global analysis of trait interrelatedness outlined clusters in the microbial phenotype network, suggesting common genetic underpinnings. Our extended set of phenotype annotations allows detection of 57 088 high confidence gene-trait links, which recover many known associations involving sporulation, flagella, catalase activity, aerobicity, photosynthesis and other traits. Over 99% of the commonly occurring gene families are involved in genetic interactions conditional on at least one phenotype, suggesting that epistasis has a major role in shaping microbial gene content.
Collapse
Affiliation(s)
- Maria Brbić
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia
| | - Matija Piškorec
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia
| | - Vedrana Vidulin
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia
| | - Anita Kriško
- Mediterranean Institute of Life Sciences, 21000 Split, Croatia
| | - Tomislav Šmuc
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia
| | - Fran Supek
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia .,EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| |
Collapse
|
64
|
Du C, Pan P, Jiang Y, Zhang Q, Bao J, Liu C. Microarray data analysis to identify crucial genes regulated by CEBPB in human SNB19 glioma cells. World J Surg Oncol 2016; 14:258. [PMID: 27716259 PMCID: PMC5054626 DOI: 10.1186/s12957-016-0997-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Accepted: 08/30/2016] [Indexed: 01/18/2023] Open
Abstract
Background Glioma is one of the most common primary malignancies in the brain or spine. The transcription factor (TF) CCAAT/enhancer binding protein beta (CEBPB) is important for maintaining the tumor initiating capacity and invasion ability. To investigate the regulation mechanism of CEBPB in glioma, microarray data GSE47352 was analyzed. Methods GSE47352 was downloaded from Gene Expression Omnibus, including three samples of SNB19 human glioma cells transduced with non-target control small hairpin RNA (shRNA) lentiviral vectors for 72 h (normal glioma cells) and three samples of SNB19 human glioma cells transduced with CEBPB shRNA lentiviral vectors for 72 h (CEBPB-silenced glioma cells). The differentially expressed genes (DEGs) were screened using limma package and then annotated. Afterwards, the Database for Annotation, Visualization, and Integrated Discovery (DAVID) software was applied to perform enrichment analysis for the DEGs. Furthermore, the protein-protein interaction (PPI) network and transcriptional regulatory network were constructed using Cytoscape software. Results Total 529 DEGs were identified in the normal glioma cells compared with the CEBPB-silenced glioma cells, including 336 up-regulated and 193 down-regulated genes. The significantly enriched pathways included chemokine signaling pathway (which involved CCL2), focal adhesion (which involved THBS1 and THBS2), TGF-beta signaling pathway (which involved THBS1, THBS2, SMAD5, and SMAD6) and chronic myeloid leukemia (which involved TGFBR2 and CCND1). In the PPI network, CCND1 (degree = 29) and CCL2 (degree = 12) were hub nodes. Additionally, CEBPB and TCF12 might function in glioma through targeting others (CEBPB → TCF12, CEBPB → TGFBR2, and TCF12 → TGFBR2). Conclusions CEBPB might act in glioma by regulating CCL2, CCND1, THBS1, THBS2, SMAD5, SMAD6, TGFBR2, and TCF12.
Collapse
Affiliation(s)
- Chenghua Du
- Department of Neurosurgery, The Affiliated Hospital of Inner Mongolia University for the Nationalities, Huolinhe Street No.1742, Tongliao, Inner Mongolia, 028007, China.
| | - Pan Pan
- Department of Hepatology, Tongliao City Hospital for Infectious Diseases, Tongliao, Inner Mongolia, 028007, China
| | - Yan Jiang
- Department of Neurosurgery, The Affiliated Hospital of Inner Mongolia University for the Nationalities, Huolinhe Street No.1742, Tongliao, Inner Mongolia, 028007, China
| | - Qiuli Zhang
- Department of Neurosurgery, The Affiliated Hospital of Inner Mongolia University for the Nationalities, Huolinhe Street No.1742, Tongliao, Inner Mongolia, 028007, China
| | - Jinsuo Bao
- Department of Neurosurgery, The Affiliated Hospital of Inner Mongolia University for the Nationalities, Huolinhe Street No.1742, Tongliao, Inner Mongolia, 028007, China
| | - Chang Liu
- Department of Neurosurgery, The Affiliated Hospital of Inner Mongolia University for the Nationalities, Huolinhe Street No.1742, Tongliao, Inner Mongolia, 028007, China
| |
Collapse
|
65
|
Redwan RM, Saidin A, Kumar SV. The draft genome of MD-2 pineapple using hybrid error correction of long reads. DNA Res 2016; 23:427-439. [PMID: 27374615 PMCID: PMC5066169 DOI: 10.1093/dnares/dsw026] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 05/18/2016] [Indexed: 11/12/2022] Open
Abstract
The introduction of the elite pineapple variety, MD-2, has caused a significant market shift in the pineapple industry. Better productivity, overall increased in fruit quality and taste, resilience to chilled storage and resistance to internal browning are among the key advantages of the MD-2 as compared with its previous predecessor, the Smooth Cayenne. Here, we present the genome sequence of the MD-2 pineapple (Ananas comosus (L.) Merr.) by using the hybrid sequencing technology from two highly reputable platforms, i.e. the PacBio long sequencing reads and the accurate Illumina short reads. Our draft genome achieved 99.6% genome coverage with 27,017 predicted protein-coding genes while 45.21% of the genome was identified as repetitive elements. Furthermore, differential expression of ripening RNASeq library of pineapple fruits revealed ethylene-related transcripts, believed to be involved in regulating the process of non-climacteric pineapple fruit ripening. The MD-2 pineapple draft genome serves as an example of how a complex heterozygous genome is amenable to whole genome sequencing by using a hybrid technology that is both economical and accurate. The genome will make genomic applications more feasible as a medium to understand complex biological processes specific to pineapple.
Collapse
Affiliation(s)
- Raimi M. Redwan
- Biotechnology Research Institute, Universiti Malaysia Sabah, Jalan UMS, 88400 Kota Kinabalu, Sabah, Malaysia
| | - Akzam Saidin
- Novocraft Technology Sdn. Bhd., C-23A-05, Jalan 19/1, Seksyen 19, 46300 Petaling Jaya, Selangor, Malaysia
| | - S. Vijay Kumar
- Biotechnology Research Institute, Universiti Malaysia Sabah, Jalan UMS, 88400 Kota Kinabalu, Sabah, Malaysia
| |
Collapse
|
66
|
|
67
|
Mansouri A, Cregut M, Abbes C, Durand MJ, Landoulsi A, Thouand G. The Environmental Issues of DDT Pollution and Bioremediation: a Multidisciplinary Review. Appl Biochem Biotechnol 2016; 181:309-339. [PMID: 27591882 DOI: 10.1007/s12010-016-2214-5] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 08/12/2016] [Indexed: 12/01/2022]
Abstract
DDT (1,1,1-trichloro-2,2-bis(4-chlorophenyl) ethane) is probably the best known and most useful organochlorine insecticide in the world which was used since 1945 for agricultural purposes and also for vector-borne disease control such as malaria since 1955, until its banishment in most countries by the Stockholm convention for ecologic considerations. However, the World Health Organization allowed its reintroduction only for control of vector-borne diseases in some tropical countries in 2006. Due to its physicochemical properties and specially its persistence related with a half-life up to 30 years, DDT linked to several health and social problems which are due to its accumulation in the environment and its biomagnification properties in living organisms. This manuscript compiles a multidisciplinary review to evaluate primarily (i) the worldwide contamination of DDT and (ii) its (eco) toxicological impact onto living organisms. Secondly, several ways for DDT bioremediation from contaminated environment are discussed. For this, reports on DDT biodegradation capabilities by microorganisms and ways to enhance bioremediation strategies to remove DDT are presented. The different existing strategies for DDT bioremediation are evaluated with their efficiencies and limitations to struggle efficiently this contaminant. Finally, rising new approaches and technological bottlenecks to promote DDT bioremediation are discussed.
Collapse
Affiliation(s)
- Ahlem Mansouri
- University of Nantes, UMR CNRS 6144 GEPEA, CBAC group, 18 Bvd Gaston Defferre, 85000, La Roche sur Yon, France.,Faculty of Sciences of Bizerte, Laboratory of Biochemistry and Molecular Biology, University of Carthage, Zarzouna, 7021, Tunisia
| | - Mickael Cregut
- University of Nantes, UMR CNRS 6144 GEPEA, CBAC group, 18 Bvd Gaston Defferre, 85000, La Roche sur Yon, France
| | - Chiraz Abbes
- Faculty of Sciences of Bizerte, Laboratory of Biochemistry and Molecular Biology, University of Carthage, Zarzouna, 7021, Tunisia
| | - Marie-Jose Durand
- University of Nantes, UMR CNRS 6144 GEPEA, CBAC group, 18 Bvd Gaston Defferre, 85000, La Roche sur Yon, France
| | - Ahmed Landoulsi
- Faculty of Sciences of Bizerte, Laboratory of Biochemistry and Molecular Biology, University of Carthage, Zarzouna, 7021, Tunisia
| | - Gerald Thouand
- University of Nantes, UMR CNRS 6144 GEPEA, CBAC group, 18 Bvd Gaston Defferre, 85000, La Roche sur Yon, France.
| |
Collapse
|
68
|
Zhang L, Qiao M, Gao H, Hu B, Tan H, Zhou X, Li CM. Investigation of mechanism of bone regeneration in a porous biodegradable calcium phosphate (CaP) scaffold by a combination of a multi-scale agent-based model and experimental optimization/validation. NANOSCALE 2016; 8:14877-87. [PMID: 27460959 PMCID: PMC10150920 DOI: 10.1039/c6nr01637e] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Herein, we have developed a novel approach to investigate the mechanism of bone regeneration in a porous biodegradable calcium phosphate (CaP) scaffold by a combination of a multi-scale agent-based model, experimental optimization of key parameters and experimental data validation of the predictive power of the model. The advantages of this study are that the impact of mechanical stimulation on bone regeneration in a porous biodegradable CaP scaffold is considered, experimental design is used to investigate the optimal combination of growth factors loaded on the porous biodegradable CaP scaffold to promote bone regeneration and the training, testing and analysis of the model are carried out by using experimental data, a data-mining algorithm and related sensitivity analysis. The results reveal that mechanical stimulation has a great impact on bone regeneration in a porous biodegradable CaP scaffold and the optimal combination of growth factors that are encapsulated in nanospheres and loaded into porous biodegradable CaP scaffolds layer-by-layer can effectively promote bone regeneration. Furthermore, the model is robust and able to predict the development of bone regeneration under specified conditions.
Collapse
Affiliation(s)
- Le Zhang
- College of Computer and Information Science, Southwest University, Chongqing 400715, P. R. China.
| | | | | | | | | | | | | |
Collapse
|
69
|
Barsacchi M, Novoa EM, Kellis M, Bechini A. SwiSpot: modeling riboswitches by spotting out switching sequences. Bioinformatics 2016; 32:3252-3259. [PMID: 27378291 DOI: 10.1093/bioinformatics/btw401] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Accepted: 06/20/2016] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Riboswitches are cis-regulatory elements in mRNA, mostly found in Bacteria, which exhibit two main secondary structure conformations. Although one of them prevents the gene from being expressed, the other conformation allows its expression, and this switching process is typically driven by the presence of a specific ligand. Although there are a handful of known riboswitches, our knowledge in this field has been greatly limited due to our inability to identify their alternate structures from their sequences. Indeed, current methods are not able to predict the presence of the two functionally distinct conformations just from the knowledge of the plain RNA nucleotide sequence. Whether this would be possible, for which cases, and what prediction accuracy can be achieved, are currently open questions. RESULTS Here we show that the two alternate secondary structures of riboswitches can be accurately predicted once the 'switching sequence' of the riboswitch has been properly identified. The proposed SwiSpot approach is capable of identifying the switching sequence inside a putative, complete riboswitch sequence, on the basis of pairing behaviors, which are evaluated on proper sets of configurations. Moreover, it is able to model the switching behavior of riboswitches whose generated ensemble covers both alternate configurations. Beyond structural predictions, the approach can also be paired to homology-based riboswitch searches. AVAILABILITY AND IMPLEMENTATION SwiSpot software, along with the reference dataset files, is available at: http://www.iet.unipi.it/a.bechini/swispot/Supplementary information: Supplementary data are available at Bioinformatics online. CONTACT a.bechini@ing.unipi.it.
Collapse
Affiliation(s)
- Marco Barsacchi
- Department of Information Engineering, University of Pisa, Largo L. Lazzarino, Pisa, IT 56122, Italy
| | - Eva Maria Novoa
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Alessio Bechini
- Department of Information Engineering, University of Pisa, Largo L. Lazzarino, Pisa, IT 56122, Italy
| |
Collapse
|
70
|
Whittemore K, Johnston SA, Sykes K, Shen L. A General Method to Discover Epitopes from Sera. PLoS One 2016; 11:e0157462. [PMID: 27300760 PMCID: PMC4907474 DOI: 10.1371/journal.pone.0157462] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Accepted: 05/01/2016] [Indexed: 11/19/2022] Open
Abstract
Antigen-antibody complexes are central players in an effective immune response. However, finding those interactions relevant to a particular disease state can be arduous. Nonetheless many paths to discovery have been explored since deciphering these interactions can greatly facilitate the development of new diagnostics, therapeutics, and vaccines. In silico B cell epitope mapping approaches have been widely pursued, though success has not been consistent. Antibody mixtures in immune sera have been used as handles for biologically relevant antigens, but these and other experimental approaches have proven resource intensive and time consuming. In addition, these methods are often tailored to individual diseases or a specific proteome, rather than providing a universal platform. Most of these methods are not able to identify the specific antibody’s epitopes from unknown antigens, such as un-annotated neo antigens in cancer. Alternatively, a peptide library comprised of sequences unrestricted by naturally-found protein space provides for a universal search for mimotopes of an antibody’s epitope. Here we present the utility of such a non-natural random sequence library of 10,000 peptides physically addressed on a microarray for mimotope discovery without sequence information of the specific antigen. The peptide arrays were probed with serum from an antigen-immunized rabbit, or alternatively probed with serum pre-absorbed with the same immunizing antigen. With this positive and negative screening scheme, we identified the library-peptides as the mimotopes of the antigen. The unique library peptides were successfully used to isolate antigen-specific antibodies from complete immune serum. Sequence analysis of these peptides revealed the epitopes in the immunized antigen. We present this method as an inexpensive, efficient method for identifying mimotopes of any antibody’s targets. These mimotopes should be useful in defining both components of the antigen-antibody complex.
Collapse
Affiliation(s)
- Kurt Whittemore
- Center for Innovations in Medicine, Biodesign Institute, Arizona State University, 1001 South McAllister Avenue, Tempe, Arizona 85287, United States of America
| | - Stephen Albert Johnston
- Center for Innovations in Medicine, Biodesign Institute, Arizona State University, 1001 South McAllister Avenue, Tempe, Arizona 85287, United States of America
| | - Kathryn Sykes
- Center for Innovations in Medicine, Biodesign Institute, Arizona State University, 1001 South McAllister Avenue, Tempe, Arizona 85287, United States of America
| | - Luhui Shen
- Center for Innovations in Medicine, Biodesign Institute, Arizona State University, 1001 South McAllister Avenue, Tempe, Arizona 85287, United States of America
- * E-mail:
| |
Collapse
|
71
|
Singh S, Kaur S, Goel N. A Review of Computational Intelligence Methods for Eukaryotic Promoter Prediction. NUCLEOSIDES NUCLEOTIDES & NUCLEIC ACIDS 2016; 34:449-62. [PMID: 26158565 DOI: 10.1080/15257770.2015.1013126] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
In past decades, prediction of genes in DNA sequences has attracted the attention of many researchers but due to its complex structure it is extremely intricate to correctly locate its position. A large number of regulatory regions are present in DNA that helps in transcription of a gene. Promoter is one such region and to find its location is a challenging problem. Various computational methods for promoter prediction have been developed over the past few years. This paper reviews these promoter prediction methods. Several difficulties and pitfalls encountered by these methods are also detailed, along with future research directions.
Collapse
Affiliation(s)
- Shailendra Singh
- a Department of Computer Science and Engineering , PEC University of Technology , Chandigarh , India
| | | | | |
Collapse
|
72
|
Bhattacharya D, Cao R, Cheng J. UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling. Bioinformatics 2016; 32:2791-9. [PMID: 27259540 PMCID: PMC5018369 DOI: 10.1093/bioinformatics/btw316] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Accepted: 05/15/2016] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Recent experimental studies have suggested that proteins fold via stepwise assembly of structural units named 'foldons' through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo protein conformational sampling from continuous space. However, existing computational approaches for de novo protein structure prediction often randomly sample protein conformational space as opposed to experimentally suggested stepwise sampling. RESULTS Here, we develop a novel generative, probabilistic model that simultaneously captures local structural preferences of backbone and side chain conformational space of polypeptide chains in a united-residue representation and performs experimentally motivated conditional conformational sampling via stepwise synthesis and assembly of foldon units that minimizes a composite physics and knowledge-based energy function for de novo protein structure prediction. The proposed method, UniCon3D, has been found to (i) sample lower energy conformations with higher accuracy than traditional random sampling in a small benchmark of 6 proteins; (ii) perform comparably with the top five automated methods on 30 difficult target domains from the 11th Critical Assessment of Protein Structure Prediction (CASP) experiment and on 15 difficult target domains from the 10th CASP experiment; and (iii) outperform two state-of-the-art approaches and a baseline counterpart of UniCon3D that performs traditional random sampling for protein modeling aided by predicted residue-residue contacts on 45 targets from the 10th edition of CASP. AVAILABILITY AND IMPLEMENTATION Source code, executable versions, manuals and example data of UniCon3D for Linux and OSX are freely available to non-commercial users at http://sysbio.rnet.missouri.edu/UniCon3D/ CONTACT: chengji@missouri.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Jianlin Cheng
- Department of Computer Science Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
73
|
Crow M, Paul A, Ballouz S, Huang ZJ, Gillis J. Exploiting single-cell expression to characterize co-expression replicability. Genome Biol 2016; 17:101. [PMID: 27165153 PMCID: PMC4862082 DOI: 10.1186/s13059-016-0964-6] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 04/25/2016] [Indexed: 01/25/2023] Open
Abstract
Background Co-expression networks have been a useful tool for functional genomics, providing important clues about the cellular and biochemical mechanisms that are active in normal and disease processes. However, co-expression analysis is often treated as a black box with results being hard to trace to their basis in the data. Here, we use both published and novel single-cell RNA sequencing (RNA-seq) data to understand fundamental drivers of gene-gene connectivity and replicability in co-expression networks. Results We perform the first major analysis of single-cell co-expression, sampling from 31 individual studies. Using neighbor voting in cross-validation, we find that single-cell network connectivity is less likely to overlap with known functions than co-expression derived from bulk data, with functional variation within cell types strongly resembling that also occurring across cell types. To identify features and analysis practices that contribute to this connectivity, we perform our own single-cell RNA-seq experiment of 126 cortical interneurons in an experimental design targeted to co-expression. By assessing network replicability, semantic similarity and overall functional connectivity, we identify technical factors influencing co-expression and suggest how they can be controlled for. Many of the technical effects we identify are expression-level dependent, making expression level itself highly predictive of network topology. We show this occurs generally through re-analysis of the BrainSpan RNA-seq data. Conclusions Technical properties of single-cell RNA-seq data create confounds in co-expression networks which can be identified and explicitly controlled for in any supervised analysis. This is useful both in improving co-expression performance and in characterizing single-cell data in generally applicable terms, permitting cross-laboratory comparison within a common framework. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-0964-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Megan Crow
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring, Harbor, NY, 11724, USA
| | - Anirban Paul
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring, Harbor, NY, 11724, USA
| | - Sara Ballouz
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring, Harbor, NY, 11724, USA
| | - Z Josh Huang
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring, Harbor, NY, 11724, USA
| | - Jesse Gillis
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring, Harbor, NY, 11724, USA.
| |
Collapse
|
74
|
Pulido-Tamayo S, Duitama J, Marchal K. EXPLoRA-web: linkage analysis of quantitative trait loci using bulk segregant analysis. Nucleic Acids Res 2016; 44:W142-6. [PMID: 27105844 PMCID: PMC4987886 DOI: 10.1093/nar/gkw298] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 04/11/2016] [Indexed: 11/13/2022] Open
Abstract
Identification of genomic regions associated with a phenotype of interest is a fundamental step toward solving questions in biology and improving industrial research. Bulk segregant analysis (BSA) combined with high-throughput sequencing is a technique to efficiently identify these genomic regions associated with a trait of interest. However, distinguishing true from spuriously linked genomic regions and accurately delineating the genomic positions of these truly linked regions requires the use of complex statistical models currently implemented in software tools that are generally difficult to operate for non-expert users. To facilitate the exploration and analysis of data generated by bulked segregant analysis, we present EXPLoRA-web, a web service wrapped around our previously published algorithm EXPLoRA, which exploits linkage disequilibrium to increase the power and accuracy of quantitative trait loci identification in BSA analysis. EXPLoRA-web provides a user friendly interface that enables easy data upload and parallel processing of different parameter configurations. Results are provided graphically and as BED file and/or text file and the input is expected in widely used formats, enabling straightforward BSA data analysis. The web server is available at http://bioinformatics.intec.ugent.be/explora-web/.
Collapse
Affiliation(s)
- Sergio Pulido-Tamayo
- Department of Information Technology, iGent Toren, Technologiepark 15, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, UGent, Technologiepark 927, 9052 Gent, Belgium Bioinformatics Institute Ghent, Technologiepark 927, 9052 Gent, Belgium Department of Microbial and Molecular Systems, KU Leuven, Kasteelpark Arenberg 20, B-3001 Leuven, Belgium
| | - Jorge Duitama
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), 763537 Cali, Colombia
| | - Kathleen Marchal
- Department of Information Technology, iGent Toren, Technologiepark 15, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, UGent, Technologiepark 927, 9052 Gent, Belgium Bioinformatics Institute Ghent, Technologiepark 927, 9052 Gent, Belgium Department of Genetics, University of Pretoria, Hatfield Campus, Pretoria 0028, South Africa
| |
Collapse
|
75
|
Lv Y, Liu Y, Zhao H. mInDel: a high-throughput and efficient pipeline for genome-wide InDel marker development. BMC Genomics 2016; 17:290. [PMID: 27079510 PMCID: PMC4832496 DOI: 10.1186/s12864-016-2614-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2015] [Accepted: 04/06/2016] [Indexed: 11/29/2022] Open
Abstract
Background Rich in genetic information and cost-effective to genotype, the Insertion-Deletion (InDel) molecular marker system is an important tool for studies in genetics, genomics and for marker-assisted breeding. Advent of next-generation sequencing (NGS) revolutionized the speed and throughput of sequence data generation, and enabled genome-wide identification of insertion and deletion variation. However, current NGS-based InDel mining tools, such as Samtools, GATK and Atlas2, all rely on a reference genome for variant calling which hinders their application on unsequenced organisms and the output of short InDels compromised their use on gel-based genotyping platforms. To address these issues, an enhanced platform is needed to identify longer InDels and develop markers in absence of a reference genome. Results Here we present mInDel (multiple InDel), a next-generation variant calling tool specifically designed for InDel marker discovery. By taking in raw sequence reads and assembling them into contigs de novo, this software identifies InDel polymorphisms using a sliding window alignment from assembled contigs, rendering a unique advantage when a reference genome is unavailable. By providing an option of combining multiple discovered InDels as output, mInDel is amiable to gel-based genotyping platforms where markers with large polymorphisms are preferred. We demonstrated the usability and performance of this software through a case study using a set of maize NGS data, and experimentally validated the accuracy of markers generated from mInDel. Conclusions mInDel is a novel and practical tool that enables rapid genome-wide InDel marker discovery. The features of being independent from a reference genome and the flexibility with downstream genotyping platforms will allow a broad range of applications across genetics research and plant breeding. The mInDel pipeline is freely available at www.github.com/lyd0527/mInDel.
Collapse
Affiliation(s)
- Yuanda Lv
- Institute of Agricultural Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Yuhe Liu
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Han Zhao
- Institute of Agricultural Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China.
| |
Collapse
|
76
|
Kuznetsov IB. Identification of non-random sequence properties in groups of signature peptides obtained in random sequence peptide microarray experiments. Biopolymers 2016; 106:318-29. [PMID: 27037995 DOI: 10.1002/bip.22845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2015] [Revised: 02/16/2016] [Accepted: 03/28/2016] [Indexed: 11/09/2022]
Abstract
Immunosignaturing is an emerging experimental technique that uses random sequence peptide microarrays to detect antibodies produced by the immune system in response to a particular disease. Two important questions regarding immunosignaturing are "Do microarray peptides that exhibit a strong affinity to a given type of antibodies share common sequence properties?" and "If so, what are those properties?" In this work, three statistical tests designed to detect non-random patterns in the amino acid makeup of a group of microarray peptides are presented. One test detects patterns of significantly biased amino acid usage, whereas the other two detect patterns of significant bias in the biochemical properties. These tests do not require a large number of peptides per group. The tests were applied to analyze 19 groups of peptides identified in immunosignaturing experiments as being specific for antibodies produced in response to various types of cancer and other diseases. The positional distribution of the biochemical properties of the amino acids in these 19 peptide groups was also studied. Remarkably, despite the random nature of the sequence libraries used to design the microarrays, a unique group-specific non-random pattern was identified in the majority of the peptide groups studied. © 2016 Wiley Periodicals, Inc. Biopolymers (Pept Sci) 106: 318-329, 2016.
Collapse
Affiliation(s)
- Igor B Kuznetsov
- Cancer Research Center and Department of Epidemiology and Biostatistics, University at Albany, State University of New York, One Discovery Drive, Rensselaer, NY, 12144
| |
Collapse
|
77
|
Identification of repetitive units in protein structures with ReUPred. Amino Acids 2016; 48:1391-400. [PMID: 26898549 DOI: 10.1007/s00726-016-2187-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 01/23/2016] [Indexed: 01/02/2023]
Abstract
Over the last decade, numerous studies have demonstrated the fundamental importance of tandem repeat (TR) proteins in many biological processes. A plethora of new repeat structures have also been solved. The recently published RepeatsDB provides information on TR proteins. However, a detailed structural characterization of repetitive elements is largely missing, as repeat unit annotation is manually curated and currently covers only 3 % of the bona fide TR proteins. Repeat Protein Unit Predictor (ReUPred) is a novel method for the fast automatic prediction of repeat units and repeat classification using an extensive Structure Repeat Unit Library (SRUL) derived from RepeatsDB. ReUPred uses an iterative structural search against the SRUL to find repetitive units. On a test set of solenoid proteins, ReUPred is able to correctly detect 92 % of the proteins. Unlike previous methods, it is also able to correctly classify solenoid repeats in 89 % of cases. It also outperforms two recent state-of-the-art methods for the repeat unit identification problem. The accurate prediction of repeat units increases the number of annotated repeat units by an order of magnitude compared to the sequence-based Pfam classification. ReUPred is implemented in Python for Linux and freely available from the URL: http://protein.bio.unipd.it/reupred/ .
Collapse
|
78
|
Shivashankar N, Patil S, Bhosle A, Chandra N, Natarajan V. MS3ALIGN: an efficient molecular surface aligner using the topology of surface curvature. BMC Bioinformatics 2016; 17:26. [PMID: 26753741 PMCID: PMC4710026 DOI: 10.1186/s12859-015-0874-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Accepted: 12/15/2015] [Indexed: 11/17/2022] Open
Abstract
Background Aligning similar molecular structures is an important step in the process of bio-molecular structure and function analysis. Molecular surfaces are simple representations of molecular structure that are easily constructed from various forms of molecular data such as 3D atomic coordinates (PDB) and Electron Microscopy (EM) data. Methods We present a Multi-Scale Morse-Smale Molecular-Surface Alignment tool, MS3ALIGN, which aligns molecular surfaces based on significant protrusions on the molecular surface. The input is a pair of molecular surfaces represented as triangle meshes. A key advantage of MS3ALIGN is computational efficiency that is achieved because it processes only a few carefully chosen protrusions on the molecular surface. Furthermore, the alignments are partial in nature and therefore allows for inexact surfaces to be aligned. Results The method is evaluated in four settings. First, we establish performance using known alignments with varying overlap and noise values. Second, we compare the method with SurfComp, an existing surface alignment method. We show that we are able to determine alignments reported by SurfComp, as well as report relevant alignments not found by SurfComp. Third, we validate the ability of MS3ALIGN to determine alignments in the case of structurally dissimilar binding sites. Fourth, we demonstrate the ability of MS3ALIGN to align iso-surfaces derived from cryo-electron microscopy scans. Conclusions We have presented an algorithm that aligns Molecular Surfaces based on the topology of surface curvature. A webserver and standalone software implementation of the algorithm available at http://vgl.serc.iisc.ernet.in/ms3align. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0874-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nithin Shivashankar
- Department of Computer Science and Automation, Indian Institute of Science, Bangalore, 560012, India.
| | - Sonali Patil
- Department of Computer Science and Automation, Indian Institute of Science, Bangalore, 560012, India
| | - Amrisha Bhosle
- Department of Biochemistry, Indian Institute of Science, Bangalore, 560012, India
| | - Nagasuma Chandra
- Department of Biochemistry, Indian Institute of Science, Bangalore, 560012, India
| | - Vijay Natarajan
- Department of Computer Science and Automation, and Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore, 560012, India.
| |
Collapse
|
79
|
Palmer A, Ovchinnikova E, Thuné M, Lavigne R, Guével B, Dyatlov A, Vitek O, Pineau C, Borén M, Alexandrov T. Using collective expert judgements to evaluate quality measures of mass spectrometry images. Bioinformatics 2015; 31:i375-84. [PMID: 26072506 PMCID: PMC4765867 DOI: 10.1093/bioinformatics/btv266] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Motivation: Imaging mass spectrometry (IMS) is a maturating technique of molecular imaging. Confidence in the reproducible quality of IMS data is essential for its integration into routine use. However, the predominant method for assessing quality is visual examination, a time consuming, unstandardized and non-scalable approach. So far, the problem of assessing the quality has only been marginally addressed and existing measures do not account for the spatial information of IMS data. Importantly, no approach exists for unbiased evaluation of potential quality measures. Results: We propose a novel approach for evaluating potential measures by creating a gold-standard set using collective expert judgements upon which we evaluated image-based measures. To produce a gold standard, we engaged 80 IMS experts, each to rate the relative quality between 52 pairs of ion images from MALDI-TOF IMS datasets of rat brain coronal sections. Experts’ optional feedback on their expertise, the task and the survey showed that (i) they had diverse backgrounds and sufficient expertise, (ii) the task was properly understood, and (iii) the survey was comprehensible. A moderate inter-rater agreement was achieved with Krippendorff’s alpha of 0.5. A gold-standard set of 634 pairs of images with accompanying ratings was constructed and showed a high agreement of 0.85. Eight families of potential measures with a range of parameters and statistical descriptors, giving 143 in total, were evaluated. Both signal-to-noise and spatial chaos-based measures performed highly with a correlation of 0.7 to 0.9 with the gold standard ratings. Moreover, we showed that a composite measure with the linear coefficients (trained on the gold standard with regularized least squares optimization and lasso) showed a strong linear correlation of 0.94 and an accuracy of 0.98 in predicting which image in a pair was of higher quality. Availability and implementation: The anonymized data collected from the survey and the Matlab source code for data processing can be found at: https://github.com/alexandrovteam/IMS_quality. Contact:theodore.alexandrov@embl.de
Collapse
Affiliation(s)
- Andrew Palmer
- European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Ekaterina Ovchinnikova
- European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Mikael Thuné
- European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Régis Lavigne
- European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Blandine Guével
- European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Andrey Dyatlov
- European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Olga Vitek
- European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Charles Pineau
- European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Mats Borén
- European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Theodore Alexandrov
- European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany, Denator, Uppsala, Sweden, Protim, Inserm U1085 - Irset, University of Rennes 1, Rennes, France, SCiLS GmbH, Bremen, Germany, College of Computer and Information Science, Northeastern University, Boston, MA, USA and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA European Molecular Biology Laboratory, Heidelberg, Germany, Center for Industrial Mathematics, University of Bremen, Bremen, Germany, High Performance Humanoid Technologies Lab, Institute for Anthropomatics, Karlsruhe Institute of Technolo
| |
Collapse
|
80
|
Kersey PJ, Allen JE, Armean I, Boddu S, Bolt BJ, Carvalho-Silva D, Christensen M, Davis P, Falin LJ, Grabmueller C, Humphrey J, Kerhornou A, Khobova J, Aranganathan NK, Langridge N, Lowy E, McDowall MD, Maheswari U, Nuhn M, Ong CK, Overduin B, Paulini M, Pedro H, Perry E, Spudich G, Tapanari E, Walts B, Williams G, Tello-Ruiz M, Stein J, Wei S, Ware D, Bolser DM, Howe KL, Kulesha E, Lawson D, Maslen G, Staines DM. Ensembl Genomes 2016: more genomes, more complexity. Nucleic Acids Res 2015; 44:D574-80. [PMID: 26578574 PMCID: PMC4702859 DOI: 10.1093/nar/gkv1209] [Citation(s) in RCA: 371] [Impact Index Per Article: 41.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Accepted: 10/27/2015] [Indexed: 12/14/2022] Open
Abstract
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.
Collapse
Affiliation(s)
- Paul Julian Kersey
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - James E Allen
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Irina Armean
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sanjay Boddu
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Bruce J Bolt
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Denise Carvalho-Silva
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Mikkel Christensen
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Paul Davis
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Lee J Falin
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christoph Grabmueller
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Jay Humphrey
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Arnaud Kerhornou
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Julia Khobova
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Naveen K Aranganathan
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nicholas Langridge
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Ernesto Lowy
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Mark D McDowall
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Uma Maheswari
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Michael Nuhn
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Chuang Kee Ong
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Bert Overduin
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Michael Paulini
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Helder Pedro
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Emily Perry
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Giulietta Spudich
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Electra Tapanari
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Brandon Walts
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Gareth Williams
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Marcela Tello-Ruiz
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Joshua Stein
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Sharon Wei
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Doreen Ware
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA USDA-ARS NAA Plant, Soil and Nutrition Laboratory Research Unit, Cornell University, Ithaca, NY 14853, USA
| | - Daniel M Bolser
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Kevin L Howe
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Eugene Kulesha
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Daniel Lawson
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Gareth Maslen
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Daniel M Staines
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| |
Collapse
|
81
|
Hugall AF, O'Hara TD, Hunjan S, Nilsen R, Moussalli A. An Exon-Capture System for the Entire Class Ophiuroidea. Mol Biol Evol 2015; 33:281-94. [PMID: 26474846 PMCID: PMC4693979 DOI: 10.1093/molbev/msv216] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Exon-capture studies have typically been restricted to relatively shallow phylogenetic scales due primarily to hybridization constraints. Here, we present an exon-capture system for an entire class of marine invertebrates, the Ophiuroidea, built upon a phylogenetically diverse transcriptome foundation. The system captures approximately 90% of the 1,552 exon target, across all major lineages of the quarter-billion-year-old extant crown group. Key features of our system are 1) basing the target on an alignment of orthologous genes determined from 52 transcriptomes spanning the phylogenetic diversity and trimmed to remove anything difficult to capture, map, or align; 2) use of multiple artificial representatives based on ancestral state reconstructions rather than exemplars to improve capture and mapping of the target; 3) mapping reads to a multi-reference alignment; and 4) using patterns of site polymorphism to distinguish among paralogy, polyploidy, allelic differences, and sample contamination. The resulting data give a well-resolved tree (currently standing at 417 samples, 275,352 sites, 91% data-complete) that will transform our understanding of ophiuroid evolution and biogeography.
Collapse
Affiliation(s)
| | | | | | - Roger Nilsen
- Georgia Genomics Facility, University of Georgia
| | | |
Collapse
|
82
|
Anderson D, Ferreras E, Trindade M, Cowan D. A novel bacterial Water Hypersensitivity-like protein shows in vivo protection against cold and freeze damage. FEMS Microbiol Lett 2015; 362:fnv110. [PMID: 26187747 DOI: 10.1093/femsle/fnv110] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/04/2015] [Indexed: 11/13/2022] Open
Abstract
Metagenomic library screening, by functional or sequence analysis, has become an established method for the identification of novel genes and gene products, including genetic elements implicated in microbial stress response and adaptation. We have identified, using a sequence-based approach, a fosmid clone from an Antarctic desert soil metagenome library containing a novel gene which codes for a protein homologous to a Water Hypersensitivity domain (WHy). The WHy domain is typically found as a component of specific LEA (Late Embryogenesis Abundant) proteins, particularly the LEA-14 (LEA-8) variants, which occur widely in plants, nematodes, bacteria and archaea and which are typically induced by exposure to stress conditions. The novel WHy-like protein (165 amino acid, 18.6 kDa) exhibits a largely invariant NPN motif at the N-terminus and has high sequence identity to genes identified in Pseudomonas genomes. Expression of this protein in Escherichia coli significantly protected the recombinant host against cold and freeze stress.
Collapse
Affiliation(s)
- Dominique Anderson
- Institute for Microbial Biotechnology and Metagenomics, Department of Biotechnology, University of the Western Cape, Bellville 7535, Cape Town, South Africa
| | - Eloy Ferreras
- Centre for Microbial Ecology and Genomics, Department of Genetics, University of Pretoria, Hatfield 0028, Pretoria, South Africa
| | - Marla Trindade
- Institute for Microbial Biotechnology and Metagenomics, Department of Biotechnology, University of the Western Cape, Bellville 7535, Cape Town, South Africa
| | - Don Cowan
- Institute for Microbial Biotechnology and Metagenomics, Department of Biotechnology, University of the Western Cape, Bellville 7535, Cape Town, South Africa Centre for Microbial Ecology and Genomics, Department of Genetics, University of Pretoria, Hatfield 0028, Pretoria, South Africa
| |
Collapse
|
83
|
O'Donnell B, Maurer A, Papandreou-Suppappola A, Stafford P. Time-Frequency Analysis of Peptide Microarray Data: Application to Brain Cancer Immunosignatures. Cancer Inform 2015; 14:219-33. [PMID: 26157331 PMCID: PMC4476374 DOI: 10.4137/cin.s17285] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Revised: 03/02/2015] [Accepted: 03/06/2015] [Indexed: 12/21/2022] Open
Abstract
One of the gravest dangers facing cancer patients is an extended symptom-free lull between tumor initiation and the first diagnosis. Detection of tumors is critical for effective intervention. Using the body’s immune system to detect and amplify tumor-specific signals may enable detection of cancer using an inexpensive immunoassay. Immunosignatures are one such assay: they provide a map of antibody interactions with random-sequence peptides. They enable detection of disease-specific patterns using classic train/test methods. However, to date, very little effort has gone into extracting information from the sequence of peptides that interact with disease-specific antibodies. Because it is difficult to represent all possible antigen peptides in a microarray format, we chose to synthesize only 330,000 peptides on a single immunosignature microarray. The 330,000 random-sequence peptides on the microarray represent 83% of all tetramers and 27% of all pentamers, creating an unbiased but substantial gap in the coverage of total sequence space. We therefore chose to examine many relatively short motifs from these random-sequence peptides. Time-variant analysis of recurrent subsequences provided a means to dissect amino acid sequences from the peptides while simultaneously retaining the antibody–peptide binding intensities. We first used a simple experiment in which monoclonal antibodies with known linear epitopes were exposed to these random-sequence peptides, and their binding intensities were used to create our algorithm. We then demonstrated the performance of the proposed algorithm by examining immunosignatures from patients with Glioblastoma multiformae (GBM), an aggressive form of brain cancer. Eight different frameshift targets were identified from the random-sequence peptides using this technique. If immune-reactive antigens can be identified using a relatively simple immune assay, it might enable a diagnostic test with sufficient sensitivity to detect tumors in a clinically useful way.
Collapse
Affiliation(s)
- Brian O'Donnell
- School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, USA
| | - Alexander Maurer
- School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, USA
| | | | - Phillip Stafford
- Center for Innovations in Medicine, The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
84
|
Neumann RS, Kumar S, Shalchian-Tabrizi K. BLAST output visualization in the new sequencing era. Brief Bioinform 2015; 15:484-503. [PMID: 23603091 DOI: 10.1093/bib/bbt009] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Basic Local Alignment Search Tool (BLAST) algorithm remains one of the most widely used bioinformatic programs. For many projects, new sequencing technologies and increased database sizes will increase the BLAST output significantly. Frequently, this output is so large that it is no longer able to be processed manually. As BLAST users are increasingly recruited from mainstream biology without any bioinformatic background, user-friendly programs capable of BLAST output visualization, analysis and post-processing are in demand. In this review, freely available BLAST output processing programs are categorized as BLAST output interpreters, BLAST environments, BLAST output parsers or specialized tools. They are evaluated according to their user-friendliness, analysis features and high-throughput data processing capabilities.
Collapse
|
85
|
Palmer AD, Bunch J, Styles IB. The use of random projections for the analysis of mass spectrometry imaging data. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2015; 26:315-22. [PMID: 25522725 PMCID: PMC4320302 DOI: 10.1007/s13361-014-1024-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Revised: 08/28/2014] [Accepted: 10/08/2014] [Indexed: 05/04/2023]
Abstract
The 'curse of dimensionality' imposes fundamental limits on the analysis of the large, information rich datasets that are produced by mass spectrometry imaging. Additionally, such datasets are often too large to be analyzed as a whole and so dimensionality reduction is required before further analysis can be performed. We investigate the use of simple random projections for the dimensionality reduction of mass spectrometry imaging data and examine how they enable efficient and fast segmentation using k-means clustering. The method is computationally efficient and can be implemented such that only one spectrum is needed in memory at any time. We use this technique to reveal histologically significant regions within MALDI images of diseased human liver. Segmentation results achieved following a reduction in the dimensionality of the data by more than 99% (without peak picking) showed that histologic changes due to disease can be automatically visualized from molecular images.
Collapse
Affiliation(s)
- Andrew D. Palmer
- PSIBS Doctoral Training Centre, University of Birmingham, Edgbaston B15 2TT Birmingham, UK
- Zentrum für Technomathematik, Fachbereich 3, Universität Bremen, Postfach 33 04 40, 28334 Bremen, Deutschland
| | - Josephine Bunch
- National Physical Laboratory, Hampton Road, Teddington, TW11 0LW Middlesex, UK
- School of Pharmacy, University of Nottingham, University Park NG7 2RD Nottingham, UK
| | - Iain B. Styles
- School of Computer Science, University of Birmingham, Edgbaston B15 2TT Birmingham, UK
| |
Collapse
|
86
|
Peptide based diagnostics: Are random-sequence peptides more useful than tiling proteome sequences? J Immunol Methods 2015; 417:10-21. [DOI: 10.1016/j.jim.2014.12.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Revised: 11/08/2014] [Accepted: 12/05/2014] [Indexed: 11/19/2022]
|
87
|
Dmytrenko O, Russell SL, Loo WT, Fontanez KM, Liao L, Roeselers G, Sharma R, Stewart FJ, Newton ILG, Woyke T, Wu D, Lang JM, Eisen JA, Cavanaugh CM. The genome of the intracellular bacterium of the coastal bivalve, Solemya velum: a blueprint for thriving in and out of symbiosis. BMC Genomics 2014; 15:924. [PMID: 25342549 PMCID: PMC4287430 DOI: 10.1186/1471-2164-15-924] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Accepted: 09/23/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Symbioses between chemoautotrophic bacteria and marine invertebrates are rare examples of living systems that are virtually independent of photosynthetic primary production. These associations have evolved multiple times in marine habitats, such as deep-sea hydrothermal vents and reducing sediments, characterized by steep gradients of oxygen and reduced chemicals. Due to difficulties associated with maintaining these symbioses in the laboratory and culturing the symbiotic bacteria, studies of chemosynthetic symbioses rely heavily on culture independent methods. The symbiosis between the coastal bivalve, Solemya velum, and its intracellular symbiont is a model for chemosynthetic symbioses given its accessibility in intertidal environments and the ability to maintain it under laboratory conditions. To better understand this symbiosis, the genome of the S. velum endosymbiont was sequenced. RESULTS Relative to the genomes of obligate symbiotic bacteria, which commonly undergo erosion and reduction, the S. velum symbiont genome was large (2.7 Mb), GC-rich (51%), and contained a large number (78) of mobile genetic elements. Comparative genomics identified sets of genes specific to the chemosynthetic lifestyle and necessary to sustain the symbiosis. In addition, a number of inferred metabolic pathways and cellular processes, including heterotrophy, branched electron transport, and motility, suggested that besides the ability to function as an endosymbiont, the bacterium may have the capacity to live outside the host. CONCLUSIONS The physiological dexterity indicated by the genome substantially improves our understanding of the genetic and metabolic capabilities of the S. velum symbiont and the breadth of niches the partners may inhabit during their lifecycle.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Jonathan A Eisen
- Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Avenue, 4081 Biological Laboratories, Cambridge, MA 02138, USA.
| | | |
Collapse
|
88
|
Abstract
Although the search for disease biomarkers continues, the clinical return has thus far been disappointing. The complexity of the body's response to disease makes it difficult to represent this response with only a few biomarkers, particularly when many are present at low levels. An alternative to the typical reductionist biomarker paradigm is an assay we call an "immunosignature." This approach leverages the response of antibodies to disease-related changes, as well as the inherent signal amplification associated with antigen-stimulated B-cell proliferation. To perform an immunosignature assay, the antibodies in diluted blood are incubated with a microarray of thousands of random sequence peptides. The pattern of binding to these peptides is the immunosignature. Because the peptide sequences are completely random, the assay is effectively disease-agnostic, potentially providing a comprehensive diagnostic on multiple diseases simultaneously. To explore the ability of an immunosignature to detect and identify multiple diseases simultaneously, 20 samples from each of five cancer cohorts collected from multiple sites and 20 noncancer samples (120 total) were used as a training set to develop a reference immunosignature. A blinded evaluation of 120 blinded samples covering the same diseases gave 95% classification accuracy. To investigate the breadth of the approach and test sensitivity to biological diversity further, immunosignatures of >1,500 historical samples comprising 14 different diseases were examined by training with 75% of the samples and testing the remaining 25%. The average accuracy was >98%. These results demonstrate the potential power of the immunosignature approach in the accurate, simultaneous classification of disease.
Collapse
|
89
|
Williams S, Stafford P, Hoffman SA. Diagnosis and early detection of CNS-SLE in MRL/lpr mice using peptide microarrays. BMC Immunol 2014; 15:23. [PMID: 24908187 PMCID: PMC4065311 DOI: 10.1186/1471-2172-15-23] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Accepted: 05/20/2014] [Indexed: 12/20/2022] Open
Abstract
Background An accurate method that can diagnose and predict lupus and its neuropsychiatric manifestations is essential since currently there are no reliable methods. Autoantibodies to a varied panel of antigens in the body are characteristic of lupus. In this study we investigated whether serum autoantibody binding patterns on random-sequence peptide microarrays (immunosignaturing) can be used for diagnosing and predicting the onset of lupus and its central nervous system (CNS) manifestations. We also tested the techniques for identifying potentially pathogenic autoantibodies in CNS-Lupus. We used the well-characterized MRL/lpr lupus animal model in two studies as a first step to develop and evaluate future studies in humans. Results In study one we identified possible diagnostic peptides for both lupus and altered behavior in the forced swim test. When comparing the results of study one to that of study two (carried out in a similar manner), we further identified potential peptides that may be diagnostic and predictive of both lupus and altered behavior in the forced swim test. We also characterized five potentially pathogenic brain-reactive autoantibodies, as well as suggested possible brain targets. Conclusions These results indicate that immunosignaturing could predict and diagnose lupus and its CNS manifestations. It can also be used to characterize pathogenic autoantibodies, which may help to better understand the underlying mechanisms of CNS-Lupus.
Collapse
Affiliation(s)
- Stephanie Williams
- Neuroimmunology Labs, School of Life Sciences, Arizona State University, Tempe, AZ 85287-4501, USA.
| | | | | |
Collapse
|
90
|
Shrestha AMS, Frith MC, Horton P. A bioinformatician's guide to the forefront of suffix array construction algorithms. Brief Bioinform 2014; 15:138-54. [PMID: 24413184 PMCID: PMC3956071 DOI: 10.1093/bib/bbt081] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The suffix array and its variants are text-indexing data structures that have become indispensable in the field of bioinformatics. With the uninitiated in mind, we provide an accessible exposition of the SA-IS algorithm, which is the state of the art in suffix array construction. We also describe DisLex, a technique that allows standard suffix array construction algorithms to create modified suffix arrays designed to enable a simple form of inexact matching needed to support 'spaced seeds' and 'subset seeds' used in many biological applications.
Collapse
|
91
|
Ramsak Ž, Baebler Š, Rotter A, Korbar M, Mozetic I, Usadel B, Gruden K. GoMapMan: integration, consolidation and visualization of plant gene annotations within the MapMan ontology. Nucleic Acids Res 2013; 42:D1167-75. [PMID: 24194592 PMCID: PMC3965006 DOI: 10.1093/nar/gkt1056] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
GoMapMan (http://www.gomapman.org) is an open web-accessible resource for gene functional annotations in the plant sciences. It was developed to facilitate improvement, consolidation and visualization of gene annotations across several plant species. GoMapMan is based on the MapMan ontology, organized in the form of a hierarchical tree of biological concepts, which describe gene functions. Currently, genes of the model species Arabidopsis and three crop species (potato, tomato and rice) are included. The main features of GoMapMan are (i) dynamic and interactive gene product annotation through various curation options; (ii) consolidation of gene annotations for different plant species through the integration of orthologue group information; (iii) traceability of gene ontology changes and annotations; (iv) integration of external knowledge about genes from different public resources; and (v) providing gathered information to high-throughput analysis tools via dynamically generated export files. All of the GoMapMan functionalities are openly available, with the restriction on the curation functions, which require prior registration to ensure traceability of the implemented changes.
Collapse
Affiliation(s)
- Živa Ramsak
- Department of Biotechnology and Systems Biology, National Institute of Biology, 1000 Ljubljana, Slovenia, Department of Knowledge Technologies, JoŽef Stefan Institute, 1000 Ljubljana, Slovenia, Department of Biology, Institute for Biology I, RWTH Aachen University, D-52056 Aachen, Germany and IBG-2: Plant Sciences, Institute for Bio- and Geosciences, Forschungszentrum Jülich, 52425 Jülich, Germany
| | | | | | | | | | | | | |
Collapse
|
92
|
Abstract
The development of new vaccines would be greatly facilitated by having effective methods to predict vaccine performance. Such methods could also be helpful in monitoring individual vaccine responses to existing vaccines. We have developed "immunosignaturing" as a simple, comprehensive, chip-based method to display the antibody diversity in an individual on peptide arrays. Here we examined whether this technology could be used to develop correlates for predicting vaccine effectiveness. By using a mouse influenza infection, we show that the immunosignaturing of a natural infection can be used to discriminate a protective from nonprotective vaccine. Further, we demonstrate that an immunosignature can determine which mice receiving the same vaccine will survive. Finally, we show that the peptides comprising the correlate signatures of protection can be used to identify possible epitopes in the influenza virus proteome that are correlates of protection.
Collapse
|
93
|
A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology. BIOMED RESEARCH INTERNATIONAL 2013; 2013:432375. [PMID: 24228248 PMCID: PMC3818807 DOI: 10.1155/2013/432375] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 08/26/2013] [Accepted: 08/27/2013] [Indexed: 01/04/2023]
Abstract
Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease.
Collapse
|
94
|
Sykes KF, Legutki JB, Stafford P. Immunosignaturing: a critical review. Trends Biotechnol 2013; 31:45-51. [DOI: 10.1016/j.tibtech.2012.10.012] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2012] [Revised: 10/26/2012] [Accepted: 10/29/2012] [Indexed: 01/08/2023]
|
95
|
Restrepo L, Stafford P, Johnston SA. Feasibility of an early Alzheimer's disease immunosignature diagnostic test. J Neuroimmunol 2012; 254:154-60. [PMID: 23084373 DOI: 10.1016/j.jneuroim.2012.09.014] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2012] [Revised: 09/27/2012] [Accepted: 09/28/2012] [Indexed: 01/28/2023]
Abstract
A practical diagnostic test is needed for early Alzheimer's disease (AD) detection. Immunosignaturing, a technology that employs antibody binding to a random-sequence peptide microarray, generates profiles that distinguish transgenic mice engineered with familial AD mutations (APPswe/PSEN1-dE9) from non-transgenic littermates. It can also detect an AD-like signature in humans. Here, we assess the changes in the immunosignature at different time points of the disease in mice and humans. We also evaluate the accuracy of the late-stage signature as a test to discriminate between young mice with familial AD mutations from non-transgenic littermates. Plasma samples from AD patients were assayed 3-12 months apart, while APPswe/PSEN1-dE9 and non-transgenic controls supplied plasma at monthly intervals until they reached 15 months of age. Microarrays with 10,000 random-sequence peptides were used to compare antibody binding patterns. These patterns gradually changed over the life-span of mice. Strong, characteristic signatures were observed in transgenic mice at early, mid and late stages, but these profiles had minimal overlap. The signature of young transgenic mice had an error rate of 18% at classifying plasma samples from late-stage transgenic mice. Conversely, the late-stage transgenic mice signature discriminated between young transgenic mice and littermates with an error rate of 21%. Less distinctive profiles were recognizable throughout the transgenic mice lifespan, being detectable as early as 2 months. The human signature had minimal change on short-term follow-up. Our results call for a reappraisal of the way incipient AD is studied, as biomarkers seen in late-stages of the disease may not be relevant in earlier stages.
Collapse
Affiliation(s)
- Lucas Restrepo
- Center for Innovations in Medicine, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5901, United States
| | | | | |
Collapse
|