251
|
The common marmoset genome provides insight into primate biology and evolution. Nat Genet 2014; 46:850-7. [PMID: 25038751 PMCID: PMC4138798 DOI: 10.1038/ng.3042] [Citation(s) in RCA: 162] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2013] [Accepted: 06/27/2014] [Indexed: 02/06/2023]
Abstract
A first analysis of the genome sequence of the common marmoset (Callithrix jacchus), assembled using traditional Sanger methods and Ensembl annotation, has permitted genomic comparison with apes and that old world monkeys and the identification of specific molecular features a rapid reproductive capacity partly due to may contribute to the unique biology of diminutive The common marmoset has prevalence of this dizygotic primate. twins. Remarkably, these twins share placental circulation and exchange hematopoietic stem cells in utero, resulting in adults that are hematopoietic chimeras. We observed positive selection or non-synonymous substitutions for genes encoding growth hormone / insulin-like growth factor (growth pathways), respiratory complex I (metabolic pathways), immunobiology, and proteases (reproductive and immunity pathways). In addition, both protein-coding and microRNA genes related to reproduction exhibit rapid sequence evolution. This New World monkey genome sequence enables significantly increased power for comparative analyses among available primate genomes and facilitates biomedical research application.
Collapse
|
252
|
Spouge JL, Mariño-Ramírez L, Sheetlin SL. Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps. INTERNATIONAL JOURNAL OF BIOINFORMATICS RESEARCH AND APPLICATIONS 2014; 10:384-408. [PMID: 24989859 DOI: 10.1504/ijbra.2014.062991] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Some biological sequences contain subsequences of unusual composition; e.g. some proteins contain DNA binding domains, transmembrane regions and charged regions, and some DNA sequences contain repeats. The linear-time Ruzzo-Tompa (RT) algorithm finds subsequences of unusual composition, using a sequence of scores as input and the corresponding 'maximal segments' as output. In principle, permitting gaps in the output subsequences could improve sensitivity. Here, the input of the RT algorithm is generalised to a finite, totally ordered, weighted graph, so the algorithm locates paths of maximal weight through increasing but not necessarily adjacent vertices. By permitting the penalised deletion of unfavourable letters, the generalisation therefore includes gaps. The program RepWords, which finds inexact simple repeats in DNA, exemplifies the general concepts by out-performing a similar extant, ad hoc tool. With minimal programming effort, the generalised Ruzzo-Tompa algorithm could improve the performance of many programs for finding biological subsequences of unusual composition.
Collapse
Affiliation(s)
- John L Spouge
- Computational Biology Branch, National Center for Biotechnology Information, Bethesda, MD 20894, USA
| | - Leonardo Mariño-Ramírez
- Computational Biology Branch, National Center for Biotechnology Information, Bethesda, MD 20894, USA
| | - Sergey L Sheetlin
- Computational Biology Branch, National Center for Biotechnology Information, Bethesda, MD 20894, USA
| |
Collapse
|
253
|
Jiang Y, Xie M, Chen W, Talbot R, Maddox JF, Faraut T, Wu C, Muzny DM, Li Y, Zhang W, Stanton JA, Brauning R, Barris WC, Hourlier T, Aken BL, Searle SMJ, Adelson DL, Bian C, Cam GR, Chen Y, Cheng S, DeSilva U, Dixen K, Dong Y, Fan G, Franklin IR, Fu S, Guan R, Highland MA, Holder ME, Huang G, Ingham AB, Jhangiani SN, Kalra D, Kovar CL, Lee SL, Liu W, Liu X, Lu C, Lv T, Mathew T, McWilliam S, Menzies M, Pan S, Robelin D, Servin B, Townley D, Wang W, Wei B, White SN, Yang X, Ye C, Yue Y, Zeng P, Zhou Q, Hansen JB, Kristensen K, Gibbs RA, Flicek P, Warkup CC, Jones HE, Oddy VH, Nicholas FW, McEwan JC, Kijas J, Wang J, Worley KC, Archibald AL, Cockett N, Xu X, Wang W, Dalrymple BP. The sheep genome illuminates biology of the rumen and lipid metabolism. Science 2014; 344:1168-1173. [PMID: 24904168 DOI: 10.1126/science.1252806] [Citation(s) in RCA: 312] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Sheep (Ovis aries) are a major source of meat, milk, and fiber in the form of wool and represent a distinct class of animals that have a specialized digestive organ, the rumen, that carries out the initial digestion of plant material. We have developed and analyzed a high-quality reference sheep genome and transcriptomes from 40 different tissues. We identified highly expressed genes encoding keratin cross-linking proteins associated with rumen evolution. We also identified genes involved in lipid metabolism that had been amplified and/or had altered tissue expression patterns. This may be in response to changes in the barrier lipids of the skin, an interaction between lipid metabolism and wool synthesis, and an increased role of volatile fatty acids in ruminants compared with nonruminant animals.
Collapse
Affiliation(s)
- Yu Jiang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China.,CSIRO Animal Food and Health Sciences, St Lucia, QLD 4067, Australia.,College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Min Xie
- BGI-Shenzhen, Shenzhen 518083, China
| | | | - Richard Talbot
- Ediburgh Genomics, University of Edinburgh, Easter Bush, Midlothian EH 25 9RG, UK
| | - Jillian F Maddox
- Department of Veterinary Science, University of Melbourne, Victoria 3010, Australia
| | - Thomas Faraut
- INRA, Laboratoire de Génétique Cellulaire, UMR 444, Castanet-Tolosan F-31326, France
| | - Chunhua Wu
- Utah State University, Logan, UT 84322-1435-1435, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Wenguang Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China.,Inner Mongolia Agricultural University, Hohhot 010018, China.,Institute of ATCG, Nei Mongol Bio-Information, Hohhot, China
| | - Jo-Ann Stanton
- Department of Anatomy, University of Otago, Dunedin 9054, New Zealand
| | - Rudiger Brauning
- AgResearch, Invermay Agricultural Centre, Mosgiel 9053, New Zealand
| | - Wesley C Barris
- CSIRO Animal Food and Health Sciences, St Lucia, QLD 4067, Australia
| | - Thibaut Hourlier
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom
| | - Bronwen L Aken
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom
| | - Stephen M J Searle
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - David L Adelson
- CSIRO Animal Food and Health Sciences, St Lucia, QLD 4067, Australia
| | - Chao Bian
- BGI-Shenzhen, Shenzhen 518083, China
| | - Graham R Cam
- CSIRO Animal Food and Health Sciences, St Lucia, QLD 4067, Australia
| | - Yulin Chen
- College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | | | - Udaya DeSilva
- CSIRO Animal Food and Health Sciences, St Lucia, QLD 4067, Australia
| | - Karen Dixen
- Department of Biology, University of Copenhagen, DK-2100 Copenhagen Ø, Denmark
| | - Yang Dong
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | | | - Ian R Franklin
- CSIRO Animal Food and Health Sciences, St Lucia, QLD 4067, Australia
| | - Shaoyin Fu
- Inner Mongolia Agricultural University, Hohhot 010018, China
| | - Rui Guan
- BGI-Shenzhen, Shenzhen 518083, China
| | - Margaret A Highland
- USDA-ARS Animal Disease Research Unit, Pullman, WA 99164 USA.,Department of Veterinary Microbiology & Pathology, Washington State University, Pullman, WA 99164 USA
| | - Michael E Holder
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Aaron B Ingham
- CSIRO Animal Food and Health Sciences, St Lucia, QLD 4067, Australia
| | - Shalini N Jhangiani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Divya Kalra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Christie L Kovar
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Sandra L Lee
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Xin Liu
- BGI-Shenzhen, Shenzhen 518083, China
| | | | - Tian Lv
- BGI-Shenzhen, Shenzhen 518083, China
| | - Tittu Mathew
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Sean McWilliam
- CSIRO Animal Food and Health Sciences, St Lucia, QLD 4067, Australia
| | - Moira Menzies
- CSIRO Animal Food and Health Sciences, St Lucia, QLD 4067, Australia
| | | | - David Robelin
- INRA, Laboratoire de Génétique Cellulaire, UMR 444, Castanet-Tolosan F-31326, France
| | - Bertrand Servin
- INRA, Laboratoire de Génétique Cellulaire, UMR 444, Castanet-Tolosan F-31326, France
| | - David Townley
- CSIRO Animal Food and Health Sciences, St Lucia, QLD 4067, Australia
| | | | - Bin Wei
- BGI-Shenzhen, Shenzhen 518083, China.,Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Stephen N White
- USDA-ARS Animal Disease Research Unit, Pullman, WA 99164 USA.,Department of Veterinary Microbiology & Pathology, Washington State University, Pullman, WA 99164 USA
| | | | - Chen Ye
- BGI-Shenzhen, Shenzhen 518083, China
| | - Yaojing Yue
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Lanzhou,730050,China
| | - Peng Zeng
- BGI-Shenzhen, Shenzhen 518083, China
| | - Qing Zhou
- BGI-Shenzhen, Shenzhen 518083, China
| | - Jacob B Hansen
- Department of Biology, University of Copenhagen, DK-2100 Copenhagen Ø, Denmark
| | - Karsten Kristensen
- Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom
| | | | - Huw E Jones
- Biosciences KTN, The Roslin Institute, Easter Bush, Midlothian, EH25 9RG, UK
| | - V Hutton Oddy
- School of Environmental and Rural Science, University of New England, Armidale, NSW 2351, Australia
| | - Frank W Nicholas
- Faculty of Veterinary Science, University of Sydney, NSW 2006, Australia
| | - John C McEwan
- AgResearch, Invermay Agricultural Centre, Mosgiel 9053, New Zealand
| | - James Kijas
- CSIRO Animal Food and Health Sciences, St Lucia, QLD 4067, Australia
| | - Jun Wang
- BGI-Shenzhen, Shenzhen 518083, China.,Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark.,Princess Al Jawhara Center of Excellence in the Research of Hereditary Disorders, King Abdulaziz University, Jeddah 21589, Saudi Arabia.,Macau University of Science and Technology, Macau 999078, China
| | - Kim C Worley
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Alan L Archibald
- The Roslin Institute and R(D)SVS, University of Edinburgh, Easter Bush, Midlothian EH 25 9RG, UK
| | | | - Xun Xu
- BGI-Shenzhen, Shenzhen 518083, China
| | - Wen Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Brian P Dalrymple
- CSIRO Animal Food and Health Sciences, St Lucia, QLD 4067, Australia
| |
Collapse
|
254
|
Hawley ER, Malfatti SA, Pagani I, Huntemann M, Chen A, Foster B, Copeland A, del Rio TG, Pati A, Jansson JR, Gilbert JA, Tringe SG, Lorenson TD, Hess M. Metagenomes from two microbial consortia associated with Santa Barbara seep oil. Mar Genomics 2014; 18 Pt B:97-9. [PMID: 24958360 DOI: 10.1016/j.margen.2014.06.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2014] [Revised: 06/10/2014] [Accepted: 06/10/2014] [Indexed: 11/16/2022]
Abstract
The metagenomes from two microbial consortia associated with natural oils seeping into the Pacific Ocean offshore the coast of Santa Barbara (California, USA) were determined to complement already existing metagenomes generated from microbial communities associated with hydrocarbons that pollute the marine ecosystem. This genomics resource article is the first of two publications reporting a total of four new metagenomes from oils that seep into the Santa Barbara Channel.
Collapse
Affiliation(s)
| | - Stephanie A Malfatti
- Lawrence Livermore National Laboratory, Biosciences and Biotechnology Division, Livermore, CA, USA
| | | | | | - Amy Chen
- DOE Joint Genome Institute, Walnut Creek, CA, USA
| | - Brian Foster
- DOE Joint Genome Institute, Walnut Creek, CA, USA
| | | | | | - Amrita Pati
- DOE Joint Genome Institute, Walnut Creek, CA, USA
| | - Janet R Jansson
- DOE Joint Genome Institute, Walnut Creek, CA, USA; Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Jack A Gilbert
- Argonne National Laboratory, Lemont, IL, USA; University of Chicago, Chicago, IL, USA
| | - Susannah Green Tringe
- DOE Joint Genome Institute, Walnut Creek, CA, USA; Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | | - Matthias Hess
- Washington State University, Richland, WA, USA; DOE Joint Genome Institute, Walnut Creek, CA, USA; Pacific Northwest National Laboratory, Chemical & Biological Process Development Group, Richland, WA, USA; Environmental Molecular Sciences Laboratory, Richland, WA, USA.
| |
Collapse
|
255
|
Contrant M, Fender A, Chane-Woon-Ming B, Randrianjafy R, Vivet-Boudou V, Richer D, Pfeffer S. Importance of the RNA secondary structure for the relative accumulation of clustered viral microRNAs. Nucleic Acids Res 2014; 42:7981-96. [PMID: 24831544 PMCID: PMC4081064 DOI: 10.1093/nar/gku424] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Micro (mi)RNAs are small non-coding RNAs with key regulatory functions. Recent advances in the field allowed researchers to identify their targets. However, much less is known regarding the regulation of miRNAs themselves. The accumulation of these tiny regulators can be modulated at various levels during their biogenesis from the transcription of the primary transcript (pri-miRNA) to the stability of the mature miRNA. Here, we studied the importance of the pri-miRNA secondary structure for the regulation of mature miRNA accumulation. To this end, we used the Kaposi's sarcoma herpesvirus, which encodes a cluster of 12 pre-miRNAs. Using small RNA profiling and quantitative northern blot analysis, we measured the absolute amount of each mature miRNAs in different cellular context. We found that the difference in expression between the least and most expressed viral miRNAs could be as high as 60-fold. Using high-throughput selective 2′-hydroxyl acylation analyzed by primer extension, we then determined the secondary structure of the long primary transcript. We found that highly expressed miRNAs derived from optimally structured regions within the pri-miRNA. Finally, we confirmed the importance of the local structure by swapping stem-loops or by targeted mutagenesis of selected miRNAs, which resulted in a perturbed accumulation of the mature miRNA.
Collapse
Affiliation(s)
- Maud Contrant
- Architecture et Réactivité de l'ARN - UPR 9002, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, 15 rue René Descartes, F-67084 Strasbourg Cedex, France
| | - Aurélie Fender
- Architecture et Réactivité de l'ARN - UPR 9002, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, 15 rue René Descartes, F-67084 Strasbourg Cedex, France
| | - Béatrice Chane-Woon-Ming
- Architecture et Réactivité de l'ARN - UPR 9002, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, 15 rue René Descartes, F-67084 Strasbourg Cedex, France
| | - Ramy Randrianjafy
- Architecture et Réactivité de l'ARN - UPR 9002, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, 15 rue René Descartes, F-67084 Strasbourg Cedex, France
| | - Valérie Vivet-Boudou
- Architecture et Réactivité de l'ARN - UPR 9002, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, 15 rue René Descartes, F-67084 Strasbourg Cedex, France
| | - Delphine Richer
- Architecture et Réactivité de l'ARN - UPR 9002, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, 15 rue René Descartes, F-67084 Strasbourg Cedex, France
| | - Sébastien Pfeffer
- Architecture et Réactivité de l'ARN - UPR 9002, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, 15 rue René Descartes, F-67084 Strasbourg Cedex, France
| |
Collapse
|
256
|
Lopes IDON, Schliep A, de Carvalho ACPDLF. The discriminant power of RNA features for pre-miRNA recognition. BMC Bioinformatics 2014; 15:124. [PMID: 24884650 PMCID: PMC4046174 DOI: 10.1186/1471-2105-15-124] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2013] [Accepted: 04/08/2014] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Computational discovery of microRNAs (miRNA) is based on pre-determined sets of features from miRNA precursors (pre-miRNA). Some feature sets are composed of sequence-structure patterns commonly found in pre-miRNAs, while others are a combination of more sophisticated RNA features. In this work, we analyze the discriminant power of seven feature sets, which are used in six pre-miRNA prediction tools. The analysis is based on the classification performance achieved with these feature sets for the training algorithms used in these tools. We also evaluate feature discrimination through the F-score and feature importance in the induction of random forests. RESULTS Small or non-significant differences were found among the estimated classification performances of classifiers induced using sets with diversification of features, despite the wide differences in their dimension. Inspired in these results, we obtained a lower-dimensional feature set, which achieved a sensitivity of 90% and a specificity of 95%. These estimates are within 0.1% of the maximal values obtained with any feature set (SELECT, Section "Results and discussion") while it is 34 times faster to compute. Even compared to another feature set (FS2, see Section "Results and discussion"), which is the computationally least expensive feature set of those from the literature which perform within 0.1% of the maximal values, it is 34 times faster to compute. The results obtained by the tools used as references in the experiments carried out showed that five out of these six tools have lower sensitivity or specificity. CONCLUSION In miRNA discovery the number of putative miRNA loci is in the order of millions. Analysis of putative pre-miRNAs using a computationally expensive feature set would be wasteful or even unfeasible for large genomes. In this work, we propose a relatively inexpensive feature set and explore most of the learning aspects implemented in current ab-initio pre-miRNA prediction tools, which may lead to the development of efficient ab-initio pre-miRNA discovery tools.The material to reproduce the main results from this paper can be downloaded from http://bioinformatics.rutgers.edu/Static/Software/discriminant.tar.gz.
Collapse
Affiliation(s)
- Ivani de O N Lopes
- Empresa Brasileira de Pesquisa Agropecuária, Embrapa Soja, Caixa Postal 231, Londrina-PR, CEP 86001-970, Brasil.
| | | | | |
Collapse
|
257
|
The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nat Commun 2014; 5:3657. [PMID: 24755649 PMCID: PMC4071752 DOI: 10.1038/ncomms4657] [Citation(s) in RCA: 591] [Impact Index Per Article: 59.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Accepted: 03/14/2014] [Indexed: 02/07/2023] Open
Abstract
Vertebrate evolution has been shaped by several rounds of whole-genome duplications (WGDs) that are often suggested to be associated with adaptive radiations and evolutionary innovations. Due to an additional round of WGD, the rainbow trout genome offers a unique opportunity to investigate the early evolutionary fate of a duplicated vertebrate genome. Here we show that after 100 million years of evolution the two ancestral subgenomes have remained extremely collinear, despite the loss of half of the duplicated protein-coding genes, mostly through pseudogenization. In striking contrast is the fate of miRNA genes that have almost all been retained as duplicated copies. The slow and stepwise rediploidization process characterized here challenges the current hypothesis that WGD is followed by massive and rapid genomic reorganizations and gene deletions. Although whole-genome duplications (WGDs) are rare events, they have an important role in shaping vertebrate evolution. Here, the authors sequence the rainbow trout genome and show that rediploidization after WGD occurs in a slow and stepwise manner.
Collapse
|
258
|
Small RNA cloning and sequencing strategy affects host and viral microRNA expression signatures. J Biotechnol 2014; 181:35-44. [PMID: 24746587 DOI: 10.1016/j.jbiotec.2014.04.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Revised: 03/26/2014] [Accepted: 04/04/2014] [Indexed: 01/04/2023]
Abstract
The establishment of the microRNA (miRNA) expression signatures is the basic element to investigate the role played by these regulatory molecules in the biology of an organism. Marek's disease virus 1 (MDV-1) is an avian herpesvirus that naturally infects chicken and induces T cells lymphomas. During latency, MDV-1, like other herpesviruses, expresses a limited subset of transcripts. These include three miRNA clusters. Several studies identified the expression of virus and host encoded miRNAs from MDV-1 infected cell cultures and chickens. But a high discrepancy was observed when miRNA cloning frequencies obtained from different cloning and sequencing protocols were compared. Thus, we analyzed the effect of small RNA library preparation and sequencing on the miRNA frequencies obtained from the same RNA samples collected during MDV-1 infection of chicken at different steps of the oncoviral pathogenesis. Qualitative and quantitative variations were found in the data, depending on the strategy used. One of the mature miRNA derived from the latency-associated-transcript (LAT), mdv1-miR-M7-5p, showed the highest variation. Its cloning frequency was 50% of the viral miRNA counts when a small scale sequencing approach was used. Its frequency was 100 times less abundant when determined through the deep sequencing approach. Northern blot analysis showed a better correlation with the miRNA frequencies found by the small scale sequencing approach. By analyzing the cellular miRNA repertoire, we also found a gap between the two sequencing approaches. Collectively, our study indicates that next-generation sequencing data considered alone are limited for assessing the absolute copy number of transcripts. Thus, the quantification of small RNA should be addressed by compiling data obtained by using different techniques such as microarrays, qRT-PCR and NB analysis in support of high throughput sequencing data. These observations should be considered when miRNA variations are studied prior addressing functional studies.
Collapse
|
259
|
Microbiome analysis of a microalgal mass culture growing in municipal wastewater in a prototype OMEGA photobioreactor. ALGAL RES 2014. [DOI: 10.1016/j.algal.2013.11.006] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
260
|
Rosa BA, Jasmer DP, Mitreva M. Genome-wide tissue-specific gene expression, co-expression and regulation of co-expressed genes in adult nematode Ascaris suum. PLoS Negl Trop Dis 2014; 8:e2678. [PMID: 24516681 PMCID: PMC3916258 DOI: 10.1371/journal.pntd.0002678] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 12/18/2013] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Caenorhabditis elegans has traditionally been used as a model for studying nematode biology, but its small size limits the ability for researchers to perform some experiments such as high-throughput tissue-specific gene expression studies. However, the dissection of individual tissues is possible in the parasitic nematode Ascaris suum due to its relatively large size. Here, we take advantage of the recent genome sequencing of Ascaris suum and the ability to physically dissect its separate tissues to produce a wide-scale tissue-specific nematode RNA-seq datasets, including data on three non-reproductive tissues (head, pharynx, and intestine) in both male and female worms, as well as four reproductive tissues (testis, seminal vesicle, ovary, and uterus). We obtained fundamental information about the biology of diverse cell types and potential interactions among tissues within this multicellular organism. METHODOLOGY/PRINCIPAL FINDINGS Overexpression and functional enrichment analyses identified many putative biological functions enriched in each tissue studied, including functions which have not been previously studied in detail in nematodes. Putative tissue-specific transcriptional factors and corresponding binding motifs that regulate expression in each tissue were identified, including the intestine-enriched ELT-2 motif/transcription factor previously described in nematode intestines. Constitutively expressed and novel genes were also characterized, with the largest number of novel genes found to be overexpressed in the testis. Finally, a putative acetylcholine-mediated transcriptional network connecting biological activity in the head to the male reproductive system is described using co-expression networks, along with a similar ecdysone-mediated system in the female. CONCLUSIONS/SIGNIFICANCE The expression profiles, co-expression networks and co-expression regulation of the 10 tissues studied and the tissue-specific analysis presented here are a valuable resource for studying tissue-specific biological functions in nematodes.
Collapse
Affiliation(s)
- Bruce A. Rosa
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Douglas P. Jasmer
- Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, Washington, United States of America
| | - Makedonka Mitreva
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
261
|
Abstract
MOTIVATION Since 1990, the basic local alignment search tool (BLAST) has become one of the most popular and fundamental bioinformatics tools for sequence similarity searching, receiving extensive attention from the research community. The two pioneering papers on BLAST have received over 96 000 citations. Given the huge population of BLAST users and the increasing size of sequence databases, an urgent topic of study is how to improve the speed. Recently, graphics processing units (GPUs) have been widely used as low-cost, high-performance computing platforms. The existing GPU-BLAST is a promising software tool that uses a GPU to accelerate protein sequence alignment. Unfortunately, there is still no GPU-accelerated software tool for BLAST-based nucleotide sequence alignment. RESULTS We developed G-BLASTN, a GPU-accelerated nucleotide alignment tool based on the widely used NCBI-BLAST. G-BLASTN can produce exactly the same results as NCBI-BLAST, and it has very similar user commands. Compared with the sequential NCBI-BLAST, G-BLASTN can achieve an overall speedup of 14.80X under 'megablast' mode. More impressively, it achieves an overall speedup of 7.15X over the multithreaded NCBI-BLAST running on 4 CPU cores. When running under 'blastn' mode, the overall speedups are 4.32X (against 1-core) and 1.56X (against 4-core). G-BLASTN also supports a pipeline mode that further improves the overall performance by up to 44% when handling a batch of queries as a whole. Currently G-BLASTN is best optimized for databases with long sequences. We plan to optimize its performance on short database sequences in our future work. AVAILABILITY http://www.comp.hkbu.edu.hk/∼chxw/software/G-BLASTN.html CONTACT chxw@comp.hkbu.edu.hk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kaiyong Zhao
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China and Institute of Computational and Theoretical Studies, Hong Kong Baptist University, Hong Kong, China
| | | |
Collapse
|
262
|
Hawley ER, Piao H, Scott NM, Malfatti S, Pagani I, Huntemann M, Chen A, Glavina Del Rio T, Foster B, Copeland A, Jansson J, Pati A, Tringe S, Gilbert JA, Lorenson TD, Hess M. Metagenomic analysis of microbial consortium from natural crude oil that seeps into the marine ecosystem offshore Southern California. Stand Genomic Sci 2014; 9:1259-74. [PMID: 25197496 PMCID: PMC4149020 DOI: 10.4056/sigs.5029016] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Crude oils can be major contaminants of the marine ecosystem and microorganisms play a significant role in the degradation of its main constituents. To increase our understanding of the microbial hydrocarbon degradation process in the marine ecosystem, we collected crude oil from an active seep area located in the Santa Barbara Channel (SBC) and generated a total of about 52 Gb of raw metagenomic sequence data. The assembled data comprised ~500 Mb, representing ~1.1 million genes derived primarily from chemolithoautotrophic bacteria. Members of Oceanospirillales, a bacterial order belonging to the Deltaproteobacteria, recruited less than 2% of the assembled genes within the SBC metagenome. In contrast, the microbial community associated with the oil plume that developed in the aftermath of the Deepwater Horizon (DWH) blowout in 2010, was dominated by Oceanospirillales, which comprised more than 60% of the metagenomic data generated from the DWH oil plume. This suggests that Oceanospirillales might play a less significant role in the microbially mediated hydrocarbon conversion within the SBC seep oil compared to the DWH plume oil. We hypothesize that this difference results from the SBC oil seep being mostly anaerobic, while the DWH oil plume is aerobic. Within the Archaea, the phylum Euryarchaeota, recruited more than 95% of the assembled archaeal sequences from the SBC oil seep metagenome, with more than 50% of the sequences assigned to members of the orders Methanomicrobiales and Methanosarcinales. These orders contain organisms capable of anaerobic methanogenesis and methane oxidation (AOM) and we hypothesize that these orders – and their metabolic capabilities – may be fundamental to the ecology of the SBC oil seep.
Collapse
Affiliation(s)
- Erik R Hawley
- Washington State University Tri-Cities, Richland, WA, USA
| | - Hailan Piao
- Washington State University Tri-Cities, Richland, WA, USA
| | | | - Stephanie Malfatti
- Lawrence Livermore National Laboratory, Biosciences and Biotechnology Division, Livermore, CA, USA
| | | | | | - Amy Chen
- DOE Joint Genome Institute, Walnut Creek, CA, USA
| | | | - Brian Foster
- DOE Joint Genome Institute, Walnut Creek, CA, USA
| | | | - Janet Jansson
- DOE Joint Genome Institute, Walnut Creek, CA, USA ; Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Amrita Pati
- DOE Joint Genome Institute, Walnut Creek, CA, USA
| | - Susannah Tringe
- DOE Joint Genome Institute, Walnut Creek, CA, USA ; Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Jack A Gilbert
- Argonne National Laboratory, Lemont, IL, USA ; University of Chicago, Chicago, IL, USA
| | | | - Matthias Hess
- Washington State University Tri-Cities, Richland, WA, USA ; DOE Joint Genome Institute, Walnut Creek, CA, USA ; Washington State University, Pullman, WA, USA ; Pacific Northwest National Laboratory, Chemical & Biological Process Development Group, Richland, WA, USA ; Environmental Molecular Sciences Laboratory, Richland, WA, USA
| |
Collapse
|
263
|
Schmutzer T, Ma L, Pousarebani N, Bull F, Stein N, Houben A, Scholz U. Kmasker--a tool for in silico prediction of single-copy FISH probes for the large-genome species Hordeum vulgare. Cytogenet Genome Res 2013; 142:66-78. [PMID: 24335088 DOI: 10.1159/000356460] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/08/2013] [Indexed: 11/19/2022] Open
Abstract
Specific localization of large genomic fragments by fluorescence in situ hybridization (FISH) is challenging in large- genome plant species due to the high content of repetitive sequences. We report the automated work flow (Kmasker) for in silico extraction of unique genomic sequences of large genomic fragments suitable for FISH in barley. This method can be widely used for the integration of genetic and cytogenetic maps in plants and other species with large and complex genomes if the probe sequence (e.g. BACs, sequence contigs) and a low coverage (8-fold) of unassembled sequences of the species of interest are available. Kmasker has been made publicly available as a web tool at http://webblast.ipk-gatersleben.de/kmasker.
Collapse
Affiliation(s)
- T Schmutzer
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | | | | | | | | | | | | |
Collapse
|
264
|
Carvalho AB, Clark AG. Efficient identification of Y chromosome sequences in the human and Drosophila genomes. Genome Res 2013; 23:1894-907. [PMID: 23921660 PMCID: PMC3814889 DOI: 10.1101/gr.156034.113] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2013] [Accepted: 07/25/2013] [Indexed: 12/25/2022]
Abstract
Notwithstanding their biological importance, Y chromosomes remain poorly known in most species. A major obstacle to their study is the identification of Y chromosome sequences; due to its high content of repetitive DNA, in most genome projects, the Y chromosome sequence is fragmented into a large number of small, unmapped scaffolds. Identification of Y-linked genes among these fragments has yielded important insights about the origin and evolution of Y chromosomes, but the process is labor intensive, restricting studies to a small number of species. Apart from these fragmentary assemblies, in a few mammalian species, the euchromatic sequence of the Y is essentially complete, owing to painstaking BAC mapping and sequencing. Here we use female short-read sequencing and k-mer comparison to identify Y-linked sequences in two very different genomes, Drosophila virilis and human. Using this method, essentially all D. virilis scaffolds were unambiguously classified as Y-linked or not Y-linked. We found 800 new scaffolds (totaling 8.5 Mbp), and four new genes in the Y chromosome of D. virilis, including JYalpha, a gene involved in hybrid male sterility. Our results also strongly support the preponderance of gene gains over gene losses in the evolution of the Drosophila Y. In the intensively studied human genome, used here as a positive control, we recovered all previously known genes or gene families, plus a small amount (283 kb) of new, unfinished sequence. Hence, this method works in large and complex genomes and can be applied to any species with sex chromosomes.
Collapse
Affiliation(s)
- Antonio Bernardo Carvalho
- Departamento de Genética, Universidade Federal do Rio de Janeiro, Caixa Postal 68011, CEP 21941-971, Rio de Janeiro, Brazil
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Andrew G. Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| |
Collapse
|
265
|
Markowitz VM, Chen IMA, Chu K, Szeto E, Palaniappan K, Pillay M, Ratner A, Huang J, Pagani I, Tringe S, Huntemann M, Billis K, Varghese N, Tennessen K, Mavromatis K, Pati A, Ivanova NN, Kyrpides NC. IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Res 2013; 42:D568-73. [PMID: 24136997 PMCID: PMC3964948 DOI: 10.1093/nar/gkt919] [Citation(s) in RCA: 196] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
IMG/M (http://img.jgi.doe.gov/m) provides support for comparative analysis of microbial community aggregate genomes (metagenomes) in the context of a comprehensive set of reference genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG/M’s data content and analytical tools have expanded continuously since its first version was released in 2007. Since the last report published in the 2012 NAR Database Issue, IMG/M’s database architecture, annotation and data integration pipelines and analysis tools have been extended to copewith the rapid growth in the number and size of metagenome data sets handled by the system. IMG/M data marts provide support for the analysis of publicly available genomes, expert review of metagenome annotations (IMG/M ER: http://img.jgi.doe.gov/mer) and Human Microbiome Project (HMP)-specific metagenome samples (IMG/M HMP: http://img.jgi.doe.gov/imgm_hmp).
Collapse
Affiliation(s)
- Victor M Markowitz
- Biological Data Management and Technology Center, Computational Research Division Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, 94720 USA and Microbial Genome and Metagenome Program, Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598 USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
266
|
Abstract
Assembling a large genome using next generation sequencing reads requires large computer memory and a long execution time. To reduce these requirements, we propose an extension-based assembler, called JR-Assembler, where J and R stand for "jumping" extension and read "remapping." First, it uses the read count to select good quality reads as seeds. Second, it extends each seed by a whole-read extension process, which expedites the extension process and can jump over short repeats. Third, it uses a dynamic back trimming process to avoid extension termination due to sequencing errors. Fourth, it remaps reads to each assembled sequence, and if an assembly error occurs by the presence of a repeat, it breaks the contig at the repeat boundaries. Fifth, it applies a less stringent extension criterion to connect low-coverage regions. Finally, it merges contigs by unused reads. An extensive comparison of JR-Assembler with current assemblers using datasets from small, medium, and large genomes shows that JR-Assembler achieves a better or comparable overall assembly quality and requires lower memory use and less central processing unit time, especially for large genomes. Finally, a simulation study shows that JR-Assembler achieves a superior performance on memory use and central processing unit time than most current assemblers when the read length is 150 bp or longer, indicating that the advantages of JR-Assembler over current assemblers will increase as the read length increases with advances in next generation sequencing technology.
Collapse
|
267
|
Taylor CM, Wang Q, Rosa BA, Huang SCC, Powell K, Schedl T, Pearce EJ, Abubucker S, Mitreva M. Discovery of anthelmintic drug targets and drugs using chokepoints in nematode metabolic pathways. PLoS Pathog 2013; 9:e1003505. [PMID: 23935495 PMCID: PMC3731235 DOI: 10.1371/journal.ppat.1003505] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 06/03/2013] [Indexed: 12/19/2022] Open
Abstract
Parasitic roundworm infections plague more than 2 billion people (1/3 of humanity) and cause drastic losses in crops and livestock. New anthelmintic drugs are urgently needed as new drug resistance and environmental concerns arise. A “chokepoint reaction” is defined as a reaction that either consumes a unique substrate or produces a unique product. A chokepoint analysis provides a systematic method of identifying novel potential drug targets. Chokepoint enzymes were identified in the genomes of 10 nematode species, and the intersection and union of all chokepoint enzymes were found. By studying and experimentally testing available compounds known to target proteins orthologous to nematode chokepoint proteins in public databases, this study uncovers features of chokepoints that make them successful drug targets. Chemogenomic screening was performed on drug-like compounds from public drug databases to find existing compounds that target homologs of nematode chokepoints. The compounds were prioritized based on chemical properties frequently found in successful drugs and were experimentally tested using Caenorhabditis elegans. Several drugs that are already known anthelmintic drugs and novel candidate targets were identified. Seven of the compounds were tested in Caenorhabditis elegans and three yielded a detrimental phenotype. One of these three drug-like compounds, Perhexiline, also yielded a deleterious effect in Haemonchus contortus and Onchocerca lienalis, two nematodes with divergent forms of parasitism. Perhexiline, known to affect the fatty acid oxidation pathway in mammals, caused a reduction in oxygen consumption rates in C. elegans and genome-wide gene expression profiles provided an additional confirmation of its mode of action. Computational modeling of Perhexiline and its target provided structural insights regarding its binding mode and specificity. Our lists of prioritized drug targets and drug-like compounds have potential to expedite the discovery of new anthelmintic drugs with broad-spectrum efficacy. The World Health Organization estimates that 2.9 million people are infected with parasitic roundworms, causing high-morbidity and mortality rates, developmental delays in children, and low productivity of affected individuals. The agricultural industry experiences drastic losses in crop and livestock due to parasitic worm infections. Therefore, there is an urgent need to identify new targets and drugs to fight parasitic nematode infection. This study identified metabolic chokepoint compounds that were either produced or consumed by a single reaction and elucidated the chokepoint enzyme that drives the reaction. If the enzyme that catalyzes that reaction is blocked, a toxic build-up of a compound or lack of compound necessary for subsequent reaction will occur, potentially causing adverse effects to the parasite organism. Compounds that target some of the chokepoint enzymes were tested in C. elegans and several compounds showed efficacy. One drug-like compound, Perhexiline, showed efficacy in two different parasitic worms and yielded expected physiological effects, indicating that this drug-like compound may have efficacy on a pan-phylum level through the predicted mode of action. The methodology to find and prioritize metabolic chokepoint targets and prioritize compounds could be applied to other parasites.
Collapse
Affiliation(s)
- Christina M. Taylor
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Qi Wang
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Bruce A. Rosa
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Stanley Ching-Cheng Huang
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Kerrie Powell
- SCYNEXIS, Inc, Research Triangle Park, North Carolina, United States of America
| | - Tim Schedl
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Edward J. Pearce
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Sahar Abubucker
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Makedonka Mitreva
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Division of Infectious Diseases, Department of Internal Medicine, Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
268
|
Mohorianu I, Stocks MB, Wood J, Dalmay T, Moulton V. CoLIde: a bioinformatics tool for CO-expression-based small RNA Loci Identification using high-throughput sequencing data. RNA Biol 2013; 10:1221-30. [PMID: 23851377 DOI: 10.4161/rna.25538] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Small RNAs (sRNAs) are 20-25 nt non-coding RNAs that act as guides for the highly sequence-specific regulatory mechanism known as RNA silencing. Due to the recent increase in sequencing depth, a highly complex and diverse population of sRNAs in both plants and animals has been revealed. However, the exponential increase in sequencing data has also made the identification of individual sRNA transcripts corresponding to biological units (sRNA loci) more challenging when based exclusively on the genomic location of the constituent sRNAs, hindering existing approaches to identify sRNA loci. To infer the location of significant biological units, we propose an approach for sRNA loci detection called CoLIde (Co-expression based sRNA Loci Identification) that combines genomic location with the analysis of other information such as variation in expression levels (expression pattern) and size class distribution. For CoLIde, we define a locus as a union of regions sharing the same pattern and located in close proximity on the genome. Biological relevance, detected through the analysis of size class distribution, is also calculated for each locus. CoLIde can be applied on ordered (e.g., time-dependent) or un-ordered (e.g., organ, mutant) series of samples both with or without biological/technical replicates. The method reliably identifies known types of loci and shows improved performance on sequencing data from both plants (e.g., A. thaliana, S. lycopersicum) and animals (e.g., D. melanogaster) when compared with existing locus detection techniques. CoLIde is available for use within the UEA Small RNA Workbench which can be downloaded from: http://srna-workbench.cmp.uea.ac.uk.
Collapse
Affiliation(s)
- Irina Mohorianu
- University of East Anglia; School of Computing Sciences; Norwich, UK
| | | | | | | | | |
Collapse
|
269
|
Depew J, Zhou B, McCorrison JM, Wentworth DE, Purushe J, Koroleva G, Fouts DE. Sequencing viral genomes from a single isolated plaque. Virol J 2013; 10:181. [PMID: 23742765 PMCID: PMC3693891 DOI: 10.1186/1743-422x-10-181] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2013] [Accepted: 05/31/2013] [Indexed: 02/05/2023] Open
Abstract
Background Whole genome sequencing of viruses and bacteriophages is often hindered because of the need for large quantities of genomic material. A method is described that combines single plaque sequencing with an optimization of Sequence Independent Single Primer Amplification (SISPA). This method can be used for de novo whole genome next-generation sequencing of any cultivable virus without the need for large-scale production of viral stocks or viral purification using centrifugal techniques. Methods A single viral plaque of a variant of the 2009 pandemic H1N1 human Influenza A virus was isolated and amplified using the optimized SISPA protocol. The sensitivity of the SISPA protocol presented here was tested with bacteriophage F_HA0480sp/Pa1651 DNA. The amplified products were sequenced with 454 and Illumina HiSeq platforms. Mapping and de novo assemblies were performed to analyze the quality of data produced from this optimized method. Results Analysis of the sequence data demonstrated that from a single viral plaque of Influenza A, a mapping assembly with 3590-fold average coverage representing 100% of the genome could be produced. The de novo assembled data produced contigs with 30-fold average sequence coverage, representing 96.5% of the genome. Using only 10 pg of starting DNA from bacteriophage F_HA0480sp/Pa1651 in the SISPA protocol resulted in sequencing data that gave a mapping assembly with 3488-fold average sequence coverage, representing 99.9% of the reference and a de novo assembly with 45-fold average sequence coverage, representing 98.1% of the genome. Conclusions The optimized SISPA protocol presented here produces amplified product that when sequenced will give high quality data that can be used for de novo assembly. The protocol requires only a single viral plaque or as little as 10 pg of DNA template, which will facilitate rapid identification of viruses during an outbreak and viruses that are difficult to propagate.
Collapse
Affiliation(s)
- Jessica Depew
- Department of Genomic Medicine, The J. Craig Venter Institute (JCVI), 9704 Medical Center Drive, Rockville, MD 20850, USA
| | | | | | | | | | | | | |
Collapse
|
270
|
Swei A, Russell BJ, Naccache SN, Kabre B, Veeraraghavan N, Pilgard MA, Johnson BJB, Chiu CY. The genome sequence of Lone Star virus, a highly divergent bunyavirus found in the Amblyomma americanum tick. PLoS One 2013; 8:e62083. [PMID: 23637969 PMCID: PMC3639253 DOI: 10.1371/journal.pone.0062083] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Accepted: 03/17/2013] [Indexed: 12/17/2022] Open
Abstract
Viruses in the family Bunyaviridae infect a wide range of plant, insect, and animal hosts. Tick-borne bunyaviruses in the Phlebovirus genus, including Severe Fever with Thrombocytopenia Syndrome virus (SFTSV) in China, Heartland virus (HRTV) in the United States, and Bhanja virus in Eurasia and Africa have been associated with acute febrile illness in humans. Here we sought to characterize the growth characteristics and genome of Lone Star virus (LSV), an unclassified bunyavirus originally isolated from the lone star tick Amblyomma americanum. LSV was able to infect both human (HeLa) and monkey (Vero) cells. Cytopathic effects were seen within 72 h in both cell lines; vacuolization was observed in infected Vero, but not HeLa, cells. Viral culture supernatants were examined by unbiased deep sequencing and analysis using an in-house developed rapid computational pipeline for viral discovery, which definitively identified LSV as a phlebovirus. De novo assembly of the full genome revealed that LSV is highly divergent, sharing <61% overall amino acid identity with any other bunyavirus. Despite this sequence diversity, LSV was found by phylogenetic analysis to be part of a well-supported clade that includes members of the Bhanja group viruses, which are most closely related to SFSTV/HRTV. The genome sequencing of LSV is a critical first step in developing diagnostic tools to determine the risk of arbovirus transmission by A. americanum, a tick of growing importance given its expanding geographic range and competence as a disease vector. This study also underscores the power of deep sequencing analysis in rapidly identifying and sequencing the genomes of viruses of potential clinical and public health significance.
Collapse
Affiliation(s)
- Andrea Swei
- Department of Laboratory Medicine, University of California San Francisco, San Francisco, California, United States of America
- UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California, United States of America
- Department of Biology, San Francisco State University, San Francisco, California, United States of America
| | - Brandy J. Russell
- Division of Vector-borne Diseases, Centers for Disease Control and Prevention, Fort Collins, Colorado, United States of America
| | - Samia N. Naccache
- Department of Laboratory Medicine, University of California San Francisco, San Francisco, California, United States of America
- UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California, United States of America
| | - Beniwende Kabre
- Department of Laboratory Medicine, University of California San Francisco, San Francisco, California, United States of America
- UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California, United States of America
| | - Narayanan Veeraraghavan
- Department of Laboratory Medicine, University of California San Francisco, San Francisco, California, United States of America
- UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California, United States of America
| | - Mark A. Pilgard
- Division of Vector-borne Diseases, Centers for Disease Control and Prevention, Fort Collins, Colorado, United States of America
| | - Barbara J. B. Johnson
- Division of Vector-borne Diseases, Centers for Disease Control and Prevention, Fort Collins, Colorado, United States of America
| | - Charles Y. Chiu
- Department of Laboratory Medicine, University of California San Francisco, San Francisco, California, United States of America
- UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
271
|
Bellos E, Johnson MR, Coin LJM. cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data. Genome Biol 2012; 13:R120. [PMID: 23259578 PMCID: PMC4056371 DOI: 10.1186/gb-2012-13-12-r120] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2012] [Accepted: 12/22/2012] [Indexed: 02/08/2023] Open
Abstract
Recent advances in sequencing technologies provide the means for identifying copy number variation (CNV) at an unprecedented resolution. A single next-generation sequencing experiment offers several features that can be used to detect CNV, yet current methods do not incorporate all available signatures into a unified model. cnvHiTSeq is an integrative probabilistic method for CNV discovery and genotyping that jointly analyzes multiple features at the population level. By combining evidence from complementary sources, cnvHiTSeq achieves high genotyping accuracy and a substantial improvement in CNV detection sensitivity over existing methods, while maintaining a low false discovery rate. cnvHiTSeq is available at http://sourceforge.net/projects/cnvhitseq.
Collapse
|
272
|
Matveeva OV, Nazipova NN, Ogurtsov AY, Shabalina SA. Optimized models for design of efficient miR30-based shRNAs. Front Genet 2012; 3:163. [PMID: 22952469 PMCID: PMC3429853 DOI: 10.3389/fgene.2012.00163] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 08/10/2012] [Indexed: 11/13/2022] Open
Abstract
Small hairpin RNAs (shRNAs) became an important research tool in cell biology. Reliable design of these molecules is essential for the needs of large functional genomics projects. To optimize the design of efficient shRNAs, we performed comparative, thermodynamic, and correlation analyses of ~18,000 miR30-based shRNAs with known functional efficiencies, derived from the Sensor Assay project (Fellmann et al., 2011). We identified features of the shRNA guide strand that significantly correlate with the silencing efficiency and performed multiple regression analysis, using 4/5 of the data for training purposes and 1/5 for cross validation. A model that included the position-dependent nucleotide preferences was predictive in the cross-validation data subset (R = 0.39). However, a model, which in addition to the nucleotide preferences included thermodynamic shRNA features such as a thermodynamic duplex stability and position-dependent thermodynamic profile (dinucleotide free energy) was performing better (R = 0.53). Software "miR_Scan" was developed based upon the optimized models. Calculated mRNA target secondary structure stability showed correlation with shRNA silencing efficiency but failed to improve the model. Correlation analysis demonstrates that our algorithm for identification of efficient miR30-based shRNA molecules performs better than approaches that were developed for design of chemically synthesized siRNAs (R(max) = 0.36).
Collapse
Affiliation(s)
- Olga V Matveeva
- Department of Human Genetics, University of Utah Salt Lake City, UT, USA
| | | | | | | |
Collapse
|
273
|
Winfield MO, Wilkinson PA, Allen AM, Barker GLA, Coghill JA, Burridge A, Hall A, Brenchley RC, D'Amore R, Hall N, Bevan MW, Richmond T, Gerhardt DJ, Jeddeloh JA, Edwards KJ. Targeted re-sequencing of the allohexaploid wheat exome. PLANT BIOTECHNOLOGY JOURNAL 2012; 10:733-42. [PMID: 22703335 DOI: 10.1111/j.1467-7652.2012.00713.x] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Bread wheat, Triticum aestivum, is an allohexaploid composed of the three distinct ancestral genomes, A, B and D. The polyploid nature of the wheat genome together with its large size has limited our ability to generate the significant amount of sequence data required for whole genome studies. Even with the advent of next-generation sequencing technology, it is still relatively expensive to generate whole genome sequences for more than a few wheat genomes at any one time. To overcome this problem, we have developed a targeted-capture re-sequencing protocol based upon NimbleGen array technology to capture and characterize 56.5 Mb of genomic DNA with sequence similarity to over 100 000 transcripts from eight different UK allohexaploid wheat varieties. Using this procedure in conjunction with a carefully designed bioinformatic procedure, we have identified more than 500 000 putative single-nucleotide polymorphisms (SNPs). While 80% of these were variants between the homoeologous genomes, A, B and D, a significant number (20%) were putative varietal SNPs between the eight varieties studied. A small number of these latter polymorphisms were experimentally validated using KASPar technology and 94% proved to be genuine. The procedures described here to sequence a large proportion of the wheat genome, and the various SNPs identified should be of considerable use to the wider wheat community.
Collapse
Affiliation(s)
- Mark O Winfield
- School of Biological Sciences, University of Bristol, Bristol, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
274
|
Kim WC, Lee KH, Shin KS, You RN, Lee YK, Cho K, Cho DH. REMiner-II: a tool for rapid identification and configuration of repetitive element arrays from large mammalian chromosomes as a single query. Genomics 2012; 100:131-40. [PMID: 22750555 DOI: 10.1016/j.ygeno.2012.06.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2012] [Revised: 06/04/2012] [Accepted: 06/12/2012] [Indexed: 01/17/2023]
Abstract
Genes occupy ~3% of the human and mouse genomes whereas repetitive elements (REs), whose biologic functions are largely uncharacterized, constitute greater than 50%. A heterogeneous population of RE arrays (arrangement structures) is formed by combinations of various REs in mammalian genomes. In this study, REMiner-II was refined from the original REMiner for a more efficient identification and configuration of RE arrays from large queries (e.g., human chromosomes) using an unbiased self-alignment protocol. Chromosome-wide RE array profiles for the entire sets of human and mouse chromosomes were obtained using REMiner-II on a personal computer. REMiner-II provides 10 adjustable parameters and three data output modes to accommodate different experimental settings and/or goals. Examination of the human and mouse chromosome data using the REMiner-II viewer revealed species-specific libraries of complexly organized RE arrays. In conclusion, REMiner-II is an efficient tool for chromosome-wide identification and characterization of RE arrays from mammalian genomes.
Collapse
Affiliation(s)
- Woo-Chan Kim
- Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| | | | | | | | | | | | | |
Collapse
|
275
|
Wylie KM, Truty RM, Sharpton TJ, Mihindukulasuriya KA, Zhou Y, Gao H, Sodergren E, Weinstock GM, Pollard KS. Novel bacterial taxa in the human microbiome. PLoS One 2012; 7:e35294. [PMID: 22719826 PMCID: PMC3374617 DOI: 10.1371/journal.pone.0035294] [Citation(s) in RCA: 80] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2011] [Accepted: 03/14/2012] [Indexed: 12/21/2022] Open
Abstract
The human gut harbors thousands of bacterial taxa. A profusion of metagenomic sequence data has been generated from human stool samples in the last few years, raising the question of whether more taxa remain to be identified. We assessed metagenomic data generated by the Human Microbiome Project Consortium to determine if novel taxa remain to be discovered in stool samples from healthy individuals. To do this, we established a rigorous bioinformatics pipeline that uses sequence data from multiple platforms (Illumina GAIIX and Roche 454 FLX Titanium) and approaches (whole-genome shotgun and 16S rDNA amplicons) to validate novel taxa. We applied this approach to stool samples from 11 healthy subjects collected as part of the Human Microbiome Project. We discovered several low-abundance, novel bacterial taxa, which span three major phyla in the bacterial tree of life. We determined that these taxa are present in a larger set of Human Microbiome Project subjects and are found in two sampling sites (Houston and St. Louis). We show that the number of false-positive novel sequences (primarily chimeric sequences) would have been two orders of magnitude higher than the true number of novel taxa without validation using multiple datasets, highlighting the importance of establishing rigorous standards for the identification of novel taxa in metagenomic data. The majority of novel sequences are related to the recently discovered genus Barnesiella, further encouraging efforts to characterize the members of this genus and to study their roles in the microbial communities of the gut. A better understanding of the effects of less-abundant bacteria is important as we seek to understand the complex gut microbiome in healthy individuals and link changes in the microbiome to disease.
Collapse
Affiliation(s)
- Kristine M. Wylie
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail: (KMW); (RMT)
| | - Rebecca M. Truty
- Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America
- * E-mail: (KMW); (RMT)
| | - Thomas J. Sharpton
- Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America
| | - Kathie A. Mihindukulasuriya
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Yanjiao Zhou
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Hongyu Gao
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Erica Sodergren
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - George M. Weinstock
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Katherine S. Pollard
- Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America
- Division of Biostatistics, Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
| |
Collapse
|
276
|
Sequence analysis of the human virome in febrile and afebrile children. PLoS One 2012; 7:e27735. [PMID: 22719819 PMCID: PMC3374612 DOI: 10.1371/journal.pone.0027735] [Citation(s) in RCA: 145] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2011] [Accepted: 10/23/2011] [Indexed: 01/21/2023] Open
Abstract
Unexplained fever (UF) is a common problem in children under 3 years old. Although virus infection is suspected to be the cause of most of these fevers, a comprehensive analysis of viruses in samples from children with fever and healthy controls is important for establishing a relationship between viruses and UF. We used unbiased, deep sequencing to analyze 176 nasopharyngeal swabs (NP) and plasma samples from children with UF and afebrile controls, generating an average of 4.6 million sequences per sample. An analysis pipeline was developed to detect viral sequences, which resulted in the identification of sequences from 25 viral genera. These genera included expected pathogens, such as adenoviruses, enteroviruses, and roseoloviruses, plus viruses with unknown pathogenicity. Viruses that were unexpected in NP and plasma samples, such as the astrovirus MLB-2, were also detected. Sequencing allowed identification of virus subtype for some viruses, including roseoloviruses. Highly sensitive PCR assays detected low levels of viruses that were not detected in approximately 5 million sequences, but greater sequencing depth improved sensitivity. On average NP and plasma samples from febrile children contained 1.5- to 5-fold more viral sequences, respectively, than samples from afebrile children. Samples from febrile children contained a broader range of viral genera and contained multiple viral genera more frequently than samples from children without fever. Differences between febrile and afebrile groups were most striking in the plasma samples, where detection of viral sequence may be associated with a disseminated infection. These data indicate that virus infection is associated with UF. Further studies are important in order to establish the range of viral pathogens associated with fever and to understand of the role of viral infection in fever. Ultimately these studies may improve the medical treatment of children with UF by helping avoid antibiotic therapy for children with viral infections.
Collapse
|
277
|
Audemard E, Schiex T, Faraut T. Detecting long tandem duplications in genomic sequences. BMC Bioinformatics 2012; 13:83. [PMID: 22568762 PMCID: PMC3464658 DOI: 10.1186/1471-2105-13-83] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2011] [Accepted: 05/08/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. RESULTS In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,(a) we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS < 1) and that it is also able to predict tandem duplications involving non coding elements such as pseudo-genes or RNA genes. CONCLUSIONS ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations.
Collapse
Affiliation(s)
- Eric Audemard
- Unité de Biométrie et Intelligence Artificielle, UR 875, INRA, Toulouse, France.
| | | | | |
Collapse
|
278
|
Murchison EP, Schulz-Trieglaff OB, Ning Z, Alexandrov LB, Bauer MJ, Fu B, Hims M, Ding Z, Ivakhno S, Stewart C, Ng BL, Wong W, Aken B, White S, Alsop A, Becq J, Bignell GR, Cheetham RK, Cheng W, Connor TR, Cox AJ, Feng ZP, Gu Y, Grocock RJ, Harris SR, Khrebtukova I, Kingsbury Z, Kowarsky M, Kreiss A, Luo S, Marshall J, McBride DJ, Murray L, Pearse AM, Raine K, Rasolonjatovo I, Shaw R, Tedder P, Tregidgo C, Vilella AJ, Wedge DC, Woods GM, Gormley N, Humphray S, Schroth G, Smith G, Hall K, Searle SMJ, Carter NP, Papenfuss AT, Futreal PA, Campbell PJ, Yang F, Bentley DR, Evers DJ, Stratton MR. Genome sequencing and analysis of the Tasmanian devil and its transmissible cancer. Cell 2012; 148:780-91. [PMID: 22341448 PMCID: PMC3281993 DOI: 10.1016/j.cell.2011.11.065] [Citation(s) in RCA: 238] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2011] [Revised: 11/03/2011] [Accepted: 11/29/2011] [Indexed: 01/23/2023]
Abstract
The Tasmanian devil (Sarcophilus harrisii), the largest marsupial carnivore, is endangered due to a transmissible facial cancer spread by direct transfer of living cancer cells through biting. Here we describe the sequencing, assembly, and annotation of the Tasmanian devil genome and whole-genome sequences for two geographically distant subclones of the cancer. Genomic analysis suggests that the cancer first arose from a female Tasmanian devil and that the clone has subsequently genetically diverged during its spread across Tasmania. The devil cancer genome contains more than 17,000 somatic base substitution mutations and bears the imprint of a distinct mutational process. Genotyping of somatic mutations in 104 geographically and temporally distributed Tasmanian devil tumors reveals the pattern of evolution and spread of this parasitic clonal lineage, with evidence of a selective sweep in one geographical area and persistence of parallel lineages in other populations. PaperClip
Collapse
|
279
|
Marcinowski L, Tanguy M, Krmpotic A, Rädle B, Lisnić VJ, Tuddenham L, Chane-Woon-Ming B, Ruzsics Z, Erhard F, Benkartek C, Babic M, Zimmer R, Trgovcich J, Koszinowski UH, Jonjic S, Pfeffer S, Dölken L. Degradation of cellular mir-27 by a novel, highly abundant viral transcript is important for efficient virus replication in vivo. PLoS Pathog 2012; 8:e1002510. [PMID: 22346748 PMCID: PMC3276556 DOI: 10.1371/journal.ppat.1002510] [Citation(s) in RCA: 157] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2011] [Accepted: 12/13/2011] [Indexed: 12/11/2022] Open
Abstract
Cytomegaloviruses express large amounts of viral miRNAs during lytic infection, yet, they only modestly alter the cellular miRNA profile. The most prominent alteration upon lytic murine cytomegalovirus (MCMV) infection is the rapid degradation of the cellular miR-27a and miR-27b. Here, we report that this regulation is mediated by the ∼1.7 kb spliced and highly abundant MCMV m169 transcript. Specificity to miR-27a/b is mediated by a single, apparently optimized, miRNA binding site located in its 3′-UTR. This site is easily and efficiently retargeted to other cellular and viral miRNAs by target site replacement. Expression of the 3′-UTR of m169 by an adenoviral vector was sufficient to mediate its function, indicating that no other viral factors are essential in this process. Degradation of miR-27a/b was found to be accompanied by 3′-tailing and -trimming. Despite its dramatic effect on miRNA stability, we found this interaction to be mutual, indicating potential regulation of m169 by miR-27a/b. Most interestingly, three mutant viruses no longer able to target miR-27a/b, either due to miRNA target site disruption or target site replacement, showed significant attenuation in multiple organs as early as 4 days post infection, indicating that degradation of miR-27a/b is important for efficient MCMV replication in vivo. MicroRNAs are small, non-coding RNAs which shape and fine-tune gene expression of at least a third of our genes. During millions of years of coevolution with their hosts, herpesviruses have both usurped the host cell miRNA machinery by expressing their own sets of miRNAs, and learned to modify host miRNA expression for their own needs. Recently, we reported on the rapid degradation of two cellular miRNAs upon lytic murine cytomegalovirus (MCMV) infection, namely miR-27a and miR-27b. In this paper, we show that their regulation is mediated by the highly abundant viral transcript m169. It targets miR-27a/b via a single binding site in its 3′-UTR, which can be efficiently retargeted to other cellular and viral miRNAs, enabling the efficient knock-down of individual miRNAs of interest. Degradation of miR-27a/b is preceded by its 3′-tailing and -trimming. Most interestingly, three mutant viruses unable to target miR-27a/b showed significantly lower virus titers in various organs during acute MCMV infection, indicating that degradation of miR-27a/b is important for efficient virus replication in vivo.
Collapse
Affiliation(s)
- Lisa Marcinowski
- Max von Pettenkofer-Institute, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Mélanie Tanguy
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de Biologie Moléculaire et Cellulaire du CNRS, Strasbourg, France
| | - Astrid Krmpotic
- Department of Histology and Embryology, Faculty of Medicine University of Rijeka, Rijeka, Croatia
| | - Bernd Rädle
- Max von Pettenkofer-Institute, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Vanda J. Lisnić
- Department of Histology and Embryology, Faculty of Medicine University of Rijeka, Rijeka, Croatia
| | - Lee Tuddenham
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de Biologie Moléculaire et Cellulaire du CNRS, Strasbourg, France
| | - Béatrice Chane-Woon-Ming
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de Biologie Moléculaire et Cellulaire du CNRS, Strasbourg, France
| | - Zsolt Ruzsics
- Max von Pettenkofer-Institute, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Florian Erhard
- Institute for Informatics, Ludwig-Maximilians-University Munich, Munich, Germany
| | | | - Marina Babic
- Department of Histology and Embryology, Faculty of Medicine University of Rijeka, Rijeka, Croatia
| | - Ralf Zimmer
- Institute for Informatics, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Joanne Trgovcich
- Department of Pathology, The Ohio State University, Columbus, Ohio, United States of America
| | - Ulrich H. Koszinowski
- Max von Pettenkofer-Institute, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Stipan Jonjic
- Department of Histology and Embryology, Faculty of Medicine University of Rijeka, Rijeka, Croatia
- * E-mail: (SJ); (SP); (LD)
| | - Sébastien Pfeffer
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de Biologie Moléculaire et Cellulaire du CNRS, Strasbourg, France
- * E-mail: (SJ); (SP); (LD)
| | - Lars Dölken
- Max von Pettenkofer-Institute, Ludwig-Maximilians-University Munich, Munich, Germany
- Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, United Kingdom
- * E-mail: (SJ); (SP); (LD)
| |
Collapse
|
280
|
Benkel BF, Smith A, Christensen K, Anistoroaei R, Zhang Y, Sensen CW, Farid H, Paterson L, Teather RM. A comparative, BAC end sequence enabled map of the genome of the American mink (Neovison vison). Genes Genomics 2012. [DOI: 10.1007/s13258-011-0160-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
281
|
Mittal VK, McDonald JF. R-SAP: a multi-threading computational pipeline for the characterization of high-throughput RNA-sequencing data. Nucleic Acids Res 2012; 40:e67. [PMID: 22287631 PMCID: PMC3351179 DOI: 10.1093/nar/gks047] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
The rapid expansion in the quantity and quality of RNA-Seq data requires the development of sophisticated high-performance bioinformatics tools capable of rapidly transforming this data into meaningful information that is easily interpretable by biologists. Currently available analysis tools are often not easily installed by the general biologist and most of them lack inherent parallel processing capabilities widely recognized as an essential feature of next-generation bioinformatics tools. We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets. R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading. In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.
Collapse
Affiliation(s)
- Vinay K Mittal
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | | |
Collapse
|
282
|
Szcześniak MW, Deorowicz S, Gapski J, Kaczyński Ł, Makalowska I. miRNEST database: an integrative approach in microRNA search and annotation. Nucleic Acids Res 2011; 40:D198-204. [PMID: 22135287 PMCID: PMC3245016 DOI: 10.1093/nar/gkr1159] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Despite accumulating data on animal and plant microRNAs and their functions, existing public miRNA resources usually collect miRNAs from a very limited number of species. A lot of microRNAs, including those from model organisms, remain undiscovered. As a result there is a continuous need to search for new microRNAs. We present miRNEST (http://mirnest.amu.edu.pl), a comprehensive database of animal, plant and virus microRNAs. The core part of the database is built from our miRNA predictions conducted on Expressed Sequence Tags of 225 animal and 202 plant species. The miRNA search was performed based on sequence similarity and as many as 10 004 miRNA candidates in 221 animal and 199 plant species were discovered. Out of them only 299 have already been deposited in miRBase. Additionally, miRNEST has been integrated with external miRNA data from literature and 13 databases, which includes miRNA sequences, small RNA sequencing data, expression, polymorphisms and targets data as well as links to external miRNA resources, whenever applicable. All this makes miRNEST a considerable miRNA resource in a sense of number of species (544) that integrates a scattered miRNA data into a uniform format with a user-friendly web interface.
Collapse
Affiliation(s)
- Michał Wojciech Szcześniak
- Laboratory of Bioinformatics, Faculty of Biology, Adam Mickiewicz University, Umultowska 89, 61-614 Poznan, Poland.
| | | | | | | | | |
Collapse
|
283
|
Sparks ME, Gundersen-Rindal DE. The Lymantria dispar IPLB-Ld652Y cell line transcriptome comprises diverse virus-associated transcripts. Viruses 2011; 3:2339-50. [PMID: 22163348 PMCID: PMC3230855 DOI: 10.3390/v3112339] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Revised: 11/12/2011] [Accepted: 11/14/2011] [Indexed: 12/28/2022] Open
Abstract
The enhanced viral susceptibility of the gypsy moth (Lymantria dispar)-derived IPLB-Ld652Y cell line has made it a popular in vitro system for studying virus-related phenomena in the Lepidoptera. Using both single-pass EST sequencing and 454-based pyrosequencing, a transcriptomic library of 14,368 putatively unique transcripts (PUTs) was produced comprising 8,476,050 high-quality, informative bases. The gene content of the IPLB-Ld652Y transcriptome was broadly assessed via comparison with the NCBI non-redundant protein database, and more detailed functional annotation was inferred by comparison to the Swiss-Prot subset of UniProtKB. In addition to L. dispar cellular transcripts, a diverse array of both RNA and DNA virus-associated transcripts was identified within the dataset, suggestive of a high level of viral expression and activity in IPLB-Ld652Y cells. These sequence resources will provide a sound basis for developing testable experimental hypotheses by insect virologists, and suggest a number of avenues for potential research.
Collapse
Affiliation(s)
- Michael E Sparks
- USDA-ARS Invasive Insect Biocontrol and Behavior Laboratory, Beltsville, MD 20705, USA.
| | | |
Collapse
|
284
|
Mayoral RJ, Deho L, Rusca N, Bartonicek N, Saini HK, Enright AJ, Monticelli S. MiR-221 influences effector functions and actin cytoskeleton in mast cells. PLoS One 2011; 6:e26133. [PMID: 22022537 PMCID: PMC3192147 DOI: 10.1371/journal.pone.0026133] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2011] [Accepted: 09/20/2011] [Indexed: 02/01/2023] Open
Abstract
Mast cells have essential effector and immunoregulatory functions in IgE-associated allergic disorders and certain innate and adaptive immune responses, but the role of miRNAs in regulating mast cell functions is almost completely unexplored. To examine the role of the activation-induced miRNA miR-221 in mouse mast cells, we developed robust lentiviral systems for miRNA overexpression and depletion. While miR-221 favored mast cell adhesion and migration towards SCF or antigen in trans-well migration assays, as well as cytokine production and degranulation in response to IgE-antigen complexes, neither miR-221 overexpression, nor its ablation, interfered with mast cell differentiation. Transcriptional profiling of miR-221-overexpressing mast cells revealed modulation of many transcripts, including several associated with the cytoskeleton; indeed, miR-221 overexpression was associated with reproducible increases in cortical actin in mast cells, and with altered cellular shape and cell cycle in murine fibroblasts. Our bioinformatics analysis showed that this effect was likely mediated by the composite effect of miR-221 on many primary and secondary targets in resting cells. Indeed, miR-221-induced cellular alterations could not be recapitulated by knockdown of one of the major targets of miR-221. We propose a model in which miR-221 has two different roles in mast cells: in resting cells, basal levels of miR-221 contribute to the regulation of the cell cycle and cytoskeleton, a general mechanism probably common to other miR-221-expressing cell types, such as fibroblasts. Vice versa, upon induction in response to mast cell stimulation, miR-221 effects are mast cell-specific and activation-dependent, contributing to the regulation of degranulation, cytokine production and cell adherence. Our studies provide new insights into the roles of miR-221 in mast cell biology, and identify novel mechanisms that may contribute to mast cell-related pathological conditions, such as asthma, allergy and mastocytosis.
Collapse
Affiliation(s)
| | - Lorenzo Deho
- Institute for Research in Biomedicine, Bellinzona, Switzerland
| | - Nicole Rusca
- Institute for Research in Biomedicine, Bellinzona, Switzerland
| | - Nenad Bartonicek
- EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Harpreet Kaur Saini
- EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Anton J. Enright
- EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Silvia Monticelli
- Institute for Research in Biomedicine, Bellinzona, Switzerland
- * E-mail:
| |
Collapse
|
285
|
Fleetwood DJ, Khan AK, Johnson RD, Young CA, Mittal S, Wrenn RE, Hesse U, Foster SJ, Schardl CL, Scott B. Abundant degenerate miniature inverted-repeat transposable elements in genomes of epichloid fungal endophytes of grasses. Genome Biol Evol 2011; 3:1253-64. [PMID: 21948396 PMCID: PMC3227409 DOI: 10.1093/gbe/evr098] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/20/2011] [Indexed: 12/20/2022] Open
Abstract
Miniature inverted-repeat transposable elements (MITEs) are abundant repeat elements in plant and animal genomes; however, there are few analyses of these elements in fungal genomes. Analysis of the draft genome sequence of the fungal endophyte Epichloë festucae revealed 13 MITE families that make up almost 1% of the E. festucae genome, and relics of putative autonomous parent elements were identified for three families. Sequence and DNA hybridization analyses suggest that at least some of the MITEs identified in the study were active early in the evolution of Epichloë but are not found in closely related genera. Analysis of MITE integration sites showed that these elements have a moderate integration site preference for 5' genic regions of the E. festucae genome and are particularly enriched near genes for secondary metabolism. Copies of the EFT-3m/Toru element appear to have mediated recombination events that may have abolished synthesis of two fungal alkaloids in different epichloae. This work provides insight into the potential impact of MITEs on epichloae evolution and provides a foundation for analysis in other fungal genomes.
Collapse
Affiliation(s)
- Damien J Fleetwood
- Forage Biotechnology Section, AgResearch, Palmerston North, New Zealand.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
286
|
Structural variation in the chicken genome identified by paired-end next-generation DNA sequencing of reduced representation libraries. BMC Genomics 2011; 12:94. [PMID: 21291514 PMCID: PMC3039614 DOI: 10.1186/1471-2164-12-94] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2010] [Accepted: 02/03/2011] [Indexed: 11/21/2022] Open
Abstract
Background Variation within individual genomes ranges from single nucleotide polymorphisms (SNPs) to kilobase, and even megabase, sized structural variants (SVs), such as deletions, insertions, inversions, and more complex rearrangements. Although much is known about the extent of SVs in humans and mice, species in which they exert significant effects on phenotypes, very little is known about the extent of SVs in the 2.5-times smaller and less repetitive genome of the chicken. Results We identified hundreds of shared and divergent SVs in four commercial chicken lines relative to the reference chicken genome. The majority of SVs were found in intronic and intergenic regions, and we also found SVs in the coding regions. To identify the SVs, we combined high-throughput short read paired-end sequencing of genomic reduced representation libraries (RRLs) of pooled samples from 25 individuals and computational mapping of DNA sequences from a reference genome. Conclusion We provide a first glimpse of the high abundance of small structural genomic variations in the chicken. Extrapolating our results, we estimate that there are thousands of rearrangements in the chicken genome, the majority of which are located in non-coding regions. We observed that structural variation contributes to genetic differentiation among current domesticated chicken breeds and the Red Jungle Fowl. We expect that, because of their high abundance, SVs might explain phenotypic differences and play a role in the evolution of the chicken genome. Finally, our study exemplifies an efficient and cost-effective approach for identifying structural variation in sequenced genomes.
Collapse
|
287
|
Abstract
Summary: Here, we present PRINSEQ for easy and rapid quality control and data preprocessing of genomic and metagenomic datasets. Summary statistics of FASTA (and QUAL) or FASTQ files are generated in tabular and graphical form and sequences can be filtered, reformatted and trimmed by a variety of options to improve downstream analysis. Availability and Implementation: This open-source application was implemented in Perl and can be used as a stand alone version or accessed online through a user-friendly web interface. The source code, user help and additional information are available at http://prinseq.sourceforge.net/. Contact:rschmied@sciences.sdsu.edu; redwards@cs.sdsu.edu
Collapse
Affiliation(s)
- Robert Schmieder
- Department of Computer Science, Computational Science Research Center, San Diego State University, San Diego, CA 92182, USA.
| | | |
Collapse
|
288
|
Sullivan PF, Allander T, Lysholm F, Goh S, Persson B, Jacks A, Evengård B, Pedersen NL, Andersson B. An unbiased metagenomic search for infectious agents using monozygotic twins discordant for chronic fatigue. BMC Microbiol 2011; 11:2. [PMID: 21194495 PMCID: PMC3022642 DOI: 10.1186/1471-2180-11-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2010] [Accepted: 01/02/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Chronic fatigue syndrome is an idiopathic syndrome widely suspected of having an infectious or immune etiology. We applied an unbiased metagenomic approach to try to identify known or novel infectious agents in the serum of 45 cases with chronic fatigue syndrome or idiopathic chronic fatigue. Controls were the unaffected monozygotic co-twins of cases, and serum samples were obtained at the same place and time. RESULTS No novel DNA or RNA viral signatures were confidently identified. Four affected twins and no unaffected twins evidenced viremia with GB virus C (8.9% vs. 0%, p = 0.019), and one affected twin had previously undetected hepatitis C viremia. An excess of GB virus C viremia in cases with chronic fatigue requires confirmation. CONCLUSIONS Current, impairing chronic fatigue was not robustly associated with viremia detectable in serum.
Collapse
Affiliation(s)
- Patrick F Sullivan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
289
|
Development and application of bovine and porcine oligonucleotide arrays with protein-based annotation. J Biomed Biotechnol 2010; 2010:453638. [PMID: 21197395 PMCID: PMC3010673 DOI: 10.1155/2010/453638] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2010] [Accepted: 11/01/2010] [Indexed: 12/11/2022] Open
Abstract
The design of oligonucleotide sequences for the detection of gene expression in species with disparate volumes of genome and EST sequence information has been broadly studied. However, a congruous strategy has yet to emerge to allow the design of sensitive and specific gene expression detection probes. This study explores the use of a phylogenomic approach to align transcribed sequences to vertebrate protein sequences for the detection of gene families to design genomewide 70-mer oligonucleotide probe sequences for bovine and porcine. The bovine array contains 23,580 probes that target the transcripts of 16,341 genes, about 72% of the total number of bovine genes. The porcine array contains 19,980 probes targeting 15,204 genes, about 76% of the genes in the Ensembl annotation of the pig genome. An initial experiment using the bovine array demonstrates the specificity and sensitivity of the array.
Collapse
|
290
|
Abstract
Biological sequences are often analyzed by detecting homologous regions between them. Homology search is confounded by simple repeats, which give rise to strong similarities that are not homologies. Standard repeat-masking methods fail to eliminate this problem, and they are especially ill-suited to AT-rich DNA such as malaria and slime-mould genomes. We present a new repeat-masking method, tantan, which is motivated by the mechanisms that create simple repeats. This method thoroughly eliminates spurious homology predictions for DNA–DNA, protein–protein and DNA–protein comparisons. Moreover, it enables accurate homology search for non-coding DNA with extreme A + T composition.
Collapse
Affiliation(s)
- Martin C Frith
- Computational Biology Research Center, Institute for Advanced Industrial Science and Technology, Sequence Analysis Team, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| |
Collapse
|
291
|
Ramagopalan SV, Heger A, Berlanga AJ, Maugeri NJ, Lincoln MR, Burrell A, Handunnetthi L, Handel AE, Disanto G, Orton SM, Watson CT, Morahan JM, Giovannoni G, Ponting CP, Ebers GC, Knight JC. A ChIP-seq defined genome-wide map of vitamin D receptor binding: associations with disease and evolution. Genome Res 2010; 20:1352-60. [PMID: 20736230 PMCID: PMC2945184 DOI: 10.1101/gr.107920.110] [Citation(s) in RCA: 612] [Impact Index Per Article: 43.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Accepted: 07/13/2010] [Indexed: 02/06/2023]
Abstract
Initially thought to play a restricted role in calcium homeostasis, the pleiotropic actions of vitamin D in biology and their clinical significance are only now becoming apparent. However, the mode of action of vitamin D, through its cognate nuclear vitamin D receptor (VDR), and its contribution to diverse disorders, remain poorly understood. We determined VDR binding throughout the human genome using chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq). After calcitriol stimulation, we identified 2776 genomic positions occupied by the VDR and 229 genes with significant changes in expression in response to vitamin D. VDR binding sites were significantly enriched near autoimmune and cancer associated genes identified from genome-wide association (GWA) studies. Notable genes with VDR binding included IRF8, associated with MS, and PTPN2 associated with Crohn's disease and T1D. Furthermore, a number of single nucleotide polymorphism associations from GWA were located directly within VDR binding intervals, for example, rs13385731 associated with SLE and rs947474 associated with T1D. We also observed significant enrichment of VDR intervals within regions of positive selection among individuals of Asian and European descent. ChIP-seq determination of transcription factor binding, in combination with GWA data, provides a powerful approach to further understanding the molecular bases of complex diseases.
Collapse
Affiliation(s)
- Sreeram V. Ramagopalan
- Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, Oxford OX3 7BN, United Kingdom
- Department of Clinical Neurology, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom
- Blizard Institute of Cell and Molecular Science, Queen Mary University of London, Barts and The London School of Medicine and Dentistry, London E1 2AT, United Kingdom
| | - Andreas Heger
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom
| | - Antonio J. Berlanga
- Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, Oxford OX3 7BN, United Kingdom
- Department of Clinical Neurology, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom
| | - Narelle J. Maugeri
- Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, Oxford OX3 7BN, United Kingdom
| | - Matthew R. Lincoln
- Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, Oxford OX3 7BN, United Kingdom
- Department of Clinical Neurology, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom
| | - Amy Burrell
- Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, Oxford OX3 7BN, United Kingdom
- Department of Clinical Neurology, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom
| | - Lahiru Handunnetthi
- Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, Oxford OX3 7BN, United Kingdom
- Department of Clinical Neurology, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom
| | - Adam E. Handel
- Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, Oxford OX3 7BN, United Kingdom
- Department of Clinical Neurology, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom
| | - Giulio Disanto
- Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, Oxford OX3 7BN, United Kingdom
- Department of Clinical Neurology, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom
| | - Sarah-Michelle Orton
- Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, Oxford OX3 7BN, United Kingdom
- Department of Clinical Neurology, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom
| | - Corey T. Watson
- Department of Biological Sciences, Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada
| | - Julia M. Morahan
- Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, Oxford OX3 7BN, United Kingdom
- Department of Clinical Neurology, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom
| | - Gavin Giovannoni
- Blizard Institute of Cell and Molecular Science, Queen Mary University of London, Barts and The London School of Medicine and Dentistry, London E1 2AT, United Kingdom
| | - Chris P. Ponting
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom
| | - George C. Ebers
- Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, Oxford OX3 7BN, United Kingdom
- Department of Clinical Neurology, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom
| | - Julian C. Knight
- Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, Oxford OX3 7BN, United Kingdom
| |
Collapse
|
292
|
Abstract
Motivation: High-throughput sequencing technologies have recently made deep interrogation of expressed transcript sequences practical, both economically and temporally. Identification of intron/exon boundaries is an essential part of genome annotation, yet remains a challenge. Here, we present supersplat, a method for unbiased splice-junction discovery through empirical RNA-seq data. Results: Using a genomic reference and RNA-seq high-throughput sequencing datasets, supersplat empirically identifies potential splice junctions at a rate of ∼11.4 million reads per hour. We further benchmark the performance of the algorithm by mapping Illumina RNA-seq reads to identify introns in the genome of the reference dicot plant Arabidopsis thaliana and we demonstrate the utility of supersplat for de novo empirical annotation of splice junctions using the reference monocot plant Brachypodium distachyon. Availability: Implemented in C++, supersplat source code and binaries are freely available on the web at http://mocklerlab-tools.cgrb.oregonstate.edu/ Contact:tmockler@cgrb.oregonstate.edu
Collapse
Affiliation(s)
- Douglas W Bryant
- Department of Botany and Plant Pathology and Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR 97331, USA
| | | | | | | | | |
Collapse
|
293
|
Sierra R, Rodríguez-R LM, Chaves D, Pinzón A, Grajales A, Rojas A, Mutis G, Cárdenas M, Burbano D, Jiménez P, Bernal A, Restrepo S. Discovery of Phytophthora infestans genes expressed in planta through mining of cDNA libraries. PLoS One 2010; 5:e9847. [PMID: 20352100 PMCID: PMC2844423 DOI: 10.1371/journal.pone.0009847] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2009] [Accepted: 03/04/2010] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Phytophthora infestans (Mont.) de Bary causes late blight of potato and tomato, and has a broad host range within the Solanaceae family. Most studies of the Phytophthora--Solanum pathosystem have focused on gene expression in the host and have not analyzed pathogen gene expression in planta. METHODOLOGY/PRINCIPAL FINDINGS We describe in detail an in silico approach to mine ESTs from inoculated host plants deposited in a database in order to identify particular pathogen sequences associated with disease. We identified candidate effector genes through mining of 22,795 ESTs corresponding to P. infestans cDNA libraries in compatible and incompatible interactions with hosts from the Solanaceae family. CONCLUSIONS/SIGNIFICANCE We annotated genes of P. infestans expressed in planta associated with late blight using different approaches and assigned putative functions to 373 out of the 501 sequences found in the P. infestans genome draft, including putative secreted proteins, domains associated with pathogenicity and poorly characterized proteins ideal for further experimental studies. Our study provides a methodology for analyzing cDNA libraries and provides an understanding of the plant--oomycete pathosystems that is independent of the host, condition, or type of sample by identifying genes of the pathogen expressed in planta.
Collapse
Affiliation(s)
- Roberto Sierra
- Departamento de Ciencias Biológicas, Universidad de los Andes, Bogotá Distrito Capital, Colombia
| | - Luis M. Rodríguez-R
- Departamento de Ciencias Biológicas, Universidad de los Andes, Bogotá Distrito Capital, Colombia
| | - Diego Chaves
- Departamento de Ciencias Biológicas, Universidad de los Andes, Bogotá Distrito Capital, Colombia
| | - Andrés Pinzón
- Departamento de Ciencias Biológicas, Universidad de los Andes, Bogotá Distrito Capital, Colombia
| | - Alejandro Grajales
- Departamento de Ciencias Biológicas, Universidad de los Andes, Bogotá Distrito Capital, Colombia
| | - Alejandro Rojas
- Departamento de Ciencias Biológicas, Universidad de los Andes, Bogotá Distrito Capital, Colombia
| | - Gabriel Mutis
- Departamento de Ciencias Biológicas, Universidad de los Andes, Bogotá Distrito Capital, Colombia
| | - Martha Cárdenas
- Departamento de Ciencias Biológicas, Universidad de los Andes, Bogotá Distrito Capital, Colombia
| | - Daniel Burbano
- Dirección de Tecnologías de Información, Universidad de los Andes, Bogotá Distrito Capital, Colombia
| | - Pedro Jiménez
- Programa de Biología Aplicada, Universidad Militar Nueva Granada, Bogotá Distrito Capital, Colombia
| | - Adriana Bernal
- Departamento de Ciencias Biológicas, Universidad de los Andes, Bogotá Distrito Capital, Colombia
| | - Silvia Restrepo
- Departamento de Ciencias Biológicas, Universidad de los Andes, Bogotá Distrito Capital, Colombia
| |
Collapse
|
294
|
Frith MC, Hamada M, Horton P. Parameters for accurate genome alignment. BMC Bioinformatics 2010; 11:80. [PMID: 20144198 PMCID: PMC2829014 DOI: 10.1186/1471-2105-11-80] [Citation(s) in RCA: 136] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2009] [Accepted: 02/09/2010] [Indexed: 11/25/2022] Open
Abstract
Background Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed. Results We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases. Conclusions These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours http://last.cbrc.jp/.
Collapse
Affiliation(s)
- Martin C Frith
- Computational Biology Research Center, Institute for Advanced Industrial Science and Technology, Tokyo 135-0064, Japan.
| | | | | |
Collapse
|
295
|
Croning MDR, Fricker DG, Komiyama NH, Grant SGN. Automated design of genomic Southern blot probes. BMC Genomics 2010; 11:74. [PMID: 20113467 PMCID: PMC2830989 DOI: 10.1186/1471-2164-11-74] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2009] [Accepted: 01/29/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Sothern blotting is a DNA analysis technique that has found widespread application in molecular biology. It has been used for gene discovery and mapping and has diagnostic and forensic applications, including mutation detection in patient samples and DNA fingerprinting in criminal investigations. Southern blotting has been employed as the definitive method for detecting transgene integration, and successful homologous recombination in gene targeting experiments.The technique employs a labeled DNA probe to detect a specific DNA sequence in a complex DNA sample that has been separated by restriction-digest and gel electrophoresis. Critically for the technique to succeed the probe must be unique to the target locus so as not to cross-hybridize to other endogenous DNA within the sample.Investigators routinely employ a manual approach to probe design. A genome browser is used to extract DNA sequence from the locus of interest, which is searched against the target genome using a BLAST-like tool. Ideally a single perfect match is obtained to the target, with little cross-reactivity caused by homologous DNA sequence present in the genome and/or repetitive and low-complexity elements in the candidate probe. This is a labor intensive process often requiring several attempts to find a suitable probe for laboratory testing. RESULTS We have written an informatic pipeline to automatically design genomic Sothern blot probes that specifically attempts to optimize the resultant probe, employing a brute-force strategy of generating many candidate probes of acceptable length in the user-specified design window, searching all against the target genome, then scoring and ranking the candidates by uniqueness and repetitive DNA element content. Using these in silico measures we can automatically design probes that we predict to perform as well, or better, than our previous manual designs, while considerably reducing design time.We went on to experimentally validate a number of these automated designs by Southern blotting. The majority of probes we tested performed well confirming our in silico prediction methodology and the general usefulness of the software for automated genomic Southern probe design. CONCLUSIONS Software and supplementary information are freely available at: http://www.genes2cognition.org/software/southern_blot.
Collapse
Affiliation(s)
- Mike D R Croning
- Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB101SA, UK
| | | | | | | |
Collapse
|
296
|
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics 2009. [PMID: 20003500 DOI: 10.1186/1471–2105-10-421] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. RESULTS We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. CONCLUSION The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
Collapse
Affiliation(s)
- Christiam Camacho
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | | | | | | | | | | | |
Collapse
|
297
|
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics 2009; 10:421. [PMID: 20003500 PMCID: PMC2803857 DOI: 10.1186/1471-2105-10-421] [Citation(s) in RCA: 11126] [Impact Index Per Article: 741.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2009] [Accepted: 12/15/2009] [Indexed: 01/13/2023] Open
Abstract
Background Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. Results We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. Conclusion The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
Collapse
Affiliation(s)
- Christiam Camacho
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | | | | | | | | | | | |
Collapse
|
298
|
Filichkin SA, Priest HD, Givan SA, Shen R, Bryant DW, Fox SE, Wong WK, Mockler TC. Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Res 2009; 20:45-58. [PMID: 19858364 DOI: 10.1101/gr.093302.109] [Citation(s) in RCA: 670] [Impact Index Per Article: 44.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Alternative splicing can enhance transcriptome plasticity and proteome diversity. In plants, alternative splicing can be manifested at different developmental stages, and is frequently associated with specific tissue types or environmental conditions such as abiotic stress. We mapped the Arabidopsis transcriptome at single-base resolution using the Illumina platform for ultrahigh-throughput RNA sequencing (RNA-seq). Deep transcriptome sequencing confirmed a majority of annotated introns and identified thousands of novel alternatively spliced mRNA isoforms. Our analysis suggests that at least approximately 42% of intron-containing genes in Arabidopsis are alternatively spliced; this is significantly higher than previous estimates based on cDNA/expressed sequence tag sequencing. Random validation confirmed that novel splice isoforms empirically predicted by RNA-seq can be detected in vivo. Novel introns detected by RNA-seq were substantially enriched in nonconsensus terminal dinucleotide splice signals. Alternative isoforms with premature termination codons (PTCs) comprised the majority of alternatively spliced transcripts. Using an example of an essential circadian clock gene, we show that intron retention can generate relatively abundant PTC(+) isoforms and that this specific event is highly conserved among diverse plant species. Alternatively spliced PTC(+) isoforms can be potentially targeted for degradation by the nonsense mediated mRNA decay (NMD) surveillance machinery or regulate the level of functional transcripts by the mechanism of regulated unproductive splicing and translation (RUST). We demonstrate that the relative ratios of the PTC(+) and reference isoforms for several key regulatory genes can be considerably shifted under abiotic stress treatments. Taken together, our results suggest that like in animals, NMD and RUST may be widespread in plants and may play important roles in regulating gene expression.
Collapse
Affiliation(s)
- Sergei A Filichkin
- Department of Botany and Plant Pathology and Center for Genome Research and Biocomputing, Oregon State University, Corvallis, Oregon 97331, USA
| | | | | | | | | | | | | | | |
Collapse
|
299
|
Ye Y, Tang H. An ORFome assembly approach to metagenomics sequences analysis. J Bioinform Comput Biol 2009; 7:455-71. [PMID: 19507285 DOI: 10.1142/s0219720009004151] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2008] [Revised: 11/04/2008] [Accepted: 11/06/2008] [Indexed: 11/18/2022]
Abstract
Metagenomics is an emerging methodology for the direct genomic analysis of a mixed community of uncultured microorganisms. The current analyses of metagenomics data largely rely on the computational tools originally designed for microbial genomics projects. The challenge of assembling metagenomic sequences arises mainly from the short reads and the high species complexity of the community. Alternatively, individual (short) reads will be searched directly against databases of known genes (or proteins) to identify homologous sequences. The latter approach may have low sensitivity and specificity in identifying homologous sequences, which may further bias the subsequent diversity analysis. In this paper, we present a novel approach to metagenomic data analysis, called Metagenomic ORFome Assembly (MetaORFA). The whole computational framework consists of three steps. Each read from a metagenomics project will first be annotated with putative open reading frames (ORFs) that likely encode proteins. Next, the predicted ORFs are assembled into a collection of peptides using an EULER assembly method. Finally, the assembled peptides (i.e. ORFome) are used for database searching of homologs and subsequent diversity analysis. We applied MetaORFA approach to several metagenomics datasets with low coverage short reads. The results show that MetaORFA can produce long peptides even when the sequence coverage of reads is extremely low. Hence, the ORFome assembly significantly increases the sensitivity of homology searching, and may potentially improve the diversity analysis of the metagenomic data. This improvement is especially useful for metagenomic projects when the genome assembly does not work because of the low sequence coverage.
Collapse
Affiliation(s)
- Yuzhen Ye
- School of Informatics, Indiana University, Bloomington, IN 47408, USA.
| | | |
Collapse
|
300
|
Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing. PLoS One 2009; 4:e6659. [PMID: 19684856 PMCID: PMC2722027 DOI: 10.1371/journal.pone.0006659] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2009] [Accepted: 07/26/2009] [Indexed: 11/19/2022] Open
Abstract
Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb) and 7 (1.1 Mb) from an individual from the International HapMap Project (NA12872). We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage≥4-fold, and 97.9% concordant in regions with coverage≥15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.
Collapse
|