1
|
In Vitro and In Silico Based Approaches to Identify Potential Novel Bacteriocins from the Athlete Gut Microbiome of an Elite Athlete Cohort. Microorganisms 2022; 10:microorganisms10040701. [PMID: 35456752 PMCID: PMC9025905 DOI: 10.3390/microorganisms10040701] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/09/2022] [Accepted: 03/22/2022] [Indexed: 12/30/2022] Open
Abstract
Exercise reduces inflammation, fatigue, and aids overall health. Additionally, physical fitness has been associated with desirable changes in the community composition of the athlete gut microbiome, with health-associated taxa being shown to be increased in active individuals. Here, using a combination of in silico and in vitro methods, we investigate the antimicrobial activity of the athlete gut microbiome. In vitro approaches resulted in the generation of 284 gut isolates with inhibitory activity against Clostridioides difficile and/or Fusobacterium nucleatum, and the most potent isolates were further characterized, and potential bacteriocins were predicted using both MALDI-TOF MS and whole-genome sequencing. Additionally, metagenomic reads from the faecal samples were used to recover 770 Metagenome Assembled Genomes (MAGs), of which 148 were assigned to be high-quality MAGs and screened for the presence of putative bacteriocin gene clusters using BAGEL4 software, with 339 gene clusters of interest being identified. Class I was the most abundant bacteriocin class predicted, accounting for 91.3% of predictions, Class III had a predicted abundance of 7.5%, and Class II was represented by just 1% of all predictions.
Collapse
|
2
|
Sohrabi SS, Ismaili A, Nazarian-Firouzabadi F, Fallahi H, Hosseini SZ. Identification of key genes and molecular mechanisms associated with temperature stress in lentil. Gene 2022; 807:145952. [PMID: 34500049 DOI: 10.1016/j.gene.2021.145952] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 08/24/2021] [Accepted: 09/03/2021] [Indexed: 02/03/2023]
Abstract
Extreme temperature is one of the serious threats to crop production in present and future scenarios of global climate changes. Lentil (Lens culinaris) is an important crop, and there is a serious lack of genetic information regarding environmental and temperature stresses responses. This study is the first report of evaluation of key genes and molecular mechanisms related to temperature stresses in lentil using the RNA sequencing technique. De novo transcriptome assembly created 44,673 contigs and differential gene expression analysis revealed 7494 differentially expressed genes between the temperature stresses and control group. Basic annotation of generated transcriptome assembly in our study led to the identification of 2765 novel transcripts that have not been identified yet in lentil genome draft v1.2. In addition, several unigenes involved in mechanisms of temperature sensing, calcium and hormone signaling and DNA-binding transcription factor activity were identified. Also, common mechanisms in response to temperature stresses, including the proline biosynthesis, the photosynthetic light reactions balancing, chaperone activity and circadian rhythms, are determined by the hub genes through the protein-protein interaction networks analysis. Deciphering the mechanisms of extreme temperature tolerance would be a new way for developing crops with enhanced plasticity against climate change. In general, this study has identified set of mechanisms and various genes related to cold and heat stresses which will be useful in better understanding of the lentil's reaction to temperature stresses.
Collapse
Affiliation(s)
- Seyed Sajad Sohrabi
- Department of Plant Production and Genetic Engineering, Faculty of Agriculture, Lorestan University, Khorramabad, Iran.
| | - Ahmad Ismaili
- Department of Plant Production and Genetic Engineering, Faculty of Agriculture, Lorestan University, Khorramabad, Iran.
| | - Farhad Nazarian-Firouzabadi
- Department of Plant Production and Genetic Engineering, Faculty of Agriculture, Lorestan University, Khorramabad, Iran.
| | - Hossein Fallahi
- Department of Biology, School of Sciences, Razi University, Kermanshah, Iran.
| | - Seyedeh Zahra Hosseini
- Department of Plant Production and Genetic Engineering, Faculty of Agriculture, Lorestan University, Khorramabad, Iran.
| |
Collapse
|
3
|
Turner D, Adriaenssens EM, Tolstoy I, Kropinski AM. Phage Annotation Guide: Guidelines for Assembly and High-Quality Annotation. PHAGE (NEW ROCHELLE, N.Y.) 2021; 2:170-182. [PMID: 35083439 PMCID: PMC8785237 DOI: 10.1089/phage.2021.0013] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
All sequencing projects of bacteriophages (phages) should seek to report an accurate and comprehensive annotation of their genomes. This article defines 14 questions for those new to phage genomics that should be addressed before submitting a genome sequence to the International Nucleotide Sequence Database Collaboration or writing a publication.
Collapse
Affiliation(s)
- Dann Turner
- Department of Applied Sciences, Faculty of Health and Applied Sciences, University of the West of England, Bristol, United Kingdom
| | | | - Igor Tolstoy
- Viral Resources, National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, Maryland, USA
| | - Andrew M Kropinski
- Department of Food Science, and University of Guelph, Guelph, Ontario, Canada.,Department of Pathobiology, University of Guelph, Guelph, Ontario, Canada
| |
Collapse
|
4
|
Ahmad SS, Samia NSN, Khan AS, Turjya RR, Khan MAAK. Bidirectional promoters: an enigmatic genome architecture and their roles in cancers. Mol Biol Rep 2021; 48:6637-6644. [PMID: 34378109 DOI: 10.1007/s11033-021-06612-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 07/29/2021] [Indexed: 11/28/2022]
Abstract
Bidirectional promoters are the transcription regulatory regions of genes positioned head-to-head on opposite strands. Specific sequence signals, chromatin modifications and three-dimensional structures of the transcription site facilitate the unconventional yet tightly regulated transcription proceeding in both directions from these promoters. Mutations or aberrant epigenetic changes can lead to abnormal enhanced or reduced expression from either of the bidirectionally transcribed genes resulting in tumorigenesis. Moreover, bidirectionally transcribed genes might also contribute towards the immune regulation in tumor microenvironment. In this review, we aimed to expound the characteristic features of bidirectional promoters alongside their transcriptional regulations, and ultimately, the association of these enigmatic genomic elements in different cancers.
Collapse
Affiliation(s)
- Sheikh Shafin Ahmad
- Department of Mathematics and Natural Sciences, Brac University, Dhaka, Bangladesh
| | | | - Auroni Semonti Khan
- Department of Genetic Engineering and Biotechnology, Jagannath University, Dhaka, Bangladesh
| | - Rafeed Rahman Turjya
- Department of Mathematics and Natural Sciences, Brac University, Dhaka, Bangladesh
| | | |
Collapse
|
5
|
Marla SS, Mishra P, Maurya R, Singh M, Wankhede DP, Kumar A, Yadav MC, Subbarao N, Singh SK, Kumar R. Refinement of Draft Genome Assemblies of Pigeonpea ( Cajanus cajan). Front Genet 2020; 11:607432. [PMID: 33384719 PMCID: PMC7770131 DOI: 10.3389/fgene.2020.607432] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 11/23/2020] [Indexed: 11/13/2022] Open
Abstract
Genome assembly of short reads from large plant genomes remains a challenge in computational biology despite major developments in next generation sequencing. Of late several draft assemblies have been reported in sequenced plant genomes. The reported draft genome assemblies of Cajanus cajan have different levels of genome completeness, a large number of repeats, gaps, and segmental duplications. Draft assemblies with portions of genome missing are shorter than the referenced original genome. These assemblies come with low map accuracy affecting further functional annotation and the prediction of gene components as desired by crop researchers. Genome coverage, i.e., the number of sequenced raw reads mapped onto a certain location of the genome is an important quality indicator of completeness and assembly quality in draft assemblies. The present work aimed to improve the coverage in reported de novo sequenced draft genomes (GCA_000340665.1 and GCA_000230855.2) of pigeonpea, a legume widely cultivated in India. The two recently sequenced assemblies, A1 and A2 comprised 72% and 75% of the estimated coverage of the genome, respectively. We employed an assembly reconciliation approach to compare the draft assemblies and merge them, filling the gaps by employing an algorithm size sorting mate-pair library to generate a high quality and near complete assembly with enhanced contiguity. The majority of gaps present within scaffolds were filled with right-sized mate-pair reads. The improved assembly reduced the number of gaps than those reported in draft assemblies resulting in an improved genome coverage of 82.4%. Map accuracy of the improved assembly was evaluated using various quality metrics and for the presence of specific trait-related functional genes. Employed pair-end and mate-pair local libraries helped us to reduce gaps, repeats, and other sequence errors resulting in lengthier scaffolds compared to the two draft assemblies. We reported the prediction of putative host resistance genes against Fusarium wilt disease by their performance and evaluated them both in wet laboratory and field phenotypic conditions.
Collapse
Affiliation(s)
- Soma S. Marla
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Pallavi Mishra
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Ranjeet Maurya
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Mohar Singh
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | | | - Anil Kumar
- Directorate of Education, Rani Lakshmi Bai Central Agricultural University, Jhansi, India
| | - Mahesh C. Yadav
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - N. Subbarao
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Sanjeev K. Singh
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Rajesh Kumar
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| |
Collapse
|
6
|
Luo Y, Liao X, Wu FX, Wang J. Computational Approaches for Transcriptome Assembly Based on Sequencing Technologies. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190410155603] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Transcriptome assembly plays a critical role in studying biological properties and
examining the expression levels of genomes in specific cells. It is also the basis of many
downstream analyses. With the increase of speed and the decrease in cost, massive sequencing
data continues to accumulate. A large number of assembly strategies based on different
computational methods and experiments have been developed. How to efficiently perform
transcriptome assembly with high sensitivity and accuracy becomes a key issue. In this work, the
issues with transcriptome assembly are explored based on different sequencing technologies.
Specifically, transcriptome assemblies with next-generation sequencing reads are divided into
reference-based assemblies and de novo assemblies. The examples of different species are used to
illustrate that long reads produced by the third-generation sequencing technologies can cover fulllength
transcripts without assemblies. In addition, different transcriptome assemblies using the
Hybrid-seq methods and other tools are also summarized. Finally, we discuss the future directions
of transcriptome assemblies.
Collapse
Affiliation(s)
- Yuwen Luo
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xingyu Liao
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan, Canada
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
7
|
Marla SS, Mishra P, Maurya R, Singh M, Wankhede DP, Kumar A, Yadav MC, Subbarao N, Singh SK, Kumar R. Refinement of Draft Genome Assemblies of Pigeonpea ( Cajanus cajan). Front Genet 2020. [PMID: 33384719 DOI: 10.1101/2020.08.10.243949] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2023] Open
Abstract
Genome assembly of short reads from large plant genomes remains a challenge in computational biology despite major developments in next generation sequencing. Of late several draft assemblies have been reported in sequenced plant genomes. The reported draft genome assemblies of Cajanus cajan have different levels of genome completeness, a large number of repeats, gaps, and segmental duplications. Draft assemblies with portions of genome missing are shorter than the referenced original genome. These assemblies come with low map accuracy affecting further functional annotation and the prediction of gene components as desired by crop researchers. Genome coverage, i.e., the number of sequenced raw reads mapped onto a certain location of the genome is an important quality indicator of completeness and assembly quality in draft assemblies. The present work aimed to improve the coverage in reported de novo sequenced draft genomes (GCA_000340665.1 and GCA_000230855.2) of pigeonpea, a legume widely cultivated in India. The two recently sequenced assemblies, A1 and A2 comprised 72% and 75% of the estimated coverage of the genome, respectively. We employed an assembly reconciliation approach to compare the draft assemblies and merge them, filling the gaps by employing an algorithm size sorting mate-pair library to generate a high quality and near complete assembly with enhanced contiguity. The majority of gaps present within scaffolds were filled with right-sized mate-pair reads. The improved assembly reduced the number of gaps than those reported in draft assemblies resulting in an improved genome coverage of 82.4%. Map accuracy of the improved assembly was evaluated using various quality metrics and for the presence of specific trait-related functional genes. Employed pair-end and mate-pair local libraries helped us to reduce gaps, repeats, and other sequence errors resulting in lengthier scaffolds compared to the two draft assemblies. We reported the prediction of putative host resistance genes against Fusarium wilt disease by their performance and evaluated them both in wet laboratory and field phenotypic conditions.
Collapse
Affiliation(s)
- Soma S Marla
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Pallavi Mishra
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Ranjeet Maurya
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Mohar Singh
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | | | - Anil Kumar
- Directorate of Education, Rani Lakshmi Bai Central Agricultural University, Jhansi, India
| | - Mahesh C Yadav
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - N Subbarao
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Sanjeev K Singh
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Rajesh Kumar
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| |
Collapse
|
8
|
|
9
|
Lu ZH, Archibald AL, Ait-Ali T. Beyond the whole genome consensus: unravelling of PRRSV phylogenomics using next generation sequencing technologies. Virus Res 2014; 194:167-74. [PMID: 25312450 PMCID: PMC4275598 DOI: 10.1016/j.virusres.2014.10.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Revised: 10/01/2014] [Accepted: 10/01/2014] [Indexed: 02/05/2023]
Abstract
NGS allows the whole genome sequencing of PRRSV without any prior knowledge. Low frequency variants within the co-evolving quasispecies can be detected. Both macro- and micro-evolutionary events can be followed using NGS.
The highly heterogeneous porcine reproductive and respiratory syndrome virus (PRRSV) is the causative agent responsible for an economically important pig disease with the characteristic symptoms of reproductive losses in breeding sows and respiratory illnesses in young piglets. The virus can be broadly divided into the European and North American-like genotype 1 and 2 respectively. In addition to this intra-strains variability, the impact of coexisting viral quasispecies on disease development has recently gained much attention; owing very much to the advent of the next-generation sequencing (NGS) technologies. Genomic data produced from the massive sequencing capacities of NGS have enabled the study of PRRSV at an unprecedented rate and details. Unlike conventional sequencing methods which require knowledge of conserved regions, NGS allows de novo assembly of the full viral genomes. Evolutionary variations gained from different genotypic strains provide valuable insights into functionally important regions of the virus. Together with the advancement of sophisticated bioinformatics tools, ultra-deep NGS technologies make the detection of low frequency co-evolving quasispecies possible. This short review gives an overview, including a proposed workflow, on the use of NGS to explore the genetic diversity of PRRSV at both macro- and micro-evolutionary levels.
Collapse
Affiliation(s)
- Zen H Lu
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG Midlothian, United Kingdom.
| | - Alan L Archibald
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG Midlothian, United Kingdom
| | - Tahar Ait-Ali
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG Midlothian, United Kingdom.
| |
Collapse
|
10
|
Schellenberg JJ, Verbeke TJ, McQueen P, Krokhin OV, Zhang X, Alvare G, Fristensky B, Thallinger GG, Henrissat B, Wilkins JA, Levin DB, Sparling R. Enhanced whole genome sequence and annotation of Clostridium stercorarium DSM8532T using RNA-seq transcriptomics and high-throughput proteomics. BMC Genomics 2014; 15:567. [PMID: 24998381 PMCID: PMC4102724 DOI: 10.1186/1471-2164-15-567] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2013] [Accepted: 06/26/2014] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Growing interest in cellulolytic clostridia with potential for consolidated biofuels production is mitigated by low conversion of raw substrates to desired end products. Strategies to improve conversion are likely to benefit from emerging techniques to define molecular systems biology of these organisms. Clostridium stercorarium DSM8532T is an anaerobic thermophile with demonstrated high ethanol production on cellulose and hemicellulose. Although several lignocellulolytic enzymes in this organism have been well-characterized, details concerning carbohydrate transporters and central metabolism have not been described. Therefore, the goal of this study is to define an improved whole genome sequence (WGS) for this organism using in-depth molecular profiling by RNA-seq transcriptomics and tandem mass spectrometry-based proteomics. RESULTS A paired-end Roche/454 WGS assembly was closed through application of an in silico algorithm designed to resolve repetitive sequence regions, resulting in a circular replicon with one gap and a region of 2 kilobases with 10 ambiguous bases. RNA-seq transcriptomics resulted in nearly complete coverage of the genome, identifying errors in homopolymer length attributable to 454 sequencing. Peptide sequences resulting from high-throughput tandem mass spectrometry of trypsin-digested protein extracts were mapped to 1,755 annotated proteins (68% of all protein-coding regions). Proteogenomic analysis confirmed the quality of annotation and improvement pipelines, identifying a missing gene and an alternative reading frame. Peptide coverage of genes hypothetically involved in substrate hydrolysis, transport and utilization confirmed multiple pathways for glycolysis, pyruvate conversion and recycling of intermediates. No sequences homologous to transaldolase, a central enzyme in the pentose phosphate pathway, were observed by any method, despite demonstrated growth of this organism on xylose and xylan hemicellulose. CONCLUSIONS Complementary omics techniques confirm the quality of genome sequence assembly, annotation and error-reporting. Nearly complete genome coverage by RNA-seq likely indicates background DNA in RNA extracts, however these preps resulted in WGS enhancement and transcriptome profiling in a single Illumina run. No detection of transaldolase by any method despite xylose utilization by this organism indicates an alternative pathway for sedoheptulose-7-phosphate degradation. This report combines next-generation omics techniques to elucidate previously undefined features of substrate transport and central metabolism for this organism and its potential for consolidated biofuels production from lignocellulose.
Collapse
Affiliation(s)
| | - Tobin J Verbeke
- />Department of Microbiology, University of Manitoba, Winnipeg, Canada
| | - Peter McQueen
- />Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, Winnipeg, Canada
| | - Oleg V Krokhin
- />Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, Winnipeg, Canada
| | - Xiangli Zhang
- />Department of Plant Sciences, University of Manitoba, Winnipeg, Canada
| | - Graham Alvare
- />Department of Plant Sciences, University of Manitoba, Winnipeg, Canada
| | - Brian Fristensky
- />Department of Plant Sciences, University of Manitoba, Winnipeg, Canada
| | - Gerhard G Thallinger
- />Core Facility Bioinformatics, Austrian Centre of Industrial Biotechnology (ACIB), Graz, Austria
- />Institute for Genomics and Bioinformatics, Graz University of Technology, Graz, Austria
| | - Bernard Henrissat
- />Architecture et Fonction des Macromolécules Biologiques, Université Aix-Marseille, Marseille, France
- />UMR 7257, Centre National de Recherche Scientifique, 163 ave. de Luminy, Marseille, 13288 France
| | - John A Wilkins
- />Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, Winnipeg, Canada
| | - David B Levin
- />Department of Biosystems Engineering, University of Manitoba, Winnipeg, Canada
| | - Richard Sparling
- />Department of Microbiology, University of Manitoba, Winnipeg, Canada
| |
Collapse
|
11
|
El-Metwally S, Hamza T, Zakaria M, Helmy M. Next-generation sequence assembly: four stages of data processing and computational challenges. PLoS Comput Biol 2013; 9:e1003345. [PMID: 24348224 PMCID: PMC3861042 DOI: 10.1371/journal.pcbi.1003345] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.
Collapse
Affiliation(s)
- Sara El-Metwally
- Computer Science Department, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
| | - Taher Hamza
- Computer Science Department, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
| | - Magdi Zakaria
- Computer Science Department, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
| | - Mohamed Helmy
- Botany Department, Faculty of Agriculture, Al-Azhar University, Cairo, Egypt
- Biotechnology Department, Faculty of Agriculture, Al-Azhar University, Cairo, Egypt
| |
Collapse
|
12
|
Ferrarini M, Moretto M, Ward JA, Šurbanovski N, Stevanović V, Giongo L, Viola R, Cavalieri D, Velasco R, Cestaro A, Sargent DJ. An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC Genomics 2013; 14:670. [PMID: 24083400 PMCID: PMC3853357 DOI: 10.1186/1471-2164-14-670] [Citation(s) in RCA: 107] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2013] [Accepted: 09/26/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Second generation sequencing has permitted detailed sequence characterisation at the whole genome level of a growing number of non-model organisms, but the data produced have short read-lengths and biased genome coverage leading to fragmented genome assemblies. The PacBio RS long-read sequencing platform offers the promise of increased read length and unbiased genome coverage and thus the potential to produce genome sequence data of a finished quality containing fewer gaps and longer contigs. However, these advantages come at a much greater cost per nucleotide and with a perceived increase in error-rate. In this investigation, we evaluated the performance of the PacBio RS sequencing platform through the sequencing and de novo assembly of the Potentilla micrantha chloroplast genome. RESULTS Following error-correction, a total of 28,638 PacBio RS reads were recovered with a mean read length of 1,902 bp totalling 54,492,250 nucleotides and representing an average depth of coverage of 320× the chloroplast genome. The dataset covered the entire 154,959 bp of the chloroplast genome in a single contig (100% coverage) compared to seven contigs (90.59% coverage) recovered from an Illumina data, and revealed no bias in coverage of GC rich regions. Post-assembly the data were largely concordant with the Illumina data generated and allowed 187 ambiguities in the Illumina data to be resolved. The additional read length also permitted small differences in the two inverted repeat regions to be assigned unambiguously. CONCLUSIONS This is the first report to our knowledge of a chloroplast genome assembled de novo using PacBio sequence data. The PacBio RS data generated here were assembled into a single large contig spanning the P. micrantha chloroplast genome, with a higher degree of accuracy than an Illumina dataset generated at a much greater depth of coverage, due to longer read lengths and lower GC bias in the data. The results we present suggest PacBio data will be of immense utility for the development of genome sequence assemblies containing fewer unresolved gaps and ambiguities and a significantly smaller number of contigs than could be produced using short-read sequence data alone.
Collapse
Affiliation(s)
- Marco Ferrarini
- Research and Innovation Centre, Fondazione Edmund Mach, Via E, Mach 1, 38010 San Michele all'Adige, Italy.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Huang Y, Zhao Z, Xu H, Shyr Y, Zhang B. Advances in systems biology: computational algorithms and applications. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 3:S1. [PMID: 23281622 PMCID: PMC3524016 DOI: 10.1186/1752-0509-6-s3-s1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The 2012 International Conference on Intelligent Biology and Medicine (ICIBM 2012) was held on April 22-24, 2012 in Nashville, Tennessee, USA. The conference featured six technical sessions, one tutorial session, one workshop, and 3 keynote presentations that covered state-of-the-art research activities in genomics, systems biology, and intelligent computing. In addition to a major emphasis on the next generation sequencing (NGS)-driven informatics, ICIBM 2012 aligned significant interests in systems biology and its applications in medicine. We highlight in this editorial the selected papers from the meeting that address the developments of novel algorithms and applications in systems biology.
Collapse
Affiliation(s)
- Yufei Huang
- Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX 78249, USA.
| | | | | | | | | |
Collapse
|