1
|
Accumulation of endosymbiont genomes in an insect autosome followed by endosymbiont replacement. Curr Biol 2022; 32:2786-2795.e5. [PMID: 35671755 DOI: 10.1016/j.cub.2022.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 04/12/2022] [Accepted: 05/10/2022] [Indexed: 12/01/2022]
Abstract
Eukaryotic genomes can acquire bacterial DNA via lateral gene transfer (LGT).1 A prominent source of LGT is Wolbachia,2 a widespread endosymbiont of arthropods and nematodes that is transmitted maternally through female germline cells.3,4 The DNA transfer from the Wolbachia endosymbiont wAna to Drosophila ananassae is extensive5-7 and has been localized to chromosome 4, contributing to chromosome expansion in this lineage.6 As has happened frequently with claims of bacteria-to-eukaryote LGT, the contribution of wAna transfers to the expanded size of D. ananassae chromosome 4 has been specifically contested8 owing to an assembly where Wolbachia sequences were classified as contaminants and removed.9 Here, long-read sequencing with DNA from a Wolbachia-cured line enabled assembly of 4.9 Mbp of nuclear Wolbachia transfers (nuwts) in D. ananassae and a 24-kbp nuclear mitochondrial transfer. The nuwts are <8,000 years old in at least two locations in chromosome 4 with at least one whole-genome integration followed by rapid extensive duplication of most of the genome with regions that have up to 10 copies. The genes in nuwts are accumulating small indels and mobile element insertions. Among the highly duplicated genes are cifA and cifB, two genes associated with Wolbachia-mediated Drosophila cytoplasmic incompatibility. The wAna strain that was the source of nuwts was subsequently replaced by a different wAna endosymbiont. Direct RNA Nanopore sequencing of Wolbachia-cured lines identified nuwt transcripts, including spliced transcripts, but functionality, if any, remains elusive.
Collapse
|
2
|
X-treme loss of sequence diversity linked to neo-X chromosomes in filarial nematodes. PLoS Negl Trop Dis 2021; 15:e0009838. [PMID: 34705823 PMCID: PMC8575316 DOI: 10.1371/journal.pntd.0009838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 11/08/2021] [Accepted: 09/24/2021] [Indexed: 11/19/2022] Open
Abstract
The sequence diversity of natural and laboratory populations of Brugia pahangi and Brugia malayi was assessed with Illumina resequencing followed by mapping in order to identify single nucleotide variants and insertions/deletions. In natural and laboratory Brugia populations, there is a lack of sequence diversity on chromosome X relative to the autosomes (πX/πA = 0.2), which is lower than the expected (πX/πA = 0.75). A reduction in diversity is also observed in other filarial nematodes with neo-X chromosome fusions in the genera Onchocerca and Wuchereria, but not those without neo-X chromosome fusions in the genera Loa and Dirofilaria. In the species with neo-X chromosome fusions, chromosome X is abnormally large, containing a third of the genetic material such that a sizable portion of the genome is lacking sequence diversity. Such profound differences in genetic diversity can be consequential, having been associated with drug resistance and adaptability, with the potential to affect filarial eradication.
Collapse
|
3
|
Comparison of long-read sequencing technologies in interrogating bacteria and fly genomes. G3 (BETHESDA, MD.) 2021; 11:jkab083. [PMID: 33768248 PMCID: PMC8495745 DOI: 10.1093/g3journal/jkab083] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 03/07/2021] [Indexed: 12/14/2022]
Abstract
The newest generation of DNA sequencing technology is highlighted by the ability to generate sequence reads hundreds of kilobases in length. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. We used whole-genome sequencing data produced by three PacBio protocols (Sequel II CLR, Sequel II HiFi, RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. In both organisms tested, Sequel II assemblies had the highest consensus accuracy, even after accounting for differences in sequencing throughput. ONT and PacBio CLR had the longest reads sequenced compared to PacBio RS II and HiFi, and genome contiguity was highest when assembling these datasets. ONT Rapid Sequencing libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries. The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assembly or polishing eukaryotic genome assemblies, and an ONT-Illumina hybrid approach would be more cost-effective for many users. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs. The ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.
Collapse
|
4
|
Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 2021; 372:eabf7117. [PMID: 33632895 PMCID: PMC8026704 DOI: 10.1126/science.abf7117] [Citation(s) in RCA: 270] [Impact Index Per Article: 90.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/09/2021] [Indexed: 12/14/2022]
Abstract
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Collapse
|
5
|
Soluble Sema4D in Plasma of Head and Neck Squamous Cell Carcinoma Patients Is Associated With Underlying Non-Inflamed Tumor Profile. Front Immunol 2021; 12:596646. [PMID: 33776991 PMCID: PMC7991916 DOI: 10.3389/fimmu.2021.596646] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 01/20/2021] [Indexed: 11/19/2022] Open
Abstract
Semaphorin 4D (Sema4D) is a glycoprotein that is expressed by several tumors and immune cells. It can function as a membrane bound protein or as a cleaved soluble protein (sSema4D). We sought to investigate the translational potential of plasma sSema4D as an immune marker in plasma of patients with head and neck squamous cell carcinoma (HNSCC). Paired peripheral blood and tumor tissue samples of 104 patients with HNSCC were collected at the same time point to allow for real time analysis. Scoring of the histological inflammatory subtype (HIS) was carried out using Sema4D immunohistochemistry on the tumor tissue. sSema4D was detected in plasma using direct ELISA assay. Defining elevated sSema4D as values above the 95th percentile in healthy controls, our data showed that sSema4D levels in plasma were elevated in 25.0% (95% CI, 16.7–34.9%) of the patients with HNSCC and showed significant association with HIS immune excluded (HIS-IE) (p = 0.007), Sema4D+ve tumor cells (TCs) (p = 0.018) and PD-L1+ve immune cells (ICs) (p = 0.038). A multi-variable logistic regression analysis showed that HIS was significantly (P = 0.004) associated with elevated sSema4D, an association not explained by available patient-level factors. Using the IO-360 nanoString platform, differential gene expression (DGE) analysis of 10 HNSCC tumor tissues showed that patients with high sSema4D in plasma (HsS4D) clustered as IFN-γ negative tumor immune signature and were mostly HIS-IE. The IC type in the HsS4D paired tumor tissue was predominantly myeloid, while the lymphoid compartment was higher in the low sSema4D (LsS4D). The Wnt signaling pathway was upregulated in the HsS4D group. Further analysis using the IO-360, 770 gene set, showed significant non-inflamed profile of the HsS4D tumors compared to the LsS4D. In conclusion, our data reveals an association between sSema4D and the histological inflammatory subtype.
Collapse
|
6
|
Comparative Analysis of Genome of Ehrlichia sp. HF, a Model Bacterium to Study Fatal Human Ehrlichiosis. BMC Genomics 2021; 22:11. [PMID: 33407096 PMCID: PMC7789307 DOI: 10.1186/s12864-020-07309-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 12/07/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The genus Ehrlichia consists of tick-borne obligatory intracellular bacteria that can cause deadly diseases of medical and agricultural importance. Ehrlichia sp. HF, isolated from Ixodes ovatus ticks in Japan [also referred to as I. ovatus Ehrlichia (IOE) agent], causes acute fatal infection in laboratory mice that resembles acute fatal human monocytic ehrlichiosis caused by Ehrlichia chaffeensis. As there is no small laboratory animal model to study fatal human ehrlichiosis, Ehrlichia sp. HF provides a needed disease model. However, the inability to culture Ehrlichia sp. HF and the lack of genomic information have been a barrier to advance this animal model. In addition, Ehrlichia sp. HF has several designations in the literature as it lacks a taxonomically recognized name. RESULTS We stably cultured Ehrlichia sp. HF in canine histiocytic leukemia DH82 cells from the HF strain-infected mice, and determined its complete genome sequence. Ehrlichia sp. HF has a single double-stranded circular chromosome of 1,148,904 bp, which encodes 866 proteins with a similar metabolic potential as E. chaffeensis. Ehrlichia sp. HF encodes homologs of all virulence factors identified in E. chaffeensis, including 23 paralogs of P28/OMP-1 family outer membrane proteins, type IV secretion system apparatus and effector proteins, two-component systems, ankyrin-repeat proteins, and tandem repeat proteins. Ehrlichia sp. HF is a novel species in the genus Ehrlichia, as demonstrated through whole genome comparisons with six representative Ehrlichia species, subspecies, and strains, using average nucleotide identity, digital DNA-DNA hybridization, and core genome alignment sequence identity. CONCLUSIONS The genome of Ehrlichia sp. HF encodes all known virulence factors found in E. chaffeensis, substantiating it as a model Ehrlichia species to study fatal human ehrlichiosis. Comparisons between Ehrlichia sp. HF and E. chaffeensis will enable identification of in vivo virulence factors that are related to host specificity, disease severity, and host inflammatory responses. We propose to name Ehrlichia sp. HF as Ehrlichia japonica sp. nov. (type strain HF), to denote the geographic region where this bacterium was initially isolated.
Collapse
|
7
|
Complete Genome Sequence of wBp, the Wolbachia Endosymbiont of Brugia pahangi FR3. Microbiol Resour Announc 2020; 9:e00480-20. [PMID: 32616636 PMCID: PMC7330238 DOI: 10.1128/mra.00480-20] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 06/02/2020] [Indexed: 12/31/2022] Open
Abstract
Lymphatic filariasis is a devastating disease caused by filarial nematode roundworms, which contain obligate Wolbachia endosymbionts. Here, we assembled the genome of wBp, the Wolbachia endosymbiont of the filarial nematode Brugia pahangi, from Illumina, Pacific Biosciences, and Oxford Nanopore data. The complete, circular genome is 1,072,967 bp.
Collapse
|
8
|
Comparative Metagenome-Assembled Genome Analysis of " Candidatus Lachnocurva vaginae", Formerly Known as Bacterial Vaginosis-Associated Bacterium-1 (BVAB1). Front Cell Infect Microbiol 2020; 10:117. [PMID: 32296647 PMCID: PMC7136613 DOI: 10.3389/fcimb.2020.00117] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Accepted: 03/02/2020] [Indexed: 01/07/2023] Open
Abstract
Bacterial vaginosis-associated bacterium 1 (BVAB1) is an as-yet uncultured bacterial species found in the human vagina that belongs to the family Lachnospiraceae within the order Clostridiales. As its name suggests, this bacterium is often associated with bacterial vaginosis (BV), a common vaginal disorder that has been shown to increase a woman's risk for HIV, Chlamydia trachomatis, and Neisseria gonorrhoeae infections as well as preterm birth. BVAB1 has been further associated with the persistence of BV following metronidazole treatment, increased vaginal inflammation, and adverse obstetrics outcomes. There is no available complete genome sequence of BVAB1, which has made it difficult to mechanistically understand its role in disease. We present here a circularized metagenome-assembled genome (cMAG) of BVAB1 as well as a comparative analysis including an additional six metagenome-assembled genomes (MAGs) of this species. These sequences were derived from cervicovaginal samples of seven separate women. The cMAG was obtained from a metagenome sequenced with long-read technology on a PacBio Sequel II instrument while the others were derived from metagenomes sequenced on the Illumina HiSeq platform. The cMAG is 1.649 Mb in size and encodes 1,578 genes. We propose to rename BVAB1 to "Candidatus Lachnocurva vaginae" based on phylogenetic analyses, and provide genomic and metabolomic evidence that this candidate species may metabolize D-lactate, produce trimethylamine (one of the chemicals responsible for BV-associated odor), and be motile. The cMAG and the six MAGs are valuable resources that will further contribute to our understanding of the heterogeneous etiology of bacterial vaginosis.
Collapse
|
9
|
Strains used in whole organism Plasmodium falciparum vaccine trials differ in genome structure, sequence, and immunogenic potential. Genome Med 2020; 12:6. [PMID: 31915075 PMCID: PMC6950926 DOI: 10.1186/s13073-019-0708-9] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 12/19/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Plasmodium falciparum (Pf) whole-organism sporozoite vaccines have been shown to provide significant protection against controlled human malaria infection (CHMI) in clinical trials. Initial CHMI studies showed significantly higher durable protection against homologous than heterologous strains, suggesting the presence of strain-specific vaccine-induced protection. However, interpretation of these results and understanding of their relevance to vaccine efficacy have been hampered by the lack of knowledge on genetic differences between vaccine and CHMI strains, and how these strains are related to parasites in malaria endemic regions. METHODS Whole genome sequencing using long-read (Pacific Biosciences) and short-read (Illumina) sequencing platforms was conducted to generate de novo genome assemblies for the vaccine strain, NF54, and for strains used in heterologous CHMI (7G8 from Brazil, NF166.C8 from Guinea, and NF135.C10 from Cambodia). The assemblies were used to characterize sequences in each strain relative to the reference 3D7 (a clone of NF54) genome. Strains were compared to each other and to a collection of clinical isolates (sequenced as part of this study or from public repositories) from South America, sub-Saharan Africa, and Southeast Asia. RESULTS While few variants were detected between 3D7 and NF54, we identified tens of thousands of variants between NF54 and the three heterologous strains. These variants include SNPs, indels, and small structural variants that fall in regulatory and immunologically important regions, including transcription factors (such as PfAP2-L and PfAP2-G) and pre-erythrocytic antigens that may be key for sporozoite vaccine-induced protection. Additionally, these variants directly contributed to diversity in immunologically important regions of the genomes as detected through in silico CD8+ T cell epitope predictions. Of all heterologous strains, NF135.C10 had the highest number of unique predicted epitope sequences when compared to NF54. Comparison to global clinical isolates revealed that these four strains are representative of their geographic origin despite long-term culture adaptation; of note, NF135.C10 is from an admixed population, and not part of recently formed subpopulations resistant to artemisinin-based therapies present in the Greater Mekong Sub-region. CONCLUSIONS These results will assist in the interpretation of vaccine efficacy of whole-organism vaccines against homologous and heterologous CHMI.
Collapse
|
10
|
Intratumor genetic heterogeneity in squamous cell carcinoma of the oral cavity. Head Neck 2019; 41:2514-2524. [PMID: 30869813 DOI: 10.1002/hed.25719] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 01/03/2019] [Accepted: 02/07/2019] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND We sought to evaluate intratumor heterogeneity in squamous cell carcinoma of the oral cavity (OCC) and specifically determine the effect of physical separation and histologic differentiation within the same tumor. METHODS We performed whole exome sequencing on five biopsy sites-two from well-differentiated, two from poorly differentiated regions, and one from normal parenchyma-from five primary OCC specimens. RESULTS We found high levels of intratumor heterogeneity and, in four primary tumors, identified only 0 to 2 identical mutations in all subsites. We found that the heterogeneity inversely correlated with physical separation and that pairs of well-differentiated samples were more similar to each other than analogous poorly differentiated specimens. Only TP53 mutations, but not other purported "driver mutations" in head and neck squamous cell carcinoma, were found in multiple biopsy sites. CONCLUSION These data highlight the challenges to characterization of the mutational landscape of OCC with single site biopsy and have implications for personalized medicine.
Collapse
|
11
|
Extra-Chromosomal DNA Sequencing Reveals Episomal Prophages Capable of Impacting Virulence Factor Expression in Staphylococcus aureus. Front Microbiol 2018; 9:1406. [PMID: 30013526 PMCID: PMC6036120 DOI: 10.3389/fmicb.2018.01406] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2018] [Accepted: 06/07/2018] [Indexed: 01/20/2023] Open
Abstract
Staphylococcus aureus is a major human pathogen with well-characterized bacteriophage contributions to its virulence potential. Recently, we identified plasmidial and episomal prophages in S. aureus strains using an extra-chromosomal DNA (exDNA) isolation and sequencing approach, uncovering the plasmidial phage ϕBU01, which was found to encode important virulence determinants. Here, we expanded our extra-chromosomal sequencing of S. aureus, selecting 15 diverse clinical isolates with known chromosomal sequences for exDNA isolation and next-generation sequencing. We uncovered the presence of additional episomal prophages in 5 of 15 samples, but did not identify any plasmidial prophages. exDNA isolation was found to enrich for circular prophage elements, and qPCR characterization of the strains revealed that such prophage enrichment is detectable only in exDNA samples and would likely be missed in whole-genome DNA preparations (e.g., detection of episomal prophages did not correlate with higher prophage excision rates nor higher excised prophage copy numbers in qPCR experiments using whole-genome DNA). In S. aureus MSSA476, we found that enrichment and excision of the prophage ϕSa4ms into the cytoplasm was temporal and that episomal prophage localization did not appear to be a precursor to lytic cycle replication, suggesting ϕSa4ms excision into the cytoplasm may be part of a novel lysogenic switch. For example, we show that ϕSa4ms excision alters the promoter and transcription of htrA2 , encoding a stress-response serine protease, and that alternative promotion of htrA2 confers increased heat-stress survival in S. aureus COL. Overall, exDNA isolation and focused sequencing may offer a more complete genomic picture for bacterial pathogens, offering insights into important chromosomal dynamics likely missed with whole-genome DNA-based approaches.
Collapse
|
12
|
Streptococcus pneumoniae in the heart subvert the host response through biofilm-mediated resident macrophage killing. PLoS Pathog 2017; 13:e1006582. [PMID: 28841717 PMCID: PMC5589263 DOI: 10.1371/journal.ppat.1006582] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Revised: 09/07/2017] [Accepted: 08/15/2017] [Indexed: 11/18/2022] Open
Abstract
For over 130 years, invasive pneumococcal disease has been associated with the presence of extracellular planktonic pneumococci, i.e. diplococci or short chains in affected tissues. Herein, we show that Streptococcus pneumoniae that invade the myocardium instead replicate within cellular vesicles and transition into non-purulent biofilms. Pneumococci within mature cardiac microlesions exhibited salient biofilm features including intrinsic resistance to antibiotic killing and the presence of an extracellular matrix. Dual RNA-seq and subsequent principal component analyses of heart- and blood-isolated pneumococci confirmed the biofilm phenotype in vivo and revealed stark anatomical site-specific differences in virulence gene expression; the latter having major implications on future vaccine antigen selection. Our RNA-seq approach also identified three genomic islands as exclusively expressed in vivo. Deletion of one such island, Region of Diversity 12, resulted in a biofilm-deficient and highly inflammogenic phenotype within the heart; indicating a possible link between the biofilm phenotype and a dampened host-response. We subsequently determined that biofilm pneumococci released greater amounts of the toxin pneumolysin than did planktonic or RD12 deficient pneumococci. This allowed heart-invaded wildtype pneumococci to kill resident cardiac macrophages and subsequently subvert cytokine/chemokine production and neutrophil infiltration into the myocardium. This is the first report for pneumococcal biofilm formation in an invasive disease setting. We show that biofilm pneumococci actively suppress the host response through pneumolysin-mediated immune cell killing. As such, our findings contradict the emerging notion that biofilm pneumococci are passively immunoquiescent.
Collapse
|
13
|
Aligner optimization increases accuracy and decreases compute times in multi-species sequence data. Microb Genom 2017; 3:e000122. [PMID: 29114401 PMCID: PMC5643015 DOI: 10.1099/mgen.0.000122] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 06/04/2017] [Indexed: 01/01/2023] Open
Abstract
As sequencing technologies have evolved, the tools to analyze these sequences have made similar advances. However, for multi-species samples, we observed important and adverse differences in alignment specificity and computation time for bwa- mem (Burrows-Wheeler aligner-maximum exact matches) relative to bwa-aln. Therefore, we sought to optimize bwa-mem for alignment of data from multi-species samples in order to reduce alignment time and increase the specificity of alignments. In the multi-species cases examined, there was one majority member (i.e. Plasmodium falciparum or Brugia malayi) and one minority member (i.e. human or the Wolbachia endosymbiont wBm) of the sequence data. Increasing bwa-mem seed length from the default value reduced the number of read pairs from the majority sequence member that incorrectly aligned to the reference genome of the minority sequence member. Combining both source genomes into a single reference genome increased the specificity of mapping, while also reducing the central processing unit (CPU) time. In Plasmodium, at a seed length of 18 nt, 24.1 % of reads mapped to the human genome using 1.7±0.1 CPU hours, while 83.6 % of reads mapped to the Plasmodium genome using 0.2±0.0 CPU hours (total: 107.7 % reads mapping; in 1.9±0.1 CPU hours). In contrast, 97.1 % of the reads mapped to a combined Plasmodium-human reference in only 0.7±0.0 CPU hours. Overall, the results suggest that combining all references into a single reference database and using a 23 nt seed length reduces the computational time, while maximizing specificity. Similar results were found for simulated sequence reads from a mock metagenomic data set. We found similar improvements to computation time in a publicly available human-only data set.
Collapse
|
14
|
Analysis of complete genome sequence and major surface antigens of Neorickettsia helminthoeca, causative agent of salmon poisoning disease. Microb Biotechnol 2017; 10:933-957. [PMID: 28585301 PMCID: PMC5481527 DOI: 10.1111/1751-7915.12731] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Revised: 03/09/2017] [Accepted: 04/25/2017] [Indexed: 12/31/2022] Open
Abstract
Neorickettsia helminthoeca, a type species of the genus Neorickettsia, is an endosymbiont of digenetic trematodes of veterinary importance. Upon ingestion of salmonid fish parasitized with infected trematodes, canids develop salmon poisoning disease (SPD), an acute febrile illness that is particularly severe and often fatal in dogs without adequate treatment. We determined and analysed the complete genome sequence of N. helminthoeca: a single small circular chromosome of 884 232 bp encoding 774 potential proteins. N. helminthoeca is unable to synthesize lipopolysaccharides and most amino acids, but is capable of synthesizing vitamins, cofactors, nucleotides and bacterioferritin. N. helminthoeca is, however, distinct from majority of the family Anaplasmataceae to which it belongs, as it encodes nearly all enzymes required for peptidoglycan biosynthesis, suggesting its structural hardiness and inflammatory potential. Using sera from dogs that were experimentally infected by feeding with parasitized fish or naturally infected in southern California, Western blot analysis revealed that among five predicted N. helminthoeca outer membrane proteins, P51 and strain‐variable surface antigen were uniformly recognized. Our finding will help understanding pathogenesis, prevalence of N. helminthoeca infection among trematodes, canids and potentially other animals in nature to develop effective SPD diagnostic and preventive measures. Recent progresses in large‐scale genome sequencing have been uncovering broad distribution of Neorickettsia spp., the comparative genomics will facilitate understanding of biology and the natural history of these elusive environmental bacteria.
Collapse
|
15
|
Efficient Enrichment of Bacterial mRNA from Host-Bacteria Total RNA Samples. Sci Rep 2016; 6:34850. [PMID: 27713560 PMCID: PMC5054355 DOI: 10.1038/srep34850] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 09/21/2016] [Indexed: 01/19/2023] Open
Abstract
Despite numerous advances in genomics and bioinformatics, technological hurdles remain to examine host-microbe transcriptomics. Sometimes the transcriptome of either or both can be ascertained merely by generating more sequencing reads. However, many cases exist where bacterial mRNA needs to be enriched further to enable cost-effective sequencing of the pathogen or endosymbiont. While a suitable method is commercially available for mammalian samples of this type, development of such methods has languished for invertebrate samples. Furthermore, a common method across multiple taxa would facilitate comparisons between bacteria in invertebrate vectors and their vertebrate hosts. Here, a method is described to concurrently remove polyadenylated transcripts, prokaryotic rRNA, and eukaryotic rRNA, including those with low amounts of starting material (e.g. 100 ng). In a Wolbachia-Drosophila system, this bacterial mRNA enrichment yielded a 3-fold increase in Wolbachia mRNA abundance and a concomitant 3.3-fold increase in the percentage of transcripts detected. More specifically, 70% of the genome could be recovered by transcriptome sequencing compared to 21% in the total RNA. Sequencing of similar bacterial mRNA-enriched samples generated from Ehrlichia-infected canine cells covers 93% of the Ehrlichia genome, suggesting ubiquitous transcription across the entire Ehrlichia chaffeensis genome. This technique can potentially be used to enrich bacterial mRNA in many studies of host-microbe interactions.
Collapse
|
16
|
Abstract
BACKGROUND Next-generation sequencing of transposon-genome junctions from a saturated bacterial mutant library (Tn-seq) is a powerful tool that permits genome-wide determination of the contribution of genes to fitness of the organism under a wide range of experimental conditions. We report development, testing, and results from a Tn-seq system for use in Streptococcus agalactiae (group B Streptococcus; GBS), an important cause of neonatal sepsis. METHODS Our method uses a Himar1 mini-transposon that inserts at genomic TA dinucleotide sites, delivered to GBS on a temperature-sensitive plasmid that is subsequently cured from the bacterial population. In order to establish the GBS essential genome, we performed Tn-seq on DNA collected from three independent mutant libraries-with at least 135,000 mutants per library-at serial 24 h time points after outgrowth in rich media. RESULTS After statistical analysis of transposon insertion density and distribution, we identified 13.5 % of genes as essential and 1.2 % as critical, with high levels of reproducibility. Essential and critical genes are enriched for fundamental cellular housekeeping functions, such as acyl-tRNA biosynthesis, nucleotide metabolism, and glycolysis. We further validated our system by comparing fitness assignments of homologous genes in GBS and a close bacterial relative, Streptococcus pyogenes, which demonstrated 93 % concordance. Finally, we used our fitness assignments to identify signal transduction pathway components predicted to be essential or critical in GBS. CONCLUSIONS We believe that our baseline fitness assignments will be a valuable tool for GBS researchers and that our system has the potential to reveal key pathogenesis gene networks and potential therapeutic/preventative targets.
Collapse
|
17
|
Drosophila anti-nematode and antibacterial immune regulators revealed by RNA-Seq. BMC Genomics 2015; 16:519. [PMID: 26162375 PMCID: PMC4499211 DOI: 10.1186/s12864-015-1690-2] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 06/05/2015] [Indexed: 12/27/2022] Open
Abstract
Background Drosophila melanogaster activates a variety of immune responses against microbial infections. However, information on the Drosophila immune response to entomopathogenic nematode infections is currently limited. The nematode Heterorhabditis bacteriophora is an insect parasite that forms a mutualistic relationship with the gram-negative bacteria Photorhabdus luminescens. Following infection, the nematodes release the bacteria that quickly multiply within the insect and produce several toxins that eventually kill the host. Although we currently know that the insect immune system interacts with Photorhabdus, information on interaction with the nematode vector is scarce. Results Here we have used next generation RNA-sequencing to analyze the transcriptional profile of wild-type adult flies infected by axenic Heterorhabditis nematodes (lacking Photorhabdus bacteria), symbiotic Heterorhabditis nematodes (carrying Photorhabdus bacteria), and Photorhabdus bacteria alone. We have obtained approximately 54 million reads from the different infection treatments. Bioinformatic analysis shows that infection with Photorhabdus alters the transcription of a large number of Drosophila genes involved in translational repression as well in response to stress. However, Heterorhabditis infection alters the transcription of several genes that participate in lipidhomeostasis and metabolism, stress responses, DNA/protein sythesis and neuronal functions. We have also identified genes in the fly with potential roles in nematode recognition, anti-nematode activity and nociception. Conclusions These findings provide fundamental information on the molecular events that take place in Drosophila upon infection with the two pathogens, either separately or together. Such large-scale transcriptomic analyses set the stage for future functional studies aimed at identifying the exact role of key factors in the Drosophila immune response against nematode-bacteria complexes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1690-2) contains supplementary material, which is available to authorized users.
Collapse
|
18
|
Transcriptional attenuation controls macrolide inducible efflux and resistance in Streptococcus pneumoniae and in other Gram-positive bacteria containing mef/mel(msr(D)) elements. PLoS One 2015; 10:e0116254. [PMID: 25695510 PMCID: PMC4335068 DOI: 10.1371/journal.pone.0116254] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2014] [Accepted: 12/04/2014] [Indexed: 01/30/2023] Open
Abstract
Macrolide resistance, emerging in Streptococcus pneumoniae and other Gram-positive bacteria, is increasingly due to efflux pumps encoded by mef/mel(msr) operons found on discrete mobile genetic elements. The regulation of mef/mel(msr) in these elements is not well understood. We identified the mef(E)/mel transcriptional start, localized the mef(E)/mel promoter, and demonstrated attenuation of transcription as a mechanism of regulation of macrolide-inducible mef-mediated macrolide resistance in S. pneumoniae. The mef(E)/mel transcriptional start site was a guanine 327 bp upstream of mef(E). Consensus pneumococcal promoter -10 (5′-TATACT-3′) and -35 (5′-TTGAAC-3′) boxes separated by 17 bp were identified 7 bp upstream of the start site. Analysis of the predicted secondary structure of the 327 5’ region identified four pairs of inverted repeats R1-R8 predicted to fold into stem-loops, a small leader peptide [MTASMRLR, (Mef(E)L)] required for macrolide induction and a Rho-independent transcription terminator. RNA-seq analyses provided confirmation of transcriptional attenuation. In addition, expression of mef(E)L was also influenced by mef(E)L-dependent mRNA stability. The regulatory region 5’ of mef(E) was highly conserved in other mef/mel(msr)-containing elements including Tn1207.1 and the 5612IQ complex in pneumococci and Tn1207.3 in Group A streptococci, indicating a regulatory mechanism common to a wide variety of Gram-positive bacteria containing mef/mel(msr) elements.
Collapse
|
19
|
Extensive duplication of the Wolbachia DNA in chromosome four of Drosophila ananassae. BMC Genomics 2014; 15:1097. [PMID: 25496002 PMCID: PMC4299567 DOI: 10.1186/1471-2164-15-1097] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2014] [Accepted: 12/03/2014] [Indexed: 12/03/2022] Open
Abstract
Background Lateral gene transfer (LGT) from bacterial Wolbachia endosymbionts has been detected in ~20% of arthropod and nematode genome sequencing projects. Many of these transfers are large and contain a substantial part of the Wolbachia genome. Results Here, we re-sequenced three D. ananassae genomes from Asia and the Pacific that contain large LGTs from Wolbachia. We find that multiple copies of the Wolbachia genome are transferred to the Drosophila nuclear genome in all three lines. In the D. ananassae line from Indonesia, the copies of Wolbachia DNA in the nuclear genome are nearly identical in size and sequence yielding an even coverage of mapped reads over the Wolbachia genome. In contrast, the D. ananassae lines from Hawaii and India show an uneven coverage of mapped reads over the Wolbachia genome suggesting that different parts of these LGTs are present in different copy numbers. In the Hawaii line, we find that this LGT is underrepresented in third instar larvae indicative of being heterochromatic. Fluorescence in situ hybridization of mitotic chromosomes confirms that the LGT in the Hawaii line is heterochromatic and represents ~20% of the sequence on chromosome 4 (dot chromosome, Muller element F). Conclusions This collection of related lines contain large lateral gene transfers composed of multiple Wolbachia genomes that constitute >2% of the D. ananassae genome (~5 Mbp) and partially explain the abnormally large size of chromosome 4 in D. ananassae. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-1097) contains supplementary material, which is available to authorized users.
Collapse
|
20
|
Cassava genome from a wild ancestor to cultivated varieties. Nat Commun 2014; 5:5110. [PMID: 25300236 DOI: 10.1038/ncomms610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Accepted: 08/27/2014] [Indexed: 05/28/2023] Open
Abstract
Cassava is a major tropical food crop in the Euphorbiaceae family that has high carbohydrate production potential and adaptability to diverse environments. Here we present the draft genome sequences of a wild ancestor and a domesticated variety of cassava and comparative analyses with a partial inbred line. We identify 1,584 and 1,678 gene models specific to the wild and domesticated varieties, respectively, and discover high heterozygosity and millions of single-nucleotide variations. Our analyses reveal that genes involved in photosynthesis, starch accumulation and abiotic stresses have been positively selected, whereas those involved in cell wall biosynthesis and secondary metabolism, including cyanogenic glucoside formation, have been negatively selected in the cultivated varieties, reflecting the result of natural selection and domestication. Differences in microRNA genes and retrotransposon regulation could partly explain an increased carbon flux towards starch accumulation and reduced cyanogenic glucoside accumulation in domesticated cassava. These results may contribute to genetic improvement of cassava through better understanding of its biology.
Collapse
|
21
|
Cassava genome from a wild ancestor to cultivated varieties. Nat Commun 2014; 5:5110. [PMID: 25300236 PMCID: PMC4214410 DOI: 10.1038/ncomms6110] [Citation(s) in RCA: 154] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Accepted: 08/27/2014] [Indexed: 11/10/2022] Open
Abstract
Cassava is a major tropical food crop in the Euphorbiaceae family that has high carbohydrate production potential and adaptability to diverse environments. Here we present the draft genome sequences of a wild ancestor and a domesticated variety of cassava and comparative analyses with a partial inbred line. We identify 1,584 and 1,678 gene models specific to the wild and domesticated varieties, respectively, and discover high heterozygosity and millions of single-nucleotide variations. Our analyses reveal that genes involved in photosynthesis, starch accumulation and abiotic stresses have been positively selected, whereas those involved in cell wall biosynthesis and secondary metabolism, including cyanogenic glucoside formation, have been negatively selected in the cultivated varieties, reflecting the result of natural selection and domestication. Differences in microRNA genes and retrotransposon regulation could partly explain an increased carbon flux towards starch accumulation and reduced cyanogenic glucoside accumulation in domesticated cassava. These results may contribute to genetic improvement of cassava through better understanding of its biology.
Collapse
|
22
|
Single molecule sequencing and genome assembly of a clinical specimen of Loa loa, the causative agent of loiasis. BMC Genomics 2014; 15:788. [PMID: 25217238 PMCID: PMC4175631 DOI: 10.1186/1471-2164-15-788] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Accepted: 09/02/2014] [Indexed: 12/31/2022] Open
Abstract
Background More than 20% of the world’s population is at risk for infection by filarial nematodes and >180 million people worldwide are already infected. Along with infection comes significant morbidity that has a socioeconomic impact. The eight filarial nematodes that infect humans are Wuchereria bancrofti, Brugia malayi, Brugia timori, Onchocerca volvulus, Loa loa, Mansonella perstans, Mansonella streptocerca, and Mansonella ozzardi, of which three have published draft genome sequences. Since all have humans as the definitive host, standard avenues of research that rely on culturing and genetics have often not been possible. Therefore, genome sequencing provides an important window into understanding the biology of these parasites. The need for large amounts of high quality genomic DNA from homozygous, inbred lines; the availability of only short sequence reads from next-generation sequencing platforms at a reasonable expense; and the lack of random large insert libraries has limited our ability to generate high quality genome sequences for these parasites. However, the Pacific Biosciences single molecule, real-time sequencing platform holds great promise in reducing input amounts and generating sufficiently long sequences that bypass the need for large insert paired libraries. Results Here, we report on efforts to generate a more complete genome assembly for L. loa using genetically heterogeneous DNA isolated from a single clinical sample and sequenced on the Pacific Biosciences platform. To obtain the best assembly, numerous assemblers and sequencing datasets were analyzed, combined, and compared. Quiver-informed trimming of an assembly of only Pacific Biosciences reads by HGAP2 was selected as the final assembly of 96.4 Mbp in 2,250 contigs. This results in ~9% more of the genome in ~85% fewer contigs from ~80% less starting material at a fraction of the cost of previous Roche 454-based sequencing efforts. Conclusions The result is the most complete filarial nematode assembly produced thus far and demonstrates the utility of single molecule sequencing on the Pacific Biosciences platform for genetically heterogeneous metazoan genomes. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-788) contains supplementary material, which is available to authorized users.
Collapse
|
23
|
Rapid transcriptome sequencing of an invasive pest, the brown marmorated stink bug Halyomorpha halys. BMC Genomics 2014; 15:738. [PMID: 25168586 PMCID: PMC4174608 DOI: 10.1186/1471-2164-15-738] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Accepted: 08/21/2014] [Indexed: 12/23/2022] Open
Abstract
Background Halyomorpha halys (Stål) (Insecta:Hemiptera;Pentatomidae), commonly known as the Brown Marmorated Stink Bug (BMSB), is an invasive pest of the mid-Atlantic region of the United States, causing economically important damage to a wide range of crops. Native to Asia, BMSB was first observed in Allentown, PA, USA, in 1996, and this pest is now well-established throughout the US mid-Atlantic region and beyond. In addition to the serious threat BMSB poses to agriculture, BMSB has become a nuisance to homeowners, invading home gardens and congregating in large numbers in human-made structures, including homes, to overwinter. Despite its significance as an agricultural pest with limited control options, only 100 bp of BMSB sequence data was available in public databases when this project began. Results Transcriptome sequencing was undertaken to provide a molecular resource to the research community to inform the development of pest control strategies and to provide molecular data for population genetics studies of BMSB. Using normalized, strand-specific libraries, we sequenced pools of all BMSB life stages on the Illumina HiSeq. Trinity was used to assemble 200,000 putative transcripts in >100,000 components. A novel bioinformatic method that analyzed the strand-specificity of the data reduced this to 53,071 putative transcripts from 18,573 components. By integrating multiple other data types, we narrowed this further to 13,211 representative transcripts. Conclusions Bacterial endosymbiont genes were identified in this dataset, some of which have a copy number consistent with being lateral gene transfers between endosymbiont genomes and Hemiptera, including ankyrin-repeat related proteins, lysozyme, and mannanase. Such genes and endosymbionts may provide novel targets for BMSB-specific biocontrol. This study demonstrates the utility of strand-specific sequencing in generating shotgun transcriptomes and that rapid sequencing shotgun transcriptomes is possible without the need for extensive inbreeding to generate homozygous lines. Such sequencing can provide a rapid response to pest invasions similar to that already described for disease epidemiology. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-738) contains supplementary material, which is available to authorized users.
Collapse
|
24
|
Insights into the role of DNA methylation in diatoms by genome-wide profiling in Phaeodactylum tricornutum. Nat Commun 2013; 4:2091. [PMID: 23820484 DOI: 10.1038/ncomms3091] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Accepted: 05/31/2013] [Indexed: 02/07/2023] Open
Abstract
DNA cytosine methylation is a widely conserved epigenetic mark in eukaryotes that appears to have critical roles in the regulation of genome structure and transcription. Genome-wide methylation maps have so far only been established from the supergroups Archaeplastida and Unikont. Here we report the first whole-genome methylome from a stramenopile, the marine model diatom Phaeodactylum tricornutum. Around 6% of the genome is intermittently methylated in a mosaic pattern. We find extensive methylation in transposable elements. We also detect methylation in over 320 genes. Extensive gene methylation correlates strongly with transcriptional silencing and differential expression under specific conditions. By contrast, we find that genes with partial methylation tend to be constitutively expressed. These patterns contrast with those found previously in other eukaryotes. By going beyond plants, animals and fungi, this stramenopile methylome adds significantly to our understanding of the evolution of DNA methylation in eukaryotes.
Collapse
|
25
|
Extensively duplicated and transcriptionally active recent lateral gene transfer from a bacterial Wolbachia endosymbiont to its host filarial nematode Brugia malayi. BMC Genomics 2013; 14:639. [PMID: 24053607 PMCID: PMC3849323 DOI: 10.1186/1471-2164-14-639] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2013] [Accepted: 09/17/2013] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Lymphatic filariasis is a neglected tropical disease afflicting more than 120 million people, while another 1.3 billion people are at risk of infection. The nematode worm Brugia malayi is one of the causative agents of the disease and exists in a mutualistic symbiosis with Wolbachia bacteria. Since extensive lateral gene transfer occurs frequently between Wolbachia and its hosts, we sought to measure the extent of such LGT in B. malayi by whole genome sequencing of Wolbachia-depleted worms. RESULTS A considerable fraction (at least 115.4-kbp, or 10.6%) of the 1.08-Mbp Wolbachia wBm genome has been transferred to its nematode host and retains high levels of similarity, including 227 wBm genes and gene fragments. Complete open reading frames were transferred for 32 of these genes, meaning they have the potential to produce functional proteins. Moreover, four transfers have evidence of life stage-specific regulation of transcription at levels similar to other nematode transcripts, strengthening the possibility that they are functional. CONCLUSIONS There is extensive and ongoing transfer of Wolbachia DNA to the worm genome and some transfers are transcribed in a stage-specific manner at biologically relevant levels.
Collapse
|
26
|
Phenotypic, genomic, and transcriptional characterization of Streptococcus pneumoniae interacting with human pharyngeal cells. BMC Genomics 2013; 14:383. [PMID: 23758733 PMCID: PMC3708772 DOI: 10.1186/1471-2164-14-383] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2012] [Accepted: 05/24/2013] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Streptococcus pneumoniae is a leading cause of childhood morbidity and mortality worldwide, despite the availability of effective pneumococcal vaccines. Understanding the molecular interactions between the bacterium and the host will contribute to the control and prevention of pneumococcal disease. RESULTS We used a combination of adherence assays, mutagenesis and functional genomics to identify novel factors involved in adherence. By contrasting these processes in two pneumococcal strains, TIGR4 and G54, we showed that adherence and invasion capacities vary markedly by strain. Electron microscopy showed more adherent bacteria in association with membranous pseudopodia in the TIGR4 strain. Operons for cell wall phosphorylcholine incorporation (lic), manganese transport (psa) and phosphate utilization (phn) were up-regulated in both strains on exposure to epithelial cells. Pneumolysin, pili, stress protection genes (adhC-czcD) and genes of the type II fatty acid synthesis pathway were highly expressed in the naturally more invasive strain, TIGR4. Deletion mutagenesis of five gene regions identified as regulated in this study revealed attenuation in adherence. Most strikingly, ∆SP_1922 which was predicted to contain a B-cell epitope and revealed significant attenuation in adherence, appeared to be expressed as a part of an operon that includes the gene encoding the cytoplasmic pore-forming toxin and vaccine candidate, pneumolysin. CONCLUSION This work identifies a list of novel potential pneumococcal adherence determinants.
Collapse
|
27
|
Abstract
MOTIVATION A large and rapidly growing number of bacterial organisms have been sequenced by the newest sequencing technologies. Cheaper and faster sequencing technologies make it easy to generate very high coverage of bacterial genomes, but these advances mean that DNA preparation costs can exceed the cost of sequencing for small genomes. The need to contain costs often results in the creation of only a single sequencing library, which in turn introduces new challenges for genome assembly methods. RESULTS We evaluated the ability of multiple genome assembly programs to assemble bacterial genomes from a single, deep-coverage library. For our comparison, we chose bacterial species spanning a wide range of GC content and measured the contiguity and accuracy of the resulting assemblies. We compared the assemblies produced by this very high-coverage, one-library strategy to the best assemblies created by two-library sequencing, and we found that remarkably good bacterial assemblies are possible with just one library. We also measured the effect of read length and depth of coverage on assembly quality and determined the values that provide the best results with current algorithms. CONTACT salzberg@jhu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
28
|
Exome sequencing reveals SCO2 mutations in a family presented with fatal infantile hyperthermia. J Hum Genet 2013; 58:226-8. [PMID: 23364397 DOI: 10.1038/jhg.2012.156] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We applied whole-exome sequencing (WES) for identification of an underlying genetic cause of a disease in a family presented with fatal infantile hyperthermia. Analysis of WES results revealed novel, deleterious compound missense mutations, Val160Ala and Pro233Thr, in the synthesis of cytochrome C oxidase 2 gene (SCO2) encoding a mitochondrial protein, Sco2, which is important for cytochrome C oxidase (COX) synthesis. Autosomal recessive mutations in SCO2 are known to be associated with COX deficiency recognized as fatal infantile cardio-encephalomyopathy (604272, OMIM). The Val160Ala and Pro233Thr mutations occurred in the conserved thioredoxin domain of Sco2 and predicted to disrupt protein folding and interaction of Sco2 with other proteins. Our results show applicability of WES in identification of disease-causing mutations and in establishing molecular diagnosis of severe, infantile onset disorder with a challenging diagnosis.
Collapse
|
29
|
Efficient subtraction of insect rRNA prior to transcriptome analysis of Wolbachia-Drosophila lateral gene transfer. BMC Res Notes 2012; 5:230. [PMID: 22583543 PMCID: PMC3424148 DOI: 10.1186/1756-0500-5-230] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2012] [Accepted: 05/14/2012] [Indexed: 12/03/2022] Open
Abstract
Background Numerous methods exist for enriching bacterial or mammalian mRNA prior to transcriptome experiments. Yet there persists a need for methods to enrich for mRNA in non-mammalian animal systems. For example, insects contain many important and interesting obligate intracellular bacteria, including endosymbionts and vector-borne pathogens. Such obligate intracellular bacteria are difficult to study by traditional methods. Therefore, genomics has greatly increased our understanding of these bacteria. Efficient subtraction methods are needed for removing both bacteria and insect rRNA in these systems to enable transcriptome-based studies. Findings A method is described that efficiently removes >95% of insect rRNA from total RNA samples, as determined by microfluidics and transcriptome sequencing. This subtraction yielded a 6.2-fold increase in mRNA abundance. Such a host rRNA-depletion strategy, in combination with bacterial rRNA depletion, is necessary to analyze transcription of obligate intracellular bacteria. Here, transcripts were identified that arise from a lateral gene transfer of an entire Wolbachia bacterial genome into a Drosophila ananassae chromosome. In this case, an rRNA depletion strategy is preferred over polyA-based enrichment since transcripts arising from bacteria-to-animal lateral gene transfer may not be poly-adenylated. Conclusions This enrichment method yields a significant increase in mRNA abundance when poly-A selection is not suitable. It can be used in combination with bacterial rRNA subtraction to enable experiments to simultaneously measure bacteria and insect mRNA in vector and endosymbiont biology experiments.
Collapse
|
30
|
Abstract
The genetic architecture of ischemic stroke is complex and is likely to include rare or low frequency variants with high penetrance and large effect sizes. Such variants are likely to provide important insights into disease pathogenesis compared to common variants with small effect sizes. Because a significant portion of human functional variation may derive from the protein-coding portion of genes we undertook a pilot study to identify variation across the human exome (i.e., the coding exons across the entire human genome) in 10 ischemic stroke cases. Our efforts focused on evaluating the feasibility and identifying the difficulties in this type of research as it applies to ischemic stroke. The cases included 8 African-Americans and 2 Caucasians selected on the basis of similar stroke subtypes and by implementing a case selection algorithm that emphasized the genetic contribution of stroke risk. Following construction of paired-end sequencing libraries, all predicted human exons in each sample were captured and sequenced. Sequencing generated an average of 25.5 million read pairs (75 bp×2) and 3.8 Gbp per sample. After passing quality filters, screening the exomes against dbSNP demonstrated an average of 2839 novel SNPs among African-Americans and 1105 among Caucasians. In an aggregate analysis, 48 genes were identified to have at least one rare variant across all stroke cases. One gene, CSN3, identified by screening our prior GWAS results in conjunction with our exome results, was found to contain an interesting coding polymorphism as well as containing excess rare variation as compared with the other genes evaluated. In conclusion, while rare coding variants may predispose to the risk of ischemic stroke, this fact has yet to be definitively proven. Our study demonstrates the complexities of such research and highlights that while exome data can be obtained, the optimal analytical methods have yet to be determined.
Collapse
|
31
|
Abstract
Neurotrophin-dependent activation of the tyrosine kinase receptor trkB.FL modulates neuromuscular synapse maintenance and function; however, it is unclear what role the alternative splice variant, truncated trkB (trkB.T1), may have in the peripheral neuromuscular axis. We examined this question in trkB.T1 null mice and demonstrate that in vivo neuromuscular performance and nerve-evoked muscle tension are significantly increased. In vitro assays indicated that the gain-in-function in trkB.T1(-/-) animals resulted specifically from an increased muscle contractility, and increased electrically evoked calcium release. In the trkB.T1 null muscle, we identified an increase in Akt activation in resting muscle as well as a significant increase in trkB.FL and Akt activation in response to contractile activity. On the basis of these findings, we conclude that the trkB signaling pathway might represent a novel target for intervention across diseases characterized by deficits in neuromuscular function.
Collapse
|
32
|
Full-genome sequence and analysis of a novel human rhinovirus strain within a divergent HRV-A clade. Arch Virol 2009; 155:83-7. [PMID: 19936613 PMCID: PMC2910715 DOI: 10.1007/s00705-009-0549-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2009] [Accepted: 11/03/2009] [Indexed: 11/25/2022]
Abstract
Genome sequences of human rhinoviruses (HRV) have primarily been from stocks collected in the 1960s, with genomes and phylogeny of modern HRVs remaining undefined. Here, two modern isolates (hrv-A101 and hrv-A101-v1) collected approximately 8 years apart were sequenced in their entirety. Incorporation into our full-genome HRV alignment with subsequent phylogenetic network inference indicated that these represent a unique HRV-A, localized within a distinct divergent clade. They appear to have resulted from recombination of the hrv-65 and hrv-78 lineages. These results support our contention that there are unrecognized distinct HRV-A strains, and that recombination is evident in currently circulating strains.
Collapse
|
33
|
Complete genome sequence of the aerobic CO-oxidizing thermophile Thermomicrobium roseum. PLoS One 2009; 4:e4207. [PMID: 19148287 PMCID: PMC2615216 DOI: 10.1371/journal.pone.0004207] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2008] [Accepted: 11/07/2008] [Indexed: 12/02/2022] Open
Abstract
In order to enrich the phylogenetic diversity represented in the available sequenced bacterial genomes and as part of an “Assembling the Tree of Life” project, we determined the genome sequence of Thermomicrobium roseum DSM 5159. T. roseum DSM 5159 is a red-pigmented, rod-shaped, Gram-negative extreme thermophile isolated from a hot spring that possesses both an atypical cell wall composition and an unusual cell membrane that is composed entirely of long-chain 1,2-diols. Its genome is composed of two circular DNA elements, one of 2,006,217 bp (referred to as the chromosome) and one of 919,596 bp (referred to as the megaplasmid). Strikingly, though few standard housekeeping genes are found on the megaplasmid, it does encode a complete system for chemotaxis including both chemosensory components and an entire flagellar apparatus. This is the first known example of a complete flagellar system being encoded on a plasmid and suggests a straightforward means for lateral transfer of flagellum-based motility. Phylogenomic analyses support the recent rRNA-based analyses that led to T. roseum being removed from the phylum Thermomicrobia and assigned to the phylum Chloroflexi. Because T. roseum is a deep-branching member of this phylum, analysis of its genome provides insights into the evolution of the Chloroflexi. In addition, even though this species is not photosynthetic, analysis of the genome provides some insight into the origins of photosynthesis in the Chloroflexi. Metabolic pathway reconstructions and experimental studies revealed new aspects of the biology of this species. For example, we present evidence that T. roseum oxidizes CO aerobically, making it the first thermophile known to do so. In addition, we propose that glycosylation of its carotenoids plays a crucial role in the adaptation of the cell membrane to this bacterium's thermophilic lifestyle. Analyses of published metagenomic sequences from two hot springs similar to the one from which this strain was isolated, show that close relatives of T. roseum DSM 5159 are present but have some key differences from the strain sequenced.
Collapse
|
34
|
Comparative genomics of mutualistic viruses of Glyptapanteles parasitic wasps. Genome Biol 2008; 9:R183. [PMID: 19116010 PMCID: PMC2646287 DOI: 10.1186/gb-2008-9-12-r183] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2008] [Accepted: 12/30/2008] [Indexed: 02/04/2023] Open
Abstract
Comparative genome analysis of two endosymbiotic polydnaviruses from Glyptapanteles parasitic wasps reveals new insights into the evolutionary arms race between host and parasite. Background Polydnaviruses, double-stranded DNA viruses with segmented genomes, have evolved as obligate endosymbionts of parasitoid wasps. Virus particles are replication deficient and produced by female wasps from proviral sequences integrated into the wasp genome. These particles are co-injected with eggs into caterpillar hosts, where viral gene expression facilitates parasitoid survival and, thereby, survival of proviral DNA. Here we characterize and compare the encapsidated viral genome sequences of bracoviruses in the family Polydnaviridae associated with Glyptapanteles gypsy moth parasitoids, along with near complete proviral sequences from which both viral genomes are derived. Results The encapsidated Glyptapanteles indiensis and Glyptapanteles flavicoxis bracoviral genomes, each composed of 29 different size segments, total approximately 517 and 594 kbp, respectively. They are generated from a minimum of seven distinct loci in the wasp genome. Annotation of these sequences revealed numerous novel features for polydnaviruses, including insect-like sugar transporter genes and transposable elements. Evolutionary analyses suggest that positive selection is widespread among bracoviral genes. Conclusions The structure and organization of G. indiensis and G. flavicoxis bracovirus proviral segments as multiple loci containing one to many viral segments, flanked and separated by wasp gene-encoding DNA, is confirmed. Rapid evolution of bracovirus genes supports the hypothesis of bracovirus genes in an 'arms race' between bracovirus and caterpillar. Phylogenetic analyses of the bracoviral genes encoding sugar transporters provides the first robust evidence of a wasp origin for some polydnavirus genes. We hypothesize transposable elements, such as those described here, could facilitate transfer of genes between proviral segments and host DNA.
Collapse
|
35
|
Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure. BMC Genomics 2008; 9:562. [PMID: 19036158 PMCID: PMC2612030 DOI: 10.1186/1471-2164-9-562] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2008] [Accepted: 11/26/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Tetrahymena thermophila, a widely studied model for cellular and molecular biology, is a binucleated single-celled organism with a germline micronucleus (MIC) and somatic macronucleus (MAC). The recent draft MAC genome assembly revealed low sequence repetitiveness, a result of the epigenetic removal of invasive DNA elements found only in the MIC genome. Such low repetitiveness makes complete closure of the MAC genome a feasible goal, which to achieve would require standard closure methods as well as removal of minor MIC contamination of the MAC genome assembly. Highly accurate preliminary annotation of Tetrahymena's coding potential was hindered by the lack of both comparative genomic sequence information from close relatives and significant amounts of cDNA evidence, thus limiting the value of the genomic information and also leaving unanswered certain questions, such as the frequency of alternative splicing. RESULTS We addressed the problem of MIC contamination using comparative genomic hybridization with purified MIC and MAC DNA probes against a whole genome oligonucleotide microarray, allowing the identification of 763 genome scaffolds likely to contain MIC-limited DNA sequences. We also employed standard genome closure methods to essentially finish over 60% of the MAC genome. For the improvement of annotation, we have sequenced and analyzed over 60,000 verified EST reads from a variety of cellular growth and development conditions. Using this EST evidence, a combination of automated and manual reannotation efforts led to updates that affect 16% of the current protein-coding gene models. By comparing EST abundance, many genes showing apparent differential expression between these conditions were identified. Rare instances of alternative splicing and uses of the non-standard amino acid selenocysteine were also identified. CONCLUSION We report here significant progress in genome closure and reannotation of Tetrahymena thermophila. Our experience to date suggests that complete closure of the MAC genome is attainable. Using the new EST evidence, automated and manual curation has resulted in substantial improvements to the over 24,000 gene models, which will be valuable to researchers studying this model organism as well as for comparative genomics purposes.
Collapse
|
36
|
Abstract
Only five bacterial phyla with members capable of chlorophyll (Chl)-based phototrophy are presently known. Metagenomic data from the phototrophic microbial mats of alkaline siliceous hot springs in Yellowstone National Park revealed the existence of a distinctive bacteriochlorophyll (BChl)-synthesizing, phototrophic bacterium. A highly enriched culture of this bacterium grew photoheterotrophically, synthesized BChls a and c under oxic conditions, and had chlorosomes and type 1 reaction centers. "Candidatus Chloracidobacterium thermophilum" is a BChl-producing member of the poorly characterized phylum Acidobacteria.
Collapse
|
37
|
Structure and evolution of a proviral locus of Glyptapanteles indiensis bracovirus. BMC Microbiol 2007; 7:61. [PMID: 17594494 PMCID: PMC1919376 DOI: 10.1186/1471-2180-7-61] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2007] [Accepted: 06/26/2007] [Indexed: 11/18/2022] Open
Abstract
Background Bracoviruses (BVs), a group of double-stranded DNA viruses with segmented genomes, are mutualistic endosymbionts of parasitoid wasps. Virus particles are replication deficient and are produced only by female wasps from proviral sequences integrated into the wasp genome. Virus particles are injected along with eggs into caterpillar hosts, where viral gene expression facilitates parasitoid survival and therefore perpetuation of proviral DNA. Here we describe a 223 kbp region of Glyptapanteles indiensis genomic DNA which contains a part of the G. indiensis bracovirus (GiBV) proviral genome. Results Eighteen of ~24 GiBV viral segment sequences are encoded by 7 non-overlapping sets of BAC clones, revealing that some proviral segment sequences are separated by long stretches of intervening DNA. Two overlapping BACs, which contain a locus of 8 tandemly arrayed proviral segments flanked on either side by ~35 kbp of non-packaged DNA, were sequenced and annotated. Structural and compositional analyses of this cluster revealed it exhibits a G+C and nucleotide composition distinct from the flanking DNA. By analyzing sequence polymorphisms in the 8 GiBV viral segment sequences, we found evidence for widespread selection acting on both protein-coding and non-coding DNA. Comparative analysis of viral and proviral segment sequences revealed a sequence motif involved in the excision of proviral genome segments which is highly conserved in two other bracoviruses. Conclusion Contrary to current concepts of bracovirus proviral genome organization our results demonstrate that some but not all GiBV proviral segment sequences exist in a tandem array. Unexpectedly, non-coding DNA in the 8 proviral genome segments which typically occupies ~70% of BV viral genomes is under selection pressure suggesting it serves some function(s). We hypothesize that selection acting on GiBV proviral sequences maintains the genetic island-like nature of the cluster of proviral genome segments described herein. In contrast to large differences in the predicted gene composition of BV genomes, sequences that appear to mediate processes of viral segment formation, such as proviral segment excision and circularization, appear to be highly conserved, supporting the hypothesis of a single origin for BVs.
Collapse
|
38
|
Metabolic complementarity and genomics of the dual bacterial symbiosis of sharpshooters. PLoS Biol 2007; 4:e188. [PMID: 16729848 PMCID: PMC1472245 DOI: 10.1371/journal.pbio.0040188] [Citation(s) in RCA: 296] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2005] [Accepted: 04/10/2006] [Indexed: 11/19/2022] Open
Abstract
Mutualistic intracellular symbiosis between bacteria and insects is a widespread phenomenon that has contributed to the global success of insects. The symbionts, by provisioning nutrients lacking from diets, allow various insects to occupy or dominate ecological niches that might otherwise be unavailable. One such insect is the glassy-winged sharpshooter
(Homalodisca coagulata), which feeds on xylem fluid, a diet exceptionally poor in organic nutrients. Phylogenetic studies based on rRNA have shown two types of bacterial symbionts to be coevolving with sharpshooters: the gamma-proteobacterium
Baumannia cicadellinicola and the
Bacteroidetes species
Sulcia muelleri. We report here the sequencing and analysis of the 686,192–base pair genome of
B. cicadellinicola and approximately 150 kilobase pairs of the small genome of
S. muelleri, both isolated from
H. coagulata. Our study, which to our knowledge is the first genomic analysis of an obligate symbiosis involving multiple partners, suggests striking complementarity in the biosynthetic capabilities of the two symbionts:
B. cicadellinicola devotes a substantial portion of its genome to the biosynthesis of vitamins and cofactors required by animals and lacks most amino acid biosynthetic pathways, whereas
S. muelleri apparently produces most or all of the essential amino acids needed by its host. This finding, along with other results of our genome analysis, suggests the existence of metabolic codependency among the two unrelated endosymbionts and their insect host. This dual symbiosis provides a model case for studying correlated genome evolution and genome reduction involving multiple organisms in an intimate, obligate mutualistic relationship. In addition, our analysis provides insight for the first time into the differences in symbionts between insects (e.g., aphids) that feed on phloem versus those like
H. coagulata that feed on xylem. Finally, the genomes of these two symbionts provide potential targets for controlling plant pathogens such as
Xylella fastidiosa, a major agroeconomic problem, for which
H. coagulata and other sharpshooters serve as vectors of transmission.
Sequence data from two obligate bacterial endosymbionts of an insect--the glassy-winged sharpshooter--suggest there is metabolic co-dependency among them and their insect host.
Collapse
|
39
|
Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote. PLoS Biol 2007; 4:e286. [PMID: 16933976 PMCID: PMC1557398 DOI: 10.1371/journal.pbio.0040286] [Citation(s) in RCA: 545] [Impact Index Per Article: 32.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2006] [Accepted: 06/23/2006] [Indexed: 01/05/2023] Open
Abstract
The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus (MIC) has its genome held in reserve for sexual reproduction. The soma-like macronucleus (MAC), which possesses a genome processed from that of the MIC, is the center of gene expression and does not directly contribute DNA to sexual progeny. We report here the shotgun sequencing, assembly, and analysis of the MAC genome of T. thermophila, which is approximately 104 Mb in length and composed of approximately 225 chromosomes. Overall, the gene set is robust, with more than 27,000 predicted protein-coding genes, 15,000 of which have strong matches to genes in other organisms. The functional diversity encoded by these genes is substantial and reflects the complexity of processes required for a free-living, predatory, single-celled organism. This is highlighted by the abundance of lineage-specific duplications of genes with predicted roles in sensing and responding to environmental conditions (e.g., kinases), using diverse resources (e.g., proteases and transporters), and generating structural complexity (e.g., kinesins and dyneins). In contrast to the other lineages of alveolates (apicomplexans and dinoflagellates), no compelling evidence could be found for plastid-derived genes in the genome. UGA, the only T. thermophila stop codon, is used in some genes to encode selenocysteine, thus making this organism the first known with the potential to translate all 64 codons in nuclear genes into amino acids. We present genomic evidence supporting the hypothesis that the excision of DNA from the MIC to generate the MAC specifically targets foreign DNA as a form of genome self-defense. The combination of the genome sequence, the functional diversity encoded therein, and the presence of some pathways missing from other model organisms makes T. thermophila an ideal model for functional genomic studies to address biological, biomedical, and biotechnological questions of fundamental importance.
Collapse
|
40
|
Complete plastid genome sequence of Daucus carota: implications for biotechnology and phylogeny of angiosperms. BMC Genomics 2006; 7:222. [PMID: 16945140 PMCID: PMC1579219 DOI: 10.1186/1471-2164-7-222] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2006] [Accepted: 08/31/2006] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Carrot (Daucus carota) is a major food crop in the US and worldwide. Its capacity for storage and its lifecycle as a biennial make it an attractive species for the introduction of foreign genes, especially for oral delivery of vaccines and other therapeutic proteins. Until recently efforts to express recombinant proteins in carrot have had limited success in terms of protein accumulation in the edible tap roots. Plastid genetic engineering offers the potential to overcome this limitation, as demonstrated by the accumulation of BADH in chromoplasts of carrot taproots to confer exceedingly high levels of salt resistance. The complete plastid genome of carrot provides essential information required for genetic engineering. Additionally, the sequence data add to the rapidly growing database of plastid genomes for assessing phylogenetic relationships among angiosperms. RESULTS The complete carrot plastid genome is 155,911 bp in length, with 115 unique genes and 21 duplicated genes within the IR. There are four ribosomal RNAs, 30 distinct tRNA genes and 18 intron-containing genes. Repeat analysis reveals 12 direct and 2 inverted repeats > or = 30 bp with a sequence identity > or = 90%. Phylogenetic analysis of nucleotide sequences for 61 protein-coding genes using both maximum parsimony (MP) and maximum likelihood (ML) were performed for 29 angiosperms. Phylogenies from both methods provide strong support for the monophyly of several major angiosperm clades, including monocots, eudicots, rosids, asterids, eurosids II, euasterids I, and euasterids II. CONCLUSION The carrot plastid genome contains a number of dispersed direct and inverted repeats scattered throughout coding and non-coding regions. This is the first sequenced plastid genome of the family Apiaceae and only the second published genome sequence of the species-rich euasterid II clade. Both MP and ML trees provide very strong support (100% bootstrap) for the sister relationship of Daucus with Panax in the euasterid II clade. These results provide the best taxon sampling of complete chloroplast genomes and the strongest support yet for the sister relationship of Caryophyllales to the asterids. The availability of the complete plastid genome sequence should facilitate improved transformation efficiency and foreign gene expression in carrot through utilization of endogenous flanking sequences and regulatory elements.
Collapse
|
41
|
Comparative genomics of Brassica oleracea and Arabidopsis thaliana reveal gene loss, fragmentation, and dispersal after polyploidy. THE PLANT CELL 2006; 18:1348-59. [PMID: 16632643 PMCID: PMC1475499 DOI: 10.1105/tpc.106.041665] [Citation(s) in RCA: 280] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2006] [Revised: 03/21/2006] [Accepted: 03/28/2006] [Indexed: 05/08/2023]
Abstract
We sequenced 2.2 Mb representing triplicated genome segments of Brassica oleracea, which are each paralogous with one another and homologous with a segmentally duplicated region of the Arabidopsis thaliana genome. Sequence annotation identified 177 conserved collinear genes in the B. oleracea genome segments. Analysis of synonymous base substitution rates indicated that the triplicated Brassica genome segments diverged from a common ancestor soon after divergence of the Arabidopsis and Brassica lineages. This conclusion was corroborated by phylogenetic analysis of protein families. Using A. thaliana as an outgroup, 35% of the genes inferred to be present when genome triplication occurred in the Brassica lineage have been lost, most likely via a deletion mechanism, in an interspersed pattern. Genes encoding proteins involved in signal transduction or transcription were not found to be significantly more extensively retained than those encoding proteins classified with other functions, but putative proteins predicted in the A. thaliana genome were underrepresented in B. oleracea. We identified one example of gene loss from the Arabidopsis lineage. We found evidence for the frequent insertion of gene fragments of nuclear genomic origin and identified four apparently intact genes in noncollinear positions in the B. oleracea and A. thaliana genomes.
Collapse
|
42
|
The complete chloroplast genome sequence of Gossypium hirsutum: organization and phylogenetic relationships to other angiosperms. BMC Genomics 2006; 7:61. [PMID: 16553962 PMCID: PMC1513215 DOI: 10.1186/1471-2164-7-61] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2005] [Accepted: 03/23/2006] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Cotton (Gossypium hirsutum) is the most important fiber crop grown in 90 countries. In 2004-2005, US farmers planted 79% of the 5.7-million hectares of nuclear transgenic cotton. Unfortunately, genetically modified cotton has the potential to hybridize with other cultivated and wild relatives, resulting in geographical restrictions to cultivation. However, chloroplast genetic engineering offers the possibility of containment because of maternal inheritance of transgenes. The complete chloroplast genome of cotton provides essential information required for genetic engineering. In addition, the sequence data were used to assess phylogenetic relationships among the major clades of rosids using cotton and 25 other completely sequenced angiosperm chloroplast genomes. RESULTS The complete cotton chloroplast genome is 160,301 bp in length, with 112 unique genes and 19 duplicated genes within the IR, containing a total of 131 genes. There are four ribosomal RNAs, 30 distinct tRNA genes and 17 intron-containing genes. The gene order in cotton is identical to that of tobacco but lacks rpl22 and infA. There are 30 direct and 24 inverted repeats 30 bp or longer with a sequence identity > or = 90%. Most of the direct repeats are within intergenic spacer regions, introns and a 72 bp-long direct repeat is within the psaA and psaB genes. Comparison of protein coding sequences with expressed sequence tags (ESTs) revealed nucleotide substitutions resulting in amino acid changes in ndhC, rpl23, rpl20, rps3 and clpP. Phylogenetic analysis of a data set including 61 protein-coding genes using both maximum likelihood and maximum parsimony were performed for 28 taxa, including cotton and five other angiosperm chloroplast genomes that were not included in any previous phylogenies. CONCLUSION Cotton chloroplast genome lacks rpl22 and infA and contains a number of dispersed direct and inverted repeats. RNA editing resulted in amino acid changes with significant impact on their hydropathy. Phylogenetic analysis provides strong support for the position of cotton in the Malvales in the eurosids II clade sister to Arabidopsis in the Brassicales. Furthermore, there is strong support for the placement of the Myrtales sister to the eurosid I clade, although expanded taxon sampling is needed to further test this relationship.
Collapse
|
43
|
Life in hot carbon monoxide: the complete genome sequence of Carboxydothermus hydrogenoformans Z-2901. PLoS Genet 2005; 1:e65. [PMID: 16311624 PMCID: PMC1287953 DOI: 10.1371/journal.pgen.0010065] [Citation(s) in RCA: 198] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2005] [Accepted: 10/19/2005] [Indexed: 11/20/2022] Open
Abstract
We report here the sequencing and analysis of the genome of the thermophilic bacterium Carboxydothermus hydrogenoformans Z-2901. This species is a model for studies of hydrogenogens, which are diverse bacteria and archaea that grow anaerobically utilizing carbon monoxide (CO) as their sole carbon source and water as an electron acceptor, producing carbon dioxide and hydrogen as waste products. Organisms that make use of CO do so through carbon monoxide dehydrogenase complexes. Remarkably, analysis of the genome of C. hydrogenoformans reveals the presence of at least five highly differentiated anaerobic carbon monoxide dehydrogenase complexes, which may in part explain how this species is able to grow so much more rapidly on CO than many other species. Analysis of the genome also has provided many general insights into the metabolism of this organism which should make it easier to use it as a source of biologically produced hydrogen gas. One surprising finding is the presence of many genes previously found only in sporulating species in the Firmicutes Phylum. Although this species is also a Firmicutes, it was not known to sporulate previously. Here we show that it does sporulate and because it is missing many of the genes involved in sporulation in other species, this organism may serve as a “minimal” model for sporulation studies. In addition, using phylogenetic profile analysis, we have identified many uncharacterized gene families found in all known sporulating Firmicutes, but not in any non-sporulating bacteria, including a sigma factor not known to be involved in sporulation previously. Carboxydothermus hydrogenoformans, a bacterium isolated from a Russian hotspring, is studied for three major reasons: it grows at very high temperature, it lives almost entirely on a diet of carbon monoxide (CO), and it converts water to hydrogen gas as part of its metabolism. Understanding this organism's unique biology gets a boost from the decoding of its genome, reported in this issue of PLoS Genetics. For example, genome analysis reveals that it encodes five different forms of the protein machine carbon monoxide dehydrogenase (CODH). Most species have no CODH and even species that utilize CO usually have only one or two. The five CODH in C. hydrogenoformans likely allow it to both use CO for diverse cellular processes and out-compete for it when it is limiting. The genome sequence also led the researchers to experimentally document new aspects of this species' biology including the ability to form spores. The researchers then used comparative genomic analysis to identify conserved genes found in all spore-forming species, including Bacillus anthracis, and not in any other species. Finally, the genome sequence and analysis reported here will aid in those trying to develop this and other species into systems to biologically produce hydrogen gas from water.
Collapse
|
44
|
The R1 resistance gene cluster contains three groups of independently evolving, type I R1 homologues and shows substantial structural variation among haplotypes of Solanum demissum. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2005; 44:37-51. [PMID: 16167894 DOI: 10.1111/j.1365-313x.2005.02506.x] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Cultivated and wild potatoes contain a major disease-resistance cluster on the short arm of chromosome V, including the R1 resistance (R) gene against potato late blight. To explore the functional and evolutionary significance of clustering in the generation of novel disease-resistance genes, we constructed three approximately 1 Mb physical maps in the R1 gene region, one for each of the three genomes (haplotypes) of allohexaploid Solanum demissum, the wild potato progenitor of the R1 locus. Totals of 691, 919 and 559 kb were sequenced for each haplotype, and three distinct resistance-gene families were identified, one homologous to the potato R1 gene and two others homologous to either the Prf or the Bs4 R-gene of tomato. The regions with R1 homologues are highly divergent among the three haplotypes, in contrast to the conserved flanking non-resistance gene regions. The R1 locus shows dramatic variation in overall length and R1 homologue number among the three haplotypes. Sequence comparisons of the R1 homologues show that they form three distinct clades in a distance tree. Frequent sequence exchanges were detected among R1 homologues within each clade, but not among those in different clades. These frequent sequence exchanges homogenized the intron sequences of homologues within each clade, but did not homogenize the coding sequences. Our results suggest that the R1 homologues represent three independent groups of fast-evolving type I resistance genes, characterized by chimeric structures resulting from frequent sequence exchanges among group members. Such genes were first identified among clustered RGC2 genes in lettuce, where they were distinguished from slow-evolving type II R-genes. Our findings at the R1 locus in S. demissum may indicate that a common or similar mechanism underlies the previously reported differentiation of type I and type II R-genes and the differentiation of type I R-genes into distinct groups, identified here.
Collapse
|
45
|
Abstract
The genome of the flowering plant Arabidopsis thaliana has five chromosomes. Here we report the sequence of the largest, chromosome 1, in two contigs of around 14.2 and 14.6 megabases. The contigs extend from the telomeres to the centromeric borders, regions rich in transposons, retrotransposons and repetitive elements such as the 180-base-pair repeat. The chromosome represents 25% of the genome and contains about 6,850 open reading frames, 236 transfer RNAs (tRNAs) and 12 small nuclear RNAs. There are two clusters of tRNA genes at different places on the chromosome. One consists of 27 tRNA(Pro) genes and the other contains 27 tandem repeats of tRNA(Tyr)-tRNA(Tyr)-tRNA(Ser) genes. Chromosome 1 contains about 300 gene families with clustered duplications. There are also many repeat elements, representing 8% of the sequence.
Collapse
|
46
|
Abstract
Arabidopsis thaliana is an important model system for plant biologists. In 1996 an international collaboration (the Arabidopsis Genome Initiative) was formed to sequence the whole genome of Arabidopsis and in 1999 the sequence of the first two chromosomes was reported. The sequence of the last three chromosomes and an analysis of the whole genome are reported in this issue. Here we present the sequence of chromosome 3, organized into four sequence segments (contigs). The two largest (13.5 and 9.2 Mb) correspond to the top (long) and the bottom (short) arms of chromosome 3, and the two small contigs are located in the genetically defined centromere. This chromosome encodes 5,220 of the roughly 25,500 predicted protein-coding genes in the genome. About 20% of the predicted proteins have significant homology to proteins in eukaryotic genomes for which the complete sequence is available, pointing to important conserved cellular functions among eukaryotes.
Collapse
|
47
|
Abstract
Arabidopsis thaliana (Arabidopsis) is unique among plant model organisms in having a small genome (130-140 Mb), excellent physical and genetic maps, and little repetitive DNA. Here we report the sequence of chromosome 2 from the Columbia ecotype in two gap-free assemblies (contigs) of 3.6 and 16 megabases (Mb). The latter represents the longest published stretch of uninterrupted DNA sequence assembled from any organism to date. Chromosome 2 represents 15% of the genome and encodes 4,037 genes, 49% of which have no predicted function. Roughly 250 tandem gene duplications were found in addition to large-scale duplications of about 0.5 and 4.5 Mb between chromosomes 2 and 1 and between chromosomes 2 and 4, respectively. Sequencing of nearly 2 Mb within the genetically defined centromere revealed a low density of recognizable genes, and a high density and diverse range of vestigial and presumably inactive mobile elements. More unexpected is what appears to be a recent insertion of a continuous stretch of 75% of the mitochondrial genome into chromosome 2.
Collapse
|