126
|
Li R, Yang P, Li M, Fang W, Yue X, Nanaei HA, Gan S, Du D, Cai Y, Dai X, Yang Q, Cao C, Deng W, He S, Li W, Ma R, Liu M, Jiang Y. A Hu sheep genome with the first ovine Y chromosome reveal introgression history after sheep domestication. SCIENCE CHINA-LIFE SCIENCES 2020; 64:1116-1130. [PMID: 32997330 DOI: 10.1007/s11427-020-1807-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Accepted: 08/25/2020] [Indexed: 01/21/2023]
Abstract
The Y chromosome plays key roles in male fertility and reflects the evolutionary history of paternal lineages. Here, we present a de novo genome assembly of the Hu sheep with the first draft assembly of ovine Y chromosome (oMSY), using nanopore sequencing and Hi-C technologies. The oMSY that we generated spans 10.6 Mb from which 775 Y-SNPs were identified by applying a large panel of whole genome sequences from worldwide sheep and wild Iranian mouflons. Three major paternal lineages (HY1a, HY1b and HY2) were defined across domestic sheep, of which HY2 was newly detected. Surprisingly, HY2 forms a monophyletic clade with the Iranian mouflons and is highly divergent from both HY1a and HY1b. Demographic analysis of Y chromosomes, mitochondrial and nuclear genomes confirmed that HY2 and the maternal counterpart of lineage C represented a distinct wild mouflon population in Iran that diverge from the direct ancestor of domestic sheep, the wild mouflons in Southeastern Anatolia. Our results suggest that wild Iranian mouflons had introgressed into domestic sheep and thereby introduced this Iranian mouflon specific lineage carrying HY2 to both East Asian and Africa sheep populations.
Collapse
|
127
|
Rodriguez-Caro L, Fenner J, Benson C, Van Belleghem SM, Counterman BA. Genome Assembly of the Dogface Butterfly Zerene cesonia. Genome Biol Evol 2020; 12:3580-3585. [PMID: 31755926 PMCID: PMC6944212 DOI: 10.1093/gbe/evz254] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/14/2019] [Indexed: 02/04/2023] Open
Abstract
Comparisons of high-quality, reference butterfly, and moth genomes have been instrumental to advancing our understanding of how hybridization, and natural selection drive genomic change during the origin of new species and novel traits. Here, we present a genome assembly of the Southern Dogface butterfly, Zerene cesonia (Pieridae) whose brilliant wing colorations have been implicated in developmental plasticity, hybridization, sexual selection, and speciation. We assembled 266,407,278 bp of the Z. cesonia genome, which accounts for 98.3% of the estimated 271 Mb genome size. Using a hybrid approach involving Chicago libraries with Hi-Rise assembly and a diploid Meraculous assembly, the final haploid genome was assembled. In the final assembly, nearly all autosomes and the Z chromosome were assembled into single scaffolds. The largest 29 scaffolds accounted for 91.4% of the genome assembly, with the remaining ∼8% distributed among another 247 scaffolds and overall N50 of 9.2 Mb. Tissue-specific RNA-seq informed annotations identified 16,442 protein-coding genes, which included 93.2% of the arthropod Benchmarking Universal Single-Copy Orthologs (BUSCO). The Z. cesonia genome assembly had ∼9% identified as repetitive elements, with a transposable element landscape rich in helitrons. Similar to other Lepidoptera genomes, Z. cesonia showed a high conservation of chromosomal synteny. The Z. cesonia assembly provides a high-quality reference for studies of chromosomal arrangements in the Pierid family, as well as for population, phylo, and functional genomic studies of adaptation and speciation.
Collapse
|
128
|
The Complete Genome Sequence of the Staphylococcus Bacteriophage Metroid. G3-GENES GENOMES GENETICS 2020; 10:2975-2979. [PMID: 32727926 PMCID: PMC7466978 DOI: 10.1534/g3.120.401365] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Phages infecting bacteria of the genus Staphylococcus play an important role in their host’s ecology and evolution. On one hand, horizontal gene transfer from phage can encourage the rapid adaptation of pathogenic Staphylococcus enabling them to escape host immunity or access novel environments. On the other hand, lytic phages are promising agents for the treatment of bacterial infections, especially those resistant to antibiotics. As part of an ongoing effort to gain novel insights into bacteriophage diversity, we characterized the complete genome of the Staphylococcus bacteriophage Metroid, a cluster C phage with a genome size of 151kb, encompassing 254 predicted protein-coding genes as well as 4 tRNAs. A comparative genomic analysis highlights strong similarities – including a conservation of the lysis cassette – with other Staphylococcus cluster C bacteriophages, several of which were previously characterized for therapeutic applications.
Collapse
|
129
|
The Genome Sequence of the Octocoral Paramuricea clavata - A Key Resource To Study the Impact of Climate Change in the Mediterranean. G3-GENES GENOMES GENETICS 2020; 10:2941-2952. [PMID: 32660973 PMCID: PMC7467007 DOI: 10.1534/g3.120.401371] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The octocoral, Paramuricea clavata, is a habitat-forming anthozoan with a key ecological role in rocky benthic and biodiversity-rich communities in the Mediterranean and Eastern Atlantic. Shallow populations of P. clavata in the North-Western Mediterranean are severely affected by warming-induced mass mortality events (MMEs). These MMEs have differentially impacted individuals and populations of P. clavata (i.e., varied levels of tissue necrosis and mortality rates) over thousands of kilometers of coastal areas. The eco-evolutionary processes, including genetic factors, contributing to these differential responses remain to be characterized. Here, we sequenced a P. clavata individual with short and long read technologies, producing 169.98 Gb of Illumina paired-end and 3.55 Gb of Oxford Nanopore Technologies (ONT) reads. We obtained a de novo genome assembly accounting for 607 Mb in 64,145 scaffolds. The contig and scaffold N50s are 19.15 Kb and 23.92 Kb, respectively. Despite of the low contiguity of the assembly, its gene completeness is relatively high, including 75.8% complete and 9.4% fragmented genes out of the 978 metazoan genes contained in the metazoa_odb9 database. A total of 62,652 protein-coding genes have been annotated. This assembly is one of the few octocoral genomes currently available. This is undoubtedly a valuable resource for characterizing the genetic bases of the differential responses to thermal stress and for the identification of thermo-resistant individuals and populations. Overall, having the genome of P. clavata will facilitate studies of various aspects of its evolutionary ecology and elaboration of effective conservation plans such as active restoration to overcome the threats of global change.
Collapse
|
130
|
Padovani de Souza K, Setubal JC, Ponce de Leon F de Carvalho AC, Oliveira G, Chateau A, Alves R. Machine learning meets genome assembly. Brief Bioinform 2020; 20:2116-2129. [PMID: 30137230 DOI: 10.1093/bib/bby072] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 07/11/2018] [Accepted: 07/22/2018] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION With the recent advances in DNA sequencing technologies, the study of the genetic composition of living organisms has become more accessible for researchers. Several advances have been achieved because of it, especially in the health sciences. However, many challenges which emerge from the complexity of sequencing projects remain unsolved. Among them is the task of assembling DNA fragments from previously unsequenced organisms, which is classified as an NP-hard (nondeterministic polynomial time hard) problem, for which no efficient computational solution with reasonable execution time exists. However, several tools that produce approximate solutions have been used with results that have facilitated scientific discoveries, although there is ample room for improvement. As with other NP-hard problems, machine learning algorithms have been one of the approaches used in recent years in an attempt to find better solutions to the DNA fragment assembly problem, although still at a low scale. RESULTS This paper presents a broad review of pioneering literature comprising artificial intelligence-based DNA assemblers-particularly the ones that use machine learning-to provide an overview of state-of-the-art approaches and to serve as a starting point for further study in this field.
Collapse
|
131
|
Sicilia A, Santoro DF, Testa G, Cosentino SL, Lo Piero AR. Transcriptional response of giant reed (Arundo donax L.) low ecotype to long-term salt stress by unigene-based RNAseq. PHYTOCHEMISTRY 2020; 177:112436. [PMID: 32563719 DOI: 10.1016/j.phytochem.2020.112436] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 06/03/2020] [Accepted: 06/07/2020] [Indexed: 06/11/2023]
Abstract
The giant reed is a fast growing herbaceous non-food crop considered as eligible alternative energy source to reduce the usage of fossil fuels. Tolerance of this plant to abiotic stress has been demonstrated across a range of stressful conditions, thus allowing cultivation in marginal or poorly cultivated land in order not to compromise food security and to overcome land use controversies. In this work, we de novo sequenced, assembled and analyzed the A. donax low G34 ecotype leaf transcriptome (RNAseq analysis) subjected to severe long-term salt stress (256.67 mM NaCl corresponding to 32 dS m-1 electric conductibility). In order to shed light upon the response to high salinity of this non model plant, we analyzed clusters related to salt sensory and signaling transduction, transcription factors, hormone regulation, Reactive Oxygen Species (ROS) scavenging and osmolyte biosynthesis, all of them showing different regulation compared to untreated plants. The analysis of clusters related to ethylene biosynthesis and signaling indicated that gene transcription is modulated towards the minimization of ethylene negative effects upon plant growth. Certainly, the photosynthesis is strongly affected since genes involved in Rubisco biosynthesis and assembly are down-regulated. However, a shift towards C4 photosynthesis is likely to occur as gene regulation is aimed to activate the primary CO2 fixation to PEP (phosphoenolpyruvate). The analysis of "carbon metabolism" category revealed that G34 ecotype under salt stress induces the expression of glycolysis and Krebs cycle related genes, this being consistent with the hypothesis that some sort of salt avoidance might be occurred in A. donax G34 low ecotype. By comparing our results with findings obtained with other giant reed ecotype, we identified several differences in the response to salt that are in accordance with the possibility that heritable phenotypic differences among clones of A. donax might be accumulated especially in ecotypes originating from distant geographical areas, despite their asexual reproduction modality. Additionally, 26,838 simple sequence repeat (SSR) markers were identified and validated. This SSR dataset definitely expands the marker catalogue of A. donax facilitating the genotypic characterization of this species.
Collapse
|
132
|
Upadhyay M, Hauser A, Kunz E, Krebs S, Blum H, Dotsev A, Okhlopkov I, Bagirov V, Brem G, Zinovieva N, Medugorac I. The First Draft Genome Assembly of Snow Sheep (Ovis nivicola). Genome Biol Evol 2020; 12:1330-1336. [PMID: 32592471 PMCID: PMC7487135 DOI: 10.1093/gbe/evaa124] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/08/2020] [Indexed: 12/30/2022] Open
Abstract
The snow sheep, Ovis nivicola, which is endemic to the mountain ranges of northeastern Siberia, are well adapted to the harsh cold climatic conditions of their habitat. In this study, using long reads of Nanopore sequencing technology, whole-genome sequencing, assembly, and gene annotation of a snow sheep were carried out. Additionally, RNA-seq reads from several tissues were also generated to supplement the gene prediction in snow sheep genome. The assembled genome was ∼2.62 Gb in length and was represented by 7,157 scaffolds with N50 of about 2 Mb. The repetitive sequences comprised of 41% of the total genome. BUSCO analysis revealed that the snow sheep assembly contained full-length or partial fragments of 97% of mammalian universal single-copy orthologs (n = 4,104), illustrating the completeness of the assembly. In addition, a total of 20,045 protein-coding sequences were identified using comprehensive gene prediction pipeline. Of which 19,240 (∼96%) sequences were annotated using protein databases. Moreover, homology-based searches and de novo identification detected 1,484 tRNAs; 243 rRNAs; 1,931 snRNAs; and 782 miRNAs in the snow sheep genome. To conclude, we generated the first de novo genome of the snow sheep using long reads; these data are expected to contribute significantly to our understanding related to evolution and adaptation within the Ovis genus.
Collapse
|
133
|
Maggi J, Roberts L, Koller S, Rebello G, Berger W, Ramesar R. De Novo Assembly-Based Analysis of RPGR Exon ORF15 in an Indigenous African Cohort Overcomes Limitations of a Standard Next-Generation Sequencing (NGS) Data Analysis Pipeline. Genes (Basel) 2020; 11:genes11070800. [PMID: 32679846 PMCID: PMC7396994 DOI: 10.3390/genes11070800] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 06/24/2020] [Accepted: 07/13/2020] [Indexed: 01/10/2023] Open
Abstract
RPGR exon ORF15 variants are one of the most frequent causes for inherited retinal disorders (IRDs), in particular retinitis pigmentosa. The low sequence complexity of this mutation hotspot makes it prone to indels and challenging for sequence data analysis. Whole-exome sequencing generally fails to provide adequate coverage in this region. Therefore, complementary methods are needed to avoid false positives as well as negative results. In this study, next-generation sequencing (NGS) was used to sequence long-range PCR amplicons for an IRD cohort of African ancestry. By developing a novel secondary analysis pipeline based on de novo assembly, we were able to avoid the miscalling of variants generated by standard NGS analysis tools. We identified pathogenic variants in 11 patients (13% of the cohort), two of which have not been reported previously. We provide a novel and alternative end-to-end secondary analysis pipeline for targeted NGS of ORF15 that is less prone to false positive and negative variant calls.
Collapse
|
134
|
Yamasaki YY, Kakioka R, Takahashi H, Toyoda A, Nagano AJ, Machida Y, Møller PR, Kitano J. Genome-wide patterns of divergence and introgression after secondary contact between Pungitius sticklebacks. Philos Trans R Soc Lond B Biol Sci 2020; 375:20190548. [PMID: 32654635 DOI: 10.1098/rstb.2019.0548] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Speciation is a continuous process. Although it is known that differential adaptation can initiate divergence even in the face of gene flow, we know relatively little about the mechanisms driving complete reproductive isolation and the genomic patterns of divergence and introgression at the later stages of speciation. Sticklebacks contain many pairs of sympatric species differing in levels of reproductive isolation and divergence history. Nevertheless, most previous studies have focused on young species pairs. Here, we investigated two sympatric stickleback species, Pungitius pungitius and P. sinensis, whose habitats overlap in eastern Hokkaido; these species show hybrid male sterility, suggesting that they may be at a late stage of speciation. Our demographic analysis using whole-genome sequence data showed that these species split 1.73 Ma and came into secondary contact 37 200 years ago after a period of allopatry. This long period of allopatry might have promoted the evolution of intrinsic incompatibility. Although we detected on-going gene flow and signatures of introgression, overall genomic divergence was high, with considerable heterogeneity across the genome. The heterogeneity was significantly associated with variation in recombination rate. This sympatric pair provides new avenues to investigate the late stages of the stickleback speciation continuum. This article is part of the theme issue 'Towards the completion of speciation: the evolution of reproductive isolation beyond the first barriers'.
Collapse
|
135
|
Hiltbrunner M, Heckel G. Assessing Genome-Wide Diversity in European Hantaviruses through Sequence Capture from Natural Host Samples. Viruses 2020; 12:v12070749. [PMID: 32664593 PMCID: PMC7412162 DOI: 10.3390/v12070749] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 07/08/2020] [Accepted: 07/09/2020] [Indexed: 12/19/2022] Open
Abstract
Research on the ecology and evolution of viruses is often hampered by the limitation of sequence information to short parts of the genomes or single genomes derived from cultures. In this study, we use hybrid sequence capture enrichment in combination with high-throughput sequencing to provide efficient access to full genomes of European hantaviruses from rodent samples obtained in the field. We applied this methodology to Tula (TULV) and Puumala (PUUV) orthohantaviruses for which analyses from natural host samples are typically restricted to partial sequences of their tri-segmented RNA genome. We assembled a total of ten novel hantavirus genomes de novo with very high coverage (on average >99%) and sequencing depth (average >247×). A comparison with partial Sanger sequences indicated an accuracy of >99.9% for the assemblies. An analysis of two common vole (Microtus arvalis) samples infected with two TULV strains each allowed for the de novo assembly of all four TULV genomes. Combining the novel sequences with all available TULV and PUUV genomes revealed very similar patterns of sequence diversity along the genomes, except for remarkably higher diversity in the non-coding region of the S-segment in PUUV. The genomic distribution of polymorphisms in the coding sequence was similar between the species, but differed between the segments with the highest sequence divergence of 0.274 for the M-segment, 0.265 for the S-segment, and 0.248 for the L-segment (overall 0.258). Phylogenetic analyses showed the clustering of genome sequences consistent with their geographic distribution within each species. Genome-wide data yielded extremely high node support values, despite the impact of strong mutational saturation that is expected for hantavirus sequences obtained over large spatial distances. We conclude that genome sequencing based on capture enrichment protocols provides an efficient means for ecological and evolutionary investigations of hantaviruses at an unprecedented completeness and depth.
Collapse
|
136
|
Jiao F, Luo R, Dai X, Liu H, Yu G, Han S, Lu X, Su C, Chen Q, Song Q, Meng C, Li F, Sun H, Zhang R, Hui T, Qian Y, Zhao A, Jiang Y. Chromosome-Level Reference Genome and Population Genomic Analysis Provide Insights into the Evolution and Improvement of Domesticated Mulberry (Morus alba). MOLECULAR PLANT 2020; 13:1001-1012. [PMID: 32422187 DOI: 10.1016/j.molp.2020.05.005] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Revised: 04/08/2020] [Accepted: 05/12/2020] [Indexed: 05/16/2023]
Abstract
Mulberry (Morus spp.) is the sole plant consumed by the domesticated silkworm. However, the genome of domesticated mulberry has not yet been sequenced, and the ploidy level of this species remains unclear. Here, we report a high-quality, chromosome-level domesticated mulberry (Morus alba) genome. Analysis of genomic data and karyotype analyses confirmed that M. alba is a diploid with 28 chromosomes (2n = 2x = 28). Population genomic analysis based on resequencing of 134 mulberry accessions classified domesticated mulberry into three geographical groups, namely, Taihu Basin of southeastern China (Hu mulberry), northern and southwestern China, and Japan. Hu mulberry had the lowest nucleotide diversity among these accessions and demonstrated obvious signatures of selection associated with environmental adaptation. Further phylogenetic analysis supports a previous proposal that multiple domesticated mulberry accessions previously classified as different species actually belong to one species. This study expands our understanding of genome evolution of the genus Morus and population structure of domesticated mulberry, which would facilitate mulberry breeding and improvement.
Collapse
|
137
|
A First Insight into North American Plant Pathogenic Fungi Armillaria Sinapina Transcriptome. BIOLOGY 2020; 9:biology9070153. [PMID: 32635577 PMCID: PMC7407180 DOI: 10.3390/biology9070153] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 07/01/2020] [Accepted: 07/03/2020] [Indexed: 12/02/2022]
Abstract
Armillaria sinapina, a fungal pathogen of primary timber species of North American forests, causes white root rot disease that ultimately kills the trees. A more detailed understanding of the molecular mechanisms underlying this illness will support future developments on disease resistance and management, as well as in the decomposition of cellulosic material for further use. In this study, RNA-Seq technology was used to compare the transcriptome profiles of A. sinapina fungal culture grown in yeast malt broth medium supplemented or not with betulin, a natural compound of the terpenoid group found in abundance in white birch bark. This was done to identify enzyme transcripts involved in the metabolism (redox reaction) of betulin into betulinic acid, a potent anticancer drug. De novo assembly and characterization of A. sinapina transcriptome was performed using Illumina technology. A total of 170,592,464 reads were generated, then 273,561 transcripts were characterized. Approximately, 53% of transcripts could be identified using public databases with several metabolic pathways represented. A total of 11 transcripts involved in terpenoid biosynthesis were identified. In addition, 25 gene transcripts that could play a significant role in lignin degradation were uncovered, as well as several redox enzymes of the cytochromes P450 family. To our knowledge, this research is the first transcriptomic study carried out on A. sinapina.
Collapse
|
138
|
Marques JP, Seixas FA, Farelo L, Callahan CM, Good JM, Montgomery WI, Reid N, Alves PC, Boursot P, Melo-Ferreira J. An Annotated Draft Genome of the Mountain Hare (Lepus timidus). Genome Biol Evol 2020; 12:3656-3662. [PMID: 31834364 PMCID: PMC6951464 DOI: 10.1093/gbe/evz273] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/07/2019] [Indexed: 12/25/2022] Open
Abstract
Hares (genus Lepus) provide clear examples of repeated and often massive introgressive hybridization and striking local adaptations. Genomic studies on this group have so far relied on comparisons to the European rabbit (Oryctolagus cuniculus) reference genome. Here, we report the first de novo draft reference genome for a hare species, the mountain hare (Lepus timidus), and evaluate the efficacy of whole-genome re-sequencing analyses using the new reference versus using the rabbit reference genome. The genome was assembled using the ALLPATHS-LG protocol with a combination of overlapping pair and mate-pair Illumina sequencing (77x coverage). The assembly contained 32,294 scaffolds with a total length of 2.7 Gb and a scaffold N50 of 3.4 Mb. Re-scaffolding based on the rabbit reference reduced the total number of scaffolds to 4,205 with a scaffold N50 of 194 Mb. A correspondence was found between 22 of these hare scaffolds and the rabbit chromosomes, based on gene content and direct alignment. We annotated 24,578 protein coding genes by combining ab-initio predictions, homology search, and transcriptome data, of which 683 were solely derived from hare-specific transcriptome data. The hare reference genome is therefore a new resource to discover and investigate hare-specific variation. Similar estimates of heterozygosity and inferred demographic history profiles were obtained when mapping hare whole-genome re-sequencing data to the new hare draft genome or to alternative references based on the rabbit genome. Our results validate previous reference-based strategies and suggest that the chromosome-scale hare draft genome should enable chromosome-wide analyses and genome scans on hares.
Collapse
|
139
|
Shu R, Zhang J, Meng Q, Zhang H, Zhou G, Li M, Wu P, Zhao Y, Chen C, Qin Q. A New High-Quality Draft Genome Assembly of the Chinese Cordyceps Ophiocordyceps sinensis. Genome Biol Evol 2020; 12:1074-1079. [PMID: 32579174 PMCID: PMC7486949 DOI: 10.1093/gbe/evaa112] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/21/2020] [Indexed: 01/07/2023] Open
Abstract
Ophiocordyceps sinensis (Berk.) is an entomopathogenic fungus endemic to the Qinghai-Tibet Plateau. It parasitizes and mummifies the underground ghost moth larvae, then produces a fruiting body. The fungus-insect complex, called Chinese cordyceps or "DongChongXiaCao," is not only a valuable traditional Chinese medicine, but also a major source of income for numerous Himalayan residents. Here, taking advantage of rapid advances in single-molecule sequencing, we assembled a highly contiguous genome assembly of O. sinensis. The assembly of 23 contigs was ∼110.8 Mb with a N50 length of 18.2 Mb. We used RNA-seq and homologous protein sequences to identify 8,916 protein-coding genes in the IOZ07 assembly. Moreover, 63 secondary metabolite gene clusters were identified in the improved assembly. The improved assembly and genome features described in this study will further inform the evolutionary study and resource utilization of Chinese cordyceps.
Collapse
|
140
|
Ma X, Agudelo P, Richards VP, Baeza JA. The complete mitochondrial genome of the Columbia lance nematode, Hoplolaimus columbus, a major agricultural pathogen in North America. Parasit Vectors 2020; 13:321. [PMID: 32571423 PMCID: PMC7310197 DOI: 10.1186/s13071-020-04187-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 06/13/2020] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND The plant-parasitic nematode Hoplolaimus columbus is a pathogen that uses a wide range of hosts and causes substantial yield loss in agricultural fields in North America. This study describes, for the first time, the complete mitochondrial genome of H. columbus from South Carolina, USA. METHODS The mitogenome of H. columbus was assembled from Illumina 300 bp pair-end reads. It was annotated and compared to other published mitogenomes of plant-parasitic nematodes in the superfamily Tylenchoidea. The phylogenetic relationships between H. columbus and other 6 genera of plant-parasitic nematodes were examined using protein-coding genes (PCGs). RESULTS The mitogenome of H. columbus is a circular AT-rich DNA molecule 25,228 bp in length. The annotation result comprises 12 PCGs, 2 ribosomal RNA genes, and 19 transfer RNA genes. No atp8 gene was found in the mitogenome of H. columbus but long non-coding regions were observed in agreement to that reported for other plant-parasitic nematodes. The mitogenomic phylogeny of plant-parasitic nematodes in the superfamily Tylenchoidea agreed with previous molecular phylogenies. Mitochondrial gene synteny in H. columbus was unique but similar to that reported for other closely related species. CONCLUSIONS The mitogenome of H. columbus is unique within the superfamily Tylenchoidea but exhibits similarities in both gene content and synteny to other closely related nematodes. Among others, this new resource will facilitate population genomic studies in lance nematodes from North America and beyond.
Collapse
|
141
|
Köhler M, Reginato M, Souza-Chies TT, Majure LC. Insights Into Chloroplast Genome Evolution Across Opuntioideae (Cactaceae) Reveals Robust Yet Sometimes Conflicting Phylogenetic Topologies. FRONTIERS IN PLANT SCIENCE 2020; 11:729. [PMID: 32636853 PMCID: PMC7317007 DOI: 10.3389/fpls.2020.00729] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 05/06/2020] [Indexed: 05/22/2023]
Abstract
Chloroplast genomes (plastomes) are frequently treated as highly conserved among land plants. However, many lineages of vascular plants have experienced extensive structural rearrangements, including inversions and modifications to the size and content of genes. Cacti are one of these lineages, containing the smallest plastome known for an obligately photosynthetic angiosperm, including the loss of one copy of the inverted repeat (∼25 kb) and the ndh gene suite, but only a few cacti from the subfamily Cactoideae have been sufficiently characterized. Here, we investigated the variation of plastome sequences across the second-major lineage of the Cactaceae, the subfamily Opuntioideae, to address (1) how variable is the content and arrangement of chloroplast genome sequences across the subfamily, and (2) how phylogenetically informative are the plastome sequences for resolving major relationships among the clades of Opuntioideae. Our de novo assembly of the Opuntia quimilo plastome recovered an organelle of 150,347 bp in length with both copies of the inverted repeat and the presence of all the ndh gene suite. An expansion of the large single copy unit and a reduction of the small single copy unit was observed, including translocations and inversion of genes, as well as the putative pseudogenization of some loci. Comparative analyses among all clades within Opuntioideae suggested that plastome structure and content vary across taxa of this subfamily, with putative independent losses of the ndh gene suite and pseudogenization of genes across disparate lineages, further demonstrating the dynamic nature of plastomes in Cactaceae. Our plastome dataset was robust in resolving three tribes with high support within Opuntioideae: Cylindropuntieae, Tephrocacteae and Opuntieae. However, conflicting topologies were recovered among major clades when exploring different assemblies of markers. A plastome-wide survey for highly informative phylogenetic markers revealed previously unused regions for future use in Sanger-based studies, presenting a valuable dataset with primers designed for continued evolutionary studies across Cactaceae. These results bring new insights into the evolution of plastomes in cacti, suggesting that further analyses should be carried out to address how ecological drivers, physiological constraints and morphological traits of cacti may be related with the common rearrangements in plastomes that have been reported across the family.
Collapse
|
142
|
Mastrochirico-Filho VA, Hata ME, Kuradomi RY, de Freitas MV, Ariede RB, Pinheiro DG, Robledo D, Houston R, Hashimoto DT. Transcriptome Profiling of Pacu ( Piaractus mesopotamicus) Challenged With Pathogenic Aeromonas hydrophila: Inference on Immune Gene Response. Front Genet 2020; 11:604. [PMID: 32582300 PMCID: PMC7295981 DOI: 10.3389/fgene.2020.00604] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 05/18/2020] [Indexed: 11/13/2022] Open
Abstract
Pacu (Piaractus mesopotamicus) is a Neotropical fish of major importance for South American aquaculture. Septicemia caused by Aeromonas hydrophila bacteria is currently considered a substantial threat for pacu aquaculture that have provoked infectious disease outbreaks with high economic losses. The understanding of molecular aspects on progress of A. hydrophila infection and pacu immune response is scarce, which have limited the development of genomic selection for resistance to this infection. The present study aimed to generate information on transcriptome of pacu in face of A. hydrophila infection, and compare the transcriptomic responses between two groups of time-series belonging to a disease resistance challenge, peak mortality (HM) and mortality plateau (PM) groups of individuals. Nine RNA sequencing (RNA-Seq) libraries were prepared from liver tissue of challenged individuals, generating ∼160 million 150 bp pair-end reads. After quality trimming/cleanup, these reads were assembled de novo generating 211,259 contigs. When the expression of genes from individuals of HM group were compared to individuals from control group, a total of 4,413 differentially expressed transcripts were found (2,000 upregulated and 2,413 downregulated candidate genes). Additionally, 433 transcripts were differentially expressed when individuals from MP group were compared with those in the control group (155 upregulated and 278 downregulated candidate genes). The resulting differentially expressed transcripts were clustered into the following functional categories: cytokines and signaling, epithelial protection, antigen processing and presentation, apoptosis, phagocytosis, complement system cascades and pattern recognition receptors. The proposed results revealing relevant differential gene expression on HM and PM groups which will contribute to a better understanding of the molecular defense mechanisms during A. hydrophila infection.
Collapse
|
143
|
Yoshida N, Kaito C. Dataset for de novo transcriptome assembly of the African bullfrog Pyxicephalus adspersus. Data Brief 2020; 30:105388. [PMID: 32211462 PMCID: PMC7082503 DOI: 10.1016/j.dib.2020.105388] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 02/28/2020] [Accepted: 02/28/2020] [Indexed: 11/18/2022] Open
Abstract
In this article, we report the first de novo transcriptome assembly of the African bullfrog Pyxicephalus adspersus. In this data, 75,320,390 raw reads were acquired from African bullfrog mRNA using Illumina paired-end sequencing platform. De novo assembly resulted in a total of 136,958 unigenes. In the obtained unigenes, 30,039 open reading frames (ORFs) were detected. This dataset provides basic information for molecular level analysis of this species, which undergoes a state of dormancy under dry conditions at ordinary temperatures called estivation.
Collapse
|
144
|
Linder RA, Majumder A, Chakraborty M, Long A. Two Synthetic 18-Way Outcrossed Populations of Diploid Budding Yeast with Utility for Complex Trait Dissection. Genetics 2020; 215:323-342. [PMID: 32241804 PMCID: PMC7268983 DOI: 10.1534/genetics.120.303202] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Accepted: 03/31/2020] [Indexed: 02/07/2023] Open
Abstract
Advanced-generation multiparent populations (MPPs) are a valuable tool for dissecting complex traits, having more power than genome-wide association studies to detect rare variants and higher resolution than F2 linkage mapping. To extend the advantages of MPPs in budding yeast, we describe the creation and characterization of two outbred MPPs derived from 18 genetically diverse founding strains. We carried out de novo assemblies of the genomes of the 18 founder strains, such that virtually all variation segregating between these strains is known, and represented those assemblies as Santa Cruz Genome Browser tracks. We discovered complex patterns of structural variation segregating among the founders, including a large deletion within the vacuolar ATPase VMA1, several different deletions within the osmosensor MSB2, a series of deletions and insertions at PRM7 and the adjacent BSC1, as well as copy number variation at the dehydrogenase ALD2 Resequenced haploid recombinant clones from the two MPPs have a median unrecombined block size of 66 kb, demonstrating that the population is highly recombined. We pool-sequenced the two MPPs to 3270× and 2226× coverage and demonstrated that we can accurately estimate local haplotype frequencies using pooled data. We further downsampled the pool-sequenced data to ∼20-40× and showed that local haplotype frequency estimates remained accurate, with median error rates 0.8 and 0.6% at 20× and 40×, respectively. Haplotypes frequencies are estimated much more accurately than SNP frequencies obtained directly from the same data. Deep sequencing of the two populations revealed that 10 or more founders are present at a detectable frequency for > 98% of the genome, validating the utility of this resource for the exploration of the role of standing variation in the architecture of complex traits.
Collapse
|
145
|
Cogne Y, Gouveia D, Chaumot A, Degli-Esposti D, Geffard O, Pible O, Almunia C, Armengaud J. Proteogenomics-Guided Evaluation of RNA-Seq Assembly and Protein Database Construction for Emergent Model Organisms. Proteomics 2020; 20:e1900261. [PMID: 32249536 DOI: 10.1002/pmic.201900261] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 03/24/2020] [Indexed: 11/10/2022]
Abstract
Proteogenomics is gaining momentum as, today, genomics, transcriptomics, and proteomics can be readily performed on any new species. This approach allows key alterations to molecular pathways to be identified when comparing conditions. For animals and plants, RNA-seq-informed proteomics is the most popular means of interpreting tandem mass spectrometry spectra acquired for species for which the genome has not yet been sequenced. It relies on high-performance de novo RNA-seq assembly and optimized translation strategies. Here, several pre-treatments for Illumina RNA-seq reads before assembly are explored to translate the resulting contigs into useful polypeptide sequences. Experimental transcriptomics and proteomics datasets acquired for individual Gammarus fossarum freshwater crustaceans are used, the most relevant procedure is defined by the ratio of MS/MS spectra assigned to peptide sequences. Removing reads with a mean quality score of less than 17-which represents a single probable nucleotide error on 150-bp reads-prior to assembly, increases the proteomics outcome. The best translation using Transdecoder is achieved with a minimal open reading frame length of 50 amino acids and systematic selection of ORFs longer than 900 nucleotides. Using these parameters, transcriptome assembly and translation informed by proteomics pave the way to further improvements in proteogenomics.
Collapse
|
146
|
Das P, Sahoo L, Das SP, Bit A, Joshi CG, Kushwaha B, Kumar D, Shah TM, Hinsu AT, Patel N, Patnaik S, Agarwal S, Pandey M, Srivastava S, Meher PK, Jayasankar P, Koringa PG, Nagpure NS, Kumar R, Singh M, Iquebal MA, Jaiswal S, Kumar N, Raza M, Das Mahapatra K, Jena J. De novo Assembly and Genome-Wide SNP Discovery in Rohu Carp, Labeo rohita. Front Genet 2020; 11:386. [PMID: 32373166 PMCID: PMC7186481 DOI: 10.3389/fgene.2020.00386] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 03/27/2020] [Indexed: 11/24/2022] Open
|
147
|
Louha S, Ray DA, Winker K, Glenn TC. A High-Quality Genome Assembly of the North American Song Sparrow, Melospiza melodia. G3 (BETHESDA, MD.) 2020; 10:1159-1166. [PMID: 32075855 PMCID: PMC7144075 DOI: 10.1534/g3.119.400929] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 02/13/2020] [Indexed: 01/25/2023]
Abstract
The song sparrow, Melospiza melodia, is one of the most widely distributed species of songbirds found in North America. It has been used in a wide range of behavioral and ecological studies. This species' pronounced morphological and behavioral diversity across populations makes it a favorable candidate in several areas of biomedical research. We have generated a high-quality de novo genome assembly of M. melodia using Illumina short read sequences from genomic and in vitro proximity-ligation libraries. The assembled genome is 978.3 Mb, with a physical coverage of 24.9×, N50 scaffold size of 5.6 Mb and N50 contig size of 31.7 Kb. Our genome assembly is highly complete, with 87.5% full-length genes present out of a set of 4,915 universal single-copy orthologs present in most avian genomes. We annotated our genome assembly and constructed 15,086 gene models, a majority of which have high homology to related birds, Taeniopygia guttata and Junco hyemalis In total, 83% of the annotated genes are assigned with putative functions. Furthermore, only ∼7% of the genome is found to be repetitive; these regions and other non-coding functional regions are also identified. The high-quality M. melodia genome assembly and annotations we report will serve as a valuable resource for facilitating studies on genome structure and evolution that can contribute to biomedical research and serve as a reference in population genomic and comparative genomic studies of closely related species.
Collapse
|
148
|
Jayakumar V, Sakakibara Y. Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data. Brief Bioinform 2020; 20:866-876. [PMID: 29112696 PMCID: PMC6585154 DOI: 10.1093/bib/bbx147] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 09/22/2017] [Indexed: 12/20/2022] Open
Abstract
Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences (PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms.
Collapse
|
149
|
De Miccolis Angelini RM, Romanazzi G, Pollastro S, Rotolo C, Faretra F, Landi L. New High-Quality Draft Genome of the Brown Rot Fungal Pathogen Monilinia fructicola. Genome Biol Evol 2020; 11:2850-2855. [PMID: 31560373 PMCID: PMC6795239 DOI: 10.1093/gbe/evz207] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/25/2019] [Indexed: 12/29/2022] Open
Abstract
Brown rot is a worldwide fungal disease of stone and pome fruit that is caused by several Monilinia species. Among these, Monilinia fructicola can cause severe preharvest and postharvest losses, especially for stone fruit. Here, we present a high-quality draft genome assembly of M. fructicola Mfrc123 strain obtained using both Illumina and PacBio sequencing technologies. The genome assembly comprised 20 scaffolds, including 29 telomere sequences at both ends of 10 scaffolds, and at a single end of 9 scaffolds. The total length was 44.05 Mb, with a scaffold N50 of 2,592 kb. Annotation of the M. fructicola assembly identified a total of 12,118 genes and 13,749 proteins that were functionally annotated. This newly generated reference genome is expected to significantly contribute to comparative analysis of genome biology and evolution within Monilinia species.
Collapse
|
150
|
Bushmanova E, Antipov D, Lapidus A, Prjibelski AD. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. Gigascience 2020; 8:5559527. [PMID: 31494669 PMCID: PMC6736328 DOI: 10.1093/gigascience/giz100] [Citation(s) in RCA: 303] [Impact Index Per Article: 75.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 04/20/2019] [Accepted: 08/01/2019] [Indexed: 12/18/2022] Open
Abstract
Background The possibility of generating large RNA-sequencing datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the organisms with finished and well-annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing, and paralogous genes. Results Herein we describe the novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler and explores computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-sequencing datasets, and briefly highlight strong and weak points of different assemblers. Conclusions Based on the performed comparison between different assembly methods, we infer that it is not possible to detect the absolute leader according to all quality metrics and all used datasets. However, rnaSPAdes typically outperforms other assemblers by such important property as the number of assembled genes and isoforms, and at the same time has higher accuracy statistics on average comparing to the closest competitors.
Collapse
|