151
|
Tao YT, Suo F, Tusso S, Wang YK, Huang S, Wolf JBW, Du LL. Intraspecific Diversity of Fission Yeast Mitochondrial Genomes. Genome Biol Evol 2020; 11:2312-2329. [PMID: 31364709 PMCID: PMC6736045 DOI: 10.1093/gbe/evz165] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/18/2019] [Indexed: 02/07/2023] Open
Abstract
The fission yeast Schizosaccharomyces pombe is an important model organism, but its natural diversity and evolutionary history remain under-studied. In particular, the population genomics of the S. pombe mitochondrial genome (mitogenome) has not been thoroughly investigated. Here, we assembled the complete circular-mapping mitogenomes of 192 S. pombe isolates de novo, and found that these mitogenomes belong to 69 nonidentical sequence types ranging from 17,618 to 26,910 bp in length. Using the assembled mitogenomes, we identified 20 errors in the reference mitogenome and discovered two previously unknown mitochondrial introns. Analyzing sequence diversity of these 69 types of mitogenomes revealed two highly distinct clades, with only three mitogenomes exhibiting signs of inter-clade recombination. This diversity pattern suggests that currently available S. pombe isolates descend from two long-separated ancestral lineages. This conclusion is corroborated by the diversity pattern of the recombination-repressed K-region located between donor mating-type loci mat2 and mat3 in the nuclear genome. We estimated that the two ancestral S. pombe lineages diverged about 31 million generations ago. These findings shed new light on the evolution of S. pombe and the data sets generated in this study will facilitate future research on genome evolution.
Collapse
|
152
|
Vanillin Production in Pseudomonas: Whole-Genome Sequencing of Pseudomonas sp. Strain 9.1 and Reannotation of Pseudomonas putida CalA as a Vanillin Reductase. Appl Environ Microbiol 2020; 86:AEM.02442-19. [PMID: 31924622 PMCID: PMC7054097 DOI: 10.1128/aem.02442-19] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 12/21/2019] [Indexed: 02/06/2023] Open
Abstract
Microbial degradation of lignin and its related aromatic compounds has great potential for the sustainable production of chemicals and bioremediation of contaminated soils. We previously isolated Pseudomonas sp. strain 9.1 from historical waste deposits (forming so-called fiber banks) released from pulp and paper mills along the Baltic Sea coast. The strain accumulated vanillyl alcohol during growth on vanillin, and while reported in other microbes, this phenotype is less common in wild-type pseudomonads. As the reduction of vanillin to vanillyl alcohol is an undesired trait in Pseudomonas strains engineered to accumulate vanillin, connecting the strain 9.1 phenotype with a genotype would increase the fundamental understanding and genetic engineering potential of microbial vanillin metabolism. The genome of Pseudomonas sp. 9.1 was sequenced and assembled. Annotation identified oxidoreductases with homology to Saccharomyces cerevisiae alcohol dehydrogenase ScADH6p, known to reduce vanillin to vanillyl alcohol, in both the 9.1 genome and the model strain Pseudomonas putida KT2440. Recombinant expression of the Pseudomonas sp. 9.1 FEZ21_09870 and P. putida KT2440 PP_2426 (calA) genes in Escherichia coli revealed that these open reading frames encode aldehyde reductases that convert vanillin to vanillyl alcohol, and that P. putida KT2440 PP_3839 encodes a coniferyl alcohol dehydrogenase that oxidizes coniferyl alcohol to coniferyl aldehyde (i.e., the function previously assigned to calA). The deletion of PP_2426 in P. putida GN442 engineered to accumulate vanillin resulted in a decrease in by-product (vanillyl alcohol) yield from 17% to ∼1%. Based on these results, we propose the reannotation of PP_2426 and FEZ21_09870 as areA and PP_3839 as calA-II IMPORTANCE Valorization of lignocellulose (nonedible plant matter) is of key interest for the sustainable production of chemicals from renewable resources. Lignin, one of the main constituents of lignocellulose, is a heterogeneous aromatic biopolymer that can be chemically depolymerized into a heterogeneous mixture of aromatic building blocks; those can be further converted by certain microbes into value-added aromatic chemicals, e.g., the flavoring agent vanillin. We previously isolated a Pseudomonas sp. strain with the (for the genus) unusual trait of vanillyl alcohol production during growth on vanillin. Whole-genome sequencing of the isolate led to the identification of a vanillin reductase candidate gene whose deletion in a recombinant vanillin-accumulating P. putida strain almost completely alleviated the undesired vanillyl alcohol by-product yield. These results represent an important step toward biotechnological production of vanillin from lignin using bacterial cell factories.
Collapse
|
153
|
Shen F, Long Y, Li F, Ge G, Song G, Li Q, Qiao Z, Cui Z. De novo transcriptome assembly and sex-biased gene expression in the gonads of Amur catfish (Silurus asotus). Genomics 2020; 112:2603-2614. [PMID: 32109564 DOI: 10.1016/j.ygeno.2020.01.026] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 01/11/2020] [Accepted: 01/15/2020] [Indexed: 11/28/2022]
Abstract
Amur catfish is extensively distributed and cultured in Asian countries. Despite of economic importance, the genomic information of this species remains limited. A reference transcriptome of Amur catfish was assembled and the sex-biased gene expression in the gonads was characterized using RNA-sequencing. The assembled transcriptome of Amur catfish consisted of 74,840 transcripts. The N50, mean length and max length of transcripts are 1970, 1235 and 16,748 bp. Putative sex-specific transcripts were identified and sex-specific expression of the representative genes was verified by RT-PCR. Differential expression analysis identified 5401 ovary-biased and 5618 testis-biased genes. The ovary-biased genes were mainly enriched in pathways such as RNA transport and ribosome biogenesis in eukaryotes. The testis-biased genes were enriched in calcium signaling and cytokine-cytokine receptor interaction, etc. Our data provide a valuable genomic resource for further investigating the genetic basis of sex determination, sex differentiation and sexual dimorphism of catfish.
Collapse
|
154
|
Tang L, Li M, Wu FX, Pan Y, Wang J. MAC: Merging Assemblies by Using Adjacency Algebraic Model and Classification. Front Genet 2020; 10:1396. [PMID: 32082361 PMCID: PMC7005248 DOI: 10.3389/fgene.2019.01396] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 12/20/2019] [Indexed: 12/13/2022] Open
Abstract
With the generation of a large amount of sequencing data, different assemblers have emerged to perform de novo genome assembly. As a single strategy is hard to fit various biases of datasets, none of these tools outperforms the others on all species. The process of assembly reconciliation is to merge multiple assemblies and generate a high-quality consensus assembly. Several assembly reconciliation tools have been proposed. However, the existing reconciliation tools cannot produce a merged assembly which has better contiguity and contains less errors simultaneously, and the results of these tools usually depend on the ranking of input assemblies. In this study, we propose a novel assembly reconciliation tool MAC, which merges assemblies by using the adjacency algebraic model and classification. In order to solve the problem of uneven sequencing depth and sequencing errors, MAC identifies consensus blocks between contig sets to construct an adjacency graph. To solve the problem of repetitive region, MAC employs classification to optimize the adjacency algebraic model. What’s more, MAC designs an overall scoring function to solve the problem of unknown ranking of input assembly sets. The experimental results from four species of GAGE-B demonstrate that MAC outperforms other assembly reconciliation tools.
Collapse
|
155
|
Marrano A, Palmer AE, Moyers BT. Stacking up RADSeq assembly programs: From complete hit to completely abysmal. Mol Ecol Resour 2020; 20:357-359. [PMID: 32012467 DOI: 10.1111/1755-0998.13140] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 01/06/2020] [Accepted: 01/30/2020] [Indexed: 11/29/2022]
Abstract
Decreasing sequencing costs have driven a rapid expansion of novel genotyping methods. One of these methods is the exploitation of restriction enzyme cut sites to generate genome-wide but reduced representation sequencing libraries (RRLs), alternatively termed genotyping by sequencing or restriction-site associated DNA sequencing. Without a reference genome, the resulting short sequence reads must be assembled de novo. There are many possible assembly programs, most not explicitly developed for RRL data, and we know little of their effectiveness. In this issue of Molecular Ecology Resources, LaCava et al. (2020) systematically evaluate six commonly used programs and two commonly varied parameters for complete and accurate assembly of RRLs, using simulated double digests of Homo sapiens and Arabidopsis thaliana genomes with varied mutation rates and types. The authors find substantial variation in performance across assembly programs. The most consistently high-performing assembler is infrequently used in their literature survey (CD-HIT; Li and Godzik, 2006), while several others fail to produce complete, accurate assemblies under many conditions. LaCava et al. additionally recommend best practices in parameter choice and evaluation of future assembly programs-advice that molecular ecologists working to assemble sequences of all kinds should take to heart.
Collapse
|
156
|
Abstract
By using next-generation sequencing technologies, it is possible to quickly and inexpensively generate large numbers of relatively short reads from both the nuclear and mitochondrial DNA (mtDNA) contained in a biological sample. Unfortunately, assembling such whole-genome sequencing (WGS) data with standard de novo assemblers often fails to generate high-quality mitochondrial genome sequences due to the large difference in copy number (and hence sequencing depth) between the mitochondrial and nuclear genomes. Assembly of complete mitochondrial genome sequences is further complicated by the fact that many de novo assemblers are not designed for circular genomes and by the presence of repeats in the mitochondrial genomes of some species. In this article, we describe the Statistical Mitogenome Assembly with RepeaTs (SMART) pipeline for automated assembly of mitochondrial genomes from WGS data. SMART uses an efficient coverage-based filter to first select a subset of reads enriched in mtDNA sequences. Contigs produced by an initial assembly step are filtered using the Basic Local Alignment Search Tool searches against a comprehensive mitochondrial genome database and are used as "baits" for an alignment-based filter that produces the set of reads used in a second de novo assembly and scaffolding step. In the presence of repeats, the possible paths through the assembly graph are evaluated using a maximum likelihood model. Additionally, the assembly process is repeated for a user-specified number of times on resampled subsets of reads to select for annotation of the reconstructed sequences with highest bootstrap support. Experiments on WGS data sets from a variety of species show that the SMART pipeline produces complete circular mitochondrial genome sequences with a higher success rate than current state-of-the-art tools, particularly for low-coverage WGS data sets.
Collapse
|
157
|
Razo-Mendivil FG, Martínez O, Hayano-Kanashiro C. Compacta: a fast contig clustering tool for de novo assembled transcriptomes. BMC Genomics 2020; 21:148. [PMID: 32046653 PMCID: PMC7014741 DOI: 10.1186/s12864-020-6528-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 01/22/2020] [Indexed: 12/25/2022] Open
Abstract
Background RNA-Seq is the preferred method to explore transcriptomes and to estimate differential gene expression. When an organism has a well-characterized and annotated genome, reads obtained from RNA-Seq experiments can be directly mapped to that genome to estimate the number of transcripts present and relative expression levels of these transcripts. However, for unknown genomes, de novo assembly of RNA-Seq reads must be performed to generate a set of contigs that represents the transcriptome. These contig sets contain multiple transcripts, including immature mRNAs, spliced transcripts and allele variants, as well as products of close paralogs or gene families that can be difficult to distinguish. Thus, tools are needed to select a set of less redundant contigs to represent the transcriptome for downstream analyses. Here we describe the development of Compacta to produce contig sets from de novo assemblies. Results Compacta is a fast and flexible computational tool that allows selection of a representative set of contigs from de novo assemblies. Using a graph-based algorithm, Compacta groups contigs into clusters based on the proportion of shared reads. The user can determine the minimum coverage of the contigs to be clustered, as well as a threshold for the proportion of shared reads in the clustered contigs, thus providing a dynamic range of transcriptome compression that can be adapted according to experimental aims. We compared the performance of Compacta against state of the art clustering algorithms on assemblies from Arabidopsis, mouse and mango, and found that Compacta yielded more rapid results and had competitive precision and recall ratios. We describe and demonstrate a pipeline to tailor Compacta parameters to specific experimental aims. Conclusions Compacta is a fast and flexible algorithm for the determination of optimum contig sets that represent the transcriptome for downstream analyses.
Collapse
|
158
|
Eisfeldt J, Mårtensson G, Ameur A, Nilsson D, Lindstrand A. Discovery of Novel Sequences in 1,000 Swedish Genomes. Mol Biol Evol 2020; 37:18-30. [PMID: 31560401 PMCID: PMC6984370 DOI: 10.1093/molbev/msz176] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Novel sequences (NSs), not present in the human reference genome, are abundant and remain largely unexplored. Here, we utilize de novo assembly to study NS in 1,000 Swedish individuals first sequenced as part of the SweGen project revealing a total of 46 Mb in 61,044 distinct contigs of sequences not present in GRCh38. The contigs were aligned to recently published catalogs of Icelandic and Pan-African NSs, as well as the chimpanzee genome, revealing a great diversity of shared sequences. Analyzing the positioning of NS across the chimpanzee genome, we find that 2,807 NS align confidently within 143 chimpanzee orthologs of human genes. Aligning the whole genome sequencing data to the chimpanzee genome, we discover ancestral NS common throughout the Swedish population. The NSs were searched for repeats and repeat elements: revealing a majority of repetitive sequence (56%), and enrichment of simple repeats (28%) and satellites (15%). Lastly, we align the unmappable reads of a subset of the thousand genomes data to our collection of NS, as well as the previously published Pan-African NS: revealing that both the Swedish and Pan-African NS are widespread, and that the Swedish NSs are largely a subset of the Pan-African NS. Overall, these results highlight the importance of creating a more diverse reference genome and illustrate that significant amounts of the NS may be of ancestral origin.
Collapse
|
159
|
Landi L, Pollastro S, Rotolo C, Romanazzi G, Faretra F, De Miccolis Angelini RM. Draft Genomic Resources for the Brown Rot Fungal Pathogen Monilinia laxa. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2020; 33:145-148. [PMID: 31687915 DOI: 10.1094/mpmi-08-19-0225-a] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Monilinia laxa is the causal agent of brown rot on stone fruit, and it can cause heavy yield losses during field production and postharvest storage. This article reports the draft genome assembly of the M. laxa Mlax316 strain, obtained using a hybrid genome assembly with both Illumina short-reads and PacBio long-reads sequencing technologies. The complete draft genome consists of 49 scaffolds with total size of 42.81 Mb, and scaffold N50 of 2,449.4 kb. Annotation of the M. laxa assembly identified 11,163 genes and 12,424 proteins which were functionally annotated. This new genome draft improves current genomic resources available for M. laxa and represents a useful tool for further research into its interactions with host plants and into evolution in the Monilinia genus.
Collapse
|
160
|
Wang J, Zhao A, Sun H. The complete mitochondrial genome of the least horseshoe bat ( Rhinolophus pusillus). MITOCHONDRIAL DNA PART B-RESOURCES 2020; 5:881-882. [PMID: 33366795 PMCID: PMC7748457 DOI: 10.1080/23802359.2020.1717389] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In this study, we generated the complete mitochondrial genome of Rhinolophus pusillus using next-generation sequencing. The mitochondrial genome was 16,833 bp in length and contained 13 protein-coding genes, 22 tRNA, 2 rRNA, and a non-coding control region. Phylogenetic analyses supported the taxonomic status of Rhinolophus pusillus among genus Rhinolophus, and the grouping with the sister taxon R. monoceros, which was highly restricted to Taiwan Island.
Collapse
|
161
|
Therkildsen NO, Baumann H. A comprehensive non-redundant reference transcriptome for the Atlantic silverside Menidia menidia. Mar Genomics 2020; 53:100738. [PMID: 32883435 DOI: 10.1016/j.margen.2019.100738] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 10/25/2019] [Accepted: 12/20/2019] [Indexed: 11/16/2022]
Abstract
The Atlantic silverside (Menidia menidia) has been the focus of extensive research efforts in ecology, evolutionary biology, and physiology over the past three decades, but lack of genomic resources has so far hindered examination of the molecular basis underlying the remarkable patterns of phenotypic variation described in this species. We here present the first reference transcriptome for M. menidia. We sought to capture a single representative sequence from as many genes as possible by first using a combination of Trinity and the CLC Genomics Workbench to de novo assemble contigs based on RNA-seq data from multiple individuals, tissue types, and life stages. To reduce redundancy, we passed the combined raw assemblies through a stringent filtering pipeline based both on sequence similarity to related species and computational predictions of transcript quality, condensing an initial set of >480,000 contigs to a final set of 20,998 representative contigs, amounting to a total length of 53.3 Mb. In this final assembly, 91% of the contigs were functionally annotated with putative gene function and gene ontology (GO) terms and/or InterProScan identifiers. The assembly contains complete or nearly complete copies of >95% of 248 highly conserved core genes present in low copy number across higher eukaryotes, and partial copies of another 3.8%, suggesting that our assembly provides relatively comprehensive coverage of the M. menidia transcriptome. The assembly provided here will be an important resource for future research.
Collapse
|
162
|
De Novo Transcriptome Sequencing of Serangium japonicum (Coleoptera: Coccinellidae) and Application of Two Assembled Unigenes. G3-GENES GENOMES GENETICS 2020; 10:247-254. [PMID: 31722887 PMCID: PMC6945030 DOI: 10.1534/g3.119.400785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The ladybird beetle Serangium japonicum is an important predator of whiteflies. Investigations of the molecular mechanisms of this predatory beetle have been hindered by the scarcity of gene sequence data. To obtain gene sequences for the ladybird beetle and determine differences in gene expression between the summer and winter seasons, paired-end sequencing was performed. Real-time PCR was used to validate differences in Krueppel homolog 1 gene (Kr-h1) mRNA expression in summer vs. winter samples. To determined the diversity of the population, annotated cytochrome c oxidase subunit I gene (COX1) gene fragments were amplified from several ladybird beetle populations. The analysis yielded 191,246 assembled unigenes, 127,016 of which (66.4%) were annotated. These functional annotations of gene sequences are currently available from the National Center for Biotechnology Information (NCBI), and will provide a basis for studying the molecular mechanisms underlying the biological characteristics of S. japonicum We found a change in expression of ribosome-associated genes across seasons, and postulate that this change is because of seasonal variation in temperature and photoperiod. The differential expression of Kr-h1 suggests that S. japonicum can successfully overwinter because the adults enter diapause. To explain the effects of season on Kr-h1 gene expression, we hypothesize a model in which that a short photoperiod affects the density of Ca2+, the subsequent activity of methyl farnesoate epoxidase and the synthesis of JH, and in turn Kr-h1 gene expression. COX1 annotation was concordant with the morphological ID. The same COX1 sequence was found in the samples from several provinces in China. Therefore, the COX1 sequence is worth further study to distinguish beetle species and populations.
Collapse
|
163
|
Thole V, Bassard JE, Ramírez-González R, Trick M, Ghasemi Afshar B, Breitel D, Hill L, Foito A, Shepherd L, Freitag S, Nunes dos Santos C, Menezes R, Bañados P, Naesby M, Wang L, Sorokin A, Tikhonova O, Shelenga T, Stewart D, Vain P, Martin C. RNA-seq, de novo transcriptome assembly and flavonoid gene analysis in 13 wild and cultivated berry fruit species with high content of phenolics. BMC Genomics 2019; 20:995. [PMID: 31856735 PMCID: PMC6924045 DOI: 10.1186/s12864-019-6183-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 10/15/2019] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Flavonoids are produced in all flowering plants in a wide range of tissues including in berry fruits. These compounds are of considerable interest for their biological activities, health benefits and potential pharmacological applications. However, transcriptomic and genomic resources for wild and cultivated berry fruit species are often limited, despite their value in underpinning the in-depth study of metabolic pathways, fruit ripening as well as in the identification of genotypes rich in bioactive compounds. RESULTS To access the genetic diversity of wild and cultivated berry fruit species that accumulate high levels of phenolic compounds in their fleshy berry(-like) fruits, we selected 13 species from Europe, South America and Asia representing eight genera, seven families and seven orders within three clades of the kingdom Plantae. RNA from either ripe fruits (ten species) or three ripening stages (two species) as well as leaf RNA (one species) were used to construct, assemble and analyse de novo transcriptomes. The transcriptome sequences are deposited in the BacHBerryGEN database (http://jicbio.nbi.ac.uk/berries) and were used, as a proof of concept, via its BLAST portal (http://jicbio.nbi.ac.uk/berries/blast.html) to identify candidate genes involved in the biosynthesis of phenylpropanoid compounds. Genes encoding regulatory proteins of the anthocyanin biosynthetic pathway (MYB and basic helix-loop-helix (bHLH) transcription factors and WD40 repeat proteins) were isolated using the transcriptomic resources of wild blackberry (Rubus genevieri) and cultivated red raspberry (Rubus idaeus cv. Prestige) and were shown to activate anthocyanin synthesis in Nicotiana benthamiana. Expression patterns of candidate flavonoid gene transcripts were also studied across three fruit developmental stages via the BacHBerryEXP gene expression browser (http://www.bachberryexp.com) in R. genevieri and R. idaeus cv. Prestige. CONCLUSIONS We report a transcriptome resource that includes data for a wide range of berry(-like) fruit species that has been developed for gene identification and functional analysis to assist in berry fruit improvement. These resources will enable investigations of metabolic processes in berries beyond the phenylpropanoid biosynthetic pathway analysed in this study. The RNA-seq data will be useful for studies of berry fruit development and to select wild plant species useful for plant breeding purposes.
Collapse
|
164
|
Wang J, Wang N, Zhao WD, Zhao LX, Jing YG, Yang LJ, He J, Li J. RNA-Seq Analysis Identified XLOC_009190 as Potential Therapeutic Target for Lung Adenocarcinoma. Onco Targets Ther 2019; 12:11221-11229. [PMID: 31908488 PMCID: PMC6927261 DOI: 10.2147/ott.s225532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 10/31/2019] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND The abnormal regulation on the expression of lncRNAs had been linked to multiple kinds of cancers, including lung adenocarcinoma. METHODS In this study, we carried out RNA-Seq on the three tumors and their paired normal samples from Chinese patients with lung adenocarcinoma. All the transcripts were de novo assembled, among which all the possible lncRNAs were predicted by tools including PLEK, CNCI, CPC, Blastp, hmmscan, and so forth. Their expression levels, altogether with the annotated mRNAs, were quantified. The weighted correlation network analysis and analysis of differential expression were carried out to explain the biological function of these novel lncRNAs. RESULTS The weighted correlation network analysis showed that the lncRNAs, which were highly correlated with protein-coding genes, participated in various pathways, including PI3K kinase pathways. These lncRNAs were important regulators in biological processes. Next, the differentially expressed lncRNAs were identified, including four known lncRNAs and one novel lncRNA (XLOC_009190). The cis-regulation of this novel lncRNA might act on MGST1, which protected cells by conjugation and glutathione peroxidase functions. The trans-regulation of this lncRNA was investigated by its correlated mRNAs. The results showed that it possibly played a role in transmembrane receptors like G protein-coupled receptors and potassium channels. CONCLUSION We proposed the potential biological function of XLOC_009190, but further experiments are needed to elucidate its roles and its potential to be the therapeutic target.
Collapse
|
165
|
Mogodiniyai Kasmaei K, Sundh J. Identification of Novel Putative Bacterial Feruloyl Esterases From Anaerobic Ecosystems by Use of Whole-Genome Shotgun Metagenomics and Genome Binning. Front Microbiol 2019; 10:2673. [PMID: 31824458 PMCID: PMC6879456 DOI: 10.3389/fmicb.2019.02673] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 11/04/2019] [Indexed: 12/20/2022] Open
Abstract
Feruloyl esterases (FAEs) can reduce the recalcitrance of lignocellulosic biomass to enzymatic hydrolysis, thereby enhancing biorefinery potentials or animal feeding values of the biomass. In addition, ferulic acid, a product of FAE activity, has applications in pharmaceutical and food/beverage industries. It is therefore of great interest to identify new FAEs to enhance understanding about this enzyme family. For this purpose, we used whole-genome shotgun metagenomics and genome binning to explore rumens of dairy cows, large intestines of horses, sediments of freshwater and forest topsoils to identify novel prokaryotic FAEs and trace the responsible microorganisms. A number of prokaryotic genomes were recovered of which, genomes of Clostridiales order and Candidatus Rhabdochlamydia genus showed FAE coding capacities. In total, five sequences were deemed as putative FAE. The BLASTP search against non-redundant protein database of NCBI indicated that these putative FAEs represented novel sequences within this enzyme family. The phylogenetic analysis showed that at least three putative sequences shared evolutionary lineage with FAEs of type A and thus could possess specific activities similar to this type of FAEs, something that is not previously found outside fungal kingdom. We nominate Candidatus Rhabdochlamydia genus as a novel FAE producing taxonomic unit.
Collapse
|
166
|
Li R, Fu W, Su R, Tian X, Du D, Zhao Y, Zheng Z, Chen Q, Gao S, Cai Y, Wang X, Li J, Jiang Y. Towards the Complete Goat Pan-Genome by Recovering Missing Genomic Segments From the Reference Genome. Front Genet 2019; 10:1169. [PMID: 31803240 PMCID: PMC6874019 DOI: 10.3389/fgene.2019.01169] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 10/23/2019] [Indexed: 01/08/2023] Open
Abstract
It is broadly expected that next generation sequencing will ultimately generate a complete genome as is the latest goat reference genome (ARS1), which is considered to be one of the most continuous assemblies in livestock. However, the rich diversity of worldwide goat breeds indicates that a genome from one individual would be insufficient to represent the whole genomic contents of goats. By comparing nine de novo assemblies from seven sibling species of domestic goat with ARS1 and using resequencing and transcriptome data from goats for verification, we identified a total of 38.3 Mb sequences that were absent in ARS1. The pan-sequences contain genic fractions with considerable expression. Using the pan-genome (ARS1 together with the pan-sequences) as a reference genome, variation calling efficacy can be appreciably improved. A total of 56,657 spurious SNPs per individual were repressed and 24,414 novel SNPs per individual on average were recovered as a result of better reads mapping quality. The transcriptomic mapping rate was also increased by ∼1.15%. Our study demonstrated that comparing de novo assemblies from closely related species is an efficient and reliable strategy for finding missing sequences from the reference genome and could be applicable to other species. Pan-genome can serve as an improved reference genome in animals for a better exploration of the underlying genomic variations and could increase the probability of finding genotype-phenotype associations assessed by a comprehensive variation database containing much more differences between individuals. We have constructed a goat pan-genome web interface for data visualization (http://animal.nwsuaf.edu.cn/panGoat).
Collapse
|
167
|
Li FD, Tong W, Xia EH, Wei CL. Optimized sequencing depth and de novo assembler for deeply reconstructing the transcriptome of the tea plant, an economically important plant species. BMC Bioinformatics 2019; 20:553. [PMID: 31694521 PMCID: PMC6836513 DOI: 10.1186/s12859-019-3166-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 10/21/2019] [Indexed: 11/10/2022] Open
Abstract
Background Tea is the oldest and among the world’s most popular non-alcoholic beverages, which has important economic, health and cultural values. Tea is commonly produced from the leaves of tea plants (Camellia sinensis), which belong to the genus Camellia of family Theaceae. In the last decade, many studies have generated the transcriptomes of tea plants at different developmental stages or under abiotic and/or biotic stresses to investigate the genetic basis of secondary metabolites that determine tea quality. However, these results exhibited large differences, particularly in the total number of reconstructed transcripts and the quality of the assembled transcriptomes. These differences largely result from limited knowledge regarding the optimized sequencing depth and assembler for transcriptome assembly of structurally complex plant species genomes. Results We employed different amounts of RNA-sequencing data, ranging from 4 to 84 Gb, to assemble the tea plant transcriptome using five well-known and representative transcript assemblers. Although the total number of assembled transcripts increased with increasing sequencing data, the proportion of unassembled transcripts became saturated as revealed by plant BUSCO datasets. Among the five representative assemblers, the Bridger package shows the best performance in both assembly completeness and accuracy as evaluated by the BUSCO datasets and genome alignment. In addition, we showed that Bridger and BinPacker harbored the shortest runtimes followed by SOAPdenovo and Trans-ABySS. Conclusions The present study compares the performance of five representative transcript assemblers and investigates the key factors that affect the assembly quality of the transcriptome of the tea plants. This study will be of significance in helping the tea research community obtain better sequencing and assembly of tea plant transcriptomes under conditions of interest and may thus help to answer major biological questions currently facing the tea industry.
Collapse
|
168
|
LaPierre N, Egan R, Wang W, Wang Z. De novo Nanopore read quality improvement using deep learning. BMC Bioinformatics 2019; 20:552. [PMID: 31694525 PMCID: PMC6833143 DOI: 10.1186/s12859-019-3103-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 09/20/2019] [Indexed: 11/23/2022] Open
Abstract
Background Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area. Results Here we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent “scrubbing” (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors. Conclusions MiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as a tool for preprocessing Nanopore reads for downstream analyses. MiniScrub is open-source software and is available at https://bitbucket.org/berkeleylab/jgi-miniscrub.
Collapse
|
169
|
De novo transcriptome sequencing of radish (Raphanus sativus L.) fleshy roots: analysis of major genes involved in the anthocyanin synthesis pathway. BMC Mol Cell Biol 2019; 20:45. [PMID: 31646986 PMCID: PMC6813128 DOI: 10.1186/s12860-019-0228-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2019] [Accepted: 09/20/2019] [Indexed: 01/07/2023] Open
Abstract
Background The HongXin radish (Raphanus sativus L.), which contains the natural red pigment (red radish pigment), is grown in the Fuling district of Chongqing City. However, the molecular mechanisms underlying anthocyanin synthesis for the formation of natural red pigment in the fleshy roots of HongXin radish are not well studied. Results De novo transcriptome of HX-1 radish, as well as that of the advanced inbred lines HX-2 and HX-3 were characterized using next generation sequencing (NGS) technology. In total, approximately 66.22 million paired-end reads comprising 34, 927 unigenes (N50 = 1, 621 bp) were obtained. Based on sequence similarity search with known proteins, total of 30, 127 (about 86.26%) unigenes were identified. Additionally, functional annotation and classification of these unigenes indicated that most of the unigenes were predominantly enriched in the metabolic process-related terms, especially for the biosynthetic pathways of secondary metabolites. Moreover, majority of the anthocyanin biosynthesis-related genes (ABRGs) involved in the regulation of anthocyanin biosynthesis were identified by targeted search for their annotation. Subsequently, the expression of 15 putative ABRGs involved in the anthocyanin synthesis-related pathways were validated using quantitative real-time polymerase chain reaction (qRT-PCR). Of those, RsPAL2, RsCHS-B2, RsDFR1, RsDFR2, RsFLS, RsMT3 and RsUFGT73B2-like were identified significantly associated with anthocyanin biosynthesis. Especially for RsDFR1, RsDFR2 and RsFLS, of those, RsDFR1 and RsDFR2 were highest enriched in the HX-3 and WG-3, but RsFLS were down-regulated in HX-3 and WG-3. We proposed that the transcripts of RsDFR1, RsDFR2 and RsFLS might be act as key regulators in anthocyanin biosynthesis pathway. Conclusions The assembled radish transcript sequences were analysed to identify the key ABRGs involved in the regulation of anthocyanin biosynthesis. Additionally, the expression patterns of candidate ABRGs involved in the anthocyanin biosynthetic pathway were validated by qRT-PCR. We proposed that the transcripts of RsDFR1, RsDFR2 and RsFLS might be acted as key regulators in anthocyanin biosynthesis pathway. This study will enhance our understanding of the biosynthesis and metabolism of anthocyanin in radish.
Collapse
|
170
|
Camacho E, Rastrojo A, Sanchiz Á, González-de la Fuente S, Aguado B, Requena JM. Leishmania Mitochondrial Genomes: Maxicircle Structure and Heterogeneity of Minicircles. Genes (Basel) 2019; 10:genes10100758. [PMID: 31561572 PMCID: PMC6826401 DOI: 10.3390/genes10100758] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 09/21/2019] [Accepted: 09/24/2019] [Indexed: 01/27/2023] Open
Abstract
The mitochondrial DNA (mtDNA), which is present in almost all eukaryotic organisms, is a useful marker for phylogenetic studies due to its relative high conservation and its inheritance manner. In Leishmania and other trypanosomatids, the mtDNA (also referred to as kinetoplast DNA or kDNA) is composed of thousands of minicircles and a few maxicircles, catenated together into a complex network. Maxicircles are functionally similar to other eukaryotic mtDNAs, whereas minicircles are involved in RNA editing of some maxicircle-encoded transcripts. Next-generation sequencing (NGS) is increasingly used for assembling nuclear genomes and, currently, a large number of genomic sequences are available. However, most of the time, the mitochondrial genome is ignored in the genome assembly processes. The aim of this study was to develop a pipeline to assemble Leishmania minicircles and maxicircle DNA molecules, exploiting the raw data generated in the NGS projects. As a result, the maxicircle molecules and the plethora of minicircle classes for Leishmania major, Leishmania infantum and Leishmania braziliensis have been characterized. We have observed that whereas the heterogeneity of minicircle sequences existing in a single cell hampers their use for Leishmania typing and classification, maxicircles emerge as an extremely robust genetic marker for taxonomic studies within the clade of kinetoplastids.
Collapse
|
171
|
Yuan H, Atta C, Tornabene L, Li C. Assexon: Assembling Exon Using Gene Capture Data. Evol Bioinform Online 2019; 15:1176934319874792. [PMID: 31523128 PMCID: PMC6732846 DOI: 10.1177/1176934319874792] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 08/19/2019] [Indexed: 12/30/2022] Open
Abstract
Exon capture across species has been one of the most broadly applied approaches
to acquire multi-locus data in phylogenomic studies of non-model organisms.
Methods for assembling loci from short-read sequences (eg, Illumina platforms)
that rely on mapping reads to a reference genome may not be suitable for studies
comprising species across a wide phylogenetic spectrum; thus, de novo assembling
methods are more generally applied. Current approaches for assembling targeted
exons from short reads are not particularly optimized as they cannot (1)
assemble loci with low read depth, (2) handle large files efficiently, and (3)
reliably address issues with paralogs. Thus, we present Assexon: a streamlined
pipeline that de novo assembles targeted exons and their flanking sequences from
raw reads. We tested our method using reads from Lepisosteus
osseus (4.37 Gb) and Boleophthalmus pectinirostris
(2.43 Gb), which are captured using baits that were designed based on genome
sequence of Lepisosteus oculatus and Oreochromis
niloticus, respectively. We compared performance of Assexon to
PHYLUCE and HybPiper, which are commonly used pipelines to assemble
ultra-conserved element (UCE) and Hyb-seq data. A custom exon capture analysis
pipeline (CP) developed by Yuan et al was compared as well. Assexon accurately
assembled more than 3400 to 3800 (20%-28%) loci than PHYLUCE and more than 1900
to 2300 (8%-14%) loci than HybPiper across different levels of phylogenetic
divergence. Assexon ran at least twice as fast as PHYLUCE and HybPiper. Number
of loci assembled using CP was comparable with Assexon in both tests, while
Assexon ran at least 7 times faster than CP. In addition, some steps of CP
require the user’s interaction and are not fully automated, and this user time
was not counted in our calculation. Both Assexon and CP retrieved no paralogs in
the testing runs, but PHYLUCE and Hybpiper did. In conclusion, Assexon is a tool
for accurate and efficient assembling of large read sets from exon capture
experiments. Furthermore, Assexon includes scripts to filter poorly aligned
coding regions and flanking regions, calculate summary statistics of loci, and
select loci with reliable phylogenetic signal. Assexon is available at https://github.com/yhadevol/Assexon.
Collapse
|
172
|
Liao YC, Cheng HW, Wu HC, Kuo SC, Lauderdale TLY, Chen FJ. Completing Circular Bacterial Genomes With Assembly Complexity by Using a Sampling Strategy From a Single MinION Run With Barcoding. Front Microbiol 2019; 10:2068. [PMID: 31551994 PMCID: PMC6737777 DOI: 10.3389/fmicb.2019.02068] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 08/22/2019] [Indexed: 11/13/2022] Open
Abstract
The Oxford Nanopore MinION is an affordable and portable DNA sequencer that can produce very long reads (tens of kilobase pairs), which enable de novo bacterial genome assembly. Although many algorithms and tools have been developed for base calling, read mapping, de novo assembly, and polishing, an automated pipeline is not available for one-stop analysis for circular bacterial genome reconstruction. In this paper, we present the pipeline CCBGpipe for completing circular bacterial genomes. Raw current signals are demultiplexed and base called to generate sequencing data. Sequencing reads are de novo assembled several times by using a sampling strategy to produce circular contigs that have a sequence in common between their start and end. The circular contigs are polished by using raw signals and sequencing reads; then, duplicated sequences are removed to form a linear representation of circular sequences. The circularized contigs are finally rearranged to start at the start position of dnaA/repA or a replication origin based on the GC skew. CCBGpipe implemented in Python is available at https://github.com/jade-nhri/CCBGpipe. Using sequencing data produced from a single MinION run, we obtained 48 circular sequences, comprising 12 chromosomes and 36 plasmids of 12 bacteria, including Acinetobacter nosocomialis, Acinetobacter pittii, and Staphylococcus aureus. With adequate quantities of sequencing reads (80×), CCBGpipe can provide a complete and automated assembly of circular bacterial genomes.
Collapse
|
173
|
Khalkhali-Evrigh R, Hedayat-Evrigh N, Hafezian SH, Farhadi A, Bakhtiarizadeh MR. Genome-Wide Identification of Microsatellites and Transposable Elements in the Dromedary Camel Genome Using Whole-Genome Sequencing Data. Front Genet 2019; 10:692. [PMID: 31404266 PMCID: PMC6675863 DOI: 10.3389/fgene.2019.00692] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 07/02/2019] [Indexed: 01/09/2023] Open
Abstract
Transposable elements (TEs) along with simple sequence repeats (SSRs) are prevalent in eukaryotic genome, especially in mammals. Repetitive sequences form approximately one-third of the camelid genomes, so study on this part of genome can be helpful in providing deeper information from the genome and its evolutionary path. Here, in order to improve our understanding regarding the camel genome architecture, the whole genome of the two dromedaries (Yazdi and Trodi camels) was sequenced. Totally, 92- and 84.3-Gb sequence data were obtained and assembled to 137,772 and 149,997 contigs with a N50 length of 54,626 and 54,031 bp in Yazdi and Trodi camels, respectively. Results showed that 30.58% of Yazdi camel genome and 30.50% of Trodi camel genome were covered by TEs. Contrary to the observed results in the genomes of cattle, sheep, horse, and pig, no endogenous retrovirus-K (ERVK) elements were found in the camel genome. Distribution pattern of DNA transposons in the genomes of dromedary, Bactrian, and cattle was similar in contrast with LINE, SINE, and long terminal repeat (LTR) families. Elements like RTE-BovB belonging to LINEs family in cattle and sheep genomes are dramatically higher than genome of dromedary. However, LINE1 (L1) and LINE2 (L2) elements cover higher percentage of LINE family in dromedary genome compared to genome of cattle. Also, 540,133 and 539,409 microsatellites were identified from the assembled contigs of Yazdi and Trodi dromedary camels, respectively. In both samples, di-(393,196) and tri-(65,313) nucleotide repeats contributed to about 42.5% of the microsatellites. The findings of the present study revealed that non-repetitive content of mammalian genomes is approximately similar. Results showed that 9.1 Mb (0.47% of whole assembled genome) of Iranian dromedary's genome length is made up of SSRs. Annotation of repetitive content of Iranian dromedary camel genome revealed that 9,068 and 11,544 genes contain different types of TEs and SSRs, respectively. SSR markers identified in the present study can be used as a valuable resource for genetic diversity investigations and marker-assisted selection (MAS) in camel-breeding programs.
Collapse
|
174
|
Tierney BT, Yang Z, Luber JM, Beaudin M, Wibowo MC, Baek C, Mehlenbacher E, Patel CJ, Kostic AD. The Landscape of Genetic Content in the Gut and Oral Human Microbiome. Cell Host Microbe 2019; 26:283-295.e8. [PMID: 31415755 PMCID: PMC6716383 DOI: 10.1016/j.chom.2019.07.008] [Citation(s) in RCA: 166] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 05/01/2019] [Accepted: 06/19/2019] [Indexed: 02/06/2023]
Abstract
Despite substantial interest in the species diversity of the human microbiome and its role in disease, the scale of its genetic diversity, which is fundamental to deciphering human-microbe interactions, has not been quantified. Here, we conducted a cross-study meta-analysis of metagenomes from two human body niches, the mouth and gut, covering 3,655 samples from 13 studies. We found staggering genetic heterogeneity in the dataset, identifying a total of 45,666,334 non-redundant genes (23,961,508 oral and 22,254,436 gut) at the 95% identity level. Fifty percent of all genes were "singletons," or unique to a single metagenomic sample. Singletons were enriched for different functions (compared with non-singletons) and arose from sub-population-specific microbial strains. Overall, these results provide potential bases for the unexplained heterogeneity observed in microbiome-derived human phenotypes. One the basis of these data, we built a resource, which can be accessed at https://microbial-genes.bio.
Collapse
|
175
|
Mukherjee S, Cai Z, Mukherjee A, Longkumer I, Mech M, Vupru K, Khate K, Rajkhowa C, Mitra A, Guldbrandtsen B, Lund MS, Sahana G. Whole genome sequence and de novo assembly revealed genomic architecture of Indian Mithun (Bos frontalis). BMC Genomics 2019; 20:617. [PMID: 31357931 PMCID: PMC6664528 DOI: 10.1186/s12864-019-5980-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 07/16/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Mithun (Bos frontalis), also called gayal, is an endangered bovine species, under the tribe bovini with 2n = 58 XX chromosome complements and reared under the tropical rain forests region of India, China, Myanmar, Bhutan and Bangladesh. However, the origin of this species is still disputed and information on its genomic architecture is scanty so far. We trust that availability of its whole genome sequence data and assembly will greatly solve this problem and help to generate many information including phylogenetic status of mithun. Recently, the first genome assembly of gayal, mithun of Chinese origin, was published. However, an improved reference genome assembly would still benefit in understanding genetic variation in mithun populations reared under diverse geographical locations and for building a superior consensus assembly. We, therefore, performed deep sequencing of the genome of an adult female mithun from India, assembled and annotated its genome and performed extensive bioinformatic analyses to produce a superior de novo genome assembly of mithun. RESULTS We generated ≈300 Gigabyte (Gb) raw reads from whole-genome deep sequencing platforms and assembled the sequence data using a hybrid assembly strategy to create a high quality de novo assembly of mithun with 96% recovered as per BUSCO analysis. The final genome assembly has a total length of 3.0 Gb, contains 5,015 scaffolds with an N50 value of 1 Mb. Repeat sequences constitute around 43.66% of the assembly. The genomic alignments between mithun to cattle showed that their genomes, as expected, are highly conserved. Gene annotation identified 28,044 protein-coding genes presented in mithun genome. The gene orthologous groups of mithun showed a high degree of similarity in comparison with other species, while fewer mithun specific coding sequences were found compared to those in cattle. CONCLUSION Here we presented the first de novo draft genome assembly of Indian mithun having better coverage, less fragmented, better annotated, and constitutes a reasonably complete assembly compared to the previously published gayal genome. This comprehensive assembly unravelled the genomic architecture of mithun to a great extent and will provide a reference genome assembly to research community to elucidate the evolutionary history of mithun across its distinct geographical locations.
Collapse
|