1
|
The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update. Nucleic Acids Res 2024:gkae410. [PMID: 38769056 DOI: 10.1093/nar/gkae410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/18/2024] [Accepted: 05/02/2024] [Indexed: 05/22/2024] Open
Abstract
Galaxy (https://galaxyproject.org) is deployed globally, predominantly through free-to-use services, supporting user-driven research that broadens in scope each year. Users are attracted to public Galaxy services by platform stability, tool and reference dataset diversity, training, support and integration, which enables complex, reproducible, shareable data analysis. Applying the principles of user experience design (UXD), has driven improvements in accessibility, tool discoverability through Galaxy Labs/subdomains, and a redesigned Galaxy ToolShed. Galaxy tool capabilities are progressing in two strategic directions: integrating general purpose graphical processing units (GPGPU) access for cutting-edge methods, and licensed tool support. Engagement with global research consortia is being increased by developing more workflows in Galaxy and by resourcing the public Galaxy services to run them. The Galaxy Training Network (GTN) portfolio has grown in both size, and accessibility, through learning paths and direct integration with Galaxy tools that feature in training courses. Code development continues in line with the Galaxy Project roadmap, with improvements to job scheduling and the user interface. Environmental impact assessment is also helping engage users and developers, reminding them of their role in sustainability, by displaying estimated CO2 emissions generated by each Galaxy job.
Collapse
|
2
|
Identification of a viral gene essential for the genome replication of a domesticated endogenous virus in ichneumonid parasitoid wasps. PLoS Pathog 2024; 20:e1011980. [PMID: 38662774 DOI: 10.1371/journal.ppat.1011980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 05/07/2024] [Accepted: 03/22/2024] [Indexed: 05/08/2024] Open
Abstract
Thousands of endoparasitoid wasp species in the families Braconidae and Ichneumonidae harbor "domesticated endogenous viruses" (DEVs) in their genomes. This study focuses on ichneumonid DEVs, named ichnoviruses (IVs). Large quantities of DNA-containing IV virions are produced in ovary calyx cells during the pupal and adult stages of female wasps. Females parasitize host insects by injecting eggs and virions into the body cavity. After injection, virions rapidly infect host cells which is followed by expression of IV genes that promote the successful development of wasp offspring. IV genomes consist of two components: proviral segment loci that serve as templates for circular dsDNAs that are packaged into capsids, and genes from an ancestral virus that produce virions. In this study, we generated a chromosome-scale genome assembly for Hyposoter didymator that harbors H. didymator ichnovirus (HdIV). We identified a total of 67 HdIV loci that are amplified in calyx cells during the wasp pupal stage. We then focused on an HdIV gene, U16, which is transcribed in calyx cells during the initial stages of replication. Sequence analysis indicated that U16 contains a conserved domain in primases from select other viruses. Knockdown of U16 by RNA interference inhibited virion morphogenesis in calyx cells. Genome-wide analysis indicated U16 knockdown also inhibited amplification of HdIV loci in calyx cells. Altogether, our results identified several previously unknown HdIV loci, demonstrated that all HdIV loci are amplified in calyx cells during the pupal stage, and showed that U16 is required for amplification and virion morphogenesis.
Collapse
|
3
|
First chromosome scale genomes of ithomiine butterflies (Nymphalidae: Ithomiini): Comparative models for mimicry genetic studies. Mol Ecol Resour 2023; 23:872-885. [PMID: 36533297 DOI: 10.1111/1755-0998.13749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 11/30/2022] [Accepted: 12/05/2022] [Indexed: 12/23/2022]
Abstract
The ithomiine butterflies (Nymphalidae: Danainae) represent the largest known radiation of Müllerian mimetic butterflies. They dominate by number the mimetic butterfly communities, which include species such as the iconic neotropical Heliconius genus. Recent studies on the ecology and genetics of speciation in Ithomiini have suggested that sexual pheromones, colour pattern and perhaps hostplant could drive reproductive isolation. However, no reference genome was available for Ithomiini, which has hindered further exploration on the genetic architecture of these candidate traits, and more generally on the genomic patterns of divergence. Here, we generated high-quality, chromosome-scale genome assemblies for two Melinaea species, M. marsaeus and M. menophilus, and a draft genome of the species Ithomia salapia. We obtained genomes with a size ranging from 396 to 503 Mb across the three species and scaffold N50 of 40.5 and 23.2 Mb for the two chromosome-scale assemblies. Using collinearity analyses we identified massive rearrangements between the two closely related Melinaea species. An annotation of transposable elements and gene content was performed, as well as a specialist annotation to target chemosensory genes, which is crucial for host plant detection and mate recognition in mimetic species. A comparative genomic approach revealed independent gene expansions in ithomiines and particularly in gustatory receptor genes. These first three genomes of ithomiine mimetic butterflies constitute a valuable addition and a welcome comparison to existing biological models such as Heliconius, and will enable further understanding of the mechanisms of adaptation in butterflies.
Collapse
|
4
|
Training Infrastructure as a Service. Gigascience 2022; 12:giad048. [PMID: 37395629 PMCID: PMC10316688 DOI: 10.1093/gigascience/giad048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/31/2023] [Accepted: 06/08/2023] [Indexed: 07/04/2023] Open
Abstract
BACKGROUND Hands-on training, whether in bioinformatics or other domains, often requires significant technical resources and knowledge to set up and run. Instructors must have access to powerful compute infrastructure that can support resource-intensive jobs running efficiently. Often this is achieved using a private server where there is no contention for the queue. However, this places a significant prerequisite knowledge or labor barrier for instructors, who must spend time coordinating deployment and management of compute resources. Furthermore, with the increase of virtual and hybrid teaching, where learners are located in separate physical locations, it is difficult to track student progress as efficiently as during in-person courses. FINDINGS Originally developed by Galaxy Europe and the Gallantries project, together with the Galaxy community, we have created Training Infrastructure-as-a-Service (TIaaS), aimed at providing user-friendly training infrastructure to the global training community. TIaaS provides dedicated training resources for Galaxy-based courses and events. Event organizers register their course, after which trainees are transparently placed in a private queue on the compute infrastructure, which ensures jobs complete quickly, even when the main queue is experiencing high wait times. A built-in dashboard allows instructors to monitor student progress. CONCLUSIONS TIaaS provides a significant improvement for instructors and learners, as well as infrastructure administrators. The instructor dashboard makes remote events not only possible but also easy. Students experience continuity of learning, as all training happens on Galaxy, which they can continue to use after the event. In the past 60 months, 504 training events with over 24,000 learners have used this infrastructure for Galaxy training.
Collapse
|
5
|
Corrigendum to “Nutritive value of dehydrated sainfoin (Onobrychis viciifolia) for growing rabbits, according to the harvesting stage” [Anim. Feed Sci. Technol. 279 (2021) 114995]. Anim Feed Sci Technol 2022. [DOI: 10.1016/j.anifeedsci.2022.115439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
|
6
|
Spodoptera littoralis genome mining brings insights on the dynamic of expansion of gustatory receptors in polyphagous noctuidae. G3 (BETHESDA, MD.) 2022; 12:6598846. [PMID: 35652787 PMCID: PMC9339325 DOI: 10.1093/g3journal/jkac131] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 05/17/2022] [Indexed: 11/13/2022]
Abstract
The bitter taste, triggered via gustatory receptors, serves as an important natural defense against the ingestion of poisonous foods in animals, and the increased host breadth is usually linked to an increase in the number of gustatory receptor genes. This has been especially observed in polyphagous insect species, such as noctuid species from the Spodoptera genus. However, the dynamic and physical mechanisms leading to these gene expansions and the evolutionary pressures behind them remain elusive. Among major drivers of genome dynamics are the transposable elements but, surprisingly, their potential role in insect gustatory receptor expansion has not been considered yet. In this work, we hypothesized that transposable elements and possibly positive selection would be involved in the highly dynamic evolution of gustatory receptor in Spodoptera spp. We first sequenced de novo the full 465 Mb genome of S. littoralis, and manually annotated the main chemosensory genes, including a large repertoire of 373 gustatory receptor genes (including 19 pseudogenes). We also improved the completeness of S. frugiperda and S. litura gustatory receptor gene repertoires. Then, we annotated transposable elements and revealed that a particular category of class I retrotransposons, the SINE transposons, was significantly enriched in the vicinity of gustatory receptor gene clusters, suggesting a transposon-mediated mechanism for the formation of these clusters. Selection pressure analyses indicated that positive selection within the gustatory receptor gene family is cryptic, only 7 receptors being identified as positively selected. Altogether, our data provide a new good quality Spodoptera genome, pinpoint interesting gustatory receptor candidates for further functional studies and bring valuable genomic information on the mechanisms of gustatory receptor expansions in polyphagous insect species.
Collapse
|
7
|
The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res 2022; 50:W345-W351. [PMID: 35446428 PMCID: PMC9252830 DOI: 10.1093/nar/gkac247] [Citation(s) in RCA: 250] [Impact Index Per Article: 125.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/17/2022] [Accepted: 03/30/2022] [Indexed: 01/19/2023] Open
Abstract
Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely accessible analysis and training services. The Galaxy Training Network supports free, self-directed, virtual training with >230 integrated tutorials. Project engagement metrics have continued to grow over the last 2 years, including source code contributions, publications, software packages wrapped as tools, registered users and their daily analysis jobs, and new independent specialized servers. Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools. Important scientific developments enabled by Galaxy include Vertebrate Genome Project (VGP) assembly workflows and global SARS-CoV-2 collaborations.
Collapse
|
8
|
Nutritive value of dehydrated sainfoin (Onobrychis viciifoliae) for growing rabbits, according to the harvesting stage. Anim Feed Sci Technol 2021. [DOI: 10.1016/j.anifeedsci.2021.114995] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
9
|
Author Correction: Chromosomal scale assembly of parasitic wasp genome reveals symbiotic virus colonization. Commun Biol 2021; 4:940. [PMID: 34331006 PMCID: PMC8324771 DOI: 10.1038/s42003-021-02480-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
10
|
Chromosomal scale assembly of parasitic wasp genome reveals symbiotic virus colonization. Commun Biol 2021; 4:104. [PMID: 33483589 PMCID: PMC7822920 DOI: 10.1038/s42003-020-01623-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 12/10/2020] [Indexed: 02/06/2023] Open
Abstract
Endogenous viruses form an important proportion of eukaryote genomes and a source of novel functions. How large DNA viruses integrated into a genome evolve when they confer a benefit to their host, however, remains unknown. Bracoviruses are essential for the parasitism success of parasitoid wasps, into whose genomes they integrated ~103 million years ago. Here we show, from the assembly of a parasitoid wasp genome at a chromosomal scale, that bracovirus genes colonized all ten chromosomes of Cotesia congregata. Most form clusters of genes involved in particle production or parasitism success. Genomic comparison with another wasp, Microplitis demolitor, revealed that these clusters were already established ~53 mya and thus belong to remarkably stable genomic structures, the architectures of which are evolutionary constrained. Transcriptomic analyses highlight temporal synchronization of viral gene expression without resulting in immune gene induction, suggesting that no conflicts remain between ancient symbiotic partners when benefits to them converge.
Collapse
|
11
|
Metage2Metabo, microbiota-scale metabolic complementarity for the identification of key species. eLife 2020; 9:61968. [PMID: 33372654 PMCID: PMC7861615 DOI: 10.7554/elife.61968] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 12/25/2020] [Indexed: 12/13/2022] Open
Abstract
To capture the functional diversity of microbiota, one must identify metabolic functions and species of interest within hundreds or thousands of microorganisms. We present Metage2Metabo (M2M) a resource that meets the need for de novo functional screening of genome-scale metabolic networks (GSMNs) at the scale of a metagenome, and the identification of critical species with respect to metabolic cooperation. M2M comprises a flexible pipeline for the characterisation of individual metabolisms and collective metabolic complementarity. In addition, M2M identifies key species, that are meaningful members of the community for functions of interest. We demonstrate that M2M is applicable to collections of genomes as well as metagenome-assembled genomes, permits an efficient GSMN reconstruction with Pathway Tools, and assesses the cooperation potential between species. M2M identifies key organisms by reducing the complexity of a large-scale microbiota into minimal communities with equivalent properties, suitable for further analyses. All the microbes that live in a specific environment, for example an organ, are collectively called the microbiota. In humans, the microbiota of the gut has been extensively studied by sequencing the DNA of the different microbes to identify them and determine the roles they play in health and disease. The DNA sequences of all the members of the microbiota is called the metagenome. The chemical reactions that the gut microbiota perform to produce energy and make the biomolecules they need to survive are collectively referred to as the metabolism of these microbes. Studying the metabolism of the gut microbiota can help researchers understand the roles of the different microbes. However, the large variety of species in the gut microbiota and gaps in the information about them render these studies difficult, despite technology improving quickly. To tackle this issue, Belcour, Frioux et al developed a new piece of software called Metage2Metabo (M2M) that simulates the metabolism of the gut microbiota and describes the metabolic relationships between the different microbes. Metage2Metabo analyses the roles of the metabolic genes of a large number of microbe species to establish how they complement each other metabolically. Then, it can calculate the minimum number of species needed to perform a metabolic role of interest within that microbiota, and which key species are associated with that role. To test the new software, Belcour, Frioux et al. used Metage2Metabo to analyse genomes from the human gut microbiota and from the cow rumen (one of the cow’s stomachs). They showed that even if the metagenome was incomplete, the software was able to make stable predictions of key species involved in metabolic complementarity. Additionally, they also illustrated how the method can be used to study the gut microbiota of individuals. This work presents a new method for determining the metabolic relationships between species within a microbiota. The software is highly flexible and could be adapted to identify key members within different communities. In the context of the gut microbiota, the predictions of Metage2Metabo could shed lights on the interactions between the host and the microbes and contribute to a better understanding of microbe environments.
Collapse
|
12
|
Positive selection alone is sufficient for whole genome differentiation at the early stage of speciation process in the fall armyworm. BMC Evol Biol 2020; 20:152. [PMID: 33187468 PMCID: PMC7663868 DOI: 10.1186/s12862-020-01715-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 10/28/2020] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND The process of speciation involves differentiation of whole genome sequences between a pair of diverging taxa. In the absence of a geographic barrier and in the presence of gene flow, genomic differentiation may occur when the homogenizing effect of recombination is overcome across the whole genome. The fall armyworm is observed as two sympatric strains with different host-plant preferences across the entire habitat. These two strains exhibit a very low level of genetic differentiation across the whole genome, suggesting that genomic differentiation occurred at an early stage of speciation. In this study, we aim at identifying critical evolutionary forces responsible for genomic differentiation in the fall armyworm. RESULTS These two strains exhibit a low level of genomic differentiation (FST = 0.0174), while 99.2% of 200 kb windows have genetically differentiated sequences (FST > 0). We found that the combined effect of mild positive selection and genetic linkage to selectively targeted loci are responsible for the genomic differentiation. However, a single event of very strong positive selection appears not to be responsible for genomic differentiation. The contribution of chromosomal inversions or tight genetic linkage among positively selected loci causing reproductive barriers is not supported by our data. Phylogenetic analysis shows that the genomic differentiation occurred by sub-setting of genetic variants in one strain from the other. CONCLUSIONS From these results, we concluded that genomic differentiation may occur at the early stage of a speciation process in the fall armyworm and that mild positive selection targeting many loci alone is sufficient evolutionary force for generating the pattern of genomic differentiation. This genomic differentiation may provide a condition for accelerated genomic differentiation by synergistic effects among linkage disequilibrium generated by following events of positive selection. Our study highlights genomic differentiation as a key evolutionary factor connecting positive selection to divergent selection.
Collapse
|
13
|
Adaptation by copy number variation increases insecticide resistance in the fall armyworm. Commun Biol 2020; 3:664. [PMID: 33184418 PMCID: PMC7661717 DOI: 10.1038/s42003-020-01382-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 10/14/2020] [Indexed: 12/19/2022] Open
Abstract
Understanding the genetic basis of insecticide resistance is a key topic in agricultural ecology. The adaptive evolution of multi-copy detoxification genes has been interpreted as a cause of insecticide resistance, yet the same pattern can also be generated by the adaptation to host-plant defense toxins. In this study, we tested in the fall armyworm, Spodoptera frugiperda (Lepidoptera: Noctuidae), if adaptation by copy number variation caused insecticide resistance in two geographically distinct populations with different levels of resistance and the two host-plant strains. We observed a significant allelic differentiation of genomic copy number variations between the two geographic populations, but not between host-plant strains. A locus with positively selected copy number variation included a CYP gene cluster. Toxicological tests supported a central role for CYP enzymes in deltamethrin resistance. Our results indicate that copy number variation of detoxification genes might be responsible for insecticide resistance in fall armyworm and that evolutionary forces causing insecticide resistance could be independent of host-plant adaptation.
Collapse
|
14
|
Correction to: The genome sequence of the grape phylloxera provides insights into the evolution, adaptation, and invasion routes of an iconic pest. BMC Biol 2020; 18:123. [PMID: 32917281 PMCID: PMC7488435 DOI: 10.1186/s12915-020-00864-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
|
15
|
Genomic architecture of endogenous ichnoviruses reveals distinct evolutionary pathways leading to virus domestication in parasitic wasps. BMC Biol 2020; 18:89. [PMID: 32703219 PMCID: PMC7379367 DOI: 10.1186/s12915-020-00822-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Accepted: 06/29/2020] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Polydnaviruses (PDVs) are mutualistic endogenous viruses inoculated by some lineages of parasitoid wasps into their hosts, where they facilitate successful wasp development. PDVs include the ichnoviruses and bracoviruses that originate from independent viral acquisitions in ichneumonid and braconid wasps respectively. PDV genomes are fully incorporated into the wasp genomes and consist of (1) genes involved in viral particle production, which derive from the viral ancestor and are not encapsidated, and (2) proviral segments harboring virulence genes, which are packaged into the viral particle. To help elucidating the mechanisms that have facilitated viral domestication in ichneumonid wasps, we analyzed the structure of the viral insertions by sequencing the whole genome of two ichnovirus-carrying wasp species, Hyposoter didymator and Campoletis sonorensis. RESULTS Assemblies with long scaffold sizes allowed us to unravel the organization of the endogenous ichnovirus and revealed considerable dispersion of the viral loci within the wasp genomes. Proviral segments contained species-specific sets of genes and occupied distinct genomic locations in the two ichneumonid wasps. In contrast, viral machinery genes were organized in clusters showing highly conserved gene content and order, with some loci located in collinear wasp genomic regions. This genomic architecture clearly differs from the organization of PDVs in braconid wasps, in which proviral segments are clustered and viral machinery elements are more dispersed. CONCLUSIONS The contrasting structures of the two types of ichnovirus genomic elements are consistent with their different functions: proviral segments are vehicles for virulence proteins expected to adapt according to different host defense systems, whereas the genes involved in virus particle production in the wasp are likely more stable and may reflect ancestral viral architecture. The distinct genomic architectures seen in ichnoviruses versus bracoviruses reveal different evolutionary trajectories that have led to virus domestication in the two wasp lineages.
Collapse
|
16
|
The genome sequence of the grape phylloxera provides insights into the evolution, adaptation, and invasion routes of an iconic pest. BMC Biol 2020; 18:90. [PMID: 32698880 PMCID: PMC7376646 DOI: 10.1186/s12915-020-00820-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 06/22/2020] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Although native to North America, the invasion of the aphid-like grape phylloxera Daktulosphaira vitifoliae across the globe altered the course of grape cultivation. For the past 150 years, viticulture relied on grafting-resistant North American Vitis species as rootstocks, thereby limiting genetic stocks tolerant to other stressors such as pathogens and climate change. Limited understanding of the insect genetics resulted in successive outbreaks across the globe when rootstocks failed. Here we report the 294-Mb genome of D. vitifoliae as a basic tool to understand host plant manipulation, nutritional endosymbiosis, and enhance global viticulture. RESULTS Using a combination of genome, RNA, and population resequencing, we found grape phylloxera showed high duplication rates since its common ancestor with aphids, but similarity in most metabolic genes, despite lacking obligate nutritional symbioses and feeding from parenchyma. Similarly, no enrichment occurred in development genes in relation to viviparity. However, phylloxera evolved > 2700 unique genes that resemble putative effectors and are active during feeding. Population sequencing revealed the global invasion began from the upper Mississippi River in North America, spread to Europe and from there to the rest of the world. CONCLUSIONS The grape phylloxera genome reveals genetic architecture relative to the evolution of nutritional endosymbiosis, viviparity, and herbivory. The extraordinary expansion in effector genes also suggests novel adaptations to plant feeding and how insects induce complex plant phenotypes, for instance galls. Finally, our understanding of the origin of this invasive species and its genome provide genetics resources to alleviate rootstock bottlenecks restricting the advancement of viticulture.
Collapse
|
17
|
Functional insights from the GC-poor genomes of two aphid parasitoids, Aphidius ervi and Lysiphlebus fabarum. BMC Genomics 2020; 21:376. [PMID: 32471448 PMCID: PMC7257214 DOI: 10.1186/s12864-020-6764-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 04/30/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Parasitoid wasps have fascinating life cycles and play an important role in trophic networks, yet little is known about their genome content and function. Parasitoids that infect aphids are an important group with the potential for biological control. Their success depends on adapting to develop inside aphids and overcoming both host aphid defenses and their protective endosymbionts. RESULTS We present the de novo genome assemblies, detailed annotation, and comparative analysis of two closely related parasitoid wasps that target pest aphids: Aphidius ervi and Lysiphlebus fabarum (Hymenoptera: Braconidae: Aphidiinae). The genomes are small (139 and 141 Mbp) and the most AT-rich reported thus far for any arthropod (GC content: 25.8 and 23.8%). This nucleotide bias is accompanied by skewed codon usage and is stronger in genes with adult-biased expression. AT-richness may be the consequence of reduced genome size, a near absence of DNA methylation, and energy efficiency. We identify missing desaturase genes, whose absence may underlie mimicry in the cuticular hydrocarbon profile of L. fabarum. We highlight key gene groups including those underlying venom composition, chemosensory perception, and sex determination, as well as potential losses in immune pathway genes. CONCLUSIONS These findings are of fundamental interest for insect evolution and biological control applications. They provide a strong foundation for further functional studies into coevolution between parasitoids and their hosts. Both genomes are available at https://bipaa.genouest.org.
Collapse
|
18
|
Tripal v3: an ontology-based toolkit for construction of FAIR biological community databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5532788. [PMID: 31328773 PMCID: PMC6643302 DOI: 10.1093/database/baz077] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 05/12/2019] [Accepted: 05/22/2019] [Indexed: 12/20/2022]
Abstract
Community biological databases provide an important online resource for both public and private data, analysis tools and community engagement. These sites house genomic, transcriptomic, genetic, breeding and ancillary data for specific species, families or clades. Due to the complexity and increasing quantities of these data, construction of online resources is increasingly difficult especially with limited funding and access to technical expertise. Furthermore, online repositories are expected to promote FAIR data principles (findable, accessible, interoperable and reusable) that presents additional challenges. The open-source Tripal database toolkit seeks to mitigate these challenges by creating both the software and an interactive community of developers for construction of online community databases. Additionally, through coordinated, distributed co-development, Tripal sites encourage community-wide sustainability. Here, we report the release of Tripal version 3 that improves data accessibility and data sharing through systematic use of controlled vocabularies (CVs). Tripal uses the community-developed Chado database as a default data store, but now provides tools to support other data stores, while ensuring that CVs remain the central organizational structure for the data. A new site developer can use Tripal to develop a basic site with little to no programming, with the ability to integrate other data types using extension modules and the Tripal application programming interface. A thorough online User’s Guide and Developer’s Handbook are available at http://tripal.info, providing download, installation and step-by-step setup instructions.
Collapse
|
19
|
Community-Driven Data Analysis Training for Biology. Cell Syst 2019; 6:752-758.e1. [PMID: 29953864 DOI: 10.1016/j.cels.2018.05.012] [Citation(s) in RCA: 97] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 03/10/2018] [Accepted: 05/18/2018] [Indexed: 01/12/2023]
Abstract
The primary problem with the explosion of biomedical datasets is not the data, not computational resources, and not the required storage space, but the general lack of trained and skilled researchers to manipulate and analyze these data. Eliminating this problem requires development of comprehensive educational resources. Here we present a community-driven framework that enables modern, interactive teaching of data analytics in life sciences and facilitates the development of training materials. The key feature of our system is that it is not a static but a continuously improved collection of tutorials. By coupling tutorials with a web-based analysis framework, biomedical researchers can learn by performing computation themselves through a web browser without the need to install software or search for example datasets. Our ultimate goal is to expand the breadth of training materials to include fundamental statistical and data science topics and to precipitate a complete re-engineering of undergraduate and graduate curricula in life sciences. This project is accessible at https://training.galaxyproject.org.
Collapse
|
20
|
Draft genome and reference transcriptomic resources for the urticating pine defoliator Thaumetopoea pityocampa (Lepidoptera: Notodontidae). Mol Ecol Resour 2018; 18:602-619. [PMID: 29352511 DOI: 10.1111/1755-0998.12756] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Revised: 12/23/2017] [Accepted: 01/03/2018] [Indexed: 12/15/2022]
Abstract
The pine processionary moth Thaumetopoea pityocampa (Lepidoptera: Notodontidae) is the main pine defoliator in the Mediterranean region. Its urticating larvae cause severe human and animal health concerns in the invaded areas. This species shows a high phenotypic variability for various traits, such as phenology, fecundity and tolerance to extreme temperatures. This study presents the construction and analysis of extensive genomic and transcriptomic resources, which are an obligate prerequisite to understand their underlying genetic architecture. Using a well-studied population from Portugal with peculiar phenological characteristics, the karyotype was first determined and a first draft genome of 537 Mb total length was assembled into 68,292 scaffolds (N50 = 164 kb). From this genome assembly, 29,415 coding genes were predicted. To circumvent some limitations for fine-scale physical mapping of genomic regions of interest, a 3X coverage BAC library was also developed. In particular, 11 BACs from this library were individually sequenced to assess the assembly quality. Additionally, de novo transcriptomic resources were generated from various developmental stages sequenced with HiSeq and MiSeq Illumina technologies. The reads were de novo assembled into 62,376 and 63,175 transcripts, respectively. Then, a robust subset of the genome-predicted coding genes, the de novo transcriptome assemblies and previously published 454/Sanger data were clustered to obtain a high-quality and comprehensive reference transcriptome consisting of 29,701 bona fide unigenes. These sequences covered 99% of the cegma and 88% of the busco highly conserved eukaryotic genes and 84% of the busco arthropod gene set. Moreover, 90% of these transcripts could be localized on the draft genome. The described information is available via a genome annotation portal (http://bipaa.genouest.org/sp/thaumetopoea_pityocampa/).
Collapse
|
21
|
De novo genome and transcriptome resources of the Adzuki bean borer Ostrinia scapulalis (Lepidoptera: Crambidae). Data Brief 2018; 17:781-787. [PMID: 29785409 PMCID: PMC5958680 DOI: 10.1016/j.dib.2018.01.073] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Revised: 01/23/2018] [Accepted: 01/25/2018] [Indexed: 11/25/2022] Open
Abstract
We present a draft genome assembly with a de novo prediction and automated functional annotation of coding genes, and a reference transcriptome of the Adzuki bean borer, Ostrinia scapulalis, based on RNA sequencing of various tissues and developmental stages. The genome assembly spans 419 Mb, has a GC content of 37.4% and includes 26,120 predicted coding genes. The reference transcriptome holds 33,080 unigenes and contains a high proportion of a set of genes conserved in eukaryotes and arthropods, used as quality assessment of the reconstructed transcripts. The new genomic and transcriptomic data presented here significantly enrich the public sequence databases for the Crambidae and Lepidoptera, and represent useful resources for future researches related to the evolution and the adaptation of phytophagous moths. The genome and transcriptome assemblies have been deposited and made accessible via a NCBI BioProject (id PRJNA390510) and the LepidoDB database (http://bipaa.genouest.org/sp/ostrinia_scapulalis/).
Collapse
|
22
|
Two genomes of highly polyphagous lepidopteran pests (Spodoptera frugiperda, Noctuidae) with different host-plant ranges. Sci Rep 2017; 7:11816. [PMID: 28947760 PMCID: PMC5613006 DOI: 10.1038/s41598-017-10461-4] [Citation(s) in RCA: 169] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 04/19/2017] [Indexed: 12/30/2022] Open
Abstract
Emergence of polyphagous herbivorous insects entails significant adaptation to recognize, detoxify and digest a variety of host-plants. Despite of its biological and practical importance - since insects eat 20% of crops - no exhaustive analysis of gene repertoires required for adaptations in generalist insect herbivores has previously been performed. The noctuid moth Spodoptera frugiperda ranks as one of the world’s worst agricultural pests. This insect is polyphagous while the majority of other lepidopteran herbivores are specialist. It consists of two morphologically indistinguishable strains (“C” and “R”) that have different host plant ranges. To describe the evolutionary mechanisms that both enable the emergence of polyphagous herbivory and lead to the shift in the host preference, we analyzed whole genome sequences from laboratory and natural populations of both strains. We observed huge expansions of genes associated with chemosensation and detoxification compared with specialist Lepidoptera. These expansions are largely due to tandem duplication, a possible adaptation mechanism enabling polyphagy. Individuals from natural C and R populations show significant genomic differentiation. We found signatures of positive selection in genes involved in chemoreception, detoxification and digestion, and copy number variation in the two latter gene families, suggesting an adaptive role for structural variation.
Collapse
|
23
|
Genome scans on experimentally evolved populations reveal candidate regions for adaptation to plant resistance in the potato cyst nematode Globodera pallida. Mol Ecol 2017; 26:4700-4711. [PMID: 28734070 DOI: 10.1111/mec.14240] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Revised: 07/13/2017] [Accepted: 07/17/2017] [Indexed: 12/30/2022]
Abstract
Improving resistance durability involves to be able to predict the adaptation speed of pathogen populations. Identifying the genetic bases of pathogen adaptation to plant resistances is a useful step to better understand and anticipate this phenomenon. Globodera pallida is a major pest of potato crop for which a resistance QTL, GpaVvrn , has been identified in Solanum vernei. However, its durability is threatened as G. pallida populations are able to adapt to the resistance in few generations. The aim of this study was to investigate the genomic regions involved in the resistance breakdown by coupling experimental evolution and high-density genome scan. We performed a whole-genome resequencing of pools of individuals (Pool-Seq) belonging to G. pallida lineages derived from two independent populations having experimentally evolved on susceptible and resistant potato cultivars. About 1.6 million SNPs were used to perform the genome scan using a recent model testing for adaptive differentiation and association to population-specific covariables. We identified 275 outliers and 31 of them, which also showed a significant reduction in diversity in adapted lineages, were investigated for their genic environment. Some candidate genomic regions contained genes putatively encoding effectors and were enriched in SPRYSECs, known in cyst nematodes to be involved in pathogenicity and in (a)virulence. Validated candidate SNPs will provide a useful molecular tool to follow frequencies of virulence alleles in natural G. pallida populations and define efficient strategies of use of potato resistances maximizing their durability.
Collapse
|
24
|
Abstract
Background Heterogametic species display a differential number of sex chromosomes resulting in imbalanced transcription levels for these chromosomes between males and females. To correct this disequilibrium, dosage compensation mechanisms involving gene expression and chromatin accessibility regulations have emerged throughout evolution. In insects, these mechanisms have been extensively characterized only in Drosophila but not in insects of agronomical importance. Aphids are indeed major pests of a wide range of crops. Their remarkable ability to switch from asexual to sexual reproduction during their life cycle largely explains the economic losses they can cause. As heterogametic insects, male aphids are X0, while females (asexual and sexual) are XX. Results Here, we analyzed transcriptomic and open chromatin data obtained from whole male and female individuals to evaluate the putative existence of a dosage compensation mechanism involving differential chromatin accessibility of the pea aphid’s X chromosome. Transcriptomic analyses first showed X/AA and XX/AA expression ratios for expressed genes close to 1 in males and females, respectively, suggesting dosage compensation in the pea aphid. Analyses of open chromatin data obtained by Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE-seq) revealed a X chromosome chromatin accessibility globally and significantly higher in males than in females, while autosomes’ chromatin accessibility is similar between sexes. Moreover, chromatin environment of X-linked genes displaying similar expression levels in males and females—and thus likely to be compensated—is significantly more accessible in males. Conclusions Our results suggest the existence of an underlying epigenetic mechanism enhancing the X chromosome chromatin accessibility in males to allow X-linked gene dose correction between sexes in the pea aphid, similar to Drosophila. Our study gives new evidence into the comprehension of dosage compensation in link with chromatin biology in insects and newly in a major crop pest, taking benefits from both transcriptomic and open chromatin data. Electronic supplementary material The online version of this article (doi:10.1186/s13072-017-0137-1) contains supplementary material, which is available to authorized users.
Collapse
|
25
|
Erratum to: Rapid transcriptional plasticity of duplicated gene clusters enables a clonally reproducing aphid to colonise diverse plant species. Genome Biol 2017; 18:63. [PMID: 28376841 PMCID: PMC5381131 DOI: 10.1186/s13059-017-1202-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 03/29/2017] [Indexed: 11/10/2022] Open
|
26
|
Rapid transcriptional plasticity of duplicated gene clusters enables a clonally reproducing aphid to colonise diverse plant species. Genome Biol 2017; 18:27. [PMID: 28190401 PMCID: PMC5304397 DOI: 10.1186/s13059-016-1145-3] [Citation(s) in RCA: 161] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Accepted: 12/22/2016] [Indexed: 12/04/2022] Open
Abstract
Background The prevailing paradigm of host-parasite evolution is that arms races lead to increasing specialisation via genetic adaptation. Insect herbivores are no exception and the majority have evolved to colonise a small number of closely related host species. Remarkably, the green peach aphid, Myzus persicae, colonises plant species across 40 families and single M. persicae clonal lineages can colonise distantly related plants. This remarkable ability makes M. persicae a highly destructive pest of many important crop species. Results To investigate the exceptional phenotypic plasticity of M. persicae, we sequenced the M. persicae genome and assessed how one clonal lineage responds to host plant species of different families. We show that genetically identical individuals are able to colonise distantly related host species through the differential regulation of genes belonging to aphid-expanded gene families. Multigene clusters collectively upregulate in single aphids within two days upon host switch. Furthermore, we demonstrate the functional significance of this rapid transcriptional change using RNA interference (RNAi)-mediated knock-down of genes belonging to the cathepsin B gene family. Knock-down of cathepsin B genes reduced aphid fitness, but only on the host that induced upregulation of these genes. Conclusions Previous research has focused on the role of genetic adaptation of parasites to their hosts. Here we show that the generalist aphid pest M. persicae is able to colonise diverse host plant species in the absence of genetic specialisation. This is achieved through rapid transcriptional plasticity of genes that have duplicated during aphid evolution. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-1145-3) contains supplementary material, which is available to authorized users.
Collapse
|
27
|
De novo transcriptome assembly of the grapevine phylloxera allows identification of genes differentially expressed between leaf- and root-feeding forms. BMC Genomics 2016; 17:219. [PMID: 26968158 PMCID: PMC4787006 DOI: 10.1186/s12864-016-2530-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 02/24/2016] [Indexed: 11/28/2022] Open
Abstract
Background Grapevine phylloxera, an insect related to true aphids, is a major historic pest of viticulture only controlled through the selection of resistant rootstocks or through quarantine regulations where grapevine is cultivated own-rooted. Transcriptomic data could help understand the bases of its original life-traits, including a striking case of polyphenism, with forms feeding on roots and forms feeding in leaf-galls. Comparisons with true aphids (for which complete genomes have been sequenced) should also allow to link differences in life-traits of the two groups with changes in gene repertoires or shifts in patterns of expression. Results We sequenced transcriptomes of the grapevine phylloxera (Illumina technology), choosing three life-stages (adults on roots or on leaf galls, and eggs) to cover a large catalogue of transcripts, and performed a de novo assembly. This resulted in 105,697 contigs, which were annotated: most contigs had a best blastx hit to the pea aphid (phylogenetically closest complete genome), while very few bacterial hits were recorded (except for Probionibacterium acnes). Coding sequences were predicted from this data set (17,372 sequences), revealing an extremely high AT-bias (at the third codon position). Differential expression (DE) analysis among root-feeding and gall-feeding showed that i) the root-feeding form displayed a much larger number of differentially expressed transcripts ii) root-feeding biased genes were enriched in some categories, for example cuticular proteins and genes associated with cell-cell signaling iii) leaf-galling-biased genes were enriched in genes associated with the nucleus and DNA-replication, suggesting a metabolism more oriented towards fast and active multiplication. We also identified a gene family with a very high expression level (copies totaling nearly 10 % of the reads) in the grapevine phylloxera (both in root and leaf galling forms), but usually expressed at very low levels in true aphids (except in sexual oviparous females). These transcripts thus appear to be associated with oviparity. Conclusions Our study illustrated major intraspecific changes in transcriptome profiles, related with different life-styles (and the feeding on roots versus in leaf-galls). At a different scale, we could also illustrate one major shift in expression levels associated with changes in life-traits that occurred along evolution and that respectively characterize (strictly oviparous) grapevine phylloxera and (mostly viviparous) true aphids. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2530-8) contains supplementary material, which is available to authorized users.
Collapse
|
28
|
Tools and data services registry: a community effort to document bioinformatics resources. Nucleic Acids Res 2015; 44:D38-47. [PMID: 26538599 PMCID: PMC4702812 DOI: 10.1093/nar/gkv1116] [Citation(s) in RCA: 86] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 10/13/2015] [Indexed: 01/24/2023] Open
Abstract
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand. Here we present a community-driven curation effort, supported by ELIXIR—the European infrastructure for biological information—that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners. As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
Collapse
|
29
|
BioMAJ2Galaxy: automatic update of reference data in Galaxy using BioMAJ. Gigascience 2015; 4:22. [PMID: 25960870 PMCID: PMC4425870 DOI: 10.1186/s13742-015-0063-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Accepted: 04/22/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many bioinformatics tools use reference data, such as genome assemblies or sequence databanks. Galaxy offers multiple ways to give access to this data through its web interface. However, the process of adding new reference data was customarily manual and time consuming, even more so when this data needed to be indexed in a variety of formats (e.g. Blast, Bowtie, BWA, or 2bit). BioMAJ is a widely used and stable software that is designed to automate the download and transformation of data from various sources. This data can be used directly from the command line, in more complex systems, such as Mobyle, or by using a REST API. FINDINGS To ease the process of giving access to reference data in Galaxy, we have developed the BioMAJ2Galaxy module, which enables the gap between BioMAJ and Galaxy to be bridged. With this module, it is now possible to configure BioMAJ to automatically download some reference data, to then convert and/or index it in various formats, and then make this data available in a Galaxy server using data libraries or data managers. CONCLUSIONS The developments presented in this paper allow us to integrate the reference data in Galaxy in an automatic, reliable, and diskspace-saving way. The code is freely available on the GenOuest GitHub account (https://github.com/genouest/biomaj2galaxy).
Collapse
|
30
|
Establishment and analysis of a reference transcriptome for Spodoptera frugiperda. BMC Genomics 2014; 15:704. [PMID: 25149648 PMCID: PMC4150953 DOI: 10.1186/1471-2164-15-704] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Accepted: 08/15/2014] [Indexed: 12/20/2022] Open
Abstract
Background Spodoptera frugiperda (Noctuidae) is a major agricultural pest throughout the American continent. The highly polyphagous larvae are frequently devastating crops of importance such as corn, sorghum, cotton and grass. In addition, the Sf9 cell line, widely used in biochemistry for in vitro protein production, is derived from S. frugiperda tissues. Many research groups are using S. frugiperda as a model organism to investigate questions such as plant adaptation, pest behavior or resistance to pesticides. Results In this study, we constructed a reference transcriptome assembly (Sf_TR2012b) of RNA sequences obtained from more than 35 S. frugiperda developmental time-points and tissue samples. We assessed the quality of this reference transcriptome by annotating a ubiquitous gene family - ribosomal proteins - as well as gene families that have a more constrained spatio-temporal expression and are involved in development, immunity and olfaction. We also provide a time-course of expression that we used to characterize the transcriptional regulation of the gene families studied. Conclusion We conclude that the Sf_TR2012b transcriptome is a valid reference transcriptome. While its reliability decreases for the detection and annotation of genes under strong transcriptional constraint we still recover a fair percentage of tissue-specific transcripts. That allowed us to explore the spatial and temporal expression of genes and to observe that some olfactory receptors are expressed in antennae and palps but also in other non related tissues such as fat bodies. Similarly, we observed an interesting interplay of gene families involved in immunity between fat bodies and antennae. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-704) contains supplementary material, which is available to authorized users.
Collapse
|
31
|
Seqcrawler: biological data indexing and browsing platform. BMC Bioinformatics 2012; 13:175. [PMID: 22827839 PMCID: PMC3481441 DOI: 10.1186/1471-2105-13-175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Accepted: 06/19/2012] [Indexed: 11/10/2022] Open
|
32
|
The duplicated genes database: identification and functional annotation of co-localised duplicated genes across genomes. PLoS One 2012; 7:e50653. [PMID: 23209799 PMCID: PMC3508997 DOI: 10.1371/journal.pone.0050653] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2012] [Accepted: 10/24/2012] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND There has been a surge in studies linking genome structure and gene expression, with special focus on duplicated genes. Although initially duplicated from the same sequence, duplicated genes can diverge strongly over evolution and take on different functions or regulated expression. However, information on the function and expression of duplicated genes remains sparse. Identifying groups of duplicated genes in different genomes and characterizing their expression and function would therefore be of great interest to the research community. The 'Duplicated Genes Database' (DGD) was developed for this purpose. METHODOLOGY Nine species were included in the DGD. For each species, BLAST analyses were conducted on peptide sequences corresponding to the genes mapped on a same chromosome. Groups of duplicated genes were defined based on these pairwise BLAST comparisons and the genomic location of the genes. For each group, Pearson correlations between gene expression data and semantic similarities between functional GO annotations were also computed when the relevant information was available. CONCLUSIONS The Duplicated Gene Database provides a list of co-localised and duplicated genes for several species with the available gene co-expression level and semantic similarity value of functional annotation. Adding these data to the groups of duplicated genes provides biological information that can prove useful to gene expression analyses. The Duplicated Gene Database can be freely accessed through the DGD website at http://dgd.genouest.org.
Collapse
|
33
|
Abstract
CyanoLyase (http://cyanolyase.genouest.org/) is a manually curated sequence and motif database of phycobilin lyases and related proteins. These enzymes catalyze the covalent ligation of chromophores (phycobilins) to specific binding sites of phycobiliproteins (PBPs). The latter constitute the building bricks of phycobilisomes, the major light-harvesting systems of cyanobacteria and red algae. Phycobilin lyases sequences are poorly annotated in public databases. Sequences included in CyanoLyase were retrieved from all available genomes of these organisms and a few others by similarity searches using biochemically characterized enzyme sequences and then classified into 3 clans and 32 families. Amino acid motifs were computed for each family using Protomata learner. CyanoLyase also includes BLAST and a novel pattern matching tool (Protomatch) that allow users to rapidly retrieve and annotate lyases from any new genome. In addition, it provides phylogenetic analyses of all phycobilin lyases families, describes their function, their presence/absence in all genomes of the database (phyletic profiles) and predicts the chromophorylation of PBPs in each strain. The site also includes a thorough bibliography about phycobilin lyases and genomes included in the database. This resource should be useful to scientists and companies interested in natural or artificial PBPs, which have a number of biotechnological applications, notably as fluorescent markers.
Collapse
|
34
|
PHYMYCO-DB: a curated database for analyses of fungal diversity and evolution. PLoS One 2012; 7:e43117. [PMID: 23028445 PMCID: PMC3441585 DOI: 10.1371/journal.pone.0043117] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Accepted: 07/16/2012] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND In environmental sequencing studies, fungi can be identified based on nucleic acid sequences, using either highly variable sequences as species barcodes or conserved sequences containing a high-quality phylogenetic signal. For the latter, identification relies on phylogenetic analyses and the adoption of the phylogenetic species concept. Such analysis requires that the reference sequences are well identified and deposited in public-access databases. However, many entries in the public sequence databases are problematic in terms of quality and reliability and these data require screening to ensure correct phylogenetic interpretation. METHODS AND PRINCIPAL FINDINGS To facilitate phylogenetic inferences and phylogenetic assignment, we introduce a fungal sequence database. The database PHYMYCO-DB comprises fungal sequences from GenBank that have been filtered to satisfy stringent sequence quality criteria. For the first release, two widely used molecular taxonomic markers were chosen: the nuclear SSU rRNA and EF1-α gene sequences. Following the automatic extraction and filtration, a manual curation is performed to remove problematic sequences while preserving relevant sequences useful for phylogenetic studies. As a result of curation, ~20% of the automatically filtered sequences have been removed from the database. To demonstrate how PHYMYCO-DB can be employed, we test a set of environmental Chytridiomycota sequences obtained from deep sea samples. CONCLUSION PHYMYCO-DB offers the tools necessary to: (i) extract high quality fungal sequences for each of the 5 fungal phyla, at all taxonomic levels, (ii) extract already performed alignments, to act as 'reference alignments', (iii) launch alignments of personal sequences along with stored data. A total of 9120 SSU rRNA and 672 EF1-α high-quality fungal sequences are now available. The PHYMYCO-DB is accessible through the URL http://phymycodb.genouest.org/.
Collapse
|
35
|
AnnotQTL: a new tool to gather functional and comparative information on a genomic region. Nucleic Acids Res 2011; 39:W328-33. [PMID: 21596783 PMCID: PMC3125768 DOI: 10.1093/nar/gkr361] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
AnnotQTL is a web tool designed to aggregate functional annotations from different prominent web sites by minimizing the redundancy of information. Although thousands of QTL regions have been identified in livestock species, most of them are large and contain many genes. This tool was therefore designed to assist the characterization of genes in a QTL interval region as a step towards selecting the best candidate genes. It localizes the gene to a specific region (using NCBI and Ensembl data) and adds the functional annotations available from other databases (Gene Ontology, Mammalian Phenotype, HGNC and Pubmed). Both human genome and mouse genome can be aligned with the studied region to detect synteny and segment conservation, which is useful for running inter-species comparisons of QTL locations. Finally, custom marker lists can be included in the results display to select the genes that are closest to your most significant markers. We use examples to demonstrate that in just a couple of hours, AnnotQTL is able to identify all the genes located in regions identified by a full genome scan, with some highlighted based on both location and function, thus considerably increasing the chances of finding good candidate genes. AnnotQTL is available at http://annotqtl.genouest.org.
Collapse
|