1
|
Ye X, Yang Y, Fang Q, Ye G. Genomics of insect natural enemies in agroecosystems. CURRENT OPINION IN INSECT SCIENCE 2025; 68:101298. [PMID: 39547440 DOI: 10.1016/j.cois.2024.101298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 09/26/2024] [Accepted: 11/10/2024] [Indexed: 11/17/2024]
Abstract
Currently, a wealth of genomic data are now accessible for numerous insect natural enemies, serving as valuable resources that deepen our understanding of the genetic basis of biocontrol traits in these organisms. We summarize the current state of genome sequencing and highlight candidate genes related to biocontrol traits that hold promise for genetic improvement. We also review the recent population genomic studies in biological control and the discovery of potential insecticidal genes in parasitoid wasps. Collectively, current genomic works have shown the powerful ability to identify candidate genes responsible for desirable traits or promising effectors. However, further functional study is necessary to gain a mechanistic understanding of these genes, and future efforts are also needed to develop suitable approaches to translate genomic insights into field applications.
Collapse
Affiliation(s)
- Xinhai Ye
- College of Advanced Agriculture Science, Zhejiang A&F University, Hangzhou 311300, China; Zhejiang Key Laboratory of Biology and Ecological Regulation of Crop Pathogens and Insects, Zhejiang A&F University, Hangzhou 311300, China.
| | - Yi Yang
- State Key Laboratory of Rice Biology and Breeding & Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou 310058, China
| | - Qi Fang
- State Key Laboratory of Rice Biology and Breeding & Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou 310058, China
| | - Gongyin Ye
- State Key Laboratory of Rice Biology and Breeding & Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
2
|
Fields O, Hammond MJ, Xu X, O'Neill EC. Advances in euglenoid genomics: unravelling the fascinating biology of a complex clade. Trends Genet 2025; 41:251-260. [PMID: 39147613 DOI: 10.1016/j.tig.2024.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 07/23/2024] [Accepted: 07/23/2024] [Indexed: 08/17/2024]
Abstract
Euglenids have long been studied due to their unique physiology and versatile metabolism, providing underpinnings for much of our understanding of photosynthesis and biochemistry, and a growing opportunity in biotechnology. Until recently there has been a lack of genetic studies due to their large and complex genomes, but recently new technologies have begun to unveil their genetic capabilities. Whilst much research has focused on the model organism Euglena gracilis, other members of the euglenids have now started to receive due attention. Currently only poor nuclear genome assemblies of E. gracilis and Rhabdomonas costata are available, but there are many more plastid genome sequences and an increasing number of transcriptomes. As more assemblies become available, there are great opportunities to understand the fundamental biology of these organisms and to exploit them for biotechnology.
Collapse
Affiliation(s)
- Oskar Fields
- School of Chemistry, University of Nottingham, University Park, Nottingham, NG7 2RD, UK; Biodiscovery Institute, University of Nottingham, University Park, Nottingham, NG7 2RD, UK; These authors contributed equally
| | - Michael J Hammond
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice (Budweis), Czech Republic; Faculty of Science, University of South Bohemia, České Budějovice (Budweis), Czech Republic; These authors contributed equally
| | - Xiao Xu
- School of Chemistry, University of Nottingham, University Park, Nottingham, NG7 2RD, UK; Biodiscovery Institute, University of Nottingham, University Park, Nottingham, NG7 2RD, UK; These authors contributed equally
| | - Ellis C O'Neill
- School of Chemistry, University of Nottingham, University Park, Nottingham, NG7 2RD, UK; Biodiscovery Institute, University of Nottingham, University Park, Nottingham, NG7 2RD, UK.
| |
Collapse
|
3
|
Hoile AE, Holland PWH, Mulhair PO. Gene novelty and gene family expansion in the early evolution of Lepidoptera. BMC Genomics 2025; 26:161. [PMID: 39966712 PMCID: PMC11837612 DOI: 10.1186/s12864-025-11338-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Accepted: 02/10/2025] [Indexed: 02/20/2025] Open
Abstract
BACKGROUND Almost 10% of all known animal species belong to Lepidoptera: moths and butterflies. To understand how this incredible diversity evolved we assess the role of gene gain in driving early lepidopteran evolution. Here, we compared the complete genomes of 115 insect species, including 99 Lepidoptera, to search for novel genes coincident with the emergence of Lepidoptera. RESULTS We find 217 orthogroups or gene families which emerged on the branch leading to Lepidoptera; of these 177 likely arose by gene duplication followed by extensive sequence divergence, 2 are candidates for origin by horizontal gene transfer, and 38 have no known homology outside of Lepidoptera and possibly arose via de novo gene genesis. We focus on two new gene families that are conserved across all lepidopteran species and underwent extensive duplication, suggesting important roles in lepidopteran biology. One encodes a family of sugar and ion transporter molecules, potentially involved in the evolution of diverse feeding behaviours in early Lepidoptera. The second encodes a family of unusual propeller-shaped proteins that likely originated by horizontal gene transfer from Spiroplasma bacteria; we name these the Lepidoptera propellin genes. CONCLUSION We provide the first insights into the role of genetic novelty in the early evolution of Lepidoptera. This gives new insight into the rate of gene gain during the evolution of the order as well as providing context on the likely mechanisms of origin. We describe examples of new genes which were retained and duplicated further in all lepidopteran species, suggesting their importance in Lepidoptera evolution.
Collapse
Affiliation(s)
- Asia E Hoile
- Department of Biology, University of Oxford, Mansfield Road, Oxford, OX1 3SZ, UK
| | - Peter W H Holland
- Department of Biology, University of Oxford, Mansfield Road, Oxford, OX1 3SZ, UK.
| | - Peter O Mulhair
- Department of Biology, University of Oxford, Mansfield Road, Oxford, OX1 3SZ, UK.
| |
Collapse
|
4
|
Munn PR, Chia J, Danko CG. Accurate de novo transcription unit annotation from run-on and sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.12.637853. [PMID: 40027686 PMCID: PMC11870431 DOI: 10.1101/2025.02.12.637853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Functional element annotations are critical tools used to provide insight into the molecular processes governing cell development, differentiation, and disease. Run-on and sequencing assays measure the production of nascent RNAs and can provide an effective data source for discovering functional elements. However, the accurate inference of functional elements from run-on sequencing data remains an open problem because the signal is noisy and challenging to model. Here we investigated computational approaches that convert run-on and sequencing data into annotations representing transcription units, including genes and non-coding RNAs. We developed a convolutional neural network, called c onvolutional discovery of g ene a natomy using P RO-seq (CGAP), trained to identify different anatomical features of a transcription unit, which were then stitched together into transcript annotations using a hidden Markov model (HMM). Comparison with existing methods showed a significant performance improvement using our novel CGAP-HMM approach. We developed a voting system that ensembles the top three annotation strategies, resulting in large and significant improvements in transcription unit annotation accuracy over the best performing individual method. Finally, we also report a conditional generative adversarial network (cGAN) as a generative approach to transcription unit annotation that shows promise for further development. Collectively our work provides novel tools for de novo transcription unit annotation from run-on and sequencing data that are accurate enough to be useful in many applications.
Collapse
|
5
|
Mouratidis I, Konnaris MA, Chantzi N, Chan CSY, Patsakis M, Provatas K, Montgomery A, Baltoumas FA, Sha CM, Mareboina M, Pavlopoulos GA, Chartoumpekis DV, Georgakopoulos-Soares I. Identification of the shortest species-specific oligonucleotide sequences. Genome Res 2025; 35:279-295. [PMID: 39746719 PMCID: PMC11874967 DOI: 10.1101/gr.280070.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Accepted: 11/27/2024] [Indexed: 01/04/2025]
Abstract
Despite the exponential increase in sequencing information driven by massively parallel DNA sequencing technologies, universal and succinct genomic fingerprints for each organism are still missing. Identifying the shortest species-specific nucleotide sequences offers insights into species evolution and holds potential practical applications in agriculture, wildlife conservation, and healthcare. We propose a new method for sequence analysis termed nucleic "quasi-primes," the shortest occurring sequences in each of 45,076 organismal reference genomes, present in one genome and absent from every other examined genome. In the human genome, we find that the genomic loci of nucleic quasi-primes are most enriched for genes associated with brain development and cognitive function. In a single-cell case study focusing on the human primary motor cortex, nucleic quasi-prime genes account for a significantly larger proportion of the variation based on average gene expression. Nonneuronal cell types, including astrocytes, endothelial cells, microglia perivascular-macrophages, oligodendrocytes, and vascular and leptomeningeal cells, exhibit significant activation of quasi-prime-containing gene associations related to cancer, whereas simultaneously suppressing quasi-prime-containing genes are associated with cognitive, mental, and developmental disorders. We also show that human disease-causing variants, eQTLs, mQTLs, and sQTLs are 4.43-fold, 4.34-fold, 4.29-fold, and 4.21-fold enriched at human quasi-prime loci, respectively. These findings indicate that nucleic quasi-primes are genomic loci linked to the evolution of species-specific traits, and in humans, they provide insights in the development of cognitive traits and human diseases, including neurodevelopmental disorders.
Collapse
Affiliation(s)
- Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Maxwell A Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Candace S Y Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California 94143, USA
| | - Michail Patsakis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
- National Technical University of Athens, School of Electrical and Computer Engineering, Athens 15772, Greece
| | - Kimonas Provatas
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
- National Technical University of Athens, School of Electrical and Computer Engineering, Athens 15772, Greece
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
| | - Fotis A Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming," Vari 16672, Greece
| | - Congzhou M Sha
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
| | - Georgios A Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming," Vari 16672, Greece
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, Athens 11527, Greece
| | - Dionysios V Chartoumpekis
- Service of Endocrinology, Diabetology and Metabolism, Lausanne University Hospital, 1005 Lausanne, Switzerland
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA;
| |
Collapse
|
6
|
Jenike KM, Campos-Domínguez L, Boddé M, Cerca J, Hodson CN, Schatz MC, Jaron KS. k-mer approaches for biodiversity genomics. Genome Res 2025; 35:219-230. [PMID: 39890468 PMCID: PMC11874746 DOI: 10.1101/gr.279452.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 01/09/2025] [Indexed: 02/03/2025]
Abstract
The wide array of currently available genomes displays a wonderful diversity in size, composition, and structure and is quickly expanding thanks to several global biodiversity genomics initiatives. However, sequencing of genomes, even with the latest technologies, can still be challenging for both technical (e.g., small physical size, contaminated samples, or access to appropriate sequencing platforms) and biological reasons (e.g., germline-restricted DNA, variable ploidy levels, sex chromosomes, or very large genomes). In recent years, k-mer-based techniques have become popular to overcome some of these challenges. They are based on the simple process of dividing the analyzed sequences (e.g., raw reads or genomes) into a set of subsequences of length k, called k-mers, and then analyzing the frequency or sequences of those k-mers. Analyses based on k-mers allow for a rapid and intuitive assessment of complex sequencing data sets. Here, we provide a comprehensive review to the theoretical properties and practical applications of k-mers in biodiversity genomics with a special focus on genome modeling.
Collapse
Affiliation(s)
- Katharine M Jenike
- Johns Hopkins University, School of Medicine, Baltimore, Maryland 21205, USA
| | - Lucía Campos-Domínguez
- Centre for Research in Agricultural Genomics, CRAG (CSIC-IRTA-UAB-UB), Campus UAB, Cerdanyola del Vallès, 08193 Barcelona, Spain
| | - Marilou Boddé
- Tree of Life, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - José Cerca
- Center for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, 0313 Oslo, Norway
| | - Christina N Hodson
- University College London, UCL Department of Genetics, Evolution & Environment, London, WC1E 6BT, United Kingdom
| | - Michael C Schatz
- Johns Hopkins University, School of Medicine, Baltimore, Maryland 21205, USA
| | - Kamil S Jaron
- Tree of Life, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom;
| |
Collapse
|
7
|
Lukhtanov VA. Telomere DNA in the insect order Dermaptera and the first evidence for the non-canonical telomeric motif TTCGG in Arthropoda. COMPARATIVE CYTOGENETICS 2025; 19:13-18. [PMID: 39958913 PMCID: PMC11829193 DOI: 10.3897/compcytogen.19.142613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2024] [Accepted: 01/04/2025] [Indexed: 02/18/2025]
Abstract
Despite recent advances in telomere research, the telomere DNA organization remains unknown for representatives of several insect orders. In this study, analysis of the chromosome-level genome assembly shows that the telomeric DNA of the earwig Labiaminor (Linnaeus, 1758) (Polyneoptera, Dermaptera, Spongiphoridae) consists of repeats of the 5 bp motif TTCGG/CCGAA. This is the first record describing the structure of telomeric DNA in the order Dermaptera. This record expands the spectrum of the known telomeric sequences, since the TTCGG motif has not been reported for insects previously.
Collapse
Affiliation(s)
- Vladimir A. Lukhtanov
- Department of Karyosystematics, Zoological Institute of the Russian Academy of Sciences, Universitetskaya nab. 1, St. Petersburg 199034, RussiaZoological Institute of the Russian Academy of SciencesSt. PetersburgRussia
| |
Collapse
|
8
|
Brown MR, Manuel Gonzalez de La Rosa P, Blaxter M. tidk: a toolkit to rapidly identify telomeric repeats from genomic datasets. Bioinformatics 2025; 41:btaf049. [PMID: 39891350 PMCID: PMC11814493 DOI: 10.1093/bioinformatics/btaf049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 12/10/2024] [Accepted: 01/29/2025] [Indexed: 02/03/2025] Open
Abstract
SUMMARY "tidk" (short for telomere identification toolkit) uses a simple, fast algorithm to scan long DNA reads for the presence of short tandemly repeated DNA in runs, and to aggregate them based on canonical DNA string representation. These are telomeric repeat candidates. Our algorithm is shown to be accurate in genomes for which the telomeric repeat unit is known and is tested across a wide variety of newly assembled genomes to uncover new telomeric repeat units. Tools are provided to identify telomeric repeats de novo, scan genomes for known telomeric repeats, and to visualize telomeric repeats on the assembly. "tidk" is implemented in Rust and is available as a command line tool which can be compiled using the Rust toolchain or downloaded as a binary from bioconda. AVAILABILITY AND IMPLEMENTATION The "tidk" Rust crate is freely available under the MIT license (https://crates.io/crates/tidk), and the source code is available at https://github.com/tolkit/telomeric-identifier.
Collapse
Affiliation(s)
- Max R Brown
- School of Life Sciences, Anglia Ruskin University, Cambridge, CB1 1PT, United Kingdom
| | | | - Mark Blaxter
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1RQ, United Kingdom
| |
Collapse
|
9
|
Law CT, Burns KH. Comparative Genomics Reveals LINE-1 Recombination with Diverse RNAs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.02.635956. [PMID: 39975348 PMCID: PMC11838501 DOI: 10.1101/2025.02.02.635956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Long interspersed element-1 (LINE-1, L1) retrotransposons are the most abundant protein-coding transposable elements (TE) in mammalian genomes, and have shaped genome content over 170 million years of evolution. LINE-1 is self-propagating and mobilizes other sequences, including Alu elements. Occasionally, LINE-1 forms chimeric insertions with non-coding RNAs and mRNAs. U6 spliceosomal small nuclear RNA/LINE-1 chimeras are best known, though there are no comprehensive catalogs of LINE-1 chimeras. To address this, we developed TiMEstamp, a computational pipeline that leverages multiple sequence alignments (MSA) to estimate the age of LINE-1 insertions and identify candidate chimeric insertions where an adjacent sequence arrives contemporaneously. Candidates were refined by detecting hallmark features of L1 retrotransposition, such as target site duplication (TSD). Applying this pipeline to the human genome, we recovered all known species of LINE-1 chimeras and discovered new chimeric insertions involving small RNAs, Alu elements, and mRNA fragments. Some insertions are compatible with known mechanisms, such as RNA ligation. Other structures nominate novel mechanisms, such as trans-splicing. We also see evidence that LINE-1 loci with defunct promoters can acquire regulatory elements from nearby genes to restore retrotransposition activity. These discoveries highlight the recombinatory potential of LINE-1 RNA with implications for genome evolution and TE domestication.
Collapse
Affiliation(s)
- Cheuk-Ting Law
- Corresponding authors: Cheuk-Ting Law (), Kathleen H. Burns ()
| | | |
Collapse
|
10
|
Li Z, Xu Z, Zhu L, Qin T, Ma J, Feng Z, Yue H, Guan Q, Zhou B, Han G, Zhang G, Li C, Jia S, Qiu Q, Hao D, Wang Y, Wang W. High-quality sika deer omics data and integrative analysis reveal genic and cellular regulation of antler regeneration. Genome Res 2025; 35:188-201. [PMID: 39542648 PMCID: PMC11789637 DOI: 10.1101/gr.279448.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 10/28/2024] [Indexed: 11/17/2024]
Abstract
The antler is the only organ that can fully regenerate annually in mammals. However, the regulatory pattern and mechanism of gene expression and cell differentiation during this process remain largely unknown. Here, we obtain comprehensive assembly and gene annotation of the sika deer (Cervus nippon) genome. We construct, together with large-scale chromatin accessibility and gene expression data, gene regulatory networks involved in antler regeneration, identifying four transcription factors, MYC, KLF4, NFE2L2, and JDP2, with high regulatory activity across the whole regeneration process. Comparative studies and luciferase reporter assay suggest the MYC expression driven by a cervid-specific regulatory element might be important for antler regenerative ability. We further develop a model called combinatorial TF Oriented Program (cTOP), which integrates single-cell data with bulk regulatory networks and find PRDM1, FOSL1, BACH1, and NFATC1 as potential pivotal factors in antler stem cell activation and osteogenic differentiation. Additionally, we uncover interactions within and between cell programs and pathways during the regeneration process. These findings provide insights into the gene and cell regulatory mechanisms of antler regeneration, particularly in stem cell activation and differentiation.
Collapse
Affiliation(s)
- Zihe Li
- New Cornerstone Science Laboratory, Shaanxi Key Laboratory of Qinling Ecological Intelligent Monitoring and Protection, School of Ecology and Environment, Northwestern Polytechnical University, Xi'an 710072, China
| | - Ziyu Xu
- CEMS, NCMIS, HCMS, MADIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematics, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Beijing 100049, China
| | - Lei Zhu
- Department of Spine Surgery, Honghui Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi 710054, China
- Shaanxi Key Laboratory of Spine Bionic Treatment, Xi'an, Shaanxi 710054, China
| | - Tao Qin
- New Cornerstone Science Laboratory, Shaanxi Key Laboratory of Qinling Ecological Intelligent Monitoring and Protection, School of Ecology and Environment, Northwestern Polytechnical University, Xi'an 710072, China
| | - Jinrui Ma
- New Cornerstone Science Laboratory, Shaanxi Key Laboratory of Qinling Ecological Intelligent Monitoring and Protection, School of Ecology and Environment, Northwestern Polytechnical University, Xi'an 710072, China
| | - Zhanying Feng
- CEMS, NCMIS, HCMS, MADIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- Department of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford University, Stanford, California 94305, USA
| | - Huishan Yue
- New Cornerstone Science Laboratory, Shaanxi Key Laboratory of Qinling Ecological Intelligent Monitoring and Protection, School of Ecology and Environment, Northwestern Polytechnical University, Xi'an 710072, China
| | - Qing Guan
- Key Laboratory of Genetic Evolution & Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Botong Zhou
- New Cornerstone Science Laboratory, Shaanxi Key Laboratory of Qinling Ecological Intelligent Monitoring and Protection, School of Ecology and Environment, Northwestern Polytechnical University, Xi'an 710072, China
| | - Ge Han
- New Cornerstone Science Laboratory, Shaanxi Key Laboratory of Qinling Ecological Intelligent Monitoring and Protection, School of Ecology and Environment, Northwestern Polytechnical University, Xi'an 710072, China
| | - Guokun Zhang
- Institute of Antler Science and Product Technology, Changchun Sci-Tech University, 130600 Changchun, China
| | - Chunyi Li
- Institute of Antler Science and Product Technology, Changchun Sci-Tech University, 130600 Changchun, China
| | - Shuaijun Jia
- Department of Spine Surgery, Honghui Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi 710054, China
- Shaanxi Key Laboratory of Spine Bionic Treatment, Xi'an, Shaanxi 710054, China
| | - Qiang Qiu
- New Cornerstone Science Laboratory, Shaanxi Key Laboratory of Qinling Ecological Intelligent Monitoring and Protection, School of Ecology and Environment, Northwestern Polytechnical University, Xi'an 710072, China;
| | - Dingjun Hao
- Department of Spine Surgery, Honghui Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi 710054, China;
- Shaanxi Key Laboratory of Spine Bionic Treatment, Xi'an, Shaanxi 710054, China
| | - Yong Wang
- CEMS, NCMIS, HCMS, MADIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China;
- School of Mathematics, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Beijing 100049, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| | - Wen Wang
- New Cornerstone Science Laboratory, Shaanxi Key Laboratory of Qinling Ecological Intelligent Monitoring and Protection, School of Ecology and Environment, Northwestern Polytechnical University, Xi'an 710072, China;
- Key Laboratory of Genetic Evolution & Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| |
Collapse
|
11
|
Lewin TD, Liao IJY, Chen ME, Bishop JDD, Holland PWH, Luo YJ. Fusion, fission, and scrambling of the bilaterian genome in Bryozoa. Genome Res 2025; 35:78-92. [PMID: 39762050 PMCID: PMC11789643 DOI: 10.1101/gr.279636.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 10/31/2024] [Indexed: 01/24/2025]
Abstract
Groups of orthologous genes are commonly found together on the same chromosome over vast evolutionary distances. This extensive physical gene linkage, known as macrosynteny, is seen between bilaterian phyla as divergent as Chordata, Echinodermata, Mollusca, and Nemertea. Here, we report a unique pattern of genome evolution in Bryozoa, an understudied phylum of colonial invertebrates. Using comparative genomics, we reconstruct the chromosomal evolutionary history of five bryozoans. Multiple ancient chromosome fusions followed by gene mixing led to the near-complete loss of bilaterian linkage groups in the ancestor of extant bryozoans. A second wave of rearrangements, including chromosome fission, then occurred independently in two bryozoan classes, further scrambling bryozoan genomes. We also discover at least five derived chromosomal fusion events shared between bryozoans and brachiopods, supporting the traditional but highly debated Lophophorata hypothesis and suggesting macrosynteny to be a potentially powerful source of phylogenetic information. Finally, we show that genome rearrangements led to the dispersion of genes from bryozoan Hox clusters onto multiple chromosomes. Our findings demonstrate that the canonical bilaterian genome structure has been lost across all studied representatives of an entire phylum, and reveal that linkage group fission can occur very frequently in specific lineages.
Collapse
Affiliation(s)
- Thomas D Lewin
- Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan
| | | | - Mu-En Chen
- Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan
| | - John D D Bishop
- Marine Biological Association, Plymouth PL1 2PB, United Kingdom
| | - Peter W H Holland
- Department of Biology, University of Oxford, Oxford OX1 3SZ, United Kingdom
| | - Yi-Jyun Luo
- Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan;
| |
Collapse
|
12
|
Crowley LM, Wawman DC. The genome sequence of the Ruddy Darter, Sympetrum sanguineum (Müller, 1764). Wellcome Open Res 2025; 10:23. [PMID: 40027405 PMCID: PMC11868746 DOI: 10.12688/wellcomeopenres.23466.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/27/2024] [Indexed: 03/05/2025] Open
Abstract
We present a genome assembly from a male specimen of Sympetrum sanguineum (Ruddy Darter; Arthropoda; Insecta; Odonata; Libellulidae). The haplotype-resolved assembly contains two haplotypes with total lengths of 1,500.53 megabases and 1,304.05 megabases. Most of haplotype 1 is scaffolded into 13 chromosomal pseudomolecules, including the X sex chromosome, while haplotype 2 is scaffolded into 12 autosomes.
Collapse
|
13
|
Hunter T. The genome sequence of the tawny cockroach, Ectobius (Ectobius) pallidus (Olivier, 1789). Wellcome Open Res 2025; 10:22. [PMID: 39866809 PMCID: PMC11754957 DOI: 10.12688/wellcomeopenres.23463.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/27/2024] [Indexed: 01/28/2025] Open
Abstract
We present a genome assembly from a specimen of Ectobius pallidus (tawny cockroach; Arthropoda; Insecta; Blattodea; Ectobiidae). The assembly contains two haplotypes with total lengths of 2,087.55 megabases and 2,124.67 megabases, respectively. Most of haplotype 1 (98.55%) is scaffolded into 11 chromosomal pseudomolecules, while haplotype 2 is assembled to scaffold level. The mitochondrial genome has also been assembled and is 15.75 kilobases in length.
Collapse
Affiliation(s)
- Tony Hunter
- Entomology Section, World Museum, Liverpool, England, UK
| | | | | | | | | | | | | | | |
Collapse
|
14
|
Halstead A, Falk S. The genome sequence of a sawfly, Athalia cordata Serville, 1823. Wellcome Open Res 2025; 10:15. [PMID: 39990998 PMCID: PMC11842964 DOI: 10.12688/wellcomeopenres.23456.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/27/2024] [Indexed: 02/25/2025] Open
Abstract
We present a genome assembly from a female specimen of Athalia cordata (sawfly; Arthropoda; Insecta; Hymenoptera; Athaliidae). The genome sequence has a total length of 169.00 megabases. Most of the assembly (99.98%) is scaffolded into 4 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 27.47 kilobases in length.
Collapse
Affiliation(s)
- Andrew Halstead
- Independent researcher, Knaphill, Woking, Surrey, England, UK
| | - Steven Falk
- Independent researcher, Kenilworth, Warwickshire, England, UK
| | | | | | | | - Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team
- Independent researcher, Knaphill, Woking, Surrey, England, UK
- Independent researcher, Kenilworth, Warwickshire, England, UK
| | | | | | | | | |
Collapse
|
15
|
McCulloch J, Crowley LM. The genome sequence of a cranefly, Diogma glabrata (Meigen, 1818). Wellcome Open Res 2025; 10:20. [PMID: 40027408 PMCID: PMC11868750 DOI: 10.12688/wellcomeopenres.23462.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/27/2024] [Indexed: 03/05/2025] Open
Abstract
We present a genome assembly from a specimen of Diogma glabrata (cranefly; Arthropoda; Insecta; Diptera; Cylindrotomidae). The genome sequence has a total length of 1,328.70 megabases. Most of the assembly (90.7%) is scaffolded into 4 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 17.5 kilobases in length.
Collapse
Affiliation(s)
- James McCulloch
- University of Oxford, Oxford, England, UK
- Wellcome Sanger Institute, Hinxton, England, UK
| | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Fowler K. The genome sequence of a leafhopper, Allygus modestus Scott, 1876. Wellcome Open Res 2025; 10:9. [PMID: 39881685 PMCID: PMC11775446 DOI: 10.12688/wellcomeopenres.23451.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/27/2024] [Indexed: 01/31/2025] Open
Abstract
We present a genome assembly from an individual male specimen of Allygus modestus (leafhopper; Arthropoda; Insecta; Hemiptera; Cicadellidae). The genome sequence has a total length of 1,819.90 megabases. Most of the assembly (99.86%) is scaffolded into 7 chromosomal pseudomolecules, including the X sex chromosome. The mitochondrial genome has also been assembled and is 16.69 kilobases in length.
Collapse
|
17
|
Cunningham A, Halstead A. The genome sequence of a sawfly, Abia candens Konow, 1887. Wellcome Open Res 2025; 10:2. [PMID: 39912117 PMCID: PMC11795027 DOI: 10.12688/wellcomeopenres.23449.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/27/2024] [Indexed: 02/07/2025] Open
Abstract
We present a genome assembly from an individual male specimen of Abia candens (sawfly; Arthropoda; Insecta; Hymenoptera; Cimbicidae). The genome sequence has a total length of 261.00 megabases. Most of the assembly (82.7%) is scaffolded into 16 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 19.72 kilobases in length.
Collapse
Affiliation(s)
| | - Andrew Halstead
- Independent researcher, Knaphill, Woking, Surrey, England, UK
| | | | | | - Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team
- Independent researcher, Tiverton, Devon, England, UK
- Independent researcher, Knaphill, Woking, Surrey, England, UK
| | | | | | | | | |
Collapse
|
18
|
Dewar AE, Belcher LJ, West SA. A phylogenetic approach to comparative genomics. Nat Rev Genet 2025:10.1038/s41576-024-00803-0. [PMID: 39779997 PMCID: PMC7617348 DOI: 10.1038/s41576-024-00803-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/05/2024] [Indexed: 01/11/2025]
Abstract
Comparative genomics, whereby the genomes of different species are compared, has the potential to address broad and fundamental questions at the intersection of genetics and evolution. However, species, genomes and genes cannot be considered as independent data points within statistical tests. Closely related species tend to be similar because they share genes by common descent, which must be accounted for in analyses. This problem of non-independence may be exacerbated when examining genomes or genes but can be addressed by applying phylogeny-based methods to comparative genomic analyses. Here, we review how controlling for phylogeny can change the conclusions of comparative genomics studies. We address common questions on how to apply these methods and illustrate how they can be used to test causal hypotheses. The combination of rapidly expanding genomic datasets and phylogenetic comparative methods is set to revolutionize the biological insights possible from comparative genomic studies.
Collapse
Affiliation(s)
- Anna E Dewar
- Department of Biology, University of Oxford, Oxford, UK.
- St John's College, Oxford, UK.
| | | | - Stuart A West
- Department of Biology, University of Oxford, Oxford, UK
| |
Collapse
|
19
|
Xian W, Bezrukov I, Bao Z, Vorbrugg S, Gautam A, Weigel D. TIPPo: A User-Friendly Tool for De Novo Assembly of Organellar Genomes with High-Fidelity Data. Mol Biol Evol 2025; 42:msae247. [PMID: 39800935 PMCID: PMC11725521 DOI: 10.1093/molbev/msae247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 11/15/2024] [Accepted: 11/18/2024] [Indexed: 01/16/2025] Open
Abstract
Plant cells have two major organelles with their own genomes: chloroplasts and mitochondria. While chloroplast genomes tend to be structurally conserved, the mitochondrial genomes of plants, which are much larger than those of animals, are characterized by complex structural variation. We introduce TIPPo, a user-friendly, reference-free assembly tool that uses PacBio high-fidelity long-read data and that does not rely on genomes from related species or nuclear genome information for the assembly of organellar genomes. TIPPo employs a deep learning model for initial read classification and leverages k-mer counting for further refinement, significantly reducing the impact of nuclear insertions of organellar DNA on the assembly process. We used TIPPo to completely assemble a set of 54 complete chloroplast genomes. No other tool was able to completely assemble this set. TIPPo is comparable with PMAT in assembling mitochondrial genomes from most species but does achieve even higher completeness for several species. We also used the assembled organelle genomes to identify instances of nuclear plastid DNA (NUPTs) and nuclear mitochondrial DNA (NUMTs) insertions. The cumulative length of NUPTs/NUMTs positively correlates with the size of the nuclear genome, suggesting that insertions occur stochastically. NUPTs/NUMTs show predominantly C:G to T:A changes, with the mutated cytosines typically found in CG and CHG contexts, suggesting that degradation of NUPT and NUMT sequences is driven by the known elevated mutation rate of methylated cytosines. Small interfering RNA loci are enriched in NUPTs and NUMTs, consistent with the RdDM pathway mediating DNA methylation in these sequences.
Collapse
Affiliation(s)
- Wenfei Xian
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Ilja Bezrukov
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Zhigui Bao
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Sebastian Vorbrugg
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Anupam Gautam
- Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany
- International Max Planck Research School “From Molecules to Organisms”, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany
| |
Collapse
|
20
|
Rigden DJ, Fernández XM. The 2025 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Res 2025; 53:D1-D9. [PMID: 39658041 PMCID: PMC11701706 DOI: 10.1093/nar/gkae1220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Accepted: 11/26/2024] [Indexed: 12/12/2024] Open
Abstract
The 2025 Nucleic Acids Research database issue contains 185 papers spanning biology and related areas. Seventy three new databases are covered, while resources previously described in the issue account for 101 update articles. Databases most recently published elsewhere account for a further 11 papers. Nucleic acid databases include EXPRESSO for multi-omics of 3D genome structure (this issue's chosen Breakthrough Resource and Article) and NAIRDB for Fourier transform infrared data. New protein databases include structure predictions for human isoforms at ASpdb and for viral proteins at BFVD. UniProt, Pfam and InterPro have all provided updates: metabolism and signalling are covered by new descriptions of STRING, KEGG and CAZy, while updated microbe-oriented databases include Enterobase, VFDB and PHI-base. Biomedical research is supported, among others, by ClinVar, PubChem and DrugMAP. Genomics-related resources include Ensembl, UCSC Genome Browser and dbSNP. New plant databases cover the Solanaceae (SolR) and Asteraceae (AMIR) families while an update from NCBI Taxonomy also features. The Database Issue is freely available on the Nucleic Acids Research website (https://academic.oup.com/nar). At the NAR online Molecular Biology Database Collection (http://www.oxfordjournals.org/nar/database/c/), 932 entries have been reviewed in the last year, 74 new resources added and 226 discontinued URLs eliminated bringing the current total to 2236 databases.
Collapse
Affiliation(s)
- Daniel J Rigden
- Department of Biochemistry, Cell and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| | | |
Collapse
|
21
|
Bronner IF, Dawson E, Park N, Piepenburg O, Quail MA. Evaluation of controls, quality control assays, and protocol optimisations for PacBio HiFi sequencing on diverse and challenging samples. Front Genet 2025; 15:1505839. [PMID: 39845189 PMCID: PMC11752452 DOI: 10.3389/fgene.2024.1505839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Accepted: 11/19/2024] [Indexed: 01/24/2025] Open
Abstract
The Darwin Tree of Life (DToL) project aims to generate high-quality reference genomes for all eukaryotic organisms in Britain and Ireland. At the time of writing, PacBio HiFi reads are generated for all samples using the Sequel IIe systems by the Wellcome Sanger Institute's Scientific Operations teams, however we expect lessons from this work to apply directly to the Revio system too, as core principles of SMRT sequencing remain the same. We observed that HiFi yield is highly variable for DToL samples. We have investigated what drives this variation, and potential mitigations. To support these investigations a number of controls were evaluated to ensure that the library and sequencing preparation procedures, reagents, consumables, and Sequel IIe instruments, were performing as expected. Our findings support that a primary factor driving variability in HiFi yield is the quality of the DNA prior to library construction, e.g., purity, size, and damage. We investigated whether quality assessment assays could link measurable DNA damage or purity to sequencing yield. Some correlation could be established, however no assay was predictive of sequencing yield for all samples, indicating that the variability is driven by multiple factors that may interact. We demonstrate that contaminants present in some samples are the cause of very low HiFi yield, and show that these contaminants can negatively affect the PacBio internal sequencing control and samples multiplexed on the same SMRT Cell. We found that consistently high yields could be obtained if an amplification workflow was utilised, namely PacBio's ultra-low input library preparation protocol.
Collapse
Affiliation(s)
| | - Emma Dawson
- Wellcome Sanger Institute (WT), Hinxton, United Kingdom
| | | | | | | |
Collapse
|
22
|
Dyer SC, Austine-Orimoloye O, Azov AG, Barba M, Barnes I, Barrera-Enriquez VP, Becker A, Bennett R, Beracochea M, Berry A, Bhai J, Bhurji SK, Boddu S, Branco Lins PR, Brooks L, Ramaraju SB, Campbell LI, Martinez MC, Charkhchi M, Cortes LA, Davidson C, Denni S, Dodiya K, Donaldson S, El Houdaigui B, El Naboulsi T, Falola O, Fatima R, Genez T, Martinez JG, Gurbich T, Hardy M, Hollis Z, Hunt T, Kay M, Kaykala V, Lemos D, Lodha D, Mathlouthi N, Merino GA, Merritt R, Mirabueno LP, Mushtaq A, Hossain SN, Pérez-Silva JG, Perry M, Piližota I, Poppleton D, Prosovetskaia I, Raj S, Salam AI, Saraf S, Saraiva-Agostinho N, Sinha S, Sipos B, Sitnik V, Steed E, Suner MM, Surapaneni L, Sutinen K, Tricomi FF, Tsang I, Urbina-Gómez D, Veidenberg A, Walsh TA, Willhoft NL, Allen J, Alvarez-Jarreta J, Chakiachvili M, Cheema J, da Rocha JB, De Silva NH, Giorgetti S, Haggerty L, Ilsley GR, Keatley J, Loveland JE, Moore B, Mudge JM, Naamati G, Tate J, Trevanion SJ, Winterbottom A, Flint B, Frankish A, Hunt SE, Finn RD, Freeberg MA, Harrison PW, Martin FJ, Yates AD. Ensembl 2025. Nucleic Acids Res 2025; 53:D948-D957. [PMID: 39656687 PMCID: PMC11701638 DOI: 10.1093/nar/gkae1071] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 10/14/2024] [Accepted: 10/22/2024] [Indexed: 12/17/2024] Open
Abstract
Ensembl (www.ensembl.org) is an open platform integrating publicly available genomics data across the tree of life with a focus on eukaryotic species related to human health, agriculture and biodiversity. This year has seen a continued expansion in the number of species represented, with >4800 eukaryotic and >31 300 prokaryotic genomes available. The new Ensembl site, currently in beta, has continued to develop, currently holding >2700 eukaryotic genome assemblies. The new site provides genome, gene, transcript, homology and variation views, and will replace the current Rapid Release site; this represents a key step towards provision of a single integrated Ensembl site. Additional activities have included developing improved regulatory annotation for human, mouse and agricultural species, and expanding the Ensembl Variant Effect Predictor tool. To learn more about Ensembl, help and documentation are available along with an extensive training program that can be accessed via our training pages.
Collapse
Affiliation(s)
- Sarah C Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrey G Azov
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthieu Barba
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vianey Paola Barrera-Enriquez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Arne Becker
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Martin Beracochea
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jyothish Bhai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Simarpreet Kaur Bhurji
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sanjay Boddu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paulo R Branco Lins
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Lucy Brooks
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Shashank Budhanuru Ramaraju
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Lahcen I Campbell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Manuel Carbajo Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mehrnaz Charkhchi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Lucas A Cortes
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sukanya Denni
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Université de Rouen Normandie, UFR Sciences et Techniques, 3 Av. Pasteur, 76000 Rouen, France
| | - Kamalkumar Dodiya
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bilal El Houdaigui
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tamara El Naboulsi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Oluwadamilare Falola
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thiago Genez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Gonzalez Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tatiana Gurbich
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vinay Kaykala
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Diana Lemos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Disha Lodha
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nourhen Mathlouthi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gabriela Alejandra Merino
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ryan Merritt
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Louisse Paola Mirabueno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Aleena Mushtaq
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Syed Nakib Hossain
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - José G Pérez-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Malcolm Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ivana Piližota
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Poppleton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Prosovetskaia
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ahamed Imran Abdul Salam
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Shradha Saraf
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nuno Saraiva-Agostinho
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Swati Sinha
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Botond Sipos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vasily Sitnik
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kyösti Sutinen
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ian Tsang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- NIAB, Lawrence Weaver Road, Cambridge CB3 0LE, UK
- University of Nottingham, Department of Plant Science, Plant Sciences Building, Sutton Bonnington Campus, Nottingham LE12 5RD, UK
| | - David Urbina-Gómez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andres Veidenberg
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas A Walsh
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Natalie L Willhoft
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jorge Alvarez-Jarreta
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marc Chakiachvili
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jitender Cheema
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jorge Batista da Rocha
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nishadi H De Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Garth R Ilsley
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jon Keatley
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Benjamin Moore
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - John Tate
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephen J Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrea Winterbottom
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bethany Flint
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mallory A Freeberg
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew D Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
23
|
Blake DP. Eimeria of chickens: the changing face of an old foe. Avian Pathol 2025:1-12. [PMID: 39743984 DOI: 10.1080/03079457.2024.2441180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Revised: 11/22/2024] [Accepted: 12/03/2024] [Indexed: 01/04/2025]
Abstract
RESEARCH HIGHLIGHTS The cost of coccidiosis in chickens fluctuates considerably, peaking in 2022.Three new Eimeria species can infect chickens and escape current vaccines.Eimeria infection exerts wide-ranging effects on enteric microbiota.
Collapse
Affiliation(s)
- Damer P Blake
- Pathobiology and Population Sciences, Royal Veterinary College, North Mymms, UK
| |
Collapse
|
24
|
Secomandi S, Gallo GR, Rossi R, Rodríguez Fernandes C, Jarvis ED, Bonisoli-Alquati A, Gianfranceschi L, Formenti G. Pangenome graphs and their applications in biodiversity genomics. Nat Genet 2025; 57:13-26. [PMID: 39779953 DOI: 10.1038/s41588-024-02029-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 11/08/2024] [Indexed: 01/11/2025]
Abstract
Complete datasets of genetic variants are key to biodiversity genomic studies. Long-read sequencing technologies allow the routine assembly of highly contiguous, haplotype-resolved reference genomes. However, even when complete, reference genomes from a single individual may bias downstream analyses and fail to adequately represent genetic diversity within a population or species. Pangenome graphs assembled from aligned collections of high-quality genomes can overcome representation bias by integrating sequence information from multiple genomes from the same population, species or genus into a single reference. Here, we review the available tools and data structures to build, visualize and manipulate pangenome graphs while providing practical examples and discussing their applications in biodiversity and conservation genomics across the tree of life.
Collapse
Affiliation(s)
- Simona Secomandi
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
| | | | - Riccardo Rossi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| | - Carlos Rodríguez Fernandes
- Centre for Ecology, Evolution and Environmental Changes (CE3C) and CHANGE, Global Change and Sustainability Institute, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
- Faculdade de Psicologia, Universidade de Lisboa, Lisboa, Portugal
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
- The Vertebrate Genome Laboratory, New York, NY, USA
| | - Andrea Bonisoli-Alquati
- Department of Biological Sciences, California State Polytechnic University, Pomona, Pomona, CA, USA
| | | | | |
Collapse
|
25
|
Hauff L, Rasoanaivo NE, Razafindrakoto A, Ravelonjanahary H, Wright PC, Rakotoarivony R, Bergey CM. De Novo Genome Assembly for an Endangered Lemur Using Portable Nanopore Sequencing in Rural Madagascar. Ecol Evol 2025; 15:e70734. [PMID: 39777412 PMCID: PMC11705420 DOI: 10.1002/ece3.70734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 12/02/2024] [Accepted: 12/03/2024] [Indexed: 01/11/2025] Open
Abstract
As one of the most threatened mammalian taxa, lemurs of Madagascar are facing unprecedented anthropogenic pressures. To address conservation imperatives such as this, researchers have increasingly relied on conservation genomics to identify populations of particular concern. However, many of these genomic approaches necessitate high-quality genomes. While the advent of next-generation sequencing technologies and the resulting reduction in associated costs have led to the proliferation of genomic data and high-quality reference genomes, global discrepancies in genomic sequencing capabilities often result in biological samples from biodiverse host countries being exported to facilities in the Global North, creating inequalities in access and training within genomic research. Here, we present the first published reference genome for the endangered red-fronted brown lemur (Eulemur rufifrons) from sequencing efforts conducted entirely within the host country using portable Oxford Nanopore sequencing. Using an archived E. rufifrons specimen, we conducted long-read, nanopore sequencing at the Centre ValBio Research Station near Ranomafana National Park, in rural Madagascar, generating over 750 Gb of sequencing data from 10 MinION flow cells. Exclusively using this long-read data, we assembled 2.157 gigabase, 2980-contig nuclear assembly with an N50 of 101.6 Mb and a 17,108 bp mitogenome. The nuclear assembly had 30× average coverage and was comparable in completeness to other primate reference genomes, with a 96.1% BUSCO completeness score for primate-specific genes. As the first published reference genome for E. rufifrons and the only annotated genome available for the speciose Eulemur genus, this resource will prove vital for conservation genomic studies while our efforts exhibit the potential of this protocol to address research inequalities and build genomic capacity.
Collapse
Affiliation(s)
- Lindsey Hauff
- Department of Ecology, Evolution, and Natural ResourcesRutgers UniversityNew BrunswickNew JerseyUSA
- Center for Human Evolutionary StudiesRutgers UniversityNew BrunswickNew JerseyUSA
- Human Genetics Institute of New JerseyPiscatawayNew JerseyUSA
| | - Noa Elosmie Rasoanaivo
- Department of Zoology and Animal BiodiversityUniversity of AntananarivoAntananarivoMadagascar
| | | | | | - Patricia C. Wright
- Centre ValBio, Ranomafana National ParkIfanadianaMadagascar
- Department of AnthropologyStony Brook UniversityStony BrookNew YorkUSA
| | - Rindra Rakotoarivony
- Department of Biological Anthropology and Sustainable DevelopmentUniversity of AntananarivoAntananarivoMadagascar
| | - Christina M. Bergey
- Center for Human Evolutionary StudiesRutgers UniversityNew BrunswickNew JerseyUSA
- Human Genetics Institute of New JerseyPiscatawayNew JerseyUSA
- Department of GeneticsRutgers UniversityPiscatawayNew JerseyUSA
| |
Collapse
|
26
|
Krabberød AK, Stokke E, Thoen E, Skrede I, Kauserud H. The Ribosomal Operon Database: A Full-Length rDNA Operon Database Derived From Genome Assemblies. Mol Ecol Resour 2025; 25:e14031. [PMID: 39428982 DOI: 10.1111/1755-0998.14031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 06/27/2024] [Accepted: 09/27/2024] [Indexed: 10/22/2024]
Abstract
Current rDNA reference sequence databases are tailored towards shorter DNA markers, such as parts of the 16/18S marker or the internally transcribed spacer (ITS) region. However, due to advances in long-read DNA sequencing technologies, longer stretches of the rDNA operon are increasingly used in environmental sequencing studies to increase the phylogenetic resolution. There is, therefore, a growing need for longer rDNA reference sequences. Here, we present the ribosomal operon database (ROD), which includes eukaryotic full-length rDNA operons fished from publicly available genome assemblies. Full-length operons were detected in 34.1% of the 34,701 examined eukaryotic genome assemblies from NCBI. In most cases (53.1%), more than one operon variant was detected, which can be due to intragenomic operon copy variability, allelic variation in non-haploid genomes, or technical errors from the sequencing and assembly process. The highest copy number found was 5947 in Zea mays. In total, 453,697 unique operons were detected, with 69,480 operon variant clusters remaining after intragenomic clustering at 99% sequence identity. The operon length varied extensively across eukaryotes, ranging from 4136 to 16,463 bp, which will lead to considerable polymerase chain reaction (PCR) bias during amplification of the entire operon. Clustering the full-length operons revealed that the different parts (i.e., 18S, 28S, and the hypervariable regions V4 and V9 of 18S) provide divergent taxonomic resolution, with 18S, the V4 and V9 regions being the most conserved. The ROD will be updated regularly to provide an increasing number of full-length rDNA operons to the scientific community.
Collapse
Affiliation(s)
- Anders K Krabberød
- Department of Biosciences, Section for Genetics and Evolutionary Biology, University of Oslo, Oslo, Norway
| | - Embla Stokke
- Department of Biosciences, Section for Genetics and Evolutionary Biology, University of Oslo, Oslo, Norway
| | - Ella Thoen
- Department of Biosciences, Section for Genetics and Evolutionary Biology, University of Oslo, Oslo, Norway
| | - Inger Skrede
- Department of Biosciences, Section for Genetics and Evolutionary Biology, University of Oslo, Oslo, Norway
| | - Håvard Kauserud
- Department of Biosciences, Section for Genetics and Evolutionary Biology, University of Oslo, Oslo, Norway
| |
Collapse
|
27
|
Orozco-Arias S, Sierra P, Durbin R, González J. MCHelper automatically curates transposable element libraries across eukaryotic species. Genome Res 2024; 34:2256-2268. [PMID: 39653419 PMCID: PMC11694758 DOI: 10.1101/gr.278821.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 09/18/2024] [Indexed: 12/25/2024]
Abstract
The number of species with high-quality genome sequences continues to increase, in part due to the scaling up of multiple large-scale biodiversity sequencing projects. While the need to annotate genic sequences in these genomes is widely acknowledged, the parallel need to annotate transposable element (TE) sequences that have been shown to alter genome architecture, rewire gene regulatory networks, and contribute to the evolution of host traits is becoming ever more evident. However, accurate genome-wide annotation of TE sequences is still technically challenging. Several de novo TE identification tools are now available, but manual curation of the libraries produced by these tools is needed to generate high-quality genome annotations. Manual curation is time-consuming, and thus impractical for large-scale genomic studies, and lacks reproducibility. In this work, we present the Manual Curator Helper tool MCHelper, which automates the TE library curation process. By leveraging MCHelper's fully automated mode with the outputs from three de novo TE identification tools, RepeatModeler2, EDTA, and REPET, in the fruit fly, rice, hooded crow, zebrafish, maize, and human, we show a substantial improvement in the quality of the TE libraries and genome annotations. MCHelper libraries are less redundant, with up to 65% reduction in the number of consensus sequences, have up to 11.4% fewer false positive sequences, and up to ∼48% fewer "unclassified/unknown" TE consensus sequences. Genome-wide TE annotations are also improved, including larger unfragmented insertions. Moreover, MCHelper is an easy-to-install and easy-to-use tool.
Collapse
Affiliation(s)
| | - Pío Sierra
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Josefa González
- Institute of Evolutionary Biology, CSIC, UPF, 08003 Barcelona, Spain;
- Institut Botànic de Barcelona (IBB), CSIC-CMCNB, 08038 Barcelona, Spain
| |
Collapse
|
28
|
Sanita Lima M, Silva Domingues D, Rossi Paschoal A, Smith DR. Long-read RNA sequencing can probe organelle genome pervasive transcription. Brief Funct Genomics 2024; 23:695-701. [PMID: 38880995 DOI: 10.1093/bfgp/elae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 05/20/2024] [Accepted: 05/30/2024] [Indexed: 06/18/2024] Open
Abstract
40 years ago, organelle genomes were assumed to be streamlined and, perhaps, unexciting remnants of their prokaryotic past. However, the field of organelle genomics has exposed an unparallel diversity in genome architecture (i.e. genome size, structure, and content). The transcription of these eccentric genomes can be just as elaborate - organelle genomes are pervasively transcribed into a plethora of RNA types. However, while organelle protein-coding genes are known to produce polycistronic transcripts that undergo heavy posttranscriptional processing, the nature of organelle noncoding transcriptomes is still poorly resolved. Here, we review how wet-lab experiments and second-generation sequencing data (i.e. short reads) have been useful to determine certain types of organelle RNAs, particularly noncoding RNAs. We then explain how third-generation (long-read) RNA-Seq data represent the new frontier in organelle transcriptomics. We show that public repositories (e.g. NCBI SRA) already contain enough data for inter-phyla comparative studies and argue that organelle biologists can benefit from such data. We discuss the prospects of using publicly available sequencing data for organelle-focused studies and examine the challenges of such an approach. We highlight that the lack of a comprehensive database dedicated to organelle genomics/transcriptomics is a major impediment to the development of a field with implications in basic and applied science.
Collapse
Affiliation(s)
- Matheus Sanita Lima
- Department of Biology, Western University, 1151 Richmond Street, London, Ontario N6A 5B7, Canada
| | - Douglas Silva Domingues
- Department of Genetics, "Luiz de Queiroz" College of Agriculture, University of São Paulo, Avenida Padua Dias 11, Piracicaba, SP 13418-900, Brazil
| | - Alexandre Rossi Paschoal
- Department of Computer Science, Bioinformatics and Pattern Recognition Group (BIOINFO-CP), Federal University of Technology - Paraná - UTFPR, Avenida Alberto Carazzai 1640, Cornélio Procópio, PR 86300000, Brazil
| | - David Roy Smith
- Department of Biology, Western University, 1151 Richmond Street, London, Ontario N6A 5B7, Canada
| |
Collapse
|
29
|
Mikalsen SO, Í Hjøllum J, Salter I, Djurhuus A, Í Kongsstovu S. A Faroese perspective on decoding life for sustainable use of nature and protection of biodiversity. NPJ BIODIVERSITY 2024; 3:37. [PMID: 39632982 PMCID: PMC11618374 DOI: 10.1038/s44185-024-00068-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Accepted: 11/11/2024] [Indexed: 12/07/2024]
Abstract
Biodiversity is under pressure, mainly due to human activities and climate change. At the international policy level, it is now recognised that genetic diversity is an important part of biodiversity. The availability of high-quality reference genomes gives the best basis for using genetics and genetic diversity towards the global aims of (1) the protection of species, biodiversity, and nature, and (2) the management of biodiversity for achieving sustainable harvesting of nature. Protecting biodiversity is a global responsibility, also resting on small nations, like the Faroe Islands. Being in the middle of the North Atlantic Ocean and having large fisheries activity, the nation has a particular responsibility towards maritime matters. We here provide the reasoning behind the Genome Atlas of Faroese Ecology (Gen@FarE), a project based on our participation in the European Reference Genome Atlas consortium (ERGA). Gen@FarE has three major aims: (1) To acquire high-quality genomes of all eukaryotic species in the Faroe Islands and Faroese waters. (2) To establish population genetics for species of commercial or ecological interest. (3) To establish an information databank for all Faroese species, combined with a citizen science registration database, making it possible for the public to participate in acquiring and maintaining the overview of Faroese species in both terrestrial and marine environments. Altogether, we believe that this will enhance the society's interest in and awareness of biodiversity, thereby protecting the foundations of our lives. Furthermore, the combination of a wide and highly competent ERGA umbrella and more targeted national projects will help fulfil the formal and moral responsibilities that all nations, also those with limited resources, have in protecting biodiversity and achieving sustainability in harvesting from nature.
Collapse
Affiliation(s)
- Svein-Ole Mikalsen
- Faculty of Science and Technology, University of the Faroe Islands, Tórshavn, Faroe Islands.
| | - Jari Í Hjøllum
- Faculty of Science and Technology, University of the Faroe Islands, Tórshavn, Faroe Islands
| | - Ian Salter
- Faroe Marine Research Institute, Tórshavn, Faroe Islands
| | - Anni Djurhuus
- Faculty of Science and Technology, University of the Faroe Islands, Tórshavn, Faroe Islands
| | - Sunnvør Í Kongsstovu
- Faculty of Science and Technology, University of the Faroe Islands, Tórshavn, Faroe Islands
| |
Collapse
|
30
|
Prescott T, Hill D, Bence S. The genome sequence of the Gold Spot moth, Plusia festucae (Linnaeus, 1758). Wellcome Open Res 2024; 9:704. [PMID: 39925654 PMCID: PMC11803378 DOI: 10.12688/wellcomeopenres.23409.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/12/2024] [Indexed: 02/11/2025] Open
Abstract
We present a genome assembly from an individual male specimen of Plusia festucae (Gold Spot; Arthropoda; Insecta; Lepidoptera; Noctuidae). The genome sequence has a total length of 422.50 megabases. Most of the assembly (99.92%) is scaffolded into 31 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 15.3 kilobases in length. Gene annotation of this assembly on Ensembl identified 19,273 protein-coding genes.
Collapse
Affiliation(s)
- Tom Prescott
- Butterfly Conservation Scotland, Stirling, Scotland, UK
| | - David Hill
- Butterfly Conservation Scotland, Stirling, Scotland, UK
| | | | | | | | | | | | | |
Collapse
|
31
|
Augustijnen H, Arias-Sardá C, Farré M, Lucek K. A Genomic Update on the Evolutionary Impact of Chromosomal Rearrangements. Mol Ecol 2024; 33:e17602. [PMID: 39585199 DOI: 10.1111/mec.17602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 10/29/2024] [Accepted: 11/13/2024] [Indexed: 11/26/2024]
Affiliation(s)
- Hannah Augustijnen
- Unit of Ecology and Evolution, Department of Biology, University of Fribourg, Fribourg, Switzerland
| | | | - Marta Farré
- School of Biosciences, University of Kent, Kent, UK
| | - Kay Lucek
- Biodiversity Genomics Laboratory, Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
| |
Collapse
|
32
|
Provatas K, Chantzi N, Patsakis M, Nayak A, Mouratidis I, Georgakopoulos-Soares I. Microsatellites explorer: A database of short tandem repeats across genomes. Comput Struct Biotechnol J 2024; 23:3817-3826. [PMID: 39525087 PMCID: PMC11550718 DOI: 10.1016/j.csbj.2024.10.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 10/24/2024] [Accepted: 10/24/2024] [Indexed: 11/16/2024] Open
Abstract
Short tandem repeats (STRs) are widespread, repetitive elements, with a number of biological functions and are among the most rapidly mutating regions in the genome. Their distribution varies significantly between taxonomic groups in the tree of life and are highly polymorphic within the human population. Advances in sequencing technologies coupled with decreasing costs have enabled the generation of an ever-growing number of complete genomes. Additionally, the arrival of accurate long reads has facilitated the generation of Telomere-to-Telomere (T2T) assemblies of complete genomes. Nevertheless, there is no comprehensive database that encompasses the STRs found per genome across different organisms and for different human genomes across diverse ancestries. Here we introduce Microsatellites Explorer, a database of STRs found in the genomes of 117,253 organisms across all major taxonomic groups, 15 T2T genome assemblies of different organisms, and 94 human haplotypes from the human pangenome. The database currently hosts 406,758,798 STR sequences, serving as a centralized user-friendly repository to perform searches, interactive visualizations, and download existing STR data for independent analysis. Microsatellites Explorer is implemented as a web-portal for browsing, analyzing and downloading STR data. Microsatellites Explorer is publicly available at https://www.microsatellitesexplorer.com.
Collapse
Affiliation(s)
- Kimonas Provatas
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Michail Patsakis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Akshatha Nayak
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
33
|
Kalbfleisch TS, Smith ML, Ciosek JL, Li K, Doris PA. Three decades of rat genomics: approaching the finish(ed) line. Physiol Genomics 2024; 56:807-818. [PMID: 39348459 PMCID: PMC11573253 DOI: 10.1152/physiolgenomics.00110.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 09/11/2024] [Accepted: 09/26/2024] [Indexed: 10/02/2024] Open
Abstract
The rat, Rattus norvegicus, has provided an important model for investigation of a range of characteristics of biomedical importance. Here we survey the origins of this species, its introduction into laboratory research, and the emergence of genetic and genomic methods that utilize this model organism. Genomic studies have yielded important progress and provided new insight into several biologically important traits. However, some studies have been impeded by the lack of a complete and accurate reference genome for this species. New sequencing and genome assembly methods applied to the rat have resulted in a new reference genome assembly, GRCr8, which is a near telomere-to-telomere assembly of high base-level accuracy that incorporates several elements not captured in prior assemblies. As genome assembly methods continue to advance and production costs become a less significant obstacle, genome assemblies for multiple inbred rat strains are emerging. These assemblies will allow a rat pangenome assembly to be constructed that captures all the genetic variations in strains selected for their utility in research and will overcome reference bias, a limitation associated with reliance on a single reference assembly. By this means, the full utility of this model organism to genomic studies will begin to be revealed.
Collapse
Affiliation(s)
- Theodore S Kalbfleisch
- Gluck Equine Research Center, University of Kentucky, Lexington, Kentucky, United States
| | - Melissa L Smith
- Department of Biochemistry and Molecular Biology, University of Louisville School of Medicine, Louisville, Kentucky, United States
| | - Julia L Ciosek
- Gluck Equine Research Center, University of Kentucky, Lexington, Kentucky, United States
| | - Kai Li
- Gluck Equine Research Center, University of Kentucky, Lexington, Kentucky, United States
| | - Peter A Doris
- Center for Human Genetics, Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center, Houston, Texas, United States
| |
Collapse
|
34
|
Nguyen AK, Schall PZ, Kidd JM. A map of canine sequence variation relative to a Greenland wolf outgroup. Mamm Genome 2024; 35:565-576. [PMID: 39088040 DOI: 10.1007/s00335-024-10056-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 07/25/2024] [Indexed: 08/02/2024]
Abstract
For over 15 years, canine genetics research relied on a reference assembly from a Boxer breed dog named Tasha (i.e., canFam3.1). Recent advances in long-read sequencing and genome assembly have led to the development of numerous high-quality assemblies from diverse canines. These assemblies represent notable improvements in completeness, contiguity, and the representation of gene promoters and gene models. Although genome graph and pan-genome approaches have promise, most genetic analyses in canines rely upon the mapping of Illumina sequencing reads to a single reference. The Dog10K consortium, and others, have generated deep catalogs of genetic variation through an alignment of Illumina sequencing reads to a reference genome obtained from a German Shepherd Dog named Mischka (i.e., canFam4, UU_Cfam_GSD_1.0). However, alignment to a breed-derived genome may introduce bias in genotype calling across samples. Since the use of an outgroup reference genome may remove this effect, we have reprocessed 1929 samples analyzed by the Dog10K consortium using a Greenland wolf (mCanLor1.2) as the reference. We efficiently performed remapping and variant calling using a GPU-implementation of common analysis tools. The resulting call set removes the variability in genetic differences seen across samples and breed relationships revealed by principal component analysis are not affected by the choice of reference genome. Using this sequence data, we inferred the history of population sizes and found that village dog populations experienced a 9-13 fold reduction in historic effective population size relative to wolves.
Collapse
Affiliation(s)
- Anthony K Nguyen
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Peter Z Schall
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
35
|
Patsakis M, Provatas K, Mouratidis I, Georgakopoulos-Soares I. MAFcounter: An efficient tool for counting the occurrences of k-mers in MAF files. ARXIV 2024:arXiv:2411.19427v1. [PMID: 39650609 PMCID: PMC11623707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
Motivation With the rapid expansion of large-scale biological datasets, DNA and protein sequence alignments have become essential for comparative genomics and proteomics. These alignments facilitate the exploration of sequence similarity patterns, providing valuable insights into sequence conservation, evolutionary relationships and for functional analyses. Typically, sequence alignments are stored in formats such as the Multiple Alignment Format (MAF). Counting k-mer occurrences is a crucial task in many computational biology applications, but currently, there is no algorithm designed for k-mer counting in alignment files. Results We have developed MAFcounter, the first k-mer counter dedicated to alignment files. MAFcounter is multithreaded, fast, and memory efficient, enabling k-mer counting in DNA and protein sequence alignment files. Availability The MAFcounter package and its Python bindings are released under GPL license as a multi-platform application and are available at: https://github.com/Georgakopoulos-Soares-lab/MAFcounter.
Collapse
Affiliation(s)
- Michail Patsakis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Kimonas Provatas
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
36
|
Boyes D. The genome sequence of the Poplar Grey moth, Subacronicta megacephala (Denis & Schiffermüller, 1775). Wellcome Open Res 2024; 9:696. [PMID: 39822595 PMCID: PMC11736114 DOI: 10.12688/wellcomeopenres.23371.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/08/2024] [Indexed: 01/19/2025] Open
Abstract
We present a genome assembly from an individual male Subacronicta megacephala (Poplar Grey moth; Arthropoda; Insecta; Lepidoptera; Noctuidae). The genome sequence has a total length of 424.20 megabases. Most of the assembly (99.02%) is scaffolded into 31 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 15.35 kilobases in length. Gene annotation of this assembly on Ensembl identified 18,189 protein-coding genes.
Collapse
Affiliation(s)
- Douglas Boyes
- UK Centre for Ecology & Hydrology, Wallingford, England, UK
| | | | | | | | | | | | | | | |
Collapse
|
37
|
Vazquez JM, Lauterbur ME, Mottaghinia S, Bucci M, Fraser D, Gray-Sandoval G, Gaucherand L, Haidar ZR, Han M, Kohler W, Lama TM, Le Corf A, Loyer C, Maesen S, McMillan D, Li S, Lo J, Rey C, Capel SLR, Singer M, Slocum K, Thomas W, Tyburec JD, Villa S, Miller R, Buchalski M, Vazquez-Medina JP, Pfeffer S, Etienne L, Enard D, Sudmant PH. Extensive longevity and DNA virus-driven adaptation in nearctic Myotis bats. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.10.617725. [PMID: 39416019 PMCID: PMC11482938 DOI: 10.1101/2024.10.10.617725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
The genus Myotis is one of the largest clades of bats, and exhibits some of the most extreme variation in lifespans among mammals alongside unique adaptations to viral tolerance and immune defense. To study the evolution of longevity-associated traits and infectious disease, we generated near-complete genome assemblies and cell lines for 8 closely related species of Myotis. Using genome-wide screens of positive selection, analyses of structural variation, and functional experiments in primary cell lines, we identify new patterns of adaptation contributing to longevity, cancer resistance, and viral interactions in bats. We find that Myotis bats have some of the most significant variation in cancer risk across mammals and demonstrate a unique DNA damage response in primary cells of the long-lived M. lucifugus. We also find evidence of abundant adaptation in response to DNA viruses - but not RNA viruses - in Myotis and other bats in sharp contrast with other mammals, potentially contributing to the role of bats as reservoirs of zoonoses. Together, our results demonstrate how genomics and primary cells derived from diverse taxa uncover the molecular bases of extreme adaptations in non-model organisms.
Collapse
Affiliation(s)
- Juan M Vazquez
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA USA
- These authors contributed equally
| | - M. Elise Lauterbur
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ USA
- Current affiliation: Department of Biology, University of Vermont, Burlington, VT USA
- These authors contributed equally
| | - Saba Mottaghinia
- Centre International de Recherche en Infectiologie (CIRI), Inserm U1111, UCBL1, CNRS UMR5308, Ecole Normale Supérieure ENS de Lyon, Université de Lyon, Lyon, France
| | - Melanie Bucci
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ USA
| | - Devaughn Fraser
- Wildlife Genetics Research Unit, Wildlife Health Laboratory, California Department of Fish and Wildlife, Sacramento, CA, United States
- Current affiliation: Wildlife Diversity Program, Wildlife Division, Connecticut Department of Energy and Environmental Protection, Burlington, CT, United States
| | | | - Léa Gaucherand
- Université de Strasbourg, Architecture et Réactivité de l’ARN, Institut de Biologie Moléculaire et Cellulaire du CNRS, Strasbourg, France
| | - Zeinab R Haidar
- Department of Biology, California State Polytechnic University, Humboldt, Arcata, CA USA
- Current affiliation: Western EcoSystems Technology Inc, Cheyenne, WY USA
| | - Melissa Han
- Department of Pathology and Clinical Laboratories, University of Michigan, Ann Arbor, MI USA
| | - William Kohler
- Department of Pathology and Clinical Laboratories, University of Michigan, Ann Arbor, MI USA
| | - Tanya M. Lama
- Department of Biological Sciences, Smith College, Northampton, MA USA
| | - Amandine Le Corf
- Centre International de Recherche en Infectiologie (CIRI), Inserm U1111, UCBL1, CNRS UMR5308, Ecole Normale Supérieure ENS de Lyon, Université de Lyon, Lyon, France
| | - Clara Loyer
- Centre International de Recherche en Infectiologie (CIRI), Inserm U1111, UCBL1, CNRS UMR5308, Ecole Normale Supérieure ENS de Lyon, Université de Lyon, Lyon, France
| | - Sarah Maesen
- Centre International de Recherche en Infectiologie (CIRI), Inserm U1111, UCBL1, CNRS UMR5308, Ecole Normale Supérieure ENS de Lyon, Université de Lyon, Lyon, France
| | - Dakota McMillan
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA USA
- Department of Science and Biotechnology, Berkeley City College, Berkeley, CA USA
| | - Stacy Li
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA USA
| | - Johnathan Lo
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA USA
| | - Carine Rey
- Centre International de Recherche en Infectiologie (CIRI), Inserm U1111, UCBL1, CNRS UMR5308, Ecole Normale Supérieure ENS de Lyon, Université de Lyon, Lyon, France
| | - Samantha LR Capel
- Current affiliation: Wildlife Diversity Program, Wildlife Division, Connecticut Department of Energy and Environmental Protection, Burlington, CT, United States
| | - Michael Singer
- Department of Molecular and Cellular Biology, University of California, Berkeley, Berkeley, CA USA
| | | | - William Thomas
- Department of Ecology and Evolution, Stony Brook University, Stony Brook NY USA
| | | | - Sarah Villa
- Department of Molecular and Cellular Biology, University of California, Berkeley, Berkeley, CA USA
| | - Richard Miller
- Department of Pathology and Clinical Laboratories, University of Michigan, Ann Arbor, MI USA
| | - Michael Buchalski
- Wildlife Genetics Research Unit, Wildlife Health Laboratory, California Department of Fish and Wildlife, Sacramento, CA, United States
| | | | - Sébastien Pfeffer
- Université de Strasbourg, Architecture et Réactivité de l’ARN, Institut de Biologie Moléculaire et Cellulaire du CNRS, Strasbourg, France
| | - Lucie Etienne
- Centre International de Recherche en Infectiologie (CIRI), Inserm U1111, UCBL1, CNRS UMR5308, Ecole Normale Supérieure ENS de Lyon, Université de Lyon, Lyon, France
- Senior author
| | - David Enard
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ USA
- Senior author
- These authors contributed equally
| | - Peter H Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA USA
- Senior author
- These authors contributed equally
- Lead contact
| |
Collapse
|
38
|
Ryan H, Vernes SC, Teeling EC, Mai M. The genome sequence of the whiskered bat, Myotis mystacinus (Kuhl, 1817). Wellcome Open Res 2024; 9:684. [PMID: 39635244 PMCID: PMC11615438 DOI: 10.12688/wellcomeopenres.23345.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/04/2024] [Indexed: 12/07/2024] Open
Abstract
We present a genome assembly from an individual male Myotis mystacinus (whiskered bat; Chordata; Mammalia; Chiroptera; Vespertilionidae). The genome sequence has a total length of 2,081.20 megabases. Most of the assembly (97.52%) is scaffolded into 23 chromosomal pseudomolecules, including the X and Y sex chromosomes. The mitochondrial genome has also been assembled and is 16.93 kilobases in length.
Collapse
Affiliation(s)
| | - Sonja C. Vernes
- School of Biology, University of St Andrews, St Andrews, Scotland, UK
| | - Emma C Teeling
- School of Biology, University of St Andrews, St Andrews, Scotland, UK
- Wellcome Sanger Institute, Hinxton, England, UK
| | - Meike Mai
- School of Biology, University of St Andrews, St Andrews, Scotland, UK
| | - Natural History Museum Genome Acquisition Lab
- Kent Bat Group, Whitstable, England, UK
- School of Biology, University of St Andrews, St Andrews, Scotland, UK
- Wellcome Sanger Institute, Hinxton, England, UK
| | - Darwin Tree of Life Barcoding collective
- Kent Bat Group, Whitstable, England, UK
- School of Biology, University of St Andrews, St Andrews, Scotland, UK
- Wellcome Sanger Institute, Hinxton, England, UK
| | - Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team
- Kent Bat Group, Whitstable, England, UK
- School of Biology, University of St Andrews, St Andrews, Scotland, UK
- Wellcome Sanger Institute, Hinxton, England, UK
| | - Wellcome Sanger Institute Scientific Operations: Sequencing Operations
- Kent Bat Group, Whitstable, England, UK
- School of Biology, University of St Andrews, St Andrews, Scotland, UK
- Wellcome Sanger Institute, Hinxton, England, UK
| | - Wellcome Sanger Institute Tree of Life Core Informatics team
- Kent Bat Group, Whitstable, England, UK
- School of Biology, University of St Andrews, St Andrews, Scotland, UK
- Wellcome Sanger Institute, Hinxton, England, UK
| | - Tree of Life Core Informatics collective
- Kent Bat Group, Whitstable, England, UK
- School of Biology, University of St Andrews, St Andrews, Scotland, UK
- Wellcome Sanger Institute, Hinxton, England, UK
| | | |
Collapse
|
39
|
Calcino A, Cooke I, Cowman P, Higgie M, Massault C, Schmitz U, Whittaker M, Field MA. Harnessing genomic technologies for one health solutions in the tropics. Global Health 2024; 20:78. [PMID: 39543642 PMCID: PMC11566161 DOI: 10.1186/s12992-024-01083-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 11/01/2024] [Indexed: 11/17/2024] Open
Abstract
BACKGROUND The targeted application of cutting-edge high-throughput molecular data technologies provides an enormous opportunity to address key health, economic and environmental issues in the tropics within the One Health framework. The Earth's tropical regions are projected to contain > 50% of the world's population by 2050 coupled with 80% of its biodiversity however these regions are relatively less developed economically, with agricultural productivity substantially lower than temperate zones, a large percentage of its population having limited health care options and much of its biodiversity understudied and undescribed. The generation of high-throughput molecular data and bespoke bioinformatics capability to address these unique challenges offers an enormous opportunity for people living in the tropics. MAIN: In this review we discuss in depth solutions to challenges to populations living in tropical zones across three critical One Health areas: human health, biodiversity and food production. This review will examine how some of the challenges in the tropics can be addressed through the targeted application of advanced omics and bioinformatics and will discuss how local populations can embrace these technologies through strategic outreach and education ensuring the benefits of the One Health approach is fully realised through local engagement. CONCLUSION Within the context of the One Health framework, we will demonstrate how genomic technologies can be utilised to improve the overall quality of life for half the world's population.
Collapse
Affiliation(s)
- Andrew Calcino
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, Australia
- College of Public Health, Medical and Veterinary Sciences, James Cook University, Townsville, QLD, Australia
| | - Ira Cooke
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, Australia
- College of Public Health, Medical and Veterinary Sciences, James Cook University, Townsville, QLD, Australia
| | - Pete Cowman
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, Australia
- Queensland Museum, Townsville, QLD, Australia
| | - Megan Higgie
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, Australia
- College of Public Health, Medical and Veterinary Sciences, James Cook University, Townsville, QLD, Australia
| | - Cecile Massault
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, Australia
- Centre for Sustainable Tropical Fisheries and Aquaculture James Cook University, Townsville, QLD, Australia
| | - Ulf Schmitz
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, Australia
- College of Public Health, Medical and Veterinary Sciences, James Cook University, Townsville, QLD, Australia
- Sydney Medical School, University of Sydney, Sydney, NSW, Australia
| | - Maxine Whittaker
- College of Public Health, Medical and Veterinary Sciences, James Cook University, Townsville, QLD, Australia
| | - Matt A Field
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, Australia.
- Garvan Institute of Medical Research, Victoria Street, Darlinghurst, NSW, Australia.
| |
Collapse
|
40
|
Falk S. The genome sequence of the long-horned nomad bee, Nomada hirtipes Pérez, 1884. Wellcome Open Res 2024; 9:665. [PMID: 39931109 PMCID: PMC11809185 DOI: 10.12688/wellcomeopenres.23163.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/01/2024] [Indexed: 02/13/2025] Open
Abstract
We present a genome assembly from an individual female Nomada hirtipes (the long-horned nomad bee; Arthropoda; Insecta; Hymenoptera; Apidae). The genome sequence has a total length of 316.5 megabases. Most of the assembly (90.79%) is scaffolded into 16 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 29.88 kilobases in length. Gene annotation of this assembly on Ensembl identified 11,693 protein-coding genes.
Collapse
Affiliation(s)
- Steven Falk
- Independent researcher, Kenilworth, England, UK
| | | | | | | | | | | | | | | |
Collapse
|
41
|
Crowley LM. The genome sequence of the jet ant, Lasius fuliginosus (Latreille, 1798). Wellcome Open Res 2024; 9:668. [PMID: 39931112 PMCID: PMC11809159 DOI: 10.12688/wellcomeopenres.23347.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/04/2024] [Indexed: 02/13/2025] Open
Abstract
We present a genome assembly from an individual female Lasius fuliginosus (the jet ant; Arthropoda; Insecta; Hymenoptera; Formicidae). The genome sequence is 256.2 megabases in span. Most of the assembly is scaffolded into 14 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 18.75 kilobases in length.
Collapse
|
42
|
Griffiths A, Prescott T. The genome sequence of the Scalloped Hook-tip moth, Falcaria lacertinaria (Linnaeus, 1758). Wellcome Open Res 2024; 9:659. [PMID: 39649622 PMCID: PMC11624438 DOI: 10.12688/wellcomeopenres.23258.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/16/2024] [Indexed: 12/11/2024] Open
Abstract
We present a genome assembly from an individual female Falcaria lacertinaria (the Scalloped Hook-tip; Arthropoda; Insecta; Lepidoptera; Drepanidae). The genome sequence has a total length of 300.20 megabases. Most of the assembly is scaffolded into 32 chromosomal pseudomolecules, including the W and Z sex chromosomes. The mitochondrial genome has also been assembled and is 16.07 kilobases in length. Gene annotation of this assembly on Ensembl identified 11,709 protein-coding genes.
Collapse
Affiliation(s)
- Andy Griffiths
- Wellcome Sanger Institute, Hinxton, England, UK
- Royal Botanic Garden Edinburgh, Edinburgh, Scotland, UK
| | - Tom Prescott
- Butterfly Conservation Scotland, Stirling, Scotland, UK
| | | | - Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team
- Wellcome Sanger Institute, Hinxton, England, UK
- Royal Botanic Garden Edinburgh, Edinburgh, Scotland, UK
- Butterfly Conservation Scotland, Stirling, Scotland, UK
| | | | | | | | | |
Collapse
|
43
|
Weber CC. Disentangling cobionts and contamination in long-read genomic data using sequence composition. G3 (BETHESDA, MD.) 2024; 14:jkae187. [PMID: 39148415 PMCID: PMC11540323 DOI: 10.1093/g3journal/jkae187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/02/2024] [Accepted: 08/02/2024] [Indexed: 08/17/2024]
Abstract
The recent acceleration in genome sequencing targeting previously unexplored parts of the tree of life presents computational challenges. Samples collected from the wild often contain sequences from several organisms, including the target, its cobionts, and contaminants. Effective methods are therefore needed to separate sequences. Though advances in sequencing technology make this task easier, it remains difficult to taxonomically assign sequences from eukaryotic taxa that are not well represented in databases. Therefore, reference-based methods alone are insufficient. Here, I examine how we can take advantage of differences in sequence composition between organisms to identify symbionts, parasites, and contaminants in samples, with minimal reliance on reference data. To this end, I explore data from the Darwin Tree of Life project, including hundreds of high-quality HiFi read sets from insects. Visualizing two-dimensional representations of read tetranucleotide composition learned by a variational autoencoder can reveal distinct components of a sample. Annotating the embeddings with additional information, such as coding density, estimated coverage, or taxonomic labels allows rapid assessment of the contents of a dataset. The approach scales to millions of sequences, making it possible to explore unassembled read sets, even for large genomes. Combined with interactive visualization tools, it allows a large fraction of cobionts reported by reference-based screening to be identified. Crucially, it also facilitates retrieving genomes for which suitable reference data are absent.
Collapse
Affiliation(s)
- Claudia C Weber
- Tree of Life, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| |
Collapse
|
44
|
Hartley M, Anita L, Babalola K, Russell C, Yoldaş AK, Zulueta-Coarasa T. Pictures at an exhibition: How to share your imaging data. J Microsc 2024; 296:145-149. [PMID: 37648214 DOI: 10.1111/jmi.13221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 08/22/2023] [Accepted: 08/28/2023] [Indexed: 09/01/2023]
Abstract
Open access to data underpinning published results is a key pillar of scientific reproducibility. Making data available at scale also provides opportunities for data reuse, encouraging the development of new analysis approaches. In this poster article, accompanying a recorded talk, we will explain the benefits of publicly archiving your image data alongside your published manuscripts, as well as highlight what resources are available to do this. This will include the BioImage Archive, EMBL-EBI's new resource for biological image data, https://www.ebi.ac.uk/bioimage-archive/. We will look at how image data submission works, how to prepare in advance for archiving your data and upcoming developments.
Collapse
Affiliation(s)
- Matthew Hartley
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Liviu Anita
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kolawole Babalola
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Craig Russell
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Aybüke Küpcü Yoldaş
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Teresa Zulueta-Coarasa
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
45
|
Chung G, Piano F, Gunsalus KC. TeloSearchLR: an algorithm to detect novel telomere repeat motifs using long sequencing reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.29.617943. [PMID: 39554068 PMCID: PMC11565940 DOI: 10.1101/2024.10.29.617943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Telomeres are eukaryotic chromosome end structures that guard against sequence loss and aberrant chromosome fusions. Telomeric repeat motifs (TRMs), the minimal repeating unit of a telomere, vary from species to species, with some evolutionary clades experiencing a rapid sequence divergence. To explore the full scope of this evolutionary divergence, many bioinformatic tools have been developed to infer novel TRMs using repetitive sequence search on short sequencing reads. However, novel telomeric motifs remain unidentified in up to half of the sequencing libraries assayed with these tools. A possible reason may be that short reads, derived from extensively sheared DNA, preserve little to no positional context of the repetitive sequences assayed. On the other hand, if a sequencing read is sufficiently long, telomeric sequences must appear at either end rather than in the middle. The TeloSearchLR algorithm relies on this to help identify novel TRMs on long reads, in many cases where short-read search tools have failed. In addition, we demonstrate that TeloSearchLR can reveal unusually long telomeric motifs not maintained by telomerase, and it can also be used to anchor terminal scaffolds in new genome assemblies.
Collapse
|
46
|
Kaur G, Perteghella T, Carbonell-Sala S, Gonzalez-Martinez J, Hunt T, Mądry T, Jungreis I, Arnan C, Lagarde J, Borsari B, Sisu C, Jiang Y, Bennett R, Berry A, Cerdán-Vélez D, Cochran K, Vara C, Davidson C, Donaldson S, Dursun C, González-López S, Gopal Das S, Hardy M, Hollis Z, Kay M, Montañés JC, Ni P, Nurtdinov R, Palumbo E, Pulido-Quetglas C, Suner MM, Yu X, Zhang D, Loveland JE, Albà MM, Diekhans M, Tanzer A, Mudge JM, Flicek P, Martin FJ, Gerstein M, Kellis M, Kundaje A, Paten B, Tress ML, Johnson R, Uszczynska-Ratajczak B, Frankish A, Guigó R. GENCODE: massively expanding the lncRNA catalog through capture long-read RNA sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.29.620654. [PMID: 39554180 PMCID: PMC11565817 DOI: 10.1101/2024.10.29.620654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Accurate and complete gene annotations are indispensable for understanding how genome sequences encode biological functions. For twenty years, the GENCODE consortium has developed reference annotations for the human and mouse genomes, becoming a foundation for biomedical and genomics communities worldwide. Nevertheless, collections of important yet poorly-understood gene classes like long non-coding RNAs (lncRNAs) remain incomplete and scattered across multiple, uncoordinated catalogs, slowing down progress in the field. To address these issues, GENCODE has undertaken the most comprehensive lncRNAs annotation effort to date. This is founded on the manual annotation of full-length targeted long-read sequencing, on matched embryonic and adult tissues, of orthologous regions in human and mouse. Altogether 17,931 novel human genes (140,268 novel transcripts) and 22,784 novel mouse genes (136,169 novel transcripts) have been added to the GENCODE catalog representing a 2-fold and 6-fold increase in transcripts, respectively - the greatest increase since the sequencing of the human genome. Novel gene annotations display evolutionary constraints, have well-formed promoter regions, and link to phenotype-associated genetic variants. They greatly enhance the functional interpretability of the human genome, as they help explain millions of previously-mapped "orphan" omics measurements corresponding to transcription start sites, chromatin modifications and transcription factor binding sites. Crucially, our targeted design assigned human-mouse orthologs at a rate beyond previous studies, tripling the number of human disease-associated lncRNAs with mouse orthologs. The expanded and enhanced GENCODE lncRNA annotations mark a critical step towards deciphering the human and mouse genomes.
Collapse
Affiliation(s)
- Gazaldeep Kaur
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Tamara Perteghella
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra (UPF)
| | - Sílvia Carbonell-Sala
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Jose Gonzalez-Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tomasz Mądry
- Department of Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Irwin Jungreis
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, MA 02139, USA
- The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Carme Arnan
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Flomics Biotech, SL, Carrer de Roc Boronat 31, 08005 Barcelona, Catalonia, Spain
| | - Beatrice Borsari
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Cristina Sisu
- Department of Life Sciences, Brunel University London, Uxbridge, London, UB8 3PH, UK
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Kelly Cochran
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Covadonga Vara
- Hospital del Mar Research Institute, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cagatay Dursun
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Silvia González-López
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra (UPF)
| | - Sasti Gopal Das
- Department of Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Pengyu Ni
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Ramil Nurtdinov
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Emilio Palumbo
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Carlos Pulido-Quetglas
- Department of Medical Oncology, Bern University Hospital, Murtenstrasse 35, 3008 Bern, Switzerland
- School of Biology and Environmental Science, University College Dublin, University College Dublin, Belfield, Dublin 4, D04 V1W8, Ireland
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Xuezhu Yu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Dingyao Zhang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - M Mar Albà
- Hospital del Mar Research Institute, Dr. Aiguader 88, Barcelona 08003, Spain
- Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, 2300 Delaware Avenue, University of California, Santa Cruz, CA 95060, USA
| | - Andrea Tanzer
- University of Vienna, Research Network Data Science, Kolingasse 14-16, 1090 Vienna, Austria
- University of Vienna, Faculty of Computer Science, Research Group Visualization and Data Analysis, Waehringerstrasse 29, 1090 Vienna, Austria
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, MA 02139, USA
- The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, 2300 Delaware Avenue, University of California, Santa Cruz, CA 95060, USA
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Rory Johnson
- Department of Medical Oncology, Bern University Hospital, Murtenstrasse 35, 3008 Bern, Switzerland
- School of Biology and Environmental Science, University College Dublin, University College Dublin, Belfield, Dublin 4, D04 V1W8, Ireland
| | - Barbara Uszczynska-Ratajczak
- Department of Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra (UPF)
| |
Collapse
|
47
|
Adkins P, Harley J, Brittain R, Scott-Somme K, Azzopardi F. The genome sequence of the John Dory, Zeus faber Linnaeus, 1758. Wellcome Open Res 2024; 9:150. [PMID: 38881949 PMCID: PMC11179049 DOI: 10.12688/wellcomeopenres.21140.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/11/2024] [Indexed: 06/18/2024] Open
Abstract
We present a genome assembly from an individual Zeus faber (the John Dory; Chordata; Actinopteri; Zeiformes; Zeidae). The genome sequence is 804.7 megabases in span. Most of the assembly is scaffolded into 22 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 16.72 kilobases in length.
Collapse
Affiliation(s)
- Patrick Adkins
- The Marine Biological Association, Plymouth, England, UK
| | - Joanna Harley
- The Marine Biological Association, Plymouth, England, UK
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Beaven R, Denholm B. The cryptonephridial/rectal complex: an evolutionary adaptation for water and ion conservation. Biol Rev Camb Philos Soc 2024. [PMID: 39438273 DOI: 10.1111/brv.13156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 10/08/2024] [Accepted: 10/10/2024] [Indexed: 10/25/2024]
Abstract
Arthropods have integrated digestive and renal systems, which function to acquire and maintain homeostatically the substances they require for survival. The cryptonephridial complex (CNC) is an evolutionary novelty in which the renal organs and gut have been dramatically reorganised. Parts of the renal or Malpighian tubules (MpTs) form a close association with the surface of the rectum, and are surrounded by a novel tissue, the perinephric membrane, which acts to insulate the system from the haemolymph and thus allows tight regulation of ions and water into and out of the CNC. The CNC can reclaim water and solutes from the rectal contents and recycle these back into the haemolymph. Fluid flow in the MpTs runs counter to flow within the rectum. It is this countercurrent arrangement that underpins its powerful recycling capabilities, and represents one of the most efficient water conservation mechanisms in nature. CNCs appear to have evolved multiple times, and are present in some of the largest and most evolutionarily successful insect groups including the larvae of most Lepidoptera and in a major beetle lineage (Cucujiformia + Bostrichoidea), suggesting that the CNC is an important adaptation. Here we review the knowledge of this remarkable organ system gained over the past 200 years. We first focus on the CNCs of tenebrionid beetles, for which we have an in-depth understanding from physiological, structural and ultrastructural studies (primarily in Tenebrio molitor), which are now being extended by studies in Tribolium castaneum enabled by advances in molecular and microscopy approaches established for this species. These recent studies are beginning to illuminate CNC development, physiology and endocrine control. We then take a broader view of arthropod CNCs, phylogenetically mapping their reported occurrence to assess their distribution and likely evolutionary origins. We explore CNCs from an ecological viewpoint, put forward evidence that CNCs may primarily be adaptations for facing the challenges of larval life, and argue that their loss in many aquatic species could point to a primary function in conserving water in terrestrial species. Finally, by considering the functions of renal and digestive epithelia in insects lacking CNCs, as well as the typical architecture of these organs in relation to one another, we propose that ancestral features of these organs predispose them for the evolution of CNCs.
Collapse
Affiliation(s)
- Robin Beaven
- Hugh Robson Building, George Square, Deanery of Biomedical Sciences, The University of Edinburgh, Edinburgh, EH8 9XD, UK
| | - Barry Denholm
- Hugh Robson Building, George Square, Deanery of Biomedical Sciences, The University of Edinburgh, Edinburgh, EH8 9XD, UK
| |
Collapse
|
49
|
Blaxter M, Pauperio J, Schoch C, Howe K. Taxonomy Identifiers (TaxId) for Biodiversity Genomics: a guide to getting TaxId for submission of data to public databases. Wellcome Open Res 2024; 9:591. [PMID: 39526195 PMCID: PMC11544195 DOI: 10.12688/wellcomeopenres.22949.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/26/2024] [Indexed: 11/16/2024] Open
Abstract
Biodiversity genomics critically depends on correct taxonomic identification of the sample from which data are derived. Tracking of that taxonomic information through systems that archive data and report on genome sequencing efforts. For submission of data to the International Nucleotide Sequence Database Collaboration (INSDC) databases (DNA DataBank of Japan [DDBJ], European Nucleotide Archive [ENA] and National Center for Biotechnology Information [NCBI]), samples and data derived from them must be assigned a species-level NCBI Taxonomy taxonomic identifier (TaxId, sometimes referred to as taxId or txid). We thus need to be able to identify the TaxId for a target species efficiently. Because the NCBI Taxonomy does not include all known species and cannot preemptively represent unknown taxa, we also need an efficient process for generating new TaxIds for species not yet listed. This document provides workflows for different kinds of TaxId acquisition scenarios and was created to guide users in these processes. Although developed for European projects such as Darwin Tree of Life and the European Reference Genome Atlas, the workflows are universally applicable and describe the use of ENA in resolving taxonomic issues. Too Long: Didn't Read (TL;DR): Use the ENA REST API programmatically to retrieve TaxIds for target species and confirm that sequence data can be submitted to those TaxIds.Use the NCBI Web interface to NCBI Taxonomy to identify potential homotypic synonyms.Request a new TaxId from ENA for a species not yet in NCBI Taxonomy, and for species-like entries for which the full Linnaean binomen is not determined (see https://ena-docs.readthedocs.io/en/latest/faq/taxonomy_requests.html#creating-taxon-requests).Discuss directly with the NCBI Taxonomy curators or the curators at ENA and NCBI whenever you think there is an opportunity to improve their database.
Collapse
Affiliation(s)
- Mark Blaxter
- Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK
| | - Joana Pauperio
- European Nucleotide Archive, European Bioinformatics Institute, Hinxton, England, UK
| | - Conrad Schoch
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK
| |
Collapse
|
50
|
Patsakis M, Provatas K, Baltoumas FA, Chantzi N, Mouratidis I, Pavlopoulos GA, Georgakopoulos-Soares I. MAFin: Motif Detection in Multiple Alignment Files. ARXIV 2024:arXiv:2410.11021v1. [PMID: 39483349 PMCID: PMC11527099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Motivation Genome and Proteome Alignments, represented by the Multiple Alignment File (MAF) format, have become a standard approach in the field of comparative genomics and proteomics. However, current approaches lack a direct method for motif detection within MAF files. To address this gap, we present MAFin, a novel tool that enables efficient motif detection and conservation analysis in MAF files, streamlining genomic and proteomic research. Results We developed MAFin, the first motif detection tool for Multiple Alignment Format files. MAFin enables the multithreaded search of conserved motifs using three approaches: 1) by using user-specified k-mers to search the sequences. 2) with regular expressions, in which case one or more patterns are searched, and 3) with predefined Position Weight Matrices. Once the motif has been found, MAFin detects the motif instances and calculates the conservation across the aligned sequences. MAFin also calculates a conservation percentage, which provides information about the conservation levels of each motif across the aligned sequences, based on the number of matches relative to the length of the motif. A set of statistics enable the interpretation of each motif's conservation level, and the detected motifs are exported in JSON and CSV files for downstream analyses. Availability MAFin is released as a Python package under the GPL license as a multi-platform application and is available at: https://github.com/Georgakopoulos-Soares-lab/MAFin.
Collapse
Affiliation(s)
- Michail Patsakis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Kimonas Provatas
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari 16672, Greece
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| |
Collapse
|