1
|
Cox E, Tsuchiya MTN, Ciufo S, Torcivia J, Falk R, Anderson WR, Holmes JB, Hem V, Breen L, Davis E, Ketter A, Zhang P, Soussov V, Schoch CL, O'Leary NA. NCBI taxonomy: enhanced access via NCBI datasets. Nucleic Acids Res 2024:gkae967. [PMID: 39470745 DOI: 10.1093/nar/gkae967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Revised: 10/07/2024] [Accepted: 10/17/2024] [Indexed: 11/01/2024] Open
Abstract
The NCBI Taxonomy resource (https://www.ncbi.nlm.nih.gov/taxonomy) has long been a trusted, curated hub for organism names, classifications, and links to related data for all taxonomic nodes. NCBI Datasets (https://www.ncbi.nlm.nih.gov/datasets/) is an improved way to leverage the rich data available at NCBI so users can effectively browse, search, and download information. While taxonomy data has been a cornerstone of NCBI Datasets since its inception, we recently extended the taxonomy information available via NCBI Datasets by updating the existing NCBI Datasets taxonomy page, implementing a new taxonomy name details page, expanding programmatic access to taxonomic information via command-line tools and APIs and improving the way we handle taxonomic queries to connect users to gene and genome data. This paper highlights these improvements and provides examples to help users effectively harness these new features.
Collapse
Affiliation(s)
- Eric Cox
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mirian T N Tsuchiya
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Stacy Ciufo
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - John Torcivia
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Robert Falk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - W Ray Anderson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - J Bradley Holmes
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Vichet Hem
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Laurie Breen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Emily Davis
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Anne Ketter
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Peifen Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Vladimir Soussov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Conrad L Schoch
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Nuala A O'Leary
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
2
|
Langschied F, Bordin N, Cosentino S, Fuentes-Palacios D, Glover N, Hiller M, Hu Y, Huerta-Cepas J, Coelho LP, Iwasaki W, Majidian S, Manzano-Morales S, Persson E, Richards TA, Gabaldón T, Sonnhammer E, Thomas PD, Dessimoz C, Ebersberger I. Quest for Orthologs in the Era of Biodiversity Genomics. Genome Biol Evol 2024; 16:evae224. [PMID: 39404012 PMCID: PMC11523110 DOI: 10.1093/gbe/evae224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/11/2024] [Indexed: 11/01/2024] Open
Abstract
The era of biodiversity genomics is characterized by large-scale genome sequencing efforts that aim to represent each living taxon with an assembled genome. Generating knowledge from this wealth of data has not kept up with this pace. We here discuss major challenges to integrating these novel genomes into a comprehensive functional and evolutionary network spanning the tree of life. In summary, the expanding datasets create a need for scalable gene annotation methods. To trace gene function across species, new methods must seek to increase the resolution of ortholog analyses, e.g. by extending analyses to the protein domain level and by accounting for alternative splicing. Additionally, the scope of orthology prediction should be pushed beyond well-investigated proteomes. This demands the development of specialized methods for the identification of orthologs to short proteins and noncoding RNAs and for the functional characterization of novel gene families. Furthermore, protein structures predicted by machine learning are now readily available, but this new information is yet to be integrated with orthology-based analyses. Finally, an increasing focus should be placed on making orthology assignments adhere to the findable, accessible, interoperable, and reusable (FAIR) principles. This fosters green bioinformatics by avoiding redundant computations and helps integrating diverse scientific communities sharing the need for comparative genetics and genomics information. It should also help with communicating orthology-related concepts in a format that is accessible to the public, to counteract existing misinformation about evolution.
Collapse
Affiliation(s)
- Felix Langschied
- Department for Applied Bioinformatics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Salvatore Cosentino
- Department of Integrated Biosciences, The University of Tokyo, 277-0882 Tokyo, Japan
| | - Diego Fuentes-Palacios
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Natasha Glover
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Michael Hiller
- Department of Comparative Genomics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Yanhui Hu
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, Boston, MA 02115, USA
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, Spain
| | - Luis Pedro Coelho
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Wataru Iwasaki
- Department of Integrated Biosciences, University of Tokyo, 277-0882 Tokyo, Japan
| | - Sina Majidian
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Saioa Manzano-Morales
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Emma Persson
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna, Sweden
| | | | - Toni Gabaldón
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
- CIBER de Enfermedades Infecciosas, Instituto de Salud Carlos III, Madrid, Spain
| | - Erik Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna, Sweden
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Ingo Ebersberger
- Department for Applied Bioinformatics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIK-F), Frankfurt am Main, Germany
| |
Collapse
|
3
|
Kodama Y, Ara T, Fukuda A, Tokimatsu T, Mashima J, Kosuge T, Tanizawa Y, Tanjo T, Ogasawara O, Fujisawa T, Nakamura Y, Arita M. DDBJ update in 2024: the DDBJ Group Cloud service for sharing pre-publication data. Nucleic Acids Res 2024:gkae882. [PMID: 39380489 DOI: 10.1093/nar/gkae882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 09/19/2024] [Accepted: 09/24/2024] [Indexed: 10/10/2024] Open
Abstract
The Bioinformation and DNA Data Bank of Japan Center (DDBJ Center, https://www.ddbj.nig.ac.jp) provides public databases that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), the DDBJ Center accepts and distributes nucleotide sequence data ranging from raw reads to assembled and annotated sequences with the National Center for Biotechnology Information and the European Bioinformatics Institute. Besides INSDC databases, the DDBJ Center provides databases for functional genomics (Genomic Expression Archive), metabolomics (MetaboBank), human genetic variations (TogoVar-repository) and human genetic and phenotypic data (Japanese Genotype-phenotype Archive). These database systems have been built on the National Institute of Genetics supercomputer, which is also a platform for the DDBJ Group Cloud (DGC) services for sharing and analysis of pre-publication data among research groups. This paper reports recent updates on the databases and the services of the DDBJ Center, highlighting the DGC service.
Collapse
Affiliation(s)
- Yuichi Kodama
- Bioinformation and DDBJ Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Takeshi Ara
- Bioinformation and DDBJ Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Asami Fukuda
- Bioinformation and DDBJ Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Toshiaki Tokimatsu
- Bioinformation and DDBJ Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Jun Mashima
- Bioinformation and DDBJ Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Takehide Kosuge
- Bioinformation and DDBJ Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Yasuhiro Tanizawa
- Bioinformation and DDBJ Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Tomoya Tanjo
- Bioinformation and DDBJ Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Osamu Ogasawara
- Bioinformation and DDBJ Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Takatomo Fujisawa
- Bioinformation and DDBJ Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Yasukazu Nakamura
- Bioinformation and DDBJ Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Masanori Arita
- Bioinformation and DDBJ Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
| |
Collapse
|
4
|
Calhoun VC, Hatcher EL, Yankie L, Nawrocki EP. Influenza sequence validation and annotation using VADR. Database (Oxford) 2024; 2024:baae091. [PMID: 39297389 PMCID: PMC11411204 DOI: 10.1093/database/baae091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 08/02/2024] [Accepted: 08/08/2024] [Indexed: 09/25/2024]
Abstract
Tens of thousands of influenza sequences are deposited into the GenBank database each year. The software tool FLu ANnotation tool (FLAN) has been used by GenBank since 2007 to validate and annotate incoming influenza sequence submissions and has been publicly available as a webserver but not as a standalone tool. Viral Annotation DefineR (VADR) is a general sequence validation and annotation software package used by GenBank for norovirus, dengue virus and SARS-CoV-2 virus sequence processing that is available as a standalone tool. We have created VADR influenza models based on the FLAN reference sequences and adapted VADR to accurately annotate influenza sequences. VADR and FLAN show consistent results on the vast majority of influenza sequences, and when they disagree, VADR is usually correct. VADR can also accurately process influenza D sequences as well as influenza A H17, H18, H19, N10 and N11 subtype sequences, which FLAN cannot. VADR 1.6.3 and the associated influenza models are now freely available for users to download and use. Database URL: https://bitbucket.org/nawrockie/vadr-models-flu.
Collapse
Affiliation(s)
- Vincent C Calhoun
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda, MD 20894, United States
| | - Eneida L Hatcher
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda, MD 20894, United States
| | - Linda Yankie
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda, MD 20894, United States
| | - Eric P Nawrocki
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda, MD 20894, United States
| |
Collapse
|
5
|
Bu C, Zheng X, Zhao X, Xu T, Bai X, Jia Y, Chen M, Hao L, Xiao J, Zhang Z, Zhao W, Tang B, Bao Y. GenBase: A Nucleotide Sequence Database. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae047. [PMID: 38913867 PMCID: PMC11434157 DOI: 10.1093/gpbjnl/qzae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/28/2024] [Accepted: 06/10/2024] [Indexed: 06/26/2024]
Abstract
The rapid advancement of sequencing technologies poses challenges in managing the large volume and exponential growth of sequence data efficiently and on time. To address this issue, we present GenBase (https://ngdc.cncb.ac.cn/genbase), an open-access data repository that follows the International Nucleotide Sequence Database Collaboration (INSDC) data standards and structures, for efficient nucleotide sequence archiving, searching, and sharing. As a core resource within the National Genomics Data Center (NGDC) of the China National Center for Bioinformation (CNCB; https://ngdc.cncb.ac.cn), GenBase offers bilingual submission pipeline and services, as well as local submission assistance in China. GenBase also provides a unique Excel format for metadata description and feature annotation of nucleotide sequences, along with a real-time data validation system to streamline sequence submissions. As of April 23, 2024, GenBase received 68,251 nucleotide sequences and 689,574 annotated protein sequences across 414 species from 2319 submissions. Out of these, 63,614 (93%) nucleotide sequences and 620,640 (90%) annotated protein sequences have been released and are publicly accessible through GenBase's web search system, File Transfer Protocol (FTP), and Application Programming Interface (API). Additionally, in collaboration with INSDC, GenBase has constructed an effective data exchange mechanism with GenBank and started sharing released nucleotide sequences. Furthermore, GenBase integrates all sequences from GenBank with daily updates, demonstrating its commitment to actively contributing to global sequence data management and sharing.
Collapse
Affiliation(s)
- Congfan Bu
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Xinchang Zheng
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Xuetong Zhao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Tianyi Xu
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Xue Bai
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Yaokai Jia
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Meili Chen
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Lili Hao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Jingfa Xiao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhang Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenming Zhao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bixia Tang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Yiming Bao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
6
|
Nanduri S, Black A, Bedford T, Huddleston J. Dimensionality reduction distills complex evolutionary relationships in seasonal influenza and SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.07.579374. [PMID: 39253501 PMCID: PMC11383015 DOI: 10.1101/2024.02.07.579374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Public health researchers and practitioners commonly infer phylogenies from viral genome sequences to understand transmission dynamics and identify clusters of genetically-related samples. However, viruses that reassort or recombine violate phylogenetic assumptions and require more sophisticated methods. Even when phylogenies are appropriate, they can be unnecessary or difficult to interpret without specialty knowledge. For example, pairwise distances between sequences can be enough to identify clusters of related samples or assign new samples to existing phylogenetic clusters. In this work, we tested whether dimensionality reduction methods could capture known genetic groups within two human pathogenic viruses that cause substantial human morbidity and mortality and frequently reassort or recombine, respectively: seasonal influenza A/H3N2 and SARS-CoV-2. We applied principal component analysis (PCA), multidimensional scaling (MDS), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP) to sequences with well-defined phylogenetic clades and either reassortment (H3N2) or recombination (SARS-CoV-2). For each low-dimensional embedding of sequences, we calculated the correlation between pairwise genetic and Euclidean distances in the embedding and applied a hierarchical clustering method to identify clusters in the embedding. We measured the accuracy of clusters compared to previously defined phylogenetic clades, reassortment clusters, or recombinant lineages. We found that MDS embeddings accurately represented pairwise genetic distances including the intermediate placement of recombinant SARS-CoV-2 lineages between parental lineages. Clusters from t-SNE embeddings accurately recapitulated known phylogenetic clades, H3N2 reassortment groups, and SARS-CoV-2 recombinant lineages. We show that simple statistical methods without a biological model can accurately represent known genetic relationships for relevant human pathogenic viruses. Our open source implementation of these methods for analysis of viral genome sequences can be easily applied when phylogenetic methods are either unnecessary or inappropriate.
Collapse
Affiliation(s)
- Sravani Nanduri
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Allison Black
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - John Huddleston
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| |
Collapse
|
7
|
Renner SS, Scherz MD, Schoch CL, Gottschling M, Vences M. Improving the gold standard in NCBI GenBank and related databases: DNA sequences from type specimens and type strains. Syst Biol 2024; 73:486-494. [PMID: 37956405 PMCID: PMC11502950 DOI: 10.1093/sysbio/syad068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 08/21/2023] [Accepted: 11/11/2023] [Indexed: 11/15/2023] Open
Abstract
Scientific names permit humans and search engines to access knowledge about the biodiversity that surrounds us, and names linked to DNA sequences are playing an ever-greater role in search-and-match identification procedures. Here, we analyze how users and curators of the National Center for Biotechnology Information (NCBI) are flagging and curating sequences derived from nomenclatural type material, which is the only way to improve the quality of DNA-based identification in the long run. For prokaryotes, 18,281 genome assemblies from type strains have been curated by NCBI staff and improve the quality of prokaryote naming. For Fungi, type-derived sequences representing over 21,000 species are now essential for fungus naming and identification. For the remaining eukaryotes, however, the numbers of sequences identifiable as type-derived are minuscule, representing only 739 species of arthropods, 1542 vertebrates, and 125 embryophytes. An increase in the production and curation of such sequences will come from (i) sequencing of types or topotypic specimens in museum collections, (ii) the March 2023 rule changes at the International Nucleotide Sequence Database Collaboration requiring more metadata for specimens, and (iii) efforts by data submitters to facilitate curation, including informing NCBI curators about a specimen's type status. We illustrate different type-data submission journeys and provide best-practice examples from a range of organisms. Expanding the number of type-derived sequences in DNA databases, especially of eukaryotes, is crucial for capturing, documenting, and protecting biodiversity.
Collapse
Affiliation(s)
- Susanne S Renner
- Department of Biology, Washington University, Saint Louis, MO 63130, USA
| | - Mark D Scherz
- Natural History Museum of Denmark, University of Copenhagen, Universitetsparken 15, Copenhagen 2100, Denmark
| | - Conrad L Schoch
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Marc Gottschling
- Faculty of Biology, GeoBio-Center, Ludwig-Maximilians-University, Munich 80333, Germany
| | - Miguel Vences
- Division of Evolutionary Biology, Zoological Institute, University of Technology, Mendelssohnstr. 4, 38106 Braunschweig, Germany
| |
Collapse
|
8
|
Sharaf A, Nesengani LT, Hayah I, Kuja JO, Mdyogolo S, Omotoriogun TC, Odogwu BA, Beedessee G, Smith RM, Barakat A, Moila AM, El Hamouchi A, Benkahla A, Boukteb A, Elmouhtadi A, Mafwila AL, Abushady AM, Elsherif AK, Ahmed B, Wairuri C, Ndiribe CC, Ebuzome C, Kinnear CJ, Ndlovu DF, Iraqi D, El Fahime E, Assefa E, Ouardi F, Belharfi FZ, Tmimi FZ, Markey FB, Radouani F, Zeukeng F, Mvumbi GL, Ganesan H, Hanachi M, Nigussie H, Charoute H, Benamri I, Mkedder I, Haddadi I, Meftah-Kadmiri I, Mubiru JF, Domelevo Entfellner JBK, Rokani JB, Ogwang J, Daiga JB, Omumbo J, Ideozu JE, Errafii K, Labuschagne K, Komi KK, Tonfack LB, Hadjeras L, Ramantswana M, Chaisi M, Botes MW, Kilian M, Kvas M, Melloul M, Chaouch M, Khyatti M, Abdo M, Phasha-Muchemenye M, Hijri M, Mediouni MR, Hassan MA, Piro M, Mwale M, Maaloum M, Mavhunga M, Olivier NA, Aminou O, Arbani O, Souiai O, Djocgoue PF, Mentag R, Zipfel RD, Tata RB, Megnekou R, Muzemil S, Paez S, Salifu SP, Kagame SP, Selka S, Edwards S, Gaouar SBS, Reda SRA, Fellahi S, Khayi S, Ayed S, Madisha T, Sahil T, Udensi OU, Ras V, Ezebuiro V, Duru VC, David X, Geberemichael Y, Tchiechoua YH, Mungloo-Dilmohamud Z, Chen Z, Happi C, Kariuki T, Ziyomo C, Djikeng A, Badaoui B, Mapholi N, Muigai A, Osuji JO, Ebenezer TE. Establishing African genomics and bioinformatics programs through annual regional workshops. Nat Genet 2024:10.1038/s41588-024-01807-6. [PMID: 38977855 DOI: 10.1038/s41588-024-01807-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 05/22/2024] [Indexed: 07/10/2024]
Abstract
The African BioGenome Project (AfricaBP) Open Institute for Genomics and Bioinformatics aims to overcome barriers to capacity building through its distributed African regional workshops and prioritizes the exchange of grassroots knowledge and innovation in biodiversity genomics and bioinformatics. In 2023, we implemented 28 workshops on biodiversity genomics and bioinformatics, covering 11 African countries across the 5 African geographical regions. These regional workshops trained 408 African scientists in hands-on molecular biology, genomics and bioinformatics techniques as well as the ethical, legal and social issues associated with acquiring genetic resources. Here, we discuss the implementation of transformative strategies, such as expanding the regional workshop model of AfricaBP to involve multiple countries, institutions and partners, including the proposed creation of an African digital database with sequence information relating to both biodiversity and agriculture. This will ultimately help create a critical mass of skilled genomics and bioinformatics scientists across Africa.
Collapse
Affiliation(s)
- Abdoallah Sharaf
- SequAna Core Facility, Department of Biology, University of Konstanz, Konstanz, Germany
- Genetics Department, Faculty of Agriculture, Ain Shams University, Cairo, Egypt
| | - Lucky Tendani Nesengani
- College of Agriculture and Environmental Sciences, University of South Africa, Florida, South Africa
| | - Ichrak Hayah
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | | | - Sinebongo Mdyogolo
- College of Agriculture and Environmental Sciences, University of South Africa, Florida, South Africa
| | - Taiwo Crossby Omotoriogun
- Department of Biological Sciences, Elizade University, Ilara-Mokin, Nigeria
- A. P. Leventis Ornithological Research Institute, University of Jos, Jos, Nigeria
| | - Blessing Adanta Odogwu
- Regional Centre for Biotechnology and Bioresources Research, University of Port Harcourt, Port Harcourt, Nigeria
- South-South Zonal Centre of Excellence, National Biotechnology Development Agency, Port Harcourt, Nigeria
| | - Girish Beedessee
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Newcastle-upon-Tyne, UK
| | - Rae Marvin Smith
- College of Agriculture and Environmental Sciences, University of South Africa, Florida, South Africa
| | | | | | - Adil El Hamouchi
- Research Department, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Alia Benkahla
- Laboratory of Bioinformatics, Biomathematics and Biostatistics-LR16IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Amal Boukteb
- Field Crops Laboratory, National Institute of Agricultural Research of Tunisia (INRAT), University of Carthage, Tunis, Tunisia
| | - Amine Elmouhtadi
- Biotechnology Research Unit, Regional Center of Agricultural Research of Rabat, National Institute of Agricultural Research, Rabat, Morocco
| | - Antoine Lusala Mafwila
- Laboratory of Molecular Biology, Department of Basic Sciences, University of Kinshasa, Kinshasa, Democratic Republic of Congo
| | - Asmaa Mohammed Abushady
- Genetics Department, Faculty of Agriculture, Ain Shams University, Cairo, Egypt
- Biotechnology School, Nile University, Giza, Egypt
| | | | - Bulbul Ahmed
- African Genome Center, University Mohammed VI Polytechnic (UM6P), Ben Guerir, Morocco
| | | | | | | | - Craig J Kinnear
- South African Medical Research Council Genomics Platform, Cape Town, South Africa
| | | | - Driss Iraqi
- Biotechnology Research Unit, Regional Center of Agricultural Research of Rabat, National Institute of Agricultural Research, Rabat, Morocco
| | | | - Ermias Assefa
- Bio and Emerging Technology Institute, Addis Ababa, Ethiopia
| | - Faissal Ouardi
- Faculty of Sciences, Mohammed V University, Rabat, Morocco
| | - Fatima Zohra Belharfi
- Applied Genetics in Agriculture, Ecology and Public Health Laboratory, University of Abou Bekr Belkaid Tlemcen, Tlemcen, Algeria
| | | | - Fatu Badiane Markey
- Science for Africa Foundation, Nairobi, Kenya
- Rutgers University-Newark, Newark, NJ, USA
| | - Fouzia Radouani
- Research Department, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Francis Zeukeng
- Biotechnology Centre, University of Yaoundé 1, Yaoundé, Cameroon
| | - Georges Lelo Mvumbi
- Laboratory of Molecular Biology, Department of Basic Sciences, University of Kinshasa, Kinshasa, Democratic Republic of Congo
| | | | - Mariem Hanachi
- Laboratory of Bioinformatics, Biomathematics and Biostatistics-LR16IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Helen Nigussie
- Department of Microbial Cellular and Molecular Biology, Addis Ababa University, Addis Ababa, Ethiopia
| | - Hicham Charoute
- Research Department, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Ichrak Benamri
- Research Department, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Ikram Mkedder
- Applied Genetics in Agriculture, Ecology and Public Health Laboratory, University of Abou Bekr Belkaid Tlemcen, Tlemcen, Algeria
| | - Imane Haddadi
- Applied Genetics in Agriculture, Ecology and Public Health Laboratory, University of Abou Bekr Belkaid Tlemcen, Tlemcen, Algeria
| | - Issam Meftah-Kadmiri
- Plant and Microbial Biotechnology Center, Moroccan Foundation for Advanced Science, Innovation and Research, University Mohammed VI Polytechnic, Ben Guerir, Morocco
| | - Jackson Franco Mubiru
- Department of Breeding and Reproduction, National Animal Genetic Resources Centre and Data Bank, Entebbe, Uganda
| | | | - Joan Bayowa Rokani
- Department of Breeding and Reproduction, National Animal Genetic Resources Centre and Data Bank, Entebbe, Uganda
| | - Joel Ogwang
- Department of Breeding and Reproduction, National Animal Genetic Resources Centre and Data Bank, Entebbe, Uganda
| | | | - Judy Omumbo
- Science for Africa Foundation, Nairobi, Kenya
| | | | - Khaoula Errafii
- African Genome Center, University Mohammed VI Polytechnic (UM6P), Ben Guerir, Morocco
| | - Kim Labuschagne
- Foundational Biodiversity Science, South African National Biodiversity Institute, Pretoria, South Africa
| | - Komi Koukoura Komi
- Laboratoire des Sciences Biomédicales, Alimentaires et de Santé Environnementale (LaSBASE), Département des Analyses Biomédicales (AMB), Ecole Supérieure des Techniques Biologiques et Alimentaires (ESTBA), Université de Lomé, Lomé, Togo
| | | | | | | | - Mamohale Chaisi
- Foundational Biodiversity Science, South African National Biodiversity Institute, Pretoria, South Africa
| | - Marietjie W Botes
- Division of Medicine, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | | | - Marija Kvas
- Separations (Pty) Ltd, Johannesburg, South Africa
| | - Marouane Melloul
- National Center for Scientific and Technical Research, Rabat, Morocco
| | - Melek Chaouch
- Laboratory of Bioinformatics, Biomathematics and Biostatistics-LR16IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Meriem Khyatti
- Research Department, Institut Pasteur du Maroc, Casablanca, Morocco
| | | | | | - Mohamed Hijri
- African Genome Center, University Mohammed VI Polytechnic (UM6P), Ben Guerir, Morocco
| | - Mohammed Rida Mediouni
- Applied Genetics in Agriculture, Ecology and Public Health Laboratory, University of Abou Bekr Belkaid Tlemcen, Tlemcen, Algeria
| | | | - Mohammed Piro
- Veterinary Genetic Analysis Laboratory, Hassan II Agronomy and Veterinary Institute (IAV), Rabat, Morocco
| | - Monica Mwale
- Foundational Biodiversity Science, South African National Biodiversity Institute, Pretoria, South Africa
| | | | - Mudzuli Mavhunga
- Foundational Biodiversity Science, South African National Biodiversity Institute, Pretoria, South Africa
| | - Nicholas Abraham Olivier
- Department of Plant and Soil Sciences, University of Pretoria, Pretoria, South Africa
- Forestry and Agricultural Biotechnology Institute, University of Pretoria, Pretoria, South Africa
| | - Oumaima Aminou
- Veterinary Genetic Analysis Laboratory, Hassan II Agronomy and Veterinary Institute (IAV), Rabat, Morocco
| | - Oumayma Arbani
- Department of Veterinary Pathology and Public Health, Hassan II Agronomy and Veterinary Institute (IAV), Rabat, Morocco
| | - Oussema Souiai
- Laboratory of Bioinformatics, Biomathematics and Biostatistics-LR16IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | | | - Rachid Mentag
- Biotechnology Research Unit, Regional Center of Agricultural Research of Rabat, National Institute of Agricultural Research, Rabat, Morocco
| | - Renate Dorothea Zipfel
- Forestry and Agricultural Biotechnology Institute, University of Pretoria, Pretoria, South Africa
- Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa
| | | | - Rosette Megnekou
- Biotechnology Centre, University of Yaoundé 1, Yaoundé, Cameroon
| | | | - Sadye Paez
- Department of Neurogenetics of Language, Rockefeller University, New York, NY, USA
| | - Samson Pandam Salifu
- Faculty of Bioscience, College of Science, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
| | | | - Sarra Selka
- Applied Genetics in Agriculture, Ecology and Public Health Laboratory, University of Abou Bekr Belkaid Tlemcen, Tlemcen, Algeria
| | | | - Semir Bechir Suheil Gaouar
- Applied Genetics in Agriculture, Ecology and Public Health Laboratory, University of Abou Bekr Belkaid Tlemcen, Tlemcen, Algeria
| | | | - Siham Fellahi
- Veterinary Genetic Analysis Laboratory, Hassan II Agronomy and Veterinary Institute (IAV), Rabat, Morocco
| | - Slimane Khayi
- Biotechnology Research Unit, Regional Center of Agricultural Research of Rabat, National Institute of Agricultural Research, Rabat, Morocco
| | - Soumia Ayed
- Applied Genetics in Agriculture, Ecology and Public Health Laboratory, University of Abou Bekr Belkaid Tlemcen, Tlemcen, Algeria
| | - Thabang Madisha
- Agricultural Research Council, Biotechnology Platform, Pretoria, South Africa
| | | | | | - Verena Ras
- University of Cape Town, Cape Town, South Africa
| | - Victor Ezebuiro
- Regional Centre for Biotechnology and Bioresources Research, University of Port Harcourt, Port Harcourt, Nigeria
- South-South Zonal Centre of Excellence, National Biotechnology Development Agency, Port Harcourt, Nigeria
| | - Vincent C Duru
- Department of Parasitology and Entomology, Nnamdi Azikiwe University, Awka, Nigeria
| | | | | | - Yves H Tchiechoua
- Department of Biology, Chemistry and Pharmacy, Free University Berlin, Berlin, Germany
| | | | | | - Christian Happi
- African Centre of Excellence for Genomics of Infectious Diseases, Redeemer's University, Ede, Nigeria
| | | | | | - Appolinaire Djikeng
- College of Agriculture and Environmental Sciences, University of South Africa, Florida, South Africa
- International Livestock Research Institute, Nairobi, Kenya
- Centre for Tropical Livestock Genetics and Health (CTLGH), Roslin Institute, University of Edinburgh, Edinburgh, UK
| | - Bouabid Badaoui
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco.
- African Sustainable Agriculture Research Institute (ASARI), Mohammed VI Polytechnic University (UM6P), Laâyoune, Morocco.
| | - Ntanganedzeni Mapholi
- College of Agriculture and Environmental Sciences, University of South Africa, Florida, South Africa.
| | - Anne Muigai
- National Defence University-Kenya, Nakuru, Kenya.
- Jomo Kenyatta University of Agriculture and Technology, Juja, Kenya.
| | - Julian O Osuji
- Regional Centre for Biotechnology and Bioresources Research, University of Port Harcourt, Port Harcourt, Nigeria.
- South-South Zonal Centre of Excellence, National Biotechnology Development Agency, Port Harcourt, Nigeria.
- Department of Plant Science and Biotechnology, University of Port Harcourt, Port Harcourt, Nigeria.
| | - ThankGod Echezona Ebenezer
- Early Cancer Institute, Department of Oncology, School of Clinical Medicine, University of Cambridge, Cambridge, UK.
| |
Collapse
|
9
|
Lin D, McAuliffe M, Pruitt KD, Gururaj A, Melchior C, Schmitt C, Wright SN. Biomedical Data Repository Concepts and Management Principles. Sci Data 2024; 11:622. [PMID: 38871749 PMCID: PMC11176378 DOI: 10.1038/s41597-024-03449-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 05/31/2024] [Indexed: 06/15/2024] Open
Abstract
The demand for open data and open science is on the rise, fueled by expectations from the scientific community, calls to increase transparency and reproducibility in research findings, and developments such as the Final Data Management and Sharing Policy from the U.S. National Institutes of Health and a memorandum on increasing public access to federally funded research, issued by the U.S. Office of Science and Technology Policy. This paper explores the pivotal role of data repositories in biomedical research and open science, emphasizing their importance in managing, preserving, and sharing research data. Our objective is to familiarize readers with the functions of data repositories, set expectations for their services, and provide an overview of methods to evaluate their capabilities. The paper serves to introduce fundamental concepts and community-based guiding principles and aims to equip researchers, repository operators, funders, and policymakers with the knowledge to select appropriate repositories for their data management and sharing needs and foster a foundation for the open sharing and preservation of research data.
Collapse
Affiliation(s)
- Dawei Lin
- National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health, Bethesda, Maryland, USA.
| | - Matthew McAuliffe
- Center of Information Technology (CIT), National Institutes of Health, Bethesda, Maryland, USA.
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.
| | - Anupama Gururaj
- National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health, Bethesda, Maryland, USA
| | - Christine Melchior
- Center for Scientific Review (CSR), National Institutes of Health, Bethesda, Maryland, USA
| | - Charles Schmitt
- National Institute of Environmental Health Sciences (NIEHS), National Institutes of Health, Durham, North Carolina, USA
| | - Susan N Wright
- National Institute on Drug Abuse (NIDA), National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
10
|
Masuda Y, Mise K, Xu Z, Zhang Z, Shiratori Y, Senoo K, Itoh H. Global soil metagenomics reveals distribution and predominance of Deltaproteobacteria in nitrogen-fixing microbiome. MICROBIOME 2024; 12:95. [PMID: 38790049 PMCID: PMC11127431 DOI: 10.1186/s40168-024-01812-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 04/09/2024] [Indexed: 05/26/2024]
Abstract
BACKGROUND Biological nitrogen fixation is a fundamental process sustaining all life on earth. While distribution and diversity of N2-fixing soil microbes have been investigated by numerous PCR amplicon sequencing of nitrogenase genes, their comprehensive understanding has been hindered by lack of de facto standard protocols for amplicon surveys and possible PCR biases. Here, by fully leveraging the planetary collections of soil shotgun metagenomes along with recently expanded culture collections, we evaluated the global distribution and diversity of terrestrial diazotrophic microbiome. RESULTS After the extensive analysis of 1,451 soil metagenomic samples, we revealed that the Anaeromyxobacteraceae and Geobacteraceae within Deltaproteobacteria are ubiquitous groups of diazotrophic microbiome in the soils with different geographic origins and land usage types, with particular predominance in anaerobic soils (paddy soils and sediments). CONCLUSION Our results indicate that Deltaproteobacteria is a core bacterial taxon in the potential soil nitrogen fixation population, especially in anaerobic environments, which encourages a careful consideration on deltaproteobacterial diazotrophs in understanding terrestrial nitrogen cycling. Video Abstract.
Collapse
Affiliation(s)
- Yoko Masuda
- Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan.
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan.
| | - Kazumori Mise
- National Institute of Advanced Industrial Science and Technology (AIST) Hokkaido, 2-17-2-1 Tsukisamu-higashi, Toyohira, Sapporo, Hokkaido, 062-8517, Japan.
| | - Zhenxing Xu
- Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan
| | - Zhengcheng Zhang
- Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan
| | - Yutaka Shiratori
- Niigata Agricultural Research Institute, 857 Nagakura-machi, Nagaoka, Niigata, 940-0826, Japan
| | - Keishi Senoo
- Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan
| | - Hideomi Itoh
- National Institute of Advanced Industrial Science and Technology (AIST) Hokkaido, 2-17-2-1 Tsukisamu-higashi, Toyohira, Sapporo, Hokkaido, 062-8517, Japan.
| |
Collapse
|
11
|
Basenko EY, Shanmugasundram A, Böhme U, Starns D, Wilkinson PA, Davison HR, Crouch K, Maslen G, Harb OS, Amos B, McDowell MA, Kissinger JC, Roos DS, Jones A. What is new in FungiDB: a web-based bioinformatics platform for omics-scale data analysis for fungal and oomycete species. Genetics 2024; 227:iyae035. [PMID: 38529759 PMCID: PMC11075537 DOI: 10.1093/genetics/iyae035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 02/15/2024] [Indexed: 03/27/2024] Open
Abstract
FungiDB (https://fungidb.org) serves as a valuable online resource that seamlessly integrates genomic and related large-scale data for a wide range of fungal and oomycete species. As an integral part of the VEuPathDB Bioinformatics Resource Center (https://veupathdb.org), FungiDB continually integrates both published and unpublished data addressing various aspects of fungal biology. Established in early 2011, the database has evolved to support 674 datasets. The datasets include over 300 genomes spanning various taxa (e.g. Ascomycota, Basidiomycota, Blastocladiomycota, Chytridiomycota, Mucoromycota, as well as Albuginales, Peronosporales, Pythiales, and Saprolegniales). In addition to genomic assemblies and annotation, over 300 extra datasets encompassing diverse information, such as expression and variation data, are also available. The resource also provides an intuitive web-based interface, facilitating comprehensive approaches to data mining and visualization. Users can test their hypotheses and navigate through omics-scale datasets using a built-in search strategy system. Moreover, FungiDB offers capabilities for private data analysis via the integrated VEuPathDB Galaxy platform. FungiDB also permits genome improvements by capturing expert knowledge through the User Comments system and the Apollo genome annotation editor for structural and functional gene curation. FungiDB facilitates data exploration and analysis and contributes to advancing research efforts by capturing expert knowledge for fungal and oomycete species.
Collapse
Affiliation(s)
- Evelina Y Basenko
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, UK
| | - Achchuthan Shanmugasundram
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, UK
- Genomics England Limited, London E14 5AB, UK
| | - Ulrike Böhme
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, UK
| | - David Starns
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, UK
| | - Paul A Wilkinson
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, UK
| | - Helen R Davison
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, UK
| | - Kathryn Crouch
- School of Infection and Immunity, University of Glasgow, Glasgow G12 8QQ, UK
| | | | - Omar S Harb
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | | | | | - David S Roos
- University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Andrew Jones
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, UK
| |
Collapse
|
12
|
Fryssouli V, Polemis E, Typas MA, Zervakis GI. Revisiting the phylogeny and taxonomy of the genus Sidera (Hymenochaetales, Basidiomycota) with particular emphasis on S.vulgaris. MycoKeys 2024; 105:119-137. [PMID: 38752164 PMCID: PMC11094396 DOI: 10.3897/mycokeys.105.121601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Accepted: 04/01/2024] [Indexed: 05/18/2024] Open
Abstract
The genus Sidera (Hymenochaetales, Basidiomycota) comprises white-rot, mono- or dimitic fungi with poroid or hydnoid hymenophore. It has a worldwide distribution albeit with fewer species present in the Southern Hemisphere. Although recent studies revealed the existence of several new Sidera species, there are still taxonomic inconsistencies and obscure phylogenetic relationships amongst certain taxa of the genus. In this work, a large number of Sidera collections were used to obtain an updated phylogeny, based on ITS and 28S rDNA sequences by including new material from Mediterranean Europe. The monophyly of the genus was strongly supported and all species with poroid hymenophore formed a highly-supported lineage with two major subclades. In total, 23 putative species were recognised. Amongst those, five are considered to possibly represent entities new to science, but further work is required since they are represented by single specimens or environmental sequences. Examined collections originally named S.lenis from southern Europe were grouped within S.vulgaris. Similarly, several collections under various names were hereby identified as S.vulgaris, including those of the recently described species S.tibetica. Furthermore, a critical discussion (based on morphoanatomical findings) is made on the key features that could be used to distinguish S.lenis from S.vulgaris.
Collapse
Affiliation(s)
- Vassiliki Fryssouli
- Agricultural University of Athens, Laboratory of General and Agricultural Microbiology, Iera Odos 75, 11855 Athens, GreeceAgricultural University of AthensAthensGreece
| | - Elias Polemis
- Agricultural University of Athens, Laboratory of General and Agricultural Microbiology, Iera Odos 75, 11855 Athens, GreeceAgricultural University of AthensAthensGreece
| | - Milton A. Typas
- National and Kapodistrian University of Athens, Department of Genetics and Biotechnology, Faculty of Biology, Panepistemiopolis, Athens 15701, GreeceNational and Kapodistrian University of AthensAthensGreece
| | - Georgios I. Zervakis
- Agricultural University of Athens, Laboratory of General and Agricultural Microbiology, Iera Odos 75, 11855 Athens, GreeceAgricultural University of AthensAthensGreece
| |
Collapse
|
13
|
Meng A, Li X, Li Z, Miao F, Ma L, Li S, Sun W, Huang J, Yang G. Genome assembly of Melilotus officinalis provides a new reference genome for functional genomics. BMC Genom Data 2024; 25:37. [PMID: 38637749 PMCID: PMC11025269 DOI: 10.1186/s12863-024-01224-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 04/10/2024] [Indexed: 04/20/2024] Open
Abstract
BACKGROUND Sweet yellow clover (Melilotus officinalis) is a diploid plant (2n = 16) that is native to Europe. It is an excellent legume forage. It can both fix nitrogen and serve as a medicine. A genome assembly of Melilotus officinalis that was collected from Best corporation in Beijing is available based on Nanopore sequencing. The genome of Melilotus officinalis was sequenced, assembled, and annotated. RESULTS The latest PacBio third generation HiFi assembly and sequencing strategies were used to produce a Melilotus officinalis genome assembly size of 1,066 Mbp, contig N50 = 5 Mbp, scaffold N50 = 130 Mbp, and complete benchmarking universal single-copy orthologs (BUSCOs) = 96.4%. This annotation produced 47,873 high-confidence gene models, which will substantially aid in our research on molecular breeding. A collinear analysis showed that Melilotus officinalis and Medicago truncatula shared conserved synteny. The expansion and contraction of gene families showed that Melilotus officinalis expanded by 565 gene families and shrank by 56 gene families. The contacted gene families were associated with response to stimulus, nucleotide binding, and small molecule binding. Thus, it is related to a family of genes associated with peptidase activity, which could lead to better stress tolerance in plants. CONCLUSIONS In this study, the latest PacBio technology was used to assemble and sequence the genome of the Melilotus officinalis and annotate its protein-coding genes. These results will expand the genomic resources available for Melilotus officinalis and should assist in subsequent research on sweet yellow clover plants.
Collapse
Affiliation(s)
- Aoran Meng
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Xinru Li
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Zhiguang Li
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Fuhong Miao
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Lichao Ma
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Shuo Li
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Wenfei Sun
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | | | - Guofeng Yang
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China.
| |
Collapse
|
14
|
Iketani S, Ho DD. SARS-CoV-2 resistance to monoclonal antibodies and small-molecule drugs. Cell Chem Biol 2024; 31:632-657. [PMID: 38640902 PMCID: PMC11084874 DOI: 10.1016/j.chembiol.2024.03.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 03/18/2024] [Accepted: 03/21/2024] [Indexed: 04/21/2024]
Abstract
Over four years have passed since the beginning of the COVID-19 pandemic. The scientific response has been rapid and effective, with many therapeutic monoclonal antibodies and small molecules developed for clinical use. However, given the ability for viruses to become resistant to antivirals, it is perhaps no surprise that the field has identified resistance to nearly all of these compounds. Here, we provide a comprehensive review of the resistance profile for each of these therapeutics. We hope that this resource provides an atlas for mutations to be aware of for each agent, particularly as a springboard for considerations for the next generation of antivirals. Finally, we discuss the outlook and thoughts for moving forward in how we continue to manage this, and the next, pandemic.
Collapse
Affiliation(s)
- Sho Iketani
- Aaron Diamond AIDS Research Center, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA; Division of Infectious Diseases, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
| | - David D Ho
- Aaron Diamond AIDS Research Center, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA; Division of Infectious Diseases, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA; Department of Microbiology and Immunology, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA.
| |
Collapse
|
15
|
Santangelo BE, Apgar M, Colorado ASB, Martin CG, Sterrett J, Wall E, Joachimiak MP, Hunter LE, Lozupone CA. Integrating biological knowledge for mechanistic inference in the host-associated microbiome. Front Microbiol 2024; 15:1351678. [PMID: 38638909 PMCID: PMC11024261 DOI: 10.3389/fmicb.2024.1351678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/26/2024] [Indexed: 04/20/2024] Open
Abstract
Advances in high-throughput technologies have enhanced our ability to describe microbial communities as they relate to human health and disease. Alongside the growth in sequencing data has come an influx of resources that synthesize knowledge surrounding microbial traits, functions, and metabolic potential with knowledge of how they may impact host pathways to influence disease phenotypes. These knowledge bases can enable the development of mechanistic explanations that may underlie correlations detected between microbial communities and disease. In this review, we survey existing resources and methodologies for the computational integration of broad classes of microbial and host knowledge. We evaluate these knowledge bases in their access methods, content, and source characteristics. We discuss challenges of the creation and utilization of knowledge bases including inconsistency of nomenclature assignment of taxa and metabolites across sources, whether the biological entities represented are rooted in ontologies or taxonomies, and how the structure and accessibility limit the diversity of applications and user types. We make this information available in a code and data repository at: https://github.com/lozuponelab/knowledge-source-mappings. Addressing these challenges will allow for the development of more effective tools for drawing from abundant knowledge to find new insights into microbial mechanisms in disease by fostering a systematic and unbiased exploration of existing information.
Collapse
Affiliation(s)
- Brook E. Santangelo
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | - Madison Apgar
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | | | - Casey G. Martin
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | - John Sterrett
- Department of Integrative Physiology, University of Colorado, Boulder, CO, United States
| | - Elena Wall
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | - Marcin P. Joachimiak
- Lawrence Berkeley National Laboratory, Environmental Genomics and Systems Biology Division, Biosystems Data Science Department, Berkeley, CA, United States
| | - Lawrence E. Hunter
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | - Catherine A. Lozupone
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| |
Collapse
|
16
|
Calhoun VC, Hatcher EL, Yankie L, Nawrocki EP. Influenza sequence validation and annotation using VADR. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.21.585980. [PMID: 38712272 PMCID: PMC11071281 DOI: 10.1101/2024.03.21.585980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Tens of thousands of influenza sequences are deposited into the GenBank database each year. The software tool FLAN has been used by GenBank since 2007 to validate and annotate incoming influenza sequence submissions, and has been publicly available as a webserver but not as a standalone tool. VADR is a general sequence validation and annotation software package used by GenBank for Norovirus, Dengue virus and SARS-CoV-2 virus sequence processing that is available as a standalone tool. We have created VADR influenza models based on the FLAN reference sequences and adapted VADR to accurately annotate influenza sequences. VADR and FLAN show consistent results on the vast majority of influenza sequences, and when they disagree VADR is usually correct. VADR can also accurately process influenza D sequences as well as influenza A H17, H18, H19, N10 and N11 subtype sequences, which FLAN cannot. VADR 1.6.3 and the associated influenza models are now freely available for users to download and use.
Collapse
Affiliation(s)
- Vincent C. Calhoun
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, United States
| | - Eneida L. Hatcher
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, United States
| | - Linda Yankie
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, United States
| | - Eric P. Nawrocki
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, United States
| |
Collapse
|
17
|
Olbrich M, Bartels L, Wohlers I. Sequencing technologies and hardware-accelerated parallel computing transform computational genomics research. FRONTIERS IN BIOINFORMATICS 2024; 4:1384497. [PMID: 38567256 PMCID: PMC10985184 DOI: 10.3389/fbinf.2024.1384497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 03/07/2024] [Indexed: 04/04/2024] Open
Affiliation(s)
- Michael Olbrich
- Center for Biotechnology, Khalifa University for Science and Technology, Abu Dhabi, United Arab Emirates
| | - Lennart Bartels
- Biomolecular Data Science in Pneumology, Research Center Borstel, Borstel, Germany
| | - Inken Wohlers
- Biomolecular Data Science in Pneumology, Research Center Borstel, Borstel, Germany
- University of Lübeck, Lübeck, Germany
| |
Collapse
|
18
|
Miller JR, Adjeroh DA. Machine learning on alignment features for parent-of-origin classification of simulated hybrid RNA-seq. BMC Bioinformatics 2024; 25:109. [PMID: 38475727 DOI: 10.1186/s12859-024-05728-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 03/01/2024] [Indexed: 03/14/2024] Open
Abstract
BACKGROUND Parent-of-origin allele-specific gene expression (ASE) can be detected in interspecies hybrids by virtue of RNA sequence variants between the parental haplotypes. ASE is detectable by differential expression analysis (DEA) applied to the counts of RNA-seq read pairs aligned to parental references, but aligners do not always choose the correct parental reference. RESULTS We used public data for species that are known to hybridize. We measured our ability to assign RNA-seq read pairs to their proper transcriptome or genome references. We tested software packages that assign each read pair to a reference position and found that they often favored the incorrect species reference. To address this problem, we introduce a post process that extracts alignment features and trains a random forest classifier to choose the better alignment. On each simulated hybrid dataset tested, our machine-learning post-processor achieved higher accuracy than the aligner by itself at choosing the correct parent-of-origin per RNA-seq read pair. CONCLUSIONS For the parent-of-origin classification of RNA-seq, machine learning can improve the accuracy of alignment-based methods. This approach could be useful for enhancing ASE detection in interspecies hybrids, though RNA-seq from real hybrids may present challenges not captured by our simulations. We believe this is the first application of machine learning to this problem domain.
Collapse
Affiliation(s)
- Jason R Miller
- Department of Computer Science, Mathematics, Engineering, Shepherd University, Shepherdstown, WV, USA.
- EVOGENE, Department of Biosciences, University of Oslo, Oslo, Norway.
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA.
| | - Donald A Adjeroh
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA
| |
Collapse
|
19
|
Ara T, Kodama Y, Tokimatsu T, Fukuda A, Kosuge T, Mashima J, Tanizawa Y, Tanjo T, Ogasawara O, Fujisawa T, Nakamura Y, Arita M. DDBJ update in 2023: the MetaboBank for metabolomics data and associated metadata. Nucleic Acids Res 2024; 52:D67-D71. [PMID: 37971299 PMCID: PMC10767850 DOI: 10.1093/nar/gkad1046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 10/21/2023] [Accepted: 10/27/2023] [Indexed: 11/19/2023] Open
Abstract
The Bioinformation and DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp) provides database archives that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), DDBJ accepts and distributes nucleotide sequence data as well as their study and sample information along with the National Center for Biotechnology Information in the United States and the European Bioinformatics Institute (EBI). Besides INSDC databases, the DDBJ Center provides databases for functional genomics (GEA: Genomic Expression Archive), metabolomics (MetaboBank) and human genetic and phenotypic data (JGA: Japanese Genotype-phenotype Archive). These database systems have been built on the National Institute of Genetics (NIG) supercomputer, which is also open for domestic life science researchers to analyze large-scale sequence data. This paper reports recent updates on the archival databases and the services of the DDBJ Center, highlighting the newly redesigned MetaboBank. MetaboBank uses BioProject and BioSample in its metadata description making it suitable for multi-omics large studies. Its collaboration with MetaboLights at EBI brings synergy in locating and reusing public data.
Collapse
Affiliation(s)
- Takeshi Ara
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yuichi Kodama
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Toshiaki Tokimatsu
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Asami Fukuda
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Takehide Kosuge
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Jun Mashima
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yasuhiro Tanizawa
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Tomoya Tanjo
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Osamu Ogasawara
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Takatomo Fujisawa
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yasukazu Nakamura
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Masanori Arita
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| |
Collapse
|
20
|
Rigden DJ, Fernández XM. The 2024 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Res 2024; 52:D1-D9. [PMID: 38035367 PMCID: PMC10767945 DOI: 10.1093/nar/gkad1173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 11/23/2023] [Indexed: 12/02/2023] Open
Abstract
The 2024 Nucleic Acids Research database issue contains 180 papers from across biology and neighbouring disciplines. There are 90 papers reporting on new databases and 83 updates from resources previously published in the Issue. Updates from databases most recently published elsewhere account for a further seven. Nucleic acid databases include the new NAKB for structural information and updates from Genbank, ENA, GEO, Tarbase and JASPAR. The Issue's Breakthrough Article concerns NMPFamsDB for novel prokaryotic protein families and the AlphaFold Protein Structure Database has an important update. Metabolism is covered by updates from Reactome, Wikipathways and Metabolights. Microbes are covered by RefSeq, UNITE, SPIRE and P10K; viruses by ViralZone and PhageScope. Medically-oriented databases include the familiar COSMIC, Drugbank and TTD. Genomics-related resources include Ensembl, UCSC Genome Browser and Monarch. New arrivals cover plant imaging (OPIA and PlantPAD) and crop plants (SoyMD, TCOD and CropGS-Hub). The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). Over the last year the NAR online Molecular Biology Database Collection has been updated, reviewing 1060 entries, adding 97 new resources and eliminating 388 discontinued URLs bringing the current total to 1959 databases. It is available at http://www.oxfordjournals.org/nar/database/c/.
Collapse
Affiliation(s)
- Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| | | |
Collapse
|
21
|
Harrison PW, Amode MR, Austine-Orimoloye O, Azov A, Barba M, Barnes I, Becker A, Bennett R, Berry A, Bhai J, Bhurji SK, Boddu S, Branco Lins PR, Brooks L, Ramaraju S, Campbell L, Martinez MC, Charkhchi M, Chougule K, Cockburn A, Davidson C, De Silva N, Dodiya K, Donaldson S, El Houdaigui B, Naboulsi T, Fatima R, Giron CG, Genez T, Grigoriadis D, Ghattaoraya G, Martinez JG, Gurbich T, Hardy M, Hollis Z, Hourlier T, Hunt T, Kay M, Kaykala V, Le T, Lemos D, Lodha D, Marques-Coelho D, Maslen G, Merino G, Mirabueno L, Mushtaq A, Hossain S, Ogeh D, Sakthivel MP, Parker A, Perry M, Piližota I, Poppleton D, Prosovetskaia I, Raj S, Pérez-Silva J, Salam A, Saraf S, Saraiva-Agostinho N, Sheppard D, Sinha S, Sipos B, Sitnik V, Stark W, Steed E, Suner MM, Surapaneni L, Sutinen K, Tricomi FF, Urbina-Gómez D, Veidenberg A, Walsh TA, Ware D, Wass E, Willhoft N, Allen J, Alvarez-Jarreta J, Chakiachvili M, Flint B, Giorgetti S, Haggerty L, Ilsley G, Keatley J, Loveland J, Moore B, Mudge J, Naamati G, Tate J, Trevanion S, Winterbottom A, Frankish A, Hunt SE, Cunningham F, Dyer S, Finn R, Martin F, Yates A. Ensembl 2024. Nucleic Acids Res 2024; 52:D891-D899. [PMID: 37953337 PMCID: PMC10767893 DOI: 10.1093/nar/gkad1049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/20/2023] [Accepted: 10/24/2023] [Indexed: 11/14/2023] Open
Abstract
Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-quality reference genomes, alongside major advances in the pan-genome representations of higher species. In order to support these efforts and accelerate downstream research, Ensembl continues to focus on scaling for the rapid annotation of new genome assemblies, developing new methods for comparative analysis, and expanding the depth and quality of our genome annotations. This year we have continued our expansion to support global biodiversity research, doubling the number of annotated genomes we support on our Rapid Release site to over 1700, driven by our close collaboration with biodiversity projects such as Darwin Tree of Life. We have also strengthened support for key agricultural species, including the first regulatory builds for farmed animals, and have updated key tools and resources that support the global scientific community, notably the Ensembl Variant Effect Predictor. Ensembl data, software, and tools are freely available.
Collapse
Affiliation(s)
- Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - M Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrey G Azov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Matthieu Barba
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Arne Becker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jyothish Bhai
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Simarpreet Kaur Bhurji
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sanjay Boddu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Paulo R Branco Lins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Lucy Brooks
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shashank Budhanuru Ramaraju
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Lahcen I Campbell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Manuel Carbajo Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Mehrnaz Charkhchi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA
| | - Alexander Cockburn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Nishadi H De Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kamalkumar Dodiya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Bilal El Houdaigui
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tamara El Naboulsi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thiago Genez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Dionysios Grigoriadis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gurpreet S Ghattaoraya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jose Gonzalez Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tatiana A Gurbich
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Vinay Kaykala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tuan Le
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Diana Lemos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Disha Lodha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Diego Marques-Coelho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gabriela Alejandra Merino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Louisse Paola Mirabueno
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Aleena Mushtaq
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Syed Nakib Hossain
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Denye N Ogeh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Manoj Pandian Sakthivel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Malcolm Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ivana Piližota
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Daniel Poppleton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Irina Prosovetskaia
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - José G Pérez-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ahamed Imran Abdul Salam
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shradha Saraf
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Nuno Saraiva-Agostinho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Dan Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Swati Sinha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Botond Sipos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Vasily Sitnik
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - William Stark
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kyösti Sutinen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - David Urbina-Gómez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andres Veidenberg
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thomas A Walsh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, NY 14853, USA
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Natalie L Willhoft
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jorge Alvarez-Jarreta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Marc Chakiachvili
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Bethany Flint
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Garth R Ilsley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jon Keatley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Benjamin Moore
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - John Tate
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Stephen J Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrea Winterbottom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrew D Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
22
|
Sayers E, Beck J, Bolton E, Brister J, Chan J, Comeau D, Connor R, DiCuccio M, Farrell C, Feldgarden M, Fine A, Funk K, Hatcher E, Hoeppner M, Kane M, Kannan S, Katz K, Kelly C, Klimke W, Kim S, Kimchi A, Landrum M, Lathrop S, Lu Z, Malheiro A, Marchler-Bauer A, Murphy T, Phan L, Prasad A, Pujar S, Sawyer A, Schmieder E, Schneider V, Schoch C, Sharma S, Thibaud-Nissen F, Trawick B, Venkatapathi T, Wang J, Pruitt K, Sherry S. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2024; 52:D33-D43. [PMID: 37994677 PMCID: PMC10767890 DOI: 10.1093/nar/gkad1044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/20/2023] [Accepted: 10/23/2023] [Indexed: 11/24/2023] Open
Abstract
The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
Collapse
Affiliation(s)
- Eric W Sayers
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jeff Beck
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - J Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jessica Chan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Donald C Comeau
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Ryan Connor
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Michael DiCuccio
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Catherine M Farrell
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Michael Feldgarden
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Anna M Fine
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kathryn Funk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Eneida Hatcher
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Marilu Hoeppner
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Megan Kane
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sivakumar Kannan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kenneth S Katz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Christopher Kelly
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - William Klimke
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Avi Kimchi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Melissa Landrum
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Stacy Lathrop
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Adriana Malheiro
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Lon Phan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Arjun B Prasad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Amanda Sawyer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Erin Schmieder
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Conrad L Schoch
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Shobha Sharma
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Barton W Trawick
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Thilakam Venkatapathi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jiyao Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Stephen T Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
23
|
Haft DH, Badretdin A, Coulouris G, DiCuccio M, Durkin A, Jovenitti E, Li W, Mersha M, O’Neill K, Virothaisakun J, Thibaud-Nissen F. RefSeq and the prokaryotic genome annotation pipeline in the age of metagenomes. Nucleic Acids Res 2024; 52:D762-D769. [PMID: 37962425 PMCID: PMC10767926 DOI: 10.1093/nar/gkad988] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/13/2023] [Accepted: 10/18/2023] [Indexed: 11/15/2023] Open
Abstract
The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains over 315 000 bacterial and archaeal genomes and 236 million proteins with up-to-date and consistent annotation. In the past 3 years, we have expanded the diversity of the RefSeq collection by including the best quality metagenome-assembled genomes (MAGs) submitted to INSDC (DDBJ, ENA and GenBank), while maintaining its quality by adding validation checks. Assemblies are now more stringently evaluated for contamination and for completeness of annotation prior to acceptance into RefSeq. MAGs now account for over 17000 assemblies in RefSeq, split over 165 orders and 362 families. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP), which is used to annotate nearly all RefSeq assemblies include better detection of protein-coding genes. Nearly 83% of RefSeq proteins are now named by a curated Protein Family Model, a 4.7% increase in the past three years ago. In addition to literature citations, Enzyme Commission numbers, and gene symbols, Gene Ontology terms are now assigned to 48% of RefSeq proteins, allowing for easier multi-genome comparison. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/. PGAP is available as a stand-alone tool able to produce GenBank-ready files at https://github.com/ncbi/pgap.
Collapse
Affiliation(s)
- Daniel H Haft
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Azat Badretdin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - George Coulouris
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Michael DiCuccio
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - A Scott Durkin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Eric Jovenitti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Wenjun Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Megdelawit Mersha
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Kathleen R O’Neill
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Joel Virothaisakun
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
24
|
Bai X, Bao Y, Bei S, Bu C, Cao R, Cao Y, Cen H, Chao J, Chen F, Chen H, Chen K, Chen M, Chen M, Chen M, Chen Q, Chen R, Chen S, Chen T, Chen X, Chen X, Cheng Y, Chu Y, Cui Q, Dong L, Du Z, Duan G, Fan S, Fan Z, Fang X, Fang Z, Feng Z, Fu S, Gao F, Gao G, Gao H, Gao W, Gao X, Gao X, Gao X, Gong J, Gong J, Gou Y, Gu S, Guo AY, Guo G, Guo X, Han C, Hao D, Hao L, He Q, He S, He S, Hu W, Huang K, Huang T, Huang X, Huang Y, Jia P, Jia Y, Jiang C, Jiang M, Jiang S, Jiang T, Jiang X, Jin E, Jin W, Kang H, Kang H, Kong D, Lan L, Lei W, Li CY, Li C, Li C, Li H, Li J, Li J, Li L, Li P, Li R, Li X, Li Y, Li Y, Li Z, Liao X, Lin S, Lin Y, Ling Y, Liu B, Liu CJ, Liu D, Liu GH, Liu L, Liu S, Liu W, Liu X, Liu X, Liu Y, Liu Y, Lu M, Lu T, Luo H, Luo H, Luo M, Luo S, Luo X, Ma L, Ma Y, Mai J, Meng J, Meng X, Meng Y, Meng Y, Miao W, Miao YR, Ni L, Nie Z, Niu G, Niu X, Niu Y, Pan R, Pan S, Peng D, Peng J, Qi J, Qi Y, Qian Q, Qin Y, Qu H, Ren J, Ren J, Sang Z, Shang K, Shen WK, Shen Y, Shi Y, Song S, Song T, Su T, Sun J, Sun Y, Sun Y, Sun Y, Tang B, Tang D, Tang Q, Tang Z, Tian D, Tian F, Tian W, Tian Z, Wang A, Wang G, Wang G, Wang J, Wang J, Wang P, Wang P, Wang W, Wang Y, Wang Y, Wang Y, Wang Y, Wang Z, Wei H, Wei Y, Wei Z, Wu D, Wu G, Wu S, Wu S, Wu W, Wu W, Wu Z, Xia Z, Xiao J, Xiao L, Xiao Y, Xie G, Xie GY, Xie J, Xie Y, Xiong J, Xiong Z, Xu D, Xu S, Xu T, Xu T, Xue Y, Xue Y, Yan C, Yang D, Yang F, Yang F, Yang H, Yang J, Yang K, Yang N, Yang QY, Yang S, Yang X, Yang X, Yang X, Yang YG, Ye W, Yu C, Yu F, Yu S, Yuan C, Yuan H, Zeng J, Zhai S, Zhang C, Zhang F, Zhang G, Zhang M, Zhang P, Zhang Q, Zhang R, Zhang S, Zhang W, Zhang W, Zhang W, Zhang X, Zhang X, Zhang Y, Zhang Y, Zhang Y, Zhang YE, Zhang Y, Zhang Z, Zhang Z, Zhao D, Zhao F, Zhao G, Zhao M, Zhao W, Zhao W, Zhao X, Zhao Y, Zhao Y, Zhao Z, Zheng X, Zheng Y, Zhou C, Zhou H, Zhou X, Zhou X, Zhou Y, Zhou Y, Zhu J, Zhu L, Zhu R, Zhu T, Zong W, Zou D, Zuo Z. Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2024. Nucleic Acids Res 2024; 52:D18-D32. [PMID: 38018256 PMCID: PMC10767964 DOI: 10.1093/nar/gkad1078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/12/2023] [Accepted: 10/27/2023] [Indexed: 11/30/2023] Open
Abstract
The National Genomics Data Center (NGDC), which is a part of the China National Center for Bioinformation (CNCB), provides a family of database resources to support the global academic and industrial communities. With the rapid accumulation of multi-omics data at an unprecedented pace, CNCB-NGDC continuously expands and updates core database resources through big data archiving, integrative analysis and value-added curation. Importantly, NGDC collaborates closely with major international databases and initiatives to ensure seamless data exchange and interoperability. Over the past year, significant efforts have been dedicated to integrating diverse omics data, synthesizing expanding knowledge, developing new resources, and upgrading major existing resources. Particularly, several database resources are newly developed for the biodiversity of protists (P10K), bacteria (NTM-DB, MPA) as well as plant (PPGR, SoyOmics, PlantPan) and disease/trait association (CROST, HervD Atlas, HALL, MACdb, BioKA, BioKA, RePoS, PGG.SV, NAFLDkb). All the resources and services are publicly accessible at https://ngdc.cncb.ac.cn.
Collapse
|
25
|
De Castro E, Hulo C, Masson P, Auchincloss A, Bridge A, Le Mercier P. ViralZone 2024 provides higher-resolution images and advanced virus-specific resources. Nucleic Acids Res 2024; 52:D817-D821. [PMID: 37897348 PMCID: PMC10767872 DOI: 10.1093/nar/gkad946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/09/2023] [Accepted: 10/12/2023] [Indexed: 10/30/2023] Open
Abstract
ViralZone (http://viralzone.expasy.org) is a knowledge repository for viruses that links biological knowledge and databases. It contains data on virion structure, genome, proteome, replication cycle and host-virus interactions. The new update provides better access to the data through contextual popups and higher resolution images in Scalable Vector Graphics (SVG) format. These images are designed to be dynamic and interactive with human viruses to give users better access to the data. In addition, a new coronavirus-specific resource provides regularly updated data on variants and molecular biology of SARS-CoV-2. Other virus-specific resources have been added to the database, particularly for HIV, herpesviruses and poxviruses.
Collapse
Affiliation(s)
- Edouard De Castro
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel Servet, 1211 Geneva 4, Switzerland
| | - Chantal Hulo
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel Servet, 1211 Geneva 4, Switzerland
| | - Patrick Masson
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel Servet, 1211 Geneva 4, Switzerland
| | - Andrea Auchincloss
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel Servet, 1211 Geneva 4, Switzerland
| | - Alan Bridge
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel Servet, 1211 Geneva 4, Switzerland
| | - Philippe Le Mercier
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel Servet, 1211 Geneva 4, Switzerland
| |
Collapse
|
26
|
Yuan D, Ahamed A, Burgin J, Cummins C, Devraj R, Gueye K, Gupta D, Gupta V, Haseeb M, Ihsan M, Ivanov E, Jayathilaka S, Kadhirvelu VB, Kumar M, Lathi A, Leinonen R, McKinnon J, Meszaros L, O’Cathail C, Ouma D, Paupério J, Pesant S, Rahman N, Rinck G, Selvakumar S, Suman S, Sunthornyotin Y, Ventouratou M, Vijayaraja S, Waheed Z, Woollard P, Zyoud A, Burdett T, Cochrane G. The European Nucleotide Archive in 2023. Nucleic Acids Res 2024; 52:D92-D97. [PMID: 37956313 PMCID: PMC10767888 DOI: 10.1093/nar/gkad1067] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 10/23/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena) is maintained by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI). The ENA is one of the three members of the International Nucleotide Sequence Database Collaboration (INSDC). It serves the bioinformatics community worldwide via the submission, processing, archiving and dissemination of sequence data. The ENA supports data types ranging from raw reads, through alignments and assemblies to functional annotation. The data is enriched with contextual information relating to samples and experimental configurations. In this article, we describe recent progress and improvements to ENA services. In particular, we focus upon three areas of work in 2023: FAIRness of ENA data, pandemic preparedness and foundational technology. For FAIRness, we have introduced minimal requirements for spatiotemporal annotation, created a metadata-based classification system, incorporated third party metadata curations with archived records, and developed a new rapid visualisation platform, the ENA Notebooks. For foundational enhancements, we have improved the INSDC data exchange and synchronisation pipelines, and invested in site reliability engineering for ENA infrastructure. In order to support genomic surveillance efforts, we have continued to provide ENA services in support of SARS-CoV-2 data mobilisation and have adapted these for broader pathogen surveillance efforts.
Collapse
Affiliation(s)
- David Yuan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alisha Ahamed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Josephine Burgin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carla Cummins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rajkumar Devraj
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Khadim Gueye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dipayan Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vikas Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Muhammad Haseeb
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Maira Ihsan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eugene Ivanov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Suran Jayathilaka
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Manish Kumar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ankur Lathi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rasko Leinonen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jasmine McKinnon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Lili Meszaros
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Colman O’Cathail
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dennis Ouma
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Joana Paupério
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephane Pesant
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nadim Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gabriele Rinck
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sandeep Selvakumar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Swati Suman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yanisa Sunthornyotin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marianna Ventouratou
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Senthilnathan Vijayaraja
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zahra Waheed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter Woollard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ahmad Zyoud
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Guy Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
27
|
Abarenkov K, Nilsson RH, Larsson KH, Taylor AS, May T, Frøslev TG, Pawlowska J, Lindahl B, Põldmaa K, Truong C, Vu D, Hosoya T, Niskanen T, Piirmann T, Ivanov F, Zirk A, Peterson M, Cheeke T, Ishigami Y, Jansson A, Jeppesen T, Kristiansson E, Mikryukov V, Miller J, Oono R, Ossandon F, Paupério J, Saar I, Schigel D, Suija A, Tedersoo L, Kõljalg U. The UNITE database for molecular identification and taxonomic communication of fungi and other eukaryotes: sequences, taxa and classifications reconsidered. Nucleic Acids Res 2024; 52:D791-D797. [PMID: 37953409 PMCID: PMC10767974 DOI: 10.1093/nar/gkad1039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/20/2023] [Accepted: 10/23/2023] [Indexed: 11/14/2023] Open
Abstract
UNITE (https://unite.ut.ee) is a web-based database and sequence management environment for molecular identification of eukaryotes. It targets the nuclear ribosomal internal transcribed spacer (ITS) region and offers nearly 10 million such sequences for reference. These are clustered into ∼2.4M species hypotheses (SHs), each assigned a unique digital object identifier (DOI) to promote unambiguous referencing across studies. UNITE users have contributed over 600 000 third-party sequence annotations, which are shared with a range of databases and other community resources. Recent improvements facilitate the detection of cross-kingdom biological associations and the integration of undescribed groups of organisms into everyday biological pursuits. Serving as a digital twin for eukaryotic biodiversity and communities worldwide, the latest release of UNITE offers improved avenues for biodiversity discovery, precise taxonomic communication and integration of biological knowledge across platforms.
Collapse
Affiliation(s)
- Kessy Abarenkov
- Natural History Museum, University of Tartu, Vanemuise 46, 51003 Tartu, Estonia
| | - R Henrik Nilsson
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 453, 405 30 Göteborg, Sweden
- Gothenburg Global Biodiversity Centre, University of Gothenburg, Box 453, 405 30 Göteborg, Sweden
| | - Karl-Henrik Larsson
- Gothenburg Global Biodiversity Centre, University of Gothenburg, Box 453, 405 30 Göteborg, Sweden
- Natural History Museum, University of Oslo, Box 1172 Blindern, 0318 Oslo, Norway
| | - Andy F S Taylor
- The James Hutton Institute, Craigiebuckler, Aberdeen AB15 8QH, UK
- Institute of Biological and Environmental Sciences, University of Aberdeen, Cruickshank Building, St Machar Drive, Aberdeen AB24 3UU, UK
| | - Tom W May
- Royal Botanic Gardens Victoria, Birdwood Avenue, Melbourne, VIC 3004, Australia
| | - Tobias Guldberg Frøslev
- Global Biodiversity Information Facility (GBIF), Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
| | - Julia Pawlowska
- Institute of Evolutionary Biology, Faculty of Biology, University of Warsaw, ul. Zwirki i Wigury 101, 02-089 Warsaw, Poland
| | - Björn Lindahl
- Swedish University of Agricultural Sciences, Department of Soil and Environment, Box 7014, SE-750 07 Uppsala, Sweden
| | - Kadri Põldmaa
- Natural History Museum, University of Tartu, Vanemuise 46, 51003 Tartu, Estonia
- Institute of Ecology and Earth Sciences, University of Tartu, J. Liivi 2, 50409 Tartu, Estonia
| | - Camille Truong
- Royal Botanic Gardens Victoria, Birdwood Avenue, Melbourne, VIC 3004, Australia
| | - Duong Vu
- Westerdijk Fungal Biodiversity Institute, The Netherlands
| | | | - Tuula Niskanen
- Botany Unit, Finnish Museum of Natural History, P.O.Box 7, 00014 University of Helsinki, Finland
| | - Timo Piirmann
- Natural History Museum, University of Tartu, Vanemuise 46, 51003 Tartu, Estonia
| | - Filipp Ivanov
- Natural History Museum, University of Tartu, Vanemuise 46, 51003 Tartu, Estonia
| | - Allan Zirk
- Natural History Museum, University of Tartu, Vanemuise 46, 51003 Tartu, Estonia
| | - Marko Peterson
- Institute of Ecology and Earth Sciences, University of Tartu, J. Liivi 2, 50409 Tartu, Estonia
| | - Tanya E Cheeke
- School of Biological Sciences, Washington State University, 2710 Crimson Way, Richland, WA 9935, USA
| | - Yui Ishigami
- Institute of Ecology and Earth Sciences, University of Tartu, J. Liivi 2, 50409 Tartu, Estonia
| | - Arnold Tobias Jansson
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 453, 405 30 Göteborg, Sweden
| | - Thomas Stjernegaard Jeppesen
- Global Biodiversity Information Facility (GBIF), Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
| | - Erik Kristiansson
- Department of Mathematical Sciences, Chalmers University of Technology, Gothenburg, Sweden
| | - Vladimir Mikryukov
- Institute of Ecology and Earth Sciences, University of Tartu, J. Liivi 2, 50409 Tartu, Estonia
| | - Joseph T Miller
- Global Biodiversity Information Facility (GBIF), Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
| | - Ryoko Oono
- Department of Ecology, Evolution, and Marine Biology, University of California at Santa Barbara, USA
| | | | - Joana Paupério
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Irja Saar
- Natural History Museum, University of Tartu, Vanemuise 46, 51003 Tartu, Estonia
- Institute of Ecology and Earth Sciences, University of Tartu, J. Liivi 2, 50409 Tartu, Estonia
| | - Dmitry Schigel
- Global Biodiversity Information Facility (GBIF), Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
| | - Ave Suija
- Natural History Museum, University of Tartu, Vanemuise 46, 51003 Tartu, Estonia
| | - Leho Tedersoo
- Institute of Ecology and Earth Sciences, University of Tartu, J. Liivi 2, 50409 Tartu, Estonia
| | - Urmas Kõljalg
- Institute of Ecology and Earth Sciences, University of Tartu, J. Liivi 2, 50409 Tartu, Estonia
| |
Collapse
|
28
|
Eloe-Fadrosh EA, Mungall CJ, Miller MA, Smith M, Patil SS, Kelliher JM, Johnson LYD, Rodriguez FE, Chain PSG, Hu B, Thornton MB, McCue LA, McHardy AC, Harris NL, Reddy TBK, Mukherjee S, Hunter CI, Walls R, Schriml LM. A Practical Approach to Using the Genomic Standards Consortium MIxS Reporting Standard for Comparative Genomics and Metagenomics. Methods Mol Biol 2024; 2802:587-609. [PMID: 38819573 DOI: 10.1007/978-1-0716-3838-5_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Comparative analysis of (meta)genomes necessitates aggregation, integration, and synthesis of well-annotated data using standards. The Genomic Standards Consortium (GSC) collaborates with the research community to develop and maintain the Minimum Information about any (x) Sequence (MIxS) reporting standard for genomic data. To facilitate the use of the GSC's MIxS reporting standard, we provide a description of the structure and terminology, how to navigate ontologies for required terms in MIxS, and demonstrate practical usage through a soil metagenome example.
Collapse
Affiliation(s)
- Emiley A Eloe-Fadrosh
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| | - Christopher J Mungall
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Mark Andrew Miller
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Montana Smith
- Pacific Northwest National Laboratory, Richland, WA, USA
| | - Sujay Sanjeev Patil
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Julia M Kelliher
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Leah Y D Johnson
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | | | - Patrick S G Chain
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Bin Hu
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Michael B Thornton
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Lee Ann McCue
- Pacific Northwest National Laboratory, Richland, WA, USA
| | - Alice Carolyn McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Nomi L Harris
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - T B K Reddy
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Supratim Mukherjee
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Christopher I Hunter
- GigaScience Press, Hong Kong Science Park, Pak Shek Kok, New Territories, Hong Kong
| | | | - Lynn M Schriml
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| |
Collapse
|
29
|
Zhao K, Farrell K, Mashiku M, Abay D, Tang K, Oberste MS, Burns CC. A search-based geographic metadata curation pipeline to refine sequencing institution information and support public health. Front Public Health 2023; 11:1254976. [PMID: 38035280 PMCID: PMC10683794 DOI: 10.3389/fpubh.2023.1254976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 10/19/2023] [Indexed: 12/02/2023] Open
Abstract
Background The National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) has amassed a vast reservoir of genetic data since its inception in 2007. These public data hold immense potential for supporting pathogen surveillance and control. However, the lack of standardized metadata and inconsistent submission practices in SRA may impede the data's utility in public health. Methods To address this issue, we introduce the Search-based Geographic Metadata Curation (SGMC) pipeline. SGMC utilized Python and web scraping to extract geographic data of sequencing institutions from NCBI SRA in the Cloud and its website. It then harnessed ChatGPT to refine the sequencing institution and location assignments. To illustrate the pipeline's utility, we examined the geographic distribution of the sequencing institutions and their countries relevant to polio eradication and categorized them. Results SGMC successfully identified 7,649 sequencing institutions and their global locations from a random selection of 2,321,044 SRA accessions. These institutions were distributed across 97 countries, with strong representation in the United States, the United Kingdom and China. However, there was a lack of data from African, Central Asian, and Central American countries, indicating potential disparities in sequencing capabilities. Comparison with manually curated data for U.S. institutions reveals SGMC's accuracy rates of 94.8% for institutions, 93.1% for countries, and 74.5% for geographic coordinates. Conclusion SGMC may represent a novel approach using a generative AI model to enhance geographic data (country and institution assignments) for large numbers of samples within SRA datasets. This information can be utilized to bolster public health endeavors.
Collapse
Affiliation(s)
- Kun Zhao
- Division of Viral Diseases, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Katie Farrell
- Cherokee Nation Businesses, Contracting Agency to the Division of Viral Diseases, Centers for Disease Control and Prevention, Catoosa, OK, United States
| | - Melchizedek Mashiku
- Cherokee Nation Businesses, Contracting Agency to the Division of Viral Diseases, Centers for Disease Control and Prevention, Catoosa, OK, United States
| | - Dawit Abay
- Cherokee Nation Businesses, Contracting Agency to the Division of Viral Diseases, Centers for Disease Control and Prevention, Catoosa, OK, United States
| | - Kevin Tang
- Division of Scientific Resources, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - M Steven Oberste
- Division of Viral Diseases, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Cara C Burns
- Division of Viral Diseases, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| |
Collapse
|
30
|
Ma B, Lu C, Wang Y, Yu J, Zhao K, Xue R, Ren H, Lv X, Pan R, Zhang J, Zhu Y, Xu J. A genomic catalogue of soil microbiomes boosts mining of biodiversity and genetic resources. Nat Commun 2023; 14:7318. [PMID: 37951952 PMCID: PMC10640626 DOI: 10.1038/s41467-023-43000-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 10/27/2023] [Indexed: 11/14/2023] Open
Abstract
Soil harbors a vast expanse of unidentified microbes, termed as microbial dark matter, presenting an untapped reservo)ir of microbial biodiversity and genetic resources, but has yet to be fully explored. In this study, we conduct a large-scale excavation of soil microbial dark matter by reconstructing 40,039 metagenome-assembled genome bins (the SMAG catalogue) from 3304 soil metagenomes. We identify 16,530 of 21,077 species-level genome bins (SGBs) as unknown SGBs (uSGBs), which expand archaeal and bacterial diversity across the tree of life. We also illustrate the pivotal role of uSGBs in augmenting soil microbiome's functional landscape and intra-species genome diversity, providing large proportions of the 43,169 biosynthetic gene clusters and 8545 CRISPR-Cas genes. Additionally, we determine that uSGBs contributed 84.6% of previously unexplored viral-host associations from the SMAG catalogue. The SMAG catalogue provides an useful genomic resource for further studies investigating soil microbial biodiversity and genetic resources.
Collapse
Affiliation(s)
- Bin Ma
- Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, 310058, China
- Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Zhejiang University, Hangzhou, 310058, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China
| | - Caiyu Lu
- Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, 310058, China
- Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Zhejiang University, Hangzhou, 310058, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China
| | - Yiling Wang
- Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, 310058, China
- Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Zhejiang University, Hangzhou, 310058, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China
| | - Jingwen Yu
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China
| | - Kankan Zhao
- Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, 310058, China
- Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Zhejiang University, Hangzhou, 310058, China
| | - Ran Xue
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China
| | - Hao Ren
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China
| | - Xiaofei Lv
- Department of Environmental Engineering, China Jiliang University, Hangzhou, 310018, China
| | - Ronghui Pan
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China
| | - Jiabao Zhang
- State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing, 210008, China
| | - Yongguan Zhu
- Research Center for Eco-environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China
| | - Jianming Xu
- Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, 310058, China.
- Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Zhejiang University, Hangzhou, 310058, China.
| |
Collapse
|
31
|
Grigson SR, Giles SK, Edwards RA, Papudeshi B. Knowing and Naming: Phage Annotation and Nomenclature for Phage Therapy. Clin Infect Dis 2023; 77:S352-S359. [PMID: 37932119 PMCID: PMC10627814 DOI: 10.1093/cid/ciad539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023] Open
Abstract
Bacteriophages, or phages, are viruses that infect bacteria shaping microbial communities and ecosystems. They have gained attention as potential agents against antibiotic resistance. In phage therapy, lytic phages are preferred for their bacteria killing ability, while temperate phages, which can transfer antibiotic resistance or toxin genes, are avoided. Selection relies on plaque morphology and genome sequencing. This review outlines annotating genomes, identifying critical genomic features, and assigning functional labels to protein-coding sequences. These annotations prevent the transfer of unwanted genes, such as antimicrobial resistance or toxin genes, during phage therapy. Additionally, it covers International Committee on Taxonomy of Viruses (ICTV)-an established phage nomenclature system for simplified classification and communication. Accurate phage genome annotation and nomenclature provide insights into phage-host interactions, replication strategies, and evolution, accelerating our understanding of the diversity and evolution of phages and facilitating the development of phage-based therapies.
Collapse
Affiliation(s)
- Susanna R Grigson
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| | - Sarah K Giles
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| | - Robert A Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| |
Collapse
|
32
|
Avila Santos AP, Kabiru Nata'ala M, Kasmanas JC, Bartholomäus A, Keller-Costa T, Jurburg SD, Tal T, Camarinha-Silva A, Saraiva JP, Ponce de Leon Ferreira de Carvalho AC, Stadler PF, Sipoli Sanches D, Rocha U. The AnimalAssociatedMetagenomeDB reveals a bias towards livestock and developed countries and blind spots in functional-potential studies of animal-associated microbiomes. Anim Microbiome 2023; 5:48. [PMID: 37798675 PMCID: PMC10552293 DOI: 10.1186/s42523-023-00267-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 09/18/2023] [Indexed: 10/07/2023] Open
Abstract
BACKGROUND Metagenomic data can shed light on animal-microbiome relationships and the functional potential of these communities. Over the past years, the generation of metagenomics data has increased exponentially, and so has the availability and reusability of data present in public repositories. However, identifying which datasets and associated metadata are available is not straightforward. We created the Animal-Associated Metagenome Metadata Database (AnimalAssociatedMetagenomeDB - AAMDB) to facilitate the identification and reuse of publicly available non-human, animal-associated metagenomic data, and metadata. Further, we used the AAMDB to (i) annotate common and scientific names of the species; (ii) determine the fraction of vertebrates and invertebrates; (iii) study their biogeography; and (iv) specify whether the animals were wild, pets, livestock or used for medical research. RESULTS We manually selected metagenomes associated with non-human animals from SRA and MG-RAST. Next, we standardized and curated 51 metadata attributes (e.g., host, compartment, geographic coordinates, and country). The AAMDB version 1.0 contains 10,885 metagenomes associated with 165 different species from 65 different countries. From the collected metagenomes, 51.1% were recovered from animals associated with medical research or grown for human consumption (i.e., mice, rats, cattle, pigs, and poultry). Further, we observed an over-representation of animals collected in temperate regions (89.2%) and a lower representation of samples from the polar zones, with only 11 samples in total. The most common genus among invertebrate animals was Trichocerca (rotifers). CONCLUSION Our work may guide host species selection in novel animal-associated metagenome research, especially in biodiversity and conservation studies. The data available in our database will allow scientists to perform meta-analyses and test new hypotheses (e.g., host-specificity, strain heterogeneity, and biogeography of animal-associated metagenomes), leveraging existing data. The AAMDB WebApp is a user-friendly interface that is publicly available at https://webapp.ufz.de/aamdb/ .
Collapse
Affiliation(s)
- Anderson Paulo Avila Santos
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
- Institute of Mathematics and Computer Sciences, University of Sao Paulo, Sao Carlos, Brazil
| | - Muhammad Kabiru Nata'ala
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Saxony, Germany
| | - Jonas Coelho Kasmanas
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Saxony, Germany
- Institute of Mathematics and Computer Sciences, University of Sao Paulo, Sao Carlos, Brazil
| | - Alexander Bartholomäus
- GFZ German Research Centre for Geosciences, Section 3.7 Geomicrobiology, 14473, Telegrafenberg, Potsdam, Germany
| | - Tina Keller-Costa
- Institute for Bioengineering and Biosciences (iBB) and Institute for Health and Bioeconomy (i4HB), Instituto Superior Tecnico (IST), Universidade de Lisboa, Lisbon, 1049-001, Portugal
| | - Stephanie D Jurburg
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
- German Centre of Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstraße 4, Leipzig, 04103, Germany
| | - Tamara Tal
- Department of Bioanalytical Ecotoxicology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Amélia Camarinha-Silva
- Hohenheim Center for Livestock Microbiome Research (HoLMiR), University of Hohenheim, Stuttgart, Germany
- Institute of Animal Science, University of Hohenheim, Stuttgart, Germany
| | - João Pedro Saraiva
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany
| | | | - Peter F Stadler
- Department of Computer Science and Interdisciplinary Centre of Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Saxony, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße, 04103, Leipzig, Germany
- Institute for Theoretical Chemistry, Universität Wien, Währingerstraße 17, Vienna, A-1090, Austria
- Center for Scalable Data Analytics and Artificial Intelligence Dresden-Leipzig, Leipzig University, Leipzig, Germany
- Faculdad de Ciencias, Universidad Nacional de Colombia, Sede Bogotá, Bogotá, Colombia
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
- The Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA
| | | | - Ulisses Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ GmbH, 04318, Leipzig, Germany.
| |
Collapse
|
33
|
Rosenzweig AF, Burian J, Brady SF. Present and future outlooks on environmental DNA-based methods for antibiotic discovery. Curr Opin Microbiol 2023; 75:102335. [PMID: 37327680 PMCID: PMC11076179 DOI: 10.1016/j.mib.2023.102335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 04/28/2023] [Accepted: 05/17/2023] [Indexed: 06/18/2023]
Abstract
Novel antibiotics are in constant demand to combat a global increase in antibiotic-resistant infections. Bacterial natural products have been a long-standing source of antibiotic compounds, and metagenomic mining of environmental DNA (eDNA) has increasingly provided new antibiotic leads. The metagenomic small-molecule discovery pipeline can be divided into three main steps: surveying eDNA, retrieving a sequence of interest, and accessing the encoded natural product. Improvements in sequencing technology, bioinformatic algorithms, and methods for converting biosynthetic gene clusters into small molecules are steadily increasing our ability to discover metagenomically encoded antibiotics. We predict that, over the next decade, ongoing technological improvements will dramatically increase the rate at which antibiotics are discovered from metagenomes.
Collapse
Affiliation(s)
- Adam F Rosenzweig
- Laboratory of Genetically Encoded Small Molecules, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA
| | - Ján Burian
- Laboratory of Genetically Encoded Small Molecules, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA
| | - Sean F Brady
- Laboratory of Genetically Encoded Small Molecules, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA.
| |
Collapse
|
34
|
Bao Y, Xue Y. From BIG Data Center to China National Center for Bioinformation. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:900-903. [PMID: 37832784 PMCID: PMC10928365 DOI: 10.1016/j.gpb.2023.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 09/30/2023] [Accepted: 10/07/2023] [Indexed: 10/15/2023]
Affiliation(s)
- Yiming Bao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Yongbiao Xue
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
35
|
Unal M, Bostanci E, Ozkul C, Acici K, Asuroglu T, Guzel MS. Crohn's Disease Prediction Using Sequence Based Machine Learning Analysis of Human Microbiome. Diagnostics (Basel) 2023; 13:2835. [PMID: 37685376 PMCID: PMC10486516 DOI: 10.3390/diagnostics13172835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 08/24/2023] [Accepted: 08/31/2023] [Indexed: 09/10/2023] Open
Abstract
Human microbiota refers to the trillions of microorganisms that inhabit our bodies and have been discovered to have a substantial impact on human health and disease. By sampling the microbiota, it is possible to generate massive quantities of data for analysis using Machine Learning algorithms. In this study, we employed several modern Machine Learning techniques to predict Inflammatory Bowel Disease using raw sequence data. The dataset was obtained from NCBI preprocessed graph representations and converted into a structured form. Seven well-known Machine Learning frameworks, including Random Forest, Support Vector Machines, Extreme Gradient Boosting, Light Gradient Boosting Machine, Gaussian Naïve Bayes, Logistic Regression, and k-Nearest Neighbor, were used. Grid Search was employed for hyperparameter optimization. The performance of the Machine Learning models was evaluated using various metrics such as accuracy, precision, fscore, kappa, and area under the receiver operating characteristic curve. Additionally, Mc Nemar's test was conducted to assess the statistical significance of the experiment. The data was constructed using k-mer lengths of 3, 4 and 5. The Light Gradient Boosting Machine model overperformed over other models with 67.24%, 74.63% and 76.47% accuracy for k-mer lengths of 3, 4 and 5, respectively. The LightGBM model also demonstrated the best performance in each metric. The study showed promising results predicting disease from raw sequence data. Finally, Mc Nemar's test results found statistically significant differences between different Machine Learning approaches.
Collapse
Affiliation(s)
- Metehan Unal
- Department of Computer Engineering, Ankara University, 06830 Ankara, Turkey; (M.U.)
| | - Erkan Bostanci
- Department of Computer Engineering, Ankara University, 06830 Ankara, Turkey; (M.U.)
| | - Ceren Ozkul
- Department of Pharmaceutical Microbiology, Faculty of Pharmacy, Hacettepe University, 06230 Ankara, Turkey
| | - Koray Acici
- Department of Artificial Intelligence and Data Engineering, Ankara University, 06830 Ankara, Turkey
| | - Tunc Asuroglu
- Faculty of Medicine and Health Technology, Tampere University, 33720 Tampere, Finland
| | - Mehmet Serdar Guzel
- Department of Computer Engineering, Ankara University, 06830 Ankara, Turkey; (M.U.)
| |
Collapse
|
36
|
Zhao Y, He B, Xu F, Li C, Xu Z, Su X, He H, Huang Y, Rossjohn J, Song J, Yao J. DeepAIR: A deep learning framework for effective integration of sequence and 3D structure to enable adaptive immune receptor analysis. SCIENCE ADVANCES 2023; 9:eabo5128. [PMID: 37556545 PMCID: PMC10411891 DOI: 10.1126/sciadv.abo5128] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 07/06/2023] [Indexed: 08/11/2023]
Abstract
Structural docking between the adaptive immune receptors (AIRs), including T cell receptors (TCRs) and B cell receptors (BCRs), and their cognate antigens are one of the most fundamental processes in adaptive immunity. However, current methods for predicting AIR-antigen binding largely rely on sequence-derived features of AIRs, omitting the structure features that are essential for binding affinity. In this study, we present a deep learning framework, termed DeepAIR, for the accurate prediction of AIR-antigen binding by integrating both sequence and structure features of AIRs. DeepAIR achieves a Pearson's correlation of 0.813 in predicting the binding affinity of TCR, and a median area under the receiver-operating characteristic curve (AUC) of 0.904 and 0.942 in predicting the binding reactivity of TCR and BCR, respectively. Meanwhile, using TCR and BCR repertoire, DeepAIR correctly identifies every patient with nasopharyngeal carcinoma and inflammatory bowel disease in test data. Thus, DeepAIR improves the AIR-antigen binding prediction that facilitates the study of adaptive immunity.
Collapse
Affiliation(s)
- Yu Zhao
- AI Lab, Tencent, Shenzhen, China
| | - Bing He
- AI Lab, Tencent, Shenzhen, China
| | - Fan Xu
- AI Lab, Tencent, Shenzhen, China
| | - Chen Li
- Biomedicine Discovery Institute and Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | | | | | | | | | - Jamie Rossjohn
- Infection and Immunity Program and Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
- Institute of Infection and Immunity, Cardiff University School of Medicine, Heath Park, Cardiff, UK
| | - Jiangning Song
- AI Lab, Tencent, Shenzhen, China
- Biomedicine Discovery Institute and Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | | |
Collapse
|
37
|
Jackson KC, Pachter L. A standard for sharing spatial transcriptomics data. CELL GENOMICS 2023; 3:100374. [PMID: 37601972 PMCID: PMC10435375 DOI: 10.1016/j.xgen.2023.100374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/22/2023]
Abstract
Spatial transcriptomic technologies have the potential to reveal critical relationships between the function of genes and cells and their spatial organization. Here, we provide a sharing model for spatial transcriptomics data with the aim of establishing a set of primary data and metadata needed to reproduce analyses and facilitate computational methods development.
Collapse
Affiliation(s)
- Kayla C. Jackson
- Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
38
|
Karatzas E, Baltoumas FA, Aplakidou E, Kontou PI, Stathopoulos P, Stefanis L, Bagos PG, Pavlopoulos GA. Flame (v2.0): advanced integration and interpretation of functional enrichment results from multiple sources. Bioinformatics 2023; 39:btad490. [PMID: 37540207 PMCID: PMC10423032 DOI: 10.1093/bioinformatics/btad490] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 05/31/2023] [Accepted: 08/03/2023] [Indexed: 08/05/2023] Open
Abstract
Functional enrichment is the process of identifying implicated functional terms from a given input list of genes or proteins. In this article, we present Flame (v2.0), a web tool which offers a combinatorial approach through merging and visualizing results from widely used functional enrichment applications while also allowing various flexible input options. In this version, Flame utilizes the aGOtool, g: Profiler, WebGestalt, and Enrichr pipelines and presents their outputs separately or in combination following a visual analytics approach. For intuitive representations and easier interpretation, it uses interactive plots such as parameterizable networks, heatmaps, barcharts, and scatter plots. Users can also: (i) handle multiple protein/gene lists and analyse union and intersection sets simultaneously through interactive UpSet plots, (ii) automatically extract genes and proteins from free text through text-mining and Named Entity Recognition (NER) techniques, (iii) upload single nucleotide polymorphisms (SNPs) and extract their relative genes, or (iv) analyse multiple lists of differentially expressed proteins/genes after selecting them interactively from a parameterizable volcano plot. Compared to the previous version of 197 supported organisms, Flame (v2.0) currently allows enrichment for 14 436 organisms. AVAILABILITY AND IMPLEMENTATION Web Application: http://flame.pavlopouloslab.info. Code: https://github.com/PavlopoulosLab/Flame. Docker: https://hub.docker.com/r/pavlopouloslab/flame.
Collapse
Affiliation(s)
- Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari (Athens), 16672, Greece
| | - Fotis A Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari (Athens), 16672, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari (Athens), 16672, Greece
| | - Panagiota I Kontou
- Department of Mathematics, University of Thessaly, Lamia, 35100, Greece
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, 35131, Greece
| | - Panos Stathopoulos
- 1st Department of Neurology, Eginition Hospital, Athens, 11528, Greece
- School of Medicine, National and Kapodistrian University of Athens, Athens, 11527, Greece
| | - Leonidas Stefanis
- 1st Department of Neurology, Eginition Hospital, Athens, 11528, Greece
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, 35131, Greece
| | - Georgios A Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari (Athens), 16672, Greece
- Center of Basic Research, Biomedical Research Foundation of the Academy of Athens, Athens, 11527, Greece
- Hellenic Army Academy, Vari, 16673, Greece
| |
Collapse
|
39
|
Bowler-Barnett EH, Fan J, Luo J, Magrane M, Martin MJ, Orchard S. UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship. Mol Cell Proteomics 2023; 22:100591. [PMID: 37301379 PMCID: PMC10404557 DOI: 10.1016/j.mcpro.2023.100591] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/20/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023] Open
Abstract
The human proteome comprises of all of the proteins produced by the sequences translated from the human genome with additional modifications in both sequence and function caused by nonsynonymous variants and posttranslational modifications including cleavage of the initial transcript into smaller peptides and polypeptides. The UniProtKB database (www.uniprot.org) is the world's leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information and presents a summary of experimentally verified, or computationally predicted, functional information added by our expert biocuration team for each protein in the proteome. Researchers in the field of mass spectrometry-based proteomics both consume and add to the body of data available in UniProtKB, and this review highlights the information we provide to this community and the knowledge we in turn obtain from groups via deposition of large-scale datasets in public domain databases.
Collapse
Affiliation(s)
- E H Bowler-Barnett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Fan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Luo
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - S Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom.
| |
Collapse
|
40
|
Ruperao P, Rangan P, Shah T, Thakur V, Kalia S, Mayes S, Rathore A. The Progression in Developing Genomic Resources for Crop Improvement. Life (Basel) 2023; 13:1668. [PMID: 37629524 PMCID: PMC10455509 DOI: 10.3390/life13081668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 07/25/2023] [Indexed: 08/27/2023] Open
Abstract
Sequencing technologies have rapidly evolved over the past two decades, and new technologies are being continually developed and commercialized. The emerging sequencing technologies target generating more data with fewer inputs and at lower costs. This has also translated to an increase in the number and type of corresponding applications in genomics besides enhanced computational capacities (both hardware and software). Alongside the evolving DNA sequencing landscape, bioinformatics research teams have also evolved to accommodate the increasingly demanding techniques used to combine and interpret data, leading to many researchers moving from the lab to the computer. The rich history of DNA sequencing has paved the way for new insights and the development of new analysis methods. Understanding and learning from past technologies can help with the progress of future applications. This review focuses on the evolution of sequencing technologies, their significant enabling role in generating plant genome assemblies and downstream applications, and the parallel development of bioinformatics tools and skills, filling the gap in data analysis techniques.
Collapse
Affiliation(s)
- Pradeep Ruperao
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Parimalan Rangan
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India;
| | - Trushar Shah
- International Institute of Tropical Agriculture (IITA), Nairobi 30709-00100, Kenya;
| | - Vivek Thakur
- Department of Systems & Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad 500046, India;
| | - Sanjay Kalia
- Department of Biotechnology, Ministry of Science and Technology, Government of India, New Delhi 110003, India;
| | - Sean Mayes
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Abhishek Rathore
- Excellence in Breeding, International Maize and Wheat Improvement Center (CIMMYT), Hyderabad 502324, India
| |
Collapse
|
41
|
Choi YM, Choi DH, Lee YQ, Koduru L, Lewis NE, Lakshmanan M, Lee DY. Mitigating biomass composition uncertainties in flux balance analysis using ensemble representations. Comput Struct Biotechnol J 2023; 21:3736-3745. [PMID: 37547082 PMCID: PMC10400880 DOI: 10.1016/j.csbj.2023.07.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 07/04/2023] [Accepted: 07/19/2023] [Indexed: 08/08/2023] Open
Abstract
The biomass equation is a critical component in genome-scale metabolic models (GEMs): it is used as the de facto objective function in flux balance analysis (FBA). This equation accounts for the quantities of all known biomass precursors that are required for cell growth based on the macromolecular and monomer compositions measured at certain conditions. However, it is often reported that the macromolecular composition of cells could change across different environmental conditions and thus the use of the same single biomass equation in FBA, under multiple conditions, is questionable. Herein, we first investigated the qualitative and quantitative variations of macromolecular compositions of three representative host organisms, Escherichia coli, Saccharomyces cerevisiae and Cricetulus griseus, across different environmental/genetic variations. While macromolecular building blocks such as RNA, protein, and lipid composition vary notably, changes in fundamental biomass monomer units such as nucleotides and amino acids are not appreciable. We also observed that flux predictions through FBA is quite sensitive to macromolecular compositions but not the monomer compositions. Based on these observations, we propose ensemble representations of biomass equation in FBA to account for the natural variation of cellular constituents. Such ensemble representations of biomass better predicted the flux through anabolic reactions as it allows for the flexibility in the biosynthetic demands of the cells. The current study clearly highlights that certain component of the biomass equation indeed vary across different conditions, and the ensemble representation of biomass equation in FBA by accounting for such natural variations could avoid inaccuracies that may arise from in silico simulations.
Collapse
Affiliation(s)
- Yoon-Mi Choi
- School of Chemical Engineering, Sungkyunkwan University, Suwon-si, Gyeonggi-do, Republic of Korea
- Bioprocessing Technology Institute (BTI), Agency for Science, Technology and Research (A⁎STAR), Singapore
| | - Dong-Hyuk Choi
- School of Chemical Engineering, Sungkyunkwan University, Suwon-si, Gyeonggi-do, Republic of Korea
| | - Yi Qing Lee
- School of Chemical Engineering, Sungkyunkwan University, Suwon-si, Gyeonggi-do, Republic of Korea
| | - Lokanand Koduru
- Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A⁎STAR), Singapore
| | - Nathan E. Lewis
- Departments of Pediatrics and Bioengineering, University of California, La Jolla, San Diego, USA
| | - Meiyappan Lakshmanan
- Bioprocessing Technology Institute (BTI), Agency for Science, Technology and Research (A⁎STAR), Singapore
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, and Centre for Integrative Biology and Systems medicinE (IBSE), Indian Institute of Technology Madras, Chennai, Tamil Nadu, India
| | - Dong-Yup Lee
- School of Chemical Engineering, Sungkyunkwan University, Suwon-si, Gyeonggi-do, Republic of Korea
- Bitwinners Pte. Ltd., Singapore
| |
Collapse
|
42
|
Kim PY, Kim AY, Newman JJ, Cella E, Bishop TC, Huwe PJ, Uchakina ON, McKallip RJ, Mack VL, Hill MP, Ogungbe IV, Adeyinka O, Jones S, Ware G, Carroll J, Sawyer JF, Densmore KH, Foster M, Valmond L, Thomas J, Azarian T, Queen K, Kamil JP. A collaborative approach to improving representation in viral genomic surveillance. PLOS GLOBAL PUBLIC HEALTH 2023; 3:e0001935. [PMID: 37467165 PMCID: PMC10355392 DOI: 10.1371/journal.pgph.0001935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 06/05/2023] [Indexed: 07/21/2023]
Abstract
The lack of routine viral genomic surveillance delayed the initial detection of SARS-CoV-2, allowing the virus to spread unfettered at the outset of the U.S. epidemic. Over subsequent months, poor surveillance enabled variants to emerge unnoticed. Against this backdrop, long-standing social and racial inequities have contributed to a greater burden of cases and deaths among minority groups. To begin to address these problems, we developed a new variant surveillance model geared toward building 'next generation' genome sequencing capacity at universities in or near rural areas and engaging the participation of their local communities. The resulting genomic surveillance network has generated more than 1,000 SARS-CoV-2 genomes to date, including the first confirmed case in northeast Louisiana of Omicron, and the first and sixth confirmed cases in Georgia of the emergent BA.2.75 and BQ.1.1 variants, respectively. In agreement with other studies, significantly higher viral gene copy numbers were observed in Delta variant samples compared to those from Omicron BA.1 variant infections, and lower copy numbers were seen in asymptomatic infections relative to symptomatic ones. Collectively, the results and outcomes from our collaborative work demonstrate that establishing genomic surveillance capacity at smaller academic institutions in rural areas and fostering relationships between academic teams and local health clinics represent a robust pathway to improve pandemic readiness.
Collapse
Affiliation(s)
- Paul Y. Kim
- Department of Biological Sciences, Grambling State University, Grambling, LA, United States of America
| | - Audrey Y. Kim
- Department of Biological Sciences, Grambling State University, Grambling, LA, United States of America
| | - Jamie J. Newman
- School of Biological Sciences, Louisiana Tech University, Ruston, LA, United States of America
| | - Eleonora Cella
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL, United States of America
| | - Thomas C. Bishop
- Physics and Chemistry Programs, Louisiana Tech University, Ruston, LA, United States of America
| | - Peter J. Huwe
- Mercer University School of Medicine, Macon, GA, United States of America
| | - Olga N. Uchakina
- Mercer University School of Medicine, Macon, GA, United States of America
| | - Robert J. McKallip
- Mercer University School of Medicine, Macon, GA, United States of America
| | - Vance L. Mack
- Mercer Medicine, Macon, GA, United States of America
| | | | - Ifedayo Victor Ogungbe
- Department of Chemistry, Jackson State University, Jackson, MS, United States of America
| | - Olawale Adeyinka
- Department of Chemistry, Jackson State University, Jackson, MS, United States of America
| | - Samuel Jones
- Health Services Center, Jackson State University, Jackson, MS, United States of America
| | - Gregory Ware
- Center of Excellence for Emerging Viral Threats, Louisiana State University Health Shreveport, Shreveport, LA, United States of America
| | - Jennifer Carroll
- Center of Excellence for Emerging Viral Threats, Louisiana State University Health Shreveport, Shreveport, LA, United States of America
| | - Jarrod F. Sawyer
- Center of Excellence for Emerging Viral Threats, Louisiana State University Health Shreveport, Shreveport, LA, United States of America
| | - Kenneth H. Densmore
- Center of Excellence for Emerging Viral Threats, Louisiana State University Health Shreveport, Shreveport, LA, United States of America
| | - Michael Foster
- School of Biological Sciences, Louisiana Tech University, Ruston, LA, United States of America
| | - Lescia Valmond
- Department of Biological Sciences, Grambling State University, Grambling, LA, United States of America
| | - John Thomas
- Department of Biological Sciences, Grambling State University, Grambling, LA, United States of America
| | - Taj Azarian
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL, United States of America
| | - Krista Queen
- Center of Excellence for Emerging Viral Threats, Louisiana State University Health Shreveport, Shreveport, LA, United States of America
| | - Jeremy P. Kamil
- Center of Excellence for Emerging Viral Threats, Louisiana State University Health Shreveport, Shreveport, LA, United States of America
- Department of Microbiology and Immunology, Louisiana State University Health Shreveport, Shreveport, LA, United States of America
| |
Collapse
|
43
|
Cuzick A, Seager J, Wood V, Urban M, Rutherford K, Hammond-Kosack KE. A framework for community curation of interspecies interactions literature. eLife 2023; 12:e84658. [PMID: 37401199 DOI: 10.7554/elife.84658] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 05/18/2023] [Indexed: 07/05/2023] Open
Abstract
The quantity and complexity of data being generated and published in biology has increased substantially, but few methods exist for capturing knowledge about phenotypes derived from molecular interactions between diverse groups of species, in such a way that is amenable to data-driven biology and research. To improve access to this knowledge, we have constructed a framework for the curation of the scientific literature studying interspecies interactions, using data curated for the Pathogen-Host Interactions database (PHI-base) as a case study. The framework provides a curation tool, phenotype ontology, and controlled vocabularies to curate pathogen-host interaction data, at the level of the host, pathogen, strain, gene, and genotype. The concept of a multispecies genotype, the 'metagenotype,' is introduced to facilitate capturing changes in the disease-causing abilities of pathogens, and host resistance or susceptibility, observed by gene alterations. We report on this framework and describe PHI-Canto, a community curation tool for use by publication authors.
Collapse
Affiliation(s)
- Alayne Cuzick
- Strategic area: Protecting Crops and the Environment, Rothamsted Research, Harpenden, United Kingdom
| | - James Seager
- Strategic area: Protecting Crops and the Environment, Rothamsted Research, Harpenden, United Kingdom
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Martin Urban
- Strategic area: Protecting Crops and the Environment, Rothamsted Research, Harpenden, United Kingdom
| | - Kim Rutherford
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Kim E Hammond-Kosack
- Strategic area: Protecting Crops and the Environment, Rothamsted Research, Harpenden, United Kingdom
| |
Collapse
|
44
|
Malý M, Kolenko P, Stránský J, Švecová L, Dušková J, Koval’ T, Skálová T, Trundová M, Adámková K, Černý J, Božíková P, Dohnálek J. Tetracycline-modifying enzyme SmTetX from Stenotrophomonas maltophilia. Acta Crystallogr F Struct Biol Commun 2023; 79:180-192. [PMID: 37405486 PMCID: PMC10327574 DOI: 10.1107/s2053230x23005381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 06/16/2023] [Indexed: 07/06/2023] Open
Abstract
The resistance of the emerging human pathogen Stenotrophomonas maltophilia to tetracycline antibiotics mainly depends on multidrug efflux pumps and ribosomal protection enzymes. However, the genomes of several strains of this Gram-negative bacterium code for a FAD-dependent monooxygenase (SmTetX) homologous to tetracycline destructases. This protein was recombinantly produced and its structure and function were investigated. Activity assays using SmTetX showed its ability to modify oxytetracycline with a catalytic rate comparable to those of other destructases. SmTetX shares its fold with the tetracycline destructase TetX from Bacteroides thetaiotaomicron; however, its active site possesses an aromatic region that is unique in this enzyme family. A docking study confirmed tetracycline and its analogues to be the preferred binders amongst various classes of antibiotics.
Collapse
Affiliation(s)
- Martin Malý
- Institute of Biotechnology, Czech Academy of Sciences, v.v.i., BIOCEV, Průmyslová 595, 252 50 Vestec, Czech Republic
- Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague, Břehová 7, 115 19 Prague 1, Czech Republic
| | - Petr Kolenko
- Institute of Biotechnology, Czech Academy of Sciences, v.v.i., BIOCEV, Průmyslová 595, 252 50 Vestec, Czech Republic
- Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague, Břehová 7, 115 19 Prague 1, Czech Republic
| | - Jan Stránský
- Institute of Biotechnology, Czech Academy of Sciences, v.v.i., BIOCEV, Průmyslová 595, 252 50 Vestec, Czech Republic
| | - Leona Švecová
- Institute of Biotechnology, Czech Academy of Sciences, v.v.i., BIOCEV, Průmyslová 595, 252 50 Vestec, Czech Republic
| | - Jarmila Dušková
- Institute of Biotechnology, Czech Academy of Sciences, v.v.i., BIOCEV, Průmyslová 595, 252 50 Vestec, Czech Republic
| | - Tomáš Koval’
- Institute of Biotechnology, Czech Academy of Sciences, v.v.i., BIOCEV, Průmyslová 595, 252 50 Vestec, Czech Republic
| | - Tereza Skálová
- Institute of Biotechnology, Czech Academy of Sciences, v.v.i., BIOCEV, Průmyslová 595, 252 50 Vestec, Czech Republic
| | - Mária Trundová
- Institute of Biotechnology, Czech Academy of Sciences, v.v.i., BIOCEV, Průmyslová 595, 252 50 Vestec, Czech Republic
| | - Kristýna Adámková
- Institute of Biotechnology, Czech Academy of Sciences, v.v.i., BIOCEV, Průmyslová 595, 252 50 Vestec, Czech Republic
| | - Jiří Černý
- Institute of Biotechnology, Czech Academy of Sciences, v.v.i., BIOCEV, Průmyslová 595, 252 50 Vestec, Czech Republic
| | - Paulína Božíková
- Institute of Biotechnology, Czech Academy of Sciences, v.v.i., BIOCEV, Průmyslová 595, 252 50 Vestec, Czech Republic
| | - Jan Dohnálek
- Institute of Biotechnology, Czech Academy of Sciences, v.v.i., BIOCEV, Průmyslová 595, 252 50 Vestec, Czech Republic
| |
Collapse
|
45
|
Qin ZX, Chen GZ, Yang QQ, Wu YJ, Sun CQ, Yang XM, Luo M, Yi CR, Zhu J, Chen WH, Liu Z. Cross-Platform Transcriptomic Data Integration, Profiling, and Mining in Vibrio cholerae. Microbiol Spectr 2023; 11:e0536922. [PMID: 37191528 PMCID: PMC10269641 DOI: 10.1128/spectrum.05369-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 04/24/2023] [Indexed: 05/17/2023] Open
Abstract
A large number of transcriptome studies generate important data and information for the study of pathogenic mechanisms of pathogens, including Vibrio cholerae. V. cholerae transcriptome data include RNA-seq and microarray: microarray data mainly include clinical human and environmental samples, and RNA-seq data mainly focus on laboratory processing conditions, including different stresses and experimental animals in vivo. In this study, we integrated the data sets of both platforms using Rank-in and the Limma R package normalized Between Arrays function, achieving the first cross-platform transcriptome data integration of V. cholerae. By integrating the entire transcriptome data, we obtained the profiles of the most active or silent genes. By transferring the integrated expression profiles into the weighted correlation network analysis (WGCNA) pipeline, we identified the important functional modules of V. cholerae in vitro stress treatment, gene manipulation, and in vitro culture as DNA transposon, chemotaxis and signaling, signal transduction, and secondary metabolic pathways, respectively. The analysis of functional module hub genes revealed the uniqueness of clinical human samples; however, under specific expression patterning, the Δhns, ΔoxyR1 strains, and tobramycin treatment group showed high expression profile similarity with human samples. By constructing a protein-protein interaction (PPI) interaction network, we discovered several unreported novel protein interactions within transposon functional modules. IMPORTANCE We used two techniques to integrate RNA-seq data for laboratory studies with clinical microarray data for the first time. The interactions between V. cholerae genes were obtained from a global perspective, as well as comparing the similarity between clinical human samples and the current experimental conditions, and uncovering the functional modules that play a major role under different conditions. We believe that this data integration can provide us with some insight and basis for elucidating the pathogenesis and clinical control of V. cholerae.
Collapse
Affiliation(s)
- Zi-Xin Qin
- Department of Biotechnology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Guo-Zhong Chen
- Department of Biotechnology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Qian-Qian Yang
- Department of Biotechnology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Ying-Jian Wu
- Department of Bioinformatics and Systems Biology, Huazhong University of Science and Technology College of Life Sciences and Technology, Wuhan, Hubei, China
| | - Chu-Qing Sun
- Department of Bioinformatics and Systems Biology, Huazhong University of Science and Technology College of Life Sciences and Technology, Wuhan, Hubei, China
| | - Xiao-Man Yang
- Department of Biotechnology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Mei Luo
- Department of Biotechnology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Chun-Rong Yi
- Department of Biotechnology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Jun Zhu
- Department of Biotechnology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Wei-Hua Chen
- Department of Bioinformatics and Systems Biology, Huazhong University of Science and Technology College of Life Sciences and Technology, Wuhan, Hubei, China
| | - Zhi Liu
- Department of Biotechnology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
46
|
Zhang Z, Wei X. Artificial intelligence-assisted selection and efficacy prediction of antineoplastic strategies for precision cancer therapy. Semin Cancer Biol 2023; 90:57-72. [PMID: 36796530 DOI: 10.1016/j.semcancer.2023.02.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 01/12/2023] [Accepted: 02/13/2023] [Indexed: 02/16/2023]
Abstract
The rapid development of artificial intelligence (AI) technologies in the context of the vast amount of collectable data obtained from high-throughput sequencing has led to an unprecedented understanding of cancer and accelerated the advent of a new era of clinical oncology with a tone of precision treatment and personalized medicine. However, the gains achieved by a variety of AI models in clinical oncology practice are far from what one would expect, and in particular, there are still many uncertainties in the selection of clinical treatment options that pose significant challenges to the application of AI in clinical oncology. In this review, we summarize emerging approaches, relevant datasets and open-source software of AI and show how to integrate them to address problems from clinical oncology and cancer research. We focus on the principles and procedures for identifying different antitumor strategies with the assistance of AI, including targeted cancer therapy, conventional cancer therapy, and cancer immunotherapy. In addition, we also highlight the current challenges and directions of AI in clinical oncology translation. Overall, we hope this article will provide researchers and clinicians with a deeper understanding of the role and implications of AI in precision cancer therapy, and help AI move more quickly into accepted cancer guidelines.
Collapse
Affiliation(s)
- Zhe Zhang
- Laboratory of Aging Research and Cancer Drug Target, State Key Laboratory of Biotherapy and Cancer Center, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, PR China; State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy, Chengdu 610041, PR China
| | - Xiawei Wei
- Laboratory of Aging Research and Cancer Drug Target, State Key Laboratory of Biotherapy and Cancer Center, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, PR China.
| |
Collapse
|
47
|
Nilsson RH, Ryberg M, Wurzbacher C, Tedersoo L, Anslan S, Põlme S, Spirin V, Mikryukov V, Svantesson S, Hartmann M, Lennartsdotter C, Belford P, Khomich M, Retter A, Corcoll N, Gómez Martinez D, Jansson T, Ghobad-Nejhad M, Vu D, Sanchez-Garcia M, Kristiansson E, Abarenkov K. How, not if, is the question mycologists should be asking about DNA-based typification. MycoKeys 2023; 96:143-157. [PMID: 37214179 PMCID: PMC10194844 DOI: 10.3897/mycokeys.96.102669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Accepted: 03/28/2023] [Indexed: 05/24/2023] Open
Abstract
Fungal metabarcoding of substrates such as soil, wood, and water is uncovering an unprecedented number of fungal species that do not seem to produce tangible morphological structures and that defy our best attempts at cultivation, thus falling outside the scope of the International Code of Nomenclature for algae, fungi, and plants. The present study uses the new, ninth release of the species hypotheses of the UNITE database to show that species discovery through environmental sequencing vastly outpaces traditional, Sanger sequencing-based efforts in a strongly increasing trend over the last five years. Our findings challenge the present stance of some in the mycological community - that the current situation is satisfactory and that no change is needed to "the code" - and suggest that we should be discussing not whether to allow DNA-based descriptions (typifications) of species and by extension higher ranks of fungi, but what the precise requirements for such DNA-based typifications should be. We submit a tentative list of such criteria for further discussion. The present authors hope for a revitalized and deepened discussion on DNA-based typification, because to us it seems harmful and counter-productive to intentionally deny the overwhelming majority of extant fungi a formal standing under the International Code of Nomenclature for algae, fungi, and plants.
Collapse
Affiliation(s)
- R. Henrik Nilsson
- Gothenburg Global Biodiversity Centre, Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 405 30 Göteborg, Sweden
| | - Martin Ryberg
- Department of Organismal Biology, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| | - Christian Wurzbacher
- Chair of Urban Water Systems Engineering, Technical University of Munich, Am Coulombwall 3, 85748 Garching, Germany
| | - Leho Tedersoo
- Mycology and Microbiology Center, University of Tartu, Liivi 2, 50409 Tartu, Estonia
- College of Science, King Saud University, 1145 Riyadh, Saudi Arabia
| | - Sten Anslan
- Mycology and Microbiology Center, University of Tartu, Liivi 2, 50409 Tartu, Estonia
| | - Sergei Põlme
- Mycology and Microbiology Center, University of Tartu, Liivi 2, 50409 Tartu, Estonia
| | - Viacheslav Spirin
- Gothenburg Global Biodiversity Centre, Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 405 30 Göteborg, Sweden
- Institute of Ecology and Earth Sciences, University of Tartu, Liivi 2, 50409 Tartu, Estonia
| | - Vladimir Mikryukov
- Mycology and Microbiology Center, University of Tartu, Liivi 2, 50409 Tartu, Estonia
| | - Sten Svantesson
- Gothenburg Global Biodiversity Centre, Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 405 30 Göteborg, Sweden
- Department of Organismal Biology, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| | - Martin Hartmann
- Botany Unit (Mycology), Finnish Museum of Natural History, University of Helsinki, P.O. Box 7, FI-00014, Helsinki, Finland
| | - Charlotte Lennartsdotter
- Gothenburg Global Biodiversity Centre, Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 405 30 Göteborg, Sweden
| | - Pauline Belford
- Department of Environmental Systems Science, ETH Zürich, Universitätstrasse 2, 8092 Zürich, Switzerland
| | - Maryia Khomich
- Interaction Design and Software Engineering, Chalmers University of Technology, Lindholmsplatsen 1, 417 56 Göteborg, Sweden
| | - Alice Retter
- Department of Clinical Science, University of Bergen, Box 7804, 5020 Bergen, Norway
| | - Natàlia Corcoll
- Gothenburg Global Biodiversity Centre, Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 405 30 Göteborg, Sweden
| | - Daniela Gómez Martinez
- Gothenburg Global Biodiversity Centre, Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 405 30 Göteborg, Sweden
| | - Tobias Jansson
- Gothenburg Global Biodiversity Centre, Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 405 30 Göteborg, Sweden
| | - Masoomeh Ghobad-Nejhad
- Department of Functional and Evolutionary Ecology, University of Vienna, Djerassiplatz 1, A-1030 Vienna, Austria
| | - Duong Vu
- Department of Biotechnology, Iranian Research Organization for Science and Technology, PO Box 3353-5111, Tehran 3353136846, Iran
| | | | - Erik Kristiansson
- Department of Environmental Systems Science, ETH Zürich, Universitätstrasse 2, 8092 Zürich, Switzerland
| | - Kessy Abarenkov
- Mycology and Microbiology Center, University of Tartu, Liivi 2, 50409 Tartu, Estonia
| |
Collapse
|
48
|
Bremer E, Calteau A, Danchin A, Harwood C, Helmann JD, Médigue C, Palsson BO, Sekowska A, Vallenet D, Zuniga A, Zuniga C. A model industrial workhorse:
Bacillus subtilis
strain 168 and its genome after a quarter of a century. Microb Biotechnol 2023; 16:1203-1231. [PMID: 37002859 DOI: 10.1111/1751-7915.14257] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 03/20/2023] [Indexed: 04/04/2023] Open
Abstract
The vast majority of genomic sequences are automatically annotated using various software programs. The accuracy of these annotations depends heavily on the very few manual annotation efforts that combine verified experimental data with genomic sequences from model organisms. Here, we summarize the updated functional annotation of Bacillus subtilis strain 168, a quarter century after its genome sequence was first made public. Since the last such effort 5 years ago, 1168 genetic functions have been updated, allowing the construction of a new metabolic model of this organism of environmental and industrial interest. The emphasis in this review is on new metabolic insights, the role of metals in metabolism and macromolecule biosynthesis, functions involved in biofilm formation, features controlling cell growth, and finally, protein agents that allow class discrimination, thus allowing maintenance management, and accuracy of all cell processes. New 'genomic objects' and an extensive updated literature review have been included for the sequence, now available at the International Nucleotide Sequence Database Collaboration (INSDC: AccNum AL009126.4).
Collapse
Affiliation(s)
- Erhard Bremer
- Department of Biology, Laboratory for Microbiology and Center for Synthetic Microbiology (SYNMIKRO) Philipps‐University Marburg Marburg Germany
| | - Alexandra Calteau
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut de Biologie François Jacob Université d'Évry, Université Paris‐Saclay, CNRS Évry France
| | - Antoine Danchin
- School of Biomedical Sciences, Li KaShing Faculty of Medicine Hong Kong University Pokfulam SAR Hong Kong China
| | - Colin Harwood
- Centre for Bacterial Cell Biology, Biosciences Institute Newcastle University Baddiley Clark Building Newcastle upon Tyne UK
| | - John D. Helmann
- Department of Microbiology Cornell University Ithaca New York USA
| | - Claudine Médigue
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut de Biologie François Jacob Université d'Évry, Université Paris‐Saclay, CNRS Évry France
| | - Bernhard O. Palsson
- Department of Bioengineering University of California San Diego La Jolla USA
| | | | - David Vallenet
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut de Biologie François Jacob Université d'Évry, Université Paris‐Saclay, CNRS Évry France
| | - Abril Zuniga
- Department of Biology San Diego State University San Diego California USA
| | - Cristal Zuniga
- Bioinformatics and Medical Informatics Graduate Program San Diego State University San Diego California USA
| |
Collapse
|
49
|
Lee B, Hwang S, Kim PG, Ko G, Jang K, Kim S, Kim JH, Jeon J, Kim H, Jung J, Yoon BH, Byeon I, Jang I, Song W, Choi J, Kim SY. Introduction of the Korea BioData Station (K-BDS) for sharing biological data. Genomics Inform 2023; 21:e12. [PMID: 37037470 PMCID: PMC10085736 DOI: 10.5808/gi.22073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 03/06/2023] [Indexed: 04/03/2023] Open
Abstract
A wave of new technologies has created opportunities for the cost-effective generation of high-throughput profiles of biological systems, foreshadowing a "data-driven science" era. The large variety of data available from biological research is also a rich resource that can be used for innovative endeavors. However, we are facing considerable challenges in big data deposition, integration, and translation due to the complexity of biological data and its production at unprecedented exponential rates. To address these problems, in 2020, the Korean government officially announced a national strategy to collect and manage the biological data produced through national R&D fund allocations and provide the collected data to researchers. To this end, the Korea Bioinformation Center (KOBIC) developed a new biological data repository, the Korea BioData Station (K-BDS), for sharing data from individual researchers and research programs to create a data-driven biological study environment. The K-BDS is dedicated to providing free open access to a suite of featured data resources in support of worldwide activities in both academia and industry.
Collapse
|
50
|
Nawrocki EP. Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR. NAR Genom Bioinform 2023; 5:lqad002. [PMID: 36685728 PMCID: PMC9853093 DOI: 10.1093/nargab/lqad002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 11/28/2022] [Accepted: 01/03/2023] [Indexed: 01/22/2023] Open
Abstract
In 2020 and 2021, >1.5 million SARS-CoV-2 sequences were submitted to GenBank. The initial version (v1.0) of the VADR (Viral Annotation DefineR) software package that GenBank uses to automatically validate and annotate incoming viral sequences is too slow and memory intensive to process many thousands of SARS-CoV-2 sequences in a reasonable amount of time. Additionally, long stretches of ambiguous N nucleotides, which are common in many SARS-CoV-2 sequences, prevent VADR from accurate validation and annotation. VADR has been updated to more accurately and rapidly annotate SARS-CoV-2 sequences. Stretches of consecutive Ns are now identified and temporarily replaced with expected nucleotides to facilitate processing, and the slowest steps have been overhauled using blastn and glsearch, increasing speed, reducing the memory requirement from 64Gb to 2Gb per thread, and allowing simple, coarse-grained parallelization on multiple processors per host. VADR is now nearly 1000 times faster than it was in early 2020 SARS-CoV-2 sequence processing. It has been used to screen and annotate more than 1.5 million SARS-CoV-2 sequences since June 2020, and it is now efficient enough to cope with the current rate of hundreds of thousands of submitted sequences per month.
Collapse
Affiliation(s)
- Eric P Nawrocki
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| |
Collapse
|