1
|
Wang B, Chougule K, Jiao Y, Olson A, Kumar V, Gladman N, Huang J, Llaca V, Fengler K, Wei X, Wang L, Wang X, Regulski M, Drenkow J, Gingeras T, Hayes C, Armstrong J, Huang Y, Xin Z, Ware D. High-quality chromosome scale genome assemblies of two important Sorghum inbred lines, Tx2783 and RTx436. NAR Genom Bioinform 2024; 6:lqae097. [PMID: 39131819 PMCID: PMC11310780 DOI: 10.1093/nargab/lqae097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 07/01/2024] [Accepted: 07/23/2024] [Indexed: 08/13/2024] Open
Abstract
Sorghum bicolor (L.) Moench is a significant grass crop globally, known for its genetic diversity. High quality genome sequences are needed to capture the diversity. We constructed high-quality, chromosome-level genome assemblies for two vital sorghum inbred lines, Tx2783 and RTx436. Through advanced single-molecule techniques, long-read sequencing and optical maps, we improved average sequence continuity 19-fold and 11-fold higher compared to existing Btx623 v3.0 reference genome and obtained 19 and 18 scaffolds (N50 of 25.6 and 14.4) for Tx2783 and RTx436, respectively. Our gene annotation efforts resulted in 29 612 protein-coding genes for the Tx2783 genome and 29 265 protein-coding genes for the RTx436 genome. Comparative analyses with 26 plant genomes which included 18 sorghum genomes and 8 outgroup species identified around 31 210 protein-coding gene families, with about 13 956 specific to sorghum. Using representative models from gene trees across the 18 sorghum genomes, a total of 72 579 pan-genes were identified, with 14% core, 60% softcore and 26% shell genes. We identified 99 genes in Tx2783 and 107 genes in RTx436 that showed functional enrichment specifically in binding and metabolic processes, as revealed by the GO enrichment Pearson Chi-Square test. We detected 36 potential large inversions in the comparison between the BTx623 Bionano map and the BTx623 v3.1 reference sequence. Strikingly, these inversions were notably absent when comparing Tx2783 or RTx436 with the BTx623 Bionano map. These inversion were mostly in the pericentromeric region which is known to have low complexity regions and harder to assemble and suggests the presence of potential artifacts in the public BTx623 reference assembly. Furthermore, in comparison to Tx2783, RTx436 exhibited 324 883 additional Single Nucleotide Polymorphisms (SNPs) and 16 506 more Insertions/Deletions (INDELs) when using BTx623 as the reference genome. We also characterized approximately 348 nucleotide-binding leucine-rich repeat (NLR) disease resistance genes in the two genomes. These high-quality genomes serve as valuable resources for discovering agronomic traits and structural variation studies.
Collapse
Affiliation(s)
- Bo Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Yinping Jiao
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
- Texas Tech University, 1006 Canton Ave, Lubbock, TX 79409-2122, USA
| | - Andrew Olson
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Vivek Kumar
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Nicholas Gladman
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
- USDA ARS Robert W. Holley Center for Agriculture and Health Cornell University, Ithaca, NY, USA
| | - Jian Huang
- Department of Plant and Soil Sciences, Oklahoma State University, Stillwater, OK 74078-6028, USA
| | - Victor Llaca
- Corteva Agriscience™, 8325 NW 62nd Avenue, Johnston, IA 50131, USA
| | - Kevin Fengler
- Corteva Agriscience™, 8325 NW 62nd Avenue, Johnston, IA 50131, USA
| | - Xuehong Wei
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Liya Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Xiaofei Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Jorg Drenkow
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Chad Hayes
- U.S. Department of Agriculture-Agricultural Research Service, Plant Stress and Germplasm Development Unit, Cropping Systems Research Laboratory, Lubbock, TX 79415, USA
| | - J Scott Armstrong
- Peanut and Small Grains Research Unit, 1301 N. Western Rd. Stillwater, OK 74075, USA
| | - Yinghua Huang
- USDA-ARS Plant Science Research Laboratory, 1301 N. Western Road, Stillwater, OK 74075-2714, USA
- Dept. of Plant Biology, Ecology, and Evolution, 301 Physical Sciences, Stillwater, OK 74078-3013, USA
| | - Zhanguo Xin
- U.S. Department of Agriculture-Agricultural Research Service, Plant Stress and Germplasm Development Unit, Cropping Systems Research Laboratory, Lubbock, TX 79415, USA
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
- USDA ARS Robert W. Holley Center for Agriculture and Health Cornell University, Ithaca, NY, USA
| |
Collapse
|
2
|
Pflughaupt P, Sahakyan AB. Generalised interrelations among mutation rates drive the genomic compliance of Chargaff's second parity rule. Nucleic Acids Res 2023; 51:7409-7423. [PMID: 37293966 PMCID: PMC10415130 DOI: 10.1093/nar/gkad477] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 05/05/2023] [Accepted: 05/17/2023] [Indexed: 06/10/2023] Open
Abstract
Chargaff's second parity rule (PR-2), where the complementary base and k-mer contents are matching within the same strand of a double stranded DNA (dsDNA), is a phenomenon that invited many explanations. The strict compliance of nearly all nuclear dsDNA to PR-2 implies that the explanation should also be similarly adamant. In this work, we revisited the possibility of mutation rates driving PR-2 compliance. Starting from the assumption-free approach, we constructed kinetic equations for unconstrained simulations. The results were analysed for their PR-2 compliance by employing symbolic regression and machine learning techniques. We arrived to a generalised set of mutation rate interrelations in place in most species that allow for their full PR-2 compliance. Importantly, our constraints explain PR-2 in genomes out of the scope of the prior explanations based on the equilibration under mutation rates with simpler no-strand-bias constraints. We thus reinstate the role of mutation rates in PR-2 through its molecular core, now shown, under our formulation, to be tolerant to previously noted strand biases and incomplete compositional equilibration. We further investigate the time for any genome to reach PR-2, showing that it is generally earlier than the compositional equilibrium, and well within the age of life on Earth.
Collapse
Affiliation(s)
- Patrick Pflughaupt
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK
| | - Aleksandr B Sahakyan
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK
| |
Collapse
|
3
|
Zhou Y, Yu Z, Chebotarov D, Chougule K, Lu Z, Rivera LF, Kathiresan N, Al-Bader N, Mohammed N, Alsantely A, Mussurova S, Santos J, Thimma M, Troukhan M, Fornasiero A, Green CD, Copetti D, Kudrna D, Llaca V, Lorieux M, Zuccolo A, Ware D, McNally K, Zhang J, Wing RA. Pan-genome inversion index reveals evolutionary insights into the subpopulation structure of Asian rice. Nat Commun 2023; 14:1567. [PMID: 36944612 PMCID: PMC10030860 DOI: 10.1038/s41467-023-37004-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 02/27/2023] [Indexed: 03/23/2023] Open
Abstract
Understanding and exploiting genetic diversity is a key factor for the productive and stable production of rice. Here, we utilize 73 high-quality genomes that encompass the subpopulation structure of Asian rice (Oryza sativa), plus the genomes of two wild relatives (O. rufipogon and O. punctata), to build a pan-genome inversion index of 1769 non-redundant inversions that span an average of ~29% of the O. sativa cv. Nipponbare reference genome sequence. Using this index, we estimate an inversion rate of ~700 inversions per million years in Asian rice, which is 16 to 50 times higher than previously estimated for plants. Detailed analyses of these inversions show evidence of their effects on gene expression, recombination rate, and linkage disequilibrium. Our study uncovers the prevalence and scale of large inversions (≥100 bp) across the pan-genome of Asian rice and hints at their largely unexplored role in functional biology and crop performance.
Collapse
Affiliation(s)
- Yong Zhou
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
- Arizona Genomics Institute (AGI), School of Plant Sciences, University of Arizona, Tucson, AZ, 85721, USA
| | - Zhichao Yu
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Dmytro Chebotarov
- International Rice Research Institute (IRRI), Los Baños, 4031, Laguna, Philippines
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Zhenyuan Lu
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Luis F Rivera
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Nagarajan Kathiresan
- Supercomputing Core Lab, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Noor Al-Bader
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Nahed Mohammed
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Aseel Alsantely
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Saule Mussurova
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - João Santos
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Manjula Thimma
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | | | - Alice Fornasiero
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Carl D Green
- Information Technology Department, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Dario Copetti
- Arizona Genomics Institute (AGI), School of Plant Sciences, University of Arizona, Tucson, AZ, 85721, USA
| | - David Kudrna
- Arizona Genomics Institute (AGI), School of Plant Sciences, University of Arizona, Tucson, AZ, 85721, USA
| | - Victor Llaca
- Research and Development, Corteva Agriscience, Johnston, IA, 50131, USA
| | - Mathias Lorieux
- DIADE, University of Montpellier, CIRAD, IRD, Montpellier, France
| | - Andrea Zuccolo
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
- Crop Science Research Center (CSRC), Scuola Superiore Sant'Anna, Pisa, 56127, Italy.
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
- USDA ARS NEA Plant, Soil & Nutrition Laboratory Research Unit, Ithaca, NY, 14853, USA.
| | - Kenneth McNally
- International Rice Research Institute (IRRI), Los Baños, 4031, Laguna, Philippines.
| | - Jianwei Zhang
- Arizona Genomics Institute (AGI), School of Plant Sciences, University of Arizona, Tucson, AZ, 85721, USA.
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Rod A Wing
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
- Arizona Genomics Institute (AGI), School of Plant Sciences, University of Arizona, Tucson, AZ, 85721, USA.
- International Rice Research Institute (IRRI), Los Baños, 4031, Laguna, Philippines.
| |
Collapse
|
4
|
Voelker WG, Krishnan K, Chougule K, Alexander LC, Lu Z, Olson A, Ware D, Songsomboon K, Ponce C, Brenton ZW, Boatwright JL, Cooper EA. Ten new high-quality genome assemblies for diverse bioenergy sorghum genotypes. FRONTIERS IN PLANT SCIENCE 2023; 13:1040909. [PMID: 36684744 PMCID: PMC9846640 DOI: 10.3389/fpls.2022.1040909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/09/2022] [Indexed: 06/17/2023]
Abstract
Introduction Sorghum (Sorghum bicolor (L.) Moench) is an agriculturally and economically important staple crop that has immense potential as a bioenergy feedstock due to its relatively high productivity on marginal lands. To capitalize on and further improve sorghum as a potential source of sustainable biofuel, it is essential to understand the genomic mechanisms underlying complex traits related to yield, composition, and environmental adaptations. Methods Expanding on a recently developed mapping population, we generated de novo genome assemblies for 10 parental genotypes from this population and identified a comprehensive set of over 24 thousand large structural variants (SVs) and over 10.5 million single nucleotide polymorphisms (SNPs). Results We show that SVs and nonsynonymous SNPs are enriched in different gene categories, emphasizing the need for long read sequencing in crop species to identify novel variation. Furthermore, we highlight SVs and SNPs occurring in genes and pathways with known associations to critical bioenergy-related phenotypes and characterize the landscape of genetic differences between sweet and cellulosic genotypes. Discussion These resources can be integrated into both ongoing and future mapping and trait discovery for sorghum and its myriad uses including food, feed, bioenergy, and increasingly as a carbon dioxide removal mechanism.
Collapse
Affiliation(s)
- William G. Voelker
- Dept. of Bioinformatics & Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
- North Carolina Research Campus, Kannapolis, NC, United States
| | - Krittika Krishnan
- Dept. of Bioinformatics & Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
- North Carolina Research Campus, Kannapolis, NC, United States
| | - Kapeel Chougule
- Cold Spring Harbor Research Laboratory, Cold Spring Harbor, NY, United States
| | - Louie C. Alexander
- Dept. of Bioinformatics & Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
- North Carolina Research Campus, Kannapolis, NC, United States
| | - Zhenyuan Lu
- Cold Spring Harbor Research Laboratory, Cold Spring Harbor, NY, United States
| | - Andrew Olson
- Cold Spring Harbor Research Laboratory, Cold Spring Harbor, NY, United States
| | - Doreen Ware
- Cold Spring Harbor Research Laboratory, Cold Spring Harbor, NY, United States
- United States Department of Agriculture - Agricultural Research Service in the North Atlantic Area (USDA-ARS NAA), Robert W. Holley Center for Agriculture and Health, Ithaca, NY, United States
| | - Kittikun Songsomboon
- Dept. of Bioinformatics & Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
- North Carolina Research Campus, Kannapolis, NC, United States
| | - Cristian Ponce
- Dept. of Bioinformatics & Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
- North Carolina Research Campus, Kannapolis, NC, United States
| | - Zachary W. Brenton
- Carolina Seed Systems, Darlington, SC, United States
- Advanced Plant Technology, Clemson University, Clemson, SC, United States
| | - J. Lucas Boatwright
- Advanced Plant Technology, Clemson University, Clemson, SC, United States
- Dept. of Plant and Environmental Sciences, Clemson University, Clemson, SC, United States
| | - Elizabeth A. Cooper
- Dept. of Bioinformatics & Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
- North Carolina Research Campus, Kannapolis, NC, United States
| |
Collapse
|
5
|
Lefranc MP, Lefranc G. Antibody Sequence and Structure Analyses Using IMGT ®: 30 Years of Immunoinformatics. Methods Mol Biol 2023; 2552:3-59. [PMID: 36346584 DOI: 10.1007/978-1-0716-2609-2_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
IMGT®, the international ImMunoGeneTics information system®, http://www.imgt.org , the global reference in immunogenetics and immunoinformatics, was created in 1989 by Marie-Paule Lefranc (Université de Montpellier and CNRS) to manage the huge diversity of the antigen receptors, immunoglobulins (IG) or antibodies, and T cell receptors (TR) of the adaptive immune responses. The founding of IMGT® marked the advent of immunoinformatics, which emerged at the interface between immunogenetics and bioinformatics. IMGT® standardized analysis of the IG, TR, and major histocompatibility (MH) genes and proteins bridges the gap between sequences and three-dimensional (3D) structures, for all jawed vertebrates from fish to humans. This is achieved through the IMGT Scientific chart rules, based on the IMGT-ONTOLOGY axioms, and primarily CLASSIFICATION (IMGT gene and allele nomenclature) and NUMEROTATION (IMGT unique numbering and IMGT Colliers de Perles). IMGT® comprises seven databases (IMGT/LIGM-DB for nucleotide sequences, IMGT/GENE-DB for genes and alleles, etc.), 17 tools (IMGT/V-QUEST, IMGT/JunctionAnalysis, IMGT/HighV-QUEST for NGS, etc.), and more than 20,000 Web resources. In this chapter, the focus is on the tools for amino acid sequences per domain (IMGT/DomainGapAlign and IMGT/Collier-de-Perles), and on the databases for receptors (IMGT/2Dstructure-DB and IMGT/3D-structure-DB) described per receptor, chain, and domain and, for 3D, with contact analysis, paratope, and epitope. The IMGT/mAb-DB is the query interface for monoclonal antibodies (mAb), fusion proteins for immune applications (FPIA), composite proteins for clinical applications (CPCA), and related proteins of interest (RPI) with links to IMGT® 2D and 3D databases and to the World Health Organization (WHO) International Nonproprietary Names (INN) program lists. The chapter includes the human IG allotypes and antibody engineered variants for effector properties used in the description of therapeutical mAb.
Collapse
Affiliation(s)
- Marie-Paule Lefranc
- IMGT®, the international ImMunoGeneTics information system®, Laboratoire d'ImmunoGénétique Moléculaire LIGM, Institut de Génétique Humaine IGH, UMR 9002 CNRS, Université de Montpellier, Montpellier cedex 5, France.
| | - Gérard Lefranc
- IMGT®, the international ImMunoGeneTics information system®, Laboratoire d'ImmunoGénétique Moléculaire LIGM, Institut de Génétique Humaine IGH, UMR 9002 CNRS, Université de Montpellier, Montpellier cedex 5, France.
| |
Collapse
|
6
|
Contreras-Moreira B, Filippi CV, Naamati G, Girón CG, Allen JE, Flicek P. K-mer counting and curated libraries drive efficient annotation of repeats in plant genomes. THE PLANT GENOME 2021; 14:e20143. [PMID: 34562304 PMCID: PMC7614178 DOI: 10.1002/tpg2.20143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 07/06/2021] [Indexed: 06/13/2023]
Abstract
The annotation of repetitive sequences within plant genomes can help in the interpretation of observed phenotypes. Moreover, repeat masking is required for tasks such as whole-genome alignment, promoter analysis, or pangenome exploration. Although homology-based annotation methods are computationally expensive, k-mer strategies for masking are orders of magnitude faster. Here, we benchmarked a two-step approach, where repeats were first called by k-mer counting and then annotated by comparison to curated libraries. This hybrid protocol was tested on 20 plant genomes from Ensembl, with the k-mer-based Repeat Detector (Red) and two repeat libraries (REdat, last updated in 2013, and nrTEplants, curated for this work). Custom libraries produced by RepeatModeler were also tested. We obtained repeated genome fractions that matched those reported in the literature but with shorter repeated elements than those produced directly by sequence homology. Inspection of the masked regions that overlapped genes revealed no preference for specific protein domains. Most Red-masked sequences could be successfully classified by sequence similarity, with the complete protocol taking less than 2 h on a desktop Linux box. A guide to curating your own repeat libraries and the scripts for masking and annotating plant genomes can be obtained at https://github.com/Ensembl/plant-scripts.
Collapse
Affiliation(s)
- Bruno Contreras-Moreira
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carla V Filippi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Instituto de Biotecnología, Centro de Investigaciones en Ciencias Veterinarias y Agronómicas (CICVyA), Instituto Nacional de Tecnología Agropecuaria (INTA); Instituto de Agrobiotecnología y Biología Molecular (IABIMO), INTA-Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) Nicolas Repetto y Los Reseros s/n (1686), Hurlingham, Buenos Aires, Argentina
- CONICET, Av Rivadavia 1917, C1033AAJ Ciudad de Buenos Aires, Argentina
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - James E Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
7
|
Hufford MB, Seetharam AS, Woodhouse MR, Chougule KM, Ou S, Liu J, Ricci WA, Guo T, Olson A, Qiu Y, Della Coletta R, Tittes S, Hudson AI, Marand AP, Wei S, Lu Z, Wang B, Tello-Ruiz MK, Piri RD, Wang N, Kim DW, Zeng Y, O'Connor CH, Li X, Gilbert AM, Baggs E, Krasileva KV, Portwood JL, Cannon EKS, Andorf CM, Manchanda N, Snodgrass SJ, Hufnagel DE, Jiang Q, Pedersen S, Syring ML, Kudrna DA, Llaca V, Fengler K, Schmitz RJ, Ross-Ibarra J, Yu J, Gent JI, Hirsch CN, Ware D, Dawe RK. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 2021; 373:655-662. [PMID: 34353948 PMCID: PMC8733867 DOI: 10.1126/science.abg5289] [Citation(s) in RCA: 247] [Impact Index Per Article: 82.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 06/24/2021] [Indexed: 12/24/2022]
Abstract
We report de novo genome assemblies, transcriptomes, annotations, and methylomes for the 26 inbreds that serve as the founders for the maize nested association mapping population. The number of pan-genes in these diverse genomes exceeds 103,000, with approximately a third found across all genotypes. The results demonstrate that the ancient tetraploid character of maize continues to degrade by fractionation to the present day. Excellent contiguity over repeat arrays and complete annotation of centromeres revealed additional variation in major cytological landmarks. We show that combining structural variation with single-nucleotide polymorphisms can improve the power of quantitative mapping studies. We also document variation at the level of DNA methylation and demonstrate that unmethylated regions are enriched for cis-regulatory elements that contribute to phenotypic variation.
Collapse
Affiliation(s)
- Matthew B Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Arun S Seetharam
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
- Genome Informatics Facility, Iowa State University, Ames, IA 50011, USA
| | - Margaret R Woodhouse
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | | | - Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Jianing Liu
- Department of Genetics, University of Georgia, Athens, GA 30602, USA
| | - William A Ricci
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | - Tingting Guo
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Andrew Olson
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Yinjie Qiu
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Silas Tittes
- Center for Population Biology, University of California, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - Asher I Hudson
- Center for Population Biology, University of California, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | | | - Sharon Wei
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Zhenyuan Lu
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Bo Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | | | - Rebecca D Piri
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Na Wang
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | - Dong Won Kim
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | - Yibing Zeng
- Department of Genetics, University of Georgia, Athens, GA 30602, USA
| | - Christine H O'Connor
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN 55108, USA
| | - Xianran Li
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Amanda M Gilbert
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Erin Baggs
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Ksenia V Krasileva
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - John L Portwood
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Ethalinda K S Cannon
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Carson M Andorf
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Nancy Manchanda
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Samantha J Snodgrass
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - David E Hufnagel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
- Virus and Prion Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA, 50010, USA
| | - Qiuhan Jiang
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Sarah Pedersen
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Michael L Syring
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - David A Kudrna
- Arizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
| | | | | | - Robert J Schmitz
- Department of Genetics, University of Georgia, Athens, GA 30602, USA
| | - Jeffrey Ross-Ibarra
- Center for Population Biology, University of California, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Jianming Yu
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Jonathan I Gent
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Doreen Ware
- USDA-ARS NAA Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, NY 14853, USA
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - R Kelly Dawe
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
8
|
Luo H, Liu H, Zhang J, Hu B, Zhou C, Xiang M, Yang Y, Zhou M, Jing T, Li Z, Zhou X, Lv G, He W, Zeng B, Xiao S, Li Q, Ye H. Full-length transcript sequencing accelerates the transcriptome research of Gymnocypris namensis, an iconic fish of the Tibetan Plateau. Sci Rep 2020; 10:9668. [PMID: 32541658 PMCID: PMC7296019 DOI: 10.1038/s41598-020-66582-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 05/25/2020] [Indexed: 12/11/2022] Open
Abstract
Gymnocypris namensis, the only commercial fish in Namtso Lake of Tibet in China, is rated as nearly threatened species in the Red List of China's Vertebrates. As one of the highest-altitude schizothorax fish in China, G. namensis has strong adaptability to the plateau harsh environment. Although being an indigenous economic fish with high value in research, the biological characterization, genetic diversity, and plateau adaptability of G. namensis are still unclear. Here, we used Pacific Biosciences single molecular real time long read sequencing technology to generate full-length transcripts of G. namensis. Sequences clustering analysis and error correction with Illumina-produced short reads to obtain 319,044 polished isoforms. After removing redundant reads, 125,396 non-redundant isoforms were obtained. Among all transcripts, 103,286 were annotated to public databases. Natural selection has acted on 42 genes for G. namensis, which were enriched on the functions of mismatch repair and Glutathione metabolism. Total 89,736 open reading frames, 95,947 microsatellites, and 21,360 long non-coding RNAs were identified across all transcripts. This is the first study of transcriptome in G. namensis by using PacBio Iso-seq. The acquisition of full-length transcript isoforms might accelerate the transcriptome research of G. namensis and provide basis for further research.
Collapse
Affiliation(s)
- Hui Luo
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University College of Animal Sciences, Chongqing, 402460, China
- Key Laboratory of Aquatic Science of Chongqing, 400175, Chongqing, China
| | - Haiping Liu
- Institute of Fisheries Science, Tibet Academy of Agricultural and Animal Husbandry Sciences, Lhasa, 850000, China
| | - Jie Zhang
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University College of Animal Sciences, Chongqing, 402460, China
| | - Bingjie Hu
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University College of Animal Sciences, Chongqing, 402460, China
| | - Chaowei Zhou
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University College of Animal Sciences, Chongqing, 402460, China
- Key Laboratory of Aquatic Science of Chongqing, 400175, Chongqing, China
| | - Mengbin Xiang
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University College of Animal Sciences, Chongqing, 402460, China
| | - Yuejing Yang
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University College of Animal Sciences, Chongqing, 402460, China
- Key Laboratory of Aquatic Science of Chongqing, 400175, Chongqing, China
| | - Mingrui Zhou
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University College of Animal Sciences, Chongqing, 402460, China
- Key Laboratory of Aquatic Science of Chongqing, 400175, Chongqing, China
| | - Tingsen Jing
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University College of Animal Sciences, Chongqing, 402460, China
- Key Laboratory of Aquatic Science of Chongqing, 400175, Chongqing, China
| | - Zhe Li
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University College of Animal Sciences, Chongqing, 402460, China
| | - Xinghua Zhou
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University College of Animal Sciences, Chongqing, 402460, China
- Key Laboratory of Aquatic Science of Chongqing, 400175, Chongqing, China
| | - Guangjun Lv
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University College of Animal Sciences, Chongqing, 402460, China
- Key Laboratory of Aquatic Science of Chongqing, 400175, Chongqing, China
| | - Wenping He
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University College of Animal Sciences, Chongqing, 402460, China
- Key Laboratory of Aquatic Science of Chongqing, 400175, Chongqing, China
| | - Benhe Zeng
- Institute of Fisheries Science, Tibet Academy of Agricultural and Animal Husbandry Sciences, Lhasa, 850000, China
| | - Shijun Xiao
- Department of Computer Science, Wuhan University of Technology, Wuhan, 430070, China.
| | - Qinglu Li
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University College of Animal Sciences, Chongqing, 402460, China.
| | - Hua Ye
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University College of Animal Sciences, Chongqing, 402460, China.
- Key Laboratory of Aquatic Science of Chongqing, 400175, Chongqing, China.
| |
Collapse
|
9
|
Abstract
IMGT®, the international ImMunoGeneTics information system® ( http://www.imgt.org ), was created in 1989 by Marie-Paule Lefranc (Université de Montpellier and CNRS) to manage the huge diversity of the antigen receptors, immunoglobulins (IG) or antibodies, and T cell receptors (TR). The founding of IMGT® marked the advent of immunoinformatics, which emerged at the interface between immunogenetics and bioinformatics. Standardized sequence and structure analysis of antibody using IMGT® databases and tools allow one to bridge, for the first time, the gap between antibody sequences and three-dimensional (3D) structures. This is achieved through the IMGT Scientific chart rules, based on the IMGT-ONTOLOGY concepts of classification (IMGT gene and allele nomenclature), description (IMGT standardized labels), and numerotation (IMGT unique numbering and IMGT Collier de Perles). IMGT® is acknowledged as the global reference for immunogenetics and immunoinformatics, and its standards are particularly useful for antibody engineering and humanization. IMGT® databases for antibody nucleotide sequences and genes include IMGT/LIGM-DB and IMGT/GENE-DB, respectively, and nucleotide sequence analysis is performed by the IMGT/V-QUEST and IMGT/JunctionAnalysis tools and for NGS by IMGT/HighV-QUEST. In this chapter, we focus on IMGT® databases and tools for amino acid sequences, two-dimensional (2D) and three-dimensional (3D) structures: the IMGT/DomainGapAlign and IMGT Collier de Perles tools and the IMGT/2Dstructure-DB and IMGT/3Dstructure-DB database. IMGT/mAb-DB provides the query interface for monoclonal antibodies (mAb), fusion proteins for immune applications (FPIA), and composite proteins for clinical applications (CPCA) and related proteins of interest (RPI) and links to the proposed and recommended lists of the World Health Organization International Nonproprietary Name (WHO INN) programme, to IMGT/2Dstructure-DB for amino acid sequences, and to IMGT/3Dstructure-DB and its associated tools (IMGT/StructuralQuery, IMGT/DomainSuperimpose) for crystallized antibodies.
Collapse
|
10
|
Ruffier M, Kähäri A, Komorowska M, Keenan S, Laird M, Longden I, Proctor G, Searle S, Staines D, Taylor K, Vullo A, Yates A, Zerbino D, Flicek P. Ensembl core software resources: storage and programmatic access for DNA sequence and genome annotation. Database (Oxford) 2017; 2017:3074789. [PMID: 28365736 PMCID: PMC5467575 DOI: 10.1093/database/bax020] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Revised: 02/07/2017] [Accepted: 02/20/2017] [Indexed: 01/09/2023]
Abstract
The Ensembl software resources are a stable infrastructure to store, access and manipulate genome assemblies and their functional annotations. The Ensembl 'Core' database and Application Programming Interface (API) was our first major piece of software infrastructure and remains at the centre of all of our genome resources. Since its initial design more than fifteen years ago, the number of publicly available genomic, transcriptomic and proteomic datasets has grown enormously, accelerated by continuous advances in DNA-sequencing technology. Initially intended to provide annotation for the reference human genome, we have extended our framework to support the genomes of all species as well as richer assembly models. Cross-referenced links to other informatics resources facilitate searching our database with a variety of popular identifiers such as UniProt and RefSeq. Our comprehensive and robust framework storing a large diversity of genome annotations in one location serves as a platform for other groups to generate and maintain their own tailored annotation. We welcome reuse and contributions: our databases and APIs are publicly available, all of our source code is released with a permissive Apache v2.0 licence at http://github.com/Ensembl and we have an active developer mailing list ( http://www.ensembl.org/info/about/contact/index.html ). Database URL http://www.ensembl.org.
Collapse
Affiliation(s)
- Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andreas Kähäri
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Monika Komorowska
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephen Keenan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Laird
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ian Longden
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Glenn Proctor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Steve Searle
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Daniel Staines
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alessandro Vullo
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| |
Collapse
|
11
|
Aken BL, Achuthan P, Akanni W, Amode MR, Bernsdorff F, Bhai J, Billis K, Carvalho-Silva D, Cummins C, Clapham P, Gil L, Girón CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Juettemann T, Keenan S, Laird MR, Lavidas I, Maurel T, McLaren W, Moore B, Murphy DN, Nag R, Newman V, Nuhn M, Ong CK, Parker A, Patricio M, Riat HS, Sheppard D, Sparrow H, Taylor K, Thormann A, Vullo A, Walts B, Wilder SP, Zadissa A, Kostadima M, Martin FJ, Muffato M, Perry E, Ruffier M, Staines DM, Trevanion SJ, Cunningham F, Yates A, Zerbino DR, Flicek P. Ensembl 2017. Nucleic Acids Res 2016; 45:D635-D642. [PMID: 27899575 PMCID: PMC5210575 DOI: 10.1093/nar/gkw1104] [Citation(s) in RCA: 409] [Impact Index Per Article: 51.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2016] [Revised: 10/25/2016] [Accepted: 10/28/2016] [Indexed: 12/12/2022] Open
Abstract
Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.
Collapse
Affiliation(s)
- Bronwen L Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Premanand Achuthan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Wasiu Akanni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - M Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Friederike Bernsdorff
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jyothish Bhai
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Denise Carvalho-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carla Cummins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter Clapham
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Laurent Gil
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Leo Gordon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sophie H Janacek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Juettemann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephen Keenan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew R Laird
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ilias Lavidas
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Maurel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - William McLaren
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Benjamin Moore
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel N Murphy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rishi Nag
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Victoria Newman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Michael Nuhn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Chuang Kee Ong
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mateus Patricio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Harpreet Singh Riat
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Helen Sparrow
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anja Thormann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alessandro Vullo
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Brandon Walts
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Steven P Wilder
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Amonida Zadissa
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Myrto Kostadima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emily Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel M Staines
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephen J Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel R Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK .,Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
12
|
Howe KL, Bolt BJ, Shafie M, Kersey P, Berriman M. WormBase ParaSite - a comprehensive resource for helminth genomics. Mol Biochem Parasitol 2016; 215:2-10. [PMID: 27899279 PMCID: PMC5486357 DOI: 10.1016/j.molbiopara.2016.11.005] [Citation(s) in RCA: 395] [Impact Index Per Article: 49.4] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Revised: 11/24/2016] [Accepted: 11/25/2016] [Indexed: 12/02/2022]
Abstract
WormBase ParaSite is a new resource for helminth genomics. The resource provides access to over 100 nematode and platyhelminth genomes. The genomes are consistently annotated, organised and presented. A variety of views and tools for exploring and querying the data are provided.
The number of publicly available parasitic worm genome sequences has increased dramatically in the past three years, and research interest in helminth functional genomics is now quickly gathering pace in response to the foundation that has been laid by these collective efforts. A systematic approach to the organisation, curation, analysis and presentation of these data is clearly vital for maximising the utility of these data to researchers. We have developed a portal called WormBase ParaSite (http://parasite.wormbase.org) for interrogating helminth genomes on a large scale. Data from over 100 nematode and platyhelminth species are integrated, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide several ways of exploring the data, including genome browsers, genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review, we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the displays and tools available to users for interrogating helminth genomic data.
Collapse
Affiliation(s)
- Kevin L Howe
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | - Bruce J Bolt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Myriam Shafie
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Paul Kersey
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Berriman
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| |
Collapse
|
13
|
Lim JH, Latysheva NS, Iggo RD, Barker D. Cluster Analysis of p53 Binding Site Sequences Reveals Subsets with Different Functions. Cancer Inform 2016; 15:199-209. [PMID: 27812278 PMCID: PMC5081245 DOI: 10.4137/cin.s39968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Revised: 08/31/2016] [Accepted: 09/09/2016] [Indexed: 11/05/2022] Open
Abstract
p53 is an important regulator of cell cycle arrest, senescence, apoptosis and metabolism, and is frequently mutated in tumors. It functions as a tetramer, where each component dimer binds to a decameric DNA region known as a response element. We identify p53 binding site subtypes and examine the functional and evolutionary properties of these subtypes. We start with over 1700 known binding sites and, with no prior labeling, identify two sets of response elements by unsupervised clustering. When combined, they give rise to three types of p53 binding sites. We find that probabilistic and alignment-based assessments of cross-species conservation show no strong evidence of differential conservation between types of binding sites. In contrast, functional analysis of the genes most proximal to the binding sites provides strong bioinformatic evidence of functional differentiation between the three types of binding sites. Our results are consistent with recent structural data identifying two conformations of the L1 loop in the DNA binding domain, suggesting that they reflect biologically meaningful groups imposed by the p53 protein structure.
Collapse
Affiliation(s)
- Ji-Hyun Lim
- School of Biology, University of St Andrews, St Andrews, UK
- School of Medicine, University of St Andrews, St Andrews, UK
- Current address: Alacris Theranostics GmbH, Berlin, Germany
| | - Natasha S. Latysheva
- School of Biology, University of St Andrews, St Andrews, UK
- Current address: MRC Laboratory of Molecular Biology, Cambridge, UK
| | - Richard D. Iggo
- School of Medicine, University of St Andrews, St Andrews, UK
- INSERM Unit U1218, University of Bordeaux, Institut Bergonie, Bordeaux, France
| | - Daniel Barker
- School of Biology, University of St Andrews, St Andrews, UK
- Current address: Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
14
|
The Tetraodon nigroviridis reference transcriptome: developmental transition, length retention and microsynteny of long non-coding RNAs in a compact vertebrate genome. Sci Rep 2016; 6:33210. [PMID: 27628538 PMCID: PMC5024134 DOI: 10.1038/srep33210] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Accepted: 07/28/2016] [Indexed: 01/03/2023] Open
Abstract
Pufferfish such as fugu and tetraodon carry the smallest genomes among all vertebrates and are ideal for studying genome evolution. However, comparative genomics using these species is hindered by the poor annotation of their genomes. We performed RNA sequencing during key stages of maternal to zygotic transition of Tetraodon nigroviridis and report its first developmental transcriptome. We assembled 61,033 transcripts (23,837 loci) representing 80% of the annotated gene models and 3816 novel coding transcripts from 2667 loci. We demonstrate the similarities of gene expression profiles between pufferfish and zebrafish during maternal to zygotic transition and annotated 1120 long non-coding RNAs (lncRNAs) many of which differentially expressed during development. The promoters for 60% of the assembled transcripts result validated by CAGE-seq. Despite the extreme compaction of the tetraodon genome and the dramatic loss of transposons, the length of lncRNA exons remain comparable to that of other vertebrates and a small set of lncRNAs appears enriched for transposable elements suggesting a selective pressure acting on lncRNAs length and composition. Finally, a set of lncRNAs are microsyntenic between teleost and vertebrates, which indicates potential regulatory interactions between lncRNAs and their flanking coding genes. Our work provides a fundamental molecular resource for vertebrate comparative genomics and embryogenesis studies.
Collapse
|
15
|
Aken BL, Ayling S, Barrell D, Clarke L, Curwen V, Fairley S, Fernandez Banet J, Billis K, García Girón C, Hourlier T, Howe K, Kähäri A, Kokocinski F, Martin FJ, Murphy DN, Nag R, Ruffier M, Schuster M, Tang YA, Vogel JH, White S, Zadissa A, Flicek P, Searle SMJ. The Ensembl gene annotation system. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw093. [PMID: 27337980 PMCID: PMC4919035 DOI: 10.1093/database/baw093] [Citation(s) in RCA: 690] [Impact Index Per Article: 86.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 05/09/2016] [Indexed: 12/12/2022]
Abstract
The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail.Database URL: http://www.ensembl.org/index.html.
Collapse
Affiliation(s)
- Bronwen L Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Sarah Ayling
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK Present addresses: The Genome Analysis Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Daniel Barrell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK Eagle Genomics Ltd, Babraham Research Campus, Cambridge CB22 3AT, UK
| | - Laura Clarke
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Valery Curwen
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Susan Fairley
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julio Fernandez Banet
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK Pfizer Inc, 10646 Science Center Dr, San Diego, CA 92121, USA
| | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Kevin Howe
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andreas Kähäri
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK Institutionen för cell-och molekylärbiologi, Uppsala University, Husargatan 3, Uppsala 752 37, Sweden
| | - Felix Kokocinski
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Daniel N Murphy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Rishi Nag
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Magali Ruffier
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Michael Schuster
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna a-1090, Austria
| | - Y Amy Tang
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jan-Hinnerk Vogel
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK Genentech Inc, 1 DNA Way, South San Francisco, CA 94080, USA
| | - Simon White
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK The Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Amonida Zadissa
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Stephen M J Searle
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| |
Collapse
|
16
|
Herrero J, Muffato M, Beal K, Fitzgerald S, Gordon L, Pignatelli M, Vilella AJ, Searle SMJ, Amode R, Brent S, Spooner W, Kulesha E, Yates A, Flicek P. Ensembl comparative genomics resources. Database (Oxford) 2016; 2016:bav096. [PMID: 26896847 PMCID: PMC4761110 DOI: 10.1093/database/bav096] [Citation(s) in RCA: 191] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Revised: 08/10/2015] [Accepted: 09/04/2015] [Indexed: 01/08/2023]
Abstract
Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org.
Collapse
Affiliation(s)
- Javier Herrero
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Bill Lyons Informatics Centre, UCL Cancer Institute, University College London, London WC1E 6DD
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Kathryn Beal
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Stephen Fitzgerald
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Leo Gordon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Miguel Pignatelli
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Albert J. Vilella
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | | | - Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| | - Simon Brent
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| | - William Spooner
- Eagle Genomics Ltd., Babraham Research Campus, Cambridge, CB22 3AT, UK, and
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Eugene Kulesha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| |
Collapse
|
17
|
Herrero J, Muffato M, Beal K, Fitzgerald S, Gordon L, Pignatelli M, Vilella AJ, Searle SMJ, Amode R, Brent S, Spooner W, Kulesha E, Yates A, Flicek P. Ensembl comparative genomics resources. Database (Oxford) 2016; 2016:bav096. [PMID: 26896847 PMCID: PMC4761110 DOI: 10.1093/database/bav096 10.1093/database/baw053] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Revised: 08/10/2015] [Accepted: 09/04/2015] [Indexed: 08/10/2024]
Abstract
Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org.
Collapse
Affiliation(s)
- Javier Herrero
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Bill Lyons Informatics Centre, UCL Cancer Institute, University College London, London WC1E 6DD
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Kathryn Beal
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Stephen Fitzgerald
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Leo Gordon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Miguel Pignatelli
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Albert J. Vilella
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | | | - Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| | - Simon Brent
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| | - William Spooner
- Eagle Genomics Ltd., Babraham Research Campus, Cambridge, CB22 3AT, UK, and
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Eugene Kulesha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| |
Collapse
|
18
|
Akahori H, Guindon S, Yoshizaki S, Muto Y. Molecular Evolution of the TET Gene Family in Mammals. Int J Mol Sci 2015; 16:28472-85. [PMID: 26633372 PMCID: PMC4691057 DOI: 10.3390/ijms161226110] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Revised: 11/10/2015] [Accepted: 11/18/2015] [Indexed: 11/21/2022] Open
Abstract
Ten-eleven translocation (TET) proteins, a family of Fe2+- and 2-oxoglutarate-dependent dioxygenases, are involved in DNA demethylation. They also help regulate various cellular functions. Three TET paralogs have been identified (TET1, TET2, and TET3) in humans. This study focuses on the evolution of mammalian TET genes. Distinct patterns in TET1 and TET2 vs. TET3 were revealed by codon-based tests of positive selection. Results indicate that TET1 and TET2 genes have experienced positive selection more frequently than TET3 gene, and that the majority of codon sites evolved under strong negative selection. These findings imply that the selective pressure on TET3 may have been relaxed in several lineages during the course of evolution. Our analysis of convergent amino acid substitutions also supports the different evolutionary dynamics among TET gene subfamily members. All of the five amino acid sites that are inferred to have evolved under positive selection in the catalytic domain of TET2 are localized at the protein’s outer surface. The adaptive changes of these positively selected amino acid sites could be associated with dynamic interactions between other TET-interacting proteins, and positive selection thus appears to shift the regulatory scheme of TET enzyme function.
Collapse
Affiliation(s)
- Hiromichi Akahori
- United Graduate School of Drug Discovery and Medical Information Sciences, Gifu University, 1-1 Yanagido, Gifu 501-1194, Japan.
| | - Stéphane Guindon
- Department of Statistics, the University of Auckland, Auckland 1010, New Zealand.
| | - Sumio Yoshizaki
- United Graduate School of Drug Discovery and Medical Information Sciences, Gifu University, 1-1 Yanagido, Gifu 501-1194, Japan.
| | - Yoshinori Muto
- United Graduate School of Drug Discovery and Medical Information Sciences, Gifu University, 1-1 Yanagido, Gifu 501-1194, Japan.
- Department of Functional Bioscience, Gifu University School of Medicine, 1-1 Yanagido, Gifu 501-1193, Japan.
| |
Collapse
|
19
|
Mbandi SK, Hesse U, van Heusden P, Christoffels A. Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms. BMC Bioinformatics 2015; 16:58. [PMID: 25880035 PMCID: PMC4344733 DOI: 10.1186/s12859-015-0492-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 02/06/2015] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND De novo transcriptome assembly of short transcribed fragments (transfrags) produced from sequencing-by-synthesis technologies often results in redundant datasets with differing levels of unassembled, partially assembled or mis-assembled transcripts. Post-assembly processing intended to reduce redundancy typically involves reassembly or clustering of assembled sequences. However, these approaches are mostly based on common word heuristics and often create clusters of biologically unrelated sequences, resulting in loss of unique transfrags annotations and propagation of mis-assemblies. RESULTS Here, we propose a structured framework that consists of a few steps in pipeline architecture for Inferring Functionally Relevant Assembly-derived Transcripts (IFRAT). IFRAT combines 1) removal of identical subsequences, 2) error tolerant CDS prediction, 3) identification of coding potential, and 4) complements BLAST with a multiple domain architecture annotation that reduces non-specific domain annotation. We demonstrate that independent of the assembler, IFRAT selects bona fide transfrags (with CDS and coding potential) from the transcriptome assembly of a model organism without relying on post-assembly clustering or reassembly. The robustness of IFRAT is inferred on RNA-Seq data of Neurospora crassa assembled using de Bruijn graph-based assemblers, in single (Trinity and Oases-25) and multiple (Oases-Merge and additive or pooled) k-mer modes. Single k-mer assemblies contained fewer transfrags compared to the multiple k-mer assemblies. However, Trinity identified a comparable number of predicted coding sequence and gene loci to Oases pooled assembly. IFRAT selects bona fide transfrags representing over 94% of cumulative BLAST-derived functional annotations of the unfiltered assemblies. Between 4-6% are lost when orphan transfrags are excluded and this represents only a tiny fraction of annotation derived from functional transference by sequence similarity. The median length of bona fide transfrags ranged from 1.5kb (Trinity) to 2kb (Oases), which is consistent with the average coding sequence length in fungi. The fraction of transfrags that could be associated with gene ontology terms ranged from 33-50%, which is also high for domain based annotation. We showed that unselected transfrags were mostly truncated and represent sequences from intronic, untranslated (5' and 3') regions and non-coding gene loci. CONCLUSIONS IFRAT simplifies post-assembly processing providing a reference transcriptome enriched with functionally relevant assembly-derived transcripts for non-model organism.
Collapse
Affiliation(s)
- Stanley Kimbung Mbandi
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
| | - Uljana Hesse
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
| | - Peter van Heusden
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
| | - Alan Christoffels
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
| |
Collapse
|
20
|
Adaptive evolution of formyl peptide receptors in mammals. J Mol Evol 2015; 80:130-41. [PMID: 25627928 DOI: 10.1007/s00239-015-9666-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2014] [Accepted: 01/19/2015] [Indexed: 01/06/2023]
Abstract
The formyl peptide receptors (FPRs) are a family of chemoattractant receptors with important roles in host defense and the regulation of inflammatory reactions. In humans, three FPR paralogs have been identified (FPR1, FPR2, and FPR3) and may have functionally diversified by gene duplication and adaptive evolution. However, the evolutionary mechanisms operating in the diversification of FPR family genes and the changes in selection pressures have not been characterized to date. Here, we have made a comprehensive evolutionary analysis of FPR genes from mammalian species. Phylogenetic analysis showed that an early duplication was responsible for FPR1 and FPR2/FPR3 splitting, and FPR3 originated from the latest duplication event near the origin of primates. Codon-based tests of positive selection reveal interesting patterns in FPR1 and FPR2 versus FPR3, with the first two genes showing clear evidence of positive selection at some sites while the majority of them evolve under strong negative selection. In contrast, our results suggest that the selective pressure may be relaxed in the FPR3 lineage. Of the six amino acid sites inferred to evolve under positive selection in FPR1 and FPR2, four sites were located in extracellular loops of the protein. The electrostatic potential of the extracellular surface of FPR might be affected more frequently with amino acid substitutions in positively selected sites. Thus, positive selection of FPRs among mammals may reflect a link between changes in the sequence and surface structure of the proteins and is likely to be important in the host's defense against invading pathogens.
Collapse
|
21
|
Lefranc MP. Immunoglobulins: 25 years of immunoinformatics and IMGT-ONTOLOGY. Biomolecules 2014; 4:1102-39. [PMID: 25521638 PMCID: PMC4279172 DOI: 10.3390/biom4041102] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Revised: 12/02/2014] [Accepted: 12/03/2014] [Indexed: 11/17/2022] Open
Abstract
IMGT®, the international ImMunoGeneTics information system® (CNRS and Montpellier University) is the global reference in immunogenetics and immunoinformatics. By its creation in 1989, IMGT® marked the advent of immunoinformatics, which emerged at the interface between immunogenetics and bioinformatics. IMGT® is specialized in the immunoglobulins (IG) or antibodies, T cell receptors (TR), major histocompatibility (MH), and IgSF and MhSF superfamilies. IMGT® has been built on the IMGT-ONTOLOGY axioms and concepts, which bridged the gap between genes, sequences and three-dimensional (3D) structures. The concepts include the IMGT® standardized keywords (identification), IMGT® standardized labels (description), IMGT® standardized nomenclature (classification), IMGT unique numbering and IMGT Colliers de Perles (numerotation). IMGT® comprises seven databases, 15,000 pages of web resources and 17 tools. IMGT® tools and databases provide a high-quality analysis of the IG from fish to humans, for basic, veterinary and medical research, and for antibody engineering and humanization. They include, as examples: IMGT/V-QUEST and IMGT/JunctionAnalysis for nucleotide sequence analysis and their high-throughput version IMGT/HighV-QUEST for next generation sequencing, IMGT/DomainGapAlign for amino acid sequence analysis of IG domains, IMGT/3Dstructure-DB for 3D structures, contact analysis and paratope/epitope interactions of IG/antigen complexes, and the IMGT/mAb-DB interface for therapeutic antibodies and fusion proteins for immunological applications (FPIA).
Collapse
Affiliation(s)
- Marie-Paule Lefranc
- IMGT®, the international ImMunoGenetics information system®, Laboratoire d'ImmunoGénétique Moléculaire LIGM, Institut de Génétique Humaine IGH, UPR CNRS 1142, Montpellier University, 141 rue de la Cardonille, 34396 Montpellier cedex 5, France.
| |
Collapse
|
22
|
Alamyar E, Giudicelli V, Duroux P, Lefranc MP. Antibody V and C domain sequence, structure, and interaction analysis with special reference to IMGT®. Methods Mol Biol 2014; 1131:337-81. [PMID: 24515476 DOI: 10.1007/978-1-62703-992-5_21] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
IMGT(®), the international ImMunoGeneTics information system(®) (http://www.imgt.org), created in 1989 (Centre National de la Recherche Scientifique, Montpellier University), is acknowledged as the global reference in immunogenetics and immunoinformatics. The accuracy and the consistency of the IMGT(®) data are based on IMGT-ONTOLOGY which bridges the gap between genes, sequences, and three-dimensional (3D) structures. Thus, receptors, chains, and domains are characterized with the same IMGT(®) rules and standards (IMGT standardized labels, IMGT gene and allele nomenclature, IMGT unique numbering, IMGT Collier de Perles), independently from the molecule type (genomic DNA, complementary DNA, transcript, or protein) or from the species. More particularly, IMGT(®) tools and databases provide a highly standardized analysis of the immunoglobulin (IG) or antibody and T cell receptor (TR) V and C domains. IMGT/V-QUEST analyzes the V domains of IG or TR rearranged nucleotide sequences, integrates the IMGT/JunctionAnalysis and IMGT/Automat tools, and provides IMGT Collier de Perles. IMGT/HighV-QUEST analyzes sequences from high-throughput sequencing (HTS) (up to 150,000 sequences per batch) and performs statistical analysis on up to 450,000 results, with the same resolution and high quality as IMGT/V-QUEST online. IMGT/DomainGapAlign analyzes amino acid sequences of V and C domains and IMGT/3Dstructure-DB and associated tools provide information on 3D structures, contact analysis, and paratope/epitope interactions. These IMGT(®) tools and databases, and the IMGT/mAb-DB interface with access to therapeutical antibody data, provide an invaluable help for antibody engineering and antibody humanization.
Collapse
Affiliation(s)
- Eltaf Alamyar
- The International ImMunoGenetics information system, Laboratoire d'ImmunoGénétique Moléculaire, Institut de Génétique Humaine IGH, Université Montpellier 2, Montpellier, France
| | | | | | | |
Collapse
|
23
|
Abstract
ABSTRACT
Antibody informatics, a part of immunoinformatics, refers to the concepts, databases, and tools developed and used to explore and to analyze the particular properties of the immunoglobulins (IG) or antibodies, compared with conventional genes and proteins. Antibody informatics is based on a unique ontology, IMGT-ONTOLOGY, created in 1989 by IMGT, the international ImMunoGeneTics information system (
http://www.imgt.org
). IMGT-ONTOLOGY defined, for the first time, the concept of ‘genes’ for the IG and the T cell receptors (TR), which led to their gene and allele nomenclature and allowed their entry in databases and tools. A second IMGT-ONTOLOGY revolutionizing and definitive concept was the IMGT unique numbering that bridged the gap between sequences and structures for the variable (V) and constant (C) domains of the IG and TR, and for the groove (G) domains of the major histocompatibility (MH). These breakthroughs contributed to the development of IMGT databases and tools for antibody informatics and its diverse applications, such as repertoire analysis in infectious diseases, antibody engineering and humanization, and study of antibody/antigen interactions. Nucleotide sequences of antibody V domains from deep sequencing (Next Generation Sequencing or High Throughput Sequencing) are analyzed with IMGT/HighV-QUEST, the high-throughput version of IMGT/V-QUEST and IMGT/JunctionAnalysis. Amino acid sequences of V and C domains are represented with the IMGT/Collier-de-Perles tool and analyzed with IMGT/DomainGapAlign. Three-dimensional (3D) structures (including contact analysis and paratope/epitope) are described in IMGT/3Dstructure-DB. Based on a friendly interface, IMGT/mAb-DB contains therapeutic monoclonal antibodies (INN suffix–mab) that can be queried on their specificity, for example, in infectious diseases, on bacterial or viral targets.
Collapse
|
24
|
Lefranc MP. Immunoglobulin and T Cell Receptor Genes: IMGT(®) and the Birth and Rise of Immunoinformatics. Front Immunol 2014; 5:22. [PMID: 24600447 PMCID: PMC3913909 DOI: 10.3389/fimmu.2014.00022] [Citation(s) in RCA: 165] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2013] [Accepted: 01/15/2014] [Indexed: 11/13/2022] Open
Abstract
IMGT(®), the international ImMunoGeneTics information system(®) (1), (CNRS and Université Montpellier 2) is the global reference in immunogenetics and immunoinformatics. By its creation in 1989, IMGT(®) marked the advent of immunoinformatics, which emerged at the interface between immunogenetics and bioinformatics. IMGT(®) is specialized in the immunoglobulins (IG) or antibodies, T cell receptors (TR), major histocompatibility (MH), and proteins of the IgSF and MhSF superfamilies. IMGT(®) has been built on the IMGT-ONTOLOGY axioms and concepts, which bridged the gap between genes, sequences, and three-dimensional (3D) structures. The concepts include the IMGT(®) standardized keywords (concepts of identification), IMGT(®) standardized labels (concepts of description), IMGT(®) standardized nomenclature (concepts of classification), IMGT unique numbering, and IMGT Colliers de Perles (concepts of numerotation). IMGT(®) comprises seven databases, 15,000 pages of web resources, and 17 tools, and provides a high-quality and integrated system for the analysis of the genomic and expressed IG and TR repertoire of the adaptive immune responses. Tools and databases are used in basic, veterinary, and medical research, in clinical applications (mutation analysis in leukemia and lymphoma) and in antibody engineering and humanization. They include, for example IMGT/V-QUEST and IMGT/JunctionAnalysis for nucleotide sequence analysis and their high-throughput version IMGT/HighV-QUEST for next-generation sequencing (500,000 sequences per batch), IMGT/DomainGapAlign for amino acid sequence analysis of IG and TR variable and constant domains and of MH groove domains, IMGT/3Dstructure-DB for 3D structures, contact analysis and paratope/epitope interactions of IG/antigen and TR/peptide-MH complexes and IMGT/mAb-DB interface for therapeutic antibodies and fusion proteins for immune applications (FPIA).
Collapse
Affiliation(s)
- Marie-Paule Lefranc
- The International ImMunoGenetics Information System (IMGT), Laboratoire d’ImmunoGénétique Moléculaire (LIGM), Institut de Génétique Humaine, UPR CNRS, Université Montpellier 2, Montpellier, France
| |
Collapse
|
25
|
Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Girón CG, Gordon L, Hourlier T, Hunt S, Johnson N, Juettemann T, Kähäri AK, Keenan S, Kulesha E, Martin FJ, Maurel T, McLaren WM, Murphy DN, Nag R, Overduin B, Pignatelli M, Pritchard B, Pritchard E, Riat HS, Ruffier M, Sheppard D, Taylor K, Thormann A, Trevanion SJ, Vullo A, Wilder SP, Wilson M, Zadissa A, Aken BL, Birney E, Cunningham F, Harrow J, Herrero J, Hubbard TJ, Kinsella R, Muffato M, Parker A, Spudich G, Yates A, Zerbino DR, Searle SM. Ensembl 2014. Nucleic Acids Res 2013; 42:D749-55. [PMID: 24316576 PMCID: PMC3964975 DOI: 10.1093/nar/gkt1196] [Citation(s) in RCA: 1059] [Impact Index Per Article: 96.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.
Collapse
Affiliation(s)
- Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
- *To whom correspondence should be addressed. Tel: +44 1223 492 581; Fax: +44 1223 494 494;
| | - M. Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Daniel Barrell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Kathryn Beal
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Simon Brent
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Denise Carvalho-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Peter Clapham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Guy Coates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Stephen Fitzgerald
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Laurent Gil
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Leo Gordon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Sarah Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Nathan Johnson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Thomas Juettemann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Andreas K. Kähäri
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Stephen Keenan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Eugene Kulesha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Fergal J. Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Thomas Maurel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - William M. McLaren
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Daniel N. Murphy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Rishi Nag
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Bert Overduin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Miguel Pignatelli
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Bethan Pritchard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Emily Pritchard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Harpreet S. Riat
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Daniel Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Anja Thormann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Stephen J. Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Alessandro Vullo
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Steven P. Wilder
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Mark Wilson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Amonida Zadissa
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Bronwen L. Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Jennifer Harrow
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Javier Herrero
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Tim J.P. Hubbard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Rhoda Kinsella
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Giulietta Spudich
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Andy Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Daniel R. Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Stephen M.J. Searle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
26
|
Minelli C, De Grandi A, Weichenberger CX, Gögele M, Modenese M, Attia J, Barrett JH, Boehnke M, Borsani G, Casari G, Fox CS, Freina T, Hicks AA, Marroni F, Parmigiani G, Pastore A, Pattaro C, Pfeufer A, Ruggeri F, Schwienbacher C, Taliun D, Pramstaller PP, Domingues FS, Thompson JR. Importance of different types of prior knowledge in selecting genome-wide findings for follow-up. Genet Epidemiol 2013; 37:205-13. [PMID: 23307621 DOI: 10.1002/gepi.21705] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2012] [Revised: 10/28/2012] [Accepted: 11/22/2012] [Indexed: 12/14/2022]
Abstract
Biological plausibility and other prior information could help select genome-wide association (GWA) findings for further follow-up, but there is no consensus on which types of knowledge should be considered or how to weight them. We used experts' opinions and empirical evidence to estimate the relative importance of 15 types of information at the single-nucleotide polymorphism (SNP) and gene levels. Opinions were elicited from 10 experts using a two-round Delphi survey. Empirical evidence was obtained by comparing the frequency of each type of characteristic in SNPs established as being associated with seven disease traits through GWA meta-analysis and independent replication, with the corresponding frequency in a randomly selected set of SNPs. SNP and gene characteristics were retrieved using a specially developed bioinformatics tool. Both the expert and the empirical evidence rated previous association in a meta-analysis or more than one study as conferring the highest relative probability of true association, whereas previous association in a single study ranked much lower. High relative probabilities were also observed for location in a functional protein domain, although location in a region evolutionarily conserved in vertebrates was ranked high by the data but not by the experts. Our empirical evidence did not support the importance attributed by the experts to whether the gene encodes a protein in a pathway or shows interactions relevant to the trait. Our findings provide insight into the selection and weighting of different types of knowledge in SNP or gene prioritization, and point to areas requiring further research.
Collapse
Affiliation(s)
- Cosetta Minelli
- Center for Biomedicine, European Academy Bozen/Bolzano (EURAC), Bolzano, Italy.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Magadán-Mompó S, Zimmerman AM, Sánchez-Espinel C, Gambón-Deza F. Immunoglobulin light chains in medaka (Oryzias latipes). Immunogenetics 2013; 65:387-96. [PMID: 23417322 DOI: 10.1007/s00251-013-0678-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Accepted: 01/11/2013] [Indexed: 11/26/2022]
Abstract
The gene segments encoding antibodies have been studied in many capacities and represent some of the best-characterized gene families in traditional animal disease models (mice and humans). To date, multiple immunoglobulin light chain (IgL) isotypes have been found in vertebrates and it is unclear as to which isotypes might be more primordial in nature. Sequence data emerging from an array of fish genome projects is a valuable resource for discerning complex multigene assemblages in this critical branch point of vertebrate phylogeny. Herein, we have analyzed the genomic organization of medaka (Oryzias latipes) IgL gene segments based on recently released genome data. The medaka IgL locus located on chromosome 11 contains at least three clusters of IgL gene segments comprised of multiple gene assemblages of the kappa light chain isotype. These data suggest that medaka IgL gene segments may undergo both intra- and inter-cluster rearrangements as a means to generate additional diversity. Alignments of expressed sequence tags to concordant gene segments which revealed each of the three IgL clusters are expressed. Collectively, these data provide a genomic framework for IgL genes in medaka and indicate that Ig diversity in this species is achieved from at least three distinct chromosomal regions.
Collapse
Affiliation(s)
- Susana Magadán-Mompó
- Virologie et Immunologie Moleculaires, Institut National de la Recherche Agronomique (INRA), Jouy-en-Josas, France.
| | | | | | | |
Collapse
|
28
|
Use of IMGT(®) databases and tools for antibody engineering and humanization. Methods Mol Biol 2012; 907:3-37. [PMID: 22907343 DOI: 10.1007/978-1-61779-974-7_1] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
IMGT(®), the international ImMunoGeneTics information system(®) (http://www.imgt.org), was created in 1989 to manage the huge diversity of the antigen receptors, immunoglobulins (IG) or antibodies, and T cell receptors (TR). Standardized sequence and structure analysis of antibody using IMGT(®) databases and tools allows one to bridge, for the first time, the gap between antibody sequences and three-dimensional (3D) structures. This is achieved through the IMGT Scientific chart rules, based on the IMGT-ONTOLOGY concepts of classification (IMGT gene and allele nomenclature), description (IMGT standardized labels), and numerotation (IMGT unique numbering and IMGT Colliers de Perles). IMGT(®) is the international reference for immunogenetics and immunoinformatics and its standards are particularly useful for antibody humanization and evaluation of immunogenicity. IMGT(®) databases for antibody nucleotide sequences and genes include IMGT/LIGM-DB and IMGT/GENE-DB, respectively, whereas nucleotide sequence analysis is performed by the IMGT/V-QUEST, IMGT/HighV-QUEST, and IMGT/JunctionAnalysis tools. In this chapter, we focus on IMGT(®) databases and tools for amino acid sequences, two-dimensional (2D) and three-dimensional (3D) structures: the IMGT/DomainGapAlign and IMGT/Collier-de-Perles tools, the IMGT/2Dstructure-DB database for amino acid sequences of monoclonal antibodies (mAb, suffix -mab) and fusion proteins for immune applications (FPIA, suffix -cept) of the World Health Organization/International Nonproprietary Name (WHO/INN) programme and other proteins of interest, and the IMGT/3Dstructure-DB database for crystallized antibodies and its associated tools (IMGT/StructuralQuery, IMGT/DomainSuperimpose).
Collapse
|
29
|
Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, García-Girón C, Gordon L, Hourlier T, Hunt S, Juettemann T, Kähäri AK, Keenan S, Komorowska M, Kulesha E, Longden I, Maurel T, McLaren WM, Muffato M, Nag R, Overduin B, Pignatelli M, Pritchard B, Pritchard E, Riat HS, Ritchie GRS, Ruffier M, Schuster M, Sheppard D, Sobral D, Taylor K, Thormann A, Trevanion S, White S, Wilder SP, Aken BL, Birney E, Cunningham F, Dunham I, Harrow J, Herrero J, Hubbard TJP, Johnson N, Kinsella R, Parker A, Spudich G, Yates A, Zadissa A, Searle SMJ. Ensembl 2013. Nucleic Acids Res 2012. [PMID: 23203987 PMCID: PMC3531136 DOI: 10.1093/nar/gks1236] [Citation(s) in RCA: 791] [Impact Index Per Article: 65.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.
Collapse
Affiliation(s)
- Paul Flicek
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Iwama H, Kato K, Imachi H, Murao K, Masaki T. Human microRNAs originated from two periods at accelerated rates in mammalian evolution. Mol Biol Evol 2012; 30:613-26. [PMID: 23171859 PMCID: PMC3563971 DOI: 10.1093/molbev/mss262] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
MicroRNAs (miRNAs) are short, noncoding RNAs that modulate genes posttranscriptionally. Frequent gains and losses of miRNA genes have been reported to occur during evolution. However, little is known systematically about the periods of evolutionary origin of the present miRNA gene repertoire of an extant mammalian species. Thus, in this study, we estimated the evolutionary periods during which each of 1,433 present human miRNA genes originated within 15 periods, from human to platypus-human common ancestral branch and a class "conserved beyond theria," primarily using multiple genome alignments of 38 species, plus the pairwise genome alignments of five species. The results showed two peak periods in which the human miRNA genes originated at significantly accelerated rates. The most accelerated rate appeared in the period of the initial phase of hominoid lineage, and the second appeared shortly before Laurasiatherian divergence. Approximately 53% of the present human miRNA genes have originated within the simian lineage to human. In particular, approximately 28% originated within the hominoid lineage. The early phase of placental mammal radiation comprises approximately 28%, while no more than 15% of human miRNAs have been conserved beyond placental mammals. We also clearly showed a general trend, in which the miRNA expression level decreases as the miRNA becomes younger. Intriguingly, amid this decreasing trend of expression, we found one significant rise in the expression level that corresponded to the initial phase of the hominoid lineage, suggesting that increased functional acquisitions of miRNAs originated at this particular period.
Collapse
Affiliation(s)
- Hisakazu Iwama
- Life Science Research Center, Kagawa University, Kita-gun, Kagawa, Japan.
| | | | | | | | | |
Collapse
|
31
|
Paterson T, Law A. JEnsembl: a version-aware Java API to Ensembl data systems. Bioinformatics 2012; 28:2724-31. [PMID: 22945789 PMCID: PMC3476335 DOI: 10.1093/bioinformatics/bts525] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2012] [Revised: 08/16/2012] [Accepted: 08/20/2012] [Indexed: 11/21/2022] Open
Abstract
MOTIVATION The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. RESULTS The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing 'through time' comparative analyses to be performed. AVAILABILITY Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net).
Collapse
Affiliation(s)
- Trevor Paterson
- Division of Genetics and Genomics, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK.
| | | |
Collapse
|
32
|
Xie C, Zhang YE, Chen JY, Liu CJ, Zhou WZ, Li Y, Zhang M, Zhang R, Wei L, Li CY. Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs. PLoS Genet 2012; 8:e1002942. [PMID: 23028352 PMCID: PMC3441637 DOI: 10.1371/journal.pgen.1002942] [Citation(s) in RCA: 116] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2012] [Accepted: 07/24/2012] [Indexed: 01/08/2023] Open
Abstract
Tinkering with pre-existing genes has long been known as a major way to create new genes. Recently, however, motherless protein-coding genes have been found to have emerged de novo from ancestral non-coding DNAs. How these genes originated is not well addressed to date. Here we identified 24 hominoid-specific de novo protein-coding genes with precise origination timing in vertebrate phylogeny. Strand-specific RNA–Seq analyses were performed in five rhesus macaque tissues (liver, prefrontal cortex, skeletal muscle, adipose, and testis), which were then integrated with public transcriptome data from human, chimpanzee, and rhesus macaque. On the basis of comparing the RNA expression profiles in the three species, we found that most of the hominoid-specific de novo protein-coding genes encoded polyadenylated non-coding RNAs in rhesus macaque or chimpanzee with a similar transcript structure and correlated tissue expression profile. According to the rule of parsimony, the majority of these hominoid-specific de novo protein-coding genes appear to have acquired a regulated transcript structure and expression profile before acquiring coding potential. Interestingly, although the expression profile was largely correlated, the coding genes in human often showed higher transcriptional abundance than their non-coding counterparts in rhesus macaque. The major findings we report in this manuscript are robust and insensitive to the parameters used in the identification and analysis of de novo genes. Our results suggest that at least a portion of long non-coding RNAs, especially those with active and regulated transcription, may serve as a birth pool for protein-coding genes, which are then further optimized at the transcriptional level. Ever since the pre-genomic era, people believed that “mother gene”-based mechanisms such as gene duplication were the major means of creating new genes. Recently, we and others reported several “motherless” protein-coding genes in human, challenging the conventional idea in that some protein-coding genes might have emerged de novo from ancestral non-coding DNAs. However, how these interesting proteins originated is a question that remained unaddressed. The ancestral non-coding DNA must become transcribed and gain a translatable open reading frame before becoming a protein-coding gene, but either order of these two steps is possible. Here, we performed a comparative transcriptome study in human, chimpanzee, and rhesus macaque to address these fundamental questions. We found that most of the hominoid-specific de novo protein-coding genes encoded long non-coding RNAs in rhesus macaque or chimpanzee, with similar transcript structure and correlated tissue expression profile, but the protein-coding genes often had higher transcriptional abundance. According to the rule of parsimony, we conclude that at least a portion of long non-coding RNAs, especially those with active and regulated transcription, may serve as a birth pool for protein-coding genes that are then further optimized at the transcriptional level, a pattern insensitive to the parameters used in the identification and analysis of de novo genes.
Collapse
Affiliation(s)
- Chen Xie
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, China
| | - Yong E. Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jia-Yu Chen
- Institute of Molecular Medicine, Peking University, Beijing, China
| | - Chu-Jun Liu
- Institute of Molecular Medicine, Peking University, Beijing, China
| | - Wei-Zhen Zhou
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, China
| | - Ying Li
- Institute of Molecular Medicine, Peking University, Beijing, China
| | - Mao Zhang
- Institute of Molecular Medicine, Peking University, Beijing, China
| | - Rongli Zhang
- Institute of Molecular Medicine, Peking University, Beijing, China
| | - Liping Wei
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, China
- * E-mail: (C-YL); (LW)
| | - Chuan-Yun Li
- Institute of Molecular Medicine, Peking University, Beijing, China
- * E-mail: (C-YL); (LW)
| |
Collapse
|
33
|
Testori A, Caizzi L, Cutrupi S, Friard O, De Bortoli M, Cora' D, Caselle M. The role of Transposable Elements in shaping the combinatorial interaction of Transcription Factors. BMC Genomics 2012; 13:400. [PMID: 22897927 PMCID: PMC3478180 DOI: 10.1186/1471-2164-13-400] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2012] [Accepted: 06/28/2012] [Indexed: 12/22/2022] Open
Abstract
Background In the last few years several studies have shown that Transposable Elements (TEs) in the human genome are significantly associated with Transcription Factor Binding Sites (TFBSs) and that in several cases their expansion within the genome led to a substantial rewiring of the regulatory network. Another important feature of the regulatory network which has been thoroughly studied is the combinatorial organization of transcriptional regulation. In this paper we combine these two observations and suggest that TEs, besides rewiring the network, also played a central role in the evolution of particular patterns of combinatorial gene regulation. Results To address this issue we searched for TEs overlapping Estrogen Receptor α (ERα) binding peaks in two publicly available ChIP-seq datasets from the MCF7 cell line corresponding to different modalities of exposure to estrogen. We found a remarkable enrichment of a few specific classes of Transposons. Among these a prominent role was played by MIR (Mammalian Interspersed Repeats) transposons. These TEs underwent a dramatic expansion at the beginning of the mammalian radiation and then stabilized. We conjecture that the special affinity of ERα for the MIR class of TEs could be at the origin of the important role assumed by ERα in Mammalians. We then searched for TFBSs within the TEs overlapping ChIP-seq peaks. We found a strong enrichment of a few precise combinations of TFBS. In several cases the corresponding Transcription Factors (TFs) were known cofactors of ERα, thus supporting the idea of a co-regulatory role of TFBS within the same TE. Moreover, most of these correlations turned out to be strictly associated to specific classes of TEs thus suggesting the presence of a well-defined "transposon code" within the regulatory network. Conclusions In this work we tried to shed light into the role of Transposable Elements (TEs) in shaping the regulatory network of higher eukaryotes. To test this idea we focused on a particular transcription factor: the Estrogen Receptor α (ERα) and we found that ERα preferentially targets a well defined set of TEs and that these TEs host combinations of transcriptional regulators involving several of known co-regulators of ERα. Moreover, a significant number of these TEs turned out to be conserved between human and mouse and located in the vicinity (and thus candidate to be regulators) of important estrogen-related genes.
Collapse
Affiliation(s)
- Alessandro Testori
- Center for Molecular Systems Biology, University of Turin, Turin, Candiolo I-10060, Italy.
| | | | | | | | | | | | | |
Collapse
|
34
|
Singh DD, Jain A. Multipurpose instantaneous microarray detection of acute encephalitis causing viruses and their expression profiles. Curr Microbiol 2012; 65:290-303. [PMID: 22674173 PMCID: PMC7080014 DOI: 10.1007/s00284-012-0154-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2012] [Accepted: 05/14/2012] [Indexed: 01/15/2023]
Abstract
Detection of multiple viruses is important for global analysis of gene or protein content and expression, opening up new prospects in terms of molecular and physiological systems for pathogenic diagnosis. Early diagnosis is crucial for disease treatment and control as it reduces inappropriate use of antiviral therapy and focuses surveillance activity. This requires the ability to detect and accurately diagnose infection at or close to the source/outbreak with minimum delay and the need for specific, accessible point-of-care diagnosis able to distinguish causative viruses and their subtypes. None of the available viral diagnostic assays combine a point-of-care format with the complex capability to identify a large range of human and animal viruses. Microarray detection provides a useful, labor-saving tool for detection of multiple viruses with several advantages, such as convenience and prevention of cross-contamination of polymerase chain reaction (PCR) products, which is of foremost importance in such applications. Recently, real-time PCR assays with the ability to confirm the amplification product and quantitate the target concentration have been developed. Furthermore, nucleotide sequence analysis of amplification products has facilitated epidemiological studies of infectious disease outbreaks and monitoring of treatment outcomes for infections, in particular for viruses that mutate at high frequency. This review discusses applications of microarray technology as a potential new tool for detection and identification of acute encephalitis-causing viruses in human serum, plasma, and cell cultures.
Collapse
Affiliation(s)
- Desh Deepak Singh
- Virology Laboratory, Department of Microbiology, C S M Medical University, Lucknow, UP 226003, India.
| | | |
Collapse
|
35
|
Sana J, Faltejskova P, Svoboda M, Slaby O. Novel classes of non-coding RNAs and cancer. J Transl Med 2012; 10:103. [PMID: 22613733 PMCID: PMC3434024 DOI: 10.1186/1479-5876-10-103] [Citation(s) in RCA: 229] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2012] [Accepted: 05/21/2012] [Indexed: 12/12/2022] Open
Abstract
For the many years, the central dogma of molecular biology has been that RNA functions mainly as an informational intermediate between a DNA sequence and its encoded protein. But one of the great surprises of modern biology was the discovery that protein-coding genes represent less than 2% of the total genome sequence, and subsequently the fact that at least 90% of the human genome is actively transcribed. Thus, the human transcriptome was found to be more complex than a collection of protein-coding genes and their splice variants. Although initially argued to be spurious transcriptional noise or accumulated evolutionary debris arising from the early assembly of genes and/or the insertion of mobile genetic elements, recent evidence suggests that the non-coding RNAs (ncRNAs) may play major biological roles in cellular development, physiology and pathologies. NcRNAs could be grouped into two major classes based on the transcript size; small ncRNAs and long ncRNAs. Each of these classes can be further divided, whereas novel subclasses are still being discovered and characterized. Although, in the last years, small ncRNAs called microRNAs were studied most frequently with more than ten thousand hits at PubMed database, recently, evidence has begun to accumulate describing the molecular mechanisms by which a wide range of novel RNA species function, providing insight into their functional roles in cellular biology and in human disease. In this review, we summarize newly discovered classes of ncRNAs, and highlight their functioning in cancer biology and potential usage as biomarkers or therapeutic targets.
Collapse
Affiliation(s)
- Jiri Sana
- Masaryk Memorial Cancer Institute, Department of Comprehensive Cancer Care, Zluty kopec 7, Brno, Czech Republic, Europe
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic, Europe
| | - Petra Faltejskova
- Masaryk Memorial Cancer Institute, Department of Comprehensive Cancer Care, Zluty kopec 7, Brno, Czech Republic, Europe
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic, Europe
| | - Marek Svoboda
- Masaryk Memorial Cancer Institute, Department of Comprehensive Cancer Care, Zluty kopec 7, Brno, Czech Republic, Europe
| | - Ondrej Slaby
- Masaryk Memorial Cancer Institute, Department of Comprehensive Cancer Care, Zluty kopec 7, Brno, Czech Republic, Europe
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic, Europe
- Masaryk Memorial Cancer Institute, Department of Comprehensive Cancer Care, Zluty kopec 7, 656 53, Brno, Czech Republic, Europe
| |
Collapse
|
36
|
Nelson MR, Wegmann D, Ehm MG, Kessner D, St Jean P, Verzilli C, Shen J, Tang Z, Bacanu SA, Fraser D, Warren L, Aponte J, Zawistowski M, Liu X, Zhang H, Zhang Y, Li J, Li Y, Li L, Woollard P, Topp S, Hall MD, Nangle K, Wang J, Abecasis G, Cardon LR, Zöllner S, Whittaker JC, Chissoe SL, Novembre J, Mooser V. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 2012; 337:100-4. [PMID: 22604722 DOI: 10.1126/science.1217876] [Citation(s) in RCA: 483] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.
Collapse
Affiliation(s)
- Matthew R Nelson
- Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Bergerson RJ, Collier LS, Sarver AL, Been RA, Lugthart S, Diers MD, Zuber J, Rappaport AR, Nixon MJ, Silverstein KAT, Fan D, Lamblin AFJ, Wolff L, Kersey JH, Delwel R, Lowe SW, O'Sullivan MG, Kogan SC, Adams DJ, Largaespada DA. An insertional mutagenesis screen identifies genes that cooperate with Mll-AF9 in a murine leukemogenesis model. Blood 2012; 119:4512-23. [PMID: 22427200 PMCID: PMC3362364 DOI: 10.1182/blood-2010-04-281428] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2010] [Accepted: 03/03/2012] [Indexed: 11/20/2022] Open
Abstract
Patients with a t(9;11) translocation (MLL-AF9) develop acute myeloid leukemia (AML), and while in mice the expression of this fusion oncogene also results in the development of myeloid leukemia, it is with long latency. To identify mutations that cooperate with Mll-AF9, we infected neonatal wild-type (WT) or Mll-AF9 mice with a murine leukemia virus (MuLV). MuLV-infected Mll-AF9 mice succumbed to disease significantly faster than controls presenting predominantly with myeloid leukemia while infected WT animals developed predominantly lymphoid leukemia. We identified 88 candidate cancer genes near common sites of proviral insertion. Analysis of transcript levels revealed significantly elevated expression of Mn1, and a trend toward increased expression of Bcl11a and Fosb in Mll-AF9 murine leukemia samples with proviral insertions proximal to these genes. Accordingly, FOSB and BCL11A were also overexpressed in human AML harboring MLL gene translocations. FOSB was revealed to be essential for growth in mouse and human myeloid leukemia cells using shRNA lentiviral vectors in vitro. Importantly, MN1 cooperated with Mll-AF9 in leukemogenesis in an in vivo BM viral transduction and transplantation assay. Together, our data identified genes that define transcription factor networks and important genetic pathways acting during progression of leukemia induced by MLL fusion oncogenes.
Collapse
Affiliation(s)
- Rachel J Bergerson
- Department of Genetics, Cell Biology and Development, Masonic Cancer Center, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
A Sleeping Beauty mutagenesis screen reveals a tumor suppressor role for Ncoa2/Src-2 in liver cancer. Proc Natl Acad Sci U S A 2012; 109:E1377-86. [PMID: 22556267 DOI: 10.1073/pnas.1115433109] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The Sleeping Beauty (SB) transposon mutagenesis system is a powerful tool that facilitates the discovery of mutations that accelerate tumorigenesis. In this study, we sought to identify mutations that cooperate with MYC, one of the most commonly dysregulated genes in human malignancy. We performed a forward genetic screen with a mouse model of MYC-induced liver cancer using SB-mediated mutagenesis. We sequenced insertions in 63 liver tumor nodules and identified at least 16 genes/loci that contribute to accelerated tumor development. RNAi-mediated knockdown in a liver progenitor cell line further validate three of these genes, Ncoa2/Src-2, Zfx, and Dtnb, as tumor suppressors in liver cancer. Moreover, deletion of Ncoa2/Src-2 in mice predisposes to diethylnitrosamine-induced liver tumorigenesis. These findings reveal genes and pathways that functionally restrain MYC-mediated liver tumorigenesis and therefore may provide targets for cancer therapy.
Collapse
|
39
|
Characterization of rainbow trout gonad, brain and gill deep cDNA repertoires using a Roche 454-Titanium sequencing approach. Gene 2012; 500:32-9. [PMID: 22465513 DOI: 10.1016/j.gene.2012.03.053] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2011] [Revised: 03/09/2012] [Accepted: 03/12/2012] [Indexed: 11/23/2022]
Abstract
Rainbow trout, Oncorhynchus mykiss, is an important aquaculture species worldwide and, in addition to being of commercial interest, it is also a research model organism of considerable scientific importance. Because of the lack of a whole genome sequence in that species, transcriptomic analyses of this species have often been hindered. Using next-generation sequencing (NGS) technologies, we sought to fill these informational gaps. Here, using Roche 454-Titanium technology, we provide new tissue-specific cDNA repertoires from several rainbow trout tissues. Non-normalized cDNA libraries were constructed from testis, ovary, brain and gill rainbow trout tissue samples, and these different libraries were sequenced in 10 separate half-runs of 454-Titanium. Overall, we produced a total of 3million quality sequences with an average size of 328bp, representing more than 1Gb of expressed sequence information. These sequences have been combined with all publicly available rainbow trout sequences, resulting in a total of 242,187 clusters of putative transcript groups and 22,373 singletons. To identify the predominantly expressed genes in different tissues of interest, we developed a Digital Differential Display (DDD) approach. This approach allowed us to characterize the genes that are predominantly expressed within each tissue of interest. Of these genes, some were already known to be tissue-specific, thereby validating our approach. Many others, however, were novel candidates, demonstrating the usefulness of our strategy and of such tissue-specific resources. This new sequence information, acquired using NGS 454-Titanium technology, deeply enriched our current knowledge of the expressed genes in rainbow trout through the identification of an increased number of tissue-specific sequences. This identification allowed a precise cDNA tissue repertoire to be characterized in several important rainbow trout tissues. The rainbow trout contig browser can be accessed at the following publicly available web site (http://www.sigenae.org/).
Collapse
|
40
|
Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri AK, Keefe D, Keenan S, Kinsella R, Komorowska M, Koscielny G, Kulesha E, Larsson P, Longden I, McLaren W, Muffato M, Overduin B, Pignatelli M, Pritchard B, Riat HS, Ritchie GRS, Ruffier M, Schuster M, Sobral D, Tang YA, Taylor K, Trevanion S, Vandrovcova J, White S, Wilson M, Wilder SP, Aken BL, Birney E, Cunningham F, Dunham I, Durbin R, Fernández-Suarez XM, Harrow J, Herrero J, Hubbard TJP, Parker A, Proctor G, Spudich G, Vogel J, Yates A, Zadissa A, Searle SMJ. Ensembl 2012. Nucleic Acids Res 2011; 40:D84-90. [PMID: 22086963 PMCID: PMC3245178 DOI: 10.1093/nar/gkr991] [Citation(s) in RCA: 806] [Impact Index Per Article: 62.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.
Collapse
Affiliation(s)
- Paul Flicek
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Zhang YE, Landback P, Vibranovski MD, Long M. Accelerated recruitment of new brain development genes into the human genome. PLoS Biol 2011; 9:e1001179. [PMID: 22028629 PMCID: PMC3196496 DOI: 10.1371/journal.pbio.1001179] [Citation(s) in RCA: 114] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2011] [Accepted: 09/08/2011] [Indexed: 11/24/2022] Open
Abstract
How the human brain evolved has attracted tremendous interests for decades. Motivated by case studies of primate-specific genes implicated in brain function, we examined whether or not the young genes, those emerging genome-wide in the lineages specific to the primates or rodents, showed distinct spatial and temporal patterns of transcription compared to old genes, which had existed before primate and rodent split. We found consistent patterns across different sources of expression data: there is a significantly larger proportion of young genes expressed in the fetal or infant brain of humans than in mouse, and more young genes in humans have expression biased toward early developing brains than old genes. Most of these young genes are expressed in the evolutionarily newest part of human brain, the neocortex. Remarkably, we also identified a number of human-specific genes which are expressed in the prefrontal cortex, which is implicated in complex cognitive behaviors. The young genes upregulated in the early developing human brain play diverse functional roles, with a significant enrichment of transcription factors. Genes originating from different mechanisms show a similar expression bias in the developing brain. Moreover, we found that the young genes upregulated in early brain development showed rapid protein evolution compared to old genes also expressed in the fetal brain. Strikingly, genes expressed in the neocortex arose soon after its morphological origin. These four lines of evidence suggest that positive selection for brain function may have contributed to the origination of young genes expressed in the developing brain. These data demonstrate a striking recruitment of new genes into the early development of the human brain.
Collapse
Affiliation(s)
- Yong E. Zhang
- Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois, United States of America
| | - Patrick Landback
- Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois, United States of America
| | - Maria D. Vibranovski
- Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois, United States of America
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
42
|
Moss SP, Joyce DA, Humphries S, Tindall KJ, Lunt DH. Comparative analysis of teleost genome sequences reveals an ancient intron size expansion in the zebrafish lineage. Genome Biol Evol 2011; 3:1187-96. [PMID: 21920901 PMCID: PMC3205604 DOI: 10.1093/gbe/evr090] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
We have developed a bioinformatics pipeline for the comparative evolutionary analysis of Ensembl genomes and have used it to analyze the introns of the five available teleost fish genomes. We show our pipeline to be a powerful tool for revealing variation between genomes that may otherwise be overlooked with simple summary statistics. We identify that the zebrafish, Danio rerio, has an unusual distribution of intron sizes, with a greater number of larger introns in general and a notable peak in the frequency of introns of approximately 500 to 2,000 bp compared with the monotonically decreasing frequency distributions of the other fish. We determine that 47% of D. rerio introns are composed of repetitive sequences, although the remainder, over 331 Mb, is not. Because repetitive elements may be the origin of the majority of all noncoding DNA, it is likely that the remaining D. rerio intronic sequence has an ancient repetitive origin and has since accumulated so many mutations that it can no longer be recognized as such. To study such an ancient expansion of repeats in the Danio, lineage will require further comparative analysis of fish genomes incorporating a broader distribution of teleost lineages.
Collapse
|
43
|
Washington NL, Stinson EO, Perry MD, Ruzanov P, Contrino S, Smith R, Zha Z, Lyne R, Carr A, Lloyd P, Kephart E, McKay SJ, Micklem G, Stein LD, Lewis SE. The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2011; 2011:bar023. [PMID: 21856757 PMCID: PMC3170170 DOI: 10.1093/database/bar023] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The model organism Encyclopedia of DNA Elements (modENCODE) project is a National Human Genome Research Institute (NHGRI) initiative designed to characterize the genomes of Drosophila melanogaster and Caenorhabditis elegans. A Data Coordination Center (DCC) was created to collect, store and catalog modENCODE data. An effective DCC must gather, organize and provide all primary, interpreted and analyzed data, and ensure the community is supplied with the knowledge of the experimental conditions, protocols and verification checks used to generate each primary data set. We present here the design principles of the modENCODE DCC, and describe the ramifications of collecting thorough and deep metadata for describing experiments, including the use of a wiki for capturing protocol and reagent information, and the BIR-TAB specification for linking biological samples to experimental results. modENCODE data can be found at http://www.modencode.org.
Collapse
Affiliation(s)
- Nicole L Washington
- Lawrence Berkeley National Laboratory, Genomics Division, 1 Cyclotron Road MS64-121, Berkeley, CA 94720, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Kinsella RJ, Kähäri A, Haider S, Zamora J, Proctor G, Spudich G, Almeida-King J, Staines D, Derwent P, Kerhornou A, Kersey P, Flicek P. Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database (Oxford) 2011; 2011:bar030. [PMID: 21785142 PMCID: PMC3170168 DOI: 10.1093/database/bar030] [Citation(s) in RCA: 895] [Impact Index Per Article: 68.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2011] [Revised: 06/12/2011] [Accepted: 06/16/2011] [Indexed: 11/20/2022]
Abstract
For a number of years the BioMart data warehousing system has proven to be a valuable resource for scientists seeking a fast and versatile means of accessing the growing volume of genomic data provided by the Ensembl project. The launch of the Ensembl Genomes project in 2009 complemented the Ensembl project by utilizing the same visualization, interactive and programming tools to provide users with a means for accessing genome data from a further five domains: protists, bacteria, metazoa, plants and fungi. The Ensembl and Ensembl Genomes BioMarts provide a point of access to the high-quality gene annotation, variation data, functional and regulatory annotation and evolutionary relationships from genomes spanning the taxonomic space. This article aims to give a comprehensive overview of the Ensembl and Ensembl Genomes BioMarts as well as some useful examples and a description of current data content and future objectives. Database URLs: http://www.ensembl.org/biomart/martview/; http://metazoa.ensembl.org/biomart/martview/; http://plants.ensembl.org/biomart/martview/; http://protists.ensembl.org/biomart/martview/; http://fungi.ensembl.org/biomart/martview/; http://bacteria.ensembl.org/biomart/martview/.
Collapse
Affiliation(s)
- Rhoda J. Kinsella
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Department of Computer Science and Technology, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Andreas Kähäri
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Department of Computer Science and Technology, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Syed Haider
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Department of Computer Science and Technology, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Jorge Zamora
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Department of Computer Science and Technology, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Glenn Proctor
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Department of Computer Science and Technology, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Giulietta Spudich
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Department of Computer Science and Technology, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Jeff Almeida-King
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Department of Computer Science and Technology, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Daniel Staines
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Department of Computer Science and Technology, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Paul Derwent
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Department of Computer Science and Technology, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Arnaud Kerhornou
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Department of Computer Science and Technology, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Paul Kersey
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Department of Computer Science and Technology, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Paul Flicek
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Department of Computer Science and Technology, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| |
Collapse
|
45
|
Caffrey DR, Zhao J, Song Z, Schaffer ME, Haney SA, Subramanian RR, Seymour AB, Hughes JD. siRNA off-target effects can be reduced at concentrations that match their individual potency. PLoS One 2011; 6:e21503. [PMID: 21750714 PMCID: PMC3130022 DOI: 10.1371/journal.pone.0021503] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2011] [Accepted: 05/29/2011] [Indexed: 11/19/2022] Open
Abstract
Small interfering RNAs (siRNAs) are routinely used to reduce mRNA levels for a specific gene with the goal of studying its function. Several studies have demonstrated that siRNAs are not always specific and can have many off-target effects. The 3′ UTRs of off-target mRNAs are often enriched in sequences that are complementary to the seed-region of the siRNA. We demonstrate that siRNA off-targets can be significantly reduced when cells are treated with a dose of siRNA that is relatively low (e.g. 1 nM), but sufficient to effectively silence the intended target. The reduction in off-targets was demonstrated for both modified and unmodified siRNAs that targeted either STAT3 or hexokinase II. Low concentrations reduced silencing of transcripts with complementarity to the seed region of the siRNA. Similarly, off-targets that were not complementary to the siRNA were reduced at lower doses, including up-regulated genes that are involved in immune response. Importantly, the unintended induction of caspase activity following treatment with a siRNA that targeted hexokinase II was also shown to be a concentration-dependent off-target effect. We conclude that off-targets and their related phenotypic effects can be reduced for certain siRNA that potently silence their intended target at low concentrations.
Collapse
|
46
|
Busset J, Cabau C, Meslin C, Pascal G. PhyleasProg: a user-oriented web server for wide evolutionary analyses. Nucleic Acids Res 2011; 39:W479-85. [PMID: 21531699 PMCID: PMC3125726 DOI: 10.1093/nar/gkr243] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Evolutionary analyses of biological data are becoming a prerequisite in many fields of biology. At a time of high-throughput data analysis, phylogenetics is often a necessary complementary tool for biologists to understand, compare and identify the functions of sequences. But available bioinformatics tools are frequently not easy for non-specialists to use. We developed PhyleasProg (http://phyleasprog.inra.fr), a user-friendly web server as a turnkey tool dedicated to evolutionary analyses. PhyleasProg can help biologists with little experience in evolutionary methodologies by analysing their data in a simple and robust way, using methods corresponding to robust standards. Via a very intuitive web interface, users only need to enter a list of Ensembl protein IDs and a list of species as inputs. After dynamic computations, users have access to phylogenetic trees, positive/purifying selection data (on site and branch-site models), with a display of these results on the protein sequence and on a 3D structure model, and the synteny environment of related genes. This connection between different domains of phylogenetics opens the way to new biological analyses for the discovery of the function and structure of proteins.
Collapse
Affiliation(s)
- Joël Busset
- INRA, UMR85, Physiologie de la Reproduction et des Comportements, F-37380 Nouzilly, France
| | | | | | | |
Collapse
|
47
|
Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Larsson P, Longden I, McLaren W, Overduin B, Pritchard B, Riat HS, Rios D, Ritchie GRS, Ruffier M, Schuster M, Sobral D, Spudich G, Tang YA, Trevanion S, Vandrovcova J, Vilella AJ, White S, Wilder SP, Zadissa A, Zamora J, Aken BL, Birney E, Cunningham F, Dunham I, Durbin R, Fernández-Suarez XM, Herrero J, Hubbard TJP, Parker A, Proctor G, Vogel J, Searle SMJ. Ensembl 2011. Nucleic Acids Res 2011; 39:D800-6. [PMID: 21045057 PMCID: PMC3013672 DOI: 10.1093/nar/gkq1064] [Citation(s) in RCA: 564] [Impact Index Per Article: 43.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2010] [Accepted: 10/13/2010] [Indexed: 11/13/2022] Open
Abstract
The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.
Collapse
Affiliation(s)
- Paul Flicek
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Palidwor GA, Perkins TJ, Xia X. A general model of codon bias due to GC mutational bias. PLoS One 2010; 5:e13431. [PMID: 21048949 PMCID: PMC2965080 DOI: 10.1371/journal.pone.0013431] [Citation(s) in RCA: 122] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2010] [Accepted: 09/10/2010] [Indexed: 12/04/2022] Open
Abstract
Background In spite of extensive research on the effect of mutation and selection on codon usage, a general model of codon usage bias due to mutational bias has been lacking. Because most amino acids allow synonymous GC content changing substitutions in the third codon position, the overall GC bias of a genome or genomic region is highly correlated with GC3, a measure of third position GC content. For individual amino acids as well, G/C ending codons usage generally increases with increasing GC bias and decreases with increasing AT bias. Arginine and leucine, amino acids that allow GC-changing synonymous substitutions in the first and third codon positions, have codons which may be expected to show different usage patterns. Principal Findings In analyzing codon usage bias in hundreds of prokaryotic and plant genomes and in human genes, we find that two G-ending codons, AGG (arginine) and TTG (leucine), unlike all other G/C-ending codons, show overall usage that decreases with increasing GC bias, contrary to the usual expectation that G/C-ending codon usage should increase with increasing genomic GC bias. Moreover, the usage of some codons appears nonlinear, even nonmonotone, as a function of GC bias. To explain these observations, we propose a continuous-time Markov chain model of GC-biased synonymous substitution. This model correctly predicts the qualitative usage patterns of all codons, including nonlinear codon usage in isoleucine, arginine and leucine. The model accounts for 72%, 64% and 52% of the observed variability of codon usage in prokaryotes, plants and human respectively. When codons are grouped based on common GC content, 87%, 80% and 68% of the variation in usage is explained for prokaryotes, plants and human respectively. Conclusions The model clarifies the sometimes-counterintuitive effects that GC mutational bias can have on codon usage, quantifies the influence of GC mutational bias and provides a natural null model relative to which other influences on codon bias may be measured.
Collapse
|
49
|
Zhang YE, Vibranovski MD, Landback P, Marais GAB, Long M. Chromosomal redistribution of male-biased genes in mammalian evolution with two bursts of gene gain on the X chromosome. PLoS Biol 2010; 8. [PMID: 20957185 PMCID: PMC2950125 DOI: 10.1371/journal.pbio.1000494] [Citation(s) in RCA: 152] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2010] [Accepted: 08/16/2010] [Indexed: 01/20/2023] Open
Abstract
Mammalian X chromosomes evolved under various mechanisms including sexual antagonism, the faster-X process, and meiotic sex chromosome inactivation (MSCI). These forces may contribute to nonrandom chromosomal distribution of sex-biased genes. In order to understand the evolution of gene content on the X chromosome and autosome under these forces, we dated human and mouse protein-coding genes and miRNA genes on the vertebrate phylogenetic tree. We found that the X chromosome recently acquired a burst of young male-biased genes, which is consistent with fixation of recessive male-beneficial alleles by sexual antagonism. For genes originating earlier, however, this pattern diminishes and finally reverses with an overrepresentation of the oldest male-biased genes on autosomes. MSCI contributes to this dynamic since it silences X-linked old genes but not X-linked young genes. This demasculinization process seems to be associated with feminization of the X chromosome with more X-linked old genes expressed in ovaries. Moreover, we detected another burst of gene originations after the split of eutherian mammals and opossum, and these genes were quickly incorporated into transcriptional networks of multiple tissues. Preexisting X-linked genes also show significantly higher protein-level evolution during this period compared to autosomal genes, suggesting positive selection accompanied the early evolution of mammalian X chromosomes. These two findings cast new light on the evolutionary history of the mammalian X chromosome in terms of gene gain, sequence, and expressional evolution.
Collapse
Affiliation(s)
- Yong E. Zhang
- Department of Ecology and Evolution, the University of Chicago, Chicago, Illinois, United States of America
| | - Maria D. Vibranovski
- Department of Ecology and Evolution, the University of Chicago, Chicago, Illinois, United States of America
| | - Patrick Landback
- Department of Ecology and Evolution, the University of Chicago, Chicago, Illinois, United States of America
| | - Gabriel A. B. Marais
- Université de Lyon, Centre National de la Recherche Scientifique, Laboratoire de Biométrie et Biologie évolutive, Villeurbanne, France
| | - Manyuan Long
- Department of Ecology and Evolution, the University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
50
|
Davidson WS, Koop BF, Jones SJM, Iturra P, Vidal R, Maass A, Jonassen I, Lien S, Omholt SW. Sequencing the genome of the Atlantic salmon (Salmo salar). Genome Biol 2010; 11:403. [PMID: 20887641 PMCID: PMC2965382 DOI: 10.1186/gb-2010-11-9-403] [Citation(s) in RCA: 189] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The International Collaboration to Sequence the Atlantic Salmon Genome (ICSASG) will produce a genome sequence that identifies and physically maps all genes in the Atlantic salmon genome and acts as a reference sequence for other salmonids.
Collapse
Affiliation(s)
- William S Davidson
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby BC, V5A 1S6, Canada.
| | | | | | | | | | | | | | | | | |
Collapse
|