1
|
Yu T, Fife JD, Bhat V, Adzhubey I, Sherwood R, Cassa CA. FUSE: Improving the estimation and imputation of variant impacts in functional screening. CELL GENOMICS 2024; 4:100667. [PMID: 39389016 DOI: 10.1016/j.xgen.2024.100667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 06/28/2024] [Accepted: 09/05/2024] [Indexed: 10/12/2024]
Abstract
Deep mutational scanning enables high-throughput functional assessment of genetic variants. While phenotypic measurements from screening assays generally align with clinical outcomes, experimental noise may affect the accuracy of individual variant estimates. We developed the FUSE (functional substitution estimation) pipeline, which leverages measurements collectively within screening assays to improve the estimation of variant impacts. Drawing data from 115 published functional assays, FUSE assesses the mean functional effect per amino acid position and makes estimates for individual allelic variants. It enhances the correlation of variant functional effects from different assay platforms and increases the classification accuracy of missense variants in ClinVar across 29 genes (area under the receiver operating characteristic [ROC] curve [AUC] from 0.83 to 0.90). In UK Biobank patients with rare missense variants in BRCA1, LDLR, or TP53, FUSE improves the classification accuracy of associated phenotypes. FUSE can also impute variant effects for substitutions not experimentally screened. This approach improves accuracy and broadens the utility of data from functional screening.
Collapse
Affiliation(s)
- Tian Yu
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - James D Fife
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Vineel Bhat
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Ivan Adzhubey
- Department of Biomedical Informatics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
| | - Richard Sherwood
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
| | - Christopher A Cassa
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
2
|
Xavier JM, Magno R, Russell R, de Almeida BP, Jacinta-Fernandes A, Besouro-Duarte A, Dunning M, Samarajiwa S, O'Reilly M, Maia AM, Rocha CL, Rosli N, Ponder BAJ, Maia AT. Identification of candidate causal variants and target genes at 41 breast cancer risk loci through differential allelic expression analysis. Sci Rep 2024; 14:22526. [PMID: 39341862 PMCID: PMC11438911 DOI: 10.1038/s41598-024-72163-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 09/04/2024] [Indexed: 10/01/2024] Open
Abstract
Understanding breast cancer genetic risk relies on identifying causal variants and candidate target genes in risk loci identified by genome-wide association studies (GWAS), which remains challenging. Since most loci fall in active gene regulatory regions, we developed a novel approach facilitated by pinpointing the variants with greater regulatory potential in the disease's tissue of origin. Through genome-wide differential allelic expression (DAE) analysis, using microarray data from 64 normal breast tissue samples, we mapped the variants associated with DAE (daeQTLs). Then, we intersected these with GWAS data to reveal candidate risk regulatory variants and analysed their cis-acting regulatory potential. Finally, we validated our approach by extensive functional analysis of the 5q14.1 breast cancer risk locus. We observed widespread gene expression regulation by cis-acting variants in breast tissue, with 65% of coding and noncoding expressed genes displaying DAE (daeGenes). We identified over 54 K daeQTLs for 6761 (26%) daeGenes, including 385 daeGenes harbouring variants previously associated with BC risk. We found 1431 daeQTLs mapped to 93 different loci in strong linkage disequilibrium with risk-associated variants (risk-daeQTLs), suggesting a link between risk-causing variants and cis-regulation. There were 122 risk-daeQTL with stronger cis-acting potential in active regulatory regions with protein binding evidence. These variants mapped to 41 risk loci, of which 29 had no previous report of target genes and were candidates for regulating the expression levels of 65 genes. As validation, we identified and functionally characterised five candidate causal variants at the 5q14.1 risk locus targeting the ATG10 and ATP6AP1L genes, likely acting via modulation of alternative transcription and transcription factor binding. Our study demonstrates the power of DAE analysis and daeQTL mapping to identify causal regulatory variants and target genes at breast cancer risk loci, including those with complex regulatory landscapes. It additionally provides a genome-wide resource of variants associated with DAE for future functional studies.
Collapse
Affiliation(s)
- Joana M Xavier
- Cintesis@Rise, Universidade do Algarve, Faro, Portugal.
- Centro de Ciências do Mar (CCMAR), Universidade do Algarve, Faro, Portugal.
| | - Ramiro Magno
- Cintesis@Rise, Universidade do Algarve, Faro, Portugal
- Pattern Institute PT, Faro, Portugal
| | - Roslin Russell
- Cambridge Institute - CRUK, University of Cambridge, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Bernardo P de Almeida
- Faculdade de Medicina e Ciências Biomédicas (FMCB), Universidade do Algarve, Faro, Portugal
- Faculdade de Medicina, Instituto de Medicina Molecular, Universidade de Lisboa, Lisbon, Portugal
- InstaDeep, Paris, France
| | - Ana Jacinta-Fernandes
- Faculdade de Medicina e Ciências Biomédicas (FMCB), Universidade do Algarve, Faro, Portugal
| | | | - Mark Dunning
- Cambridge Institute - CRUK, University of Cambridge, Cambridge, UK
- Sheffield Bioinformatics Core, The School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
| | - Shamith Samarajiwa
- Medical Research Council (MRC) Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, UK
- Genetics and Genomics Section, Imperial College London, London, UK
| | - Martin O'Reilly
- Cambridge Institute - CRUK, University of Cambridge, Cambridge, UK
| | | | - Cátia L Rocha
- Faculdade de Medicina e Ciências Biomédicas (FMCB), Universidade do Algarve, Faro, Portugal
- Faculty of Medicine, Instituto de Saúde Ambiental (ISAMB), University of Lisbon, Lisbon, Portugal
| | - Nordiana Rosli
- Faculdade de Medicina e Ciências Biomédicas (FMCB), Universidade do Algarve, Faro, Portugal
- Training Division, Ministry of Health Malaysia, Putrajaya, Malaysia
- Biometrology Group, Division of Chemical and Biological Metrology, Korea Research Institute of Standards and Science, Daejeon, South Korea
| | - Bruce A J Ponder
- Cambridge Institute - CRUK, University of Cambridge, Cambridge, UK
| | - Ana-Teresa Maia
- Cintesis@Rise, Universidade do Algarve, Faro, Portugal.
- Centro de Ciências do Mar (CCMAR), Universidade do Algarve, Faro, Portugal.
- Faculdade de Medicina e Ciências Biomédicas (FMCB), Universidade do Algarve, Faro, Portugal.
| |
Collapse
|
3
|
Bradley D, Hogrebe A, Dandage R, Dubé AK, Leutert M, Dionne U, Chang A, Villén J, Landry CR. The fitness cost of spurious phosphorylation. EMBO J 2024:10.1038/s44318-024-00200-7. [PMID: 39256561 DOI: 10.1038/s44318-024-00200-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 07/23/2024] [Accepted: 07/24/2024] [Indexed: 09/12/2024] Open
Abstract
The fidelity of signal transduction requires the binding of regulatory molecules to their cognate targets. However, the crowded cell interior risks off-target interactions between proteins that are functionally unrelated. How such off-target interactions impact fitness is not generally known. Here, we use Saccharomyces cerevisiae to inducibly express tyrosine kinases. Because yeast lacks bona fide tyrosine kinases, the resulting tyrosine phosphorylation is biologically spurious. We engineered 44 yeast strains each expressing a tyrosine kinase, and quantitatively analysed their phosphoproteomes. This analysis resulted in ~30,000 phosphosites mapping to ~3500 proteins. The number of spurious pY sites generated correlates strongly with decreased growth, and we predict over 1000 pY events to be deleterious. However, we also find that many of the spurious pY sites have a negligible effect on fitness, possibly because of their low stoichiometry. This result is consistent with our evolutionary analyses demonstrating a lack of phosphotyrosine counter-selection in species with tyrosine kinases. Our results suggest that, alongside the risk for toxicity, the cell can tolerate a large degree of non-functional crosstalk as interaction networks evolve.
Collapse
Affiliation(s)
- David Bradley
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexander Hogrebe
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Rohan Dandage
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexandre K Dubé
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Mario Leutert
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| | - Ugo Dionne
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexis Chang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Judit Villén
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada.
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada.
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada.
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada.
- Department of Biology, Université Laval, Québec, QC, Canada.
| |
Collapse
|
4
|
Rosen Y, Brbić M, Roohani Y, Swanson K, Li Z, Leskovec J. Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN. Nat Methods 2024; 21:1492-1500. [PMID: 38366243 PMCID: PMC11310084 DOI: 10.1038/s41592-024-02191-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 01/22/2024] [Indexed: 02/18/2024]
Abstract
Analysis of single-cell datasets generated from diverse organisms offers unprecedented opportunities to unravel fundamental evolutionary processes of conservation and diversification of cell types. However, interspecies genomic differences limit the joint analysis of cross-species datasets to homologous genes. Here we present SATURN, a deep learning method for learning universal cell embeddings that encodes genes' biological properties using protein language models. By coupling protein embeddings from language models with RNA expression, SATURN integrates datasets profiled from different species regardless of their genomic similarity. SATURN can detect functionally related genes coexpressed across species, redefining differential expression for cross-species analysis. Applying SATURN to three species whole-organism atlases and frog and zebrafish embryogenesis datasets, we show that SATURN can effectively transfer annotations across species, even when they are evolutionarily remote. We also demonstrate that SATURN can be used to find potentially divergent gene functions between glaucoma-associated genes in humans and four other species.
Collapse
Affiliation(s)
- Yanay Rosen
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Maria Brbić
- School of Computer and Communication Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Yusuf Roohani
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Kyle Swanson
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Ziang Li
- Department of Computer Science and Technology, Tsinghua University, Beijing, China
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
5
|
Pfennig A, Lachance J. The evolutionary fate of Neanderthal DNA in 30,780 admixed genomes with recent African-like ancestry. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.25.605203. [PMID: 39091830 PMCID: PMC11291122 DOI: 10.1101/2024.07.25.605203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Following introgression, Neanderthal DNA was initially purged from non-African genomes, but the evolutionary fate of remaining introgressed DNA has not been explored yet. To fill this gap, we analyzed 30,780 admixed genomes with African-like ancestry from the All of Us research program, in which Neanderthal alleles encountered novel genetic backgrounds during the last 15 generations. Observed amounts of Neanderthal DNA approximately match expectations based on ancestry proportions, suggesting neutral evolution. Nevertheless, we identified genomic regions that have significantly less or more Neanderthal ancestry than expected and are associated with spermatogenesis, innate immunity, and other biological processes. We also identified three novel introgression desert-like regions in recently admixed genomes, whose genetic features are compatible with hybrid incompatibilities and intrinsic negative selection. Overall, we find that much of the remaining Neanderthal DNA in human genomes is not under strong selection, and complex evolutionary dynamics have shaped introgression landscapes in our species.
Collapse
Affiliation(s)
- Aaron Pfennig
- School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Dr, Atlanta, 30332, GA, USA
| | - Joseph Lachance
- School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Dr, Atlanta, 30332, GA, USA
| |
Collapse
|
6
|
Kornrumpf K, Kurz N, Drofenik K, Krauß L, Schneider C, Koch R, Beißbarth T, Dönitz J. SeqCAT: Sequence Conversion and Analysis Toolbox. Nucleic Acids Res 2024; 52:W116-W120. [PMID: 38801081 PMCID: PMC11223787 DOI: 10.1093/nar/gkae422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 04/30/2024] [Accepted: 05/07/2024] [Indexed: 05/29/2024] Open
Abstract
Dealing with sequence coordinates in different formats and reference genomes is challenging in genetic research. This complexity arises from the need to convert and harmonize datasets of different sources using alternating nomenclatures. Since manual processing is time-consuming and requires specialized knowledge, the Sequence Conversion and Analysis Toolbox (SeqCAT) was developed for daily work with genetic datasets. Our tool provides a range of functions designed to standardize and convert gene variant coordinates based on various sequence types. Its user-friendly web interface provides easy access to all functionalities, while the Application Programming Interface (API) enables automation within pipelines. SeqCAT provides access to human genomic, protein and transcript data, utilizing various data resources and packages and extending them with its own unique features. The platform covers a wide range of genetic research needs with its 14 different applications and 3 info points, including search for transcript and gene information, transition between reference genomes, variant mapping, and genetic event review. Notable examples are 'Convert Protein to DNA Position' for translation of amino acid changes into genomic single nucleotide variants, or 'Fusion Check' for frameshift determination in gene fusions. SeqCAT is an excellent resource for converting sequence coordinate data into the required formats and is available at: https://mtb.bioinf.med.uni-goettingen.de/SeqCAT/.
Collapse
Affiliation(s)
- Kevin Kornrumpf
- Department of Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany
| | - Nadine S Kurz
- Department of Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany
- Göttingen Comprehensive Cancer Center (G-CCC), 37075 Göttingen, Germany
| | - Klara Drofenik
- Department of Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany
| | - Lukas Krauß
- Department of General, Visceral and Pediatric Surgery, University Medical Center Göttingen, Robert-Koch Str. 40, 37075 Göttingen, Germany
| | - Carolin Schneider
- Department of General, Visceral and Pediatric Surgery, University Medical Center Göttingen, Robert-Koch Str. 40, 37075 Göttingen, Germany
| | - Raphael Koch
- Department of Hematology and Medical Oncology, University Medical Center Göttingen, Robert-Koch Str. 40, 37075 Göttingen, Germany
| | - Tim Beißbarth
- Department of Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany
- Campus Institute Data Science (CIDAS), Göttingen, Germany
| | - Jürgen Dönitz
- Department of Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany
- Göttingen Comprehensive Cancer Center (G-CCC), 37075 Göttingen, Germany
- Campus Institute Data Science (CIDAS), Göttingen, Germany
| |
Collapse
|
7
|
Schmidt AF, Finan C, Chopade S, Ellmerich S, Rossor MN, Hingorani AD, Pepys M. Genetic evidence for serum amyloid P component as a drug target in neurodegenerative disorders. Open Biol 2024; 14:230419. [PMID: 39013416 PMCID: PMC11251762 DOI: 10.1098/rsob.230419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 05/23/2024] [Indexed: 07/18/2024] Open
Abstract
The mechanisms responsible for neuronal death causing cognitive loss in Alzheimer's disease (AD) and many other dementias are not known. Serum amyloid P component (SAP) is a constitutive plasma protein, which is cytotoxic for cerebral neurones and also promotes formation and persistence of cerebral Aβ amyloid and neurofibrillary tangles. Circulating SAP, which is produced exclusively by the liver, is normally almost completely excluded from the brain. Conditions increasing brain exposure to SAP increase dementia risk, consistent with a causative role in neurodegeneration. Furthermore, neocortex content of SAP is strongly and independently associated with dementia at death. Here, seeking genomic evidence for a causal link of SAP with neurodegeneration, we meta-analysed three genome-wide association studies of 44 288 participants, then conducted cis-Mendelian randomization assessment of associations with neurodegenerative diseases. Higher genetically instrumented plasma SAP concentrations were associated with AD (odds ratio 1.07, 95% confidence interval (CI) 1.02; 1.11, p = 1.8 × 10-3), Lewy body dementia (odds ratio 1.37, 95%CI 1.19; 1.59, p = 1.5 × 10-5) and plasma tau concentration (0.06 log2(ng l-1) 95%CI 0.03; 0.08, p = 4.55 × 10-6). These genetic findings are consistent with neuropathogenicity of SAP. Depletion of SAP from the blood and the brain, by the safe, well tolerated, experimental drug miridesap may thus be neuroprotective.
Collapse
Affiliation(s)
- A. Floriaan Schmidt
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, 69-75 Chenies Mews, London WC1E 6HX, UK
- UCL British Heart Foundation Research Accelerator, 69-75 Chenies Mews, London WC1E 6HX, UK
- Department of Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centres, University of Amsterdam, Amsterdam UMC, locatie AMC Postbus 22660, 1100 DD Amsterdam, Zuidoost, The Netherlands
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht University, Heidelberglaan 100, 3584 CX Utrecht, The Netherlands
| | - Chris Finan
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, 69-75 Chenies Mews, London WC1E 6HX, UK
- UCL British Heart Foundation Research Accelerator, 69-75 Chenies Mews, London WC1E 6HX, UK
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht University, Heidelberglaan 100, 3584 CX Utrecht, The Netherlands
| | - Sandesh Chopade
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, 69-75 Chenies Mews, London WC1E 6HX, UK
- UCL British Heart Foundation Research Accelerator, 69-75 Chenies Mews, London WC1E 6HX, UK
| | - Stephan Ellmerich
- Wolfson Drug Discovery Unit, Division of Medicine, University College London, Royal Free Campus, Rowland Hill Street, London NW3 2PF, UK
| | - Martin N. Rossor
- UCL Queen Square Institute of Neurology, Faculty of Brain Sciences, University College London, Queen Square, London WC1N 3BG, UK
| | - Aroon D. Hingorani
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, 69-75 Chenies Mews, London WC1E 6HX, UK
- UCL British Heart Foundation Research Accelerator, 69-75 Chenies Mews, London WC1E 6HX, UK
| | - Mark B. Pepys
- Wolfson Drug Discovery Unit, Division of Medicine, University College London, Royal Free Campus, Rowland Hill Street, London NW3 2PF, UK
| |
Collapse
|
8
|
Lee J, Achuthan M, Chen L, Carmona-Mora P. A customizable secure DIY web application for accessing, sharing, and browsing aggregate experimental results and metadata. BIOINFORMATICS ADVANCES 2024; 4:vbae087. [PMID: 39027642 PMCID: PMC11257709 DOI: 10.1093/bioadv/vbae087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 04/24/2024] [Accepted: 06/27/2024] [Indexed: 07/20/2024]
Abstract
Summary A problem spanning across many research fields is that processed data and research results are often scattered, which makes data access, analysis, extraction, and team sharing more challenging. We have developed a platform for researchers to easily manage tabular data with features like browsing, bookmarking, and linking to external open knowledge bases. The source code, originally designed for genomics research, is customizable for use by other fields or data, providing a no- to low-cost DIY system for research teams. Availability and implementation The source code of our DIY app is available on https://github.com/Carmona-MoraUCD/Human-Genomics-Browser. It can be downloaded and run by anyone with a web browser, Python3, and Node.js on their machine. The web application is licensed under the MIT license.
Collapse
Affiliation(s)
- Jaewoo Lee
- College of Engineering, University of California at Davis, Davis, CA 95616, United States
| | - Mehita Achuthan
- College of Engineering, University of California at Davis, Davis, CA 95616, United States
| | - Lucas Chen
- College of Engineering, University of California at Davis, Davis, CA 95616, United States
| | - Paulina Carmona-Mora
- Department of Neurology, School of Medicine, University of California at Davis, Sacramento, CA 95817, United States
| |
Collapse
|
9
|
Won S, Yu J, Kim H. Identifying genes within pathways in unannotated genomes with PaGeSearch. Genome Res 2024; 34:784-795. [PMID: 38858086 PMCID: PMC11216310 DOI: 10.1101/gr.278566.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 04/01/2024] [Indexed: 06/12/2024]
Abstract
In biological research, the identification and comparison of genes within specific pathways across the genomes of various species are invaluable. However, annotating the entire genome is resource intensive, and sequence similarity searches often yield results that are not actually genes. To address these limitations, we introduce Pathway Gene Search (PaGeSearch), a tool designed to identify genes from predefined lists, especially those in specific pathways, within genomes. The tool uses an initial sequence similarity search to identify relevant genomic regions, followed by targeted gene prediction and neural network-based result filtering. PaGeSearch suggests the regions that are most likely the orthologs of the genes in the query and is designed to be applicable for species within five classes: mammals, fish, birds, eudicotyledons, and Liliopsida. Compared with GeMoMa and miniprot, PaGeSearch generally outperforms in terms of sensitivity and positive predictive value, as well as negative predictive value. Also, the exon coverage of gene models from PaGeSearch is higher compared with those in GeMoMa and miniprot. Although its performance shows increased variability when applied to actual biological pathways, it nonetheless maintains an acceptable level of accuracy. Evaluating PaGeSearch across different assembly levels, chromosome, scaffold, and contig shows minimal variation in outcomes, indicating that PaGeSearch is resilient to variations in assembly quality.
Collapse
Affiliation(s)
- Sohyoung Won
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea, 08826
- eGnome, Incorporated, Seoul, Republic of Korea, 05836
| | - Jaewoong Yu
- eGnome, Incorporated, Seoul, Republic of Korea, 05836
- UNGENE, Incorporated, Seoul, Republic of Korea, 14556
| | - Heebal Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea, 08826;
- eGnome, Incorporated, Seoul, Republic of Korea, 05836
- Department of Agricultural Biotechnology and Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea, 08826
| |
Collapse
|
10
|
Livesey BJ, Badonyi M, Dias M, Frazer J, Kumar S, Lindorff-Larsen K, McCandlish DM, Orenbuch R, Shearer CA, Muffley L, Foreman J, Glazer AM, Lehner B, Marks DS, Roth FP, Rubin AF, Starita LM, Marsh JA. Guidelines for releasing a variant effect predictor. ARXIV 2024:arXiv:2404.10807v1. [PMID: 38699161 PMCID: PMC11065047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released to date, and there is tremendous variability in their underlying algorithms and outputs, and in the ways in which the methodologies and predictions are shared. This leads to considerable challenges for end users in knowing which VEPs to use and how to use them. Here, to address these issues, we provide guidelines and recommendations for the release of novel VEPs. Emphasising open-source availability, transparent methodologies, clear variant effect score interpretations, standardised scales, accessible predictions, and rigorous training data disclosure, we aim to improve the usability and interpretability of VEPs, and promote their integration into analysis and evaluation pipelines. We also provide a large, categorised list of currently available VEPs, aiming to facilitate the discovery and encourage the usage of novel methods within the scientific community.
Collapse
Affiliation(s)
- Benjamin J. Livesey
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Mihaly Badonyi
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Mafalda Dias
- Centre for Genomic Regulation (CRG),The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Jonathan Frazer
- Centre for Genomic Regulation (CRG),The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Sushant Kumar
- Department of Medical Biophysics, University of Toronto; Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Rose Orenbuch
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | | | - Lara Muffley
- Department of Genome Sciences, University of Washington and the Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Julia Foreman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Ben Lehner
- Wellcome Sanger Institute, Cambridge, UK; Universitat Pompeu Fabra (UPF), Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Debora S. Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Boston, MA, USA
| | - Frederick P. Roth
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Alan F. Rubin
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research; Department of Medical Biology, University of Melbourne, Parkville, Australia
| | - Lea M. Starita
- Department of Genome Sciences, University of Washington and the Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Joseph A. Marsh
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
11
|
Duncan AG, Mitchell JA, Moses AM. Improving the performance of supervised deep learning for regulatory genomics using phylogenetic augmentation. Bioinformatics 2024; 40:btae190. [PMID: 38588559 PMCID: PMC11042905 DOI: 10.1093/bioinformatics/btae190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 01/12/2024] [Accepted: 04/05/2024] [Indexed: 04/10/2024] Open
Abstract
MOTIVATION Supervised deep learning is used to model the complex relationship between genomic sequence and regulatory function. Understanding how these models make predictions can provide biological insight into regulatory functions. Given the complexity of the sequence to regulatory function mapping (the cis-regulatory code), it has been suggested that the genome contains insufficient sequence variation to train models with suitable complexity. Data augmentation is a widely used approach to increase the data variation available for model training, however current data augmentation methods for genomic sequence data are limited. RESULTS Inspired by the success of comparative genomics, we show that augmenting genomic sequences with evolutionarily related sequences from other species, which we term phylogenetic augmentation, improves the performance of deep learning models trained on regulatory genomic sequences to predict high-throughput functional assay measurements. Additionally, we show that phylogenetic augmentation can rescue model performance when the training set is down-sampled and permits deep learning on a real-world small dataset, demonstrating that this approach improves data efficiency. Overall, this data augmentation method represents a solution for improving model performance that is applicable to many supervised deep-learning problems in genomics. AVAILABILITY AND IMPLEMENTATION The open-source GitHub repository agduncan94/phylogenetic_augmentation_paper includes the code for rerunning the analyses here and recreating the figures.
Collapse
Affiliation(s)
- Andrew G Duncan
- Cell & Systems Biology, University of Toronto, Toronto, ON M5S 3G5, Canada
| | | | - Alan M Moses
- Cell & Systems Biology, University of Toronto, Toronto, ON M5S 3G5, Canada
| |
Collapse
|
12
|
Liu Q, Hu Q, Liu S, Hutson A, Morgan M. ReUseData: an R/Bioconductor tool for reusable and reproducible genomic data management. BMC Bioinformatics 2024; 25:8. [PMID: 38172657 PMCID: PMC10765726 DOI: 10.1186/s12859-023-05626-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND The increasing volume and complexity of genomic data pose significant challenges for effective data management and reuse. Public genomic data often undergo similar preprocessing across projects, leading to redundant or inconsistent datasets and inefficient use of computing resources. This is especially pertinent for bioinformaticians engaged in multiple projects. Tools have been created to address challenges in managing and accessing curated genomic datasets, however, the practical utility of such tools becomes especially beneficial for users who seek to work with specific types of data or are technically inclined toward a particular programming language. Currently, there exists a gap in the availability of an R-specific solution for efficient data management and versatile data reuse. RESULTS Here we present ReUseData, an R software tool that overcomes some of the limitations of existing solutions and provides a versatile and reproducible approach to effective data management within R. ReUseData facilitates the transformation of ad hoc scripts for data preprocessing into Common Workflow Language (CWL)-based data recipes, allowing for the reproducible generation of curated data files in their generic formats. The data recipes are standardized and self-contained, enabling them to be easily portable and reproducible across various computing platforms. ReUseData also streamlines the reuse of curated data files and their integration into downstream analysis tools and workflows with different frameworks. CONCLUSIONS ReUseData provides a reliable and reproducible approach for genomic data management within the R environment to enhance the accessibility and reusability of genomic data. The package is available at Bioconductor ( https://bioconductor.org/packages/ReUseData/ ) with additional information on the project website ( https://rcwl.org/dataRecipes/ ).
Collapse
Affiliation(s)
- Qian Liu
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA.
| | - Qiang Hu
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Song Liu
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Alan Hutson
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Martin Morgan
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| |
Collapse
|
13
|
Mönttinen HAM, Frilander MJ, Löytynoja A. Generation of de novo miRNAs from template switching during DNA replication. Proc Natl Acad Sci U S A 2023; 120:e2310752120. [PMID: 38019864 PMCID: PMC10710096 DOI: 10.1073/pnas.2310752120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 11/01/2023] [Indexed: 12/01/2023] Open
Abstract
The mechanisms generating novel genes and genetic information are poorly known, even for microRNA (miRNA) genes with an extremely constrained design. All miRNA primary transcripts need to fold into a stem-loop structure to yield short gene products ([Formula: see text]22 nt) that bind and repress their mRNA targets. While a substantial number of miRNA genes are ancient and highly conserved, short secondary structures coding for entirely novel miRNA genes have been shown to emerge in a lineage-specific manner. Template switching is a DNA-replication-related mutation mechanism that can introduce complex changes and generate perfect base pairing for entire hairpin structures in a single event. Here, we show that the template-switching mutations (TSMs) have participated in the emergence of over 6,000 suitable hairpin structures in the primate lineage to yield at least 18 new human miRNA genes, that is 26% of the miRNAs inferred to have arisen since the origin of primates. While the mechanism appears random, the TSM-generated miRNAs are enriched in introns where they can be expressed with their host genes. The high frequency of TSM events provides raw material for evolution. Being orders of magnitude faster than other mechanisms proposed for de novo creation of genes, TSM-generated miRNAs enable near-instant rewiring of genetic information and rapid adaptation to changing environments.
Collapse
Affiliation(s)
- Heli A. M. Mönttinen
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| | - Mikko J. Frilander
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| | - Ari Löytynoja
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| |
Collapse
|
14
|
Bradley D, Hogrebe A, Dandage R, Dubé AK, Leutert M, Dionne U, Chang A, Villén J, Landry CR. The fitness cost of spurious phosphorylation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.08.561337. [PMID: 37873463 PMCID: PMC10592693 DOI: 10.1101/2023.10.08.561337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The fidelity of signal transduction requires the binding of regulatory molecules to their cognate targets. However, the crowded cell interior risks off-target interactions between proteins that are functionally unrelated. How such off-target interactions impact fitness is not generally known, but quantifying this is required to understand the constraints faced by cell systems as they evolve. Here, we use the model organism S. cerevisiae to inducibly express tyrosine kinases. Because yeast lacks bona fide tyrosine kinases, most of the resulting tyrosine phosphorylation is spurious. This provides a suitable system to measure the impact of artificial protein interactions on fitness. We engineered 44 yeast strains each expressing a tyrosine kinase, and quantitatively analysed their phosphoproteomes. This analysis resulted in ~30,000 phosphosites mapping to ~3,500 proteins. Examination of the fitness costs in each strain revealed a strong correlation between the number of spurious pY sites and decreased growth. Moreover, the analysis of pY effects on protein structure and on protein function revealed over 1000 pY events that we predict to be deleterious. However, we also find that a large number of the spurious pY sites have a negligible effect on fitness, possibly because of their low stoichiometry. This result is consistent with our evolutionary analyses demonstrating a lack of phosphotyrosine counter-selection in species with bona fide tyrosine kinases. Taken together, our results suggest that, alongside the risk for toxicity, the cell can tolerate a large degree of non-functional crosstalk as interaction networks evolve.
Collapse
Affiliation(s)
- David Bradley
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexander Hogrebe
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Rohan Dandage
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexandre K Dubé
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Mario Leutert
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| | - Ugo Dionne
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexis Chang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Judit Villén
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| |
Collapse
|
15
|
Rosen Y, Brbić M, Roohani Y, Swanson K, Li Z, Leskovec J. Towards Universal Cell Embeddings: Integrating Single-cell RNA-seq Datasets across Species with SATURN. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.03.526939. [PMID: 36778387 PMCID: PMC9915700 DOI: 10.1101/2023.02.03.526939] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Analysis of single-cell datasets generated from diverse organisms offers unprecedented opportunities to unravel fundamental evolutionary processes of conservation and diversification of cell types. However, inter-species genomic differences limit the joint analysis of cross-species datasets to homologous genes. Here, we present SATURN, a deep learning method for learning universal cell embeddings that encodes genes' biological properties using protein language models. By coupling protein embeddings from language models with RNA expression, SATURN integrates datasets profiled from different species regardless of their genomic similarity. SATURN has a unique ability to detect functionally related genes co-expressed across species, redefining differential expression for cross-species analysis. We apply SATURN to three species whole-organism atlases and frog and zebrafish embryogenesis datasets. We show that cell embeddings learnt in SATURN can be effectively used to transfer annotations across species and identify both homologous and species-specific cell types, even across evolutionarily remote species. Finally, we use SATURN to reannotate the five species Cell Atlas of Human Trabecular Meshwork and Aqueous Outflow Structures and find evidence of potentially divergent functions between glaucoma associated genes in humans and other species.
Collapse
Affiliation(s)
- Yanay Rosen
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Maria Brbić
- School of Computer and Communication Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Yusuf Roohani
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Kyle Swanson
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Ziang Li
- Department of Computer Science and Technology, Tsinghua University, Beijing, China
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, USA
| |
Collapse
|
16
|
Fairbrother-Browne A, García-Ruiz S, Hertfelder Reynolds R, Ryten M, Hodgkinson A. ensemblQueryR: fast, flexible and high-throughput querying of Ensembl LD API endpoints in R. GIGABYTE 2023; 2023:1-10. [PMID: 37732134 PMCID: PMC10507293 DOI: 10.46471/gigabyte.91] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 09/11/2023] [Indexed: 09/22/2023] Open
Abstract
We present ensemblQueryR, an R package for querying Ensembl linkage disequilibrium (LD) endpoints. This package is flexible, fast and user-friendly, and optimised for high-throughput querying. ensemblQueryR uses functions that are intuitive and amenable to custom code integration, familiar R object types as inputs and outputs as well as providing parallelisation functionality. For each Ensembl LD endpoint, ensemblQueryR provides two functions, permitting both single- and multi-query modes of operation. The multi-query functions are optimised for large query sizes and provide optional parallelisation to leverage available computational resources and minimise processing time. We demonstrate improved computational performance of ensemblQueryR over an exisiting tool in terms of random access memory (RAM) usage and speed, delivering a 10-fold speed increase whilst using a third of the RAM. Finally, ensemblQueryR is near-agnostic to operating system and computational architecture through Docker and singularity images, making this tool widely accessible to the scientific community.
Collapse
Affiliation(s)
- Aine Fairbrother-Browne
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- Department of Medical and Molecular Genetics, School of Basic and Medical Biosciences, King’s College London, London, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London, UK
| | - Sonia García-Ruiz
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
| | - Regina Hertfelder Reynolds
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
| | - Mina Ryten
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
| | - Alan Hodgkinson
- Department of Medical and Molecular Genetics, School of Basic and Medical Biosciences, King’s College London, London, UK
| |
Collapse
|
17
|
Sakthikumar S, Facista S, Whitley D, Byron SA, Ahmed Z, Warrier M, Zhu Z, Chon E, Banovich K, Haworth D, Hendricks WPD, Wang G. Standing in the canine precision medicine knowledge gap: Improving annotation of canine cancer genomic biomarkers through systematic comparative analysis of human cancer mutations in COSMIC. Vet Comp Oncol 2023; 21:482-491. [PMID: 37248814 DOI: 10.1111/vco.12911] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 04/26/2023] [Accepted: 05/08/2023] [Indexed: 05/31/2023]
Abstract
The accrual of cancer mutation data and related functional and clinical associations have revolutionised human oncology, enabling the advancement of precision medicine and biomarker-guided clinical management. The catalogue of cancer mutations is also growing in canine cancers. However, without direct high-powered functional data in dogs, it remains challenging to interpret and utilise them in research and clinical settings. It is well-recognised that canine and human cancers share genetic, molecular and phenotypic similarities. Therefore, leveraging the massive wealth of human mutation data may help advance canine oncology. Here, we present a structured analysis of sequence conservation and conversion of human mutations to the canine genome through a 'caninisation' process. We applied this analysis to COSMIC, the Catalogue of Somatic Mutations in Cancer, the most prominent human cancer mutation database. For the project's initial phase, we focused on the subset of the COSMIC data corresponding to Cancer Gene Census (CGC) genes. A total of 670 canine orthologs were found for 721 CGC genes. In these genes, 365 K unique mutations across 160 tumour types were converted successfully to canine coordinates. We identified shared putative cancer-driving mutations, including pathogenic and hotspot mutations and mutations bearing similar biomarker associations with diagnostic, prognostic and therapeutic utility. Thus, this structured caninisation of human cancer mutations facilitates the interpretation and annotation of canine mutations and helps bridge the knowledge gap to enable canine precision medicine.
Collapse
Affiliation(s)
| | | | - Derick Whitley
- Vidium Animal Health, a TGen Subsidiary, Phoenix, Arizona, USA
| | - Sara A Byron
- Translational Genomics Research Institute, Phoenix, Arizona, USA
| | - Zeeshan Ahmed
- Vidium Animal Health, a TGen Subsidiary, Phoenix, Arizona, USA
| | - Manisha Warrier
- Vidium Animal Health, a TGen Subsidiary, Phoenix, Arizona, USA
| | - Zhanyang Zhu
- Vidium Animal Health, a TGen Subsidiary, Phoenix, Arizona, USA
| | - Esther Chon
- Vidium Animal Health, a TGen Subsidiary, Phoenix, Arizona, USA
| | | | - David Haworth
- Vidium Animal Health, a TGen Subsidiary, Phoenix, Arizona, USA
| | | | - Guannan Wang
- Vidium Animal Health, a TGen Subsidiary, Phoenix, Arizona, USA
| |
Collapse
|
18
|
Schmidt AF, Finan C, Chopade S, Ellmerich S, Rossor MN, Hingorani AD, Pepys MB. Genetic evidence for serum amyloid P component as a drug target for treatment of neurodegenerative disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.15.23293564. [PMID: 37645746 PMCID: PMC10462209 DOI: 10.1101/2023.08.15.23293564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
The direct causes of neurodegeneration underlying Alzheimer's disease (AD) and many other dementias, are not known. Here we identify serum amyloid P component (SAP), a constitutive plasma protein normally excluded from the brain, as a potential drug target. After meta-analysis of three genome-wide association studies, comprising 44,288 participants, cis-Mendelian randomization showed that genes responsible for higher plasma SAP values are significantly associated with AD, Lewy body dementia and plasma tau concentration. These genetic findings are consistent with experimental evidence of SAP neurotoxicity and the strong, independent association of neocortex SAP content with dementia at death. Depletion of SAP from the blood and from the brain, as is provided by the safe, well tolerated, experimental drug, miridesap, may therefore contribute to treatment of neurodegeneration.
Collapse
Affiliation(s)
- A Floriaan Schmidt
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, United Kingdom
- UCL British Heart Foundation Research Accelerator, London, United Kingdom
- Department of Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centres, University of Amsterdam, Amsterdam, the Netherlands
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - Chris Finan
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, United Kingdom
- UCL British Heart Foundation Research Accelerator, London, United Kingdom
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - Sandesh Chopade
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, United Kingdom
- UCL British Heart Foundation Research Accelerator, London, United Kingdom
| | - Stephan Ellmerich
- Wolfson Drug Discovery Unit, Division of Medicine, University College London, London, United Kingdom
| | - Martin N Rossor
- UCL Queen Square Institute of Neurology, Faculty of Brain Sciences, University College London, United Kingdom
| | - Aroon D Hingorani
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, United Kingdom
- UCL British Heart Foundation Research Accelerator, London, United Kingdom
| | - Mark B Pepys
- Wolfson Drug Discovery Unit, Division of Medicine, University College London, London, United Kingdom
| |
Collapse
|
19
|
Fife JD, Cassa CA. Estimating clinical risk in gene regions from population sequencing cohort data. Am J Hum Genet 2023; 110:940-949. [PMID: 37236177 PMCID: PMC10257006 DOI: 10.1016/j.ajhg.2023.05.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 05/04/2023] [Accepted: 05/05/2023] [Indexed: 05/28/2023] Open
Abstract
While pathogenic variants can significantly increase disease risk, it is still challenging to estimate the clinical impact of rare missense variants more generally. Even in genes such as BRCA2 or PALB2, large cohort studies find no significant association between breast cancer and rare missense variants collectively. Here, we introduce REGatta, a method to estimate clinical risk from variants in smaller segments of individual genes. We first define these regions by using the density of pathogenic diagnostic reports and then calculate the relative risk in each region by using over 200,000 exome sequences in the UK Biobank. We apply this method in 13 genes with established roles across several monogenic disorders. In genes with no significant difference at the gene level, this approach significantly separates disease risk for individuals with rare missense variants at higher or lower risk (BRCA2 regional model OR = 1.46 [1.12, 1.79], p = 0.0036 vs. BRCA2 gene model OR = 0.96 [0.85, 1.07] p = 0.4171). We find high concordance between these regional risk estimates and high-throughput functional assays of variant impact. We compare our method with existing methods and the use of protein domains (Pfam) as regions and find REGatta better identifies individuals at elevated or reduced risk. These regions provide useful priors and are potentially useful for improving risk assessment for genes associated with monogenic diseases.
Collapse
Affiliation(s)
- James D Fife
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Christopher A Cassa
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
20
|
Reich T, Adato O, Kofman NS, Faiglin A, Unger R. TREM2 has a significant, gender-specific, effect on human obesity. Sci Rep 2023; 13:482. [PMID: 36627355 PMCID: PMC9832124 DOI: 10.1038/s41598-022-27272-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 12/29/2022] [Indexed: 01/11/2023] Open
Abstract
Triggering Receptor Expressed On Myeloid Cells 2 (TREM2) is a membrane protein expressed on immune cells, involved in neurodegenerative diseases and cancer. Recently, it was shown that TREM2 is expressed on lipid associated macrophages in adipose tissue, and that TREM2 knockout mice suffer from metabolic symptoms. Here, a computational study using public databases, brings direct evidence for the involvement of TREM2 in human obesity. First, we show a significant correlation between TREM2 expression levels and BMI in adipose tissues in samples from the GTEx database. This association was evident for males but not for females. Second, we identified in the UK Biobank cohort a coding SNP in TREM2 with a significant effect on BMI. Compared to previously identified SNPs associated with BMI, this SNP (rs2234256 SNP, L211P) has the strongest association, reflected in significantly higher BMI values of people carrying the SNP as heterozygous and even more for homozygous. Strikingly, this association was evident only for females. These observations suggest a novel gender-specific role of TREM2 in human obesity, and call for further studies to elucidate the mechanism by which this gene correlates with an obese phenotype.
Collapse
Affiliation(s)
- Tzila Reich
- Faculty of Life Sciences, The Mina and Everard Goodman, Bar-Ilan University, 52900, Ramat Gan, Israel
| | - Orit Adato
- Faculty of Life Sciences, The Mina and Everard Goodman, Bar-Ilan University, 52900, Ramat Gan, Israel
| | - Naomi Schneid Kofman
- Faculty of Life Sciences, The Mina and Everard Goodman, Bar-Ilan University, 52900, Ramat Gan, Israel
| | - Ariel Faiglin
- Faculty of Life Sciences, The Mina and Everard Goodman, Bar-Ilan University, 52900, Ramat Gan, Israel
| | - Ron Unger
- Faculty of Life Sciences, The Mina and Everard Goodman, Bar-Ilan University, 52900, Ramat Gan, Israel.
| |
Collapse
|
21
|
Fife JD, Cassa CA. Estimating clinical risk in gene regions from population sequencing cohort data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.06.23284281. [PMID: 36711752 PMCID: PMC9882564 DOI: 10.1101/2023.01.06.23284281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
While pathogenic variants significantly increase disease risk in many genes, it is still challenging to estimate the clinical impact of rare missense variants more generally. Even in genes such as BRCA2 or PALB2 , large cohort studies find no significant association between breast cancer and rare germline missense variants collectively. Here we introduce REGatta, a method to improve the estimation of clinical risk in gene segments. We define gene regions using the density of pathogenic diagnostic reports, and then calculate the relative risk in each of these regions using 109,581 exome sequences from women in the UK Biobank. We apply this method in seven established breast cancer genes, and identify regions in each gene with statistically significant differences in breast cancer incidence for rare missense carriers. Even in genes with no significant difference at the gene level, this approach significantly separates rare missense variant carriers at higher or lower risk ( BRCA2 regional model OR=1.46 [1.12, 1.79], p=0.0036 vs. BRCA2 gene model OR=0.96 [0.85,1.07] p=0.4171). We find high concordance between these regional risk estimates and high-throughput functional assays of variant impact. We compare with existing methods and the use of protein domains (Pfam) as regions, and find REGatta better identifies individuals at elevated or reduced risk. These regions provide useful priors which can potentially be used to improve risk assessment and clinical management.
Collapse
Affiliation(s)
- James D Fife
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Christopher A Cassa
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
22
|
Yu T, Fife JD, Adzhubey I, Sherwood R, Cassa CA. Joint estimation and imputation of variant functional effects using high throughput assay data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.06.23284280. [PMID: 36711907 PMCID: PMC9882428 DOI: 10.1101/2023.01.06.23284280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Deep mutational scanning assays enable the functional assessment of variants in high throughput. Phenotypic measurements from these assays are broadly concordant with clinical outcomes but are prone to noise at the individual variant level. We develop a framework to exploit related measurements within and across experimental assays to jointly estimate variant impact. Drawing from a large corpus of deep mutational scanning data, we collectively estimate the mean functional effect per AA residue position within each gene, normalize observed functional effects by substitution type, and make estimates for individual allelic variants with a pipeline called FUSE (Functional Substitution Estimation). FUSE improves the correlation of functional screening datasets covering the same variants, better separates estimated functional impacts for known pathogenic and benign variants (ClinVar BRCA1, p=2.24×10-51), and increases the number of variants for which predictions can be made (2,741 to 10,347) by inferring additional variant effects for substitutions not experimentally screened. For UK Biobank patients who carry a rare variant in TP53, FUSE significantly improves the separation of patients who develop cancer syndromes from those without cancer (p=1.77×10-6). These approaches promise to improve estimates of variant impact and broaden the utility of screening data generated from functional assays.
Collapse
Affiliation(s)
- Tian Yu
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
| | - James D. Fife
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
| | - Ivan Adzhubey
- Department of Biomedical Informatics, Blavatnik Institute, Harvard Medical School, Boston, Massachusetts
| | - Richard Sherwood
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
| | - Christopher A. Cassa
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
23
|
Frankish A, Carbonell-Sala S, Diekhans M, Jungreis I, Loveland J, Mudge J, Sisu C, Wright J, Arnan C, Barnes I, Banerjee A, Bennett R, Berry A, Bignell A, Boix C, Calvet F, Cerdán-Vélez D, Cunningham F, Davidson C, Donaldson S, Dursun C, Fatima R, Giorgetti S, Giron C, Gonzalez J, Hardy M, Harrison P, Hourlier T, Hollis Z, Hunt T, James B, Jiang Y, Johnson R, Kay M, Lagarde J, Martin F, Gómez L, Nair S, Ni P, Pozo F, Ramalingam V, Ruffier M, Schmitt B, Schreiber J, Steed E, Suner MM, Sumathipala D, Sycheva I, Uszczynska-Ratajczak B, Wass E, Yang Y, Yates A, Zafrulla Z, Choudhary J, Gerstein M, Guigo R, Hubbard TJP, Kellis M, Kundaje A, Paten B, Tress M, Flicek P. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res 2023; 51:D942-D949. [PMID: 36420896 PMCID: PMC9825462 DOI: 10.1093/nar/gkac1071] [Citation(s) in RCA: 78] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 10/15/2022] [Accepted: 11/07/2022] [Indexed: 11/27/2022] Open
Abstract
GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
Collapse
Affiliation(s)
- Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sílvia Carbonell-Sala
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Sisu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Life Sciences, Brunel University London, Uxbridge UB8 3PH, UK
| | - James C Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Carme Arnan
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Abhimanyu Banerjee
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carles Boix
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Ferriol Calvet
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cagatay Dursun
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlos Garcıa Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Manuel Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Benjamin James
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Rory Johnson
- Department of Medical Oncology, Bern University Hospital, Murtenstrasse 35, 3008 Bern, Switzerland
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, D04 V1W8, Ireland
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julien Lagarde
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Surag Nair
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Pengyu Ni
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Vivek Ramalingam
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bianca M Schmitt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jacob M Schreiber
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dulika Sumathipala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Sycheva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Barbara Uszczynska-Ratajczak
- Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yucheng T Yang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zahoor Zafrulla
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Jyoti S Choudhary
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Roderic Guigo
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
| | - Tim J P Hubbard
- Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
24
|
Rybarczyk A, Lehmann T, Iwańczyk-Skalska E, Juzwa W, Pławski A, Kopciuch K, Blazewicz J, Jagodziński PP. In silico and in vitro analysis of the impact of single substitutions within EXO-motifs on Hsa-MiR-1246 intercellular transfer in breast cancer cell. J Appl Genet 2023; 64:105-124. [PMID: 36394782 PMCID: PMC9837009 DOI: 10.1007/s13353-022-00730-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/26/2022] [Accepted: 09/27/2022] [Indexed: 11/19/2022]
Abstract
MiR-1246 has recently gained much attention and many studies have shown its oncogenic role in colorectal, breast, lung, and ovarian cancers. However, miR-1246 processing, stability, and mechanisms directing miR-1246 into neighbor cells remain still unclear. In this study, we aimed to determine the role of single-nucleotide substitutions within short exosome sorting motifs - so-called EXO-motifs: GGAG and GCAG present in miR-1246 sequence on its intracellular stability and extracellular transfer. We applied in silico methods such as 2D and 3D structure analysis and modeling of protein interactions. We also performed in vitro validation through the transfection of fluorescently labeled miRNA to MDA-MB-231 cells, which we analyzed by flow cytometry and fluorescent microscopy. Our results suggest that nucleotides alterations that disturbed miR-1246 EXO-motifs were able to modulate miRNA-1246 stability and its transfer level to the neighboring cells, suggesting that the molecular mechanism of RNA stability and intercellular transfer can be closely related.
Collapse
Affiliation(s)
- Agnieszka Rybarczyk
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
| | - Tomasz Lehmann
- Department of Biochemistry and Molecular Biology, Poznan University of Medical Sciences, Fredry 10, 61-701 Poznan, Poland
| | - Ewa Iwańczyk-Skalska
- Department of Biochemistry and Molecular Biology, Poznan University of Medical Sciences, Fredry 10, 61-701 Poznan, Poland
| | - Wojciech Juzwa
- Biotechnology and Food Microbiology, Poznan University of Life Sciences, Wojska Polskiego 48, 60-627 Poznan, Poland
| | - Andrzej Pławski
- Institute of Human Genetics, Polish Academy of Sciences, Strzeszyńska 32, 60-479 Poznan, Poland
| | - Kamil Kopciuch
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
| | - Jacek Blazewicz
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Paweł P. Jagodziński
- Department of Biochemistry and Molecular Biology, Poznan University of Medical Sciences, Fredry 10, 61-701 Poznan, Poland
| |
Collapse
|
25
|
Lo Giudice C, Zambelli F, Chiara M, Pavesi G, Tangaro M, Picardi E, Pesole G. UTRdb 2.0: a comprehensive, expert curated catalog of eukaryotic mRNAs untranslated regions. Nucleic Acids Res 2022; 51:D337-D344. [PMID: 36399486 PMCID: PMC9825521 DOI: 10.1093/nar/gkac1016] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/19/2022] [Accepted: 10/25/2022] [Indexed: 11/19/2022] Open
Abstract
The 5' and 3' untranslated regions of eukaryotic mRNAs (UTRs) play crucial roles in the post-transcriptional regulation of gene expression through the modulation of nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization, and message stability. Since 1996, we have developed and maintained UTRdb, a specialized database of UTR sequences. Here we present UTRdb 2.0, a major update of UTRdb featuring an extensive collection of eukaryotic 5' and 3' UTR sequences, including over 26 million entries from over 6 million genes and 573 species, enriched with a curated set of functional annotations. Annotations include CAGE tags and polyA signals to label the completeness of 5' and 3'UTRs, respectively. In addition, uORFs and IRES are annotated in 5'UTRs as well as experimentally validated miRNA targets in 3'UTRs. Further annotations include evolutionarily conserved blocks, Rfam motifs, ADAR-mediated RNA editing events, and m6A modifications. A web interface allowing a flexible selection and retrieval of specific subsets of UTRs, selected according to a combination of criteria, has been implemented which also provides comprehensive download facilities. UTRdb 2.0 is accessible at http://utrdb.cloud.ba.infn.it/utrdb/.
Collapse
Affiliation(s)
- Claudio Lo Giudice
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, 70126 Bari, Italy
| | - Federico Zambelli
- Department of Biosciences, University of Milan, 20133 Milan, Italy,Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, Consiglio Nazionale delle Ricerche, 70126 Bari, Italy
| | - Matteo Chiara
- Department of Biosciences, University of Milan, 20133 Milan, Italy,Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, Consiglio Nazionale delle Ricerche, 70126 Bari, Italy
| | - Giulio Pavesi
- Department of Biosciences, University of Milan, 20133 Milan, Italy,Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, Consiglio Nazionale delle Ricerche, 70126 Bari, Italy
| | - Marco Antonio Tangaro
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, Consiglio Nazionale delle Ricerche, 70126 Bari, Italy
| | - Ernesto Picardi
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, 70126 Bari, Italy,Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, Consiglio Nazionale delle Ricerche, 70126 Bari, Italy
| | - Graziano Pesole
- To whom correspondence should be addressed. Tel: +39 0805443588;
| |
Collapse
|
26
|
Accounting for small variations in the tracrRNA sequence improves sgRNA activity predictions for CRISPR screening. Nat Commun 2022; 13:5255. [PMID: 36068235 PMCID: PMC9448816 DOI: 10.1038/s41467-022-33024-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 08/30/2022] [Indexed: 12/17/2022] Open
Abstract
CRISPR technology is a powerful tool for studying genome function. To aid in picking sgRNAs that have maximal efficacy against a target of interest from many possible options, several groups have developed models that predict sgRNA on-target activity. Although multiple tracrRNA variants are commonly used for screening, no existing models account for this feature when nominating sgRNAs. Here we develop an on-target model, Rule Set 3, that makes optimal predictions for multiple tracrRNA variants. We validate Rule Set 3 on a new dataset of sgRNAs tiling essential and non-essential genes, demonstrating substantial improvement over prior prediction models. By analyzing the differences in sgRNA activity between tracrRNA variants, we show that Pol III transcription termination is a strong determinant of sgRNA activity. We expect these results to improve the performance of CRISPR screening and inform future research on tracrRNA engineering and sgRNA modeling.
Collapse
|
27
|
Advancing FDSTools by integrating STRNaming 1.1. Forensic Sci Int Genet 2022; 61:102768. [PMID: 35994887 DOI: 10.1016/j.fsigen.2022.102768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 07/25/2022] [Accepted: 08/15/2022] [Indexed: 11/22/2022]
Abstract
The introduction of massively parallel sequencing in forensic analysis has been facilitated with typing kits, analysis software and allele naming tools such as the ForenSeq DNA Signature Prep (DSP) kit, FDSTools and STRNaming respectively. Here we describe how FDSTools 2.0 with integrated and refined STRNaming nomenclature was validated for implementation under ISO 17025 accreditation for the ForenSeq DSP kit. Newly-added options result in efficient automatic allele calling for the majority of markers while specific settings are applied for 'novel' sequence variants to avoid the calling of remaining variable noise observed in samples sequenced with the ForenSeq DSP kit that seem to arise in the PCR. Genome-wide built-in reference data allows for greatly simplified configuration of allele naming for human targets.
Collapse
|
28
|
Khan AR, Shah SH, Ajaz S, Firasat S, Abid A, Raza A. The Prevalence of Pharmacogenomics Variants and Their Clinical Relevance Among the Pakistani Population. Evol Bioinform Online 2022; 18:11769343221095834. [PMID: 35497687 PMCID: PMC9047794 DOI: 10.1177/11769343221095834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Accepted: 04/04/2022] [Indexed: 11/28/2022] Open
Abstract
Background: Pharmacogenomics (PGx), forming the basis of precision medicine, has
revolutionized traditional medical practice. Currently, drug responses such
as drug efficacy, drug dosage, and drug adverse reactions can be anticipated
based on the genetic makeup of the patients. The pharmacogenomic data of
Pakistani populations are limited. This study investigates the frequencies
of pharmacogenetic variants and their clinical relevance among ethnic groups
in Pakistan. Methods: The Pharmacogenomics Knowledge Base (PharmGKB) database was used to extract
pharmacogenetic variants that are involved in medical conditions with high
(1A + 1B) to moderate (2A + 2B) clinical evidence. Subsequently, the allele
frequencies of these variants were searched among multiethnic groups of
Pakistan (Balochi, Brahui, Burusho, Hazara, Kalash, Pashtun, Punjabi, and
Sindhi) using the 1000 Genomes Project (1KGP) and
ALlele FREquency
Database (ALFRED). Furthermore, the published
Pharmacogenomics literature on the Pakistani population was reviewed in
PubMed and Google Scholar. Results: Our search retrieved (n = 29) pharmacogenetic genes and their (n = 44)
variants with high to moderate evidence of clinical association. These
pharmacogenetic variants correspond to drug-metabolizing enzymes (n = 22),
drug-metabolizing transporters (n = 8), and PGx gene regulators, etc.
(n = 14). We found 5 pharmacogenetic variants present at >50% among 8
ethnic groups of Pakistan. These pharmacogenetic variants include
CYP2B6 (rs2279345, C; 70%-86%), CYP3A5
(rs776746, C; 64%-88%), FLT3 (rs1933437, T; 54%-74%),
CETP (rs1532624, A; 50%-70%), and DPP6
(rs6977820, C; 61%-86%) genes that are involved in drug response for
acquired immune deficiency syndrome, transplantation, cancer, heart disease,
and mental health therapy, respectively. Conclusions: This study highlights the frequency of important clinical pharmacogenetic
variants (1A, 1B, 2A, and 2B) among multi-ethnic Pakistani populations. The
high prevalence (>50%) of single nucleotide pharmacogenetic variants may
contribute to the drug response/diseases outcome. These PGx data could be
used as pharmacogenetic markers in the selection of appropriate therapeutic
regimens for specific ethnic groups of Pakistan.
Collapse
Affiliation(s)
- Abdul Rafay Khan
- Center for Human Genetics and Molecular Medicine, Sindh Institute of Urology and Transplantation, Karachi, Pakistan
| | - Sayed Hajan Shah
- Center for Human Genetics and Molecular Medicine, Sindh Institute of Urology and Transplantation, Karachi, Pakistan
| | - Sadia Ajaz
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, Pakistan
| | - Sadaf Firasat
- Center for Human Genetics and Molecular Medicine, Sindh Institute of Urology and Transplantation, Karachi, Pakistan
| | - Aiysha Abid
- Center for Human Genetics and Molecular Medicine, Sindh Institute of Urology and Transplantation, Karachi, Pakistan
| | - Ali Raza
- Center for Human Genetics and Molecular Medicine, Sindh Institute of Urology and Transplantation, Karachi, Pakistan
| |
Collapse
|
29
|
Nielsen RL, Wolthers BO, Helenius M, Albertsen BK, Clemmensen L, Nielsen K, Kanerva J, Niinimäki R, Frandsen TL, Attarbaschi A, Barzilai S, Colombini A, Escherich G, Aytan-Aktug D, Liu HC, Möricke A, Samarasinghe S, van der Sluis IM, Stanulla M, Tulstrup M, Yadav R, Zapotocka E, Schmiegelow K, Gupta R. Can Machine Learning Models Predict Asparaginase-associated Pancreatitis in Childhood Acute Lymphoblastic Leukemia. J Pediatr Hematol Oncol 2022; 44:e628-e636. [PMID: 35226426 PMCID: PMC8946594 DOI: 10.1097/mph.0000000000002292] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 06/21/2021] [Indexed: 11/26/2022]
Abstract
Asparaginase-associated pancreatitis (AAP) frequently affects children treated for acute lymphoblastic leukemia (ALL) causing severe acute and persisting complications. Known risk factors such as asparaginase dosing, older age and single nucleotide polymorphisms (SNPs) have insufficient odds ratios to allow personalized asparaginase therapy. In this study, we explored machine learning strategies for prediction of individual AAP risk. We integrated information on age, sex, and SNPs based on Illumina Omni2.5exome-8 arrays of patients with childhood ALL (N=1564, 244 with AAP 1.0 to 17.9 yo) from 10 international ALL consortia into machine learning models including regression, random forest, AdaBoost and artificial neural networks. A model with only age and sex had area under the receiver operating characteristic curve (ROC-AUC) of 0.62. Inclusion of 6 pancreatitis candidate gene SNPs or 4 validated pancreatitis SNPs boosted ROC-AUC somewhat (0.67) while 30 SNPs, identified through our AAP genome-wide association study cohort, boosted performance (0.80). Most predictive features included rs10273639 (PRSS1-PRSS2), rs10436957 (CTRC), rs13228878 (PRSS1/PRSS2), rs1505495 (GALNTL6), rs4655107 (EPHB2) and age (1 to 7 y). Second AAP following asparaginase re-exposure was predicted with ROC-AUC: 0.65. The machine learning models assist individual-level risk assessment of AAP for future prevention trials, and may legitimize asparaginase re-exposure when AAP risk is predicted to be low.
Collapse
Affiliation(s)
- Rikke L. Nielsen
- Departments of Health Technology
- Department of Pediatrics and Adolescent Medicine, University Hospital Rigshospitalet
- Sino-Danish Center for Education and Research, University of Chinese Academy of Sciences, Huairou, China
| | - Benjamin O. Wolthers
- Department of Pediatrics and Adolescent Medicine, University Hospital Rigshospitalet
| | | | - Birgitte K. Albertsen
- Department of Pediatrics and Adolescent Medicine, Aarhus University Hospital, Aarhus, Denmark
| | - Line Clemmensen
- Department of Applied Mathematics and Computer Science, Kgs. Lyngby
| | - Kasper Nielsen
- Center for Biological Sequence Analysis, Technical University of Denmark
| | - Jukka Kanerva
- Children’s Hospital, Helsinki University Central Hospital, University of Helsinki, Helsinki
| | - Riitta Niinimäki
- Oulu University Hospital, Department of Children and Adolescents, and University of Oulu, PEDEGO Research Unit, Oulu, Finland
| | - Thomas L. Frandsen
- Department of Pediatrics and Adolescent Medicine, University Hospital Rigshospitalet
| | - Andishe Attarbaschi
- Department of Pediatric Hematology and Oncology, St Anna Children’s Hospital and Department of Pediatric and Adolescent Medicine, Medical University of Vienna, Wien, Austria
| | - Shlomit Barzilai
- Pediatric Hematology and Oncology, Schneider Children’s Medical Center of Israel, Petah-Tikva, Israel and Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv-Yafo, Israel
| | - Antonella Colombini
- Department of Pediatrics, Ospedale San Gerardo, University of Milano-Bicocca, Fondazione MBBM, Monza, Italy
| | - Gabriele Escherich
- Clinic of Pediatric Hematology and Oncology, University Medical Center Eppendorf, Hamburg
| | | | - Hsi-Che Liu
- Division of Pediatric Hematology-Oncology, Mackay Memorial Hospital, Taipei, Taiwan
| | - Anja Möricke
- Department of Pediatrics, Christian-Albrechts-University Kiel and University Medical Center Schleswig-Holstein, Kiel
| | | | - Inge M. van der Sluis
- Dutch Childhood Oncology Group, The Hague and Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Martin Stanulla
- Department of Pediatric Hematology and Oncology, Hannover Medical School, Hannover, Germany
| | - Morten Tulstrup
- Department of Pediatrics and Adolescent Medicine, University Hospital Rigshospitalet
| | - Rachita Yadav
- Center for Biological Sequence Analysis, Technical University of Denmark
| | - Ester Zapotocka
- Department of Pediatric Hematology/Oncology, University Hospital Motol, Prague, Czech Republic
| | - Kjeld Schmiegelow
- Department of Pediatrics and Adolescent Medicine, University Hospital Rigshospitalet
- Institute of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen
| | | |
Collapse
|
30
|
Tsagiopoulou M, Pechlivanis N, Maniou M, Psomopoulos F. InterTADs: integration of multi-omics data on topologically associated domains, application to chronic lymphocytic leukemia. NAR Genom Bioinform 2022; 4:lqab121. [PMID: 35047813 PMCID: PMC8759567 DOI: 10.1093/nargab/lqab121] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 10/21/2021] [Accepted: 12/13/2021] [Indexed: 11/25/2022] Open
Abstract
The integration of multi-omics data can greatly facilitate the advancement of research in Life Sciences by highlighting new interactions. However, there is currently no widespread procedure for meaningful multi-omics data integration. Here, we present a robust framework, called InterTADs, for integrating multi-omics data derived from the same sample, and considering the chromatin configuration of the genome, i.e. the topologically associating domains (TADs). Following the integration process, statistical analysis highlights the differences between the groups of interest (normal versus cancer cells) relating to (i) independent and (ii) integrated events through TADs. Finally, enrichment analysis using KEGG database, Gene Ontology and transcription factor binding sites and visualization approaches are available. We applied InterTADs to multi-omics datasets from 135 patients with chronic lymphocytic leukemia (CLL) and found that the integration through TADs resulted in a dramatic reduction of heterogeneity compared to individual events. Significant differences for individual events and on TADs level were identified between patients differing in the somatic hypermutation status of the clonotypic immunoglobulin genes, the core biological stratifier in CLL, attesting to the biomedical relevance of InterTADs. In conclusion, our approach suggests a new perspective towards analyzing multi-omics data, by offering reasonable execution time, biological benchmarking and potentially contributing to pattern discovery through TADs.
Collapse
|
31
|
Hasaart KA, Manders F, Ubels J, Verheul M, van Roosmalen MJ, Groenen NM, Oka R, Kuijk E, Lopes SMCDS, Boxtel RV. Human induced pluripotent stem cells display a similar mutation burden as embryonic pluripotent cells in vivo. iScience 2022; 25:103736. [PMID: 35118356 PMCID: PMC8792070 DOI: 10.1016/j.isci.2022.103736] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 12/03/2021] [Accepted: 01/02/2022] [Indexed: 11/30/2022] Open
Abstract
Induced pluripotent stem cells (iPSCs) hold great promise for regenerative medicine, but genetic instability is a major concern. Embryonic pluripotent cells also accumulate mutations during early development, but how this relates to the mutation burden in iPSCs remains unknown. Here, we directly compared the mutation burden of cultured iPSCs with their isogenic embryonic cells during human embryogenesis. We generated developmental lineage trees of human fetuses by phylogenetic inference from somatic mutations in the genomes of multiple stem cells, which were derived from different germ layers. Using this approach, we characterized the mutations acquired pre-gastrulation and found a rate of 1.65 mutations per cell division. When cultured in hypoxic conditions, iPSCs generated from fetal stem cells of the assessed fetuses displayed a similar mutation rate and spectrum. Our results show that iPSCs maintain a genomic integrity during culture at a similar degree as their pluripotent counterparts do in vivo.
Collapse
Affiliation(s)
- Karlijn A.L. Hasaart
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS Utrecht, the Netherlands
- Oncode Institute, Jaarbeursplein 6, 3521 AL Utrecht, the Netherlands
| | - Freek Manders
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS Utrecht, the Netherlands
- Oncode Institute, Jaarbeursplein 6, 3521 AL Utrecht, the Netherlands
| | - Joske Ubels
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS Utrecht, the Netherlands
- Oncode Institute, Jaarbeursplein 6, 3521 AL Utrecht, the Netherlands
| | - Mark Verheul
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS Utrecht, the Netherlands
- Oncode Institute, Jaarbeursplein 6, 3521 AL Utrecht, the Netherlands
| | - Markus J. van Roosmalen
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS Utrecht, the Netherlands
- Oncode Institute, Jaarbeursplein 6, 3521 AL Utrecht, the Netherlands
| | - Niels M. Groenen
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS Utrecht, the Netherlands
- Oncode Institute, Jaarbeursplein 6, 3521 AL Utrecht, the Netherlands
| | - Rurika Oka
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS Utrecht, the Netherlands
- Oncode Institute, Jaarbeursplein 6, 3521 AL Utrecht, the Netherlands
| | - Ewart Kuijk
- Oncode Institute, Jaarbeursplein 6, 3521 AL Utrecht, the Netherlands
- Center for Molecular Medicine, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG Utrecht, the Netherlands
| | | | - Ruben van Boxtel
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS Utrecht, the Netherlands
- Oncode Institute, Jaarbeursplein 6, 3521 AL Utrecht, the Netherlands
| |
Collapse
|
32
|
Mitchell BL, Saklatvala JR, Dand N, Hagenbeek FA, Li X, Min JL, Thomas L, Bartels M, Jan Hottenga J, Lupton MK, Boomsma DI, Dong X, Hveem K, Løset M, Martin NG, Barker JN, Han J, Smith CH, Rentería ME, Simpson MA. Genome-wide association meta-analysis identifies 29 new acne susceptibility loci. Nat Commun 2022; 13:702. [PMID: 35132056 PMCID: PMC8821634 DOI: 10.1038/s41467-022-28252-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 01/13/2022] [Indexed: 02/08/2023] Open
Abstract
Acne vulgaris is a highly heritable skin disorder that primarily impacts facial skin. Severely inflamed lesions may leave permanent scars that have been associated with long-term psychosocial consequences. Here, we perform a GWAS meta-analysis comprising 20,165 individuals with acne from nine independent European ancestry cohorts. We identify 29 novel genome-wide significant loci and replicate 14 of the 17 previously identified risk loci, bringing the total number of reported acne risk loci to 46. Using fine-mapping and eQTL colocalisation approaches, we identify putative causal genes at several acne susceptibility loci that have previously been implicated in Mendelian hair and skin disorders, including pustular psoriasis. We identify shared genetic aetiology between acne, hormone levels, hormone-sensitive cancers and psychiatric traits. Finally, we show that a polygenic risk score calculated from our results explains up to 5.6% of the variance in acne liability in an independent cohort.
Collapse
Affiliation(s)
- Brittany L Mitchell
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, Australia
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, Australia
| | - Jake R Saklatvala
- Department of Medical and Molecular Genetics, King's College London, London, UK
| | - Nick Dand
- Department of Medical and Molecular Genetics, King's College London, London, UK
- Health Data Research UK, London, UK
| | - Fiona A Hagenbeek
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health research institute, Amsterdam, The Netherlands
| | - Xin Li
- Department of Epidemiology, Indiana University Richard M. Fairbanks School of Public Health, Indianapolis, US
- Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Indianapolis, US
| | - Josine L Min
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Laurent Thomas
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
- K. G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health, Norwegian University of Science and Technology, Trondheim, Norway
- BioCore - Bioinformatics Core Facility, Norwegian University of Science and Technology, Trondheim, Norway
| | - Meike Bartels
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health research institute, Amsterdam, The Netherlands
| | - Jouke Jan Hottenga
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health research institute, Amsterdam, The Netherlands
| | - Michelle K Lupton
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Dorret I Boomsma
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health research institute, Amsterdam, The Netherlands
| | - Xianjun Dong
- Genomics and Bioinformatics Hub, Brigham and Women's Hospital, Boston, MA, USA
- Department of Neurology, Brigham and Women's Hospital, Boston, MA, USA
| | - Kristian Hveem
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
- HUNT Research Centre, Department of Public Health and Nursing, Norwegian University of Science and Technology, Levanger, Norway
- Levanger Hospital, Nord-Trøndelag Hospital Trust, Levanger, Norway
| | - Mari Løset
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Dermatology, Clinic of Orthopaedy, Rheumatology and Dermatology, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway
| | - Nicholas G Martin
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Jonathan N Barker
- St John's Institute of Dermatology, Faculty of Life Sciences & Medicine, King's College London, London, UK
| | - Jiali Han
- Department of Epidemiology, Indiana University Richard M. Fairbanks School of Public Health, Indianapolis, US
- Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Indianapolis, US
| | - Catherine H Smith
- St John's Institute of Dermatology, Faculty of Life Sciences & Medicine, King's College London, London, UK
| | - Miguel E Rentería
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, Australia.
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, Australia.
| | - Michael A Simpson
- Department of Medical and Molecular Genetics, King's College London, London, UK.
| |
Collapse
|
33
|
Template switching in DNA replication can create and maintain RNA hairpins. Proc Natl Acad Sci U S A 2022; 119:2107005119. [PMID: 35046021 PMCID: PMC8794818 DOI: 10.1073/pnas.2107005119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2021] [Indexed: 11/18/2022] Open
Abstract
The evolutionary origin of RNA stem structures and the preservation of their base pairing under a spontaneous and random mutation process have puzzled theoretical evolutionary biologists. DNA replication-related template switching is a mutation mechanism that creates reverse-complement copies of sequence regions within a genome by replicating briefly along either the complementary or nascent DNA strand. Depending on the relative positions and context of the four switch points, this process may produce a reverse-complement repeat capable of forming the stem of a perfect DNA hairpin or fix the base pairing of an existing stem. Template switching is typically thought to trigger large structural changes, and its possible role in the origin and evolution of RNA genes has not been studied. Here, we show that the reconstructed ancestral histories of RNA genes contain mutation patterns consistent with the DNA replication-related template switching. In addition to multibase compensatory mutations, the mechanism can explain complex sequence changes, although mutations breaking the structure rarely get fixed in evolution. Our results suggest a solution for the long-standing dilemma of RNA gene evolution and demonstrate how template switching can both create perfect stems with a single mutation event and help maintaining the stem structure over time. Interestingly, template switching also provides an elegant explanation for the asymmetric base pair frequencies within RNA stems.
Collapse
|
34
|
Cunningham F, Allen JE, Allen J, Alvarez-Jarreta J, Amode M, Armean I, Austine-Orimoloye O, Azov A, Barnes I, Bennett R, Berry A, Bhai J, Bignell A, Billis K, Boddu S, Brooks L, Charkhchi M, Cummins C, Da Rin Fioretto L, Davidson C, Dodiya K, Donaldson S, El Houdaigui B, El Naboulsi T, Fatima R, Giron CG, Genez T, Martinez J, Guijarro-Clarke C, Gymer A, Hardy M, Hollis Z, Hourlier T, Hunt T, Juettemann T, Kaikala V, Kay M, Lavidas I, Le T, Lemos D, Marugán JC, Mohanan S, Mushtaq A, Naven M, Ogeh D, Parker A, Parton A, Perry M, Piližota I, Prosovetskaia I, Sakthivel M, Salam A, Schmitt B, Schuilenburg H, Sheppard D, Pérez-Silva J, Stark W, Steed E, Sutinen K, Sukumaran R, Sumathipala D, Suner MM, Szpak M, Thormann A, Tricomi FF, Urbina-Gómez D, Veidenberg A, Walsh T, Walts B, Willhoft N, Winterbottom A, Wass E, Chakiachvili M, Flint B, Frankish A, Giorgetti S, Haggerty L, Hunt S, IIsley G, Loveland J, Martin F, Moore B, Mudge J, Muffato M, Perry E, Ruffier M, Tate J, Thybert D, Trevanion S, Dyer S, Harrison P, Howe K, Yates A, Zerbino D, Flicek P. Ensembl 2022. Nucleic Acids Res 2022; 50:D988-D995. [PMID: 34791404 PMCID: PMC8728283 DOI: 10.1093/nar/gkab1049] [Citation(s) in RCA: 1018] [Impact Index Per Article: 509.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/14/2021] [Accepted: 10/19/2021] [Indexed: 12/29/2022] Open
Abstract
Ensembl (https://www.ensembl.org) is unique in its flexible infrastructure for access to genomic data and annotation. It has been designed to efficiently deliver annotation at scale for all eukaryotic life, and it also provides deep comprehensive annotation for key species. Genomes representing a greater diversity of species are increasingly being sequenced. In response, we have focussed our recent efforts on expediting the annotation of new assemblies. Here, we report the release of the greatest annual number of newly annotated genomes in the history of Ensembl via our dedicated Ensembl Rapid Release platform (http://rapid.ensembl.org). We have also developed a new method to generate comparative analyses at scale for these assemblies and, for the first time, we have annotated non-vertebrate eukaryotes. Meanwhile, we continually improve, extend and update the annotation for our high-value reference vertebrate genomes and report the details here. We have a range of specific software tools for specific tasks, such as the Ensembl Variant Effect Predictor (VEP) and the newly developed interface for the Variant Recoder. All Ensembl data, software and tools are freely available for download and are accessible programmatically.
Collapse
Affiliation(s)
- Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - James E Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jorge Alvarez-Jarreta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - M Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina M Armean
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrey G Azov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jyothish Bhai
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sanjay Boddu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Lucy Brooks
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mehrnaz Charkhchi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carla Cummins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Luca Da Rin Fioretto
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kamalkumar Dodiya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bilal El Houdaigui
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tamara El Naboulsi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thiago Genez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Gonzalez Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Guijarro-Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Arthur Gymer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Juettemann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vinay Kaikala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ilias Lavidas
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tuan Le
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Diana Lemos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - José Carlos Marugán
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Shamika Mohanan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Aleena Mushtaq
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marc Naven
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Denye N Ogeh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Parton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Malcolm Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ivana Piližota
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Prosovetskaia
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Manoj Pandian Sakthivel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ahamed Imran Abdul Salam
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bianca M Schmitt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Helen Schuilenburg
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dan Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - José G Pérez-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - William Stark
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kyösti Sutinen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ranjit Sukumaran
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dulika Sumathipala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Michal Szpak
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anja Thormann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Urbina-Gómez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andres Veidenberg
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas A Walsh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Brandon Walts
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Natalie Willhoft
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrea Winterbottom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marc Chakiachvili
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bethany Flint
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Garth R IIsley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Benjamin Moore
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emily Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - John Tate
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Thybert
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephen J Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kevin L Howe
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew D Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel R Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
35
|
Valentini S, Gandolfi F, Carolo M, Dalfovo D, Pozza L, Romanel A. OUP accepted manuscript. Nucleic Acids Res 2022; 50:1335-1350. [PMID: 35061909 PMCID: PMC8860573 DOI: 10.1093/nar/gkac024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 01/03/2022] [Accepted: 01/07/2022] [Indexed: 11/21/2022] Open
Abstract
In the last years, many studies were able to identify associations between common genetic variants and complex diseases. However, the mechanistic biological links explaining these associations are still mostly unknown. Common variants are usually associated with a relatively small effect size, suggesting that interactions among multiple variants might be a major genetic component of complex diseases. Hence, elucidating the presence of functional relations among variants may be fundamental to identify putative variants’ interactions. To this aim, we developed Polympact, a web-based resource that allows to explore functional relations among human common variants by exploiting variants’ functional element landscape, their impact on transcription factor binding motifs, and their effect on transcript levels of protein-coding genes. Polympact characterizes over 18 million common variants and allows to explore putative relations by combining clustering analysis and innovative similarity and interaction network models. The properties of the network models were studied and the utility of Polympact was demonstrated by analysing the rich sets of Breast Cancer and Alzheimer's GWAS variants. We identified relations among multiple variants, suggesting putative interactions. Polympact is freely available at bcglab.cibio.unitn.it/polympact.
Collapse
Affiliation(s)
- Samuel Valentini
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento, Italy
| | - Francesco Gandolfi
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento, Italy
| | - Mattia Carolo
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento, Italy
| | - Davide Dalfovo
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento, Italy
| | - Lara Pozza
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento, Italy
| | - Alessandro Romanel
- To whom correspondence should be addressed. Tel: +39 0461 285217; Fax: +39 0461 283937;
| |
Collapse
|
36
|
Contreras-Moreira B, Naamati G, Rosello M, Allen JE, Hunt SE, Muffato M, Gall A, Flicek P. Scripting Analyses of Genomes in Ensembl Plants. Methods Mol Biol 2022; 2443:27-55. [PMID: 35037199 PMCID: PMC7614177 DOI: 10.1007/978-1-0716-2067-0_2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Ensembl Plants ( http://plants.ensembl.org ) offers genome-scale information for plants, with four releases per year. As of release 47 (April 2020) it features 79 species and includes genome sequence, gene models, and functional annotation. Comparative analyses help reconstruct the evolutionary history of gene families, genomes, and components of polyploid genomes. Some species have gene expression baseline reports or variation across genotypes. While the data can be accessed through the Ensembl genome browser, here we review specifically how our plant genomes can be interrogated programmatically and the data downloaded in bulk. These access routes are generally consistent across Ensembl for other non-plant species, including plant pathogens, pests, and pollinators.
Collapse
Affiliation(s)
- Bruno Contreras-Moreira
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - Marc Rosello
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - James E Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - Astrid Gall
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|
37
|
Valentini S, Marchioretti C, Bisio A, Rossi A, Zaccara S, Romanel A, Inga A. TranSNPs: A class of functional SNPs affecting mRNA translation potential revealed by fraction-based allelic imbalance. iScience 2021; 24:103531. [PMID: 34917903 PMCID: PMC8666669 DOI: 10.1016/j.isci.2021.103531] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 10/27/2021] [Accepted: 11/23/2021] [Indexed: 12/23/2022] Open
Abstract
Few studies have explored the association between SNPs and alterations in mRNA translation potential. We developed an approach to identify SNPs that can mark allele-specific protein expression levels and could represent sources of inter-individual variation in disease risk. Using MCF7 cells under different treatments, we performed polysomal profiling followed by RNA sequencing of total or polysome-associated mRNA fractions and designed a computational approach to identify SNPs showing a significant change in the allelic balance between total and polysomal mRNA fractions. We identified 147 SNPs, 39 of which located in UTRs. Allele-specific differences at the translation level were confirmed in transfected MCF7 cells by reporter assays. Exploiting breast cancer data from TCGA we identified UTR SNPs demonstrating distinct prognosis features and altering binding sites of RNA-binding proteins. Our approach produced a catalog of tranSNPs, a class of functional SNPs associated with allele-specific translation and potentially endowed with prognostic value for disease risk.
Collapse
Affiliation(s)
- Samuel Valentini
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy
| | - Caterina Marchioretti
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy
- Department of Biomedical Sciences (DBS), University of Padova, 35131 Padova, Italy
| | - Alessandra Bisio
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy
| | - Annalisa Rossi
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy
| | - Sara Zaccara
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy
- Weill Medical College, Cornell University, New York 10065, NY, USA
| | - Alessandro Romanel
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy
| | - Alberto Inga
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy
| |
Collapse
|
38
|
Gordillo-Marañón M, Zwierzyna M, Charoen P, Drenos F, Chopade S, Shah T, Engmann J, Chaturvedi N, Papacosta O, Wannamethee G, Wong A, Sofat R, Kivimaki M, Price JF, Hughes AD, Gaunt TR, Lawlor DA, Gaulton A, Hingorani AD, Schmidt AF, Finan C. Validation of lipid-related therapeutic targets for coronary heart disease prevention using human genetics. Nat Commun 2021; 12:6120. [PMID: 34675202 PMCID: PMC8531035 DOI: 10.1038/s41467-021-25731-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 08/26/2021] [Indexed: 12/14/2022] Open
Abstract
Drug target Mendelian randomization (MR) studies use DNA sequence variants in or near a gene encoding a drug target, that alter the target's expression or function, as a tool to anticipate the effect of drug action on the same target. Here we apply MR to prioritize drug targets for their causal relevance for coronary heart disease (CHD). The targets are further prioritized using independent replication, co-localization, protein expression profiles and data from the British National Formulary and clinicaltrials.gov. Out of the 341 drug targets identified through their association with blood lipids (HDL-C, LDL-C and triglycerides), we robustly prioritize 30 targets that might elicit beneficial effects in the prevention or treatment of CHD, including NPC1L1 and PCSK9, the targets of drugs used in CHD prevention. We discuss how this approach can be generalized to other targets, disease biomarkers and endpoints to help prioritize and validate targets during the drug development process.
Collapse
Affiliation(s)
- María Gordillo-Marañón
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, WC1E 6BT, UK.
| | - Magdalena Zwierzyna
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, WC1E 6BT, UK
- UCL British Heart Foundation Research Accelerator, London, UK
| | - Pimphen Charoen
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, WC1E 6BT, UK
- Department of Tropical Hygiene, Faculty of Tropical Medicine, Mahidol University, Bangkok, 10400, Thailand
- Integrative Computational BioScience (ICBS) Center, Mahidol University, Bangkok, 10400, Thailand
| | - Fotios Drenos
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, WC1E 6BT, UK
- Department of Life Sciences, College of Health, Medicine, and Life Sciences, Brunel University London, Uxbridge, UK
| | - Sandesh Chopade
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, WC1E 6BT, UK
- UCL British Heart Foundation Research Accelerator, London, UK
| | - Tina Shah
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, WC1E 6BT, UK
- UCL British Heart Foundation Research Accelerator, London, UK
| | - Jorgen Engmann
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, WC1E 6BT, UK
- UCL British Heart Foundation Research Accelerator, London, UK
| | - Nishi Chaturvedi
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, WC1E 6BT, UK
- MRC Unit for Lifelong Health and Ageing, University College London, London, WC1E 7HB, UK
| | - Olia Papacosta
- Primary Care and Population Health, University College London, London, NW3 2PF, UK
| | - Goya Wannamethee
- Primary Care and Population Health, University College London, London, NW3 2PF, UK
| | - Andrew Wong
- MRC Unit for Lifelong Health and Ageing, University College London, London, WC1E 7HB, UK
| | - Reecha Sofat
- Institute of Health Informatics, University College London, London, WC1E 6BT, UK
| | - Mika Kivimaki
- Department of Epidemiology and Public Health, University College London, London, WC1E 6BT, UK
| | - Jackie F Price
- Usher Institute, University of Edinburgh, Edinburgh, EH8 9AG, UK
| | - Alun D Hughes
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, WC1E 6BT, UK
- UCL British Heart Foundation Research Accelerator, London, UK
- MRC Unit for Lifelong Health and Ageing, University College London, London, WC1E 7HB, UK
| | - Tom R Gaunt
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, BS8 2BN, UK
- Population Health, Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
- Bristol NIHR Bristol Biomedical Research Centre, University Hospitals Bristol National Health Service Foundation Trust and University of Bristol, Bristol, BS8 2BN, UK
| | - Deborah A Lawlor
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, BS8 2BN, UK
- Population Health, Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
- Bristol NIHR Bristol Biomedical Research Centre, University Hospitals Bristol National Health Service Foundation Trust and University of Bristol, Bristol, BS8 2BN, UK
| | - Anna Gaulton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Aroon D Hingorani
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, WC1E 6BT, UK
- UCL British Heart Foundation Research Accelerator, London, UK
| | - Amand F Schmidt
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, WC1E 6BT, UK
- UCL British Heart Foundation Research Accelerator, London, UK
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, The Netherlands
| | - Chris Finan
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, WC1E 6BT, UK
- UCL British Heart Foundation Research Accelerator, London, UK
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, The Netherlands
| |
Collapse
|
39
|
Ireland SM, Martin ACR. GraphQL for the delivery of bioinformatics web APIs and application to ZincBind. BIOINFORMATICS ADVANCES 2021; 1:vbab023. [PMID: 35585947 PMCID: PMC9108989 DOI: 10.1093/bioadv/vbab023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 09/06/2021] [Accepted: 09/23/2021] [Indexed: 01/27/2023]
Abstract
Motivation Many bioinformatics resources are provided as 'web services', with large databases and analysis software stored on a central server, and clients interacting with them using the hypertext transport protocol (HTTP). While some provide only a visual HTML interface, requiring a web browser to use them, many provide programmatic access using a web application programming interface (API) which returns XML, JSON or plain text that computer programs can interpret more easily. This allows access to be automated. Initially, many bioinformatics APIs used the 'simple object access protocol' (SOAP) and, more recently, representational state transfer (REST). Results GraphQL is a novel, increasingly prevalent alternative to REST and SOAP that represents the available data in the form of a graph to which any conceivable query can be submitted, and which is seeing increasing adoption in industry. Here, we review the principles of GraphQL, outline its particular suitability to the delivery of bioinformatics resources and describe its implementation in our ZincBind resource. Availability and implementation https://api.zincbind.net. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Sam M Ireland
- Division of Biosciences, Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Andrew C R Martin
- Division of Biosciences, Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
- To whom correspondence should be addressed. /
| |
Collapse
|
40
|
Boer CG, Hatzikotoulas K, Southam L, Stefánsdóttir L, Zhang Y, Coutinho de Almeida R, Wu TT, Zheng J, Hartley A, Teder-Laving M, Skogholt AH, Terao C, Zengini E, Alexiadis G, Barysenka A, Bjornsdottir G, Gabrielsen ME, Gilly A, Ingvarsson T, Johnsen MB, Jonsson H, Kloppenburg M, Luetge A, Lund SH, Mägi R, Mangino M, Nelissen RRGHH, Shivakumar M, Steinberg J, Takuwa H, Thomas LF, Tuerlings M, Babis GC, Cheung JPY, Kang JH, Kraft P, Lietman SA, Samartzis D, Slagboom PE, Stefansson K, Thorsteinsdottir U, Tobias JH, Uitterlinden AG, Winsvold B, Zwart JA, Davey Smith G, Sham PC, Thorleifsson G, Gaunt TR, Morris AP, Valdes AM, Tsezou A, Cheah KSE, Ikegawa S, Hveem K, Esko T, Wilkinson JM, Meulenbelt I, Lee MTM, van Meurs JBJ, Styrkársdóttir U, Zeggini E. Deciphering osteoarthritis genetics across 826,690 individuals from 9 populations. Cell 2021; 184:4784-4818.e17. [PMID: 34450027 PMCID: PMC8459317 DOI: 10.1016/j.cell.2021.07.038] [Citation(s) in RCA: 159] [Impact Index Per Article: 53.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 03/26/2021] [Accepted: 07/30/2021] [Indexed: 12/19/2022]
Abstract
Osteoarthritis affects over 300 million people worldwide. Here, we conduct a genome-wide association study meta-analysis across 826,690 individuals (177,517 with osteoarthritis) and identify 100 independently associated risk variants across 11 osteoarthritis phenotypes, 52 of which have not been associated with the disease before. We report thumb and spine osteoarthritis risk variants and identify differences in genetic effects between weight-bearing and non-weight-bearing joints. We identify sex-specific and early age-at-onset osteoarthritis risk loci. We integrate functional genomics data from primary patient tissues (including articular cartilage, subchondral bone, and osteophytic cartilage) and identify high-confidence effector genes. We provide evidence for genetic correlation with phenotypes related to pain, the main disease symptom, and identify likely causal genes linked to neuronal processes. Our results provide insights into key molecular players in disease processes and highlight attractive drug targets to accelerate translation.
Collapse
Affiliation(s)
- Cindy G Boer
- Department of Internal Medicine, Erasmus MC, Medical Center, 3015CN Rotterdam, the Netherlands
| | - Konstantinos Hatzikotoulas
- Institute of Translational Genomics, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
| | - Lorraine Southam
- Institute of Translational Genomics, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
| | | | - Yanfei Zhang
- Genomic Medicine Institute, Geisinger Health System, Danville, PA 17822, USA
| | - Rodrigo Coutinho de Almeida
- Department of Biomedical Data Sciences, Section Molecular Epidemiology, Postzone S05-P Leiden University Medical Center, 2333ZC Leiden, the Netherlands
| | - Tian T Wu
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong, China
| | - Jie Zheng
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK
| | - April Hartley
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK; Musculoskeletal Research Unit, Translation Health Sciences, Bristol Medical School, University of Bristol, Southmead Hospital, Bristol BS10 5NB, UK
| | - Maris Teder-Laving
- Estonian Genome Center, Institute of Genomics, University of Tartu, 51010 Tartu, Estonia
| | - Anne Heidi Skogholt
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, 7491 Trondheim, Norway
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa 230-0045, Japan
| | - Eleni Zengini
- 4(th) Psychiatric Department, Dromokaiteio Psychiatric Hospital, 12461 Athens, Greece
| | - George Alexiadis
- 1(st) Department of Orthopaedics, KAT General Hospital, 14561 Athens, Greece
| | - Andrei Barysenka
- Institute of Translational Genomics, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
| | | | - Maiken E Gabrielsen
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, 7491 Trondheim, Norway
| | - Arthur Gilly
- Institute of Translational Genomics, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
| | - Thorvaldur Ingvarsson
- Faculty of Medicine, University of Iceland, 101 Reykjavik, Iceland; Department of Orthopedic Surgery, Akureyri Hospital, 600 Akureyri, Iceland
| | - Marianne B Johnsen
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, 7491 Trondheim, Norway; Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, 0316 Oslo, Norway; Research and Communication Unit for Musculoskeletal Health (FORMI), Department of Research, Innovation and Education, Division of Clinical Neuroscience, Oslo University Hospital, 0424 Oslo, Norway
| | - Helgi Jonsson
- Department of Medicine, Landspitali The National University Hospital of Iceland, 108 Reykjavik, Iceland; Faculty of Medicine, University of Iceland, 101 Reykjavik, Iceland
| | - Margreet Kloppenburg
- Departments of Rheumatology and Clinical Epidemiology, Leiden University Medical Center, 9600, 23OORC Leiden, the Netherlands
| | - Almut Luetge
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, 7491 Trondheim, Norway
| | | | - Reedik Mägi
- Estonian Genome Center, Institute of Genomics, University of Tartu, 51010 Tartu, Estonia
| | - Massimo Mangino
- Department of Twin Research and Genetic Epidemiology, Kings College London, London SE1 7EH, UK
| | - Rob R G H H Nelissen
- Department of Orthopaedics, Leiden University Medical Center, 9600, 23OORC Leiden, the Netherlands
| | - Manu Shivakumar
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Julia Steinberg
- Institute of Translational Genomics, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany; Daffodil Centre, The University of Sydney, a joint venture with Cancer Council NSW, Sydney, NSW 1340, Australia
| | - Hiroshi Takuwa
- Laboratory for Bone and Joint Diseases, RIKEN Center for Integrative Medical Sciences, Tokyo 108-8639, Japan; Department of Orthopedic Surgery, Shimane University, Shimane 693-8501, Japan
| | - Laurent F Thomas
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, 7491 Trondheim, Norway; Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, 7491 Trondheim, Norway; BioCore-Bioinformatics Core Facility, Norwegian University of Science and Technology, 7491 Trondheim, Norway; Clinic of Laboratory Medicine, St. Olavs Hospital, Trondheim University Hospital, 7030 Trondheim, Norway
| | - Margo Tuerlings
- Department of Biomedical Data Sciences, Section Molecular Epidemiology, Postzone S05-P Leiden University Medical Center, 2333ZC Leiden, the Netherlands
| | - George C Babis
- 2(nd) Department of Orthopaedics, National and Kapodistrian University of Athens, Medical School, Nea Ionia General Hospital Konstantopouleio, 14233 Athens, Greece
| | - Jason Pui Yin Cheung
- Department of Orthopaedics and Traumatology, The University of Hong Kong, Pokfulam, Hong Kong, China
| | - Jae Hee Kang
- Department of Medicine, Brigham and Women's Hospital, 181 Longwood Ave, Boston, MA 02115, USA
| | - Peter Kraft
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA
| | - Steven A Lietman
- Musculoskeletal Institute, Geisinger Health System, Danville, PA 17822, USA
| | - Dino Samartzis
- Department of Orthopaedics and Traumatology, The University of Hong Kong, Pokfulam, Hong Kong, China; Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, IL 60612, USA
| | - P Eline Slagboom
- Department of Biomedical Data Sciences, Section Molecular Epidemiology, Postzone S05-P Leiden University Medical Center, 2333ZC Leiden, the Netherlands
| | - Kari Stefansson
- deCODE Genetics/Amgen Inc., 102 Reykjavik, Iceland; Faculty of Medicine, University of Iceland, 101 Reykjavik, Iceland
| | - Unnur Thorsteinsdottir
- deCODE Genetics/Amgen Inc., 102 Reykjavik, Iceland; Faculty of Medicine, University of Iceland, 101 Reykjavik, Iceland
| | - Jonathan H Tobias
- Musculoskeletal Research Unit, Translation Health Sciences, Bristol Medical School, University of Bristol, Southmead Hospital, Bristol BS10 5NB, UK; MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK
| | - André G Uitterlinden
- Department of Internal Medicine, Erasmus MC, Medical Center, 3015CN Rotterdam, the Netherlands
| | - Bendik Winsvold
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, 7491 Trondheim, Norway; Department of Research, Innovation and Education, Division of Clinical Neuroscience, Oslo University Hospital and University of Oslo, 0450 Oslo, Norway; Department of Neurology, Oslo University Hospital, 0424 Oslo, Norway
| | - John-Anker Zwart
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, 7491 Trondheim, Norway; Department of Research, Innovation and Education, Division of Clinical Neuroscience, Oslo University Hospital and University of Oslo, 0450 Oslo, Norway
| | - George Davey Smith
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK; Population Health Sciences, Bristol Medical School, University of Bristol, Bristol BS8 2BN, UK
| | - Pak Chung Sham
- Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong, China
| | | | - Tom R Gaunt
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK
| | - Andrew P Morris
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, University of Manchester, Manchester M13 9LJ, UK
| | - Ana M Valdes
- Faculty of Medicine and Health Sciences, School of Medicine, University of Nottingham, Nottingham, Nottinghamshire NG5 1PB, UK
| | - Aspasia Tsezou
- Laboratory of Cytogenetics and Molecular Genetics, Faculty of Medicine, University of Thessaly, Larissa 411 10, Greece
| | - Kathryn S E Cheah
- School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China
| | - Shiro Ikegawa
- Laboratory for Bone and Joint Diseases, RIKEN Center for Integrative Medical Sciences, Tokyo 108-8639, Japan
| | - Kristian Hveem
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, 7491 Trondheim, Norway; HUNT Research Center, Department of Public Health and Nursing, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, 7600 Levanger, Norway
| | - Tõnu Esko
- Estonian Genome Center, Institute of Genomics, University of Tartu, 51010 Tartu, Estonia
| | - J Mark Wilkinson
- Department of Oncology and Metabolism and Healthy Lifespan Institute, University of Sheffield, Sheffield S10 2RX, UK
| | - Ingrid Meulenbelt
- Department of Biomedical Data Sciences, Section Molecular Epidemiology, Postzone S05-P Leiden University Medical Center, 2333ZC Leiden, the Netherlands
| | - Ming Ta Michael Lee
- Genomic Medicine Institute, Geisinger Health System, Danville, PA 17822, USA; Institute of Biomedical Sciences, Academia Sinica, 115 Taipei, Taiwan
| | - Joyce B J van Meurs
- Department of Internal Medicine, Erasmus MC, Medical Center, 3015CN Rotterdam, the Netherlands
| | | | - Eleftheria Zeggini
- Institute of Translational Genomics, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany; TUM School of Medicine, Technical University of Munich and Klinikum Rechts der Isar, 81675 Munich, Germany.
| |
Collapse
|
41
|
Johnsson M, Jungnickel MK. Evidence for and localization of proposed causative variants in cattle and pig genomes. Genet Sel Evol 2021; 53:67. [PMID: 34461824 PMCID: PMC8404348 DOI: 10.1186/s12711-021-00662-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 08/20/2021] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND This paper reviews the localization of published potential causative variants in contemporary pig and cattle reference genomes, and the evidence for their causality. In spite of the difficulties inherent to the identification of causative variants from genetic mapping and genome-wide association studies, researchers in animal genetics have proposed putative causative variants for several traits relevant to livestock breeding. RESULTS For this review, we read the literature that supports potential causative variants in 13 genes (ABCG2, DGAT1, GHR, IGF2, MC4R, MSTN, NR6A1, PHGK1, PRKAG3, PLRL, RYR1, SYNGR2 and VRTN) in cattle and pigs, and localized them in contemporary reference genomes. We review the evidence for their causality, by aiming to separate the evidence for the locus, the proposed causative gene and the proposed causative variant, and report the bioinformatic searches and tactics needed to localize the sequence variants in the cattle or pig genome. CONCLUSIONS Taken together, there is usually good evidence for the association at the locus level, some evidence for a specific causative gene at eight of the loci, and some experimental evidence for a specific causative variant at six of the loci. We recommend that researchers who report new potential causative variants use referenced coordinate systems, show local sequence context, and submit variants to repositories.
Collapse
Affiliation(s)
- Martin Johnsson
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Box 7023, 750 07 Uppsala, Sweden
| | - Melissa K. Jungnickel
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, EH25 9RG Scotland, UK
| |
Collapse
|
42
|
Genomic selection signatures in autism spectrum disorder identifies cognitive genomic tradeoff and its relevance in paradoxical phenotypes of deficits versus potentialities. Sci Rep 2021; 11:10245. [PMID: 33986442 PMCID: PMC8119484 DOI: 10.1038/s41598-021-89798-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 04/26/2021] [Indexed: 11/18/2022] Open
Abstract
Autism spectrum disorder (ASD) is a heterogeneous neurodevelopmental disorder characterized by paradoxical phenotypes of deficits as well as gain in brain function. To address this a genomic tradeoff hypothesis was tested and followed up with the biological interaction and evolutionary significance of positively selected ASD risk genes. SFARI database was used to retrieve the ASD risk genes while for population datasets 1000 genome data was used. Common risk SNPs were subjected to machine learning as well as independent tests for selection, followed by Bayesian analysis to identify the cumulative effect of selection on risk SNPs. Functional implication of these positively selected risk SNPs was assessed and subjected to ontology analysis, pertaining to their interaction and enrichment of biological and cellular functions. This was followed by comparative analysis with the ancient genomes to identify their evolutionary patterns. Our results identified significant positive selection signals in 18 ASD risk SNPs. Functional and ontology analysis indicate the role of biological and cellular processes associated with various brain functions. The core of the biological interaction network constitutes genes for cognition and learning while genes in the periphery of the network had direct or indirect impact on brain function. Ancient genome analysis identified de novo and conserved evolutionary selection clusters. The de-novo evolutionary cluster represented genes involved in cognitive function. Relative enrichment of the ASD risk SNPs from the respective evolutionary cluster or biological interaction networks may help in addressing the phenotypic diversity in ASD. This cognitive genomic tradeoff signatures impacting the biological networks can explain the paradoxical phenotypes in ASD.
Collapse
|
43
|
Martin FJ, Gall A, Szpak M, Flicek P. Accessing Livestock Resources in Ensembl. Front Genet 2021; 12:650228. [PMID: 33995484 PMCID: PMC8115729 DOI: 10.3389/fgene.2021.650228] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 03/18/2021] [Indexed: 12/12/2022] Open
Abstract
Genome assembly is cheaper, more accurate and more automated than it has ever been. This is due to a combination of more cost-efficient chemistries, new sequencing technologies and better algorithms. The livestock community has been at the forefront of this new wave of genome assembly, generating some of the highest quality vertebrate genome sequences. Ensembl's goal is to add functional and comparative annotation to these genomes, through our gene annotation, genomic alignments, gene trees, regulatory, and variation data. We run computationally complex analyses in a high throughput and consistent manner to help accelerate downstream science. Our livestock resources are continuously growing in both breadth and depth. We annotate reference genome assemblies for newly sequenced species and regularly update annotation for existing genomes. We are the only major resource to support the annotation of breeds and other non-reference assemblies. We currently provide resources for 13 pig breeds, maternal and paternal haplotypes for hybrid cattle and various other non-reference or wild type assemblies for livestock species. Here, we describe the livestock data present in Ensembl and provide protocols for how to view data in our genome browser, download via it our FTP site, manipulate it via our tools and interact with it programmatically via our REST API.
Collapse
Affiliation(s)
- Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| | - Astrid Gall
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| | - Michal Szpak
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| |
Collapse
|
44
|
Cormier MJ, Belyeu JR, Pedersen BS, Brown J, Köster J, Quinlan AR. Go Get Data (GGD) is a framework that facilitates reproducible access to genomic data. Nat Commun 2021; 12:2151. [PMID: 33846313 PMCID: PMC8041854 DOI: 10.1038/s41467-021-22381-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 03/09/2021] [Indexed: 12/05/2022] Open
Abstract
The rapid increase in the amount of genomic data provides researchers with an opportunity to integrate diverse datasets and annotations when addressing a wide range of biological questions. However, genomic datasets are deposited on different platforms and are stored in numerous formats from multiple genome builds, which complicates the task of collecting, annotating, transforming, and integrating data as needed. Here, we developed Go Get Data (GGD) as a fast, reproducible approach to installing standardized data recipes. GGD is available on Github ( https://gogetdata.github.io/ ), is extendable to other data types, and can streamline the complexities typically associated with data integration, saving researchers time and improving research reproducibility.
Collapse
Affiliation(s)
- Michael J Cormier
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Jonathan R Belyeu
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Joseph Brown
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Johannes Köster
- Institute of Human Genetics, University of Duisburg-Essen, Essen, NRW, Germany
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA.
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
45
|
Munz M, Khodaygani M, Aherrahrou Z, Busch H, Wohlers I. In silico candidate variant and gene identification using inbred mouse strains. PeerJ 2021; 9:e11017. [PMID: 33763305 PMCID: PMC7956000 DOI: 10.7717/peerj.11017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 02/06/2021] [Indexed: 12/05/2022] Open
Abstract
Mice are the most widely used animal model to study genotype to phenotype relationships. Inbred mice are genetically identical, which eliminates genetic heterogeneity and makes them particularly useful for genetic studies. Many different strains have been bred over decades and a vast amount of phenotypic data has been generated. In addition, recently whole genome sequencing-based genome-wide genotype data for many widely used inbred strains has been released. Here, we present an approach for in silico fine-mapping that uses genotypic data of 37 inbred mouse strains together with phenotypic data provided by the user to propose candidate variants and genes for the phenotype under study. Public genome-wide genotype data covering more than 74 million variant sites is queried efficiently in real-time to provide those variants that are compatible with the observed phenotype differences between strains. Variants can be filtered by molecular consequences and by corresponding molecular impact. Candidate gene lists can be generated from variant lists on the fly. Fine-mapping together with annotation or filtering of results is provided in a Bioconductor package called MouseFM. In order to characterize candidate variant lists under various settings, MouseFM was applied to two expression data sets across 20 inbred mouse strains, one from neutrophils and one from CD4+ T cells. Fine-mapping was assessed for about 10,000 genes, respectively, and identified candidate variants and haplotypes for many expression quantitative trait loci (eQTLs) reported previously based on these data. For albinism, MouseFM reports only one variant allele of moderate or high molecular impact that only albino mice share: a missense variant in the Tyr gene, reported previously to be causal for this phenotype. Performing in silico fine-mapping for interfrontal bone formation in mice using four strains with and five strains without interfrontal bone results in 12 genes. Of these, three are related to skull shaping abnormality. Finally performing fine-mapping for dystrophic cardiac calcification by comparing 9 strains showing the phenotype with eight strains lacking it, we identify only one moderate impact variant in the known causal gene Abcc6. In summary, this illustrates the benefit of using MouseFM for candidate variant and gene identification.
Collapse
Affiliation(s)
- Matthias Munz
- Medical Systems Biology Division, Lübeck Institute of Experimental Dermatology and Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany
| | - Mohammad Khodaygani
- Medical Systems Biology Division, Lübeck Institute of Experimental Dermatology and Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany
| | | | - Hauke Busch
- Medical Systems Biology Division, Lübeck Institute of Experimental Dermatology and Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany
| | - Inken Wohlers
- Medical Systems Biology Division, Lübeck Institute of Experimental Dermatology and Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany
| |
Collapse
|
46
|
Protein context shapes the specificity of SH3 domain-mediated interactions in vivo. Nat Commun 2021; 12:1597. [PMID: 33712617 PMCID: PMC7954794 DOI: 10.1038/s41467-021-21873-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 02/17/2021] [Indexed: 02/07/2023] Open
Abstract
Protein–protein interactions (PPIs) between modular binding domains and their target peptide motifs are thought to largely depend on the intrinsic binding specificities of the domains. The large family of SRC Homology 3 (SH3) domains contribute to cellular processes via their ability to support such PPIs. While the intrinsic binding specificities of SH3 domains have been studied in vitro, whether each domain is necessary and sufficient to define PPI specificity in vivo is largely unknown. Here, by combining deletion, mutation, swapping and shuffling of SH3 domains and measurements of their impact on protein interactions in yeast, we find that most SH3s do not dictate PPI specificity independently from their host protein in vivo. We show that the identity of the host protein and the position of the SH3 domains within their host are critical for PPI specificity, for cellular functions and for key biophysical processes such as phase separation. Our work demonstrates the importance of the interplay between a modular PPI domain such as SH3 and its host protein in establishing specificity to wire PPI networks. These findings will aid understanding how protein networks are rewired during evolution and in the context of mutation-driven diseases such as cancer. The SRC Homology 3 (SH3) domains mediate protein–protein interactions (PPIs). Here, the authors assess the SH3-mediated PPIs in yeast, and show that the identity of the protein itself and the position of the SH3 both affect the interaction specificity and thus the PPI-dependent cellular functions.
Collapse
|
47
|
Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Azov AG, Bennett R, Bhai J, Billis K, Boddu S, Charkhchi M, Cummins C, Da Rin Fioretto L, Davidson C, Dodiya K, El Houdaigui B, Fatima R, Gall A, Garcia Giron C, Grego T, Guijarro-Clarke C, Haggerty L, Hemrom A, Hourlier T, Izuogu OG, Juettemann T, Kaikala V, Kay M, Lavidas I, Le T, Lemos D, Gonzalez Martinez J, Marugán JC, Maurel T, McMahon AC, Mohanan S, Moore B, Muffato M, Oheh DN, Paraschas D, Parker A, Parton A, Prosovetskaia I, Sakthivel MP, Salam AIA, Schmitt BM, Schuilenburg H, Sheppard D, Steed E, Szpak M, Szuba M, Taylor K, Thormann A, Threadgold G, Walts B, Winterbottom A, Chakiachvili M, Chaubal A, De Silva N, Flint B, Frankish A, Hunt SE, IIsley GR, Langridge N, Loveland JE, Martin FJ, Mudge JM, Morales J, Perry E, Ruffier M, Tate J, Thybert D, Trevanion SJ, Cunningham F, Yates AD, Zerbino DR, Flicek P. Ensembl 2021. Nucleic Acids Res 2021; 49:D884-D891. [PMID: 33137190 PMCID: PMC7778975 DOI: 10.1093/nar/gkaa942] [Citation(s) in RCA: 1004] [Impact Index Per Article: 334.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 10/05/2020] [Accepted: 10/07/2020] [Indexed: 12/12/2022] Open
Abstract
The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed and comprehensive annotation of gene structures, regulatory elements and variants, and enable comparative genomics by inferring the evolutionary history of genes and genomes. Our integrated genomic data are made available in a variety of ways, including genome browsers, search interfaces, specialist tools such as the Ensembl Variant Effect Predictor, download files and programmatic interfaces. Here, we present recent Ensembl developments including two new website portals. Ensembl Rapid Release (http://rapid.ensembl.org) is designed to provide core tools and services for genomes as soon as possible and has been deployed to support large biodiversity sequencing projects. Our SARS-CoV-2 genome browser (https://covid-19.ensembl.org) integrates our own annotation with publicly available genomic data from numerous sources to facilitate the use of genomics in the international scientific response to the COVID-19 pandemic. We also report on other updates to our annotation resources, tools and services. All Ensembl data and software are freely available without restriction.
Collapse
Affiliation(s)
- Kevin L Howe
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Premanand Achuthan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - James Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jorge Alvarez-Jarreta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - M Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina M Armean
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrey G Azov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jyothish Bhai
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sanjay Boddu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mehrnaz Charkhchi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carla Cummins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Luca Da Rin Fioretto
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kamalkumar Dodiya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bilal El Houdaigui
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Astrid Gall
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tiago Grego
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Guijarro-Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anmol Hemrom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Osagie G Izuogu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Juettemann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vinay Kaikala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ilias Lavidas
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tuan Le
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Diana Lemos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Gonzalez Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - José Carlos Marugán
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Maurel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Aoife C McMahon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Shamika Mohanan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Benjamin Moore
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Denye N Oheh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dimitrios Paraschas
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Parton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Prosovetskaia
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Manoj P Sakthivel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ahamed I Abdul Salam
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bianca M Schmitt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Helen Schuilenburg
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dan Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Michal Szpak
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marek Szuba
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anja Thormann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Glen Threadgold
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Brandon Walts
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrea Winterbottom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marc Chakiachvili
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ameya Chaubal
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nishadi De Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bethany Flint
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Garth R IIsley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nick Langridge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Joanella Morales
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emily Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - John Tate
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Thybert
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephen J Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew D Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel R Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
48
|
Procter JB, Carstairs GM, Soares B, Mourão K, Ofoegbu TC, Barton D, Lui L, Menard A, Sherstnev N, Roldan-Martinez D, Duce S, Martin DMA, Barton GJ. Alignment of Biological Sequences with Jalview. Methods Mol Biol 2021; 2231:203-224. [PMID: 33289895 PMCID: PMC7116599 DOI: 10.1007/978-1-0716-1036-7_13] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In this chapter, we introduce core functionality of the Jalview interactive platform for the creation, analysis, and publication of multiple sequence alignments. A workflow is described based on Jalview's core functions: from data import to figure generation, including import of alignment reliability scores from T-Coffee and use of Jalview from the command line. The accompanying notes provide background information on the underlying methods and discuss additional options for working with Jalview to perform multiple sequence alignment, functional site analysis, and publication of alignments on the web.
Collapse
Affiliation(s)
| | | | - Ben Soares
- University of Dundee, Dundee, Scotland, UK
| | - Kira Mourão
- University of Dundee, Dundee, Scotland, UK
- Synpromics Ltd., Edinburgh, Scotland, UK
| | | | - Daniel Barton
- University of Dundee, Dundee, Scotland, UK
- Institute of Physics, Chinese Academy of Sciences, Beijing, China
| | - Lauren Lui
- University of Dundee, Dundee, Scotland, UK
- UC Santa Cruz, Santa Cruz, CA, USA
| | | | - Natasha Sherstnev
- University of Dundee, Dundee, Scotland, UK
- U. Paris Sud, Orsay, France
| | | | | | | | | |
Collapse
|
49
|
Biglari N, Gaziano I, Schumacher J, Radermacher J, Paeger L, Klemm P, Chen W, Corneliussen S, Wunderlich CM, Sue M, Vollmar S, Klöckener T, Sotelo-Hitschfeld T, Abbasloo A, Edenhofer F, Reimann F, Gribble FM, Fenselau H, Kloppenburg P, Wunderlich FT, Brüning JC. Functionally distinct POMC-expressing neuron subpopulations in hypothalamus revealed by intersectional targeting. Nat Neurosci 2021; 24:913-929. [PMID: 34002087 PMCID: PMC8249241 DOI: 10.1038/s41593-021-00854-0] [Citation(s) in RCA: 65] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 03/31/2021] [Indexed: 02/03/2023]
Abstract
Pro-opiomelanocortin (POMC)-expressing neurons in the arcuate nucleus of the hypothalamus represent key regulators of metabolic homeostasis. Electrophysiological and single-cell sequencing experiments have revealed a remarkable degree of heterogeneity of these neurons. However, the exact molecular basis and functional consequences of this heterogeneity have not yet been addressed. Here, we have developed new mouse models in which intersectional Cre/Dre-dependent recombination allowed for successful labeling, translational profiling and functional characterization of distinct POMC neurons expressing the leptin receptor (Lepr) and glucagon like peptide 1 receptor (Glp1r). Our experiments reveal that POMCLepr+ and POMCGlp1r+ neurons represent largely nonoverlapping subpopulations with distinct basic electrophysiological properties. They exhibit a specific anatomical distribution within the arcuate nucleus and differentially express receptors for energy-state communicating hormones and neurotransmitters. Finally, we identify a differential ability of these subpopulations to suppress feeding. Collectively, we reveal a notably distinct functional microarchitecture of critical metabolism-regulatory neurons.
Collapse
Affiliation(s)
- Nasim Biglari
- grid.418034.a0000 0004 4911 0702Max Planck Institute for Metabolism Research, Department of Neuronal Control of Metabolism, Cologne, Germany ,grid.411097.a0000 0000 8852 305XPoliclinic for Endocrinology, Diabetes and Preventive Medicine (PEDP), University Hospital Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany
| | - Isabella Gaziano
- grid.418034.a0000 0004 4911 0702Max Planck Institute for Metabolism Research, Department of Neuronal Control of Metabolism, Cologne, Germany ,grid.411097.a0000 0000 8852 305XPoliclinic for Endocrinology, Diabetes and Preventive Medicine (PEDP), University Hospital Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany
| | - Jonas Schumacher
- grid.418034.a0000 0004 4911 0702Max Planck Institute for Metabolism Research, Department of Neuronal Control of Metabolism, Cologne, Germany ,grid.411097.a0000 0000 8852 305XPoliclinic for Endocrinology, Diabetes and Preventive Medicine (PEDP), University Hospital Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany
| | - Jan Radermacher
- grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Institute for Zoology, Biocenter, University of Cologne, Cologne, Germany
| | - Lars Paeger
- grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Institute for Zoology, Biocenter, University of Cologne, Cologne, Germany
| | - Paul Klemm
- grid.418034.a0000 0004 4911 0702Max Planck Institute for Metabolism Research, Department of Neuronal Control of Metabolism, Cologne, Germany ,grid.411097.a0000 0000 8852 305XPoliclinic for Endocrinology, Diabetes and Preventive Medicine (PEDP), University Hospital Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany
| | - Weiyi Chen
- grid.418034.a0000 0004 4911 0702Max Planck Institute for Metabolism Research, Department of Neuronal Control of Metabolism, Cologne, Germany ,grid.411097.a0000 0000 8852 305XPoliclinic for Endocrinology, Diabetes and Preventive Medicine (PEDP), University Hospital Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany
| | - Svenja Corneliussen
- grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Institute for Zoology, Biocenter, University of Cologne, Cologne, Germany
| | - Claudia M. Wunderlich
- grid.418034.a0000 0004 4911 0702Max Planck Institute for Metabolism Research, Department of Neuronal Control of Metabolism, Cologne, Germany ,grid.411097.a0000 0000 8852 305XPoliclinic for Endocrinology, Diabetes and Preventive Medicine (PEDP), University Hospital Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany
| | - Michael Sue
- grid.418034.a0000 0004 4911 0702Max Planck Institute for Metabolism Research, Department of Neuronal Control of Metabolism, Cologne, Germany
| | - Stefan Vollmar
- grid.418034.a0000 0004 4911 0702Max Planck Institute for Metabolism Research, Department of Neuronal Control of Metabolism, Cologne, Germany
| | - Tim Klöckener
- grid.418034.a0000 0004 4911 0702Max Planck Institute for Metabolism Research, Department of Neuronal Control of Metabolism, Cologne, Germany ,grid.411097.a0000 0000 8852 305XPoliclinic for Endocrinology, Diabetes and Preventive Medicine (PEDP), University Hospital Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany
| | - Tamara Sotelo-Hitschfeld
- grid.418034.a0000 0004 4911 0702Max Planck Institute for Metabolism Research, Department of Neuronal Control of Metabolism, Cologne, Germany ,grid.411097.a0000 0000 8852 305XPoliclinic for Endocrinology, Diabetes and Preventive Medicine (PEDP), University Hospital Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany
| | - Amin Abbasloo
- grid.418034.a0000 0004 4911 0702Max Planck Institute for Metabolism Research, Department of Neuronal Control of Metabolism, Cologne, Germany
| | - Frank Edenhofer
- grid.5771.40000 0001 2151 8122Leopold-Franzens-Universität Innsbruck, Institute for Molecular Biology, Innsbruck, Austria
| | - Frank Reimann
- grid.120073.70000 0004 0622 5016Cambridge Institute for Medical Research and Medical Research Council Metabolic Diseases Unit, Addenbrooke’s Hospital, Cambridge, UK
| | - Fiona M. Gribble
- grid.120073.70000 0004 0622 5016Cambridge Institute for Medical Research and Medical Research Council Metabolic Diseases Unit, Addenbrooke’s Hospital, Cambridge, UK
| | - Henning Fenselau
- grid.411097.a0000 0000 8852 305XPoliclinic for Endocrinology, Diabetes and Preventive Medicine (PEDP), University Hospital Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany ,grid.418034.a0000 0004 4911 0702Max Planck Institute for Metabolism Research, Research Group Synaptic Transmission in Energy Homeostasis, Cologne, Germany
| | - Peter Kloppenburg
- grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Institute for Zoology, Biocenter, University of Cologne, Cologne, Germany
| | - Frank T. Wunderlich
- grid.418034.a0000 0004 4911 0702Max Planck Institute for Metabolism Research, Department of Neuronal Control of Metabolism, Cologne, Germany ,grid.411097.a0000 0000 8852 305XPoliclinic for Endocrinology, Diabetes and Preventive Medicine (PEDP), University Hospital Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany
| | - Jens C. Brüning
- grid.418034.a0000 0004 4911 0702Max Planck Institute for Metabolism Research, Department of Neuronal Control of Metabolism, Cologne, Germany ,grid.411097.a0000 0000 8852 305XPoliclinic for Endocrinology, Diabetes and Preventive Medicine (PEDP), University Hospital Cologne, Cologne, Germany ,grid.6190.e0000 0000 8580 3777Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases (CECAD) and Center of Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany ,National Center for Diabetes Research (DZD), Ingolstädter Landstrasse 1, Neuherberg, Germany
| |
Collapse
|
50
|
Veidenberg A, Löytynoja A. Evolutionary Sequence Analysis and Visualization with Wasabi. Methods Mol Biol 2021; 2231:225-240. [PMID: 33289896 DOI: 10.1007/978-1-0716-1036-7_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Wasabi is an open-source, web-based graphical environment for evolutionary sequence analysis and visualization, designed to work with multiple sequence alignments within their phylogenetic context. Its interactive user interface provides convenient access to external data sources and computational tools and is easily extendable with custom tools and pipelines using a plugin system. Wasabi stores intermediate editing and analysis steps as workflow histories and provides direct-access web links to datasets, allowing for reproducible, collaborative research, and easy dissemination of the results. In addition to shared analyses and installation-free usage, the web-based design allows Wasabi to be run as a cross-platform, stand-alone application and makes its integration to other web services straightforward.This chapter gives a detailed description and guidelines for the use of Wasabi's analysis environment. Example use cases will give step-by-step instructions for practical application of the public Wasabi, from quick data visualization to branched analysis pipelines and publishing of results. We end with a brief discussion of advanced usage of Wasabi, including command-line communication, interface extension, offline usage, and integration to local and public web services. The public Wasabi application, its source code, documentation, and other materials are available at http://wasabiapp.org.
Collapse
Affiliation(s)
- Andres Veidenberg
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| | - Ari Löytynoja
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| |
Collapse
|