1
|
Jain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, Huff CD, Murakami K, Nagai Y, Imanishi T, Mungall CJ, Jacobsen JOB, Kim D, Jeong CS, Jones DT, Li MJ, Guthrie VB, Bhattacharya R, Chen YC, Douville C, Fan J, Kim D, Masica D, Niknafs N, Sengupta S, Tokheim C, Turner TN, Yeo HTG, Karchin R, Shin S, Welch R, Keles S, Li Y, Kellis M, Corbi-Verge C, Strokach AV, Kim PM, Klein TE, Mohan R, Sinnott-Armstrong NA, Wainberg M, Kundaje A, Gonzaludo N, Mak ACY, Chhibber A, Lam HYK, Dahary D, Fishilevich S, Lancet D, Lee I, Bachman B, Katsonis P, Lua RC, Wilson SJ, Lichtarge O, Bhat RR, Sundaram L, Viswanath V, Bellazzi R, Nicora G, Rizzo E, Limongelli I, Mezlini AM, Chang R, Kim S, Lai C, O’Connor R, Topper S, van den Akker J, Zhou AY, Zimmer AD, Mishne G, Bergquist TR, Breese MR, Guerrero RF, Jiang Y, Kiga N, Li B, Mort M, Pagel KA, Pejaver V, Stamboulian MH, Thusberg J, Mooney SD, Teerakulkittipong N, Cao C, Kundu K, Yin Y, Yu CH, Kleyman M, Lin CF, Stackpole M, Mount SM, Eraslan G, Mueller NS, Naito T, Rao AR, Azaria JR, Brodie A, Ofran Y, Garg A, Pal D, Hawkins-Hooker A, Kenlay H, Reid J, Mucaki EJ, Rogan PK, Schwarz JM, Searls DB, Lee GR, Seok C, Krämer A, Shah S, Huang CV, Kirsch JF, Shatsky M, Cao Y, Chen H, Karimi M, Moronfoye O, Sun Y, Shen Y, Shigeta R, Ford CT, Nodzak C, Uppal A, Shi X, Joseph T, Kotte S, Rana S, Rao A, Saipradeep VG, Sivadasan N, Sunderam U, Stanke M, Su A, Adzhubey I, Jordan DM, Sunyaev S, Rousseau F, Schymkowitz J, Van Durme J, Tavtigian SV, Carraro M, Giollo M, Tosatto SCE, Adato O, Carmel L, Cohen NE, Fenesh T, Holtzer T, Juven-Gershon T, Unger R, Niroula A, Olatubosun A, Väliaho J, Yang Y, Vihinen M, Wahl ME, Chang B, Chong KC, Hu I, Sun R, Wu WKK, Xia X, Zee BC, Wang MH, Wang M, Wu C, Lu Y, Chen K, Yang Y, Yates CM, Kreimer A, Yan Z, Yosef N, Zhao H, Wei Z, Yao Z, Zhou F, Folkman L, Zhou Y, Daneshjou R, Altman RB, Inoue F, Ahituv N, Arkin AP, Lovisa F, Bonvini P, Bowdin S, Gianni S, Mantuano E, Minicozzi V, Novak L, Pasquo A, Pastore A, Petrosino M, Puglisi R, Toto A, Veneziano L, Chiaraluce R, Ball MP, Bobe JR, Church GM, Consalvi V, Cooper DN, Buckley BA, Sheridan MB, Cutting GR, Scaini MC, Cygan KJ, Fredericks AM, Glidden DT, Neil C, Rhine CL, Fairbrother WG, Alontaga AY, Fenton AW, Matreyek KA, Starita LM, Fowler DM, Löscher BS, Franke A, Adamson SI, Graveley BR, Gray JW, Malloy MJ, Kane JP, Kousi M, Katsanis N, Schubach M, Kircher M, Mak ACY, Tang PLF, Kwok PY, Lathrop RH, Clark WT, Yu GK, LeBowitz JH, Benedicenti F, Bettella E, Bigoni S, Cesca F, Mammi I, Marino-Buslje C, Milani D, Peron A, Polli R, Sartori S, Stanzial F, Toldo I, Turolla L, Aspromonte MC, Bellini M, Leonardi E, Liu X, Marshall C, McCombie WR, Elefanti L, Menin C, Meyn MS, Murgia A, Nadeau KCY, Neuhausen SL, Nussbaum RL, Pirooznia M, Potash JB, Dimster-Denk DF, Rine JD, Sanford JR, Snyder M, Cote AG, Sun S, Verby MW, Weile J, Roth FP, Tewhey R, Sabeti PC, Campagna J, Refaat MM, Wojciak J, Grubb S, Schmitt N, Shendure J, Spurdle AB, Stavropoulos DJ, Walton NA, Zandi PP, Ziv E, Burke W, Chen F, Carr LR, Martinez S, Paik J, Harris-Wai J, Yarborough M, Fullerton SM, Koenig BA, McInnes G, Shigaki D, Chandonia JM, Furutsuki M, Kasak L, Yu C, Chen R, Friedberg I, Getz GA, Cong Q, Kinch LN, Zhang J, Grishin NV, Voskanian A, Kann MG, Tran E, Ioannidis NM, Hunter JM, Udani R, Cai B, Morgan AA, Sokolov A, Stuart JM, Minervini G, Monzon AM, Batzoglou S, Butte AJ, Greenblatt MS, Hart RK, Hernandez R, Hubbard TJP, Kahn S, O’Donnell-Luria A, Ng PC, Shon J, Veltman J, Zook JM. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 2024; 25:53. [PMID: 38389099 PMCID: PMC10882881 DOI: 10.1186/s13059-023-03113-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 11/17/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Collapse
|
2
|
Goar W, Babb L, Chamala S, Cline M, Freimuth RR, Hart RK, Kuzma K, Lee J, Nelson T, Prlić A, Riehle K, Smith A, Stahl K, Yates AD, Rehm HL, Wagner AH. Development and application of a computable genotype model in the GA4GH Variation Representation Specification. Pac Symp Biocomput 2023; 28:383-394. [PMID: 36540993 PMCID: PMC9782714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
As the diversity of genomic variation data increases with our growing understanding of the role of variation in health and disease, it is critical to develop standards for precise inter-system exchange of these data for research and clinical applications. The Global Alliance for Genomics and Health (GA4GH) Variation Representation Specification (VRS) meets this need through a technical terminology and information model for disambiguating and concisely representing variation concepts. Here we discuss the recent Genotype model in VRS, which may be used to represent the allelic composition of a genetic locus. We demonstrate the use of the Genotype model and the constituent Haplotype model for the precise and interoperable representation of pharmacogenomic diplotypes, HGVS variants, and VCF records using VRS and discuss how this can be leveraged to enable interoperable exchange and search operations between assayed variation and genomic knowledgebases.
Collapse
Affiliation(s)
- Wesley Goar
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Wagner AH, Babb L, Alterovitz G, Baudis M, Brush M, Cameron DL, Cline M, Griffith M, Griffith OL, Hunt SE, Kreda D, Lee JM, Li S, Lopez J, Moyer E, Nelson T, Patel RY, Riehle K, Robinson PN, Rynearson S, Schuilenburg H, Tsukanov K, Walsh B, Konopko M, Rehm HL, Yates AD, Freimuth RR, Hart RK. The GA4GH Variation Representation Specification: A computational framework for variation representation and federated identification. Cell Genom 2021; 1. [PMID: 35311178 PMCID: PMC8929418 DOI: 10.1016/j.xgen.2021.100027] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Maximizing the personal, public, research, and clinical value of genomic information will require the reliable exchange of genetic variation data. We report here the Variation Representation Specification (VRS, pronounced "verse"), an extensible framework for the computable representation of variation that complements contemporary human-readable and flat file standards for genomic variation representation. VRS provides semantically precise representations of variation and leverages this design to enable federated identification of biomolecular variation with globally consistent and unique computed identifiers. The VRS framework includes a terminology and information model, machine-readable schema, data sharing conventions, and a reference implementation, each of which is intended to be broadly useful and freely available for community use. VRS was developed by a partnership among national information resource providers, public initiatives, and diagnostic testing laboratories under the auspices of the Global Alliance for Genomics and Health (GA4GH).
Collapse
Affiliation(s)
- Alex H. Wagner
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43210, USA
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH 43215, USA
- Corresponding author
| | - Lawrence Babb
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Corresponding author
| | - Gil Alterovitz
- Harvard Medical School, Boston, MA 02115, USA
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Michael Baudis
- University of Zurich and Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Matthew Brush
- Oregon Health & Science University, Portland, OR 97239, USA
| | - Daniel L. Cameron
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
| | - Melissa Cline
- UC Santa Cruz Genomics Institute, Santa Cruz, CA 95060, USA
| | - Malachi Griffith
- Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Obi L. Griffith
- Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Sarah E. Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Kreda
- Department of Biomedical Informatics, Harvard Medical School, Boston MA 02115, USA
| | - Jennifer M. Lee
- Essex Management LLC and National Cancer Institute, Rockville, MD 20850, USA
| | - Stephanie Li
- The Global Alliance for Genomics and Health, Toronto, ON, Canada
| | | | - Eric Moyer
- National Center for Biotechnology Information, National Library of Medicine National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | - Kevin Riehle
- Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Shawn Rynearson
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112, USA
| | - Helen Schuilenburg
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kirill Tsukanov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Brian Walsh
- Oregon Health & Science University, Portland, OR 97239, USA
| | - Melissa Konopko
- The Global Alliance for Genomics and Health, Toronto, ON, Canada
| | - Heidi L. Rehm
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Cambridge, MA 02142, USA
| | - Andrew D. Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert R. Freimuth
- Center for Individualized Medicine, Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55905, USA
| | - Reece K. Hart
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- MyOme, Inc., Menlo Park, CA 94070, USA
- Corresponding author
| |
Collapse
|
4
|
Rehm HL, Page AJ, Smith L, Adams JB, Alterovitz G, Babb LJ, Barkley MP, Baudis M, Beauvais MJ, Beck T, Beckmann JS, Beltran S, Bernick D, Bernier A, Bonfield JK, Boughtwood TF, Bourque G, Bowers SR, Brookes AJ, Brudno M, Brush MH, Bujold D, Burdett T, Buske OJ, Cabili MN, Cameron DL, Carroll RJ, Casas-Silva E, Chakravarty D, Chaudhari BP, Chen SH, Cherry JM, Chung J, Cline M, Clissold HL, Cook-Deegan RM, Courtot M, Cunningham F, Cupak M, Davies RM, Denisko D, Doerr MJ, Dolman LI, Dove ES, Dursi LJ, Dyke SO, Eddy JA, Eilbeck K, Ellrott KP, Fairley S, Fakhro KA, Firth HV, Fitzsimons MS, Fiume M, Flicek P, Fore IM, Freeberg MA, Freimuth RR, Fromont LA, Fuerth J, Gaff CL, Gan W, Ghanaim EM, Glazer D, Green RC, Griffith M, Griffith OL, Grossman RL, Groza T, Guidry Auvil JM, Guigó R, Gupta D, Haendel MA, Hamosh A, Hansen DP, Hart RK, Hartley DM, Haussler D, Hendricks-Sturrup RM, Ho CW, Hobb AE, Hoffman MM, Hofmann OM, Holub P, Hsu JS, Hubaux JP, Hunt SE, Husami A, Jacobsen JO, Jamuar SS, Janes EL, Jeanson F, Jené A, Johns AL, Joly Y, Jones SJ, Kanitz A, Kato K, Keane TM, Kekesi-Lafrance K, Kelleher J, Kerry G, Khor SS, Knoppers BM, Konopko MA, Kosaki K, Kuba M, Lawson J, Leinonen R, Li S, Lin MF, Linden M, Liu X, Liyanage IU, Lopez J, Lucassen AM, Lukowski M, Mann AL, Marshall J, Mattioni M, Metke-Jimenez A, Middleton A, Milne RJ, Molnár-Gábor F, Mulder N, Munoz-Torres MC, Nag R, Nakagawa H, Nasir J, Navarro A, Nelson TH, Niewielska A, Nisselle A, Niu J, Nyrönen TH, O’Connor BD, Oesterle S, Ogishima S, Ota Wang V, Paglione LA, Palumbo E, Parkinson HE, Philippakis AA, Pizarro AD, Prlic A, Rambla J, Rendon A, Rider RA, Robinson PN, Rodarmer KW, Rodriguez LL, Rubin AF, Rueda M, Rushton GA, Ryan RS, Saunders GI, Schuilenburg H, Schwede T, Scollen S, Senf A, Sheffield NC, Skantharajah N, Smith AV, Sofia HJ, Spalding D, Spurdle AB, Stark Z, Stein LD, Suematsu M, Tan P, Tedds JA, Thomson AA, Thorogood A, Tickle TL, Tokunaga K, Törnroos J, Torrents D, Upchurch S, Valencia A, Guimera RV, Vamathevan J, Varma S, Vears DF, Viner C, Voisin C, Wagner AH, Wallace SE, Walsh BP, Williams MS, Winkler EC, Wold BJ, Wood GM, Woolley JP, Yamasaki C, Yates AD, Yung CK, Zass LJ, Zaytseva K, Zhang J, Goodhand P, North K, Birney E. GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genom 2021; 1:100029. [PMID: 35072136 PMCID: PMC8774288 DOI: 10.1016/j.xgen.2021.100029] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits.
Collapse
Affiliation(s)
- Heidi L. Rehm
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Massachusetts General Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Angela J.H. Page
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Global Alliance for Genomics and Health, Toronto, ON, Canada
| | - Lindsay Smith
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Jeremy B. Adams
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Gil Alterovitz
- Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | | | | | - Michael Baudis
- University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Michael J.S. Beauvais
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- McGill University, Montreal, QC, Canada
| | - Tim Beck
- University of Leicester, Leicester, UK
| | | | - Sergi Beltran
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Universitat de Barcelona, Barcelona, Spain
| | - David Bernick
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Tiffany F. Boughtwood
- Australian Genomics, Parkville, VIC, Australia
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
| | - Guillaume Bourque
- McGill University, Montreal, QC, Canada
- Canadian Center for Computational Genomics, Montreal, QC, Canada
| | | | | | - Michael Brudno
- Canadian Center for Computational Genomics, Montreal, QC, Canada
- University of Toronto, Toronto, ON, Canada
- University Health Network, Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
- Canadian Distributed Infrastructure for Genomics (CanDIG), Toronto, ON, Canada
| | | | - David Bujold
- McGill University, Montreal, QC, Canada
- Canadian Center for Computational Genomics, Montreal, QC, Canada
- Canadian Distributed Infrastructure for Genomics (CanDIG), Toronto, ON, Canada
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | - Daniel L. Cameron
- Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
| | | | | | | | - Bimal P. Chaudhari
- Nationwide Children’s Hospital, Columbus, OH, USA
- The Ohio State University, Columbus, OH, USA
| | - Shu Hui Chen
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Justina Chung
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Melissa Cline
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | | | - Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | | | | | | | | | - L. Jonathan Dursi
- University Health Network, Toronto, ON, Canada
- Canadian Distributed Infrastructure for Genomics (CanDIG), Toronto, ON, Canada
| | | | | | | | | | - Susan Fairley
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Khalid A. Fakhro
- Sidra Medicine, Doha, Qatar
- Weill Cornell Medicine - Qatar, Doha, Qatar
| | - Helen V. Firth
- Wellcome Sanger Institute, Hinxton, UK
- Addenbrooke’s Hospital, Cambridge, UK
| | | | | | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Ian M. Fore
- National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mallory A. Freeberg
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | - Lauren A. Fromont
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | | | - Clara L. Gaff
- Australian Genomics, Parkville, VIC, Australia
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
| | - Weiniu Gan
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Elena M. Ghanaim
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - David Glazer
- Verily Life Sciences, South San Francisco, CA, USA
| | - Robert C. Green
- Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Malachi Griffith
- Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| | - Obi L. Griffith
- Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| | | | | | | | - Roderic Guigó
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Dipayan Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | - Ada Hamosh
- Johns Hopkins University, Baltimore, MD, USA
| | - David P. Hansen
- Australian Genomics, Parkville, VIC, Australia
- The Australian e-Health Research Centre, CSIRO, Herston, QLD, Australia
| | - Reece K. Hart
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Invitae, San Francisco, CA, USA
- MyOme, Inc, San Bruno, CA, USA
| | | | - David Haussler
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, USA
| | | | | | | | - Michael M. Hoffman
- University of Toronto, Toronto, ON, Canada
- University Health Network, Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Oliver M. Hofmann
- University of Toronto, Toronto, ON, Canada
- University of Melbourne, Melbourne, VIC, Australia
| | - Petr Holub
- BBMRI-ERIC, Graz, Austria
- Masaryk University, Brno, Czech Republic
| | | | | | - Sarah E. Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Ammar Husami
- Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | | | - Saumya S. Jamuar
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Republic of Singapore
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore, Republic of Singapore
| | - Elizabeth L. Janes
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- University of Waterloo, Waterloo, ON, Canada
| | | | - Aina Jené
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Amber L. Johns
- Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Yann Joly
- McGill University, Montreal, QC, Canada
| | - Steven J.M. Jones
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Alexander Kanitz
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University of Basel, Basel, Switzerland
| | | | - Thomas M. Keane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
- University of Nottingham, Nottingham, UK
| | - Kristina Kekesi-Lafrance
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- McGill University, Montreal, QC, Canada
| | | | - Giselle Kerry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Seik-Soon Khor
- National Center for Global Health and Medicine Hospital, Tokyo, Japan
- University of Tokyo, Tokyo, Japan
| | | | | | | | | | | | - Rasko Leinonen
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Stephanie Li
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Global Alliance for Genomics and Health, Toronto, ON, Canada
| | | | - Mikael Linden
- CSC–IT Center for Science, Espoo, Finland
- ELIXIR Finland, Espoo, Finland
| | | | - Isuru Udara Liyanage
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | | | - Alice L. Mann
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Wellcome Sanger Institute, Hinxton, UK
| | | | | | | | - Anna Middleton
- Wellcome Connecting Science, Hinxton, UK
- University of Cambridge, Cambridge, UK
| | - Richard J. Milne
- Wellcome Connecting Science, Hinxton, UK
- University of Cambridge, Cambridge, UK
| | | | - Nicola Mulder
- H3ABioNet, Computational Biology Division, IDM, Faculty of Health Sciences, Cape Town, South Africa
| | | | - Rishi Nag
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Hidewaki Nakagawa
- Japan Agency for Medical Research & Development (AMED), Tokyo, Japan
- RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | | | - Arcadi Navarro
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
| | | | - Ania Niewielska
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Amy Nisselle
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
- Human Genetics Society of Australasia Education, Ethics & Social Issues Committee, Alexandria, NSW, Australia
| | - Jeffrey Niu
- University Health Network, Toronto, ON, Canada
| | - Tommi H. Nyrönen
- CSC–IT Center for Science, Espoo, Finland
- ELIXIR Finland, Espoo, Finland
| | | | - Sabine Oesterle
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Vivian Ota Wang
- National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Emilio Palumbo
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Helen E. Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | | | - Jordi Rambla
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | | | - Renee A. Rider
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Peter N. Robinson
- The Jackson Laboratory, Farmington, CT, USA
- University of Connecticut, Farmington, CT, USA
| | - Kurt W. Rodarmer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | - Alan F. Rubin
- Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
| | - Manuel Rueda
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | | | | | | | - Helen Schuilenburg
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Torsten Schwede
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University of Basel, Basel, Switzerland
| | | | | | | | - Neerjah Skantharajah
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | | | - Heidi J. Sofia
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Dylan Spalding
- CSC–IT Center for Science, Espoo, Finland
- ELIXIR Finland, Espoo, Finland
| | | | - Zornitza Stark
- Australian Genomics, Parkville, VIC, Australia
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
| | - Lincoln D. Stein
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- University of Toronto, Toronto, ON, Canada
| | | | - Patrick Tan
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Republic of Singapore
- Precision Health Research Singapore, Singapore, Republic of Singapore
- Genome Institute of Singapore, Singapore, Republic of Singapore
| | | | - Alastair A. Thomson
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Adrian Thorogood
- McGill University, Montreal, QC, Canada
- University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | | | - Katsushi Tokunaga
- University of Tokyo, Tokyo, Japan
- National Center for Global Health and Medicine, Tokyo, Japan
| | - Juha Törnroos
- CSC–IT Center for Science, Espoo, Finland
- ELIXIR Finland, Espoo, Finland
| | - David Torrents
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
- Barcelona Supercomputing Center, Barcelona, Spain
| | - Sean Upchurch
- California Institute of Technology, Pasadena, CA, USA
| | - Alfonso Valencia
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
- Barcelona Supercomputing Center, Barcelona, Spain
| | | | - Jessica Vamathevan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Susheel Varma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
- Health Data Research UK, London, UK
| | - Danya F. Vears
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
- Human Genetics Society of Australasia Education, Ethics & Social Issues Committee, Alexandria, NSW, Australia
- Melbourne Law School, University of Melbourne, Parkville, VIC, Australia
| | - Coby Viner
- University of Toronto, Toronto, ON, Canada
- University Health Network, Toronto, ON, Canada
| | | | - Alex H. Wagner
- Nationwide Children’s Hospital, Columbus, OH, USA
- The Ohio State University, Columbus, OH, USA
| | | | | | | | - Eva C. Winkler
- Section of Translational Medical Ethics, University Hospital Heidelberg, Heidelberg, Germany
| | | | | | | | | | - Andrew D. Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Christina K. Yung
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Indoc Research, Toronto, ON, Canada
| | - Lyndon J. Zass
- H3ABioNet, Computational Biology Division, IDM, Faculty of Health Sciences, Cape Town, South Africa
| | - Ksenia Zaytseva
- McGill University, Montreal, QC, Canada
- Canadian Centre for Computational Genomics, Montreal, QC, Canada
| | - Junjun Zhang
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Peter Goodhand
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Kathryn North
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- University of Toronto, Toronto, ON, Canada
- University of Melbourne, Melbourne, VIC, Australia
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
- European Molecular Biology Laboratory, Heidelberg, Germany
| |
Collapse
|
5
|
Abstract
MOTIVATION Access to biological sequence data, such as genome, transcript, or protein sequence, is at the core of many bioinformatics analysis workflows. The National Center for Biotechnology Information (NCBI), Ensembl, and other sequence database maintainers provide methods to access sequences through network connections. For many users, the convenience and currency of remotely managed data are compelling, and the network latency is non-consequential. However, for high-throughput and clinical applications, local sequence collections are essential for performance, stability, privacy, and reproducibility. RESULTS Here we describe SeqRepo, a novel system for building a local, high-performance, non-redundant collection of biological sequences. SeqRepo enables clients to use primary database identifiers and several digests to identify sequences and sequence alises. SeqRepo provides a native Python interface and a REST interface, which can run locally and enables access from other programming languages. SeqRepo also provides an alternative REST interface based on the GA4GH refget protocol. SeqRepo provides fast random access to sequence slices. We provide results that demonstrate that a local SeqRepo sequence collection yields significant performance benefits of up to 1300-fold over remote sequence collections. In our use case for a variant validation and normalization pipeline, SeqRepo improved throughput 50-fold relative to use with remote sequences. SeqRepo may be used with any species or sequence type. Regular snapshots of Human sequence collections are available. It is often convenient or necessary to use a computed digest as a sequence identifier. For example, a digest-based identifier may be used to refer to proprietary reference genomes or segments of a graph genome, for which conventional identifiers will not be available. Here we also introduce a convention for the application of the SHA-512 hashing algorithm with Base64 encoding to generate URL-safe identifiers. This convention, sha512t24u, combines a fast digest mechanism with a space-efficient representation that can be used for any object. Our report includes an analysis of timing and collision probabilities for sha512t24u. SeqRepo enables clients to use sha512t24u as identifiers, thereby seamlessly integrating public and private sequence sets. AVAILABILITY SeqRepo is released under the Apache License 2.0 and is available on github and PyPi. Docker images and database snapshots are also available. See https://github.com/biocommons/biocommons.seqrepo.
Collapse
Affiliation(s)
- Reece K. Hart
- Biocommons, San Francisco, CA, United States of America
- * E-mail:
| | - Andreas Prlić
- Invitae, Inc., San Francisco, CA, United States of America
| |
Collapse
|
6
|
Wagner AH, Hart RK, Babb L, Freimuth RR, Coffman A, Liang Y, Pitel B, Roy A, Brush M, Lee J, Lu A, Coard T, Rao S, Ritter D, Walsh B, Mockus S, Horak P, King I, Sonkin D, Madhavan S, Raca G, Chakravarty D, Griffith M, Griffith OL. Abstract 1096: Harmonization standards from the Variant Interpretation for Cancer Consortium. Cancer Res 2020. [DOI: 10.1158/1538-7445.am2020-1096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
The use of clinical gene sequencing is now commonplace, and genome analysts and molecular pathologists are often tasked with the labor-intensive process of interpreting the clinical significance of large numbers of tumor variants. Numerous independent knowledgebases have been constructed to alleviate this manual burden, however these knowledgebases are non-interoperable. As a result, the analyst is left with a difficult tradeoff: for each knowledgebase used the analyst must understand the nuances particular to that resource and integrate its evidence accordingly when generating the clinical report, but for each knowledgebase omitted there is increased potential for missed findings of clinical significance.The Variant Interpretation for Cancer Consortium (VICC; cancervariants.org) was formed as a driver project of the Global Alliance for Genomics and Health (GA4GH; ga4gh.org) to address this concern. VICC members include representatives from several major somatic interpretation knowledgebases including CIViC, OncoKB, Jax-CKB, the Weill Cornell PMKB, the IRB-Barcelona Cancer Biomarkers Database, and others. Previously, the VICC built and reported on a harmonized meta-knowledgebase of 19,551 biomarker associations of harmonized variants, diseases, drugs, and evidence across the constituent resources.In that study, we analyzed the frequency with which the tumor samples from the AACR Project GENIE cohort would match to harmonized associations. Variant matches increased dramatically from 57% to 86% when broader matching to regions describing categorical variants were allowed. Unlike precise sequence variants with specified alternate alleles, categorical variants describe a collection of potential variants with a common feature, such as “V600” (non-valine alleles at the 600 residue), “Exon 20 mutations” (all non-silent mutations in exon 20), or “Gain-of-function” (hypermorphic alterations that activate or amplify gene activity). However, matching observed sequence variants to categorical variants is challenging, as the latter are typically only described as unstructured text. Here we describe the expressive and computational GA4GH Variation Representation specification (vr-spec.readthedocs.io), which we co-developed as members of the GA4GH Genomic Knowledge Standards work stream. This specification provides a schema for common, precise forms of variation (e.g. SNVs and Indels) and the method for computing identifiers from these objects. We highlight key aspects of the specification and our work to apply it to the characterization of categorical variation, showcasing the variant terminology and classification tools developed by the VICC to support this effort. These standards and tools are free, open-source, and extensible, overcoming barriers to standardized variant knowledge sharing and search.
Citation Format: Alex H. Wagner, Reece K. Hart, Larry Babb, Robert R. Freimuth, Adam Coffman, Yonghao Liang, Beth Pitel, Angshumoy Roy, Matthew Brush, Jennifer Lee, Anna Lu, Thomas Coard, Shruti Rao, Deborah Ritter, Brian Walsh, Susan Mockus, Peter Horak, Ian King, Dmitriy Sonkin, Subha Madhavan, Gordana Raca, Debyani Chakravarty, Malachi Griffith, Obi L. Griffith. Harmonization standards from the Variant Interpretation for Cancer Consortium [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 1096.
Collapse
Affiliation(s)
- Alex H. Wagner
- 1Washington University School of Medicine, Saint Louis, MO
| | | | | | | | - Adam Coffman
- 1Washington University School of Medicine, Saint Louis, MO
| | - Yonghao Liang
- 1Washington University School of Medicine, Saint Louis, MO
| | | | | | | | | | - Anna Lu
- 7National Cancer Institute, Bethesda, MD
| | | | | | | | - Brian Walsh
- 6Oregon Health and Science University, Portland, OR
| | - Susan Mockus
- 9The Jackson Laboratory for Genomic Medicine, Farmington, CT
| | - Peter Horak
- 10National Center for Tumor Diseases, Heidelberg, Germany
| | - Ian King
- 11University of Toronto, Toronto, Ontario, Canada
| | | | | | - Gordana Raca
- 12University of Southern California, Los Angeles, CA
| | | | | | | |
Collapse
|
7
|
Wang M, Callenberg KM, Dalgleish R, Fedtsov A, Fox NK, Freeman PJ, Jacobs KB, Kaleta P, McMurry AJ, Prlić A, Rajaraman V, Hart RK. hgvs: A Python package for manipulating sequence variants using HGVS nomenclature: 2018 Update. Hum Mutat 2018; 39:1803-1813. [PMID: 30129167 PMCID: PMC6282708 DOI: 10.1002/humu.23615] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2018] [Revised: 07/15/2018] [Accepted: 08/13/2018] [Indexed: 11/29/2022]
Abstract
The Human Genome Variation Society (HGVS) nomenclature guidelines encourage the accurate and standard description of DNA, RNA, and protein sequence variants in public variant databases and the scientific literature. Inconsistent application of the HGVS guidelines can lead to misinterpretation of variants in clinical settings. Reliable software tools are essential to ensure consistent application of the HGVS guidelines when reporting and interpreting variants. We present the hgvs Python package, a comprehensive tool for manipulating sequence variants according to the HGVS nomenclature guidelines. Distinguishing features of the hgvs package include: (1) parsing, formatting, validating, and normalizing variants on genome, transcript, and protein sequences; (2) projecting variants between aligned sequences, including those with gapped alignments; (3) flexible installation using remote or local data (fully local installations eliminate network dependencies); (4) extensive automated tests; and (5) open source development by a community from eight organizations worldwide. This report summarizes recent and significant updates to the hgvs package since its original release in 2014, and presents results of extensive validation using clinical relevant variants from ClinVar and HGMD.
Collapse
Affiliation(s)
- Meng Wang
- School of Life Sciences, Peking University, Beijing, China
| | | | - Raymond Dalgleish
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | | | | | - Peter J Freeman
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | | | | | | | | | | | | |
Collapse
|
8
|
Freeman PJ, Hart RK, Gretton LJ, Brookes AJ, Dalgleish R. VariantValidator: Accurate validation, mapping, and formatting of sequence variation descriptions. Hum Mutat 2017; 39:61-68. [PMID: 28967166 PMCID: PMC5765404 DOI: 10.1002/humu.23348] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2017] [Revised: 09/21/2017] [Accepted: 09/25/2017] [Indexed: 12/14/2022]
Abstract
The Human Genome Variation Society (HGVS) variant nomenclature is widely used to describe sequence variants in scientific publications, clinical reports, and databases. However, the HGVS recommendations are complex and this often results in inaccurate variant descriptions being reported. The open‐source hgvs Python package (https://github.com/biocommons/hgvs) provides a programmatic interface for parsing, manipulating, formatting, and validating of variants according to the HGVS recommendations, but does not provide a user‐friendly Web interface. We have developed a Web‐based variant validation tool, VariantValidator (https://variantvalidator.org/), which utilizes the hgvs Python package and provides additional functionality to assist users who wish to accurately describe and report sequence‐level variations that are compliant with the HGVS recommendations. VariantValidator was designed to ensure that users are guided through the intricacies of the HGVS nomenclature, for example, if the user makes a mistake, VariantValidator automatically corrects the mistake if it can, or provides helpful guidance if it cannot. In addition, VariantValidator has the facility to interconvert genomic variant descriptions in HGVS and Variant Call Format with a degree of accuracy that surpasses most competing solutions.
Collapse
Affiliation(s)
- Peter J Freeman
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| | - Reece K Hart
- Invitae, Inc., San Francisco, California.,Genome Medical, Inc., San Francisco, California
| | - Liam J Gretton
- IT Services, University of Leicester, Leicester, United Kingdom
| | - Anthony J Brookes
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| | - Raymond Dalgleish
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
9
|
Ritter DI, Roychowdhury S, Roy A, Rao S, Landrum MJ, Sonkin D, Shekar M, Davis CF, Hart RK, Micheel C, Weaver M, Van Allen EM, Parsons DW, McLeod HL, Watson MS, Plon SE, Kulkarni S, Madhavan S. Somatic cancer variant curation and harmonization through consensus minimum variant level data. Genome Med 2016; 8:117. [PMID: 27814769 PMCID: PMC5095986 DOI: 10.1186/s13073-016-0367-z] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2016] [Accepted: 10/13/2016] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND To truly achieve personalized medicine in oncology, it is critical to catalog and curate cancer sequence variants for their clinical relevance. The Somatic Working Group (WG) of the Clinical Genome Resource (ClinGen), in cooperation with ClinVar and multiple cancer variant curation stakeholders, has developed a consensus set of minimal variant level data (MVLD). MVLD is a framework of standardized data elements to curate cancer variants for clinical utility. With implementation of MVLD standards, and in a working partnership with ClinVar, we aim to streamline the somatic variant curation efforts in the community and reduce redundancy and time burden for the interpretation of cancer variants in clinical practice. METHODS We developed MVLD through a consensus approach by i) reviewing clinical actionability interpretations from institutions participating in the WG, ii) conducting extensive literature search of clinical somatic interpretation schemas, and iii) survey of cancer variant web portals. A forthcoming guideline on cancer variant interpretation, from the Association of Molecular Pathology (AMP), can be incorporated into MVLD. RESULTS Along with harmonizing standardized terminology for allele interpretive and descriptive fields that are collected by many databases, the MVLD includes unique fields for cancer variants such as Biomarker Class, Therapeutic Context and Effect. In addition, MVLD includes recommendations for controlled semantics and ontologies. The Somatic WG is collaborating with ClinVar to evaluate MVLD use for somatic variant submissions. ClinVar is an open and centralized repository where sequencing laboratories can report summary-level variant data with clinical significance, and ClinVar accepts cancer variant data. CONCLUSIONS We expect the use of the MVLD to streamline clinical interpretation of cancer variants, enhance interoperability among multiple redundant curation efforts, and increase submission of somatic variants to ClinVar, all of which will enhance translation to clinical oncology practice.
Collapse
Affiliation(s)
- Deborah I Ritter
- Baylor College of Medicine and Texas Children's Hospital, Houston, TX, USA
| | | | - Angshumoy Roy
- Baylor College of Medicine and Texas Children's Hospital, Houston, TX, USA
| | - Shruti Rao
- Innovation Center for Biomedical Informatics and Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA
| | | | | | | | | | | | | | - Meredith Weaver
- American College of Medical Genetics and Genomics, Bethesda, MD, USA
| | | | - Donald W Parsons
- Baylor College of Medicine and Texas Children's Hospital, Houston, TX, USA
| | | | - Michael S Watson
- American College of Medical Genetics and Genomics, Bethesda, MD, USA
| | - Sharon E Plon
- Baylor College of Medicine and Texas Children's Hospital, Houston, TX, USA
| | | | - Subha Madhavan
- Innovation Center for Biomedical Informatics and Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA.
| |
Collapse
|
10
|
den Dunnen JT, Dalgleish R, Maglott DR, Hart RK, Greenblatt MS, McGowan-Jordan J, Roux AF, Smith T, Antonarakis SE, Taschner PEM. HGVS Recommendations for the Description of Sequence Variants: 2016 Update. Hum Mutat 2016; 37:564-9. [PMID: 26931183 DOI: 10.1002/humu.22981] [Citation(s) in RCA: 990] [Impact Index Per Article: 123.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 02/18/2016] [Indexed: 01/19/2023]
Abstract
The consistent and unambiguous description of sequence variants is essential to report and exchange information on the analysis of a genome. In particular, DNA diagnostics critically depends on accurate and standardized description and sharing of the variants detected. The sequence variant nomenclature system proposed in 2000 by the Human Genome Variation Society has been widely adopted and has developed into an internationally accepted standard. The recommendations are currently commissioned through a Sequence Variant Description Working Group (SVD-WG) operating under the auspices of three international organizations: the Human Genome Variation Society (HGVS), the Human Variome Project (HVP), and the Human Genome Organization (HUGO). Requests for modifications and extensions go through the SVD-WG following a standard procedure including a community consultation step. Version numbers are assigned to the nomenclature system to allow users to specify the version used in their variant descriptions. Here, we present the current recommendations, HGVS version 15.11, and briefly summarize the changes that were made since the 2000 publication. Most focus has been on removing inconsistencies and tightening definitions allowing automatic data processing. An extensive version of the recommendations is available online, at http://www.HGVS.org/varnomen.
Collapse
Affiliation(s)
- Johan T den Dunnen
- Human Genetics & Clinical Genetics, Leiden University Medical Center, Leiden, Nederland
| | - Raymond Dalgleish
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| | - Donna R Maglott
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland
| | | | | | - Jean McGowan-Jordan
- Children's Hospital of Eastern Ontario and University of Ottawa, Ottawa, Ontario, Canada
| | | | - Timothy Smith
- Human Variome Project International Coordinating Office, Melbourne, Australia
| | | | - Peter E M Taschner
- Generade Centre of Expertise Genomics and University of Applied Sciences Leiden, Leiden, The Netherlands
| |
Collapse
|
11
|
Oetting WS, Brenner SE, Brookes AJ, Greenblatt MS, Hart RK, Karchin R, Sunyaev SR, Taschner PE. Pathogenicity Interpretation in the Age of Precision Medicine: The 2015 Annual Scientific Meeting of the Human Genome Variation Society. Hum Mutat 2016; 37:406-11. [PMID: 26791113 DOI: 10.1002/humu.22958] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 01/10/2016] [Indexed: 11/05/2022]
Affiliation(s)
- William S Oetting
- Department of Experimental and Clinical Pharmacology, University of Minnesota, Minneapolis, Minnesota
| | - Steven E Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, California
| | | | | | | | - Rachel Karchin
- Departments of Biomedical Engineering/Oncology and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | - Shamil R Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Peter E Taschner
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.,Generade Centre of Expertise Genomics and University of Applied Sciences, Leiden, The Netherlands
| |
Collapse
|
12
|
Hart RK, Rico R, Hare E, Garcia J, Westbrook J, Fusaro VA. A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature. ACTA ACUST UNITED AC 2014; 31:268-70. [PMID: 25273102 PMCID: PMC4287946 DOI: 10.1093/bioinformatics/btu630] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
UNLABELLED Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. AVAILABILITY AND IMPLEMENTATION The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Reece K Hart
- Invitae Inc., San Francisco, CA 94107 and 23andMe Inc., Mountain View, CA 94043, USA Invitae Inc., San Francisco, CA 94107 and 23andMe Inc., Mountain View, CA 94043, USA
| | - Rudolph Rico
- Invitae Inc., San Francisco, CA 94107 and 23andMe Inc., Mountain View, CA 94043, USA
| | - Emily Hare
- Invitae Inc., San Francisco, CA 94107 and 23andMe Inc., Mountain View, CA 94043, USA
| | - John Garcia
- Invitae Inc., San Francisco, CA 94107 and 23andMe Inc., Mountain View, CA 94043, USA
| | - Jody Westbrook
- Invitae Inc., San Francisco, CA 94107 and 23andMe Inc., Mountain View, CA 94043, USA
| | - Vincent A Fusaro
- Invitae Inc., San Francisco, CA 94107 and 23andMe Inc., Mountain View, CA 94043, USA
| |
Collapse
|
13
|
Brownstein CA, Beggs AH, Homer N, Merriman B, Yu TW, Flannery KC, DeChene ET, Towne MC, Savage SK, Price EN, Holm IA, Luquette LJ, Lyon E, Majzoub J, Neupert P, McCallie D, Szolovits P, Willard HF, Mendelsohn NJ, Temme R, Finkel RS, Yum SW, Medne L, Sunyaev SR, Adzhubey I, Cassa CA, de Bakker PIW, Duzkale H, Dworzyński P, Fairbrother W, Francioli L, Funke BH, Giovanni MA, Handsaker RE, Lage K, Lebo MS, Lek M, Leshchiner I, MacArthur DG, McLaughlin HM, Murray MF, Pers TH, Polak PP, Raychaudhuri S, Rehm HL, Soemedi R, Stitziel NO, Vestecka S, Supper J, Gugenmus C, Klocke B, Hahn A, Schubach M, Menzel M, Biskup S, Freisinger P, Deng M, Braun M, Perner S, Smith RJH, Andorf JL, Huang J, Ryckman K, Sheffield VC, Stone EM, Bair T, Black-Ziegelbein EA, Braun TA, Darbro B, DeLuca AP, Kolbe DL, Scheetz TE, Shearer AE, Sompallae R, Wang K, Bassuk AG, Edens E, Mathews K, Moore SA, Shchelochkov OA, Trapane P, Bossler A, Campbell CA, Heusel JW, Kwitek A, Maga T, Panzer K, Wassink T, Van Daele D, Azaiez H, Booth K, Meyer N, Segal MM, Williams MS, Tromp G, White P, Corsmeier D, Fitzgerald-Butt S, Herman G, Lamb-Thrush D, McBride KL, Newsom D, Pierson CR, Rakowsky AT, Maver A, Lovrečić L, Palandačić A, Peterlin B, Torkamani A, Wedell A, Huss M, Alexeyenko A, Lindvall JM, Magnusson M, Nilsson D, Stranneheim H, Taylan F, Gilissen C, Hoischen A, van Bon B, Yntema H, Nelen M, Zhang W, Sager J, Zhang L, Blair K, Kural D, Cariaso M, Lennon GG, Javed A, Agrawal S, Ng PC, Sandhu KS, Krishna S, Veeramachaneni V, Isakov O, Halperin E, Friedman E, Shomron N, Glusman G, Roach JC, Caballero J, Cox HC, Mauldin D, Ament SA, Rowen L, Richards DR, San Lucas FA, Gonzalez-Garay ML, Caskey CT, Bai Y, Huang Y, Fang F, Zhang Y, Wang Z, Barrera J, Garcia-Lobo JM, González-Lamuño D, Llorca J, Rodriguez MC, Varela I, Reese MG, De La Vega FM, Kiruluta E, Cargill M, Hart RK, Sorenson JM, Lyon GJ, Stevenson DA, Bray BE, Moore BM, Eilbeck K, Yandell M, Zhao H, Hou L, Chen X, Yan X, Chen M, Li C, Yang C, Gunel M, Li P, Kong Y, Alexander AC, Albertyn ZI, Boycott KM, Bulman DE, Gordon PMK, Innes AM, Knoppers BM, Majewski J, Marshall CR, Parboosingh JS, Sawyer SL, Samuels ME, Schwartzentruber J, Kohane IS, Margulies DM. An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge. Genome Biol 2014; 15:R53. [PMID: 24667040 PMCID: PMC4073084 DOI: 10.1186/gb-2014-15-3-r53] [Citation(s) in RCA: 90] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2013] [Accepted: 03/25/2014] [Indexed: 12/30/2022] Open
Abstract
Background There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance. Results A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization. Conclusions The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups.
Collapse
|
14
|
Hart RK, Mukhyala K. Unison: an integrated platform for computational biology discovery. Pac Symp Biocomput 2009:403-414. [PMID: 19209718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
This paper describes the design and applications of Unison, a comprehensive and integrated warehouse of protein sequences, diverse precomputed predictions, and other biological data. Unison provides a practical solution to the burden of preparing data for computational discovery projects, enables holistic feature-based mining queries regarding protein composition and functions, and provides a foundation for the development of new tools. Unison is available for immediate use online via direct database connections and a web interface. In addition, the database schema, command line tools, web interface, and non-proprietary precomputed predictions are released under the Academic Free License and available for download at http://unison-db.org/. This project has resulted in a system that significantly reduces several practical impediments to the initiation of computational biology discovery projects.
Collapse
Affiliation(s)
- Reece K Hart
- Genentech, Inc., 1 DNA Way, South San Francisco, CA 94080, USA.
| | | |
Collapse
|
15
|
|
16
|
Abstract
We present an efficient algorithm to systematically and automatically identify patterns in protein sequence families. The procedure is based on the Splash deterministic pattern discovery algorithm and on a framework to assess the statistical significance of patterns. We demonstrate its application to the fully automated discovery of patterns in 974 PROSITE families (the complete subset of PROSITE families which are defined by patterns and contain DR records). Splash generates patterns with better specificity and undiminished sensitivity, or vice versa, in 28% of the families; identical statistics were obtained in 48% of the families, worse statistics in 15%, and mixed behavior in the remaining 9%. In about 75% of the cases, Splash patterns identify sequence sites that overlap more than 50% with the corresponding PROSITE pattern. The procedure is sufficiently rapid to enable its use for daily curation of existing motif and profile databases. Third, our results show that the statistical significance of discovered patterns correlates well with their biological significance. The trypsin subfamily of serine proteases is used to illustrate this method's ability to exhaustively discover all motifs in a family that are statistically and biologically significant. Finally, we discuss applications of sequence patterns to multiple sequence alignment and the training of more sensitive score-based motif models, akin to the procedure used by PSI-BLAST. All results are available at httpl//www.research.ibm.com/spat/.
Collapse
Affiliation(s)
- R K Hart
- IBM Computational Biology Center, T.J. Watson Research Center, Yorktown Heights, NY 10598, USA.
| | | | | | | |
Collapse
|
17
|
Affiliation(s)
- Rohit V. Pappu
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri 63110
| | - Reece K. Hart
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri 63110
| | - Jay W. Ponder
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri 63110
| |
Collapse
|
18
|
|