1
|
Jain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, Huff CD, Murakami K, Nagai Y, Imanishi T, Mungall CJ, Jacobsen JOB, Kim D, Jeong CS, Jones DT, Li MJ, Guthrie VB, Bhattacharya R, Chen YC, Douville C, Fan J, Kim D, Masica D, Niknafs N, Sengupta S, Tokheim C, Turner TN, Yeo HTG, Karchin R, Shin S, Welch R, Keles S, Li Y, Kellis M, Corbi-Verge C, Strokach AV, Kim PM, Klein TE, Mohan R, Sinnott-Armstrong NA, Wainberg M, Kundaje A, Gonzaludo N, Mak ACY, Chhibber A, Lam HYK, Dahary D, Fishilevich S, Lancet D, Lee I, Bachman B, Katsonis P, Lua RC, Wilson SJ, Lichtarge O, Bhat RR, Sundaram L, Viswanath V, Bellazzi R, Nicora G, Rizzo E, Limongelli I, Mezlini AM, Chang R, Kim S, Lai C, O’Connor R, Topper S, van den Akker J, Zhou AY, Zimmer AD, Mishne G, Bergquist TR, Breese MR, Guerrero RF, Jiang Y, Kiga N, Li B, Mort M, Pagel KA, Pejaver V, Stamboulian MH, Thusberg J, Mooney SD, Teerakulkittipong N, Cao C, Kundu K, Yin Y, Yu CH, Kleyman M, Lin CF, Stackpole M, Mount SM, Eraslan G, Mueller NS, Naito T, Rao AR, Azaria JR, Brodie A, Ofran Y, Garg A, Pal D, Hawkins-Hooker A, Kenlay H, Reid J, Mucaki EJ, Rogan PK, Schwarz JM, Searls DB, Lee GR, Seok C, Krämer A, Shah S, Huang CV, Kirsch JF, Shatsky M, Cao Y, Chen H, Karimi M, Moronfoye O, Sun Y, Shen Y, Shigeta R, Ford CT, Nodzak C, Uppal A, Shi X, Joseph T, Kotte S, Rana S, Rao A, Saipradeep VG, Sivadasan N, Sunderam U, Stanke M, Su A, Adzhubey I, Jordan DM, Sunyaev S, Rousseau F, Schymkowitz J, Van Durme J, Tavtigian SV, Carraro M, Giollo M, Tosatto SCE, Adato O, Carmel L, Cohen NE, Fenesh T, Holtzer T, Juven-Gershon T, Unger R, Niroula A, Olatubosun A, Väliaho J, Yang Y, Vihinen M, Wahl ME, Chang B, Chong KC, Hu I, Sun R, Wu WKK, Xia X, Zee BC, Wang MH, Wang M, Wu C, Lu Y, Chen K, Yang Y, Yates CM, Kreimer A, Yan Z, Yosef N, Zhao H, Wei Z, Yao Z, Zhou F, Folkman L, Zhou Y, Daneshjou R, Altman RB, Inoue F, Ahituv N, Arkin AP, Lovisa F, Bonvini P, Bowdin S, Gianni S, Mantuano E, Minicozzi V, Novak L, Pasquo A, Pastore A, Petrosino M, Puglisi R, Toto A, Veneziano L, Chiaraluce R, Ball MP, Bobe JR, Church GM, Consalvi V, Cooper DN, Buckley BA, Sheridan MB, Cutting GR, Scaini MC, Cygan KJ, Fredericks AM, Glidden DT, Neil C, Rhine CL, Fairbrother WG, Alontaga AY, Fenton AW, Matreyek KA, Starita LM, Fowler DM, Löscher BS, Franke A, Adamson SI, Graveley BR, Gray JW, Malloy MJ, Kane JP, Kousi M, Katsanis N, Schubach M, Kircher M, Mak ACY, Tang PLF, Kwok PY, Lathrop RH, Clark WT, Yu GK, LeBowitz JH, Benedicenti F, Bettella E, Bigoni S, Cesca F, Mammi I, Marino-Buslje C, Milani D, Peron A, Polli R, Sartori S, Stanzial F, Toldo I, Turolla L, Aspromonte MC, Bellini M, Leonardi E, Liu X, Marshall C, McCombie WR, Elefanti L, Menin C, Meyn MS, Murgia A, Nadeau KCY, Neuhausen SL, Nussbaum RL, Pirooznia M, Potash JB, Dimster-Denk DF, Rine JD, Sanford JR, Snyder M, Cote AG, Sun S, Verby MW, Weile J, Roth FP, Tewhey R, Sabeti PC, Campagna J, Refaat MM, Wojciak J, Grubb S, Schmitt N, Shendure J, Spurdle AB, Stavropoulos DJ, Walton NA, Zandi PP, Ziv E, Burke W, Chen F, Carr LR, Martinez S, Paik J, Harris-Wai J, Yarborough M, Fullerton SM, Koenig BA, McInnes G, Shigaki D, Chandonia JM, Furutsuki M, Kasak L, Yu C, Chen R, Friedberg I, Getz GA, Cong Q, Kinch LN, Zhang J, Grishin NV, Voskanian A, Kann MG, Tran E, Ioannidis NM, Hunter JM, Udani R, Cai B, Morgan AA, Sokolov A, Stuart JM, Minervini G, Monzon AM, Batzoglou S, Butte AJ, Greenblatt MS, Hart RK, Hernandez R, Hubbard TJP, Kahn S, O’Donnell-Luria A, Ng PC, Shon J, Veltman J, Zook JM. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 2024; 25:53. [PMID: 38389099 PMCID: PMC10882881 DOI: 10.1186/s13059-023-03113-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 11/17/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Collapse
|
2
|
Schubach M, Maass T, Nazaretyan L, Röner S, Kircher M. CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions. Nucleic Acids Res 2024; 52:D1143-D1154. [PMID: 38183205 PMCID: PMC10767851 DOI: 10.1093/nar/gkad989] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/14/2023] [Accepted: 10/17/2023] [Indexed: 01/07/2024] Open
Abstract
Machine Learning-based scoring and classification of genetic variants aids the assessment of clinical findings and is employed to prioritize variants in diverse genetic studies and analyses. Combined Annotation-Dependent Depletion (CADD) is one of the first methods for the genome-wide prioritization of variants across different molecular functions and has been continuously developed and improved since its original publication. Here, we present our most recent release, CADD v1.7. We explored and integrated new annotation features, among them state-of-the-art protein language model scores (Meta ESM-1v), regulatory variant effect predictions (from sequence-based convolutional neural networks) and sequence conservation scores (Zoonomia). We evaluated the new version on data sets derived from ClinVar, ExAC/gnomAD and 1000 Genomes variants. For coding effects, we tested CADD on 31 Deep Mutational Scanning (DMS) data sets from ProteinGym and, for regulatory effect prediction, we used saturation mutagenesis reporter assay data of promoter and enhancer sequences. The inclusion of new features further improved the overall performance of CADD. As with previous releases, all data sets, genome-wide CADD v1.7 scores, scripts for on-site scoring and an easy-to-use webserver are readily provided via https://cadd.bihealth.org/ or https://cadd.gs.washington.edu/ to the community.
Collapse
Affiliation(s)
- Max Schubach
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Thorben Maass
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Lusiné Nazaretyan
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Sebastian Röner
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Martin Kircher
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| |
Collapse
|
3
|
Agarwal V, Inoue F, Schubach M, Martin BK, Dash PM, Zhang Z, Sohota A, Noble WS, Yardimci GG, Kircher M, Shendure J, Ahituv N. Massively parallel characterization of transcriptional regulatory elements in three diverse human cell types. bioRxiv 2023:2023.03.05.531189. [PMID: 36945371 PMCID: PMC10028905 DOI: 10.1101/2023.03.05.531189] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
The human genome contains millions of candidate cis-regulatory elements (CREs) with cell-type-specific activities that shape both health and myriad disease states. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these CREs. Here, we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of over 680,000 sequences, representing a nearly comprehensive set of all annotated CREs among three cell types (HepG2, K562, and WTC11), finding 41.7% to be functional. By testing sequences in both orientations, we find promoters to have significant strand orientation effects. We also observe that their 200 nucleotide cores function as non-cell-type-specific 'on switches' providing similar expression levels to their associated gene. In contrast, enhancers have weaker orientation effects, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict CRE function with high accuracy and delineate regulatory motifs. Testing an additional lentiMPRA library encompassing 60,000 CREs in all three cell types, we further identified factors that determine cell-type specificity. Collectively, our work provides an exhaustive catalog of functional CREs in three widely used cell lines, and showcases how large-scale functional measurements can be used to dissect regulatory grammar.
Collapse
Affiliation(s)
- Vikram Agarwal
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- mRNA Center of Excellence, Sanofi Pasteur Inc., Waltham, MA 02451, USA
| | - Fumitaka Inoue
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Max Schubach
- Berlin Institute of Health of Health at Charité - Universitätsmedizin Berlin, 10178, Berlin, Germany
| | - Beth K. Martin
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Pyaree Mohan Dash
- Berlin Institute of Health of Health at Charité - Universitätsmedizin Berlin, 10178, Berlin, Germany
| | - Zicong Zhang
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Ajuni Sohota
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Galip Gürkan Yardimci
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Knight Cancer Institute, Oregon Health and Science University, Portland, OR, USA
- Cancer Early Detection Advanced Research Center, Oregon Health and Science University, Portland, OR, USA
| | - Martin Kircher
- Berlin Institute of Health of Health at Charité - Universitätsmedizin Berlin, 10178, Berlin, Germany
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, Seattle, WA 98195, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA
- Allen Center for Cell Lineage Tracing, University of Washington, Seattle, WA 98195, USA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
4
|
Abstract
BACKGROUND Genome sequencing efforts for individuals with rare Mendelian disease have increased the research focus on the noncoding genome and the clinical need for methods that prioritize potentially disease causal noncoding variants. Some tools for assessment of variant pathogenicity as well as annotations are not available for the current human genome build (GRCh38), for which the adoption in databases, software, and pipelines was slow. RESULTS Here, we present an updated version of the Regulatory Mendelian Mutation (ReMM) score, retrained on features and variants derived from the GRCh38 genome build. Like its GRCh37 version, it achieves good performance on its highly imbalanced data. To improve accessibility and provide users with a toolbox to score their variant files and look up scores in the genome, we developed a website and API for easy score lookup. CONCLUSIONS Scores of the GRCh38 genome build are highly correlated to the prior release with a performance increase due to the better coverage of features. For prioritization of noncoding mutations in imbalanced datasets, the ReMM score performed much better than other variation scores. Prescored whole-genome files of GRCh37 and GRCh38 genome builds are cited in the article and the website; UCSC genome browser tracks, and an API are available at https://remm.bihealth.org.
Collapse
Affiliation(s)
- Max Schubach
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
| | - Lusiné Nazaretyan
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
| | - Martin Kircher
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, 23562 Lübeck, Germany
| |
Collapse
|
5
|
Cappelletti L, Petrini A, Gliozzo J, Casiraghi E, Schubach M, Kircher M, Valentini G. Boosting tissue-specific prediction of active cis-regulatory regions through deep learning and Bayesian optimization techniques. BMC Bioinformatics 2022; 23:154. [PMID: 36510125 PMCID: PMC9743524 DOI: 10.1186/s12859-022-04582-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 01/20/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Cis-regulatory regions (CRRs) are non-coding regions of the DNA that fine control the spatio-temporal pattern of transcription; they are involved in a wide range of pivotal processes such as the development of specific cell-lines/tissues and the dynamic cell response to physiological stimuli. Recent studies showed that genetic variants occurring in CRRs are strongly correlated with pathogenicity or deleteriousness. Considering the central role of CRRs in the regulation of physiological and pathological conditions, the correct identification of CRRs and of their tissue-specific activity status through Machine Learning methods plays a major role in dissecting the impact of genetic variants on human diseases. Unfortunately, the problem is still open, though some promising results have been already reported by (deep) machine-learning based methods that predict active promoters and enhancers in specific tissues or cell lines by encoding epigenetic or spectral features directly extracted from DNA sequences. RESULTS We present the experiments we performed to compare two Deep Neural Networks, a Feed-Forward Neural Network model working on epigenomic features, and a Convolutional Neural Network model working only on genomic sequence, targeted to the identification of enhancer- and promoter-activity in specific cell lines. While performing experiments to understand how the experimental setup influences the prediction performance of the methods, we particularly focused on (1) automatic model selection performed by Bayesian optimization and (2) exploring different data rebalancing setups for reducing negative unbalancing effects. CONCLUSIONS Results show that (1) automatic model selection by Bayesian optimization improves the quality of the learner; (2) data rebalancing considerably impacts the prediction performance of the models; test set rebalancing may provide over-optimistic results, and should therefore be cautiously applied; (3) despite working on sequence data, convolutional models obtain performance close to those of feed forward models working on epigenomic information, which suggests that also sequence data carries informative content for CRR-activity prediction. We therefore suggest combining both models/data types in future works.
Collapse
Affiliation(s)
- Luca Cappelletti
- grid.4708.b0000 0004 1757 2822AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy
| | - Alessandro Petrini
- grid.4708.b0000 0004 1757 2822AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy
| | - Jessica Gliozzo
- grid.4708.b0000 0004 1757 2822AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy
| | - Elena Casiraghi
- grid.4708.b0000 0004 1757 2822AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy
| | - Max Schubach
- grid.6363.00000 0001 2218 4662Berlin Institute of Health at Charité, Universitätsmedizin Berlin, Berlin, Germany
| | - Martin Kircher
- grid.6363.00000 0001 2218 4662Berlin Institute of Health at Charité, Universitätsmedizin Berlin, Berlin, Germany
| | - Giorgio Valentini
- grid.4708.b0000 0004 1757 2822AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy ,European Laboratory for Learning and Intelligent Systems (ELLIS), Berlin, Germany ,CINI National Laboratory of Artificial Intelligence and Intelligent Systems (AIIS), Rome, Italy ,grid.4708.b0000 0004 1757 2822Data Science Research Center, Università degli Studi di Milano, Milan, Italy
| |
Collapse
|
6
|
Oheim R, Tsourdi E, Seefried L, Beller G, Schubach M, Vettorazzi E, Stürznickel J, Rolvien T, Ehmke N, Delsmann A, Genest F, Krüger U, Zemojtel T, Barvencik F, Schinke T, Jakob F, Hofbauer LC, Mundlos S, Kornak U. Genetic Diagnostics in Routine Osteological Assessment of Adult Low Bone Mass Disorders. J Clin Endocrinol Metab 2022; 107:e3048-e3057. [PMID: 35276006 PMCID: PMC9202726 DOI: 10.1210/clinem/dgac147] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Indexed: 12/17/2022]
Abstract
CONTEXT Many different inherited and acquired conditions can result in premature bone fragility/low bone mass disorders (LBMDs). OBJECTIVE We aimed to elucidate the impact of genetic testing on differential diagnosis of adult LBMDs and at defining clinical criteria for predicting monogenic forms. METHODS Four clinical centers broadly recruited a cohort of 394 unrelated adult women before menopause and men younger than 55 years with a bone mineral density (BMD) Z-score < -2.0 and/or pathological fractures. After exclusion of secondary causes or unequivocal clinical/biochemical hallmarks of monogenic LBMDs, all participants were genotyped by targeted next-generation sequencing. RESULTS In total, 20.8% of the participants carried rare disease-causing variants (DCVs) in genes known to cause osteogenesis imperfecta (COL1A1, COL1A2), hypophosphatasia (ALPL), and early-onset osteoporosis (LRP5, PLS3, and WNT1). In addition, we identified rare DCVs in ENPP1, LMNA, NOTCH2, and ZNF469. Three individuals had autosomal recessive, 75 autosomal dominant, and 4 X-linked disorders. A total of 9.7% of the participants harbored variants of unknown significance. A regression analysis revealed that the likelihood of detecting a DCV correlated with a positive family history of osteoporosis, peripheral fractures (> 2), and a high normal body mass index (BMI). In contrast, mutation frequencies did not correlate with age, prevalent vertebral fractures, BMD, or biochemical parameters. In individuals without monogenic disease-causing rare variants, common variants predisposing for low BMD (eg, in LRP5) were overrepresented. CONCLUSION The overlapping spectra of monogenic adult LBMD can be easily disentangled by genetic testing and the proposed clinical criteria can help to maximize the diagnostic yield.
Collapse
Affiliation(s)
- Ralf Oheim
- Ralf Oheim, MD, Department of Osteology and Biomechanics, University Medical Center Hamburg-Eppendorf, Lottestraße 59, 22529 Hamburg, Germany.
| | - Elena Tsourdi
- Department of Medicine III, Technische Universität Dresden Medical Center, 01307 Dresden, Germany
- Center for Healthy Aging, Technische Universität Dresden Medical Center, 01307 Dresden, Germany
| | - Lothar Seefried
- Orthopedic Center for Musculoskeletal Research, Orthopedic Department, University of Würzburg, 97070 Würzburg, Germany
| | - Gisela Beller
- Centre of Muscle and Bone Research, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany
| | - Max Schubach
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, 13353 Berlin, Germany
| | - Eik Vettorazzi
- Department of Medical Biometry and Epidemiology, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Julian Stürznickel
- Department of Osteology and Biomechanics, University Medical Center Hamburg-Eppendorf, 22529 Hamburg, Germany
- Department of Orthopaedics and Trauma Surgery, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Tim Rolvien
- Department of Osteology and Biomechanics, University Medical Center Hamburg-Eppendorf, 22529 Hamburg, Germany
- Department of Orthopaedics and Trauma Surgery, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Nadja Ehmke
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, Germany
| | - Alena Delsmann
- Department of Osteology and Biomechanics, University Medical Center Hamburg-Eppendorf, 22529 Hamburg, Germany
| | - Franca Genest
- Orthopedic Center for Musculoskeletal Research, Orthopedic Department, University of Würzburg, 97070 Würzburg, Germany
| | - Ulrike Krüger
- Core Facility Genomics, Berlin Institute of Health (BIH), 10178 Berlin, Germany
| | - Tomasz Zemojtel
- Core Facility Genomics, Berlin Institute of Health (BIH), 10178 Berlin, Germany
| | - Florian Barvencik
- Department of Osteology and Biomechanics, University Medical Center Hamburg-Eppendorf, 22529 Hamburg, Germany
| | - Thorsten Schinke
- Department of Osteology and Biomechanics, University Medical Center Hamburg-Eppendorf, 22529 Hamburg, Germany
| | - Franz Jakob
- Orthopedic Center for Musculoskeletal Research, Orthopedic Department, University of Würzburg, 97070 Würzburg, Germany
| | - Lorenz C Hofbauer
- Department of Medicine III, Technische Universität Dresden Medical Center, 01307 Dresden, Germany
- Center for Healthy Aging, Technische Universität Dresden Medical Center, 01307 Dresden, Germany
| | - Stefan Mundlos
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, Germany
- BIH Center for Regenerative Therapies, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 10178 Berlin, Germany
- Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Uwe Kornak
- Correspondence: Uwe Kornak, PhD, Institute of Human Genetics, Universitätsmedizin Göttingen, Heinrich-Düker-Weg 12, 37073 Göttingen, Germany.
| |
Collapse
|
7
|
Rentzsch P, Schubach M, Shendure J, Kircher M. CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med 2021; 13:31. [PMID: 33618777 PMCID: PMC7901104 DOI: 10.1186/s13073-021-00835-9] [Citation(s) in RCA: 309] [Impact Index Per Article: 103.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 01/20/2021] [Indexed: 02/08/2023] Open
Abstract
Background Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. Methods It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. Results We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. Conclusions While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-021-00835-9.
Collapse
Affiliation(s)
- Philipp Rentzsch
- Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany.,Berlin Institute of Health (BIH), 10178, Berlin, Germany
| | - Max Schubach
- Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany.,Berlin Institute of Health (BIH), 10178, Berlin, Germany
| | - Jay Shendure
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, 98195, USA.,Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
| | - Martin Kircher
- Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany. .,Berlin Institute of Health (BIH), 10178, Berlin, Germany.
| |
Collapse
|
8
|
Krützfeldt LM, Schubach M, Kircher M. The impact of different negative training data on regulatory sequence predictions. PLoS One 2020; 15:e0237412. [PMID: 33259518 PMCID: PMC7707526 DOI: 10.1371/journal.pone.0237412] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 11/12/2020] [Indexed: 01/08/2023] Open
Abstract
Regulatory regions, like promoters and enhancers, cover an estimated 5–15% of the human genome. Changes to these sequences are thought to underlie much of human phenotypic variation and a substantial proportion of genetic causes of disease. However, our understanding of their functional encoding in DNA is still very limited. Applying machine or deep learning methods can shed light on this encoding and gapped k-mer support vector machines (gkm-SVMs) or convolutional neural networks (CNNs) are commonly trained on putative regulatory sequences. Here, we investigate the impact of negative sequence selection on model performance. By training gkm-SVM and CNN models on open chromatin data and corresponding negative training dataset, both learners and two approaches for negative training data are compared. Negative sets use either genomic background sequences or sequence shuffles of the positive sequences. Model performance was evaluated on three different tasks: predicting elements active in a cell-type, predicting cell-type specific elements, and predicting elements' relative activity as measured from independent experimental data. Our results indicate strong effects of the negative training data, with genomic backgrounds showing overall best results. Specifically, models trained on highly shuffled sequences perform worse on the complex tasks of tissue-specific activity and quantitative activity prediction, and seem to learn features of artificial sequences rather than regulatory activity. Further, we observe that insufficient matching of genomic background sequences results in model biases. While CNNs achieved and exceeded the performance of gkm-SVMs for larger training datasets, gkm-SVMs gave robust and best results for typical training dataset sizes without the need of hyperparameter optimization.
Collapse
Affiliation(s)
- Louisa-Marie Krützfeldt
- Charité–Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
| | - Max Schubach
- Charité–Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
| | - Martin Kircher
- Charité–Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
- * E-mail:
| |
Collapse
|
9
|
Gordon MG, Inoue F, Martin B, Schubach M, Agarwal V, Whalen S, Feng S, Zhao J, Ashuach T, Ziffra R, Kreimer A, Georgakopoulos-Soares I, Yosef N, Ye CJ, Pollard KS, Shendure J, Kircher M, Ahituv N. Author Correction: lentiMPRA and MPRAflow for high-throughput functional characterization of gene regulatory elements. Nat Protoc 2020; 16:3736. [PMID: 33128032 DOI: 10.1038/s41596-020-00422-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- M Grace Gordon
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.,Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.,Biological and Medical Informatics Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Fumitaka Inoue
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA. .,Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
| | | | - Max Schubach
- Berlin Institute of Health (BIH), Berlin, Germany
| | - Vikram Agarwal
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.,Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Sean Whalen
- Gladstone Institutes, San Francisco, CA, USA
| | - Shiyun Feng
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.,Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Jingjing Zhao
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.,Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Tal Ashuach
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Ryan Ziffra
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.,Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Anat Kreimer
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.,Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.,Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Ilias Georgakopoulos-Soares
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.,Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Nir Yosef
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.,Chan-Zuckerberg Biohub, San Francisco, CA, USA
| | - Chun Jimmie Ye
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.,Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.,Chan-Zuckerberg Biohub, San Francisco, CA, USA.,Division of Rheumatology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA.,Institute for Computational Health Sciences, University of California, San Francisco, San Francisco, California, USA
| | - Katherine S Pollard
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.,Gladstone Institutes, San Francisco, CA, USA.,Chan-Zuckerberg Biohub, San Francisco, CA, USA.,Department of Epidemiology and Biostatistics and Institute of Computational Health Sciences, University of California, San Francisco, San Francisco, CA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA. .,Howard Hughes Medical Institute, Seattle, WA, USA. .,Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA.
| | - Martin Kircher
- Berlin Institute of Health (BIH), Berlin, Germany. .,Charité-Universitätsmedizin Berlin, Berlin, Germany.
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA. .,Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
10
|
Gordon MG, Inoue F, Martin B, Schubach M, Agarwal V, Whalen S, Feng S, Zhao J, Ashuach T, Ziffra R, Kreimer A, Georgakopoulos-Soares I, Yosef N, Ye CJ, Pollard KS, Shendure J, Kircher M, Ahituv N. lentiMPRA and MPRAflow for high-throughput functional characterization of gene regulatory elements. Nat Protoc 2020; 15:2387-2412. [PMID: 32641802 PMCID: PMC7550205 DOI: 10.1038/s41596-020-0333-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Accepted: 04/17/2020] [Indexed: 12/22/2022]
Abstract
Massively parallel reporter assays (MPRAs) can simultaneously measure the function of thousands of candidate regulatory sequences (CRSs) in a quantitative manner. In this method, CRSs are cloned upstream of a minimal promoter and reporter gene, alongside a unique barcode, and introduced into cells. If the CRS is a functional regulatory element, it will lead to the transcription of the barcode sequence, which is measured via RNA sequencing and normalized for cellular integration via DNA sequencing of the barcode. This technology has been used to test thousands of sequences and their variants for regulatory activity, to decipher the regulatory code and its evolution, and to develop genetic switches. Lentivirus-based MPRA (lentiMPRA) produces 'in-genome' readouts and enables the use of this technique in hard-to-transfect cells. Here, we provide a detailed protocol for lentiMPRA, along with a user-friendly Nextflow-based computational pipeline-MPRAflow-for quantifying CRS activity from different MPRA designs. The lentiMPRA protocol takes ~2 months, which includes sequencing turnaround time and data processing with MPRAflow.
Collapse
Affiliation(s)
- M Grace Gordon
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
- Biological and Medical Informatics Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Fumitaka Inoue
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
| | - Beth Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Max Schubach
- Berlin Institute of Health (BIH), Berlin, Germany
- Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Vikram Agarwal
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Sean Whalen
- Gladstone Institutes, San Francisco, CA, USA
| | - Shiyun Feng
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Jingjing Zhao
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Tal Ashuach
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Ryan Ziffra
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Anat Kreimer
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Ilias Georgakopoulos-Soares
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Nir Yosef
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
- Chan-Zuckerberg Biohub, San Francisco, CA, USA
| | - Chun Jimmie Ye
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
- Chan-Zuckerberg Biohub, San Francisco, CA, USA
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
- Institute for Computational Health Sciences, University of California, San Francisco, San Francisco, California, USA
| | - Katherine S Pollard
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
- Gladstone Institutes, San Francisco, CA, USA
- Chan-Zuckerberg Biohub, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics and Institute of Computational Health Sciences, University of California, San Francisco, San Francisco, CA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA.
| | - Martin Kircher
- Berlin Institute of Health (BIH), Berlin, Germany.
- Charité-Universitätsmedizin Berlin, Berlin, Germany.
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
11
|
Petrini A, Mesiti M, Schubach M, Frasca M, Danis D, Re M, Grossi G, Cappelletti L, Castrignanò T, Robinson PN, Valentini G. parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants. Gigascience 2020; 9:giaa052. [PMID: 32444882 PMCID: PMC7244787 DOI: 10.1093/gigascience/giaa052] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Revised: 10/31/2019] [Accepted: 04/28/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data. RESULTS To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version. CONCLUSIONS parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at https://github.com/AnacletoLAB/parSMURF.
Collapse
Affiliation(s)
- Alessandro Petrini
- Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy
| | - Marco Mesiti
- Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy
| | - Max Schubach
- Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, 10178 Berlin, Germany
- Charité – Universitätsmedizin Berlin, Chariteplatz 1, 10117 Berlin, Germany
| | - Marco Frasca
- Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington (CT) - 06032, United States of America
| | - Matteo Re
- Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy
| | - Giuliano Grossi
- Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy
| | - Luca Cappelletti
- Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy
| | - Tiziana Castrignanò
- CINECA, SCAI SuperComputing Applications and Innovation Department, Via dei Tizii 6, 00185 Roma, Italy
- University of Tuscia, Department of Ecological and Biological Sciences (DEB), Largo dell'Università snc, 01100 Viterbo, Italy
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington (CT) - 06032, United States of America
| | - Giorgio Valentini
- Università degli Studi di Milano, AnacletoLab - Dipartimento di Informatica, via Giovanni Celoria 18, 20135 Milano, Italy
- CINI National Laboratory in Artificial Intelligence and Intelligent Systems - AIIS, Università di Roma, Via Ariosto 25, 00185 Roma, Italy
| |
Collapse
|
12
|
Shigaki D, Adato O, Adhikar AN, Dong S, Hawkins-Hooker A, Inoue F, Juven-Gershon T, Kenlay H, Martin B, Patra A, Penzar DP, Schubach M, Xiong C, Yan Z, Boyle AP, Kreimer A, Kulakovskiy IV, Reid J, Unger R, Yosef N, Shendure J, Ahituv N, Kircher M, Beer MA. Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay. Hum Mutat 2019; 40:1280-1291. [PMID: 31106481 PMCID: PMC6879779 DOI: 10.1002/humu.23797] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 04/17/2019] [Accepted: 05/15/2019] [Indexed: 12/25/2022]
Abstract
The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation.
Collapse
Affiliation(s)
- Dustin Shigaki
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Orit Adato
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Aashish N. Adhikar
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Shengcheng Dong
- Department of Computational Medicine and Bioinformatics and Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | | | - Fumitaka Inoue
- Department of Bioengineering and Therapeutic Sciences and Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Tamar Juven-Gershon
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Henry Kenlay
- MRC Biostatistics Unit, University of Cambridge, UK
| | - Beth Martin
- Department of Genome Sciences, University of Washington, Seattle WA
| | - Ayoti Patra
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland
| | - Dmitry P. Penzar
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- School of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Max Schubach
- Berlin Institute of Health (BIH), Berlin, Germany
- Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Chenling Xiong
- Department of Bioengineering and Therapeutic Sciences and Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Zhongxia Yan
- Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Alan P. Boyle
- Department of Computational Medicine and Bioinformatics and Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Anat Kreimer
- Department of Bioengineering and Therapeutic Sciences and Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
- Department of Electrical Engineering and Computer Science and Center for Computational Biology, University of California, Berkeley, CA
| | - Ivan V. Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- School of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
- Institute of Mathematical Problems of Biology, Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Russia
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - John Reid
- MRC Biostatistics Unit, University of Cambridge, UK
- Alan Turing Institute, British Library, London, UK
| | - Ron Unger
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Nir Yosef
- Department of Electrical Engineering and Computer Science and Center for Computational Biology, University of California, Berkeley, CA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle WA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences and Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Martin Kircher
- Department of Genome Sciences, University of Washington, Seattle WA
- Berlin Institute of Health (BIH), Berlin, Germany
- Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Michael A. Beer
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland
| |
Collapse
|
13
|
Holtgrewe M, Knaus A, Hildebrand G, Pantel JT, Santos MRDL, Neveling K, Goldmann J, Schubach M, Jäger M, Coutelier M, Mundlos S, Beule D, Sperling K, Krawitz PM. Multisite de novo mutations in human offspring after paternal exposure to ionizing radiation. Sci Rep 2018; 8:14611. [PMID: 30279461 PMCID: PMC6168503 DOI: 10.1038/s41598-018-33066-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 09/12/2018] [Indexed: 12/30/2022] Open
Abstract
A genome-wide evaluation of the effects of ionizing radiation on mutation induction in the mouse germline has identified multisite de novo mutations (MSDNs) as marker for previous exposure. Here we present the results of a small pilot study of whole genome sequencing in offspring of soldiers who served in radar units on weapon systems that were emitting high-frequency radiation. We found cases of exceptionally high MSDN rates as well as an increased mean in our cohort: While a MSDN mutation is detected in average in 1 out of 5 offspring of unexposed controls, we observed 12 MSDNs in altogether 18 offspring, including a family with 6 MSDNs in 3 offspring. Moreover, we found two translocations, also resulting from neighboring mutations. Our findings indicate that MSDNs might be suited in principle for the assessment of DNA damage from ionizing radiation also in humans. However, as exact person-related dose values in risk groups are usually not available, the interpretation of MSDNs in single families would benefit from larger molecular epidemiologic studies on this new biomarker.
Collapse
Affiliation(s)
- Manuel Holtgrewe
- Berlin Institute of Health (BIH), Core Unit Bioinformatics, Berlin, 10178, Germany.,Charité - Universitätsmedizin Berlin, Berlin, 10117, Germany
| | - Alexej Knaus
- Institute for Genomic Statistics and Bioinformatics, Rheinische Friedrich-Wilhelms Universität, Bonn, 53127, Germany
| | - Gabriele Hildebrand
- Charité - Universitätsmedizin Berlin, Institute of Medical and Human Genetics, 13353, Berlin, Germany
| | - Jean-Tori Pantel
- Charité - Universitätsmedizin Berlin, Institute of Medical and Human Genetics, 13353, Berlin, Germany
| | | | - Kornelia Neveling
- Department for Human Genetics, Radboud University, Nijmegen, 6525, Netherlands
| | - Jakob Goldmann
- Department for Human Genetics, Radboud University, Nijmegen, 6525, Netherlands
| | - Max Schubach
- Charité - Universitätsmedizin Berlin, Institute of Medical and Human Genetics, 13353, Berlin, Germany.,Berlin Institute of Health (BIH), JRG Computational Genome Biology, 10178, Berlin, Germany
| | - Marten Jäger
- Charité - Universitätsmedizin Berlin, Institute of Medical and Human Genetics, 13353, Berlin, Germany
| | - Marie Coutelier
- Charité - Universitätsmedizin Berlin, Institute of Medical and Human Genetics, 13353, Berlin, Germany
| | - Stefan Mundlos
- Charité - Universitätsmedizin Berlin, Institute of Medical and Human Genetics, 13353, Berlin, Germany
| | - Dieter Beule
- Berlin Institute of Health (BIH), Core Unit Bioinformatics, Berlin, 10178, Germany.,Max Delbrück Center for Molecuar Medicine, 13125, Berlin, Germany
| | - Karl Sperling
- Charité - Universitätsmedizin Berlin, Institute of Medical and Human Genetics, 13353, Berlin, Germany
| | - Peter Michael Krawitz
- Institute for Genomic Statistics and Bioinformatics, Rheinische Friedrich-Wilhelms Universität, Bonn, 53127, Germany.
| |
Collapse
|
14
|
Sonntag K, Hashimoto H, Eyrich M, Menzel M, Schubach M, Döcker D, Battke F, Courage C, Lambertz H, Handgretinger R, Biskup S, Schilbach K. Immune monitoring and TCR sequencing of CD4 T cells in a long term responsive patient with metastasized pancreatic ductal carcinoma treated with individualized, neoepitope-derived multipeptide vaccines: a case report. J Transl Med 2018; 16:23. [PMID: 29409514 PMCID: PMC5801813 DOI: 10.1186/s12967-018-1382-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 01/10/2018] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Cancer vaccines can effectively establish clinically relevant tumor immunity. Novel sequencing approaches rapidly identify the mutational fingerprint of tumors, thus allowing to generate personalized tumor vaccines within a few weeks from diagnosis. Here, we report the case of a 62-year-old patient receiving a four-peptide-vaccine targeting the two sole mutations of his pancreatic tumor, identified via exome sequencing. METHODS Vaccination started during chemotherapy in second complete remission and continued monthly thereafter. We tracked IFN-γ+ T cell responses against vaccine peptides in peripheral blood after 12, 17 and 34 vaccinations by analyzing T-cell receptor (TCR) repertoire diversity and epitope-binding regions of peptide-reactive T-cell lines and clones. By restricting analysis to sorted IFN-γ-producing T cells we could assure epitope-specificity, functionality, and TH1 polarization. RESULTS A peptide-specific T-cell response against three of the four vaccine peptides could be detected sequentially. Molecular TCR analysis revealed a broad vaccine-reactive TCR repertoire with clones of discernible specificity. Four identical or convergent TCR sequences could be identified at more than one time-point, indicating timely persistence of vaccine-reactive T cells. One dominant TCR expressing a dual TCRVα chain could be found in three T-cell clones. The observed T-cell responses possibly contributed to clinical outcome: The patient is alive 6 years after initial diagnosis and in complete remission for 4 years now. CONCLUSIONS Therapeutic vaccination with a neoantigen-derived four-peptide vaccine resulted in a diverse and long-lasting immune response against these targets which was associated with prolonged clinical remission. These data warrant confirmation in a larger proof-of concept clinical trial.
Collapse
Affiliation(s)
- Katja Sonntag
- Department of Pediatric Hematology and Oncology, University Children's Hospital Tübingen, Hoppe-Seyler Street 1, 72076, Tübingen, Germany
| | - Hisayoshi Hashimoto
- Department of Pediatric Hematology and Oncology, University Children's Hospital Tübingen, Hoppe-Seyler Street 1, 72076, Tübingen, Germany
| | - Matthias Eyrich
- Department of Pediatric Hematology, Oncology and Stem Cell Transplantation, University Medical Center Würzburg, Josef-Schneider Street 2, 97080, Würzburg, Germany
| | - Moritz Menzel
- Center for Genomics and Transcriptomics (CeGaT) GmbH and Practice for Human Genetics, Paul-Ehrlich-Straße 23, 72076, Tübingen, Germany
| | - Max Schubach
- Institute for Medical and Human Genetics, Charité - Universitätsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Germany
| | - Dennis Döcker
- Center for Genomics and Transcriptomics (CeGaT) GmbH and Practice for Human Genetics, Paul-Ehrlich-Straße 23, 72076, Tübingen, Germany
| | - Florian Battke
- Center for Genomics and Transcriptomics (CeGaT) GmbH and Practice for Human Genetics, Paul-Ehrlich-Straße 23, 72076, Tübingen, Germany
| | - Carolina Courage
- Folkhälsan Institute of Genetics, Haartmaninkatu 8, 00014, Helsinki, Finland
| | - Helmut Lambertz
- Klinikum Garmisch-Partenkirchen GmbH, Zentrum für Innere Medizin, 82467, Garmisch-Partenkirchen, Germany
| | - Rupert Handgretinger
- Department of Pediatric Hematology and Oncology, University Children's Hospital Tübingen, Hoppe-Seyler Street 1, 72076, Tübingen, Germany
| | - Saskia Biskup
- Center for Genomics and Transcriptomics (CeGaT) GmbH and Practice for Human Genetics, Paul-Ehrlich-Straße 23, 72076, Tübingen, Germany
| | - Karin Schilbach
- Department of Pediatric Hematology and Oncology, University Children's Hospital Tübingen, Hoppe-Seyler Street 1, 72076, Tübingen, Germany. .,University Children's Hospital, University Medical Center Tübingen, Hoppe-Seyler-Street 1, 72076, Tübingen, Germany.
| |
Collapse
|
15
|
Knaus A, Pantel JT, Pendziwiat M, Hajjir N, Zhao M, Hsieh TC, Schubach M, Gurovich Y, Fleischer N, Jäger M, Köhler S, Muhle H, Korff C, Møller RS, Bayat A, Calvas P, Chassaing N, Warren H, Skinner S, Louie R, Evers C, Bohn M, Christen HJ, van den Born M, Obersztyn E, Charzewska A, Endziniene M, Kortüm F, Brown N, Robinson PN, Schelhaas HJ, Weber Y, Helbig I, Mundlos S, Horn D, Krawitz PM. Characterization of glycosylphosphatidylinositol biosynthesis defects by clinical features, flow cytometry, and automated image analysis. Genome Med 2018; 10:3. [PMID: 29310717 PMCID: PMC5759841 DOI: 10.1186/s13073-017-0510-5] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Accepted: 12/11/2017] [Indexed: 12/17/2022] Open
Abstract
Background Glycosylphosphatidylinositol biosynthesis defects (GPIBDs) cause a group of phenotypically overlapping recessive syndromes with intellectual disability, for which pathogenic mutations have been described in 16 genes of the corresponding molecular pathway. An elevated serum activity of alkaline phosphatase (AP), a GPI-linked enzyme, has been used to assign GPIBDs to the phenotypic series of hyperphosphatasia with mental retardation syndrome (HPMRS) and to distinguish them from another subset of GPIBDs, termed multiple congenital anomalies hypotonia seizures syndrome (MCAHS). However, the increasing number of individuals with a GPIBD shows that hyperphosphatasia is a variable feature that is not ideal for a clinical classification. Methods We studied the discriminatory power of multiple GPI-linked substrates that were assessed by flow cytometry in blood cells and fibroblasts of 39 and 14 individuals with a GPIBD, respectively. On the phenotypic level, we evaluated the frequency of occurrence of clinical symptoms and analyzed the performance of computer-assisted image analysis of the facial gestalt in 91 individuals. Results We found that certain malformations such as Morbus Hirschsprung and diaphragmatic defects are more likely to be associated with particular gene defects (PIGV, PGAP3, PIGN). However, especially at the severe end of the clinical spectrum of HPMRS, there is a high phenotypic overlap with MCAHS. Elevation of AP has also been documented in some of the individuals with MCAHS, namely those with PIGA mutations. Although the impairment of GPI-linked substrates is supposed to play the key role in the pathophysiology of GPIBDs, we could not observe gene-specific profiles for flow cytometric markers or a correlation between their cell surface levels and the severity of the phenotype. In contrast, it was facial recognition software that achieved the highest accuracy in predicting the disease-causing gene in a GPIBD. Conclusions Due to the overlapping clinical spectrum of both HPMRS and MCAHS in the majority of affected individuals, the elevation of AP and the reduced surface levels of GPI-linked markers in both groups, a common classification as GPIBDs is recommended. The effectiveness of computer-assisted gestalt analysis for the correct gene inference in a GPIBD and probably beyond is remarkable and illustrates how the information contained in human faces is pivotal in the delineation of genetic entities. Electronic supplementary material The online version of this article (doi:10.1186/s13073-017-0510-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alexej Knaus
- Institut für Medizinische Genetik und Humangenetik, Charité Universitätsmedizin Berlin, 13353, Berlin, Germany.,Max Planck Institute for Molecular Genetics, 14195, Berlin, Germany.,Berlin-Brandenburg School for Regenerative Therapies, Charité Universitätsmedizin Berlin, 13353, Berlin, Germany.,Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127, Bonn, Germany
| | - Jean Tori Pantel
- Institut für Medizinische Genetik und Humangenetik, Charité Universitätsmedizin Berlin, 13353, Berlin, Germany
| | - Manuela Pendziwiat
- Department of Neuropediatrics, University Medical Center Schleswig Holstein, 24105, Kiel, Germany
| | - Nurulhuda Hajjir
- Institut für Medizinische Genetik und Humangenetik, Charité Universitätsmedizin Berlin, 13353, Berlin, Germany
| | - Max Zhao
- Institut für Medizinische Genetik und Humangenetik, Charité Universitätsmedizin Berlin, 13353, Berlin, Germany
| | - Tzung-Chien Hsieh
- Institut für Medizinische Genetik und Humangenetik, Charité Universitätsmedizin Berlin, 13353, Berlin, Germany.,Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127, Bonn, Germany
| | - Max Schubach
- Institut für Medizinische Genetik und Humangenetik, Charité Universitätsmedizin Berlin, 13353, Berlin, Germany.,Berlin Institute of Health (BIH), 10178, Berlin, Germany
| | | | | | - Marten Jäger
- Institut für Medizinische Genetik und Humangenetik, Charité Universitätsmedizin Berlin, 13353, Berlin, Germany.,Berlin Institute of Health (BIH), 10178, Berlin, Germany
| | - Sebastian Köhler
- Institut für Medizinische Genetik und Humangenetik, Charité Universitätsmedizin Berlin, 13353, Berlin, Germany
| | - Hiltrud Muhle
- Department of Neuropediatrics, University Medical Center Schleswig Holstein, 24105, Kiel, Germany
| | - Christian Korff
- Unité de Neuropédiatrie, Université de Genève, CH-1211, Genève, Switzerland
| | - Rikke S Møller
- Danish Epilepsy Centre, DK-4293, Dianalund, Denmark.,Institute for Regional Health Services Research, University of Southern Denmark, DK-5000, Odense, Denmark
| | - Allan Bayat
- Department of Pediatrics, University Hospital of Hvidovre, 2650, Hvicovre, Denmark
| | - Patrick Calvas
- Service de Génétique Médicale, Hôpital Purpan, CHU, 31059, Toulouse, France
| | - Nicolas Chassaing
- Service de Génétique Médicale, Hôpital Purpan, CHU, 31059, Toulouse, France
| | | | | | | | - Christina Evers
- Genetische Poliklinik, Universitätsklinik Heidelberg, 69120, Heidelberg, Germany
| | - Marc Bohn
- St. Bernward Krankenhaus, 31134, Hildesheim, Germany
| | - Hans-Jürgen Christen
- Kinderkrankenhaus auf der Bult, Hannoversche Kinderheilanstalt, 30173, Hannover, Germany
| | | | - Ewa Obersztyn
- Institute of Mother and Child Department of Molecular Genetics, 01-211, Warsaw, Poland
| | - Agnieszka Charzewska
- Institute of Mother and Child Department of Molecular Genetics, 01-211, Warsaw, Poland
| | - Milda Endziniene
- Neurology Department, Lithuanian University of Health Sciences, 50009, Kaunas, Lithuania
| | - Fanny Kortüm
- Institute of Human Genetics, University Medical Center Hamburg-Eppendorf, 20246, Hamburg, Germany
| | - Natasha Brown
- Victorian Clinical Genetics Services, Royal Children's Hospital, MCRI, Parkville, Australia.,Department of Clinical Genetics, Austin Health, Heidelberg, Australia
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, 06032, Farmington, USA
| | - Helenius J Schelhaas
- Departement of Neurology, Academic Center for Epileptology, 5590, Heeze, The Netherlands
| | - Yvonne Weber
- Department of Neurology and Epileptology and Hertie Institute for Clinical Brain Research, University Tübingen, 72076, Tübingen, Germany
| | - Ingo Helbig
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127, Bonn, Germany.,Pediatric Neurology, Children's Hospital of Philadelphia, 3401, Philadelphia, USA
| | - Stefan Mundlos
- Institut für Medizinische Genetik und Humangenetik, Charité Universitätsmedizin Berlin, 13353, Berlin, Germany.,Max Planck Institute for Molecular Genetics, 14195, Berlin, Germany
| | - Denise Horn
- Institut für Medizinische Genetik und Humangenetik, Charité Universitätsmedizin Berlin, 13353, Berlin, Germany.
| | - Peter M Krawitz
- Institut für Medizinische Genetik und Humangenetik, Charité Universitätsmedizin Berlin, 13353, Berlin, Germany. .,Max Planck Institute for Molecular Genetics, 14195, Berlin, Germany. .,Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127, Bonn, Germany.
| |
Collapse
|
16
|
Notaro M, Schubach M, Robinson PN, Valentini G. Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods. BMC Bioinformatics 2017; 18:449. [PMID: 29025394 PMCID: PMC5639780 DOI: 10.1186/s12859-017-1854-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Accepted: 10/02/2017] [Indexed: 03/12/2023] Open
Abstract
BACKGROUND The prediction of human gene-abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides a standard categorization of the abnormalities associated with human diseases. While the problem of the prediction of gene-disease associations has been widely investigated, the related problem of gene-phenotypic feature (i.e., HPO term) associations has been largely overlooked, even if for most human genes no HPO term associations are known and despite the increasing application of the HPO to relevant medical problems. Moreover most of the methods proposed in literature are not able to capture the hierarchical relationships between HPO terms, thus resulting in inconsistent and relatively inaccurate predictions. RESULTS We present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO. The modular structure of the proposed methods, that consists in a "flat" learning first step and a hierarchical combination of the predictions in the second step, allows the predictions of virtually any flat learning method to be enhanced. The experimental results show that hierarchical ensemble methods are able to predict novel associations between genes and abnormal phenotypes with results that are competitive with state-of-the-art algorithms and with a significant reduction of the computational complexity. CONCLUSIONS Hierarchical ensembles are efficient computational methods that guarantee biologically meaningful predictions that obey the true path rule, and can be used as a tool to improve and make consistent the HPO terms predictions starting from virtually any flat learning method. The implementation of the proposed methods is available as an R package from the CRAN repository.
Collapse
Affiliation(s)
- Marco Notaro
- Anacleto Lab - Dipartimento di Informatica, Universitá degli Studi di Milano, Via Comelico 39, Milan, 20135 Italy
| | - Max Schubach
- Institute for Medical and Human Genetics, Charité - Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353 Germany
- Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Str. 2, Berlin, 10178 Germany
| | - Peter N. Robinson
- Institute for Medical and Human Genetics, Charité - Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353 Germany
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, Berlin, 14195 Germany
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Dr, Farmington, 06032 CT USA
- Institute for Systems Genomics, University of Connecticut, 10 Discovery Dr, Farmington, 06032 CT USA
| | - Giorgio Valentini
- Anacleto Lab - Dipartimento di Informatica, Universitá degli Studi di Milano, Via Comelico 39, Milan, 20135 Italy
| |
Collapse
|
17
|
Jäger M, Schubach M, Zemojtel T, Reinert K, Church DM, Robinson PN. Alternate-locus aware variant calling in whole genome sequencing. Genome Med 2016; 8:130. [PMID: 27964746 PMCID: PMC5155401 DOI: 10.1186/s13073-016-0383-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2016] [Accepted: 11/23/2016] [Indexed: 01/09/2023] Open
Abstract
Background The last two human genome assemblies have extended the previous linear golden-path paradigm of the human genome to a graph-like model to better represent regions with a high degree of structural variability. The new model offers opportunities to improve the technical validity of variant calling in whole-genome sequencing (WGS). Methods We developed an algorithm that analyzes the patterns of variant calls in the 178 structurally variable regions of the GRCh38 genome assembly, and infers whether a given sample is most likely to contain sequences from the primary assembly, an alternate locus, or their heterozygous combination at each of these 178 regions. We investigate 121 in-house WGS datasets that have been aligned to the GRCh37 and GRCh38 assemblies. Results We show that stretches of sequences that are largely but not entirely identical between the primary assembly and an alternate locus can result in multiple variant calls against regions of the primary assembly. In WGS analysis, this results in characteristic and recognizable patterns of variant calls at positions that we term alignable scaffold-discrepant positions (ASDPs). In 121 in-house genomes, on average 51.8±3.8 of the 178 regions were found to correspond best to an alternate locus rather than the primary assembly sequence, and filtering these genomes with our algorithm led to the identification of 7863 variant calls per genome that colocalized with ASDPs. Additionally, we found that 437 of 791 genome-wide association study hits located within one of the regions corresponded to ASDPs. Conclusions Our algorithm uses the information contained in the 178 structurally variable regions of the GRCh38 genome assembly to avoid spurious variant calls in cases where samples contain an alternate locus rather than the corresponding segment of the primary assembly. These results suggest the great potential of fully incorporating the resources of graph-like genome assemblies into variant calling, but also underscore the importance of developing computational resources that will allow a full reconstruction of the genotype in personal genomes. Our algorithm is freely available at https://github.com/charite/asdpex. Electronic supplementary material The online version of this article (doi:10.1186/s13073-016-0383-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marten Jäger
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany.,Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany
| | - Max Schubach
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany
| | - Tomasz Zemojtel
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany
| | - Knut Reinert
- Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 14, Berlin, 14195, Germany
| | - Deanna M Church
- 10x Genomics, 7068 Koll Center Parkway, Suite 401, Pleasanton, 94566, CA, USA
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany. .,Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany. .,Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 14, Berlin, 14195, Germany. .,The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, 06032, CT, USA. .,Institute for Systems Genomics, University of Connecticut, Farmington, 06032, CT, USA.
| |
Collapse
|
18
|
Smedley D, Schubach M, Jacobsen J, Köhler S, Zemojtel T, Spielmann M, Jäger M, Hochheiser H, Washington N, McMurry J, Haendel M, Mungall C, Lewis S, Groza T, Valentini G, Robinson P. A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease. Am J Hum Genet 2016; 99:595-606. [PMID: 27569544 DOI: 10.1016/j.ajhg.2016.07.005] [Citation(s) in RCA: 157] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 07/01/2016] [Indexed: 12/17/2022] Open
Abstract
The interpretation of non-coding variants still constitutes a major challenge in the application of whole-genome sequencing in Mendelian disease, especially for single-nucleotide and other small non-coding variants. Here we present Genomiser, an analysis framework that is able not only to score the relevance of variation in the non-coding genome, but also to associate regulatory variants to specific Mendelian diseases. Genomiser scores variants through either existing methods such as CADD or a bespoke machine learning method and combines these with allele frequency, regulatory sequences, chromosomal topological domains, and phenotypic relevance to discover variants associated to specific Mendelian disorders. Overall, Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes, allowing effective detection and discovery of regulatory variants in Mendelian disease.
Collapse
|
19
|
Filges I, Bruder E, Brandal K, Meier S, Undlien DE, Waage TR, Hoesli I, Schubach M, de Beer T, Sheng Y, Hoeller S, Schulzke S, Røsby O, Miny P, Tercanli S, Oppedal T, Meyer P, Selmer KK, Strømme P. Strømme Syndrome Is a Ciliary Disorder Caused by Mutations in CENPF. Hum Mutat 2016; 37:711. [DOI: 10.1002/humu.22997] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Isabel Filges
- Medical Genetics; University Hospital Basel; Basel Switzerland
| | | | - Kristin Brandal
- Department of Medical Genetics; Oslo University Hospital and University of Oslo; Oslo Norway
| | - Stephanie Meier
- Medical Genetics; University Hospital Basel; Basel Switzerland
| | - Dag Erik Undlien
- Department of Medical Genetics; Oslo University Hospital and University of Oslo; Oslo Norway
| | - Trine Rygvold Waage
- Section of Paediatric Neurohabilitation; Department of Clinical Neurosciences for Children; Oslo University Hospital; Ullevål, Oslo Norway
| | - Irene Hoesli
- Obstetrics and Gynecology; University Hospital Basel; Basel Switzerland
| | - Max Schubach
- Institute for Medical and Human Genetics; Charité-Universitätsmedizin Berlin; Berlin Germany
| | - Tjaart de Beer
- Biozentrum and Swiss Institute of Bioinformatics; University of Basel; Basel Switzerland
| | - Ying Sheng
- Department of Medical Genetics; Oslo University Hospital and University of Oslo; Oslo Norway
| | | | - Sven Schulzke
- Neonatology; University Children's Hospital Basel; Basel Switzerland
| | - Oddveig Røsby
- Department of Medical Genetics; Oslo University Hospital and University of Oslo; Oslo Norway
| | - Peter Miny
- Medical Genetics; University Hospital Basel; Basel Switzerland
| | | | - Truls Oppedal
- Department of Ophthalmology; Section for Pediatric Ophthalmology; Oslo University Hospital; Ullevål, Oslo Norway
| | - Peter Meyer
- Pathology; University Hospital Basel; Basel Switzerland
| | - Kaja Kristine Selmer
- Department of Medical Genetics; Oslo University Hospital and University of Oslo; Oslo Norway
| | - Petter Strømme
- Section for Clinical Neurosciences; Department of Pediatrics; Oslo University Hospital and University of Oslo; Oslo Norway
| |
Collapse
|
20
|
Filges I, Bruder E, Brandal K, Meier S, Undlien DE, Waage TR, Hoesli I, Schubach M, de Beer T, Sheng Y, Hoeller S, Schulzke S, Røsby O, Miny P, Tercanli S, Oppedal T, Meyer P, Selmer KK, Strømme P. Strømme Syndrome Is a Ciliary Disorder Caused by Mutations in CENPF. Hum Mutat 2016; 37:359-63. [PMID: 26820108 DOI: 10.1002/humu.22960] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 01/08/2016] [Indexed: 11/10/2022]
Abstract
Strømme syndrome was first described by Strømme et al. (1993) in siblings presenting with "apple peel" type intestinal atresia, ocular anomalies and microcephaly. The etiology remains unknown to date. We describe the long-term clinical follow-up data for the original pair of siblings as well as two previously unreported siblings with a severe phenotype overlapping that of the Strømme syndrome including fetal autopsy results. Using family-based whole-exome sequencing, we identified truncating mutations in the centrosome gene CENPF in the two nonconsanguineous Caucasian sibling pairs. Compound heterozygous inheritance was confirmed in both families. Recently, mutations in this gene were shown to cause a fetal lethal phenotype, the phenotype and functional data being compatible with a human ciliopathy [Waters et al., 2015]. We show for the first time that Strømme syndrome is an autosomal-recessive disease caused by mutations in CENPF that can result in a wide phenotypic spectrum.
Collapse
Affiliation(s)
- Isabel Filges
- Medical Genetics, University Hospital Basel, Basel, Switzerland
| | | | - Kristin Brandal
- Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Stephanie Meier
- Medical Genetics, University Hospital Basel, Basel, Switzerland
| | - Dag Erik Undlien
- Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Trine Rygvold Waage
- Section of Paediatric Neurohabilitation, Department of Clinical Neurosciences for Children, Oslo University Hospital, Ullevål, Oslo, Norway
| | - Irene Hoesli
- Obstetrics and Gynecology, University Hospital Basel, Basel, Switzerland
| | - Max Schubach
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Tjaart de Beer
- Biozentrum and Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Ying Sheng
- Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Sylvia Hoeller
- Pathology, University Hospital Basel, Basel, Switzerland
| | - Sven Schulzke
- Neonatology, University Children's Hospital Basel, Basel, Switzerland
| | - Oddveig Røsby
- Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Peter Miny
- Medical Genetics, University Hospital Basel, Basel, Switzerland
| | | | - Truls Oppedal
- Department of Ophthalmology, Section for Pediatric Ophthalmology, Oslo University Hospital, Ullevål, Oslo, Norway
| | - Peter Meyer
- Pathology, University Hospital Basel, Basel, Switzerland
| | - Kaja Kristine Selmer
- Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Petter Strømme
- Section for Clinical Neurosciences, Department of Pediatrics, Oslo University Hospital and University of Oslo, Oslo, Norway
| |
Collapse
|
21
|
Weisschuh N, Mayer AK, Strom TM, Kohl S, Glöckle N, Schubach M, Andreasson S, Bernd A, Birch DG, Hamel CP, Heckenlively JR, Jacobson SG, Kamme C, Kellner U, Kunstmann E, Maffei P, Reiff CM, Rohrschneider K, Rosenberg T, Rudolph G, Vámos R, Varsányi B, Weleber RG, Wissinger B. Mutation Detection in Patients with Retinal Dystrophies Using Targeted Next Generation Sequencing. PLoS One 2016; 11:e0145951. [PMID: 26766544 PMCID: PMC4713063 DOI: 10.1371/journal.pone.0145951] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 12/10/2015] [Indexed: 11/24/2022] Open
Abstract
Retinal dystrophies (RD) constitute a group of blinding diseases that are characterized by clinical variability and pronounced genetic heterogeneity. The different nonsyndromic and syndromic forms of RD can be attributed to mutations in more than 200 genes. Consequently, next generation sequencing (NGS) technologies are among the most promising approaches to identify mutations in RD. We screened a large cohort of patients comprising 89 independent cases and families with various subforms of RD applying different NGS platforms. While mutation screening in 50 cases was performed using a RD gene capture panel, 47 cases were analyzed using whole exome sequencing. One family was analyzed using whole genome sequencing. A detection rate of 61% was achieved including mutations in 34 known and two novel RD genes. A total of 69 distinct mutations were identified, including 39 novel mutations. Notably, genetic findings in several families were not consistent with the initial clinical diagnosis. Clinical reassessment resulted in refinement of the clinical diagnosis in some of these families and confirmed the broad clinical spectrum associated with mutations in RD genes.
Collapse
Affiliation(s)
- Nicole Weisschuh
- Molecular Genetics Laboratory, Institute for Ophthalmic Research, Centre for Ophthalmology, University of Tuebingen, Tuebingen, Germany
- * E-mail:
| | - Anja K. Mayer
- Molecular Genetics Laboratory, Institute for Ophthalmic Research, Centre for Ophthalmology, University of Tuebingen, Tuebingen, Germany
| | - Tim M. Strom
- Institute of Human Genetics, Helmholtz Zentrum Muenchen, Neuherberg, Germany
| | - Susanne Kohl
- Molecular Genetics Laboratory, Institute for Ophthalmic Research, Centre for Ophthalmology, University of Tuebingen, Tuebingen, Germany
| | | | - Max Schubach
- Institute of Medical Genetics and Human Genetics, Charité – Universitaetsmedizin Berlin, Berlin, Germany
| | | | - Antje Bernd
- University Eye Hospital, Centre for Ophthalmology, University of Tuebingen, Tuebingen, Germany
| | - David G. Birch
- The Retina Foundation of the Southwest, Dallas, Texas, United States of America
| | | | - John R. Heckenlively
- Department of Ophthalmology and Visual Sciences, Kellogg Eye Center, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Samuel G. Jacobson
- Scheie Eye Institute, Department of Ophthalmology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | | | - Ulrich Kellner
- Rare Retinal Disease Center, AugenZentrum Siegburg, MVZ ADTC Siegburg GmbH, Siegburg, Germany
| | - Erdmute Kunstmann
- Institute of Human Genetics, Julius-Maximilian-University, Wuerzburg, Germany
| | - Pietro Maffei
- Department of Medicine, University Hospital of Padua, Padua, Italy
| | | | | | - Thomas Rosenberg
- National Eye Clinic, Department of Ophthalmology, Glostrup Hospital, Glostrup, Denmark
| | - Günther Rudolph
- University Eye Hospital, Ludwig Maximilians University, Munich, Germany
| | - Rita Vámos
- Department of Ophthalmology, Semmelweis University, Budapest, Hungary
| | - Balázs Varsányi
- Department of Ophthalmology, Semmelweis University, Budapest, Hungary
- Department of Ophthalmology, University of Pécs Medical School, Pécs, Hungary
| | - Richard G. Weleber
- Casey Eye Institute, Oregon Retinal Degeneration Center, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Bernd Wissinger
- Molecular Genetics Laboratory, Institute for Ophthalmic Research, Centre for Ophthalmology, University of Tuebingen, Tuebingen, Germany
| |
Collapse
|
22
|
Smedley D, Jacobsen JOB, Jäger M, Köhler S, Holtgrewe M, Schubach M, Siragusa E, Zemojtel T, Buske OJ, Washington NL, Bone WP, Haendel MA, Robinson PN. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc 2015; 10:2004-15. [PMID: 26562621 DOI: 10.1038/nprot.2015.124] [Citation(s) in RCA: 212] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Exomiser is an application that prioritizes genes and variants in next-generation sequencing (NGS) projects for novel disease-gene discovery or differential diagnostics of Mendelian disease. Exomiser comprises a suite of algorithms for prioritizing exome sequences using random-walk analysis of protein interaction networks, clinical relevance and cross-species phenotype comparisons, as well as a wide range of other computational filters for variant frequency, predicted pathogenicity and pedigree analysis. In this protocol, we provide a detailed explanation of how to install Exomiser and use it to prioritize exome sequences in a number of scenarios. Exomiser requires ∼3 GB of RAM and roughly 15-90 s of computing time on a standard desktop computer to analyze a variant call format (VCF) file. Exomiser is freely available for academic use from http://www.sanger.ac.uk/science/tools/exomiser.
Collapse
Affiliation(s)
- Damian Smedley
- Skarnes Faculty Group, Wellcome Trust Sanger Institute, Hinxton, UK
| | | | - Marten Jäger
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Manuel Holtgrewe
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute for Health, Berlin, Germany
| | - Max Schubach
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Enrico Siragusa
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute for Health, Berlin, Germany.,Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Tomasz Zemojtel
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.,Labor Berlin - Charité Vivantes, Humangenetik, Berlin, Germany
| | - Orion J Buske
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Nicole L Washington
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - William P Bone
- The National Institutes of Health (NIH) Undiagnosed Diseases Program, Common Fund, Office of the Director, NIH, Bethesda, Maryland, USA
| | - Melissa A Haendel
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health &Science University, Portland, Oregon, USA
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany.,Max Planck Institute for Molecular Genetics, Berlin, Germany.,Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
23
|
Peveling-Oberhag J, Wolters F, Döring C, Walter D, Sellmann L, Scholtysik R, Lucioni M, Schubach M, Paulli M, Biskup S, Zeuzem S, Küppers R, Hansmann ML. Whole exome sequencing of microdissected splenic marginal zone lymphoma: a study to discover novel tumor-specific mutations. BMC Cancer 2015; 15:773. [PMID: 26498442 PMCID: PMC4619476 DOI: 10.1186/s12885-015-1766-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Accepted: 10/10/2015] [Indexed: 12/14/2022] Open
Abstract
Background Splenic marginal zone lymphoma (SMZL) is an indolent B-cell non-Hodgkin lymphoma and represents the most common primary malignancy of the spleen. Its precise molecular pathogenesis is still unknown and specific molecular markers for diagnosis or possible targets for causal therapies are lacking. Methods We performed whole exome sequencing (WES) and copy number analysis from laser-microdissected tumor cells of two primary SMZL discovery cases. Selected somatic single nucleotide variants (SNVs) were analyzed using pyrosequencing and Sanger sequencing in an independent validation cohort. Results Overall, 25 nonsynonymous somatic SNVs were identified, including known mutations in the NOTCH2 and MYD88 genes. Twenty-three of the mutations have not been associated with SMZL before. Many of these seem to be subclonal. Screening of 24 additional SMZL for mutations at the same positions found mutated in the WES approach revealed no recurrence of mutations for ZNF608 and PDE10A, whereas the MYD88 L265P missense mutation was identified in 15 % of cases. An analysis of the NOTCH2 PEST domain and the whole coding region of the transcription factor SMYD1 in eight cases identified no additional case with a NOTCH2 mutation, but two additional cases with SMYD1 alterations. Conclusions In this first WES approach from microdissected SMZL tissue we confirmed known mutations and discovered new somatic variants. Recurrence of MYD88 mutations in SMZL was validated, but NOTCH2 PEST domain mutations were relatively rare (10 % of cases). Recurrent mutations in the transcription factor SMYD1 have not been described in SMZL before and warrant further investigation. Electronic supplementary material The online version of this article (doi:10.1186/s12885-015-1766-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jan Peveling-Oberhag
- Medizinische Klinik 1, Klinikum der Johann Wolfgang Goethe-Universität, Theodor-Stern-Kai 7, Frankfurt am Main, Germany.
| | - Franziska Wolters
- Medizinische Klinik 1, Klinikum der Johann Wolfgang Goethe-Universität, Theodor-Stern-Kai 7, Frankfurt am Main, Germany.
| | - Claudia Döring
- Senckenbergisches Institut für Pathologie, Klinikum der Johann Wolfgang Goethe-Universität, Theodor-Stern-Kai 7, Frankfurt am Main, Germany.
| | - Dirk Walter
- Medizinische Klinik 1, Klinikum der Johann Wolfgang Goethe-Universität, Theodor-Stern-Kai 7, Frankfurt am Main, Germany.
| | - Ludger Sellmann
- Institute of Cell Biology (Cancer Research), Medical School, University of Duisburg-Essen, Essen, Germany.
| | - René Scholtysik
- Institute of Cell Biology (Cancer Research), Medical School, University of Duisburg-Essen, Essen, Germany.
| | - Marco Lucioni
- Department of Human Pathology, Fondazione IRCCS Policlinico San Matteo, University of Pavia, Pavia, Italy.
| | - Max Schubach
- Institute of Medical Genetics and Human Genetics, Charité Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, Germany.
| | - Marco Paulli
- Department of Human Pathology, Fondazione IRCCS Policlinico San Matteo, University of Pavia, Pavia, Italy.
| | - Saskia Biskup
- CeGaT GmbH, Paul-Ehrlich-Straße 23, Tübingen, Germany.
| | - Stefan Zeuzem
- Medizinische Klinik 1, Klinikum der Johann Wolfgang Goethe-Universität, Theodor-Stern-Kai 7, Frankfurt am Main, Germany.
| | - Ralf Küppers
- Institute of Cell Biology (Cancer Research), Medical School, University of Duisburg-Essen, Essen, Germany. .,German Cancer Consortium (DKTK), Heidelberg, Germany.
| | - Martin-Leo Hansmann
- Senckenbergisches Institut für Pathologie, Klinikum der Johann Wolfgang Goethe-Universität, Theodor-Stern-Kai 7, Frankfurt am Main, Germany. .,German Cancer Consortium (DKTK), Heidelberg, Germany.
| |
Collapse
|
24
|
Kettwig M, Schubach M, Zimmermann FA, Klinge L, Mayr JA, Biskup S, Sperl W, Gärtner J, Huppke P. From ventriculomegaly to severe muscular atrophy: Expansion of the clinical spectrum related to mutations in AIFM1. Mitochondrion 2015; 21:12-8. [DOI: 10.1016/j.mito.2015.01.001] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 09/02/2014] [Accepted: 01/05/2015] [Indexed: 12/20/2022]
|
25
|
Gadzicki D, Döcker D, Schubach M, Menzel M, Schmorl B, Stellmer F, Biskup S, Bartholdi D. Expanding the phenotype of a recurrent de novo variant in PACS1 causing intellectual disability. Clin Genet 2014; 88:300-2. [PMID: 25522177 DOI: 10.1111/cge.12544] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Revised: 11/18/2014] [Accepted: 11/19/2014] [Indexed: 02/02/2023]
Affiliation(s)
- D Gadzicki
- MVZ Endokrinologikum Hannover, Hannover, Germany
| | - D Döcker
- Institute of Clinical Genetics, Klinikum Stuttgart, Stuttgart, Germany
| | | | | | - B Schmorl
- MVZ Endokrinologikum Hannover, Hannover, Germany
| | | | - S Biskup
- Institute of Clinical Genetics, Klinikum Stuttgart, Stuttgart, Germany.,CeGaT GmbH, Tübingen, Germany
| | - D Bartholdi
- Institute of Clinical Genetics, Klinikum Stuttgart, Stuttgart, Germany
| |
Collapse
|
26
|
Brownstein CA, Beggs AH, Homer N, Merriman B, Yu TW, Flannery KC, DeChene ET, Towne MC, Savage SK, Price EN, Holm IA, Luquette LJ, Lyon E, Majzoub J, Neupert P, McCallie D, Szolovits P, Willard HF, Mendelsohn NJ, Temme R, Finkel RS, Yum SW, Medne L, Sunyaev SR, Adzhubey I, Cassa CA, de Bakker PIW, Duzkale H, Dworzyński P, Fairbrother W, Francioli L, Funke BH, Giovanni MA, Handsaker RE, Lage K, Lebo MS, Lek M, Leshchiner I, MacArthur DG, McLaughlin HM, Murray MF, Pers TH, Polak PP, Raychaudhuri S, Rehm HL, Soemedi R, Stitziel NO, Vestecka S, Supper J, Gugenmus C, Klocke B, Hahn A, Schubach M, Menzel M, Biskup S, Freisinger P, Deng M, Braun M, Perner S, Smith RJH, Andorf JL, Huang J, Ryckman K, Sheffield VC, Stone EM, Bair T, Black-Ziegelbein EA, Braun TA, Darbro B, DeLuca AP, Kolbe DL, Scheetz TE, Shearer AE, Sompallae R, Wang K, Bassuk AG, Edens E, Mathews K, Moore SA, Shchelochkov OA, Trapane P, Bossler A, Campbell CA, Heusel JW, Kwitek A, Maga T, Panzer K, Wassink T, Van Daele D, Azaiez H, Booth K, Meyer N, Segal MM, Williams MS, Tromp G, White P, Corsmeier D, Fitzgerald-Butt S, Herman G, Lamb-Thrush D, McBride KL, Newsom D, Pierson CR, Rakowsky AT, Maver A, Lovrečić L, Palandačić A, Peterlin B, Torkamani A, Wedell A, Huss M, Alexeyenko A, Lindvall JM, Magnusson M, Nilsson D, Stranneheim H, Taylan F, Gilissen C, Hoischen A, van Bon B, Yntema H, Nelen M, Zhang W, Sager J, Zhang L, Blair K, Kural D, Cariaso M, Lennon GG, Javed A, Agrawal S, Ng PC, Sandhu KS, Krishna S, Veeramachaneni V, Isakov O, Halperin E, Friedman E, Shomron N, Glusman G, Roach JC, Caballero J, Cox HC, Mauldin D, Ament SA, Rowen L, Richards DR, San Lucas FA, Gonzalez-Garay ML, Caskey CT, Bai Y, Huang Y, Fang F, Zhang Y, Wang Z, Barrera J, Garcia-Lobo JM, González-Lamuño D, Llorca J, Rodriguez MC, Varela I, Reese MG, De La Vega FM, Kiruluta E, Cargill M, Hart RK, Sorenson JM, Lyon GJ, Stevenson DA, Bray BE, Moore BM, Eilbeck K, Yandell M, Zhao H, Hou L, Chen X, Yan X, Chen M, Li C, Yang C, Gunel M, Li P, Kong Y, Alexander AC, Albertyn ZI, Boycott KM, Bulman DE, Gordon PMK, Innes AM, Knoppers BM, Majewski J, Marshall CR, Parboosingh JS, Sawyer SL, Samuels ME, Schwartzentruber J, Kohane IS, Margulies DM. An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge. Genome Biol 2014; 15:R53. [PMID: 24667040 PMCID: PMC4073084 DOI: 10.1186/gb-2014-15-3-r53] [Citation(s) in RCA: 90] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2013] [Accepted: 03/25/2014] [Indexed: 12/30/2022] Open
Abstract
Background There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance. Results A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization. Conclusions The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups.
Collapse
|
27
|
Menzel M, Scheurenbrand T, Sprecher A, Schubach M, Battke F, Biskup S. Diagnostic Next-Generation Sequencing Panel for Hereditary Breast and Ovarian Cancer. Ann Oncol 2013. [DOI: 10.1093/annonc/mdt078.8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
28
|
Lemke JR, Riesch E, Scheurenbrand T, Schubach M, Wilhelm C, Steiner I, Hansen J, Courage C, Gallati S, Bürki S, Strozzi S, Simonetti BG, Grunt S, Steinlin M, Alber M, Wolff M, Klopstock T, Prott EC, Lorenz R, Spaich C, Rona S, Lakshminarasimhan M, Kröll J, Dorn T, Krämer G, Synofzik M, Becker F, Weber YG, Lerche H, Böhm D, Biskup S. Targeted next generation sequencing as a diagnostic tool in epileptic disorders. Epilepsia 2012; 53:1387-98. [PMID: 22612257 DOI: 10.1111/j.1528-1167.2012.03516.x] [Citation(s) in RCA: 242] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
PURPOSE Epilepsies have a highly heterogeneous background with a strong genetic contribution. The variety of unspecific and overlapping syndromic and nonsyndromic phenotypes often hampers a clear clinical diagnosis and prevents straightforward genetic testing. Knowing the genetic basis of a patient's epilepsy can be valuable not only for diagnosis but also for guiding treatment and estimating recurrence risks. METHODS To overcome these diagnostic restrictions, we composed a panel of genes for Next Generation Sequencing containing the most relevant epilepsy genes and covering the most relevant epilepsy phenotypes known so far. With this method, 265 genes were analyzed per patient in a single step. We evaluated this panel on a pilot cohort of 33 index patients with concise epilepsy phenotypes or with a severe but unspecific seizure disorder covering both sporadic and familial cases. KEY FINDINGS We identified presumed disease-causing mutations in 16 of 33 patients comprising sequence alterations in frequently as well as in less commonly affected genes. The detected aberrations encompassed known and unknown point mutations (SCN1A p.R222X, p. E289V, p.379R, p.R393H; SCN2A p.V208E; STXBP1 p.R122X; KCNJ10 p.L68P, p.I129V; KCTD7 p.L108M; KCNQ3 p.P574S; ARHGEF9 p.R290H; SMS p.F58L; TPP1 p.Q278R, p.Q422H; MFSD8 p.T294K), a putative splice site mutation (SCN1A c.693A> p.T/P231P) and small deletions (SCN1A p.F1330Lfs3X [1 bp]; MFSD8 p.A138Dfs10X [7 bp]). All mutations have been confirmed by conventional Sanger sequencing and, where possible, validated by parental testing and segregation analysis. In three patients with either Dravet syndrome or myoclonic epilepsy, we detected SCN1A mutations (p.R222X, p.P231P, p.R393H), even though other laboratories had previously excluded aberrations of this gene by Sanger sequencing or high-resolution melting analysis. SIGNIFICANCE We have developed a fast and cost-efficient diagnostic screening method to analyze the genetic basis of epilepsies. We were able to detect mutations in patients with clear and with unspecific epilepsy phenotypes, to uncover the genetic basis of many so far unresolved cases with epilepsy including mutation detection in cases in which previous conventional methods yielded falsely negative results. Our approach thus proved to be a powerful diagnostic tool that may contribute to collecting information on both common and unknown epileptic disorders and in delineating associated phenotypes of less frequently mutated genes.
Collapse
Affiliation(s)
- Johannes R Lemke
- Division of Human Genetics, University Children's Hospital Inselspital, Bern, Switzerland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Abstract
Background Metagenomics is the study of environmental samples using sequencing. Rapid advances in sequencing technology are fueling a vast increase in the number and scope of metagenomics projects. Most metagenome sequencing projects so far have been based on Sanger or Roche-454 sequencing, as only these technologies provide long enough reads, while Illumina sequencing has not been considered suitable for metagenomic studies due to a short read length of only 35 bp. However, now that reads of length 75 bp can be sequenced in pairs, Illumina sequencing has become a viable option for metagenome studies. Results This paper addresses the problem of taxonomical analysis of paired reads. We describe a new feature of our metagenome analysis software MEGAN that allows one to process sequencing reads in pairs and makes assignments of such reads based on the combined bit scores of their matches to reference sequences. Using this new software in a simulation study, we investigate the use of Illumina paired-sequencing in taxonomical analysis and compare the performance of single reads, short clones and long clones. In addition, we also compare against simulated Roche-454 sequencing runs. Conclusion This work shows that paired reads perform better than single reads, as expected, but also, perhaps slightly less obviously, that long clones allow more specific assignments than short ones. A new version of the program MEGAN that explicitly takes paired reads into account is available from our website.
Collapse
Affiliation(s)
- Suparna Mitra
- Center for Bioinformatics ZBIT, Tübingen University, Sand 14, 72076 Tübingen, Germany.
| | | | | |
Collapse
|